HexaGene | Structural Physics for Biological Systems

The Science

Physics of the Genetic Code

HexaGene applies established thermodynamic and biophysical principles to quantify structural stress in DNA sequences.

DNA is not just information — it's a physical polymer with measurable mechanical properties. Base-pair stacking energies, hydrogen bond strengths, and local flexibility are governed by well-understood thermodynamics. HexaGene formalizes these principles into a deterministic scoring framework.

Base-Pair Thermodynamics → Binary Encoding

G ≡ C

3 Hydrogen Bonds

ΔG ≈ -21 kJ/mol

→ 1 (Rigid)

A = T

2 Hydrogen Bonds

ΔG ≈ -14 kJ/mol

→ 0 (Flexible)

🔬

Nearest-Neighbor Thermodynamics

The stability of DNA depends on stacking interactions between adjacent base pairs — the same principle used in RNA folding models and PCR primer design.

ΔG = Σ ΔG(stack) + ΔG(init)

📐

Sequence Context Effects

A mutation's impact depends on its neighbors. CpG dinucleotides, trinucleotide contexts, and codon position all modulate local physical stability.

6bp sliding window analysis

⚖️

Structural Stress Quantification

Mutations that disrupt local stiffness, introduce torsional strain, or break symmetric patterns create measurable "structural dissonance."

SDS = f(stiffness, lability, harmony)

Seven Biophysical Features

Nucleotide Transition Severity

Transversions cost more energy than transitions

T = 9.51

GC-Content Perturbation

Local rigidity changes from base composition shifts

T = 7.70

Harmonic Balance

Symmetry of the central 4-base "nuclear core"

T = 7.36

Local Sequence Stiffness

Mechanical resistance to conformational change

Validated

Codon Position Impact

Wobble vs. critical position effects

Validated

Compositional Complexity

Information density of local context

Validated

Neighbor-Codon Transitions

Elemental conflicts between adjacent codons

T = 2.33

📐 Thermodynamic Basis

🧬 Context-Aware

✓ ACMG-Compatible

🔄 Null-Model Validated

Validation Evidence

Proven on Real Biological Data

Every module is independently validated on published datasets. No simulations. No synthetic benchmarks.

Primary Validation

ClinVar Pathogenicity Study

38,000 genetic variants from NCBI ClinVar. Engine blind to clinical labels. Physics-based risk calculation separated benign from pathogenic with mathematical certainty.

T = 16.39

T-Statistic

10⁻⁶⁰

P-Value

38,000

Variants Tested

Zero-Shot

No Training

Orthogonality Proof • VUS Rescue

REVEL Grey Zone Resolution Study

2,000 ClinVar missense variants benchmarked against REVEL ensemble predictor. HexaGene maintains discrimination where conservation-based tools fail — proving statistical independence and complementary signal.

T = 13.20

Overall Separation

T = 3.62

Grey Zone (0.4-0.6)

AUC = 0.67

VUS Rescue

r = 0.27

REVEL Correlation

145

Grey Zone Variants

5.1×10⁻⁴

Grey Zone P-Value

Fragile Genes (High Volatility)

T = 10.30

TTN, BRCA2, cardiac panels

Robust Genes (High Stiffness)

T = 7.33

40% stronger in fragile genes

Population Health

NHANES Reverse Imputation

7,939 participants from CDC national survey. Structural constants inferred from routine biomarkers without genetic data.

4.62σ

Separation

98.4%

Accuracy

Manufacturing

β-Lactam Enzyme Stability

IPNS enzymes across 15 organisms. Structural conflict rate predicts expression stability with near-perfect correlation.

ρ = -0.92

Correlation

10⁻⁷

P-Value

Frequently Asked Questions

Understanding HexaGene

All

Core Engine

Manufacturing

Diagnostics

Longevity

CORE What scientific standards does HexaGene align with? ▼

HexaGene builds on established principles already standard across genomics and biophysics:

1. Nearest-Neighbor Thermodynamics: The same energy models used in RNA folding (Vienna, Mfold) and PCR primer design.

2. Sequence-Context Modeling: Consistent with CpG mutability, trinucleotide signatures, and codon context effects in translation.

3. ACMG Orthogonal Evidence: Clinical variant interpretation explicitly encourages independent evidence lines — HexaGene qualifies as it doesn't reuse conservation features.

4. Null-Model Validation: Features validated against sequence-shuffled controls and GC-matched nulls, matching gold-standard biophysics methodology.

HexaGene is not new biology — it formalizes well-understood physical constraints into a deterministic framework.

CORE What makes HexaGene different from AI/ML tools? ▼

HexaGene is a deterministic physics engine, not a machine learning model. It calculates structural stress, friction, and decay from first principles using equations — not patterns learned from data. This means it can assess novel sequences, rare variants, and never-seen-before mutations because it computes physics, not history. There is no training, no fitting, no black box.

CORE What does "zero-shot capability" mean? ▼

Zero-shot means HexaGene can assess sequences it has never encountered before. Traditional tools require similar examples in their training data. HexaGene calculates the physical stress on any DNA structure from scratch — including ultra-rare variants (AF < 0.0001) and completely novel synthetic sequences. This was validated in the ClinVar study where the engine successfully predicted pathogenicity for variants with no population history.

CORE What are k, μ, λ, and SRI? ▼

These are the four structural constants that HexaGene calculates:

k (Structural Resilience): How quickly the system returns to equilibrium after stress.
μ (Metabolic Friction): Turbulence, viscosity, and inflammatory drag in the system.
λ (Structural Decay): Accelerated aging and accumulated material fatigue.
SRI (Structural Risk Index): Composite measure of global structural tension.

These are analogous to material constants in engineering — they describe how biological structures respond to stress.

CORE Does HexaGene replace existing tools? ▼

No. HexaGene complements existing tools by adding a physics-based layer. Conservation scores (SIFT, PolyPhen) tell you what evolution has filtered. Codon optimization (CAI) tells you about translation efficiency. HexaGene tells you about structural stress — an orthogonal dimension. The best results come from combining HexaGene with existing pipelines, not replacing them.

CORE Is the methodology published? ▼

Validation results and benchmark datasets are publicly available on GitHub and Zenodo. A technical preprint describing the REVEL benchmark study is available on bioRxiv. The core mathematical framework is protected by US Provisional Patent #63/918,749. We believe in open science for validation while protecting the underlying intellectual property.

CORE How reproducible are the results? ▼

Fully reproducible. The pipeline is deterministic — same input always produces same output. There is no training, no random initialization, no stochastic components. Any institution with access to the same public datasets (ClinVar, NHANES) can reproduce our validation results exactly. Validation scripts are available on GitHub.

MFG How does HexaGene improve expression yields? ▼

HexaGene identifies structural stress patterns in DNA sequences that lead to expression failure — independent of codon adaptation. In our β-lactam enzyme study, structural conflict rate predicted stability with ρ = -0.92 (p < 10⁻⁶). This means we can flag problematic sequences before wet lab, reducing failed batches and optimization cycles.

MFG Can HexaGene predict aggregation? ▼

Yes. In our GLP-1 peptide validation, structural conflict rate at junction regions correlated with aggregation propensity (ρ = 0.67, p = 0.002). Constructs that failed due to aggregation had 14% higher junction conflict rates. This allows early identification of aggregation-prone sequences before formulation development.

DX How accurate is HexaGene for variant classification? ▼

In the ClinVar validation study (38,000 variants), HexaGene achieved T-statistic = 16.39 and p-value = 3.43 × 10⁻⁶⁰. For context, a T-statistic of 2.0 is typically considered significant; 16.39 represents exceptional separation between benign and pathogenic groups. The engine was completely blind to clinical labels during assessment.

DX Can HexaGene explain WHY a variant is pathogenic? ▼

Yes — this is a key differentiator. While ML tools output "probably damaging" without explanation, HexaGene provides mechanistic interpretation: "High friction mutation in low-stiffness region causes structural failure." The physics constants (stiffness, friction, decay) explain what makes a mutation break the structure. This supports clinical reasoning and suggests intervention targets.

DX Is HexaGene a diagnostic test? ▼

No. HexaGene does not diagnose disease or assign clinical labels. It provides structural state metrics (stiffness, friction, decay, risk scores) that support clinical reasoning. Think of it as a "structural health lens" — a complementary layer of physics-based evidence to help interpret variants, not replace clinical judgment.

DX How does HexaGene perform in REVEL grey zones? ▼

When REVEL scores fall between 0.4-0.6 (the "grey zone" affecting 15-25% of variants), HexaGene maintains discrimination with T = 3.62 (p = 5.1×10⁻⁴) and AUC = 0.67. The low correlation with REVEL (r = 0.27) confirms HexaGene measures a distinct biological signal — structural physics rather than evolutionary conservation. This makes it particularly valuable for VUS rescue.

LONG How does reverse imputation work? ▼

HexaGene can infer structural physics from biomarker patterns without requiring genetic data. The same equations that map DNA → structure can be inverted to map biomarkers → structure. In the NHANES validation, we computed k, μ, λ from routine labs (HbA1c, triglycerides, albumin, CRP, etc.) and achieved 4.62σ separation between healthy and metabolically stressed cohorts.

LONG What biomarkers are required? ▼

The validated minimal panel includes: HbA1c, fasting glucose, triglycerides or LDL, albumin, creatinine, and hs-CRP. Optional additions include WBC and lipid panel. These are standard markers available in any routine blood panel — no specialized tests required. The engine can also integrate wearable data (HRV, activity) in hybrid mode.

LONG How early can HexaGene detect risk? ▼

The NHANES validation showed that structural decay (λ) and friction (μ) rise before biomarkers cross diagnostic thresholds. Early estimates suggest 6-18 months of lead time for metabolic syndrome detection. This is because HexaGene measures system-wide structural stress, not just single pathway markers. Standard risk tools typically achieve 1.5-2σ separation; HexaGene achieves 4.62σ.

Biology is Structural Physics.
Now We Can Calculate It.

Structural Physics Explorer

Physics of the Genetic Code

Base-Pair Thermodynamics → Binary Encoding

Nearest-Neighbor Thermodynamics

Sequence Context Effects

Structural Stress Quantification

Pattern Recognition Has Hit Its Ceiling

Variants of Uncertain Significance

Late-Stage Drug Failures

Expression & Aggregation Failures

Disease Detection Lags Pathology

One Core Engine. Multiple Applications.

HEXACORE

Manufacturing

Drug Discovery

Diagnostics

Longevity & Inverse

Proven on Real Biological Data

ClinVar Pathogenicity Study

REVEL Grey Zone Resolution Study

Fragile Genes (High Volatility)

Robust Genes (High Stiffness)

NHANES Reverse Imputation

β-Lactam Enzyme Stability

Open Science, Protected IP

GitHub Repository

VUS Rescue Paper

NHANES Validation Data NEW

Clinical Oracle Demo

Understanding HexaGene

Partner With Us

Biology is Structural Physics. Now We Can Calculate It.

Structural Physics Explorer

Physics of the Genetic Code

Base-Pair Thermodynamics → Binary Encoding

Nearest-Neighbor Thermodynamics

Sequence Context Effects

Structural Stress Quantification

Pattern Recognition Has Hit Its Ceiling

Variants of Uncertain Significance

Late-Stage Drug Failures

Expression & Aggregation Failures

Disease Detection Lags Pathology

One Core Engine. Multiple Applications.

HEXACORE

Manufacturing

Drug Discovery

Diagnostics

Longevity & Inverse

Proven on Real Biological Data

ClinVar Pathogenicity Study

REVEL Grey Zone Resolution Study

Fragile Genes (High Volatility)

Robust Genes (High Stiffness)

NHANES Reverse Imputation

β-Lactam Enzyme Stability

Open Science, Protected IP

GitHub Repository

VUS Rescue Paper

NHANES Validation Data NEW

Clinical Oracle Demo

Understanding HexaGene

Partner With Us

Biology is Structural Physics.
Now We Can Calculate It.