Computational Biology

Nobel Data Intelligence

Physics-informed deep learning for protein stability and enzyme kinetics. Treats proteins as vibrating machines — extracting Vibrational Density of States from Normal Mode Analysis for a novel tri-modal prediction architecture.

View on GitHub
PyTorch GeometricProDyBioPythonRDKitTransformers
MoleculeTransformGraph
CCO (Ethanol)
CCO
Molecular Structure
Atoms and bonds

Fig. 1 — Molecular structure transformed into a graph representation for GNN processing.

Architecture

Tri-Modal Fusion

Three encoders capture sequence, structure+dynamics, and chemical features. Gated attention fusion combines them for prediction.

ProtT5

Sequence Encoder

Protein language model (Rostlab/prot_t5_xl_uniref50) generates 1024-dim embeddings from amino acid sequences.

GATv2 + VDOS

Structure & Dynamics

Graph Attention Network encodes protein topology. Vibrational Density of States from Normal Mode Analysis captures dynamics invisible to static models.

ChemBERTa + DRFP

Chemical Encoder

Chemical language model (seyonec/ChemBERTa-zinc-base-v1) with Differential Reaction Fingerprints encodes substrates into 512-dim embeddings.

GATv2

Attention-Based Message Passing

Each node aggregates information from neighbors using learned attention weights. After multiple layers, atoms encode their full molecular environment — capturing the chemical context that determines protein behavior.

Fig. 2 — GATv2 message passing with attention-weighted neighbor aggregation.

GATv2 Message Passing
01234
Node 0 aggregating from neighbors
hi = σ(Σj∈N(i) αij·W·hj)

Framework

What Makes It Unique

Physics-informed approach — VDOS from Normal Mode Analysis captures protein vibrations, not just static structure
MM-Drop training — multimodal dropout ensures robustness when one modality is missing
Two-phase architecture — QDD framework (Phase 1-2) + VibroPredict enzyme kinetics (Phase 3)
139 unit tests across core modules and VibroPredict subsystem
8 Jupyter notebooks — from quickstart to ablation studies
CLI tools + batch inference pipeline for production use
jayhemnani9910/nobel-dataintelligence
139 Tests8 Notebooks3 Phases