Computational Biology · Researcher

Nobel Data Intelligence.

Physics-informed deep learning for protein stability and enzyme kinetics. A tri-modal architecture fuses ProtT5 sequence embeddings, a VDOS vibrational spectrum from normal mode analysis, and ChemBERTa chemistry through learned gated attention.

Role: Researcher
When: 2024
Stack: Python, PyTorch, ProDy, Transformers
Scale: 3-branch tri-modal fusion

View live demo ↗GitHub ↗

ProtT5 · VDOS · ChemBERTa

3-branchtri-modal fusion

1000point VDOS spectrum

unit-testedboth subsystems

2 phasesQDD then VibroPredict

The problem

Most protein-property models treat a protein as a static sequence or a single frozen structure, ignoring that a protein is a moving object whose vibrations carry information about how it behaves. The goal was a prediction framework for protein stability and enzyme kinetics that adds a physics-based dynamics signal to sequence and chemistry, and that keeps working when one of those inputs is missing.

What it does

A tri-modal architecture that encodes three views of a protein: ProtT5 for sequence, a 1D SpectralCNN over a vibrational density-of-states (VDOS) spectrum for dynamics, and ChemBERTa plus differential reaction fingerprints for substrate chemistry.
A VDOS spectrum is computed per structure from normal mode analysis (ANM/GNM via ProDy), turning protein vibrations into a 1000-point spectral feature that sequence and chemistry models never see.
A learned gating network emits softmax attention weights over the three branches, so the model decides how much to trust each modality per prediction instead of concatenating them blindly.
MM-Drop training randomly masks the spectral branch during training, so the model degrades gracefully at inference when no structure (and therefore no VDOS) is available.

Impact

Two-phase codebase: a Quantum Data Decoder core for general molecular property prediction, and VibroPredict on top of it for enzyme catalytic-turnover (k_cat) prediction.
Unit-tested across both subsystems, with eight Jupyter notebooks from quickstart through ablation and SOTA comparison, plus Colab training notebooks.
CLI entry points and a batch inference pipeline, and an interactive deployed demo that renders the VDOS spectrum live.