PhenoGenX Documentation

Comprehensive technical documentation, implementation guides, and scientific specifications for the PhenoGenX HIV drug resistance interpretation platform.

Technical Specifications
System Architecture
  • Backend: Flask + Python 3.10+
  • Frontend: Bootstrap 5 + JavaScript
  • ML Framework: Scikit-learn, LightGBM, XGBoost
  • Database: SQLite/PostgreSQL
API Documentation
  • RESTful API endpoints
  • JSON request/response formats
  • Authentication & rate limiting
Database & Schemas
Database Schema
  • Sequence metadata storage
  • Mutation profile tables
  • Prediction results schema
  • User & session management
Data Models
  • ER Diagrams
  • Primary/Foreign key relationships
  • Data migration scripts
SOPs & Protocols
Standard Operating Procedures
  • Sequence quality control SOP
  • ML model validation protocol
  • Data security & privacy SOP
  • Report generation workflow
Implementation Guides
  • Deployment checklist
  • Maintenance procedures
  • Troubleshooting guide
Technical Implementation Details

HXB2 Alignment Pipeline

Multi-stage alignment process for accurate mutation calling:

  1. MAFFT Alignment: Initial sequence alignment using MAFFT v7.525
  2. Muscle Refinement: MUSCLE v5.1 for consensus refinement
  3. Biopython Fallback: Custom Python alignment for edge cases
  4. HXB2 Coordinate Mapping: Standardized position mapping
Mutation Calling Logic
  • Amino acid substitution detection
  • Insertion/deletion handling
  • Ambiguous base resolution
  • Quality score integration

Algorithm Architecture

Multi-source rule integration for comprehensive resistance interpretation:

  • Stanford HIVDB: Weighted mutation scoring system
  • ANRS Algorithm: French National Agency scoring
  • IAS-USA Guidelines: Major/accessory mutation classification
  • WHO SDRM: Surveillance drug resistance mutations
Scoring System
Resistance Level Score Range Clinical Interpretation
Susceptible 0-9 No significant resistance
Potential Low-Level 10-14 Possible reduced susceptibility
Low-Level 15-29 Reduced susceptibility
Intermediate 30-59 Significant resistance
High-Level ≥60 High-level resistance

CRPS-Optimized Ensemble

Continuous Ranked Probability Score optimization for model selection:

Base Models
  • ElasticNet: Regularized linear regression
  • LightGBM: Gradient boosting framework
  • XGBoost: Extreme gradient boosting
  • Random Forest: Ensemble decision trees
Optimization
  • CRPS-based weighting
  • Cross-validation (5-fold)
  • Hyperparameter tuning
  • Model calibration
Training Dataset
  • Size: 45,000+ genotype-phenotype pairs
  • Sources: Stanford, Los Alamos, EPHI clinical data
  • Subtypes: A, B, C, CRF01_AE, CRF02_AG
  • Drugs: 22 ARV medications

Supported Input Formats
FASTA Format
>Sequence_ID
ATGACC...
Mutation CSV
ID,Mutations
SEQ001,K103N,M184V
Plain Mutation List
K103N, M184V, G190A
Output Formats
  • JSON: Complete structured response
  • CSV (Wide): One row per sequence
  • CSV (Long): One row per sequence-drug pair
  • PDF Reports: Clinical summary reports
  • Excel: Formatted worksheets
Release Notes & Changelog
Version History
v1.0 (Current)

Initial production release with core rule-based and ML engines.

v1.1 (Planned)

Batch processing enhancements and API improvements.

v1.2 (Planned)

Subtype inference and advanced visualization tools.

v2.0 (Roadmap)

Full API ecosystem and integration capabilities.

Summary Reports & Analytics
Platform Analytics
  • Usage statistics dashboard
  • Data processing metrics
  • Performance benchmarks
  • User activity reports
Validation Reports
  • Model performance validation
  • Concordance analysis reports
  • Monthly summary reports
Documentation Status

Complete: Technical specifications, API documentation, database schemas

In Progress: SOPs, implementation guides, detailed protocol documentation

Planned: Interactive API playground, video tutorials, comprehensive PDF manual (v1.3)