MOVR DataHub Analytics
Python Package GSoC 2026MOVR Legacy DataHub Library
A Python package for analyzing clinical registry data from the MOVR DataHub (2019-2025): 3,570 participants, 11,501 encounters across ALS, DMD, SMA, BMD, LGMD, FSHD, and Pompe disease.
Overview
MOVR DataHub Analytics transforms raw Excel registry exports into analysis-ready datasets through automated data wrangling, quality validation, and cohort management.
Key Capabilities
Data Pipeline
Excel to Parquet conversion with audit logging
Data Wrangling
YAML-configurable transformation rules
Cohort Management
Flexible patient cohort building
Analytics
Descriptive statistics and reporting
Data Dictionary
Field search and metadata exploration
Plugin System
Custom transformation extensions
Project Structure
movr-datahub-analytics/ ├── src/movr/ # Main package │ ├── config/ # Configuration management │ ├── data/ # Excel/Parquet loading │ ├── wrangling/ # Data cleaning │ ├── cohorts/ # Cohort management │ ├── analytics/ # Analysis framework │ ├── dictionary/ # Data dictionary tools │ └── cli/ # Command-line interface ├── config/ # YAML configuration files ├── data/ # Data storage (gitignored) ├── notebooks/ # Jupyter examples ├── tests/ # Test suite └── docs/ # Documentation
Roadmap
Phase 1: Core Library (2025)
- Package structure
- Excel to Parquet conversion
- Data wrangling
- Cohort management
- CLI implementation
Phase 2: Advanced (2026)
- Config-driven cohort builder
- Disease-specific analysis rules
- Workflow orchestration
- Visualization tools
- Web interface (FastAPI)
Installation
Note: This package is not yet on PyPI. Install locally in editable mode.
Requirements
- Python 3.9+
- Git
- Virtual environment (recommended)
Basic Installation
# Clone the repository git clone https://github.com/OpenMOVR/movr-datahub-analytics.git cd movr-datahub-analytics # Create virtual environment python3 -m venv .venv source .venv/bin/activate # Windows: .venv\Scripts\activate # Install package pip install -e .
Full Installation (recommended)
# Install with all optional dependencies pip install -e ".[dev,viz,notebooks]" # This includes: # - dev: pytest, black, mypy, pre-commit # - viz: matplotlib, seaborn, plotly # - notebooks: jupyter, ipywidgets
Configuration Setup
# Run the setup wizard (first time) movr setup # This will: # - Create config/config.yaml # - Set up data directories # - Configure Excel file paths # - Initialize audit logging
Verify Installation
# Check CLI is working movr --help # Check version movr --version # Run tests pytest
CLI Reference
The movr command-line interface provides access to all package functionality.
Setup Commands
# Interactive setup wizard movr setup # Show current configuration movr config show # Validate configuration movr config validate
Data Pipeline
# Convert Excel files to Parquet movr convert # Convert specific file movr convert --file "path/to/file.xlsx" # Validate data quality movr validate # View data summary movr summary --registry datahub --metric all movr summary --registry datahub --metric enrollment
Data Dictionary
# Search for fields movr dictionary search "age" movr dictionary search "medication" --diseases "DMD,SMA" movr dictionary search "ambulation" --diseases "all" # List all fields movr dictionary list-fields # Show specific field details movr dictionary show-field FACPATID # Export dictionary movr dictionary export --format csv
Cohort Management
# Create cohort from YAML config movr cohort create --config cohort_config.yaml # List existing cohorts movr cohort list # Export cohort movr cohort export --name "my_cohort" --format parquet
Analytics
# Run descriptive statistics movr analytics describe --cohort "my_cohort" # Generate report movr analytics report --cohort "my_cohort" --output report.html
Contributing
We welcome contributions from both the research and software engineering communities.
GSoC 2026 Priority Project
This is a priority project for Google Summer of Code 2026. Students should review the Contributing Guide.
Development Setup
# Install with development dependencies pip install -e ".[dev]" # Install pre-commit hooks pre-commit install # Run all checks pre-commit run --all-files
Code Quality
# Run tests pytest # Run tests with coverage pytest --cov=src/movr # Format code black src/ tests/ # Type checking mypy src/ # Lint ruff check src/ tests/
Pull Request Process
- Fork the repository on GitHub
- Create a feature branch:
git checkout -b feature/amazing-feature - Make changes following coding standards
- Add tests for new functionality
- Run tests and linters
- Commit with clear messages
- Push and create a Pull Request
Contribution Areas
Code
- Core library features
- Bug fixes
- Performance improvements
- Plugin development
Documentation
- Tutorials and guides
- API documentation
- Example notebooks
- Translations
Resources
- README - Project overview
- Quick Start Guide
- Contributing Guide
- Technical Docs
- Example Notebooks