🌎
This job posting isn't available in all website languages

(While navigating through the site, please be sure to disable your pop-up blocker.)


Data Scientist, Genomic, Multi-Omics

📁
Information Technology
💼
Genomic Med Rsch Department 710562
📅
175771 Requisition #
Sign Up for Job Alerts

The Institute for Data Science in Oncology (IDSO) aims to enable and support MD Anderson’s growth as a data-driven cancer center where data, data science and continuous learning converge to transform insights into impact through advanced patient treatments, better experiences and innovative care. One of the many components of this is to help support a major collaborative effort referred to as the Break Through Cancer Data Science Hub TeamLab. Break Through Cancer TeamLabs are comprised of individuals from many different institutions all dedicated to “Radical Collaboration.” The Data Science Team Lab helps in developing methods and pipelines for the other cancer specific Team Labs analysis in addition to supporting data transfer needs.

Function Summary

We are seeking a highly skilled and mission-driven Data Scientist to support workflows supporting the Breakthrough Cancer Initiative. This individual will play a critical role in managing complex, multi-modal research datasets—primarily genomic and molecular data—by ensuring their usability, quality, and integration into translational research pipelines. This position will be responsible for metadata curation, data structure assessment, data flow tracking, and implementation of scalable data transfer and harmonization strategies across clinical and research environments. In this role, the data scientist will interface with a wide variety of both clinical and laboratory-based researchers across MD Anderson Cancer Center, Dana-Farber Cancer Institute, MIT, the Memorial Sloan-Kettering Cancer Center, Johns Hopkins University and other institutions. We anticipate that this person will present regularly to dozens of faculty across these institutions.

We are looking for a person who will think independently and take ownership of their projects and responsibilities. The work will be focused on gathering, organizing, and initiating analyses of complex datasets and communicating results in talks and papers. A familiarity with both quantitative analysis and biomedical principles is required, as is the ability to write code. Equally important is the ability to work well with people from various backgrounds and to communicate effectively.

Key Responsibilities:

* Serve as the primary data lead for a portfolio of research projects under the Breakthrough Cancer Initiative.
* Evaluate and interpret a broad range of biomedical data types, with an emphasis on genomic, transcriptomic, and other omics data, as well as relevant clinical annotations.
* Design and manage systems for metadata compilation, curation, and validation to ensure data integrity and reusability.
* Oversee and execute data transfer workflows, including secure exchange with external collaborators, labs, and data commons.
* Track and document data flow, provenance, and usage across the research lifecycle.
* Work closely with investigators, data analysts, bioinformaticians, and IT teams to align data operations with research goals.
* Implement or contribute to automation and reproducibility of data-related processes (e.g., using scripting, workflow languages, or APIs).
* Ensure compliance with data governance, privacy, and security regulations, including HIPAA and institutional IRB protocols.
* Provide data coordination support during the design, launch, and conduct of translational and precision oncology studies.

Other Duties:

* Attend collaborator meetings and team working group meetings. Prioritize and manage multiple projects in a timely and resource-effective manner.
* Stay up to date with relevant literature, gather information systematically, and confer with the supervisor regarding new procedures.

Ideal candidate will have:

* Master’s or PhD in Bioinformatics, Computational Biology, Biomedical Informatics, Data Science, or related field.
* Minimum of 3+ years of experience managing biomedical research data, ideally in oncology or precision medicine contexts.
* Deep understanding of genomic and multi-omics data types, data structures, and file formats (e.g., VCF, BAM, FASTQ, etc).
* Experience working with or designing metadata schemas and familiarity with data standards and sharing large datasets via cloud tools (e.g. AWS).
* Proficiency with data wrangling tools and programming/scripting languages (e.g., Python, R, bash, SQL).
* Experience using or integrating with data platforms such as cBioPortal, LabKey, Terra, Seven Bridges, Cirro, or other research data environments.
* Strong understanding of data governance and privacy requirements in healthcare research.
* Excellent communication and documentation skills, especially in translating technical data concepts for non-technical stakeholders.
* Experience in an academic medical center, NCI-designated cancer center, or large-scale biomedical research initiative.
* Experience contributing to or maintaining data dictionaries, data models, or common data elements in support of data harmonization.

Other duties as assigned. 

EDUCATION: 
Required: Bachelor's degree in Biomedical Engineering, Electrical Engineering, Computer Engineering, Physics, Applied Mathematics, Science, Engineering, Computer Science, Statistics, Computational Biology, or related field.
Preferred: Master’s or PhD in Bioinformatics, Computational Biology, Biomedical Informatics, Data Science, or related field.

EXPERIENCE: 
Required: Three years experience in scientific software or industry development/analysis. With Master's degree, one years experience required. With PhD, no experience required.

Preferred: 
* Deep understanding of genomic and multi-omics data types, data structures, and file formats (e.g., VCF, BAM, FASTQ, etc).
* Experience working with or designing metadata schemas and familiarity with data standards and sharing large datasets via cloud tools (e.g. AWS).
* Proficiency with data wrangling tools and programming/scripting languages (e.g., Python, R, bash, SQL).
* Experience using or integrating with data platforms such as cBioPortal, LabKey, Terra, Seven Bridges, Cirro, or other research data environments.
* Strong understanding of data governance and privacy requirements in healthcare research.
* Excellent communication and documentation skills, especially in translating technical data concepts for non-technical stakeholders.
* Experience in an academic medical center, NCI-designated cancer center, or large-scale biomedical research initiative.
* Experience contributing to or maintaining data dictionaries, data models, or common data elements in support of data harmonization.
 

My Submissions

Track your opportunities.

My Submissions

Similar Listings

United States, Texas, Houston, Houston (TX Med Ctr)

📁 Information Technology

Requisition #: 175862

United States, Texas, Houston, Houston (TX Med Ctr)

📁 Information Technology

Requisition #: 175802

United States, Texas, Houston, Houston (TX Med Ctr)

📁 Information Technology

Requisition #: 175513