Sema4 Research

Sema4 is a patient-centered health intelligence company dedicated to advancing healthcare through data-driven insights. Our cutting-edge research is essential to this mission.

We publish regularly in peer-reviewed, high-impact factor journals and collaborate extensively with health systems, clinicians, and pharmaceutical companies to deliver insights that drive precision medicine. Our research team includes world leaders in data science, machine learning, network modeling, and genomics. Our ongoing collaborations with scientists and clinicians in the Mount Sinai and other healthcare systems allow Sema4’s research to remain patient-centered and clinically relevant.

By the numbers


peer-reviewed research papers published since 2017


h-index score


PhDs and MDs currently employed at Sema4


large next generation sequencing (NGS) panels run

Our research

Sema4’s research concentrates on structuring multidimensional data collected from our advanced genomic tests and real-world data sources into clinical insights. We use this structured data and our proprietary technology platforms, including bioinformatic pipelines and integrative predictive modeling, to discover disease mechanisms and identify clinically actionable biomarkers, and create products that support patient care, from information-driven genomic tests to digital tools.

Much of our research is powered by Centrellis®, Sema4’s innovative health intelligence platform, which enables us to generate a more complete understanding of disease and wellness and deliver data-driven insights to our clinical collaborators to help drive better health decisions.

Since 2017, Sema4’s scientists have published more than 210 papers in peer-reviewed, high-impact factor journals. We also regularly present our research at national and international conferences.  Our research focuses on five interlinked areas:

We develop, test, and maintain big data solutions to transform, aggregate, and abstract data into high-quality formats that are optimized for query and analysis. We then apply advanced technologies, including artificial intelligence and natural language processing (NLP), to these data to extract meaningful predictive information, which furthers our understanding of disease and wellness. Click here to see some of our recent data science & engineering publications.
Centrellis enables us to aggregate, abstract, and structure real-world data, such as the data contained in electronic health records (EHRs). We run this unstructured data through multiple pipelines leveraging machine learning-enabled NLP, augmented as needed by human annotators, to extract information and knowledge. Our multiscale, integrative strategy then allows us to connect the processed EHR data with complex biological data from many sources, including the genome, proteome, and transcriptome, and conduct real-world evidence studies. Click here to see some of our recent real-world evidence publications.
Sema4 has developed methodologies to integrate diverse multi-omics data, including genomic, transcriptomic, and proteomic data, into causal probabilistic network models. These machine learning-based models help us to understand disease processes and identify key biomarkers through advanced network analysis. Our scientists have also pioneered the use of DNA variation information to statistically infer causal relationships among any number of traits that have common genetic variance components. We can then systematically apply these causal relationships to traits to infer probabilistic causal network structures that we can mine for a broad range of discoveries. Click here to see some of our recent network modeling publications.
We are continually designing, developing, and improving assays across a range of sequencing technologies. One example of this is our pharmacogenomics (PGx) research, which focuses on advancing the development of tests to identify genetic variants for drug response associated with medically actionable and clinically relevant data. Such tests can help clinicians to make more informed treatment decisions. Click here to see some of our recent molecular profiling publications.
Informed by years of experience in patient-centered care, we build digital tools and solutions to enable our clinician partners and their patients to engage with complex data and insights via friendly user interfaces. These technologies support mobile health research, medical record integration, and patient input into the research process, enabling multidimensional clinical insights.

View our Publications

Precision oncology
Reproductive health
& rare disease
View complete
publication list

How can digital risk assessment tools combine artificial intelligence and real-world data to reduce morbidity and mortality in postpartum hemorrhage (PPH)?

View our latest Sema4 Research Highlight showcasing two publications in a special print issue of “Informatics for Sex- and Gender-Related Health” in the Journal of the American Medical Informatics Association (JAMIA) to learn how:

  • Sema4 developed a novel PPH risk stratification tool using longitudinal real-world data and advanced machine learning algorithms
  • Digital phenotyping algorithms identified potential clinical thresholds potentially indicative of increased PPH risk that are not currently used in standard clinical practice
  • Digital risk prediction tools enable earlier PPH identification, prevention, and intervention to improve maternal health outcomes

Our Partnerships

As a research partner, Sema4 can support a range of studies, from optimizing biomarker discovery to leveraging our biorepository services and accelerating clinical trial enrollment. We can bring a range of capabilities to partnerships, including:

A portfolio of sequencing solutions for the family health journey, including reproductive and women’s health, pediatrics, hereditary cancer, and rare disorders.
Assistance with in-depth interpretation of generated sequencing data. Our precision medicine experts work with partners to interpret results, including de novo biomarker signature discovery, and create predictive network models.
Expert structuring and analysis of large data sets, and industry-leading bioinformatics pipelines. Using machine learning and natural language processing, we draw insights from structured data at the individual and cohort level and make these insights accessible to partners through our digital tools.

We currently partner with numerous pharmaceutical companies, major health systems, research consortia, clinicians, and advocacy groups.

Interested in Partnering with Us?

Fill out the contact form below for more information.