Big Data Solutions Architects | Distributed Computing and Optimization | Spark and Scala Coaching

about us.

Diatom is a team of experts in distributed and cloud-based computing, database/ETL application architectures, software development best practices, artificial intelligence / machine learning, bioinformatics, and electronic medical record (EMR) data processing. We are based in Boston, Massachusetts.


Healthcare Informatics and Big Data
Diatom personnel transformed the electronic medical record processing of a major healthcare analytics company from a traditional, batch-oriented Oracle data factory to a scalable distributed-computing approach that leveraged Scala and Apache Spark on Hadoop. This platform handles the integration and analysis of over 90 million patient medical records in “near-real-time”. Diatom also planned and executed a training initiative on Scala Functional Programming and Spark for dozens of Java developers, SQL developers, analysts, and clinical informaticians, fundamentally changing the way the organization works with data.

Rethinking the way the CDC collects disease notifications from the states
A core mission of our nation's Centers for Disease Control and Prevention (CDC) is to intake data from state health departments to monitor cases of dozens of different diseases from Chicken Pox to Tuberculosis. The collection of this data helps CDC epidemiologists analyze trends, and better understand the state of health throughout the country. In a year long engagement, Diatom partnered with CDC epidemiologists and several high-profile consulting agencies to prototype and deploy a new NEDSS messaging system. The resulting system was less expensive to maintain, easier to query, and more performant.

Improving the functional annotation of newly-discovered bacterial genes
COMBREX (COMputational BRidges to EXperiments, http://combrex.bu.edu) is an NIH-funded initiative bringing together computational and experimental biologists with the goal of greatly improving our overall understanding of microbial protein function. Diatom collaborated with Boston University researchers to develop COMBREX components for user-management and other system tools.

Helping a life-sciences company optimize its platform for next-generation sequencing
A leader in next-generation sequencing technologies hired Diatom to work with their in-house informatics team to develop a multiplex PCR primer design pipeline critical to their technology's sample preparation protocol. Diatom leveraged Amazon EC2 cloud computing resources and our expertise in multi-objective optimization to help their informatics staff clarify some of the assay design tradeoffs involved.

Helping to identify the right BI solution for your business
Diatom personnel conducted a thorough review of Business Intelligence solutions for a leading robotics company, helping them select the correct solution for their needs.

Pioneering the analysis of biological pathways
Diatom worked with a major pharmaceutical company to create a web-based tool for genomic data integration and biological pathway analysis. This effort resulted in a massive knowledge network that documented connections between genes, proteins, functional annotations, diseases, biological pathways, compounds, bioassays, and clinical trials.