We've built a state-of-the-art robotic lab that processes saliva and blood samples, outputting DNA sequence results without any human interaction. The lab is powered by our custom automation stack of hundreds of virtual servers and device drivers that schedule and execute operations on the robots. Our current tests identify mutations that cause early childhood disease, mutations that cause cancer, and identify fetal chromosomal abnormalities early in pregnancy using a standard blood sample from the woman. Ops Pipeline is a next-generation advancement to our automation infrastructure that allows us to ship products and technologies faster.
The Ops Pipeline framework exposes a functional DSL describing a directed graph of jobs that need to be executed in the laboratory. New lab protocols can be defined using an interactive web interface. This way, scientists and automation engineers can quickly reconfigure the laboratory to run experimental protocols. The user interface includes a complete management dashboard and visualization of lab operations, as shown in the figure; Ops (ovals) are applied to plates (squares) as a timestamp-ordered functional composition.
The framework also has an execution layer that is used to manage priority queues and schedule jobs on the robots. The execution layer is able to perform load balancing of jobs among multiple robots, and is designed to tolerate a variety of failures, such as network outages and mechanical failures.
In the short term, Ops Pipeline makes it easier to optimize assay performance, debug machine errors, and validate standard operating procedures. In the long term, this framework allows us to track and improve turn-around-time, ensure quality control of processes, and reduce cost.
Modernizing healthcare and making it patient-centric is a tough but critical challenge: the solution involves both software and people. We use modern media technologies to help patients access, interpret and understand their results. And if they have further questions, we give them easy access to scheduled or immediate consultation with medical experts. We're trying to automate healthcare's last-mile problem while still keeping it personal.
The screen for Fragile X mutations determines the length of a CGG repeat sequence in the FMR1 gene responsible for neural development. The sequence, normally 5-44 repeats, can elongate to 200+ repeats causing failure of the FMR1 gene to produce necessary protein.
Unfortunately, the high homology makes this genetic variation hard to detect through normal sequencing. Instead, capillary electrophoresis of the amplified CGG region is performed: the position of the two distinct sample peaks along the capillary are compared against a series of calibration peaks of known length to determine the number of CGG repeats.
Algorithm: Find the optimal assignment of calibration peaks from the raw data through a dynamic programming algorithm to minimize the size standard deviation (SSD) against trained statistical priors. These peaks serve as the basis to fit a logistic calibration curve mapping peak position to CGG repeat number which has a residual sum of squared error (RSS). Since the accuracy of the calibration curve is essential for correct results, we perform an orthogonal measurement of quality control (QC) against this entire process. The figure shows a 2-D histogram of training data of production batches in a vector space of the QC parameters SSD and RSS transformed by non-linear kernel.
Ultimately, we want to determine what are acceptable QC values which correspond to validated assay performance and which observations are outliers that should be re-tested. We train a binary classifier by a semi-heuristic EM method: 1. Maximum likelihood estimate of a multivariate Gaussian model of the non-outlier data (green elliptical contours for mean µ and covariance matrix ∑) 2. Expectation of the classification boundary for outliers as the p-value for the Gaussian when there is a divergence between the model and data distributions. This gives the solid QC boundary seen in the figure.
This sort of anomaly detection for QC provides assay performance benchmarks that are essential for a fully operational lab and to ensure that patients receive the most accurate testing possible.
“Someday people will know genetics like they know smartphones.”
“Every day, I come to work and learn something new about engineering, statistics, biology, or robotics. It's energizing to work at the crossroads of these fields, and rewarding to know our products help patients at important times in their lives.”
As a dev, you would have the chance to work across areas you've never seen before: your work could range from web design and UX engineering to backend web services and relational data modeling to next-gen DNA sequencing analysis to laboratory robotics. Ideally, you'd be an expert in some area of software or science, but you don't need to be a scientist to work at Counsyl!
Devs work on projects across all areas of the business:
People here come from many different backgrounds: computer science and software engineering, statistics, genetics, electrical engineering, chemistry, bioinformatics, clinical diagnostics, genetic counseling. There are weekly journal clubs, tech talks and workshops.
Work on things that are actually important. Your genome is the software that built your body. How it works, and how it fails, determines how you, and everyone else you know, lives and dies. Data modeling how the body works is the most significant puzzle, with the most significant impact, anyone can wrestle with.
Whether it’s releasing open-source software or blogging about the problems we tackle, we do our best to give back to the community.
Python is uniquely positioned between science and dev. We built an efficient stack around it that powers everything we do.
Summer and winter interns get an opportunity to own a project that will genuinely make an impact on one of our dev teams.See alumni projects