Loading…
Attending this event?
Wednesday, April 22 • 1:30pm - 2:10pm
Interval Trees for Genomic Feature Retrieval in Neo4j

Sign up or log in to save this to your schedule and see who's attending!

When you grow corn, yield is paramount. How much a corn seed will yield is encoded in its genome. If you visualized that genome in ASCII characters, you would see a seemingly random string of 25 million A, C, G, and T characters. Somewhere in that string you could find the interesting portions that governed how much corn that seed will yield, as well as others that provide useful properties to the corn plant making it resilient to different pressures.

At Bayer Crop Science, we track these interesting portions of the Corn genome as numeric intervals: start and stop indices within the string of 25 million characters. Our goal is to find relationships between intervals of interest to determine how to breed plants to produce the greatest yield and be resilient for different conditions around the world.

Our team, the developers of an internal genomics software stack within Bayer Crop Science, have been challenged to provide an API to our internal customers for efficiently finding these related intervals. In exploring solutions we came upon the interval tree data structure and implemented a means for storing and querying interval trees in Neo4j.

In this session we will discuss and demonstrate our approach as well as a set of Neo4j stored procedures created by Bayer Crop Science to effectively manage, retrieve and search interval tree data structures in large scale.

Speakers
avatar for Jason Clark

Jason Clark

Lead Data Engineer, Bayer
Jason Clark is a Lead Data Engineer in Bayer's Crop Science business unit with a depth of experience in delivering fit-for-purpose data and software solutions for genetics and genomics datasets using software design principles. As a founding member of Crop Science's Product360 data... Read More →



Wednesday April 22, 2020 1:30pm - 2:10pm
Room 4