News
22 November 2024
Stowers scientists reveal how a key protein drives gut lining regeneration
Discovery offers potential therapeutic target for certain cancers
Read Article
The Computational Biology team operates at the interface of the fields of biology, computer science, and statistics by developing and applying algorithms and models to understand biological systems and relationships.
The Computational Biology team at the Stowers Institute assists investigators with the analysis of biological data. The group combines software development and technical skills with biological insights to help find answers in complex and massive datasets. The bioinformatics expertise spans a wide range of topics, including but not limited to:
Open source software
Custom Tools
Based on the research, Computational Biology can develop custom tools for specific needs.
Tools We Use
The Computational Biology group uses a variety of software packages, both open-source and commercial, to assist us in our analysis process. Here are a few tools we currently use to process NGS data:
Team Contact
Director, Computational Biology, Bioinformatics, and Biostatistics
Stowers Institute for Medical Research
Hua Li joined the Stowers Institute in 2006 and completed her postdoctoral training in 2007. During that time, Li applied bi-variate analysis to improve the power of genome-wide association studies and constructed a Bayesian network using relaxed gene ordering. She became the Computational Biology group leader in 2009 and Head of Computational Biology in 2017. With over 10 years of bioinformatics experience, Hua was appointed Director of Computational Biology, Bioinformatics, and Biostatistics in 2019.
Hua Li joined the Stowers Institute in 2006 and completed her postdoctoral training in 2007. During that time, Li applied bi-variate analysis to improve the power of genome-wide association studies and constructed a Bayesian network using relaxed gene ordering. She became the Computational Biology group leader in 2009 and Head of Computational Biology in 2017. With over 10 years of bioinformatics experience, Hua was appointed Director of Computational Biology, Bioinformatics, and Biostatistics in 2019.
The field of Bioinformatics is fast moving, constantly changing and adapting to new developments in biological protocols and experiments. Our group keeps up with changes in the field by following literature, attending conferences, and procuring new software.
Starting with single cell RNA sequencing and expanding to other single cell based technologies like scATAC-seq (measuring regions of open chromatin), these methods allow researchers to measure genes or other features of interest in thousands of individual cells at a time. On one hand, this increases both the resolution of data and the types of questions they can answer; however, it also magnifies the complexity of data. We are always trying to learn new and better ways of analyzing and visualizing this data to provide our collaborators more useful and meaningful results. Currently, we use the cellranger software from 10x genomics and the Seurat R package among other tools to check data quality, cluster cells into types, identify marker genes, and compare samples.
New techniques like slide-seq or 10x Visium place a slice of tissue on a slide and use barcoded beads with a known position to associate a given spatial position with gene expression measurements for thousands of genes at a time. This allows biologists’ understanding of anatomy to help inform their identification of cell type, and it allows them to find genes with interesting and spatially variable patterns in their systems of study. Many new approaches to analyzing spatial data and integrating spatial with single cell data are under development and being used in our group.
Deep learning technologies have been widely used in biology to learn patterns from rapidly growing data for solving various problems. Our team has applied a set of deep learning tools for SNP and indel analysis (DeepVariant), peak calling (LanceOtron), motif discovery (BPNet), and protein structure prediction (AlphaFold). We also actively develop deep learning methods for different applications, such as flow cytometry image classification and multi-omics integration. We participate in an institute-wide deep learning journal club to keep abreast of the latest developments, and better understand how deep learning can facilitate researchers at the institute.
For data types we frequently encounter (RNA-seq, ChIP-seq, and single cell RNA-seq), we have developed robust pipelines to automatically run the first few steps of analysis and quality control. This saves time for challenging and more interesting downstream analyses, which varies from project to project.
For aspects of analysis that we or our collaborators regularly perform, we have developed a number of in-house web applications – RNA-seq differential expression, gene ontology enrichment, Venn diagram construction, and sequencing depth needed for a given experiment. These tools enable institute members who don’t necessarily want to go through the process of learning programming to perform some basic analysis themselves.
Optimal performance from our team requires many different types of software. From initial alignment and processing to custom scripts for making figures and tables of genes, we are constantly installing, testing, reading documentation, and writing code ourselves to help our collaborators find solutions to their questions.
When data comes off a sequencing machine, it is encoded in a binary file format that is only meaningful to a computer. The first few steps of analysis involve turning these raw files into files full of DNA sequences, usually millions of short (50-100 base) reads containing As, Cs, Ts, and Gs. Once we construct these (.fastq) files with sequences and quality values, we can use an alignment software to align the sequence reads to a genome or transcriptome so that we can identify what they represent. Depending on the type of data we are working with, they may tell us something about how much a gene is expressed in a given condition (or cell), how open a region of chromatin is, or whether or not two regions of a genome are in contact.
Custom data analysis using R and python is most of what we work on day-to-day. This could mean running a package to analyze a specific type of data, generating data visualizations in R or python, making interactive plots, or developing R/shiny applications to allow users to interact with their data using a graphical interface.
News
22 November 2024
Discovery offers potential therapeutic target for certain cancers
Read Article
News
20 February 2024
The Computational Biology Scholars Program seeks to create new possibilities to perform cutting-edge, collaborative, and multidisciplinary computational science in the Midwest.
Read Article
Press Release
12 May 2022
New research examines how cavefish developed unique metabolic adaptations to survive in nutrient-scarce environments.
Read Article
Bhattacharya S, Levy MJ, Zhang N, Li H, Florens L, Washburn MP, Workman JL. Nat Commun. 2021;12:1443. doi: 1410.1038/s41467-41021-21663-w.
Peuß R, Box AC, Chen S, Wang Y, Tsuchiya D, Persons JL, Kenzior A, Maldonado E, Krishnan J, Scharsack JP, Slaughter BD, Rohner N. Nat Ecol Evol. 2020;4:1416-1430.
Translation of small downstream ORFs enhances translation of canonical main open reading frames
Wu Q, Wright M, Gogol MM, Bradford WD, Zhang N, Bazzini AA. EMBO J. 2020;39:e104763. doi: 104710../embj.2020104763.
Zeng A, Li H, Guo L, Gao X, McKinney S, Wang Y, Yu Z, Park J, Semerad C, Ross E, Cheng LC, Davies E, Lei K, Wang W, Perera A, Hall K, Peak A, Box A, Sánchez Alvarado A. Cell. 2018;173:1593-1608.e20.
Set2 methylation of histone H3 lysine36 suppresses histone exchange on transcribed genes.
Venkatesh S, Smolle M, Li H, Gogol MM, Saint M, Kumar S, Natarajan K, Workman JL. Nature 2012;489:452-455.
Qian P, De Kumar B, He XC, Nolte C, Gogol M, Ahn Y, Chen S, Li Z, Xu H, Perry JM, Hu D, Tao F, Zhao M, Han Y, Hall K, Peak A, Paulson A, Zhao C, Venkatraman A, Box A, Perera A, Haug JS, Parmely T, Li H, Krumlauf R, Li L. Cell Stem Cell. 2018;22:740-754 e747.