Abstract: The DOE Systems Biology Knowledgebase (KBase) is a platform designed to solve the grand challenges of systems biology. KBase has implemented bioinformatics tools that allow for multiple workflows including genome annotation, comparative genomics, and metabolic modeling. In KBase, we selected a phylogenetically diverse set of approximately 1600 genomes and constructed draft genome-scale metabolic models (GEMs) using the ModelSEED pipeline implemented in KBase. We used these 1600 genomes as a test set to improve the quality of models produced by the ModelSEED.
First, we updated our biochemistry database to include reaction data from KEGG, MetaCyc, BIGG, and published models. In an effort to reconcile pathway representation across the multiple databases, we manually curated pathways before inclusion in our reconstruction templates. Next, we curated our mapping of RAST functional roles to biochemistry by reconciling with data mined from KEGG and published metabolic models. We corrected errors in our reaction reversibility assertions to improve overall model constraints, and we refined our gap filling procedure to prevent draft models from our pipeline from over-producing ATP. We show how all of our pipeline improvements increase the number of gene associations, decrease the number of gap filled reactions, improve the accuracy of growth and ATP production yield predictions, and decrease the number of blocked reactions across all models. Finally, we select 17 specific genomes for which comprehensive TN-seq data is available, and we compare model predictions of all data with experimental results, showing significant improvement in accuracy between models generated by the original ModelSEED.
The listed Improvements will be available as an update to our reconstruction pipeline, ModelSEED release 2.