Hayai-Annotation: A GUI R-Package for an Ultrafast Gene Annotation System in Plants
Abstract: The main targets in plant science and breeding are to understand biological systems to describe patterns of evolution and diversity and to increase crop productivity and quality by improving bioatic and abiotic stress tolerance. It is critical for molecular biologists and breeders to broadly and accurately understand gene profiles in genomes. Since genome sequencing is becoming faster and cheaper because of recent advances, even in crop having complex genomes with high ploidy level, a high-throughput and specially fast annotation workflow is required.
In this study, we propose Hayai-Annotation, a GUI R-package, for an automated, ultrafast, and accurate gene annotation system for plant species (model and non-model organisms). The workflow is based on sequence similarity searches using USEARCH in a database of UniprotKB and taxonomy embryophytes (plants). Hayai-Annotation uses UniprotKB's complete set of protein information to provide five levels of annotation: gene name; gene ontology (GO) consisting of three main categories (biological process, molecular function, and xellular xomponent); enzyme commission (EC) code; protein existence level; and evidence type.
Hayai-Annotation was used to compare the annotation of five plant species (sweet cherry, peach, strawberry, fig, and Arabidopsis), regarding the distribution of genes for each GO term (gene level and parental level), and EC code. We concluded that Hayai-Annotation ia an ultrafast tool for detecting differences between particularities of gene prediction methodology such as the presence of transposons and retrotransposons in fig.
Additionally, we observed an increased number of genes per GO term, in Arabidopsis compared with the other studied species, particularly in fast-evolving genes. In addition, it detected an increased number of genes in sweet cherry and peach compared with strawberry and fig, related to disease resistance. Finally, we may have found a different pattern of defense response between Arabidopsis and the other studied species.