Global and Local Approach of Part-of-Speech Tagging for Large Corpora
We present Global-Local POS tagging, a framework to train generative stochastic Part-of-Speech models on large corpora. Global Taggers offer several advantages over their counter parts trained on small, curated corpus, including the ability to automatically extend and update their models to new text. Global Taggers also avoid a fundamental limitation of current models, whose performance heavily relies on curated text with manually assigned labels. We illustrate our approach by training several Global Taggers, implemented with generative stochastic models, on two large corpora using high performance computing architecture. We further demonstrate that global taggers can be improved by incorporating models trained on curated text, called Local Taggers, for better tagging performance derived from specific topics.
Shi Yu's main research interests are cloud computing, biomedical text mining, computational linguistics, parametric statistical learning methods, and non-parametric machine learning methods, consensus learning and data integration, crowd sourcing and game based machine learning theory. He has been working in the cross disciplinary areas of these topics. He obtained a Bachelor degree in Mechanical and Electrical Engineering at China Textile University, a Masters in Artificial Intelligence and a Ph.D in Electrical Engineering (main area in Bioinformatics) at University of Leuven . He is now post-doc scholar at Institute for Genomics and Systems Biology, University of Chicago. He has published one book and more than 10 papers in areas of machine learning, text mining, computational linguistics and bioinformatics.