Ontology highlight
ABSTRACT:
SUBMITTER: Zvyagin M
PROVIDER: S-EPMC9709791 | biostudies-literature | 2022 Nov
REPOSITORIES: biostudies-literature
Zvyagin Maxim M Brace Alexander A Hippe Kyle K Deng Yuntian Y Zhang Bin B Bohorquez Cindy Orozco CO Clyde Austin A Kale Bharat B Perez-Rivera Danilo D Ma Heng H Mann Carla M CM Irvin Michael M Pauloski J Gregory JG Ward Logan L Hayot-Sasson Valerie V Emani Murali M Foreman Sam S Xie Zhen Z Lin Diangen D Shukla Maulik M Nie Weili W Romero Josh J Dallago Christian C Vahdat Arash A Xiao Chaowei C Gibbs Thomas T Foster Ian I Davis James J JJ Papka Michael E ME Brettin Thomas T Stevens Rick R Anandkumar Anima A Vishwanath Venkatram V Ramanathan Arvind A
bioRxiv : the preprint server for biology 20221123
We seek to transform how new and emergent variants of pandemic-causing viruses, specifically SARS-CoV-2, are identified and classified. By adapting large language models (LLMs) for genomic data, we build genome-scale language models (GenSLMs) which can learn the evolutionary landscape of SARS-CoV-2 genomes. By pre-training on over 110 million prokaryotic gene sequences and fine-tuning a SARS-CoV-2-specific model on 1.5 million genomes, we show that GenSLMs can accurately and rapidly identify var ...[more]