MIMt-18S- A new 18S database for taxonomic identification of a Eukaryotic metagenomic sample
Ontology highlight
ABSTRACT: MIMt-18S is a database composed by sequences belonging to Eukaryotes from Target Loci type material, Refseq genomes and Genbank genomes.
To create MIMt-18S database we collected all the small subunit rRNA sequences from Target Loci and we append the predicted region from all the available eukaryotic genomes in RefSeq predicted with the tool RNAmmer-1.2.
The result is a complete database where most of the sequences are manually curated from RefSeq curators and are properly identified at species level, or even subspecies/strain. The full version of MIMt-18S contains in addition 18S sequences from the genome of new species deposited in Genbank, always keeping the full 18S region and identifying exactly the species name to get the full taxonomic classification.
Thus, all sequences included in both versions of MIMt-18S are full length and are well identified at all taxonomic levels.
MIMt-18S has been trained to be used in QIIME and the classifier is also provided.
The database format is:
>SeqIDK__kingdom;P__phylum;C__class;O__order;F__family;G__genus;S__Genus_species
CGCGACTACGACTACGCTCAGACGCATCGTACGCAGACTACGTCAGTCAGACGTCGCTGCTCGTCGTACGTACGCT
There is also available a file with just the taxonomy associated to each sequence in the format:
SeqIDFull_taxonomy
and another one with species sharing the 100% of the sequence, so the programs could not differentiate between both species when a taxonomic classification is performed.
All files are available for both, only curated version and full version (including also predicted 18S regions from Genbank genomes)
ORGANISM(S): Eukaryotes
SUBMITTER:
PROVIDER: S-BSST2009 | biostudies-other |
REPOSITORIES: biostudies-other
ACCESS DATA