Monarch geneset OGS2.0

DPOGS205501
TranscriptDPOGS205501-TA2784 bp
ProteinDPOGS205501-PA927 aa
Genomic positionDPSCF300056 - 450093-461192
RNAseq coverage2632x (Rank: top 5%)
Annotation
HeliconiusHMEL0112840.079.85% 
BombyxBGIBMGA000132-TA0.071.92% 
DrosophilaCG14476-PE0.048.89% 
EBI UniRef50UniRef50_Q7KMM40.048.89%BcDNA.GH04962 n=35 Tax=Coelomata RepID=Q7KMM4_DROME
NCBI RefSeqXP_968738.20.052.82%PREDICTED: similar to CG14476 CG14476-PB [Tribolium castaneum]
NCBI nr blastpgi|3072064620.052.51%Neutral alpha-glucosidase AB [Harpegnathos saltator]
NCBI nr blastxgi|3838648890.051.59%PREDICTED: neutral alpha-glucosidase AB-like [Megachile rotundata]
Group
Gene OntologyGO:00045530hydrolase activity, hydrolyzing O-glycosyl compounds
GO:00059750carbohydrate metabolic process
GO:00302461.1e-19carbohydrate binding
GO:00038241.1e-19catalytic activity
KEGG pathwaytca:6571740.0 
 K05546 (GANAB)maps-> Protein processing in endoplasmic reticulum
    N-Glycan biosynthesis
InterPro domain[1-928] IPR0003220Glycoside hydrolase, family 31
[343-704] IPR0178531.7e-78Glycoside hydrolase, superfamily
[28-342] IPR0110131.1e-19Glycoside hydrolase-type carbohydrate-binding
Orthology groupMCL10634 Multiple-copy universal gene
Genotypes for resequenced monarchs and outgroup Danaus species

Nucleotide sequence:

>DPOGS205501-TA
ATGAAGACGCTGAGTCTTCTTCTGGTGGTTGCACTATCGATTATCAGTAGCTTGGCTGTAGATAGAAACAACTTTAAAACTTGCGAACAGTCGGGCTTTTGTAAGCGGCTTCGGCCATTCAAGTCTGAAAAATCACAGTATGCCTTGAACTTGGATACAGTGATGGTACATGGGAATGTACTGGGAGCGGAAGTCGTCACCCAGGATAATCAAGGAGAGAAAAATAACGTTCTGTGGCGTTATACGCTTAAACTATCAGCTCTCGTCGACGGAACCTTCAGGGTTGAGTTGGATGAATCTGAACCTCTATATCCGAGGTATAGGACACAGTTAGCCCTCGACGGAGAACCGAAAGAAGATAGTCTGAAACTGATATCCAATGAGAGCGGTAAACTGACGGTGGTCAACAGTCAAGGTCATAAGGTCATTATAACAGCTGACCCCTTGAAGTTTGAGTTCTACAACAAGAACGGTGACCTCGCCGTAGTACTGAATGACAACAACCAGCTGATAGTCGAACCGCTGCGGGTGAAGAGGGAGAAGATTGGTGATGATGATGAAGCAGCTGCAGTTGAGGAGGATGAAGGTGCTTGGAGTGAAAATTTCAAATCTCATCACGATAGCAAACCGAGAGGCAACGAGGCTGTGTCCCTGGACGTGGCCTTCCCTGACGCTAACCAGGTTTACGGTATCCCACAACACACGGATAACTTCTATCTCAAGACCACGACGTCCGGTGAGCCCTACCGTTTGTATAACTTGGATGTCTTCGAGTATGAGTTAGACAGTCGCATGGCTATATACGGCGCTGTGCCCGTCCTGTACTCACACAGTAAGCGTCACAGTGCCGGTGTGTTCTGGCACAACTCGGCTGAAACGTGGGTGGATGTGGTGAACTACGCTGACGAAACAGTGGTGTCCTCTCTCGTGAACCTGGTGACTGGGGGGAGGAAGACCAGGGTGGACGCCAGGTTCATGAGTGAGTCCGGTGTGATAGACGTGTTCGTGTTGCTCGGAGACAAGCCCTCCGACGTGTTTAGACAGTACACCAGGCTGACGGGAGTGGCGCCGCTACCGCCGAAATTCTCTCTGGCGTACCATCAATCAAGATGGAACTACGCTGATGAAAACGAAGTGAGGTCTGTGGACGAGGGATTCGACGCAAATGATATACCCGCGGACGTTATCTGGCTGGACATTGAATATACGGATAGGAAAAAATATTTCACCTGGGACCCGGAGAAGTTTGCTCACCCAGCCGAGATGGTGGCGAATCTGACTGCCAAGGGTAGAAAACTGGTGGTCATCATAGACCCGCACATCAAGAGGGAGGCCGGGTACTTCCTGCACGAGGACGCCACCGAGCAAGGGCTATACGTCAAGAACAAGGACGGGAATGACTATGAGGGTTGGTGTTGGCCGGGGTCGTCTTCGTACCTCGACTTCTTCAACCCTAAAGTCATGGATTACTACGTCAAGAGGTATCAGTTCGATAACTTCCCGGGGACCAGCAAGGATGTGCACATATGGAACGATATGAACGAACCTAGTGTATTCAATGGACCGGAAATAACAATGCCAAAGGACTGTCGCCACTACAAACCACCTCAAGACGGACATGACGGTCTCGCGTCTTTCTGGGAACACAGACACGTCCACAACGAGTACGGCCTGTTCCACATCAGCGGCACCCACCAGGGCGTCTTGGATAGGGCGGGCGGGAGATACAGGCCTTTCGTACTAACTCGCTCCACCTTCGCCGGCACACAGCGGTACGCCGCAGTGTGGACCGGCGACAACTCAGCGGAGTGGGGTTTCTTGGAGGCGTCGGTGAGGATGTGCGTGTCGCTGGCGGCGAGCGGCATCAGTCACTGCGGATCGGACGTCGGCGGGTTCTTTAAGTACCCTGAGGAAGAGCTCATGACGAGGTGGTACCAGGCCGCCGCGTATCAACCGTTCTTCCGAGCTCACTCTCACATAGAAACCAAGAGACGGGAGCCCTGGCTGTACCCGGCCGCCACCATGGGCAGGATAAGAGACGCGGTCAGACGGAGATACGCCTTGCTGGACTTTTGGTACACGTTGTTCTACGAGCACTCGGTAGACGGTCTACCAGTCATGAGACCATTGTTCCAGGAATTCCCGGAGGAGGAAGAGACGTTCACTATAGATGACACATATCTGTTGGGCGATCGCTTGCTAGTAAGGCCGGTGTTGTCAGAGGGCGCCACTAGTGTTAAAGTTTATTTCCCCGGAAAGGATTCCAAGACACTGTGGTATGATACAGATTCATATCAGGCATACCCCGGAAACGGATACACTACCATCGATGTCAACATAGCCAAGACTCCGGTGTACCAGCGAGGCGGCACAGTGATCTTCCGCAAGGAGAGGGTCAGGCGAGCATCCCCACTCATGGCGGACGACCCTTACACTGTAGTGGTGACGCTCGACCAACAGAACACGGCGCGCGGCTCGCTGTACATCGACGACGGGGAAACGTACGAGTACACGAAGCACAAGTACACGTACGGGCGACTCGCGTACTCCGCGGACAGGATGGCCTACACGTTCATAGACAAGAACGCACATTACCCGACGCGTTCGTGGGTGGAGCGTATAGTCATAGCGGGTATTAAGAACCCACCGAAATCGGCCAAACTCGTCCAGGACGGTAAAGTCACGCCGCTGCAGATGACCTTGCACCGGGGCAACGACGTGCTGGTGGTGAGGAAACCAGCCGCCGCCATGGCCAAGGAGTGGGAAATACAATTCACATATTAA

Protein sequence:

>DPOGS205501-PA
MKTLSLLLVVALSIISSLAVDRNNFKTCEQSGFCKRLRPFKSEKSQYALNLDTVMVHGNVLGAEVVTQDNQGEKNNVLWRYTLKLSALVDGTFRVELDESEPLYPRYRTQLALDGEPKEDSLKLISNESGKLTVVNSQGHKVIITADPLKFEFYNKNGDLAVVLNDNNQLIVEPLRVKREKIGDDDEAAAVEEDEGAWSENFKSHHDSKPRGNEAVSLDVAFPDANQVYGIPQHTDNFYLKTTTSGEPYRLYNLDVFEYELDSRMAIYGAVPVLYSHSKRHSAGVFWHNSAETWVDVVNYADETVVSSLVNLVTGGRKTRVDARFMSESGVIDVFVLLGDKPSDVFRQYTRLTGVAPLPPKFSLAYHQSRWNYADENEVRSVDEGFDANDIPADVIWLDIEYTDRKKYFTWDPEKFAHPAEMVANLTAKGRKLVVIIDPHIKREAGYFLHEDATEQGLYVKNKDGNDYEGWCWPGSSSYLDFFNPKVMDYYVKRYQFDNFPGTSKDVHIWNDMNEPSVFNGPEITMPKDCRHYKPPQDGHDGLASFWEHRHVHNEYGLFHISGTHQGVLDRAGGRYRPFVLTRSTFAGTQRYAAVWTGDNSAEWGFLEASVRMCVSLAASGISHCGSDVGGFFKYPEEELMTRWYQAAAYQPFFRAHSHIETKRREPWLYPAATMGRIRDAVRRRYALLDFWYTLFYEHSVDGLPVMRPLFQEFPEEEETFTIDDTYLLGDRLLVRPVLSEGATSVKVYFPGKDSKTLWYDTDSYQAYPGNGYTTIDVNIAKTPVYQRGGTVIFRKERVRRASPLMADDPYTVVVTLDQQNTARGSLYIDDGETYEYTKHKYTYGRLAYSADRMAYTFIDKNAHYPTRSWVERIVIAGIKNPPKSAKLVQDGKVTPLQMTLHRGNDVLVVRKPAAAMAKEWEIQFTY-