Monarch geneset OGS2.0

DPOGS202361
TranscriptDPOGS202361-TA2754 bp
ProteinDPOGS202361-PA917 aa
Genomic positionDPSCF300104 - 186667-189420
RNAseq coverage125x (Rank: top 57%)
Annotation
HeliconiusHMEL0028930.064.18% 
BombyxBGIBMGA013995-TA0.058.70% 
DrosophilaCG14476-PE8e-1623.44% 
EBI UniRef50UniRef50_C0X6590.056.20%Glycosyl hydrolase n=34 Tax=cellular organisms RepID=C0X659_ENTFA
NCBI RefSeqXP_002161972.12e-3631.21%PREDICTED: similar to alpha glucosidase II alpha subunit [Hydra magnipapillata]
NCBI nr blastpgi|2275190110.056.20%glycosyl hydrolase [Enterococcus faecalis TX0104]
NCBI nr blastxgi|3072879980.056.16%LPXTG-motif protein cell wall anchor domain protein [Enterococcus faecalis TX0109]
Group
Gene OntologyGO:00045534.8e-117hydrolase activity, hydrolyzing O-glycosyl compounds
GO:00059754.8e-117carbohydrate metabolic process
GO:00302461.4e-27carbohydrate binding
GO:00038241.4e-27catalytic activity
KEGG pathwaydtu:Dtur_06507e-43 
 K01187 (E3.2.1.20, malZ)maps-> Starch and sucrose metabolism
    Galactose metabolism
InterPro domain[413-796] IPR0003224.8e-117Glycoside hydrolase, family 31
[1-224] IPR0110131.4e-27Glycoside hydrolase-type carbohydrate-binding
[421-558] IPR0178531.2e-23Glycoside hydrolase, superfamily
Orthology groupMCL26161 Lepidoptera specific
Genotypes for resequenced monarchs and outgroup Danaus species

Nucleotide sequence:

>DPOGS202361-TA
ATGATAGGTGCTGTGAAAAGCATCACAAAGGTCACTAAGTACTACCAAATTAATTTCTCGACTGGCGAGGAAGCGAGGTTGTATGTCCTCAATGATCATGTTTTCAGATACTACGTGTCACCCAAAGGAATCTTTCTAGACTATCCGGAACCCATGAACCCAGAACATGAAGCCAGAATCGTTTACAAACACGAAGACGCATACGGCTTACAAGCATTCAAAGAGTCCACTTTAAAAGACGACGATTCCCGTTACATCATAGAAACTAAAGATGTGAAAATTATATTTCATAAAACGCTTGGTACTATGGAGGTACACGATTTGAGAAGGGGCAAGGAAGTATTCGGCGAATTAAGGCCTTTGTCGTATAAAAATTGTCACTCTATGCAAACCCTTCGTCAGAGACGAGATGAGTATTTTTTTGGGGGCGGCATGCAGAACGGCAGATTCACCCACAAAGGAGAAGTCATAGAGATAGTCAACACCAACAAGTGGAACGACGGAGACGTCGCTTCGCCTTGCCCATTTTATTGGTCCTCGTCCGGCTATGGCGCACTGAGAAACACTTTTCGACCAGGCGAATACGATTTTGGGATAAAATCTATGAGTTACATAGAAACAACTCATAACGGCGTAGATTTTGATGCGTTCTACTTCATAAATGAACTGCCGAGAGACATTTTAAACGATTACTACGAACTAACAGGGAAACCGATACTGTTGCCAGAATACGCGTATTATGAGGCACATTTGAACGCGTTCAATCGAGATTATTGGGTCAAAGTAACATCCGACATAAATGGCGCCATACTATTCGAAGACGGACAATACTATAAATGCTTTCAGCCAAATCAAATCGGTGACAAGACGGGGATTTTGGAGTCTTTGAATGGAGACGAGAATAATTATCAATTTTCCGCCCGCGCTATGATAGATAGATATAAAAAGCACGATTTACCGCTGGGCTGGTTTATACCCAACGACGGCTACGGCTCAGGATATGGACAGACTGATTCTATGGATGGTGACATACAGAATTTGAAACGTTTCTCTGATTACGCGTTACAAAATGGCGTAGAGTGTGCTCTGTGGACGGAGAGCAACCTGACACCCAAGGATCCTTTGAACCCAAAAAAGGGAGAAAGAGATTTATCCAAAGAGGTGGGAATCGCTAATGTGGTAGCATTAAAATGCGACGTAGCCTGGGTGGGCAGCGGATATTCGTTCGGACTCTCAGCTATAGAGAACGCGACGGACATATTCGTTAAAAGCACGAGAAATAATGTCAGACCCTTCATCATCATGGTGGATGGATGGGCTGGATGTCAACGCTATGCGGGTATATGGAGTGGAGACCAGAAGGGCGGGGAGTGGGAGTACATAAGGTTTCATATACCGACTTATATCGGTGCTGGGCTCTCTGGAATACCGCTCGTCGGTTCAGATATGGATGGCATCTACGCCGGGGGTGATAAGGAAATTAACATACGAGAATATCAATGGAAAACTTTCACCCCGATACAACTTAACATGGATGGGTGGGGTCACGTACAAAAGACGCCATTCACGTTCGGCGAGGAAGCGACATACATAAACAGGGGATATTTGAAATTAAAATCAATGTTGATCCCATACAACTACAGCATTGGCTATGAATCGATCCACGGCCTGCCCATGGTTAGGGCCATGTTCTTAGAATATCCAGGAGAAGTAACGGCTTACACTTTAGAATCCCAATACCAGTACATGTGGGGTCCGAATATTTTGGTTGCTCCTATATACAGCGGCGAGAAATTAGGCAAAGACTCACTACGTGATGGAATCTACTTGCCAGATTCCAATCAGATATGGATAGACTTTCTAACCGGCGAGAAATATCAAGGGGGGAAGATTTACAACAACATTGTATCACCTTTATGGAAAATACCTGTTTTTGTTAAGGACGGCGCCATAATACCAACCACAAATCCGAACAACAATCCTTACGAAATAAAACGAGATCTTAGAGTATTCACCGTGTATCCCAACGGCACGTCTAGCTTTATCGTATACGAAGACGATGGAATTACGTCCGATTATCTAAAAGGTTCGTACGCCACGACCAAAATCCATGCCAGTGGTCCTGTGTCTAATAAAAATGGGGATTTAATTATAAAAATACACAAAACTAAAGGACACTATAAGAACATTGTGAAGGAAAGACGTACTTTGATACAAGTAATGTGCTCCAGAGCGGTCGGACGGATAAAGGTATCAGCCAACGAAAAATCTATCAGATTAAAGAAAGTTCGTAACTCAGACGAGTTCGCTAACAACGACGACTGTTTCTATCACGATGAGAATTTCCAATTCAACCCCTACCTTAAAAACTATGCCGAGACGAAGCAAAAATTCCTACTAATAAAACTTAGTAAATTAGACGTCACAGCGTGTGAGATAATCATAAAGATCAAAGATGTATCTAATAAAAGTGCTGTTTACGGCAAAATTGATGTCGACAACGGAATTGAAGTCCCGAAAAATGTTAAAGAAGTGGAACATGGGACCTCAACCATAGGTTTGCAATGGGATAACAGCAATTACGATTATAACGAAGTAGAAAAAGACGGTGTTATATACACAAATATAAAAAACAATTCGTTCATATTTAATCATGCGGACGGCGTACATGAATGTAGAGTCCGTTCAGTAGTAGGCGTAAAAGCATCTAAATGGAGTGAGAAAGTAATTTTAAATAATCAGAATGTACTATAA

Protein sequence:

>DPOGS202361-PA
MIGAVKSITKVTKYYQINFSTGEEARLYVLNDHVFRYYVSPKGIFLDYPEPMNPEHEARIVYKHEDAYGLQAFKESTLKDDDSRYIIETKDVKIIFHKTLGTMEVHDLRRGKEVFGELRPLSYKNCHSMQTLRQRRDEYFFGGGMQNGRFTHKGEVIEIVNTNKWNDGDVASPCPFYWSSSGYGALRNTFRPGEYDFGIKSMSYIETTHNGVDFDAFYFINELPRDILNDYYELTGKPILLPEYAYYEAHLNAFNRDYWVKVTSDINGAILFEDGQYYKCFQPNQIGDKTGILESLNGDENNYQFSARAMIDRYKKHDLPLGWFIPNDGYGSGYGQTDSMDGDIQNLKRFSDYALQNGVECALWTESNLTPKDPLNPKKGERDLSKEVGIANVVALKCDVAWVGSGYSFGLSAIENATDIFVKSTRNNVRPFIIMVDGWAGCQRYAGIWSGDQKGGEWEYIRFHIPTYIGAGLSGIPLVGSDMDGIYAGGDKEINIREYQWKTFTPIQLNMDGWGHVQKTPFTFGEEATYINRGYLKLKSMLIPYNYSIGYESIHGLPMVRAMFLEYPGEVTAYTLESQYQYMWGPNILVAPIYSGEKLGKDSLRDGIYLPDSNQIWIDFLTGEKYQGGKIYNNIVSPLWKIPVFVKDGAIIPTTNPNNNPYEIKRDLRVFTVYPNGTSSFIVYEDDGITSDYLKGSYATTKIHASGPVSNKNGDLIIKIHKTKGHYKNIVKERRTLIQVMCSRAVGRIKVSANEKSIRLKKVRNSDEFANNDDCFYHDENFQFNPYLKNYAETKQKFLLIKLSKLDVTACEIIIKIKDVSNKSAVYGKIDVDNGIEVPKNVKEVEHGTSTIGLQWDNSNYDYNEVEKDGVIYTNIKNNSFIFNHADGVHECRVRSVVGVKASKWSEKVILNNQNVL-