Monarch geneset OGS2.0

DPOGS202008
TranscriptDPOGS202008-TA1569 bp
ProteinDPOGS202008-PA522 aa
Genomic positionDPSCF300053 - 1260983-1269511
RNAseq coverage249x (Rank: top 42%)
Annotation
HeliconiusHMEL0128250.072.94% 
BombyxBGIBMGA002450-TA2e-17667.55% 
DrosophilaCG9701-PA6e-14448.41% 
EBI UniRef50UniRef50_B2DBM60.071.07%Similar to CG9701-PA (Fragment) n=4 Tax=Obtectomera RepID=B2DBM6_9NEOP
NCBI RefSeqXP_001850321.11e-15049.81%glycoside hydrolase [Culex quinquefasciatus]
NCBI nr blastpgi|1839792470.071.07%similar to CG9701-PA [Papilio xuthus]
NCBI nr blastxgi|1839792470.073.11%similar to CG9701-PA [Papilio xuthus]
Group
Gene OntologyGO:00045531.8e-229hydrolase activity, hydrolyzing O-glycosyl compounds
GO:00059751.8e-229carbohydrate metabolic process
GO:00431691.4e-169cation binding
GO:00038241.4e-169catalytic activity
KEGG pathwaytca:6645775e-132 
 K05350 (bglB)maps-> Starch and sucrose metabolism
    Phenylpropanoid biosynthesis
    Cyanoamino acid metabolism
InterPro domain[39-510] IPR0013601.8e-229Glycoside hydrolase, family 1
[38-504] IPR0137811.4e-169Glycoside hydrolase, subgroup, catalytic core
[38-517] IPR0178531.2e-160Glycoside hydrolase, superfamily
Orthology groupMCL10040 Multiple-copy universal gene
Genotypes for resequenced monarchs and outgroup Danaus species

Nucleotide sequence:

>DPOGS202008-TA
ATGCAGAGCAAGAGTAGAGTCACCCATAGTGCTGCCATGTTCGCGCTGTTTAGCCTGATGTTATTAACAAATGGAGTAAACGCGCTTATAAGCAGTAATGGGTTGAGCAACTATTCGTTTCCAGACAATTTTATTTTTGGAGTAGCGACGGCTGCATTTCAAATAGAAGGAGGTTGGAATGAGGGTGGTAAAGGTGAAAGCATGTGGGATACATATTTACACAAACACCCTAAATTCACGGTGGACCAATCGAACGGAGACGTGGCCGCCGATTCATATCACAAATACAAACAAGATTTAATAATGATCAAGTCAATCGAAGTAAAATATTACCGACTTTCAATATCATGGCCAAGAATATTACCACATGGAACTGACAACTACATCAGCAAAGATGGAGTTAGATACTATCGAAAGCTTTTCGAAGAACTAATAAATGCCAATATAACTCCCGTTGTGACACTGTATCATTGGGATATGCCAACAGCTCTAATGGATTTAGGCGGATGGACTAATCCCAAAATGGTGGATTACTTTGAGGACTACGCGAGAGTAGCGTTCACACTGTTCGGAGATATTGTGAAAACGTGGACCACTATGAACGAATTGCATCAACATTGCTTTAACGGCTATGGCGGTAATTTTTTCGTCCCTGCCCTAAAATCACATGGTGTTGGTGCATATTTATGTTCACATTACATGCTGTTGGCGCACGCACGAGCTTATCGGTTGTATGACAAGCAATTTAGACCACATCAGAAAGGAAAAGTTGGTATAACTTTAGACGCATTTTGGGCTGAACCTAAAGATTATAATAAAGAGGAAGATCATGAAGCAGCAGAACGGTATCTTCAGATGCATGTGGGTTTATTCGCTCATCCAATTTATTCAGACGAAGGAGACTATCCTCTTCTCGTTCGAAACAGGATTGATGATATGAGCCGCAATCAAGGTTTTGCCAGATCTCGATTACCATTTTTTACCCCTGAAGAAGTGGCCATGGTTCGAGGTAGTTCAGATTTCTTTGGCATCAATCACTACACCACATACTTAATGTCAAACTCATCTATGGAACCTGAATGGGTTATTCCCTCTGTGGACCATGACACTGGAGTAAAAATTGAACAGAGCAAAGAATGGCCTATACCAGGCGCCGAATGGCTCTCAGTTTATCCCCCCGGATTTCGAAAACTCATTAATTGGATAACCAAGAGTTATGGTAAAAGAGTGCCTATCATTGTAACAGAAAATGGGGTATCGGATTTCGGTGGTAAGAACGATTACTCTCGAGTGTCATATTTTAATAACTATTTGGAACAACTTTTATTGGCGATTCACGAAGACGGTTGTAATGTATCCGGATACTTCGCTTGGACTTTAATGGACGATTTTGAATGGAACGATGGATACAAGGTGAAATTTGGTCTATTTCACGTGGACTTCAACAGCCCGGGTAAAGAAAGGACTCCAAAATTATCAGCGCTCAATTACGGCGAAATAGTTCGCACGAGGCGAGTCAATTTCAACTACATAAAGATGCCATCGTATAAATATAATACTCTATTGTAA

Protein sequence:

>DPOGS202008-PA
MQSKSRVTHSAAMFALFSLMLLTNGVNALISSNGLSNYSFPDNFIFGVATAAFQIEGGWNEGGKGESMWDTYLHKHPKFTVDQSNGDVAADSYHKYKQDLIMIKSIEVKYYRLSISWPRILPHGTDNYISKDGVRYYRKLFEELINANITPVVTLYHWDMPTALMDLGGWTNPKMVDYFEDYARVAFTLFGDIVKTWTTMNELHQHCFNGYGGNFFVPALKSHGVGAYLCSHYMLLAHARAYRLYDKQFRPHQKGKVGITLDAFWAEPKDYNKEEDHEAAERYLQMHVGLFAHPIYSDEGDYPLLVRNRIDDMSRNQGFARSRLPFFTPEEVAMVRGSSDFFGINHYTTYLMSNSSMEPEWVIPSVDHDTGVKIEQSKEWPIPGAEWLSVYPPGFRKLINWITKSYGKRVPIIVTENGVSDFGGKNDYSRVSYFNNYLEQLLLAIHEDGCNVSGYFAWTLMDDFEWNDGYKVKFGLFHVDFNSPGKERTPKLSALNYGEIVRTRRVNFNYIKMPSYKYNTLL-