Monarch geneset OGS2.0

DPOGS200047
TranscriptDPOGS200047-TA897 bp
ProteinDPOGS200047-PA298 aa
Genomic positionDPSCF300365 + 57565-59797
RNAseq coverage99x (Rank: top 61%)
Annotation
HeliconiusHMEL0073701e-8459.69% 
BombyxBGIBMGA013988-TA2e-10959.11% 
DrosophilaCG5731-PA2e-6740.98% 
EBI UniRef50UniRef50_B7PDZ51e-6938.15%Alpha-D-galactosidase, putative n=3 Tax=Arthropoda RepID=B7PDZ5_IXOSC
NCBI RefSeqNP_001040191.13e-7243.85%alpha-N-acetylgalactosaminidase [Bombyx mori]
NCBI nr blastpgi|3072133906e-7241.25%Alpha-N-acetylgalactosaminidase [Harpegnathos saltator]
NCBI nr blastxgi|1892389682e-7243.92%PREDICTED: similar to AGAP005846-PA [Tribolium castaneum]
Group
Gene OntologyGO:00081521.6e-53metabolic process
GO:00038241.6e-53catalytic activity
GO:00045536.3e-28hydrolase activity, hydrolyzing O-glycosyl compounds
GO:00059756.3e-28carbohydrate metabolic process
KEGG pathwayaag:AaeL_AAEL0051888e-71 
 K01189 (GLA)maps-> Galactose metabolism
    Lysosome
    Glycerolipid metabolism
    Sphingolipid metabolism
    Glycosphingolipid biosynthesis - globo series
InterPro domain[18-278] IPR0178531.1e-73Glycoside hydrolase, superfamily
[17-161] IPR0137851.6e-53Aldolase-type TIM barrel
[20-39] IPR0022416.3e-28Glycoside hydrolase, family 27
[51-132] IPR0001111.7e-15Glycoside hydrolase, clan GH-D
Orthology groupMCL26158 Lepidoptera specific
Genotypes for resequenced monarchs and outgroup Danaus species

Nucleotide sequence:

>DPOGS200047-TA
ATGTATCTCTGTTTATTTTTATTTATTGTTTATTTATGTGGTGTGAATTTATTAAATAACGGCCTGGCACAGAAGCCTCCCATGGGCTGGATGTCATGGGGATACTATATGTGTGGTGTGGACTGTAAAAGGAATCCTCATAAGTGTCTTAATGAGGAATTAATACTATCAGTGGTTGACTCGTTTTACGATGAAGGGTATCAGGAAGCTGGGTACGAATATATCATTATTGATGATTGCTGGTCGGAGAGAATACGTGATAAAAATGGTCGTCTCGTACCAGACAGGACAAGGTTTCCGAGAGGCATGAAATTTATCGCTGATTATATACATGCTAGAGGTCTTAAATTTGGATTGTACACTAATGTAGCGGACACCACATGTATGGGTTATCCCGGTTCAAGAGATCATTTCGCCATCGACGCCAAACAGTTTGCGCAGTGGGAAATAGATTATCTTAAAGTCGATGGTTGTTTTGTCAGCGAAGAATATCTTAATATTGATTATAAAAGTGTGGCTGAATATTGCAACATGTGGCGGAATTACCACGATGTGGCTACATCATGGGAGGCAGTTAAGGCCATTATAACACACTATCAAGGGGTATATAACGATATTAATGGTTATCACGGACCAGGCCATTGGAATGATCCAGATATGTTAATATTTGGAACTAATTCGCTGTCGGAGAGTCAGAGCAGAGTACAAATATCAGTATACTCAATGCTTGCCGCACCGTTGTTGTTAAGCTGTGACATGAAAAATATTAATGATTACGAGAAACAGATGTTACTTAATTTAGATCTAATAGCGATAGGTGTTCTGCTAGACCCCTATTACCGACGGCCTTACGCGGTATGTCCTCAAGCATATACTGACTTCAGGATAGGAGGTTAA

Protein sequence:

>DPOGS200047-PA
MYLCLFLFIVYLCGVNLLNNGLAQKPPMGWMSWGYYMCGVDCKRNPHKCLNEELILSVVDSFYDEGYQEAGYEYIIIDDCWSERIRDKNGRLVPDRTRFPRGMKFIADYIHARGLKFGLYTNVADTTCMGYPGSRDHFAIDAKQFAQWEIDYLKVDGCFVSEEYLNIDYKSVAEYCNMWRNYHDVATSWEAVKAIITHYQGVYNDINGYHGPGHWNDPDMLIFGTNSLSESQSRVQISVYSMLAAPLLLSCDMKNINDYEKQMLLNLDLIAIGVLLDPYYRRPYAVCPQAYTDFRIGG-