Monarch geneset OGS2.0

DPOGS215886
TranscriptDPOGS215886-TA2835 bp
ProteinDPOGS215886-PA944 aa
Genomic positionDPSCF300029 - 96789-127808
RNAseq coverage239x (Rank: top 43%)
Annotation
HeliconiusHMEL0053250.094.33% 
BombyxBGIBMGA000442-TA0.076.21% 
Drosophilasfl-PB0.062.35% 
EBI UniRef50UniRef50_Q9V3L10.062.35%Bifunctional heparan sulfate N-deacetylase/N-sulfotransferase n=16 Tax=Bilateria RepID=NDST_DROME
NCBI RefSeqXP_001603996.10.065.14%PREDICTED: similar to heparan sulfate n-deacetylase/n-sulfotransferase [Nasonia vitripennis]
NCBI nr blastpgi|1565499890.065.14%PREDICTED: bifunctional heparan sulfate N-deacetylase/N-sulfotransferase-like [Nasonia vitripennis]
NCBI nr blastxgi|1565499890.065.80%PREDICTED: bifunctional heparan sulfate N-deacetylase/N-sulfotransferase-like [Nasonia vitripennis]
Group
Gene OntologyGO:00167872.5e-215hydrolase activity
GO:00150162.5e-215[heparan sulfate]-glucosamine N-sulfotransferase activity
GO:00081461.4e-32sulfotransferase activity
KEGG pathwaynvi:1001203470.0 
 K02577 (NDST2)maps-> Glycosaminoglycan biosynthesis - heparan sulfate
InterPro domain[48-527] IPR0219302.5e-215Heparan sulphate-N-deacetylase
[617-917] IPR0008631.4e-32Sulfotransferase domain
Orthology groupMCL10385 Multiple-copy universal gene
Genotypes for resequenced monarchs and outgroup Danaus species

Nucleotide sequence:

>DPOGS215886-TA
ATGGCGGGCGAGGGACGCGGGTCGCGTGCCCAGCTGTTGGAGTGCGGCGACTATATGCATCCTCATAAAACTGCTACACCGCGCTGCTGTTTATGGCTAGCCAGCCATATCAATGTTAGGAAGTGTGTAGCCGGCGTTATGCTGCTATCAATACTCACAATATTTTTCTATACGTACTATGTAACGGCACCGATAACAAGTTTAGTGTGGCGCGATCGTGTACCGCGACCATTGTCACAATGCTCGCTACTGGCGTCTCAGCAACAGACAGCGCGCGACCATCGCTCAGACGCTCGACTCCGCATAGACGCTAAAGTTCTAGTTATAGCGGAGTCCCTGTATTCTAGACTTGGACGAGACATAGCCGAACTGCTTGTCGCTAATCGAATTAGGTACAAAGTAGAAGTAGCTGGTAAGAGTCTGCCAGTGCTTACCACTTTAGATAAGGGCCGTTATGGAGTTATCGTGTTCGAGTCGCTATCGAAATACGCGAACATGGATAAATGGAATCGTGAACTTCTCGATAAATACTGTCGAGAATACTCAGTTGGGGTCGTCGCTTTCGCAACACCGGGGGAGGAAAGCCTTGTTGGCGCTCAGCTGAGAGGATTTCCACTCTTCATGCATACCAATCTGAGGCTTAAGGATGCAGCCCTTAATCCAGCATCACCTGTACTACGACTTGCCCGAGCTGGTGAGACGGCCTGGGGTCCTCTACCAGGCGATCATTGGACCGTCTTCAGAGCCAACTCCTCAACATACGAACCAGTAGCATGGGCTCTAAGACAGAACGAGTACGGCTCCAACGAGGAACGTCTCCCTTTAGCGACTGTAGTTCAGGACCATGGTCGTTTGGACGGAGTACAGAGAGTGCTGTTTGGGTCTGGGCTTCAGTTTTGGCTTCATAGGATACTGTTCTTGGATGCTCTGAGCTACCTCAGCCACGGGCAGCTCAGCCTCAGCTTGGACAGATGGATACTCGTGGATATAGACGACATCTTCGTAGGAGAAAGAGGTACACGTCTCCACGTAGAGGATGTGTCAGCGTTACTGGCGTCTCAGACAGCCTTACAACGACTTGTCCCAGGCTTCAGGTTTAACCTTGGCTTCAGTGCCAAATATTATCACCACGGAACGCTACTAGAAAATTTGGGCGATGACGCGCTCTTAAAGAATAGAGAGCACTTTAACTGGTTCTGTCATATGTGGAATCACCAACAGCCTCATTTGTACAACAATGTGTCCCAACTCGAAGCCGAGATGACGTTGAACAAGCAATTTGCTCTGGAGCACGGTATTCCAACTAATTCGTGTTATTCGGTGTCGCCTCACCATTCTGGAGTGTATCCTGTCCACGAGCCATTGTATGAAGCTTGGAGGAAAGTGTGGGATGTCAAGGTCACCAGTACTGAAGAATATCCTCATCTACGACCAGCTAGATTGCGGCGCGGTTTCCGTCACCGCGGTGTTATGGTCCTACCACGTCAGACCTGTGGCCTTTTCACACATACTCTACTTCTGGAGCGGTATCCAGGAGGCAGGCAGCGTCTCGACCGCTCCATACAGGGCGGGGAGTTGTTCCAGACAGTTATTAACAACCCGATAAACGTGTTCATGACTCATATGTCAAACTACGGGAACGATCGTCTCGCGTTGTACACGTTTGAATCCGTCGTTAAGTTTCTGAGATGCTGGACGAATGTGCGTCTAGCCTCGGCGCCACCACTATCACTAGCCGAAAAATATTTCCAACTGAGACCAGACGAACTGAACCCACTATGGGGGAACCCATGTGATGACATCCGTCATAGAAAAATCTGGTCGAAATCAAAATGGTGCGAGACATTACCTAAGGTTTTGGTAATAGGTCCCCAGAAGACGGGTAGCACAGCCCTATATACTTTCCTCGCGATGCATCCAGCACTGGTGCCAAATCTTCCCAGTCCAACCACGTACGAAGAATTACAGTTCTTCAACAATAACAATTACCTCAAAGGATTAGATTGGTACTTAAATTTCTTCCCTCCGAGCCAAAACAACGGCACTCAGATAACTTTTGAGAAGTCAGCAACTTACTTCGACGGGGATTTGGTACCACGGCGCGCCCACGCTCTGCTTCCAAACGCCAAGATAATTGCCATACTTATATCGCCCTCTAAAAGGTACTTAAATTTCTTCCCTCCGAGCCAAAACAACGGCACTCAGATAACTTTTGAGAAGTCAGCAACTTACTTCGACGGGGATTTGGTACCACGGCGCGCCCACGCTCTGCTTCCAAACGCCAAGATAATTGCCATACTTATATCGCCCTCTAAAAGGGCGTATTCGTGGTACCAACATATCCGTTCTCATGGGGATCCCGTAGCTAACAACTACACCTTCCACACAATCATCACAGCGAACGACTCAGCAGCGAAGCCGTTAAGAGACCTCAGGAACCGTTGTCTGAACCCTGGGAAGTACAGCCACTACCTGGAGCGTTGGCTGGTGGAGTACAGCGCTCATCAGATTCACGTGATGGACGGCTCACTGCTAAGATCTGAACCAGCTACAGCAATGCATGGACTTCAAAAGTTCCTTAAGATACAACACGTCGACTACGACAAGCTACTGAAATACGATCCCAAAAAAGGTTTCTTCTGTCAGGCCGTCAGCAACGAGAAGACGAAGTGCCTGGGCAAGTCCAAAGGCAGAATATATCCGCCTATGGAGGAGAGGTCGGCTAAATTCTTGAGGCGGTACTACACGCCTCACAACACGGCGTTGTCCAAACTGCTGGTCAGACTCGGCCGGCCAGTGCCGCAGTGGCTCAAGGACGAACTGACGAACGGATAA

Protein sequence:

>DPOGS215886-PA
MAGEGRGSRAQLLECGDYMHPHKTATPRCCLWLASHINVRKCVAGVMLLSILTIFFYTYYVTAPITSLVWRDRVPRPLSQCSLLASQQQTARDHRSDARLRIDAKVLVIAESLYSRLGRDIAELLVANRIRYKVEVAGKSLPVLTTLDKGRYGVIVFESLSKYANMDKWNRELLDKYCREYSVGVVAFATPGEESLVGAQLRGFPLFMHTNLRLKDAALNPASPVLRLARAGETAWGPLPGDHWTVFRANSSTYEPVAWALRQNEYGSNEERLPLATVVQDHGRLDGVQRVLFGSGLQFWLHRILFLDALSYLSHGQLSLSLDRWILVDIDDIFVGERGTRLHVEDVSALLASQTALQRLVPGFRFNLGFSAKYYHHGTLLENLGDDALLKNREHFNWFCHMWNHQQPHLYNNVSQLEAEMTLNKQFALEHGIPTNSCYSVSPHHSGVYPVHEPLYEAWRKVWDVKVTSTEEYPHLRPARLRRGFRHRGVMVLPRQTCGLFTHTLLLERYPGGRQRLDRSIQGGELFQTVINNPINVFMTHMSNYGNDRLALYTFESVVKFLRCWTNVRLASAPPLSLAEKYFQLRPDELNPLWGNPCDDIRHRKIWSKSKWCETLPKVLVIGPQKTGSTALYTFLAMHPALVPNLPSPTTYEELQFFNNNNYLKGLDWYLNFFPPSQNNGTQITFEKSATYFDGDLVPRRAHALLPNAKIIAILISPSKRYLNFFPPSQNNGTQITFEKSATYFDGDLVPRRAHALLPNAKIIAILISPSKRAYSWYQHIRSHGDPVANNYTFHTIITANDSAAKPLRDLRNRCLNPGKYSHYLERWLVEYSAHQIHVMDGSLLRSEPATAMHGLQKFLKIQHVDYDKLLKYDPKKGFFCQAVSNEKTKCLGKSKGRIYPPMEERSAKFLRRYYTPHNTALSKLLVRLGRPVPQWLKDELTNG-