Monarch geneset OGS2.0

DPOGS208540
TranscriptDPOGS208540-TA3288 bp
ProteinDPOGS208540-PA1095 aa
Genomic positionDPSCF300064 + 842874-850288
RNAseq coverage585x (Rank: top 22%)
Annotation
HeliconiusHMEL0087510.099.05% 
BombyxBGIBMGA010354-TA0.097.78% 
Drosophilatutl-PG0.075.24% 
EBI UniRef50UniRef50_Q7Q3K80.075.41%AGAP007928-PA n=7 Tax=Endopterygota RepID=Q7Q3K8_ANOGA
NCBI RefSeqXP_317553.40.075.41%AGAP007928-PA [Anopheles gambiae str. PEST]
NCBI nr blastpgi|1582972930.075.41%AGAP007928-PA [Anopheles gambiae str. PEST]
NCBI nr blastxgi|1571136260.072.45%turtle protein, isoform [Aedes aegypti]
Group
Gene OntologyGO:00055151.9e-10protein binding
KEGG pathwaymdo:1000291821e-33 
 K06766 (NEO1)maps-> Cell adhesion molecules (CAMs)
InterPro domain[150-251] IPR0137831.1e-23Immunoglobulin-like fold
[527-623] IPR0089574.2e-22Fibronectin type III domain
[152-240] IPR0130988.4e-18Immunoglobulin I-set
[165-230] IPR0035984.7e-16Immunoglobulin subtype 2
[250-337] IPR0035991.5e-11Immunoglobulin subtype
[531-614] IPR0039611.9e-10Fibronectin, type III
[34-139] IPR0131061.4e-06Immunoglobulin V-set
Orthology groupMCL10701 Multiple-copy universal gene
Genotypes for resequenced monarchs and outgroup Danaus species

Nucleotide sequence:

>DPOGS208540-TA
ATGGGCTGGCGCGCCGAGCAACCGCCGCACATCGCCGCCGGCCTGCTCTTCTTGCTGCTAGTCTACCTGCCGGTCACCTGCCATCAAGATCATCAAGATGCTGTACACATCACGGCGATCCTCGGAGAGAGCGTCGTATTCAACTGCCAGGTCGATTTCCCCGAAGACATCCCTGTGCCGTACGTGTTGCAGTGGGAGAAGAAGGTAGGCGAAACGGGACAGGACATTCCGATCTATATCTGGTATGAGAGCTATCCGACGCACAGCGGCGAAGGTTACGAGGGGCGAGTGTCGCGAGTGGCTCCTGACTCGCCCTACGGAGCGGCCAGCCTCAACCTCACTAATATTAGAGAGTCGGATCAGGGTTGGTACGAGTGTAAGGTGGTGTTCCTCAACCGATCCCCAAACCAACACAAGAATGGGACCTGGTTTCATCTGGACGTGCACGCGCCACCTAGATTCTCCATCACACCGGAAGACATTATATACGTCAATTTAGGTGATGCCATAATCCTAAACTGTCAAGCCGAAGGGACACCAACTCCCGAAATACTATGGTACAAAGACGCGAATCCAGTGGAACCTTCAGGCACTGTTGGCATATTCAACGACGGCACCGAACTGAGGATAAGCAACATCCGCCATGAGGACATCGGAGACTATACATGTATAGCGAGGAACGGGGAAGGTCAGGTGTCACACACGGCTCGCGTCATCATCGCTGGAGGAGCTGTCATTACTATGCCACCAACGAACCAAACGAAACTGGAAGGGGAAAAGGTACAATTTTCTTGCGAAGCGAAGGCTCTACCGGGAAATGTTACTGTGAAATGGTTCCGCGAGGGGGCGCCAGTGGCTGAAGTGGCGGCTTTGGAGACCCGCGTCACGATCAGACGAGACGGAGCCCTGGTCATCAACCCCGTGGCAGCGGACGATTCCGGCCAGTACTTGTGCGAAGTGTCCAACGGCATCGGCGATCCCCAAAGCGCCTCTGCGTATCTCAACGTCGAATATCCAGCGAAGGTGACTTTTACGCCAACAGTACAATACCTGCCGTTCCGGTTGGCGGGAGTAGTTCAGTGTTACATAAAAGCCAACCCGCCCCTTCAGTATGTCACGTGGACGAAGGACAAAAGACTGTTGGAGCCGTATCAGACCAAGGACATAGTTATCATGAACAATGGCTCGCTGCTGTTCACCCGCGTCAATCAAAACCATCAAGGAAGATACACTTGTACGCCGTACAACGCCCAAGGGACGCAAGGGTCTTCAGGCCCTATGGAGGTGCTAGTACGTAAACCGCCAGTATTCACAGTGGAACCGGAACCTTTGTATCAGAGAAAAGTAGGAGAATCAGTGGAAATGCACTGCGAGGCTCAAGAGGCTGAGGGGACGCAGCGACCGAGCGTAGTGTGGAGGCGACGAGATGGACTCCCTCTACAGAAGAGTCGAGTGAGGGCGCTGGGCGGCAACATCACCATCGACACGCTCAGGAGACAGGACTTCGGAATATACCAGTGTGTGGCTTCCAACGAGGTGGCGACGATAGTAGCGGACACTCAGCTCGTCATAGAGGGCACGCAGCCCCATGCGCCATACAACGTGTCAGGCACGGCCACCGAGTTCCAGGTGACACTCCGATGGCAGCCAGGCTACGCGGGGGGACCGGACTATAAACAAGACTACACCATATGGTACAGAGAGGCTGGCTTCTCAGAGTGGACTAAAGTACCAGTCACGCCATCTGGTGCCACCTCCGTGACAATAAATCGTCTCCAGCCCGGTACGACTTACGAGTTTCAAGTGAACAGTAAGAACACGATCGGTGAGGGGATGATGAGTAAAGCTATCACTATAAGGACTCTCGACGTAGGTGCCAAGCCAAAGGCTGCCCCCACCGCCGCGGGGCCGATAGATGAAAAGATATTTCAGAACGCACCCGAAGGCTCCGGTCCTAAGTCCGGACCCCCCCGCAACCTGACAGTGACAGAGGTCCACAATGGTTTCCTGATAACATGGCAAGCGCCTCTGGAGCGGTCTCACTTGGTCCAGTACTACACTATCAAGTACCGCACAGACGCTCAGTGGAAGACACTCAATAGAGGACAGATAAGGCCCGAGGAGACCAGCTACTTGGTCAAGAATCTAGTCGGAGGTAGGACGTATTATTTCCGCGTGCTGGCGAACTCCGCGACCAGCTACGAGAGTTCCGAGGAAGTGAAGTTTCCTGTGCCGGCGCGGGTCAAACACAAGGCCATAACGGCCGGGGTCGTTGGAGGGATATTGTTCTTCATAGTGGCCATCATACTGTCCGTCTGCGCGGTCAAGATATGCAATAAGAGGAAACGCCGCAAGCAGGAGAAAGCATACAACATGGTAGCCGCGCGACTCACCGACCTGCGCGCGGCTGACAGCACTCAAGTGCCTTTTAAGAAATTTAGAGAAAGCGGAATATCGAGTTTAGTACAATGTTTGCGATTCACGGCGAACTGGGTGTGGCCGGCGTCTCGATGTGGCGACGAGTCCCGCGTTTGGCCGGCGCCCGTCGCCTCCCTGGCGGAGTCTCCAGCGCCCTCGCCGGCTCCCTCCGCCGCGCCTTCCTCCTCAGACGACGGCGGGTTCCTCCCGCGACTGCGTGCGCCGCTATCCCCCGCCGCCGCGCCGCCTCTGTTCCGCGCCTCCTCGCCGGCCCTCACCCTGGCCTGGCCGCCGTGGCCGCCCTGGCCGTCCTGGCCGGTGTGGGCGCCCGCCTGGACTCCGTGGTCGCCTCTCCACATCTCCGACCTCAGCTCGGTACCGTTCCCCAGTTCGGCGGACGGCTCGTTTCCGACGCCCCCCTCTTTCCGCTCCCGCCCCCCACGCGTGTCCCTCGATGTCCCGTCCCGAGTGTGCGTGCTCGGTCGCCCCCGGTCCCGGCCGCCGCAGCCGCACGGGAAGCCTGGCCGAGTGGCGTCCCCCGCCCTACCGGCCGCCGCCGCCCGTGCCCGTCTCGCGGCCGGGGCGGGGCGGCTCGAGGCGGCGGCGGAGGCGGCGGCCGCCGAAGCCGCCGACGCGGGCTCCGTGGACGTCCACTACGAGTTCGATCGCGCGACTCGCACCCCGACGCCCTCGACGCCGGAACGAACCCGCGCCCGTCCCTCGCGAGACGACGTAGAGGCTCGCGTGCGCGCTATGAAGGAGGAGTTCCTGGAGTTCCGCAAGCGCCAGGCGCTCCGCCGTCGCTCCCCGGAGCCGCTGTCCCCGCTGGCGCCGCTGTCCTCGCTGGCCCCCGCCGAGACGGTGTGCTGA

Protein sequence:

>DPOGS208540-PA
MGWRAEQPPHIAAGLLFLLLVYLPVTCHQDHQDAVHITAILGESVVFNCQVDFPEDIPVPYVLQWEKKVGETGQDIPIYIWYESYPTHSGEGYEGRVSRVAPDSPYGAASLNLTNIRESDQGWYECKVVFLNRSPNQHKNGTWFHLDVHAPPRFSITPEDIIYVNLGDAIILNCQAEGTPTPEILWYKDANPVEPSGTVGIFNDGTELRISNIRHEDIGDYTCIARNGEGQVSHTARVIIAGGAVITMPPTNQTKLEGEKVQFSCEAKALPGNVTVKWFREGAPVAEVAALETRVTIRRDGALVINPVAADDSGQYLCEVSNGIGDPQSASAYLNVEYPAKVTFTPTVQYLPFRLAGVVQCYIKANPPLQYVTWTKDKRLLEPYQTKDIVIMNNGSLLFTRVNQNHQGRYTCTPYNAQGTQGSSGPMEVLVRKPPVFTVEPEPLYQRKVGESVEMHCEAQEAEGTQRPSVVWRRRDGLPLQKSRVRALGGNITIDTLRRQDFGIYQCVASNEVATIVADTQLVIEGTQPHAPYNVSGTATEFQVTLRWQPGYAGGPDYKQDYTIWYREAGFSEWTKVPVTPSGATSVTINRLQPGTTYEFQVNSKNTIGEGMMSKAITIRTLDVGAKPKAAPTAAGPIDEKIFQNAPEGSGPKSGPPRNLTVTEVHNGFLITWQAPLERSHLVQYYTIKYRTDAQWKTLNRGQIRPEETSYLVKNLVGGRTYYFRVLANSATSYESSEEVKFPVPARVKHKAITAGVVGGILFFIVAIILSVCAVKICNKRKRRKQEKAYNMVAARLTDLRAADSTQVPFKKFRESGISSLVQCLRFTANWVWPASRCGDESRVWPAPVASLAESPAPSPAPSAAPSSSDDGGFLPRLRAPLSPAAAPPLFRASSPALTLAWPPWPPWPSWPVWAPAWTPWSPLHISDLSSVPFPSSADGSFPTPPSFRSRPPRVSLDVPSRVCVLGRPRSRPPQPHGKPGRVASPALPAAAARARLAAGAGRLEAAAEAAAAEAADAGSVDVHYEFDRATRTPTPSTPERTRARPSRDDVEARVRAMKEEFLEFRKRQALRRRSPEPLSPLAPLSSLAPAETVC-