Monarch geneset OGS2.0

DPOGS210359
TranscriptDPOGS210359-TA3369 bp
ProteinDPOGS210359-PA1122 aa
Genomic positionDPSCF300025 + 383112-389224
RNAseq coverage422x (Rank: top 29%)
Annotation
HeliconiusHMEL0072560.061.81% 
BombyxBGIBMGA011917-TA0.057.18% 
DrosophilaCG4557-PA3e-4941.36% 
EBI UniRef50UniRef50_D6WF842e-14135.59%Putative uncharacterized protein n=1 Tax=Tribolium castaneum RepID=D6WF84_TRICA
NCBI RefSeqXP_969463.13e-14235.59%PREDICTED: similar to CG4557 CG4557-PA [Tribolium castaneum]
NCBI nr blastpgi|910785107e-14135.59%PREDICTED: similar to CG4557 CG4557-PA [Tribolium castaneum]
NCBI nr blastxgi|910785107e-15835.28%PREDICTED: similar to CG4557 CG4557-PA [Tribolium castaneum]
Group
KEGG pathway 
InterPro domain[652-725] IPR0220926.5e-19TATA element modulatory factor 1 DNA binding
[1053-1112] IPR0220911e-11TATA element modulatory factor 1 TATA binding
Orthology groupMCL12318 Single-copy universal gene
Genotypes for resequenced monarchs and outgroup Danaus species

Nucleotide sequence:

>DPOGS210359-TA
ATGAATTGGTTCGATGCATCGGGGCTAACAAGTTTAGCCAAATCAGCTTTAAAGGAGGCCCAGAAAACTATCGATAAGGCTCTCGACATTGATGATGACAGCAGCGAGGATCAAGAAGAGCCAACTGGGACATCTACTTCTAAGTCAACACCTACCAGAAGTATGAATGAAAAAGATAATTCAGACTTCTTTTCTTCTTGGGGCTTAACGGTGAGTGCGGAAAGTGAGAGAGAAAATCCAATACAGGAACAGCCCGTCGTCACAATGAGTCCATCTAAATCTAATTCCCAAAGTTTATGGGGATCCTTTGCTGGTTCCTTTTTTGAGCAAACAAAATCTGAAAGTGAGACAATAGTCCGACCACCCAAGGCAAAGTCTATGAACCTTATTTCAGATAAATATGACAGCCAAGACGATTTGTTTTCTTCTAGTCAGTTAGTGATGTCGGATGGTGGAGAAAGTGGGAATGTAAAGAAAGAAGTTCCTAAATCAGAGGACCGTCGTGACTCAACTATGTCAAATGTTTTGTCATTTATGTCCAGCAGAAATAGCTCTGATTCAGTGGAGGTTCTGTCACAAAGTTTGAAGTCATCACCAGAGTCAGAAGCAGCTTCTTGTCATACAATCTCCAATTCCCACAGCAGTAGCGTTGGTGTTAAGCATAATTCTGAATCAGTTGAAATATTACCTGACAGCCTTGTGAGTCCCAGCTCTATAGAGTGCTTAGGCTTTGACAGTTATGCAAGTGACAAAAACAGCAGCAATTCTTCTAATCTGTCTCCAGGCAAGACATCAGACAAGAAGTCTACACCGGTCGGAGAAAGAACTGAGAGGGCTGAAACTGCAGACAGTGTGAGTTTGGTAGCAGATGACGATGAGGATACCATGTCTTATAACTCAATTTCTGAATGCACGGCCCCCACAGTCCTCGATACCGATGACAACTCCATGAATCCTTTTTCAAAACTTGCCAGATCCGAATTTAAAAAAACGAATGAAAGGGATCTCGTTTATTTAGAGCAGCCTCTGGCATCCCATAAGATGCAATTAAGTGAAAACTCTTCTAATGACGGCTCTTGGTCAGACAGAACCCTAAATGCTGATAATGAAAGTGTTATCTTAGAAAAATCTATAGAAGAGCAGAAAAAATATAGCCAGGAGGATGTTCTAATAGATAAACTAAGTGATTCATCTTCATTTTATAATGTCAACGTCACCAGTGATTTATTGCAGTCTGAGAGTTCTGCTTTTGTGAATGTAGAGAAACAACAATGTAGTCACTCTACTAGTAATGATTCATCCCAGAGAGATACCAGTGTGAAGGAGAGGACCTCTCCTGTGAGTTCCGACAGCAAAAGTGATCTGGTCAAGATTGGTTCCGACCAAACCTCGGGTCATACCTCTGGTGATGAACTGGAGACTGCCACATCTTCTGACATAGAAATCATTCCGAGCCCAAACGGCGAAAGCAGCAACGGCTGCAGAAATAGTCCAGGAAAATATGGCTTCAAAGCAAAAGTAGACGGGGCCACTTCACCCAATCTTGTAGATTTAGTTTTAGGAAAGAGCCTGGCTTCTAAGATACGCGGACATAATAGAGAATTGTCAGAAGCCTCGATACAGAGCAACACCAGTGACGATAGCCAGGGTTCAGATAATGATAAACTGATGCGAAGGTTGTGTGAGATGACCGAGATCTTAGATGCGAGAGAATCGAGGTTGATGGAGGTCAGCCGGAATAACGCGGAGCTGGCCGAATGTAACGCCAGCCTCAAGAGTCAGATCGAAAGTTTACTGAACAAGCATGACGGAGGAGACATCAACACGATCACAGAGGACTACACTCAGAGGATGTCCGCTCTGGAAAAGAAGTTCCAGCAGGCTATCAGGGAGAAGGATCAATTGAGGAAGCAGTTAGACACCTTGAAATCGGACACGACACGCAAGAACTCGTCGGAGCTGGAGAACACTATAAAGGAGAAGGACGAGATGATCTCCCAGCTCCAGGAGGAGGGAGAGAAGCTGGCGAGGCACGAGCTGCAGCACACCAACATCATCAAGAAACTACGCGCTAAGGAAAAAGACAACGAACAGGTCATAAAGGGATTGAGAGACAAGATAGCTGATCAGACGAGTGAACTGGAGCGAATGAAGAGGTGTCTGTCAGCCAAGGAGGAGCTGGAGGTCAACCAGATAGAGGCCGTGTACAGGCTCACGGCCACTAACAAGACTCTGGAGGCCGAGCTGGCAGAGACAAAGAGCTCGCTGGACGACACGACTCAGAGGCTTGCGACGAGCCGCGCGTCTCTGGAGGCGGCGCGGCGGGAGCTGGCCGAGCTGCAGAGAGGAGGGGCAGAGACGACCAGGCTCAGGGACGAGCTGCAGCACGCTCGGGAGGAAGCCCGCCTCGCCCGGGAACACGCGGCCGCCCTGCTAGAGGAGACGAGGCTGCTCAGGACTGAGAGACGAGCCGGAGCAGCGCTGGGCGGCACGCGAGGAAGCTCTGCGGCGCGAGGTGCGAGCCGCCGCGGAGGACTCTACCGGACCGCGCTCGACGCTAGGCTCGCGGAGGCAGAGACGACGGCCGCCAAGGCCAAGGAGAGGGAGAGACTGCTGAGAGAAGACAACACCTCCCTGGCGGAGACACTGGCGGAGGAAAGGAGCCGGGGGGAGGAGCGGGAGGAGAGGAGCAGGGCGCTGGAGCAAGAGCTGCGGGAGGCCAGGGGCACGATACACACGCTCACATGCGACCTGGACAGAAAAACAACGGAGCTAGAACAGATCCGGGTAGAAAGTGAGAGGCAGATAGAGGAACTGAGGACGAGAGTGAGCGAGACGGAACACTCGCTGGCCGAGGAAAAGGCGGCCCTGGACACTGAGAGGAGGAGGAACGCCATACTGCAGGTACGGGGAGCCAGTGAGGGCAGCGGGGATGGGACTCGTACCACTTATGTAAATGAGCAAACGGAGCAAGTGTCCAGTCGCGGCGACGTGTCCCCGGCGCGCTCCGTCACTTCCGACCTCGGCTCCACTTCGTTTTGGACTGAGGAGGCGGCGGGGAGCAGCGCGCTGGCCGTGGAGCAGGCGCTGAGTCAGATGACGCGCCAGTGGGGCTCCCGCCGCGGCCGAGACGACGTGCTGGCCCGCCTGGCCGCCGAGCGCGCGGCGCTGGCCGGGGAGCTGGCGGCCTTGCGGGCGCGCCTCGCGGACCACGAGCACGCGCGCTACGACGAGCTGCTGCAGATGTACGGCGAGAAGGAGGAGCAGCTGCACGAGCTGCGGCTCGACCTCCACGACGTCACGCAGCTCTACAAGCAACAGCTGGACGAGCTGCTGCTGCTCAGGAGACGCCTCGACGAGCGCACCTGA

Protein sequence:

>DPOGS210359-PA
MNWFDASGLTSLAKSALKEAQKTIDKALDIDDDSSEDQEEPTGTSTSKSTPTRSMNEKDNSDFFSSWGLTVSAESERENPIQEQPVVTMSPSKSNSQSLWGSFAGSFFEQTKSESETIVRPPKAKSMNLISDKYDSQDDLFSSSQLVMSDGGESGNVKKEVPKSEDRRDSTMSNVLSFMSSRNSSDSVEVLSQSLKSSPESEAASCHTISNSHSSSVGVKHNSESVEILPDSLVSPSSIECLGFDSYASDKNSSNSSNLSPGKTSDKKSTPVGERTERAETADSVSLVADDDEDTMSYNSISECTAPTVLDTDDNSMNPFSKLARSEFKKTNERDLVYLEQPLASHKMQLSENSSNDGSWSDRTLNADNESVILEKSIEEQKKYSQEDVLIDKLSDSSSFYNVNVTSDLLQSESSAFVNVEKQQCSHSTSNDSSQRDTSVKERTSPVSSDSKSDLVKIGSDQTSGHTSGDELETATSSDIEIIPSPNGESSNGCRNSPGKYGFKAKVDGATSPNLVDLVLGKSLASKIRGHNRELSEASIQSNTSDDSQGSDNDKLMRRLCEMTEILDARESRLMEVSRNNAELAECNASLKSQIESLLNKHDGGDINTITEDYTQRMSALEKKFQQAIREKDQLRKQLDTLKSDTTRKNSSELENTIKEKDEMISQLQEEGEKLARHELQHTNIIKKLRAKEKDNEQVIKGLRDKIADQTSELERMKRCLSAKEELEVNQIEAVYRLTATNKTLEAELAETKSSLDDTTQRLATSRASLEAARRELAELQRGGAETTRLRDELQHAREEARLAREHAAALLEETRLLRTERRAGAALGGTRGSSAARGASRRGGLYRTALDARLAEAETTAAKAKERERLLREDNTSLAETLAEERSRGEEREERSRALEQELREARGTIHTLTCDLDRKTTELEQIRVESERQIEELRTRVSETEHSLAEEKAALDTERRRNAILQVRGASEGSGDGTRTTYVNEQTEQVSSRGDVSPARSVTSDLGSTSFWTEEAAGSSALAVEQALSQMTRQWGSRRGRDDVLARLAAERAALAGELAALRARLADHEHARYDELLQMYGEKEEQLHELRLDLHDVTQLYKQQLDELLLLRRRLDERT-