Monarch geneset OGS2.0

DPOGS203760
TranscriptDPOGS203760-TA2829 bp
ProteinDPOGS203760-PA942 aa
Genomic positionDPSCF300010 + 98417-113786
RNAseq coverage129x (Rank: top 56%)
Annotation
HeliconiusHMEL0025650.050.29% 
BombyxBGIBMGA011490-TA0.067.27% 
Drosophilaspel1-PA0.040.97% 
EBI UniRef50UniRef50_E2C8G10.043.31%DNA mismatch repair protein Msh2 n=9 Tax=Formicidae RepID=E2C8G1_HARSA
NCBI RefSeqXP_001121207.10.044.39%PREDICTED: similar to mutS homolog 2 [Apis mellifera]
NCBI nr blastpgi|3504234840.045.65%PREDICTED: DNA mismatch repair protein Msh2-like [Bombus impatiens]
NCBI nr blastxgi|3504234840.045.69%PREDICTED: DNA mismatch repair protein Msh2-like [Bombus impatiens]
Group
Gene OntologyGO:00055243e-91ATP binding
GO:00062983e-91mismatch repair
GO:00309833e-91mismatched DNA binding
KEGG pathwayame:7253480.0 
 K08735 (MSH2)maps-> Colorectal cancer
    Pathways in cancer
    Mismatch repair
InterPro domain[685-872] IPR0004323e-91DNA mismatch repair protein MutS, C-terminal domain
[353-640] IPR0076962.3e-55DNA mismatch repair protein MutS, core
[198-338] IPR0078603.1e-13DNA mismatch repair protein MutS, connector
[81-176] IPR0076953.7e-08DNA mismatch repair protein MutS-like, N-terminal
Orthology groupMCL11845 Single-copy universal gene
Genotypes for resequenced monarchs and outgroup Danaus species

Nucleotide sequence:

>DPOGS203760-TA
ATGCCGGACCTGCAAGCTTTGGCCCGGAGACTGGCTAGGAAGAAAGCTGGCTTACAGGACTGTTACAGAATATACCAGGCTATCAACCGCATTCCCGTCCTATTGAAGTGTCTGTCTGAGTTCAACGACCCCACGATACATTCGGTGCTCTGTGAACCGATAGCTGAACTTAACAACGACCTGGAAAAGTTCCAGCAGATGATTGAAACTACCATCGACCTAGAAGCTGTTGACAGAGGTTCGAAACCTCCAACAACAGTACGTATATTTCACAGAAATGAGTATTACAGCGTTCACGGGGCCGACGCTACGACCGCTGCCAGAGAAGTATTCTCCTCCACATCAAACATCAAGAGAATGGGCATCGAGCCTAACAAACTAGACTATTTGGTCCTATCGAAGGGAAACTTTGAGATACTCATCAGGAAATTACTATTGGTACGGAGATACAGAGTCGAGATATTTGTGTCGGAGGGATCAGTGAAGTCCTGTGATTGGTCGCTCAGGTACAAAGGTTCTCCTGGATACCTGTCCCAATTGGAGGAAATTGTCGGGGACGGTTTAGGATCCGCCAATGAGCAATCTACATGCTTGATGGCCGTCAATGTCAAGAGTGACGCCATCAGTAAGGGCCGCCTAGTGGGCATAGCGTGCGTGTATCAGAACGATTACACTTTATCAGTGTCGGAGTTCACTGATGATGTTGACTTCACCCAGCTAGAGTCGATCGTCGTACAAGTGGCGCCCTCTGAGTGCGTTGCGGCGCCGGCTGATAACGATTATAAAGCCTTAAAGAAGGTTATGGACAGAGCGAGTGTGACGGTGACGAAGGTCAAGAAGTCGGAGTTCACGACGGAAGGTCTCATCCAGGATCTGAACAGACTTCTCAAGTTCAAAGAGGATCAGCAAAAAGATGCCAATGGATTCCAGGAAACCAAACTACCGGTGGCCATGAGCGCTCTGGCAGCCGCCGTTAGATATACGTCGCTGTTAAACGATGACACGAACTTTGGAAGGTTCCGCATATCGTCAGTGAAGGCCGACTACCTTCAGCTGGACTCCTCGGCCCTGTCGGCACTGAATGTGTTCCCTGAACTCGGTGATACGAACACTTCGCCAACCAGGAGCATCTACGGACTACTCGACAGATGTAGAACACAGCATGGAAAACGACTTCTGTGCCAGTTGCTTCGTCAGCCTCTTAGAGACATCAACCTGATCAACGAGCGCCTGGACATTATCCAGCTGTTGCAGTTGCATGAAGATCATCTTAGGCGGATGCCGGACCTGCAAGCTTTGGCCCGGAGACTGGCTAGGAAGAAAGCTGGCTTACAGGACTGTTACAGAATATACCAGGCTATCAACCGCATTCCCGTCCTATTGAAGTGTCTGTCTGAGTTCAACGACCCCACGATACATTCGGTGCTCTGTGAACCGATAGCTGAACTTAACAACGACCTGGAAAAGTTCCAGCAGATGATTGAAACTACCATCGACCTAGAAGCTGTTGACAGAGGTGATTTTCTCGTGAAGCCATCTTTCGATGAAGAGTTACAGGTACTAGCGAATGATCTGGAAAAATTACAAAACTCAGCTGAGAAAGAATTAAACAAAGCGGCCAGGGATCTTGACATGGAAGCGGGGAAAACTATTAAATTAGAAAATAATCCACAGCACGGTTTTAAATACACGATAGTGGATGCCATTAAAGGTGGGGTCAGATTCAGGAACAGTTGCTTAGGAGACATCACAGAGAACTACCTCCAGGCGAAGGCTGCGTACGAGAAGGAGCAAGATAAAGTAGTCGCCGAAATCATTAATATAGCTTCCACTTATTCGGAGTGTCTGTATTGCCTGTCCAATATAATATCTAAGTTGGATGTATTGGTGTCACTGTCTGTGGTGGCGAGTACCTCTTCATCCAAGTACACTCGACCAGTTCTCACTACCAGTATCCAGGATCTGGTGCTGAAGGATGTACGGCATCCGTGCCTCGAACTACAGGAAGGCGTCTCGTATATACCCAATGATGTTGTTCTCGAACGAGATTCGAGTCTGATGCATATAGTGACGGGCGCCAATATGGGTGGTAAATCCACGTGGATGAGGTCGTGTGGGGTGGCTGTGATCCTCGCTCACGTGGGGTCCTTCGTGCCAGCCGAATACGCCAAAATACCCATCCTAAGGTCTCTATGCGCTAGAATCGGTGCCAGCGATAGAGAGGAGAAAGGCCAGAGTACTTTCATGCTAGAGATGCTAGAGACGGCTGGGATATTGAGGAACGCTACGGCCGATTCTCTGGTCCTGATCGACGAACTCGGTCGTGGAACATCTACGTACGAGGGTTGCGGCATCGCTTGGGCTATCGCTGAAAAACTTTCAAAGGAGATCCAATGCTTCTGTCTGTTCGCGACCCACTACCACGAGCTGACCCGGCTGGCGTCGTGTGGTTCTCGCGTCGTCAACTCGCAGGCGCTGGCGGATGTCGTCGACGGCCGGCTCGTGTTGCTGCATCGCGTGGTACAGGGGCCAGCCGCCAAGTCTCTGGGGCTGCACGTCGCTAAGATCGCTGACTTACCGGAAGATATACTGCAGTTCGCAGAAGAGAAGCAGGCGGAGTTAGAAACGGATCTTTGCGAGGTCGAATCCGAAGTTAGATCTGAAGATACATCCGAAGGGCAGGCGTTCATCAAAGAGTTTCTCATAAAATGCAAGCAAATACAGGAAAAGAACGAGTCGGATGAAAAAATGATGGCTGAAATAAAGAAGCTGAAACAAGAAATGTTGCAGACGGATAACAAATATGTGGCCGCGTTGCTCAGCCGCTGA

Protein sequence:

>DPOGS203760-PA
MPDLQALARRLARKKAGLQDCYRIYQAINRIPVLLKCLSEFNDPTIHSVLCEPIAELNNDLEKFQQMIETTIDLEAVDRGSKPPTTVRIFHRNEYYSVHGADATTAAREVFSSTSNIKRMGIEPNKLDYLVLSKGNFEILIRKLLLVRRYRVEIFVSEGSVKSCDWSLRYKGSPGYLSQLEEIVGDGLGSANEQSTCLMAVNVKSDAISKGRLVGIACVYQNDYTLSVSEFTDDVDFTQLESIVVQVAPSECVAAPADNDYKALKKVMDRASVTVTKVKKSEFTTEGLIQDLNRLLKFKEDQQKDANGFQETKLPVAMSALAAAVRYTSLLNDDTNFGRFRISSVKADYLQLDSSALSALNVFPELGDTNTSPTRSIYGLLDRCRTQHGKRLLCQLLRQPLRDINLINERLDIIQLLQLHEDHLRRMPDLQALARRLARKKAGLQDCYRIYQAINRIPVLLKCLSEFNDPTIHSVLCEPIAELNNDLEKFQQMIETTIDLEAVDRGDFLVKPSFDEELQVLANDLEKLQNSAEKELNKAARDLDMEAGKTIKLENNPQHGFKYTIVDAIKGGVRFRNSCLGDITENYLQAKAAYEKEQDKVVAEIINIASTYSECLYCLSNIISKLDVLVSLSVVASTSSSKYTRPVLTTSIQDLVLKDVRHPCLELQEGVSYIPNDVVLERDSSLMHIVTGANMGGKSTWMRSCGVAVILAHVGSFVPAEYAKIPILRSLCARIGASDREEKGQSTFMLEMLETAGILRNATADSLVLIDELGRGTSTYEGCGIAWAIAEKLSKEIQCFCLFATHYHELTRLASCGSRVVNSQALADVVDGRLVLLHRVVQGPAAKSLGLHVAKIADLPEDILQFAEEKQAELETDLCEVESEVRSEDTSEGQAFIKEFLIKCKQIQEKNESDEKMMAEIKKLKQEMLQTDNKYVAALLSR-