Monarch geneset OGS2.0

DPOGS202600
TranscriptDPOGS202600-TA3360 bp
ProteinDPOGS202600-PA1119 aa
Genomic positionDPSCF300140 - 413973-418719
RNAseq coverage7x (Rank: top 86%)
Annotation
Heliconius% 
BombyxBGIBMGA006345-TA1e-5038.59% 
Drosophila% 
EBI UniRef50UniRef50_UPI0001CB94FF8e-3130.00%UPI0001CB94FF related cluster n=1 Tax=unknown RepID=UPI0001CB94FF
NCBI RefSeqXP_002730726.12e-3130.00%PREDICTED: MutL protein homolog 1-like [Saccoglossus kowalevskii]
NCBI nr blastpgi|2912214113e-3030.00%PREDICTED: MutL protein homolog 1-like [Saccoglossus kowalevskii]
NCBI nr blastxgi|2420227216e-5822.79%DNA mismatch repair protein, putative [Pediculus humanus corporis]
Group
Gene OntologyGO:00055243.7e-61ATP binding
GO:00062983.7e-61mismatch repair
GO:00309833.7e-61mismatched DNA binding
KEGG pathwaytgu:1002306175e-30 
 K08739 (MLH3)maps-> Mismatch repair
InterPro domain[7-1106] IPR0020993.7e-61DNA mismatch repair protein
[7-131] IPR0035948.8e-13ATPase-like, ATP-binding domain
[893-1054] IPR0147901.7e-12MutL, C-terminal, dimerisation
Orthology groupMCL22120 Insect specific
Genotypes for resequenced monarchs and outgroup Danaus species

Nucleotide sequence:

>DPOGS202600-TA
ATGTGTACATTGAAATCTGCCCCTAATAAATATGGTTATCGTGGTTTATCTTTAGCAAGTGTTATAGGAATTTCACAGACTGTTTTAATTACTTCAAGATATAATGATTCTGACTCAACATGGCTAAAAACGTTTTGTAATGGAACAGAGAAGAATATTTGTATTGTATCAACAAGACCATCAAAAGGCACAACGGTAGAAATCAGAGGATTCCTATACAATCTAAACATTCAAAGAAAAGCAATAAATCCTATAAATGAATTACAAAACATCAAATCATCTTTAGAGAAATTGTCATTAATTCACTGTGATGTATCTATTAGCTTAAGGGATGATTATAAGAATAAGATTATATTTAAAATGTACAAAAAAAGAGATATTTATCAAACTTTATGGTCTTTATTTGATATTAATAAAGAAGATGTTCAAGAATTGCAAGTTGAAAAAAATAATTATAAAGCAAAAGCATTTATTGCCAATGAAAATATAATGAAAACCAGACATTTTAATCATCAATGGGTATATTTAAATGGAAAATTTGTTACAAAATCTGAAATACATACAAAAATAAACAGAGTTTTTAAAAAAACTTTCCATAAAGTACAAAAAATTACAAAAATTAAGAATAATATTGATGAAGACCATAACAGTGATATACCATTTTATTTTATATTTATATCATGTCCATTTTATGATTTTGACATAACATATAAGCACAAACAAACAATAGTTGAGTTTAAAGATTGGTCAGAAAAATCAAAATATATCGATCATATTGCCCTTAAATTCAAAAATTTAACCGATAGCGTTAGAGATAGACGTAAAATATTATCATCGGAAGCACGAAATTTAATTAAAAAGAAAGAATCTAATACTGAAACTATCAAAATAACTTACACAGCAGAGTATAACGAAACTTATTCGTATTATAAGGGGAAAGACGAATTAAATATTGATTATAGATTTGTTAGTAAAACCGATATCAGAGATACTCATCGTATATTCGATTCTGAAACATTTAAATCATTACATCCGAAAATTACATTTGACAGTCACAACTGTCAAAATTATGCGAAGACAAGGAATATATTCACAAAAAATTTAAACTGTGCTAACACAACTGATAAAATACATTTGGATGAACATGATTCGAATATTTGCCACGAAAATTTAAAAAGATTTAGTGATAACGGATGTTATGATGAAAATAGTTTTATAAAATATAATAATGAGCAAATGTCGTATACGATAAATAGTGATACCAGTAAAAGAATAATTATTAGTAATAATTCATATTTAAATAAATATATAGAGACAGAAACTAACGCAGATAAGATTAATTTATTCGATTTGATCGATAAACGATTATCGAATAAATCAAATTTAAAGACATCATACGATAAAGAAAATTTTCAAAATAAAAGAAACAGTAGTTATAATAAACCATATCATGCATTGATAGCTGATAATATAAGACAAAATAATGGGATAGATTTAACGTACCTAAGTTACAAAAACTATCAAAAAAAATATACTAATACTGTACGAAATAACACAGAAAGGGTCAAGTCAATAGGCGACGCACACTATAAAGAATTATCCCAAAATTTTCATTCTGTAAACTATTGTGAAACGATTTACCATAGCAATGAAATTAGTTTCGTAGCCGGAGACCTCCAAAATCGTAATAATTGTGTCAAAATAAGTCCTGATGATTTCAAGTCGCCTGAACAAACTCACAAAGAAAATTCAAATTTCTTTTTGCTAAATAATGAAGATATTCATCACGATAATAATATATTTCAAGAAGCAAATACTAATACGGAATTGCACCCTGTAGTAGAAAATCCGGTTCAAGACTTAAATAGTTATCAATTTATAAAGAGCCATGGCTGCAAGAAATCAAATATCGCAATATATTTCGATGCAGAAGATTTTCCGAATAAAGACAAATTTAACCTCAATGAAACATATTCCGTAATAAAAACAAATAACTTAAGGGAAAACAATGACGTAGATTTGAATTTAATGAGTAATAAAAATTATACACAAGATTATAATCAAAACGAAATGAATAAAACTGAAATAATTGATAATAAGGAATTATCACAAAGTTTGGATCCTATGAATTATTGCGAAACGTTTTTCCGACGCAGCGAAATGAGTGACATTGCCAAAGAATTTATGAACAGCTTCAACATAAATACAAACGACCTGGGTTCTAACGAATGTGATCCTATTTCTAATGCTCACAATGAAAATTTCAAACTTAATTACTCCAATGATGAAAAAATACAAGACGAAGAAAAAATATCACAAAATTCAAACACTAATTCCGAACTGCACAGCGCAAAACGAAATTGTGTTGAAGATTTAAAAACTTTTGAATTGAAAAAACGTCATGACTTGATGCCGAAAGGTATGTCCCAAGTCTACAAGACTAGACTACAAAAACAAACTAATATAAGTATATCTCAAATCGACTATTACGAGAATATAATGTATGACAAATTTGCAGACGATGTTTTCGTAAAATCTAAAATATTTGCACCATCGATACAGAATGCTGAAGTCAATTCAAGGAAATTGAAGAATTGTGATATTAGAAATGATGATCTAATATTTAATGCCACGTCTTTGAGACAAGCCAAAGTAAGCTGGGGTGTTAGTACCGAGATTCTAGGTCAAATAGATCGTAAATTTATTGCCACAAAAATGAACGGGAAGAAAACTGACGTTAATGTAGATTTTTTGGTACTCTTCGATCAGCACGCGGTCGATGAAAGAGTTAAACTTGAAAGGAATTTAGCGGAATACTTTGACGGAGAACTCTGGCGTAGCGTTAAAGTAGATTCAATACCACTCAAGCTGAATGAAAACGAACTTGTCTATTTGCATAACCACAGACATAAATTCTCGCAATTCGGTTTACAGTGGACATTTCAAGAGAACAAAATATCGATCAATTCTATACCTAAAGCAATTATAGGCAAAAATGCCAGACAGGAGCAAATAGTTCTTAAAGCTGTTCACCGTCTGATATTAGAACAAATTGATGTCATTGAAACGATTGGTGGTAATCTGAATGTATTTCCCAAAGCAATTATGGATCTTGTTTTCAGTGAGGCTTGTCGGAATGCAATTAAATTTGGCGATAACGTATCTCTAAGTGATTGTACAACTTTGCTTAAGTCACTTTCATCCTGCAAAATCCCATTTCAATGCGCACATGGACGTCCTGTGATGACAGTCGTAATGGAACTTCCTAAAAACATTCGTAATTACAGGGTGGACAAGGAAAAGATTAAACAATTCAAATCACGTAAATATAATTCGAATAAATATATTGCTAGACATTAA

Protein sequence:

>DPOGS202600-PA
MCTLKSAPNKYGYRGLSLASVIGISQTVLITSRYNDSDSTWLKTFCNGTEKNICIVSTRPSKGTTVEIRGFLYNLNIQRKAINPINELQNIKSSLEKLSLIHCDVSISLRDDYKNKIIFKMYKKRDIYQTLWSLFDINKEDVQELQVEKNNYKAKAFIANENIMKTRHFNHQWVYLNGKFVTKSEIHTKINRVFKKTFHKVQKITKIKNNIDEDHNSDIPFYFIFISCPFYDFDITYKHKQTIVEFKDWSEKSKYIDHIALKFKNLTDSVRDRRKILSSEARNLIKKKESNTETIKITYTAEYNETYSYYKGKDELNIDYRFVSKTDIRDTHRIFDSETFKSLHPKITFDSHNCQNYAKTRNIFTKNLNCANTTDKIHLDEHDSNICHENLKRFSDNGCYDENSFIKYNNEQMSYTINSDTSKRIIISNNSYLNKYIETETNADKINLFDLIDKRLSNKSNLKTSYDKENFQNKRNSSYNKPYHALIADNIRQNNGIDLTYLSYKNYQKKYTNTVRNNTERVKSIGDAHYKELSQNFHSVNYCETIYHSNEISFVAGDLQNRNNCVKISPDDFKSPEQTHKENSNFFLLNNEDIHHDNNIFQEANTNTELHPVVENPVQDLNSYQFIKSHGCKKSNIAIYFDAEDFPNKDKFNLNETYSVIKTNNLRENNDVDLNLMSNKNYTQDYNQNEMNKTEIIDNKELSQSLDPMNYCETFFRRSEMSDIAKEFMNSFNINTNDLGSNECDPISNAHNENFKLNYSNDEKIQDEEKISQNSNTNSELHSAKRNCVEDLKTFELKKRHDLMPKGMSQVYKTRLQKQTNISISQIDYYENIMYDKFADDVFVKSKIFAPSIQNAEVNSRKLKNCDIRNDDLIFNATSLRQAKVSWGVSTEILGQIDRKFIATKMNGKKTDVNVDFLVLFDQHAVDERVKLERNLAEYFDGELWRSVKVDSIPLKLNENELVYLHNHRHKFSQFGLQWTFQENKISINSIPKAIIGKNARQEQIVLKAVHRLILEQIDVIETIGGNLNVFPKAIMDLVFSEACRNAIKFGDNVSLSDCTTLLKSLSSCKIPFQCAHGRPVMTVVMELPKNIRNYRVDKEKIKQFKSRKYNSNKYIARH-