Monarch geneset OGS2.0

DPOGS210900
TranscriptDPOGS210900-TA1074 bp
ProteinDPOGS210900-PA357 aa
Genomic positionDPSCF300045 - 561678-584644
RNAseq coverage165x (Rank: top 51%)
Annotation
HeliconiusHMEL0035067e-15286.13% 
BombyxBGIBMGA003082-TA2e-6584.62% 
DrosophilaSox21b-PA9e-6156.82% 
EBI UniRef50UniRef50_E2B1E26e-7358.36%Transcription factor SOX-21 n=11 Tax=Neoptera RepID=E2B1E2_CAMFO
NCBI RefSeqXP_971967.14e-8256.87%PREDICTED: similar to Sox21b CG32139-PA [Tribolium castaneum]
NCBI nr blastpgi|2700081304e-8257.10%Sox21b [Tribolium castaneum]
NCBI nr blastxgi|2700081303e-9458.09%Sox21b [Tribolium castaneum]
Group
Gene OntologyGO:00036771.8e-33DNA binding
GO:00055152.6e-30protein binding
KEGG pathway 
InterPro domain[65-147] IPR0009101.8e-33High mobility group, HMG1/HMG2
[47-139] IPR0090712.6e-30High mobility group, superfamily
Orthology groupMCL16621 Insect specific
Genotypes for resequenced monarchs and outgroup Danaus species

Nucleotide sequence:

>DPOGS210900-TA
ATGTACACCCTTTCACAACTCGACTACCAAACGTCGATGAACTCTGCCGCTGGTGCTATGTGCGATCGAGCTGGTTATGGTAGCACCATGAGCCACTTCCCCATGGGCCCAATGTCGGCCATGGGAGCCATCGGCACAATGGGAGCGATCTCGGCCATGAACGGCCAATCACACCAGAAGAAATCCCAGGAGGAACACATCAAGCGACCCATGAACGCGTTCATGGTGTGGTCCAGATTACAAAGGAGGAAAATTGCCCAGGACAATCCAAAAATGCATAACTCGGAGATCTCAAAGAGATTAGGCGCTGAATGGAAGTTGCTCTCAGAAGACGAGAAGAGACCCTTCATTGATGAGGCGAAGCGTTTGAGAGCCATGCACATGAAGGAACACCCAGACTATAAGTACAGACCGCGGAGGAAGCCCAAGACTTTGAGGAAGGAAGGCTATCCCTACTCCATCCCCTATCCCAGTGTACCGATGGACGCCTTGAGAGCCGGTATGGCCGGTGGTGGTATGACCCAGGCCATGGGTGGCTACTATGGTGCTGCATACGGACCTCTGGGAGCGAGTATGGCCGCAGCAGCAGCGGCGGCGGCGCAGCAGAACGCCATATCGGCAGCGTTGACGCCGAACGCACAGGTTGGTTCGTCAATGGATATGTCTAAATACGGTATAGACGACAAATACAGGAGCTATGGTATGTATCCCGACCCGTCCCGGGGCTACTTGGATTCGGCGGCCCTCTCCAAGGCCTACATGTACATGGATCAACAGCAGCAACGCTCGTATCCTATGGATATCAGCAAGATGTACTCCGAGGCCTCGGTGGCTATGGCTGGATTGAGCTCCACTGTTAACTCTGCATCCAGCTTGTCTCCGAGGTCTCCAGCGGAGTCGCCCGATACTAAGCAGGACCGCCCCGAGGGGTCAACGTCCTCGGGGTCCAATCCCTCCCCCTCCCTCCCCTACTACCAGTCCTCGGGGCTCCTGATGCCCCAGTATCCCGGCCAGTATCCTCAGAACACCCAAGCCGGTCCGGAGTTCAGGCGGCCGCTCACTGTTATATTCTGA

Protein sequence:

>DPOGS210900-PA
MYTLSQLDYQTSMNSAAGAMCDRAGYGSTMSHFPMGPMSAMGAIGTMGAISAMNGQSHQKKSQEEHIKRPMNAFMVWSRLQRRKIAQDNPKMHNSEISKRLGAEWKLLSEDEKRPFIDEAKRLRAMHMKEHPDYKYRPRRKPKTLRKEGYPYSIPYPSVPMDALRAGMAGGGMTQAMGGYYGAAYGPLGASMAAAAAAAAQQNAISAALTPNAQVGSSMDMSKYGIDDKYRSYGMYPDPSRGYLDSAALSKAYMYMDQQQQRSYPMDISKMYSEASVAMAGLSSTVNSASSLSPRSPAESPDTKQDRPEGSTSSGSNPSPSLPYYQSSGLLMPQYPGQYPQNTQAGPEFRRPLTVIF-