Monarch geneset OGS2.0

DPOGS200324
TranscriptDPOGS200324-TA912 bp
ProteinDPOGS200324-PA303 aa
Genomic positionDPSCF300026 + 175569-183244
RNAseq coverage57x (Rank: top 69%)
Annotation
HeliconiusHMEL0115644e-11295.41% 
BombyxBGIBMGA005619-TA1e-9285.00% 
Drosophilaunc-4-PA5e-7578.70% 
EBI UniRef50UniRef50_O772157e-7378.70%Homeobox protein unc-4 n=34 Tax=Coelomata RepID=UNC4_DROME
NCBI RefSeqNP_573242.11e-7378.70%unc-4 [Drosophila melanogaster]
NCBI nr blastpgi|564118602e-7276.70%paired-type homeodomain protein [Drosophila sechellia]
NCBI nr blastxgi|3479660251e-7581.87%AGAP001495-PA [Anopheles gambiae str. PEST]
Group
Gene OntologyGO:00036771.1e-27DNA binding
GO:00063551.1e-27regulation of transcription, DNA-dependent
GO:00435651.7e-24sequence-specific DNA binding
GO:00037001.7e-24sequence-specific DNA binding transcription factor activity
GO:00055152e-23protein binding
KEGG pathway 
InterPro domain[31-100] IPR0122871.1e-27Homeodomain-related
[40-102] IPR0013561.7e-24Homeobox
[27-97] IPR0090572e-23Homeodomain-like
Orthology groupMCL15011 Insect specific
Genotypes for resequenced monarchs and outgroup Danaus species

Nucleotide sequence:

>DPOGS200324-TA
ATGGTATTAAGTTTAAAAGTTCGGTTTAAAAAGGCTGTGGCTTATGGTTGTGAGATACGGTACATTTGGGCCCGGGTTATAGGTGGTCTCGCTGATGATGATGGCGATAGCGGCAAGCGACGCCGCTCCCGCACCAACTTCAACTCCTGGCAGCTGGAGGAGCTGGAGAGAGCGTTCCTCGCGTCCCACTACCCGGATGTATTCATGAGGGAGGCGCTGGCGATGCGCCTTGACCTCAAGGAGAGTAGAGTTGCCGTATGGTTCCAAAACCGCCGCGCGAAGTGGCGTAAGAAGGAGCACACGAAGAAGGGTCCAGGTCGACCAGCTCACAACGCTCATCCTCAGAGCTGTTCCGGGGAACCGATCCCACCACACGAGCTGCGAGCCAGGGAGAGGGCGAGAAGAAGAAAGAAGCTGGCAAAAGCTCTTGAGAGACAGGCGAGGAAGCTTCGAGCCAAAGGCATCGCGGTAGATCTCGAGGCCTTGAAGGCCGAATATTTAGCACAGCACAGAAACAGCGGTTTGTACTCCGACTCGGACGGAGACATCGATGGAGAAGAATGTATGATAGATGTAGTCGGCGGGGACTCCTGCCACGACTCAGGTCCCGAGGACTTCTCACTATCAGGACGTCTCCGAGCCCCGCCCGGAGCGGACGCTATCAATCATTCAGACAGCTCCACATCGGATGGCCACTCGGCGCGTAACTTCAGCCCGCTGCCGGACCCGCAGACATCATTCTTCAAACAGTTCACATCAGCCGCCACAAACTTCGACAAATTCGACTTCGACTCCGGCCGAGCCGTTTCCTTAGACCTCAGCGGTAGTTCCAACACGGAACAGGGTTTATCCAAGAGCGGGGGCGCGAGGAAGCACAATCCGTTCAGTATAGAATCTCTCCTTAACACGTGA

Protein sequence:

>DPOGS200324-PA
MVLSLKVRFKKAVAYGCEIRYIWARVIGGLADDDGDSGKRRRSRTNFNSWQLEELERAFLASHYPDVFMREALAMRLDLKESRVAVWFQNRRAKWRKKEHTKKGPGRPAHNAHPQSCSGEPIPPHELRARERARRRKKLAKALERQARKLRAKGIAVDLEALKAEYLAQHRNSGLYSDSDGDIDGEECMIDVVGGDSCHDSGPEDFSLSGRLRAPPGADAINHSDSSTSDGHSARNFSPLPDPQTSFFKQFTSAATNFDKFDFDSGRAVSLDLSGSSNTEQGLSKSGGARKHNPFSIESLLNT-