Monarch geneset OGS2.0

DPOGS203153
TranscriptDPOGS203153-TA1107 bp
ProteinDPOGS203153-PA368 aa
Genomic positionDPSCF300035 - 981250-983580
RNAseq coverage402x (Rank: top 30%)
Annotation
HeliconiusHMEL0065051e-14590.49% 
BombyxBGIBMGA011498-TA2e-16585.29% 
Drosophilasr-PA3e-7789.58% 
EBI UniRef50UniRef50_E0VJ722e-7581.29%Early growth response protein, putative n=3 Tax=Neoptera RepID=E0VJ72_PEDHC
NCBI RefSeqXP_970833.21e-7774.09%PREDICTED: similar to conserved hypothetical protein [Tribolium castaneum]
NCBI nr blastpgi|2700025382e-7674.09%hypothetical protein TcasGA2_TC004846 [Tribolium castaneum]
NCBI nr blastxgi|1892341462e-8044.90%PREDICTED: similar to conserved hypothetical protein [Tribolium castaneum]
Group
Gene OntologyGO:00036762.4e-14nucleic acid binding
GO:00082701.1e-05zinc ion binding
GO:00056221.1e-05intracellular
KEGG pathwaynvi:1001170071e-75 
 K09203 (EGR1)maps-> Prion diseases
InterPro domain[325-355] IPR0130872.4e-14Zinc finger, C2H2-type/integrase, DNA-binding
Orthology groupMCL25861 Lepidoptera specific
Genotypes for resequenced monarchs and outgroup Danaus species

Nucleotide sequence:

>DPOGS203153-TA
ATGGGCGCGGATGGAGACCCACCGACCGGGCCTCACCTCCTCTCGCTGGCGGACGTCGGGGCGCTGGGCTTCGACTGCGCTCTGAAGCCGGTGACCGCGCCTATGACAGGCGGCGCTCCGGCCGATCTCAACACACCCGTGTCCACATCGGAACTTCCCGCTTTCTTCCCGAGCCTGCTCGAGCCTCCTCCGATATCAGGTACTTTACCAGGCGATGAGTTACTGGGGTGCTCCCCTCGTCGTCACAAGCACGAAGCGTCTTTGTCACCGGGAGCGAGGGCTGAGGACGCTAGCAATGCCTCTAGTGCTAGCGCCTCTCTATACGGACCGCCGATGGGCGGCAAAAGAGCTCCCTCACCACCACTACAATGGTTGCTACCATCTGGACCCGGTCCCGGCAGCGTCGATAAATACTTCCAACAAGAATACGAGGAACGCGTCGAGCTTCTACCGCCCGAATGTCAGCCTTCTTACTGTACAGCACCGCAGCAATGCCAGCCGCAACACTGCGACTACAGACCCCAACCTCCACCACAACCCCAACACTCGTGGGAGACGCAGGAGTACGCGAGCGTGCCGCAGCCAACACCGGGTCCCTCCGGAGTCCCCAAAAGAGAACCCTATCCAAACACAACAGGCGACAGACCCGTGCAACTAGCAGAATACAACCCGTCCACGAGCAAAGGCCATGAGATATTATCTCAAGTGTATCAACAGAGCGCTCAACCACTGCGTCTAGTCGCCGTCAAACCTCGCAAGTATCCCAACCGTCCGAGTAAAACACCCGTACATGAAAGGCCCTATGCCTGTCCAGTGGACGAGTGTGATCGCAGGTTTTCGAGATCAGACGAGCTGACAAGGCACATACGCATACACACAGGACAAAAACCGTTCCAGTGTCGTATCTGTATGCGCTCGTTCAGTCGATCGGATCATTTGACGACACATGTCAGAACTCACACAGGGGAGAAGCCGTTTGCGTGCGACGTGTGCGGTCGTAAGTTCGCGAGGTCTGATGAGAAGAAGCGTCACGCGAAGGTTCACCTTAAGCAGCGTCTCAAACGCGAGCGGGGCAGTGGACCGGCTCACCCACACGCGCCGCTCTAG

Protein sequence:

>DPOGS203153-PA
MGADGDPPTGPHLLSLADVGALGFDCALKPVTAPMTGGAPADLNTPVSTSELPAFFPSLLEPPPISGTLPGDELLGCSPRRHKHEASLSPGARAEDASNASSASASLYGPPMGGKRAPSPPLQWLLPSGPGPGSVDKYFQQEYEERVELLPPECQPSYCTAPQQCQPQHCDYRPQPPPQPQHSWETQEYASVPQPTPGPSGVPKREPYPNTTGDRPVQLAEYNPSTSKGHEILSQVYQQSAQPLRLVAVKPRKYPNRPSKTPVHERPYACPVDECDRRFSRSDELTRHIRIHTGQKPFQCRICMRSFSRSDHLTTHVRTHTGEKPFACDVCGRKFARSDEKKRHAKVHLKQRLKRERGSGPAHPHAPL-