Monarch geneset OGS2.0

DPOGS203366
TranscriptDPOGS203366-TA2799 bp
ProteinDPOGS203366-PA932 aa
Genomic positionDPSCF300003 + 165114-174934
RNAseq coverage424x (Rank: top 29%)
Annotation
HeliconiusHMEL0135310.059.09% 
BombyxBGIBMGA003891-TA0.065.41% 
DrosophilaTaf4-PE6e-8965.08% 
EBI UniRef50UniRef50_UPI00022C96A59e-13751.33%UPI00022C96A5 related cluster n=2 Tax=unknown RepID=UPI00022C96A5
NCBI RefSeqXP_001664327.11e-10166.92%transcription initiation factor [Aedes aegypti]
NCBI nr blastpgi|3407163107e-17042.76%PREDICTED: transcription initiation factor TFIID subunit 4-like isoform 1 [Bombus terrestris]
NCBI nr blastxgi|3838537940.043.47%PREDICTED: transcription initiation factor TFIID subunit 4-like [Megachile rotundata]
Group
Gene OntologyGO:00056691.4e-71transcription factor TFIID complex
GO:00063521.4e-71transcription initiation, DNA-dependent
GO:00063558.4e-44regulation of transcription, DNA-dependent
GO:00037008.4e-44sequence-specific DNA binding transcription factor activity
GO:00036774.3e-13DNA binding
KEGG pathwayaag:AaeL_AAEL0059754e-101 
 K03129 (TFIID3, TAF4)maps-> Huntington's disease
    Basal transcription factors
InterPro domain[686-930] IPR0079001.4e-71Transcription initiation factor TFIID component TAF4
[391-481] IPR0038948.4e-44TAFH/NHR1
[724-771] IPR0090724.3e-13Histone-fold
Orthology groupMCL12440 Single-copy universal gene
Genotypes for resequenced monarchs and outgroup Danaus species

Nucleotide sequence:

>DPOGS203366-TA
ATGGCGTCAGCGGAGTTTTTAGAGCAAGCTTTGTCTACAGACGTTGATGAGAATGCAGTTAACGCGATAGTAGGTTCCCTAGAAAATCATTTAGTGACGTCCGTTCCGTCAATATCGTCACAGAACAATTTGTTGACTGTTATTCCAAGTCAACTTAGCCTTGCAACAAGTGAAAATACCATTATTGGACAAAAATATAACAAAGAGAATAGTGATGGCGATATAGGGAGTGTAAACTTTAGACCAAATATTGTTTCAAGTTCCTCGTTTAGTTTACCCTCAACTTTTATTAACCAAACAAGCCTGTCTCAAAACATTTCAAATGGTACTGATTTGGTAAAAGTTATAAGTTCTCAACCGCTAACTTTATCCGTCTCTGATAATAGTGTTGTGTTCTCAGCGCCATCATACGCAAACGGTTGTCCTTCTTTGCCGTTATCCCAAGCTCAGATAATTCAGACTGTACAAGGAAGTAGTGCAATAAATCAGCCAATTAATAAATCTATTACTATGCAAAATCCTCCTTTGGTTATAAAACAGGGAACGACTTCTGGTCAAGTCAGTATGCAAGCCAATATGGTACCAATGACAGTGAATTCTAGCATGCCGGGTTCTATTTCTAACGTGATGACTATTAATAAGCCAGGAGGGCAGAACGTCGTTGTCACAACACAGAATCTCGGTACAGGCCAACCTGCTATATTGCCCAATGTTCAAATTTTAAACATGAGGCCGGGTGCGCCTGCGGTGGCGGCTCAAAAATCGGTCGCAACTGTGTCCCCGCGCGTTGTTATCGGAACTCCTCAGGTTGTTGGACAGAGAGCAGCTGCCCCTGGAATAACGCTGCAAACACTACAAAGTCTACAACAGGGGCAGCAGGGTCATTTGTTATTAAAGACTGAGAACGGTCATTACCAGCTGTTAAGAGTGGGTCCAGGGCCCGGGGCCAGCACGCTGGCGCCGCAGCAACAGACGATGCGACTGTCCACAGTGCCGGCACATCCCGGGGTGTCAACGGTGTCCACGAGCGTGCCGGCCCCGGTACAGATACCTGGTCAGATGCCGCAGGGGCCGGTCGCTACCCCGGTGCCAGCGGCATCTGTGACCGTGCCTCTGCCTTCGCCACAATCACTCCAACCCACAGTCACTACTCAGAAGCCGTTGGACAACACTAAGGAGAAGTGTCGCAACTTCCTGGCCAACCTGCTGGACCTGTCTAGCAAGGAGCCGAAGTCCGTGGAGAGGAGTGTCAGGAACCTCATACAGGAACTGATCGACGCTCAGGTGGAACCGGAGGAGTTCTGTGATAGGTTGGAAAGACTCCTGAACGCCAGCCCACAACCCTGCCTCATCGGCTTCCTGAAGAAAAGTCTACCGTTGCTGCGTCAGTCCCTCGTCACCAAGGAGCTGGTGATAGAGGGCATCAACCCGCCGTCTCCGCACGTGGCGTTCTCAGCGATATCGCCGCAAGCACCCAACACGGCGGTAGCCACCAGCAACATACAGATGCCGGGCCTAACGTTAGTGGTTCGGCAGCCAGATGACGAAGGCAGCTCAAGTCCGACCCTAACGCCCCTCCTGCCTCCCGTGATGCCGGTCATCCCGCCCCAACCACCATCACCGAAACAGATTAACATAGTCGCCGTGCAGTCGGCGCAGGTTTGTCGCTCGCGTTACACCAGCCTTAGCACCAGTAGAGTCATCGGAATCATTCGCTACGGTGGTTCCACACGTGATGCGGCTCTTTATATAACTTATATGTCAATCCGTCTCCATCATTGCGCTCTTCCACATCAGCCGAAGCCTCAGCCCAAGTCTGGTGGTACGATAGCGGTGCTCCAGAACATTCCAGTGCATCCGAAGATCAACGTCAGCAAAGTGGGCAAGACTATGACGGTGAACAGTAAGGCTGGCTTCACGCGACCCACGGGCTCCGCTAACACGGGCCTCTCGACTGTGCTCACGGCGGGGAAGTCTCTGCTGCGGGACAGGGAGAGGAGATCAGCTCAGTTCTCGCAGAGCTTCGTGGACGACAAGATGGCCGGCGATGACGACATCAATGACGTAGCAGCCATGGGAGGAGTAAATCTCGCTGAGGAGAGCCAGCGGATATTGGGCTCCACGGAAATGATCGGAGCACAGATCAGATCCTGTAAAGACGAGACCTTAGTACCAATGGCGGTGATGCAGGCCAGGATACGTGCGGTGTCTCTGAGACACGGCCTGGAGGAGCCCCCGGCGGAGGTTGGGGCCTTACTGAGCCACGCGCTGCAAGAACGACTCAAATCGCTACTAGAGAAGCTAGCGGTCATATCACAGCACCGGATAGACACGCATGTCAAGATGGATTCGCGTTATGAAGTGACTCAGGATGTGAAGGGCCAGCTGAAGTTCCTTGAAGAACTGGACAGAGTGGATAAGAAGAGACGAGAGGACTCAGAGAGGGAGATGTTGTTGAGAGCAGCCAAGTCGCGATCCAAGAACGAGGACCCTGAACAGGCCAAGCTTAAGGCGAAGGCCAAGGAGATGCAGCGCGCTGAGTTAGAGGAGCTGAGGCAGCGGGAGGCGAATCTGACAGCATTACAGGCGATCGGGCCAAGGAAGAAGCCGCGAACTGACGGCTCAGCAGCTGGAGATAATCTGGGATCCAGCGGTCAGAGTACTGGACCTTCCGGCCGAGGTCAGCTCCCACAGCGAACCCGTCTGAAGAGAGTCAACATGCGAGATATGCTGTTCATGATGGAACAAGAGCCTGAATATAGACACTCGGCGCTACTATACCGTGCCTACCTCAAGTAA

Protein sequence:

>DPOGS203366-PA
MASAEFLEQALSTDVDENAVNAIVGSLENHLVTSVPSISSQNNLLTVIPSQLSLATSENTIIGQKYNKENSDGDIGSVNFRPNIVSSSSFSLPSTFINQTSLSQNISNGTDLVKVISSQPLTLSVSDNSVVFSAPSYANGCPSLPLSQAQIIQTVQGSSAINQPINKSITMQNPPLVIKQGTTSGQVSMQANMVPMTVNSSMPGSISNVMTINKPGGQNVVVTTQNLGTGQPAILPNVQILNMRPGAPAVAAQKSVATVSPRVVIGTPQVVGQRAAAPGITLQTLQSLQQGQQGHLLLKTENGHYQLLRVGPGPGASTLAPQQQTMRLSTVPAHPGVSTVSTSVPAPVQIPGQMPQGPVATPVPAASVTVPLPSPQSLQPTVTTQKPLDNTKEKCRNFLANLLDLSSKEPKSVERSVRNLIQELIDAQVEPEEFCDRLERLLNASPQPCLIGFLKKSLPLLRQSLVTKELVIEGINPPSPHVAFSAISPQAPNTAVATSNIQMPGLTLVVRQPDDEGSSSPTLTPLLPPVMPVIPPQPPSPKQINIVAVQSAQVCRSRYTSLSTSRVIGIIRYGGSTRDAALYITYMSIRLHHCALPHQPKPQPKSGGTIAVLQNIPVHPKINVSKVGKTMTVNSKAGFTRPTGSANTGLSTVLTAGKSLLRDRERRSAQFSQSFVDDKMAGDDDINDVAAMGGVNLAEESQRILGSTEMIGAQIRSCKDETLVPMAVMQARIRAVSLRHGLEEPPAEVGALLSHALQERLKSLLEKLAVISQHRIDTHVKMDSRYEVTQDVKGQLKFLEELDRVDKKRREDSEREMLLRAAKSRSKNEDPEQAKLKAKAKEMQRAELEELRQREANLTALQAIGPRKKPRTDGSAAGDNLGSSGQSTGPSGRGQLPQRTRLKRVNMRDMLFMMEQEPEYRHSALLYRAYLK-