Monarch geneset OGS2.0

DPOGS212040
TranscriptDPOGS212040-TA1686 bp
ProteinDPOGS212040-PA561 aa
Genomic positionDPSCF300054 + 20703-29554
RNAseq coverage306x (Rank: top 37%)
Annotation
HeliconiusHMEL0180665e-10873.61% 
BombyxBGIBMGA010089-TA0.075.52% 
DrosophilaTfb1-PA7e-16050.42% 
EBI UniRef50UniRef50_Q960E81e-15750.42%General transcription factor IIH subunit 1 n=14 Tax=Neoptera RepID=TF2H1_DROME
NCBI RefSeqXP_002081487.12e-16051.52%GD11042 [Drosophila simulans]
NCBI nr blastpgi|1955833544e-15951.52%GD11042 [Drosophila simulans]
NCBI nr blastxgi|1954859466e-15450.51%GE13578 [Drosophila yakuba]
Group
Gene OntologyGO:00055151.4e-39protein binding
KEGG pathwaydsi:Dsim_GD110425e-160 
 K03141 (TFIIH1)maps-> Basal transcription factors
    Nucleotide excision repair
InterPro domain[1-109] IPR0119931.4e-39Pleckstrin homology-type
[10-81] IPR0138761.8e-20TFIIH p62 subunit, N-terminal
[179-234] IPR0056074.7e-15BSD
Orthology groupMCL13922 Single-copy universal gene
Genotypes for resequenced monarchs and outgroup Danaus species

Nucleotide sequence:

>DPOGS212040-TA
ATGACCACATCGTCGGAGGACGTTCTTTTGAGTGTAGGACATGTAAGGTATAAAAAGGGCGATGGCACTTTGTATGTGATGAACCAAAGATTGGCTTGGATGCTTGAGAACAAGGACACTGTTGCTGTCTCTCACAAGTATGCAGATATAAAAACTCAAAAAATCTCACCAGCTGGAAAACCAAAAGTTCAACTACAAGTGGTGTTACATGATGGGGCATGTTCCACATTTCATTTTGTCAATCCGGCCGGGGCAGAGGCTCAGGCTAAAGACAGGGACCAAGTTAAAATGTTATTACAGAATCTATTACCCAAGTTTAAGAGACAGATAGACGGAGAGTTGGAGATGAAATCTAAGCTACTGTCGTTACATCCCACATTAAAGCATTTATATGAAGATTTAGTTATATCAAAAGTTATAAATAGTGAAGAGTATTGGAATACGCCGACATTGAAACATTACACAGAATCTACTAACATGAAACAAGAGGCCGGCGTGTCGGGTGCGTTTCTAGCCGATATACAGCCGCAGACTGATGGATGCAATGGACTTAAGTATAACCTGACGCAGGACATTATAGATGCCATATTCAAAACATATCCGGCGGTTAGGAAGAAACATGTGGATTATGTGCCAAATAAGATGACAGAGGCTGAGTTTTGGACAAAATTCTTTCAATCCCATTACTTTCATAGAGATCGTATAATGTCGTCATCGAGTAAGGACTTATTTGGGGAGTGCGCTAAACTTGATGACCAAGCGATCGCCTCCGCTATGAAACACACAACCTTGGACTTGACTGTGGATCTACCCTCATTCAAAGAACCAATCCCACTCTTACCCGACGATGAAACACACGAAAAGGAAAAAGATGGTACATCGATACACAGGAACATGATAAAAAGATTCAACCAACACTCCATAATGGTGCTAAAAGCTAGTCATAAAAATTCCAATAGCAGTAGTAGTAAAACAAATAAAGTCGAGAATGGTATGAAAGAAACGAACGGCGTCGAAAAAAGGCCGAGTGCAGACAAAGATGTGACGGAGCCGGTTGATAAGAAACGTAGGATAATGGAGAAGATACATTACGAGGACCTGGATAATGTGGATAGCAATGAAGATACTCAGGAGTTGAAACTGTCAAAGGTAGAACGATATCTGTTGGGTCCAGCGTCTCAAGTGGGTCACACAGGAACGAGTTCCAGTAATCCACCACCACTGTCCGCCCTGGCATCTGTCTGTCAGGCGTGGAGTAGTGGTCAACAGTGTAGTCGCCCTGTCCGCGTGAGTGCTGCGGCCGCTGTCGGAGCTCTGGGCGAATTAAGCCCGGGAGGAGCTCTAATGAGGCAACACCACGCGGCGAGCATGGCCCAGCTGGTCCCGCCGCCCGCCCGCCAGGAGCTCCAGCGTCTGTACCTGTCATGTGGCGAACTGCTCCGTGAGTTGTGGCGTTGTTTCCCTCAGCCGGGCGCTCCGCCCGATGACGACGCCGGCACCAGGGCGGAGAGGTTCTATGACGCTATCATGAGGTTCAGGAACCTTAAGCTGAGGCCGTTTGAGGAAAAAATGCTACGTGACCTGACACCGCTGGCGTCGTCATTAACAAGACATATGAATCAAATGATCGAAACAGCCTGCGCCAAATACGCTGTTTGGCAACAGAGACAAGCTAAACTTCGGTAG

Protein sequence:

>DPOGS212040-PA
MTTSSEDVLLSVGHVRYKKGDGTLYVMNQRLAWMLENKDTVAVSHKYADIKTQKISPAGKPKVQLQVVLHDGACSTFHFVNPAGAEAQAKDRDQVKMLLQNLLPKFKRQIDGELEMKSKLLSLHPTLKHLYEDLVISKVINSEEYWNTPTLKHYTESTNMKQEAGVSGAFLADIQPQTDGCNGLKYNLTQDIIDAIFKTYPAVRKKHVDYVPNKMTEAEFWTKFFQSHYFHRDRIMSSSSKDLFGECAKLDDQAIASAMKHTTLDLTVDLPSFKEPIPLLPDDETHEKEKDGTSIHRNMIKRFNQHSIMVLKASHKNSNSSSSKTNKVENGMKETNGVEKRPSADKDVTEPVDKKRRIMEKIHYEDLDNVDSNEDTQELKLSKVERYLLGPASQVGHTGTSSSNPPPLSALASVCQAWSSGQQCSRPVRVSAAAAVGALGELSPGGALMRQHHAASMAQLVPPPARQELQRLYLSCGELLRELWRCFPQPGAPPDDDAGTRAERFYDAIMRFRNLKLRPFEEKMLRDLTPLASSLTRHMNQMIETACAKYAVWQQRQAKLR-