Monarch geneset OGS2.0

DPOGS213403
TranscriptDPOGS213403-TA1662 bp
ProteinDPOGS213403-PA553 aa
Genomic positionDPSCF300109 + 540261-544883
RNAseq coverage4775x (Rank: top 3%)
Annotation
HeliconiusHMEL0144985e-15274.77% 
BombyxBGIBMGA009139-TA0.075.82% 
Drosophila26-29-p-PA0.060.79% 
EBI UniRef50UniRef50_Q9V3U60.060.79%26-29kD-proteinase n=45 Tax=Coelomata RepID=Q9V3U6_DROME
NCBI RefSeqNP_001164088.10.061.94%cathepsin L precursor [Tribolium castaneum]
NCBI nr blastpgi|64484690.064.42%homologue of Sarcophaga 26,29kDa proteinase [Periplaneta americana]
NCBI nr blastxgi|64484690.064.66%homologue of Sarcophaga 26,29kDa proteinase [Periplaneta americana]
Group
Gene OntologyGO:00082342.6e-182cysteine-type peptidase activity
GO:00065081.5e-91proteolysis
KEGG pathwaynve:NEMVE_v1g1811815e-63 
 K01365 (CTSL)maps-> Lysosome
    Phagosome
    Antigen processing and presentation
InterPro domain[213-553] IPR0131282.6e-182Peptidase C1A, papain
[335-552] IPR0006681.5e-91Peptidase C1A, papain C-terminal
[248-304] IPR0132014.5e-20Proteinase inhibitor I29, cathepsin propeptide
Orthology groupMCL14711 Insect specific
Genotypes for resequenced monarchs and outgroup Danaus species

Nucleotide sequence:

>DPOGS213403-TA
ATGTTTGTCTACACTCTATTGTGTTTCTACTTGGGCTCCGTTGTAGGACTTCGCATCGATAAAGATAATCCACCGCAATGGAGCGATGTTTACACGGTCAAGGGTTTATTAAATATTCCCTACGCAGAACTTCACGAACCTTTTTATGCGTGGTTCGACAGCAAGAATGGCAAGTCTCGTATTGACTACTACGGTACTATGGTGAAGACCTACCAGCTGTCTGCCTCCGTCTACCCTCAGTATGGTACATCCATTAAGATAGCTCCGGTGACTACTGAGCATGTCCTGAACCAGGACACCTGTCTTCAGGTGAACGGTACGGAGGGAGAGAATATTAACATTCAGACCGTACTCCCTGACATGACCGACTTCAAGTTTGTAGGAACAGAGACTATGAAAGACTCCGACACCTTCAAGTGGCGCATGGTGACCTCTGTAGGGGATAAGGTCAACAAATACACGATGTGGGTCAAGTACAGGAAGAGTCTGAGAGGAGACAACATTGCTATACCAGTCAGGTACGAGATGAAGGGTTTTAACTCTCTGCTGGGCTCTCACTACGACCACTATTATTTGGATTACACGGACTTTGACAACAGCGATATCGAGCCCGACGTCTTCAAAGTAGATTCCAGCTTCAAGTGTTCGTCGTTCCCGGGCCCGGGTTTTCGCCACATGGCCACCTTCAACCCCATGAAGGAGTTCGTTCACCCCGCCAGCGATGAGCATGTCCATCACGAGTTCGACCGGTTCGTCAATAAACACAACAAGCAGTACGCCTCGGAGGTCGAGAAGACTAAGAGGATCAATATATTCAGACAGAATTTAAGATTGATTCACTCTCACAATCGCGCTCACCGCGGCTTCTCTCTGGCCGTGAATCATCTCGCAGACCACACGGACGAGGAGCTCGCCGCGCGCCGGGGCAGGAGATACACGGGACACAACGCAGGGCTGCCGTTCCCGTACGGCGAGGCGGAGCTGGCGGACATGAGCGTCAAGCTGCCGCCGGAGTTCGACTGGAGGCTGTTCGGCGCCGTGACGCCCGTCAAAGACCAGTCGGTGTGCGGGTCTTGTTGGTCGTTCGGCACGGTGGGGGCGGTGGAGGGCGCGCTGTTCCTCAGCAACGGAGGACATCTCGTGAGACTCAGCCAACAGGCGCTCGTGGACTGCTCCTGGGGTTTCGGTAACAACGGCTGTGACGGCGGCGAGGACTACCGCGCCTACCAGTGGATCATGAGACACGGCCTGCCCACGGAGGACGACTACGGAGGATACCTCGGACAGGACGGCTACTGCCACATGGAGAACGTGACGGTCGCCACCAAGATGAAGGGCTGGGTGAACGTCACCGCCAAGAACGAGAACGCGCTGAAGTTGGCGATCTTCAAACACGGCCCGGTGTCGGTGGCCATCGACGCCTCGCACAAGACCTTCAGCTTCTACTCCAACGGAGTCTACTTCGAGCCCAAATGTAAGAACAGCGTGGAGGAGCTGGACCACGCGGTGCTGGCGGTCGGGTTCGGCGTTCTGAACGGACACAAGTACTGGCTCGTCAAGAACAGCTGGTCCAACATGTGGGGGAACGACGGGTACGTGCTCATGTCGGCCAGAGACGACAACTGTGGGGTCCAGGCCGCCCCCACCTACGTCATCATATAG

Protein sequence:

>DPOGS213403-PA
MFVYTLLCFYLGSVVGLRIDKDNPPQWSDVYTVKGLLNIPYAELHEPFYAWFDSKNGKSRIDYYGTMVKTYQLSASVYPQYGTSIKIAPVTTEHVLNQDTCLQVNGTEGENINIQTVLPDMTDFKFVGTETMKDSDTFKWRMVTSVGDKVNKYTMWVKYRKSLRGDNIAIPVRYEMKGFNSLLGSHYDHYYLDYTDFDNSDIEPDVFKVDSSFKCSSFPGPGFRHMATFNPMKEFVHPASDEHVHHEFDRFVNKHNKQYASEVEKTKRINIFRQNLRLIHSHNRAHRGFSLAVNHLADHTDEELAARRGRRYTGHNAGLPFPYGEAELADMSVKLPPEFDWRLFGAVTPVKDQSVCGSCWSFGTVGAVEGALFLSNGGHLVRLSQQALVDCSWGFGNNGCDGGEDYRAYQWIMRHGLPTEDDYGGYLGQDGYCHMENVTVATKMKGWVNVTAKNENALKLAIFKHGPVSVAIDASHKTFSFYSNGVYFEPKCKNSVEELDHAVLAVGFGVLNGHKYWLVKNSWSNMWGNDGYVLMSARDDNCGVQAAPTYVII-