Monarch geneset OGS2.0

DPOGS209955
TranscriptDPOGS209955-TA1026 bp
ProteinDPOGS209955-PA341 aa
Genomic positionDPSCF300148 - 179979-183597
RNAseq coverage9820x (Rank: top 1%)
Annotation
HeliconiusHMEL0096842e-9147.88% 
BombyxBGIBMGA011342-TA7e-16378.01% 
DrosophilaCp1-PC3e-13969.44% 
EBI UniRef50UniRef50_G6CNL50.0100.00%Cathepsin L-like protease n=3 Tax=Metazoa RepID=G6CNL5_DANPL
NCBI RefSeqNP_001037464.23e-16178.01%fibroinase [Bombyx mori]
NCBI nr blastpgi|3069921732e-16278.20%cathepsin L-like proteinase [Spodoptera frugiperda]
NCBI nr blastxgi|2547463404e-16479.06%putative C1A cysteine protease precursor [Manduca sexta]
Group
Gene OntologyGO:00082343.9e-191cysteine-type peptidase activity
GO:00065081.8e-129proteolysis
KEGG pathwayaag:AaeL_AAEL0028338e-146 
 K01365 (CTSL)maps-> Lysosome
    Phagosome
    Antigen processing and presentation
InterPro domain[1-341] IPR0131283.9e-191Peptidase C1A, papain
[124-340] IPR0006681.8e-129Peptidase C1A, papain C-terminal
[27-87] IPR0132014.6e-21Proteinase inhibitor I29, cathepsin propeptide
Orthology groupMCL11408 Multiple-copy universal gene
Genotypes for resequenced monarchs and outgroup Danaus species

Nucleotide sequence:

>DPOGS209955-TA
ATGAAAATTTTACTCGTATTATGTGCTGTGGTGGCGGCTGGCACTGCCGTCAGCTTCTTCGACCTCGTCCGCGAGGAGTGGAACACCTTTAAGCTAGAGCACAAGAAGCAGTACGACAGCGAGACGGAGGAGAAGTTCCGTATGAAGATATACGCGGAGAACAAACACAAGGTCGCCAAACACAACCAGCGGTACCAGAAGGGTCTGGTCTCCTACAGGCTGAAGACGAACAAGTACTCCGACATGCTGCACCACGAGTTCGTCAACACCATGAACGGATTCAACAAGACCGTGAAACACAACAAGGGGCTGTACGCGAAGGGTAACGATATCCGCGGGGCCACTTTCGTGTCCCCGGCCAACGTGGCGGCGCCTCCCACCGTGGACTGGAGGCAGCACGGAGCCGTCACCCCCGTCAAGGACCAGGGCAAATGTGGATCATGCTGGTCGTTCTCTACCACGGGAGCACTGGAGGGCCAACACTTCCGTAAGAGCGGCTTCCTGGTGTCTCTCTCGGAGCAGAACCTCATCGACTGCTCCTCCGCGTACGGAAACAACGGATGTAACGGCGGCCTCATGGACAACGCCTTCAAGTACATCAAGGACAACGACGGCATCGACACCGAGAAGACCTACCCCTACGAGGCCGTGGACGACAAGTGCAGGTACAACCCCAAGAACTCGGGCGCCGAGGACGTGGGCTTCGTGGACATCCCCGCCGGAGACGAGCACAAGCTGATGCTGGCGCTGGCCACCGTGGGACCCGTGTCCGTCGCCATAGACGCGAGCCAGGAGTCCTTCCAGCTCTACTCTGACGGCGTCTACTACGACGAGAACTGCTCCTCCGAAAACCTCGACCATGGAGTGTTGGTGGTGGGTTACGGCACGGACGAGGACGGCGGCGACTACTGGCTGGTGAAGAACTCGTGGGGGCCGTCCTGGGGAGACGAGGGCTACATCAAGATGGCCCGCAACAGAGACAACCACTGCGGCATCGCCTCCTCCGCCTCCTACCCGCTCGTGTAG

Protein sequence:

>DPOGS209955-PA
MKILLVLCAVVAAGTAVSFFDLVREEWNTFKLEHKKQYDSETEEKFRMKIYAENKHKVAKHNQRYQKGLVSYRLKTNKYSDMLHHEFVNTMNGFNKTVKHNKGLYAKGNDIRGATFVSPANVAAPPTVDWRQHGAVTPVKDQGKCGSCWSFSTTGALEGQHFRKSGFLVSLSEQNLIDCSSAYGNNGCNGGLMDNAFKYIKDNDGIDTEKTYPYEAVDDKCRYNPKNSGAEDVGFVDIPAGDEHKLMLALATVGPVSVAIDASQESFQLYSDGVYYDENCSSENLDHGVLVVGYGTDEDGGDYWLVKNSWGPSWGDEGYIKMARNRDNHCGIASSASYPLV-