Monarch geneset OGS2.0

DPOGS213340
TranscriptDPOGS213340-TA1953 bp
ProteinDPOGS213340-PA650 aa
Genomic positionDPSCF300109 - 468838-477747
RNAseq coverage2195x (Rank: top 6%)
Annotation
HeliconiusHMEL0145092e-10960.47% 
BombyxBGIBMGA009139-TA9e-7835.29% 
Drosophila26-29-p-PA1e-8135.59% 
EBI UniRef50UniRef50_C7BWZ13e-14049.13%Putative C1A cysteine protease n=1 Tax=Spodoptera frugiperda RepID=C7BWZ1_SPOFR
NCBI RefSeqNP_001128673.11e-8836.12%cathepsin L like protein [Bombyx mori]
NCBI nr blastpgi|2547463481e-13949.13%putative C1A cysteine protease precursor [Spodoptera frugiperda]
NCBI nr blastxgi|2547463483e-14249.13%putative C1A cysteine protease precursor [Spodoptera frugiperda]
Group
Gene OntologyGO:00082349.9e-104cysteine-type peptidase activity
GO:00065081.1e-78proteolysis
KEGG pathwayoaa:1000934031e-55 
 K01365 (CTSL)maps-> Lysosome
    Phagosome
    Antigen processing and presentation
InterPro domain[342-650] IPR0131289.9e-104Peptidase C1A, papain
[435-649] IPR0006681.1e-78Peptidase C1A, papain C-terminal
[349-404] IPR0132013.2e-10Proteinase inhibitor I29, cathepsin propeptide
Orthology groupMCL34975 Lepidoptera specific
Genotypes for resequenced monarchs and outgroup Danaus species

Nucleotide sequence:

>DPOGS213340-TA
ATGATCGCAACACACTGTAACTCGCCTACAGACTGTGCGTGCTCAGCTAATTTGGAACCAAAATATTACAAAACCCAGCTTCATGACCTCTGGCGCTCTCCAAGTTCAGTCCTCAGCTCAGAACACGAGGAACTGCAGCAGATACACCTCCAGAAATTTAAAAGGAACTCTAACTCTTTAAAGATCTATATAAAGGTGTCATTGTTCCTGCTGGCGGTGGTGTGCGTGGTGACTCAGCGACGTTACGTTCCTCCCCGAACGACGACTGTCGCAGACGAGTTCGTGGAGGTGATCAGCTTGAAGTGCGACGTCCTCGACAACGACCACTGCGGCTCCTTGGACCCCAAACAACAGAAGTATTATGTGACCCGTCCACCCCCTAAGAAACTTACCAGGCCCCCTCCCAGACCCCGTGGAGGGAACGCCGGCGAGGATGGAGTCGTGTGGCCCAAGGAGTATCATTTGAAAGGCGAAATAACTTTCATGAGCGTAGGACTCCAGGAGCCGTTTGAAATCTGGTATAGTGCTGCAGAAAATAAGTCCAGGATTGATTTTTACGATGGCACCGTGAAGAGGTACGTCATCGGGGAGGAGGAGGACGACGGTGAAGAATATAAGGTATTTCCGGTGTTCATTGACAAGGAGATGACCGTCATGTGTGTGAGAGAACCAACAGACGGGGAGTCTATGGACTTTTTAATAGATCCTAGTAATTTTACATATTTCGACACAACGTCATATAATGGAAAAACGGTTCAGGTGTGGAAGAGTATTGAAGTGGAATTAAACGAACAGAAAGTCGAAAAAGTGTTGTTTGTGTATAAACAAGATGGGTTTCATGTGCCGATAAGAGTTGAAGAAATCAAACATAATCTATGGACGGGTGCTTTAGAGGGACATAAAATTACCACCTTCTACGATTACAGGAAACCGACAAAGGACGACTTCAATGTCGCCTTAGTGACTGAATGTGAAGACGCCACTGACTTTTACAAAGACCTGCGCGTGCTTCATCCGATGATACCTTCCGATGTGGATAGGCTCTATCACAGTTATACAAAGCATCACAACAGAAACTACAAAGCTGAAGAGCATAGTTTGCGTAAATCAATATTAGAACAGAATTGGCAGCGCGTCCTCCTTCACAACAAAAAAAACTTAGGCTTCAAGTTGACTCTCAACAAATATTCTGATCGCACTAAGGAAGAACTATCCTTCCTAACTGGCACGAGACCCTCATTAGGGACAGGCACCGTCTCCTTTCCGCACACTGATGAGGAAGTGGAACAGATGGTGTTGGATCTTCCTGAGAATTATGATATGAGGCTGGAAGGAACTATTAGTGCAGTCAAGAATCAGGGTCGCTGTGGCTCCTGCTGGACGTTCTCAACTGCAGCTGCTGTGGAGGGAGCGCTGGCCAGGAAGAACGGGGGACGAGACCTGGACCTGAGCGAGCAGTCCATCGTGGATTGTGCGTGGGGATATCATAACGCTGGCTGTGACGGAGGCATGATAGACACGGCGTTTAAGTACATCCTGGACTACGGCATCCCGACTCAGATAGAATACGGAGATTACTTAGGAGAAGATGGCTACTGCCACATCGAGAACGTCACTGACGTCTACAATATCATTGGGTTCGTGCAAGTGCCGTCCAAGAGTGTGAATGCTATGAAAGTAGCCCTTTACAAGTACGGGCCGGTGTCCGTGGCTATTAATGCGAACAAGCTCTTGGTGGCCTATGAAAGTGGCATCTTCTTCGACCCTGAGTGTAACGAGGACCACATCAACCACGCCGTGACCGTAGTAGGTTATGGTGTCCGCGATGGTGCCACCTACTGGATCGTGAAGAACTCCTGGGGAGAGGACTGGGGTCAGGACGGCTACCTGCTCATCTCTGCTACCGACAATAACTGTCATATACTAGAATACGCCTACTATCCTCTAGTCTGA

Protein sequence:

>DPOGS213340-PA
MIATHCNSPTDCACSANLEPKYYKTQLHDLWRSPSSVLSSEHEELQQIHLQKFKRNSNSLKIYIKVSLFLLAVVCVVTQRRYVPPRTTTVADEFVEVISLKCDVLDNDHCGSLDPKQQKYYVTRPPPKKLTRPPPRPRGGNAGEDGVVWPKEYHLKGEITFMSVGLQEPFEIWYSAAENKSRIDFYDGTVKRYVIGEEEDDGEEYKVFPVFIDKEMTVMCVREPTDGESMDFLIDPSNFTYFDTTSYNGKTVQVWKSIEVELNEQKVEKVLFVYKQDGFHVPIRVEEIKHNLWTGALEGHKITTFYDYRKPTKDDFNVALVTECEDATDFYKDLRVLHPMIPSDVDRLYHSYTKHHNRNYKAEEHSLRKSILEQNWQRVLLHNKKNLGFKLTLNKYSDRTKEELSFLTGTRPSLGTGTVSFPHTDEEVEQMVLDLPENYDMRLEGTISAVKNQGRCGSCWTFSTAAAVEGALARKNGGRDLDLSEQSIVDCAWGYHNAGCDGGMIDTAFKYILDYGIPTQIEYGDYLGEDGYCHIENVTDVYNIIGFVQVPSKSVNAMKVALYKYGPVSVAINANKLLVAYESGIFFDPECNEDHINHAVTVVGYGVRDGATYWIVKNSWGEDWGQDGYLLISATDNNCHILEYAYYPLV-