Monarch geneset OGS2.0

DPOGS202102
TranscriptDPOGS202102-TA1638 bp
ProteinDPOGS202102-PA545 aa
Genomic positionDPSCF300150 - 446011-450451
RNAseq coverage1737x (Rank: top 7%)
Annotation
HeliconiusHMEL0221411e-10973.88% 
BombyxBGIBMGA009139-TA9e-9839.79% 
Drosophila26-29-p-PA6e-9638.81% 
EBI UniRef50UniRef50_C7BWZ06e-15750.57%Putative C1A cysteine protease n=1 Tax=Spodoptera frugiperda RepID=C7BWZ0_SPOFR
NCBI RefSeqNP_001128673.14e-14648.11%cathepsin L like protein [Bombyx mori]
NCBI nr blastpgi|2547463462e-15650.57%putative C1A cysteine protease precursor [Spodoptera frugiperda]
NCBI nr blastxgi|2547463464e-15750.95%putative C1A cysteine protease precursor [Spodoptera frugiperda]
Group
Gene OntologyGO:00082341.1e-115cysteine-type peptidase activity
GO:00065088.7e-77proteolysis
KEGG pathwayhsa:15122e-54 
 K01366 (CTSH)maps-> Lysosome
InterPro domain[220-544] IPR0131281.1e-115Peptidase C1A, papain
[330-544] IPR0006688.7e-77Peptidase C1A, papain C-terminal
[243-298] IPR0132013.9e-21Proteinase inhibitor I29, cathepsin propeptide
Orthology groupMCL23659 Lepidoptera specific
Genotypes for resequenced monarchs and outgroup Danaus species

Nucleotide sequence:

>DPOGS202102-TA
ATGATTGCCTTTATATTAAAAATTTTTCCGTTACTCATTGTTGCATCGGTTGCTGGGAAAAATGTTCTTGAAGATGACCTTCCAAAACTGAAATGGCCCAAAAAGTATTCGTTCGAAGCCGAATCTCTGTCACTGACGTCAGGTTTGGTTCAAGATGTCACCTACTGGCGAGTCAGCAAAAAATCGAGGGTAGATTTTAACAAAGGTGCCGTAAAACTGATATCAATTAAGGGCCAGAGGAAGTCAAAATTTCCTTTCGGTGTAAAATATGAGATTCATCCCGAAAGTAATGAAGAATATGAGAACAAATTCATCTGCACGGGAATGAAAGGAAACATCTTCAGACAAGCCAAACTGGATAAGGTTTTGCCAGATGTTGACGATTTTGTCCACATTGGGAAGGAGAAACTTGAATTAGGTGAGGTGGAAAAGTTTACATTCTTTGAAGACAAAGATTATATTAACTCTCAAACGAGGCAGAATTTATGGGTGTTACAAAATGATTCAACATTTATACCCGTTAGATATGAGAAGATAATATATAATACTTGGATTAAAAATGTGAAAGATCACACAATTTGGAACATCTTCAACTTCAAAACCGATTTCAGCGAAGACGTCTTCGACACAGATGACTATGATTGCAAAATTAATTCGCCCAAAAATAACAATGAAAATGAAGAGGTTGATAGTGATGAAAGCACAAACTTGGATTCGGATCACGTATTCGCAGAATTTATGCAAAAGCACAATAAAAACTACGACGGTCCTGAACATGAGCAGCGCAGAAAAATTTTTGAAACTAATTTAAGAAAGATTGAGGAACATAATAGAAGTAATAAAAACTTCAAGCTAGCAATAAACAAGTTTGCTGATCTTACCCACAAAGAAATGGAAAAACGGAAGGGTCTCAAACGACGAGGCAAATCATCAGGCGCAATTCCATTTCCGTATAGCAAATCGAAGATCGCTGAAATGTCTGATACTCTACCGAAAGAATATGACGCGAGGATGTACGGCCTAGTAACATCGGTTAAGGATCAACAGGATTGTGGATCGTGTTGGACTTTTGGAACAACTAGCGCGGTAGAGGGAGCTCTAGCAAGAATAAATGGTGGAAGACTTATGAGACTCGCCAACCAAGCTCTTATAGACTGTGCCTGGGGATATGAGAATTTTGGCTGTGACGGGGGTACAGACACGGGAGCGTATCACTGGATGTTGAATTATGGCATGCCCACTGAAGAGGAGTATGGTCCATATGTGAACAAAGACGGTTTCTGTAGAATACACAATATGACGCAAACCTACAAGATAAAAGGATTTACTAACGTTACACCCTACAGCGTTGAAGCTCTTAAGGTGGCCTTGGTGAACCACGGTCCGTTGTCGGTGTCCATCGACGCTACAGACATGCTTACTTATTACAACGGCGGTATCTACTCCGATAGTGACTGCAGTACTACAAATTTAAACCATGAAGTAACTCTCGTCGGCTACGGTGAATTGGACGGTGAAGAGTATTGGATAGTGAAAAATTCTTGGGGTAGGGATTGGGGTGTTGACGGCTATTTCCATATCACAACCCGGGATAACAGCTGCGGGATCACCACTGAACCTACTTATGTAGTTTTCTAA

Protein sequence:

>DPOGS202102-PA
MIAFILKIFPLLIVASVAGKNVLEDDLPKLKWPKKYSFEAESLSLTSGLVQDVTYWRVSKKSRVDFNKGAVKLISIKGQRKSKFPFGVKYEIHPESNEEYENKFICTGMKGNIFRQAKLDKVLPDVDDFVHIGKEKLELGEVEKFTFFEDKDYINSQTRQNLWVLQNDSTFIPVRYEKIIYNTWIKNVKDHTIWNIFNFKTDFSEDVFDTDDYDCKINSPKNNNENEEVDSDESTNLDSDHVFAEFMQKHNKNYDGPEHEQRRKIFETNLRKIEEHNRSNKNFKLAINKFADLTHKEMEKRKGLKRRGKSSGAIPFPYSKSKIAEMSDTLPKEYDARMYGLVTSVKDQQDCGSCWTFGTTSAVEGALARINGGRLMRLANQALIDCAWGYENFGCDGGTDTGAYHWMLNYGMPTEEEYGPYVNKDGFCRIHNMTQTYKIKGFTNVTPYSVEALKVALVNHGPLSVSIDATDMLTYYNGGIYSDSDCSTTNLNHEVTLVGYGELDGEEYWIVKNSWGRDWGVDGYFHITTRDNSCGITTEPTYVVF-