Monarch geneset OGS2.0

DPOGS214009
TranscriptDPOGS214009-TA1149 bp
ProteinDPOGS214009-PA382 aa
Genomic positionDPSCF300313 - 3156-12658
RNAseq coverage568x (Rank: top 22%)
Annotation
HeliconiusHMEL0107176e-13570.59% 
BombyxBGIBMGA009230-TA6e-9878.47% 
DrosophilaCG12163-PA4e-3630.79% 
EBI UniRef50UniRef50_E2AWA79e-8443.82%Cathepsin O n=7 Tax=Formicidae RepID=E2AWA7_CAMFO
NCBI RefSeqXP_623690.15e-9044.36%PREDICTED: similar to Cathepsin O precursor [Apis mellifera]
NCBI nr blastpgi|3504156102e-9245.43%PREDICTED: cathepsin O-like [Bombus impatiens]
NCBI nr blastxgi|3838521751e-9146.35%PREDICTED: cathepsin O-like [Megachile rotundata]
Group
Gene OntologyGO:00082341.9e-92cysteine-type peptidase activity
GO:00065081.1e-64proteolysis
KEGG pathwayame:5512901e-89 
 K01374 (CTSO)maps-> Lysosome
InterPro domain[5-382] IPR0131281.9e-92Peptidase C1A, papain
[166-380] IPR0006681.1e-64Peptidase C1A, papain C-terminal
[37-99] IPR0132013.9e-10Proteinase inhibitor I29, cathepsin propeptide
Orthology groupMCL15452 Patchy
Genotypes for resequenced monarchs and outgroup Danaus species

Nucleotide sequence:

>DPOGS214009-TA
ATGAAGAAATGGTGGAATTGGATTCTTGTTGTGGCCTTAGTGTGTTTGTTATTCGTTGCTATACCTCTTTCATATCCCGATAGGACTAAAGAATCCCTTCGTCCCATGTTTGATGAGTATATAGAAAATTTCAATAAAACTTATAAGGACGACCCCGCCGAGTACGAAAAAAGATTAGAGCATTTTGTGGCCTCCGTAAAAGAGATAGATAGATTGAACTCAGCAGCAAGAGGTCCCGAACAGCACAGGGCGAGGTATGGACTCACACAAATGTCAGATATGTCGAAAGATGAATTCAGAGATGTACATCTATCAGACGAACAACCTCATCGATATAGAAGACATAAGCTAGGGAAGAGTTGGAGCAAAGGTAGAGTGAAGGATATTGAGGACGTGGCCGATAACATGGATGATTACGATGATGAGGATGATGATGATAAGGAGGGTAGTCCGCATCATAATATTTATATTGTCATCAGAAAGAAACGCGCCATGCTACCACTTCAGGTTGATTGGAGAACTAAGGGTGTGATAGGTCCCGTACGCGATCAGGGTCTGTGTGGAGCGTGCTGGGCTTTCAGTACGATTGGCACAATGGAAGCCATGGCTGCCATAGACACCGGCAAGCTTAACACGCTCAGTGTCCAGGAAGTTATAGACTGCGCTGGTTTGGGGAACAGCGGTTGTGCTGGTGGCGATATATGCCTTTTATTAGACTGGTTGCTCATGACGGATACCGCTGTCCAAGTTGAGAAGGAGTATCCTCTCAAGCTGACGAACGGTGTATGTCAGGCTAAGAAAAATGCAACCGGTGTCAAAGTCGCCAAGTTCACGTGTACCGATCTGGTGGGCGCGGAGGATAAGATAATCGAGTCTATAGCAACCCATGGTCCAGTGGCCGTCGCGGTGAACGCGCTCACGTGGCAGAACTACCTTGGCGGTGTCATACAGTACCATTGCAGCGGTAGCCCCAAAGAACTGAACCACGCTGTAGAGCTAGTAGGTTATGATCTAACAGCAGAGGTACCTTACTACATAGCCAAGAACTCGTGGGGCCAAGGTTTTGGTCTCGACGGATATCTTAAACTGGCGATCGGATGCAACATATGCGGACTAGCCAATGAGGTAGCTAGCATAGACATTAAATAG

Protein sequence:

>DPOGS214009-PA
MKKWWNWILVVALVCLLFVAIPLSYPDRTKESLRPMFDEYIENFNKTYKDDPAEYEKRLEHFVASVKEIDRLNSAARGPEQHRARYGLTQMSDMSKDEFRDVHLSDEQPHRYRRHKLGKSWSKGRVKDIEDVADNMDDYDDEDDDDKEGSPHHNIYIVIRKKRAMLPLQVDWRTKGVIGPVRDQGLCGACWAFSTIGTMEAMAAIDTGKLNTLSVQEVIDCAGLGNSGCAGGDICLLLDWLLMTDTAVQVEKEYPLKLTNGVCQAKKNATGVKVAKFTCTDLVGAEDKIIESIATHGPVAVAVNALTWQNYLGGVIQYHCSGSPKELNHAVELVGYDLTAEVPYYIAKNSWGQGFGLDGYLKLAIGCNICGLANEVASIDIK-