Monarch geneset OGS2.0

DPOGS210899
TranscriptDPOGS210899-TA1011 bp
ProteinDPOGS210899-PA336 aa
Genomic positionDPSCF300045 - 601615-603604
RNAseq coverage1231x (Rank: top 10%)
Annotation
HeliconiusHMEL0035078e-5745.49% 
BombyxBGIBMGA005131-TA1e-5842.35% 
DrosophilaCG12163-PA2e-5136.48% 
EBI UniRef50UniRef50_Q8V5U04e-6240.62%Viral cathepsin n=8 Tax=Alphabaculovirus RepID=CATV_NPVHZ
NCBI RefSeqXP_002734978.11e-5943.14%PREDICTED: cysteine proteinase inhibitor-like [Saccoglossus kowalevskii]
NCBI nr blastpgi|2099788241e-6444.08%cathepsin [Adoxophyes orana nucleopolyhedrovirus]
NCBI nr blastxgi|2099788245e-6544.08%cathepsin [Adoxophyes orana nucleopolyhedrovirus]
Group
Gene OntologyGO:00082342.8e-115cysteine-type peptidase activity
GO:00065084.2e-77proteolysis
KEGG pathwaytet:TTHERM_012764004e-60 
 K01373 (CTSF)maps-> Lysosome
InterPro domain[1-322] IPR0131282.8e-115Peptidase C1A, papain
[126-335] IPR0006684.2e-77Peptidase C1A, papain C-terminal
[41-96] IPR0132011.1e-14Proteinase inhibitor I29, cathepsin propeptide
Orthology group 
Genotypes for resequenced monarchs and outgroup Danaus species

Nucleotide sequence:

>DPOGS210899-TA
ATGATCGTTTTCGTACTCTGCGCCATCTCCTTCACAGCGGCTGCACCGCAGAATGATGTGAGCGATGTGGAGAAAGTACGGAAACCAGTATTTTATTCTATGGACGAAGCTCCAATACTCTTTGAAAACTTCATCAGAGAATATAATAAAAAGTATGACTCCAAAGAAAAGGAAGAGAGATTCAAGATATTTGTAAACAATTTAAAGAGAATAAATGATCTAAACCACAAGAGTACGAACGCTGTTCACGGTATTAACAAGTTCACAGATCTGAGCAAAGAAGAGTTCAAAAAGTTTTATACAGGTTTCAAGCCGGACAAAAGCTTTTTGGATGATAACATTAAAAAACCGAGTCAATTATCATTTAATATCACCGCACCGCCTGCGTTTGATTGGCGAGATAAAGGAGTCGTCACCAGAGTGAAGAACCAAGGAACATGTGGCTCATGCTGGGCATTTAGTACAATCGGTAACGTGGAAAGTGTGAACGCAATCAAACACGGGAACCTTGTGGAATTATCAGAACAACAATTGGTAGACTGTGACAGCAAAGATGAGGCGTGTGACAGCGGATTACCAGATAACGCACAACAATACCTCGTATCACACGGTGCTATCTCTGAACAATCTTACCCATACAAAGGATATGCCGCAAACTGTACATACGATAGCAGTCAGGTTGTTGTTAGATTAAGTAATTTTGAAAAAGTTGTATTGTCAGAGTGTCAAATGGCCGAAAAGCTTTACAGCACCGCACCATTGAGTATAGTTATTGCTGCAGAAGTATTAGGTACATATACTAAGGGTATCCTCGTCAATGAATGTGAACAAAGTCAAGACCTCAATCATGCTGTGCTTTTGGTAGGCTACGGAAACGAGGGAGGCACTAACTTCTGGATCCTCAAGAATTCTTGGGGAACTAACTGGGGTGAAGGCGGTTACTTCAGAATAAAGCGAGGTGTCAACTGTCTTATGATCACCGATTACGGAGTCCTTTCAGGAATCATATAA

Protein sequence:

>DPOGS210899-PA
MIVFVLCAISFTAAAPQNDVSDVEKVRKPVFYSMDEAPILFENFIREYNKKYDSKEKEERFKIFVNNLKRINDLNHKSTNAVHGINKFTDLSKEEFKKFYTGFKPDKSFLDDNIKKPSQLSFNITAPPAFDWRDKGVVTRVKNQGTCGSCWAFSTIGNVESVNAIKHGNLVELSEQQLVDCDSKDEACDSGLPDNAQQYLVSHGAISEQSYPYKGYAANCTYDSSQVVVRLSNFEKVVLSECQMAEKLYSTAPLSIVIAAEVLGTYTKGILVNECEQSQDLNHAVLLVGYGNEGGTNFWILKNSWGTNWGEGGYFRIKRGVNCLMITDYGVLSGII-