Monarch geneset OGS2.0

DPOGS209245
TranscriptDPOGS209245-TA1005 bp
ProteinDPOGS209245-PA334 aa
Genomic positionDPSCF300111 - 463744-464748
RNAseq coverage1769x (Rank: top 7%)
Annotation
HeliconiusHMEL0167432e-15776.05% 
BombyxBGIBMGA007061-TA1e-15273.13% 
DrosophilaCG10992-PA2e-10756.56% 
EBI UniRef50UniRef50_P106057e-10561.27%Cathepsin B n=58 Tax=Opisthokonta RepID=CATB_MOUSE
NCBI RefSeqNP_001036850.13e-15072.84%cathepsin B [Bombyx mori]
NCBI nr blastpgi|1184245514e-15576.90%cathepsin B-like cysteine proteinase [Spodoptera exigua]
NCBI nr blastxgi|2547463381e-16377.41%putative C1A cysteine protease precursor [Manduca sexta]
Group
Gene OntologyGO:00082341.4e-134cysteine-type peptidase activity
GO:00065082.5e-91proteolysis
GO:00041973.4e-13cysteine-type endopeptidase activity
GO:00507903.4e-13regulation of catalytic activity
KEGG pathwaycqu:CpipJ_CPIJ0005748e-114 
 K01363 (CTSB)maps-> Lysosome
    Antigen processing and presentation
InterPro domain[1-334] IPR0156431.4e-134Peptidase C1A, cathepsin B
[1-334] IPR0131281.4e-134Peptidase C1A, papain
[82-330] IPR0006682.5e-91Peptidase C1A, papain C-terminal
[24-63] IPR0125993.4e-13Peptidase C1A, propeptide
Orthology groupMCL17277 Patchy
Genotypes for resequenced monarchs and outgroup Danaus species

Nucleotide sequence:

>DPOGS209245-TA
ATGATACTGATTCGCGCTATCTGTTTAGTGTTTCTATGTGGAATTGCAGTTTCAGAAATTCCTCACCCATTATCTGATAAATTTATAGACCTTATAAATTCTAAACAAAACACATGGATCGCTGGACGTAATTTCGACATCGGTAGGACATTAAAATCTATCAAGAAACTCATGGGAGCTCTTGAAGATAAATACCTTCATAAGTTGTACACAGTTGAACACGATGATGACACAATAAATAATCTGCCTGAAAACTTTGACCCGCGCGACAAATGGCCAAATTGCCCTACTTTAAACGAAATAAGAGATCAAGGATCTTGTGGAAGCTGCTGGGCCTTCGGAGCAGTTGAAGCTATGACTGATCGTTATTGCACTTATTCAAATGGTACAAAACATTTCCATTTTTCGGCAGAAGATTTACTTAGCTGCTGTCCTGTTTGTGGACTGGGATGTAATGGTGGTATTCCTTCTTTTGCTTGGGAGTACTGGAAACATTTTGGTATTGTATCCGGAGGTAACTACAACTCATCACAAGGATGTCTCCCTTATGAAATACCTCCCTGCGAGCATCATGTACCCGGCAACAGAATCCCATGTAATGGTGAAACAAGCACTCCCAAATGTCACAGGAGTTGCAGGAAAGAATATACAAATTCGTATAAATCTGATAAAAAGTACGGAAAACATGTGTACTCCGTAGGAGGAGGTGAGGAACATATAAAAGCGGAAATATTTAAAAACGGTCCAGTTGAAGGTGCATTTACTGTGTATGCGGATTTGCTTACATACAAAAGTGGTGTCTATAAGCATACCGAGGGTGAAGCTCTTGGCGGACATGCAATTAAAATAATGGGATGGGGAGTTGAAAATGGAAACAAATATTGGTTAATTGCTAACTCTTGGAATTCAGATTGGGGAGACAACGGCTTCTTCAAAATCCTACGCGGTGAGGACCATTGCGGAATTGAAAGTTCAATTGTCGCCGGTGAACCATCGTATGATTAA

Protein sequence:

>DPOGS209245-PA
MILIRAICLVFLCGIAVSEIPHPLSDKFIDLINSKQNTWIAGRNFDIGRTLKSIKKLMGALEDKYLHKLYTVEHDDDTINNLPENFDPRDKWPNCPTLNEIRDQGSCGSCWAFGAVEAMTDRYCTYSNGTKHFHFSAEDLLSCCPVCGLGCNGGIPSFAWEYWKHFGIVSGGNYNSSQGCLPYEIPPCEHHVPGNRIPCNGETSTPKCHRSCRKEYTNSYKSDKKYGKHVYSVGGGEEHIKAEIFKNGPVEGAFTVYADLLTYKSGVYKHTEGEALGGHAIKIMGWGVENGNKYWLIANSWNSDWGDNGFFKILRGEDHCGIESSIVAGEPSYD-