Monarch geneset OGS2.0

DPOGS210925
TranscriptDPOGS210925-TA4254 bp
ProteinDPOGS210925-PA1417 aa
Genomic positionDPSCF300045 + 605146-627899
RNAseq coverage283x (Rank: top 39%)
Annotation
HeliconiusHMEL0035071e-14935.80% 
BombyxBGIBMGA005131-TA1e-5740.39% 
DrosophilaCG12163-PA1e-5337.62% 
EBI UniRef50UniRef50_G7Y3979e-6435.19%Cathepsin F n=1 Tax=Clonorchis sinensis RepID=G7Y397_CLOSI
NCBI RefSeqXP_003142769.11e-5843.64%ctsf protein [Loa loa]
NCBI nr blastpgi|463094231e-6442.33%ORF31 [Agrotis segetum granulovirus]
NCBI nr blastxgi|463094238e-6642.68%ORF31 [Agrotis segetum granulovirus]
Group
Gene OntologyGO:00082349.2e-130cysteine-type peptidase activity
GO:00065086.7e-89proteolysis
KEGG pathwayppp:PHYPADRAFT_2091584e-57 
 K01373 (CTSF)maps-> Lysosome
InterPro domain[51-346] IPR0131289.2e-130Peptidase C1A, papain
[1206-1414] IPR0006686.7e-89Peptidase C1A, papain C-terminal
[397-452] IPR0132013.9e-17Proteinase inhibitor I29, cathepsin propeptide
Orthology groupMCL34854 Lepidoptera specific
Genotypes for resequenced monarchs and outgroup Danaus species

Nucleotide sequence:

>DPOGS210925-TA
ATGGCCACCTCCAAGAAACTATCAGAAATCGAGTCTAGAAAATTAACACTCTCCTACTCTGTCGCCATTAAGCATTATAACGATCTGTCACAAAAAAATAATAATAAAGTTACATTACAATATAATGTATATTCTAGGGAGTTAGGTCAAAAGCATCTGTACTCTCTGGAGGAGGCTCCAACACTTTTCGAAAGGTTTATAAAAGATTACAATAAAGAGTATGATGAGAGCGAGAAAGAAGAAAGGTTTAAAATATTTGTGAACAATTTAAAGGATATTAACGCTATGAACGAGAGGAGTTCGAATGCTGTTTACGGTATCAATAAGTTTTCAGATCTAAGCAAAGAAGAATTCGTAAAATATTATACTGGTTTGAAACGAGAAGAGAGTCCATCGAATGAGGATCATAAAAAAACTGATTTGCCAGAATCATTTAATGTTACTGCACCGGATCAATTTGATTGGCGAAAGAAAGGAGTTGTCAGCAGCATTAAAAATCAAAAACATTGTGGTTCATGCTGGGCATTTAGTGCAGCTGCTAATGTCGAAAGTATACATGCTATAAAGACTGGTAAGCTCATAGACGTGTCTGAGCAACAACTACTGGACTGTGATAAATATGATTCGGGATGTTCAGGAGGACTCCCATGGGATGCTTTGAGATATTTCGTCGCTAATGGTGCAATGTCTTTGAAGTCTTATCCTTACGTTGCTAAAGAAGGAAAATGCCGTTATGATAGCAGTAAAGTTGAAATAAGATTAAAGGAATATAAACACAAAGAAAAACTATCGGAAGACCAAATTAAGGAACATCTTTACAATATCGGACCGTTGAGTATAGCTATAACGTCATCACCACTTGCATCGTATAATGGGGGAATTCTTATTGAAGAGTGTCATAGAAGTTATCTAATCAATCACGCTGTTCTTTTGGTAGGATACGGAAAAGAAAACGGCGTTAAATACTGGATCGTCAAGAATTCCTGGGGTCAGAATTGGGGGGAAAATGGTTATTTTAGAATGAAGATGGGAGTGAATTGCTTATTGAGAGTTAGGTCAAAAGTGACTGAGCAACAACCAGTAGACTATGATATCTGGGATGAGGGATGTTCAGGAGGGATGCAGTGGTTGGCGATAAGGGAGTTAGGTCAAAGGCGTCTGTACTCTCTGGAGGAGGCTCCAACACTTTTCGAACAGTTTATAAAAGATTACAATAAAGAGTATGATGAGAGCGAGAAGGAAGAAAGGTTTAAAATATTTGTGAACAATTTAAAGGATATTAACGCTATGAACGAGAGGAGTTCGAATGCTGTTTACGGTATCAATAAGTTTTCGGATCTGAGCAAAGAAGAATTCATAAAATATTATACTGGTCTGAAACGAGACAGGTGTACAACGACTGAGCATCATAAAAGTACTGATTTGCCAAAATCATTTAATATTACTGCACCGGATCAATTTGATTGGCGAAAGAAAGGAGTTGTCAGCAGCGTTAAAAATCAAAGACATTGTGGTTCATGCTGGGCATTTAGTGCAGCTGCTAATGTCGAAAGTATACATGCTATAAAGACTGGTAAGCTCATAGACGTGTCTGAGCAACAACTACTGGACTGTGATAAATATGATTCGGGATGTTCAGGAGGACTCGAATGGATTGCCATGAGAGAGTTAGGTCAAAGGCGTCTGTACTCTCTGGAGGAGGCTCCAACACTTTATGAACAGATTATAAAAGATTACAAGAAAGAGTATGATGTGACCGAGAAGGAAGAAAGGTTTAAAATATATTCTAGGGAGTTAGGTCAAAGGCGTCTGTACTCTCTGGAGGAGGCTCCAACACTTTTCGAACAGTTTATAAAAGATTACAATAAAGAGTATGATGAGAGCGAGAAGGAAGAAAGGTTTAAAATATTTGTGAACAATTTAAAGGATATTAACGCTATGAACGAGAGGAGTTCGAATGCTGTTTACGGTATCAATAAGTTTTCGGATCTGAGCAAAGAAGAATTCATAAAATATTATACTGGTCTGAAACGAGAGGAGAGTCCATCGAATGAGGATCATAAAAAAACTGATTTGCCAGAATCATTTAATGTTACTGCACCGGATCAATTTGATTGGCGAAAGAAAGGAGTTGTCAGCAGCATTAAAAATCAAAAACATTGTGGTTCATGCTGGGCATTTAGTGCAGCTGGTAATGTCGAAAGTATACATGCTATAAAGACTGGTAAGCTCGTACACGTGTCTGAGCAACAACTAGTGGATTGTGATAGCCAGGATTCGGGATGTTCAGGAGGCTTGACATGGAATGCCATGAGATATTTCCGTACAAATGGTGCAGTGTCTTTGAAATCTTATCCTTACGTGGCTCAAAACGAAAATTGCCGCTATGATAGCAATAAAGTTGTAATCAGATTAAAGGACTACAAACACATCACACAACTGTCAGAAGATCAAATTAAGGAACATCTTTACAATATAGGACTATTGAGTATAGATATAACTTCAACGCAACTTACATGGTATGAAGGTGGAATTCTTATTGAAGAGTGTCGTAGAAGCGATCTAGTCGATCACGCTGTTCTTTTGGTAGAATACGGAAAAGAAAACAGCGTTGAATACTGGATCGTCAAGAATTCCTGGGGTCAGAATGGGGGGGAAAAAGTTGCATTACAATATAATGTATATTCTAGGGAGTTAGGTCAAAAGCATCTGTACTCTCTAGAGGAGGCTCCAACACTTTTCGAACAGTTTATAAAAGATTACAATAAAGAGTATGATGAGAGCGAGAAGGAAGAAAGGTTTAAAATATTTGTGAACAATTTAAAGGATATTAACGCTATGAACGAGAGGAGTTCGAATGCTGTTTACGGTATCAATAAGTTTTCGGATCTGAGCAAAGACGAATTCGTGAAATTTTATACCGGTCTGAAACGAGAAGAGAGTCCATCGAATGAGGATCATAAAAAAACTGATTTGCCAAAATCATTTAATGTTACTGCACCGGATCAATTTGATTGGCGAAAGAAAGGAGTTGTCAGCAGCGTAAAGTTTCAAGGACATTGTGTTTCATGCTGGGCATTTAGTGTGGCTGGTAATGTTGAAAGTATAAATGCTATAAAGACTGGTAAGCTCATAGACGTGTCTGAGCAACAACTAGTGGATTGTGATGAGTGGAATTTTGGATGTTCAGGAGGGATTGCCGTTGTTGGCGATGAGAGAGTTAGGTCAAAAGTGACTGAGCAACAACCAGTAGACTATGATATCTGGGATGAGGGATGTTCAGGAGGGATGCAGTGGTTGGCGATAAGGGAGTTAGGTCAAAGGCGTCTGTACTCTCTGGAGGAGGCTCCAACACTTTTCGAACAGTTTATAAAAGATTACAATAAAGAGTATGATGAGAGCGAGAAGGAAGAAAGGTTTAAAATATTTGTGAACAATTTAAAGGATATTAACGCTATGAACGAGAGGAGTTCGAATGCTGTTTACGGTATCAATAAGTTTTCGGATCTGAGCAAAGAAGAATTCATAAAATATTATACTGGTCTGAAACGAGAGGAGAGTCCATCGAATGAGGATCATAAAAAAACTGATTTGCCAGAATCATTTAATGTTACTGCACCGGATCAATTTGATTGGCGAAAGAAAGGAGTTGTCAGCAGCATTAAAAATCAAAAACATTGTGGTTCATGCTGGGCATTTAGTGCAGCTGCTAATGTCGAAAGTATACATGCTATAAAGACTGGTAAGCTCATAGACGTGTCTGAGCAACAACTACTGGACTGTGATAAATATGATTCGGGATGTTCAGGAGGACTCCCATGGGATGCTTTGAGATATTTCGTCGCTAATGGTGCAATGTCTTTGAAGTCTTATCCTTACGTTGCTAAAGAAGGAAAATGCCGTTATGATAGCAGTAAAGTTGAAATAAGATTAAAGGGCTATAAAATCTTCAGCAAAATATCGGAAGACCAAATTAAGGAACATCTTTACAATATCGGACCGTTGAGCATAGCTATTGATGTATCACCCATTAAGCCGTATGTAGGGGGGATTGTTATGGAAGAGTGTCATGAAGTCTGTCAGGTCAATCACGCAGTTCTTTTGGTAGGATACGGAAAAGAATACTCCGTTGAATACTGGATCGTCAAAAATTCTTGGGGTCCCAATTGGGGGGAAAATGGTTATTTTAGGATGGAGAGGGGAGTGAATTGCTTATTGTTAACTTCAACCGGAATTACAACAGCTGTTATATAA

Protein sequence:

>DPOGS210925-PA
MATSKKLSEIESRKLTLSYSVAIKHYNDLSQKNNNKVTLQYNVYSRELGQKHLYSLEEAPTLFERFIKDYNKEYDESEKEERFKIFVNNLKDINAMNERSSNAVYGINKFSDLSKEEFVKYYTGLKREESPSNEDHKKTDLPESFNVTAPDQFDWRKKGVVSSIKNQKHCGSCWAFSAAANVESIHAIKTGKLIDVSEQQLLDCDKYDSGCSGGLPWDALRYFVANGAMSLKSYPYVAKEGKCRYDSSKVEIRLKEYKHKEKLSEDQIKEHLYNIGPLSIAITSSPLASYNGGILIEECHRSYLINHAVLLVGYGKENGVKYWIVKNSWGQNWGENGYFRMKMGVNCLLRVRSKVTEQQPVDYDIWDEGCSGGMQWLAIRELGQRRLYSLEEAPTLFEQFIKDYNKEYDESEKEERFKIFVNNLKDINAMNERSSNAVYGINKFSDLSKEEFIKYYTGLKRDRCTTTEHHKSTDLPKSFNITAPDQFDWRKKGVVSSVKNQRHCGSCWAFSAAANVESIHAIKTGKLIDVSEQQLLDCDKYDSGCSGGLEWIAMRELGQRRLYSLEEAPTLYEQIIKDYKKEYDVTEKEERFKIYSRELGQRRLYSLEEAPTLFEQFIKDYNKEYDESEKEERFKIFVNNLKDINAMNERSSNAVYGINKFSDLSKEEFIKYYTGLKREESPSNEDHKKTDLPESFNVTAPDQFDWRKKGVVSSIKNQKHCGSCWAFSAAGNVESIHAIKTGKLVHVSEQQLVDCDSQDSGCSGGLTWNAMRYFRTNGAVSLKSYPYVAQNENCRYDSNKVVIRLKDYKHITQLSEDQIKEHLYNIGLLSIDITSTQLTWYEGGILIEECRRSDLVDHAVLLVEYGKENSVEYWIVKNSWGQNGGEKVALQYNVYSRELGQKHLYSLEEAPTLFEQFIKDYNKEYDESEKEERFKIFVNNLKDINAMNERSSNAVYGINKFSDLSKDEFVKFYTGLKREESPSNEDHKKTDLPKSFNVTAPDQFDWRKKGVVSSVKFQGHCVSCWAFSVAGNVESINAIKTGKLIDVSEQQLVDCDEWNFGCSGGIAVVGDERVRSKVTEQQPVDYDIWDEGCSGGMQWLAIRELGQRRLYSLEEAPTLFEQFIKDYNKEYDESEKEERFKIFVNNLKDINAMNERSSNAVYGINKFSDLSKEEFIKYYTGLKREESPSNEDHKKTDLPESFNVTAPDQFDWRKKGVVSSIKNQKHCGSCWAFSAAANVESIHAIKTGKLIDVSEQQLLDCDKYDSGCSGGLPWDALRYFVANGAMSLKSYPYVAKEGKCRYDSSKVEIRLKGYKIFSKISEDQIKEHLYNIGPLSIAIDVSPIKPYVGGIVMEECHEVCQVNHAVLLVGYGKEYSVEYWIVKNSWGPNWGENGYFRMERGVNCLLLTSTGITTAVI-