Monarch geneset OGS2.0

DPOGS204835
TranscriptDPOGS204835-TA3231 bp
ProteinDPOGS204835-PA1076 aa
Genomic positionDPSCF300227 - 281427-285251
RNAseq coverage248x (Rank: top 42%)
Annotation
HeliconiusHMEL0139040.078.12% 
BombyxBGIBMGA011732-TA0.070.34% 
DrosophilaCG4998-PB0.049.46% 
EBI UniRef50UniRef50_Q7QIM70.044.11%AGAP006954-PA n=1 Tax=Anopheles gambiae RepID=Q7QIM7_ANOGA
NCBI RefSeqXP_308802.20.051.46%CLIP-domain serine protease subfamily A (AGAP006954-PA) [Anopheles gambiae str. PEST]
NCBI nr blastpgi|3479652510.044.11%AGAP006954-PA [Anopheles gambiae str. PEST]
NCBI nr blastxgi|1892422690.046.32%PREDICTED: similar to CLIP-domain serine protease subfamily A (AGAP006954-PA) [Tribolium castaneum]
Group
Gene OntologyGO:00038241.5e-87catalytic activity
GO:00042529.5e-80serine-type endopeptidase activity
GO:00065089.5e-80proteolysis
KEGG pathwaycfa:4756242e-41 
 K01324 (KLKB1)maps-> Complement and coagulation cascades
InterPro domain[810-1071] IPR0090031.5e-87Peptidase cysteine/serine, trypsin-like
[826-1066] IPR0012549.5e-80Peptidase S1/S6, chymotrypsin/Hap
[856-871] IPR0013146.6e-14Peptidase S1A, chymotrypsin-type
Orthology groupMCL14055 Single-copy universal gene
Genotypes for resequenced monarchs and outgroup Danaus species

Nucleotide sequence:

>DPOGS204835-TA
ATGCGCTGGTTTATATGTTTCTGTGTGCTGCTATTATCTATAGCGAGTACATCAGCAGACTGGTCATGGGGTGGAGACGATTCTGATAAAAAAGATGAACCATCCAATGCTAAACCAACTGATCTATTGCAAGGCGAGGAGATAGATGTCGCACAAGCCAAAAACTTCAATTCGAATGGAACTATTCTTGATGATATCGTAGATGAATTAGTCAGCAATAAGCAAGGCAGAAGTTTGAGCGGATTTGACGATGTGTACAGTGACCCCACCATCAAGGAAGCGTTAGACTCGGGTGACGATAGCGAAGCAAGAAATTTGATAAAGGGCCGACTGTGTACCTTGGGATTAATACAATGCGAAGATGATGACACTCAAGAAAAGAGATTTTTATCACCAGACGAACTCATTTATGCTCAACCCGTTGACATTAAGCCTATTGGCAAACCTATCGCTTCCATACCTGTCCGTGGACCTCCAAGAGCTTATGGACCGCCAAAACCTATGTTGTACCCCCCACGCCCCCCGAAGATTCCATTAAAGAGGCCTGGATATGGAAATGTACGCCCTGGATTTTCGGAGAAGTATGGAGTAGCCGGTAATAACTATCAATTTTCACAAAGCAGCGGCTCGTTTAATGGATATGAAGCTAATTACGTAACAAAACCACCAAGTTTTGCAAACAATGAACCTTACAATTTTGAACATTCAAAACCAACCTACAACAAAATACCCTCAGGAAGCAACAATATTAAATCTGAATCTATTGTACAGCAACACGTTCATCACCACTATGTTCACGATGATTCTAATAAAGAACCTAAGGTTATAATCAAATCTGTGGCTATACCAGTTGGATCTGTAGGCCATTTAGCATCTCAAGTAAACACTCAATCATCATCCAACATCCTAACAGCATCTGGAGGAGATTTTAACACATTTAATTCAGGAGGATTCAAACCTATGACAGGCGGTTTTTCTCCCAGTAGCAAACCTGTCTACGAAACTGATACTATTTATGGATCTCAATACAGCCACAATAATTATAACAAGGGCAGTTCTAACGTTTTCAATCAAGGCTTACCAAATCAATTCGGTAGCAATACTTTTGAAGAGCAAAAATATGGCAATTCGCTAGGTTCTTATGCATCTCAGAATGAGTTTTATAAAAAAGAACTAAATGTAGGTTCCACCGGCAACTTATACACCCAAGGTCCAGCAACATTTTCACAAAATAATTTGTACCAACAAAATCAACATGAAGCCAAAGCTCAAGGTTTTGAATGTGTATGTGTAAAATACGACCAATGTCCAAGCCAAGAAATTATTGGGCGCAGAGACGACTTGTATTTACCAATTGATCCGCGTAACAAAGGTTCCGAAATATTAGCATTAACTGAAGAACAACTAGATGGAGTCAATAAAACTTCTGAGGAGATCAATGTCAGTCAAAATTCCACTGAGGCAAAGAAAATTAGTAAACGAGACGTTGATGAGGCCAAAACTAAAGATGCTGCAAAGGAAATTGAACCGCGCCTTCTGGGGTTAGCTGGATATGGAGGCAATGGCGGTAATAGCGACAAAAAAGTGCAGCCCACGTTTGGTGTTTCTTTTGGGTTGCCCCAGCCCTCCCATAGTTATCCCATTAATCCTTTCAACTCAAATCCTTTACACAATCCATACGGGCCCGCTCTAAATGGAGGCGGTCTTAATTTAGGCTTGGTCTCAGTTAATCCACTTTTAGCTGTTCAAGTGACGAAAAATGATTACGGGGAAAAGGTAGTAAAACCTTTTGTAAATTTGCATGTGACTCCAAATGAACACGTAGTTAATAAACTGGGTCATATATTCCACGAAAAAAAGCAATACCTTTTGAATAAACATGAACACTACCATCATTATAACCCCCATCCTTACCAGCCTTATCCTCATAGGCCATATGTCCCGCATCCTGTAGGTTATTCAGACCATTATGGTTTACATAGGCCACATTCTTACTCCCCACATTTTCCGCATTACGATGCTGGTCACTATCGCGTTAATCCATACAATGCACCTTCTGATAATGATGACTATTATGATGATGATGATGATAACAGTTACAACGCTGCTATAAGCTATAATGATCAAAATTACAATTTTGCCAAATCTGCTCAAACTAAAGAGAGCAATGGTCAAAATGATAATTATGCAAATAGATATTCATATTCGCGTTCCCTAACTATTCCTTCACAATCTGGGGCAAACCAGAAAAGCCAGACTGTAAGATTCCCAACAAACAGAAGGAAAAGGGAAGCCTCTCTAGCATCTGAAAAGATTAGTATACAAGAGCGTCAAGGTTATTTCGGTGGACCATCAATTCCCCAATGTAATCAAAATCAAGTTTGTTGCCGCCGGCCACTTAGACCGCAGGCATCAAATCGCGGTCAGTGTGGTATCAGACATTCCCAGGGAATCAATGGTAGAATAAAGACTCCATCATACATCGACGGGGATAGTGAATTCGGAGAGTATCCTTGGCAGGCTGCTATTTTGAAGAAGGACCCTAAAGAATCAGTTTACGTTTGTGGAGGCACACTTATTGATGGACTTCATATTATGACGGCGGCTCATTGCATCAAATCATACAAAGGATTCGAGCTGAGAGTTCGTCTAGGTGAATGGGACGTTAACCACGATGTTGAATTTTACCCATACATTGAACGAGATGTTATATCTGTTCATGTACACCCACAATACTACGCCGGCACATTAGACAACGACCTTGCTATTTTAAAATTAGAGCATCCAGTTGATTGGACCAAATATCCTCATATAAGTCCTGCATGTCTCCCTGATAAATACACCGATTACGCTGGACAAAGATGTTGGACAACTGGTTGGGGCAAGGATGCGTTTGGATCTAACGGAAAGTACCAAAATATTCTTAAGGAAGTAGATGTACCAATTCTACCCCATGGTCAATGCCAACAACAATTAAGACAAACTCGTTTGGGCTACAACTATGAGTTAAATCCTGGTTTTGTCTGTGCCGGTGGCGAAGATGGCAAAGATGCATGCAAAGGGGACGGTGGTGGCCCATTGGTTTGCGAGCGCAGCGGAACCTGGCAACTTGTAGGTGTTGTGTCTTGGGGAATCGGATGCGGTCAGGCTGGTGTACCAGGAGTTTACGTAAAAGTAGCTCATTATTTGGACTGGATCTCTCAAGTGACTGGGAAATTTTCCCAATTCTAA

Protein sequence:

>DPOGS204835-PA
MRWFICFCVLLLSIASTSADWSWGGDDSDKKDEPSNAKPTDLLQGEEIDVAQAKNFNSNGTILDDIVDELVSNKQGRSLSGFDDVYSDPTIKEALDSGDDSEARNLIKGRLCTLGLIQCEDDDTQEKRFLSPDELIYAQPVDIKPIGKPIASIPVRGPPRAYGPPKPMLYPPRPPKIPLKRPGYGNVRPGFSEKYGVAGNNYQFSQSSGSFNGYEANYVTKPPSFANNEPYNFEHSKPTYNKIPSGSNNIKSESIVQQHVHHHYVHDDSNKEPKVIIKSVAIPVGSVGHLASQVNTQSSSNILTASGGDFNTFNSGGFKPMTGGFSPSSKPVYETDTIYGSQYSHNNYNKGSSNVFNQGLPNQFGSNTFEEQKYGNSLGSYASQNEFYKKELNVGSTGNLYTQGPATFSQNNLYQQNQHEAKAQGFECVCVKYDQCPSQEIIGRRDDLYLPIDPRNKGSEILALTEEQLDGVNKTSEEINVSQNSTEAKKISKRDVDEAKTKDAAKEIEPRLLGLAGYGGNGGNSDKKVQPTFGVSFGLPQPSHSYPINPFNSNPLHNPYGPALNGGGLNLGLVSVNPLLAVQVTKNDYGEKVVKPFVNLHVTPNEHVVNKLGHIFHEKKQYLLNKHEHYHHYNPHPYQPYPHRPYVPHPVGYSDHYGLHRPHSYSPHFPHYDAGHYRVNPYNAPSDNDDYYDDDDDNSYNAAISYNDQNYNFAKSAQTKESNGQNDNYANRYSYSRSLTIPSQSGANQKSQTVRFPTNRRKREASLASEKISIQERQGYFGGPSIPQCNQNQVCCRRPLRPQASNRGQCGIRHSQGINGRIKTPSYIDGDSEFGEYPWQAAILKKDPKESVYVCGGTLIDGLHIMTAAHCIKSYKGFELRVRLGEWDVNHDVEFYPYIERDVISVHVHPQYYAGTLDNDLAILKLEHPVDWTKYPHISPACLPDKYTDYAGQRCWTTGWGKDAFGSNGKYQNILKEVDVPILPHGQCQQQLRQTRLGYNYELNPGFVCAGGEDGKDACKGDGGGPLVCERSGTWQLVGVVSWGIGCGQAGVPGVYVKVAHYLDWISQVTGKFSQF-