Monarch geneset OGS2.0

DPOGS210481
TranscriptDPOGS210481-TA3459 bp
ProteinDPOGS210481-PA1152 aa
Genomic positionDPSCF300062 + 531982-542316
RNAseq coverage445x (Rank: top 28%)
Annotation
HeliconiusHMEL0097240.081.61% 
BombyxBGIBMGA001836-TA0.074.42% 
DrosophilaTsp-PF0.051.21% 
EBI UniRef50UniRef50_E0VQ350.057.85%Thrombospondin-3, putative n=5 Tax=Coelomata RepID=E0VQ35_PEDHC
NCBI RefSeqXP_308033.40.058.98%AGAP002157-PA [Anopheles gambiae str. PEST]
NCBI nr blastpgi|3479672760.058.98%AGAP002157-PA [Anopheles gambiae str. PEST]
NCBI nr blastxgi|3287069830.058.81%PREDICTED: thrombospondin-3-like [Acyrthosiphon pisum]
Group
Gene OntologyGO:00071555.6e-103cell adhesion
GO:00055095.6e-103calcium ion binding
GO:00055765.6e-103extracellular region
KEGG pathwaycqu:CpipJ_CPIJ0113430.0 
 K04659 (THBS)maps-> Malaria
    TGF-beta signaling pathway
    Focal adhesion
    Phagosome
    ECM-receptor interaction
InterPro domain[911-1126] IPR0089852e-110Concanavalin A-like lectin/glucanase
[914-1126] IPR0133206.7e-107Concanavalin A-like lectin/glucanase, subgroup
[928-1128] IPR0088595.6e-103Thrombospondin, C-terminal
[514-547] IPR0130916.3e-08EGF calcium-binding
[514-565] IPR0018812e-06EGF-like calcium-binding
Orthology groupMCL10285 Multiple-copy universal gene
Genotypes for resequenced monarchs and outgroup Danaus species

Nucleotide sequence:

>DPOGS210481-TA
ATGCTCATGCCTTGCTGTAACAGTGGACTGTTAAGACACAGATGCTTGAAGACTGAGCGCGCTACTAAAATACTTTCTACAACAATTTCATATAAAGTTGTAAGTTTAGATACTTTCAGAATAATCCCAGTTAAACTTAATAGGTTAGAAGCAACAAATGACGTCATAGCCGCCGCTTCAGCCACCGAAGATGGTGAAGTAGCTATTATAGTCCGAGGTCCGTACGGGGATAACTTAGTCCGTGAGGAATTGCTTCACGCGAAGAGCACAGACGATAACTCCGTCTCACTTTATTACAATAGTAAAAGTAAAAAGGTGTCATTGGAAAGTCTGAACGGGAATCACATCAAGTCAGTTTCCTGGAGTTTGGGTTCTCATTTTCATGGCACATTGATTCTTATCGTGACCCACTCCAGAATAAAGTTGGCGGTGGGATGCAAGCCGCTTCATTGGCATCCAATGTCCGGTAGGCATGACGTGCTAACACTTCTAGCGAACGAAAAGTTAAAATTGTACCACGAAGAGAATGCTCCGGTGGAGGTGTATGACAGCGAAAAGACAGCGTTAGACGCCTTGAACTGCAACCACAGGGACCTTAAACCTCCGACCTTATTGACAGTGGACTCTGATGTGGAGGAAGTCAAAGATTTTATAAAACGCGAAGAGAGAATGAAGATGGAAGATGAGATGCAAGGGGACGATCCGCGTAATAACTATATAGATCCTAACATTTACGCCCCACTGCCTCTGCCACCAACGACACCTGGCTCACAAAGAGGAGACATTCCTGCGACGGATATAGAATCTTGTGATGATGAAGTGATCCGTCAACTGAAACTTCTCCGTCAGACGATTGAACTTCTGCGTCGTGAGCTTGCAGACCAAAAAGGAACTATAGACGGACTCAGAAACCAACTCCGAGCTTGTTGCAACCGAGTCTCGCCACCTCCCATAGATAGATGTTCCGGATCTTCGTGCTATCCTGGCGTGCAGTGTCGCAACACGGCGACAGGCATCCAGTGCGGACCCTGTCCATCAGGGATGGAAGGTGATGGAAGAACATGCAGACCTATAACTTGCAATCGACGCCCATGCTCTAAAAACGAATATTGCATCGACACGGAACAAGGGTTTAGATGCGAGCGGTGTCCAGGAAGACAGACCAGCGACGGACAAACATGTCAATCAGCTTGTAGCTCCAATCCTTGCTTTGGAGGAAGAGTTCAATGTCAAGATTTACCGGATGGTAGGTATCGTTGTGGGTCTTGCCCCGCCGGTTATACAGGGAATGGGGAGCAGTGTGTTAGACTGTCTTGCCGTTCCAACACTTGCTTCCAAGGAGTTGAATGCCAGGAGACGGCGTCAGGTCCACGGTGTGGACCGTGTCCCCGGGGATACGACGGTGATGGTGTTCGTTGTGCACACGTTTGCTCGCGTCGACCCTGCGGGGAGAGACGCTGCAGCCCCTCGAACAGCAGTCCCTACTACATCTGCGAAGGTTGCCCCAAGGGCTACGAATGGAACGGTTACACATGCGTTGATATGGACGAGTGTGATTTAATACGTCCGTGTGACGAACTGGTGTCGTGTCGTAATACGGAGGGAGGGTTCGAGTGTGGCGCATGTCCGACAGGGTACAGGGGCAGTTCGGGATGGAGCGGTGCTGGCCAGGAGAGACGGAAGGAGGGATGCGTTGATGTAGACGAGTGTGACCAAGACGTCTGTCCTCGGGGACGGCTGTGTGTCAACACACCTGGTTCGTTCACGTGCGTTCCCTGCGGCGGCCACTACTACGTGAACACGTCTCGGCCGTGCATAGAGGCGGACTCCTTGCGGCGCTGCGACCCAGCCTTCTGCCGCTCTCATAACGCCGTGTGTGGCTTCGGACAGGGCTGTGTGTGTGCGACGGGCTGGGCCGGTAATGGTACTGTTTGCGGTACGGACAGTGATCTAGACGGATATCCGGATCAACAGTTGCCTTGTACTGAATTGCAATGCACAGCTGATAACTGTCCCCATGTGTCCAACTCGGGACAGGAGGACGCAGATAAGGACGGTATCGGAGATTCTTGCGATCCTGATGCTGATGGTGACGGCATACCGAATGTCCCGGACAATTGTCCCTTAACACCTAATCCAGATCAGCTAGATAGGGACGAGGATCGCAGTGACAAACGTGGGGATGCTTGTGACAATTGTCCAAGAAGATTTAACCCTGGACAAGAAGATGCAGATAACGATGGACTCGGAAACGTCTGCGATCCCGACATGGATAATGATGGCATTCCCAACGACCACGACAATTGTCCTCTCGTGTTCAACCCACAACAGGAAGATATGGATGGAGATGGTGTGGGTGATCTGTGCGACAACTGTCCAAGAGTACGGAACCCCTCCCAGGATGACTCCGACAAAGATAACGTTGGTGACGCCTGTGACAGTGACGTGGATAGAGACCAGGACGGCATACAGGACGGTTTGGATAATTGTCCGAATTTAGCGAACAGTGATCAGCAAGATGTTGATAATGATGGCAAGGGAGACGCTTGTGATGATGATATAGACGGTGATGGGATCCCGAACCTCGAAGACAACTGTCCTTTGGTGTACAATCCTGATCAGGCTGACGCTAATGGTGACGGTGTCGGGAACGTTTGCGACAACGACTTCGATGGAGACAACATCACTAACGCACTCGACAATTGCCCGAATAATTCGAGGATTTTTCGCACCGACTTCAGGAAGTATATGACGGTAAGGTTGGACCCAGAAGGTACCTCCCAGCAAGACCCACGCTGGCAGCTCGCACACGAGGGCGCTGAGATCACTCAAACCCTCAACTCAGATCCTGGACTGGCGGTCGGATTCGACAGCTTCGGAGGAGTTGACTTTGAAGGCACCTTATTTGTCGACTCGCACATAGACGATGACTACGTCGGCTTCATATTCGGCTACCAGAACAACAAGCGGTTTTATGTGGTGATGTGGAAGAAGAACAGCCAGACGTATTGGCAGACGACGCCGTTCAGGGCGGTCGCGGAGCCGGGGATACAGCTGAAGTTGGTGCACTCTAGCACTGGACCTGGGAAGATACTGAGGAACGCGCTCTGGAACACGGAGTCTACTCCTGATCAGGTGACACTTCTGTGGAAGGATCCTCGAAACGTCGGCTGGCGAGAGAAGACCGCGTACCGCTGGCGTCTCATACACAGACCCAAGATAGGACTGATTAGACTGAAGATATATGAGAACAACAGTCTCGTGGCTGACTCCGGGAACGTTTACGACTTCACGCTTAAGGGTGGAAGGCTGGGAGTTTTCTGCTTTTCCCAGGAAATGATCATTTGGTCCAACCTTGTGTACCGCTGTAACGATAAAATACCAACGAACATAGTATCAGAACTGCCACCAAGGCTCCTTAAAAAGTTGGATATAGACCACGACTTCGTTTATTTGTAG

Protein sequence:

>DPOGS210481-PA
MLMPCCNSGLLRHRCLKTERATKILSTTISYKVVSLDTFRIIPVKLNRLEATNDVIAAASATEDGEVAIIVRGPYGDNLVREELLHAKSTDDNSVSLYYNSKSKKVSLESLNGNHIKSVSWSLGSHFHGTLILIVTHSRIKLAVGCKPLHWHPMSGRHDVLTLLANEKLKLYHEENAPVEVYDSEKTALDALNCNHRDLKPPTLLTVDSDVEEVKDFIKREERMKMEDEMQGDDPRNNYIDPNIYAPLPLPPTTPGSQRGDIPATDIESCDDEVIRQLKLLRQTIELLRRELADQKGTIDGLRNQLRACCNRVSPPPIDRCSGSSCYPGVQCRNTATGIQCGPCPSGMEGDGRTCRPITCNRRPCSKNEYCIDTEQGFRCERCPGRQTSDGQTCQSACSSNPCFGGRVQCQDLPDGRYRCGSCPAGYTGNGEQCVRLSCRSNTCFQGVECQETASGPRCGPCPRGYDGDGVRCAHVCSRRPCGERRCSPSNSSPYYICEGCPKGYEWNGYTCVDMDECDLIRPCDELVSCRNTEGGFECGACPTGYRGSSGWSGAGQERRKEGCVDVDECDQDVCPRGRLCVNTPGSFTCVPCGGHYYVNTSRPCIEADSLRRCDPAFCRSHNAVCGFGQGCVCATGWAGNGTVCGTDSDLDGYPDQQLPCTELQCTADNCPHVSNSGQEDADKDGIGDSCDPDADGDGIPNVPDNCPLTPNPDQLDRDEDRSDKRGDACDNCPRRFNPGQEDADNDGLGNVCDPDMDNDGIPNDHDNCPLVFNPQQEDMDGDGVGDLCDNCPRVRNPSQDDSDKDNVGDACDSDVDRDQDGIQDGLDNCPNLANSDQQDVDNDGKGDACDDDIDGDGIPNLEDNCPLVYNPDQADANGDGVGNVCDNDFDGDNITNALDNCPNNSRIFRTDFRKYMTVRLDPEGTSQQDPRWQLAHEGAEITQTLNSDPGLAVGFDSFGGVDFEGTLFVDSHIDDDYVGFIFGYQNNKRFYVVMWKKNSQTYWQTTPFRAVAEPGIQLKLVHSSTGPGKILRNALWNTESTPDQVTLLWKDPRNVGWREKTAYRWRLIHRPKIGLIRLKIYENNSLVADSGNVYDFTLKGGRLGVFCFSQEMIIWSNLVYRCNDKIPTNIVSELPPRLLKKLDIDHDFVYL-