Monarch geneset OGS2.0

DPOGS204282
TranscriptDPOGS204282-TA3210 bp
ProteinDPOGS204282-PA1069 aa
Genomic positionDPSCF300046 + 100970-112712
RNAseq coverage8071x (Rank: top 2%)
Annotation
HeliconiusHMEL0033120.048.27% 
BombyxBGIBMGA007558-TA7e-15447.54% 
Drosophila% 
EBI UniRef50UniRef50_D6WKI94e-7942.40%Putative uncharacterized protein n=3 Tax=Tribolium castaneum RepID=D6WKI9_TRICA
NCBI RefSeqXP_973726.11e-9436.15%PREDICTED: similar to inter-alpha (globulin) inhibitor H4 (plasma Kallikrein-sensitive glycoprotein) [Tribolium castaneum]
NCBI nr blastpgi|472189884e-10034.33%unnamed protein product [Tetraodon nigroviridis]
NCBI nr blastxgi|472189887e-10134.20%unnamed protein product [Tetraodon nigroviridis]
Group
Gene OntologyGO:00055154.8e-14protein binding
KEGG pathway 
InterPro domain[40-170] IPR0065871.7e-34Vault protein inter-alpha-trypsin, metazoa
[55-170] IPR0136943.4e-33Vault protein inter-alpha-trypsin
[453-697] IPR0020354.8e-14von Willebrand factor, type A
Orthology groupMCL11007 Patchy
Genotypes for resequenced monarchs and outgroup Danaus species

Nucleotide sequence:

>DPOGS204282-TA
ATGAAGAAATCCTGGATATACCTGTTTTACATAGTTTTTATCGTTGCAAAAGCACAAACCGCCTCAATTTCTAGCACCGAAACTTTGGTTGTTGCCAAGACAGATGATGAGGCGTCAACGGCTGCTCCGTCTGAACCAGTAACCGACGAACCAAACGCTCCTATCAAAGTGACAGAAATGAGAGTTAATTCGGAGGTGACGATGCGGTACGCACATACAGCTGTTGTCACACACGTCAGAAACCCAGCTTCCAAAGCACAGGAGGCAACCTTCCATGTGCTGTTGCCAGAGACAGCCTTCATCAGCGGCTTCATAATGACGTTGGGCGGGAAATCGTATAAGGCTTACGTAAAAGAAAAAAATGAAGCGAAACAAATTTTCAACGAAGCTGTCTCTCACGGGACTGGGGCGGCCCACATCGCGGCCAAAGCTCGTGATTCAAACCATTTCACAGTATCAGTGAATGTGGAGCCGAAGAGTGTTGCTATATTCAATCTGACCTATGAAGAGTTATTGGTGCGTCGCAACGGCGTTTACAACCACGCAATCAACCTTCACCCGGGAACCTTAGTACCCAAGCTGGAGGTGGTGGTACACATCAAGGAGTCCCAGAAGATCACGACGCTCCGAGTGCCTGAGGTCAGGACTGGCAATGAAATCGATGCTACAGAAAACGACGCACAAAATTCAAAGGCTGTCCAAACTAGAAATGGCGACAAGGAAGCTACCATTACATTCACGCCCGACTTGGACGAACAGATGAACCTTATTAAGATATATAAGGACAAAACAAAAGATACCGTGGCACATCATTATTGGGACAACAATGAGGAAGAAGACAACAGAGACGGAGTTTTGGGACAATTTGTTGTTCAATACGACGTGGAACGTTCGAACGATGGAGAAGTCTTGGTGAATGATGGATATTTTGTGCACTTCCTGGCACCCAGCTCGTTGCCACCACTCAACAAGTACGTGGTATTTGTGCTGGACACTTCCAGCTCTATGATCGGTCGCAAGGTGGAACAATTGATTGCAGCTATGGACGCCATACTGTCCGACCTCAACCCGAAAAATTCGAAGGCTGTCCAAACTAGAAATGGCGACAAGGAAGCTACCATTACATTCACGCCCGACTTGGACGAACAGATGAACCTTATTAAGATATATAAGGAAAAAACAAAAGATACCGTGACACATCATTATTGGGACAACAATGAGGAAGAAGACAACAGAGACGGAGTTTTGGGACAATTTGTTGTTCAATACGACGTGGAACGTTCTAACGATGGAGAAGTCTTGGTGAATGATGGATATTTTGTGCACTTCCTGGCACCCAGCTCGTTGCCACCACTCAACAAGTACGTGGTATTTGTGCTGGACACATCCAGCTCTATGATCGGTCGCAAGGTGGAACAATTGATCGCAGCTATGGACGCCATACTGTCGGACCTCAACCCGAGTGATTACTTCAGCATTGTTGAATTTAACTCCGACTACTCGGTCCATGAGCTGAAAGAAGCGGATGAGCCTCAACCTGAACCTCAAAAGTTTTCTTGGTATGGATCAACGTCATCATCAAACAAGGAACTTGTCTCACCATCACTTGCTTCACCTGAGAACATCGCTAAGGCCAAGGTTATCATTTCCAGATTACGGGCTAATGGAGGAACCAATATCCACAGCGCTTTGAGCGTAGCTATGGATCTTATTCATAAGTTCTCTGGAAAGCACGATATTTCTTCTGAAAAATCGAATTCAAGTGACGCTGCAAACGAAAAAGCGATAGCAAATGCTAACGACTTGAAAACCAAACCAGTCCATGAATTGGAGCCCATCATTATTTTCCTGACGGACGGCGACCCGACCGTCGGAGAGACCAGCACCTCGCGTATCATCTCACACGTCACCGAGAAGAACTCCGGAGAAATGAGGGCTTCCTTGTTCTCACTTGCTTTCGGTGAGGATGCGGATCGCAACTTCTTGAGAAAGCTATCACTGCGTAACGAAGGCTTCATGCGGCACATCTACGAGGCGGCGGATGCGGCGCTTCAGCTGAGAGACTTCTACAAACAGGTCTCCTCTCCACTGCTGGCTCACGTCAAGTTCACATACCCACGGGAACAGATAAAAGAGGGTTCAGTTAGTAAGAACAAGTTCCGCACCGTGTACGCGGGTTCAGAGGTAGTAGTGGCTGGGGAGCTCTCTGACGACGACGTTGATTTGAGACCTGTCGTTAGTGGCTTCTGCGGGAACCAAAATGGAAAATTGATTCCATATGAAAATGATCAGTCCAAGATCAAAGTCACTCGCGTGAAGGAGTTCTTACCTCTGGAGCGCCTGTGGGCGTACCTGAGTATCCATCAGCTATTGGACCAACGTGACGCCTCCGAAGATACAGCCGCCAAAGAGCATGAGAAGAAAGCACTCAATTTAGCGCTGAAGTACTCGTTCGTGACTCCCCTAACGTCGTTGGTGGTGGTAAAGCCGAACGAAACGAACGCCGTGGACGCTGAATCTGTAGACAAAAATAACAACACACTGTCGTTTAATGCAATGCCTCAAGCGCCTTTAAGTCATCATTTATTGATAGCACCACCAGCGTACAGACCCATGGTTATGGGTGGGAATGGAGACGCACTCGCGTTGGTAGGAGGTTTCCATGCTCAAGTAGAAGACGAGGAGGTCGACGAAAAATATGACGACATTGGCCAGATCAGTCTCAACAGAGCTGGTTACAGATTCGATTCAGACGAGGACGATTATGATGGCTTTATAGGTTCAAGTTCATTTATTACAACACCAGCACCAGTGCAGGACTTTTTTGAAACTGTCGCTACCGAAGTCCCAGATCAGGACAAATACCATTTAGAGAACTACATGTGGGCTTTAGCTTTAGTGAACAACACCGCTGACGCCCTCGTGTTTATGGATAATGGAACCGAAATCGTTTTACAGCTCTCTAAAGATAGTAATGCTCCTCGTGGTAGCTCTGAGGAGTCCTGCACGAACGTGCCCGTTGACGCGGCGAGCCCTGCTTCGGGCCCTGAACCCGTGAAGGCCTCCTGTGTCTATATCACTCGCTGTTCCGCAGCCAGGAACATCACCGAAGATGACTATCGCAGATCATACTGTCGCGTTGACAACAAATACGCTGGTGTTTGCTGCCCGAGTAGCCAAATAGACACCGAAGTGCTACCTCTTATCTAA

Protein sequence:

>DPOGS204282-PA
MKKSWIYLFYIVFIVAKAQTASISSTETLVVAKTDDEASTAAPSEPVTDEPNAPIKVTEMRVNSEVTMRYAHTAVVTHVRNPASKAQEATFHVLLPETAFISGFIMTLGGKSYKAYVKEKNEAKQIFNEAVSHGTGAAHIAAKARDSNHFTVSVNVEPKSVAIFNLTYEELLVRRNGVYNHAINLHPGTLVPKLEVVVHIKESQKITTLRVPEVRTGNEIDATENDAQNSKAVQTRNGDKEATITFTPDLDEQMNLIKIYKDKTKDTVAHHYWDNNEEEDNRDGVLGQFVVQYDVERSNDGEVLVNDGYFVHFLAPSSLPPLNKYVVFVLDTSSSMIGRKVEQLIAAMDAILSDLNPKNSKAVQTRNGDKEATITFTPDLDEQMNLIKIYKEKTKDTVTHHYWDNNEEEDNRDGVLGQFVVQYDVERSNDGEVLVNDGYFVHFLAPSSLPPLNKYVVFVLDTSSSMIGRKVEQLIAAMDAILSDLNPSDYFSIVEFNSDYSVHELKEADEPQPEPQKFSWYGSTSSSNKELVSPSLASPENIAKAKVIISRLRANGGTNIHSALSVAMDLIHKFSGKHDISSEKSNSSDAANEKAIANANDLKTKPVHELEPIIIFLTDGDPTVGETSTSRIISHVTEKNSGEMRASLFSLAFGEDADRNFLRKLSLRNEGFMRHIYEAADAALQLRDFYKQVSSPLLAHVKFTYPREQIKEGSVSKNKFRTVYAGSEVVVAGELSDDDVDLRPVVSGFCGNQNGKLIPYENDQSKIKVTRVKEFLPLERLWAYLSIHQLLDQRDASEDTAAKEHEKKALNLALKYSFVTPLTSLVVVKPNETNAVDAESVDKNNNTLSFNAMPQAPLSHHLLIAPPAYRPMVMGGNGDALALVGGFHAQVEDEEVDEKYDDIGQISLNRAGYRFDSDEDDYDGFIGSSSFITTPAPVQDFFETVATEVPDQDKYHLENYMWALALVNNTADALVFMDNGTEIVLQLSKDSNAPRGSSEESCTNVPVDAASPASGPEPVKASCVYITRCSAARNITEDDYRRSYCRVDNKYAGVCCPSSQIDTEVLPLI-