Monarch geneset OGS2.0

DPOGS211470
TranscriptDPOGS211470-TA1806 bp
ProteinDPOGS211470-PA601 aa
Genomic positionDPSCF300113 - 298524-406644
RNAseq coverage152x (Rank: top 53%)
Annotation
HeliconiusHMEL0150726e-8263.38% 
BombyxBGIBMGA002740-TA8e-4172.27% 
DrosophilaCG42342-PH2e-4250.60% 
EBI UniRef50UniRef50_E0VQF74e-5241.92%Collagen alpha-1, putative n=20 Tax=Coelomata RepID=E0VQF7_PEDHC
NCBI RefSeqXP_970497.28e-5843.72%PREDICTED: similar to LP07855p [Tribolium castaneum]
NCBI nr blastpgi|1892415872e-5643.72%PREDICTED: similar to LP07855p [Tribolium castaneum]
NCBI nr blastxgi|1892415875e-15649.68%PREDICTED: similar to LP07855p [Tribolium castaneum]
Group
KEGG pathwaytgu:1002246799e-24 
 K06237 (COL4A)maps-> Small cell lung cancer
    Pathways in cancer
    Amoebiasis
    Focal adhesion
    ECM-receptor interaction
InterPro domain[117-164] IPR0081605.4e-10Collagen triple helix repeat
Orthology groupMCL22679 Lepidoptera specific
Genotypes for resequenced monarchs and outgroup Danaus species

Nucleotide sequence:

>DPOGS211470-TA
ATGGGGGGGAAGCCTCCGGGCAAGAGTCCTCCGGAGAAGGAAAAGGAAAAAGAGACTAAGAAAAAATGGGAGAAATGCGAGCGCTGCCCAGCGGACCCTTGGACTTACTGTGTCGTGTTATGGTGTGCGTGTGCGATGAGCCTCATATCTAGTGGGTACAGTCTATACAAGCAGCAGGGTCTACAGGGGAGGCTGTCCTTGTTGGAGGAGCAGCATCTAGCTTTACGTAGTGCGGTCCTGGAGCCGCAGCAGCCTCTAGTGGAGCGTCTGAGGAGGGATCTTCACACGAGACCGTTGAGCTCCTGGAGAGCCAGGAGGAGTATTAGAGACTACGGCACCTGCGTTTGTCCACCAGGTCCTCCCGGGCCCCCCGGCAAGCGTGGCAAGAAGGGCAAGAAAGGTGACCCCGGTGACCCAGGCCCCCCGGGGTTGATGGGAGCTCCGGGGAAAAATGGATTCCCGGGTAGCAAGGGCGATAGAGGCGAGCGCGGCTTCATGGTAAGCGCCGCGCTGGCGATCCCCGTAGTGTCGCTAGAACCCTTGCGCCTCATTAACCTTACATCAACCATTTACAAATTGCCTAACCATGTCCCGCGTCCTAACTCTAGATATATCTGTAGTTTGACTTACAAACATATGATTAAGGGCCCTATAGGACTGGACGGACCTAAAGGAGATCCGGGTCGGCCGGGGGACAAGGGACAAAAAGGAGAACATGGCAGTCCAGGCTTTGATGTTTTCTCTGCAGTGAAGGGAGTCAAAAGATCAGTGGACAACTATAAGATGAGCCCCTACACGACCGCAGAGATCATAGCCGTTAAGGCCCTGCAGGCGACGGGGCACAACATCTCAGCGCAGTCCGTCATACAGTTGAAGGGGGAACCTGGAGAGCCGGGACCTCCGGGACCACCCGGACCAACAGGAGCAGAAGGTGTTGCTGGAGCAGAAGGACGCGTGGGTCCTGCGGGGACGCCCGGTCCTCCTGGACCAATAGGCCCTACGGGGCCTGCAGGATCTGCTGGACCGATAGGGCCCCCAGGACCAGTAGGACATAAGGGAGACAAGGGAGACAAGGGTGAACGTGGTTTCACGACGACACTGAAAGGCGATGCGTTCCCAACTGGCATCATCGAGGGTCCACCAGGTCCCCCCGGGCCTCCCGGGGCGGAAGGTGCGCGCGGCGAGCGCGGAGCGGGGGGTGCTCCCGGCCCCCCCGGGGAGCGCGGCGCGAGAGGCAAGCGGGGCAAGCGGGTAACACCACCCACTTCTGAATACGACCGCTATTGTGCGGTAGGCAAGGAAGGTGCGTCAGGACCTCGCGGACCGCCTGGTTCGGACGGCCGACCCGGGGTCGCCGGGGTTCCAGGCCCGCCGGGAAAACCGGGAGAAATTGGACCAAAGGGTGAAAAGGGCGACTACGGTGACATGGGGTCCCCGGGCATGCTCGGAGCTCCGGGACTTCCTGGACCCCCGGGATACCCAGGCCTTAAGGGGGAGAAAGGAGACAAGGGGGACTCGGGAGACGGGACCGGGTACGAGCTTTATGGACACGAACTGATGATGGGCCCCCCGGGCTCGCCGGGCCCCGCGGGTCCCCCGGGCGTGGCGGGCCCGCCCGGTATCAAGGGCGACAAGGGCGAGCCCGGAACACGCGGCAAGACTGGTGAGCGCGGAGAGAAAGGTGACCCAGGACCCATGGGACTCCCGGGCCCAGTAGGTCTCCCGGGGGAGGCGGGCGAGCCGGGCCGGCCGGGCGATACGGGGCCGAGGGAGAACCGCTGGCCTCCGGACTTCGCCTTCACGTAG

Protein sequence:

>DPOGS211470-PA
MGGKPPGKSPPEKEKEKETKKKWEKCERCPADPWTYCVVLWCACAMSLISSGYSLYKQQGLQGRLSLLEEQHLALRSAVLEPQQPLVERLRRDLHTRPLSSWRARRSIRDYGTCVCPPGPPGPPGKRGKKGKKGDPGDPGPPGLMGAPGKNGFPGSKGDRGERGFMVSAALAIPVVSLEPLRLINLTSTIYKLPNHVPRPNSRYICSLTYKHMIKGPIGLDGPKGDPGRPGDKGQKGEHGSPGFDVFSAVKGVKRSVDNYKMSPYTTAEIIAVKALQATGHNISAQSVIQLKGEPGEPGPPGPPGPTGAEGVAGAEGRVGPAGTPGPPGPIGPTGPAGSAGPIGPPGPVGHKGDKGDKGERGFTTTLKGDAFPTGIIEGPPGPPGPPGAEGARGERGAGGAPGPPGERGARGKRGKRVTPPTSEYDRYCAVGKEGASGPRGPPGSDGRPGVAGVPGPPGKPGEIGPKGEKGDYGDMGSPGMLGAPGLPGPPGYPGLKGEKGDKGDSGDGTGYELYGHELMMGPPGSPGPAGPPGVAGPPGIKGDKGEPGTRGKTGERGEKGDPGPMGLPGPVGLPGEAGEPGRPGDTGPRENRWPPDFAFT-