تبدیل توالی پروتئین به تصویر جهت طبقه¬بندی با شبکه عصبی کانولوشنی
محورهای موضوعی : عمومىرضا احسن 1 , منصور ابراهیمی 2 , روح الله دیانت 3
1 - عضو هیات علمی
2 - دانشکده علوم پایه - دانشگاه قم - قم - ایران
3 - دانشکده فنی مهندسی - دانشگاه قم - قم – ایران
کلید واژه: تبدیل توالی پروتئین به تصویر, فیلتر گابور, شبکه عصبی کانولوشنی, طبقه¬بندی توالی پروتئین.,
چکیده مقاله :
از آنجا که روشهای مخصوص طبقهبندی توالی یادگیری ماشین، جهت طبقهبندی پروتئینهای سالم و سرطانی موفق نبودند بنابراین یافتن راهکاری برای بازنمایی این توالیها جهت طبقه بندی افراد سالم و مریض با رویکردهای یادگیری عمیق ضرورت تام دارد. در این مطالعه، روشهای مختلف بازنمایی توالی پروتئین، جهت طبقهبندی توالی پروتئین افراد سالم و سرطان خون، مورد بررسی قرار گرفته است. نتایج نشان داد که تبدیل حروف اسید آمینه به بردار ویژگی یکبعدی در طبقه بندی 2 کلاس موفق نبود و فقط یک کلاس مریض تشخیص داده شد. با تغییر بردار ویژگی بهصورت اعداد رنگی دقت تشخیص کلاس سالم کمی بهبود یافت. روش بازنمایی توالی پروتئینی بهصورت یکپارچه دودویی، با ابتکار حفظ دنباله توالی در دو حالت یکبعدی و دوبعدی(تصویر با اعمال فیلتر گابور)، نسبت به روشهای قبلی موثرتر بود. بازنمایی توالی پروتئین به شکل تصویر دودویی با اعمال فیلتر گابور با دقت 100% توالی پروتئین افراد سالم و 98.6% توالی پروتئین افراد دارای سرطان خون را طبقهبندی کرد. یافتههای این تحقیق نشان داد که بازنمایی توالی پروتئین به شکل تصویر دودویی با اعمال فیلتر گابور، میتواند بهعنوان روش موثر جدید دربازنمایی توالیهای پروتئینی جهت طبقهبندی، ارایه نماید.
Since methods for sequencing machine learning sequences were not successful in classifying healthy and cancerous proteins, it is imperative to find a way to represent these sequences to classify healthy and ill individuals with deep learning approaches. In this study different methods of protein sequence representation for classification of protein sequence of healthy individuals and leukemia have been studied. Results showed that conversion of amino acid letters to one-dimensional feature vectors in classification of 2 classes was not successful and only one disease class was detected. By changing the feature vector to colored numbers, the accuracy of the healthy class recognition was slightly improved. The binary protein sequence representation method was more efficient than the previous methods with the initiative of sequencing the sequences in both one-dimensional and two-dimensional (image by Gabor filtering). Protein sequence representation as binary image was classified by applying Gabor filter with 100% accuracy of the protein sequence of healthy individuals and 98.6% protein sequence of those with leukemia. The findings of this study showed that the representation of protein sequence as binary image by applying Gabor filter can be used as a new effective method for representation of protein sequences for classification
[1] A. Gupta, H. Wang, and M. Ganapathiraju, "Learning structure in gene expression data using deep architectures, with an application to gene clustering," 2015, pp. 1328-1335.
[2] Y. Liu, S. Zhou, and Q. Chen, "Discriminative deep belief networks for visual data classification," Pattern Recognition, vol. 44, pp. 2287-2296, 2011.
[3] J. Chen, R. Swofford, J. Johnson, B. B. Cummings, N. Rogel, K. Lindblad-Toh, et al., "A quantitative framework for characterizing the evolutionary history of mammalian gene expression," Genome research, vol. 29, pp. 53-63, 2019.
[4] T. Hardy, J. Feng, D. Lawrence, T. Fullston, and H. Scott, "Application of Artificial Intelligence To Analysis of The Embryonic Genome For Preimplantation Genetic Diagnosis," Pathology, vol. 51, p. S65, 2019.
[5] C. S. Boddy and S. Ma, "Frontline therapy of CLL: evolving treatment paradigm," Current hematologic malignancy reports, vol. 13, pp. 69-77, 2018.
[6] K. He, D. Ge, and M. He, "Big data analytics for genomic medicine," International journal of molecular sciences, vol. 18, p. 412, 2017.
[7] C. Angermueller, T. Pärnamaa, L. Parts, and O. Stegle, "Deep learning for computational biology," Molecular systems biology, vol. 12, p. 878, 2016.
[8] M. Leung, H. Xiong, L. Lee, and B. Frey, "Deep learning of the tissueregulated splicing code," Bioinformatics 30, pp. i121 – i129, 2014.
[9] H. Xiong, B. Alipanahi, L. Lee, H. Bretschneider, D. Merico, R. Yuen, et al., "The human splicing code reveals new insights into the genetic determinants of disease," Science 347, p. 1254806, 2015.
[10] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, "How transferable are features in deep neural networks?," Advances in Neural Information Processing Systems 27, pp. 3320-3328, 2014.
[11] B. Alipanahi, A. Delong, M. Weirauch, and B. Frey, "Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning," Nat Biotechnol 33, pp. 831 – 838, 2015.
[12] J. Zhou and O. Troyanskaya, "Predicting effects of noncoding variants with deep learning-based sequence model," Nat Methods 12, pp. 931 – 934, 2015.
[13] A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, "CNN features off-the-shelf: an astounding baseline for recognition," 2018, pp. 512-519.
[14] W. Sun, T.-L. B. Tseng, J. Zhang, and W. Qian, "Enhancing deep convolutional neural network scheme for breast cancer diagnosis with unlabeled data," Computerized Medical Imaging and Graphics, vol. 57, pp. 4-9, 2017.
[15] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, "Gene selection for cancer classification using support vector machines," Machine learning, vol. 46, pp. 389-422, 2002.
[16] M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," in European conference on computer vision, 2014, pp. 818-833.
[17] M. Biswas, A. Tiwari, M. Turk, J. Laird, C. Asare, L. Saba, et al., "A Review on a Deep Learning Perspective in Brain Cancer Classification," Cancers, vol. 11, 2019.
[18] J. Schmidhuber, "Deep learning in neural networks: An overview," Neural networks, vol. 61, pp. 85-117, 2015.
[19] J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, et al., "Recent advances in convolutional neural networks," Pattern Recognition, vol. 77, pp. 354-377, 2018.
[20] M. A. Jafri, S. A. Ansari, M. H. Alqahtani, and J. W. Shay, "Roles of telomeres and telomerase in cancer, and advances in telomerase-targeted therapies," Genome medicine, vol. 8, p. 69, 2016. "
[21] X. Chu and K. L. Chan, "Rotation and scale invariant texture analysis with tunable Gabor filter banks," in Pacific-Rim Symposium on Image and Video Technology, 2009, pp. 83-93.
[22] R. C. González, R. E. Woods, and S. L. Eddins, Digital Image Processing Using MATLAB: Pearson, 2004.
[23] I. Guyon and A. Elisseeff, "An introduction to variable and feature selection," Journal of machine learning research, vol. 3, pp. 1157-1182, 2003.
[24] H. Liu and L. Yu, "Toward integrating feature selection algorithms for classification and clustering," IEEE Transactions on Knowledge & Data Engineering, pp. 491-502, 2005.