PREDICTION OF THE PROTEIN O-GLYCOSYLATION SITES BY COMBINING SUPPORT VECTOR MACHINES AND INDEPENDENT COMPONENT ANALYSIS
Journal: Asian Pacific Journal of Microbiology Research (AJMR)
Author: Xue Mei Yang, Zhen Su
This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited
O-glycosylation is one of the main types of the mammalian protein glycosylation, it occurs on the particular site of serine and threonine, and has important functions in secretion, antigenicity, and metabolism of glycoproteins, it is very important to predict the O-glycosylation sites for pharmacy, food, and disease control. To improve the prediction accuracy, we proposed a new method of ICA+SVM. The samples(protein sequence) for experiment are encoded by the sparse coding with window size w=21, 120 independent components(feature) are extracted by independent component analysis(ICA), and inputed to the support vector machines(SVM), then the prediction(classification) is done in feature space by SVM. The results of experiment show that the performance of ICA+SVM is better than that of PCA+SVM and SVM. The prediction accuracy is about 88%. Furthermore, we investigated the same protein sequence under various window size, the results indicate that the longer the length of protein sequence, the higher the prediction accuracy.