Home   >   CSC-OpenAccess Library   >    Manuscript Information
Outlier Modification and Gene Selection for Binary Cancer Classification using Gaussian Linear Bayes Classifier
Md. Hadiul Kabir, Md. Nurul Haque Mollah
Pages - 13 - 24     |    Revised - 31-08-2015     |    Published - 30-09-2015
Volume - 9   Issue - 2    |    Publication Date - September 2015  Table of Contents
Gene Expression, Outlier Modification, Top DE Genes Selection, Binary Classification, Gaussian Bayes Classifier, Misclassification Error Rate (MER).
Gaussian linear Bayes classifier is one of the most popular approaches for classification. However, it is not so popular for cancer classification using gene expression data due to the inverse problem of its covariance matrix in presence of large number of gene variables with small number of cancer patients/samples in the training dataset. To overcome these problems, we propose few top differentially expressed (DE) genes from both upregulated and downregulated groups for binary cancer classification using the Gaussian linear Bayes classifier. Usually top DE genes are selected by ranking the p-values of t-test procedure. However, both t-test statistic and Gaussian linear Bayes classifier are sensitive to outliers. Therefore, we also propose outlier modification for gene expression dataset before applying to the proposed methods, since gene expression datasets are often contaminated by outliers due to several steps involves in the data generating process from hybridization to image analysis. The performance of the proposed method is investigated using both simulated and real gene expression datasets. It is observed that the proposed method improves the performance with outlier modifications for binary cancer classification.
1 Google Scholar 
2 CiteSeerX 
3 refSeek 
4 Scribd 
5 SlideShare 
6 PdfSR 
A. Azuaje. "Interpretation of genome expression patterns: computational challenges and pportu-nities.” IEEE Engineering in Medicine and Biology, 2000.
A. Ben-Dor, L. Bruhn, N. Friedman, I. Nachman, M. Schummer, and Z. Yakhini. “Tissue classication with gene expression profiles.” In Proc. of the Fourth Annual Int. Conf. on Computational Molecular Biology, 2000.
A. Berns. “Cancer: Gene expression in diagnosis.” Nature, pages 491–492, Feb 2000.
A. Bharathi, A. M. Natarajan. “Cancer Classification of Bioinformatics data using ANOVA” International Journal of Computer Theory and Engineering, Vol. 2, No. 3, June, 2010.
A. Sharma, and K.K. Paliwal. “Cancer classification by gradient LDA technique using microarray gene expression data.” Data Knowl. Eng., vol. 66, pp. 338-347, 2008.
A. Statnikov, L. Wang and C. F. Aliferis. “A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification.” Journal BMC bioinformatics, 2008.
A.C. Tan and D. Gilbert. ”Ensemble machine learning on gene expression data for cancer classification.” Applied Bioinform., vol. 2, pp. S75-83, 2003.
D. Nguyen and D. Rocke. “Classification of Acute Leukemia based on DNA Microarray Gene Expressions using Partial Least Squares.” Kluwer Academic, 2002.
D. Slonim, P. Tamayo, J. Mesirov, T. Golub, and E. Lander. “Class prediction and discovery using gene expression data.” In Proc. 4th Int. Conf. on Computational Molecular Biology(RECOMB), pages 263–272, 2000.
Desheng Huang, Yu Quan, Miao He and Baosen Zhou. “Comparison of linear discriminant analysis methods for the classification of cancer based on gene expression data.” Journal of Experimental & Clinical Cancer Research, 28:149, 2009.
F.R. Hampel, E.M. Ronchetti, P.J. Rousseeuw and W.A. Stahel. “Robust Statistics: The Approach Based on Influence Functions.” John Wiley and Sons: New York, 1986.
G. Cong, K.L. Tan, A.K.H. Furey, T.S., N. Cristianini, N. Duffy, D.W. Bednarski and M. Schummer et al. “Support vector machine classification and validation of cancer tissue samples using microarray expression data.” Bioinformatics, vol. 16, pp. 906-914, 2005.
G.J. Gordon, R.V. Jensen, L.L. Hsiao, S.R. Gullans and J.E. Blumenstock et al. “Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma.” Cancer Res., vol. 62, pp. 4963-4967, 2002.
H. Rattikorn, K. Phongphun, “Tumor classification ranking from microarray data.” BMC genomics journal, vol. 9, pp. s21, September 2008.
H.A.L. Thi, V.V. Nguyen and S. Ouchani. “Gene selection for cancer classification using DCA.” Adv. Data Min. Appli., vol. 5139, pp. 62-72, 2008.
I. Guyon, J. Weston, S. Barnhill, M. D., and V. Vapnik. “Gene selection for cancer classification using support vector machines.” Machine Learning, 2000.
Jafari,P. and Azuaje,F. (2006) An assessment of recently published gene expression data analyses: reporting experimental design and statistical factors. BMC Medical Informatics and Decision Making, Vol. 6.
Kun-Huang Chen, Kung-Jeng Wang, Min-Lung Tsai, Kung-Min Wang, Angelia Melani Adrian, Wei-Chung Cheng, Tzu-Sen Yang, Nai-Chia Teng, Kuo-Pin Tan and Ku-Shang Chang. “Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithms.” BMC Bioinformatics, 15:49, 2014.
L. Ziaei, A. R. Mehri, M. Salehi. " Application of Artificial Neural Networks in Cancer Classification and Diagnosis Prediction of a Subtype of Lymphoma Based on Gene Expression Profile." Journal of Research in Medical Sciences, vol. 11, No. 1, Jan. & Feb. 2006.
Liang-Tsung Huang. “An integrated method for cancer classification and rule extraction from microarray data.” Journal of Biomedical Science, 2009.
Liu,H., et al. (2002) A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Inform., 13, 51–60.
P. J. Huber. Robust Statistics. John Wiley and Sons: New York, 2004.
P. J. Rousseeuw and A. M. Leroy. Robust Regression and Outlier Detection. Wiley: New York, 1987.
S. Dudoit, J f Fridlyand, T. P Speed. “Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data.” Journal of the American Statistical Association, vol. 97, No. 457, pp. 77-87, Mar. 2002.
S. Lakhani and A. Ashworth. “Microarray and histopathological analysis of tumours: the future the past?” Nature Reviews Cancer, pages 151–157, Nov 2001.
Sandrine Dudoit, Jane Fridlyand, and Terence P. Speed “Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data” Journal of the American Statistical Association, Vol. 97, No. 457, Applications and Case Studies, March 2002.
Sharma, A., C.H. Koh, S. Imoto and S. Miyano. “Strategy of finding optimal number of features on gene expression data.” Elect. Lett., vol. 47, pp. 480-482, 2011a.
T.R. Golub, D.K. Slonim, P. Tamayo, M. Gaasenbeek C. Huard, J.P. Mesirov, H. Coller, M. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander. “Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring.” Science, pages 531–537, Oct 1999.
T.W. Anderson. An Introduction to Multivariate Statistical Analysis, Wiley Interscience, 2003.
Ubharup Guha, Yuan Ji and Veerabhadran Baladandayuthapani. “Bayesian Disease Classification Using Copy Number Data.” Cancer Informatics, vol. 13 (S2), pp. 83–91, 2014.
V. Van’t, , L.J. Dai, H. Van de, M.J. Vijver and Y.D. He et al. “Gene expression profiling predicts clinical outcome of breast cancer.” Lett. Nature. Nature, vol. 415, pp. 530-536, 2002.
Wu,B., et al. (2003) Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data. Bioinformatics, 19, 1636–1643.
Y. Wang, I.V. Tetko, M.A. Hall, E. Frank and A. Facius et al. “Gene selection from microarray data for cancer classification - a machine learning approach.” Comput. Biol. Chem., vol. 29, pp. 37-46, 2005.
Mr. Md. Hadiul Kabir
Department of Statistics, University of Rajshahi, Bangladesh - Bangladesh
Professor Md. Nurul Haque Mollah
University of Rajshahi, Bangladesh - Bangladesh

View all special issues >>