Home   >   CSC-OpenAccess Library   >    Manuscript Information
Full Text Available

This is an Open Access publication published under CSC-OpenAccess Policy.
Publications from CSC-OpenAccess Library are being accessed from over 158 countries worldwide.
An Empirical Comparison of Supervised Learning Processes.
Sanjeev Manchanda, Mayank Dave, S. B. Singh
Pages - 21 - 38     |    Revised - 15-06-2007     |    Published - 30-06-2007
Volume - 1   Issue - 1    |    Publication Date - June 2007  Table of Contents
Data Mining, Knowledge Discovery in Databases, Supervised learning algorithms, Stacking
Data mining as a formal discipline is only two decades old, but it has registered phenomenal development and has become a mature discipline in this short span. In this paper, we present an empirical study of supervised learning processes based on empirical evaluation of different classification algorithms. We have included most of the supervised learning processes based on different pre pruning and post pruning criteria. We have included ten datasets, collected from internationally renowned agencies. Different specific models are presented and results are generated. Issues related to different processes are analyzed suitably. We also present a comparison of our study with benchmark results of different datasets and classification algorithms. We have presented results of all algorithms with fifteen different performance measures out of a set of twenty three calculated measures, making it a comprehensive study.
1 Google Scholar 
2 Academic Journals Database 
3 ScientificCommons 
4 Academic Index 
5 CiteSeerX 
6 refSeek 
7 Socol@r  
8 ResearchGATE 
9 Libsearch 
10 Bielefeld Academic Search Engine (BASE) 
11 Scribd 
12 WorldCat 
13 SlideShare 
15 PdfSR 
16 Chinese Directory Of Open Access 
1 Atlas L., Connor J., and Park D. “A performance comparison of trained multi-layer perceptrons and trained classification trees”. In Systems, man and cybernetics: proceedings of the 1989 IEEE international conference, pages 915–920, Cambridge, Ma. Hyatt Regency, 1991
2 Ayer M., Brunk H., Ewing G., Reid W. & Silverman E. “An empirical distribution function for sampling with incomplete information”. Annals of Mathematical Statistics, 5, 641-647, 1955
3 Bauer E. and Kohavi R. “An empirical comparison of voting classification algorithms: Bagging, boosting, and variants”. Machine Learning, 36, 1999
4 Berry C. C. “The kappa statistic”. Journal of the American Medical Association, Linguistics (COLING- 90), volume 2, pages 251-256, 1992
5 Blake C. and Merz C., UCI repository of machine learning databases, 1998
6 Breiman L., Friedman J. H., Olshen R. A. and Stone C. J. “Classification and Regression Trees”. Wadsworth and Brooks, Monterey, CA., 1984
7 Caruana Rich and Niculescu-Mizil Alexandru. “An Empirical Comparison of Supervised Learning Algorithms”. Proceedings of the 23 rd International Conference on Machine Learning, Pittsburgh, PA, 2006
8 Cooper G. F., Aliferis C. F., Ambrosino R., Aronis J., Buchanan B. G., Caruana R., Fine M. J., Glymour C., Gordon G., Hanusa B. H., Janosky J. E., Meek C., Mitchell T., Richardson T. and Spirtes P. “An evaluation of machine learning methods for predicting pneumonia mortality”. Artificial Intelligence in Medicine, 9, 1997
9 Fahrmeir, L., Haussler, W., and Tutz, G. “Diskriminanz analyse”. In Fahrmeir, L. and Hamerle, A., editors, Multivariate statistische Verfahren. Verlag de Gruyter, Berlin, 1984
10 Fayyad U., Piatetsky-Shapiro G. and P. Smyth. “The KDD process for extracting useful knowledge from volumes of data”. CACM 39 (11), pp. 27-34, 1996
11 Friedman J., Hastie T. and Tibshirani R. “Additive Logistic Regression: a Statistical View of Boosting”. Stanford University,1998
12 Giudici P. “Applied data mining”. John Wiley and Sons. New York, 2003
13 Gorman R. P. and Sejnowski T. J. “Analysis of hidden units in a layered network trained to classify sonar targets”. Neural networks, 1 (Part 1):75–89, 1988
14 Hofmann H. J. “Die anwendung des cart-verfahrens zur statistischen bonitatsanalyse von konsumentenkrediten”. Zeitschrift fur Betriebswirtschaft, 60:941–962, 1990
15 King R., Feng C. and Shutherland A. “Statlog: comparison of classi_cation algorithms on large real world problems”. Applied Artificial Intelligence, 9, 1995
16 Kirkwood C., Andrews B. and Mowforth P. “Automatic detection of gait events: a case study using inductive learning techniques”. Journal of biomedical engineering, 11(23):511–516, 1989
17 Komarek P., Gray A., Liu T. and Moore A. “High Dimensional Probabilistic Classification for Drug Discovery”, Biostatics, COMPSTAT, 2004
18 LeCun Y., Jackel L. D., Bottou L., Brunot A., Cortes C., Denker J. S., Drucker H., Guyon I., Muller U. A., Sackinger E., Simard P. and Vapnik V. “Comparison of learning algorithms for handwritten digit recognition”. International Conference on Artificial Neural Networks (pp. 53{60).Paris, 1995
19 Lim T. S., Loh W.-Y. and Shih Y. S. “A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms”. Machine Learning, 40, 203-228, 2000
20 Mitchell T., Buchanan B., DeJon G., Dietterich T., Rosenbloom P. and Waibel A. "Machine Learning". Annual Review of Computer Science, vol. 4, pp. 417-433, 1990
21 Niculescu-Mizil A. and Caruana R. “Predicting good probabilities with supervised learning”. Proc. 22nd International Conference on Machine Learning (ICML'05), 2005
22 Nishisato S. “Analysis of Categorical Data: Dual Scaling and its Applications”. University of Toronto Press, Toronto, 1980
23 Perlich C., Provost F. and Simono J. S. “Tree induction vs. logistic regression: a learning-curve analysis”. J. Mach. Learn. Res., 4, 211-255, 2003
24 Platt J. “Probabilistic outputs for support vector machines and comparison to regularized likelihood methods”. Adv. in Large Margin Classifiers, 1999
25 Provost F. and Domingos P. “Tree induction for probability-based rankings”. Machine Learning, 2003
26 Provost Foster J. and Kohavi Ron, “On Applied Research in Machine Learning”. Machine Learning 30 (2-3): 127-132, 1998
27 Provost F., Jensen D. and Oates T. “Efficient progressive sampling”. Fifth ACM SIGKDD, International Conference on Knowledge Discovery and Data Mining. San Diego, USA. 1999
28 Ripley B. “Statistical aspects of neural networks”. Chaos and Networks - Statistical and Probabilistic Aspects. Chapman and Hall, 1993
29 Robertson T., Wright F. and Dykstra R. “Order restricted statistical inference”. John Wiley and Sons, New York, 1988
30 Shadmehr R. and D’Argenio Z. “A comparison of a neural network based estimator and two statistical estimators in a sparse and noisy environment”. In IJCNN-90: proceedings of the international joint conference on neural networks, pages 289–292, Ann Arbor, MI. IEEE Neural Networks Council, 1990
31 Sonnenburg S, Rätsch G. and Schäfer C. “Learning interpretable SVMs for biological sequence classification”. Research in Computational Molecular Biology, Springer Verlag, pages 389-407, 2005
32 Spikovska L. and Reid M. B., “An empirical comparison of id3 and honns for distortion invariant object recognition”. In TAI-90: tools for artificial intelligence: proceedings of the 2nd international IEEE conference, Los Alamitos, CA. IEEE Computer Society Press, 1990
33 33. Witten I. H. and Frank E. “Data Mining: Practical machine learning tools and techniques with java implementations”. Morgan Kaufmann, 2000
34 Yoav Freund, Robert E. Schapire. “Experiments with a new boosting algorithm”. Thirteenth International Conference on Machine Learning, San Francisco, 148-156, 1996
35 Zadrozny B. and Elkan C. “Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers”. ICML, 2001
36 Zadrozny B. and Elkan C. “Transforming classifier scores into accurate multi-class probability estimates”. KDD, 2002
Mr. Sanjeev Manchanda
- India
Mr. Mayank Dave
- India
Mr. S. B. Singh
- India