Reconstruction of a Complete Dataset from an Incomplete Dataset by ARA (Attribute Relation Analysis): Some Results

Sameer S. Prabhune; S. R. Sathe

Call for Papers - Ongoing round of submission, notification and publication.

Home | Login or Register | Contact CSC

Home > CSC-OpenAccess Library > Manuscript Information

Full Text Available
(no registration required)

(71.23KB)

-- CSC-OpenAccess Policy

-- Creative Commons Attribution NonCommercial 4.0 International License

>> COMPLETE LIST OF JOURNALS

EXPLORE PUBLICATIONS BY COUNTRIES


	EUROPE

	MIDDLE EAST

	ASIA

	AFRICA
.............................

	United States of America

	United Kingdom

	Canada

	Australia

	Italy

	France

	Brazil

	Germany

	Malaysia

	Turkey

	China

	Taiwan

	Japan

	Saudi Arabia

	Jordan

	Egypt

	United Arab Emirates

	India

	Nigeria

Reconstruction of a Complete Dataset from an Incomplete Dataset by ARA (Attribute Relation Analysis): Some Results

Sameer S. Prabhune, S. R. Sathe

Pages - 35 - 42 | Revised - 31-01-2011 | Published - 08-02-2011

Published in International Journal of Data Engineering (IJDE)

Volume - 1 Issue - 5 | Publication Date - January / February Table of Contents

MORE INFORMATION

References | Abstracting & Indexing

KEYWORDS

Data mining., Data preprocessing, Missing data

ABSTRACT

Preprocessing is crucial steps used for variety of data warehousing and mining Real world data is noisy and can often suffer from corruptions or incomplete values that may impact the models created from the data. Accuracy of any mining algorithm greatly depends on the input data sets. Incomplete data sets have become almost ubiquitous in a wide variety of application domains. Common examples can be found in climate and image data sets, sensor data sets and medical data sets. The incompleteness in these data sets may arise from a number of factors: in some cases it may simply be a reflection of certain measurements not being available at the time; in others the information may be lost due to partial system failure; or it may simply be a result of users being unwilling to specify attributes due to privacy concerns. When a significant fraction of the entries are missing in all of the attributes, it becomes very difficult to perform any kind of reasonable extrapolation on the original data. For such cases, we introduce the novel idea of attribute weightage, in which we give weight to every attribute for prediction of the complete data set from incomplete data sets, on which the data mining algorithms can be directly applied. The attraction behind the idea of weights on attribute and finally averaging it. We demonstrate the effectiveness of the approach on a variety of real data sets. This paper describes a theory and implementation of a new filter ARA (Attribute Relation Analysis) to the WEKA workbench, for finding the complete dataset from an incomplete dataset.

ABSTRACTING & INDEXING

1	Google Scholar

2	CiteSeerX

3	Scribd

4	SlideShare

5	PdfSR

REFERENCES

C. J. Date and H. Darwen, “The Default Values approach to Missing Information,” Relational Database Writings 1989-1991, pp.343-354, 1989.

http://weka.sourceforge.net/wiki/index.php/CVS

http://weka.sourceforge.net/wiki/index.php/Eclipse_3.0.x

http://weka.sourceforge.net/wiki/index.php/Writing_your_own_Filter

Ian H. Witten and Eibe Frank , “Data Mining: Practical Machine Learning Tools and Techniques”Second Edition, Morgan Kaufmann Publishers. ISBN: 81-312-0050-7.

J. L. Schafer, “Analysis of Incomplete Multivariate Data”, Monographs on Stat and Applied Prob. 72,Chapman and Hall/CRC.

J. Quinlan, “C4.5: Programs for Machine Learning”, San Mateo, Calif.: Morgan Kaufmann, 1993.

J. W. Grzymala-Busse and M.Hu. “A comparison of Several Approaches to Missing Attribute Values in Data Mining, Rough Sets and Current Trends in Computing”, 378-385, 2000.

R.J.A. Little and D. Rubin. “Statastical Analysis with Missing Data”. Ch. 3,pp-42-53,Wiley Series in Prob. and Stat., 2002.

S. Mehta,, S. Parthsarthy and H. Yang “Toward Unsupervised correlation preserving discretization”,IEEE Trans. Knowledge and Data Eng.pp.1174-1185 ,2005.

S.Parthsarthy and C.C. Aggarwal, “On the Use of Conceptual Reconstruction for Mining Massively Incomplete Data Sets”,IEEE Trans. Knowledge and Data Eng., pp. 1512-1521,2003.

UCI Machine Learning Repository, http://www.ics.uci.edu/umlearn/MLsummary.html

weka.filters.SimpleBatchFilter

weka.filters.SimpleStreamFilter

wekaWiki link : http://weka.sourceforge.net/wiki/index.php/Main_Page

X. Zhu and X. Wu, “ Cost Constrained Data Acquisition for Intelligent Data Preparation”, IEEE Transactions on Knowledge and Data Engineering, Vol.17, Number 11, pp.1542-1556.

MANUSCRIPT AUTHORS

Mr. Sameer S. Prabhune

Shri Sant gajanan Maharaj college Of Engineering, Shegaon - India

ssprabhune@ssgmce.ac.in

Dr. S. R. Sathe

Visveswarayya National Institute Of Technology, Nagpur - India

CREATE AUTHOR ACCOUNT

LAUNCH YOUR SPECIAL ISSUE

View all special issues >>

PUBLICATION VIDEOS