M. Stat. 505

M. Stat. – 505
Data Mining
Full marks – 75
(Examination 60, Tutorial/Terminal 11.25, and Attendance 3.75)
Number of Lectures – Minimum 45
(Duration of Examination: 4 Hours)

****

Aim of the Course

Data that has relevance for managerial decisions is accumulating at an incredible rate due to a host of technological advances. From this flood of digital data we have to extract meaningful information and knowledge for the development of business, government and scientific community. Data mining is a class of analytical techniques that examine a large amount of data to discover new and valuable information. This course is design is to introduce the core concepts of data mining.

Objectives of the Course

After completing this course, the students should

⇒

understand the basic concept about data mining;

⇒

able to explore categorical and numerical data and also apply proper technique for pre-processing the data;

⇒

enlighten fundamental concepts and algorithms for supervised learning, unsupervised learning and semi-supervised learning, to provide the students with the necessary background for the application of data mining to real problems.

⇒

develop and apply critical thinking, problem-solving, and decision-making skills

Learning Outcomes

At the end of the course, the student will be able to

⇒

know what is data mining and how it is used;

⇒

know the different technique for pre-processing the data;

⇒

know different supervised, semi-supervised and unsupervised learning method for classification, prediction and clustering the data;

⇒

know about association rules and model evaluation technique; and

⇒

know how to use data mining technique for real data analysis and its interpretation.

************

Course Contents

Introduction: What and why is Data Mining? Need for Human Direction of Data Mining, Fallacies of Data Mining, Data Mining Tasks, Data Mining Process, Data Preprocessing, Data Cleaning, Handling Missing Data, Identifying Misclassifications, Graphical Methods for Identifying Outliers, Data Transformation and Numerical Methods for Identifying Outliers.

Exploratory Data Analysis (EDA): Introduction, Hypothesis Testing versus Exploratory Data Analysis, Getting to Know the Data Set, Dealing with Correlated Variables, Exploring Categorical Variables, Using EDA to Uncover Anomalous Fields Exploring Numerical Variables, Exploring Multivariate Relationships, Selecting Interesting Subsets of the Data, Binning.

Unsupervised Learning: Association rules- Affinity Analysis and Market Basket Analysis, Data Representation for Market Basket Analysis, Support, Confidence, Frequent Item sets, and the A Priori Property, Genetic Algorithm.

Supervised Learning: Introduction, Decision tree, Random Forest, Ensemble Learning, Neural Network, K-nearest-Neighbor methods, Support Vector Machines, Deep Learning.

Semi-Supervised Learning: Overview of Semi-Supervised Learning, Learning from Both Labeled and Unlabeled data, Inductive and Transductive Semi-Supervised Learning, Mixture Model for Semi-Supervised Learning.

****

Main Books:
	1)	Hastie, T., Tibshirani, R. and Friedman, J. (2008). The Elements of Statistical Learning: Data mining, Inference and Prediction, Springer Series in Statistics.

	2)	Daniel T. Larose (2005). Discovering Knowledge In Data: An Introduction to Data Mining, Wiley Interscience, N.J., USA. [Kantardzic]


References:
	3)	Patricia B. Cerrito (2006). Introduction to Data Mining Using SAS ® Enterprise Miner, SAS Institute Inc., Cary, NC, USA.

	4)	Bertrand Clarke · Ernest Fokou´e · Hao Helen Zhang (2009). Principles and Theory for Data Mining and Machine Learning, Springer Science+Business Media, LLC , Dordrecht Heidelberg, Germany.


	5)	B. D. Ripley (2002). Statistical Data Mining, Springer-Verlag, New York.
	6)	S. Sumathi and S.N. Sivanandam (2006). Introduction to Data Mining and its Applications, Springer-Verlag Berlin Heidelberg.

	7)	Zhu, X. J. and Goldberg, A.B. (2009). Introduction to Semi-Supervised Learning, Morgan & Claypool Publishers.

	8)	John Shawee-Taylor and Nello Cristianini (2004). Kernel Methods for Pattern Analysis, Cambridge University Press, New York. USA.

	9)	Danial T. Larose (2006). Data Mining Methods and Models, John Wiley & Sons, Inc.

Footer Menu

Quick Links

Additional Links

Important Links