COMP07190 2020 Data Mining

General Details

Full Title

Data Mining

Transcript Title

Data Mining

Code

COMP07190

Attendance

N/A %

Subject Area

COMP - 0613 Computer Science

Department

HEAL - Health & Nutritional Sciences

Level

07 - Level 7

Credit

05 - 05 Credits

Duration

Semester

Fee

€

Start Term

2020 - Full Academic Year 2020-21

End Term

9999 - The End of Time

Author(s)

Padraig McGourty, Thomas Smyth, Richeal Burns, Dr. Sasirekha Palaniswamy Lecturer

Programme Membership

SG_SINFO_B07 202000 Bachelor of Science in Health and Medical Information Science SG_SDATA_E07 202000 Certificate in Health Data Analytics

Description

This module aims to introduce basic concepts, principles, methods and techniques of Data Mining and its applications. It will help develop skills and techniques for practical applications of data mining and engage in the pattern discovery on big data. The importance of pattern discovery and interesting applications of data mining will be discussed. Data mining tasks such as Clustering, Classification, Rule learning and Data mining processes namely Data preparation, task identification and classification/prediction algorithms will be presented. Machine learning algorithms, Neural networks, clustering approaches and text mining applications in Big data will be introduced.

Learning Outcomes

On completion of this module the learner will/should be able to;

To understand basic concepts, principles, methods and techniques of Data Mining and its applications

To develop skills and techniques for practical applications of data mining and engage in pattern discovery on big data (Data Exploration)

To apply techniques in Knowledge Discovery in Databases: From data to knowledge using data mining tools and techniques (Data Mining).

To apply techniques in visualization of data to aid data mining and displaying the results of data mining (Data Presentation).

Teaching and Learning Strategies

Teaching and learning for this module will be carried out through a combination of online lectures, computer based critical appraisal and online practical's. Blended learning approaches will be adapted consistent with digital learning paradigms.

Online delivery of 1 lecture per week with self directed learning. Guidance provided on relevant areas for self directed learning.

Online delivery of 2 hour workshop weekly, where students will be directed to complete interactive type activities to enhance their study skills and knowledge.

Question and answer sessions provided in the live classroom.

A variety of methods of instruction such as discussion, group work, interactive exercises, use of online resources and/or use of audio/visual material will be provided. Core skills will be embedded into all modules to ensure all students have an equal opportunity to succeed. This may include academic writing, oral presentations, reading techniques or research abilities. Accessible materials will be provided to students, including slides, documents, audio/visual material and textbooks enabling students slow down speed up recordings etc in accordance with universal distance learning.

All module content will be based on the principles of UDL to ensure equitable access to content and learning.

Module Assessment Strategies

This module will be assessed by both a final project (50%) and continuous assessment (50%)

Repeat Assessments

Repeat examination will follow a similar format as applicable.

Indicative Syllabus

Introduction to basic concepts, principles, methods and techniques of Data Mining and its applications.
Introduction to tools, methods and techniques for practical applications of data mining
Practical application of pattern/knowledge discovery on big data.
Working with APIs
The importance of pattern discovery and interesting applications of data mining
Data mining tasks such as Clustering, Classification, Rule learning
Data mining processes namely Data preparation, task identification and classification/prediction algorithms.
Machine learning algorithms, Neural networks, clustering approaches and text mining applications in Big data will be introduced.
Knowledge discovery in databases
Privacy, Security and Legal aspects of Data Mining
Data Mining applications - eg healthcare, retail etc
Pitfalls of Data Mining

Coursework & Assessment Breakdown

Coursework & Continuous Assessment

100 %

Coursework Assessment

	Title	Type	Form	Percent	Week	Learning Outcomes Assessed
1	Data Mining - Assessment	Coursework Assessment	Assessment	50 %	OnGoing	1,2
2	Data Mining - Project	Project	Project	50 %	End of Semester	2,3,4

Online Learning Mode Workload

Type	Location	Description	Hours	Frequency	Avg Workload
Lecture	Online	Data Mining - Lecture	1	Weekly	1.00
Problem Based Learning	Online	Data Mining - PBL	2	Weekly	2.00
Independent Learning	Not Specified	Independent study	4	Weekly	4.00

Total Online Learning Average Weekly Learner Contact Time 1.00 Hours

Required & Recommended Book List

Required Reading
2011-08-08 Data Mining John Wiley & Sons
ISBN 9781118029121 ISBN-13 1118029127

This book reviews state-of-the-art methodologies and techniques for analyzing enormous quantities of raw data in high-dimensional data spaces, to extract new information for decision making. The goal of this book is to provide a single introductory source, organized in a systematic way, in which we could direct the readers in analysis of large data sets, through the explanation of basic concepts, models and methodologies developed in recent decades. If you are an instructor or professor and would like to obtain instructors materials, please visit http://booksupport.wiley.com If you are an instructor or professor and would like to obtain a solutions manual, please send an email to: pressbooks@ieee.org

Required Reading
2011-04-18 Data Mining and Statistics for Decision Making Wiley
ISBN 0470688297 ISBN-13 9780470688298

Data mining is the process of automatically searching large volumes of data for models and patterns using computational techniques from statistics, machine learning and information theory; it is the ideal tool for such an extraction of knowledge. Data mining is usually associated with a business or an organization's need to identify trends and profiles, allowing, for example, retailers to discover patterns on which to base marketing objectives. This book looks at both classical and recent techniques of data mining, such as clustering, discriminant analysis, logistic regression, generalized linear models, regularized regression, PLS regression, decision trees, neural networks, support vector machines, Vapnik theory, naive Bayesian classifier, ensemble learning and detection of association rules. They are discussed along with illustrative examples throughout the book to explain the theory of these methods, as well as their strengths and limitations. Key Features: Presents a comprehensive introduction to all techniques used in data mining and statistical learning, from classical to latest techniques. Starts from basic principles up to advanced concepts. Includes many step-by-step examples with the main software (R, SAS, IBM SPSS) as well as a thorough discussion and comparison of those software. Gives practical tips for data mining implementation to solve real world problems. Looks at a range of tools and applications, such as association rules, web mining and text mining, with a special focus on credit scoring. Supported by an accompanying website hosting datasets and user analysis. Statisticians and business intelligence analysts, students as well as computer science, biology, marketing and financial risk professionals in both commercial and government organizations across all business and industry sectors will benefit from this book.

Required Reading
20/12/2012 Data Mining - Concepts and Techniques Morgan Kaufmann Series in Data Management Systems

Required Reading
20/06/2020 Data Mining Methods and Models Wiley

Required Reading
2003-05-29 Exploratory Data Mining and Data Cleaning Wiley-Interscience
ISBN 0471268518 ISBN-13 9780471268512

Written for practitioners of data mining, data cleaning and database management. Presents a technical treatment of data quality including process, metrics, tools and algorithms. Focuses on developing an evolving modeling strategy through an iterative data exploration loop and incorporation of domain knowledge. Addresses methods of detecting, quantifying and correcting data quality issues that can have a significant impact on findings and decisions, using commercially available tools as well as new algorithmic approaches. Uses case studies to illustrate applications in real life scenarios. Highlights new approaches and methodologies, such as the DataSphere space partitioning and summary based analysis techniques. Exploratory Data Mining and Data Cleaning will serve as an important reference for serious data analysts who need to analyze large amounts of unfamiliar data, managers of operations databases, and students in undergraduate or graduate level courses dealing with large scale data analys is and data mining.

Module Resources

Other Resources

Python

SQL