COMP09014 2019 Data Analytics and Visualisation

General Details

Full Title

Data Analytics and Visualisation

Transcript Title

Data Analytics and Visualisati

Code

COMP09014

Attendance

N/A %

Subject Area

COMP - 0613 Computer Science

Department

COEL - Computing & Electronic Eng

Level

09 - Level 9

Credit

05 - 05 Credits

Duration

Semester

Fee

€

Start Term

2019 - Full Academic Year 2019-20

End Term

9999 - The End of Time

Author(s)

Donny Hurley, Saritha Unnikrishnan

Programme Membership

SG_KDATA_M09 201900 Master of Science in Data Science SG_KDATA_M09 202000 Master of Science in Data Science SG_KCOMP_N09 202300 Postgraduate Certificate in Computing (Data Science) SG_KDATA_M09 202100 Master of Science in Computing (Data Science)

Description

This module covers the data analysis and visualisation skills required for a Masters in Data Science. This topic will introduce the learner to the SOTA data analysis tools and techniques, which help to interpret and extract meaningful information from data (using public datasets). The learner will gain expertise in data preprocessing, exploratory data analysis and visualisation, pattern recognition and discriminative classification. The learner will work on designing and creating interactive dashboards for data visualisation and also on solving real-world data analysis and classification problems.

Learning Outcomes

On completion of this module the learner will/should be able to;

Apply techniques such as feature scaling, standardisation, missing data handling and encoding to preprocess and clean data.

Visualise the processed data graphically, identify the correlation between the features, interpret the linear/non-linear relationships in the data.

Make meaningful inferences from the data, remove/retain features based on variable importance.

Create interactive dashboard for data visualisation.

Identify patterns in the data using exploratory data analysis and clustering techniques.

Discriminate patterns in new data using trained discriminative classification models.

Summarise, analyse, and relate research in the area of exploratory data analysis and pattern recognition in writing. Appreciate the data ethics and constraints that apply to the use of data in real-world scenarios.

Design, implement and test a real world problem using the above learned techniques.

Teaching and Learning Strategies

The theory part of the key topics will be delivered through lectures.

The code demonstrations and execution will be performed using Jupyter notebooks. .

Module Assessment Strategies

Three individual assignments and a final project are given to assess the Learning outcomes.

20% data preprocessing assessment

25% data visualisation and analysis assessment

10% research assessment

45% project to find and solve a research problem/use case provided by the learner in the data visualisation area.

Repeat Assessments

Repeat the research assessment and project.

Indicative Syllabus

Data preprocessing:

Feature scaling and standardisation
Handling Noise / Missing data
Handling categorical data and label encoding
Splitting the dataset into training and testing sets

Data visualisation:

Graphics fundamentals
How data is visually encoded for human perception and understanding
Mapping visualisation techniques to specific datasets
2D and 3D visualisation
Python/R libraries will be used
Data sets, Kaggle will be used as data sources
Data visualisation tools like Tableau and Power BI will be introduced

Pattern recognition:

How to perform exploratory data analysis
How to identify interesting patterns in the data
Unsupervised clustering techniques such as K-means and Principal Component Analysis (PCA) will be discussed.
Programming languages such as Python/R (but not restricted to) will be used..

Discriminative Classification:

How to classify new data into categories using supervised machine learning techniques
Logistic Regression, Linear Discriminant Analysis (LDA), Decision trees/Random Forest will be discussed
Python/R libraries (but not restricted to) will be used

Coursework & Assessment Breakdown

End of Semester / Year Formal Exam

100 %

Coursework Assessment

	Title	Type	Form	Percent	Week	Learning Outcomes Assessed
1	Moodle Quiz	Coursework Assessment	Multiple Choice/Short Answer Test	20 %	Week 6	1,2
2	Problem based assignment	Coursework Assessment	Assignment	25 %	Week 9	3,4,5
3	Written assignment on literature review	Coursework Assessment	Written Report/Essay	10 %	Week 11	7
4	Final project	Project	Individual Project	45 %	Week 13	1,2,3,4,5,6,8

Online Learning Mode Workload

Type	Location	Description	Hours	Frequency	Avg Workload
Lecture	Online	Lecture	1	Weekly	1.00
Practical / Laboratory	Online	Practical	2	Weekly	2.00
Independent Learning	Not Specified	Independent Learning	4	Weekly	4.00

Total Online Learning Average Weekly Learner Contact Time 3.00 Hours

Required & Recommended Book List

Required Reading
2007 Competing on Analytics Harvard Business Press
ISBN 9781422103326 ISBN-13 1422103323

You have more information at hand about your business environment than ever before. But are you using it to out-think your rivals? If not, you may be missing out on a potent competitive tool. In Competing on Analytics: The New Science of Winning, Thomas H. Davenport and Jeanne G. Harris argue that the frontier for using data to make decisions has shifted dramatically. Certain high-performing enterprises are now building their competitive strategies around data-driven insights that in turn generate impressive business results. Their secret weapon? Analytics: sophisticated quantitative and statistical analysis and predictive modeling. Exemplars of analytics are using new tools to identify their most profitable customers and offer them the right price, to accelerate product innovation, to optimize supply chains, and to identify the true drivers of financial performance. A wealth of examplesfrom organizations as diverse as Amazon, Barclays, Capital One, Harrahs, Procter & Gamble, Wachovia, and the Boston Red Soxilluminate how to leverage the power of analytics.

Required Reading
2016-07-20 Computer Age Statistical Inference Cambridge University Press
ISBN 9781107149892 ISBN-13 1107149894

Take an exhilarating journey through the modern revolution in statistics with two of the ringleaders.

Required Reading
2011-04-18 Data Mining and Statistics for Decision Making Wiley
ISBN 0470688297 ISBN-13 9780470688298

Data mining is the process of automatically searching large volumes of data for models and patterns using computational techniques from statistics, machine learning and information theory; it is the ideal tool for such an extraction of knowledge. Data mining is usually associated with a business or an organization's need to identify trends and profiles, allowing, for example, retailers to discover patterns on which to base marketing objectives. This book looks at both classical and recent techniques of data mining, such as clustering, discriminant analysis, logistic regression, generalized linear models, regularized regression, PLS regression, decision trees, neural networks, support vector machines, Vapnik theory, naive Bayesian classifier, ensemble learning and detection of association rules. They are discussed along with illustrative examples throughout the book to explain the theory of these methods, as well as their strengths and limitations. Key Features: Presents a comprehensive introduction to all techniques used in data mining and statistical learning, from classical to latest techniques. Starts from basic principles up to advanced concepts. Includes many step-by-step examples with the main software (R, SAS, IBM SPSS) as well as a thorough discussion and comparison of those software. Gives practical tips for data mining implementation to solve real world problems. Looks at a range of tools and applications, such as association rules, web mining and text mining, with a special focus on credit scoring. Supported by an accompanying website hosting datasets and user analysis. Statisticians and business intelligence analysts, students as well as computer science, biology, marketing and financial risk professionals in both commercial and government organizations across all business and industry sectors will benefit from this book.