COMP09014 2019 Data Analytics and Visualisation

General Details

Full Title
Data Analytics and Visualisation
Transcript Title
Data Analytics and Visualisati
Code
COMP09014
Attendance
N/A %
Subject Area
COMP - 0613 Computer Science
Department
COEL - Computing & Electronic Eng
Level
09 - Level 9
Credit
05 - 05 Credits
Duration
Semester
Fee
Start Term
2019 - Full Academic Year 2019-20
End Term
9999 - The End of Time
Author(s)
Donny Hurley, Saritha Unnikrishnan
Programme Membership
SG_KDATA_M09 201900 Master of Science in Data Science SG_KDATA_M09 202000 Master of Science in Data Science SG_KDATA_M09 202100 Master of Science in Computing (Data Science) SG_KCOMP_N09 202300 Postgraduate Certificate in Computing (Data Science)
Description

This module covers the data analysis and visualisation skills required for a Masters in Data Science. This topic will introduce the learner to the SOTA data analysis tools and techniques, which help to interpret and extract meaningful information from data (using public datasets). The learner will gain expertise in data preprocessing, exploratory data analysis and visualisation, pattern recognition and discriminative classification. The learner will work on designing and creating interactive dashboards for data visualisation and also on solving real-world data analysis and classification problems.

Learning Outcomes

On completion of this module the learner will/should be able to;

1.

Apply techniques such as feature scaling, standardisation, missing data handling and encoding to preprocess and clean data.  

2.

Visualise the processed data graphically, identify the correlation between the features, interpret the linear/non-linear relationships in the data.

3.

Make meaningful inferences from the data, remove/retain features based on variable importance.

4.

Create interactive dashboard for data visualisation.

5.

Identify patterns in the data using exploratory data analysis and clustering techniques.

6.

Discriminate patterns in new data using trained discriminative classification models.

7.

Summarise, analyse, and relate research in the area of exploratory data analysis and pattern recognition in writing. Appreciate the data ethics and constraints that apply to the use of data in real-world scenarios.

8.

Design, implement and test a real world problem using the above learned techniques.

Teaching and Learning Strategies

The theory part of the key topics will be delivered through lectures.

The code demonstrations and execution will be performed using Jupyter notebooks. .

Module Assessment Strategies

Three individual assignments and a final project are given to assess the Learning outcomes.

20% data preprocessing assessment

25% data visualisation and analysis assessment

10% research assessment

45% project to find and solve a research problem/use case provided by the learner in the data visualisation area. 

 

 

Repeat Assessments

Repeat the research assessment and project.

Indicative Syllabus

Data preprocessing:

  • Feature scaling and standardisation 
  • Handling Noise / Missing data
  • Handling categorical data and label encoding
  • Splitting the dataset into training and testing sets

Data visualisation:

  • Graphics fundamentals
  • How data is visually encoded for human perception and understanding
  • Mapping visualisation techniques to specific datasets
  • 2D and 3D visualisation
  • Python/R libraries will be used
  • Data sets, Kaggle will be used as data sources
  • Data visualisation tools like Tableau and Power BI will be introduced

Pattern recognition:

  • How to perform exploratory data analysis
  • How to identify interesting patterns in the data
  • Unsupervised clustering techniques such as K-means and Principal Component Analysis (PCA) will be discussed.
  • Programming languages such as Python/R (but not restricted to) will be used..

Discriminative Classification:

  • How to classify new data into categories using supervised machine learning techniques
  • Logistic Regression, Linear Discriminant Analysis (LDA), Decision trees/Random Forest will be discussed
  • Python/R libraries (but not restricted to) will be used

 

Coursework & Assessment Breakdown

End of Semester / Year Formal Exam
100 %

Coursework Assessment

Title Type Form Percent Week Learning Outcomes Assessed
1 Moodle Quiz Coursework Assessment Multiple Choice/Short Answer Test 20 % Week 6 1,2
2 Problem based assignment Coursework Assessment Assignment 25 % Week 9 3,4,5
3 Written assignment on literature review Coursework Assessment Written Report/Essay 10 % Week 11 7
4 Final project Project Individual Project 45 % Week 13 1,2,3,4,5,6,8

Online Learning Mode Workload


Type Location Description Hours Frequency Avg Workload
Lecture Online Lecture 1 Weekly 1.00
Practical / Laboratory Online Practical 2 Weekly 2.00
Independent Learning Not Specified Independent Learning 4 Weekly 4.00
Total Online Learning Average Weekly Learner Contact Time 3.00 Hours

Required & Recommended Book List

Required Reading
2007 Competing on Analytics Harvard Business Press
ISBN 9781422103326 ISBN-13 1422103323

You have more information at hand about your business environment than ever before. But are you using it to out-think your rivals? If not, you may be missing out on a potent competitive tool. In Competing on Analytics: The New Science of Winning, Thomas H. Davenport and Jeanne G. Harris argue that the frontier for using data to make decisions has shifted dramatically. Certain high-performing enterprises are now building their competitive strategies around data-driven insights that in turn generate impressive business results. Their secret weapon? Analytics: sophisticated quantitative and statistical analysis and predictive modeling. Exemplars of analytics are using new tools to identify their most profitable customers and offer them the right price, to accelerate product innovation, to optimize supply chains, and to identify the true drivers of financial performance. A wealth of examplesfrom organizations as diverse as Amazon, Barclays, Capital One, Harrahs, Procter & Gamble, Wachovia, and the Boston Red Soxilluminate how to leverage the power of analytics.

Required Reading
2016-07-20 Computer Age Statistical Inference Cambridge University Press
ISBN 9781107149892 ISBN-13 1107149894

Take an exhilarating journey through the modern revolution in statistics with two of the ringleaders.

Required Reading
2011-04-18 Data Mining and Statistics for Decision Making Wiley
ISBN 0470688297 ISBN-13 9780470688298

Data mining is the process of automatically searching large volumes of data for models and patterns using computational techniques from statistics, machine learning and information theory; it is the ideal tool for such an extraction of knowledge. Data mining is usually associated with a business or an organization's need to identify trends and profiles, allowing, for example, retailers to discover patterns on which to base marketing objectives. This book looks at both classical and recent techniques of data mining, such as clustering, discriminant analysis, logistic regression, generalized linear models, regularized regression, PLS regression, decision trees, neural networks, support vector machines, Vapnik theory, naive Bayesian classifier, ensemble learning and detection of association rules. They are discussed along with illustrative examples throughout the book to explain the theory of these methods, as well as their strengths and limitations. Key Features: Presents a comprehensive introduction to all techniques used in data mining and statistical learning, from classical to latest techniques. Starts from basic principles up to advanced concepts. Includes many step-by-step examples with the main software (R, SAS, IBM SPSS) as well as a thorough discussion and comparison of those software. Gives practical tips for data mining implementation to solve real world problems. Looks at a range of tools and applications, such as association rules, web mining and text mining, with a special focus on credit scoring. Supported by an accompanying website hosting datasets and user analysis. Statisticians and business intelligence analysts, students as well as computer science, biology, marketing and financial risk professionals in both commercial and government organizations across all business and industry sectors will benefit from this book.

Module Resources