COMP09014 2019 Data Analytics and Visualisation
This module covers the data analysis and visualisation skills required for a Masters in Data Science. This topic will introduce the learner to the SOTA data analysis tools and techniques, which help to interpret and extract meaningful information from data (using public datasets). The learner will gain expertise in data preprocessing, exploratory data analysis and visualisation, pattern recognition and discriminative classification. The learner will work on designing and creating interactive dashboards for data visualisation and also on solving real-world data analysis and classification problems.
Learning Outcomes
On completion of this module the learner will/should be able to;
Apply techniques such as feature scaling, standardisation, missing data handling and encoding to preprocess and clean data.
Visualise the processed data graphically, identify the correlation between the features, interpret the linear/non-linear relationships in the data.
Make meaningful inferences from the data, remove/retain features based on variable importance.
Create interactive dashboard for data visualisation.
Identify patterns in the data using exploratory data analysis and clustering techniques.
Discriminate patterns in new data using trained discriminative classification models.
Summarise, analyse, and relate research in the area of exploratory data analysis and pattern recognition in writing. Appreciate the data ethics and constraints that apply to the use of data in real-world scenarios.
Design, implement and test a real world problem using the above learned techniques.
Teaching and Learning Strategies
The theory part of the key topics will be delivered through lectures.
The code demonstrations and execution will be performed using Jupyter notebooks. .
Module Assessment Strategies
Three individual assignments and a final project are given to assess the Learning outcomes.
20% data preprocessing assessment
25% data visualisation and analysis assessment
10% research assessment
45% project to find and solve a research problem/use case provided by the learner in the data visualisation area.
Repeat Assessments
Repeat the research assessment and project.
Indicative Syllabus
Data preprocessing:
- Feature scaling and standardisation
- Handling Noise / Missing data
- Handling categorical data and label encoding
- Splitting the dataset into training and testing sets
Data visualisation:
- Graphics fundamentals
- How data is visually encoded for human perception and understanding
- Mapping visualisation techniques to specific datasets
- 2D and 3D visualisation
- Python/R libraries will be used
- Data sets, Kaggle will be used as data sources
- Data visualisation tools like Tableau and Power BI will be introduced
Pattern recognition:
- How to perform exploratory data analysis
- How to identify interesting patterns in the data
- Unsupervised clustering techniques such as K-means and Principal Component Analysis (PCA) will be discussed.
- Programming languages such as Python/R (but not restricted to) will be used..
Discriminative Classification:
- How to classify new data into categories using supervised machine learning techniques
- Logistic Regression, Linear Discriminant Analysis (LDA), Decision trees/Random Forest will be discussed
- Python/R libraries (but not restricted to) will be used
Coursework & Assessment Breakdown
Coursework Assessment
Title | Type | Form | Percent | Week | Learning Outcomes Assessed | |
---|---|---|---|---|---|---|
1 | Moodle Quiz | Coursework Assessment | Multiple Choice/Short Answer Test | 20 % | Week 6 | 1,2 |
2 | Problem based assignment | Coursework Assessment | Assignment | 25 % | Week 9 | 3,4,5 |
3 | Written assignment on literature review | Coursework Assessment | Written Report/Essay | 10 % | Week 11 | 7 |
4 | Final project | Project | Individual Project | 45 % | Week 13 | 1,2,3,4,5,6,8 |
Online Learning Mode Workload
Type | Location | Description | Hours | Frequency | Avg Workload |
---|---|---|---|---|---|
Lecture | Online | Lecture | 1 | Weekly | 1.00 |
Practical / Laboratory | Online | Practical | 2 | Weekly | 2.00 |
Independent Learning | Not Specified | Independent Learning | 4 | Weekly | 4.00 |
Required & Recommended Book List
2007 Competing on Analytics Harvard Business Press
ISBN 9781422103326 ISBN-13 1422103323
You have more information at hand about your business environment than ever before. But are you using it to out-think your rivals? If not, you may be missing out on a potent competitive tool. In Competing on Analytics: The New Science of Winning, Thomas H. Davenport and Jeanne G. Harris argue that the frontier for using data to make decisions has shifted dramatically. Certain high-performing enterprises are now building their competitive strategies around data-driven insights that in turn generate impressive business results. Their secret weapon? Analytics: sophisticated quantitative and statistical analysis and predictive modeling. Exemplars of analytics are using new tools to identify their most profitable customers and offer them the right price, to accelerate product innovation, to optimize supply chains, and to identify the true drivers of financial performance. A wealth of examplesfrom organizations as diverse as Amazon, Barclays, Capital One, Harrahs, Procter & Gamble, Wachovia, and the Boston Red Soxilluminate how to leverage the power of analytics.
2016-07-20 Computer Age Statistical Inference Cambridge University Press
ISBN 9781107149892 ISBN-13 1107149894
Take an exhilarating journey through the modern revolution in statistics with two of the ringleaders.
2011-04-18 Data Mining and Statistics for Decision Making Wiley
ISBN 0470688297 ISBN-13 9780470688298
Data mining is the process of automatically searching large volumes of data for models and patterns using computational techniques from statistics, machine learning and information theory; it is the ideal tool for such an extraction of knowledge. Data mining is usually associated with a business or an organization's need to identify trends and profiles, allowing, for example, retailers to discover patterns on which to base marketing objectives. This book looks at both classical and recent techniques of data mining, such as clustering, discriminant analysis, logistic regression, generalized linear models, regularized regression, PLS regression, decision trees, neural networks, support vector machines, Vapnik theory, naive Bayesian classifier, ensemble learning and detection of association rules. They are discussed along with illustrative examples throughout the book to explain the theory of these methods, as well as their strengths and limitations. Key Features: Presents a comprehensive introduction to all techniques used in data mining and statistical learning, from classical to latest techniques. Starts from basic principles up to advanced concepts. Includes many step-by-step examples with the main software (R, SAS, IBM SPSS) as well as a thorough discussion and comparison of those software. Gives practical tips for data mining implementation to solve real world problems. Looks at a range of tools and applications, such as association rules, web mining and text mining, with a special focus on credit scoring. Supported by an accompanying website hosting datasets and user analysis. Statisticians and business intelligence analysts, students as well as computer science, biology, marketing and financial risk professionals in both commercial and government organizations across all business and industry sectors will benefit from this book.