ECM3901 - Data Analytics and Machine Learning (2023)

Back | Download as PDF
MODULE TITLEData Analytics and Machine Learning CREDIT VALUE15
MODULE CODEECM3901 MODULE CONVENERDr Saptarshi Das (Coordinator)
DURATION: TERM 1 2 3
DURATION: WEEKS 5 6
Number of Students Taking Module (anticipated) 25
DESCRIPTION - summary of the module content

Classical statistical methods were developed at a time when data collection was expensive. Recent advances in science and computing technology has resulted in an explosion of available data, in fields as diverse as medicine, finance, marketing and biology. This has led to the development of new statistical methodologies, aimed at meeting the challenges associated with processing and understanding “big data”.

In this problem-solving oriented module, you will develop hands-on skills and techniques needed to turn complex data sets into useful information, implementing techniques developed in data mining and machine learning, and learning how to apply these in various data analytics packages and their open source versions.

The module spans over two terms. In the first term, it will develop basic understanding of data analytics and encourage group works, followed by more involved big data problems in the second term.

Prerequisite modules: “Scientific Computing 1" (ECM1914) and “Statistical Modelling” (ECM2907) or equivalent.

 

AIMS - intentions of the module

This module aims to lay the foundations for an understanding of statistical learning approaches and multivariate statistics. It aims to provide practical skills for implementing these techniques in practice, and how to effectively analyse and present “big data” effectively.

 

 

INTENDED LEARNING OUTCOMES (ILOs) (see assessment section below for how ILOs will be assessed)

On successful completion of this module you should be able to:

 

Module Specific Skills and Knowledge:

1.  Understand the challenges associated with collecting, manipulating and interpreting “big data”;

2.  Learn the fundamental concepts of predictive modelling and pattern recognition;

3.  Gain knowledge and insight into the latest developments in these fields;

4.  Understand and apply statistical learning techniques in a variety of applications, using Matlab and open-source software Python/R;

 

Discipline Specific Skills and Knowledge:

5.  Learn and apply advanced statistical methods to process complex data sets;

6.  Improve computational skills, and gain a better understanding of the practical implementation of these approaches;

 

Personal and Key Transferable/Employment Skills and Knowledge:

7.   Demonstrate key skills in data analytics, including practical implementation;

8.   Understand the challenges of “big data”, communicate reasoning and solutions effectively in writing;

9.   Demonstrate appropriate use of learning resources;

10. Demonstrate self-management and time management skills.

 

SYLLABUS PLAN - summary of the structure and academic content of the module

- Heterogeneous, multimedia datasets like image, audio, video, text processing, financial time-series, bioinformatics, remote sensing and medical image; benchmark big datasets from engineering, physical/life sciences, business and social sciences; Data cleaning and pre-processing, data visualization; descriptive statistics, feature extraction, Introduction to statistical learning paradigms: supervised learning, unsupervised learning, semi-supervised learning, connections to statistical signal processing and information theory [3 hours];

- Big-data management and processing, parallel computing on CPUs and GPUs, algorithm scalability, signal/image filtering, wavelets, colour image processing, challenges in computer vision [3 hours];

 

- Multivariate analysis and dimensionality reduction: principal component analysis, independent component analysis [3 hours];

- Regression: Linear and nonlinear, univariate, multiple and multivariate regression, least square, regularization and shrinkage methods, model selection and resampling methods [3 hours];

- Introduction to Gaussian process and kernel methods; spatial and temporal random processes [3 hours];

- Classification: probabilistic and non-probabilistic classifiers, feature selection, logistic regression, discriminant analysis, k-nearest neighbour, support vector machine, decision tree, ensemble learning, generalised linear models, multilayer perceptron [3 hours];

- Clustering: k-means, mixture-model and expectation-maximisation, hierarchical and spectral clustering [3 hours];

- Recent advances in artificial intelligence and machine learning in particular deep learning (convolutional neural networks, auto-encoder, transfer learning, recurrent neural networks, deep generative models) [6 hours];

- Fuzzy inference, single and multi-objective swarm/evolutionary optimisation algorithms, reinforcement learning; Bayesian optimisation [3 hours];

- Sampling and inference, graphical models, Bayesian machine learning, combining models [3 hours].

 

LEARNING AND TEACHING
LEARNING ACTIVITIES AND TEACHING METHODS (given in hours of study time)
Scheduled Learning & Teaching Activities 33.00 Guided Independent Study 117.00 Placement / Study Abroad 0.00
DETAILS OF LEARNING ACTIVITIES AND TEACHING METHODS
Category Hours of study time Description
Scheduled Learning & Teaching activities 11 Formal lectures of new material
Scheduled Learning & Teaching activities 22 Computer classes and tutorials
Guided Independent Study 117 Lecture & assessment preparation, wider reading

 

ASSESSMENT
FORMATIVE ASSESSMENT - for feedback and development purposes; does not count towards module grade
Form of Assessment Size of Assessment (e.g. duration/length) ILOs Assessed Feedback Method
Fortnightly exercise 2 x 5  hours 1-10 Questions marked by tutors, feedback given on all questions during tutorials

 

SUMMATIVE ASSESSMENT (% of credit)
Coursework 100 Written Exams 0 Practical Exams 0
DETAILS OF SUMMATIVE ASSESSMENT
Form of Assessment % of Credit Size of Assessment (e.g. duration/length) ILOs Assessed Feedback Method
1 x Coursework (mixture of dimensionality reduction, classification, regression, clustering, AI) - based on the skills learned in the formative assessment and the Matlab/ Python/ R practical classes. Need to submit codes for an in-depth analysis of a chosen big dataset and a detailed individual report. 1 x 50

Approx. 6-10 pages essay (1 x 3 hours, individual report) (Term 2)

1-10 Written and oral
1 x Presentation on the group reports. 1 x 10 1 x 15 mins (Term 1) 1-10 Written and oral
1 x Group report/poster on in-depth analysis of medium size dataset. 1 x 20 Approx. 4-6 page essay (1 x 3 hours, group report), (Term 1) 1-10 Written and oral
1 x in-class open book test on basic understanding of method/programming (10 marks programming and 10 marks Quiz) 1 x 20 90 mins (Term 1) 1-10 Written and oral

 

DETAILS OF RE-ASSESSMENT (where required by referral or deferral)
Original Form of Assessment Form of Re-assessment ILOs Re-assessed Time Scale for Re-assessment
All above Coursework (100%) All  August Ref/Def period

 

RE-ASSESSMENT NOTES

If a module is normally assessed entirely by coursework, all referred/deferred assessments will normally be by assignment.

If a module is normally assessed by examination or examination plus coursework, referred and deferred assessment will normally be by examination. For referrals, only the examination will count, a mark of 40% being awarded if the examination is passed. For deferrals, candidates will be awarded the higher of the deferred examination mark or the deferred examination mark combined with the original coursework mark.

 

RESOURCES
INDICATIVE LEARNING RESOURCES - The following list is offered as an indication of the type & level of
information that you are expected to consult. Further guidance will be provided by the Module Convener

Basic reading:

ELE: http://vle.exeter.ac.uk

 

Reading list for this module:

Type Author Title Edition Publisher Year ISBN Search
Set Sergios Theodoridis, Aggelos Pikrakis, Konstantinos Koutroumbas & Dionisis Cavouras Introduction to Pattern Recognition: A Matlab Approach 1st Academic Press 2010 B008KO4GQ2 [Library]
Set Simon Rogers & Mark Girolami A First Course in Machine Learning 2nd CRC Press 2016 B01N7ZEBK8 [Library]
Set Murphy, K. Machine Learning: A Probabilistic Perspective 1st MIT Press 2012 978-0-262-018029 [Library]
Set Wendy L. Martinez and Angel R. Martinez Computational Statistics Handbook with MATLAB 3rd CRC Press 2015 978-1466592735 [Library]
Set Wendy L. Martinez, Angel R. Martinez, Jeffrey Solka Exploratory Data Analysis with MATLAB 3rd CRC Press 2017 978-1498776066 [Library]
Set Hastie T., Tibshirani R. & Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction 2nd Springer 2009 978-0387848587 [Library]
Set Christopher Bishop Pattern Recognition and Machine Learning Springer 2007 978-0387310732 [Library]
Set David Barber Bayesian Reasoning and Machine Learning Cambridge University Press 2012 978-0-521-51814-7 [Library]
Set Sebastian Raschka, Vahid Mirjalili Python Machine Learning: Machine Learning and Deep Learning with Python, Scikit-learn, and TensorFlow 2nd Packt Publishing 2017 978-1787125933 [Library]
Set Aurelien Geron Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow O'Reilly 2019 978-1492032649 [Library]
Set Francois Chollet Deep Learning with Python Manning Publications 2017 978-1617294433 [Library]
Set Ian Goodfellow, Yoshua Bengio, Aaron Courville, Francis Bach Deep Learning MIT Press 2017 978-0262035613 [Library]
Set Andreas C. Muller, Sarah Guido Introduction to Machine Learning with Python O'Reilly Media 2016 B01M0LNE8C [Library]
Set Carl Edward Rasmussen, Christopher K. I. Williams Gaussian Processes for Machine Learning MIT Press 2006 978-0262182539 [Library]
Set Bharath Ramsundar, Reza Bosagh Zadeh TensorFlow for Deep Learning O'Reilly 2018 978-1491980453 [Library]
Set Matthew Scarpino TensorFlow for Dummies John Wiley & Sons 2018 978-1119466215 [Library]
CREDIT VALUE 15 ECTS VALUE 7.5
PRE-REQUISITE MODULES ECM2907, ECM1914
CO-REQUISITE MODULES
NQF LEVEL (FHEQ) 6 AVAILABLE AS DISTANCE LEARNING No
ORIGIN DATE Thursday 07 May 2015 LAST REVISION DATE Wednesday 18 January 2023
KEY WORDS SEARCH Big data; machine learning; pattern recognition; multivariate analysis; classification; clustering; regression; deep learning; artificial intelligence.