# Computer Science

## COMM511 - Statistical Data Modelling (2023)

MODULE TITLE CREDIT VALUE Statistical Data Modelling 15 COMM511 Dr Matthew Thomas (Coordinator)
DURATION: TERM 1 2 3
DURATION: WEEKS 11
 Number of Students Taking Module (anticipated) 35
DESCRIPTION - summary of the module content

Statistical modelling lies at the heart of modern data analysis and is a vital part of data science, particularly when decision making is involved. Simple statistical models include linear regression familiar from most foundation courses in statistics. This module places linear regression into the very broad framework of Bayesian statistical data modelling, which has become one of the most popular approaches to data analysis. Bayesian inference will be introduced as a unifying modelling framework, and the module will introduce modelling concepts such as Generalized Linear Models, Generalized Additive Models, Hierarchical Models, Multi-Level Models, Discrete Mixture Models, Models for Flawed Data and predictive model validation. These will provide you with a toolbox and the ability to analyse any real world data set, including binary data, count data, contingency tables, data with temporal and spatial structure as well as data that are missing or partially missing. We will use the statistical software R as the main platform to fit this wide range of models, and will use it in practical sessions so that, as well as a sound theoretical basis, you will develop an understanding of how to apply techniques discussed in the module in practical data analysis.

Pre-requisite Modules: MTH2006 or equivalent (knowledge of linear regression) and MTH3041 or equivalent (e.g. self-learning of bitesize pre-recorded essential material from MTH3041)

AIMS - intentions of the module

Statistical data modelling offers a systematic and rigorous way of describing data and thus the mechanisms and processes that generated them. Uncertainty is formally quantified in terms of probability. This module will formally define statistical data modelling as a process by which we can use the data as subjective judgement to construct a mathematical description of the data. It will then argue that Bayesian inference is truly a unifying framework with which we can build and check the validity of statistical data models, while fully quantifying the different sources of uncertainty that result in the apparent haphazard nature of real data sets. The module will introduce well-established but fairly restrictive models such as GLMs but then move on to present more state-of-the-art approaches such as GAMs and Bayesian Hierarchical Models as well as a conceptual framework for correcting flaws in observational data sets (such as censoring). The module will introduce a plethora of real data sets spanning a wide range of applications such as public health, weather, climate, ecology, biology, epidemiology, natural hazards and many others.

INTENDED LEARNING OUTCOMES (ILOs) (see assessment section below for how ILOs will be assessed)

On successful completion of this module you should be able to:

Module Specific Skills and Knowledge

1. Show understanding of the many different types of data structures that can commonly occur and the need to respect the nature of the data in building statistical models;

2.  Demonstrate awareness of, and ability to apply, the unifying power of Bayesian inference for data analysis and its use in inference (e.g. quantifying relationships) and prediction;

3. Reveal awareness of, and ability to apply, related modern developments in statistical modelling techniques, including nonparametric and semi-parametric formulations (GAMs), Bayesian hierarchical modelling and models for flawed data;

4. Utilise appropriate software and a suitable computer language for advanced modelling of data;

Discipline Specific Skills and Knowledge

5. Demonstrate understanding and appreciation of, and aptitude in, the mathematical definition of stochastic models for data perceived to arise at random;

6. Apply simulation-based numerical integration methods in the context of Bayesian statistical modelling

7. Appreciate and apply the concept of piecewise processes and their use in semi-parametric statistical models

8. Understanding of the multivariate Normal distribution and its use in Bayesian statistical modelling

Personal and Key Transferable / Employment Skills and Knowledge

9. Show advanced data analysis skills and be able to communicate associated reasoning and interpretations effectively in writing;

10. Apply relevant computer software competently;

11. Use learning resources appropriately;

12. Exemplify self-management and time-management skills;

13. Gain experience in problem solving using data analysis.

SYLLABUS PLAN - summary of the structure and academic content of the module

- Introduction of linear regression as a special case of a statistical model and of statistical modelling as a method;

- Value of Bayesian inference as a unifying modelling framework;

- Posterior predictive model checking;

- Generalised linear models (GLMs): definition and historical use;

- Generalised Additive Models (GLMs): definition and a method to capture space-time structures;

- Normal approximation to the posterior and connection to maximum likelihood;

- Hierarchical Models: definition and links to random effects and multi-level models;

- Discrete mixture models and zero-inflation;

- Models for flawed data.

LEARNING AND TEACHING
LEARNING ACTIVITIES AND TEACHING METHODS (given in hours of study time)
 Scheduled Learning & Teaching Activities Guided Independent Study 33 117
DETAILS OF LEARNING ACTIVITIES AND TEACHING METHODS
 Category Hours of study time Description Scheduled learning and teaching activites 33 Lectures/ practical classes Guided Independent Study 33 Post lecture study and reading Guided Independent Study 40 Formative and summative coursework preparation Guided Independent Study 44 Exam revision/ preparation

ASSESSMENT
FORMATIVE ASSESSMENT - for feedback and development purposes; does not count towards module grade
Form of Assessment Size of Assessment (e.g. duration/length) ILOs Assessed Feedback Method
Unassessed practical modelling exercises 1 10 hours 1-13 Verbal, in class
Unassessed practical modelling exercises 2 10 hours 1-13 Verbal, in class

SUMMATIVE ASSESSMENT (% of credit)
 Coursework Written Exams 50 50
DETAILS OF SUMMATIVE ASSESSMENT
Form of Assessment % of Credit Size of Assessment (e.g. duration/length) ILOs Assessed Feedback Method
Coursework – practical modelling exercises and theoretical problems 1 25 10 hours 1-13 Written and oral
Coursework – practical modelling exercises and theoretical problems 2 25 10 hours 1-13 Written and verbal
Coursework- project on data analysis 50 20 hours 1-13 Written and verbal

DETAILS OF RE-ASSESSMENT (where required by referral or deferral)
Original Form of Assessment Form of Re-assessment ILOs Re-assessed Time Scale for Re-assessment

Coursework – practical modelling exercises and theoretical problems 1

Coursework – practical modelling exercises and theoretical problems 1

All August referral/deferral period

Coursework – practical modelling exercises and theoretical problems 2

Coursework – practical modelling exercises and theoretical problems 2 All August referral/deferral period
Coursework - project on data analysis Coursework - project on data analysis All August referral/deferral period

RE-ASSESSMENT NOTES

Reassessment will be by coursework in the failed or deferred element only. For referred candidates, the module mark will be capped at 40%. For deferred candidates, the module mark will be uncapped.

RESOURCES
INDICATIVE LEARNING RESOURCES - The following list is offered as an indication of the type & level of
information that you are expected to consult. Further guidance will be provided by the Module Convener

Reading list for this module:

Type Author Title Edition Publisher Year ISBN Search
Set A Gelman Bayesian Data Analysis 3rd CRC Press 2013 9781439840955 [Library]
Set Faraway, J.J. Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models Chapman & Hall 2006 158488424X [Library]
Set Wood, Simon N. Generalized Additive Models: An Introduction with R Chapman & Hall/CRC 2006 978-1584884743 [Library]
CREDIT VALUE ECTS VALUE 15 7.5
PRE-REQUISITE MODULES None None
NQF LEVEL (FHEQ) AVAILABLE AS DISTANCE LEARNING 7 No Tuesday 16 February 2021 Tuesday 24 January 2023
KEY WORDS SEARCH Generalised Linear Models; Additive Models; Bayesian data analysis; Hierarchical Models; censoring; MCMC.