- Homepage
- Key Information
- Students
- Taught programmes (UG / PGT)
- Student Services and Procedures
- Student Support
- Events and Colloquia
- International Students
- Students as Change Agents (SACA)
- Student Staff Liaison Committees (SSLC)
- The Exeter Award
- Peer Support
- Skills Development
- Equality and Diversity
- Athena SWAN
- Outreach
- Living Systems Institute Webpage
- Alumni
- Info points and hubs
- Inbound Exchange Students

- Staff
- PGR
- Health and Safety
- Computer Support
- National Student Survey (NSS)
- Intranet Help
- College Website

## COMM511 - Statistical Data Modelling (2023)

MODULE TITLE | Statistical Data Modelling | CREDIT VALUE | 15 |
---|---|---|---|

MODULE CODE | COMM511 | MODULE CONVENER | Dr Matthew Thomas (Coordinator) |

DURATION: TERM | 1 | 2 | 3 |
---|---|---|---|

DURATION: WEEKS | 11 |

Number of Students Taking Module (anticipated) | 35 |
---|

Statistical modelling lies at the heart of modern data analysis and is a vital part of data science, particularly when decision making is involved. Simple statistical models include linear regression familiar from most foundation courses in statistics. This module places linear regression into the very broad framework of Bayesian statistical data modelling, which has become one of the most popular approaches to data analysis. Bayesian inference will be introduced as a unifying modelling framework, and the module will introduce modelling concepts such as Generalized Linear Models, Generalized Additive Models, Hierarchical Models, Multi-Level Models, Discrete Mixture Models, Models for Flawed Data and predictive model validation. These will provide you with a toolbox and the ability to analyse any real world data set, including binary data, count data, contingency tables, data with temporal and spatial structure as well as data that are missing or partially missing. We will use the statistical software R as the main platform to fit this wide range of models, and will use it in practical sessions so that, as well as a sound theoretical basis, you will develop an understanding of how to apply techniques discussed in the module in practical data analysis.

Pre-requisite Modules: MTH2006 or equivalent (knowledge of linear regression) and MTH3041 or equivalent (e.g. self-learning of bitesize pre-recorded essential material from MTH3041)

Statistical data modelling offers a systematic and rigorous way of describing data and thus the mechanisms and processes that generated them. Uncertainty is formally quantified in terms of probability. This module will formally define statistical data modelling as a process by which we can use the data as subjective judgement to construct a mathematical description of the data. It will then argue that Bayesian inference is truly a unifying framework with which we can build and check the validity of statistical data models, while fully quantifying the different sources of uncertainty that result in the apparent haphazard nature of real data sets. The module will introduce well-established but fairly restrictive models such as GLMs but then move on to present more state-of-the-art approaches such as GAMs and Bayesian Hierarchical Models as well as a conceptual framework for correcting flaws in observational data sets (such as censoring). The module will introduce a plethora of real data sets spanning a wide range of applications such as public health, weather, climate, ecology, biology, epidemiology, natural hazards and many others.

On successful completion of this module ** you should be able to**:

**Module Specific Skills and Knowledge**

2. Demonstrate awareness of, and ability to apply, the unifying power of Bayesian inference for data analysis and its use in inference (e.g. quantifying relationships) and prediction;

**Discipline Specific Skills and Knowledge**

6. Apply simulation-based numerical integration methods in the context of Bayesian statistical modelling

7. Appreciate and apply the concept of piecewise processes and their use in semi-parametric statistical models

8. Understanding of the multivariate Normal distribution and its use in Bayesian statistical modelling

**Personal and Key Transferable / Employment Skills and Knowledge**

10. Apply relevant computer software competently;

11. Use learning resources appropriately;

12. Exemplify self-management and time-management skills;

- Introduction of linear regression as a special case of a statistical model and of statistical modelling as a method;

- Value of Bayesian inference as a unifying modelling framework;

- Posterior predictive model checking;

- Generalised linear models (GLMs): definition and historical use;

- Generalised Additive Models (GLMs): definition and a method to capture space-time structures;

- Normal approximation to the posterior and connection to maximum likelihood;

- Hierarchical Models: definition and links to random effects and multi-level models;

- Discrete mixture models and zero-inflation;

- Models for flawed data.

Scheduled Learning & Teaching Activities | 33.00 | Guided Independent Study | 117.00 | Placement / Study Abroad |
---|

Category | Hours of study time | Description |

Scheduled learning and teaching activites | 33 | Lectures/ practical classes |

Guided Independent Study | 33 | Post lecture study and reading |

Guided Independent Study | 40 | Formative and summative coursework preparation |

Guided Independent Study | 44 |
Exam revision/ preparation |

Form of Assessment | Size of Assessment (e.g. duration/length) | ILOs Assessed | Feedback Method |
---|---|---|---|

Unassessed practical modelling exercises 1 | 10 hours | 1-13 | Verbal, in class |

Unassessed practical modelling exercises 2 | 10 hours | 1-13 | Verbal, in class |

Coursework | 50 | Written Exams | 50 | Practical Exams |
---|

Form of Assessment | % of Credit | Size of Assessment (e.g. duration/length) | ILOs Assessed | Feedback Method |
---|---|---|---|---|

Coursework – practical modelling exercises and theoretical problems 1 | 25 | 10 hours | 1-13 | Written and oral |

Coursework – practical modelling exercises and theoretical problems 2 | 25 | 10 hours | 1-13 | Written and verbal |

Coursework- project on data analysis | 50 | 20 hours | 1-13 | Written and verbal |

Original Form of Assessment | Form of Re-assessment | ILOs Re-assessed | Time Scale for Re-assessment |
---|---|---|---|

Coursework – practical modelling exercises and theoretical problems 1 |
Coursework – practical modelling exercises and theoretical problems 1 |
All | August referral/deferral period |

Coursework – practical modelling exercises and theoretical problems 2 |
Coursework – practical modelling exercises and theoretical problems 2 | All | August referral/deferral period |

Coursework - project on data analysis | Coursework - project on data analysis | All | August referral/deferral period |

Reassessment will be by coursework in the failed or deferred element only. For referred candidates, the module mark will be capped at 40%. For deferred candidates, the module mark will be uncapped.

information that you are expected to consult. Further guidance will be provided by the Module Convener

Reading list for this module:

Type | Author | Title | Edition | Publisher | Year | ISBN | Search |
---|---|---|---|---|---|---|---|

Set | A Gelman | Bayesian Data Analysis | 3rd | CRC Press | 2013 | 9781439840955 | [Library] |

Set | Faraway, J.J. | Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models | Chapman & Hall | 2006 | 158488424X | [Library] | |

Set | Wood, Simon N. | Generalized Additive Models: An Introduction with R | Chapman & Hall/CRC | 2006 | 978-1584884743 | [Library] |

CREDIT VALUE | 15 | ECTS VALUE | 7.5 |
---|---|---|---|

PRE-REQUISITE MODULES | None |
---|---|

CO-REQUISITE MODULES | None |

NQF LEVEL (FHEQ) | 7 | AVAILABLE AS DISTANCE LEARNING | No |
---|---|---|---|

ORIGIN DATE | Tuesday 16 February 2021 | LAST REVISION DATE | Tuesday 24 January 2023 |

KEY WORDS SEARCH | Generalised Linear Models; Additive Models; Bayesian data analysis; Hierarchical Models; censoring; MCMC. |
---|