- Homepage
- Key Information
- Students
- Taught programmes (UG / PGT)
- Computer Science
- Engineering
- Geology (CSM)
- Mathematics (Exeter)
- Mathematics (Penryn)
- Mining and Minerals Engineering (CSM)
- Physics and Astronomy
- Renewable Energy
- Natural Sciences
- CSM Student and Staff Handbook

- Student Services and Procedures
- Student Support
- Events and Colloquia
- International Students
- Students as Change Agents (SACA)
- Student Staff Liaison Committees (SSLC)
- The Exeter Award
- Peer Support
- Skills Development
- Equality and Diversity
- Athena SWAN
- Outreach
- Living Systems Institute Webpage
- Alumni
- Info points and hubs

- Taught programmes (UG / PGT)
- Staff
- PGR
- Health and Safety
- Computer Support
- National Student Survey (NSS)
- Intranet Help
- College Website

## MTHM502 - Introduction to Data Science and Statistical Modelling (2019)

MODULE TITLE | Introduction to Data Science and Statistical Modelling | CREDIT VALUE | 15 |
---|---|---|---|

MODULE CODE | MTHM502 | MODULE CONVENER | Dorottya Fekete (Coordinator) |

DURATION: TERM | 1 | 2 | 3 |
---|---|---|---|

DURATION: WEEKS | 11 | 0 | 0 |

Number of Students Taking Module (anticipated) | 15 |
---|

In this module you will learn the basics of statistical inference, including probability, sampling variability, hypothesis testing and how to identify patterns in data and to represent them using statistical models. You will learn the essential mathematical techniques that are required for the implementation and interpretation and statistical and machine learning methods. You will learn how to fit statistical models to data, to evaluate whether models are appropriate given the context of the data and how they can be used to quantify relationships and for prediction.

Pre-requisites: None

The aim of this module is to equip students with the skills they will need to perform data science techniques and statistical analysis and to understand and interpret the outputs. Initially the focus will be on understanding essential concepts in probability and mathematics that underpin statistical analysis. Statistical distributions will be explored and used as the basis of hypothesis testing, with an emphasis on how data can inform decision making. Regression modelling will be introduced as a method of understanding relationships between variables and for prediction. Model diagnostics and methods for assessing model fit will be used to evaluate whether regression models are fit for purpose. An introduction to machine learning and clustering techniques will be given, together with examples using real-world datasets.

Activities will include data analysis, regression modelling, machine learning and report writing and presentation. Assessment will be based on examination and practical examples using real-world data examples.

On successful completion of this module, **you should be able to:**

**Module Specific Skills and Knowledge:**

1 Understand principles of probability and sampling;

2 Apply statistical regression models to data, choosing the appropriate form based on the form and origins of the data

3 Perform regression and machine learning in R/RStudio

**Discipline Specific Skills and Knowledge:**

4 Understand random sampling and statistical distributions

5 Understand the methodology, and practical use, of regression modelling

**Personal and Key Transferable/ Employment Skills and Knowledge:**

7 Statistical analysis skills;

8 Use R/RStudio and other software to implement statistical and data science methods

9 Use learning resources effectively

10 Communicate the results of data analysis clearly and accurately, both in writing and verbally

Topics will include:

• Data and variables;

• Initial data analysis;

• Probability;

• Sampling;

• Statistical distributions;

• Hypothesis testing;

• Linear regression;

• Model selection;

• Non-parametric statistics;

• Machine learning;

• Clustering.

Scheduled Learning & Teaching Activities | 36.00 | Guided Independent Study | 114.00 | Placement / Study Abroad | 0.00 |
---|

Category | Hours of study time | Description |

Scheduled Learning and Teaching Activities | 24 | Lectures |

Scheduled Learning and Teaching Activities | 12 | Hands-on practical sessions |

Guided Independent Study | 50 | Self-study & background reading |

Guided Independent Study | 64 | Assessed data analyses, report writing |

Form of Assessment | Size of Assessment (e.g. duration/length) | ILOs Assessed | Feedback Method |
---|---|---|---|

Feedback on unassessed data analyses examples (which will include report writing) | 24 | All | Oral |

Coursework | 0 | Written Exams | 60 | Practical Exams | 40 |
---|

Form of Assessment | % of Credit | Size of Assessment (e.g. duration/length) | ILOs Assessed | Feedback Method |
---|---|---|---|---|

Assessed data analyses and reports from practical sessions (selected ones from the weekly sessions) | 40 | 1.5 hours x 4 | All | Oral and Written |

Examination (Closed Book) | 60 | 2 hours | 1, 2, 4-7 | Oral (on request) |

Deferral – if you miss an assessment for certificated reasons judged acceptable by the Mitigation Committee, you will normally be either deferred in the assessment or an extension may be granted. The mark given for a re-assessment taken as a result of deferral will not be capped and will be treated as it would be if it were your first attempt at the assessment.

Referral – if you have failed the module overall (i.e. a final overall module mark of less than 50%) you will be required to re-take some or all parts of the assessment, as decided by the Module Convenor. The final mark given for a module where re-assessment was taken as a result of referral will be capped at 50%.

information that you are expected to consult. Further guidance will be provided by the Module Convener

**Basic Reading:**

Faraway, J.J., Linear Models with R, (2nd edition), Chapman & Hall

Dobson, A.J., Introduction to Statistical Modelling, Springer

Heumann, C., Schomaker, M., Shalabh, Introduction to Statistics and Data Analysis: With Exercises, Solutions and Applications in R, Springer

Reading list for this module:

Type | Author | Title | Edition | Publisher | Year | ISBN | Search |
---|---|---|---|---|---|---|---|

Set | Faraway, J.J. | Linear Models with R | Chapman and Hall/CRC (Texts in Statistical Science) | 2004 | 978-1584884255 | [Library] | |

Set | Dobson, A.J. | Introduction to Statistical Modelling | 1st | Springer | 1983 | 978-0412248603 | [Library] |

Set | Heumann, C., Schomaker, M., Shalabh | Introduction to Statistics and Data Analysis: With Exercises, Solutions and Applications in R | 1st | Springer | 2016 | 978-3319834566 | [Library] |

CREDIT VALUE | 15 | ECTS VALUE | 7.5 |
---|---|---|---|

PRE-REQUISITE MODULES | None |
---|---|

CO-REQUISITE MODULES | None |

NQF LEVEL (FHEQ) | 7 | AVAILABLE AS DISTANCE LEARNING | No |
---|---|---|---|

ORIGIN DATE | Monday 17 June 2019 | LAST REVISION DATE | Friday 13 September 2019 |

KEY WORDS SEARCH | None Defined |
---|