COM3021 - Data Science at Scale (2023)

Back | Download as PDF
MODULE TITLEData Science at Scale CREDIT VALUE15
MODULE CODECOM3021 MODULE CONVENERDr Hugo Barbosa (Coordinator)
DURATION: TERM 1 2 3
DURATION: WEEKS 11 0 0
Number of Students Taking Module (anticipated) 30
DESCRIPTION - summary of the module content

Data science relies on large amounts of data to be effective and many commercial and scientific applications require the analysis of large quantities of heterogenous, noisy data on distributed machines. This module will examine the ways in which algorithms for data science can be implemented for large data and will discuss new algorithms specifically designed for large scale data. You will also work with large-scale distributed and cloud systems for storing and computing with big data.

AIMS - intentions of the module

Through theory and practice this module aims to equip you with an understanding of the principles of distributed computing, particularly on cloud-based systems, the ways in which data can be stored and accessed to allow efficient computation, and efficient algorithms for large-scale computation.

Distributed cloud computing will provide you with the underpinning knowledge required to develop and implement machine learning and artificial intelligence algorithms on distributed high-performance computing systems.

INTENDED LEARNING OUTCOMES (ILOs) (see assessment section below for how ILOs will be assessed)

On successful completion of this module, you should be able to:

Module Specific Skills and Knowledge:

1 Explain the common challenges encountered in large scale data science projects;

2 Display competence the use of a range of abstraction and programming models for large scale data processing;

3 Analyse and use a range of data storage models for parallel query processing;

4 Understand principles of and use cloud and distributed systems for data processing;

5 Understand and design algorithms for machine learning on large scale distributed systems;

Discipline Specific Skills and Knowledge:

6 Describe a number of different programming paradigms and associated data structures;

7 Learn a variety of data science methods and apply them to real problems;

Personal and Key Transferable / Employment Skills and Knowledge:

8 Plan and write a technical report;

9 Adapt existing technical knowledge to learning new methods.

SYLLABUS PLAN - summary of the structure and academic content of the module

• Introduction: the size of data and impediments to efficient computation;

• Data storage and retrieval: relational databases and NoSQL systems;

• Distributed systems and data: cloud computing and supercomputing; data distribution and consistency;

• The MapReduce paradigm and implementations;

• Algorithms for large scale learning: stochastic gradient descent, large scale linear algebra;

• Stream processing;

• Future architectures; co-design of hardware and algorithms.

LEARNING AND TEACHING
LEARNING ACTIVITIES AND TEACHING METHODS (given in hours of study time)
Scheduled Learning & Teaching Activities 35.00 Guided Independent Study 115.00 Placement / Study Abroad 0.00
DETAILS OF LEARNING ACTIVITIES AND TEACHING METHODS
Category Hours of study time Description
Scheduled Learning and Teaching 20 Lectures
Scheduled Learning and Teaching 15 Workshops and tutorials
Guided Independent Study 115 Coursework; private study; reading

 

ASSESSMENT
FORMATIVE ASSESSMENT - for feedback and development purposes; does not count towards module grade
Form of Assessment Size of Assessment (e.g. duration/length) ILOs Assessed Feedback Method
Not Applicable      

 

SUMMATIVE ASSESSMENT (% of credit)
Coursework 30 Written Exams 70 Practical Exams 0
DETAILS OF SUMMATIVE ASSESSMENT
Form of Assessment % of Credit Size of Assessment (e.g. duration/length) ILOs Assessed Feedback Method
Written Exam 70 2 hours (Summer) 1-6 Orally, on request
Technical Exercise and Report 1 10 10 hours 2-5, 7-9 Written
Technical Exercise and Report 2 20 20 hours 2-5, 7-9 Written

 

DETAILS OF RE-ASSESSMENT (where required by referral or deferral)
Original Form of Assessment Form of Re-assessment ILOs Re-assessed Time Scale for Re-assessment
Written Exam Written Exam (2 hours) All August Ref/Def Period
Technical Exercise and Report 1 Technical Exercise and Report 1 2-5, 7-9 August Ref/Def Period
Technical Exercise and Report 2     Technical Exercise and Report 2 2-5, 7-9 August Ref/Def Period

 

RE-ASSESSMENT NOTES

Reassessment will be by coursework and/or written exam in the failed or deferred element only. For referred candidates, the module mark will be capped at 40%. For deferred candidates, the module mark will be uncapped.

 

RESOURCES
INDICATIVE LEARNING RESOURCES - The following list is offered as an indication of the type & level of
information that you are expected to consult. Further guidance will be provided by the Module Convener

Basic Reading:

ELE: http://vle.exeter.ac.uk/

 

Reading list for this module:

Type Author Title Edition Publisher Year ISBN Search
Set Kleppmann, M. Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems 1st O'Reilly 2016 1449373321 [Library]
Set White, T. Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale 4th O'Reilly 2015 1491901632 [Library]
Set Narkhede, N., Shapira, G., Polino, T. Kafka - The Definitive Guide 1st O'Reilly 2016 978-1491936160 [Library]
Set Chambers, B. Spark: The Definitive Guide 1st O'Reilly 2018 1491912219 [Library]
CREDIT VALUE 15 ECTS VALUE 7.5
PRE-REQUISITE MODULES ECM2419, COM2013
CO-REQUISITE MODULES
NQF LEVEL (FHEQ) 6 AVAILABLE AS DISTANCE LEARNING No
ORIGIN DATE Friday 12 April 2019 LAST REVISION DATE Tuesday 24 January 2023
KEY WORDS SEARCH None Defined