Skip to content

Online Master of Data Science: Course Structure

Curriculum Details

12–16 subjects required

You can complete your online Master of Data Science course in 2 years with 16 subjects if you choose to study full-time.

If you have an undergraduate degree in a related field, you may be eligible for credit or Advanced Standing for some of the IT fundamental subjects, which could reduce the course to 12 subjects.

If you have an undergraduate degree in an unrelated field, you’ll learn everything you need to know with four IT fundamentals subjects.

Electives (Select 30 credit points): Select a level four or level five STA, STM, MAT or CSE coded subject to the value of 15 credit points.

For more information about the duration of the program or the course structure, speak with an enrolment advisor on (+61 3) 9917 3009 or request more information now.

CORE (105 Credit Points)

Credits

The Academic Integrity Module will introduce you to academic integrity standards, so you’re informed about how to avoid plagiarism and academic misconduct.  You’ll complete four parts that cover academic misconduct and academic integrity decisions, such as cheating, plagiarism and collusion.  You’ll learn about the text-matching tool, Turnitin, that is used at La Trobe, how to get help and where to go to develop referencing skills.

This subject starts with an overview of the architecture and management of database systems, and a discussion of different existing database models. The main focus includes relational database analysis, design, and implementation. The students learn: relational algebra as the formal foundation of relational databases; relational conceptual design using an entity-relationship diagram; relational logical database design; security and integrity; and SQL implementation of relational database queries. Students will also learn advanced normalization theory and the techniques to remove data anomalies and redundancies. In this subject, students are required to design a database application that meets the needs of a system requirement specification, and to implement the system using a commercial standard database system such as ORACLE or POSTGRESQL. In addition, a selection of advanced topics in databases will be introduced and discussed.

In this subject, you will be introduced to the steps involved in designing and creating software solutions for a range of practical problems. To enable you to design and implement solutions, you will be introduced to methods for analysis of requirements, development of the overall structure of a solution, and identification of its key parts, and on this basis, to incrementally build and test the solution. To develop your problem-solving skills, problems drawn from different domains, with increasing complexity, will be presented for your practice. You will be introduced to the concepts of class and object, to represent real-world objects to solve problems arising from an application domain. Python is used as the programming language in the subject. The strengths of Python, in particular its supports for quick testing of ideas, are exploited to facilitate the development of your problem-solving skills and effective software development practice.

Important mathematical ideas which underpin the theory and techniques of data science are introduced and consolidated in this subject. Matrices are used to store and work with quantitative information, and the methods of calculus are used to find extreme values and accumulation. The Gamma and Beta functions are introduced, as are eigenvalues, eigenvectors and the rank of a matrix. Emphasis is placed on the relevance of the mathematics to data science applications (such as least squares estimators and calculation of variance in data), and on the development of clear communication in explaining technical ideas. This is a foundational subject for the Master of Data Science.

This subject develops an understanding of probability and statistics applied to Data Science. Probability topics include joint and conditional probability, Bayes’ Theorem and distributions such as the uniform, binomial, Poisson and normal distributions as well as properties of random variables and the Central Limit Theorem. Statistical inference and data analysis is also considered covering, among other topics, significance testing and confidence intervals with an introduction to methods such as ANOVA, linear and nonlinear regression and model verification. Applications to data science are considered and students will be exposed to the R statistical package as well as the mathematical type-setting package LaTeX.

In this subject you will be provided with specialist knowledge and tools required to formulate solutions to complex data p problems encountered by data scientists. You will learn various data exploration techniques and analysis tools. Selected topics include data cleaning, data normalisation, data visualisation and data exploration. One or more applications associated with each problem will also be discussed. You will learn the fundamentals of exploratory data analysis techniques, statistical learning, and correlation analysis to solve these problems. You will also learn to implement data exploration methods and analysis tools using the R programming language.

The purpose of this subject is to outline the basic principles of Entrepreneurship. It will examine the steps required in developing an idea into a business and will explore the tools and necessary insights to make a successful venture. The subject will involve theory, case studies and guest speakers on start-up issues, pitfalls, and ingredients for success. Students will also develop professional skills related to ethical and moral decision making and evaluate the social implications of their work and the broader global context. The subject requires active participation in group discussions and activities.

Companies are acquiring massive amounts of data and also providing internet based service to millions of people. This is extremely challenging due to the large scale of data involved and the huge number of concurrent requests by users. In this subject we will study the current state-of-the-art technologies for analysing huge amounts of data and responding to millions of user requests within one second. Currently the most cost efficient way of achieving the above aim is to use large-scale cloud-based services offered by vendors such as Amazon, Google, IBM, Microsoft, etc. We will study how to use the cloud services provided by Amazon Web Services to meet the big data needs of businesses. We will also teach how to program the world’s most popular Big Data analytics framework called Hadoop. In particular the Hadoop software systems that we will learn in detail include Map Reduce, Hive and Apache Spark.  This subject will also teach the following topics: cloud architectures, parallel database systems, key value stores, transaction support in the cloud, virtualization, and multi-tenant database systems.

Core Choice – Statistics Thesis (60CP)

Credits

The Statistical Science thesis is a minor thesis written during the second master year. In this thesis, you will work on statistical research problems. You will carry out studies in statistics leading to new theoretical results or novel analysis of real-life data. The code for this thesis is split into two parts: STA5002 (which is taken in the first semester of this second year) and STA5THB (which is taken in the second semester of this second year). STA5002 and STA5THB are allocated 30 credit points each, resulting in a total of 60 credit points for the thesis. The same combined grade is given for STA5002 and STA5THB when STA5THB is completed.

The Master of Statistical Science thesis is a minor thesis written during the second year of the Master of Data Science. In this thesis, the student demonstrates at least one of the following: (a) an understanding of a statistical research problem, (b) the numerical solution of a statistical research problem, (c) carries out research in statistics leading to new results or (d) gains experience in analysing real-life data. The code for this thesis is split into two parts: STA5002 (which is taken in the first semester of this second year) and STA5THB (which is taken in the second semester of this second year). STA5002 is allocated 30 credit points and STA5THB is allocated 30 credit points, resulting in a total of 60 credit points for the thesis. The same combined grade is given for STA5002 and STA5THB when STA5THB is completed.

Core Choice – Computer Science Thesis (60CP)

Credits

Students undertake research, across both CSE5001 and CSE5TSB, that takes the equivalent of eight or nine months of continuous work under the supervision of a member of staff. In the first semester, a literature review is written up and submitted as a hurdle requirement for the subject. A list of prospective thesis topics is available from the Department of Computer Science and Information Technology.

Students undertake a research, across both CSE5001 and CSE5TSB that takes the equivalent of eight or nine months of continuous work under the supervision of a member of staff. In the second semester, a minor thesis is written up and submitted as a hurdle requirement for the subject. The student also required to deliver a presentation based on the research at the end of the semester.

Core Choice – Industry Based Learning (60CP) & Choose 2 Electives (30CP)

Credits

This subject provides necessary skills and techniques to manage large-scale information technology projects, with strong focus on the analytical side of project management, referring to scheduling, cost, and resource management, as well as the ‘people’ and client management issues that must be dealt with in order to ensure successful projects. Students learn to design Information Technology projects covering network management or software development or data science for efficiency, portability and re-use, as well as to take advantage of different standards and system utilities, data and information management techniques.

The project focuses on developing students’ skills in teamwork, system design, implementation, testing and documentation. Students learn to design Information Technology projects covering network management or software development or data science for efficiency, portability and re-use, as well as to take advantage of different standards and system utilities, data and information management techniques. The projects require students to work in small development teams and result in the development of a small-scale industry-based system. The laboratory work is designed to bring students up to speed on relevant development skills and to provide them with a working knowledge sufficient for industrial-type network, data science and software development projects. The subject also integrates previously learned project management skills and knowledge relating to social and ethical issues.

The project focuses on developing students’ skills in system design, implementation, testing and documentation for solving Research problems in Information Technology. Students learn to understand the underlying research question specific to the chosen project; design, develop, test and document a software(or simulation) system for the analysis of the research problem.

This subject has been developed to allow you to enhance your formal learning in a practical setting and develop your understanding of (Australian) workplace culture. You will undertake approximately 180 hours of placement (typically 2 days of work per week over 12 weeks) where you will work with an industry host in an appropriate workplace role. Approximately 96 additional hours will be needed for preparation and report writing.

Students undertake a 12 x 20hr per week or equivalent industry based learning program. A member of the academic staff acting in the capacity of industry based learning coordinator liaises with the placement provider to formulate and structure a program of learning for each student. This program is normally project based, with day-to-day supervision by the placement provider. Progress is monitored at regular intervals.

Specialisation: Big Data and Cloud Computing (60CP)

Credits

Quantitative analysis plays an important role in industrial data analytics and knowledge engineering, which makes it very useful to develop computing skills for data regression and classification. This subject covers fundamentals of machine learning techniques in theory and practice. The subject is designed to focus on solving industrial data modelling problems using neural networks. You will learn how to test various learning algorithms and compare performance evaluations. Some advanced machine learning techniques for data classification will also be addressed. You will work with industrial data modelling in labs and assignments to consolidate your knowledge and gain hands-on experience with machine learning applications.

Data Mining refers to various techniques which can be used to uncover hidden information from a database. The data to be mined may be complex data including big data, multimedia, spatial and temporal data, biological and health data. Data Mining has evolved from several areas including: databases, artificial intelligence, algorithms, information retrieval and statistics. This subject is designed to provide you with a solid understanding of data mining concepts and tools. The subject covers algorithms and techniques for data pre-processing, data classification, association rule mining, and data clustering. The subject also covers domain applications where data mining techniques are used.

Creating web sites that scale to serve hundreds of millions of users with acceptable response times is a very challenging task. The main focus of this subject is on cloud computing concepts and tools that are needed to make web sites scalable. This subject assumes the technologies HTML, CSS and basic Javascript have already being taught in CSE4IFU. The subject will cover topics such as frontend fundamental (Git, responsive web design, popular frontend frameworks and the React framework), advanced frontend and backend development (Redux, Docker, RestAPI, stateless web servers and Nodejs), and web server storage and deployment in Microsoft Azure (fundamental cloud computing concepts, continuous integration and delivery with Microsoft Azure, database and no SQL storage with Microsoft Azure, authentication and authorization, and integration of third party services, such as Twitter, Google Maps and Weather, etc.).

The subject introduces you to spatial data analysis. It surveys the theory of spatial random processes, spatial statistics models, and their applications to a wide range of areas, including image analysis and GIS (geographic information system). The subject will cover the methodology and modern developments for spatial-temporal modelling, estimation and prediction, spectral analysis of spatial processes and working with big spatial data. All the methods presented will be introduced and illustrated in the context of specific datasets with GRASS and R software. You will get experience with analysis of real-world data.

The literature abounds with findings that collectively may offer important new insights for the betterment of the medical, psychological and life sciences, to name just a few. This subject is designed to provide students with the ability to combine estimated measures of evidence, known as effects, from comparable studies to increase power. Estimators are introduced which are commonly found in meta-analytic research and pitfalls are discussed. On completion of the subject, the student will have an understanding of the different effects that can be collected from the literature as well as an appreciation of how effect sizes arising from data measured on different scales can be combined. Importantly, this subject also shows students how meta-regression can be used to account for study-specific covariates that cannot be adequately accounted for using random-effects models. The freely available software packages R and RevMan are used throughout the subject.

Specialisation: Data Modelling and Analytics (60CP)

Credits

Repeated measures data is used commonly in many disciplines including health, psychology, economics and biology. This subject provides students with the knowledge of how to perform the appropriate statistical analysis in a repeated measures data environment by using models such as the linear mixed model, correlated random effects model and marginal model. Students will learn how to examine research questions by applying these models using the R statistical package.

The literature abounds with findings that collectively may offer important new insights for the betterment of the medical, psychological and life sciences, to name just a few. This subject is designed to provide students with the ability to combine estimated measures of evidence, known as effects, from comparable studies to increase power. Estimators are introduced which are commonly found in meta-analytic research and pitfalls are discussed. On completion of the subject, the student will have an understanding of the different effects that can be collected from the literature as well as an appreciation of how effect sizes arising from data measured on different scales can be combined. Importantly, this subject also shows students how meta-regression can be used to account for study-specific covariates that cannot be adequately accounted for using random-effects models. The freely available software packages R and RevMan are used throughout the subject.

The advance in omics technology have seen an exponential increase in the volume of biological data in the last ten years. Statistical models play important roles in drawing conclusions from and making sense of the complex and often noisy omics data. This subject will introduce students to statistical issues and potential solutions to problems commonly encountered at various stage of omics data analysis, from data acquisition, alignment, quality controls, data analysis, visualization and interpretation. Topics covered will include introduction to next-generation sequencing and microarray technologies, batch effects and other unwanted variations, multiple hypothesis testing problems, statistical tests and models for high-dimensional data, data visualization and utilizing biological database via pathway-based analysis. Students will also be introduced to intermediate level of R programming language, including writing customized scripts and functions, developing R packages and working with ‘pipe’ operator. Bioconductor packages ( www.bioconductor.org ) and other freely-available Bioinformatics software will be used for all Lab sessions.

Quantitative analysis plays an important role in industrial data analytics and knowledge engineering, which makes it very useful to develop computing skills for data regression and classification. This subject covers fundamentals of machine learning techniques in theory and practice. The subject is designed to focus on solving industrial data modelling problems using neural networks. You will learn how to test various learning algorithms and compare performance evaluations. Some advanced machine learning techniques for data classification will also be addressed. You will work with industrial data modelling in labs and assignments to consolidate your knowledge and gain hands-on experience with machine learning applications.

Data Mining refers to various techniques which can be used to uncover hidden information from a database. The data to be mined may be complex data including big data, multimedia, spatial and temporal data, biological and health data. Data Mining has evolved from several areas including: databases, artificial intelligence, algorithms, information retrieval and statistics. This subject is designed to provide you with a solid understanding of data mining concepts and tools. The subject covers algorithms and techniques for data pre-processing, data classification, association rule mining, and data clustering. The subject also covers domain applications where data mining techniques are used.

The subject introduces you to spatial data analysis. It surveys the theory of spatial random processes, spatial statistics models, and their applications to a wide range of areas, including image analysis and GIS (geographic information system). The subject will cover the methodology and modern developments for spatial-temporal modelling, estimation and prediction, spectral analysis of spatial processes and working with big spatial data. All the methods presented will be introduced and illustrated in the context of specific datasets with GRASS and R software. You will get experience with analysis of real-world data.