Welcome to the Advanced Statstics Workshops M.Res. unit BIOL63112. The following video (as with all the others in this unit) is best viewed in fullscreen mode at 1080 resolution. I have recorded the audio with a podcasting microphone, so it’s best listened to with headphones. YouTube generates subtitles automatically, so please turn those in if you’d find them useful.

As well as being able to contact me via email you can also contact me via our dedicated Slack channel. The details on how to join our Slack channel can be found on Blackboard. You can install the Slack app on your computer or phone - I recommend you do this as interacting with Slack via the app is really great.

You can speak to me in one of the timetabled Zoom sessions (the details of those are on Blackboard) or via a personal meeting request - just ping me via email or Slack to request that. Additionally, you can follow me on Twitter where you’ll see me tweet and re-tweet lots of Open Research and R content.

There are two assignments associated with this unit. The full details of these (plus hand in dates) can be found on the Blackboard page for this Unit.

   

Mixed Models Part 1

In this first part we will see how mixed models combine aspects of linear regression and ANOVA while circumventing the need for observations to be independent of each other. We will also examine how we model the influence of random effects in our mixed models, and see how mixed models can cope with unbalanced designs and missing data.

 

 

  

Mixed Models Part 2

In the second part we will examine mixed models for factorial designs, and explore how to model non-continuous dependent variables (e.g., binary and ordinal outcome variables) using the glmer() family of mixed models.

 

 

  

Data Simulation

In this session we will look at some R programming techniques that you can use to create simulated data sets. Data simulation is a really powerful tool as it allows you to determine whether you’re likely to find an effect (for an assumed effect size) by simulating (for example) 10,000 experiments of a particular design with a particular sample size. It’s also a great technique for determining whether your dataset will be sufficiently rich in order for you to build the kinds of models you need to in order to test a particular set of predictions.

 

 

  

Text Mining

In this session we will cover the basics of text mining using R and examine how to count the occurrences of words in a text, engage in a basic sentiment analysis to examine what kinds of sentiments might be most common in a text. We’ll also cover how to use measures such as term frequency-inverse document frequency as a way to understand what words (or phrases) are most uniquely associated with a text (compared to another set of texts).

 

 

  

Hackathon (Two Sessions)

This will be a fully Zoom-based activity so come along to the timetabled Zoom sessions to find out more…

  

Technical Details

All of the material in this workshop was created using open source where possible using an Entroware Apollo laptop running GNU/Linux distro Ubuntu 20.04 LTS (Focal Fossa). The audio was captured with a Fifine USB Podcasting microphone and the video with a Razer Kiyo webcam. The audio and video were recorded using Open Broadcast Software and edited using Shotcut. The R code was written using R 3.6.3, and run in the RStudio Desktop IDE version 1.3.959. Ubuntu 20.04 LTS (Focal Fossa), OBS, Shotcut, R, and RStudio Desktop are all open source.

The structure for this unit was very much inspired by the Sharing At Short Notice webinar by Alison Hill and Desirée De Leon.

The repo for each workshop can be accessed via the ‘Improve this Workshop’ link at the bottom of each workshop page. The workshops and this website were all written using R Markdown and the website is hosted on Netlify via continunous deployment from this GitHub repository.

The source code for each of the Workshops above is licensed under the MIT license, and the lecture content under CC-BY.