Resources for learning Data Science

Books

  1. Becoming a Data Head: How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning (2021)
    by Alex J. Gutman, Jordan Goldmeier
  • Description: “You’ll learn how to: Think statistically and understand the role variation plays in your life and decision-making Speak intelligently and ask the right questions about the statistics and results you encounter in the workplace Understand what’s really going on with machine learning, text analytics, deep learning, and artificial intelligence Avoid common pitfalls when working with and interpreting data Becoming a Data Head is a complete guide for data science in the workplace: covering everything from the personalities you’ll work with to the math behind the algorithms. The authors have spent years in data trenches and sought to create a fun, approachable, and eminently readable book. Anyone can become a Data Head–an active participant in data science, statistics, and machine learning. Whether you’re a business professional, engineer, executive, or aspiring data scientist, this book is for you.”
  • Reason for this recommendation: This book serves as a superb and gentle introduction to the realm of Data Science, particularly for those who are venturing into this field without any prior knowledge or background. It is a perfect guide for absolute beginners who are at a loss to grasp the intricate complexities of this subject. 
    If you find yourself tasked with handling data and wish to gain insights and a clear understanding of the critical concepts within Data Science, this book can serve as your stepping stone. The overwhelming technical terminology and abstract mathematical jargon often associated with this field are skillfully avoided.
    The authors have put in tremendous effort to simplify everything, breaking down complex ideas into easily digestible chunks for the reader’s convenience. They use understandable examples and explain each concept in the simplest of terms, making it possible for beginners to gain foundational knowledge without feeling overwhelmed.
  • Link: https://www.goodreads.com/uk/book/show/56357967-becoming-a-data-head

  1. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. Second Edition (2023) 
    by Hadley Wickham, Mine Çetinkaya-Rundel, Garrett Grolemund
  • Description: “This book will teach you how to do data science with R: You’ll learn how to get your data into R, get it into the most useful structure, transform it, and visualize.
    In this book, you will find a practicum of skills for data science. Just as a chemist learns how to clean test tubes and stock a lab, you’ll learn how to clean data and draw plots—and many other things besides. These are the skills that allow data science to happen, and here you will find the best practices for doing each of these things with R. You’ll learn how to use the grammar of graphics, literate programming, and reproducible research to save time. You’ll also learn how to manage cognitive resources to facilitate discoveries when wrangling, visualizing, and exploring data.”
  • Reason for this recommendation: This book focuses on basic Data Science topics such as how to import data, transform it into a useful form, and visualize it in R. With authors like Hadley Wickham, who serves as the Chief Scientist at Posit (RStudio), high-quality educational content is guaranteed. Thanks to its quite large size of 576 pages, the book provides detailed explanations of every step necessary for working with data at a basic, yet extremely crucial level. It establishes solid foundations for more complex methods. It’s highly recommended for beginners in R.
  • Link: https://r4ds.hadley.nz/

  1. Learning Statistics with R: A Tutorial for Psychology Students and Other Beginners 
    by Danielle Navarro
  • Description: “Learning Statistics with R covers the contents of an introductory statistics class, as typically taught to undergraduate psychology students, focusing on the use of the R statistical software. The book discusses how to get started in R as well as giving an introduction to data manipulation and writing scripts. From a statistical perspective, the book discusses descriptive statistics and graphing first, followed by chapters on probability theory, sampling and estimation, and null hypothesis testing. After introducing the theory, the book covers the analysis of contingency tables, t-tests, ANOVAs, and regression. Bayesian statistics are covered at the end of the book.”
  • Reason for this recommendation: Danielle Navarro authored this tutorial specifically for teaching an introductory statistics class to psychology students, utilizing R. The content is written in an easily understandable manner for those without a technical background. This resource is freely available in multiple formats including a PDF file and an HTML website (bookdown adaptation). It is highly recommended for beginners in R and Data Science. 
  • Link: https://learningstatisticswithr.com/

  1. An Introduction to Statistical Learning with Applications in R. Second Edition (2023)
    by G. James, D. Witten, T. Hestie, and R. Tibshirani
  • Description: “As the scale and scope of data collection continue to increase across virtually all fields, statistical learning has become a critical toolkit for anyone who wishes to understand data. An Introduction to Statistical Learning provides a broad and less technical treatment of key topics in statistical learning. This book is appropriate for anyone who wishes to use contemporary tools for data analysis.
    The first edition of this book, with applications in R (ISLR), was released in 2013. A 2nd Edition of ISLR was published in 2021. It has been translated into Chinese, Italian, Japanese, Korean, Mongolian, Russian, and Vietnamese. The Python edition (ISLP) was published in 2023. Each edition contains a lab at the end of each chapter, which demonstrates the chapter’s concepts in either R or Python.”
  • Reason for this recommendation: The first edition of the ebook authored by Trevor Hastie and Rob Tibshirani was already a tremendous read. It serves as one of the best introductory books on these topics. Not only have these authors prepared an enriching read, but they’ve also designed the MOOC named “Statistical Learning” which is one of the best I’ve ever taken. Both resources are highly recommended. However, I would suggest having some prior experience with R, allowing you to focus more on the core topics.
  • Link: https://www.statlearning.com/

Online courses

  1. Statistical Learning
    by Trevor Hastie and Robert Tibshirani — Professors of Biomedical Data Science and Statistics at Stanford University.
  • Description: “This is an introductory-level course in supervised learning, with a focus on regression and classification methods. […] This is not a math-heavy class, so we try and describe the methods without heavy reliance on formulas and complex mathematics. We focus on what we consider to be the important elements of modern data science. Computing is done in R. There are lectures devoted to R, giving tutorials from the ground up, and progressing with more detailed sessions that implement the techniques in each chapter”.
  • Reason for this recommendation: I completed this course in 2014. It remains one of the best MOOCs I’ve ever participated in. The course provides top-notch introductory content on supervised learning. However, the highlight of the course is not just the content, but the instructors and their teaching style. They deliver their lectures in a highly engaging manner, interacting with one another and explaining complex concepts in easy-to-understand terms. I found myself smiling throughout the video lessons. This course truly was a wonderful learning experience, perhaps the best I’ve ever had with MOOCs. To get a taste of what the course is like, you can watch this sample video: https://youtu.be/9vlDVxG4ulA?si=sdBTnF5trQ9xRAtU. In addition, there’s a fantastic free accompanying book that I highly recommend. I’ve included it in the recommended resource list under the book section.
  • Link: https://mitxonline.mit.edu/courses/course-v1:MITxT+JPAL101x/

  1. Evaluating Social Programs
    by J-PAL (Abdul Latif Jameel Poverty Action Lab) and MITx (Massachusetts Institute of Technology).
  • Description: “Learn why and when randomized evaluations can be used to rigorously evaluate the impact of social programs and how findings can inform the design of evidence-based policies and programs”.
  • Reason for this recommendation: I successfully finished this course (with distinction) in 2014. One of the instructors was Esther Duflo — Professor of Poverty Alleviation and Development Economics, who subsequently won the Nobel Prize in Economic Sciences in 2019. Though it wasn’t an easy course, it was enriched with very high-quality content. I’m recommending this great resource, especially for someone interested in the intersection of social sciences, randomized evaluations, and social programs. It’s more suitable for advanced students and might not be the best choice for beginners.
  • Link: https://mitxonline.mit.edu/courses/course-v1:MITxT+JPAL101x/

If you’re a newcomer eager to explore the world of Data Science and R programming, I’ve got some news to share. A few of the MOOCs I previously recommended for beginners are no longer accessible, but here’s the silver lining – there are more options available now. While I haven’t had the opportunity to try them all myself, I’ve done some research to curate a list for you. After giving them a shot, feel free to reach out and share your feedback. Just click “Contact” and drop me an email to let me know if they were worth your time.

  1. Data Analysis with R Programming
    by Google Career Certificates
  • Description: “These courses will equip you with the skills needed to apply to introductory-level data analyst jobs. In this course, you’ll learn about the programming language known as R. You’ll find out how to use RStudio, the environment that allows you to work with R. This course will also cover the software applications and tools that are unique to R, such as R packages. You’ll discover how R lets you clean, organize, analyze, visualize, and report data in new and more powerful ways.”
  • Reason for this recommendation: It appears that the course focuses on the most basic topics and proceeds through them at a deliberate pace. Though it introduces the concept of R Markdown instead of its successor, Quarto, this is not a significant concern.  The course has also received a high rating (4.8/5) and is recommended by thousands of participants, making it likely well worth your time.
  • Link: https://www.coursera.org/learn/data-analysis-r

  1. R Programming Fundamentals
    by Susan Holmes — Professor of Statistics at Stanford University
  • Description: “This course covers an introduction to R, from installation to basic statistical functions. You will learn to work with variable and external data sets, write functions, and hear from one of the co-creators of the R language, Robert Gentleman.”
  • Reason for this recommendation: This course may not have extensive popularity, but it is centered on the crucial tasks for beginners. It is worth exploring.
  • Link: https://www.edx.org/learn/r-programming/stanford-university-r-programming-fundamentals

  1. Data Science: R Basics
    by Rafael Irizarry — Professor of Biostatics at Harvard University
  • Description: “This course will introduce you to the basics of R programming. You can better retain R when you learn it to solve a specific problem, so you’ll use a real-world dataset about crime in the United States. You will learn the R skills needed to answer essential questions about differences in crime across the different states.”
  • Reason for this recommendation:  This course also focuses on some technical skills that a Data Scientist would need, such as file organization, Unix/Linux, and Git/GitHub. Additionally, it provides coverage on the basics of R syntax and data structures, accompanied by data visualization. It is certainly worth checking out.
  •  Link: https://www.edx.org/learn/r-programming/harvard-university-data-science-r-basics

  1. The Analytics Edge
    by Dimitris Bertsimas - Boeing Professor of Operations Research at Massachusetts Institute of Technology MIT (and others)
  • Description: “We will examine real-world examples of how analytics have been used to significantly improve a business or industry. These examples include Moneyball, eHarmony, the Framingham Heart Study, Twitter, IBM Watson, and Netflix. Through these examples and many more, we will teach you the following analytics methods: linear regression, logistic regression, trees, text analytics, clustering, visualization, and optimization. We will be using the statistical software R to build models and work with data. The contents of this course are essentially the same as those of the corresponding MIT class (The Analytics Edge). It is a challenging class, but it will enable you to apply analytics to real-world applications.”
  • Reason for this recommendation: I completed this online course quite some time ago, and I found it to be very valuable. However, I would recommend it more for individuals who already have some prior experience in the subject. If you are a complete beginner, you might want to consider starting with some less demanding courses.
  •  Link: https://www.edx.org/learn/analytics/massachusetts-institute-of-technology-the-analytics-edge

  1. Analyzing Data with R
    by IBM
  • Description: “By playing the role of a data analyst who is analyzing airline departure and arrival data to predict flight delays, you will build hands-on experience delivering insights using data. Using an Airline Reporting Carrier On-Time Performance Dataset, you will practice reading data files, preprocessing data, creating models, improving models, and evaluating them to ultimately choose the best one to use. Note: The prerequisite for this course is basic R programming skills.”
  • Reason for this recommendation: This course would be suitable for individuals looking to advance their skills in developing predictive models using regression methods. It may not be ideal for absolute beginners. However, those who already possess basic R programming skills could find it beneficial and enjoyable.
  •  Link: https://www.edx.org/learn/data-analysis/ibm-analyzing-data-with-r