Data Science with R
Data Analysis and prediction algorithms with R. Third Edition
2025-12-25
Preface
Welcome to the third edition of Data Science with R!
In an era where data-driven decisions shape industries from healthcare to finance, mastering R gives you the power to extract insights, build predictive models, and communicate findings effectively. This book has evolved from personal learning notes into a comprehensive resource that takes you from fundamentals to advanced data science techniques using practical, hands-on exercises.
This book is designed for beginners with no prior R experience who want a structured path into data science, as well as analysts looking to upgrade from spreadsheets to reproducible R workflows. It also serves students, professionals, and practitioners seeking to modernize their machine learning skills with tidymodels. Basic familiarity with statistics concepts is helpful but not required, as all code examples are self-contained and explained step-by-step.
The third edition reflects the latest developments in the R ecosystem. We have updated everything to run on R 4.5.2 and RStudio 2025.09.2. A major shift in this edition is the full migration to tidymodels for machine learning and the adoption of modern tidyverse patterns, including the native pipe operator. We have also introduced entirely new topics such as Generative AI and LLM integration with R, AI-assisted coding workflows, ethics in data science, and enhanced text mining. You will also find expanded content on interactive visualization, deep learning with Keras and TensorFlow, big data processing with Sparklyr, and reproducible workflows using Git and GitHub.
Each chapter builds on previous concepts, but you can also jump to topics of interest. If you are learning R from scratch, start with the Fundamentals in the first two chapters. Chapter 3 covers visualization with ggplot2, while Chapter 12 dives into building machine learning models. For those interested in working with Large Language Models, Chapter 14 covers Generative AI. Throughout the book, you will find hands-on exercises to test your understanding. Solutions are provided, but we encourage you to try them yourself first.
Just like previous editions, many exercises are inspired by practical classroom experiences and activities from the Professional Certificate in Data Science1 by HarvardX. The code used to generate this book is available on GitHub, encouraging transparency and reproducibility.
Support This Work
Over 700 hours went into creating this resource. If you find it valuable, consider purchasing the PDF on Leanpub. Your purchase includes:
- Future updates at no extra cost
- Three months of direct Q&A access with the author
- Support for keeping the web version free for everyone
The web version available at GitHub Pages2 seeks to democratize data science knowledge. Share it and let’s contribute together to freeing knowledge.
Stay Connected
This book has reached readers in Mexico, Colombia, Spain, Peru, Chile, and many other countries. I deeply thank readers of previous editions for their comments and suggestions, which have been fundamental to improving each version.
If you have questions or suggestions, write to me at dparedesi@uni.pe. I usually respond within 48 hours.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.