This self-contained guide introduces two pillars of data science, probability theory and statistics, side by side, illuminating the connections between probabilistic concepts and the statistical techniques they employ, such as the relationship between nonparametric and parametric models and random variables. Other topics covered include hypothesis testing, principal component analysis, correlation, and regression. Examples throughout the book draw from real-world datasets, quickly demonstrating concepts in practice and confronting readers with fundamental challenges in data science, such as overfitting, the curse of dimensionality, and causal inference. Code in Python reproducing these examples is available on the book's website, along with videos, slides, and solutions to exercises. This accessible book is ideal for undergraduate and graduate students, data science practitioners, and others interested in the theoretical concepts underlying data science methods.
A self-contained introduction to probability and statistics for data science with examples involving real-world datasets.About the AuthorCarlos Fernandez-Granda is Associate Professor of Mathematics and Data Science at New York University, where he has taught probability and statistics to data science students since 2015. The goal of his research is to design and analyze data science methodology, with a focus on machine learning, artificial intelligence, and their application to medicine, climate science, biology, and other scientific domains.
Book InformationISBN 9781009180085
Author Carlos Fernandez-GrandaFormat Hardback
Page Count 700
Imprint Cambridge University PressPublisher Cambridge University Press