This book will make the link between data cleaning and preprocessing to help you to take effective business decisions using data analytics Key Features * Become well-versed with the core concepts of data cleaning, data fusion, data reduction, and data integration * Get ready to make the most of your data with powerful data transformation and massaging techniques * Learn how to apply Multi-Layered Perceptron (MLP) to clean and create issue-free data Book Description Data preprocessing is the first step in data visualization, data analytics, and machine learning, where data is prepared for analytics functions to get the best possible insights. Around 90% of the time spent on data analytics, data visualization, and machine learning projects is dedicated to performing data preprocessing. This book will equip you with optimum data preprocessing techniques from multiple perspectives. You'll learn different technical and analytical aspects of data preprocessing - data collection, data cleaning, data integration, data reduction, and data transformation - and get to grips with implementing them using the open-source Python programming environment. The book will provide a comprehensive articulation of data preprocessing, its whys and hows, and help you identify analytics opportunities where data analytics could lead to more effective decision making. It also demonstrates the role of data management systems and technologies for effective analytics and how to create queries to pull data from relational databases. By the end of this Python data preprocessing book, you'll be able to use Python to read, manipulate, and analyze data, perform data cleaning, integration, reduction techniques, and handle outliers or missing values to implement the appropriate data transformation method. What you will learn * Use Python to perform analytics functions on your data * Learn the role of databases and connect to them effectively for your analytics requirements * Perform data cleaning and preprocessing defined by your analytics goals * Understand and resolve the challenges faced while performing data integration * Discover different data reduction methods and learn how to execute them effectively * Explore a variety of data transformation methods and choose the most suitable method for your use case Who This Book Is For Junior and senior data analysts, business intelligence professionals, engineering undergraduates, and data enthusiasts looking to perform pre-processing and data cleaning on large amounts of data will find this book useful. Basic programming skills such as working with variables, conditionals, and loops, along with beginner-level knowledge of Python and simple analytics experience is assumed. Table of Contents * Review of the Core Modules NumPy and Pandas * Review of Another Core Module: Matplotlib * Data - What Is It Really? * Databases * Data Visualization * Prediction * Classification * Clustering Analysis * Data Cleaning Level I-Clean Up the Table * Data Cleaning Level II- Unpack, Restructure, and Reformulate the Table * Data Cleaning Level III- Missing Values, Outliers, and Errors * Data Fusion and Integration * Data Reduction * Data Massaging and Transformation * Case Study 1: Mental Health in Tech * Case Study 2: Predict COVID Hospitalization * Case Study 3: United States Counties Clustering Analysis * Practice Cases
Book InformationISBN 9781801072137
Author Roy JafariFormat Paperback
Page Count 554
Imprint Packt Publishing LimitedPublisher Packt Publishing Limited
Weight(grams) 75g