Material: Data Collection and Wrangling

Tutorial

Data Collection and Wrangling

Data wrangling is the process of transforming and mapping data from one raw data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics. Here I have combined multiple Data Wrangling steps intop groups on which success of any Data Analytics / Science project depends. Topics to be covered: • Data Cleaning: Missing Value Treatment, Outlier Treatment, Data Validation • Data Manipulation: Subsetting, Indexing, Groupby, Aggregation, Pivot tables, Data Merge, Reshaping, Creating new variables, Sorting. Relevant Libraries: • Python — Pandas, Numpy, Scipy, Matplotlib, Seaborn, folium, bukeh • R — dplyr, sqldf, data.table, stringr, tm, ggplot2, ggviz, RWorldMap

Skills that you will get in this material

Data Wrangling

External link: