Tutorial
Data Collection and Wrangling
Data wrangling is the process of transforming and mapping data from one raw data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics. Here I have combined multiple Data Wrangling steps intop groups on which success of any Data Analytics / Science project depends.
Topics to be covered:
• Data Cleaning: Missing Value Treatment, Outlier Treatment, Data Validation
• Data Manipulation: Subsetting, Indexing, Groupby, Aggregation, Pivot tables, Data Merge, Reshaping, Creating new variables, Sorting.
Relevant Libraries:
• Python — Pandas, Numpy, Scipy, Matplotlib, Seaborn, folium, bukeh
• R — dplyr, sqldf, data.table, stringr, tm, ggplot2, ggviz, RWorldMap
Skills that you will get in this material