In this example, we'll be working with data from ShoeFly.com, a fictional online shoe store.
# Before we analyze anything, we need to import pandas
# and load our data
import pandas as pd
df = pd.read_csv('shoefly_page_visits.csv')
Let's examine the first 10 rows of our data!
df.head(10)
Notice that there's a column called "utm_source". This column tells us the website that sent users to ShoeFly.com. There's also a column called "month", which tells us the month in which this user visited ShoeFly.com.
We want to know how our sources have changed from month to month. Let's investigate!
# This command shows us how many users visited the site from different sources in different months.
df.groupby(['month', 'utm_source']).id.count().reset_index()
# This command shows us how many users visited the site from different sources in different months.
df.groupby(['month', 'utm_source']).id.count()\
.reset_index()\
.pivot(columns='month', index='utm_source', values='id')
Over the course of these three months, it looks like we got more visits from "email" and "google", but fewer visits from "facebook" and "twitter". The number of visits from "yahoo" stayed mostly the same.