This is a tutorial on how to download data from Etsy.

Etsy makes a lot of data avalailable to systematically download using its Application Programming Interface (API). If you sign up to be an Etsy developer, you can obtain a key to make queries directly to Etsy. With this key, you can obtain, among other data, sales ranks of various products.

Details about the associate program here:

https://www.etsy.com/developers/

This tutorial will demonstrate how to download such data with R. In particular, we will search for 1) listings, 2) store-level attributes, 3) user-level attributes, and 4) user level connections.

First, load the library required to parse the API output.

Etsy returns query results in Json format.

More on Json here: http://en.wikipedia.org/wiki/JSON

The jsonlite() package can translate JSON output into R dataframes.

library(jsonlite)

Get listings.

There is a wealth of data available, but we'll start by getting the listings.

You will need an API key, which is freely available on request.

Queries are formed via constructing URLs with specific parameters.

# The URL queries the API to get listings
# The fromJSON function transforms the JSON to an R dataframe
listing <- fromJSON(txt='https://openapi.etsy.com/v2/listings/active?api_key=REDACTED')

The JSON is parsed into a list of elements, one of which is a dataframe of item listings.

# Parse into dataframe
etsy_json <- as.data.frame(listing$results)

# The data are quite extensive, so we'll just peek at select variables
head(etsy_json)[,c("listing_id", "price", "quantity", "category_path")]
##   listing_id   price quantity                category_path
## 1  228814339   73.00        2           Housewares, Pillow
## 2  213822853 5900.00        1          Accessories, Wallet
## 3  227779303   45.00        1       Clothing, Women, Dress
## 4  228817180   25.00        1                Art, Painting
## 5   77609441  766.00      100                Jewelry, Ring
## 6  118479349   12.00        1 Accessories, Hair, Scrunchie

List store level attributes.

We can examine a number of store level attributes.

We'll choose the store “EPUU”

# The URL queries the API to get listings
# The fromJSON function transforms the JSON to an R dataframe
store <- fromJSON(txt='https://openapi.etsy.com/v2/shops/epuu/listings/active?api_key=REDACTED')

# Parse into dataframe
store_json <- as.data.frame(store$results)

# The data are quite extensive, so we'll just peek at select variables
head(store_json)[,c("state", "category_id", "category_path", "price", 
                    "views", "num_favorers", "materials")]
##    state category_id         category_path price views num_favorers
## 1 active    69151567     Jewelry, Necklace 40.00    38           13
## 2 active    69151501     Jewelry, Earrings 38.00   893          214
## 3 active    69154963 Weddings, Accessories 46.00    37            7
## 4 active    69154963 Weddings, Accessories 44.00    27            3
## 5 active    69154963 Weddings, Accessories 38.00     9            1
## 6 active    69154963 Weddings, Accessories 36.00    22            4
##                   materials
## 1                          
## 2 gold plated earring hooks
## 3                          
## 4                          
## 5                          
## 6

List individual level attributes.

We can examine a number of individual level attributes.

We'll redact the identity of this individual, but we can get their user ID, login name, and the number of feedback left.

# The URL queries the API to get listings
# The fromJSON function transforms the JSON to an R dataframe
user <- fromJSON(txt='https://openapi.etsy.com/v2/users/sidneydodge?api_key=REDACTED')

# Parse into dataframe
user_json <- as.data.frame(user$results)

# This is a list of the available variables. 
# Some of them are marked "NA" because they are not publicly available.
user_json
##    user_id  login_name creation_tsz user_pub_key referred_by_user_id
## 1 REDACTED    REDACTED   1322632980           NA                  NA
##   feedback_info.count feedback_info.score
## 1                   7                 100

List an individual's connection to others.

We can examine the individual's connection with others (i.e., egonets).

We'll redact the identity of this individual, but we can get their user ID, login name, and the number of feedback left.

# The URL queries the API to get listings
# The fromJSON function transforms the JSON to an R dataframe
connection <- fromJSON(txt='https://openapi.etsy.com/v2/users/REDACTED
                                /connected_users?api_key=REDACTED')

# Parse into dataframe
connection_json <- as.data.frame(connection$results)

# The data are quite extensive, so we'll just peek at select variables
head(connection_json[,c("user_id", "login_name", "creation_tsz")])
##    user_id       login_name creation_tsz
## 1     uid1                1   1354288084
## 2     uid2                2   1322441388
## 3     uid3                3   1360415993
## 4     uid4                4   1282958774
## 5     uid5                5   1350755065
## 6     uid6                6   1282339820