Data Science for Librarians Final Project

Analyzing the Popularity of Content in the Astrophysics Data System

by Alex Holachek

The Data Science for Librarians course introduced us to an array of technical tools for better capturing and making sense of the streams of information that surround us every day. We learned a lot in a short amount of time: how to access information through APIs, how to manipulate data using Python and natural language text processing, how to screen scrape to collect information not well-exposed through APIs, and finally, how to analyze and visualize the gathered data. The four-part project described on these pages explored applications of the different techniques covered in the course.

It was empowering to learn that you can become somewhat proficient in these tools over the course of a few months. The projects featured here were undertaken in the spirit of "learning by doing," so any feedback would be appreciated! You can use the navigation at the top to take a look at the four distinct but somewhat inter-related projects that I undertook in order to learn different ways of answering the question, "How can we analyze the popularity of different topics within the Astrophysics Data System?"

One thing that came up again and again was the need to refine data before it was usable. To this end, I wrote a class in Python that makes the most useful functions from the NLTK module even easier to use right away. You can find it here.