The New York Times "Astronomy and Astrophysics" section offers an excellent series of articles explaining recent breakthroughs in the field. At the beginning of this class, we set out to find an automated way to associate articles in the Astronomy and Astrophysics section with papers in the Astrophysics Data System.
Unfortunately, the NY Times API was too limited to allow us to associate articles easily, so I decided to use screenscraping instead. I ended up using the Python Scrapy framework to write a spider. What it did was search through every Astronomy and Astrophysics article in the NY Times, looking for a link to the scientific publication that featured the article(s) under discussion. If it could find such a link, it would follow it and search the scientific publication's page for a DOI (document object identifier). Once this was found, it was easy to automate a search in the ADS API for the bibcode that uniquely identified that paper in the ADS (a bibcode is similar to a DOI, but unique to the ADS).