Today we’ll be looking at how to acquire data from the popular movie site, Rotten Tomatoes. To follow along, you’ll want to sign up for an API key here. When you get your key, make a note of your usage limit, if there is one. You don’t want to do too many calls to their API or you may get your key revoked. Finally, it’s always a very good idea to read the documentation of the API you will be using. Here are a couple of links:
- Regular Rotten Tomatoes documentation
- Dynamic documentation
Once you’ve perused that or decided that you’ll save it for later, we’ll continue our journey.
Starting the Show
Rotten Tomatoes’ API provides a set of json feeds that we can extract data from. We’ll be using requests and simplejson to pull the data down and process it. Let’s write a little script that can get the currently playing movies.
import requests import simplejson #---------------------------------------------------------------------- def getInTheaterMovies(): """ Get a list of movies in theaters. """ key = "YOUR API KEY" url = "http://api.rottentomatoes.com/api/public/v1.0/lists/movies/in_theaters.json?apikey=%s" res = requests.get(url % key) data = res.content js = simplejson.loads(data) movies = js["movies"] for movie in movies: print movie["title"] #---------------------------------------------------------------------- if __name__ == "__main__": getInTheaterMovies()
If you run this code, you’ll see a list of movies printed to stdout. When this script was run at the time of this writing, I got the following output:
Free Birds Gravity Ender's Game Jackass Presents: Bad Grandpa Last Vegas The Counselor Cloudy with a Chance of Meatballs 2 Captain Phillips Carrie Escape Plan Enough Said Insidious: Chapter 2 12 Years a Slave We're The Millers Prisoners Baggage Claim
In the code above, we build a URL using our API key and use requests to download the feed. Then we load the data into simplejson which returns a nested Python dictionary. Next we loop over the movies dictionary and print out each movie’s title. Now we’re ready to create a function to extract additional information from Rotten Tomatoes about each of these movies.
import requests import simplejson import urllib #---------------------------------------------------------------------- def getMovieDetails(key, title): """ Get additional movie details """ if " " in title: parts = title.split(" ") title = "+".join(parts) link = "http://api.rottentomatoes.com/api/public/v1.0/movies.json" url = "%s?apikey=%s&q=%s&page_limit=1" url = url % (link, key, title) res = requests.get(url) js = simplejson.loads(res.content) for movie in js["movies"]: print "rated: %s" % movie["mpaa_rating"] print "movie synopsis: " + movie["synopsis"] print "critics_consensus: " + movie["critics_consensus"] print "Major cast:" for actor in movie["abridged_cast"]: print "%s as %s" % (actor["name"], actor["characters"][0]) ratings = movie["ratings"] print "runtime: %s" % movie["runtime"] print "critics score: %s" % ratings["critics_score"] print "audience score: %s" % ratings["audience_score"] print "for more information: %s" % movie["links"]["alternate"] print "-" * 40 print #---------------------------------------------------------------------- def getInTheaterMovies(): """ Get a list of movies in theaters. """ key = "YOUR API CODE" url = "http://api.rottentomatoes.com/api/public/v1.0/lists/movies/in_theaters.json?apikey=%s" res = requests.get(url % key) data = res.content js = simplejson.loads(data) movies = js["movies"] for movie in movies: print movie["title"] getMovieDetails(key, movie["title"]) print #---------------------------------------------------------------------- if __name__ == "__main__": getInTheaterMovies()
This new code pulls out a lot of data about each of the movies, but the json feeds contains quite a bit more that is not shown in this example. You can see what you’re missing out on by just printing the js dictionary to stdout or you can see an example json feed on the Rotten Tomatoes docs page. If you’ve been paying close attention, you’ll notice that the Rotten Tomatoes API doesn’t cover a lot of the data on their website. For example, there is no way to pull actor information itself. For example, if we wanted to know what movies Jim Carrey was in, there is no URL endpoint to query against. You also cannot look up anyone else in the cast, such as the director or producer. The information is on the website, but is not exposed by the API. For that, we would have to turn to the Internet Movie Database (IMDB), but that will be the topic of a different article.
Let’s spend some time improving this example. One simple improvement would be to put the API key into a config file. Another would be to actually store the information we’re downloading into a database. A third improvement would be to add some code that checks if we’ve already downloaded today’s current movies because there really isn’t a good reason to download today’s releases more than once a day. Let’s add those features!
Adding a Config File
I prefer and recommend ConfigObj for dealing with config files. Let’s create a simple “config.ini” file with the following contents:
[Settings] api_key = API KEY last_downloaded =
Now let’s change our code to import ConfigObj and change the getInTheaterMovies function to use it:
import requests import simplejson import urllib from configobj import ConfigObj #---------------------------------------------------------------------- def getInTheaterMovies(): """ Get a list of movies in theaters. """ config = ConfigObj("config.ini") key = config["Settings"]["api_key"] url = "http://api.rottentomatoes.com/api/public/v1.0/lists/movies/in_theaters.json?apikey=%s" res = requests.get(url % key) data = res.content js = simplejson.loads(data) movies = js["movies"] for movie in movies: print movie["title"] getMovieDetails(key, movie["title"]) print #---------------------------------------------------------------------- if __name__ == "__main__": getInTheaterMovies()
As you can see, we import configobj and pass it our filename. You could also pass it the fully qualified path. Next we pull out the value of api_key and use it in our URL. Since we have a last_downloaded value in our config, we should go ahead and add that to our code so we can prevent downloading the data multiple times a day.
import datetime import requests import simplejson import urllib from configobj import ConfigObj #---------------------------------------------------------------------- def getInTheaterMovies(): """ Get a list of movies in theaters. """ today = datetime.datetime.today().strftime("%Y%m%d") config = ConfigObj("config.ini") if today != config["Settings"]["last_downloaded"]: config["Settings"]["last_downloaded"] = today try: with open("config.ini", "w") as cfg: config.write(cfg) except IOError: print "Error writing file!" return key = config["Settings"]["api_key"] url = "http://api.rottentomatoes.com/api/public/v1.0/lists/movies/in_theaters.json?apikey=%s" res = requests.get(url % key) data = res.content js = simplejson.loads(data) movies = js["movies"] for movie in movies: print movie["title"] getMovieDetails(key, movie["title"]) print #---------------------------------------------------------------------- if __name__ == "__main__": getInTheaterMovies()
Here we import Python’s datetime module and use it to get today’s date in the following format: YYYYMMDD. Next we check if the config file’s last_downloaded value equals today’s date. If it does, we do nothing. However, if they don’t match, we set last_downloaded to today’s date and then we download the movie data. Now we’re ready to learn how to save the data to a database.
Saving the Data with SQLite
Python has supported SQLite natively since version 2.5, so unless you’re using a really old version of Python, you should be able to follow along with this part of the article without any problems. Basically, we just need to add a function that can create a database and save our data into it. Here is the function:
#---------------------------------------------------------------------- def saveData(movie): """ Save the data to a SQLite database """ if not os.path.exists("movies.db"): # create the database conn = sqlite3.connect("movies.db") cursor = conn.cursor() cursor.execute("""CREATE TABLE movies (title text, rated text, movie_synopsis text, critics_consensus text, runtime integer, critics_score integer, audience_score integer)""") cursor.execute(""" CREATE TABLE cast (actor text, character text) """) cursor.execute(""" CREATE TABLE movie_cast (movie_id integer, cast_id integer, FOREIGN KEY(movie_id) REFERENCES movie(id), FOREIGN KEY(cast_id) REFERENCES cast(id) ) """) else: conn = sqlite3.connect("movies.db") cursor = conn.cursor() # insert the data print sql = "INSERT INTO movies VALUES(?, ?, ?, ?, ?, ?, ?)" cursor.execute(sql, (movie["title"], movie["mpaa_rating"], movie["synopsis"], movie["critics_consensus"], movie["runtime"], movie["ratings"]["critics_score"], movie["ratings"]["audience_score"] ) ) movie_id = cursor.lastrowid for actor in movie["abridged_cast"]: print "%s as %s" % (actor["name"], actor["characters"][0]) sql = "INSERT INTO cast VALUES(?, ?)" cursor.execute(sql, (actor["name"], actor["characters"][0] ) ) cast_id = cursor.lastrowid sql = "INSERT INTO movie_cast VALUES(?, ?)" cursor.execute(sql, (movie_id, cast_id) ) conn.commit() conn.close()
This code first checks to see if the database file already exists. If it does not, then it will create the database along with 3 tables. Otherwise the saveData function will create a connection and a cursor object. Next it will insert the data using the movie dictionary that is passed to it. We’ll call this function and pass the movie dictionary from the getMovieDetails function. Finally, we will commit the data to the database and close the connection.
You’re probably wondering what the complete code looks like. Well, here it is:
import datetime import os import requests import simplejson import sqlite3 import urllib from configobj import ConfigObj #---------------------------------------------------------------------- def getMovieDetails(key, title): """ Get additional movie details """ if " " in title: parts = title.split(" ") title = "+".join(parts) link = "http://api.rottentomatoes.com/api/public/v1.0/movies.json" url = "%s?apikey=%s&q=%s&page_limit=1" url = url % (link, key, title) res = requests.get(url) js = simplejson.loads(res.content) for movie in js["movies"]: print "rated: %s" % movie["mpaa_rating"] print "movie synopsis: " + movie["synopsis"] print "critics_consensus: " + movie["critics_consensus"] print "Major cast:" for actor in movie["abridged_cast"]: print "%s as %s" % (actor["name"], actor["characters"][0]) ratings = movie["ratings"] print "runtime: %s" % movie["runtime"] print "critics score: %s" % ratings["critics_score"] print "audience score: %s" % ratings["audience_score"] print "for more information: %s" % movie["links"]["alternate"] saveData(movie) print "-" * 40 print #---------------------------------------------------------------------- def getInTheaterMovies(): """ Get a list of movies in theaters. """ today = datetime.datetime.today().strftime("%Y%m%d") config = ConfigObj("config.ini") if today != config["Settings"]["last_downloaded"]: config["Settings"]["last_downloaded"] = today try: with open("config.ini", "w") as cfg: config.write(cfg) except IOError: print "Error writing file!" return key = config["Settings"]["api_key"] url = "http://api.rottentomatoes.com/api/public/v1.0/lists/movies/in_theaters.json?apikey=%s" res = requests.get(url % key) data = res.content js = simplejson.loads(data) movies = js["movies"] for movie in movies: print movie["title"] getMovieDetails(key, movie["title"]) print #---------------------------------------------------------------------- def saveData(movie): """ Save the data to a SQLite database """ if not os.path.exists("movies.db"): # create the database conn = sqlite3.connect("movies.db") cursor = conn.cursor() cursor.execute("""CREATE TABLE movies (title text, rated text, movie_synopsis text, critics_consensus text, runtime integer, critics_score integer, audience_score integer)""") cursor.execute(""" CREATE TABLE cast (actor text, character text) """) cursor.execute(""" CREATE TABLE movie_cast (movie_id integer, cast_id integer, FOREIGN KEY(movie_id) REFERENCES movie(id), FOREIGN KEY(cast_id) REFERENCES cast(id) ) """) else: conn = sqlite3.connect("movies.db") cursor = conn.cursor() # insert the data print sql = "INSERT INTO movies VALUES(?, ?, ?, ?, ?, ?, ?)" cursor.execute(sql, (movie["title"], movie["mpaa_rating"], movie["synopsis"], movie["critics_consensus"], movie["runtime"], movie["ratings"]["critics_score"], movie["ratings"]["audience_score"] ) ) movie_id = cursor.lastrowid for actor in movie["abridged_cast"]: print "%s as %s" % (actor["name"], actor["characters"][0]) sql = "INSERT INTO cast VALUES(?, ?)" cursor.execute(sql, (actor["name"], actor["characters"][0] ) ) cast_id = cursor.lastrowid sql = "INSERT INTO movie_cast VALUES(?, ?)" cursor.execute(sql, (movie_id, cast_id) ) conn.commit() conn.close() #---------------------------------------------------------------------- if __name__ == "__main__": getInTheaterMovies()
If you use Firefox, there’s a fun plugin called SQLite Manager that you can use to visualize the database that we’ve created. Here is a screenshot of what was produced at the time of writing:
Wrapping Up
There are still lots of things that should be added. For example, we need some code in the getInTheaterMovies function that will load the details from the database if we’ve already got the current data. We also need to add some logic to the database to prevent us from adding the same actor or movie multiple times. It would be nice if we had some kind of GUI or web interface as well. These are all things you can add as a fun little exercise.
By the way, this article was inspired by the Real Python for the Web book by Michael Herman. It has lots of neat ideas and samples in it. You can check it out here.
Related Reading
- Python: A Simple Step-by-Step SQLite Tutorial
- wxPython and SQLAlchemy: Loading Random SQLite Databases for Viewing
- StackOverflow: SQLite foreign key examples
- Python’s official documentation on the sqlite3 module
I wasn’t aware that requests had json support built in. I probably should have used Python’s json module though…
What version of python are you guys using? I’m having syntax error with v3. I’m also getting a global name ‘requests’ is not define. Has anyone been able to use this code with just copy and paste?
I was using Python 2.6 or 2.7 when I wrote the article. I also mentioned at the beginning that you’ll need to get the requests and simplejson packages: http://www.python-requests.org/en/latest/, https://pypi.python.org/pypi/simplejson/
Is there any api by rotten tomato that can return all the movies till date , with the same info as a result.
I don’t know, actually. You’ll want to read their API documentation to find out. I don’t think it had that capability when I wrote this article, but that was nearly two years ago, so it has hopefully changed.
its done Mike, found some API to do that.
thanks..
hi, i’ve tried your first script..but it doesnt display anything. may i know why?
I am guessing that something in their API has changed. You will note that this article was written almost 4 years ago originally. I will have to take a look at the JSON to determine what’s going on
RottenTomataoes’s developer site is having issues. It looks like my current API key needs to be reset, but as I can’t get their site to work correctly, I am unable to debug this further at this time.
Yep, My api key as a developer has been expired. So, I actually didn’t receive any JSON data as response.
Instead,
I got a html line of code regarding developer inactive. So, python simplejson couldn’t find any json in the actual response.
Your code works like gem !.
That’s what I am getting as well. But I can’t reset my account’s password as they won’t send the reset email. I also tried signing up for a new account, but they said an error occurred sending me the verification email, so they have major issues in their email system. Maybe they don’t support 3rd party developers any more.
Pingback: Scrapy - Rotten Tomatoes - Cynthia Inspire