This tutorial provides an overview of tools for geocoding – converting addresses or names of locations into latitudes and longitudes – using the google maps API. Supplementary information is also provided below on what “API” actually means for those interested in a little demystification of this term, although it is not central to the tutorial.
Google offers a service that allows users to submit requests for the latitudes and longitudes associated with different addresses or place names from within R and to get back results that are easy to work within R. This service is called a google geocoding API (Section 2 below discusses what an API is in general terms).
Basically, the google maps API will accept any query you could type into Google Maps and returns information on Google’s best guess for the latitude and longitude associated with your query. The tool for doing this from within R is found int he ggmap
library, and the basic syntax is as follows:
library(ggmap)
addresses <- c("1600 Pennsylvania NW, Washington, DC", "denver, co")
locations <- geocode(addresses, source = "google", output = "more")
locations
lon lat type loctype
1 -77.03648 38.89768 street_address rooftop
2 -104.99025 39.73924 locality approximate
address north south
1 1600 pennsylvania ave nw, washington, dc 20500, usa 38.89902 38.89633
2 denver, co, usa 39.91425 39.61443
east west street_number route
1 -77.03513 -77.03783 1600 Pennsylvania Avenue Northwest
2 -104.60030 -105.10993 <NA> <NA>
neighborhood locality administrative_area_level_1
1 Northwest Washington Washington District of Columbia
2 <NA> Denver Colorado
country postal_code administrative_area_level_2
1 United States 20500 <NA>
2 United States <NA> Denver County
Note the output
option can be set to “latlon”, “latlona”, “more”, or “all” depending on how much information you want back. I would STRONGLY recommend always using the output="more"
option so that you get information on how certain google is about its guess!
geocode
resultsGeocoding results include two fields that are very important to understand: loctype
and type
.
Google thinks of the world as containing a number of different types of locations. Some are points (like houses), while others are areas (like cities). The type
field tells you if google is giving you the location of a house, or just the centroid of a city. A full list of different types is available from google here, but the most common results (in my experience) are:
street_address
: indicates a precise street addresslocality
: indicates an incorporated city or town political entitypoint_of_interest
: indicates a named point of interest. Typically, these “POI”s are prominent local entities that don’t easily fit in another category, such as “Empire State Building” or “Statue of Liberty.”administrative_area_level_[SOME NUMBER]
: these “civil entities”, where 0 is the country, 1 is the first administrative level below that (in the US, states), 2 is below that (in the US, counties), etc.This is important because if you get a locality or administrative area, the latitude and longitude you get it just the centroid of the locality, and you should interpret it as such!
The loctype
column provides similar but distinct information to type
, including more information about street_address
results. In order of precision, the possible values for this field are:
Google limits individual (free) users to 2,500 queries per day and 10 queries per second. You can see how many queries you have remaining by typing geocodeQueryCheck()
. You can also buy additional requests (up to 100,000 a day) for $0.50 / 1000 requests, and there is a paid subscription service that will provide up to 100,000 requests a day.
Because of this, it is important that you not test your code on your entire dataset or you’ll waste your queries when you’re debugging your code!
Solutions at end of tutorial, but no cheating!
Load the addresses_to_geocode.csv
from the RGIS4_Data folder into a DataFrame using read.csv()
.
Geocode the addresses using the geocode
function. Note that if read.csv
imported the addresses as a “factor” variable you may have to convert it to a character vector.
Did you get latitudes and longitudes for all the addresses?
Look at the results for Observation 3, which is mis-spelled. What did google do?
Look at the results for Observation 4 (there is no place called “the zz room 123” at Stanford). What did google do? Do you like this behavior or not?
Why might you want to take the latitude and longitude for Observation 6 with a grain of salt?
API is an (uninformative) acronym for “Application Programming Interface”.
It is often the case that a user working in one programming environment (like R) would like to take advantage of tools written in a different language. The goal of an API is to facilitate this by acting like as a cross between a translator and a messenger between two different programs.
Let’s us an example of an API that you are already quite familiar with, even if you didn’t realize it. The rgeos
library from the RGIS2 tutorial is actually just an API for a program called GEOS written in a language called C++. When people started realizing that they wanted to do GIS analysis in R, the realized that it didn’t make sense to write a whole new set of tools in R for geometric operations when there was already a really sophisticated program written for this exact purpose. At the same time, however, they didn’t want to require R users to export their data to the harddrive, open a different program in a language they might not know, use that program to execute a calculation, then re-import the data.
In steps the API, rgeos
. When you use rgeos
, here’s what’s really happening:
rgeos
– in R – what you would like to happen and give it the data you want to manipulate,rgeos
translates those commands into commands that GEOS
can understand,rgeos
converts your data to data that GEOS
can use.GEOS
then does its analysis, and gives the results to rgeos
.rgeos
then converts those results back to R data, and gives it back to you!That’s it. It’s just a middle man who “speaks” both R and GEOS, and who’s willing to run back and forth between these programs to make your life easier!
(Wanna know a secret? Most good libraries in R are actually just APIs for libraries written in C++!)
A web API is just a special kind of API that stands between an internet browser and the servers of a company like Google or Twitter. It’s job is to accept requests for data written in HTML, convert them into whatever language a company’s servers use, get the result requested, and return it to the user in a format that’s easy to work with (usually something called JSON).
This last point is key – most of the time, what a Web API does is take a web-page you are familiar with (like google maps) and strip away everything that is distracting to a computer program, like nice pictures and fancy formatting.
The way your computer accesses information on the internet is by composing a request in the form of a web address, known formally as a URL. Most of the time, however, this happens without you know it. For example, if you type “42” into google and click search, the result just appears. But look in the address bar above, and you can see the exact code used to get you those search results, which is sometimes as simple as https://www.google.com/#q=42
(though, depending on your computer, it may be much longer). Basically, when you click buttons in your browser, the browser converts your clicks into a URL and sends that request out to the internet.
With a Web API, we skip the step where a user clicks a button in the browser with their mouse, and instead just create customized URLs to ask for the data we need. This is a little less intuitive for humans, but is much easier for computers.
URL Components
There are a couple specific components of most URLs, and understanding these components will be helpful for both web APIs, and in other situations like web-scraping. For example, consider the following link to a youtube video: https://www.youtube.com/watch?v=dQw4w9WgXcQ.
Query Strings are how you send extra information to an internet server. They generally start with a ?
or a #
, and are followed by a number of variable-value pairs. In this case, for example, you’re saying that you want youtube to know that in your request, the variable v
has the value dQw4w9WgXcQ
" (the internal name for this particular video). So basically, this youtube link says: “Hey, Youtube – I would like to access the program you keep in your /watch
directory, and when you call that program, please tell it that the variable v
should be set to dQw4w9WgXcQ
.”
In this way, a URL is a lot like a function call in R where the query string contains the arguments you are passing to the function!
The main value of a Web API is that the response it provides to a query is written in a stripped-down format that’s easy for a computer to understand. Consider the two following results provided by google maps for the search term “Kalamazoo, MI”, one through google maps and one through the google maps API:
The figure on the left – the “human-readable” result – is easy to look at, but consider how hard it would be to tell a computer to zero in on the one piece of data you want and to ignore the map background, the Google Earth button, the photos from Kalamazoo, the various options for modifying the map, etc.
ggmap
When I said that the google maps API allows users to make queries to google from R, I was actually oversimplifying. The google geocoding API operates by responding to specially crafted URLs with nicely formatted outputs. The geocode
command in ggmap
is actually an API to the google API that does the work of converting your query into a URL, sending the URL to the google API, getting the results back, and converting them into an R DataFrame.
This, as it turns out, is really common – people often build APIs on top of APIs. Google, in writing an API that takes requests for data in the form of URL queries and returns results in a generic format (called JSON), is creating a very generalizable platform that can be used from lots of programs. Then other people can build on this for specific applications, like python, R, etc.
So, to recap:
geocode
function in ggmap
? It’s an API to the google maps API!Solutions to Exercise 1
library(ggmap)
addresses <- read.csv("RGIS4_Data/addresses_to_geocode.csv")
addresses$Location <- as.character(addresses$Location)
results <- geocode(addresses$Location, source = "google", output = "more")
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.