Geopy is a useful python package to deal with spatial data, such as locating the coordinates of addresses, cities, countries, and landmarks or reverse. In the Two Sigma Connect Competition in Kaggle, I found that some of the given latitude longitude do not match the addresses, so I use Geopy to clean the mismatched latitude and longitude of rental listings.
Import Packages and Read Data
I am working with rental listings in New York City, the addresses should be in NYC, so I append 'New York City' to the addresses ensure its accuracy.
from geopy.geocoders import GoogleV3
bad_location['street_address'] = bad_location['street_address'] + ',New York City'
Geoencoding
There are several geolocation services we can use, such as Google Maps, Bing Maps, or Yahoo Boss. I choose to use Google Maps Geoencoding API (GoogleV3) because it is insensitive to missing information. Using a standard GoogleV3 API, we can have 2,500 free requests per day and 50 requests per second. Before using it, we need to register to get a free API key following this link.
def do_geocode(address):
geolocator = GoogleV3(api_key = 'your_api_key')
try:
location = address.apply(geolocator.geocode)
latitude = location.apply(lambda x: x.latitude)
longitude = location.apply(lambda x: x.longitude)
return (latitude, longitude)
except GeocoderTimedOut:
return do_geocode()
(latitude, longitude) = do_geocode(bad_location['street_address'])
bad_location.loc[:,'latitude'] = latitude
bad_location.loc[:,'longitude'] = longitude
Quite simple, within a few minutes, we can generate the geolocation for the addresses.