If you haven’t already read my previous posts on how we did some interesting clustering and data mining using the tweets from Twitter, I would suggest reading the posts here, here, here, here and here. If you have already read these posts, you already know that we used users’ location as one of the factors while clustering the users. Now, this is an information which is not directly available from the user profiles. So, until a users posts some tweets which are geocoded, it is not possible to easily find out their location. Twitter does have a location field in the user profile, but it isn’t geocoded so we cannot directly extract the location from there as well. Considering that only about 1 percent of the tweets on twitter are geocoded, it therefore becomes quite difficult to do something based on users’ locations.

We used Google Maps API to obtain the complete address of the user from the user's location on Twitter. Doing this is extremely simple using the API. Just get yourself an API key and then its just one call with the address to obtain the complete geocoded information. Here's how to do it with Java.

URL url = new URL("http://maps.google.com/maps/geo?output=json&q=" + URLEncoder.encode(address, "UTF-8")
				+ "&key=" + key);
URLConnection conn = url.openConnection();
ByteArrayOutputStream output = new ByteArrayOutputStream(1024);
IOUtils.copy(conn.getInputStream(), output);
output.close();

String s = output.toString();
GAddress gaddr = new GAddress();
JSONObject json = JSONObject.fromObject(output.toString());
if (json.getJSONObject("Status").getInt("code") != 200) {
	return null;
}
JSONObject placemark = (JSONObject) query(json, "Placemark[0]");

final String commonId = "AddressDetails.Country.AdministrativeArea";

gaddr.setFullAddress(query(placemark, "address").toString());
gaddr.setZipCode(query(
		placemark,
		commonId
				+ ".SubAdministrativeArea.Locality.PostalCode.PostalCodeNumber")
		.toString());
gaddr.setAddress(query(
		placemark,
		commonId
				+ ".SubAdministrativeArea.Locality.Thoroughfare.ThoroughfareName")
		.toString());
gaddr.setCity(query(placemark,
		commonId + ".SubAdministrativeArea.SubAdministrativeAreaName")
		.toString());
gaddr.setState(query(placemark, commonId + ".AdministrativeAreaName")
		.toString());
gaddr.setLat(Double
		.parseDouble(query(placemark, "Point.coordinates[1]")
				.toString()));
gaddr.setLng(Double
		.parseDouble(query(placemark, "Point.coordinates[0]")
				.toString()));

gaddr.setCountryName(query(placemark, "AddressDetails.Country.CountryName").toString());
gaddr.setCountryCode(query(placemark, "AddressDetails.Country.CountryNameCode").toString());

GAddress here is a custom Java class with the zipcode, address, city, state, etc.