Skip to content
nvkelso edited this page Jan 7, 2013 · 42 revisions

To geocode is the hardest part about going open source geo.

Web services:

Open source tools:

  • twofishes - A coarse splitting geocoder in scala, based primarily on geonames data, from Foursquare by David Blackman. Primarily GeoNames.org data.
  • datasciencetoolkip - US and UK street address geocoding, some reverse geocoding options. Free, download a VM instance.
  • pySHPgeocode - This Python package reverse geocodes data points to shapefile regions. It is developed primarily for assigning administrative region codes to a set of geo coordinates (in standard latitude/longitude). All you need is a shapefile of the regions you want to geocode to.
  • OSM Nominatim - Used in OpenStreetMap, tied to OSM data.
  • OSM Imposm.geocoder - Optimized for German addresses.
  • MapQuest's version of Nominatim - Warning, assumes USA addresses.
  • GeoCommons GeoCoder - an open-source Geocoder using open data mostly just for the US using Census geography files
  • Gisgraphy - an open-source Geocoder using open data
  • GPLplanet - Runtime PHP/MySQL libs for Yahoo! GeoPlanet

###Not a batch geocoder, but useful:

###Data sources:###

  • Yahoo! GeoPlanet - Yahoo! GeoPlanet helps bridge the gap between the real and virtual worlds by providing an open, permanent, and intelligent infrastructure for geo-referencing data on the Internet. Includes some neighborhood names. Can be paired with Alpha Shapes below for rough lat/long and shape outlines. CC-BY.
  • Flickr Alpha Shapes - Shapes for roughly two hundred and seventy thousand (270K) WOE IDs. Determined by analyzing the geography tags in millions of Flickr photos. Woo.
  • GeoNames.org - The GeoNames geographical database covers all countries and contains over between 2 and 8 million placenames that are available for download free of charge. Readme for files and layout. Download list.
  • Natural Earth - Several thousand names of populated places gets you 1/2 of the world's population. What other database can claim that? ;)
  • NGA GeoNet Names Service - The Geographic Names Server is the official repository of standard spellings of all foreign place names, sanctioned by the United States Board on Geographic Names. The database also contains variant spellings (cross-references), which are useful for finding purposes. We are starting to hold the native script spellings of these names. All the geographic features in the database contain information about location, administrative division, and quality. The database can be used for a variety of purposes, including establishing official spellings of foreign place names, cartography, GIS, GEOINT, and finding places.
  • US Board on Geographic Names - The U.S. Board on Geographic Names is a Federal body created in 1890 and established in its present form by Public Law in 1947 to maintain uniform geographic name usage throughout the Federal Government. The Board promulgates official geographic feature names with locative attributes as well as principles, policies, and procedures governing the use of domestic names, foreign names, Antarctic names, and undersea feature names.
  • OSM Address Guide - See also State of the Map presentation on the State of Findability by Steven Johnson @geomantic.

###Entity Extraction###

  • Placemaker - Yahoo! Placemaker is a freely available geoparsing Web service. It helps developers make their applications location-aware by identifying places in unstructured and atomic content and returning geographic metadata for geographic indexing and markup. Provided with free-form text, the service identifies places mentioned in text, disambiguates those places, and returns unique identifiers (WOEIDs) for each, as well as information about how many times the place was found in the text, and where in the text it was found.
  • OpenCalais - Using natural language processing (NLP), machine learning and other methods, Calais analyzes your document and finds the entities within it. But, Calais goes well beyond classic entity identification and returns the facts and events hidden within your text as well.

###What's in a location?###

Location Types:

  • Street Address: - An address consists of a street number, a street name, and a quadrant (NE, NW, SE, SW). The address number and street name are required.

    Example: 441 4th ST NW

    Input, gazetter: in a point XY, matches free text to points, polys, or street segment lines with addy ranges.

    Result: New point location. The NAME normalized version of the address, including City, State, [Country], and ZIP+[4]. Optional: aggregated form placed back onto the POLYs/POINTS/etc used for matching.

  • Intersection: - An Intersection consists of two streets.

    Example: 14th ST NW and Pennsylvania Avenue NW, or 14th ST NW & Pennsylvania Avenue NW, or 16th ST NW over Military RD NW, or Military RD NW under 16th ST NW

    Input, gazetter: a point XY, matches free text to points, polys.

    Result: New point location. The NAME normalized version of the intersection. Optional: aggregated form placed back onto the POLYs used for matching.

  • Block: - A block consists of a street and any other cross streets.

    Example: 4th ST NW from D Street NW to E Street NW or 400 Block of 4th St NW

    Input, gazetter: a point XY or line segment, matches free text to points, polys, or street segment lines with addy ranges.

    Result: New point location at mid zooms, new lines at high zooms, now with NAME normalized form and uniqueID added to their attributes. Optional: aggregated form placed back onto the POLYs/LINES/PTS used for matching.

  • Place, POI Names: - A place name consists of common place names (neighborhoods etc.) and public/institutional building names.

    Example: "White House" or "Dupont Circle" or "Wilson Building" or "Wilson High School" or "Woodrow Wilson House"

    Input, gazetter: a point XY or polygon, matches free text to points, polys.

    Result: New points, now with NAME normalized form and uniqueID added to their attributes. Optional: aggregated form placed back onto the POLYs/PTS used for matching.

  • Cities: - Both incorporated and unincorporated. Usually disambiguated with as a city-state pair, also a country context.

    Example: "Washington, DC" with context being USA.

    Input, gazetter: a point XY or polygon, matches free text to points, polys.

    Result: New points or same points that went in, now with NAME normalized form and uniqueID added to their attributes. Optional: aggregated form placed back onto the POLYs used for matching.

  • Zipcodes: - Used for postal routing. Often point in polygon tests result in the name of the polygon being added to the point attribute columns.

    Example: "95501" with context being USA.

    Input, gazetter: a point XY or polygon, matches free text to points, polys.

    Result: New points or same points that went in, now with ZIP normalized form added to their attributes. Optional: aggregated form placed back onto the POLYs used for matching.

  • States, counties, countries: - Other administrative units.

    Example: "Humboldt County, California" with context being USA. Example: "California" with context being USA. Example: "United States" with context being Planet Earth.

    Input, gazetter: a point XY or polygon, matches free text to points, polys.

    Result: New points or same points that went in, now with NAME normalized form added to their attributes. Optional: aggregated form placed back onto the POLYs used for matching.

Tile Previews

Image source: DC MAR

Clone this wiki locally