NAME
Geo::Coder::US - Geocode (estimate latitude and longitude for) any US
address
SYNOPSIS
use Geo::Coder::US;
Geo::Coder::US->set_db( "geocoder.db" );
my @matches = Geo::Coder::US->geocode(
"1600 Pennsylvania Ave., Washington, DC" );
my @matches = Geo::Coder::US->geocode(
"42nd & Broadway New York NY" )
my ($ora) = Geo::Coder::US->geocode(
"1005 Gravenstein Hwy N, 95472" );
print "O'Reilly is located at $ora->{lat} degrees north, "
"$ora->{long} degrees east.\n";
DESCRIPTION
Geo::Coder::US provides a complete facility for geocoding US addresses,
that is, estimating the latitude and longitude of any street address or
intersection in the United States. Geo::Coder::US relies on US Census
Bureau TIGER/Line data. Using Geo::TigerLine to parse this data, and
DB_File to store a highly compressed distillation of it, the
Geo::Coder::US suite also implements a reasonably forgiving US address
parser, which it uses to break addresses into normalized components
suitable for looking up in its database.
You can find a live demo of this code at . The
demo.cgi script is included in eg/ directory distributed with this
module, along with a whole bunch of other goodies. See
Geo::Coder::US::Import for how to build your own Geo::Coder::US
database.
Consider using a web service to access this geocoder over the Internet,
rather than going to all the trouble of building a database yourself.
See eg/soap-client.pl, eg/xmlrpc-client.pl, and eg/rest-client.pl for
different examples of working clients for the rpc.geocoder.us geocoder
web service.
METHODS
In general, the only methods you are likely to need to call on
Geo::Coder::US are set_db() and geocode(). The following documentation
is included for completeness's sake, and for the benefit of developers
interested in using bits of the module's internals.
Note: Calling conventions for address and intersection specifiers are
discussed in the following section on CALLING CONVENTIONS.
Geo::Coder::US->geocode( $string )
Given a string containing a street address or intersection, return a
list of specifiers including latitude and longitude for all matching
entities in the database. To keep from churning over the entire
database, the given address string must contain either a city and
state, or a ZIP code (or both), or geocode() will return undef.
geocode() will attempt to normalize directional prefixes and
suffixes, street types, and state abbreviations, as well as
substitute TIGER/Line's idea of the "primary street name", if an
alternate street name was provided instead.
If geocode() can parse the address, but not find a match in the
database, it will return a hashref containing the parsed and
normalized address or intersection, but without the "lat" and "long"
keys specifying the location. If geocode() cannot even parse the
address, it will return undef. Be sure to check for the existence of
"lat" and "long" keys in the hashes returned from geocode() before
attempting to use the values! This serves to distinguish between
addresses that cannot be found versus addresses that are completely
unparseable.
geocode() attempts to be as forgiving as possible when geocoding an
address. If you say "Mission Ave" and all it knows about is "Mission
St", then "Mission St" is what you'll get back. If you leave off
directional identifiers, geocode() will return address geocoded in
all the variants it can find, i.e. both "N Main St" *and* "S Main
St".
Don't be surprised if geocoding an intersection returns more than
one lat/long pair for a single intersection. If one of the streets
curves greatly or doglegs even slightly, this will be the likely
outcome.
geocode() is probably the method you want to use. See more in the
following section on the structure of the returned address and
intersection specifiers.
Geo::Coder::US->normalize_address( $spec )
Takes an address or intersection specifier, and normalizes its
components, stripping out all leading and trailing whitespace and
punctuation, and substituting official abbreviations for prefix,
suffix, type, and state values. Also, city names that are prefixed
with a directional abbreviation (e.g. N, NE, etc.) have the
abbreviation expanded. The normalized specifier is returned.
Geo::Coder::US->parse_location( $string )
Parses any address or intersection string and returns the
appropriate specifier, by calling parse_intersection() or
parse_address() as needed.
Geo::Coder::US->parse_address( $address_string )
Parses a street address into an address specifier, returning undef
if the address cannot be parsed. You probably want to use
parse_location() instead.
Geo::Coder::US->parse_intersection( $intersection_string )
Parses an intersection string into an intersection specifier,
returning undef if the address cannot be parsed. You probably want
to use parse_location() instead.
Geo::Coder::US->filter_ranges( $spec, @candidates )
Filters a list of address specifiers (presumably from the database)
against a query specifier, filtering by prefix, type, suffix, or
primary name if possible. Returns a list of matching specifiers.
filter_ranges() will ignore a filtering step if it would result in
no specifiers being returned. You probably won't need to use this.
Geo::Coder::US->find_ranges( $address_spec )
Given a normalized address specifier, return all the address ranges
in the database that appear to cover that address. find_ranges()
ignores prefix, suffix, and type fields in the specifier for search
purposes, and then filters against them ex post facto. The intention
for find_ranges() to find the closest match possible in preference
to returning nothing. You probably want to use geocode_ranges()
instead, which will call find_ranges() for you.
Geo::Coder::US->geocode_ranges( $address_spec, @ranges )
Given an address specifier and (optionally) some address ranges from
the database, interpolate the street address into the street segment
referred to by the address range, and return a latitude and
longitude for the given address within each of the given ranges. If
@ranges is not given, geocode_ranges() calls find_ranges() with the
given address specifier, and uses those returned. You probably want
to just use geocode() instead, which also parses an address string
and determines whether it's a proper address or an intersection
automatically.
Geo::Coder::US->find_segments( $intersection_spec )
Given a normalized intersection specifier, find all of the street
segments in the database matching the two given streets in the given
locale or ZIP code. find_segments() ignores prefix, suffix, and type
fields in the specifier for search purposes, and then filters
against them ex post facto. The intention for find_segments() to
find the closest match possible in preference to returning nothing.
You probably want to use geocode_intersection() instead, which will
call find_segments() for you.
Geo::Coder::US->geocode_intersection( $intersection_spec )
Given an intersection specifier, return all of the intersections in
the database between the two streets specified, plus a latitude and
longitude for each intersection. You probably want to just use
geocode() instead, which also parses an address string and
determines whether it's a proper address or an intersection
automatically.
CALLING CONVENTIONS
Most Geo::Coder::US methods take a reference to a hash containing
address or intersection information as one of their arguments. This
"address specifier" hash may contain any of the following fields for a
given address:
ADDRESS SPECIFIER
number
House or street number.
prefix
Directional prefix for the street, such as N, NE, E, etc. A given
prefix should be one to two characters long.
street
Name of the street, without directional or type qualifiers.
type
Abbreviated street type, e.g. Rd, St, Ave, etc. See the USPS
official type abbreviations at
for a list of
abbreviations used.
suffix
Directional suffix for the street, as above.
city
Name of the city, town, or other locale that the address is situated
in.
state
The state which the address is situated in, given as its two-letter
postal abbreviation. See
for a list of
abbreviations used.
zip Five digit ZIP postal code for the address, including leading zero,
if needed.
lat The latitude of the address, as returned by geocode() et al. If you
provide this to as part of an argument to a Geo::Coder::US method,
it will be ignored.
long
The longitude of the address, as returned by geocode() et al. If you
provide this to as part of an argument to a Geo::Coder::US method,
it will be ignored.
INTERSECTION SPECIFIER
prefix1, prefix2
Directional prefixes for the streets in question.
street1, street2
Names of the streets in question.
type1, type2
Street types for the streets in question.
suffix1, suffix2
Directional suffixes for the streets in question.
city
City or locale containing the intersection, as above.
state
State abbreviation, as above.
zip Five digit ZIP code, as above.
lat, long
A single latitude and longitude for the intersection, as specified
above. If you provide these values as part of an argument to a
Geo::Coder::US method, they will be ignored.
DATA MODEL
Geo::Coder::US instantiates several subclasses for the purposes of
modelling TIGER/Line street data. Although you will probably not need to
use or think about these subclasses, they are described here for
completeness's sake, yadda yadda.
BUGS, CAVEATS, ETC.
The TIGER/Line data is notoriously buggy and inaccurate, but it seems to
work reasonably well for urban areas. Geo::Coder::US uses interpolation
to estimate the position of a particular address within a block, which
means that it will necessarily be slightly inaccurate. Hey, it's only 14
meters off for *my* house, which is better than the 300 meter error
given by another prominent geocoder, and definitely close enough for
navigation.
In rural areas, TIGER/Line doesn't give names for lots of putative
roads, even if the roads have names. Maybe the sign blew down the day
before the Census agents got there, assuming there was ever a sign. What
can you do? Similarly, lots of rural areas have official county
subdivision names that an ordinary user would never think to give.
Probably the right thing to do is map in names from a ZIP code database,
but that data's not in TIGER/Line. What can you do? In general, you
should expect the geocoder to be a lot more accurate in urban versus
rural areas.
There may be many kinds of US street addresses which Geo::Coder::US
can't parse. In particular, Geo::Coder::US strips out letters and dashes
from house numbers, which may cause ambiguous results in certain parts
of the country (particularly rural Michigan and Illinois, I think). Mea
culpa. Send patches.
The full TIGER/Line data set is one heck of a lot of data -- about four
gigabytes compressed, and over 24 gigs uncompressed. The BerkeleyDB
database covering the whole US runs to 750+ megabytes uncompressed, or
about 305 megs compressed. I'd really like to host this somewhere so
that people can just download the database itself, rather than having to
download the whole TIGER/Line data set just to build the geocoder
database. Contact us if you can offer a mirror site.
It would be nice to see a version of this for other countries, e.g.
Geo::Coder::CA, Geo::Coder::DK, Geo::Coder::UK, with the same methods.
Contact your local legislator about why the public geographic data for
your country isn't freely available like it is in the US. If street
address data is freely available for your country of choice, what are
you waiting for?
TODO
A web service and client to access this data, so that people can reuse
other people's geocoders.
A means of importing FIPS-55-3 or perhaps ZIP code data so that places
can be geocoded by alternate place names. For example, this geocoder
won't geocode addresses in Ventura, CA properly because the official
name of the city is San Buenaventura, but no one actually calls it that.
Reverse geocoding methods, to retrieve the nearest street address from a
given lat/long.
SEE ALSO
DB_File(3pm), Geo::TigerLine(3pm), Geo::Coder::US::Import(3pm)
TIGER/Line is a registered trademark of the US Census Bureau. Find out
more, and get the latest TIGER/Line files from
. Actually, the best place to
download the data from is .
You can find a live demo of this code at .
AUTHORS
Schuyler Erle
Jo Walsh
COPYRIGHT AND LICENSE
Copyright (C) 2004 by Schuyler Erle and Jo Walsh
This library is free software; you can redistribute it and/or modify it
under the same terms as Perl itself, either Perl version 5.8.3 or, at
your option, any later version of Perl 5 you may have available.