Archive for the ‘SQL’ Category
Terminal one-liner to create shapefile from WKT using ogr2ogr
Fair warning, this is a Linux-themed solution. But I expect it could be ported to Windows without too much trouble.
Today I needed to create a shapefile with a single point geometry in it to use as an input for a random GIS utility executable, which required its input to be a an actual shapefile rather than an array of coordinate pairs, or even a single coordinate pair, or even separate x=/y= input parameters for a single coordinate pair. I already had the point coordinate I wanted to use, and I didn’t want to go to the trouble to make this shapefile! So I got to wondering if I could use ogr2ogr to render a simple shapefile from the terminal.
Fortunately, with the help of some utilities built in to my Ubuntu shell, I was able to come up with the following solution. This compound command 1) creates an empty file (dataset.csv), 2) pushes two features into it, and 3) uses ogr2ogr to translate the CSV into a shapefile projected accordingly–in this case, to EPSG:4326.
Here’s the full command. Note that a pair of ampersands (&&) are used to daisy-chain the individual steps together into a single instruction:
touch dataset.csv && printf "gid,WKT,some_value\n1,POINT(-82.048051 33.567181),Alpha\n2,POINT(-92.7774 35.9829),Bravo\n" > dataset.csv && ogr2ogr -f "ESRI Shapefile" dataset.shp -dialect sqlite -sql "SELECT gid, GeomFromText(WKT), some_value FROM dataset" dataset.csv -a_srs EPSG:4326
Here’s a breakdown of what happens..
touch dataset.csv
- Create a file (in the current directory) named “dataset.csv”
printf "gid,WKT,some_value\n1,POINT(-82.048051 33.567181),Alpha\n2,POINT(-92.7774 35.9829),Bravo\n" > dataset.csv
- Push a string through standard output (STDOUT) and INTO (>) the new file “dataset.csv”
- The string imposes line breaks using the “\n” sequence and establishes a CSV type containing three lines
- The three lines of content include: a column header, feature 1, and feature 2
ogr2ogr -f "ESRI Shapefile" dataset.shp -dialect sqlite -sql "SELECT gid, GeomFromText(WKT), some_value FROM dataset" dataset.csv -a_srs EPSG:4326
- The ogr2ogr command exports a shapefile, “dataset.shp”
- The input file, “dataset.csv” is included toward the end, along with the declaration of the source data’s CRS (EPSG:4326/WGS84)
- ogr2ogr is instructed to use the “SQLITE” dialect of sql to pull data from the target dataset
- The SQL command makes sure ogr2ogr knows to identify the WKT column as the source data’s geometry field
It runs pretty quickly, and does exactly what I wanted: Quickly crank out a shapefile if all I have is a point or a polygon or something, already in WKT format. I’m sure I’ll be using this again, so I wanted to document it here for easy reference later. Hopefully someone else finds it useful!
MySQL Implementation of Google’s Encoded Polyline Algorithm
I just noticed someone created a MySQL implementation of Google’s Encoded Polyline Algorithm incorporating some elements of my PostgreSQL/PostGIS approach, specifically, support for multi-part, multi-ring polygon geometries.
And this is exciting, at the bottom of his post, Fabien says he’s working on a solution for consuming the Google encoded geometries in Leaflet. Nice! I love open source software!
I thought I’d mention the MySQL approach here to help expose Fabien’s solution to English searches and drive a little more traffic to his site. If you can’t read French, Google Translate is your friend!
FWIW–if anyone is interested and so inclined–there is a potential improvement that could be made to the Google Encoded Polyline implementation allowing users to specify a desired number of significant digits before rounding the Latitude and Longitude coordinate values prior to encoding. If memory serves, the documentation for Google’s algorithm allows/expects only 5 significant digits. But on some datasets with curved line features (often just heavily sampled line-segments), this limitation of 5 significant digits degrades the data, and you end up with jagged line features. So an improved solution would provide an optional parameter to specify significant digits.
All else equal–nice work, Fabien!
PostGREsql/PostGIS Implementation of Google’s Encoded Polyline Algorithm
[Edit 30 Jan, 2014]
I added an additional PostGREsql method to perform Polygon encoding by concatenating polygon geometries (delimiter: †) and their inner rings (delimiter: ‡) together into one massive encoded block of ring features. I also provided an example JavaScript method demonstrating how to bring the amalgamated polygon feature encodings into your Google Map.
By some uncanny twist of the fates, I’ve elected to use, had to use, and/or been asked to develop applications that use Google Maps ASCII Encoded Polyline expressions. In previous encounters, I’ve used a PHP class to handle the encoding task, and most recently I wrote a Python method to decode these expressions so that I could return a 3rd-party’s encoded geometries to WKT and import them into a spatially aware database.
So far so good.
However one thing has always bugged me about using the PHP solution–I don’t like using a piece of middleware to handle what I consider to be a responsibility of the data layer. Mark McClure’s page, which is basically the seminal authority on this topic, provides external links to implementations for Perl, Ruby, PHP (note: I prefer the PHP class linked, above), Java, and Mathematica. Also, by searching Stack Overflow, you can find implementations of the algorithm in both C# and VB.Net. But for all my efforts searching, I could never dredge up an implementation for either MySQL or PostGREsql/PostGIS. Bummer.
Looking up, it seems version 2.2 of PostGIS might include some built-in Google encoding conversion methods. While this is cool enough for a hat tip, unfortunately, it’s too inconvenient to wait that long, and even then, there’s no guarantee the implementation will work the way I expect with complex Polygon geometries; for instance, maybe it will encode only the exterior ring of Polygons, ignoring MultiPolygons completely, etc. For that matter, it’s equally possible there could be some bugs. So with this said, and even though the previously-mentioned PHP implementation does the job, my boss was cool-enough to let me take a crack at implementing the algorithm as a PostGREsql/PostGIS function, and then share the results with the world. Since some initial testing confirms my PostGIS implementation works, I’ll just post the code parts and hope others find it useful.
For what it’s worth, if anyone finds a bug or has recommendations for improvements, please don’t hesitate to drop me a line.
Sample query calling the first encoding function on the EXTERIOR RING of Polygon geometries:
(Also works on single-part LINESTRING features.)
/************************************************************************ * Note that the encoding method can accept a LINESTRING only, which * is the geometry type used to represent the ring parts of a Polygon. * To help understand this, and why, please see the trailing discussion * section, which elaborates on this situation further. ************************************************************************/ SELECT GoogleEncodeLine(ST_ExteriorRing(wkb_geometry)) as Google FROM polygons_wgs84 WHERE ST_GeometryType(wkb_geometry) = 'ST_Polygon' LIMIT 10 ;
[Added 30 Jan, 2014]
Sample query calling the second encoding function on Polygon and MultiPolygon geometries:
(Preserves multi-part polygons and their inner-ring parts, a.k.a. “holes”.)
/************************************************************************ * This encoding method will accept Polygon and MultiPolygon geom types. * The output returned is an amalgamation of Polyline encodings, where * individual geometries and their interior rings are concatenated * together using string delimiters, †, and ‡, respectively. ************************************************************************/ SELECT GoogleEncodePolygon(wkb_geometry) as GooglePolygon FROM polygons_wgs84 LIMIT 10 ;
Implementation functions to execute/save in your PostGREsql instance:
[Added 30 Jan, 2014]
/************************************************************* * Pass in either a Polygon or MultiPolygon geometry. Returns * an array of ASCII-encoded Polygon feature parts, including * multi-part geometries and their interior rings. ************************************************************/ CREATE OR REPLACE FUNCTION GoogleEncodePolygon ( g1 GEOMETRY ) RETURNS TEXT AS $$ DECLARE ng INT; -- Store number of Geometries in the Polygon. g INT; -- Counter for the current geometry number during outer loop. g2 GEOMETRY; -- Current geometry feature isolated by the outer loop. nr INT; -- Store number of internal ring parts in the Polygon. r INT; -- Counter for the current inner-ring part. r1 GEOMETRY; -- Exterior ring part isolated BEFORE the inner loop. r2 GEOMETRY; -- Inner-ring part isolated within the inner loop. gEncoded TEXT; -- Completed Google Encoding. BEGIN gEncoded = ''; ng = ST_NumGeometries(g1); g = 1; FOR g IN 1..ng BY 1 LOOP g2 = ST_GeometryN(g1, g); if g > 1 then gEncoded = gEncoded || chr(8224); END IF; -- Get ExteriorRing now; if there are any holes, get them later in the loop.. r1 = ST_ExteriorRing(g2); gEncoded = gEncoded || GoogleEncodeLine(r1); nr = ST_NRings(g2); if nr > 1 then -- One (1) is because interior rings is one-based. -- And nr-1 is because ring count includes the boundary. FOR r IN 1..(nr-1) BY 1 LOOP r2 = ST_InteriorRingN(g2, r); gEncoded = gEncoded || chr(8225) || GoogleEncodeLine(r2); END LOOP; END IF; END LOOP; RETURN gEncoded; End $$ LANGUAGE plpgsql;
/************************************************************* * First of two methods. Pass in a geometry (LINESTRING only). * Returns ASCII-encoded point array for use in Google Maps. ************************************************************/ CREATE OR REPLACE FUNCTION GoogleEncodeLine ( g GEOMETRY ) RETURNS TEXT AS $$ DECLARE pt1 GEOMETRY; pt2 GEOMETRY; p INT; np INT; deltaX INT; deltaY INT; enX VARCHAR(255); enY VARCHAR(255); gEncoded TEXT; BEGIN gEncoded = ''; np = ST_NPoints(g); IF np > 3 THEN g = ST_SimplifyPreserveTopology(g, 0.00001); np = ST_NPoints(g); END IF; pt1 = ST_SetSRID(ST_MakePoint(0, 0),4326); FOR p IN 1..np BY 1 LOOP pt2 = ST_PointN(g, p); deltaX = (floor(ST_X(pt2)*1e5)-floor(ST_X(pt1)*1e5))::INT; deltaY = (floor(ST_Y(pt2)*1e5)-floor(ST_Y(pt1)*1e5))::INT; enX = GoogleEncodeSignedInteger(deltaX); enY = GoogleEncodeSignedInteger(deltaY); gEncoded = gEncoded || enY || enX; pt1 = ST_SetSRID(ST_MakePoint(ST_X(pt2), ST_Y(pt2)),4326); END LOOP; RETURN gEncoded; End $$ LANGUAGE plpgsql;
/************************************************************** * Second of two methods. Accepts a signed integer (LON or LAT * by 1e5) and returns an ASCII-encoded coordinate expression. *************************************************************/ CREATE OR REPLACE FUNCTION GoogleEncodeSignedInteger(c INT) RETURNS VARCHAR(255) AS $$ DECLARE e VARCHAR(255); s BIT(32); b BIT(6); n INT; BEGIN e = ''; s = (c::BIT(32))<<1; IF s::INT < 0 THEN s = ~s; END IF; WHILE s::INT >= B'100000'::INT LOOP b = B'100000' | (('0'||substring(s, 28, 5))::BIT(6)); n = b::INT + 63; e = e || chr(n); s = s >> 5; END LOOP; e = e || chr(s::INT+63); RETURN e; End $$ LANGUAGE plpgsql;
[Added 30 Jan, 2014]
JavaScript method demonstrating how to add Polygon encodings to a Google Map object:
(This client implementation works for either the single or the multi-part polygons.)
/************************************************************* * JavaScript! Pass-in an encoded text block created by either * PostGIS method, GoogleEncodePolygon() or GoogleEncodeLine(), * and render it in your Google Map object. If you don't want * the map to zoom to each rendering, just remove the "bounds" * variable and any references to it. ************************************************************/ function renderEncoded(encoded_path) { var bounds = new google.maps.LatLngBounds(); var $encodedGeoms = encoded_path.split("†"); for (var i=0; i<$encodedGeoms.length; i++) { var encodedGeom = $encodedGeoms[i]; var $encodedRings = encodedGeom.split("‡"); var polyPaths = []; for (var j=0; j<$encodedRings.length; j++) { var ptarray = google.maps.geometry.encoding.decodePath($encodedRings[j]); polyPaths.push(ptarray); } var polygonObject = new google.maps.Polygon( { paths: polyPaths, strokeColor: '#890000', strokeOpacity: 1.0, strokeWeight: 2 }); polygonObject.setMap(map); polygonObject.getPath().forEach(function(e) { bounds.extend(e); }); } map.fitBounds(bounds); }
And some additional discussion..
There are two “gotchas” when it comes to implementing the encoding algorithm with respect to Polygons:
1) Polygons, as geometries, can be composed of many rings. The outer ring is considered to be the boundary, and various inner rings are often called “holes”. So this is a specified, understood, and accepted built-in many-to-one relationship between polygons and their internal ring geometries.
And 2) It’s not rare to find polygon tables containing both Polygon and MultiPolygon data types. I think this happens because ESRI allows it, and so in an effort to play well with others, other GIS systems have accommodated it. At least, I know this is true for MySQL and PostGIS.
Here’s why this makes trouble–Google’s encoding algorithm is only intended to represent individual point arrays as a singular geometry. Basically, as long as your first point equals your last point, it’s considered to be a closed geometry, and you can add it and render it in a Google Map as a polygon. The algorithm itself isn’t designed to represent nested arrays, which would be necessary to render either a Polygon with “holes” or a MultiPolygon, which could potentially define many polygons with holes of their own! As such, I suspect there could be considerable disagreement as to how a Polygon-to-Google-Encoded method should actually handle Polygons..
The only solutions I can imagine for this problem would require “faking” a one-to-many relationship by perhaps delimiting together several encodings to account for MultiPolygons and/or single feature Polygons with interior rings. But this starts to get weird. So to keep things somewhat simple for the sake of the post, I chose to stay true to the algorithm’s intent and return a single encoded geometry expression from my method. And the sample query demonstrates this by calling the method against the outermost ring (i.e. the boundary) of a Polygon geometry type, which PostGREsql regards as a LineString, anyway.
[Added 30 Jan, 2014]
Since I wanted to handle the more complex geometries, I wrote the wrapper method GoogleEncodePolygon()
, to first iterate over ST_NumGeometries()
and gain access to any multi-part features, then second, iterate over ST_NRings()
using ST_InteriorRingN()
–you could also do this using ST_DumpRings()
–and gain access to any interior rings of the Polygon geometries, themselves. Then, for each ring part, I call GoogleEncodeLine()
, and concatenate together all those expressions into one massive block of “compound” expressions. I chose to delimit each geometry encoding using an extra-special character that would never be used by Google’s algorithm; for example chr(8224)
, which corresponds to “†”. I then further delimit the internal ring parts using another special character, chr(8225)
, which corresponds to “‡”, and return all these concatenated together as a compound encoding expression. Then, on the client-side (a JavaScript example is provided above), I merely split the compound expression against my delimiters, loop over the various expressions, and add them to the map individually. Note if you are attaching attributes to your features, you’ll need to remember to include them explicitly to each unique Polygon added to your map.
PostGIS: query all multipolygon parcels with at least one hole
I was writing some code to iterate over Well Known Text expressions for polygon features, and I decided I needed to test the most complex edge-case I could think of–multipolygon geometries where at least one of the bound polygons has a hole (i.e. an interior ring).
I ended up with the following query. This seems like the kind of thing I’ll want to reuse later, so I’m noting it here. For good measure, I also use a rudimentary technique to sort the output with the most complicated geometries in the table at the top of the list. Basically, the more “text” it takes to describe the geometry using Well Known Text, the larger and more complex I figure it must be!
SELECT SomePrimaryId, /* your primary key, i.e. ogc_fid, etc. */ SomeUniqueId, /* your descriptive id, i.e. a parcel number */ ST_NumGeometries(wkb_geometry) AS num_geoms, ST_NRings(wkb_geometry) AS num_rings, ST_AsText(ST_Centroid(wkb_geometry)) AS center, Char_Length(ST_AsText(wkb_geometry)) AS len, ST_AsText(wkb_geometry) AS wkt FROM SomePolygonTable WHERE ST_NumGeometries(wkb_geometry) > 1 AND ST_NRings(wkb_geometry) > ST_NumGeometries(wkb_geometry) ORDER BY Char_Length(ST_AsText(wkb_geometry)) ASC ;
Just for the sake of promoting caution, I’m not certain this is a definitive approach for identifying the largest geometry in a table, as the length of the binary representation and the length of the readable text representation do not correspond one-to-one. Moreover, a feature could have more vertices that required less precision to express (fewer decimal position), than a geometry with fewer vertices that needed more precision, and then you have to ask, which is bigger, fewer vertices and more text, or more vertices that coincidentally did not require as much text? My conclusion is, the “most complicated geometry” is probably relative to the one asking the question. However for my purposes, this was close enough to put the most complicated stuff at the top of the list.