Proprietary Spatial Data Silos Considered Harmful

Andrew Hallam | | 11 August 2006, 07:26

Why is it that some organisations are happy to have their valuable spatial data locked in proprietary data silos? This question came to mind while discussing access to spatial data with some GIS guys. Their spatial data is stored in an RDBMS, but in a proprietary format.

GIS has traditionally been a back room discipline requiring expensive tools and people with specific skills. It still requires the skilled people. It still can be very expensive, but as the technology improves it offers more potential value for the money.

What doesn’t seemed to have changed as fast as the technology is the perception of who owns the spatial data. Most GIS people continue to see the data as their asset, to be accessed with their tools, using their skills. This is human nature, and we’re all human, but the result is that spatial data gets partitioned off from the rest of the enterprise, usually in a proprietary silo.

Data silo

Vendors will happily go along with such behaviour. They’ll have a list of reasons why you should use their proprietary data storage methods. Doing so just happens to “lock you in” even further. That means more licensing revenue for the vendor, and potentially higher costs to create business solutions.

While the perception remains within an organisation that “spatial data is special” it will be difficult to get maximum value from that data. A change in attitude is required.

As a discipline, true GIS is still a speciality. No number of Google Earths or ArcGIS Explorers on desktops will change that. Someone still needs to maintain the spatial data and do the high end analysis. However, the required attitude change is that the spatial data itself must be considered enterprise data, and should be available to everyone in the organisation.

There are many users in an organisation who can benefit from spatial data without having an expensive desktop GIS. They can benefit from that data without even knowing it is spatial data. However, just making the data available to more people is not enough. You also have to provide the tools that let those people get business value from that data, and do so at the lowest possible total cost of access. Proprietary storage methods can make that difficult.

Vendor holding the key to your data

Total Cost of Access

This is where it gets interesting. There is a jumble of business requirements, skills and tools that have to be considered. No one approach can do everything well, and each approach has a different cost profile.

Consider a web mapping application focused on a particular business function. The user needs to interact with a map in their browser to create point features and store them in the spatial database. Creating such an application will require expensive .Net or Java GIS software developers, and a web mapping framework. Staying within the realm of commercial products it’s highly likely that the web mapping framework will be provided by the same vendor that is used to manage the spatial data store.

So far so good, but now Kim in management wants to know how many of these points were created in her region last month, and have them broken down by type.

The GIS software developer’s first instinct is to write some more code, using their shiny powerful complex spatial engine of choice, to allow the manager to answer this question. Developers like shiny interesting things. (I can say that because I’m a developer.)

Such code costs money to write. It will cost even more to maintain over it’s life time. e.g. When the Kim wants to slice and dice the data differently, or the vendor of the spatial engine decides that to upgrade you have to rewrite all your .Net 1.1 applications in .Net 2.0, or Java 1.4 to Java 5.0.

So, the cost of access to that data could be quite high over the lifetime of the reporting application. Are there alternative approaches that should be considered? Why yes, there is.

Spatially Enabled Databases

Think about how you would approach this project if the spatial data was stored in an open format, in a spatially enabled relational database, and that database also provided the tools to operate on the spatial data. You could answer Kim’s original request by writing one SQL statement.

Here’s an example SQL statement:

[sql]
SELECT point.type, count(*) FROM point, polygon WHERE (polygon.id = ‘myRegion’) AND point.the_geom && polygon.the_geom AND contains(polygon.the_geom, point.the_geom) AND point.date >= ’2006-07-01’ AND point.date > ’2006-08-01’ GROUP BY point.type;
[/sql]

You just saved a whole lot of development time and cost. You might even be able to use that SQL statement in an off the shelf reporting tool. What, no code? What’s happening here?


  • The need for custom code is minimised, and possibly eliminated. The expensive developers can go work on more interesting problems.

  • A declarative SQL statement replaces procedural code. You’re telling the system what to do, not how to do it. This higher level of abstraction is important because it lets you create a working solution much faster. That solution will have less defects, and will be a lot easier to maintain.

  • SQL is a standardised language, and plenty of people know SQL. Sure, every database vendor has their own flavour of SQL, but if you know one you know enough to use another. Your SQL statement are also unlikely to break when the database version is updated.

  • Some power users could even, gasp, create their own queries. (Subject to strict scrutiny and testing, of course.)

All this adds up to lower cost to access your spatial data, and that means more value can be obtained from that data.

By now the geospatial developers are crying “but you’re still locked into a database vendor”. Yes, you are, but you still need a database. The key difference is that you now have a much wider choice of tools to use to access that spatial data. Anything that can create an ODBC or JDBC connection, and anyone who knows SQL, can leverage your valuable spatial data. You are not locked into the vendor who provides your spatial data gateway. You can pick the most cost effective tool for the job.

Despite all these advantages, a spatial RDBMS is not a panacea. Commercial spatially enabled databases are expensive. Spatial queries can be resource hogs so someone needs to be able to tune the spatial component of the database (this can make a huge difference to performance). Portability of your applications to another database platform is also a problem, increasing lock in.

So, the next time you need to build a spatial application consider the total cost of having your spatial data locked in a proprietary silo. Software, hardware, people, all of it. See what gives you the lowest cost of access to your data.

Note: I would be remiss if I did not acknowledge Simon Greener for educating me, a software developer at heart, on the benefits of spatial databases. If you need someone who really knows about this stuff please take advantage of Simon’s consulting services.

[tags]spatial data, silo, spatial database[/tags]

Comments [9] »

  1. This is a good summary of what many of us have been saying for years.

    The spatial SQL example you give (from PostGIS) is capable of being run within Oracle Locator/Spatial with some trivial changes:

    [sql]
    SELECT pt.type, count(*)
    FROM point pt, polygon pg
    WHERE (pg.polygon.id = ‘myRegion’)
    AND Sdo_Contains(pg.the_geom, pt.the_geom) = 'TRUE'
    AND pt.date >= '2006-07-01'
    AND pt.date = '2006-07-01'
    AND pt.date list;
    [/sql]
    Similarly, for Manifold GIS, this would be:
    [sql]
    SELECT pt.type, count(*)
    FROM point pt, polygon pg
    WHERE (pg.polygon.id = ‘myRegion’)
    AND Contains(pg.[Geom ID], pt.[Geom (I)])
    AND pt.date >= 2006-07-01
    AND pt.date list.
    [/sql]
    Now Kim in management can use these queries in her existing off-the-shelf IT reporting tool. Perhaps she is using Crystal Reports, Oracle Discoverer, Microsoft's SQL Reports: it doesn't matter. She can even include - shock horror - a spatial computation that returns non-spatial data as in the following example:
    [sql]
    SELECT vp.type,
    Sum(
    Intersection(vp.the_geom,lu.the_geom).Area
    ) as Total_Area
    FROM landuse_polygons lu,
    vegetation_polygons vp
    WHERE (lu.id = ‘New Development ID’)
    AND Intersects(vp.the_geom, lu.the_geom)
    GROUP BY vp.type;
    [/sql]

    Remember, the output from a GIS need not be a map!

    But if Kim wanted to see the the_geom in the query then would need some sort of visualisation tool of which a GIS is only one! She might use SVG to render the shape, with an image element displaying data from Google Maps as background (some clever IT person gave her a "mashup" for this), mixed in with some SVG-based piecharts (see Oracle Application Express tool for good examples of this) and all output in a PDF.

    SQL provides applications with a logical, declarative interface that is independent of the physical implementation details. We don't worry about how a number is stored in Oracle/SQLServer/PostgreSQL so why should we worry about how the_geom is stored in these databases?

    What we really need is for the vendors of spatial extensions for databases to use a standardised (ISO SQL/MM) set of query operators and functions so that the translation of the SQL above becomes unnecessary. Oracle has had an ISO SQL/MM object wrapper since 10gR1. Now ESRI is providing one of their own in Oracle for their own users ('cos no one else will be using it).

    The lack of ISO SQL/MM SQL implementations has been one of the main issues hindering Spatial SQL. It is great to see them finally appearing.

    Finally, having spatial SQL does not mean we don't need a GIS or GIS professionals in an organisation. What I have found is that the use of such technology tends to generate MORE work for geospatial professionals not less.The pie grows, not shrinks when one stops trying to own spatial data and processing from the back office to the front office.

    Well done, Andrew.

    regards
    Simon

    Simon Greener11 August 2006, 13:02

  2. Excellent article. I have been trying to push these ideas in my dept. here at the University- with some luck. We have moved a large portion of the vector data that we use on a regular basis into PostGIS- I feel like kicking myself for not doing it sooner every time I get something done in half of the time that I would have otherwise spent!

    Here is a quick example of using USDA-NCSS soil survey data from within PostGIS:

    http://casoilresource.lawr.ucdavis.edu/drupal/node/268

    Cheers,

    Dylan Beaudette11 August 2006, 16:07

  3. Excellent analysis Andrew. Many our clients are complaining about the impacts when vendors decide to stop supporting a development environment and move to the latest and greatest.

    Another client recently bemoaned the lurning curve of the SDK's provided by GIS vendors. I'd be interested to know if my casual poll of our customers (mainly software developers from non GIS backgrounds) is true when I say that most of them have at least a basic understanding of SQL and extending that to spatial SQL is not as huge a barrier as learning the object models presented by the vendors in their SDK's.

    GIS people are always quick to worry about the possibility for non GIS people to draw the wrong conculsions when they don't fully understand the tools they are using. I don't believe this is really a true statement as I have seen people develop new skills in software tools such as Spreadsheets to a point where they can produce complex financial analysis. Does the same hold true for spatial analysis?

    Angus Scown11 August 2006, 19:02

  4. Great article Andrew. I agree with most points in your article. Having worked in the scientific community for many years I have to say that getting data out of Fortran source code models has to be the worst!

    I know that ArcSDE is not popular in these circles, but full open SQL access within sde is supported now for Informix and DB2. Oracle will be supported at ArcSDE release 9.2. I have had a chance to play with it and it is going to be a boon for the community.

    One of the real powers of GIS is the ability to visualize tabular data from a database. Nothing beats creating a couple of views of multi-dimensional data summarized by location and then joining that to your spatial data to visualize it.

    http://geosql.blogspot.com/2006/03/database-gis-powerful-visualization.html

    Cheers,

    Jeremy

    Jeremy11 August 2006, 19:41

  5. Hi Jeremy,

    I've heard about the SQL access via ArcSDE, but I'm curious about how it is implemented. The ArcSDE 9.1 documentation states:

    "ArcSDE does not specifically deliver a SQL API. However, SQL functions are available to users of IBM DB2 Spatial Extender and Informix Spatial DataBlade. The available functions are documented here for convenience."

    Since there is only one ArcSDE option for storing geometry in DB2 and Informix, the native ST_Geometry, my guess is that the SQL just gets passed through to the database for execution. i.e. ArcSDE isn't doing a whole lot, and you could get the same result by accessing the database directly.

    If, however, ArcSDE is managing versions and returning a single unified result set that would be beneficial.

    It will be interesting to see whether the Oracle implementation works with only Oracle's native SDO_GEOMETRY format, or with all storage formats supported by ArcSDE.

    ArcSDE is not the villan here. It's up to those who implement ArcSDE to decide which data type to use. If they choose the data type that is supported natively by their spatial database then they are giving themselves a lot more options on how they can access their data. i.e. If you use Oracle then SDO_GEOMETRY is the way to go.

    Andrew

    Andrew Hallam12 August 2006, 07:08

  6. As I understand it, the ArcSDE SQL implementation for Oracle will be based on custom proprietary datatypes developed by ESRI, NOT on the native geometry types in Oracle Spatial. ESRI had a major role in developing the types for DB2 and Informix, so they have the technical ability to do this, and no doubt it will be a reasonable implementation. However, this just perpertuates the proprietary data silo, albeit with the major benefit of SQL functionality.

    It will be interesting to see what the market response is to this.

    Martin Davis12 August 2006, 17:34

  7. Excellent points made - this is a key issue not only facing organisations, but inadequate addressing of the siloed mentality has serious and potentially dire implications for society. Geospatial technology must evolve towards enabling delivery a new fundamental human right ie. providing humanity (with access to the internet) the right to information pertaining to the sustainability of places and decision outcomes. In the face of the serious challenges we now face in today's society, proprietary gagging of longtitudinal access across the silos must be unacceptable and considered hostile to the realisation of any of the wellbeings of sustainability.

    We urgently need to evolve our thinking well beyond the legacy contraints of the'filing clerk' metaphor and fully embace a spatial metaphor in the collaborative vision of a 'Digital Earth'. This is a topic of discussion at the forthcoming Digital Earth Summit on Sustainability in New Zealand (27-30 August). Refer: www.digitalearth.org.nz

    Richard Simpson14 August 2006, 00:45

  8. [...] 3) I will try to write a follow up on why the full relational DB route would be no picnic now. [...]

    Steve’s Little world » Steve’s list of improvements to geodatabases16 August 2006, 16:37

  9. [...] Andrew H has a great post talking about why people should keep their geospatial data in a spatially enabled database (such as Oracle Spatial or PostGIS). And Dylan talks about the functionality of PostGIS in my comments.  And I was intrigued by the ideas presented. Heck I was intrigued by uDIG as I was getting ready to leave ESRI. What fun to write in Java and work on geospatial technology and open source to boot. So I have dabbled some reading the doc, using some of the software, cruising the discussion forums and my conclusion is that there is serious work that needs to be done before this would be ready for a shop like ours. And by a shop like ours I mean one that has 5-6 GIS folks without a lot of programming time who need to be doing work that directly generates revenue. [...]

    Steve’s Little world » The GIS user is stuck in the middle24 August 2006, 01:05

Commenting is closed for this article.

|

Powered by Textpattern | Tranquility White made TXP-ready by Textpattern Templates