SQLite

SQLite (sqlite.org) is a local Database Management System. Data is stored in a local file under a well-documented file format.

SQLite is just a container format; the choice of what can go where in the DB, i.e. the schema, is chosen by the program that creates the database. For example, the DB can store simple geo data, like latitude and longitude columns; store structured geo data, when an appropriate schema is used; store pre-computed tiles (which can then be served offline, for example in OsmAnd); or do as a basic, generic container to build new standard file formats.

SpatiaLite extends SQLite with spatial indexing and spatial functions.

Converting OSM data into an SQLite DB

OSMLib can convert to and from SQLite and OSM XML. See osmlib-sqlite docs

HOT Exports web tool will generate an SQLite DB as one of several exported formats

SQLite files and QGis

Often, it's more useful to store geodata in small SQLite. It doesn't have geo-functions or a spatial index, like SpatiaLite does, but this is often not needed at all. SQLite gives a very lightweight way to store such information.

QGIS can produce both kinds of DBs, but it doesn't add the correct CRS/SRS (information about the coordinate system being used). That might be a bug, but doesn't work up to version 2.18.3 and maybe later. The problem results into QGIS complaining about a missing CRS/SRS on re-opening at file and forces it to be EPSG:4326. That's good for OpenStreetMap data, but not really usefull for anything else.

To add the missing CRS/SRS, keep in mind to store metadata in QGIS and select "SQLite" as export format (not "SpatiaLite"!). In the following example, I'll set SRID to 3035 used in Europe for maps about statistics:

Open the SQLite file, e.g., with sqlite3: sqlite3 database.sqlite
Add the information about spatial reference to the geometry column (table is "demo_data"):

UPDATE geometry_columns SET srid = 3035 WHERE table = 'demo_data';

Add the definition of the spatial reference:

INSERT INTO spatial_ref_sys (srid, auth_name, auth_srid, srtext) VALUES (3035, 'epsg', 3035, 'PROJCS["ETRS89 / ETRS-LAEA",GEOGCS["ETRS89",DATUM["European_Terrestrial_Reference_System_1989",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6258"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.01745329251994328,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4258"]],UNIT["metre",1,AUTHORITY["EPSG","9001"]],PROJECTION["Lambert_Azimuthal_Equal_Area"],PARAMETER["latitude_of_center",52],PARAMETER["longitude_of_center",10],PARAMETER["false_easting",4321000],PARAMETER["false_northing",3210000],AUTHORITY["EPSG","3035"],AXIS["X",EAST],AXIS["Y",NORTH]]');

Now you're done. You can look up the string for spatial reference, f.e. on this website: https://spatialreference.org/ref/epsg/3035/postgis/ But this is important:

Remove "proj4text," from the bracketed part and remove "+proj=..." from the second one.
Make the first number of the second bracketed part be equal to your EPSG-code, f.e. "3035".

Tools using SQLite

Microcosm is a map API server that uses Sqlite and the R*tree module.

Derived formats

Some standard formats are defined as a SQLite DB with a specific schema.

GeoPackage can contain vector features and raster tile images.
MBTiles can contain raster tile images.

Optimize tag search

ogr2ogr stores OSM tags in SQLite or GeoPackage files as plain text strings in JSON format ('{"key":"value"}') or HSTORE format ('"key"=>"value"') in the column all_tags or other_tags. ^[1] This violation of first normal form forces you to learn the JSON / HSTORE parsing functions from QGis, GDAL, SQLite, makes the query complex, slows down the query (due to JSON / HSTORE string parsing).

Instead, an OSMer and a DBMS think in these terms

SELECT (feature identification number)
  FROM (data source)
 WHERE ... = 'bench'
       AND ... LIKE '%wood%'
       -- LIKE is to handle multi-valued tags and leading / trailing semicolons, such as ';wood;metal;'

So we make a table for feature identification number, key, value. ^[2] ^[3] This table improves query performance drastically. Indices can be optionally defined on the new columns, to improve performance further.

This tutorial assumes you have a table points(ogc_fid INTEGER PRIMARY KEY, all_tags VARCHAR). ^[4] Replace the names accordingly, based on your DB. See the ogr2ogr schema for info about how to find the names of the feature identifiers column and the OSM tags column in your DB.

We'll only work on table points for simplicity, but you can repeat this work for each of the tables listed in the SQLite's vector_layers, so lines, multilinestrings, multipolygons, other_relations, points.

Extract tags from HSTORE

Do this if the tags column is like "key"=>"value".

You do not need to install PostgreSQL, but you need SqlAlchemy. For example, on Debian, install the package python3-sqlalchemy. Save this into a file a.py, then do chmod +x a.py ; ./a.py --target /path/to/file.sqlite --table points

#! /usr/bin/env python3

import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--target")
parser.add_argument("--table")
args = parser.parse_args()
target, table = args.target, args.table

# Per <https://stackoverflow.com/a/56152470>
from sqlalchemy.dialects.postgresql import HSTORE
hstore_to_dict = HSTORE().result_processor(None, None)

create = (
   f"CREATE TABLE {table}_fid_to_tag("
    "ogc_fid INTEGER, key VARCHAR, value VARCHAR,"
   f" FOREIGN KEY(ogc_fid) REFERENCES {table}(ogc_fid));"
)
select = f"SELECT ogc_fid, all_tags FROM {table} WHERE all_tags IS NOT NULL;"
insert = f"INSERT INTO {table}_fid_to_tag VALUES(?, ?, ?);"

import sqlite3
conn = sqlite3.connect(target)

conn.execute(create)
for (fid, tags) in conn.execute(select).fetchall():
    for key, value in hstore_to_dict(tags).items():
        conn.execute(insert, (fid, key, value)) and None

conn.commit()  # required by close()
conn.close()

Extract tags from JSON

This section applies if the tags column is like {"key":"value"}. This format is supported since GDAL 3.7, which is not yet released at the time of writing; so this section has not been tested yet, but here is a sketch of how it would work. Note that, for SQLite versions older than 3.38.0, redistributors of SQLite were allowed to disable JSON support, in which case the HSTORE route is your only option. ^[5]

CREATE TABLE points_fid_to_tag(
             ogc_fid INTEGER,
             key VARCHAR,
             value VARCHAR,
             FOREIGN KEY(ogc_fid) REFERENCES points(ogc_fid))
   AS SELECT ogc_fid, key, value
        FROM points, json_tree(all_tags)
       WHERE key IS NOT NULL;

Create indices

Query performance on our new table can be acceptable already, if the DB is not too large. Otherwise, you want that WHERE uses an index. The indices created below should be enough.^[6] See below to check that a query uses indices instead of iterating over all the rows of a table, and for help in making custom indices on other DB tables.

CREATE UNIQUE INDEX index_points_fid_to_tag_by_id_key
           ON points_fid_to_tag(ogc_fid, key);
CREATE  INDEX index_points_fid_to_tag_by_all
           ON points_fid_to_tag(key, value, ogc_fid);
CREATE  INDEX index_points_fid_to_tag_by_value
           ON points_fid_to_tag(value);

Example queries

Remember:

to compare multi-valued tags (separated by semicolon or other) you want LIKE '%value%'.
Put strings within ' not within ".

Get the OpenStreetMap ids of features:

SELECT osm_id
  FROM points, points_fid_to_tag
 WHERE points.ogc_fid = points_fid_to_tag.ogc_fid
       AND key = 'amenity'
       AND value = 'bench';

Get all the tags of a feature:

SELECT key, value
  FROM points, points_fid_to_tag
 WHERE points.ogc_fid = points_fid_to_tag.ogc_fid
       AND osm_id = ...;

Intersect queries:

SELECT COUNT(*)
  FROM (
       SELECT ogc_fid
         FROM points_fid_to_tag
        WHERE key = 'amenity'
              AND value = 'bench'
  ) AS benches,
       (
       SELECT ogc_fid
         FROM points_fid_to_tag
        WHERE key = 'material'
              AND value LIKE '%wood%'
  ) AS wooden
 WHERE benches.ogc_fid = wooden.ogc_fid;

General usage

Visualisation tools

General tools to navigate in a SQLite database:

sqlitebrowser: open the tab Navigate data, select the table in the dropdown box, filter columns by typing something in the text filters below the table column headers
sqlite3 command-line program:
- Type sqlite3 to run SQL interactively in the terminal window
- Or give it a SQL as an argument. For example, using the above optimization tutorial: sqlite3 file.sqlite "select fid from points_fid_to_tag where key = 'amenity' and value = 'atm' limit 10;"

Indices

You can make indices on any column of any DB table. You can pass a query to .expert to auto-suggest indices that speed up that specific query. ^[7] ^[8] Beware that WHERE uses indices which have WHERE's columns as their first columns^[9] and that at most one index is used for each table in a query^[10].

For example, ogr2ogr makes a column osm_id. To make an index to speed up the WHERE osm_id = ... part of a query, run:

CREATE INDEX index_points_by_osm_id ON points(osm_id);

To check that a query uses an index (for example, for the optimization tutorial above):

username@machine$ sqlite3 the_file.sqlite
sqlite> EXPLAIN QUERY PLAN
   ...>  SELECT points.geometry
   ...>    FROM points, points_fid_to_tag
   ...>   WHERE points.ogc_fid = points_fid_to_tag.ogc_fid
   ...>         AND points_fid_to_tag.key = 'amenity'
   ...>         AND points_fid_to_tag.value = 'bench';
QUERY PLAN
|--SCAN points_fid_to_tag
`--SEARCH points USING INTEGER PRIMARY KEY (rowid=?)
sqlite>
sqlite> -- Make indices per above
sqlite> CREATE INDEX ...
sqlite>
sqlite> EXPLAIN QUERY PLAN
   ...>  SELECT points.geometry
   ...>    FROM points, points_fid_to_tag
   ...>   WHERE points.ogc_fid = points_fid_to_tag.ogc_fid
   ...>         AND points_fid_to_tag.key = 'amenity'
   ...>         AND points_fid_to_tag.value = 'bench';
QUERY PLAN
|--SEARCH points_fid_to_tag USING COVERING INDEX index_points_fid_to_tag_by_all (key=? AND value=?)
`--SEARCH points USING INTEGER PRIMARY KEY (rowid=?)

References

↑ Note that GDAL supports JSON since version 3.7. Before that, ogr2ogr can only emit HSTORE.
↑ An alternative approach is to make one column per key. ogr2ogr supports this style for user-chosen keys that are hard-coded in ogrconf.ini's attributes key. If you do this, clearly a query for a rare key can fail with column not found instead of returning 0 records, which is inconvenient when doing QGis or sqlite interactively, and forces you to handle special cases when querying the DB programmatically (user script, cron job). Also, if you ever want to update the DB, it's easier (and more performant?) to add / remove rows than to alter the table schema.
↑ Ideally, we would make a materialized view, so that a change to the original table points (= add or remove a feature, or change a feature's tags) would auto-update our new table. SQLite does not have materialized views, so we make a table instead; if the original table is changed, this new table will not be auto-updated (if you make a trigger, please add instructions). We do not make a STORED generated column, because generating two columns would need twice the parsing.
↑ Having the primary key of exact type "INTEGER" (not "INT", not another type) exploits an internal SQLite optimization called rowid tables.
↑ If you want to learn more about json_tree, see the docs or run .mode box and then SELECT * FROM json_tree('{"outer":{"inner":"value"}}');.
↑ They were returned by passing the queries on this page to .expert.
↑ .expert is documented here in section 17.
↑ If you run .expert and it gives error sqlite3_expert_new: no such module: VirtualSpatialIndex on Debian, then install the package libsqlite3-mod-spatialite and run SELECT load_extension("mod_spatialite");.
↑ query planner documentation's section "3. WHERE Clause Analysis"
↑ query planner documentation's section "8. Choosing Between Multiple Indexes"

This article is issued from Openstreetmap. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] Note that GDAL supports JSON since version 3.7. Before that, ogr2ogr can only emit HSTORE.

[2] An alternative approach is to make one column per key. ogr2ogr supports this style for user-chosen keys that are hard-coded in ogrconf.ini's attributes key. If you do this, clearly a query for a rare key can fail with column not found instead of returning 0 records, which is inconvenient when doing QGis or sqlite interactively, and forces you to handle special cases when querying the DB programmatically (user script, cron job). Also, if you ever want to update the DB, it's easier (and more performant?) to add / remove rows than to alter the table schema.

[3] Ideally, we would make a materialized view, so that a change to the original table points (= add or remove a feature, or change a feature's tags) would auto-update our new table. SQLite does not have materialized views, so we make a table instead; if the original table is changed, this new table will not be auto-updated (if you make a trigger, please add instructions). We do not make a STORED generated column, because generating two columns would need twice the parsing.

[4] Having the primary key of exact type "INTEGER" (not "INT", not another type) exploits an internal SQLite optimization called rowid tables.

[5] If you want to learn more about json_tree, see the docs or run .mode box and then SELECT * FROM json_tree('{"outer":{"inner":"value"}}');.

[6] They were returned by passing the queries on this page to .expert.

[7] .expert is documented here in section 17.

[8] If you run .expert and it gives error sqlite3_expert_new: no such module: VirtualSpatialIndex on Debian, then install the package libsqlite3-mod-spatialite and run SELECT load_extension("mod_spatialite");.

[9] query planner documentation's section "3. WHERE Clause Analysis"

[10] query planner documentation's section "8. Choosing Between Multiple Indexes"

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]