SQLite
SQLite (sqlite.org) is a local Database Management System. Data is stored in a local file under a well-documented file format.
SQLite is just a container format; the choice of what can go where in the DB, i.e. the schema, is chosen by the program that creates the database. For example, the DB can store simple geo data, like latitude and longitude columns; store structured geo data, when an appropriate schema is used; store pre-computed tiles (which can then be served offline, for example in OsmAnd); or do as a basic, generic container to build new standard file formats.
SpatiaLite extends SQLite with spatial indexing and spatial functions.
Converting OSM data into an SQLite DB
OSMLib can convert to and from SQLite and OSM XML. See osmlib-sqlite docs
HOT Exports web tool will generate an SQLite DB as one of several exported formats
SQLite files and QGis
Often, it's more useful to store geodata in small SQLite. It doesn't have geo-functions or a spatial index, like SpatiaLite does, but this is often not needed at all. SQLite gives a very lightweight way to store such information.
QGIS can produce both kinds of DBs, but it doesn't add the correct CRS/SRS (information about the coordinate system being used). That might be a bug, but doesn't work up to version 2.18.3 and maybe later. The problem results into QGIS complaining about a missing CRS/SRS on re-opening at file and forces it to be EPSG:4326. That's good for OpenStreetMap data, but not really usefull for anything else.
To add the missing CRS/SRS, keep in mind to store metadata in QGIS and select "SQLite" as export format (not "SpatiaLite"!). In the following example, I'll set SRID to 3035 used in Europe for maps about statistics:
- Open the SQLite file, e.g., with sqlite3:
sqlite3 database.sqlite
- Add the information about spatial reference to the geometry column (table is "demo_data"):
UPDATE geometry_columns SET srid = 3035 WHERE table = 'demo_data';
- Add the definition of the spatial reference:
INSERT INTO spatial_ref_sys (srid, auth_name, auth_srid, srtext) VALUES (3035, 'epsg', 3035, 'PROJCS["ETRS89 / ETRS-LAEA",GEOGCS["ETRS89",DATUM["European_Terrestrial_Reference_System_1989",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6258"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.01745329251994328,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4258"]],UNIT["metre",1,AUTHORITY["EPSG","9001"]],PROJECTION["Lambert_Azimuthal_Equal_Area"],PARAMETER["latitude_of_center",52],PARAMETER["longitude_of_center",10],PARAMETER["false_easting",4321000],PARAMETER["false_northing",3210000],AUTHORITY["EPSG","3035"],AXIS["X",EAST],AXIS["Y",NORTH]]');
Now you're done. You can look up the string for spatial reference, f.e. on this website: https://spatialreference.org/ref/epsg/3035/postgis/ But this is important:
- Remove "proj4text," from the bracketed part and remove "+proj=..." from the second one.
- Make the first number of the second bracketed part be equal to your EPSG-code, f.e. "3035".
Tools using SQLite
Microcosm is a map API server that uses Sqlite and the R*tree module.
Derived formats
Some standard formats are defined as a SQLite DB with a specific schema.
- GeoPackage can contain vector features and raster tile images.
- MBTiles can contain raster tile images.
Optimize tag search
ogr2ogr
stores OSM tags in SQLite or GeoPackage files as plain text strings in JSON format ('{"key":"value"}'
) or HSTORE format ('"key"=>"value"'
) in the column all_tags
or other_tags
.
[1]
This violation of first normal form forces you to learn the JSON / HSTORE parsing functions from QGis, GDAL, SQLite, makes the query complex, slows down the query (due to JSON / HSTORE string parsing).
Instead, an OSMer and a DBMS think in these terms
SELECT (feature identification number)
FROM (data source)
WHERE ... = 'bench'
AND ... LIKE '%wood%'
-- LIKE is to handle multi-valued tags and leading / trailing semicolons, such as ';wood;metal;'
So we make a table for feature identification number, key, value
.
[2]
[3]
This table improves query performance drastically. Indices can be optionally defined on the new columns, to improve performance further.
This tutorial assumes you have a table points(ogc_fid INTEGER PRIMARY KEY, all_tags VARCHAR)
.
[4]
Replace the names accordingly, based on your DB. See the ogr2ogr schema for info about how to find the names of the feature identifiers column and the OSM tags column in your DB.
We'll only work on table points
for simplicity, but you can repeat this work for each of the tables listed in the SQLite's vector_layers
, so lines, multilinestrings, multipolygons, other_relations, points
.
Extract tags from HSTORE
Do this if the tags column is like "key"=>"value"
.
You do not need to install PostgreSQL, but you need SqlAlchemy. For example, on Debian, install the package python3-sqlalchemy
. Save this into a file a.py
, then do chmod +x a.py ; ./a.py --target /path/to/file.sqlite --table points
#! /usr/bin/env python3
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--target")
parser.add_argument("--table")
args = parser.parse_args()
target, table = args.target, args.table
# Per <https://stackoverflow.com/a/56152470>
from sqlalchemy.dialects.postgresql import HSTORE
hstore_to_dict = HSTORE().result_processor(None, None)
create = (
f"CREATE TABLE {table}_fid_to_tag("
"ogc_fid INTEGER, key VARCHAR, value VARCHAR,"
f" FOREIGN KEY(ogc_fid) REFERENCES {table}(ogc_fid));"
)
select = f"SELECT ogc_fid, all_tags FROM {table} WHERE all_tags IS NOT NULL;"
insert = f"INSERT INTO {table}_fid_to_tag VALUES(?, ?, ?);"
import sqlite3
conn = sqlite3.connect(target)
conn.execute(create)
for (fid, tags) in conn.execute(select).fetchall():
for key, value in hstore_to_dict(tags).items():
conn.execute(insert, (fid, key, value)) and None
conn.commit() # required by close()
conn.close()
Extract tags from JSON
This section applies if the tags column is like {"key":"value"}
. This format is supported since GDAL 3.7, which is not yet released at the time of writing; so this section has not been tested yet, but here is a sketch of how it would work. Note that, for SQLite versions older than 3.38.0, redistributors of SQLite were allowed to disable JSON support, in which case the HSTORE route is your only option.
[5]
CREATE TABLE points_fid_to_tag(
ogc_fid INTEGER,
key VARCHAR,
value VARCHAR,
FOREIGN KEY(ogc_fid) REFERENCES points(ogc_fid))
AS SELECT ogc_fid, key, value
FROM points, json_tree(all_tags)
WHERE key IS NOT NULL;
Create indices
Query performance on our new table can be acceptable already, if the DB is not too large. Otherwise, you want that WHERE
uses an index. The indices created below should be enough.[6] See below to check that a query uses indices instead of iterating over all the rows of a table, and for help in making custom indices on other DB tables.
CREATE UNIQUE INDEX index_points_fid_to_tag_by_id_key
ON points_fid_to_tag(ogc_fid, key);
CREATE INDEX index_points_fid_to_tag_by_all
ON points_fid_to_tag(key, value, ogc_fid);
CREATE INDEX index_points_fid_to_tag_by_value
ON points_fid_to_tag(value);
Example queries
Remember:
- to compare multi-valued tags (separated by semicolon or other) you want
LIKE '%value%'
. - Put strings within
'
not within"
.
Get the OpenStreetMap ids of features:
SELECT osm_id
FROM points, points_fid_to_tag
WHERE points.ogc_fid = points_fid_to_tag.ogc_fid
AND key = 'amenity'
AND value = 'bench';
Get all the tags of a feature:
SELECT key, value
FROM points, points_fid_to_tag
WHERE points.ogc_fid = points_fid_to_tag.ogc_fid
AND osm_id = ...;
Intersect queries:
SELECT COUNT(*)
FROM (
SELECT ogc_fid
FROM points_fid_to_tag
WHERE key = 'amenity'
AND value = 'bench'
) AS benches,
(
SELECT ogc_fid
FROM points_fid_to_tag
WHERE key = 'material'
AND value LIKE '%wood%'
) AS wooden
WHERE benches.ogc_fid = wooden.ogc_fid;
General usage
Visualisation tools
General tools to navigate in a SQLite database:
sqlitebrowser
: open the tabNavigate data
, select the table in the dropdown box, filter columns by typing something in the text filters below the table column headerssqlite3
command-line program:- Type
sqlite3
to run SQL interactively in the terminal window - Or give it a SQL as an argument. For example, using the above optimization tutorial:
sqlite3 file.sqlite "select fid from points_fid_to_tag where key = 'amenity' and value = 'atm' limit 10;"
- Type
Indices
You can make indices on any column of any DB table. You can pass a query to .expert
to auto-suggest indices that speed up that specific query.
[7]
[8]
Beware that WHERE
uses indices which have WHERE
's columns as their first columns[9] and that at most one index is used for each table in a query[10].
For example, ogr2ogr
makes a column osm_id
. To make an index to speed up the WHERE osm_id = ...
part of a query, run:
CREATE INDEX index_points_by_osm_id ON points(osm_id);
To check that a query uses an index (for example, for the optimization tutorial above):
username@machine$ sqlite3 the_file.sqlite
sqlite> EXPLAIN QUERY PLAN
...> SELECT points.geometry
...> FROM points, points_fid_to_tag
...> WHERE points.ogc_fid = points_fid_to_tag.ogc_fid
...> AND points_fid_to_tag.key = 'amenity'
...> AND points_fid_to_tag.value = 'bench';
QUERY PLAN
|--SCAN points_fid_to_tag
`--SEARCH points USING INTEGER PRIMARY KEY (rowid=?)
sqlite>
sqlite> -- Make indices per above
sqlite> CREATE INDEX ...
sqlite>
sqlite> EXPLAIN QUERY PLAN
...> SELECT points.geometry
...> FROM points, points_fid_to_tag
...> WHERE points.ogc_fid = points_fid_to_tag.ogc_fid
...> AND points_fid_to_tag.key = 'amenity'
...> AND points_fid_to_tag.value = 'bench';
QUERY PLAN
|--SEARCH points_fid_to_tag USING COVERING INDEX index_points_fid_to_tag_by_all (key=? AND value=?)
`--SEARCH points USING INTEGER PRIMARY KEY (rowid=?)
See also
References
- ↑ Note that GDAL supports JSON since version 3.7. Before that,
ogr2ogr
can only emit HSTORE. - ↑
An alternative approach is to make one column per key. ogr2ogr supports this style for user-chosen keys that are hard-coded in ogrconf.ini's
attributes
key. If you do this, clearly a query for a rare key can fail withcolumn not found
instead of returning 0 records, which is inconvenient when doing QGis or sqlite interactively, and forces you to handle special cases when querying the DB programmatically (user script, cron job). Also, if you ever want to update the DB, it's easier (and more performant?) to add / remove rows than to alter the table schema. - ↑
Ideally, we would make a materialized view, so that a change to the original table
points
(= add or remove a feature, or change a feature's tags) would auto-update our new table. SQLite does not have materialized views, so we make a table instead; if the original table is changed, this new table will not be auto-updated (if you make a trigger, please add instructions). We do not make a STORED generated column, because generating two columns would need twice the parsing. - ↑ Having the primary key of exact type "INTEGER" (not "INT", not another type) exploits an internal SQLite optimization called rowid tables.
- ↑
If you want to learn more about
json_tree
, see the docs or run.mode box
and thenSELECT * FROM json_tree('{"outer":{"inner":"value"}}');
. - ↑ They were returned by passing the queries on this page to
.expert
. - ↑
.expert
is documented here in section 17. - ↑ If you run
.expert
and it gives errorsqlite3_expert_new: no such module: VirtualSpatialIndex
on Debian, then install the package libsqlite3-mod-spatialite and runSELECT load_extension("mod_spatialite");
. - ↑ query planner documentation's section "3. WHERE Clause Analysis"
- ↑ query planner documentation's section "8. Choosing Between Multiple Indexes"