Zh-hans:Databases and data access APIs
这一页面提供了可用于储存并控制OSM数据的数据库,如何获取数据以填充数据库,以及如何检索它们以找到有用的部分的概述。
这旨在作为为想要编写使用OSM数据的软件的新开发者提供的概述,而不是为末端用户提供信息。
OSM数据源
另请查看下载数据以了解基础选项的总结。
多样的OSM数据源(不仅是全世界,也包括一个小区域)在下方被链接至其他含更多细节的Wiki页面。
下面的大部分获取数据的方式以可以用其他工具填充数据库的OSM XML格式返回数据。数据格式在数据原语中有详细描述。
Planet.osm
每周,一份完整的当前的OSM数据集会以不同的格式被保存为Planet.osm。 有一些人将这个文件拆解为更小的文件,提供给不同的地区,并使导出在独立的镜像服务器可用。如果有需要的话,有很多工具可以将Planet文件切分成更小的地区。
实时更新的OSM数据和Planet文件之间的区别也以变更集的形式每分钟发布,这使维护一份最新的OSM数据集成为可能。
XAPI
The Xapi servers allow OSM data to be downloaded in XML format for a given region of the globe, filtered by tag. Xapi will return quite larger areas (city level) of the globe if requested, which makes it different to the standard OSM API described below.
API
The main API is the method of obtaining OSM data used by 编辑器, as this is the only method of changing the OSM data in the live database. The API page provides a link to the specification of the protocol to be used to obtain data.
Its limitations are:
Overpass API
Allows quite complex queries on larger areas.
GeoFabrik
Offers download of pre-selected regions like by-state. There are options, like what data to include and file format.
ProtoMaps
Offers downloads of .pbf data by bounding polygon and a time-limited link to re-download the same data. Some meta-data is omitted from tag-less nodes to minimize space.
DBMS的选择
There are several different databases systems used by OSM users:
Database | Benefits | Disbenefits | Used By |
---|---|---|---|
PostgreSQL | Can handle large datasets. The PostGIS extension allows the use geographic extensions | Requires database server to be installed, with associated administrative overhead | Main OSM API, Mapnik renderer |
MySQL | Can handle large datasets | Does not have geographic extensions. Requires database server to be installed, with associated administrative overhead | The main database API used MySQL until version 0.6, when it was changed to Postgresql |
SQLite | Small, does not require a database server | May struggle with large datasets - See Mail Archive (from 2008, may not be current) | Microcosm |
MongoDB | Native Geospatial Indexes and Queries | MongOSM, Node-Mongosm | |
Hadoop / Hive | Can handle very large datasets (known as big data). Extensions available for geospatial queries (for example ESRI GIS for Hadoop) | Requires Hadoop cluster to be installed, with associated administrative overhead | OSM2Hive |
Database Schemas
The database schema for the main database (openstreetmap.org) can be found here: Rails port/Database schema.
OSM uses different database schemas for different applications.
- Updatable
- Whether the schema supports updating with OsmChange format "diffs".
- This can be extremely important for keeping world-wide databases up-to-date, as it allows the database to be kept up-to-date without requiring a complete (and space- and time-consuming) full, worldwide re-import. However, if you only need a small extract, then re-importing that extract may be a quicker and easier method to keep up-to-date than using the OsmChange diffs.
- Geometries
- Whether the schema has pre-built geometries.
- Some database schemas provide native (e.g: PostGIS) geometries, which allows their use in other pieces of software which can read those geometry formats. Other database schemas may provide enough data to produce the geometries (e.g: nodes, ways, relations and their linkage) but not in a native format. Some can provide both. If you want to use the database with other bits of software such as a GIS editor then you probably want a schema with these geometries pre-built. However, if you are doing your own analysis, or are using software which is written to use OSM node/way/relations then you may not need the geometries.
- Lossless
- Whether the full set of OSM data is kept.
- Some schemas will retain the full set of OSM data, including versioning, user IDs, changeset information and all tags. This information is important for editors, and may be of importance to someone doing analysis. However, if it is not important then it may be better to choose a "lossy" schema, as it is likely to take up less disk space and may be quicker to import.
- hstore columns
- Whether the schema uses a key-value pair datatype for tags. (This datatype is called hstore in PostgreSQL.)
- hstore is perhaps the most straightforward approach to represent OSM's freeform tagging in PostgreSQL. However, not all tools use it and other databases might not have (or need) an equivalent.
Schema name | Created with | Used by | Primary use case | Updatable | Geometries (PostGIS) | Lossless | hstore columns | Database |
---|---|---|---|---|---|---|---|---|
osm2pgsql | osm2pgsql | Mapnik, Kothic JS | Rendering | 是 | 是 | 没有 | optional | PostgreSQL |
apidb | osmosis | API | Mirroring | 是 | 没有 | 是 | 没有 | PostgreSQL, MySQL |
pgsnapshot | osmosis | jXAPI | Analysis | 是 | optional | 是 | 是 | PostgreSQL |
imposm | Imposm | Rendering | 没有 | 是 | 没有 | Imposm2: no, Imposm3: yes | PostgreSQL | |
nominatim | osm2pgsql | Nominatim | Search, Geocoding | 是 | 是 | 是 | ? | PostgreSQL |
ogr2ogr | ogr2ogr | Analysis | 没有 | 是 | 没有 | optional | various | |
osmsharp | OsmSharp | Routing | 是 | 没有 | ? | ? | Oracle | |
overpass | Overpass API | Analysis | 是 | ? | 是 | ? | custom | |
mongosm | MongOSM | Analysis | maybe | ? | ? | ? | MongoDB | |
node-mongosm | Mongoosejs | Analysis | 是 | 是 | 是 | NA | MongoDB | |
osmium | Osmium | Analysis | 没有 | 是 | 没有 | 是 | PostgreSQL |
osm2pgsql
Osm2pgsql schema has historically been the standard way to import OSM data for use in rendering software such as Mapnik. It also has uses in analysis, although the schema does not support versioning or history directly. The import is handled by the Osm2pgsql software, which has two modes of operation, slim and non-slim, which control the amount of memory used by the software during import and whether it can be updated. Slim mode supports updates, but time taken to import is highly dependent on disk speed and may take several days for the full planet, even on a fast machine. Non-slim mode is faster, but does not support updates and requires a vast amount of memory.
The import process is lossy, and controlled by a configuration file in which the keys of elements of interest are listed. The values of these "interesting" elements are imported as columns in the points, lines and polygons tables. (Alternatively, values of all tags can be imported into a "hstore" type column.) These tables can be very large, and care must be paid to get good indexed performance. If the set of "interesting" keys changes after the import and no hstore column has been used, then the import must be re-run.
Starting with version 1.3.0, configuration became more flexible. A Lua script describes now the names, fields and types of database tables. For each processed OSM object, a Lua callback is called where you can describe which tables the object should be written to.
Osm2pgsql is used by Nominatim, too.
For more information, please see the Osm2pgsql website
apidb
ApiDB is a schema designed to replicate the storage of OSM data in the same manner as the main API schema and can be produced using the Osmosis commands for writing ApiDBs or updating ApiDBs with changes. This schema does not have any native geometry, although in the nodes, ways and relations tables there is enough data to reconstruct the geometries. This schema is not recommended for users who need geometries.
This schema does support history, although the import process does not, so it can be used for mirroring of the main OSM DB. A history will be generated as replication diffs are applied.
The import process, even on good hardware, can take several weeks for the full planet. The database will take approximately 1 TB as of April 2012.
For more information, please see the detailed usage page for Osmosis.
pgsnapshot
The pgsnapshot schema is a modified and simplified version of the main OSM DB schema which provides a number of useful features, including generating geometries and storing tags in a single hstore column for easier use and indexing. JXAPI's schema is built on pgsnapshot.
imposm
Imposm is an import tool, and is able to generate schemas using a mapping which is fully configurable. As such it really shouldn't count as its own schema, but it needed fitting in somehow. The ability to break data out thematically into different tables greatly simplifies the problem of indexing performance, and may result in smaller table and index sizes on-disk.
nominatim
Nominatim is a forward and reverse geocoder. The database is produced by a special back-end of Osm2pgsql. It is a special-purpose database, and may not be suitable for other problem domains such as rendering. The Nominatim homepage provides links to the detailed technical documentation, change logs, etc.
ogr2ogr
The OGR library can read OSM data (XML and PBF) and can write into various other formats, including PostgreSQL/PostGIS, SQLite/Spatialite, and MS SQL databases (though I've tried only PostGIS). The ogr2ogr utility can do the conversion without any programming necessary with a schema configuration that's reminiscent of osm2pgsql. One interesting feature is that it resolves relations into geometries: OSM multipolygons and boundaries become OGC MultiPolygon, OSM multilinestrings and routes become OGC MultiLineString, and other OSM relations become OGC GeometryCollection.
It is listed as lossy because membership info, such as nodes in ways and relation members, is not preserved. Metadata is optional. Untagged/unused nodes and ways are optional.
overpass
The Overpass_API is a query language built on top of a custom back-end database with software called OSM3S (see OSM3S/install for install and setup instructions). This is a custom database engine and it is therefore hard to compare it with other database schemas. You could recreate the complete planet file from the database. It is geared to have good performance on locally concentrated datasets.
osmsharp
OsmSharp is a toolbox of OSM-related routines, including some to import OSM data into Oracle databases.
mongosm
MongOSM is a set of Python scripts for importing, querying and (maybe) keeping up-to-date OSM data in a MongoDB database.
node-mongosm
Inspired by mongOSM, Node-MongOSM uses Mongoose to provide schemas and insert vs upsert options via a command line interface.
osmium
The Osmium toolset can read OSM data and with osmium export can write into PostgreSQL/PostGIS.
Objects are loaded into a single osmdata table with column geom and tags.
OSHDB
The OSHDB is a high-performance data analysis framework for analysing OSM's full-history data. Data can be stored in a relational (JDBC) or distributed database (Apache Ignite).