pystatis package

Below you find all pystatis submodules.

pystatis.cache module

Module provides functions/decorators to cache downloaded data as well as remove cached data.

pystatis.cache.cache_data(cache_dir, name, params, data, content_type)

Compress and archive data within the configured cache directory.

Data will be stored in a zip file within the cache directory. The folder structure will be <name>/<endpoint>/<method>/<hash(params)>. This allows to cache different results for different params.

Parameters:
  • cache_dir (str) – The cash directory as configured in the config.

  • name (str) – The unique identifier in GENESIS-Online.

  • params (dict) – The dictionary holding the params for this data request.

  • data (bytes) – The raw bytes content of the response from GENESIS-Online.

  • content_type (str) – The content type of the data, e.g. “csv” or “zip”.

Return type:

None

pystatis.cache.clear_cache(name=None)

Clean the data cache completely or just a specified name.

Parameters:

name (str, optional) – Unique name to be deleted from cached data.

Return type:

None

pystatis.cache.hit_in_cash(cache_dir, name, params)

Check if data is already cached.

Parameters:
  • cache_dir (str) – The cash directory as configured in the config.

  • name (str) – The unique identifier in GENESIS-Online.

  • params (dict) – The dictionary holding the params for this data request.

Returns:

True, if combination of name, endpoint, method and params is already cached.

Return type:

bool

pystatis.cache.normalize_name(name)

Normalize a Destatis object name by omitting the optional job id.

Parameters:

name (str) – The unique identifier in GENESIS-Online.

Returns:

The unique identifier without the optional job id.

Return type:

str

pystatis.cache.read_from_cache(cache_dir, name, params)

Read and return compressed data from cache.

Parameters:
  • cache_dir (str) – The cash directory as configured in the config.

  • name (str) – The unique identifier in GENESIS-Online.

  • params (dict) – The dictionary holding the params for this data request.

Returns:

The uncompressed raw text data as bytes.

Return type:

bytes

pystatis.config module

Module for handling config.ini files.

This package stores core information in the config.ini,

which is stored under the user home directory.

The user can change the default config directory

by setting the environment variable PYSTATIS_CONFIG_DIR or pass a custom directory to the init_config function.

The current config directory is always stored in the settings.ini file,

which is located under the default config directory, which the user can not change.

If the user does not specify a custom config directory,

the default config directory is used, which is ~/.pystatis on Linux and %USERHOME%/.pystatis on Windows.

The config.ini holds all relevant information

about all supported databases like user credentials.

When the package is loaded for the first time,

a default config will be created with empty credentials. Subsequent calls to other pystatis functions will throw an error until the user has filled in the credentials.

pystatis.config.config_exists()

Check if the config file exists.

Return type:

bool

pystatis.config.create_default_config()

Create a default config parser with empty credentials.

Return type:

None

pystatis.config.delete_config()

Delete the config file.

Return type:

None

pystatis.config.get_cache_dir()

Get the cache directory.

Return type:

str

pystatis.config.get_db_identifiers()

Get a list of regex patterns matching item codes in the supported databases.

Return type:

dict[str, Pattern[str]]

pystatis.config.get_supported_db()

Get a list of supported database names.

Return type:

list[str]

pystatis.config.init_config()

Create a new config .ini file in the given directory.

One-time function to be called for new users to create a new config.ini with default values (empty credentials).

Parameters:

config_dir (str, optional) – Path to the root config directory. Defaults to the user home directory.

Return type:

None

pystatis.config.load_config(config_file=None)

Load a config from a file.

Parameters:

config_file (Path | None)

Return type:

ConfigParser

pystatis.config.setup_credentials()

Setup credentials for all supported databases.

Return type:

None

pystatis.config.write_config()

Write a config to a file.

Return type:

None

pystatis.custom_exceptions module

Define custom “speaking” Exception and Error classes.

exception pystatis.exception.DestatisStatusError

Bases: ValueError

Raised when Destatis status code indicates an error (“Fehler”)

exception pystatis.exception.NoNewerDataError

Bases: Exception

Raised when no newer data is available for download (parameter stand, API Error Code 50).

exception pystatis.exception.PystatisConfigError

Bases: Exception

Raised when pystatis configuration is invalid.

exception pystatis.exception.TableNotFoundError

Bases: Exception

Raised when table is not found in the database (API Error Code 90).

pystatis.find module

Implements find endpoint to retrieve results based on query

class pystatis.find.Find(query, db_name, top_n_preview=5)

Bases: object

A class representing the find object that includes Result objects for variables, statistics, cubes and tables.

Parameters:
  • query (str)

  • db_name (str)

  • top_n_preview (int)

query

The query that is provided to find endpoint.

Type:

str

db_name

The database that is used for the query. One of “genesis”, “zensus”, “regio”.

Type:

str

statistics

Statistics that match with the query.

Type:

Results

tables

Tables that match with the query.

Type:

Results

variables

Variables that match with the query.

Type:

Results

cubes

Cubes that match with the query.

Type:

Results

run()

Queries the API and prints summary.

summary()

Prints summary of all results.

run()

Execute the search to find statistics, variables, tables and cubes.

summary()
Returns:

String that contains summary statistics.

Return type:

summary_string

pystatis.helloworld module

Module provides wrapper for HelloWorld GENESIS REST-API functions.

pystatis.helloworld.logincheck(db_name)

Wrapper method which constructs a URL for testing the Destatis API logincheck method, which tests the login credentials (from the config.ini).

Parameters:

db_name (str) – Name of the database to login to

Returns:

text logincheck response from Destatis

Return type:

str

pystatis.helloworld.whoami(db_name)

Wrapper method which constructs a URL for testing the Destatis API whoami method, which returns host name and IP address.

Parameters:

db_name (str) – Name of the database to test

Returns:

text test response from Destatis

Return type:

str

pystatis.http_helper module

Wrapper module for the data endpoint.

pystatis.http_helper.get_data_from_endpoint(endpoint, method, params, db_name=None)

Wrapper method which constructs a url for querying data from Destatis and sends a GET request.

Parameters:
  • endpoint (str) – Destatis endpoint (eg. data, catalogue, ..)

  • method (str) – Destatis method (eg. tablefile, …)

  • params (dict) – dictionary of query parameters

  • db_name (str, optional) – The database to use for this data request. One of “genesis”, “zensus”, “regio”. Defaults to None.

Returns:

the response object holding the response from calling the Destatis endpoint.

Return type:

requests.Response

pystatis.http_helper.get_data_from_resultfile(job_id, db_name=None)

Get data from a job once it is finished or when the timeout is reached.

Parameters:
  • job_id (str) – Job ID generated by Destatis API.

  • db_name (str, optional) – The database to use for this data request. One of “genesis”, “zensus”, “regio”. Defaults to None.

Returns:

the response object holding the response from calling the Destatis endpoint.

Return type:

requests.Response

pystatis.http_helper.get_job_id_from_response(response)

Get the job ID of a successfully started job.

Parameters:

response (requests.Response) – Response from endpoint request with job set equal to true.

Returns:

the job id.

Return type:

str

pystatis.http_helper.load_data(endpoint, method, params, db_name=None)

Load data identified by endpoint, method and params.

Either load data from cache (previous download) or from Destatis. If no database is given, params has to have a valid value for “name” key.

Parameters:
  • endpoint (str) – The endpoint for this data request.

  • method (str) – The method for this data request.

  • params (dict) – The dictionary holding the params for this data request.

  • db_name (str, optional) – The database to use for this data request. One of “genesis”, “zensus”, “regio”. Defaults to None.

Returns:

The response content as bytes data.

Return type:

bytes

pystatis.http_helper.start_job(endpoint, method, params)

Small helper function to start a job in the background.

Parameters:
  • endpoint (str) – Destatis endpoint (eg. data, catalogue, ..)

  • method (str) – Destatis method (eg. tablefile, …)

  • params (dict) – dictionary of query parameters

Returns:

the response object holding the response from calling the Destatis endpoint.

Return type:

requests.Response

pystatis.profile module

Module provides wrapper for Profile GENESIS REST-API functions.

pystatis.profile.change_password(db_name, new_password)

Changes Genesis REST-API password and updates local config.

Parameters:
  • new_password (str) – New password for the Genesis REST-API

  • db_name (str) – Database for which the password should be changed (genesis, zensus, regio)

Returns:

text response from Destatis

Return type:

str

pystatis.profile.remove_result(name, area='all')

Remove ‘Ergebnistabellen’ from the the permission space ‘area’. Should only apply for manually saved data, visible in ‘Meine Tabellen’ in the Web Interface.

Parameters:
  • name (str) – ‘Ergebnistabelle’ to be removed

  • area (str) – permission area in which the ‘Ergebnistabelle’ resides

Returns:

text response from Destatis

Return type:

str

pystatis.table module

Module contains business logic related to destatis tables.

class pystatis.table.Table(name)

Bases: object

A wrapper class holding all relevant data and metadata about a given table.

Parameters:
  • name (str) – The unique identifier of this table.

  • raw_data (str) – The raw tablefile data as returned by the /data/table endpoint.

  • data (pd.DataFrame) – The parsed data as a pandas data frame.

  • metadata (dict) – Metadata as returned by the /metadata/table endpoint.

static extract_ags_col(data, codes, label)

Extracts the AGS column from the data if present.

Parameters:
  • data (pd.DataFrame) – The data frame to extract the AGS column from.

  • codes (list[str]) – The AGS codes to look for in the data.

  • label (str) – The label of the AGS column.

Returns:

The AGS column if present, otherwise None.

Return type:

pd.Series | None

get_data(*, prettify=True, area='all', startyear='', endyear='', timeslices='', regionalvariable='', regionalkey='', stand='', language='de', quality='off')

Downloads raw data and metadata from GENESIS-Online.

Additional keyword arguments are passed on to the GENESIS-Online GET request for tablefile.

Parameters:
  • prettify (bool, optional) – Reformats the table into a readable format. Defaults to True.

  • area (str, optional) – Area to search for the object in GENESIS-Online. Defaults to “all”.

  • startyear (str, optional) – Data beginning with that year will be returned. Parameter is cumulative to timeslices. Supports 4 digits (jjjj) or 4+2 digits (jjjj/jj). Accepts values between “1900” and “2100”.

  • endyear (str, optional) – Data ending with that year will be returned. Parameter is cumulative to timeslices. Supports 4 digits (jjjj) or 4+2 digits (jjjj/jj). Accepts values between “1900” and “2100”.

  • timeslices (str, optional) – Number of time slices to be returned. This parameter is cumulative to startyear and endyear.

  • regionalvariable (str, optional) –

    “code” of the regional classification (RKMerkmal), to which the selection using regionalkey is to be applied. Accepts 1-6 characters. Possible values:

    • Regionalstatistik (only for tables ending with “B”, see /catalogue/variables):
      • ”DG” (Deutschland, 1) -> will not return extra column

      • ”DLAND” (Bundesländer, 16)

      • ”REGBEZ” (Regierungsbezirke, 44)

      • ”KREISE” (Kreise und kreisfreie Städte, 489)

      • ”GEMEIN” (Gemeinden, 13564)

    • Zensusdatenbank (for all tables, see /catalogue/variables):
      • ”GEODL1” (Deutschland, 1) -> will not return extra column

      • ”GEODL3” (Deutschland, 1) -> will not return extra column

      • ”GEOBL1” (Bundesländer, 16)

      • ”GEOBL3” (Bundesländer, 16)

      • ”GEOBZ1” (Bezirke (Hamburg und Berlin), 19)

      • ”GEOGM1” (Gemeinden, 11340)

      • ”GEOGM2” (Gemeinden mit min. 10_000 Einwohnern, 1574)

      • ”GEOGM3” (Gemeinden mit min. 10_000 Einwohnern, 1574)

      • ”GEOGM4” (Gemeinden (Gebietsstand 15.05.2022), 10787)

      • ”GEOLK1” (Landkreise und kreisfreie Städte, 412)

      • ”GEOLK3” (Landkreise und kreisfreie Städte, 412)

      • ”GEOLK4” (Landkreise u. krsfr. Städte (Stand 15.05.22), 400)

      • ”GEORB1” (Regierungsbezirke/Statistische Regionen, 36)

      • ”GEORB3” (Regierungsbezirke/Statistische Regionen, 36)

      • ”GEOVB1” (Gemeindeverbände, 1333)

      • ”GEOVB2” (Gemeindeverbände mit mindestens 10 000 Einwohnern, 338)

      • ”GEOVB3” (Gemeindeverbände mit mindestens 10 000 Einwohnern, 157)

      • ”GEOVB4” (Gemeindeverbände (Gebietsstand 15.05.2022), 1207)

  • regionalkey (str, optional) – Official municipality key (AGS). Multiple values can be passed as a comma-separated list. Accepts 1-12 characters. “*” can be used as wildcard.

  • stand (str, optional) – Only download the table if it is newer than the status date. “tt.mm.jjjj hh:mm” or “tt.mm.jjjj”. Example: “24.12.2001 19:15”.

  • language (str, optional) – Messages and data descriptions are supplied in this language. For GENESIS and Zensus, [‘de’, ‘en’] are supported. For Regionalstatistik, only ‘de’ is supported.

  • quality (str) – One of “on” or “off”. If “on”, quality symbols are part of the download and additional columns (__q) are displayed. Defaults to “off”. The explanation of the quality labels can be found online after retrieving the table values, table -> explanation of symbols or at e.g. https://www-genesis.destatis.de/genesis/online?operation=ergebnistabelleQualitaet&language=en&levelindex=3&levelid=1719342760835#abreadcrumb.

static parse_genesis_and_regio_table(data, language)

Parse ffcsv format for tables from GENESIS and Regionalstatistik into a more readable format

Parameters:
  • data (DataFrame)

  • language (str)

Return type:

DataFrame

static parse_zensus_table(data, language)

Parse Zensus table ffcsv format into a more readable format

Parameters:
  • data (DataFrame)

  • language (str)

Return type:

DataFrame

static prettify_table(data, db_name, language)

Reformat the data into a more readable table

Parameters:
  • data (pd.DataFrame) – A pandas dataframe created from raw_data

  • db_name (str) – The name of the database.

  • language (str) – The requested language. One of “de” or “en”.

Returns:

Formatted dataframe that omits all unnecessary Code columns and includes informative columns names

Return type:

pd.DataFrame

Overall Module contents

pystatis is a Python wrapper for the GENESIS web service interface (API).

Basic usage:

```python import pystatis

print(“Version:”, pystatis.__version__) ```

class pystatis.Find(query, db_name, top_n_preview=5)

Bases: object

A class representing the find object that includes Result objects for variables, statistics, cubes and tables.

Parameters:
  • query (str)

  • db_name (str)

  • top_n_preview (int)

query

The query that is provided to find endpoint.

Type:

str

db_name

The database that is used for the query. One of “genesis”, “zensus”, “regio”.

Type:

str

statistics

Statistics that match with the query.

Type:

Results

tables

Tables that match with the query.

Type:

Results

variables

Variables that match with the query.

Type:

Results

cubes

Cubes that match with the query.

Type:

Results

run()

Queries the API and prints summary.

summary()

Prints summary of all results.

run()

Execute the search to find statistics, variables, tables and cubes.

summary()
Returns:

String that contains summary statistics.

Return type:

summary_string

class pystatis.Table(name)

Bases: object

A wrapper class holding all relevant data and metadata about a given table.

Parameters:
  • name (str) – The unique identifier of this table.

  • raw_data (str) – The raw tablefile data as returned by the /data/table endpoint.

  • data (pd.DataFrame) – The parsed data as a pandas data frame.

  • metadata (dict) – Metadata as returned by the /metadata/table endpoint.

static extract_ags_col(data, codes, label)

Extracts the AGS column from the data if present.

Parameters:
  • data (pd.DataFrame) – The data frame to extract the AGS column from.

  • codes (list[str]) – The AGS codes to look for in the data.

  • label (str) – The label of the AGS column.

Returns:

The AGS column if present, otherwise None.

Return type:

pd.Series | None

get_data(*, prettify=True, area='all', startyear='', endyear='', timeslices='', regionalvariable='', regionalkey='', stand='', language='de', quality='off')

Downloads raw data and metadata from GENESIS-Online.

Additional keyword arguments are passed on to the GENESIS-Online GET request for tablefile.

Parameters:
  • prettify (bool, optional) – Reformats the table into a readable format. Defaults to True.

  • area (str, optional) – Area to search for the object in GENESIS-Online. Defaults to “all”.

  • startyear (str, optional) – Data beginning with that year will be returned. Parameter is cumulative to timeslices. Supports 4 digits (jjjj) or 4+2 digits (jjjj/jj). Accepts values between “1900” and “2100”.

  • endyear (str, optional) – Data ending with that year will be returned. Parameter is cumulative to timeslices. Supports 4 digits (jjjj) or 4+2 digits (jjjj/jj). Accepts values between “1900” and “2100”.

  • timeslices (str, optional) – Number of time slices to be returned. This parameter is cumulative to startyear and endyear.

  • regionalvariable (str, optional) –

    “code” of the regional classification (RKMerkmal), to which the selection using regionalkey is to be applied. Accepts 1-6 characters. Possible values:

    • Regionalstatistik (only for tables ending with “B”, see /catalogue/variables):
      • ”DG” (Deutschland, 1) -> will not return extra column

      • ”DLAND” (Bundesländer, 16)

      • ”REGBEZ” (Regierungsbezirke, 44)

      • ”KREISE” (Kreise und kreisfreie Städte, 489)

      • ”GEMEIN” (Gemeinden, 13564)

    • Zensusdatenbank (for all tables, see /catalogue/variables):
      • ”GEODL1” (Deutschland, 1) -> will not return extra column

      • ”GEODL3” (Deutschland, 1) -> will not return extra column

      • ”GEOBL1” (Bundesländer, 16)

      • ”GEOBL3” (Bundesländer, 16)

      • ”GEOBZ1” (Bezirke (Hamburg und Berlin), 19)

      • ”GEOGM1” (Gemeinden, 11340)

      • ”GEOGM2” (Gemeinden mit min. 10_000 Einwohnern, 1574)

      • ”GEOGM3” (Gemeinden mit min. 10_000 Einwohnern, 1574)

      • ”GEOGM4” (Gemeinden (Gebietsstand 15.05.2022), 10787)

      • ”GEOLK1” (Landkreise und kreisfreie Städte, 412)

      • ”GEOLK3” (Landkreise und kreisfreie Städte, 412)

      • ”GEOLK4” (Landkreise u. krsfr. Städte (Stand 15.05.22), 400)

      • ”GEORB1” (Regierungsbezirke/Statistische Regionen, 36)

      • ”GEORB3” (Regierungsbezirke/Statistische Regionen, 36)

      • ”GEOVB1” (Gemeindeverbände, 1333)

      • ”GEOVB2” (Gemeindeverbände mit mindestens 10 000 Einwohnern, 338)

      • ”GEOVB3” (Gemeindeverbände mit mindestens 10 000 Einwohnern, 157)

      • ”GEOVB4” (Gemeindeverbände (Gebietsstand 15.05.2022), 1207)

  • regionalkey (str, optional) – Official municipality key (AGS). Multiple values can be passed as a comma-separated list. Accepts 1-12 characters. “*” can be used as wildcard.

  • stand (str, optional) – Only download the table if it is newer than the status date. “tt.mm.jjjj hh:mm” or “tt.mm.jjjj”. Example: “24.12.2001 19:15”.

  • language (str, optional) – Messages and data descriptions are supplied in this language. For GENESIS and Zensus, [‘de’, ‘en’] are supported. For Regionalstatistik, only ‘de’ is supported.

  • quality (str) – One of “on” or “off”. If “on”, quality symbols are part of the download and additional columns (__q) are displayed. Defaults to “off”. The explanation of the quality labels can be found online after retrieving the table values, table -> explanation of symbols or at e.g. https://www-genesis.destatis.de/genesis/online?operation=ergebnistabelleQualitaet&language=en&levelindex=3&levelid=1719342760835#abreadcrumb.

static parse_genesis_and_regio_table(data, language)

Parse ffcsv format for tables from GENESIS and Regionalstatistik into a more readable format

Parameters:
  • data (DataFrame)

  • language (str)

Return type:

DataFrame

static parse_zensus_table(data, language)

Parse Zensus table ffcsv format into a more readable format

Parameters:
  • data (DataFrame)

  • language (str)

Return type:

DataFrame

static prettify_table(data, db_name, language)

Reformat the data into a more readable table

Parameters:
  • data (pd.DataFrame) – A pandas dataframe created from raw_data

  • db_name (str) – The name of the database.

  • language (str) – The requested language. One of “de” or “en”.

Returns:

Formatted dataframe that omits all unnecessary Code columns and includes informative columns names

Return type:

pd.DataFrame

pystatis.clear_cache(name=None)

Clean the data cache completely or just a specified name.

Parameters:

name (str, optional) – Unique name to be deleted from cached data.

Return type:

None

pystatis.logincheck(db_name)

Wrapper method which constructs a URL for testing the Destatis API logincheck method, which tests the login credentials (from the config.ini).

Parameters:

db_name (str) – Name of the database to login to

Returns:

text logincheck response from Destatis

Return type:

str

pystatis.setup_credentials()

Setup credentials for all supported databases.

Return type:

None

pystatis.whoami(db_name)

Wrapper method which constructs a URL for testing the Destatis API whoami method, which returns host name and IP address.

Parameters:

db_name (str) – Name of the database to test

Returns:

text test response from Destatis

Return type:

str