pystatis package
Below you find all pystatis submodules.
pystatis.cache module
Module provides functions/decorators to cache downloaded data as well as remove cached data.
- pystatis.cache.cache_data(cache_dir, name, params, data, content_type)
Compress and archive data within the configured cache directory.
Data will be stored in a zip file within the cache directory. The folder structure will be <name>/<endpoint>/<method>/<hash(params)>. This allows to cache different results for different params.
- Parameters:
cache_dir (str) – The cash directory as configured in the config.
name (str) – The unique identifier in GENESIS-Online.
params (dict) – The dictionary holding the params for this data request.
data (bytes) – The raw bytes content of the response from GENESIS-Online.
content_type (str) – The content type of the data, e.g. “csv” or “zip”.
- Return type:
None
- pystatis.cache.clear_cache(name=None)
Clean the data cache completely or just a specified name.
- Parameters:
name (str, optional) – Unique name to be deleted from cached data.
- Return type:
None
- pystatis.cache.hit_in_cash(cache_dir, name, params)
Check if data is already cached.
- Parameters:
cache_dir (str) – The cash directory as configured in the config.
name (str) – The unique identifier in GENESIS-Online.
params (dict) – The dictionary holding the params for this data request.
- Returns:
True, if combination of name, endpoint, method and params is already cached.
- Return type:
bool
- pystatis.cache.normalize_name(name)
Normalize a Destatis object name by omitting the optional job id.
- Parameters:
name (str) – The unique identifier in GENESIS-Online.
- Returns:
The unique identifier without the optional job id.
- Return type:
str
- pystatis.cache.read_from_cache(cache_dir, name, params)
Read and return compressed data from cache.
- Parameters:
cache_dir (str) – The cash directory as configured in the config.
name (str) – The unique identifier in GENESIS-Online.
params (dict) – The dictionary holding the params for this data request.
- Returns:
The uncompressed raw text data as bytes.
- Return type:
bytes
pystatis.config module
Module for handling config.ini files.
- This package stores core information in the config.ini,
which is stored under the user home directory.
- The user can change the default config directory
by setting the environment variable PYSTATIS_CONFIG_DIR or pass a custom directory to the init_config function.
- The current config directory is always stored in the settings.ini file,
which is located under the default config directory, which the user can not change.
- If the user does not specify a custom config directory,
the default config directory is used, which is ~/.pystatis on Linux and %USERHOME%/.pystatis on Windows.
- The config.ini holds all relevant information
about all supported databases like user credentials.
- When the package is loaded for the first time,
a default config will be created with empty credentials. Subsequent calls to other pystatis functions will throw an error until the user has filled in the credentials.
- pystatis.config.config_exists()
Check if the config file exists.
- Return type:
bool
- pystatis.config.create_default_config()
Create a default config parser with empty credentials.
- Return type:
None
- pystatis.config.delete_config()
Delete the config file.
- Return type:
None
- pystatis.config.get_cache_dir()
Get the cache directory.
- Return type:
str
- pystatis.config.get_db_identifiers()
Get a list of regex patterns matching item codes in the supported databases.
- Return type:
dict[str, Pattern[str]]
- pystatis.config.get_supported_db()
Get a list of supported database names.
- Return type:
list[str]
- pystatis.config.init_config()
Create a new config .ini file in the given directory.
One-time function to be called for new users to create a new config.ini with default values (empty credentials).
- Parameters:
config_dir (str, optional) – Path to the root config directory. Defaults to the user home directory.
- Return type:
None
- pystatis.config.load_config(config_file=None)
Load a config from a file.
- Parameters:
config_file (Path | None)
- Return type:
ConfigParser
- pystatis.config.setup_credentials()
Setup credentials for all supported databases.
- Return type:
None
- pystatis.config.write_config()
Write a config to a file.
- Return type:
None
pystatis.custom_exceptions module
Define custom “speaking” Exception and Error classes.
- exception pystatis.exception.DestatisStatusError
Bases:
ValueError
Raised when Destatis status code indicates an error (“Fehler”)
- exception pystatis.exception.NoNewerDataError
Bases:
Exception
Raised when no newer data is available for download (parameter stand, API Error Code 50).
- exception pystatis.exception.PystatisConfigError
Bases:
Exception
Raised when pystatis configuration is invalid.
- exception pystatis.exception.TableNotFoundError
Bases:
Exception
Raised when table is not found in the database (API Error Code 90).
pystatis.find module
Implements find endpoint to retrieve results based on query
- class pystatis.find.Find(query, db_name, top_n_preview=5)
Bases:
object
A class representing the find object that includes Result objects for variables, statistics, cubes and tables.
- Parameters:
query (str)
db_name (str)
top_n_preview (int)
- query
The query that is provided to find endpoint.
- Type:
str
- db_name
The database that is used for the query. One of “genesis”, “zensus”, “regio”.
- Type:
str
- statistics
Statistics that match with the query.
- Type:
Results
- tables
Tables that match with the query.
- Type:
Results
- variables
Variables that match with the query.
- Type:
Results
- cubes
Cubes that match with the query.
- Type:
Results
- run()
Queries the API and prints summary.
- Return type:
None
- summary()
Prints summary of all results.
- Return type:
str
- run()
Execute the search to find statistics, variables, tables and cubes.
- Return type:
None
- summary()
- Returns:
String that contains summary statistics.
- Return type:
str
pystatis.helloworld module
Module provides wrapper for HelloWorld GENESIS REST-API functions.
- pystatis.helloworld.logincheck(db_name)
Wrapper method which constructs a URL for testing the Destatis API logincheck method, which tests the login credentials (from the config.ini).
In addition, this method automatically terminates requests that have been running for longer than 15 minutes if too many requests are running in parallel. This method restores the ability to work by cleaning up if the limit has been exceeded.
- Parameters:
db_name (str) – Name of the database to login to
- Returns:
text logincheck response from Destatis
- Return type:
str
- pystatis.helloworld.whoami(db_name)
Wrapper method which constructs a URL for testing the Destatis API whoami method, which returns host name and IP address.
- Parameters:
db_name (str) – Name of the database to test
- Returns:
text test response from Destatis
- Return type:
str
pystatis.http_helper module
Wrapper module for the data endpoint.
- pystatis.http_helper.get_data_from_endpoint(endpoint, method, params, db_name=None)
Wrapper method which constructs a url for querying data from Destatis and sends a GET request.
- Parameters:
endpoint (str) – Destatis endpoint (eg. data, catalogue, ..)
method (str) – Destatis method (eg. tablefile, …)
params (dict) – dictionary of query parameters
db_name (str, optional) – The database to use for this data request. One of “genesis”, “zensus”, “regio”. Defaults to None.
- Returns:
the response object holding the response from calling the Destatis endpoint.
- Return type:
requests.Response
- pystatis.http_helper.get_data_from_resultfile(job_id, db_name=None)
Get data from a job once it is finished or when the timeout is reached.
- Parameters:
job_id (str) – Job ID generated by Destatis API.
db_name (str, optional) – The database to use for this data request. One of “genesis”, “zensus”, “regio”. Defaults to None.
- Returns:
the response object holding the response from calling the Destatis endpoint.
- Return type:
requests.Response
- pystatis.http_helper.get_job_id_from_response(response)
Get the job ID of a successfully started job.
- Parameters:
response (requests.Response) – Response from endpoint request with job set equal to true.
- Returns:
the job id.
- Return type:
str
- pystatis.http_helper.load_data(endpoint, method, params, db_name=None)
Load data identified by endpoint, method and params.
Either load data from cache (previous download) or from Destatis. If no database is given, params has to have a valid value for “name” key.
- Parameters:
endpoint (str) – The endpoint for this data request.
method (str) – The method for this data request.
params (dict) – The dictionary holding the params for this data request.
db_name (str, optional) – The database to use for this data request. One of “genesis”, “zensus”, “regio”. Defaults to None.
- Returns:
The response content as bytes data.
- Return type:
bytes
- pystatis.http_helper.start_job(endpoint, method, params)
Small helper function to start a job in the background.
- Parameters:
endpoint (str) – Destatis endpoint (eg. data, catalogue, ..)
method (str) – Destatis method (eg. tablefile, …)
params (dict) – dictionary of query parameters
- Returns:
the response object holding the response from calling the Destatis endpoint.
- Return type:
requests.Response
pystatis.profile module
Module provides wrapper for Profile GENESIS REST-API functions.
- pystatis.profile.change_password(db_name, new_password)
Changes Genesis REST-API password and updates local config.
- Parameters:
new_password (str) – New password for the Genesis REST-API
db_name (str) – Database for which the password should be changed (genesis, zensus, regio)
- Returns:
text response from Destatis
- Return type:
str
- pystatis.profile.remove_result(name, area='all')
Remove ‘Ergebnistabellen’ from the the permission space ‘area’. Should only apply for manually saved data, visible in ‘Meine Tabellen’ in the Web Interface.
- Parameters:
name (str) – ‘Ergebnistabelle’ to be removed
area (str) – permission area in which the ‘Ergebnistabelle’ resides
- Returns:
text response from Destatis
- Return type:
str
pystatis.table module
Module contains business logic related to destatis tables.
- class pystatis.table.Table(name)
Bases:
object
A wrapper class holding all relevant data and metadata about a given table.
- Parameters:
name (str) – The unique identifier of this table.
raw_data (str) – The raw tablefile data as returned by the /data/table endpoint.
data (pd.DataFrame) – The parsed data as a pandas data frame.
metadata (dict) – Metadata as returned by the /metadata/table endpoint.
- static extract_ags_col(data, codes, label)
Extracts the AGS column from the data if present.
- Parameters:
data (pd.DataFrame) – The data frame to extract the AGS column from.
codes (list[str]) – The AGS codes to look for in the data.
label (str) – The label of the AGS column.
- Returns:
The AGS column if present, otherwise None.
- Return type:
pd.Series | None
- get_data(*, prettify=True, area='all', startyear='', endyear='', timeslices='', regionalvariable='', regionalkey='', stand='', language='de', quality='off')
Downloads raw data and metadata from GENESIS-Online.
Additional keyword arguments are passed on to the GENESIS-Online GET request for tablefile.
- Parameters:
prettify (bool, optional) – Reformats the table into a readable format. Defaults to True.
area (str, optional) – Area to search for the object in GENESIS-Online. Defaults to “all”.
startyear (str, optional) – Data beginning with that year will be returned. Parameter is cumulative to timeslices. Supports 4 digits (jjjj) or 4+2 digits (jjjj/jj). Accepts values between “1900” and “2100”.
endyear (str, optional) – Data ending with that year will be returned. Parameter is cumulative to timeslices. Supports 4 digits (jjjj) or 4+2 digits (jjjj/jj). Accepts values between “1900” and “2100”.
timeslices (str, optional) – Number of time slices to be returned. This parameter is cumulative to startyear and endyear.
regionalvariable (str, optional) –
“code” of the regional classification (RKMerkmal), to which the selection using regionalkey is to be applied. Accepts 1-6 characters. Possible values:
- Regionalstatistik (only for tables ending with “B”, see /catalogue/variables):
”DG” (Deutschland, 1) -> will not return extra column
”DLAND” (Bundesländer, 16)
”REGBEZ” (Regierungsbezirke, 44)
”KREISE” (Kreise und kreisfreie Städte, 489)
”GEMEIN” (Gemeinden, 13564)
- Zensusdatenbank (for all tables, see /catalogue/variables):
”GEODL1” (Deutschland, 1) -> will not return extra column
”GEODL3” (Deutschland, 1) -> will not return extra column
”GEOBL1” (Bundesländer, 16)
”GEOBL3” (Bundesländer, 16)
”GEOBZ1” (Bezirke (Hamburg und Berlin), 19)
”GEOGM1” (Gemeinden, 11340)
”GEOGM2” (Gemeinden mit min. 10_000 Einwohnern, 1574)
”GEOGM3” (Gemeinden mit min. 10_000 Einwohnern, 1574)
”GEOGM4” (Gemeinden (Gebietsstand 15.05.2022), 10787)
”GEOLK1” (Landkreise und kreisfreie Städte, 412)
”GEOLK3” (Landkreise und kreisfreie Städte, 412)
”GEOLK4” (Landkreise u. krsfr. Städte (Stand 15.05.22), 400)
”GEORB1” (Regierungsbezirke/Statistische Regionen, 36)
”GEORB3” (Regierungsbezirke/Statistische Regionen, 36)
”GEOVB1” (Gemeindeverbände, 1333)
”GEOVB2” (Gemeindeverbände mit mindestens 10 000 Einwohnern, 338)
”GEOVB3” (Gemeindeverbände mit mindestens 10 000 Einwohnern, 157)
”GEOVB4” (Gemeindeverbände (Gebietsstand 15.05.2022), 1207)
regionalkey (str, optional) – Official municipality key (AGS). Multiple values can be passed as a comma-separated list. Accepts 1-12 characters. “*” can be used as wildcard.
stand (str, optional) – Only download the table if it is newer than the status date. “tt.mm.jjjj hh:mm” or “tt.mm.jjjj”. Example: “24.12.2001 19:15”.
language (str, optional) – Messages and data descriptions are supplied in this language. For GENESIS and Zensus, [‘de’, ‘en’] are supported. For Regionalstatistik, only ‘de’ is supported.
quality (str) – One of “on” or “off”. If “on”, quality symbols are part of the download and additional columns (__q) are displayed. Defaults to “off”. The explanation of the quality labels can be found online after retrieving the table values, table -> explanation of symbols or at e.g. https://www-genesis.destatis.de/genesis/online?operation=ergebnistabelleQualitaet&language=en&levelindex=3&levelid=1719342760835#abreadcrumb.
- Return type:
None
- static parse_v4_table(data, language)
Parse ffcsv format for tables from GENESIS and Regionalstatistik into a more readable format
- Parameters:
data (DataFrame)
language (str)
- Return type:
DataFrame
- static parse_v5_table(data, db_name, language)
Parse Zensus table ffcsv format into a more readable format
- Parameters:
data (DataFrame)
db_name (str)
language (str)
- Return type:
DataFrame
- static prettify_table(data, db_name, language)
Reformat the data into a more readable table
- Parameters:
data (pd.DataFrame) – A pandas dataframe created from raw_data
db_name (str) – The name of the database.
language (str) – The requested language. One of “de” or “en”.
- Returns:
Formatted dataframe that omits all unnecessary Code columns and includes informative columns names
- Return type:
pd.DataFrame
Overall Module contents
pystatis is a Python wrapper for the GENESIS web service interface (API).
Basic usage:
print(“Version:”, pystatis.__version__) ```
- class pystatis.Find(query, db_name, top_n_preview=5)
Bases:
object
A class representing the find object that includes Result objects for variables, statistics, cubes and tables.
- Parameters:
query (str)
db_name (str)
top_n_preview (int)
- query
The query that is provided to find endpoint.
- Type:
str
- db_name
The database that is used for the query. One of “genesis”, “zensus”, “regio”.
- Type:
str
- statistics
Statistics that match with the query.
- Type:
Results
- tables
Tables that match with the query.
- Type:
Results
- variables
Variables that match with the query.
- Type:
Results
- cubes
Cubes that match with the query.
- Type:
Results
- run()
Queries the API and prints summary.
- Return type:
None
- summary()
Prints summary of all results.
- Return type:
str
- run()
Execute the search to find statistics, variables, tables and cubes.
- Return type:
None
- summary()
- Returns:
String that contains summary statistics.
- Return type:
str
- class pystatis.Table(name)
Bases:
object
A wrapper class holding all relevant data and metadata about a given table.
- Parameters:
name (str) – The unique identifier of this table.
raw_data (str) – The raw tablefile data as returned by the /data/table endpoint.
data (pd.DataFrame) – The parsed data as a pandas data frame.
metadata (dict) – Metadata as returned by the /metadata/table endpoint.
- static extract_ags_col(data, codes, label)
Extracts the AGS column from the data if present.
- Parameters:
data (pd.DataFrame) – The data frame to extract the AGS column from.
codes (list[str]) – The AGS codes to look for in the data.
label (str) – The label of the AGS column.
- Returns:
The AGS column if present, otherwise None.
- Return type:
pd.Series | None
- get_data(*, prettify=True, area='all', startyear='', endyear='', timeslices='', regionalvariable='', regionalkey='', stand='', language='de', quality='off')
Downloads raw data and metadata from GENESIS-Online.
Additional keyword arguments are passed on to the GENESIS-Online GET request for tablefile.
- Parameters:
prettify (bool, optional) – Reformats the table into a readable format. Defaults to True.
area (str, optional) – Area to search for the object in GENESIS-Online. Defaults to “all”.
startyear (str, optional) – Data beginning with that year will be returned. Parameter is cumulative to timeslices. Supports 4 digits (jjjj) or 4+2 digits (jjjj/jj). Accepts values between “1900” and “2100”.
endyear (str, optional) – Data ending with that year will be returned. Parameter is cumulative to timeslices. Supports 4 digits (jjjj) or 4+2 digits (jjjj/jj). Accepts values between “1900” and “2100”.
timeslices (str, optional) – Number of time slices to be returned. This parameter is cumulative to startyear and endyear.
regionalvariable (str, optional) –
“code” of the regional classification (RKMerkmal), to which the selection using regionalkey is to be applied. Accepts 1-6 characters. Possible values:
- Regionalstatistik (only for tables ending with “B”, see /catalogue/variables):
”DG” (Deutschland, 1) -> will not return extra column
”DLAND” (Bundesländer, 16)
”REGBEZ” (Regierungsbezirke, 44)
”KREISE” (Kreise und kreisfreie Städte, 489)
”GEMEIN” (Gemeinden, 13564)
- Zensusdatenbank (for all tables, see /catalogue/variables):
”GEODL1” (Deutschland, 1) -> will not return extra column
”GEODL3” (Deutschland, 1) -> will not return extra column
”GEOBL1” (Bundesländer, 16)
”GEOBL3” (Bundesländer, 16)
”GEOBZ1” (Bezirke (Hamburg und Berlin), 19)
”GEOGM1” (Gemeinden, 11340)
”GEOGM2” (Gemeinden mit min. 10_000 Einwohnern, 1574)
”GEOGM3” (Gemeinden mit min. 10_000 Einwohnern, 1574)
”GEOGM4” (Gemeinden (Gebietsstand 15.05.2022), 10787)
”GEOLK1” (Landkreise und kreisfreie Städte, 412)
”GEOLK3” (Landkreise und kreisfreie Städte, 412)
”GEOLK4” (Landkreise u. krsfr. Städte (Stand 15.05.22), 400)
”GEORB1” (Regierungsbezirke/Statistische Regionen, 36)
”GEORB3” (Regierungsbezirke/Statistische Regionen, 36)
”GEOVB1” (Gemeindeverbände, 1333)
”GEOVB2” (Gemeindeverbände mit mindestens 10 000 Einwohnern, 338)
”GEOVB3” (Gemeindeverbände mit mindestens 10 000 Einwohnern, 157)
”GEOVB4” (Gemeindeverbände (Gebietsstand 15.05.2022), 1207)
regionalkey (str, optional) – Official municipality key (AGS). Multiple values can be passed as a comma-separated list. Accepts 1-12 characters. “*” can be used as wildcard.
stand (str, optional) – Only download the table if it is newer than the status date. “tt.mm.jjjj hh:mm” or “tt.mm.jjjj”. Example: “24.12.2001 19:15”.
language (str, optional) – Messages and data descriptions are supplied in this language. For GENESIS and Zensus, [‘de’, ‘en’] are supported. For Regionalstatistik, only ‘de’ is supported.
quality (str) – One of “on” or “off”. If “on”, quality symbols are part of the download and additional columns (__q) are displayed. Defaults to “off”. The explanation of the quality labels can be found online after retrieving the table values, table -> explanation of symbols or at e.g. https://www-genesis.destatis.de/genesis/online?operation=ergebnistabelleQualitaet&language=en&levelindex=3&levelid=1719342760835#abreadcrumb.
- Return type:
None
- static parse_v4_table(data, language)
Parse ffcsv format for tables from GENESIS and Regionalstatistik into a more readable format
- Parameters:
data (DataFrame)
language (str)
- Return type:
DataFrame
- static parse_v5_table(data, db_name, language)
Parse Zensus table ffcsv format into a more readable format
- Parameters:
data (DataFrame)
db_name (str)
language (str)
- Return type:
DataFrame
- static prettify_table(data, db_name, language)
Reformat the data into a more readable table
- Parameters:
data (pd.DataFrame) – A pandas dataframe created from raw_data
db_name (str) – The name of the database.
language (str) – The requested language. One of “de” or “en”.
- Returns:
Formatted dataframe that omits all unnecessary Code columns and includes informative columns names
- Return type:
pd.DataFrame
- pystatis.clear_cache(name=None)
Clean the data cache completely or just a specified name.
- Parameters:
name (str, optional) – Unique name to be deleted from cached data.
- Return type:
None
- pystatis.logincheck(db_name)
Wrapper method which constructs a URL for testing the Destatis API logincheck method, which tests the login credentials (from the config.ini).
In addition, this method automatically terminates requests that have been running for longer than 15 minutes if too many requests are running in parallel. This method restores the ability to work by cleaning up if the limit has been exceeded.
- Parameters:
db_name (str) – Name of the database to login to
- Returns:
text logincheck response from Destatis
- Return type:
str
- pystatis.setup_credentials()
Setup credentials for all supported databases.
- Return type:
None
- pystatis.whoami(db_name)
Wrapper method which constructs a URL for testing the Destatis API whoami method, which returns host name and IP address.
- Parameters:
db_name (str) – Name of the database to test
- Returns:
text test response from Destatis
- Return type:
str