pystatis package
Below you find all pystatis submodules.
pystatis.cache module
Module provides functions/decorators to cache downloaded data as well as remove cached data.
- pystatis.cache.cache_data(cache_dir, name, params, data, content_type)
Compress and archive data within the configured cache directory.
Data will be stored in a zip file within the cache directory. The folder structure will be <name>/<endpoint>/<method>/<hash(params)>. This allows to cache different results for different params.
- Parameters:
cache_dir (str) – The cash directory as configured in the config.
name (str) – The unique identifier in GENESIS-Online.
params (dict) – The dictionary holding the params for this data request.
data (bytes) – The raw bytes content of the response from GENESIS-Online.
content_type (str) – The content type of the data, e.g. “csv” or “zip”.
- Return type:
None
- pystatis.cache.clear_cache(name=None)
Clean the data cache completely or just a specified name.
- Parameters:
name (str, optional) – Unique name to be deleted from cached data.
- Return type:
None
- pystatis.cache.hit_in_cash(cache_dir, name, params)
Check if data is already cached.
- Parameters:
cache_dir (str) – The cash directory as configured in the config.
name (str) – The unique identifier in GENESIS-Online.
params (dict) – The dictionary holding the params for this data request.
- Returns:
True, if combination of name, endpoint, method and params is already cached.
- Return type:
bool
- pystatis.cache.normalize_name(name)
Normalize a Destatis object name by omitting the optional job id.
- Parameters:
name (str) – The unique identifier in GENESIS-Online.
- Returns:
The unique identifier without the optional job id.
- Return type:
str
- pystatis.cache.read_from_cache(cache_dir, name, params)
Read and return compressed data from cache.
- Parameters:
cache_dir (str) – The cash directory as configured in the config.
name (str) – The unique identifier in GENESIS-Online.
params (dict) – The dictionary holding the params for this data request.
- Returns:
The uncompressed raw text data as bytes.
- Return type:
bytes
pystatis.config module
Module for handling config.ini files.
- This package stores core information in the config.ini,
which is stored under the user home directory.
- The user can change the default config directory
by setting the environment variable PYSTATIS_CONFIG_DIR or pass a custom directory to the init_config function.
- The current config directory is always stored in the settings.ini file,
which is located under the default config directory, which the user can not change.
- If the user does not specify a custom config directory,
the default config directory is used, which is ~/.pystatis on Linux and %USERHOME%/.pystatis on Windows.
- The config.ini holds all relevant information
about all supported databases like user credentials.
- When the package is loaded for the first time,
a default config will be created with empty credentials. Subsequent calls to other pystatis functions will throw an error until the user has filled in the credentials.
- pystatis.config.config_exists()
Check if the config file exists.
- Return type:
bool
- pystatis.config.create_default_config()
Create a default config parser with empty credentials.
- Return type:
None
- pystatis.config.delete_config()
Delete the config file.
- Return type:
None
- pystatis.config.get_cache_dir()
Get the cache directory.
- Return type:
str
- pystatis.config.get_db_identifiers()
Get a list of regex patterns matching item codes in the supported databases.
- Return type:
dict[str, Pattern[str]]
- pystatis.config.get_supported_db()
Get a list of supported database names.
- Return type:
list[str]
- pystatis.config.init_config()
Create a new config .ini file in the given directory.
One-time function to be called for new users to create a new config.ini with default values (empty credentials).
- Parameters:
config_dir (str, optional) – Path to the root config directory. Defaults to the user home directory.
- Return type:
None
- pystatis.config.load_config(config_file=None)
Load a config from a file.
- Parameters:
config_file (Path | None)
- Return type:
ConfigParser
- pystatis.config.setup_credentials()
Setup credentials for all supported databases.
- Return type:
None
- pystatis.config.write_config()
Write a config to a file.
- Return type:
None
pystatis.custom_exceptions module
Define custom “speaking” Exception and Error classes.
- exception pystatis.exception.DestatisStatusError
Bases:
ValueError
Raised when Destatis status code indicates an error (“Fehler”)
- exception pystatis.exception.NoNewerDataError
Bases:
Exception
Raised when no newer data is available for download (parameter stand, API Error Code 50).
- exception pystatis.exception.PystatisConfigError
Bases:
Exception
Raised when pystatis configuration is invalid.
- exception pystatis.exception.TableNotFoundError
Bases:
Exception
Raised when table is not found in the database (API Error Code 90).
pystatis.find module
Implements find endpoint to retrieve results based on query
- class pystatis.find.Find(query, db_name, top_n_preview=5)
Bases:
object
A class representing the find object that includes Result objects for variables, statistics, cubes and tables.
- Parameters:
query (str)
db_name (str)
top_n_preview (int)
- query
The query that is provided to find endpoint.
- Type:
str
- db_name
The database that is used for the query. One of “genesis”, “zensus”, “regio”.
- Type:
str
- statistics
Statistics that match with the query.
- Type:
Results
- tables
Tables that match with the query.
- Type:
Results
- variables
Variables that match with the query.
- Type:
Results
- cubes
Cubes that match with the query.
- Type:
Results
- run()
Queries the API and prints summary.
- summary()
Prints summary of all results.
- run()
Execute the search to find statistics, variables, tables and cubes.
- summary()
- Returns:
String that contains summary statistics.
- Return type:
summary_string
pystatis.helloworld module
Module provides wrapper for HelloWorld GENESIS REST-API functions.
- pystatis.helloworld.logincheck(db_name)
Wrapper method which constructs a URL for testing the Destatis API logincheck method, which tests the login credentials (from the config.ini).
- Parameters:
db_name (str) – Name of the database to login to
- Returns:
text logincheck response from Destatis
- Return type:
str
- pystatis.helloworld.whoami(db_name)
Wrapper method which constructs a URL for testing the Destatis API whoami method, which returns host name and IP address.
- Parameters:
db_name (str) – Name of the database to test
- Returns:
text test response from Destatis
- Return type:
str
pystatis.http_helper module
Wrapper module for the data endpoint.
- pystatis.http_helper.get_data_from_endpoint(endpoint, method, params, db_name=None)
Wrapper method which constructs a url for querying data from Destatis and sends a GET request.
- Parameters:
endpoint (str) – Destatis endpoint (eg. data, catalogue, ..)
method (str) – Destatis method (eg. tablefile, …)
params (dict) – dictionary of query parameters
db_name (str, optional) – The database to use for this data request. One of “genesis”, “zensus”, “regio”. Defaults to None.
- Returns:
the response object holding the response from calling the Destatis endpoint.
- Return type:
requests.Response
- pystatis.http_helper.get_data_from_resultfile(job_id, db_name=None)
Get data from a job once it is finished or when the timeout is reached.
- Parameters:
job_id (str) – Job ID generated by Destatis API.
db_name (str, optional) – The database to use for this data request. One of “genesis”, “zensus”, “regio”. Defaults to None.
- Returns:
the response object holding the response from calling the Destatis endpoint.
- Return type:
requests.Response
- pystatis.http_helper.get_job_id_from_response(response)
Get the job ID of a successfully started job.
- Parameters:
response (requests.Response) – Response from endpoint request with job set equal to true.
- Returns:
the job id.
- Return type:
str
- pystatis.http_helper.load_data(endpoint, method, params, db_name=None)
Load data identified by endpoint, method and params.
Either load data from cache (previous download) or from Destatis. If no database is given, params has to have a valid value for “name” key.
- Parameters:
endpoint (str) – The endpoint for this data request.
method (str) – The method for this data request.
params (dict) – The dictionary holding the params for this data request.
db_name (str, optional) – The database to use for this data request. One of “genesis”, “zensus”, “regio”. Defaults to None.
- Returns:
The response content as bytes data.
- Return type:
bytes
- pystatis.http_helper.start_job(endpoint, method, params)
Small helper function to start a job in the background.
- Parameters:
endpoint (str) – Destatis endpoint (eg. data, catalogue, ..)
method (str) – Destatis method (eg. tablefile, …)
params (dict) – dictionary of query parameters
- Returns:
the response object holding the response from calling the Destatis endpoint.
- Return type:
requests.Response
pystatis.profile module
Module provides wrapper for Profile GENESIS REST-API functions.
- pystatis.profile.change_password(db_name, new_password)
Changes Genesis REST-API password and updates local config.
- Parameters:
new_password (str) – New password for the Genesis REST-API
db_name (str) – Database for which the password should be changed (genesis, zensus, regio)
- Returns:
text response from Destatis
- Return type:
str
- pystatis.profile.remove_result(name, area='all')
Remove ‘Ergebnistabellen’ from the the permission space ‘area’. Should only apply for manually saved data, visible in ‘Meine Tabellen’ in the Web Interface.
- Parameters:
name (str) – ‘Ergebnistabelle’ to be removed
area (str) – permission area in which the ‘Ergebnistabelle’ resides
- Returns:
text response from Destatis
- Return type:
str
pystatis.table module
Module contains business logic related to destatis tables.
- class pystatis.table.Table(name)
Bases:
object
A wrapper class holding all relevant data and metadata about a given table.
- Parameters:
name (str) – The unique identifier of this table.
raw_data (str) – The raw tablefile data as returned by the /data/table endpoint.
data (pd.DataFrame) – The parsed data as a pandas data frame.
metadata (dict) – Metadata as returned by the /metadata/table endpoint.
- static extract_ags_col(data, codes, label)
Extracts the AGS column from the data if present.
- Parameters:
data (pd.DataFrame) – The data frame to extract the AGS column from.
codes (list[str]) – The AGS codes to look for in the data.
label (str) – The label of the AGS column.
- Returns:
The AGS column if present, otherwise None.
- Return type:
pd.Series | None
- get_data(*, prettify=True, area='all', startyear='', endyear='', timeslices='', regionalvariable='', regionalkey='', stand='', language='de', quality='off')
Downloads raw data and metadata from GENESIS-Online.
Additional keyword arguments are passed on to the GENESIS-Online GET request for tablefile.
- Parameters:
prettify (bool, optional) – Reformats the table into a readable format. Defaults to True.
area (str, optional) – Area to search for the object in GENESIS-Online. Defaults to “all”.
startyear (str, optional) – Data beginning with that year will be returned. Parameter is cumulative to timeslices. Supports 4 digits (jjjj) or 4+2 digits (jjjj/jj). Accepts values between “1900” and “2100”.
endyear (str, optional) – Data ending with that year will be returned. Parameter is cumulative to timeslices. Supports 4 digits (jjjj) or 4+2 digits (jjjj/jj). Accepts values between “1900” and “2100”.
timeslices (str, optional) – Number of time slices to be returned. This parameter is cumulative to startyear and endyear.
regionalvariable (str, optional) –
“code” of the regional classification (RKMerkmal), to which the selection using regionalkey is to be applied. Accepts 1-6 characters. Possible values:
- Regionalstatistik (only for tables ending with “B”, see /catalogue/variables):
”DG” (Deutschland, 1) -> will not return extra column
”DLAND” (Bundesländer, 16)
”REGBEZ” (Regierungsbezirke, 44)
”KREISE” (Kreise und kreisfreie Städte, 489)
”GEMEIN” (Gemeinden, 13564)
- Zensusdatenbank (for all tables, see /catalogue/variables):
”GEODL1” (Deutschland, 1) -> will not return extra column
”GEODL3” (Deutschland, 1) -> will not return extra column
”GEOBL1” (Bundesländer, 16)
”GEOBL3” (Bundesländer, 16)
”GEOBZ1” (Bezirke (Hamburg und Berlin), 19)
”GEOGM1” (Gemeinden, 11340)
”GEOGM2” (Gemeinden mit min. 10_000 Einwohnern, 1574)
”GEOGM3” (Gemeinden mit min. 10_000 Einwohnern, 1574)
”GEOGM4” (Gemeinden (Gebietsstand 15.05.2022), 10787)
”GEOLK1” (Landkreise und kreisfreie Städte, 412)
”GEOLK3” (Landkreise und kreisfreie Städte, 412)
”GEOLK4” (Landkreise u. krsfr. Städte (Stand 15.05.22), 400)
”GEORB1” (Regierungsbezirke/Statistische Regionen, 36)
”GEORB3” (Regierungsbezirke/Statistische Regionen, 36)
”GEOVB1” (Gemeindeverbände, 1333)
”GEOVB2” (Gemeindeverbände mit mindestens 10 000 Einwohnern, 338)
”GEOVB3” (Gemeindeverbände mit mindestens 10 000 Einwohnern, 157)
”GEOVB4” (Gemeindeverbände (Gebietsstand 15.05.2022), 1207)
regionalkey (str, optional) – Official municipality key (AGS). Multiple values can be passed as a comma-separated list. Accepts 1-12 characters. “*” can be used as wildcard.
stand (str, optional) – Only download the table if it is newer than the status date. “tt.mm.jjjj hh:mm” or “tt.mm.jjjj”. Example: “24.12.2001 19:15”.
language (str, optional) – Messages and data descriptions are supplied in this language. For GENESIS and Zensus, [‘de’, ‘en’] are supported. For Regionalstatistik, only ‘de’ is supported.
quality (str) – One of “on” or “off”. If “on”, quality symbols are part of the download and additional columns (__q) are displayed. Defaults to “off”. The explanation of the quality labels can be found online after retrieving the table values, table -> explanation of symbols or at e.g. https://www-genesis.destatis.de/genesis/online?operation=ergebnistabelleQualitaet&language=en&levelindex=3&levelid=1719342760835#abreadcrumb.
- static parse_genesis_and_regio_table(data, language)
Parse ffcsv format for tables from GENESIS and Regionalstatistik into a more readable format
- Parameters:
data (DataFrame)
language (str)
- Return type:
DataFrame
- static parse_zensus_table(data, language)
Parse Zensus table ffcsv format into a more readable format
- Parameters:
data (DataFrame)
language (str)
- Return type:
DataFrame
- static prettify_table(data, db_name, language)
Reformat the data into a more readable table
- Parameters:
data (pd.DataFrame) – A pandas dataframe created from raw_data
db_name (str) – The name of the database.
language (str) – The requested language. One of “de” or “en”.
- Returns:
Formatted dataframe that omits all unnecessary Code columns and includes informative columns names
- Return type:
pd.DataFrame
Overall Module contents
pystatis is a Python wrapper for the GENESIS web service interface (API).
Basic usage:
print(“Version:”, pystatis.__version__) ```
- class pystatis.Find(query, db_name, top_n_preview=5)
Bases:
object
A class representing the find object that includes Result objects for variables, statistics, cubes and tables.
- Parameters:
query (str)
db_name (str)
top_n_preview (int)
- query
The query that is provided to find endpoint.
- Type:
str
- db_name
The database that is used for the query. One of “genesis”, “zensus”, “regio”.
- Type:
str
- statistics
Statistics that match with the query.
- Type:
Results
- tables
Tables that match with the query.
- Type:
Results
- variables
Variables that match with the query.
- Type:
Results
- cubes
Cubes that match with the query.
- Type:
Results
- run()
Queries the API and prints summary.
- summary()
Prints summary of all results.
- run()
Execute the search to find statistics, variables, tables and cubes.
- summary()
- Returns:
String that contains summary statistics.
- Return type:
summary_string
- class pystatis.Table(name)
Bases:
object
A wrapper class holding all relevant data and metadata about a given table.
- Parameters:
name (str) – The unique identifier of this table.
raw_data (str) – The raw tablefile data as returned by the /data/table endpoint.
data (pd.DataFrame) – The parsed data as a pandas data frame.
metadata (dict) – Metadata as returned by the /metadata/table endpoint.
- static extract_ags_col(data, codes, label)
Extracts the AGS column from the data if present.
- Parameters:
data (pd.DataFrame) – The data frame to extract the AGS column from.
codes (list[str]) – The AGS codes to look for in the data.
label (str) – The label of the AGS column.
- Returns:
The AGS column if present, otherwise None.
- Return type:
pd.Series | None
- get_data(*, prettify=True, area='all', startyear='', endyear='', timeslices='', regionalvariable='', regionalkey='', stand='', language='de', quality='off')
Downloads raw data and metadata from GENESIS-Online.
Additional keyword arguments are passed on to the GENESIS-Online GET request for tablefile.
- Parameters:
prettify (bool, optional) – Reformats the table into a readable format. Defaults to True.
area (str, optional) – Area to search for the object in GENESIS-Online. Defaults to “all”.
startyear (str, optional) – Data beginning with that year will be returned. Parameter is cumulative to timeslices. Supports 4 digits (jjjj) or 4+2 digits (jjjj/jj). Accepts values between “1900” and “2100”.
endyear (str, optional) – Data ending with that year will be returned. Parameter is cumulative to timeslices. Supports 4 digits (jjjj) or 4+2 digits (jjjj/jj). Accepts values between “1900” and “2100”.
timeslices (str, optional) – Number of time slices to be returned. This parameter is cumulative to startyear and endyear.
regionalvariable (str, optional) –
“code” of the regional classification (RKMerkmal), to which the selection using regionalkey is to be applied. Accepts 1-6 characters. Possible values:
- Regionalstatistik (only for tables ending with “B”, see /catalogue/variables):
”DG” (Deutschland, 1) -> will not return extra column
”DLAND” (Bundesländer, 16)
”REGBEZ” (Regierungsbezirke, 44)
”KREISE” (Kreise und kreisfreie Städte, 489)
”GEMEIN” (Gemeinden, 13564)
- Zensusdatenbank (for all tables, see /catalogue/variables):
”GEODL1” (Deutschland, 1) -> will not return extra column
”GEODL3” (Deutschland, 1) -> will not return extra column
”GEOBL1” (Bundesländer, 16)
”GEOBL3” (Bundesländer, 16)
”GEOBZ1” (Bezirke (Hamburg und Berlin), 19)
”GEOGM1” (Gemeinden, 11340)
”GEOGM2” (Gemeinden mit min. 10_000 Einwohnern, 1574)
”GEOGM3” (Gemeinden mit min. 10_000 Einwohnern, 1574)
”GEOGM4” (Gemeinden (Gebietsstand 15.05.2022), 10787)
”GEOLK1” (Landkreise und kreisfreie Städte, 412)
”GEOLK3” (Landkreise und kreisfreie Städte, 412)
”GEOLK4” (Landkreise u. krsfr. Städte (Stand 15.05.22), 400)
”GEORB1” (Regierungsbezirke/Statistische Regionen, 36)
”GEORB3” (Regierungsbezirke/Statistische Regionen, 36)
”GEOVB1” (Gemeindeverbände, 1333)
”GEOVB2” (Gemeindeverbände mit mindestens 10 000 Einwohnern, 338)
”GEOVB3” (Gemeindeverbände mit mindestens 10 000 Einwohnern, 157)
”GEOVB4” (Gemeindeverbände (Gebietsstand 15.05.2022), 1207)
regionalkey (str, optional) – Official municipality key (AGS). Multiple values can be passed as a comma-separated list. Accepts 1-12 characters. “*” can be used as wildcard.
stand (str, optional) – Only download the table if it is newer than the status date. “tt.mm.jjjj hh:mm” or “tt.mm.jjjj”. Example: “24.12.2001 19:15”.
language (str, optional) – Messages and data descriptions are supplied in this language. For GENESIS and Zensus, [‘de’, ‘en’] are supported. For Regionalstatistik, only ‘de’ is supported.
quality (str) – One of “on” or “off”. If “on”, quality symbols are part of the download and additional columns (__q) are displayed. Defaults to “off”. The explanation of the quality labels can be found online after retrieving the table values, table -> explanation of symbols or at e.g. https://www-genesis.destatis.de/genesis/online?operation=ergebnistabelleQualitaet&language=en&levelindex=3&levelid=1719342760835#abreadcrumb.
- static parse_genesis_and_regio_table(data, language)
Parse ffcsv format for tables from GENESIS and Regionalstatistik into a more readable format
- Parameters:
data (DataFrame)
language (str)
- Return type:
DataFrame
- static parse_zensus_table(data, language)
Parse Zensus table ffcsv format into a more readable format
- Parameters:
data (DataFrame)
language (str)
- Return type:
DataFrame
- static prettify_table(data, db_name, language)
Reformat the data into a more readable table
- Parameters:
data (pd.DataFrame) – A pandas dataframe created from raw_data
db_name (str) – The name of the database.
language (str) – The requested language. One of “de” or “en”.
- Returns:
Formatted dataframe that omits all unnecessary Code columns and includes informative columns names
- Return type:
pd.DataFrame
- pystatis.clear_cache(name=None)
Clean the data cache completely or just a specified name.
- Parameters:
name (str, optional) – Unique name to be deleted from cached data.
- Return type:
None
- pystatis.logincheck(db_name)
Wrapper method which constructs a URL for testing the Destatis API logincheck method, which tests the login credentials (from the config.ini).
- Parameters:
db_name (str) – Name of the database to login to
- Returns:
text logincheck response from Destatis
- Return type:
str
- pystatis.setup_credentials()
Setup credentials for all supported databases.
- Return type:
None
- pystatis.whoami(db_name)
Wrapper method which constructs a URL for testing the Destatis API whoami method, which returns host name and IP address.
- Parameters:
db_name (str) – Name of the database to test
- Returns:
text test response from Destatis
- Return type:
str