{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "14555430", "metadata": {}, "outputs": [], "source": [ "from pprint import pprint\n", "\n", "import pystatis" ] }, { "cell_type": "markdown", "id": "151ca2e5", "metadata": {}, "source": [ "`pystatis` Presentation for Data Journalists\n", "============================================" ] }, { "cell_type": "code", "execution_count": 2, "id": "bbf0b1de", "metadata": {}, "outputs": [], "source": [ "pystatis.clear_cache()" ] }, { "cell_type": "markdown", "id": "baa7eec1", "metadata": {}, "source": [ "CorrelAid\n", "---------\n", "\n", "https://www.correlaid.org/en/about/\n", "\n", "### Our Mission\n", "~~~~~~~~~~~~~~~" ] }, { "cell_type": "markdown", "id": "e85b575c", "metadata": {}, "source": [ "CorrelAid is a **non-profit community of data science enthusiasts** who want to change the world using data science. We dedicate our work to the humans, initiatives and organizations that strive to make the world a better place.\n", "\n", "We value open knowledge management and transparency in our work wherever possible while complying with GDPR regulations and following strong principles of data ethics.\n", "\n", "### Our Work\n", "~~~~~~~~~~~~" ] }, { "cell_type": "markdown", "id": "33bc0783", "metadata": {}, "source": [ "Our work is based on three pillars:\n", "\n", "1. **Using data**: We enable data analysts and scientists to apply their knowledge for the common good and social organizations to increase their impact on society by **conducting pro-bono data for good (Data4Good) projects** and providing consulting on data topics.\n", "2. **Education**: We strongly believe in sharing our knowledge. It is not for nothing that we have chosen \"education\" as our association's official purpose. This is why we offer numerous education formats for nonprofits and volunteers. In addition, we share our knowledge, code, and materials publicly.\n", "3. **Community**: Our community is the basis of our work. We unite data scientists of different backgrounds and experience levels. We organize ourselves both online and on-site within our CorrelAidX local groups." ] }, { "cell_type": "markdown", "id": "ce8ea466", "metadata": {}, "source": [ "`pystatis`\n", "----------\n", "\n", "`pystatis` is a small Python library to conveniently wrap the different GENESIS web services (APIs) in a centralized and user-friendly manner.\n", "\n", "It allows users to browse the different databases and download the desired tables from all supported databases in a convenient `pandas` `DataFrame` object, suited for further analysis." ] }, { "cell_type": "markdown", "id": "9d462d61", "metadata": {}, "source": [ "### Setup\n", "~~~~~~~~~" ] }, { "cell_type": "markdown", "id": "d25940a8", "metadata": {}, "source": [ "We won't cover the initial only-once setup here because the user has to enter their credentials for the supported databases (GENESIS, Regionalstatistik, Zensus). But there is a dedicated notebook [Setup](./00_Setup.ipynb) with examples and explanations.\n", "\n", "Main Use Cases\n", "--------------\n", "\n", "### Find\n", "~~~~~~~~" ] }, { "cell_type": "markdown", "id": "48011553", "metadata": {}, "source": [ "`pystatis.Find` allows a user to use a keyword to browse the data available on the chosen database. Finds 3 different objects:\n", "- Tables: the tables containing the relevant keyword in title\n", "- Statistics: statistics is the larger collections of tables on the topic, finds the ones with keyword in title\n", "- Variables: variables are the values (DE: Merkmal) in columns of the tables, find ones with keyword in label\n", "\n", "Returns the titles of relevant tables/statistics/variables and their [EVAS](https://www.destatis.de/DE/Service/Bibliothek/Abloesung-Fachserien/uebersicht-fs.html) number – useful tool to look these up (EVAS is necessary for the Table method)\n", "\n", "1. call Find using a keyword `query=` and specifying a database `db_name=`\n", "2. actually query the API and print the results using `.run()`\n", "3. access the various objects, their EVAS numbers, or preview using metadata" ] }, { "cell_type": "code", "execution_count": 3, "id": "27ff0988", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "##### Results #####\n", "----------------------------------------\n", "# Number of tables: 5\n", "# Preview:\n", "| | Code | Content | Time |\n", "|---:|:----------------|:-------------------------------------------------------------------------------------------|:-------|\n", "| 0 | 32121-01-02-4 | Haushaltsabfälle - Jahr - regionale Tiefe: Kreise und krfr. Städte | |\n", "| 1 | 32121-01-02-4-B | Haushaltsabfälle - Jahr - regionale Ebenen | |\n", "| 2 | 32151-01-01-4 | Primär nachgewiesene Abfallmengen - Jahressumme - regionale Tiefe: Kreise und krfr. Städte | |\n", "| 3 | 32151-01-01-4-B | Primär nachgewiesene Abfallmengen - Jahressumme - regionale Ebenen | |\n", "| 4 | AI019 | Regionalatlas Deutschland Themenbereich \"Umwelt\" Indikatoren zu \"Haushaltsabfälle\" | |\n", "----------------------------------------\n", "# Number of statistics: 3\n", "# Preview:\n", "| | Code | Content | Cubes | Information |\n", "|---:|-------:|:----------------------------------------------------------------------|--------:|:--------------|\n", "| 0 | 32121 | Erhebung der öffentlich-rechtlichen Abfallentsorgung | 8 | true |\n", "| 1 | 32151 | Erhebung der gefährlichen Abfälle, über die Nachweise zu führen sind | 8 | true |\n", "| 2 | 99910 | Regionalatlas Deutschland | 208 | true |\n", "----------------------------------------\n", "# Number of variables: 3\n", "# Preview:\n", "| | Code | Content | Type | Values | Information |\n", "|---:|:-------|:--------------------------------------------------|:---------|---------:|:--------------|\n", "| 0 | ABFA01 | Abfallarten von Haushaltsabfällen | sachlich | 6 | false |\n", "| 1 | AI1904 | Abfälle aus der Biotonne je EW | Wert | -1 | true |\n", "| 2 | RVUEA1 | Reg. Verbleib der überwachungsbedürftigen Abfälle | sachlich | 2 | false |\n", "----------------------------------------\n", "# Number of cubes: 16\n", "# Preview:\n", "| | Code | Content | State | Time | LatestUpdate | Information |\n", "|---:|:-----------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------|:----------|:---------------------|:--------------|\n", "| 0 | 32121BJ002 | Erhebung der öffentlich-rechtlichen Abfallentsorgung, Aufkommen an Haushaltsabfällen (oh.Elektroaltger.), Deutschland, Abfallarten von Haushaltsabfällen, Jahr | vollständig mit Werten | 2004-2021 | 12.05.2023 08:58:59h | false |\n", "| 1 | 32121KJ002 | Erhebung der öffentlich-rechtlichen Abfallentsorgung, Aufkommen an Haushaltsabfällen (oh.Elektroaltger.), Kreise und kreisfreie Städte, Abfallarten von Haushaltsabfällen, Jahr | vollständig mit Werten | 2004-2021 | 09.10.2023 13:13:52h | false |\n", "| 2 | 32121LJ002 | Erhebung der öffentlich-rechtlichen Abfallentsorgung, Aufkommen an Haushaltsabfällen (oh.Elektroaltger.), Bundesländer, Abfallarten von Haushaltsabfällen, Jahr | vollständig mit Werten | 2004-2021 | 01.09.2023 14:41:23h | false |\n", "| 3 | 32121RJ002 | Erhebung der öffentlich-rechtlichen Abfallentsorgung, Aufkommen an Haushaltsabfällen (oh.Elektroaltger.), Regierungsbezirke / Statistische Regionen, Abfallarten von Haushaltsabfällen, Jahr | vollständig mit Werten | 2004-2021 | 01.09.2023 14:41:23h | false |\n", "| 4 | 32151BJ001 | Erhebung der gefährlichen Abfälle, über die Nachweise zu führen sind , Erzeuger von primär nachgewiesenen Abfallmengen, Abgegebene Abfallmenge an Entsorger, Deutschland, Jahr | vollständig mit Werten | 2001-2021 | 26.07.2023 14:30:27h | false |\n", "----------------------------------------\n", "# Use object.tables, object.statistics, object.variables or object.cubes to get all results.\n", "----------------------------------------\n" ] } ], "source": [ "results = pystatis.Find(query=\"Abfall\", db_name=\"regio\")\n", "results.run()" ] }, { "cell_type": "markdown", "id": "42bb915b", "metadata": {}, "source": [ "If interested in specific object, can run `results.tables`, `results.statistics`, or `results.variables` directly." ] }, { "cell_type": "code", "execution_count": 4, "id": "7af5949f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "| | Code | Content | Time |\n", "|---:|:----------------|:-------------------------------------------------------------------------------------------|:-------|\n", "| 0 | 32121-01-02-4 | Haushaltsabfälle - Jahr - regionale Tiefe: Kreise und krfr. Städte | |\n", "| 1 | 32121-01-02-4-B | Haushaltsabfälle - Jahr - regionale Ebenen | |\n", "| 2 | 32151-01-01-4 | Primär nachgewiesene Abfallmengen - Jahressumme - regionale Tiefe: Kreise und krfr. Städte | |\n", "| 3 | 32151-01-01-4-B | Primär nachgewiesene Abfallmengen - Jahressumme - regionale Ebenen | |\n", "| 4 | AI019 | Regionalatlas Deutschland Themenbereich \"Umwelt\" Indikatoren zu \"Haushaltsabfälle\" | |" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "results.tables" ] }, { "cell_type": "markdown", "id": "fb64eaf2", "metadata": {}, "source": [ "Add `.df` to convert to a dataframe for easier handling." ] }, { "cell_type": "code", "execution_count": 5, "id": "e4bc3879", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CodeContentTime
032121-01-02-4Haushaltsabfälle - Jahr - regionale Tiefe: Kreise und krfr. Städte
132121-01-02-4-BHaushaltsabfälle - Jahr - regionale Ebenen
232151-01-01-4Primär nachgewiesene Abfallmengen - Jahressumme - regionale Tiefe: Kreise und krfr. Städte
332151-01-01-4-BPrimär nachgewiesene Abfallmengen - Jahressumme - regionale Ebenen
4AI019Regionalatlas Deutschland Themenbereich \"Umwelt\" Indikatoren zu \"Haushaltsabfälle\"
\n", "
" ], "text/plain": [ " Code Content Time\n", "0 32121-01-02-4 Haushaltsabfälle - Jahr - regionale Tiefe: Kreise und krfr. Städte \n", "1 32121-01-02-4-B Haushaltsabfälle - Jahr - regionale Ebenen \n", "2 32151-01-01-4 Primär nachgewiesene Abfallmengen - Jahressumme - regionale Tiefe: Kreise und krfr. Städte \n", "3 32151-01-01-4-B Primär nachgewiesene Abfallmengen - Jahressumme - regionale Ebenen \n", "4 AI019 Regionalatlas Deutschland Themenbereich \"Umwelt\" Indikatoren zu \"Haushaltsabfälle\" " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "results.tables.df" ] }, { "cell_type": "markdown", "id": "7e32a668", "metadata": {}, "source": [ "We can then access the relevant codes with `.get_code([#])`. Doing this returns a list of codes from specified rows which may be useful to run in the Table method." ] }, { "cell_type": "code", "execution_count": 6, "id": "17564ec7", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['32121-01-02-4', '32121-01-02-4-B', '32151-01-01-4']" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "results.tables.get_code([0, 1, 2])" ] }, { "cell_type": "markdown", "id": "dcc60b06", "metadata": {}, "source": [ "To then check that the object has the relevant data, we can preview the columns using the `.get_metadata()` method." ] }, { "cell_type": "code", "execution_count": 7, "id": "63123a8f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "TABLES 32121-01-02-4-B - 1\n", "Name:\n", "Erhebung der öffentlich-rechtlichen Abfallentsorgung\n", "--------------------\n", "Columns:\n", "Aufkommen an Haushaltsabfällen (oh.Elektroaltger.)\n", "--------------------\n", "Rows:\n", "Kreise und kreisfreie Städte\n", "----------------------------------------\n", "TABLES 32151-01-01-4 - 2\n", "Name:\n", "Erhebung der gefährlichen Abfälle, über die Nachweise zu führen sind \n", "--------------------\n", "Columns:\n", "Erzeuger von primär nachgewiesenen Abfallmengen\n", "Abgegebene Abfallmenge an Entsorger\n", "--------------------\n", "Rows:\n", "Kreise und kreisfreie Städte\n", "----------------------------------------\n" ] } ], "source": [ "results.tables.get_metadata([1, 2])" ] }, { "cell_type": "markdown", "id": "d7d7aa04", "metadata": {}, "source": [ "The `pystatis.Find` is a useful search tool to browse the database by any keyword. It is quicker than downloading a table and does not need the EVAS number to run.\n", "\n", "Use this to identify the tables of interest and to look up their EVAS as to use in the further analysis with a `pystatis.Table` method.\n", "\n", "### Table\n", "~~~~~~~~~" ] }, { "cell_type": "markdown", "id": "446063b1", "metadata": {}, "source": [ "`pystatis.Table` offers a simple Interface to get any table via its \"name\" ([EVAS](https://www.destatis.de/DE/Service/Bibliothek/Abloesung-Fachserien/uebersicht-fs.html) number).\n", "\n", "1. Create a new Table instance by passing `name=`\n", "2. Download the actual data with `.get_data(prettify=)`\n", "3. Access data via either `.raw_data` or `.data`, metadata via `.metadata`" ] }, { "cell_type": "code", "execution_count": 3, "id": "66d0e9cf", "metadata": {}, "outputs": [], "source": [ "# GENESIS - https://www-genesis.destatis.de/genesis//online?operation=table&code=31231-0001&bypass=true&levelindex=1&levelid=1706599948340#abreadcrumb\n", "t = pystatis.Table(name=\"31231-0001\") #" ] }, { "cell_type": "markdown", "id": "11a26d11", "metadata": {}, "source": [ "Per default, `prettify` is set to `True` and will return a more readable format. Here we show the original format first." ] }, { "cell_type": "code", "execution_count": 6, "id": "927c0a2c", "metadata": {}, "outputs": [], "source": [ "t.get_data(prettify=False)" ] }, { "cell_type": "code", "execution_count": 7, "id": "4308ec4d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Statistik_Code;Statistik_Label;Zeit_Code;Zeit_Label;Zeit;1_Merkmal_Code;1_Merkmal_Label;1_Auspraegung_Code;1_Auspraegung_Label;WOH001__Wohnungen_in_Wohn-_und_Nichtwohngebaeuden__Anzahl;WOH004__Wohnungen_je_1000_Einwohner__Anzahl;FLC001__Wohnflaeche__1000_qm;FLC102__Wohnflaeche_je_Wohnung__qm;FLC103__Wohnflaeche_je_Einwohner__qm;RME001__Raeume__Anzahl;RME002__Raeume_je_Wohnung__Anzahl;RME003__Raeume_je_Einwohner__Anzahl',\n", " '31231;Fortschreibung Wohngebäude- und Wohnungsbestand;STAG;Stichtag;31.12.2015;DINSG;Deutschland insgesamt;DG;Deutschland;41446271;504;3794976;91,6;46,2;182295713;4,4;2,2',\n", " '31231;Fortschreibung Wohngebäude- und Wohnungsbestand;STAG;Stichtag;31.12.2016;DINSG;Deutschland insgesamt;DG;Deutschland;41703347;505;3822507;91,7;46,3;183354291;4,4;2,2',\n", " '31231;Fortschreibung Wohngebäude- und Wohnungsbestand;STAG;Stichtag;31.12.2017;DINSG;Deutschland insgesamt;DG;Deutschland;41968066;507;3850742;91,8;46,5;184427760;4,4;2,2',\n", " '31231;Fortschreibung Wohngebäude- und Wohnungsbestand;STAG;Stichtag;31.12.2018;DINSG;Deutschland insgesamt;DG;Deutschland;42235402;509;3878901;91,8;46,7;185491224;4,4;2,2',\n", " '31231;Fortschreibung Wohngebäude- und Wohnungsbestand;STAG;Stichtag;31.12.2019;DINSG;Deutschland insgesamt;DG;Deutschland;42512771;511;3908347;91,9;47,0;186594482;4,4;2,2',\n", " '31231;Fortschreibung Wohngebäude- und Wohnungsbestand;STAG;Stichtag;31.12.2020;DINSG;Deutschland insgesamt;DG;Deutschland;42803737;515;3938871;92,0;47,4;187746588;4,4;2,3',\n", " '31231;Fortschreibung Wohngebäude- und Wohnungsbestand;STAG;Stichtag;31.12.2021;DINSG;Deutschland insgesamt;DG;Deutschland;43084122;518;3967765;92,1;47,7;188829383;4,4;2,3',\n", " '31231;Fortschreibung Wohngebäude- und Wohnungsbestand;STAG;Stichtag;31.12.2022;DINSG;Deutschland insgesamt;DG;Deutschland;43366919;514;3996995;92,2;47,4;189920514;4,4;2,3']" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t.raw_data.splitlines()" ] }, { "cell_type": "code", "execution_count": 8, "id": "d37138dd", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Statistik_CodeStatistik_LabelZeit_CodeZeit_LabelZeit1_Merkmal_Code1_Merkmal_Label1_Auspraegung_Code1_Auspraegung_LabelWOH001__Wohnungen_in_Wohn-_und_Nichtwohngebaeuden__AnzahlWOH004__Wohnungen_je_1000_Einwohner__AnzahlFLC001__Wohnflaeche__1000_qmFLC102__Wohnflaeche_je_Wohnung__qmFLC103__Wohnflaeche_je_Einwohner__qmRME001__Raeume__AnzahlRME002__Raeume_je_Wohnung__AnzahlRME003__Raeume_je_Einwohner__Anzahl
031231Fortschreibung Wohngebäude- und WohnungsbestandSTAGStichtag31.12.2015DINSGDeutschland insgesamtDGDeutschland41446271504379497691,646,21822957134,42,2
131231Fortschreibung Wohngebäude- und WohnungsbestandSTAGStichtag31.12.2016DINSGDeutschland insgesamtDGDeutschland41703347505382250791,746,31833542914,42,2
231231Fortschreibung Wohngebäude- und WohnungsbestandSTAGStichtag31.12.2017DINSGDeutschland insgesamtDGDeutschland41968066507385074291,846,51844277604,42,2
331231Fortschreibung Wohngebäude- und WohnungsbestandSTAGStichtag31.12.2018DINSGDeutschland insgesamtDGDeutschland42235402509387890191,846,71854912244,42,2
431231Fortschreibung Wohngebäude- und WohnungsbestandSTAGStichtag31.12.2019DINSGDeutschland insgesamtDGDeutschland42512771511390834791,947,01865944824,42,2
531231Fortschreibung Wohngebäude- und WohnungsbestandSTAGStichtag31.12.2020DINSGDeutschland insgesamtDGDeutschland42803737515393887192,047,41877465884,42,3
631231Fortschreibung Wohngebäude- und WohnungsbestandSTAGStichtag31.12.2021DINSGDeutschland insgesamtDGDeutschland43084122518396776592,147,71888293834,42,3
731231Fortschreibung Wohngebäude- und WohnungsbestandSTAGStichtag31.12.2022DINSGDeutschland insgesamtDGDeutschland43366919514399699592,247,41899205144,42,3
\n", "
" ], "text/plain": [ " Statistik_Code Statistik_Label Zeit_Code Zeit_Label Zeit 1_Merkmal_Code 1_Merkmal_Label 1_Auspraegung_Code 1_Auspraegung_Label WOH001__Wohnungen_in_Wohn-_und_Nichtwohngebaeuden__Anzahl WOH004__Wohnungen_je_1000_Einwohner__Anzahl FLC001__Wohnflaeche__1000_qm FLC102__Wohnflaeche_je_Wohnung__qm FLC103__Wohnflaeche_je_Einwohner__qm RME001__Raeume__Anzahl RME002__Raeume_je_Wohnung__Anzahl RME003__Raeume_je_Einwohner__Anzahl\n", "0 31231 Fortschreibung Wohngebäude- und Wohnungsbestand STAG Stichtag 31.12.2015 DINSG Deutschland insgesamt DG Deutschland 41446271 504 3794976 91,6 46,2 182295713 4,4 2,2\n", "1 31231 Fortschreibung Wohngebäude- und Wohnungsbestand STAG Stichtag 31.12.2016 DINSG Deutschland insgesamt DG Deutschland 41703347 505 3822507 91,7 46,3 183354291 4,4 2,2\n", "2 31231 Fortschreibung Wohngebäude- und Wohnungsbestand STAG Stichtag 31.12.2017 DINSG Deutschland insgesamt DG Deutschland 41968066 507 3850742 91,8 46,5 184427760 4,4 2,2\n", "3 31231 Fortschreibung Wohngebäude- und Wohnungsbestand STAG Stichtag 31.12.2018 DINSG Deutschland insgesamt DG Deutschland 42235402 509 3878901 91,8 46,7 185491224 4,4 2,2\n", "4 31231 Fortschreibung Wohngebäude- und Wohnungsbestand STAG Stichtag 31.12.2019 DINSG Deutschland insgesamt DG Deutschland 42512771 511 3908347 91,9 47,0 186594482 4,4 2,2\n", "5 31231 Fortschreibung Wohngebäude- und Wohnungsbestand STAG Stichtag 31.12.2020 DINSG Deutschland insgesamt DG Deutschland 42803737 515 3938871 92,0 47,4 187746588 4,4 2,3\n", "6 31231 Fortschreibung Wohngebäude- und Wohnungsbestand STAG Stichtag 31.12.2021 DINSG Deutschland insgesamt DG Deutschland 43084122 518 3967765 92,1 47,7 188829383 4,4 2,3\n", "7 31231 Fortschreibung Wohngebäude- und Wohnungsbestand STAG Stichtag 31.12.2022 DINSG Deutschland insgesamt DG Deutschland 43366919 514 3996995 92,2 47,4 189920514 4,4 2,3" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t.data" ] }, { "cell_type": "markdown", "id": "1f193771", "metadata": {}, "source": [ "As you can see, the original format has a lot of redundant information and columns with metadata like the codes for the different variables. Let's rerun `get_data` with `prettify=True`." ] }, { "cell_type": "code", "execution_count": 9, "id": "6a433995", "metadata": {}, "outputs": [], "source": [ "t.get_data()" ] }, { "cell_type": "code", "execution_count": 10, "id": "88cb7561", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
StichtagDeutschland insgesamtWohnungen_in_Wohn-_und_NichtwohngebaeudenWohnungen_je_1000_EinwohnerWohnflaecheWohnflaeche_je_WohnungWohnflaeche_je_EinwohnerRaeumeRaeume_je_WohnungRaeume_je_Einwohner
031.12.2015Deutschland41446271504379497691,646,21822957134,42,2
131.12.2016Deutschland41703347505382250791,746,31833542914,42,2
231.12.2017Deutschland41968066507385074291,846,51844277604,42,2
331.12.2018Deutschland42235402509387890191,846,71854912244,42,2
431.12.2019Deutschland42512771511390834791,947,01865944824,42,2
531.12.2020Deutschland42803737515393887192,047,41877465884,42,3
631.12.2021Deutschland43084122518396776592,147,71888293834,42,3
731.12.2022Deutschland43366919514399699592,247,41899205144,42,3
\n", "
" ], "text/plain": [ " Stichtag Deutschland insgesamt Wohnungen_in_Wohn-_und_Nichtwohngebaeuden Wohnungen_je_1000_Einwohner Wohnflaeche Wohnflaeche_je_Wohnung Wohnflaeche_je_Einwohner Raeume Raeume_je_Wohnung Raeume_je_Einwohner\n", "0 31.12.2015 Deutschland 41446271 504 3794976 91,6 46,2 182295713 4,4 2,2\n", "1 31.12.2016 Deutschland 41703347 505 3822507 91,7 46,3 183354291 4,4 2,2\n", "2 31.12.2017 Deutschland 41968066 507 3850742 91,8 46,5 184427760 4,4 2,2\n", "3 31.12.2018 Deutschland 42235402 509 3878901 91,8 46,7 185491224 4,4 2,2\n", "4 31.12.2019 Deutschland 42512771 511 3908347 91,9 47,0 186594482 4,4 2,2\n", "5 31.12.2020 Deutschland 42803737 515 3938871 92,0 47,4 187746588 4,4 2,3\n", "6 31.12.2021 Deutschland 43084122 518 3967765 92,1 47,7 188829383 4,4 2,3\n", "7 31.12.2022 Deutschland 43366919 514 3996995 92,2 47,4 189920514 4,4 2,3" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t.data" ] }, { "cell_type": "markdown", "id": "1be153f2", "metadata": {}, "source": [ "You can also access the metadata as returned by the Catalogue endpoint." ] }, { "cell_type": "code", "execution_count": 11, "id": "f1f4957d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'Copyright': '© Statistisches Bundesamt (Destatis), 2024',\n", " 'Ident': {'Method': 'table', 'Service': 'metadata'},\n", " 'Object': {'Code': '31231-0001',\n", " 'Content': 'Wohnungen in Wohn- und Nichtwohngebäuden, Wohnfläche, '\n", " 'Räume:\\n'\n", " 'Deutschland, Stichtag',\n", " 'Structure': {'Columns': [{'Code': 'WOH001',\n", " 'Content': 'Wohnungen in Wohn- und '\n", " 'Nichtwohngebäuden',\n", " 'Selected': None,\n", " 'Structure': None,\n", " 'Type': 'Merkmal',\n", " 'Updated': 'see parent',\n", " 'Values': None},\n", " {'Code': 'WOH004',\n", " 'Content': 'Wohnungen je 1000 Einwohner',\n", " 'Selected': None,\n", " 'Structure': None,\n", " 'Type': 'Merkmal',\n", " 'Updated': 'see parent',\n", " 'Values': None},\n", " {'Code': 'FLC001',\n", " 'Content': 'Wohnfläche',\n", " 'Selected': None,\n", " 'Structure': None,\n", " 'Type': 'Merkmal',\n", " 'Updated': 'see parent',\n", " 'Values': None},\n", " {'Code': 'FLC102',\n", " 'Content': 'Wohnfläche je Wohnung',\n", " 'Selected': None,\n", " 'Structure': None,\n", " 'Type': 'Merkmal',\n", " 'Updated': 'see parent',\n", " 'Values': None},\n", " {'Code': 'FLC103',\n", " 'Content': 'Wohnfläche je Einwohner',\n", " 'Selected': None,\n", " 'Structure': None,\n", " 'Type': 'Merkmal',\n", " 'Updated': 'see parent',\n", " 'Values': None},\n", " {'Code': 'RME001',\n", " 'Content': 'Räume',\n", " 'Selected': None,\n", " 'Structure': None,\n", " 'Type': 'Merkmal',\n", " 'Updated': 'see parent',\n", " 'Values': None},\n", " {'Code': 'RME002',\n", " 'Content': 'Räume je Wohnung',\n", " 'Selected': None,\n", " 'Structure': None,\n", " 'Type': 'Merkmal',\n", " 'Updated': 'see parent',\n", " 'Values': None},\n", " {'Code': 'RME003',\n", " 'Content': 'Räume je Einwohner',\n", " 'Selected': None,\n", " 'Structure': None,\n", " 'Type': 'Merkmal',\n", " 'Updated': 'see parent',\n", " 'Values': None}],\n", " 'Head': {'Code': '31231',\n", " 'Content': 'Fortschreibung Wohngebäude- und '\n", " 'Wohnungsbestand',\n", " 'Selected': None,\n", " 'Structure': [{'Code': 'DINSG',\n", " 'Content': 'Deutschland '\n", " 'insgesamt',\n", " 'Selected': '1',\n", " 'Structure': None,\n", " 'Type': 'Merkmal',\n", " 'Updated': 'see parent',\n", " 'Values': '1'}],\n", " 'Type': 'Statistik',\n", " 'Updated': 'see parent',\n", " 'Values': None},\n", " 'Rows': [{'Code': 'STAG',\n", " 'Content': 'Stichtag',\n", " 'Selected': None,\n", " 'Structure': None,\n", " 'Type': 'Merkmal',\n", " 'Updated': 'see parent',\n", " 'Values': None}],\n", " 'Subheading': None,\n", " 'Subtitel': None},\n", " 'Time': {'From': '31.12.2015', 'To': '31.12.2022'},\n", " 'Updated': '16.08.2023 16:38:39h',\n", " 'Valid': 'false'},\n", " 'Parameter': {'area': 'Alle',\n", " 'language': 'de',\n", " 'name': '31231-0001',\n", " 'password': '********************',\n", " 'username': '********************'},\n", " 'Status': {'Code': 0, 'Content': 'erfolgreich', 'Type': 'Information'}}\n" ] } ], "source": [ "pprint(t.metadata)" ] }, { "cell_type": "markdown", "id": "0d8805d9", "metadata": {}, "source": [ "You can use any EVAS number from the supported databases like GENESIS, Regionalstatistik or Zensus. The library identifies the database for you so you don't have to care about this." ] }, { "cell_type": "code", "execution_count": 12, "id": "533a7e96", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
JahrDeutschland insgesamtEnergieträgerElektrizitaetserzeugung_(brutto)Elektrizitaetserzeugung_(netto)NettowaermeerzeugungBrennstoffeinsatz
02022DeutschlandSteinkohlen60759486.055443585.019106288.05.712968e+08
12022DeutschlandSteinkohlenkoksNaNNaNNaNNaN
22022DeutschlandSteinkohlenbrikettsNaNNaNNaNNaN
32022DeutschlandKohlenwertstoffe aus SteinkohleNaNNaNNaNNaN
42022DeutschlandSonstige SteinkohlenNaNNaNNaNNaN
52022DeutschlandRohbraunkohlen113611255.0105943521.06558509.01.039784e+09
62022DeutschlandHartbraunkohlenNaNNaNNaNNaN
72022DeutschlandBraunkohlenbriketts43460.040444.0NaN4.102690e+05
82022DeutschlandBraunkohlenkoksNaNNaNNaNNaN
92022DeutschlandWirbelschichtkohleNaNNaNNaNNaN
102022DeutschlandStaub- und TrockenkohleNaNNaNNaNNaN
112022DeutschlandSonstige BraunkohlenNaNNaNNaNNaN
122022DeutschlandDieselkraftstoffNaNNaNNaNNaN
132022DeutschlandHeizöl, leicht829009.0773967.01386039.01.301587e+07
142022DeutschlandHeizöl, schwer237152.0214908.033706.02.366438e+06
152022DeutschlandFlüssiggasNaNNaNNaNNaN
162022DeutschlandRaffineriegasNaNNaNNaNNaN
172022DeutschlandPetrolkoksNaNNaNNaNNaN
182022DeutschlandSonstige MineralölprodukteNaNNaNNaN1.827200e+05
192022DeutschlandErdgas, Erdölgas46636174.045163692.041934364.04.312443e+08
202022DeutschlandGrubengas582597.0577321.0NaN5.708188e+06
212022DeutschlandKokereigasNaNNaNNaNNaN
222022DeutschlandHochofengasNaNNaNNaNNaN
232022DeutschlandSonstige hergestellte GaseNaNNaNNaNNaN
242022DeutschlandWasserstoffNaNNaNNaNNaN
252022DeutschlandLaufwasser13404202.013281638.0NaNNaN
262022DeutschlandSpeicherwasser614789.0611162.0NaNNaN
272022DeutschlandPumpspeicherwasserNaNNaNNaNNaN
282022DeutschlandPumpspeicher mit natürlichem Zufluss480142.0480142.0NaNNaN
292022DeutschlandPumpspeicher ohne natürlichen ZuflussNaNNaNNaNNaN
302022DeutschlandWindkraftNaNNaNNaNNaN
312022DeutschlandPhotovoltaikNaNNaNNaNNaN
322022DeutschlandGeothermieNaNNaNNaNNaN
332022DeutschlandWärmepumpen (Erd- und Umweltwärme)NaNNaNNaNNaN
342022DeutschlandSolarthermieNaNNaNNaNNaN
352022DeutschlandFeste biogene Stoffe5157301.04555379.05496277.07.738774e+07
362022DeutschlandFlüssige biogene StoffeNaNNaNNaNNaN
372022DeutschlandBiogas2781281.02662858.02238806.02.547129e+07
382022DeutschlandBiomethan (Bioerdgas)1031257.01007770.01172103.09.574762e+06
392022DeutschlandKlärgasNaNNaNNaNNaN
402022DeutschlandDeponiegas59864.056080.035286.06.237530e+05
412022DeutschlandSonstige erneuerbare EnergienNaNNaNNaNNaN
422022DeutschlandKlärschlamm206726.0181848.0107537.02.241021e+06
432022DeutschlandAbfall (Hausmüll, Industrie)NaNNaNNaNNaN
442022DeutschlandAbfall (Industrie)542529.0368799.01083604.01.036228e+07
452022DeutschlandAbfall (Hausmüll, Siedlungsabfälle)10494870.08248924.016865152.01.899352e+08
462022DeutschlandKernenergie34709262.032765409.0NaNNaN
472022DeutschlandWärme346035.0330543.01115125.08.318414e+06
482022DeutschlandStrom (Elektrokessel)NaNNaNNaNNaN
492022DeutschlandSonstige EnergieträgerNaNNaNNaNNaN
502022DeutschlandAndere SpeicherNaNNaNNaNNaN
512022DeutschlandInsgesamt293024354.0273144778.098388835.02.395591e+09
\n", "
" ], "text/plain": [ " Jahr Deutschland insgesamt Energieträger Elektrizitaetserzeugung_(brutto) Elektrizitaetserzeugung_(netto) Nettowaermeerzeugung Brennstoffeinsatz\n", "0 2022 Deutschland Steinkohlen 60759486.0 55443585.0 19106288.0 5.712968e+08\n", "1 2022 Deutschland Steinkohlenkoks NaN NaN NaN NaN\n", "2 2022 Deutschland Steinkohlenbriketts NaN NaN NaN NaN\n", "3 2022 Deutschland Kohlenwertstoffe aus Steinkohle NaN NaN NaN NaN\n", "4 2022 Deutschland Sonstige Steinkohlen NaN NaN NaN NaN\n", "5 2022 Deutschland Rohbraunkohlen 113611255.0 105943521.0 6558509.0 1.039784e+09\n", "6 2022 Deutschland Hartbraunkohlen NaN NaN NaN NaN\n", "7 2022 Deutschland Braunkohlenbriketts 43460.0 40444.0 NaN 4.102690e+05\n", "8 2022 Deutschland Braunkohlenkoks NaN NaN NaN NaN\n", "9 2022 Deutschland Wirbelschichtkohle NaN NaN NaN NaN\n", "10 2022 Deutschland Staub- und Trockenkohle NaN NaN NaN NaN\n", "11 2022 Deutschland Sonstige Braunkohlen NaN NaN NaN NaN\n", "12 2022 Deutschland Dieselkraftstoff NaN NaN NaN NaN\n", "13 2022 Deutschland Heizöl, leicht 829009.0 773967.0 1386039.0 1.301587e+07\n", "14 2022 Deutschland Heizöl, schwer 237152.0 214908.0 33706.0 2.366438e+06\n", "15 2022 Deutschland Flüssiggas NaN NaN NaN NaN\n", "16 2022 Deutschland Raffineriegas NaN NaN NaN NaN\n", "17 2022 Deutschland Petrolkoks NaN NaN NaN NaN\n", "18 2022 Deutschland Sonstige Mineralölprodukte NaN NaN NaN 1.827200e+05\n", "19 2022 Deutschland Erdgas, Erdölgas 46636174.0 45163692.0 41934364.0 4.312443e+08\n", "20 2022 Deutschland Grubengas 582597.0 577321.0 NaN 5.708188e+06\n", "21 2022 Deutschland Kokereigas NaN NaN NaN NaN\n", "22 2022 Deutschland Hochofengas NaN NaN NaN NaN\n", "23 2022 Deutschland Sonstige hergestellte Gase NaN NaN NaN NaN\n", "24 2022 Deutschland Wasserstoff NaN NaN NaN NaN\n", "25 2022 Deutschland Laufwasser 13404202.0 13281638.0 NaN NaN\n", "26 2022 Deutschland Speicherwasser 614789.0 611162.0 NaN NaN\n", "27 2022 Deutschland Pumpspeicherwasser NaN NaN NaN NaN\n", "28 2022 Deutschland Pumpspeicher mit natürlichem Zufluss 480142.0 480142.0 NaN NaN\n", "29 2022 Deutschland Pumpspeicher ohne natürlichen Zufluss NaN NaN NaN NaN\n", "30 2022 Deutschland Windkraft NaN NaN NaN NaN\n", "31 2022 Deutschland Photovoltaik NaN NaN NaN NaN\n", "32 2022 Deutschland Geothermie NaN NaN NaN NaN\n", "33 2022 Deutschland Wärmepumpen (Erd- und Umweltwärme) NaN NaN NaN NaN\n", "34 2022 Deutschland Solarthermie NaN NaN NaN NaN\n", "35 2022 Deutschland Feste biogene Stoffe 5157301.0 4555379.0 5496277.0 7.738774e+07\n", "36 2022 Deutschland Flüssige biogene Stoffe NaN NaN NaN NaN\n", "37 2022 Deutschland Biogas 2781281.0 2662858.0 2238806.0 2.547129e+07\n", "38 2022 Deutschland Biomethan (Bioerdgas) 1031257.0 1007770.0 1172103.0 9.574762e+06\n", "39 2022 Deutschland Klärgas NaN NaN NaN NaN\n", "40 2022 Deutschland Deponiegas 59864.0 56080.0 35286.0 6.237530e+05\n", "41 2022 Deutschland Sonstige erneuerbare Energien NaN NaN NaN NaN\n", "42 2022 Deutschland Klärschlamm 206726.0 181848.0 107537.0 2.241021e+06\n", "43 2022 Deutschland Abfall (Hausmüll, Industrie) NaN NaN NaN NaN\n", "44 2022 Deutschland Abfall (Industrie) 542529.0 368799.0 1083604.0 1.036228e+07\n", "45 2022 Deutschland Abfall (Hausmüll, Siedlungsabfälle) 10494870.0 8248924.0 16865152.0 1.899352e+08\n", "46 2022 Deutschland Kernenergie 34709262.0 32765409.0 NaN NaN\n", "47 2022 Deutschland Wärme 346035.0 330543.0 1115125.0 8.318414e+06\n", "48 2022 Deutschland Strom (Elektrokessel) NaN NaN NaN NaN\n", "49 2022 Deutschland Sonstige Energieträger NaN NaN NaN NaN\n", "50 2022 Deutschland Andere Speicher NaN NaN NaN NaN\n", "51 2022 Deutschland Insgesamt 293024354.0 273144778.0 98388835.0 2.395591e+09" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# GENESIS\n", "t = pystatis.Table(name=\"43311-0001\")\n", "t.get_data()\n", "t.data" ] }, { "cell_type": "code", "execution_count": 13, "id": "b00f1c76", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/miay/git/github/CorrelAid/pystatis/src/pystatis/table.py:48: DtypeWarning: Columns (7) have mixed types. Specify dtype option on import or set low_memory=False.\n", " self.data = pd.read_csv(\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SemesterKreise und kreisfreie StädteGeschlechtNationalität (inkl. insgesamt)Fächergruppe (mit Insgesamt)Kreise und kreisfreie Städte_CodeGeschlecht_CodeNationalität (inkl. insgesamt)_CodeFächergruppe (mit Insgesamt)_CodeStudierende_(im_Kreisgebiet)
0WS 2021/22DeutschlandInsgesamtInsgesamtGeisteswissenschaftenDGINSGESAMTINSGESAMTHS-FG01316442.0
1WS 2021/22DeutschlandInsgesamtInsgesamtSportDGINSGESAMTINSGESAMTHS-FG0231157.0
2WS 2021/22DeutschlandInsgesamtInsgesamtRechts-, Wirtschafts- und SozialwissenschaftenDGINSGESAMTINSGESAMTHS-FG031138785.0
3WS 2021/22DeutschlandInsgesamtInsgesamtMathematik/NaturwissenschaftenDGINSGESAMTINSGESAMTHS-FG04314060.0
4WS 2021/22DeutschlandInsgesamtInsgesamtHumanmedizin/GesundheitswissenschaftenDGINSGESAMTINSGESAMTHS-FG05196239.0
.................................
48415WS 2021/22Altenburger Land, LandkreisweiblichDeutscheAgrar-, Forst- und Ernährungswissensch., Veterinär16077GESWNATDHS-FG07NaN
48416WS 2021/22Altenburger Land, LandkreisweiblichDeutscheIngenieurwissenschaften16077GESWNATDHS-FG08NaN
48417WS 2021/22Altenburger Land, LandkreisweiblichDeutscheKunst, Kunstwissenschaft16077GESWNATDHS-FG09NaN
48418WS 2021/22Altenburger Land, LandkreisweiblichDeutscheAußerhalb der Studienbereichsgliederung16077GESWNATDHS-FG10NaN
48419WS 2021/22Altenburger Land, LandkreisweiblichDeutscheInsgesamt16077GESWNATDINSGESAMTNaN
\n", "

48420 rows × 10 columns

\n", "
" ], "text/plain": [ " Semester Kreise und kreisfreie Städte Geschlecht Nationalität (inkl. insgesamt) Fächergruppe (mit Insgesamt) Kreise und kreisfreie Städte_Code Geschlecht_Code Nationalität (inkl. insgesamt)_Code Fächergruppe (mit Insgesamt)_Code Studierende_(im_Kreisgebiet)\n", "0 WS 2021/22 Deutschland Insgesamt Insgesamt Geisteswissenschaften DG INSGESAMT INSGESAMT HS-FG01 316442.0\n", "1 WS 2021/22 Deutschland Insgesamt Insgesamt Sport DG INSGESAMT INSGESAMT HS-FG02 31157.0\n", "2 WS 2021/22 Deutschland Insgesamt Insgesamt Rechts-, Wirtschafts- und Sozialwissenschaften DG INSGESAMT INSGESAMT HS-FG03 1138785.0\n", "3 WS 2021/22 Deutschland Insgesamt Insgesamt Mathematik/Naturwissenschaften DG INSGESAMT INSGESAMT HS-FG04 314060.0\n", "4 WS 2021/22 Deutschland Insgesamt Insgesamt Humanmedizin/Gesundheitswissenschaften DG INSGESAMT INSGESAMT HS-FG05 196239.0\n", "... ... ... ... ... ... ... ... ... ... ...\n", "48415 WS 2021/22 Altenburger Land, Landkreis weiblich Deutsche Agrar-, Forst- und Ernährungswissensch., Veterinär 16077 GESW NATD HS-FG07 NaN\n", "48416 WS 2021/22 Altenburger Land, Landkreis weiblich Deutsche Ingenieurwissenschaften 16077 GESW NATD HS-FG08 NaN\n", "48417 WS 2021/22 Altenburger Land, Landkreis weiblich Deutsche Kunst, Kunstwissenschaft 16077 GESW NATD HS-FG09 NaN\n", "48418 WS 2021/22 Altenburger Land, Landkreis weiblich Deutsche Außerhalb der Studienbereichsgliederung 16077 GESW NATD HS-FG10 NaN\n", "48419 WS 2021/22 Altenburger Land, Landkreis weiblich Deutsche Insgesamt 16077 GESW NATD INSGESAMT NaN\n", "\n", "[48420 rows x 10 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Regionalstatistik\n", "t = pystatis.Table(name=\"21311-01-01-4\")\n", "t.get_data()\n", "t.data" ] }, { "cell_type": "code", "execution_count": 4, "id": "4ce4c9d5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
StichtagDeutschlandHöchster SchulabschlussPersonen ab 15 Jahren
02011-05-09DeutschlandMittlerer Schulabschluss und gymnasiale Oberstufe1
12011-05-09DeutschlandHaupt-/ Volksschulabschluss2
22011-05-09DeutschlandOhne Abschluss4932710
32011-05-09DeutschlandInsgesamt6
42011-05-09DeutschlandAllg./fachgebundene Hochschulreife (Abitur)1
52011-05-09DeutschlandFachhochschulreife5531480
62011-05-09DeutschlandNoch in schulischer Ausbildung1691700
72011-05-09DeutschlandAllg./fachgebundene Hochschulreife (Abitur)1
82011-05-09DeutschlandSchüler/-innen der gymnasialen Oberstufe1339490
92011-05-09DeutschlandRealschul- oder gleichwertiger Abschluss1
102011-05-09DeutschlandHaupt-/ Volksschulabschluss2
112011-05-09DeutschlandFachhochschulreife5531480
122011-05-09DeutschlandOhne Schulabschluss3241010
\n", "
" ], "text/plain": [ " Stichtag Deutschland Höchster Schulabschluss Personen ab 15 Jahren\n", "0 2011-05-09 Deutschland Mittlerer Schulabschluss und gymnasiale Oberstufe 1\n", "1 2011-05-09 Deutschland Haupt-/ Volksschulabschluss 2\n", "2 2011-05-09 Deutschland Ohne Abschluss 4932710\n", "3 2011-05-09 Deutschland Insgesamt 6\n", "4 2011-05-09 Deutschland Allg./fachgebundene Hochschulreife (Abitur) 1\n", "5 2011-05-09 Deutschland Fachhochschulreife 5531480\n", "6 2011-05-09 Deutschland Noch in schulischer Ausbildung 1691700\n", "7 2011-05-09 Deutschland Allg./fachgebundene Hochschulreife (Abitur) 1\n", "8 2011-05-09 Deutschland Schüler/-innen der gymnasialen Oberstufe 1339490\n", "9 2011-05-09 Deutschland Realschul- oder gleichwertiger Abschluss 1\n", "10 2011-05-09 Deutschland Haupt-/ Volksschulabschluss 2\n", "11 2011-05-09 Deutschland Fachhochschulreife 5531480\n", "12 2011-05-09 Deutschland Ohne Schulabschluss 3241010" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Zensus\n", "t = pystatis.Table(name=\"2000S-1006\")\n", "t.get_data()\n", "t.data" ] }, { "cell_type": "markdown", "id": "4a869399", "metadata": {}, "source": [ "The `get_data()` method supports all parameters that you can pass to the API, like `startyear`, `endyear` or `timeslices`" ] }, { "cell_type": "code", "execution_count": 5, "id": "f912518e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
JahrDeutschland insgesamtEnergieträgerElektrizitaetserzeugung_(brutto)Elektrizitaetserzeugung_(netto)NettowaermeerzeugungBrennstoffeinsatz
02002DeutschlandSteinkohlen121062371.0111426568.0NaN1.081030e+09
12002DeutschlandSteinkohlenkoksNaNNaNNaNNaN
22002DeutschlandSteinkohlenbrikettsNaNNaNNaNNaN
32002DeutschlandKohlenwertstoffe aus SteinkohleNaNNaNNaNNaN
42002DeutschlandSonstige SteinkohlenNaNNaNNaNNaN
........................
10872022DeutschlandWärme346035.0330543.01115125.08.318414e+06
10882022DeutschlandStrom (Elektrokessel)NaNNaNNaNNaN
10892022DeutschlandSonstige EnergieträgerNaNNaNNaNNaN
10902022DeutschlandAndere SpeicherNaNNaNNaNNaN
10912022DeutschlandInsgesamt293024354.0273144778.098388835.02.395591e+09
\n", "

1092 rows × 7 columns

\n", "
" ], "text/plain": [ " Jahr Deutschland insgesamt Energieträger Elektrizitaetserzeugung_(brutto) Elektrizitaetserzeugung_(netto) Nettowaermeerzeugung Brennstoffeinsatz\n", "0 2002 Deutschland Steinkohlen 121062371.0 111426568.0 NaN 1.081030e+09\n", "1 2002 Deutschland Steinkohlenkoks NaN NaN NaN NaN\n", "2 2002 Deutschland Steinkohlenbriketts NaN NaN NaN NaN\n", "3 2002 Deutschland Kohlenwertstoffe aus Steinkohle NaN NaN NaN NaN\n", "4 2002 Deutschland Sonstige Steinkohlen NaN NaN NaN NaN\n", "... ... ... ... ... ... ... ...\n", "1087 2022 Deutschland Wärme 346035.0 330543.0 1115125.0 8.318414e+06\n", "1088 2022 Deutschland Strom (Elektrokessel) NaN NaN NaN NaN\n", "1089 2022 Deutschland Sonstige Energieträger NaN NaN NaN NaN\n", "1090 2022 Deutschland Andere Speicher NaN NaN NaN NaN\n", "1091 2022 Deutschland Insgesamt 293024354.0 273144778.0 98388835.0 2.395591e+09\n", "\n", "[1092 rows x 7 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# GENESIS\n", "t = pystatis.Table(name=\"43311-0001\")\n", "t.get_data(startyear=2000)\n", "t.data" ] }, { "cell_type": "markdown", "id": "19ef667d", "metadata": {}, "source": [ "Advanced Features\n", "-----------------\n", "\n", "- Caching\n", "- Handling background jobs\n", "- Cubes\n", "\n", "Geo-Visualization\n", "-----------------\n", "\n", "Case study: international students in Germany\n", "- time evolution\n", "- regional differences (at the level of federal states)" ] }, { "cell_type": "code", "execution_count": 24, "id": "2f74a00f", "metadata": {}, "outputs": [], "source": [ "# !pip install geopandas\n", "# !pip install matplotlib" ] }, { "cell_type": "code", "execution_count": 55, "id": "01a68933", "metadata": {}, "outputs": [], "source": [ "import geopandas\n", "import pandas as pd\n", "from matplotlib import pyplot as plt" ] }, { "cell_type": "markdown", "id": "0a0c49e2", "metadata": {}, "source": [ "### Load Data from Regionalstatistik\n", "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" ] }, { "cell_type": "code", "execution_count": 56, "id": "e5e9b02c", "metadata": {}, "outputs": [], "source": [ "students = pystatis.Table(name=\"21311-01-01-4\")" ] }, { "cell_type": "code", "execution_count": 57, "id": "76a868f5", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/macbookpro/Git Repos/pystatis/src/pystatis/table.py:46: DtypeWarning: Columns (7) have mixed types. Specify dtype option on import or set low_memory=False.\n", " self.data = pd.read_csv(data_str, sep=\";\", na_values = [\"...\",\".\",\"-\",\"/\",\"x\"])\n" ] } ], "source": [ "students.get_data(startyear=2015)" ] }, { "cell_type": "markdown", "id": "4de03a89", "metadata": {}, "source": [ "### Set Proper Column Types\n", "~~~~~~~~~~~~~~~~~~~~~~~~~~~" ] }, { "cell_type": "code", "execution_count": 61, "id": "87aa68c5", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 DG\n", "1 DG\n", "2 DG\n", "3 DG\n", "4 DG\n", " ... \n", "338935 16077\n", "338936 16077\n", "338937 16077\n", "338938 16077\n", "338939 16077\n", "Name: Kreise und kreisfreie Städte_Code, Length: 338940, dtype: object" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students.data[\"Kreise und kreisfreie Städte_Code\"] = students.data[\n", " \"Kreise und kreisfreie Städte_Code\"\n", "].astype(str)\n", "students.data[\"Kreise und kreisfreie Städte_Code\"]" ] }, { "cell_type": "code", "execution_count": 59, "id": "37e7f2be", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 DG\n", "1 DG\n", "2 DG\n", "3 DG\n", "4 DG\n", " ... \n", "338935 16077\n", "338936 16077\n", "338937 16077\n", "338938 16077\n", "338939 16077\n", "Name: Kreise und kreisfreie Städte_Code, Length: 338940, dtype: object" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students.data[\"Kreise und kreisfreie Städte_Code\"] = students.data[\n", " \"Kreise und kreisfreie Städte_Code\"\n", "].apply(lambda x: \"0\" + x if len(x) <= 1 else x)\n", "students.data[\"Kreise und kreisfreie Städte_Code\"]" ] }, { "cell_type": "markdown", "id": "62097373", "metadata": {}, "source": [ "### Inspect Dataframe\n", "~~~~~~~~~~~~~~~~~~~~~" ] }, { "cell_type": "code", "execution_count": 62, "id": "b7056a78", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SemesterKreise und kreisfreie StädteGeschlechtNationalität (inkl. insgesamt)Fächergruppe (mit Insgesamt)Kreise und kreisfreie Städte_CodeGeschlecht_CodeNationalität (inkl. insgesamt)_CodeFächergruppe (mit Insgesamt)_CodeStudierende_(im_Kreisgebiet)
0WS 2015/16DeutschlandInsgesamtInsgesamtGeisteswissenschaftenDGINSGESAMTINSGESAMTHS-FG01339730.0
1WS 2015/16DeutschlandInsgesamtInsgesamtSportDGINSGESAMTINSGESAMTHS-FG0227771.0
2WS 2015/16DeutschlandInsgesamtInsgesamtRechts-, Wirtschafts- und SozialwissenschaftenDGINSGESAMTINSGESAMTHS-FG031006645.0
3WS 2015/16DeutschlandInsgesamtInsgesamtMathematik/NaturwissenschaftenDGINSGESAMTINSGESAMTHS-FG04309194.0
4WS 2015/16DeutschlandInsgesamtInsgesamtHumanmedizin/GesundheitswissenschaftenDGINSGESAMTINSGESAMTHS-FG05166331.0
.................................
338935WS 2021/22Altenburger Land, LandkreisweiblichDeutscheAgrar-, Forst- und Ernährungswissensch., Veterinär16077GESWNATDHS-FG07NaN
338936WS 2021/22Altenburger Land, LandkreisweiblichDeutscheIngenieurwissenschaften16077GESWNATDHS-FG08NaN
338937WS 2021/22Altenburger Land, LandkreisweiblichDeutscheKunst, Kunstwissenschaft16077GESWNATDHS-FG09NaN
338938WS 2021/22Altenburger Land, LandkreisweiblichDeutscheAußerhalb der Studienbereichsgliederung16077GESWNATDHS-FG10NaN
338939WS 2021/22Altenburger Land, LandkreisweiblichDeutscheInsgesamt16077GESWNATDINSGESAMTNaN
\n", "

338940 rows × 10 columns

\n", "
" ], "text/plain": [ " Semester Kreise und kreisfreie Städte Geschlecht Nationalität (inkl. insgesamt) Fächergruppe (mit Insgesamt) Kreise und kreisfreie Städte_Code Geschlecht_Code Nationalität (inkl. insgesamt)_Code Fächergruppe (mit Insgesamt)_Code Studierende_(im_Kreisgebiet)\n", "0 WS 2015/16 Deutschland Insgesamt Insgesamt Geisteswissenschaften DG INSGESAMT INSGESAMT HS-FG01 339730.0\n", "1 WS 2015/16 Deutschland Insgesamt Insgesamt Sport DG INSGESAMT INSGESAMT HS-FG02 27771.0\n", "2 WS 2015/16 Deutschland Insgesamt Insgesamt Rechts-, Wirtschafts- und Sozialwissenschaften DG INSGESAMT INSGESAMT HS-FG03 1006645.0\n", "3 WS 2015/16 Deutschland Insgesamt Insgesamt Mathematik/Naturwissenschaften DG INSGESAMT INSGESAMT HS-FG04 309194.0\n", "4 WS 2015/16 Deutschland Insgesamt Insgesamt Humanmedizin/Gesundheitswissenschaften DG INSGESAMT INSGESAMT HS-FG05 166331.0\n", "... ... ... ... ... ... ... ... ... ... ...\n", "338935 WS 2021/22 Altenburger Land, Landkreis weiblich Deutsche Agrar-, Forst- und Ernährungswissensch., Veterinär 16077 GESW NATD HS-FG07 NaN\n", "338936 WS 2021/22 Altenburger Land, Landkreis weiblich Deutsche Ingenieurwissenschaften 16077 GESW NATD HS-FG08 NaN\n", "338937 WS 2021/22 Altenburger Land, Landkreis weiblich Deutsche Kunst, Kunstwissenschaft 16077 GESW NATD HS-FG09 NaN\n", "338938 WS 2021/22 Altenburger Land, Landkreis weiblich Deutsche Außerhalb der Studienbereichsgliederung 16077 GESW NATD HS-FG10 NaN\n", "338939 WS 2021/22 Altenburger Land, Landkreis weiblich Deutsche Insgesamt 16077 GESW NATD INSGESAMT NaN\n", "\n", "[338940 rows x 10 columns]" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "students.data" ] }, { "cell_type": "markdown", "id": "2785f6e4", "metadata": {}, "source": [ "### Determine Ratio of International Students per Year and Region\n", "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" ] }, { "cell_type": "code", "execution_count": 63, "id": "427e1320", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ratio_internationalyear
Kreise und kreisfreie StädteKreise und kreisfreie Städte_CodeSemester
Aachen, Kreis05354WS 2015/16NaN2015
WS 2016/17NaN2016
WS 2018/19NaN2018
WS 2020/21NaN2020
5354WS 2017/18NaN2017
...............
DeutschlandDGWS 2017/180.1316652017
WS 2018/190.1375992018
WS 2019/200.1423712019
WS 2020/210.1414462020
WS 2021/220.1497542021
\n", "

3767 rows × 2 columns

\n", "
" ], "text/plain": [ " ratio_international year\n", "Kreise und kreisfreie Städte Kreise und kreisfreie Städte_Code Semester \n", " Aachen, Kreis 05354 WS 2015/16 NaN 2015\n", " WS 2016/17 NaN 2016\n", " WS 2018/19 NaN 2018\n", " WS 2020/21 NaN 2020\n", " 5354 WS 2017/18 NaN 2017\n", "... ... ...\n", "Deutschland DG WS 2017/18 0.131665 2017\n", " WS 2018/19 0.137599 2018\n", " WS 2019/20 0.142371 2019\n", " WS 2020/21 0.141446 2020\n", " WS 2021/22 0.149754 2021\n", "\n", "[3767 rows x 2 columns]" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ratio_international = (\n", " students.data[\n", " (students.data.Geschlecht == \"Insgesamt\")\n", " & (students.data[\"Fächergruppe (mit Insgesamt)\"] == \"Insgesamt\")\n", " ]\n", " .groupby(\n", " by=[\n", " \"Kreise und kreisfreie Städte\",\n", " \"Kreise und kreisfreie Städte_Code\",\n", " \"Semester\",\n", " ]\n", " )[\"Studierende_(im_Kreisgebiet)\"]\n", " .apply(lambda x: x.iloc[1] / x.iloc[0] if x.count() == 3 else None)\n", ")\n", "ratio_international.rename(\"ratio_international\", inplace=True)\n", "\n", "ratio_international = pd.DataFrame(ratio_international)\n", "ratio_international[\"year\"] = [\n", " int(semester[3:7]) for semester in ratio_international.index.get_level_values(2)\n", "]\n", "\n", "ratio_international" ] }, { "cell_type": "code", "execution_count": 64, "id": "cf4be61f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ratio_internationalyear
Kreise und kreisfreie StädteKreise und kreisfreie Städte_CodeSemester
Bayern09WS 2015/160.1142882015
WS 2016/170.1204342016
WS 2017/180.1289482017
WS 2018/190.1391242018
WS 2019/200.1471572019
WS 2020/210.1508182020
WS 2021/220.1676802021
\n", "
" ], "text/plain": [ " ratio_international year\n", "Kreise und kreisfreie Städte Kreise und kreisfreie Städte_Code Semester \n", " Bayern 09 WS 2015/16 0.114288 2015\n", " WS 2016/17 0.120434 2016\n", " WS 2017/18 0.128948 2017\n", " WS 2018/19 0.139124 2018\n", " WS 2019/20 0.147157 2019\n", " WS 2020/21 0.150818 2020\n", " WS 2021/22 0.167680 2021" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ratio_international[ratio_international.index.get_level_values(0) == \" Bayern\"]" ] }, { "cell_type": "markdown", "id": "ddb17d52", "metadata": {}, "source": [ "### Plot Evoluation of International Students over Time\n", "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" ] }, { "cell_type": "code", "execution_count": 65, "id": "10b60366", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "for region in [\n", " \"Deutschland\",\n", " \" Baden-Württemberg\",\n", " \" Bayern\",\n", " \" Nordrhein-Westfalen\",\n", " \" Thüringen\",\n", " \" Sachsen\",\n", " \" Niedersachsen\",\n", " \" Schleswig-Holstein\",\n", " \" Berlin\",\n", "]:\n", " plt.plot(\n", " ratio_international[\n", " ratio_international.index.get_level_values(0) == region\n", " ].year,\n", " ratio_international[\n", " ratio_international.index.get_level_values(0) == region\n", " ].ratio_international,\n", " label=region,\n", " )\n", "plt.legend()" ] }, { "cell_type": "markdown", "id": "c816c0df", "metadata": {}, "source": [ "### Load Shape File for the Map of Germany\n", "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" ] }, { "cell_type": "code", "execution_count": 66, "id": "26590591", "metadata": { "lines_to_next_cell": 2 }, "outputs": [], "source": [ "path_to_data = \"./data/VG2500_LAN.shp\"\n", "gdf = geopandas.read_file(path_to_data)" ] }, { "cell_type": "code", "execution_count": 67, "id": "bb91abc6", "metadata": {}, "outputs": [], "source": [ "gdf.loc[:, \"area\"] = gdf.area" ] }, { "cell_type": "code", "execution_count": 68, "id": "5d2bb0f2", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "gdf.plot(\"area\", legend=True)" ] }, { "cell_type": "code", "execution_count": 69, "id": "27c5cf8e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 Schleswig-Holstein\n", "1 Hamburg\n", "2 Niedersachsen\n", "3 Bremen\n", "4 Nordrhein-Westfalen\n", "5 Hessen\n", "6 Rheinland-Pfalz\n", "7 Baden-Württemberg\n", "8 Bayern\n", "9 Saarland\n", "10 Berlin\n", "11 Brandenburg\n", "12 Mecklenburg-Vorpommern\n", "13 Sachsen\n", "14 Sachsen-Anhalt\n", "15 Thüringen\n", "16 Schleswig-Holstein\n", "17 Hamburg\n", "18 Niedersachsen\n", "19 Niedersachsen\n", "20 Bremen\n", "21 Mecklenburg-Vorpommern\n", "22 Baden-Württemberg (Bodensee)\n", "23 Bayern (Bodensee)\n", "Name: GEN, dtype: object" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gdf.GEN" ] }, { "cell_type": "code", "execution_count": 70, "id": "53069624", "metadata": {}, "outputs": [], "source": [ "gdf.AGS = gdf.AGS.astype(str)" ] }, { "cell_type": "markdown", "id": "fce92e3e", "metadata": {}, "source": [ "### Merge `GeoDataFrame` with Data and Visualize on the Map\n", "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" ] }, { "cell_type": "code", "execution_count": 102, "id": "877c61cc", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 102, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig = plt.figure(figsize=(10, 5))\n", "\n", "ax1 = fig.add_subplot(131)\n", "year = 2015\n", "plt.title(str(year))\n", "gdf_merged = pd.merge(\n", " left=gdf,\n", " right=ratio_international[ratio_international.year == year],\n", " left_on=\"AGS\",\n", " right_on=\"Kreise und kreisfreie Städte_Code\",\n", ")\n", "gdf_merged.ratio_international\n", "gdf_merged.plot(\n", " \"ratio_international\",\n", " ax=ax1,\n", " legend=True,\n", " missing_kwds={\"color\": \"lightgrey\"},\n", " legend_kwds={\n", " \"label\": \"ratio of int. students\",\n", " \"orientation\": \"horizontal\",\n", " },\n", " vmin=0.08,\n", " vmax=0.23,\n", ")\n", "\n", "ax2 = fig.add_subplot(132)\n", "year = 2018\n", "plt.title(str(year))\n", "gdf_merged = pd.merge(\n", " left=gdf,\n", " right=ratio_international[ratio_international.year == year],\n", " left_on=\"AGS\",\n", " right_on=\"Kreise und kreisfreie Städte_Code\",\n", ")\n", "gdf_merged.ratio_international\n", "gdf_merged.plot(\n", " \"ratio_international\",\n", " ax=ax2,\n", " legend=True,\n", " missing_kwds={\"color\": \"lightgrey\"},\n", " legend_kwds={\n", " \"label\": \"ratio of int. students\",\n", " \"orientation\": \"horizontal\",\n", " },\n", " vmin=0.08,\n", " vmax=0.23,\n", ")\n", "\n", "ax3 = fig.add_subplot(133)\n", "year = 2021\n", "plt.title(str(year))\n", "gdf_merged = pd.merge(\n", " left=gdf,\n", " right=ratio_international[ratio_international.year == year],\n", " left_on=\"AGS\",\n", " right_on=\"Kreise und kreisfreie Städte_Code\",\n", ")\n", "gdf_merged.ratio_international\n", "gdf_merged.plot(\n", " \"ratio_international\",\n", " ax=ax3,\n", " legend=True,\n", " missing_kwds={\"color\": \"lightgrey\"},\n", " legend_kwds={\n", " \"label\": \"ratio of int. students\",\n", " \"orientation\": \"horizontal\",\n", " },\n", " vmin=0.08,\n", " vmax=0.23,\n", ")" ] }, { "cell_type": "markdown", "id": "7d7fffc9", "metadata": {}, "source": [ "Outlook\n", "-------\n", "\n", "- `quality=on` -> handle the different quality identifiers\n", "- `Find` to work across all databases -> search for and find results over all supported databases with a single query\n", "- `LLM`? -> ideation: provide some kind of interface that allows to talk with GENESIS via a LLM like ChatGPT" ] }, { "cell_type": "markdown", "id": "d8b9d035", "metadata": {}, "source": [] } ], "metadata": { "jupytext": { "formats": "ipynb,py:percent" }, "kernelspec": { "display_name": "pystatis", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.6" } }, "nbformat": 4, "nbformat_minor": 5 }