insee
INSEE and data.gouv.fr connectors for Snowpark Stored Procedures.
Covers three data sources:
- Melodi — INSEE's dissemination API for statistical datasets (catalog, file download, range queries, direct SQL-style queries).
- INSEE Metadata — geographic reference lookups (communes, départements, régions).
- data.gouv.fr / SIRENE — Parquet file references and S3 constants for the official SIRENE register published by data.gouv.fr.
All requests imports are lazy — the module is importable without
requests installed.
DGF_S3_ENDPOINT = 'https://object.files.data.gouv.fr'
module-attribute
Base URL for the data.gouv.fr S3-compatible object store.
DGF_SIRENE_S3_BUCKET = 'insee'
module-attribute
S3 bucket name for SIRENE Parquet exports.
DGF_SIRENE_S3_PREFIX = 'sirene'
module-attribute
S3 key prefix under which SIRENE Parquet files are stored.
SireneFile
Bases: BaseModel
Reference to a SIRENE Parquet file on data.gouv.fr S3.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
File name (e.g. |
url |
str
|
Full download URL. |
size_bytes |
int
|
Reported file size in bytes. |
last_modified |
str
|
ISO-8601 last-modification timestamp. |
Source code in src/pinky_connect/insee.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 | |
dgf_dataset_resources(dataset_id)
List all resources attached to a data.gouv.fr dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_id
|
str
|
The data.gouv.fr dataset identifier (UUID). |
required |
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]]
|
List of resource metadata dicts as returned by the data.gouv.fr API. |
Raises:
| Type | Description |
|---|---|
HTTPError
|
On non-2xx responses. |
Source code in src/pinky_connect/insee.py
50 51 52 53 54 55 56 57 58 59 60 61 62 | |
geo_commune(code)
Look up a French commune by its INSEE code.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
code
|
str
|
5-digit INSEE commune code. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Commune metadata dict (name, département, région, population…). |
Raises:
| Type | Description |
|---|---|
HTTPError
|
If the code is not found. |
Source code in src/pinky_connect/insee.py
125 126 127 128 129 130 131 132 133 134 135 136 137 | |
geo_departement(code)
Look up a French département by its INSEE code.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
code
|
str
|
2–3 character département code (e.g. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Département metadata dict (name, région…). |
Source code in src/pinky_connect/insee.py
140 141 142 143 144 145 146 147 148 149 | |
geo_region(code)
Look up a French région by its INSEE code.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
code
|
str
|
2-digit région code (e.g. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Région metadata dict (name, chef-lieu…). |
Source code in src/pinky_connect/insee.py
152 153 154 155 156 157 158 159 160 161 | |
melodi_catalog(theme=None)
Fetch the Melodi dataset catalog.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
theme
|
str | None
|
Optional theme code to filter the catalog (e.g. |
None
|
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]]
|
List of dataset descriptor dicts. |
Source code in src/pinky_connect/insee.py
65 66 67 68 69 70 71 72 73 74 | |
melodi_fetch_file(dataset_id, *, variant=None)
Download a file from a Melodi dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_id
|
str
|
Melodi dataset identifier. |
required |
variant
|
str | None
|
Optional file variant (e.g. |
None
|
Returns:
| Type | Description |
|---|---|
bytes
|
Raw file content as bytes. |
Raises:
| Type | Description |
|---|---|
HTTPError
|
On non-2xx responses. |
Source code in src/pinky_connect/insee.py
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 | |
melodi_query(dataset_id, filters=None)
Query a Melodi dataset and return matching rows.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_id
|
str
|
Melodi dataset identifier. |
required |
filters
|
dict[str, Any] | None
|
Key-value filters applied server-side. |
None
|
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]]
|
List of row dictionaries. |
Source code in src/pinky_connect/insee.py
97 98 99 100 101 102 103 104 105 106 107 108 109 110 | |
melodi_ranges(dataset_id)
Return the available time ranges for a Melodi dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_id
|
str
|
Melodi dataset identifier. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dict with |
Source code in src/pinky_connect/insee.py
113 114 115 116 117 118 119 120 121 122 | |