Skip to content

insee

INSEE and data.gouv.fr connectors for Snowpark Stored Procedures.

Covers three data sources:

  • Melodi — INSEE's dissemination API for statistical datasets (catalog, file download, range queries, direct SQL-style queries).
  • INSEE Metadata — geographic reference lookups (communes, départements, régions).
  • data.gouv.fr / SIRENE — Parquet file references and S3 constants for the official SIRENE register published by data.gouv.fr.

All requests imports are lazy — the module is importable without requests installed.

DGF_S3_ENDPOINT = 'https://object.files.data.gouv.fr' module-attribute

Base URL for the data.gouv.fr S3-compatible object store.

DGF_SIRENE_S3_BUCKET = 'insee' module-attribute

S3 bucket name for SIRENE Parquet exports.

DGF_SIRENE_S3_PREFIX = 'sirene' module-attribute

S3 key prefix under which SIRENE Parquet files are stored.

SireneFile

Bases: BaseModel

Reference to a SIRENE Parquet file on data.gouv.fr S3.

Attributes:

Name Type Description
name str

File name (e.g. StockEtablissement_utf8.parquet).

url str

Full download URL.

size_bytes int

Reported file size in bytes.

last_modified str

ISO-8601 last-modification timestamp.

Source code in src/pinky_connect/insee.py
34
35
36
37
38
39
40
41
42
43
44
45
46
47
class SireneFile(BaseModel):
    """Reference to a SIRENE Parquet file on data.gouv.fr S3.

    Attributes:
        name: File name (e.g. ``StockEtablissement_utf8.parquet``).
        url: Full download URL.
        size_bytes: Reported file size in bytes.
        last_modified: ISO-8601 last-modification timestamp.
    """

    name: str
    url: str
    size_bytes: int
    last_modified: str

dgf_dataset_resources(dataset_id)

List all resources attached to a data.gouv.fr dataset.

Parameters:

Name Type Description Default
dataset_id str

The data.gouv.fr dataset identifier (UUID).

required

Returns:

Type Description
list[dict[str, Any]]

List of resource metadata dicts as returned by the data.gouv.fr API.

Raises:

Type Description
HTTPError

On non-2xx responses.

Source code in src/pinky_connect/insee.py
50
51
52
53
54
55
56
57
58
59
60
61
62
def dgf_dataset_resources(dataset_id: str) -> list[dict[str, Any]]:
    """List all resources attached to a data.gouv.fr dataset.

    Args:
        dataset_id: The data.gouv.fr dataset identifier (UUID).

    Returns:
        List of resource metadata dicts as returned by the data.gouv.fr API.

    Raises:
        requests.HTTPError: On non-2xx responses.
    """
    raise NotImplementedError("pending migration from _BACKUP/snowflake-kit-connect-bak")

geo_commune(code)

Look up a French commune by its INSEE code.

Parameters:

Name Type Description Default
code str

5-digit INSEE commune code.

required

Returns:

Type Description
dict[str, Any]

Commune metadata dict (name, département, région, population…).

Raises:

Type Description
HTTPError

If the code is not found.

Source code in src/pinky_connect/insee.py
125
126
127
128
129
130
131
132
133
134
135
136
137
def geo_commune(code: str) -> dict[str, Any]:
    """Look up a French commune by its INSEE code.

    Args:
        code: 5-digit INSEE commune code.

    Returns:
        Commune metadata dict (name, département, région, population…).

    Raises:
        requests.HTTPError: If the code is not found.
    """
    raise NotImplementedError("pending migration from _BACKUP/snowflake-kit-connect-bak")

geo_departement(code)

Look up a French département by its INSEE code.

Parameters:

Name Type Description Default
code str

2–3 character département code (e.g. "75", "2A").

required

Returns:

Type Description
dict[str, Any]

Département metadata dict (name, région…).

Source code in src/pinky_connect/insee.py
140
141
142
143
144
145
146
147
148
149
def geo_departement(code: str) -> dict[str, Any]:
    """Look up a French département by its INSEE code.

    Args:
        code: 2–3 character département code (e.g. ``"75"``, ``"2A"``).

    Returns:
        Département metadata dict (name, région…).
    """
    raise NotImplementedError("pending migration from _BACKUP/snowflake-kit-connect-bak")

geo_region(code)

Look up a French région by its INSEE code.

Parameters:

Name Type Description Default
code str

2-digit région code (e.g. "11" for Île-de-France).

required

Returns:

Type Description
dict[str, Any]

Région metadata dict (name, chef-lieu…).

Source code in src/pinky_connect/insee.py
152
153
154
155
156
157
158
159
160
161
def geo_region(code: str) -> dict[str, Any]:
    """Look up a French région by its INSEE code.

    Args:
        code: 2-digit région code (e.g. ``"11"`` for Île-de-France).

    Returns:
        Région metadata dict (name, chef-lieu…).
    """
    raise NotImplementedError("pending migration from _BACKUP/snowflake-kit-connect-bak")

melodi_catalog(theme=None)

Fetch the Melodi dataset catalog.

Parameters:

Name Type Description Default
theme str | None

Optional theme code to filter the catalog (e.g. "DEMOGRAPHIC").

None

Returns:

Type Description
list[dict[str, Any]]

List of dataset descriptor dicts.

Source code in src/pinky_connect/insee.py
65
66
67
68
69
70
71
72
73
74
def melodi_catalog(theme: str | None = None) -> list[dict[str, Any]]:
    """Fetch the Melodi dataset catalog.

    Args:
        theme: Optional theme code to filter the catalog (e.g. ``"DEMOGRAPHIC"``).

    Returns:
        List of dataset descriptor dicts.
    """
    raise NotImplementedError("pending migration from _BACKUP/snowflake-kit-connect-bak")

melodi_fetch_file(dataset_id, *, variant=None)

Download a file from a Melodi dataset.

Parameters:

Name Type Description Default
dataset_id str

Melodi dataset identifier.

required
variant str | None

Optional file variant (e.g. "parquet", "csv").

None

Returns:

Type Description
bytes

Raw file content as bytes.

Raises:

Type Description
HTTPError

On non-2xx responses.

Source code in src/pinky_connect/insee.py
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
def melodi_fetch_file(
    dataset_id: str,
    *,
    variant: str | None = None,
) -> bytes:
    """Download a file from a Melodi dataset.

    Args:
        dataset_id: Melodi dataset identifier.
        variant: Optional file variant (e.g. ``"parquet"``, ``"csv"``).

    Returns:
        Raw file content as bytes.

    Raises:
        requests.HTTPError: On non-2xx responses.
    """
    raise NotImplementedError("pending migration from _BACKUP/snowflake-kit-connect-bak")

melodi_query(dataset_id, filters=None)

Query a Melodi dataset and return matching rows.

Parameters:

Name Type Description Default
dataset_id str

Melodi dataset identifier.

required
filters dict[str, Any] | None

Key-value filters applied server-side.

None

Returns:

Type Description
list[dict[str, Any]]

List of row dictionaries.

Source code in src/pinky_connect/insee.py
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
def melodi_query(
    dataset_id: str,
    filters: dict[str, Any] | None = None,
) -> list[dict[str, Any]]:
    """Query a Melodi dataset and return matching rows.

    Args:
        dataset_id: Melodi dataset identifier.
        filters: Key-value filters applied server-side.

    Returns:
        List of row dictionaries.
    """
    raise NotImplementedError("pending migration from _BACKUP/snowflake-kit-connect-bak")

melodi_ranges(dataset_id)

Return the available time ranges for a Melodi dataset.

Parameters:

Name Type Description Default
dataset_id str

Melodi dataset identifier.

required

Returns:

Type Description
dict[str, Any]

Dict with start, end, and frequency keys.

Source code in src/pinky_connect/insee.py
113
114
115
116
117
118
119
120
121
122
def melodi_ranges(dataset_id: str) -> dict[str, Any]:
    """Return the available time ranges for a Melodi dataset.

    Args:
        dataset_id: Melodi dataset identifier.

    Returns:
        Dict with ``start``, ``end``, and ``frequency`` keys.
    """
    raise NotImplementedError("pending migration from _BACKUP/snowflake-kit-connect-bak")