Skip to content
Snippets Groups Projects
Forked from Tricoteuses / tricoteuses-assemblee
26 commits behind the upstream repository.
Henry Boisgibault's avatar
Henry Boisgibault authored
refactor(xml.ts): optimize boolean and integer attribute cleaning functions for better performance and readability
refactor(clean_reorganized_data.ts): improve string concatenation for dataset clean directory paths for consistency and clarity
423b5b3a
History

Tricoteuses-Assemblee

Retrieve, clean up & handle French Assemblée nationale's open data

Requirements

  • Node >= 18

Installation

git clone https://git.en-root.org/tricoteuses/tricoteuses-assemblee
cd tricoteuses-assemblee/
npm install

Download and clean data

Basic usage

Create a folder where the data will be downloaded and run the following command to download, reorganize and clean the data.

mkdir ../assemblee-data/

# Download and clean open data
npm run data:download ../assemblee-data

Data from other sources is also available :

# Retrieval of députés' pictures from Assemblée nationale's website
npm run data:retrieve_deputes_photos ../assemblee-data

# Retrieval of sénateurs' pictures from Assemblée nationale's website
npm run data:retrieve_senateurs_photos ../assemblee-data

# Retrieval of pending amendments from Assemblée nationale's website (waiting to be processed by Assemblée services)
npm run data:retrieve_pending_amendements ../assemblee-data

Notes:

Filtering options

Downloading and cleaning all the data is long and takes up a lot of disk space. It is possible to choose the type of data that you want to retrieve to reduce the load.

To download only a type of dataset, use the --categories option (shortcut -k) :

# Available options : ActeursEtOrganes, Agendas, Amendements, DossiersLegislatifs, Photos, Scrutins, Questions, ComptesRendusSeances
npm run data:download ../assemblee-data -- --categories Amendements

To download only a specific legislature, use the --legislature option (shortcut -l):

# Available options : 14, 15, 16, 17
npm run data:download ../assemblee-data -- --legislature 17

If you use such options, use them in all subsequent commands too (data:regorganize_data and data:clean_data).

Download using Docker

A Docker image that downloads and cleans the data all at once is available. Build it locally or pull it from the container registry :

docker pull registry.en-root.org/tricoteuses/tricoteuses-assemblee:latest

Create a volume to download the data and use the environment variables LEGISLATURE and CATEGORIES if needed :

docker volume create assemblee-data
docker run --name tricoteuses-assemblee -v assemblee-data:/app/assemblee -e LEGISLATURE=17 -d registry.en-root.org/tricoteuses/tricoteuses-assemblee:latest

Using the data

Once the data is downloaded and cleaned, you can use loaders to retrieve it. To use loaders in your project, you can install the @tricoteuses/assemblee package, and import the iterator functions that you need.

npm install @tricoteuses/assemblee
import {
  iterLoadAssembleeActeurs,
  iterLoadAssembleeOrganes,
  iterLoadAssembleeReunions,
  iterLoadAssembleeScrutins,
  iterLoadAssembleeDocuments,
  iterLoadAssembleeDossiersParlementaires,
  iterLoadAssembleeAmendements,
  iterLoadAssembleeQuestions,
  iterLoadAssembleeComptesRendus,
} from "@tricoteuses/assemblee/lib/loaders";

// Pass data directory and legislature as arguments
for (const { acteur } of iterLoadAssembleeActeurs("../assemblee-data", 17)) {
  console.log(acteur.uid)
}

Generating schemas and documentation (for contributors only)

View instructions here