Configuration¶

hydrodatasource uses a central YAML configuration file to manage data paths and connections to remote services. This makes it easy to adapt the library to your local environment.

The `hydro_setting.yml` File¶

The configuration is managed through a file named hydro_setting.yml located in your user home directory (e.g., C:\Users\YourUser on Windows or /home/YourUser on Linux).

If this file does not exist, hydrodatasource will automatically create and use a default directory named hydrodatasource_data in your home directory.

Local Data Path Configuration¶

For most use cases, you only need to configure the local_data_path section. This tells the library where to find and store your data.

Here is an example of a minimal hydro_setting.yml:

local_data_path:
  root: 'D:\data\hydro_data' # The main directory for all your data
  datasets-origin: 'D:\data\hydro_data\origin' # For raw, unprocessed data
  datasets-interim: 'D:\data\hydro_data\interim' # For intermediate, processed data
  cache: 'D:\data\hydro_data\.cache' # For storing cached files like NetCDFs

root: The top-level directory for all project-related data.
datasets-origin: This is where you should place your original, raw datasets.
datasets-interim: This directory is used for storing data that has been processed or transformed in some way.
cache: hydrodatasource uses this directory to store cached data, such as the NetCDF files generated by SelfMadeHydroDataset. This speeds up data loading on subsequent runs.

Other Configurations¶

The hydro_setting.yml file can also be used to configure connections to a MinIO object storage server and a PostgreSQL database, but these are not required for basic local file-based operations.

Configuration¶

The hydro_setting.yml File¶

Local Data Path Configuration¶

Other Configurations¶

The `hydro_setting.yml` File¶