hydrodatasource¶
- Free software: BSD license
- Documentation: https://iHeadWater.github.io/hydrodatasource
Overview¶
While libraries like hydrodataset exist for accessing standardized, public hydrological datasets (e.g., CAMELS), a common challenge is working with data that isn't in a ready-to-use format. This includes non-public industry data, data from local authorities, or custom datasets compiled for specific research projects.
hydrodatasource is designed to solve this problem. It provides a flexible framework to read, process, and clean these custom datasets, preparing them for hydrological modeling and analysis.
The core of this framework is the SelfMadeHydroDataset class, which allows you to easily access your own data by organizing it into a simple, predefined directory structure.
Reading Custom Datasets with SelfMadeHydroDataset¶
This is the primary use case for hydrodatasource. If you have your own basin-level time series and attribute data, you can use this class to load it seamlessly.
1. Prepare Your Data Directory¶
First, organize your data into the following folder structure:
1 2 3 4 5 6 7 8 9 10 11 12 | |
attributes/attributes.csv: A CSV file containing static basin attributes (e.g., area, mean elevation). Must include abasin_idcolumn that matches the filenames in thetimeseriesfolder.shapes/basins.shp: A shapefile with the polygon geometry for each basin.timeseries/1D/: A folder for each time resolution (e.g.,1Dfor daily,3hfor 3-hourly). Inside, each CSV file should contain the time series data for a single basin and be named after itsbasin_id.timeseries/1D_units_info.json: A JSON file defining the units for each variable in your time series CSVs (e.g.,{"precipitation": "mm/d", "streamflow": "m3/s"}).
2. Read the Data in Python¶
Once your data is organized, you can use SelfMadeHydroDataset to read it with just a few lines of code.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | |
Other Features¶
Beyond reading data, hydrodatasource also includes modules for:
processor: Perform advanced calculations like identifying rainfall-runoff events (dmca_esr.py) and calculating basin-wide mean rainfall from station data (basin_mean_rainfall.py).cleaner: Clean raw time series data. This includes tools for smoothing noisy streamflow data, correcting anomalies in rainfall and water level records, and back-calculating reservoir inflow.
The usage of these modules is described in the API Reference. We will add more examples in the future.
Installation¶
For standard use, install the package from PyPI:
1 | |
Development Setup¶
For developers, it is recommended to use uv to manage the environment, as this project has local dependencies (e.g., hydroutils, hydrodataset).
-
Clone the repository:
1 2
git clone https://github.com/iHeadWater/hydrodatasource.git cd hydrodatasource -
Sync the environment with
uv: This command will install all dependencies, including the local editable packages.1uv sync --all-extras