Segmented Power Law

Segmented Power Law#

There are several approaches to fitting a stage-discharge rating curve. This notebook demonstrates an simple way to approximate the classic approach, which uses a segmented power law.

# Uncomment below to setup Google Colab. It will take a minute or so.
# %%capture
# !pip install pymc==5.1.2
# %env MKL_THREADING_LAYER=GNU
# !pip install git+https://github.com/thodson-usgs/ratingcurve.git

%load_ext autoreload
%autoreload 2

import pymc as pm
import arviz as az
from ratingcurve.ratings import PowerLawRating

WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.

Load Data#

Begin by loading the Green Channel dataset

from ratingcurve import data
df = data.load('green channel')
df.head()

	datetime	stage	q	q_sigma
0	2020-05-21 14:13:41 [UTC-07:00]	7.04	12199.342	199.172931
1	2020-04-16 14:55:31 [UTC-07:00]	4.43	4921.953	95.425619
2	2020-03-04 13:54:10 [UTC-07:00]	2.99	2331.665	61.860500
3	2020-03-04 13:16:51 [UTC-07:00]	2.94	2289.220	47.886745
4	2020-01-23 11:04:32 [UTC-07:00]	2.96	2408.210	99.522964

and plotting the observations.

ax = df.plot.scatter(x='q', y='stage', marker='o')
ax.set_xlabel("Discharge (cfs)")
ax.set_ylabel("Stage (ft)")

Text(0, 0.5, 'Stage (ft)')

../_images/47b98e7116a9193a4ed31a32fb890df46f667fce0f48b92c78aacf541c78b77f.png

Setup model#

Now, setup the rating model. This make take a minute the first time while the model compiles but will be faster on subsequent runs.

powerrating = PowerLawRating(segments=2,
                             prior={'distribution':'uniform'})

There are of variety of ways to adjust the optimization. Here we’ll use the defaults, which uses ADVI and runs for 200,000 iterations, though the model should coverge well before that.

trace = powerrating.fit(q=df['q'],
                        h=df['stage'], 
                        q_sigma=df['q_sigma'])

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/rich/live.py:231: UserWarning: install 
"ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')

Convergence achieved at 139500

Interrupted at 139,499 [69%]: Average Loss = 40.4

Sampling: [a, b, hs_, model_q, sigma]

Sampling: [model_q]

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/rich/live.py:231: UserWarning: install 
"ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')

Once fit, we can plot the rating curve.

powerrating.plot()

Sampling: [model_q]

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/rich/live.py:231: UserWarning: install 
"ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')

../_images/81f0b0cbf2eb43205cccb34d51d8ffdc2a6d921accf3b231800732951c583ab3.png

or as a table of stage-discharge values.

table = powerrating.table()
table.head()

Sampling: [model_q]

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/rich/live.py:231: UserWarning: install 
"ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')

	stage	discharge	median	gse
0	2.21	1384.055929	1383.631462	1.017378
1	2.22	1396.196507	1395.951473	1.017410
2	2.23	1408.403309	1408.008281	1.017098
3	2.24	1420.336842	1420.002411	1.017101
4	2.25	1432.380651	1431.898741	1.017345

Exercise#

What happens if we choose the wrong number of segments? Increase the number of segments by one and rerun the model. In fact, we can use this to select the correct number of segments, which is demonstrated in the model evaluation notebook.

Simulated Example#

This example uses a simulated rating curve, which allows you to test how changing the number of segments affects the rating curve.

First, load the ‘3-segment simulated’ tutorial dataset.

sim_df = data.load('3-segment simulated')

This rating contains observations of every 0.01 inch. increment in stage, which is much more than we’d have for a natural rating. Try sampling to n_sample=15 or n_sample=30 and see how that affects the model fit.

# subsample the simulated rating curve
n_sample = 30
df = sim_df.sample(n_sample, random_state=12345)

ax = sim_df.plot(x='q', y='stage', color='grey', ls='-', legend=False)
df.plot.scatter(x='q', y='stage', marker='o', color='blue', ax=ax)
ax.set_xlabel("Discharge (cfs)");
ax.set_ylabel("Stage (ft)");

../_images/b3c03efd2a84cec2f0e7aed836e73216fb6bc843985cb22dfd3ff90e4907375a.png

Setup a rating model with 3 segments

powerrating = PowerLawRating(segments=3,
                             prior={'distribution':'uniform'},
                             #prior={'distribution':'normal', 'mu':[5, 8, 11], 'sigma':[1, 1, 0.2]}
                            )

now fit the model using ADVI

trace = powerrating.fit(q=df['q'],
                        h=df['stage'],
                        q_sigma=None,
                        method='advi')

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/rich/live.py:231: UserWarning: install 
"ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')

Finished [100%]: Average Loss = -48.508

Sampling: [a, b, hs_, model_q, sigma]

Sampling: [model_q]

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/rich/live.py:231: UserWarning: install 
"ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')

and visualize the results.

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
powerrating.plot(ax=ax)

# plot the original data for comparison
sim_df.plot(x='q', y='stage', color='red', ls=':', legend=False, ax=ax)

Sampling: [model_q]

/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/rich/live.py:231: UserWarning: install 
"ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')

<Axes: xlabel='q', ylabel='Stage'>

../_images/bc84e102837532c4f23c08d497cdba3e7e6fa30bb45d8eae49647b8e65323a73.png

powerrating.summary(var_names=["b", "a", "sigma", "hs"])

arviz - WARNING - Shape validation failed: input_shape: (1, 10000), minimum_shape: (chains=2, draws=4)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
b[0]	0.984	0.002	0.980	0.988	0.000	0.000	9598.0	9163.0	NaN
b[1]	0.353	0.008	0.338	0.368	0.000	0.000	9792.0	9747.0	NaN
b[2]	0.049	0.056	-0.055	0.155	0.001	0.000	10085.0	9675.0	NaN
a	-1.007	0.003	-1.013	-1.002	0.000	0.000	9838.0	9719.0	NaN
sigma	0.015	0.002	0.012	0.019	0.000	0.000	9832.0	9679.0	NaN
hs[0, 0]	4.928	0.001	4.926	4.930	0.000	0.000	10334.0	10066.0	NaN
hs[1, 0]	10.545	0.032	10.486	10.607	0.000	0.000	9758.0	10008.0	NaN
hs[2, 0]	12.317	0.166	12.024	12.532	0.002	0.001	9858.0	9659.0	NaN

Fitting this model can be tricky. The most common issue is a poor initialization of the breakpoints. A fix is under development, but for now, try

reinitializing the model PowerLawRating();
increasing the number of iterations for the fitting algorithm fit(n=300_000);
a prior on the breakpoints, example, try prior={'distribution':'normal', 'mu':[5, 9.5, 10.5], 'sigma':[1, 1, 0.2]}), which implies we know the true breakpoint within +-0.5 ft; or
fitting the model with NUTS fit(method='nuts').

%load_ext watermark
%watermark -n -u -v -iv -w -p pytensor,xarray

Last updated: Mon Jun 03 2024

Python implementation: CPython
Python version       : 3.11.9
IPython version      : 8.25.0

pytensor: 2.22.1
xarray  : 2024.5.0

matplotlib : 3.9.0
pymc       : 5.15.1
arviz      : 0.18.0
ratingcurve: 0.1.dev1+g54c5899

Watermark: 2.4.3