API reference¶
Auto-generated from the source docstrings.
esnfed.esn¶
Echo State Network (ESN).
An ESN is a recurrent neural network from the Reservoir Computing paradigm (Jaeger, 2001). A large, fixed, randomly connected recurrent layer -- the reservoir -- projects the input into a high-dimensional dynamical feature space; only a linear readout is trained, here by ridge regression. Training is therefore convex and fast, with no backpropagation through time.
State update (leaky-integrator neurons, Jaeger et al. 2007):
x(t) = (1 - a) x(t-1) + a * tanh( W_in [b; u(t)] + W x(t-1) )
Readout:
y(t) = W_out [b; u(t); x(t)]
where a is the leaking rate, b a bias constant, W the (spectral-radius scaled) reservoir matrix and W_in the input matrix. W_out is the only trained quantity.
Performance
The state-harvesting loop is the hot path. Three optional accelerations are available and all keep the public API and results unchanged:
- Numba -- if installed (
pip install "esnfed[fast]"), the densefloat64harvest loop is JIT-compiled to native speed automatically; otherwise a pure-NumPy fallback is used. - Sparse reservoirs (
sparse=True) -- storeWas a SciPy CSR matrix, turning the per-step matrix-vector product from O(N^2) into O(edges); a large win for big, sparse reservoirs. - float32 (
dtype=np.float32) -- halves memory traffic for a modest speed gain on memory-bound workloads.
EchoStateNetwork dataclass ¶
EchoStateNetwork(n_inputs: int, n_outputs: int, reservoir: ndarray, spectral_radius: float = 0.9, input_scaling: float = 1.0, leaking_rate: object = 1.0, activation: object = 'tanh', ridge: float = 1e-06, washout: int = 100, bias: float = 1.0, seed: int | None = None, input_weights: ndarray | None = None, dtype: object = np.float64, sparse: bool = False, use_numba: bool | None = None)
A leaky-integrator Echo State Network with a ridge-regression readout.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_inputs | int | Input and output dimensionality. | required |
n_outputs | int | Input and output dimensionality. | required |
reservoir | ndarray | Square reservoir weight matrix | required |
spectral_radius | float | Target largest absolute eigenvalue of | 0.9 |
input_scaling | float | Scaling applied to the random input weights. | 1.0 |
leaking_rate | object | Leaky-integrator rate | 1.0 |
activation | object | Node nonlinearity. | 'tanh' |
ridge | float | Tikhonov (ridge) regularisation strength for the readout. | 1e-06 |
washout | int | Number of initial timesteps discarded when harvesting states, to remove the dependence on the (zero) initial state. | 100 |
bias | float | Constant bias fed to the reservoir and readout. | 1.0 |
seed | int | None | Seed for the input-weight RNG (the reservoir itself is supplied). | None |
input_weights | ndarray | None | Optional caller-supplied input matrix of shape | None |
dtype | object | Floating-point type for the reservoir and states ( | float64 |
sparse | bool | If true, store the reservoir as a SciPy CSR matrix (needs SciPy); the per-step matvec then costs O(edges) instead of O(N^2). Best for large, low-density reservoirs. | False |
use_numba | bool | None |
| None |
harvest ¶
Run the reservoir over inputs u and return extended states.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
u | ndarray | Input array of shape (T, n_inputs). | required |
x0 | ndarray | None | Optional initial reservoir state (defaults to zeros). | None |
Returns:
| Type | Description |
|---|---|
Z | Extended-state matrix of shape (T, 1 + n_inputs + n_reservoir), each row being |
Source code in esnfed/esn.py
fit ¶
Train the readout by ridge regression on a single sequence.
Source code in esnfed/esn.py
predict ¶
Predict outputs for inputs u using the trained readout.
Source code in esnfed/esn.py
local_statistics ¶
Return the ridge sufficient statistics (A, B) for local data.
A = Z^T Z and B = Z^T Y over the post-washout extended states. Summing these across clients and solving once yields exactly the readout that pooled training would produce -- the basis of exact federated ridge regression.
Source code in esnfed/esn.py
ridge_statistics ¶
Return sufficient statistics A = Z^T Z and B = Z^T Y.
The Gram matrix is accumulated in float64 even when the states are float32: the harvest keeps the float32 speed/memory benefit, while the (small, ill-conditioned) ridge solve stays numerically stable.
Source code in esnfed/esn.py
solve_readout ¶
Solve (A + ridge * I) W_out = B for the readout weights.
esnfed.deep¶
Hierarchical (deep) Echo State Networks.
A deep ESN stacks several reservoir layers: the states of layer l are the input of layer l+1 (Gallicchio, Micheli & Pedrelli, 2017). Giving each layer its own properties --- size, spectral radius, leaking rate / time-scale, topology or node nonlinearity --- builds a hierarchy of progressively slower, more abstract dynamics, which markedly increases memory and nonlinear capacity over a single-layer reservoir of the same total size.
Only a single linear readout is trained, on the concatenation of all layers' states, so training stays a convex ridge regression and the exact federated scheme of :mod:esnfed.federated applies unchanged (a :class:DeepEchoStateNetwork is a drop-in for :class:~esnfed.esn.EchoStateNetwork in a federated :class:~esnfed.federated.Client).
DeepEchoStateNetwork dataclass ¶
DeepEchoStateNetwork(n_inputs: int, n_outputs: int, reservoirs: list, spectral_radius: object = 0.9, leaking_rate: object = 0.7, activation: object = 'tanh', input_scaling: object = 1.0, ridge: float = 1e-06, washout: int = 100, bias: float = 1.0, seed: int | None = None)
A stack of reservoir layers with a single ridge readout over all states.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_inputs | int | Input and output dimensionality. | required |
n_outputs | int | Input and output dimensionality. | required |
reservoirs | list | List of square reservoir matrices, one per layer (e.g. from :mod: | required |
spectral_radius | object | Per-layer hyper-parameters. Each may be a single value (shared by all layers) or a list with one entry per layer. Notably | 0.9 |
leaking_rate | object | Per-layer hyper-parameters. Each may be a single value (shared by all layers) or a list with one entry per layer. Notably | 0.9 |
activation | object | Per-layer hyper-parameters. Each may be a single value (shared by all layers) or a list with one entry per layer. Notably | 0.9 |
input_scaling | object | Per-layer hyper-parameters. Each may be a single value (shared by all layers) or a list with one entry per layer. Notably | 0.9 |
ridge | float | As in :class: | 1e-06 |
washout | float | As in :class: | 1e-06 |
bias | float | As in :class: | 1e-06 |
seed | float | As in :class: | 1e-06 |
harvest ¶
Run the stack and return [bias, u(t), x^(1)(t), ..., x^(L)(t)].
Source code in esnfed/deep.py
esnfed.topologies¶
Reservoir topology generators.
The recurrent connectivity of an ESN reservoir can be modelled as a directed graph. The choice of graph model shapes the reservoir dynamics and is the central experimental variable of this project. Each generator returns a dense n x n weight matrix whose non-zero entries are drawn from a symmetric uniform distribution; the Echo State Network rescales the matrix to the desired spectral radius afterwards.
Topologies
random Erdos-Renyi G(n, p): each directed edge present with prob. p. small_world Watts-Strogatz ring lattice with random rewiring. scale_free Barabasi-Albert preferential attachment (degree power law). ring Simple deterministic uni-directional ring (delay line).
All functions accept a numpy.random.Generator (or an int seed) so that experiments are fully reproducible.
random_reservoir ¶
Erdos-Renyi reservoir: each directed edge present with prob. density.
Source code in esnfed/topologies.py
small_world_reservoir ¶
Watts-Strogatz small-world reservoir.
A ring lattice where each node connects to its k nearest neighbours, with each edge rewired with probability p. Small-world reservoirs combine high clustering with short path lengths.
Source code in esnfed/topologies.py
scale_free_reservoir ¶
Barabasi-Albert scale-free reservoir.
Growth with preferential attachment; each new node attaches to m existing nodes. Produces a power-law degree distribution with a few high-degree hubs.
Source code in esnfed/topologies.py
ring_reservoir ¶
Deterministic uni-directional ring (a.k.a. simple cycle reservoir).
Each node feeds the next; node n-1 feeds node 0. Despite its simplicity this minimal-complexity reservoir is competitive on many tasks (Rodan & Tino, 2011).
Source code in esnfed/topologies.py
make_reservoir ¶
Dispatch to a named generator from :data:GENERATORS.
Source code in esnfed/topologies.py
graph_metrics ¶
Return basic graph descriptors of a reservoir weight matrix.
Useful to characterise structural heterogeneity across federated nodes.
Source code in esnfed/topologies.py
leaking_rates ¶
Per-node leaking rates a for a heterogeneous / multi-scale reservoir.
Different neurons integrate at different speeds, which helps tasks with mixed time-scales (finance, chaotic systems such as Mackey-Glass).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n | Number of reservoir nodes. | required | |
kind |
| 'uniform' | |
low | Range of leaking rates (each in | 0.1 | |
high | Range of leaking rates (each in | 0.1 | |
n_layers | Number of decreasing blocks for | 3 | |
rng | Seed or | None |
Returns:
| Name | Type | Description |
|---|---|---|
a | ndarray of shape (n,) | Per-node leaking rates, ready to pass as |
Source code in esnfed/topologies.py
mixed_activations ¶
Assign a node nonlinearity to each reservoir node (multi-type reservoir).
Mixing activation functions within one reservoir mimics biological neuronal diversity and broadens the basis of dynamics. Returns an array of activation names ready to pass as EchoStateNetwork(activation=...).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n | Number of reservoir nodes. | required | |
types | Activation names to mix (see :data: | ('tanh', 'sigmoid', 'sin') | |
weights | Optional mixing proportions (defaults to uniform). | None | |
rng | Seed or | None |
Source code in esnfed/topologies.py
esnfed.datasets¶
Benchmark sequential tasks for Echo State Networks.
These are the standard tasks used to evaluate reservoir computing systems.
NARMA-10 A non-linear auto-regressive moving-average system with a 10-step memory (Atiya & Parlos, 2000). The task is to reproduce y from the random input u; it stresses both memory and non-linearity.
Mackey-Glass A delay differential equation (Mackey & Glass, 1977) that is mildly chaotic for tau = 17. The task is one-step-ahead prediction.
Lorenz The classic 3-variable chaotic attractor (Lorenz, 1963); we predict the next x-coordinate.
Real-world finance from_array / load_csv turn any 1-D series into a forecasting task; load_ted_spread loads a bundled real counterparty-risk series (the TED spread); load_fred downloads any series from FRED on demand. These power the federated counterparty-risk use case: institutions that cannot share raw data jointly forecast a credit/counterparty-risk signal.
All generators return (u, y) as float arrays of shape (T, 1).
SequenceDataset ¶
Bases: NamedTuple
A labelled sequence-classification dataset with a natural client split.
X_* are lists (or 3-D arrays) of per-sequence arrays (T_i, n_features); groups_* give the natural federation unit (speaker / subject id).
narma10 ¶
Generate a NARMA-10 input/target sequence of length n.
Source code in esnfed/datasets.py
mackey_glass ¶
mackey_glass(n: int, tau: int = 17, beta: float = 0.2, gamma: float = 0.1, power: int = 10, dt: float = 1.0, seed: int | None = None, discard: int = 250) -> tuple[np.ndarray, np.ndarray]
Generate a Mackey-Glass series; return (x_t, x_{t+1}) for prediction.
Source code in esnfed/datasets.py
lorenz ¶
lorenz(n: int, dt: float = 0.02, sigma: float = 10.0, rho: float = 28.0, beta: float = 8.0 / 3.0, seed: int | None = None, discard: int = 500) -> tuple[np.ndarray, np.ndarray]
Generate the Lorenz attractor; return (x_t, x_{t+1}) for prediction.
Source code in esnfed/datasets.py
split ¶
split(u: ndarray, y: ndarray, train_frac: float = 0.7) -> tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]
Chronological train/test split (no shuffling, to preserve dynamics).
Source code in esnfed/datasets.py
partition_iid ¶
partition_iid(u: ndarray, y: ndarray, n_clients: int, rng=None) -> list[tuple[np.ndarray, np.ndarray]]
Split a sequence into n_clients contiguous blocks (federated clients).
Contiguous blocks (rather than shuffled samples) keep each client's slice a valid time series, which is required for reservoir state harvesting.
Source code in esnfed/datasets.py
from_array ¶
from_array(series, *, predict: str = 'next', normalize: bool = True) -> tuple[np.ndarray, np.ndarray]
Turn a 1-D real series into a one-step-ahead forecasting task.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series | 1-D sequence of observations (list or array). | required | |
predict | str |
| 'next' |
normalize | bool | If true, z-score the series (recommended for tanh reservoirs). | True |
Returns:
| Type | Description |
|---|---|
(u, y) | Input/target arrays of shape (T-1, 1). |
Source code in esnfed/datasets.py
load_csv ¶
load_csv(path, column=None, *, predict: str = 'next', normalize: bool = True) -> tuple[np.ndarray, np.ndarray]
Load a column from a CSV file and build a forecasting task.
column may be a header name, an integer index, or None (last column). Missing markers ('', '.', 'NA', 'NaN', 'null') are dropped.
Source code in esnfed/datasets.py
load_ted_spread ¶
Load the bundled TED spread series (interbank/counterparty-risk measure).
The TED spread is the difference between the 3-month interbank rate and the 3-month Treasury bill; it is a classic gauge of perceived counterparty/credit risk in the banking system. Daily, 1986--2022.
Source: Federal Reserve Bank of St. Louis, FRED, series TEDRATE.
With raw=True returns the unprocessed 1-D level array instead of (u, y).
Source code in esnfed/datasets.py
load_fred ¶
load_fred(series_id: str, *, predict: str = 'next', normalize: bool = True, cache_dir=None, timeout: float = 30.0, raw: bool = False)
Download a series from FRED (no API key) and build a forecasting task.
Useful counterparty/credit-risk series include TEDRATE (TED spread), BAMLH0A0HYM2 (US high-yield credit spread), BAA10Y (Baa credit spread) and STLFSI4 / NFCI (financial stress/conditions). The CSV is cached on disk so repeated calls are offline.
Data are retrieved from FRED (Federal Reserve Bank of St. Louis) and are subject to FRED's terms of use.
Source code in esnfed/datasets.py
load_fred_matrix ¶
load_fred_matrix(series_ids, *, target=None, predict: str = 'next', normalize: bool = True, cache_dir=None, timeout: float = 30.0, raw: bool = False)
Download several FRED series and align them into a multivariate task.
The series are intersected on their common dates into a matrix (T, d); the task is to forecast the next-step value of the target series (default: the first) from the full d-dimensional vector. This exercises high-dimensional multivariate scaling --- e.g. forecasting counterparty risk (TEDRATE) from a panel such as ["TEDRATE", "VIXCLS", "DGS10", "DFF"].
Returns (u, y) with u of shape (T-1, d) and y of shape (T-1, 1) (or the raw (T, d) matrix if raw=True).
Source code in esnfed/datasets.py
load_japanese_vowels ¶
Japanese Vowels (UCI 128): 9 male speakers uttering /ae/, encoded as 12-dimensional LPC cepstra, with variable-length utterances (7--29 frames). The task is speaker identification (9 classes). The natural federation unit is the speaker --- which is also the label, giving an extreme label-skew split in which local-only training cannot work and federation is essential.
Source code in esnfed/datasets.py
load_har ¶
Human Activity Recognition Using Smartphones (UCI 240): 30 subjects, 6 activities, raw 3-axial accelerometer + gyroscope signals windowed into 128-timestep, 9-channel sequences at 50 Hz. The natural federation unit is the subject (groups): every subject performs all activities, so the split is feature-non-i.i.d. (people move differently) without label skew --- ideal for comparing the prediction ensemble against parameter aggregation.
The ~60 MB archive is downloaded once and the parsed arrays are cached as a compressed .npz for fast re-loading.
Source code in esnfed/datasets.py
group_clients ¶
Split labelled sequences into per-group client datasets.
Returns a list of (sequences, labels) pairs, one per unique value in groups (e.g. one client per speaker or per subject).
Source code in esnfed/datasets.py
esnfed.metrics¶
Error metrics for reservoir computing tasks.
mse ¶
rmse ¶
nrmse ¶
Normalised RMSE, scaled by the standard deviation of the target.
NRMSE = 1 corresponds to a trivial predictor outputting the target mean. This is the standard figure of merit for NARMA and Mackey-Glass tasks.
Source code in esnfed/metrics.py
r2_score ¶
Coefficient of determination R^2.
Source code in esnfed/metrics.py
esnfed.federated¶
Federated learning strategies for Echo State Networks.
In an ESN only the linear readout is trained, so the federated problem reduces to learning a shared (or combinable) readout across clients whose reservoirs may or may not share the same structure.
Strategies implemented
train_centralized Pool all client data and fit one readout (the privacy-violating upper bound). train_local Each client fits its own readout on its own data only (no collaboration). federated_ridge Exact federated training for a shared reservoir: clients exchange the ridge sufficient statistics A = Z^T Z and B = Z^T Y; the server sums them and solves once. Mathematically identical to pooled training, but no raw data leaves a client. fedavg Iterative FedAvg (McMahan et al., 2017) on the readout for a shared reservoir: clients run local gradient steps and the server averages weights each round. Produces an accuracy-vs-rounds curve. ensemble_predict For heterogeneous reservoirs: each client keeps its own ESN and the server averages their predictions (an ensemble), so no parameter averaging is needed. structural_alignment Interpolates heterogeneous reservoirs toward a shared target structure; at full alignment the readouts become averageable and exact federated ridge applies. Sweeps the alignment level to expose the transition.
Client dataclass ¶
A federated client: an ESN plus a local time-series partition.
states ¶
Harvest (and cache) post-washout extended states for local data.
train_centralized ¶
Fit one readout on the concatenation of all client states (upper bound).
Source code in esnfed/federated.py
train_local ¶
Each client fits its own readout on its own data only.
Source code in esnfed/federated.py
federated_ridge ¶
Exact federated readout via summed sufficient statistics (shared reservoir).
Returns the shared W_out. Clients transmit only A_k and B_k (which are independent of dataset size and reveal no individual samples), never raw data. The result equals :func:train_centralized.
Source code in esnfed/federated.py
federated_ridge_dp ¶
Differentially private federated ridge (shared reservoir).
Each client privatises its own statistics with the Gaussian mechanism (clip + noise, see :func:esnfed.privacy.dp_statistics) before they are summed and solved once. The readout is then :math:(\varepsilon, \delta)-DP w.r.t. every client's records. Independent noise is drawn per client from cfg.seed. Unlike :func:federated_ridge this is not exact -- the clipping and noise are the price of the formal privacy guarantee. The noisy, ill-conditioned Gram is solved with a ridge augmented by the spectral scale of the injected noise, so the readout stays well-posed at any budget.
Source code in esnfed/federated.py
federated_ridge_secure ¶
federated_ridge_secure(clients: list[Client], esn: EchoStateNetwork, *, seed: int | None = None, mask_scale: float = 1.0) -> np.ndarray
Exact federated ridge via secure aggregation (additive masking).
Clients mask their (A_k, B_k) with pairwise-cancelling noise (see :func:esnfed.privacy.secure_sum), so the server obtains only the masked sum and never an individual client's statistics. The masks cancel, so the solved readout equals :func:federated_ridge up to floating-point round-off (it is exact in the fixed-point/modular arithmetic of a real protocol).
Source code in esnfed/federated.py
fedavg ¶
fedavg(clients: list[Client], esn: EchoStateNetwork, u_test: ndarray, y_test: ndarray, *, rounds: int = 30, local_epochs: int = 5, lr: float = 1.0) -> tuple[np.ndarray, list[float]]
Iterative FedAvg on the readout for a shared reservoir.
lr is a normalised step in (0, 2); see :func:_local_gradient_step. Returns the final W_out and the list of global test NRMSE values, one per communication round.
Source code in esnfed/federated.py
ensemble_predict ¶
ensemble_predict(clients: list[Client], u_test: ndarray, weights: ndarray | None = None) -> np.ndarray
Average the predictions of locally-trained, heterogeneous client ESNs.
Each client must already have a trained readout (see :func:train_local). No parameter averaging is performed, so the clients may have completely different reservoir structures and input weights.
Source code in esnfed/federated.py
interpolate_reservoir ¶
Blend a client reservoir toward a shared target: (1-a)W_local + a W*.
structural_alignment ¶
structural_alignment(local_reservoirs: list[ndarray], target_reservoir: ndarray, partitions: list[tuple[ndarray, ndarray]], u_test: ndarray, y_test: ndarray, *, alphas=np.linspace(0.0, 1.0, 11), esn_kwargs: dict | None = None, shared_input_seed: int = 0) -> list[dict]
Sweep alignment level and report ensemble vs. parameter-averaging error.
At each alpha every client's reservoir is blended toward the shared target. Two readouts are evaluated on the global test set:
- ensemble -- clients keep individual readouts, predictions averaged;
- fedavg/ridge -- exact federated ridge over the (now more similar) reservoirs, valid in the limit
alpha = 1when structures coincide.
Returns one record per alpha with both test NRMSEs and the mean pairwise reservoir distance (a measure of remaining heterogeneity).
Source code in esnfed/federated.py
make_shared_clients ¶
make_shared_clients(reservoir: ndarray, partitions: list[tuple[ndarray, ndarray]], *, input_seed: int = 0, esn_kwargs: dict | None = None) -> tuple[list[Client], EchoStateNetwork]
Build clients that all share one reservoir and input weights (homogeneous).
Source code in esnfed/federated.py
make_heterogeneous_clients ¶
make_heterogeneous_clients(reservoirs: list[ndarray], partitions: list[tuple[ndarray, ndarray]], *, esn_kwargs: dict | None = None) -> list[Client]
Build clients each with its own reservoir and input weights (heterogeneous).
Source code in esnfed/federated.py
federated_prompt_average ¶
FedAvg of FedResPrompt controllers (the prompt-specific counterpart of :func:federated_ridge's statistics sharing).
Averages each client's readout W_out and bottleneck projection P and broadcasts the means back, so the edge devices converge on a shared prompt controller while keeping their data local. Clients are duck-typed: each must expose W_out and projection.P (see :class:esnfed.llm_orchestration.EdgeClient).
Source code in esnfed/federated.py
esnfed.classification¶
Sequence classification with Echo State Networks, and its federated variants.
For classification each input is a (possibly variable-length) sequence that maps to a single class label. The reservoir turns each sequence into a fixed feature vector (the mean or last extended state); a ridge readout on one-hot targets then classifies by argmax. Because the readout is again a ridge regression, the exact federated scheme of :mod:esnfed.federated carries over unchanged: clients exchange the summed statistics A = F^T F and B = F^T Y and the server solves once, recovering the pooled classifier exactly.
reservoir_features ¶
Map each sequence to a fixed feature vector via the reservoir.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequences | Iterable of arrays of shape | required | |
pool | str |
| 'mean' |
Source code in esnfed/classification.py
class_statistics ¶
Ridge sufficient statistics (A, B) for a set of labelled sequences.
train_classifier ¶
Centralised classifier readout (ridge on reservoir features).
Source code in esnfed/classification.py
federated_classifier ¶
Exact federated classifier: sum each client's (A_k, B_k) and solve once.
client_data is a list of (sequences, labels) pairs. The result equals centralised training on the pooled data, but no client shares raw sequences.
Source code in esnfed/classification.py
predict_labels ¶
Predict class labels (argmax of the readout) for each sequence.
predict_proba ¶
Softmax class probabilities for each sequence.
Source code in esnfed/classification.py
ensemble_classify ¶
Average the class probabilities of heterogeneous member classifiers.
members is a list of (esn, W_out) pairs (each may have its own reservoir); predictions are combined by averaging softmax probabilities.
Source code in esnfed/classification.py
esnfed.privacy¶
Privacy-preserving aggregation for exact federated ridge.
Exact federated ridge (:mod:esnfed.federated) already keeps raw samples on the device: clients exchange only the sufficient statistics A = Z^T Z and B = Z^T Y. Those summed second moments still encode information about individual records, so this module hardens the exchange in two complementary ways, both NumPy-only:
- Differential privacy (:func:
dp_statistics) -- output privacy. Each client clips its per-record contribution and adds calibrated Gaussian noise to(A, B), giving a formal :math:(\varepsilon, \delta)guarantee on the statistics it releases (Dwork & Roth, 2014). Even the released statistics cannot be used to single out a record. - Secure aggregation (:func:
secure_sum) -- aggregation privacy. Clients add pairwise-cancelling random masks, so the server learns only the sumsum_k (A_k, B_k)and never any individual client's statistics (Bonawitz et al., 2017).
The two compose: a client can privatise its statistics and then mask them. See :func:esnfed.federated.federated_ridge_dp and :func:esnfed.federated.federated_ridge_secure for the client-level wrappers.
PrivacyConfig dataclass ¶
PrivacyConfig(epsilon: float, delta: float = 1e-05, clip_state: float = 1.0, clip_target: float = 1.0, seed: int | None = None)
:math:(\varepsilon, \delta)-DP configuration for releasing (A, B).
clip_state (:math:C_z) and clip_target (:math:C_y) bound the per-record L2 norms of the extended state and the target; this fixes the mechanism's sensitivity. Smaller clips mean less noise but more bias.
sensitivity ¶
Joint L2 (Frobenius) sensitivity of (A, B) to one record.
Adding/removing one record changes A by z z^T and B by z y^T; with ||z|| <= C_z and ||y|| <= C_y their Frobenius norms are C_z^2 and C_z C_y, so the joint sensitivity of the released pair is C_z * sqrt(C_z^2 + C_y^2).
Source code in esnfed/privacy.py
gaussian_sigma ¶
gaussian_sigma(epsilon: float, delta: float, sensitivity: float, *, method: str = 'analytic') -> float
Noise std dev for the Gaussian mechanism under :math:(\varepsilon,\delta)-DP.
method="analytic" (default) uses the analytic Gaussian mechanism (Balle & Wang, 2018): the smallest sigma for which the mechanism is :math:(\varepsilon, \delta)-DP, found by a short bisection. It is valid for any epsilon > 0 and is never looser than the classic bound.
method="classic" uses the textbook bound (Dwork & Roth, 2014, App. A), :math:\sigma = \Delta_2 \sqrt{2\ln(1.25/\delta)} / \varepsilon, which is only valid for epsilon <= 1 (a warning is issued above that).
Source code in esnfed/privacy.py
clip_rows ¶
Scale each row of M so its L2 norm is at most max_norm.
Source code in esnfed/privacy.py
dp_statistics ¶
Differentially private sufficient statistics (A, B) for one client.
Clips each post-washout record to the norms in cfg, forms A, B from the clipped data, and adds Gaussian noise calibrated by :func:gaussian_sigma to every entry (the Gaussian mechanism). A is symmetrised afterwards (post-processing, which preserves DP). The result is :math:(\varepsilon, \delta)-DP w.r.t. the client's records; summing the private statistics across clients and solving yields a private federated readout. Because of clipping and noise this is not exact (unlike :func:esnfed.federated.federated_ridge) -- it trades accuracy for privacy.
Source code in esnfed/privacy.py
zero_sum_masks ¶
Return n mask arrays of shape that sum to (numerically) zero.
Pairwise masking (Bonawitz et al., 2017): for each pair (i, j) a shared random mask is added by client i and subtracted by client j, so every client's contribution is hidden yet all masks cancel on summation.
Source code in esnfed/privacy.py
secure_sum ¶
Sum per-client arrays without revealing any individual one.
Each client adds a pairwise-cancelling mask before sending; the masks cancel in the sum, so the server recovers sum_k arrays[k] (exactly in fixed-point/modular arithmetic; here up to floating-point round-off) while never seeing an unmasked client array.
Source code in esnfed/privacy.py
esnfed.streaming¶
Incremental / streaming ridge for continual (federated) learning.
The ridge readout depends on the data only through the sums A = Z^T Z and B = Z^T Y (see :mod:esnfed.federated), so training can be made incremental simply by accumulating those sums as new data arrives -- the result is identical to batch training on all data seen so far. Two tools are provided:
- :class:
StreamingRidge-- accumulate(A, B)over chunks and solve on demand. :meth:StreamingRidge.mergeadds another accumulator's statistics, which is exactly the federated sum, so clients can stream locally and the server periodically merges and re-solves (continual federated learning). - :class:
RLSReadout-- recursive least squares: rank-1 updates of the readout and of the inverse Gram matrix (Sherman-Morrison) atO(D^2)per sample, for true per-sample online learning without re-solving. With unit forgetting it converges to the same readout as batch ridge.
StreamingRidge dataclass ¶
Accumulate ridge sufficient statistics incrementally; solve on demand.
Exact: after any sequence of :meth:update calls the readout equals batch ridge over all data seen so far.
update ¶
Accumulate a new batch of extended states Z and targets Y.
Source code in esnfed/streaming.py
merge ¶
Add another accumulator's statistics (exactly the federated sum).
RLSReadout dataclass ¶
Recursive least squares readout (online ridge via Sherman-Morrison).
Maintains the readout W and the inverse Gram P = (sum z z^T + ridge I)^{-1} and updates both with each sample at O(D^2) cost. Initialised with P = I / ridge so the ridge term acts as the usual Tikhonov prior; with forgetting = 1.0 the readout after processing all samples equals batch ridge. forgetting < 1 down-weights old samples (useful for non-stationary streams).
update ¶
One rank-1 update from a single record (z, y).
Source code in esnfed/streaming.py
update_batch ¶
Apply :meth:update to each row of (Z, Y) in order.
Source code in esnfed/streaming.py
esnfed.interop¶
Interoperability with ReservoirPy.
Use ReservoirPy <https://reservoirpy.readthedocs.io>_ to design reservoirs (its strength: rich node API, hyper-parameter search) and esnfed to federate them. These adapters lift the reservoir and input weights out of a ReservoirPy Reservoir node and wrap them in an :class:esnfed.EchoStateNetwork, so a reservoir tuned in ReservoirPy can be dropped straight into the federated strategies of :mod:esnfed.federated.
ReservoirPy is an optional dependency::
pip install "esnfed[reservoirpy]"
reservoir_matrix ¶
Extract the dense recurrent weight matrix W from a ReservoirPy reservoir.
input_matrix ¶
Build an esnfed input matrix [bias | Win] from a ReservoirPy reservoir.
esnfed packs the bias into column 0 of the input matrix, whereas ReservoirPy keeps a separate bias vector; this helper reconciles the two conventions.
Source code in esnfed/interop.py
to_esn ¶
to_esn(reservoir, *, n_inputs: int = 1, n_outputs: int = 1, use_input_weights: bool = True, spectral_radius: float | None = None, **esn_kwargs) -> EchoStateNetwork
Wrap a ReservoirPy Reservoir as an :class:esnfed.EchoStateNetwork.
By default the reservoir's own spectral radius and leaking rate are preserved (so the dynamics are unchanged) and its input weights and bias are reused. Pass spectral_radius or other ESN keyword arguments to override.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
reservoir | A | required | |
n_inputs | int | Task dimensions (used to initialise the reservoir if needed). | 1 |
n_outputs | int | Task dimensions (used to initialise the reservoir if needed). | 1 |
use_input_weights | bool | If true, reuse ReservoirPy's input weights and bias; otherwise let esnfed draw fresh random input weights. | True |
Source code in esnfed/interop.py
esnfed.viz¶
Visualisation utilities for Echo State Networks (optional).
Four ways to see an ESN:
- :func:
plot_reservoir-- the reservoir connectivity as a network graph (nodes sized/coloured by degree); makes the topology differences visible. - :func:
plot_spectrum-- the reservoir eigenvalues in the complex plane with the unit circle and spectral radius; makes the echo state property visible. - :func:
plot_states-- a sample of reservoir activations over time (the "echoes"); shows whether the reservoir is rich or saturated. - :func:
plot_forecast-- predicted vs. actual with the NRMSE in the title.
Three backends are supported and selected with backend=: "plotly" (the default, interactive), "matplotlib" and "seaborn" (seaborn is matplotlib with the seaborn theme applied). The plotting libraries are imported lazily, so importing this module never pulls them in; install them with::
pip install "esnfed[viz]"
Each function returns the native figure object (a Plotly Figure or a Matplotlib Figure); use :func:save or the object's own methods to render.
plot_reservoir ¶
plot_reservoir(obj, *, backend: str = DEFAULT_BACKEND, layout: str = 'spring', seed: int = 0, title: str | None = None)
Draw the reservoir connectivity as a network graph.
Source code in esnfed/viz.py
plot_spectrum ¶
Plot the reservoir eigenvalues with the unit circle and spectral radius.
Source code in esnfed/viz.py
plot_states ¶
plot_states(esn, u, *, n_neurons: int = 8, max_steps: int = 300, backend: str = DEFAULT_BACKEND, seed: int = 0, title: str | None = None)
Plot a sample of reservoir activations over time.
Source code in esnfed/viz.py
plot_forecast ¶
plot_forecast(y_true, y_pred, *, backend: str = DEFAULT_BACKEND, washout: int = 0, title: str | None = None, max_points: int = 2000)
Overlay predicted vs. actual, with the NRMSE in the title.
Source code in esnfed/viz.py
save ¶
Save a figure produced by this module.
Plotly figures are written to .html (interactive) or, if the extension is an image format and kaleido is installed, to a static image. Matplotlib figures are written with savefig.
Source code in esnfed/viz.py
esnfed.llm_orchestration¶
FedResPrompt - Federated Reservoir Prompt Orchestration (experimental).
A split-federated architecture in which an Echo State Network acts as an ultra-lightweight prompt controller at the edge. The reservoir maps local context to a small "bottleneck" vector, which a linear projection lifts into a language model's embedding space to form a soft prompt. Only the soft-prompt embedding (uplink) and its loss gradient (downlink) cross the client-server boundary; the heavy language model lives on the server.
local context --ESN--> state z --W_out--> bottleneck b --P--> soft prompt p
| (uplink)
v
server LM: loss + dL/dp | (downlink)
v
update P and W_out locally <--- backprop dL/dp through P and W_out
Why it matters. Classical federated LLM tuning (e.g. federated LoRA) ships adapter weights for every layer each round; FedResPrompt ships only a single soft-prompt vector and its gradient, which is orders of magnitude smaller, and keeps the forward/backward pass of the frozen model on the server, off the edge device. The communication and edge-compute savings are analysed in experiments/exp7_fedres_prompt.py.
The server language model is pluggable. By default a lightweight NumPy surrogate is used --- a frozen linear soft prompt -> vocabulary head with the exact cross-entropy gradient with respect to the prompt --- so the architecture runs anywhere with only NumPy. If transformers (with a torch backend) is installed, a real AutoModelForCausalLM can be plugged in via :class:TransformersLM (the gradient w.r.t. the input embedding is obtained from the autograd backward pass).
This module is experimental and optional: pip install "esnfed[llm]".
BottleneckProjection ¶
Linear projection from the bottleneck space (R^k) to the LM embedding space (R^{n_tokens x d}), trained by SGD.
The soft prompt is p = P b reshaped to (n_tokens, d).
Source code in esnfed/llm_orchestration.py
SurrogateLM ¶
A lightweight, frozen stand-in for a server-side language model.
A single linear head U maps a (mean-pooled) soft prompt to logits over a small vocabulary. It is frozen (as the LLM is in prompt tuning) and returns the cross-entropy loss together with the exact gradient with respect to the prompt embedding -- exactly the signal a real CausalLM would back-propagate to its input embeddings.
Source code in esnfed/llm_orchestration.py
loss_and_grad ¶
prompt: (n_tokens, d) -> (loss, dL/dprompt with same shape).
Source code in esnfed/llm_orchestration.py
EdgeClient dataclass ¶
EdgeClient(esn: EchoStateNetwork, bottleneck_dim: int, embed_dim: int, n_prompt_tokens: int = 1, seed: int = 0, lr: float = 0.05)
An edge device: a (frozen) reservoir + a trainable readout and projection.
The reservoir and its input weights are fixed; only W_out (state -> bottleneck) and the :class:BottleneckProjection are learned, by back-propagating the gradient the server returns.
apply_server_gradient ¶
Back-propagate dL/dp through P and W_out and take an SGD step.
Source code in esnfed/llm_orchestration.py
Server dataclass ¶
The server hosting the (frozen) language model.
evaluate ¶
Run the LM forward/backward; return (loss, dL/dprompt).
Source code in esnfed/llm_orchestration.py
TransformersLM ¶
A real, frozen server-side CausalLM (optional), e.g. a small Qwen.
The soft prompt is fed as inputs_embeds; the next-token logits after the prompt give the cross-entropy against a target token id, and the gradient with respect to the prompt is read from the autograd backward pass --- the same interface as :class:SurrogateLM, so it is a drop-in for :class:Server.
Requires transformers and torch (pip install "esnfed[llm]" plus a torch build). Tested with Qwen/Qwen2.5-0.5B.
Source code in esnfed/llm_orchestration.py
loss_and_grad ¶
Next-token cross-entropy against token id target and dL/dprompt.
Source code in esnfed/llm_orchestration.py
restricted_loss_and_grad ¶
Cross-entropy over only the candidate class tokens (the standard way to do classification with an LLM) and dL/dprompt. target_idx indexes into class_ids.
Source code in esnfed/llm_orchestration.py
logits ¶
Next-token logit vector (numpy) for the given soft prompt.
split_federated_step ¶
One split-federated example: client builds prompt, server scores it, client updates from the returned gradient. Returns the loss.
Source code in esnfed/llm_orchestration.py
fedresprompt_bytes_per_round ¶
fedresprompt_bytes_per_round(d_model: int, n_prompt_tokens: int = 1, dtype_bytes: int = FLOAT_BYTES) -> int
Client<->server bytes per example: soft prompt up + its gradient down.
fedlora_bytes_per_round ¶
fedlora_bytes_per_round(d_model: int, n_layers: int, rank: int, adapters_per_layer: int = 2, dtype_bytes: int = FLOAT_BYTES) -> int
Bytes per round for federated LoRA: adapter weights up + aggregate down.
Each adapted projection contributes two low-rank factors of size rank x d_model (A and B), so 2 * rank * d_model parameters; there are adapters_per_layer of them per layer (e.g. query and value).
Source code in esnfed/llm_orchestration.py
llm_flops ¶
esn_edge_flops ¶
Edge cost of FedResPrompt: reservoir run + readout (no LLM on device).