Planning & Management
The coexistence of multiple conflicting water users is a major challenge to traditional DP-based optimization of reservoir operation, with Direct Policy Search (DPS) emerging as a potential alternative. DPS defines the operating policy within a given functional parameterization, and explores the policy parameters space searching for the best solution with respect to a given set of objectives. The selection of the functional parameterization is crucial as non-optimal choices may restrict the policy search to a decision subspace that does not include the optimal solution.
Many DPS works apply a piecewise-linear policy parameterization, albeit conditioning the release decision on trivial monodimensional state vectors. Nonlinear multi-input multi-output approximating networks provide a more flexible operating policy shape, but require the specification of a topology (e.g., number and layering of nodes and connections) which is crucial to determine the network processing capability and calibration requirements.
Besides, in multi-objective problems the fitness landscape changes depending on the selected tradeoff and the optimal network topology should be set accordingly.
This work builds on a recent reinforcement learning branch called Neuro-Evolution, which employs Evolutionary Algorithms to generate optimal network topologies. By embedding Neuro-EVOlutionary techniques into the DPS framework (NEVODPS), the policy search problem can be expanded to dynamically and conjunctively searching the parameter and the topology space across the entire Pareto Front. NEVODPS begins with a population of minimally structured networks and progressively builds more sophisticated ones applying topological and parametrical mutation and crossover, and selection of the fittest individuals, until convergence is reached.
We tested the NEVODPS approach on the Lake Como case study, a regulated lake located in Northern Italy, which is operated trading off irrigation supply and flood control. Numerical results show that the Pareto-dynamic structural and parametrical policy search of NEVODPS outperforms the solutions designed via traditional DPS with predefined policy topologies.