### The flexibility of large scale batteries and Virtual Power Plants (VPPs) is used for near real-time imbalance management and (continuous intra-day) energy trading.

The underlying decision making problem can be quite daunting, this is a direct result of its sequential nature, inter-temporal dynamics, uncertainty (and underlying risk), partial observability and non-linearities [when being a price-maker, next to being in a multi-agent setting competing against agents with similar strategies]. In the last years there has been a growing interest in imbalance management and intra-day trading due to the limited market depth of battery-friendly ancillary services and a surge in (announced) battery projects putting pressure on prices, e.g. in the UK the revenue from ancillary services has all but dried up, nowadays any optimizer worth her salt flaunts its multi-market algorithmic trading prowess. We consider 3 paradigms to solve the underlying optimization problem.

A first paradigm, which is uncomfortably powerful (if one likes tinkering with algorithms), is that of “*simple*” expert-based rules captured in e.g. a decision tree.

A second paradigm followed by many (if not nearly all) algo-traders is that of mathematical programming. Scores of PhD students have mulled over solving near real-time decision making in the context of the problem above with multi-stage stochastic programming in all shapes and forms. If the reader would like to catch up on this, an excellent PhD thesis by Priyanka can be found here. In our view, this strong focus on mathematical programming is to a large extent driven by the presence of a fundamental linear structure in the decision making problem (linear cost function, linear dynamics) in combination with easy (and cheap at research level) access to powerful solvers such as CPLEX, Gurobi and the likes. These methods, in our opinion, can get you pretty far but start showing cracks when one is confronted with large fleets of assets (with coupling constraints), a realistic multi-market setting with coupled trading decisions and when one can no longer be considered a price taker and/or one is competing with other agents. To solve some of these challenges, elaborate bi-level optimization schemes have been explored, e.g. by Zugno and Smets, and although powerful concepts, in our view these methods sacrifice (too much) model fidelity on the altar of mathematical programming by shoehorning the decision making problem into a structure that typical mathematical programming approaches can get away with. Currently however, they are the main work horse for most algo-traders.

A third paradigm is that of reinforcement learning, a potpourri of concepts, tricks and algorithmic coterie. These approaches have in the last years demonstrated remarkable performance, e.g. in the context of nuclear fusion or drug discovery. Reinforcement learning carries a promise that one has to make far less compromises on the model-fidelity compared to mathematical programming. A challenge however is that it is not trivial to obtain stable and reproducible solutions that generalize in an intuitive way over non-observed states. An example of how reinforcement learning can be used for continuous intra-day trading can be found here .

Closer to Beebop, Soroush quite recently published the results of close to a year of research on how to get stable and reliable solutions in the context of a battery for imbalance management in a European power system. We consider distributional reinforcement learning in combination with a policy gradient method and careful policy design as an excellent **starting** point to obtain a high-performance policy without making uncomfortable model assumptions.

At Beebop, we look at these problems from the perspective of large heterogeneous fleets of decentralized assets such as heat pumps, batteries and electric vehicles embedded in a distribution grid, which has (at least) all the complexities of the above. To obtain a truly practical scalable and performant approach, the Beebop team is blending all of the paradigms above, mathematical programming, reinforcement learning and rule-based control into one framework that allows us to harness the flexibility of large heterogeneous assets in a grid-secure, multi-market optimization approach respecting each asset’s constraints and cost.

**Let’s get a whiteboard, a strong coffee and start cracking on future energy solutions**