Fisheries play multi-faceted roles in our society, economy, and environment, and the management decisions often involve competing driving forces. The need to account for multiple and possibly objectives make sustainable fishery management a highly challenging task. This is further compounded by the large amount of uncertainties present in the problem: in particular, our knowledge of the fishery system is limited, and the state of the fishery system is not directly observable. The Partially Observable Markov Decision Processes (POMDPs) - a general principled framework for sequential decision making for partially observable environments - is well-suited for sustainable fishery management: it is able to account for the long-term effect of actions, and it can conveniently take uncertainties into account. A few recent works have explored the potential of using POMDPs for sustainable fishery management. In this paper, we leverage recent advances in two sub-fields of machine learning, namely, deep learning and reinforcement learning, to develop a novel approach for sustainable fishery management using POMDPs. We first propose an offline reinforcement learning approach for sustainable fishery management. While typical reinforcement learning approaches learn an optimal policy by directly interacting with the environment, offline reinforcement learning approaches learn an optimal policy using a dataset of past interactions with the environment. The use of past data instead of direct interventions is a highly desirable feature for fishery management - this has been exploited in the literature of management strategy evaluation too. We believe this perspective will allow us to tap into recent advances in offline reinforcement learning. Our second contribution is a new algorithm, MOOR, which stands for MOdel-based Offline Reinforcement learning algorithm for sustainable fishery management. MOOR first learns a POMDP fishery dynamics model using catch and effort data, and then solves the POMDP using a state-of-the-art solver. In the model learning step, we view the POMDP fishery dynamics model as a recurrent neural net (RNN), and leverage RNN learning techniques to learn the model. This presents some new challenges, but we show that these can be overcome with a few tricks to yield a very effective learning algorithm. Finally, MOOR demonstrates strong performance in preliminary simulation studies. The learned models are generally very similar to the true models. In addition, the management policies obtained using the learned models perform similarly as the optimal management policies for the true models. While previous POMDP studies for fishery management evaluate policy performance in the learned model, we evaluate the policy in the true model, thus our results suggest that it is possible to develop a POMDP approach that can be robust against mild model learning error. Moreover, although this paper focuses on fisheries applications, the approach is general enough for other problems where the dynamics are nonlinear, though further research are needed to understand the extent and efficiency of the method on other domains. Our source code will be made available after the publication of the work.
CITATION STYLE
Ju, J., Kurniawati, H., Kroese, D., & Ye, N. (2021). MOOR: Model-based Offline Reinforcement Learning for Sustainable Fishery Management. In Proceedings of the International Congress on Modelling and Simulation, MODSIM (pp. 771–777). Modelling and Simulation Society of Australia and New Zealand Inc. (MSSANZ). https://doi.org/10.36334/modsim.2021.m2.ju
Mendeley helps you to discover research relevant for your work.