Deep reinforcement learning for practical closed-loop reservoir management

Placeholder Show Content

Abstract/Contents

Abstract
Traditional closed-loop reservoir management (CLRM) entails the repeated application of history matching (based on newly observed data) followed by optimization of well settings. These procedures are computationally expensive due to the large number of flow simulations required for history matching and optimization. The history matching step can also be particularly challenging in cases where both the geological style (scenario) and individual model realizations are uncertain. In this thesis, we introduce a general control policy framework based on deep reinforcement learning (DRL) for CLRM. The CLRM problem is formulated here as a partially observable Markov decision process (POMDP), with the associated optimization problem solved using a proximal policy optimization algorithm. This provides a control policy that instantaneously maps flow data measured at wells (as are available in practice) to optimal well pressure settings. The policy is represented by a temporal convolution and gated transformer blocks. Training is performed in a preprocessing step with an ensemble of prior geological models, which can be drawn from multiple geological scenarios. Example cases involving the production of oil via water injection, with both 2D and 3D geological models, are presented. The DRL-based methodology is shown to result in an increase in net present value (NPV) of 15\% (for the 2D cases) and 33\% (3D cases) relative to robust optimization over prior models, and to an improvement of 2 - 7\% in NPV relative to traditional CLRM. The solutions from the control policy are found to be comparable to those from deterministic optimization, in which the geological model is assumed to be known, even when multiple geological scenarios are considered. The control policy approach results in a 76\% decrease in computational cost relative to traditional CLRM with the algorithms and parameter settings considered in this work. Next, we incorporate treatments into the control-policy-based framework to facilitate the practical applicability of the CLRM results. Existing CLRM treatments can provide well settings that fluctuate substantially between control steps, which may not be acceptable in practice. In addition, the time frame for the optimization is often specified somewhat arbitrarily. We introduce a procedure in which we train control policies, using DRL, to find optimal well bottom-hole pressures for prescribed relative changes between control steps, with the project life also treated as an optimization variable. The goal of the optimizations is to maximize NPV, with project life determined such that a minimum acceptable rate of return (MARR) is achieved. We again apply the framework to 2D and 3D water-flooding cases. Solutions from the control-policy approach are shown to be comparable, in terms of NPV, to those from deterministic realization-by-realization optimization, and clearly superior to results from robust optimization over prior models. These observations hold for a range of specified MARR and relative-change values. The optimal well settings provided by the control policy are shown to vary gradually, consistent with operational requirements. Finally, we extend the framework to treat multiple assets with varying numbers of wells. Existing CLRM procedures are applied asset by asset, without exploiting information that could be useful over a range of assets. We use DRL to train a single global control policy that is applicable to all assets considered. Embedding layers are incorporated into the representation to handle the different numbers of decision variables in different assets. Because the global control policy learns a unified representation of useful features from multiple assets, it is less expensive to construct than asset-by-asset training (we observe about a factor of 3 speedup in our examples). Four assets (in 2D and 3D) with different well counts, well configurations, and geostatistical descriptions are considered. Results demonstrate that the global control policy provides NPVs, for 2D and 3D water-flooding cases, that are nearly identical to those from control policies trained individually for each asset. This promising finding suggests that multi-asset CLRM may indeed represent a viable practical strategy. This thesis also includes an appendix describing a two-stage strategy for determining the optimal well locations, type, counts, and drilling sequence to be applied for field development. A traditional optimization framework is used for this work.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2022; ©2022
Publication date 2022; 2022
Issuance monographic
Language English

Creators/Contributors

Author Nasir, Yusuf
Degree supervisor Durlofsky, Louis
Thesis advisor Durlofsky, Louis
Thesis advisor Horne, Roland N
Thesis advisor Volkov, Oleg, 1975-
Degree committee member Horne, Roland N
Degree committee member Volkov, Oleg, 1975-
Associated with Stanford University, Department of Energy Resources Engineering

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Yusuf Nasir.
Note Submitted to the Department of Energy Resources Engineering.
Thesis Thesis Ph.D. Stanford University 2022.
Location https://purl.stanford.edu/kc218fg2056

Access conditions

Copyright
© 2022 by Yusuf Nasir
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...