Deep reinforcement learning for practical closed-loop reservoir management

Nasir, Yusuf

Deep reinforcement learning for practical closed-loop reservoir management

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fkc218fg2056" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Traditional closed-loop reservoir management (CLRM) entails the repeated application of history matching (based on newly observed data) followed by optimization of well settings. These procedures are computationally expensive due to the large number of flow simulations required for history matching and optimization. The history matching step can also be particularly challenging in cases where both the geological style (scenario) and individual model realizations are uncertain. In this thesis, we introduce a general control policy framework based on deep reinforcement learning (DRL) for CLRM. The CLRM problem is formulated here as a partially observable Markov decision process (POMDP), with the associated optimization problem solved using a proximal policy optimization algorithm. This provides a control policy that instantaneously maps flow data measured at wells (as are available in practice) to optimal well pressure settings. The policy is represented by a temporal convolution and gated transformer blocks. Training is performed in a preprocessing step with an ensemble of prior geological models, which can be drawn from multiple geological scenarios. Example cases involving the production of oil via water injection, with both 2D and 3D geological models, are presented. The DRL-based methodology is shown to result in an increase in net present value (NPV) of 15\% (for the 2D cases) and 33\% (3D cases) relative to robust optimization over prior models, and to an improvement of 2 - 7\% in NPV relative to traditional CLRM. The solutions from the control policy are found to be comparable to those from deterministic optimization, in which the geological model is assumed to be known, even when multiple geological scenarios are considered. The control policy approach results in a 76\% decrease in computational cost relative to traditional CLRM with the algorithms and parameter settings considered in this work. Next, we incorporate treatments into the control-policy-based framework to facilitate the practical applicability of the CLRM results. Existing CLRM treatments can provide well settings that fluctuate substantially between control steps, which may not be acceptable in practice. In addition, the time frame for the optimization is often specified somewhat arbitrarily. We introduce a procedure in which we train control policies, using DRL, to find optimal well bottom-hole pressures for prescribed relative changes between control steps, with the project life also treated as an optimization variable. The goal of the optimizations is to maximize NPV, with project life determined such that a minimum acceptable rate of return (MARR) is achieved. We again apply the framework to 2D and 3D water-flooding cases. Solutions from the control-policy approach are shown to be comparable, in terms of NPV, to those from deterministic realization-by-realization optimization, and clearly superior to results from robust optimization over prior models. These observations hold for a range of specified MARR and relative-change values. The optimal well settings provided by the control policy are shown to vary gradually, consistent with operational requirements. Finally, we extend the framework to treat multiple assets with varying numbers of wells. Existing CLRM procedures are applied asset by asset, without exploiting information that could be useful over a range of assets. We use DRL to train a single global control policy that is applicable to all assets considered. Embedding layers are incorporated into the representation to handle the different numbers of decision variables in different assets. Because the global control policy learns a unified representation of useful features from multiple assets, it is less expensive to construct than asset-by-asset training (we observe about a factor of 3 speedup in our examples). Four assets (in 2D and 3D) with different well counts, well configurations, and geostatistical descriptions are considered. Results demonstrate that the global control policy provides NPVs, for 2D and 3D water-flooding cases, that are nearly identical to those from control policies trained individually for each asset. This promising finding suggests that multi-asset CLRM may indeed represent a viable practical strategy. This thesis also includes an appendix describing a two-stage strategy for determining the optimal well locations, type, counts, and drilling sequence to be applied for field development. A traditional optimization framework is used for this work.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2022; ©2022
Publication date	2022; 2022
Issuance	monographic
Language	English

Creators/Contributors

Author	Nasir, Yusuf
Degree supervisor	Durlofsky, Louis
Thesis advisor	Durlofsky, Louis
Thesis advisor	Horne, Roland N
Thesis advisor	Volkov, Oleg, 1975-
Degree committee member	Horne, Roland N
Degree committee member	Volkov, Oleg, 1975-
Associated with	Stanford University, Department of Energy Resources Engineering

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Yusuf Nasir.
Note	Submitted to the Department of Energy Resources Engineering.
Thesis	Thesis Ph.D. Stanford University 2022.
Location	https://purl.stanford.edu/kc218fg2056

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...