Dynamic scheduling in large-scale manufacturing processing systems using multi-agent reinforcement learning

Qu, Shuhui

Dynamic scheduling in large-scale manufacturing processing systems using multi-agent reinforcement learning

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fnb818qq6683" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Scheduling in manufacturing plays an essential role in building smart manufacturing from multiple points of view, including social, economic, and environmental. Optimal scheduling, or the allocation of jobs with different requirements for a manufacturing processing system to meet various objectives, has been discussed for several decades. However, advanced scheduling methods in modern processing systems have not significantly improved, nor have they been widely adopted by staff working on manufacturing production lines despite extensive research conducted into scheduling. Most traditional scheduling methods require statistical assumptions, which cannot support operations for a dynamic and stochastic modern processing system. In addition, most proposed scheduling methods are not sufficiently scalable for managing real-world, large-scale processing systems. To address these limitations, we focus on the dynamic scheduling approach, which involves scheduling real-time events in large-scale modern manufacturing systems, from a data-driven perspective. We implement reinforcement learning (RL) to learn adaptive, scalable, and optimal dynamic scheduling policies, since RL can learn the underlying processing system's patterns and adaptively make allocation decisions based on real-time job and server measurements. The direct application of existing RL methods on the scheduling problem in such large-scale processing systems is impractical and undesired due to the extremely high computational complexity of learning a good scheduling policy. This thesis presents a practical and systematic computational framework that integrates RL with existing expert knowledge at three levels: (1) System-level planning. The planning procedure characterizes the processing system by the nominal feasible region of the scheduling problem. (2) Algorithm-level design. The design of the algorithm in RL is carefully selected as the index-policy-based, multi-agent RL, significantly reducing control policy search complexity. (3) Learning-level demonstration. During the learning process of RL, the existing expert knowledge is used as a demonstration to increase search efficiency and stabilize the RL learning process. We conduct various experiments in both real factory scenarios and simulated environments to evaluate the performance of the framework on processing system scheduling problems. The effectiveness of the proposed index-policy-based, multi-agent reinforcement learning (MARL) method is evidenced by its performance over traditional dynamic scheduling methods, with a linear computational time complexity in regard to the number of machines and job classes.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2019; ©2019
Publication date	2019; 2019
Issuance	monographic
Language	English

Creators/Contributors

Author	Qu, Shuhui
Degree supervisor	Leckie, James M
Thesis advisor	Leckie, James M
Thesis advisor	Law, K. H. (Kincho H.)
Thesis advisor	Wang, Jie
Degree committee member	Law, K. H. (Kincho H.)
Degree committee member	Wang, Jie
Associated with	Stanford University, Civil & Environmental Engineering Department.

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Shuhui Qu.
Note	Submitted to the Department of Civil and Environmental Engineering.
Thesis	Thesis Ph.D. Stanford University 2019.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...