CCE Theses and Dissertations

Date of Award

2022

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

College of Computing and Engineering

Advisor

Michael J. Laszlo

Committee Member

Francisco J. Mitropoulos

Committee Member

Sumitra Mukherjee

Keywords

context revealing elements, deep reinforcement learning, domain randomization, reality gap, synthetic environments, Unity Machine Learning Agents (ML-Agents)

Abstract

Deep Reinforcement Learning (DRL) has the capability to solve many complex tasks in robotics, self-driving cars, smart grids, finance, healthcare, and intelligent autonomous systems. During training, DRL agents interact freely with the environment to arrive at an inference model. Under real-world conditions this training creates difficulties of safety, cost, and time considerations. Training in synthetic environments helps overcome these difficulties, however, this only approximates real-world conditions resulting in a ‘reality gap’. The synthetic training of agents has proven advantageous but requires methods to bridge this reality gap. This work addressed this through a methodology which supports agent learning. A framework which incorporates a modifiable synthetic environment integrated with an unmodified DRL algorithm was used to train, test, and evaluate agents while using a modified Structured Domain Randomization (SDR+) technique.

It was hypothesized that the application of environment domain randomizations (DR) during the learning process would allow the agent to learn variability and adapt accordingly. Experiments using the SDR+ technique included naturalistic and physical-based DR while applying the concept of context-aware elements (CAE) to guide and speed up agent training. Drone racing served as the use case. The experimental framework workflow generated the following results. First, a baseline was established by training and validating an agent in a generic synthetic environment void of DR and CAE. The agent was then tested in environments with DR which showed degradation of performance. This validated the reality gap phenomenon under synthetic conditions and established a metric for comparison. Second, an SDR+ agent was successfully trained and validated under various applications of DR and CAE. Ablation studies determine most DR and CAE effects applied had equivalent effects on agent performance.

Under comparison, the SDR+ agent’s performance exceeded that of the baseline agent in every test where single or combined DR effects were applied. These tests indicated that the SDR+ agent’s performance did improve in environments with applied DR of the same order as received during training. The last result came from testing the SDR+ agent’s inference model in a completely new synthetic environment with more extreme and additional DR effects applied. The SDR+ agent’s performance was degraded to a point where it was inconclusive if generalization occurred in the form of learning to adapt to variations. If the agent’s navigational capabilities, control/feedback from the DRL algorithm, and the use of visual sensing were improved, it is assumed that future work could exhibit indications of generalization using the SDR+ technique.

Share

COinS