CEC Theses and Dissertations

Campus Access Only

All rights reserved. This publication is intended for use solely by faculty, students, and staff of Nova Southeastern University. No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, now known or later developed, including but not limited to photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the author or the publisher.

Date of Award


Document Type

Dissertation - NSU Access Only

Degree Name

Doctor of Philosophy in Computer Science (CISD)


Graduate School of Computer and Information Sciences


Michael Laszlo

Committee Member

Wei Li

Committee Member

Sumitra Mukherjee


The growing popularity of online virtual communities such as Second Life and ActiveWorlds demands the presence of intelligent agents to assist users in their daily online activities (e.g., exploring, shopping, and socializing). As these virtual environments become more crowded, multiple agents are needed to support the increasing number of users. Multi-agent environments, however, can suffer from the problem of resource competition among agents. It is therefore necessary that agents within multi-agent environments include a coordination mechanism to prevent unrealistic behaviors. Moreover, it is essential that these agents exhibit some form of intelligence, or the ability to learn, to support realism as well as to eliminate the need for developers to write separate scripts for each task the agents are required to perform. This research presents a coordinated reinforcement learning framework which can be used to develop task-oriented intelligent agents in multi-agent virtual environments. The framework contains a combination of a "next available agent" coordination model and a reinforcement learning model consisting of existing temporal difference reinforcement learning algorithms. Furthermore, the framework supports evaluations of reinforcement learning algorithms to determine which methods are best suited for task-oriented intelligent agents in dynamic, multi-agent virtual environments.

To assess the effectiveness of the temporal difference reinforcement algorithms used in this study (Q-learning and Sarsa), experiments were conducted that measured an agent's ability to learn three tasks commonly performed by workers in a café environment. These tasks were basic sandwich making (BSM), complex sandwich making (CSM), and dynamic sandwich making (DSM). The BSM task consisted of four steps. The CSM and DSM tasks contained an additional fifth step. The agent learned the BSM and CSM tasks from scratch while the DSM task was learned after the agent became skillful in BSM. The measurements used to evaluate the efficiency of the Q-learning and Sarsa algorithms were the percentage of successful and optimally successful episodes performed by the agent and the average number of time steps taken by the agent to complete a successful episode. The experiments were run using both a fixed (FEP) and variable (VEP) ε-greedy probability rate. Results showed that the Sarsa reinforcement learning algorithm, on average, outperformed the Q-learning algorithm in almost all experiments except when measuring the percentage of successfully completed episodes using FEP for CSM and DSM, in which Sarsa performed almost equally as well as Q-learning. Overall, experiments utilizing VEP resulted in higher percentages of successes and optimal successes, and showed convergence to the optimal policy when measuring the average number of time steps per successful episode.

To access this thesis/dissertation you must have a valid nova.edu OR mynsu.nova.edu email address and create an account for NSUWorks.

  Contact Author

  Link to NovaCat