Learning Turtles

Demonstration of Reinforcement Learning Algorithms to Search for a Goal

Learning Turtles is a program I developed in NetLogo which demonstrates a reinforcement learning search technique and compares its effectiveness to other search techniques used in Artificial Intelligence programming. The program is developed using NetLogo's native agent-based programming language and utilizes NetLogo's tools and modelling environment.

The program begins by generating an obstacle filled grid-like environment with a lone agent (or turtle) placed in a random location (i.e. patch). An additional single patch is colored red at random, indicating the goal patch. The agent's objective is to search for and move to this goal patch using one of three provided AI search algorithms:

Random - The agent will move in random directions until it finds the goal patch. Visited patches are not recollected so it is possible for the agent to visit a patch more than once. This method involves no planning and takes the agent the longest time on average to find the goal compared to the other two search algorithms. Although it is not really considered an AI search method, it is used to demonstrate how other search algorithms that involve planning or heuristics can be more efficient.

The random search is the only search in which the agent attempts to guess where the goal patch is. For the following searches, the agent analyses the environment, remembers the patches it analyzed and plans a path to the goal before it begins traverse the environment.

Greedy Movement (Heuristic) - The agent now has an idea as to where the goal is by using a heuristic as its basis to reaching it. The heuristic it uses is: "h(n) = the shorter the distance between the goal patch and a specific patch, n, the greater the value that n has." The distance between a patch and the goal is calculated using the Distance Formula. The agent will use the distance formula to evaluate its four surrounding patches to see which has the lowest value. The agent will then move to that node and repeat the process until it finds the goal.

The main issue with this method is that the agent does not plan a path to the goal before it traverses the environment. Although this method could potentially have optimal performance (i.e. traverse the environment in the shortest possible amount of time) and find the shortest possible path to the goal, the agent will occasionally reach dead ends. The more obstacles the environment has, the more likely the agent will be blocked and will cease to find the goal. Consequently, it is best for the agent to not only have an idea as to where the goal is, but to also plan its path before it begins to move.

Reinforcement Learning - the reinforcement learning algorithm evaluates every accessible patch in the environment and are assigned a value based on where it is in relation to the goal patch. The patches are assigned using an algorithm that utilizes a variation of the Bellman Equation. Every node in the environment starts off with a value of 0. After every iteration, each node is reassigned the lowest value of its child nodes. Added to that value is the path cost between the node being evaluated and the child node with the lowest value. The goal node is always assigned 0 after every iteration. This allows for the values of its surrounding nodes to stabilize and thus, create a path of decreasing value towards the goal in all directions of the environment. If a patch or set of patches is blocked off from the goal or if a patch is an obstacle, it is given a value of -1 and are highlighted in yellow.

After all patches are evaluated and the environment is stabilized, the agent follows a heuristic that is different from the previous search method: "h(n) = the smaller the node's value, the greater the heuristic value." From its starting position, the agent looks at its surrounding nodes and moves to the node with the lowest value. Doing so allows it to move one step closer to the goal patch. The agent continues this until it reaches the goal.

Although the agent may take a considerable amount of time to evaluate the environment, especially if the environment is large, not only does this method guarantee the shortest possible path to the goal, the agent will never hit a dead end unless it is completely blocked off by obstacles. In which case, it will report failure before it even moves.

I created this program as part of a project for an Artificial Intelligence class I enrolled in while attending Brooklyn College. The objective of the project was to demonstrate learning and planning techniques and to compare them with other search methods.

In addition to the searches mentioned, the program also provides minor features such as displaying the number of steps the agent takes, and the ability to see the values of each patch in real time. Additionally, the lower the value of the patch, the lighter green a patch will display. The end result shows a gradient that spreads outward from the goal node.