Final Project | Machine Learning

1.0 Proposal

My proposal for the final project of this directed study is a small game which has what I am calling an ‘adaptive companion AI storyteller’. Essentially this is a learning AI system which does not directly influence the story, but acts as a companion and commentator. This system (henceforth Entity) is given a goal at the beginning of each game. As you play, Entity is also learning about the game and how to accomplish it’s own goal. It’s goal may or may not be the same as yours and it will try to get you to complete it’s goal regardless.

2.0 Story

The basic story I’ve designed for this prototype is a small puzzle/adventure style game. You start in a town, and there are three areas you can travel to (river, mountains, fields). Each of these areas presents an different opportunity for leaving the area and, therefore, and ending. Each area also allows you to pick up one useful and one useless object. Each useful object is required in one of the other locations in order to end the game.

3.0 Entity

Entity, as a system, has a few key goals. The following subsections detail how the system aims to accomplish each goal, as well as presenting sample/pseudo code to demonstrate the idea(s) where necessary.

→ 3.1 Entity will discover it’s own goals through learning

This part of Entity will feature a reward/punishment system for Entity. My designs for this system are fairly straightforward: Essentially, whenever the player performs an action, Entity will learn weather that action has brought it closer to, or moved it farther away from, it’s goal. Also, whenever locations or objects are discovered, or new information is learned, Entity will get this information and how positive or negative it is in relation to it’s goals. NOTE: it would be better to have it be one of Entities responsibilities to use past and future experiences (trial and error) to learn and update the relevance of each discovery. This system will modify these values but learning them from nothing would require a much longer, more complicated world and play session. Entity can then use this information to track which locations and objects are helpful to it. It can then decide what actions are likely to further it’s progress or even complete its goal.

This is also a good place to discuss the way Entity will store data. My plans take ideas from Neural Networks and Decisions Trees. Because Entity will need to be able to both update this data quickly and parse this data quickly to make decisions, it needs to be organized in a very efficient way. Here is where I can take advantage of JavaScript objects. Entity will store an object to represent each location, action and item that it learns about, and each of these objects will store a list of related locations, actions and objects. This creates a web of data points for Entity to took at. Furthermore, we can evaluate each node to find it’s most important related object, as well as finding any objects on which it’s importance depends (for the same reasons mentioned above, these dependencies will be pre-defined). Including this data in the objects now makes our tree traversal by Entity in a similar way to a decision tree, looking for the end node of paths that stem from the most important objects.

The question here becomes that this data is a web, not a tree, so where do we start? Neural networks work like a brain, with neurons firing due to external stimuli. In the same way, we can trace paths through our tree based on the ‘stimulation’ of a single base node. The only events that Entity is aware of are the player’s decisions and the generation of new knowledge. My design for this system will keep track of the decidedly most important object and use this node as the base for it’s decision tree whenever a decision is required. The most important object can change over time as the importance of items and locations change with new knowledge. Again we should consider performance as this system exists in a game. We cannot simply traverse every object to find the highest whenever new information is gained, so the system will keep a ranked list of objects which it can update more quickly by moving changed items up and down in rank.

Below I have created some sample/pseudo code which represents the overall process of this in the system. The code shows the type of data that will be exchanged, how Entity will deal with new discoveries, and the function which will deal with the feedback that is given to Entity.


//this is an object which will store information about the
//current goal of Entity. The data in here can be null, which
//represents an unknown or inapplicable type of data.
//Entity will update this when knew information is received, and
//use it to help decide how to act
var currentGoal = {
    location : null,
    object : null
};

//used to store all discovered locations,
//their effect on goals,
//and objects which can be found there
var locations = [];

//used to store discovered objects,
//their effect on goals,
//and locations where they can be found
var items = [];

//type: enumerator for the type of discovery (location, item etc)
//data: varies based on type of discovery, an object containing relevant information
//effect: describes the positive or negative effect of the discover on goal progression
function onDiscovery(type, data, effect)
{
    switch(type):
    {
        //for each case
            //store the data
            //remember its effect
            //look for patterns
            //update current goal
    }
}

//actionType: enumerator for type of action creating the effect (travel, pickup, etc)
//data: varies based on effect, an object containing relevant info to the action
//effect: the amount of positive or negative effect of the action on goal progression
function onFeedbackRecieved( actionType, data, effect )
{
    switch(type):
    {
        //for each case
            //update the entry based on effect
            //look for patterns
            //update current goal
    }
}

//this is an object which will store information about the

//current goal of Entity. The data in here can be null, which

//represents an unknown or inapplicable type of data.

//Entity will update this when knew information is received, and

//use it to help decide how to act

var currentGoal = {

location : null,

object : null

};

//used to store all discovered locations,

//their effect on goals,

//and objects which can be found there

var locations = [];

//used to store discovered objects,

//their effect on goals,

//and locations where they can be found

var items = [];

//type: enumerator for the type of discovery (location, item etc)

//data: varies based on type of discovery, an object containing relevant information

//effect: describes the positive or negative effect of the discover on goal progression

function onDiscovery(type, data, effect)

{

switch(type):

{

//for each case

//store the data

//remember its effect

//look for patterns

//update current goal

}

//actionType: enumerator for type of action creating the effect (travel, pickup, etc)

//data: varies based on effect, an object containing relevant info to the action

//effect: the amount of positive or negative effect of the action on goal progression

function onFeedbackRecieved( actionType, data, effect )

{

switch(type):

{

//for each case

//update the entry based on effect

//look for patterns

//update current goal

}

→ 3.2 Entity will suggest actions to the player in order to accomplish goals

As previously discussed, once Entity has enough information about it’s goal from the environment, it can use this information to start suggesting actions to the player. In this simple example, it will, for example, suggest that the player travel to locations and collect items relevant to it’s goal. Also, it may simply suggest that the player travel elsewhere, or not attempt certain actions if they are believed to have a negative impact on the goal’s progression.

In this example, we also need to deal with cause and effect. Because our goal will always require an item to be used in a specific location, Entity must be able to learn and understand these relationships.

The first step in making a suggestion is to decide on the best action based on what Entity has learned. The following sample/pseudo code shows the basic decision process that Entity will follow:


function findObjective()
{
    //first we find the most important object
    var objective = rankedObjects[0];

    //starting at this node, we traverse the
    //web of objects until an end point is reached
    while( objective has dependencies )
    {
        objective = most important dependancy;
    }
}

function findObjective()

{

//first we find the most important object

var objective = rankedObjects[0];

//starting at this node, we traverse the

//web of objects until an end point is reached

while( objective has dependencies )

{

objective = most important dependancy;

}

→ 3.3 Entity will make observations about the player and game state

In the simple example that I am creating here, Entity’s comments will be limited by the amount of information available. Also, because Entity’s decision making is event based, we will limit it’s comments to times when it’s unsure of what to do next and while it is waiting for new information. Entity should also be able to make observations which are relative to it’s own knowledge (ie expressing dislike for certain objects).

Entity will watch how long it has been since the last player decision / knowledge event. It can then make observations when the player takes too long to make. The length of time taken by the player will also be included in the player model (discussed in 3.4). The following sample/pseudo code shows the basic process of this timer.


//variable to store the elapsed time
var timer = 0;

//called every frame
function update( timeElapsed )
{
    timer += timeElapsed;

    if(timer > threshold)
    {
        timer = 0;
        makeObservation();
    }
}

//variable to store the elapsed time

var timer = 0;

//called every frame

function update( timeElapsed )

{

timer += timeElapsed;

if(timer > threshold)

{

timer = 0;

makeObservation();

}

Because Entity’s observations do not have a higher purpose aside from entertainment and immersion for the player, it will likely select randomly what to make an observation on (ie player, location, an item in the location etc) and then make a statement based on what it knows (or doesn’t know) about the selected object.

→ 3.4 Entity will change tone based on players level of cooperation

Because Everything that Entity says is a suggested action, the player will be able to ignore or deliberately disobey them. The cooperation of the player, along with Entity’s level of success, will have an impact on Entity’s tone. In essence, I am simulating a mood for Entity. In order to decide how Entity feels about the player, it will construct and maintain a model to represent the player over the following metrics:

Time to make decisions
Number of decisions made
Level of cooperation
Percentage of cooperation
Overall impact on goal progression

These metrics will be calculated and boiled down into a floating point numbers between either -1 and 1, or 0 and 1 where applicable. Together, this set of floating point numbers creates a genome which describes the necessary information about the player and the current play session. The following sample/pseudo code describes some of the processes that will be used to store and update this model:


//this is how I will setup a genome in JavaScript
var playerModel = {
    decisionTime  : 0,    //average over time in milliseconds
    decisionsMade : 0,    //integer count of decisions/actions
    coopLevel     : 0.0,  //float between -1 and 1
    coopPercent   : 0.0,  //float between  0 and 1
    impact        : 0.0   //float between -1 and 1  
};

//to update decision time
//we add in the new time assuming 10 data points.
//This allows the average to change over time
//without saving past data
playerModel.decisionTime = 0.1 * newDecisionTime 
    + 0.9 * playerMode.decisionTime

//to update decisionsMade
//simply add one whenever decision is made
playerModel.decisionsMade++;

// to update coopLevel
//we get +1 or -1 depending on if the player 
//listened to (+1) or didn't listen to(-1) the last suggestion
//again, we average this result into the genome
playerModel.coopLevel = 0.1 * result + 0.9 playerModel.coopLevel

//to update coopPercent
//here we use the number of decision made to find the percentage
//of suggestions that have been followed
//this value is absolute whereas coopLevel changes over time 
//and is forgiving of past events
playerModel.coopPerc = ( playerModel.coopPerc * 
    ( playerModel.decisionsMade-1 ) + ( result > 0 ) ? 1 : 0 ) 
    / playerModel.decisionsMade;

//to update impact
//the player may be following suggestions but the suggestions
//may not have been good ones or the player only listens to
//simple suggestions which have little impact on Entity's goals
//here we average in a helpfulness metric, which Entity will calculate
//based on how important the completion of that suggestion was
//this will likely be directly tied to the relevance of
//the suggestion's the related object
playerModel.impact = 0.1 * helpfulness + 0.9 * playerModel.impact;

//this is how I will setup a genome in JavaScript

var playerModel = {

decisionTime : 0, //average over time in milliseconds

decisionsMade : 0, //integer count of decisions/actions

coopLevel : 0.0, //float between -1 and 1

coopPercent : 0.0, //float between 0 and 1

impact : 0.0 //float between -1 and 1

};

//to update decision time

//we add in the new time assuming 10 data points.

//This allows the average to change over time

//without saving past data

playerModel.decisionTime = 0.1 * newDecisionTime

+ 0.9 * playerMode.decisionTime

//to update decisionsMade

//simply add one whenever decision is made

playerModel.decisionsMade++;

// to update coopLevel

//we get +1 or -1 depending on if the player

//listened to (+1) or didn't listen to(-1) the last suggestion

//again, we average this result into the genome

playerModel.coopLevel = 0.1 * result + 0.9 playerModel.coopLevel

//to update coopPercent

//here we use the number of decision made to find the percentage

//of suggestions that have been followed

//this value is absolute whereas coopLevel changes over time

//and is forgiving of past events

playerModel.coopPerc = ( playerModel.coopPerc *

( playerModel.decisionsMade-1 ) + ( result > 0 ) ? 1 : 0 )

/ playerModel.decisionsMade;

//to update impact

//the player may be following suggestions but the suggestions

//may not have been good ones or the player only listens to

//simple suggestions which have little impact on Entity's goals

//here we average in a helpfulness metric, which Entity will calculate

//based on how important the completion of that suggestion was

//this will likely be directly tied to the relevance of

//the suggestion's the related object

playerModel.impact = 0.1 * helpfulness + 0.9 * playerModel.impact;

→ 3.5 Entity will adapt based on player’s level of cooperation

With all the player metrics being recorded, Entity can now respond to the player and their choices. There are two basic purposes of this response, one is to have Entity change tone based on player cooperation (ie get mad, frustrated or happy and thankful) as well as deciding weather to make suggestions in line with what it wants (player follows suggestions) or to lie and ask for the opposite (because player rarely follows suggestions). I will not post sample/pseudo code here because this process will work into the comment and decision making processes already defined. Essentially, Entity will evaluate relevant metrics in the player model against pre-defined thresholds to decide what to do.

Machine Learning

Category Archives: Final Project

Final Project Designs