Final Project Designs

Posted on August 12, 2014 by Ryan Bottriell

1.0 Proposal

My proposal for the final project of this directed study is a small game which has what I am calling an ‘adaptive companion AI storyteller’. Essentially this is a learning AI system which does not directly influence the story, but acts as a companion and commentator. This system (henceforth Entity) is given a goal at the beginning of each game. As you play, Entity is also learning about the game and how to accomplish it’s own goal. It’s goal may or may not be the same as yours and it will try to get you to complete it’s goal regardless.

2.0 Story

The basic story I’ve designed for this prototype is a small puzzle/adventure style game. You start in a town, and there are three areas you can travel to (river, mountains, fields). Each of these areas presents an different opportunity for leaving the area and, therefore, and ending. Each area also allows you to pick up one useful and one useless object. Each useful object is required in one of the other locations in order to end the game.

3.0 Entity

Entity, as a system, has a few key goals. The following subsections detail how the system aims to accomplish each goal, as well as presenting sample/pseudo code to demonstrate the idea(s) where necessary.

→ 3.1 Entity will discover it’s own goals through learning

This part of Entity will feature a reward/punishment system for Entity. My designs for this system are fairly straightforward: Essentially, whenever the player performs an action, Entity will learn weather that action has brought it closer to, or moved it farther away from, it’s goal. Also, whenever locations or objects are discovered, or new information is learned, Entity will get this information and how positive or negative it is in relation to it’s goals. NOTE: it would be better to have it be one of Entities responsibilities to use past and future experiences (trial and error) to learn and update the relevance of each discovery. This system will modify these values but learning them from nothing would require a much longer, more complicated world and play session. Entity can then use this information to track which locations and objects are helpful to it. It can then decide what actions are likely to further it’s progress or even complete its goal.

This is also a good place to discuss the way Entity will store data. My plans take ideas from Neural Networks and Decisions Trees. Because Entity will need to be able to both update this data quickly and parse this data quickly to make decisions, it needs to be organized in a very efficient way. Here is where I can take advantage of JavaScript objects. Entity will store an object to represent each location, action and item that it learns about, and each of these objects will store a list of related locations, actions and objects. This creates a web of data points for Entity to took at. Furthermore, we can evaluate each node to find it’s most important related object, as well as finding any objects on which it’s importance depends (for the same reasons mentioned above, these dependencies will be pre-defined). Including this data in the objects now makes our tree traversal by Entity in a similar way to a decision tree, looking for the end node of paths that stem from the most important objects.

The question here becomes that this data is a web, not a tree, so where do we start? Neural networks work like a brain, with neurons firing due to external stimuli. In the same way, we can trace paths through our tree based on the ‘stimulation’ of a single base node. The only events that Entity is aware of are the player’s decisions and the generation of new knowledge. My design for this system will keep track of the decidedly most important object and use this node as the base for it’s decision tree whenever a decision is required. The most important object can change over time as the importance of items and locations change with new knowledge. Again we should consider performance as this system exists in a game. We cannot simply traverse every object to find the highest whenever new information is gained, so the system will keep a ranked list of objects which it can update more quickly by moving changed items up and down in rank.

Below I have created some sample/pseudo code which represents the overall process of this in the system. The code shows the type of data that will be exchanged, how Entity will deal with new discoveries, and the function which will deal with the feedback that is given to Entity.


//this is an object which will store information about the
//current goal of Entity. The data in here can be null, which
//represents an unknown or inapplicable type of data.
//Entity will update this when knew information is received, and
//use it to help decide how to act
var currentGoal = {
    location : null,
    object : null
};

//used to store all discovered locations,
//their effect on goals,
//and objects which can be found there
var locations = [];

//used to store discovered objects,
//their effect on goals,
//and locations where they can be found
var items = [];

//type: enumerator for the type of discovery (location, item etc)
//data: varies based on type of discovery, an object containing relevant information
//effect: describes the positive or negative effect of the discover on goal progression
function onDiscovery(type, data, effect)
{
    switch(type):
    {
        //for each case
            //store the data
            //remember its effect
            //look for patterns
            //update current goal
    }
}

//actionType: enumerator for type of action creating the effect (travel, pickup, etc)
//data: varies based on effect, an object containing relevant info to the action
//effect: the amount of positive or negative effect of the action on goal progression
function onFeedbackRecieved( actionType, data, effect )
{
    switch(type):
    {
        //for each case
            //update the entry based on effect
            //look for patterns
            //update current goal
    }
}

//this is an object which will store information about the

//current goal of Entity. The data in here can be null, which

//represents an unknown or inapplicable type of data.

//Entity will update this when knew information is received, and

//use it to help decide how to act

var currentGoal = {

location : null,

object : null

};

//used to store all discovered locations,

//their effect on goals,

//and objects which can be found there

var locations = [];

//used to store discovered objects,

//their effect on goals,

//and locations where they can be found

var items = [];

//type: enumerator for the type of discovery (location, item etc)

//data: varies based on type of discovery, an object containing relevant information

//effect: describes the positive or negative effect of the discover on goal progression

function onDiscovery(type, data, effect)

{

switch(type):

{

//for each case

//store the data

//remember its effect

//look for patterns

//update current goal

}

//actionType: enumerator for type of action creating the effect (travel, pickup, etc)

//data: varies based on effect, an object containing relevant info to the action

//effect: the amount of positive or negative effect of the action on goal progression

function onFeedbackRecieved( actionType, data, effect )

{

switch(type):

{

//for each case

//update the entry based on effect

//look for patterns

//update current goal

}

→ 3.2 Entity will suggest actions to the player in order to accomplish goals

As previously discussed, once Entity has enough information about it’s goal from the environment, it can use this information to start suggesting actions to the player. In this simple example, it will, for example, suggest that the player travel to locations and collect items relevant to it’s goal. Also, it may simply suggest that the player travel elsewhere, or not attempt certain actions if they are believed to have a negative impact on the goal’s progression.

In this example, we also need to deal with cause and effect. Because our goal will always require an item to be used in a specific location, Entity must be able to learn and understand these relationships.

The first step in making a suggestion is to decide on the best action based on what Entity has learned. The following sample/pseudo code shows the basic decision process that Entity will follow:


function findObjective()
{
    //first we find the most important object
    var objective = rankedObjects[0];

    //starting at this node, we traverse the
    //web of objects until an end point is reached
    while( objective has dependencies )
    {
        objective = most important dependancy;
    }
}

function findObjective()

{

//first we find the most important object

var objective = rankedObjects[0];

//starting at this node, we traverse the

//web of objects until an end point is reached

while( objective has dependencies )

{

objective = most important dependancy;

}

→ 3.3 Entity will make observations about the player and game state

In the simple example that I am creating here, Entity’s comments will be limited by the amount of information available. Also, because Entity’s decision making is event based, we will limit it’s comments to times when it’s unsure of what to do next and while it is waiting for new information. Entity should also be able to make observations which are relative to it’s own knowledge (ie expressing dislike for certain objects).

Entity will watch how long it has been since the last player decision / knowledge event. It can then make observations when the player takes too long to make. The length of time taken by the player will also be included in the player model (discussed in 3.4). The following sample/pseudo code shows the basic process of this timer.


//variable to store the elapsed time
var timer = 0;

//called every frame
function update( timeElapsed )
{
    timer += timeElapsed;

    if(timer > threshold)
    {
        timer = 0;
        makeObservation();
    }
}

//variable to store the elapsed time

var timer = 0;

//called every frame

function update( timeElapsed )

{

timer += timeElapsed;

if(timer > threshold)

{

timer = 0;

makeObservation();

}

Because Entity’s observations do not have a higher purpose aside from entertainment and immersion for the player, it will likely select randomly what to make an observation on (ie player, location, an item in the location etc) and then make a statement based on what it knows (or doesn’t know) about the selected object.

→ 3.4 Entity will change tone based on players level of cooperation

Because Everything that Entity says is a suggested action, the player will be able to ignore or deliberately disobey them. The cooperation of the player, along with Entity’s level of success, will have an impact on Entity’s tone. In essence, I am simulating a mood for Entity. In order to decide how Entity feels about the player, it will construct and maintain a model to represent the player over the following metrics:

Time to make decisions
Number of decisions made
Level of cooperation
Percentage of cooperation
Overall impact on goal progression

These metrics will be calculated and boiled down into a floating point numbers between either -1 and 1, or 0 and 1 where applicable. Together, this set of floating point numbers creates a genome which describes the necessary information about the player and the current play session. The following sample/pseudo code describes some of the processes that will be used to store and update this model:


//this is how I will setup a genome in JavaScript
var playerModel = {
    decisionTime  : 0,    //average over time in milliseconds
    decisionsMade : 0,    //integer count of decisions/actions
    coopLevel     : 0.0,  //float between -1 and 1
    coopPercent   : 0.0,  //float between  0 and 1
    impact        : 0.0   //float between -1 and 1  
};

//to update decision time
//we add in the new time assuming 10 data points.
//This allows the average to change over time
//without saving past data
playerModel.decisionTime = 0.1 * newDecisionTime 
    + 0.9 * playerMode.decisionTime

//to update decisionsMade
//simply add one whenever decision is made
playerModel.decisionsMade++;

// to update coopLevel
//we get +1 or -1 depending on if the player 
//listened to (+1) or didn't listen to(-1) the last suggestion
//again, we average this result into the genome
playerModel.coopLevel = 0.1 * result + 0.9 playerModel.coopLevel

//to update coopPercent
//here we use the number of decision made to find the percentage
//of suggestions that have been followed
//this value is absolute whereas coopLevel changes over time 
//and is forgiving of past events
playerModel.coopPerc = ( playerModel.coopPerc * 
    ( playerModel.decisionsMade-1 ) + ( result > 0 ) ? 1 : 0 ) 
    / playerModel.decisionsMade;

//to update impact
//the player may be following suggestions but the suggestions
//may not have been good ones or the player only listens to
//simple suggestions which have little impact on Entity's goals
//here we average in a helpfulness metric, which Entity will calculate
//based on how important the completion of that suggestion was
//this will likely be directly tied to the relevance of
//the suggestion's the related object
playerModel.impact = 0.1 * helpfulness + 0.9 * playerModel.impact;

//this is how I will setup a genome in JavaScript

var playerModel = {

decisionTime : 0, //average over time in milliseconds

decisionsMade : 0, //integer count of decisions/actions

coopLevel : 0.0, //float between -1 and 1

coopPercent : 0.0, //float between 0 and 1

impact : 0.0 //float between -1 and 1

};

//to update decision time

//we add in the new time assuming 10 data points.

//This allows the average to change over time

//without saving past data

playerModel.decisionTime = 0.1 * newDecisionTime

+ 0.9 * playerMode.decisionTime

//to update decisionsMade

//simply add one whenever decision is made

playerModel.decisionsMade++;

// to update coopLevel

//we get +1 or -1 depending on if the player

//listened to (+1) or didn't listen to(-1) the last suggestion

//again, we average this result into the genome

playerModel.coopLevel = 0.1 * result + 0.9 playerModel.coopLevel

//to update coopPercent

//here we use the number of decision made to find the percentage

//of suggestions that have been followed

//this value is absolute whereas coopLevel changes over time

//and is forgiving of past events

playerModel.coopPerc = ( playerModel.coopPerc *

( playerModel.decisionsMade-1 ) + ( result > 0 ) ? 1 : 0 )

/ playerModel.decisionsMade;

//to update impact

//the player may be following suggestions but the suggestions

//may not have been good ones or the player only listens to

//simple suggestions which have little impact on Entity's goals

//here we average in a helpfulness metric, which Entity will calculate

//based on how important the completion of that suggestion was

//this will likely be directly tied to the relevance of

//the suggestion's the related object

playerModel.impact = 0.1 * helpfulness + 0.9 * playerModel.impact;

→ 3.5 Entity will adapt based on player’s level of cooperation

With all the player metrics being recorded, Entity can now respond to the player and their choices. There are two basic purposes of this response, one is to have Entity change tone based on player cooperation (ie get mad, frustrated or happy and thankful) as well as deciding weather to make suggestions in line with what it wants (player follows suggestions) or to lie and ask for the opposite (because player rarely follows suggestions). I will not post sample/pseudo code here because this process will work into the comment and decision making processes already defined. Essentially, Entity will evaluate relevant metrics in the player model against pre-defined thresholds to decide what to do.

Basic Learning Prototypes

Posted on July 24, 2014 by Ryan Bottriell

My next challenge is to explore the implementation of machine learning application in Javascript. I have made several prototypes which each explore a different style of learning or implementation. Although these prototypes are all very simple, they demonstrate the basic ideas of each algorithm.

1.0 – Basic Supervised

This prototype explores a very simple classification system that uses supervised learning. The application generates training data based on user-defined parameters, and then feeds them to the learning system. The system uses the data to learn an acceptable space, and also define a space of uncertainty.

This prototype can be found here:

http://machlearn.ryanbottriell.com/prototypes/basicSupervised/

2.0 – Simple Decision Tree

Exploring a similar classification problem, this example creates a decision tree which it can use to classify new data points. The tree used in this prototype is a binary tree which looks at one variable per node.

This prototype can be found here:

http://machlearn.ryanbottriell.com/prototypes/simpleTree/

3.0 – Player Modelling

This prototype explores the idea of player modelling through the game of tic-tac-toe. The system will play several matches with you and prepare a model of your play techniques based on three measures: How you start, where you play, and your common winning lines.

http://machlearn.ryanbottriell.com/prototypes/basicModelling/

ML in Games – Practical Examples

Posted on July 16, 2014 by Ryan Bottriell

This post will discuss some existing games which employ machine learning techniques and, as much as possible, how they do it.

1.0 – Creatures

Creatures is a video game series which came out in the 1990’s and uses neural networks and biochemistry to create artificial life. The creatures in the game, called Norns, are taught how to act and live by the players through interaction and breed using simulated DNA.

1.1 – Neural Network & Learning

The neural network used for each creatures was designed to be efficient and dynamic. The network can be mutated and recombined during reproduction while still maintaining a good, if not better, level of function. Grand and Cliff define many different types of input data as well as 6 parameters for each neuron: Input types, Input gain, Rest State, Relaxation Rate, Threshold (output remains zero until threshold is passed), and State-Variable Rule (used to compute the value of the neuron from one or more input signals) (Grand, and Cliff, 1998). The neurons are grouped into different lobes which perform various decision-making and logic tasks to control the creatures behaviour. The machine learning really comes in in the concept and decision lobes. The concept lobe contain neurons which ‘watch’ 4 input signals from the creatures sensory system. These neurons fire when all 4 inputs are activated, and are given new connections when all 4 drop to zero. Another level of algorithms attempts to keep a pool of repeatedly firing neurons connected for a long time while also ensuring that there are available neurons to be committed to new connections (Grand, and Cliff, 1998). The Decision layer then has one cell for each possible action, and these cells have many connections to signals from the concept layer. both positive and negative signals come in and are summed by the cell. The decision cell with the highest value is taken as the choice of action by the creature. These signals also retain a susceptibility based on their current influence and are adjusted if positive or negative feedback is received. Of course, these connections can also be deemed influential to the connected action, and will seek out new sources of input from the concept layer (Grand, and Cliff, 1998).

1.2 – DNA & Reproduction

Each creature has a genome which defines the way it looks as well as influencing some of the internal structure of the brain and decision process. When creatures are bred, the genes of both parents are spliced and mixed together with a small amount of random mutation (Grand, and Cliff, 1998). Although this process isn’t exactly machine learning, the process does require each creature to live past puberty, and allows the player to keep favorable traits alive and produce creatures which model their parents both visually and behaviorally.

2.0 – Black & White

In Black & White the player acts as a god, gathering followers and helping them survive and thrive. At the beginning of the game, the player selects a creature who will help them along the way. The creature learns how the player plays and constructs decision trees based on feedback from the player and observations of the world (Wexeler, 2002). Details on the implementation of the creature AI in Black & White is scarce, but I gather that a very simple neural network was used in conjunction with the aforementioned decision trees to create the behaviour (Champandard, 2007).

3.0 – Forza Motorsport

As mentioned in my post on ML for player modelling, the Forza Motorsport series implements machine learning in their Drivatar™ technology.

4.0 – City Conquest

City Conquest is a Kickstarter game that was successfully funded on April 29, 2012. There is not a whole lot of documentation on this game or how machine learning was used, but an interview with the games designer and engineer, Paul Tozour, provides useful insight into the system that he calls The Evolver.

As a tower-defense-style game, City Defense requires a lot of playing and tuning to ensure that the cost of each building/unit is fair and that there are no gameplay strategies or loopholes that can cause one player to greatly over- or under-power the other. The Evolver is a system which uses randomly generated script to run game matches in an evolutionary tournament. Each script defines the order or purchase and placement of buildings in the game. After an initial tournament of 400 scripts, The Evolver uses simple processes to breed/combine successful scripts and add in random mutations to the scripts as well. Tozour could run the Evlover for one or two days straight to try and discover the best strategy in the game. Once the script was found, the designers could play it back and see why the methode came out on top. Tozour noted that although this method doesn’t replace human gameplay testing, it provides a great development resource at a reasonably low cost that expedites their tuning process greatly (Champandard, 2012).

5.0 –

References

Grand, Steven, and Dave Cliff. “Creatures: Entertainment Software Agents with Artificial Life.” Autonomous Agents and Multi-Agent Systems 1: 39-57. SpringerLink. Web. 9 July 2014.

J. Champandard, Alex . “Evolving with Creatures’ AI: 15 Tricks to Mutate into Your Own Game.” . AMGameDev.com, 1 Oct. 2007. Web. 10 July 2014. <http://aigamedev.com/open/review/creatures-ai/>.

Wexler, James. “Artificial Intelligence in Games.” Rochester: University of Rochester (2002).

J. Champandard, Alex. “Top 10 Most Influential AI Games.” Your Online Hub for Game/AI. AIGameDev.com, 12 Sept. 2007. Web. 11 July 2014. <http://aigamedev.com/open/highlights/top-ai-games/>.

J. Champandard, Alex. “Making Designers Obsolete? Evolution in Game Design” Your Online Hub for Game/AI. AIGameDev.com, 6 Feb. 2012. Web. 13 July 2014. <http://aigamedev.com/open/interview/evolution-in-cityconquest/>.

ML in Games – Part 5

Posted on July 3, 2014 by Ryan Bottriell

5.0 Procedural Content Generation & Machine Learning

I’m hoping to find some interesting things here. Going into my final year, I am working on my own senior project that involved procedural content and story generation. I’m hoping that some of the ideas of machine learning may carry over into my own project. The term ‘content’ is rather vague, but for the purposes for this research, I will be looking at level/environment generation along with story/plot generation.

5.1 – Novelty Evaluation

Silvester explores some interesting ideas about how to evaluate the novelty of procedural level generation on various levels in his paper “Using Novelty Search to Generate Video Game Levels”. He discusses how we can evaluate generated content for it’s novelty or evaluate novelty based on the sequence of actions required to complete a level (“Using Novelty Search to Generate Video Game Levels”). Although his methods do not involve machine learning, I believe there is great potential here for integrating his ideas with a learning system. Perhaps we could solve some of the issues that he mentions, such as having the level discernible from simple random generation by implementing a supervised learning system which is trained ahead of time to create levels that are both novel, and enjoyable based on novelty search methods and user feedback.

5.2 – Enjoyment Potential

Another fairly straightforward task for a machine learning system in procedural content generation is in the evaluation of the potential enjoyment associated with generated content. This also involves the possibility of training a system in the creation of enjoyable content, rather than just evaluating it afterwords. Pedersen, Togelius, and Yannkakis used supervised learning to train an algorithm which aims to generate enjoyable levels for the game Infinite Mario Bros. which is an open source version of the game Super Mario Bros. They created levels with variations in each of the features that their algorithm would control, and then collected surveys of a test population after playing each variation. This was used as training data for their system, which succeeded in identifying what lead to an enjoyable experience in a level, but they did express concerns with their original data set being too small to achieve truly great results (Pedersen, Christopher, Julian Togelius, and Georgios N. Yannakakis, 2010).

5.3 – Story Implementations

As mentioned previously in Part 2, and similar to the above, machine learning can be used to model a player as they play and feed this model into an interactive story engine to help it choose the arc of the story (Thue, Bultiko, Spetch, and Wasylishen, 2014).

Barber and Kudenko take a different approach to interactive storytelling by basing their generated narratives on dilemmas. Their system, entitled GADIN for Generator of Adaptive Dilemma-based Interactive Narratives, uses information about the story world (including characters), a list of possible actions, and dilemmas to generate story nodes for the viewer to navigate (Barber and Kudenko, 2009). The potential for machine learning here is quite exciting. Beyond incorporating a learning aspect which collects data about and models the player’s real attributes and traits for use in the GADIN, this could also be extended into the massively multiplayer online role-playing game (MMORPG) realm. Such a system could collect metrics about all players in the world and generate interactive dilemma-based narratives which are catered to and involve a larger group of players, or perhaps represents a player while they are away and gives out quests to other players based on the metrics collected about your player.

References

Silvester, Jim. “Using Novelty Search to Generate Video Game Levels”.

Pedersen, Christopher, Julian Togelius, and Georgios N. Yannakakis. “Modeling player experience for content creation.” Computational Intelligence and AI in Games, IEEE Transactions on 2.1 (2010): 54-67.

Thue, David, Vadim Bultiko, Macia Spetch, and Eric Wasylishen. “Interactive Storytelling: A Player Modelling Approach.” : n. pag. Association for the Advancement of Artiﬁcial Intelligence. Web. 23 June 2014.

Barber, Heather, and Daniel Kudenko. “Generation of adaptive Dilemma-Based interactive narratives.” Computational Intelligence and AI in Games, IEEE Transactions on 1.4 (2009): 309-326.

ML in Games – Part 4

Posted on July 2, 2014 by Ryan Bottriell

4.0 – ML for Procedural Animation

3D animation poses many challenges for a machine learning system. In games, it is often not possible nor feasible to pre-animate every possible action for a character and how the character should transition between the actions. This is an area where a lot of research has been done. Following the same idea, animation can be a slow process, and systems that can increase an animators productivity can be equally as useful.

4.1 – Animation Blending

I was looking for, and hoping to find some machine learning applications for animation blending (i.e. connecting animations and poses in a natural, fluid way). However, it seems that this type of work is easier with, and better suited for, procedural-type algorithms. A good algorithm / system that is well implemented provides a good solution for blending animation and doesn’t generally require any of the benefits that might be gained by using a learning system (improvement over time, improvement from user feedback, etc.).

4.2 – Style-Based IK System

Grochow, Martin, Hertzmann and Popović created a system which could very well make animators a lot more efficient in a production environment. Their style-based system works by learning segments of animation as a ‘style’ (e.g. a baseball pitch). The learning system uses the animation data from the pitch as input and maps the probability of various poses based on the poses within the captured data. An animator can then animate variances of this style using a very small number of IK handles and the system can interpolate natural poses and animation from the style space (Grochow, Martin, Hertzmann and Popović, 2004). Therefore, instead of moving tens or hundreds of individual controls on a character rig, animators can collect a library of styles for this system and use them to produce realistic animations very quickly.

This is a supervised learning system, as it requires training data to create style spaces and starts with knowledge of what the data representsdw. The algorithm is called a Gaussian Process model, and it essentially maps input x to a y and then uses what’s called it’s kernal function to map the similarities between them. This Kernal Matrix along with the GP mapping is used by the learning algorithm to actually map the 2d space which is used to extrapolate/interpolate other natural poses for variant animations. From what I gather the learning algorithm optimizes the 2d space to fit the given data in a way that allows easy and real-time creation of new poses (Grochow, Martin, Hertzmann and Popović, 2004).

4.3 – Motion Capture Pose Segmentation

Another interesting use for machine learning with respect to animation is in motion capture. I though this research was interesting to mention because of the section on the style-based IK system. Using some pattern recognition and machine learning processes, we can create a system which automatically looks for and divides motion capture data into distinct actions and poses (Barbič, Jernej, et al., 2004). This type of system integrated with the style-based IK system would create an extremely fast and efficient animation pipeline in a production environment.

References

Keith Grochow, Steven L. Martin, Aaron Hertzmann, Zoran Popović. Style-based Inverse Kinematics. ACM Transactions on Graphics (Proceedings of SIGGRAPH 2004), 2004.

Barbič, Jernej, et al. “Segmenting motion capture data into distinct behaviors.”Proceedings of the 2004 Graphics Interface Conference. Canadian Human-Computer Communications Society, 2004.

ML in Games – Part 3

Posted on June 27, 2014 by Ryan Bottriell

3.0 – ML in Multiplayer Environments

3.1 – Player Modelling

In my previous post I discussed the uses of player modelling in the Forza Motorsport series. Capturing player behaviour allows online games to simulate multiplayer gameplay without the need for both players to be actively involved at the same time. Musick, Bowling, Furnkraz, and Graepel also mention the potential for this technology in MMORPGs (Massively Multiplayer Online Role-Playing Games) where a large persistent universe may require a character to be present regardless of the availability of their human counter-part (Musick, Bowling, Furnkraz, and Graepel, 2006).

3.2 – Match-making

Machine learning has great potential in online match-making. As discussed in Part 1, players enjoy themselves the most when an enemy AI can create just enough challenge to match their skill level and play style. The same ideas hold true for online play, and it become a significant challenge to create a system which can match players and teams together in a way that is satisfying for all.

Xbox live employs a matchmaking system called TrueSkill™. The system addresses some of the major challenges in mutliplayer match making systems. These include the idea that team-based match outcomes need to provide skill information about each individual player, and that estimating the skill level of a team requires knowledge of each members skill and how they may act together. The math may be a little beyond me, but essentially the learning part of TrueSkill™ is when it deals with a player’s skill level. The system uses outcome of a game along with each players estimated performance within their team top update the player’s recorded skill level. TrueSkill™can then use the skill levels for each player to create increasingly more even, fair matches that are enjoyable to all members (Herbrich, Minka, and Graepel, 2006).

References

Musick, Ron, Michael Bowling, Johnannes Furnkranz, and Thore Graepel. “Machine learning and games.” Machine Learning 63: 211-215. SpringerLink. Web. 25 June 2014.

Graepel, Thore, Joaquin Quiñonero Candela, and Ralf Herbrich. “Machine Learning in Games: The Magic of Research in Microsoft Products.” . Microsoft Research Cambridge, 1 Jan. 2008. Web. 20 June 2014. <http://research.microsoft.com/en-us/events/2011summerschool/jqcandela2011.pdf>.

Herbrich, Ralf, Tom Minka, and Thore Graepel. “Trueskill™: A Bayesian skill rating system.” Advances in Neural Information Processing Systems. 2006.

ML in Games – Part 2

Posted on June 25, 2014 by Ryan Bottriell

2.0 – Player Behaviour Capture

Another use of machine learning in video games is for player modelling. Creating a model that represents one or more aspects of the player (playstyle, preferences, skill level, etc.) allows the game to adapt to the player, predict the players choices, represent the player in one or more ways among many others.

2.1 Drivatar™ Example

Forza Motorsport uses machine learning to model and reproduce player behaviour. This allows the computer to represent the player and their racing style in online games without the user being present (“Drivatar™ – Microsoft Research.”). The game AI uses a ‘racing line model‘ which is a smooth driving path through each segment of track (Graepel, Candela, and Herbrich, 2008). The player modelling process then uses information about how you navigate the turns in a track to train your Drivatar™. The variables used are related to: how consistently you drive; how smoothly you drive through corners; how you brake before a turn and consequently how you enter the turn; how quickly and accurately you navigate the apex of a corner; and how you exit the corner. The algorithm is trained from 5 races which represent a specific population sample of cars and tracks (“Drivatar™ in Forza Motorsport.”).

This is a good example of supervised learning in a video game.

2.2 Storytelling Example

Thue, Bultiko, Spetch, and Wasylishen did a very interesting study on the use of player modelling to create dynamic story lines based on individual preferences. They use the player types defined by of Robin D. Laws in his book Robin’s Laws of Good Game Mastering (2001) to evaluate their players. Whenever the player is faced with a decision, each outcome has defined player type association(s) which are used to update a genome which represents the current player’s preferences. The interactive story system then uses this genome to decide which events / plot points should come up next (Thue, David, Vadim Bultiko, Macia Spetch, and Eric Wasylishen, 2014).

This is another example of supervised learning. This implementation does not require training ahead of time, but collecting information about the player is required before accurate decisions can be made.

References

“Drivatar™ – Microsoft Research.” Drivatar™ – Microsoft Research. Microsoft Research, n.d. Web. 23 June 2014. <http://research.microsoft.com/en-us/projects/drivatar/>.

“Drivatar™ in Forza Motorsport.” Drivatar™ in Forza Motorsport. Microsoft Research, n.d. Web. 23 June 2014. <http://research.microsoft.com/en-us/projects/drivatar/forza.aspx>.

ML in Games – Part 1

Posted on June 20, 2014 by Ryan Bottriell

The next step in my research process is to look more specifically ad games. My goal in this post will be to discover and discuss several situations where machine learning could be useful in a game. I will be looking, if available, at published studies as well as my own opinions.

1.0 – Adaptive Artificial Intelligence

An important area of focus in many newer computer games is in their artificial intelligence (AI) systems. AI systems are used to control non-player characters (NPCs) or other game entities in order to create challenge for the player. As mentioned before, players are very unique, and providing them with just enough challenge relative to their play style and skill level will make the experience more enjoyable to them (Hagelback and Johansson, 51)

1.1 Some Theory

In general, an AI requires architecture and algorithms, knowledge, and an interface to the environment (Laird and van Lent, 2005). When we start looking at Adaptive, learning AI there are other factors that come into play. Nick Palmer outlines his methods for creating what he calls ‘Learning Agents’ in his online essay Machine Learning in Games Development. His setup involves the following components:

Learning Element – which is responsible for actually changing the behavior of the AI based on its past level of success or failure

Performance Element – which decides which action the AI will take based on it’s current knowledge of the environment

Performance Analyzer – which judges the actions of the performance element. Palmer clarifies that the judging must be made on the same information that used by the performance element to make decisions. The analyses made by this element will help decide how or if the learning element alters the behaviour of the AI.

Curiosity Element – which understands the goals of the AI and will challenge the performance element to try new behaviour possibilities which may improve the state of the AI with respect to it’s overall goals. This element helps keep the AI from getting satisfied by moderately successful behaviour (like hiding in cover the whole game so as not to die).

In general, developers who use adaptive AI in game development, will train the AI and allow it to improve during development, but simply capture the best results and build them statically into the final game (Palmer).

1.2 Supervised Learning and AI

Palmer mentions that his proposed system utilizes the ideas of reinforcement learning, and I found that this if often the case with game AI. I wanted to look into why this may be the case. Thore Graepel explored both methods for two different games in his talk “Learning to Play: Machine Learning in Games”. Supervised learning involves a lot of training, and is targeted more at predicting outcomes than generating real-time behaviour (Browlee, 2013). This type of learning is used to create game playing AI’s for games such as chess and GO because these games have past move sets and results that can be used for training. They also are not real-time games, and the AI can have time to think about the best solution (Graepel)

1.4 Behaviour-Based Example

In their journal “Dynamic Game Difficulty Scaling Using Adaptive Behavior-Based AI”, Tan, Tan and Tay explore one of the ways to implement an adaptive AI system using a behaviour-based controller. In essence, they have created a system which gives the AI 7 different behaviours. These behaviours can then be activated or deactivated to change the skill level of the AI. A digital chromosome is used to store 7 real numbers from 0 to 1 which represent the probability of each behavior being activated. These values are then updated at run-time to keep the AI competitive but not too difficult. In Javascript, we could setup a chromosome quite simply with an object, this way you can even assign nice names to each value:


var chromosome = {

property1: 0.0,

property2: 1.0

};

var chromosome = {

property1: 0.0,

property2: 1.0

};

The idea here is that upon any win or lose state, the values within the chromosome are increased or decreased depending on if they were activated or deactivated during the round. Overtime, this process would allow the chromosome to adjust in favor of the behaviors that cause it to win. Because each behavior may be activated or deactivated for each round, a properly tuned algorithm will vary in skill around the threshold of the players ability (Tan, Tan and Tay, 290-293).

References

Laird, John; van Lent, Michael. “Machine Learning for Computer Games.” Game Developers Conference. Moscone Center West, San Francisco, CA. 10 Mar. 2005. Lecture.

Palmer, Nick. “Machine Learning in Games Development.” Machine Learning in Games Development. N.p., n.d. Web. 20 June 2014. <http://ai-depot.com/GameAI/Learning.html>.

Graepel, Thore. “Learning to Play: Machine Learning in Games.” . Microsoft Research Cambridge, n.d. Web. 20 June 2014. <http://www.admin.cam.ac.uk/offices/research/documents/local/events/downloads/tm/06_ThoreGraepel.pdf>.

Brownlee, Jason. “A Tour of Machine Learning Algorithms.” Machine Learning Mastery. N.p., 25 Nov. 2013. Web. 2 June 2014. <http://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/>.

Hagelback, J.; Johansson, S.J., “Measuring player experience on runtime dynamic difficulty scaling in an RTS game,” Computational Intelligence and Games, 2009. CIG 2009. IEEE Symposium on , vol., no., pp.46,52, 7-10 Sept. 2009

Chin Hiong Tan; Kay Chen Tan; Tay, A., “Dynamic Game Difficulty Scaling Using Adaptive Behavior-Based AI,” Computational Intelligence and AI in Games, IEEE Transactions on , vol.3, no.4, pp.289,301, Dec. 2011

Tang, H.; Tan, C.H.; Tan, K.C.; Tay, A., “Neural network versus behavior based approach in simulated car racing game,” Evolving and Self-Developing Intelligent Systems, 2009. ESDIS ’09. IEEE Workshop on , vol., no., pp.58,65, March 30 2009-April 2 2009

ML – Some Basics

Posted on June 20, 2014 by Ryan Bottriell

In this post I will cover some basic machine learning definitions to help me understand things moving forward.

Problem Types

Classification

Here, the goal of the algorithm is to use information about a given item or data instance, and assign it a result or category (Laird, van Lent, 2005).

Clustering

Similar to classification, but the resulting categories are not pre-defined, and multiple data instances are used to define groups (Laird, van Lent, 2005).

Optimization

Modifying function input variables to find the highest or most optimal output result.

Algorithm Lingo

Decision Tree

A decision tree represents a process of classification through a mufti-step decision-making tree (SAS Institute).

Rules

With respect to machine learning, a rule is something that an system may define based on observations that defines how it should react to a given input.

Neural Network

Generally requires specific hardware. Neural networks learn by example rather than programming, by simulating a highly interconnected network similar to the human brain (Siganos , D and Stergiou, C).

Model

A model is a data set generated by the computer which is used to represent something. For example, a player model could be a set of variables that are set in order to represent the behaviours of a player.

References

Laird, John; van Lent, Michael. “Machine Learning for Computer Games.” Game Developers Conference. Moscone Center West, San Francisco, CA. 10 Mar. 2005. Lecture.

“Decision Trees— What Are They?.” . Statistical Analysis System Institute. Web. <http://support.sas.com/publishing/pubcat/chaps/57587.pdf>

Siganos , Dimitrios , and Stergiou, Christos . “Neural Networks.” Neural Networks. N.p., n.d. Web. 18 June 2014. <http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html>.

What is Machine Learning?

Posted on June 2, 2014 by Ryan Bottriell

Before I get into specific machine learning algorithms and ideas, I thought it would be best to look at machine learning from farther back. In this post I am going to wrap my head around the basic definition of machine learning along with some of the different learning styles that machine learning algorithms are categorized by. My hope is that this preliminary research will give me the basic understanding that I need to start looking into more specific ideas and solutions while also maybe focusing my interests for my project.

After my preliminary research I have found that the definition of machine learning varies wildly depending on the field and purpose for which it’s used. Like all algorithms, those used in machine learning take input data and aim to create an accurate output result. In general, what differentiates machine learning algorithms is their ability to ‘learn’ or decipher the model which connects the input data and result and even adjust this model over time to remain as accurate as possible.

Jason Brownlee has some wonderful descriptions of the different learning styles that algorithms use (link). I’ve done some other digging as well and tried to put some of the definitions into my own words as I understand them.

Supervised Learning:

Supervised learning seems to be the most common implementation of machine learning. In this setup, an algorithm uses known data with known results to create a model which describes their relationship. This model is then tested and refined with new data by seeing how accurate the results are. The supervised part comes from this need to ‘train’ the algorithm with test data and refine it before implementation.

Unsupervised Learning:

The idea here is that the algorithm will attempt to create a model to relate input data without having any predefined data sets to learn from or predefined responses/results. This type of algorithm in often used for clustering data and is also very good at finding hidden patterns within data. One of the main differences here is that there is no way for the algorithm to be told if a result is correct of accurate, drastically changing the workflow.

Semi-Supervised Learning:

This is used when there are both known and unknown data samples.

Reinforcement Learning:

So far, this is the type of machine learning that I am most interested in. Reinforcement learning is when the algorithm is not trained, but given a set of metrics which are used to measure the success of its decisions. This creates an inherent trial-and-error type system, where the algorithm must discover the actions and decisions which yield the best results. It should then favor these actions but must also continue to explore new options to see if there are better outcomes.

Looking forward to games:

In the coming weeks, I will look at how machine learning is and can be applied to games. After this initial research, I think that reinforcement learning will be the most interesting learning style to explore. It also seems as though it’s structure lends itself very well to the interactive nature of video games. Reinforcement learning’s ability to explore and adapt makes it a good choice for player-specific decision making. If the algorithm is making decisions which affect the game world, the player will be constantly interacting with that world and providing indirect feedback to the algorithm. This seems like a great environment to setup reinforcement-style metrics.

REFERENCES:

“Supervised Learning (Machine Learning) Workflow and Algorithms.” Supervised Learning (Machine Learning) Workflow and Algorithms. MathWorks, n.d. Web. 2 June 2014. <http://www.mathworks.com/help/stats/supervised-learning-machine-learning-workflow-and-algorithms.html>

“Unsupervised Learning.” Supervised Learning (Machine Learning) Workflow and Algorithms. MathWorks, n.d. Web. 2 June 2014. <http://www.mathworks.com/discovery/unsupervised-learning.html>.

Brownlee, Jason. “Practical Machine Learning Problems.” Machine Learning Mastery. N.p., 17 Nov. 2013. Web. 2 June 2014. <http://machinelearningmastery.com/what-is-machine-learning/>.

Brownlee, Jason. “Practical Machine Learning Problems.” Machine Learning Mastery. N.p., 23 Nov. 2013. Web. 2 June 2014. <http://machinelearningmastery.com/practical-machine-learning-problems/>.

Sutton, Richard, and Andrew Barto. “Reinforcement Learning.” Reinforcement Learning: An Introduction . Cambridge, Massachusetts: The MIT Press. Print.

Taiwo Oladipupo Ayodele (2010). Types of Machine Learning Algorithms, New Advances in Machine Learning, Yagang Zhang (Ed.), ISBN: 978-953-307-034-6, InTech, DOI: 10.5772/9385. Available from: http://www.intechopen.com/books/new-advances-in-machine-learning/types-of-machine-learning-algorithms