Antimatroid, The

thoughts on computer science, electronics, mathematics

Minesweeper Agent

with one comment

Introduction

Lately I’ve been brushing up on probability, statistics and machine learning and thought I’d play around with writing a Minesweeper agent based solely on these fields. The following is an overview of the game’s mechanics, verification of an implementation, some different approaches to writing the agent and some thoughts on the efficacy of each approach.

Minesweeper

Background

Minesweeper was created by Curt Johnson in the late eighties and later ported to Windows by Robert Donner while at Microsoft. With the release of Windows 3.1 in 1992, the game became a staple of the operating system and has since found its way onto multiple platforms and spawned several variants. The game has been shown to be NP-Complete, but in practice, algorithms can be developed to solve a board in a reasonable amount of time for the most common board sizes.

Specification

Gameplay
An agent, \mathcal{A}, is presented a n \times m grid containing M uniformly distributed mines. The agent’s objective is to expose all the empty grid locations and none of the mines. Information about the mines’ grid locations is gained by exposing empty grid locations which will indicate how many mines exist within a unit (Chebyshev) distance of the grid location. If the exposed grid location is a mine, then the player loses the game. Otherwise, once all empty locations are exposed, the player wins. \textbf{PLAY-GAME}(\mathcal{A}, n, m, M)  \newline \indent H \leftarrow \textbf{INITIALIZE-HIDDEN}(n,m,M)  \newline \indent V \leftarrow \textbf{INITIALIZE-VISIBLE}(n,m,M)  \newline  \newline \indent \textbf{do}  \newline \indent \indent (i,j) \leftarrow \mathcal{A}(V)  \newline \indent \indent \textbf{EXPOSE}(H, V, i, j)  \newline \indent \indent \textbf{if} \quad V_{i,j} = \text{*}  \newline \indent \indent \indent \textbf{return} \quad \textit{Failure}  \newline \indent \textbf{while} \quad M \neq \lvert \textbf{U-LOCATIONS}(V) \rvert  \newline  \newline \indent \textbf{return} \quad \textit{Success}
Initialization
The board consists of hidden and visible states. To represent the hidden, H, and visible state, V, of the board, two character matrices of dimension n \times m are used.

Characters ‘0’-‘8’ represent the number of neighboring mines, character ‘U’ to represent an unexposed grid location and character ‘*’ for a mine.

Neighbors of a grid location (i,j) is the set of grid locations such that 1 \le \lVert (u,v) - (i,j) \rVert_{\infty} \le r. By default, r = 1.

\textbf{INITIALIZE-VISIBLE}(n, m)   \newline \indent \textbf{return} \quad \textbf{INITIALIZE}(n,m, \text{U})

\textbf{INITIALIZE-HIDDEN}(n, m, M)  \newline \indent H \leftarrow \textbf{INITIALIZE}(n,m,\text{0})  \newline  \newline \indent S \leftarrow \emptyset  \newline \indent \textbf{while} \quad \lvert S \lvert < M  \newline \indent \indent S \leftarrow S \cup \textbf{RANDOM}(\textbf{LOCATIONS}(H))  \newline   \newline \indent \textbf{foreach} \quad (i, j) \in S  \newline \indent \indent H_{i, j} = \text{*}  \newline \indent \indent \textbf{foreach} \quad (u,v) \in \textbf{NEIGHBORS}(n, m, i, j) \setminus S   \newline \indent \indent \indent H_{u,v} \leftarrow H_{u,v} + 1  \newline  \newline \indent \textbf{return} \quad H

Exposing Cells
The expose behavior can be thought of as a flood fill on the grid, exposing any empty region bordered by grid locations containing mine counts and the boundaries of the grid.

A matrix, F \in \mathbb{Z}^{n \times m}, represents the topography of the board. A value of zero is reserved for sections of the board that have yet to be visited, a value of one for those that have, two for those that are boundaries and three for mines. A stack, S, keeps track of locations that should be inspected.

If a cell location can be exposed, then each of its neighbors will be added to the stack to be inspected. Those neighbors that have already been inspected will be skipped. Once all the reachable grid locations have been inspected, the process terminates.

\textbf{EXPOSE}(H, V, i, j)  \newline \indent \textbf{foreach} \quad (u,v) \in \textbf{LOCATIONS}(H)  \newline \indent \indent \textbf{if} \quad H_{u,v} = \text{0}  \newline \indent \indent \indent F_{u,v} \leftarrow 0  \newline \indent \indent \textbf{if} \quad H_{i,j} = \text{*}  \newline \indent \indent \indent F_{u,v} \leftarrow 3  \newline \indent \indent \textbf{else}  \newline \indent \indent \indent F_{u,v} \leftarrow 2  \newline  \newline \indent \textbf{PUSH}(S, (i, j))  \newline \indent \textbf{do}  \newline \indent \indent (u, v) \leftarrow \textbf{POP}(S)  \newline \indent \indent \textbf{if} \quad F_{u,v} = 0  \newline \indent \indent \indent F_{u,v} \leftarrow 1  \newline \indent \indent \indent \textbf{foreach} \quad (r,s) \in \textbf{NEIGHBORS}(H, u, v)  \newline \indent \indent \indent \indent \textbf{PUSH}(S, (r,s))  \newline \indent \indent \textbf{elseif} \quad F_{u,v} \in (1, 2)  \newline \indent \indent \indent V_{u,v} \leftarrow H_{u, v}  \newline \indent \textbf{while} \quad \textbf{COUNT}(S) > 0

Verification

Methodology

Statistical tests are used to verify the random aspects of the game’s implementation. I will skip the verification of the game’s logic as it requires use of a number of different methods that are better suited for their own post.

There are two random aspects worth thinking about: the distribution of mines and the distribution of success (i.e., not clicking a mine) for random trials. In both scenarios it made since to conduct Pearson’s chi-squared test. Under this approach there are two hypotheses:

  • H_{0}: The distribution of experimental data follows the theoretical distribution
  • H_{a}: The distribution experimental data does not follow the theoretical distribution

H_{0} is accepted when the test statistic, \chi^2, is less than the critical value, \chi_{\alpha}^2. The critical value is determined by deciding on a p-value (e.g., 0.05, 0.01, 0.001), \alpha, that results in the tail area beneath the chi-squared distribution \chi_{k}^2(x) equal to \alpha. k is the degrees of freedom in the observation.

Mine distribution

The first aspect to verify was that mines were being uniformly placed on the board. For a standard 9 \times 9 board with 10 mines, the expectation is that each grid location should be assigned \frac{10}{81} N times for N trials. k = 80 for this experiment.

In the above experiment, \chi^2 = 71.78 and \chi_{0.05}^2 = 101.87. Since \chi^2 < \chi_{0.05}^2, this affirms H_{0} and that the implemented distribution of mines is indeed uniform with a statistical significance of 0.05.

Distribution of successful clicks

The second aspect to verify is that the number of random clicks before exposing a mine follows a hypergeometric distribution. The hypergeometric distribution is appropriate since we are sampling (exposing) without replacement (the grid location remains exposed after clicking). This hypothesis relies on a non-flood-fill exposure.

The distribution has four parameters. The first is the number of samples drawn (number of exposures), the second the number of successes in the sample (number of empty exposures), the third the number of successes in the population (empty grid locations) and the last the size of the population (grid locations): h(nm-M;nm-M,nm-M, nm).

The expected frequencies for the hypergeometric distribution is given by N h(nm - M; nm - M, nm - M, nm) for N trials. k = 70 in this case.

In the above experiment \chi^2 = 47.24 and \chi_{0.05}^2 = 90.53. Since \chi^2 < \chi_{0.05}^2, this affirms H_{0} and that the number of locations exposed prior to exposing a mine follows a hypergeometric distribution with a statistical significance of 0.05.

Also included in the plot is the observed distribution for a flood based exposure. As one might expect, the observed frequency of more exposures decreases more rapidly than that of the non-flood based exposure.

Agents

Methodology

Much like how a human player would learn to play the game, I decided that each model would have knowledge of game’s mechanics and no prior experience with the game. An alternative class of agents would have prior experience with the game as the case would be in a human player who had studied other player’s strategies.

To evaluate the effectiveness of the models, each played against a series of randomly generated grids and their respective success rates were captured. Each game was played on a standard beginner’s 9 \times 9 grid containing between [1, 10] mines.

For those models that refer to a probability measure, \mathbb{P}, it is assumed that the measure is determined empirically and treated as an estimate of the probability of an event and not as an a priori measure.

Marginal Model

Development

The first model to consider is the Marginal Model. It is designed to simulate the behavior of a naive player who believes that if he observes a mine at a grid location that the location should be avoid in future trials.

The model treats the visible board, V, as a matrix of discrete random variables where each grid location is interpreted as either \textit{Empty} or (a) \textit{Mine}. This model picks the grid location with the greatest empirical probability of being empty:

\underset{(i,j)}{\arg\max} \quad \mathbb{P}(X_{i,j} = \textit{Empty})

\textbf{MARGINAL-MODEL}(V)  \newline \indent p_{\textit{max}} \leftarrow \perp  \newline \indent (i, j)_{\textit{max}} \leftarrow \perp  \newline  \newline \indent \textbf{foreach} \quad (i,j) \in \textbf{U-LOCATIONS}(V)  \newline \indent \indent p \leftarrow \mathbb{P}(V_{i,j} = \textit{Empty})  \newline \indent \indent \textbf{if} \quad p > p_{\textit{max}}   \newline \indent \indent  \indent p_{\textit{max}} \leftarrow p  \newline \indent \indent \indent (i,j)_{\textit{max}} \leftarrow (i,j)  \newline  \newline \indent \textbf{return} \quad (i,j)_{\textit{max}}

Test Results

Since the mine distribution is uniform, the model should be equivalent to selecting locations at random. The expected result is that avoiding previously occupied grid locations is an ineffective strategy as the number of mines increases. This does however, provide an indication of what the success rate should look like for chance alone.

Conditional Model

Development

One improvement over the Marginal Model is to take into account the visual clues made visible when an empty grid location is exposed. Since an empty grid location represents the number of neighboring mines, the Conditional Model can look at these clues to determine whether or not an unexposed grid location contains a mine.

This boils down to determining the probability of \mathbb{P}(\textit{Mine} \lvert \textrm{Evidence}). A simplification in calculating the probability is to assume that each piece of evidence is independent. Under this assumption the result is a Naïve Bayes Classifier:

\mathbb{P}( C = c \vert X = x ) = \dfrac{\mathbb{P}(C = c) \prod \mathbb{P}( X_{i} = x_{i} \vert C = c)}{\prod\mathbb{P}(X_{i} = x_{i})}

As in the case of the Marginal Model, the Conditional Model returns the grid location that it has determined has the greatest probability of being empty given its neighbors:

\underset{(i,j)}{\arg\max} \quad \mathbb{P}(C_{i,j} = \textit{Empty} | N(V_{i,j}) )

\textbf{CONDITIONAL-MODEL}(V, r)  \newline \indent C \leftarrow \lbrace \textit{Empty}, \textit{Mine} \rbrace  \newline \indent S \leftarrow \textbf{U-LOCATIONS}(V)  \newline  \newline \indent T \leftarrow \emptyset    \newline \indent \textbf{foreach} \quad (i,j) \in S  \newline \indent \indent N \leftarrow \textbf{NEIGHBORS}(V, i, j, r)  \newline \indent \indent p_{\textit{max}} \leftarrow \perp  \newline \indent \indent c_{\textit{max}} \leftarrow \perp  \newline \indent \indent \textbf{foreach} \quad c \in C  \newline \indent \indent \indent p \leftarrow \mathbb{P}(C = c)  \newline \indent \indent \indent \textbf{foreach} \quad (u,v) \in N  \newline \indent \indent \indent \indent p \leftarrow p * \mathbb{P}( X_{i, j} = V_{i, j} \vert C = c )  \newline \indent \indent \indent \textbf{if} \quad p > p_{\textit{max}}  \newline \indent \indent \indent \indent p_{\textit{max}} \leftarrow p  \newline \indent \indent \indent \indent c_{\textit{max}} \leftarrow c  \newline \indent \indent \textbf{if} \quad c_{\textit{max}} = \textit {Empty}  \newline \indent \indent \indent T \leftarrow T \cup (i, j)  \newline  \newline \indent \textbf{return} \quad \textbf{RANDOM}(T)
Test Results

The Naïve Bayes Classifier is regarded as being an effective approach to classifying situations for a number of different tasks. In this case, it doesn’t look like it is effective at classifying mines from non-mines. The results are only slightly better than the Marginal Model.

Graphical Model

Development

One shortfall of the Conditional Model is that it takes a greedy approach in determining which action to take. A more sophisticated approach is to not just consider the next action, but the possible sequence of actions that will minimize the possibility of exposing a mine.

Each of the possible observable grids, S, can be thought of as a set of vertices in a graph whose corresponding set of edges represent the transition between a state, s, to the next observable state, s^{\prime}. Each transition was achieved by performing an action, a \in A, on the state. The specific action, \alpha : S \times S \to A, is chosen from a subset of permitted actions given the state. Each transition has a probability, \mathbb{P}(s^{\prime} \vert s), of taking place.

It is possible to pick a path, \pi, through this graph that minimizes the risk by assigning a reward, \rho : S \to \mathbb{R}, to each state and attempting to identify an optimal path, \pi^{*}_{s}, from the present state that yields the greatest aggregate reward,

\displaystyle \varrho(\pi) = \sum_{(s, s^{\prime}) \in \pi} \rho(s^{\prime}) \mathbb{P}(s^{\prime} \vert s)

Solving for \pi^{*}_{s} is equivalent to solving the Longest Path Problem and can be computed efficiently using a dynamic programming solution.

\displaystyle \pi_{s}^{*} \gets \underset{\pi_{s}}{\arg\max} \ \sum_{(\sigma, \sigma^{\prime}) \in \pi_{s}} \rho(\sigma^{\prime}) \mathbb{P}(\sigma^{\prime} \vert \sigma) \ \pi_{s} \in \Pi_{s}

\textbf{GRAPHICAL-MODEL}(V)  \newline \indent (a, r)_{\textit{max}} \gets (\bot, \bot)  \newline \indent T \gets \emptyset  \newline  \newline \indent \textbf{foreach} \quad U \in \textbf{SUB-GRIDS}(V)  \newline \indent \indent (a, r)_{U} \gets \textbf{OPTIMAL-ACTION}(U, \bot)  \newline \indent \indent \textbf{if} \quad r_{U} > r_{\textit{max}}  \newline \indent \indent \indent (a,r)_{\textit{max}} \gets (a,r)_{U}  \newline \indent \indent \indent T \gets \emptyset  \newline \indent \indent \textbf{if} \quad r_{U} = r_{\textit{max}}  \newline \indent \indent \indent T \gets T \cup (a,r)_{U}  \newline  \newline \indent (a, r)_{\textit{max}} \gets \textbf{RANDOM}(T)  \newline  \newline \indent \textbf{return} \quad a_{\textit{max}}  \newline  \newline  \textbf{OPTIMAL-ACTION}(V)  \newline \indent (a, r)_{\textit{max}} \gets (\bot, \bot)  \newline  \newline \indent \textbf{foreach} \quad (i,j) \in \textbf{U-LOCATIONS}(V)  \newline \indent \indent \textbf{foreach} \quad V^{\prime} \in \textbf{OUTCOMES}(V, (i,j))  \newline \indent \indent \indent (a, r)_{V^{\prime}} \gets \textbf{OPTIMAL-ACTION}(V^{\prime})  \newline \indent \indent \indent r \gets r_{V^{\prime}} + \mathbb{P}(V^{\prime} \vert V) * \textbf{REWARD}(V^{\prime})  \newline \indent \indent \indent \textbf{if} \quad r \ge r_{\textit{max}}  \newline \indent \indent \indent \indent (a, r)_{\textit{max}} \gets ((i,j), r)  \newline  \newline \indent \textbf{return} \quad (a,r)_{\textit{max}}  \newline  \newline  \textbf{REWARD}(V)  \newline \indent \textbf{if} \quad \textbf{SUCCESS}(V)  \newline \indent \indent \indent \textbf{return} \quad +100  \newline \indent \textbf{if} \quad \textbf{FAILURE}(V)  \newline \indent \indent \indent \textbf{return} \quad -100  \newline  \newline \indent \textbf{return} +1

From the optimal walk, a sequence of optimal actions is determined by mapping \alpha over the path. Taking the first action gives the optimal grid location to expose given the current visible state of the board.

This description constitutes a Markov Decision Process. As is the case for most stochastic processes, it is assumed that the process holds the Markov Property; that future states only depend upon the current states and none of the prior states. In addition to being a Markov Decision Process, this is also an example of Reinforcement Learning.

First thing to observe is that the game state space is astronomical. For a standard beginner’s grid there is at most a sesvigintillion (10^{81}) possible grids that a player can encounter. Which as an aside, is on the order of the number of atoms in the observable universe! The set of actions at each state is slightly more manageable with at most eighty-one actions.

To simplify the state space, I chose to only consider 3 \times 3 boards and when evaluating a full grid, consider the possible sub-grids and evaluate the optimal sequence of actions for each sub-grid and pick the maximum reward associated for each sub-grid that was evaluated as the action to take on the full grid.

Test Results

The Graphical Model produces results that are only a margin better than those of the Conditional Model.

Semi-deterministic Model

Development

The last model I’m going to talk about is a semi-deterministic model. It works by using the visible grid to infer the topology of the hidden grid and from the hidden grid, the topology that the visible grid can become.

The grid can be viewed as a graph. Each grid location is a vertex and an edge is an unexposed grid location’s influence on another grid location’s neighbor mine count.

For each of the exposed grid locations on the board, v_{i,j}, it’s neighbors, N(v_{i,j}), are all mines when the number of inbound edges E_{i,j} = d(v_{i,j}), matches the visible mine count V_{i,j}.

The model produces its inferred version, F, of the influence graph E by using the determined mine locations M.

For each of the grid locations that are exposed and the inferred influence matches the visible count, then each of the neighbors about that location can be exposed provided they are not already exposed and not an inferred mine.

From this set of possibilities, a mine location is chosen. When no mine locations can be determined, then an alternative model can be used.

\textbf{SEMI-DETERMINISTIC-MODEL}(V)  \newline \indent E \leftarrow 0_{n, m}  \newline \indent \textbf{foreach} \quad (i,j) \in \textbf{LOCATIONS}(V)  \newline \indent \indent \textbf{if} \quad V_{i,j} \ne \textit{U}  \newline \indent \indent \indent \textbf{continue}  \newline \indent \indent \textbf{foreach} \quad (u,v) \in \textbf{NEIGHBORS}(V, i, j)  \newline \indent \indent \indent E_{u,v} \leftarrow E_{u,v} + 1  \newline  \newline \indent M \leftarrow \textit{False}_{n,m}  \newline \indent \textbf{foreach} \quad (i,j) \in \textbf{LOCATIONS}(V)  \newline \indent \indent \textbf{if} \quad V_{i,j} \in \lbrace \textit{U}, \textit{0} \rbrace \lor V_{i,j} \neq E_{i,j}  \newline \indent \indent \indent \textbf{continue}  \newline \indent \indent \textbf{foreach} \quad (u,v) \in \textbf{NEIGHBORS}(V, i, j)  \newline \indent \indent \indent \textbf{if} \quad V_{i,j} = \textit{U}  \newline \indent \indent \indent \indent M_{u,v} \leftarrow \textit{True}   \newline  \newline \indent F \leftarrow 0_{n,m}  \newline \indent \textbf{foreach} \quad (i,j) \in \textbf{LOCATIONS}(V)  \newline \indent \indent \textbf{foreach} \quad (u,v) \in \textbf{NEIGHBORS}(V, i, j)  \newline \indent \indent \indent \textbf{if} \quad M_{u,v}  \newline \indent \indent \indent \indent F_{i,j} \leftarrow F_{i,j} + 1  \newline  \newline \indent S \leftarrow \emptyset  \newline \indent \textbf{foreach} \quad (i,j) \in \textbf{LOCATIONS}(V)  \newline \indent \indent \textbf{if} \quad V_{i,j} = \textit{U} \lor F_{i,j} \ne V_{i,j}  \newline \indent \indent \indent \textbf{continue}  \newline \indent \indent \textbf{foreach} \quad (u,v) \in \textbf{NEIGHBORS}(V, i, j)  \newline \indent \indent \indent \textbf{if} \quad V_{i,j} \ne \textit{U} \lor M_{u,v}  \newline \indent \indent \indent \indent \textbf{continue}  \newline \indent \indent \indent S \leftarrow S \cup (u, v)  \newline   \newline \indent \textbf{return} \quad \textbf{RANDOM}(S)

Test Results

Since the model is a more direct attempt at solving the board, its results are superior to the previously presented models. As the number of mines increases, it is more likely that it has to rely on a more probabilistic approach.

Summary

Each of the models evaluated offered incremental improvements over their predecessors. Randomly selecting locations to expose is on par with choosing a location based on previously observed mine locations. The Conditional Model and Graphical Model yield similar results since they both make decisions based on conditioned probabilities. The Semi-deterministic Model stands alone as the only one model that produced reliable results.

The success rate point improvement between the Condition and Marginal models is most notable for boards consisting of three mines and the improvement between Graphical and Semi-deterministic models for seven mines. Improvements between Random and Marginal models is negligible and between Conditional and Graphical is minor for all mine counts fewer than seven.

Given the mathematical complexity and nondeterministic nature of the machine learning approaches, (in addition the the complexity and time involved in implementing those approaches) they don’t seem justified when more deterministic and simpler approaches exist. In particular, it seems like most people have implemented their agents using heuristics and algorithms designed to solve constraint satisfaction problems. Nonetheless, this was a good refresher to some of the elementary aspects of probability, statistics and machine learning.

References

Classification – Naïve Bayes.” Data Mining Algorithms in R. Wikibooks. 3 Nov. 2010. Web. 30 Oct. 2011.

Windows Minesweeper.” MinesweeperWiki. 8 Sept. 2011. Web. 30 Oct. 2011.

Kaye, Richard. “Minesweeper Is NP-complete.” [pdf] Mathematical Intelligencer 22.2 (2000): 9-15. Web. 30 Oct. 2011.

Nakov, Preslav, and Zile Wei. “MINESWEEPER, #MINESWEEPER.” 14 May 2003. Web. 14 Apr. 2012.

Richard, Sutton, and Andrew G. Barto. “3.6 Markov Decision Processes.” Reinforcement Learning: An Introduction. Cambridge, Massachusetts: Bradford Book, 1998. 4 Jan. 2005. Web. 30 Oct. 2011.

Rish, Irene “An Empirical Study of the Naive Bayes Classifer.” [pdf] IJCAI-01 Workshop on Empirical Methods in AI (2001). Web. 30 Oct. 2011.

Russell, Stuart J., and Peter Norvig. Artificial Intelligence: A Modern Approach. Upper Saddle River, NJ: Prentice Hall/PearsonEducation., 2003. Print.

Sun, Yijun, and Jian Li. “Adaptive Learning Approach to Landmine Detection.” [pdf] IEEE Transactions of Aerospace and Electronic Systems 41.3 (2005): 1-9. 10 Jan. 2006. Web. 30 Oct. 2011.

Taylor, John R. An introduction to error analysis: the study of uncertainties in physical measurements. Sausalito, CA: University Science Books, 1997. Print.

One Response

Subscribe to comments with RSS.

  1. Reblogged this on Ed George Developer and commented:
    Fantastic post! Worth a read.

    Ed George

    2012-05-03 at 6:45 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: