Antimatroid, The

thoughts on computer science, electronics, mathematics

Tropical Representation of the All-Pairs Shortest Path Problem

Motivation

While I was doing my Abstract Algebra research the other month, I came across an interesting way of simplifying the representation of the all-pairs shortest path problem using Tropical Geometry. I thought it was pretty clever, so I thought I’d do a quick write-up.

Problem Statement

The all-pairs shortest path problem is to identify the minimum path cost, $\Omega(\pi) = \sum_{e \in \pi} \omega(e)$, out of the possible paths $\pi_{i,j} \in \Pi_{i,j}$ between vertices $v_{i}$ and $v_{j}$.

Proposition

Consider a weighted directed graph (digraph), $G = (V, E, \omega)$, consisting of vertices, $V$, and directed edges (arcs), $E \subseteq V \times V$, and a function, $\omega : E \to \overline{\mathbb{R}}_{+}$, yielding the weight of an edge. Only those weights from the positive affinely extended real numbers, $\overline{\mathbb{R}}_{+} = \mathbb{R}_{+} \cup \lbrace \infty \rbrace$, are allowed per the problem statement. The adjacency matrix representation, $D \in \overline{\mathbb{R}}_{+}^{\lvert V \rvert \times \lvert V \rvert}$, of $G$ is given by the following matrix:

$D_{i, j} = \begin{cases} 0 & i = j \\ \omega((v_{i}, v_{j})) & (v_{i}, v_{j}) \in E \\ \infty & \text{otherwise} \end{cases}$

Now, consider a semi-ring over $x, y \in \overline{\mathbb{R}}_{+}$ whose additive operator, $\oplus \in \overline{\mathbb{R}}_{+} \to \overline{\mathbb{R}}_{+}$, is given by the minimum function, $x \oplus y = \min(x,y)$, and whose multiplicative operator, $\odot \in \overline{\mathbb{R}}_{+} \to \overline{\mathbb{R}}_{+}$, is given by addition, $x \odot y = x + y$. The additive unit is given by infinity, $x \oplus \infty = x$, and the multiplicative unit by zero, $x \odot 0 = x$. This semi-ring is the Tropical Semi-ring $\mathbb{T} = \left ( \overline{\mathbb{R}}_{+}, \oplus, \odot \right )$. (The namesake of tropical is in honor of Brazilian Imre Simon who developed this branch of mathematics.)

Linear algebra constructs can be tropicalized to yield the familiar definitions for matrix addition and multiplication for matricies $A, B \in \overline{\mathbb{R}}_{+}^{n \times m}$ and $C \in \overline{\mathbb{R}}_{+}^{m \times n}$.

$\displaystyle \left (A \oplus B \right )_{i, j} = A_{i,j} \oplus B_{i, j}$

$\displaystyle (A \odot C)_{i,j} = \bigoplus_{k}^{m} A_{i, k} \odot C_{k, j}$

Given the two prior statements, the elegant solution to the all-pairs shortest path problem is given by taking powers of the adjacency matrix: $D_{G}^{\odot \lvert V \rvert - 1}$.

Proof

To see how this works out, start with $D_{G}$. The matrix represents the minimum cost between any two adjacent vertices. In other words, the minimum cost for all paths containing a single edge. The next inductive step is to consider paths containing at most two adjacent edges. Squaring the adjacency matrix yields all such paths. When the matrix is squared, each edge is concatenated to all other adjacent edges and the minimum weight of the paths is selected. This thought process can iterated as follows:

$\displaystyle D_{G}^{\odot r} = D_{G}^{\odot r - 1} \odot D_{G}$
$\displaystyle D_{i, j}^{\odot r} = \bigoplus_{k}^{m} D_{i, k}^{\odot r - 1} \odot D_{k, j}$
$\displaystyle D_{i, j}^{\odot r} = \min { \lbrace D_{i, k}^{\odot r - 1} + D_{k, j} \rbrace }$ $\forall k \in [1, m]$

The result is a typical Bellman equation. A graph can have at most $\lvert V \rvert - 1$ edges between any two vertices, thus, the solution to the all-pairs shortest path problem is given by $\displaystyle D_{G}^{\odot \lvert V \rvert - 1}$.

Example

As a worked example, consider the following graph whose set of vertices is given by the set $V = \lbrace a, b, c, d \rbrace$, set of arcs by $E = \lbrace (a,b), (a,c), (a,d), (b,c), (b, d), (c,d) \rbrace$ and weight function, $\omega$, as labeled on the graph.

The all-pairs shortest paths are given by the following calculations where the row and column coordinates correspond to the vertices of $V$. Values in bold denote a change in the shortest path between two vertices.

$D_{G} = \begin{pmatrix}0 & 1 & 8 & 12\\\infty & 0 & 2 & 10\\\infty & \infty & 0 & 3 \\ \infty & \infty & \infty & 0 \end{pmatrix}$ $D_{G}^{\odot 2} = \begin{pmatrix}0 & 1 & \textbf{3} & \textbf{11}\\\infty & 0 & 2 & \textbf{5}\\\infty & \infty & 0 & 3 \\ \infty & \infty & \infty & 0 \end{pmatrix}$ $D_{G}^{\odot 3} = \begin{pmatrix}0 & 1 & 3 & \textbf{6}\\\infty & 0 & 2 & 5\\\infty & \infty & 0 & 3 \\ \infty & \infty & \infty & 0 \end{pmatrix}$

Computational Complexity

From asymptotic standpoint, tropical matrix multiplication is still on the order of traditional matrix multiplication of $\mathcal{O}(\lvert V \rvert^{3} )$. Computing the all-pairs shortest path problem using this approach is on the order of $\mathcal{O}(\lvert V \rvert^{4})$ since we must perform the tropical matrix multiplication $\lvert V \rvert - 1$ times. Now, This can be improved slightly since tropical matrix multiplication is associative, so we can leverage the repeated squaring approach and reduce the time complexity down to $\mathcal{O}(\lvert V \rvert^{3} \log{\lvert V \rvert})$.

The time complexity can be further reduced to $\mathcal{O}(\lvert V \rvert^{3} )$ using the Floyd-Warshall Algorithm, which is another dynamic programming approach that is similar in form to the tropical representation of the problem. In essence, it follows the same base case, but it’s recurrence statement only considers a range of vertices with respect to the two vertices being considered. A more in depth review of the algorithm can be found in the references.

References

Floyd-Warshall’s Algorithm.” Algorithmist. Web. 12 Apr. 2012.

Cormen, Thomas H., Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. “25.2 The Floyd-Warshall Algorithm.” Introduction to Algorithms. 2nd ed. Cambridge, MA: MIT, 2001. 629-35. Print.

Diestel, Reinhard. Graph theory. Heidelberg New York: Springer, 2010.

Laface, Antonio. Introduction to Tropical Geometry [pdf]. 29 Nov. 2006. Web. 11 Apr. 2012.

Maclagan, Diane, and Bernd Sturmfels. Introduction to Tropical Geometry [pdf]. 4 Nov. 2009. Web. 9 Apr. 2012.

Mohri, Mehryar. “Semiring Frameworks and Algorithms for Shortest-Distance Problems” [pdf]. Journal of Automata, Languages and Combinatorics 7 (2002) 3: 321-50. 8 Aug. 2002. Web. 31 Mar. 2012.

Written by lewellen

2012-06-01 at 8:00 am

Minesweeper Agent

with one comment

Introduction

Lately I’ve been brushing up on probability, statistics and machine learning and thought I’d play around with writing a Minesweeper agent based solely on these fields. The following is an overview of the game’s mechanics, verification of an implementation, some different approaches to writing the agent and some thoughts on the efficacy of each approach.

Minesweeper

Background

Minesweeper was created by Curt Johnson in the late eighties and later ported to Windows by Robert Donner while at Microsoft. With the release of Windows 3.1 in 1992, the game became a staple of the operating system and has since found its way onto multiple platforms and spawned several variants. The game has been shown to be NP-Complete, but in practice, algorithms can be developed to solve a board in a reasonable amount of time for the most common board sizes.

Specification

Gameplay
An agent, $\mathcal{A}$, is presented a $n \times m$ grid containing $M$ uniformly distributed mines. The agent’s objective is to expose all the empty grid locations and none of the mines. Information about the mines’ grid locations is gained by exposing empty grid locations which will indicate how many mines exist within a unit (Chebyshev) distance of the grid location. If the exposed grid location is a mine, then the player loses the game. Otherwise, once all empty locations are exposed, the player wins. $\textbf{PLAY-GAME}(\mathcal{A}, n, m, M) \newline \indent H \leftarrow \textbf{INITIALIZE-HIDDEN}(n,m,M) \newline \indent V \leftarrow \textbf{INITIALIZE-VISIBLE}(n,m,M) \newline \newline \indent \textbf{do} \newline \indent \indent (i,j) \leftarrow \mathcal{A}(V) \newline \indent \indent \textbf{EXPOSE}(H, V, i, j) \newline \indent \indent \textbf{if} \quad V_{i,j} = \text{*} \newline \indent \indent \indent \textbf{return} \quad \textit{Failure} \newline \indent \textbf{while} \quad M \neq \lvert \textbf{U-LOCATIONS}(V) \rvert \newline \newline \indent \textbf{return} \quad \textit{Success}$
Initialization
The board consists of hidden and visible states. To represent the hidden, $H$, and visible state, $V$, of the board, two character matrices of dimension $n \times m$ are used.

Characters ‘0’-‘8’ represent the number of neighboring mines, character ‘U’ to represent an unexposed grid location and character ‘*’ for a mine.

Neighbors of a grid location $(i,j)$ is the set of grid locations such that $1 \le \lVert (u,v) - (i,j) \rVert_{\infty} \le r$. By default, $r = 1$.

$\textbf{INITIALIZE-VISIBLE}(n, m) \newline \indent \textbf{return} \quad \textbf{INITIALIZE}(n,m, \text{U})$

$\textbf{INITIALIZE-HIDDEN}(n, m, M) \newline \indent H \leftarrow \textbf{INITIALIZE}(n,m,\text{0}) \newline \newline \indent S \leftarrow \emptyset \newline \indent \textbf{while} \quad \lvert S \lvert < M \newline \indent \indent S \leftarrow S \cup \textbf{RANDOM}(\textbf{LOCATIONS}(H)) \newline \newline \indent \textbf{foreach} \quad (i, j) \in S \newline \indent \indent H_{i, j} = \text{*} \newline \indent \indent \textbf{foreach} \quad (u,v) \in \textbf{NEIGHBORS}(n, m, i, j) \setminus S \newline \indent \indent \indent H_{u,v} \leftarrow H_{u,v} + 1 \newline \newline \indent \textbf{return} \quad H$

Exposing Cells
The expose behavior can be thought of as a flood fill on the grid, exposing any empty region bordered by grid locations containing mine counts and the boundaries of the grid.

A matrix, $F \in \mathbb{Z}^{n \times m}$, represents the topography of the board. A value of zero is reserved for sections of the board that have yet to be visited, a value of one for those that have, two for those that are boundaries and three for mines. A stack, $S$, keeps track of locations that should be inspected.

If a cell location can be exposed, then each of its neighbors will be added to the stack to be inspected. Those neighbors that have already been inspected will be skipped. Once all the reachable grid locations have been inspected, the process terminates.

$\textbf{EXPOSE}(H, V, i, j) \newline \indent \textbf{foreach} \quad (u,v) \in \textbf{LOCATIONS}(H) \newline \indent \indent \textbf{if} \quad H_{u,v} = \text{0} \newline \indent \indent \indent F_{u,v} \leftarrow 0 \newline \indent \indent \textbf{if} \quad H_{i,j} = \text{*} \newline \indent \indent \indent F_{u,v} \leftarrow 3 \newline \indent \indent \textbf{else} \newline \indent \indent \indent F_{u,v} \leftarrow 2 \newline \newline \indent \textbf{PUSH}(S, (i, j)) \newline \indent \textbf{do} \newline \indent \indent (u, v) \leftarrow \textbf{POP}(S) \newline \indent \indent \textbf{if} \quad F_{u,v} = 0 \newline \indent \indent \indent F_{u,v} \leftarrow 1 \newline \indent \indent \indent \textbf{foreach} \quad (r,s) \in \textbf{NEIGHBORS}(H, u, v) \newline \indent \indent \indent \indent \textbf{PUSH}(S, (r,s)) \newline \indent \indent \textbf{elseif} \quad F_{u,v} \in (1, 2) \newline \indent \indent \indent V_{u,v} \leftarrow H_{u, v} \newline \indent \textbf{while} \quad \textbf{COUNT}(S) > 0$

Verification

Methodology

Statistical tests are used to verify the random aspects of the game’s implementation. I will skip the verification of the game’s logic as it requires use of a number of different methods that are better suited for their own post.

There are two random aspects worth thinking about: the distribution of mines and the distribution of success (i.e., not clicking a mine) for random trials. In both scenarios it made since to conduct Pearson’s chi-squared test. Under this approach there are two hypotheses:

• $H_{0}$: The distribution of experimental data follows the theoretical distribution
• $H_{a}$: The distribution experimental data does not follow the theoretical distribution

$H_{0}$ is accepted when the test statistic, $\chi^2$, is less than the critical value, $\chi_{\alpha}^2$. The critical value is determined by deciding on a p-value (e.g., 0.05, 0.01, 0.001), $\alpha$, that results in the tail area beneath the chi-squared distribution $\chi_{k}^2(x)$ equal to $\alpha$. $k$ is the degrees of freedom in the observation.

Mine distribution

The first aspect to verify was that mines were being uniformly placed on the board. For a standard $9 \times 9$ board with $10$ mines, the expectation is that each grid location should be assigned $\frac{10}{81} N$ times for $N$ trials. $k = 80$ for this experiment.

In the above experiment, $\chi^2 = 71.78$ and $\chi_{0.05}^2 = 101.87$. Since $\chi^2 < \chi_{0.05}^2$, this affirms $H_{0}$ and that the implemented distribution of mines is indeed uniform with a statistical significance of $0.05$.

Distribution of successful clicks

The second aspect to verify is that the number of random clicks before exposing a mine follows a hypergeometric distribution. The hypergeometric distribution is appropriate since we are sampling (exposing) without replacement (the grid location remains exposed after clicking). This hypothesis relies on a non-flood-fill exposure.

The distribution has four parameters. The first is the number of samples drawn (number of exposures), the second the number of successes in the sample (number of empty exposures), the third the number of successes in the population (empty grid locations) and the last the size of the population (grid locations): $h(nm-M;nm-M,nm-M, nm)$.

The expected frequencies for the hypergeometric distribution is given by $N h(nm - M; nm - M, nm - M, nm)$ for $N$ trials. $k = 70$ in this case.

In the above experiment $\chi^2 = 47.24$ and $\chi_{0.05}^2 = 90.53$. Since $\chi^2 < \chi_{0.05}^2$, this affirms $H_{0}$ and that the number of locations exposed prior to exposing a mine follows a hypergeometric distribution with a statistical significance of $0.05$.

Also included in the plot is the observed distribution for a flood based exposure. As one might expect, the observed frequency of more exposures decreases more rapidly than that of the non-flood based exposure.

Agents

Methodology

Much like how a human player would learn to play the game, I decided that each model would have knowledge of game’s mechanics and no prior experience with the game. An alternative class of agents would have prior experience with the game as the case would be in a human player who had studied other player’s strategies.

To evaluate the effectiveness of the models, each played against a series of randomly generated grids and their respective success rates were captured. Each game was played on a standard beginner’s $9 \times 9$ grid containing between $[1, 10]$ mines.

For those models that refer to a probability measure, $\mathbb{P}$, it is assumed that the measure is determined empirically and treated as an estimate of the probability of an event and not as an a priori measure.

Marginal Model

Development

The first model to consider is the Marginal Model. It is designed to simulate the behavior of a naive player who believes that if he observes a mine at a grid location that the location should be avoid in future trials.

The model treats the visible board, $V$, as a matrix of discrete random variables where each grid location is interpreted as either $\textit{Empty}$ or (a) $\textit{Mine}$. This model picks the grid location with the greatest empirical probability of being empty:

$\underset{(i,j)}{\arg\max} \quad \mathbb{P}(X_{i,j} = \textit{Empty})$

$\textbf{MARGINAL-MODEL}(V) \newline \indent p_{\textit{max}} \leftarrow \perp \newline \indent (i, j)_{\textit{max}} \leftarrow \perp \newline \newline \indent \textbf{foreach} \quad (i,j) \in \textbf{U-LOCATIONS}(V) \newline \indent \indent p \leftarrow \mathbb{P}(V_{i,j} = \textit{Empty}) \newline \indent \indent \textbf{if} \quad p > p_{\textit{max}} \newline \indent \indent \indent p_{\textit{max}} \leftarrow p \newline \indent \indent \indent (i,j)_{\textit{max}} \leftarrow (i,j) \newline \newline \indent \textbf{return} \quad (i,j)_{\textit{max}}$

Test Results

Since the mine distribution is uniform, the model should be equivalent to selecting locations at random. The expected result is that avoiding previously occupied grid locations is an ineffective strategy as the number of mines increases. This does however, provide an indication of what the success rate should look like for chance alone.

Conditional Model

Development

One improvement over the Marginal Model is to take into account the visual clues made visible when an empty grid location is exposed. Since an empty grid location represents the number of neighboring mines, the Conditional Model can look at these clues to determine whether or not an unexposed grid location contains a mine.

This boils down to determining the probability of $\mathbb{P}(\textit{Mine} \lvert \textrm{Evidence})$. A simplification in calculating the probability is to assume that each piece of evidence is independent. Under this assumption the result is a NaÃ¯ve Bayes Classifier:

$\mathbb{P}( C = c \vert X = x ) = \dfrac{\mathbb{P}(C = c) \prod \mathbb{P}( X_{i} = x_{i} \vert C = c)}{\prod\mathbb{P}(X_{i} = x_{i})}$

As in the case of the Marginal Model, the Conditional Model returns the grid location that it has determined has the greatest probability of being empty given its neighbors:

$\underset{(i,j)}{\arg\max} \quad \mathbb{P}(C_{i,j} = \textit{Empty} | N(V_{i,j}) )$

$\textbf{CONDITIONAL-MODEL}(V, r) \newline \indent C \leftarrow \lbrace \textit{Empty}, \textit{Mine} \rbrace \newline \indent S \leftarrow \textbf{U-LOCATIONS}(V) \newline \newline \indent T \leftarrow \emptyset \newline \indent \textbf{foreach} \quad (i,j) \in S \newline \indent \indent N \leftarrow \textbf{NEIGHBORS}(V, i, j, r) \newline \indent \indent p_{\textit{max}} \leftarrow \perp \newline \indent \indent c_{\textit{max}} \leftarrow \perp \newline \indent \indent \textbf{foreach} \quad c \in C \newline \indent \indent \indent p \leftarrow \mathbb{P}(C = c) \newline \indent \indent \indent \textbf{foreach} \quad (u,v) \in N \newline \indent \indent \indent \indent p \leftarrow p * \mathbb{P}( X_{i, j} = V_{i, j} \vert C = c ) \newline \indent \indent \indent \textbf{if} \quad p > p_{\textit{max}} \newline \indent \indent \indent \indent p_{\textit{max}} \leftarrow p \newline \indent \indent \indent \indent c_{\textit{max}} \leftarrow c \newline \indent \indent \textbf{if} \quad c_{\textit{max}} = \textit {Empty} \newline \indent \indent \indent T \leftarrow T \cup (i, j) \newline \newline \indent \textbf{return} \quad \textbf{RANDOM}(T)$
Test Results

The NaÃ¯ve Bayes Classifier is regarded as being an effective approach to classifying situations for a number of different tasks. In this case, it doesn’t look like it is effective at classifying mines from non-mines. The results are only slightly better than the Marginal Model.

Graphical Model

Development

One shortfall of the Conditional Model is that it takes a greedy approach in determining which action to take. A more sophisticated approach is to not just consider the next action, but the possible sequence of actions that will minimize the possibility of exposing a mine.

Each of the possible observable grids, $S$, can be thought of as a set of vertices in a graph whose corresponding set of edges represent the transition between a state, $s$, to the next observable state, $s^{\prime}$. Each transition was achieved by performing an action, $a \in A$, on the state. The specific action, $\alpha : S \times S \to A$, is chosen from a subset of permitted actions given the state. Each transition has a probability, $\mathbb{P}(s^{\prime} \vert s)$, of taking place.

It is possible to pick a path, $\pi$, through this graph that minimizes the risk by assigning a reward, $\rho : S \to \mathbb{R}$, to each state and attempting to identify an optimal path, $\pi^{*}_{s}$, from the present state that yields the greatest aggregate reward,

$\displaystyle \varrho(\pi) = \sum_{(s, s^{\prime}) \in \pi} \rho(s^{\prime}) \mathbb{P}(s^{\prime} \vert s)$

Solving for $\pi^{*}_{s}$ is equivalent to solving the Longest Path Problem and can be computed efficiently using a dynamic programming solution.

$\displaystyle \pi_{s}^{*} \gets \underset{\pi_{s}}{\arg\max} \ \sum_{(\sigma, \sigma^{\prime}) \in \pi_{s}} \rho(\sigma^{\prime}) \mathbb{P}(\sigma^{\prime} \vert \sigma) \ \pi_{s} \in \Pi_{s}$

$\textbf{GRAPHICAL-MODEL}(V) \newline \indent (a, r)_{\textit{max}} \gets (\bot, \bot) \newline \indent T \gets \emptyset \newline \newline \indent \textbf{foreach} \quad U \in \textbf{SUB-GRIDS}(V) \newline \indent \indent (a, r)_{U} \gets \textbf{OPTIMAL-ACTION}(U, \bot) \newline \indent \indent \textbf{if} \quad r_{U} > r_{\textit{max}} \newline \indent \indent \indent (a,r)_{\textit{max}} \gets (a,r)_{U} \newline \indent \indent \indent T \gets \emptyset \newline \indent \indent \textbf{if} \quad r_{U} = r_{\textit{max}} \newline \indent \indent \indent T \gets T \cup (a,r)_{U} \newline \newline \indent (a, r)_{\textit{max}} \gets \textbf{RANDOM}(T) \newline \newline \indent \textbf{return} \quad a_{\textit{max}} \newline \newline \textbf{OPTIMAL-ACTION}(V) \newline \indent (a, r)_{\textit{max}} \gets (\bot, \bot) \newline \newline \indent \textbf{foreach} \quad (i,j) \in \textbf{U-LOCATIONS}(V) \newline \indent \indent \textbf{foreach} \quad V^{\prime} \in \textbf{OUTCOMES}(V, (i,j)) \newline \indent \indent \indent (a, r)_{V^{\prime}} \gets \textbf{OPTIMAL-ACTION}(V^{\prime}) \newline \indent \indent \indent r \gets r_{V^{\prime}} + \mathbb{P}(V^{\prime} \vert V) * \textbf{REWARD}(V^{\prime}) \newline \indent \indent \indent \textbf{if} \quad r \ge r_{\textit{max}} \newline \indent \indent \indent \indent (a, r)_{\textit{max}} \gets ((i,j), r) \newline \newline \indent \textbf{return} \quad (a,r)_{\textit{max}} \newline \newline \textbf{REWARD}(V) \newline \indent \textbf{if} \quad \textbf{SUCCESS}(V) \newline \indent \indent \indent \textbf{return} \quad +100 \newline \indent \textbf{if} \quad \textbf{FAILURE}(V) \newline \indent \indent \indent \textbf{return} \quad -100 \newline \newline \indent \textbf{return} +1$

From the optimal walk, a sequence of optimal actions is determined by mapping $\alpha$ over the path. Taking the first action gives the optimal grid location to expose given the current visible state of the board.

This description constitutes a Markov Decision Process. As is the case for most stochastic processes, it is assumed that the process holds the Markov Property; that future states only depend upon the current states and none of the prior states. In addition to being a Markov Decision Process, this is also an example of Reinforcement Learning.

First thing to observe is that the game state space is astronomical. For a standard beginner’s grid there is at most a sesvigintillion $(10^{81})$ possible grids that a player can encounter. Which as an aside, is on the order of the number of atoms in the observable universe! The set of actions at each state is slightly more manageable with at most eighty-one actions.

To simplify the state space, I chose to only consider $3 \times 3$ boards and when evaluating a full grid, consider the possible sub-grids and evaluate the optimal sequence of actions for each sub-grid and pick the maximum reward associated for each sub-grid that was evaluated as the action to take on the full grid.

Test Results

The Graphical Model produces results that are only a margin better than those of the Conditional Model.

Semi-deterministic Model

Development

The last model I’m going to talk about is a semi-deterministic model. It works by using the visible grid to infer the topology of the hidden grid and from the hidden grid, the topology that the visible grid can become.

The grid can be viewed as a graph. Each grid location is a vertex and an edge is an unexposed grid locationâ€™s influence on another grid locationâ€™s neighbor mine count.

For each of the exposed grid locations on the board, $v_{i,j}$, it’s neighbors, $N(v_{i,j})$, are all mines when the number of inbound edges $E_{i,j} = d(v_{i,j})$, matches the visible mine count $V_{i,j}$.

The model produces its inferred version, $F$, of the influence graph $E$ by using the determined mine locations $M$.

For each of the grid locations that are exposed and the inferred influence matches the visible count, then each of the neighbors about that location can be exposed provided they are not already exposed and not an inferred mine.

From this set of possibilities, a mine location is chosen. When no mine locations can be determined, then an alternative model can be used.

$\textbf{SEMI-DETERMINISTIC-MODEL}(V) \newline \indent E \leftarrow 0_{n, m} \newline \indent \textbf{foreach} \quad (i,j) \in \textbf{LOCATIONS}(V) \newline \indent \indent \textbf{if} \quad V_{i,j} \ne \textit{U} \newline \indent \indent \indent \textbf{continue} \newline \indent \indent \textbf{foreach} \quad (u,v) \in \textbf{NEIGHBORS}(V, i, j) \newline \indent \indent \indent E_{u,v} \leftarrow E_{u,v} + 1 \newline \newline \indent M \leftarrow \textit{False}_{n,m} \newline \indent \textbf{foreach} \quad (i,j) \in \textbf{LOCATIONS}(V) \newline \indent \indent \textbf{if} \quad V_{i,j} \in \lbrace \textit{U}, \textit{0} \rbrace \lor V_{i,j} \neq E_{i,j} \newline \indent \indent \indent \textbf{continue} \newline \indent \indent \textbf{foreach} \quad (u,v) \in \textbf{NEIGHBORS}(V, i, j) \newline \indent \indent \indent \textbf{if} \quad V_{i,j} = \textit{U} \newline \indent \indent \indent \indent M_{u,v} \leftarrow \textit{True} \newline \newline \indent F \leftarrow 0_{n,m} \newline \indent \textbf{foreach} \quad (i,j) \in \textbf{LOCATIONS}(V) \newline \indent \indent \textbf{foreach} \quad (u,v) \in \textbf{NEIGHBORS}(V, i, j) \newline \indent \indent \indent \textbf{if} \quad M_{u,v} \newline \indent \indent \indent \indent F_{i,j} \leftarrow F_{i,j} + 1 \newline \newline \indent S \leftarrow \emptyset \newline \indent \textbf{foreach} \quad (i,j) \in \textbf{LOCATIONS}(V) \newline \indent \indent \textbf{if} \quad V_{i,j} = \textit{U} \lor F_{i,j} \ne V_{i,j} \newline \indent \indent \indent \textbf{continue} \newline \indent \indent \textbf{foreach} \quad (u,v) \in \textbf{NEIGHBORS}(V, i, j) \newline \indent \indent \indent \textbf{if} \quad V_{i,j} \ne \textit{U} \lor M_{u,v} \newline \indent \indent \indent \indent \textbf{continue} \newline \indent \indent \indent S \leftarrow S \cup (u, v) \newline \newline \indent \textbf{return} \quad \textbf{RANDOM}(S)$

Test Results

Since the model is a more direct attempt at solving the board, its results are superior to the previously presented models. As the number of mines increases, it is more likely that it has to rely on a more probabilistic approach.

Summary

Each of the models evaluated offered incremental improvements over their predecessors. Randomly selecting locations to expose is on par with choosing a location based on previously observed mine locations. The Conditional Model and Graphical Model yield similar results since they both make decisions based on conditioned probabilities. The Semi-deterministic Model stands alone as the only one model that produced reliable results.

The success rate point improvement between the Condition and Marginal models is most notable for boards consisting of three mines and the improvement between Graphical and Semi-deterministic models for seven mines. Improvements between Random and Marginal models is negligible and between Conditional and Graphical is minor for all mine counts fewer than seven.

Given the mathematical complexity and nondeterministic nature of the machine learning approaches, (in addition the the complexity and time involved in implementing those approaches) they don’t seem justified when more deterministic and simpler approaches exist. In particular, it seems like most people have implemented their agents using heuristics and algorithms designed to solve constraint satisfaction problems. Nonetheless, this was a good refresher to some of the elementary aspects of probability, statistics and machine learning.

References

Classification – NaÃ¯ve Bayes.” Data Mining Algorithms in R. Wikibooks. 3 Nov. 2010. Web. 30 Oct. 2011.

Windows Minesweeper.” MinesweeperWiki. 8 Sept. 2011. Web. 30 Oct. 2011.

Kaye, Richard. “Minesweeper Is NP-complete.” [pdf] Mathematical Intelligencer 22.2 (2000): 9-15. Web. 30 Oct. 2011.

Nakov, Preslav, and Zile Wei. “MINESWEEPER, #MINESWEEPER.” 14 May 2003. Web. 14 Apr. 2012.

Richard, Sutton, and Andrew G. Barto. “3.6 Markov Decision Processes.” Reinforcement Learning: An Introduction. Cambridge, Massachusetts: Bradford Book, 1998. 4 Jan. 2005. Web. 30 Oct. 2011.

Rish, Irene “An Empirical Study of the Naive Bayes Classifer.” [pdf] IJCAI-01 Workshop on Empirical Methods in AI (2001). Web. 30 Oct. 2011.

Russell, Stuart J., and Peter Norvig. Artificial Intelligence: A Modern Approach. Upper Saddle River, NJ: Prentice Hall/PearsonEducation., 2003. Print.

Sun, Yijun, and Jian Li. “Adaptive Learning Approach to Landmine Detection.” [pdf] IEEE Transactions of Aerospace and Electronic Systems 41.3 (2005): 1-9. 10 Jan. 2006. Web. 30 Oct. 2011.

Taylor, John R. An introduction to error analysis: the study of uncertainties in physical measurements. Sausalito, CA: University Science Books, 1997. Print.

Written by lewellen

2012-05-01 at 8:00 am

Multi-agent ordering

Given a set of agents $A = \{ a_{0}, a_{1}, \dotsc, a_{n} \}$ and a set of entities $V = \{ v_{0}, v_{1}, \dotsc, v_{m} \}$, agents independently provide preferences $I_a \colon V^{2} \to \{ -1, 0, +1\}$ between different pairs of entities for a subset $S_{a}$ of the possible $E = V^{2}$ pairs. From these preferences, we wish to form a poset that contains the entities in order of least preferred to most preferred.

An agent may prefer an individual pair $e_{i,j}$ in the following ways: $-1$, the agent prefers $v_{j}$ to $v_{i}$; $0$, the agent is indifferent; $+1$, the agent prefers $v_{i}$ to $v_{j}$. Let $G_{a} = (V, S_{a})$ be the undirected graph that represents the preferences of an individual agent.

To form a consensus we are interested in $G= \sum_{a \in A} G_{a}$. Let the adjacency matrix $M$ represent $G$. In doing so, we may define $M$ in the following way: $M_{i,j} = \sum_{a \in A} I_a (e_{i,j})$. It should be evident that $M_{i,i} = 0$ and $M_{i,j} = -M_{j,i}$. We wish to simplify $M$ by turning it into a directed graph whereby all negative entries are removed. We shall define $N_{i,j} = \max{(0, M_{i,j})}$ to be this directed graph.

Let $deg_{j}^{in} = \sum_{i = 0}^{m} N_{i, j}$ be the sum of weights into vertex $v_{j}$, $deg_{i}^{out} = \sum_{j = 0}^{m} N_{i,j}$ be the sum of weights out of vertex $v_{i}$ and $\delta_i = (deg_{i}^{in}, deg_{i}^{out})$ be the set of tuples for each vertex in $V$. Thus, let $\delta$ along with the relation: increasing first element, then by decreasing second element be the poset $P$ of entities order from least preferred to most preferred.

Consider the following example: on a merchant’s website, customers $A = \{a,b,c,d,e,f\}$ independently provide comparisons between products that the merchant sells. The merchant wishes to know his customers’ preferences about his products $V = \{1,2,3,4,5,6\}$.

Customers have provided the following set of preferences:

The consensus graph is produced from these preferences, followed by the directed graph until $\delta = \begin{pmatrix} (1,4) & (4,3) & (2,0) & (1,4) & (0,2) & (4,0) \end{pmatrix}$ is computed, which, when used to form $P$, will yield $P = \{6,2,3,1,4,5\}$. The merchant can infer from this analysis that customers least preferred product is 6, and most preferred product is 5.

Now, this model assumes that the agents are being truthful in their preferences. It is conceivable that a consortium of agents could force an entity to become the least or most preferred entity by providing false preferences. This premise presents two questions: how do we identity fallacious behavior, and two: how do we prevent it. The answer to the later of these two questions is simple: we add an additional term $\tau_a \in [0,1]$ when constructing $M_{i,j} = \sum_{a \in A} (1 - \tau_a) I_a (e_{i,j})$ where $0$ indicates that the agent provides factual preferences and $1$ indicates non factual preferences. The prior of these two questions is more involved and ties into calculating $\tau_a$.

Lets consider two plausible hypotheses: a single agent tells the truth $\phi_{a}$ or he does not tell the truth about a single comparison $e$. An agent is telling the truth about a single comparison if his preference $I_{a}(e)$ is identical to the majority preference $\sum_{a \in A} I_{a}(e)$. An agent is not telling the truth if his preference is not identical to the majority preference. Let $\phi_{a}(e) = 1$ when $I_{a}(e)\sum_{\alpha \in A} I_{\alpha}(e) < 0$, $0$ otherwise formalize this membership. Thus, we may define $\displaystyle \tau_{a} = \frac{\sum_{e \in S_{a}} \phi_{a}(e)}{ \lvert S_{a} \rvert }$ as the probability that a’s comparisons are majority comparisons.

This formalization assumes that there is an objective comparison and naturally, this formalization would not hold for subjective comparisons.

Consider the following example: seven robots are asked to order a collection of five blocks from smallest to largest and the result is an incorrectly ordered list. We wish to identify the defective robot(s), remove their preferences and reproduce the corrected list. The following table summarizes each robot’s preferences starting from the top left to right then down to the bottom left to right.

Going through the process above we arrive at an order of $\{0, 1, 3, 2, 4\}$ which is incorrect, thus we look at what our values are for this first iteration: $\tau_{a}^{(0)} = \begin{pmatrix}0 & 0.25 & 0 & 0 & 0 & 0.\overline{1} & 0\end{pmatrix}$. Robots $a_{1}, a_{5}$ are more likely than their piers to deviate from the established majority. So, applying these values, we continue to perform this process iteratively until we arrive at $\{0,1,2,3,4\}$ using values of $\tau_{a}^{(2)} = \begin{pmatrix}0 & 0.41\overline{6} & 0 & 0 & 0 & 0.\overline{2} & 0 \end{pmatrix}$. This process can be visualized the iterations of $N$ below. Edge weights represent the consensus. Thicker lines for the majority, thinner for the minority.

While not previously alluded to, it is possible that it may take several iterations to achieve a correct ordering. Which raises a couple questions: under what conditions will the correct ordering exist, how many iterations will it take to reach the correct ordering and are the number of iterations finite. The last one is left to the reader.

As a thought experiment, it is evident that there has to be at least $m - 1$ preferences for a correct ordering to be produced. Starting at any one vertex, there exists a path to every other vertex, otherwise a consensus cannot be achieved because $G$ could be decomposed into a set of disjoint subgraphs and no ordering is possible.

If there is an objective order $P^{\star}$, it is still possible to arrive at a separate order $P^{\prime}$ that is produced after successive iterations if the majority preferences are all incorrect with respect to $P^{\star}$. Thus, we cannot guarantee that the process will converge. If the preferences are correct with respect to $P^{\star}$, then convergence will be achieved as the weight of dissenting agent’s preferences tends towards zero.

Written by lewellen

2010-01-01 at 8:00 am

Posted in Graph Theory

Tagged with