# Antimatroid, The

thoughts on computer science, electronics, mathematics

## A Greedy Approximation Algorithm for the Linear Assignment Problem

Starting today, I will be posting some of the related source code for articles on GitHub.

### Introduction

The Linear Assignment Problem (LAP) is concerned with uniquely matching an equal number of workers to tasks, $n$, such that the overall cost of the pairings is minimized. A polynomial time algorithm was developed in the late fifties by [6], and further refined by [9], called the Hungarian method. Named so after the work of Hungarian mathematicians König and Egerváry whose theorems in the 1930s form the basis for the method. While the Hungarian Method can solve LAP instances in $\mathcal{O}\left(n^3\right)$ time, we wish to find faster algorithms even if it means sacrificing optimality in the process. Here we examine a greedy $\alpha$-approximation algorithm with $\mathcal{O}\left(n^2 \log n \right)$ runtime in terms of its approximation factor and compare it empirically to the Hungarian method.

### Linear Assignment Problem

\displaystyle \begin{aligned} C_n = \min & \sum_{i=1}^{n} \sum_{j=1}^{n} M_{i,j} x_{i,j} \\ s.t. & \sum_{i=1}^{n} x_{i,j} = 1, \quad j = 1, \ldots, n \\ & \sum_{j=1}^{n} x_{i,j} = 1, \quad i = 1, \dots, n \label{eqn:lap} \end{aligned}

The above linear program has cost, $M \in \mathbb{Z}_{+}^{n \times n}$, and assignment, $x \in \lbrace 0,1 \rbrace^{n \times n}$, matrices that specify the terms of the LAP. This is equivalent to finding a perfect matching in a weighted bipartite graph. A minimal cost may have several possible assignments, but we are only interested in finding just one. It is assumed that no one worker can do all jobs more efficiently by themselves than the distributing work across all workers. Likewise, if the costs are thought of as durations, then the minimum cost is the minimum sequential rather than parallel time taken to complete the tasks.

From a practical point of view, we may relax the integral constraint on $M$ and allow all positive real-valued costs. For instances where there are more jobs than workers, and vice versa, dummy entries valued greater than the existing maximum may be added. Minimizing the cost is the default objective, but the maximum cost can be found by finding the optimal assignment for $M^{\prime}_{i,j} = M_{max} - M_{i,j}$, then finding the cost relative to $M$.

### Algorithms

Brute Force Rather than using the mathematical programming or graph theoretic representation of the problem, we can instead view the problem as finding the assignment that minimizes the cost out of all possible assignments:

$\displaystyle \pi^{*} = \underset{\pi \in \Pi_n}{\arg\min} \sum_{i=1}^{n} M_{i, \pi_i}$

There are $n!$ such assignments which can be produced using an iterative version of Heap’s algorithm [5] in $\mathcal{O}\left(n!\right)$ time assuming one does differential scoring (opposed to calculating the score for each permutation which would result in an $\mathcal{O}\left(n^2 (n-1)!\right)$ algorithm.)

Random The random algorithm selects a permutation $\pi \in \Pi_n$ uniformly from the set of all possible assignment permutations in $\mathcal{O}\left(n\right)$ time using the Fisher-Yates shuffle [4]. This obviously does not produce an optimal or near-optimal solution, but serves as a straw man to compare other results.

Greedy The greedy heuristic continues to cover the row and column of the smallest uncovered entry in the cost matrix until all entries are covered. The resulting set of entries then constitutes the assignment of workers to jobs. An inefficient $\mathcal{O}\left(n^3\right)$ algorithm can be used to find the smallest entry every iteration, or a more efficient result of $\mathcal{O}\left(n^2 \log n\right)$ can be obtained through the use of a sorted, array indexed hybrid mesh and queue. Let $\texttt{QNode}$ represent a tuple consisting of row, column, and value; the previous entry in the matrix $\le$ this value, and the next entry in this matrix $\ge$ this value; and the $\texttt{QNode}s$ (left, above, right, below) that are adjacent to this node.

Algorithm 1 A greedy algorithm for the LAP.

• $\textbf{procedure } \textsc{Greedy}(M)$
• $A[i] \gets \bot \text{ for } i = 0 \ldots n - 1$
• $Q[i] \gets \texttt{QNode} \text{ for } i = 0 \ldots n^2 - 1$
• $\textsc{LinkMesh}(Q)$ // Adjacent node left, above, right, below properties
• $\textsc{Sort}(Q)$ // Sort in ascending order by node value
• $\textsc{LinkQueue}(Q)$ // Adjacent node previous and next properties
• $\qquad Q_{min} \gets Q[0]$
• $\textbf{while } Q_{min} \neq nil \textbf{ do}$
• $A[ Q_{min} \rightarrow row ] \gets Q_{min} \rightarrow col$
• $Q_{min} \gets \textsc{DeleteNode}(Q, Q_{min})$ // Deletes row and col of $Q_{min}$
• $\textbf{end while}$
• $\qquad \textbf{return } A$

Allocating and linking for assignment is $\mathcal{O}\left(n\right)$; mesh $\mathcal{O}\left(n^2\right)$; queue $\mathcal{O}\left(2n^2\log n + n^2\right)$. Therefore, initialization requires $\mathcal{O}\left(n^2 \log n\right)$ time. The body of the loop requires a constant time assignment of worker to job, and $\mathcal{O}\left(2k - 1\right)$ time to remove the row and column from a $k \times k$ matrix using a modified depth first search. Thus, the loop itself accounts for $\mathcal{O}\left(n^2\right)$ time. The resulting time complexity is therefore $\mathcal{O}\left(n^2 \log n\right) \square$.

$\displaystyle \begin{pmatrix} 62 & 31 & 79 & \fbox{6} & 21 & 37 \\ 45 & 27 & 23 & 66 & \fbox{9} & 17 \\ 83 & 59 & 25 & 38 & 63 & \fbox{25} \\ \fbox{1} & 37 & 53 & 100 & 80 & 51 \\ 69 & \fbox{72} & 74 & 32 & 82 & 31 \\ 34 & 95 & \fbox{61} & 64 & 100 & 82 \\ \end{pmatrix} \quad \begin{pmatrix} 62 & 31 & 79 & \fbox{6} & 21 & 37 \\ 45 & 27 & 23 & 66 & \fbox{9} & 17 \\ 83 & 59 & \fbox{25} & 38 & 63 & 25 \\ \fbox{1} & 37 & 53 & 100 & 80 & 51 \\ 69 & 72 & 74 & 32 & 82 & \fbox{31} \\ 34 & \fbox{95} & 61 & 64 & 100 & 82 \\ \end{pmatrix}$

Breaking ties for the minimum uncovered value can result in different costs. This drawback is shown in the above example were choosing $25$ at $(3,6)$ yields a minimum cost of $174$, where as the one at $(3, 3)$ gives a minimum cost of $167$. The next progression in the design of the greedy algorithm would be to try all minimum positions and keep the top $k$ performing paths.

Hungarian The general idea behind the Kuhn-Munkres algorithm is that if we are given an initial assignment, we can make further assignments and potentially reassign workers until all workers have been tasked with a job. The high-level sketch of the algorithm starts with an initial assignment. While we have jobs that are unassigned, we look for qualified workers, ie, the zero entries. If a worker is already assigned to a job, but is also qualified for another, then we prime the alternative and continue to the next qualified worker, but if that is the only job the worker is qualified for, then we’d like to reassign any other worker already tasked to that job. This leads to a natural ripple effect represented by an alternating path of starred and primed entries. In Munkres’ paper [9] “starred” zero’s represent assignments of workers to jobs, and “primed” zero’s are alternative assignments. By flipping the bits of the path, we reassign workers to their alternative tasks while ensuring the assignment continues to be minimal by construction. After assigning as many workers as we have to, we then deduct the lowest cost to create a new qualified worker. Thus, every iteration we are guaranteed to make positive progress towards our goal of finding an optimal assignment. This scheme results in the worst case $\mathcal{O}\left(n^3\right)$ time to complete.

Algorithm 2 The Hungarian method for the LAP.

• $\textbf{procedure } \textsc{HungarianMethod}(M)$
• $M_{i,j} \gets M_{i,j} - \min_j M_{i,j} \text{ for } i = 0 \ldots n - 1$
• $M_{i,j} \gets M_{i,j} - \min_i M_{i,j} \text{ for } j = 0 \ldots n - 1$
• Star the first uncovered zero in row $i$, cover the corresponding column $j$ for $i = 0 \ldots n - 1$
• $\textbf{while }$ All columns not covered
• $\textbf{while }$ Uncovered zeros
• Prime the current uncovered zero
• $\textbf{if }$ There’s a starred zero in this row
• Uncover the starred zero’s column and cover the row
• $\textbf{else }$
• Find an alternating augmented path from the primed zero
• Unstar the starred zeros on the path and star the primed zeros on the path
• Remove all the prime markings and cover all stared zeros
• $\textbf{break}$
• $\textbf{end if}$
• $\textbf{end while}$
• $\textbf{if }$ Found path
• $\textbf{continue}$
• $\textbf{end if}$
• $M^* = \min M_{i,j}$ over all uncovered $i, j$
• $M_{i,j} = M_{i,j} - M^*$ for all uncovered columns $j$
• $M_{i,j} = M_{i,j} + M^*$ for all covered rows $i$
• $\textbf{end while }$
• $\textbf{return}$ Starred zeros // These are all the assignments
• $\textbf{end procedure}$

To further illustrate the algorithm, consider the following example where starred entries are denoted by red, and primed entries by green:

### Analysis

The prevailing convention in the literature is to look at the approximation factor, $\alpha$, to determine how close the results of an approximation algorithm are to optimal [10]. Here this ratio is the expected minimum cost assignment of the algorithm under test to the same quantity given by the expected minimum assignment cost. Let $M_{i,j} \sim \text{Exp}(1)$ be an $n \times n$ a standard exponential random cost matrix. We resort to the exponential distribution for its ease of analyis and prominence in related literature. Cf. the works of [7], [8] for analysis based on $M_{i,j} \sim \mathcal{U}(0,1)$.

Exponential Distribution Properties Let $X \sim \text{Exp}(\lambda)$ have cumulative distribution function $F_X(x) = 1 - \exp{\left(-\lambda x\right)}$ and expectation $\mathbb{E}(X) = \lambda^{-}$. The distribution demonstrates the memoryless property for expectations $\mathbb{E}(X \lvert X > a) = \mathbb{E}(X) + a$. Define the order statistic $X_{1:n} = \min \lbrace X_{1}, \ldots, X_{n} \rbrace$ to be the minimum of $n$ draws from $\text{Exp}(\lambda)$. $X_{1:n} \sim \text{Exp}(n \lambda)$ [2] with expectation $\mathbb{E}(X_{1:n}) = \left(n \lambda\right)^-$. If $Y_n = \sum_{i = 1}^{n} X_i$ then $Y_n \sim \text{Gamma}(n, \lambda)$ with expectation $\mathbb{E}(Y_n) = n \lambda^{-}$.

Expected Minimum Cost The expected minimum assignment cost for $M$ is given by [1]:

$\displaystyle \mathbb{E}(C_n) = \sum_{k = 1}^{n} \frac{1}{k^2} = H_{n}^{(2)}$

Which is the generalized harmonic number of order two and converges to $\zeta(2) = \pi^2/6$. For the generalized harmonic numbers, $H_{n}^{(k)}$, $\lim_{k\to\infty} H_{n}^{(k)} = \zeta(k)$ for $k > 1$.

Greedy The minimum value of an $n \times n$ matrix is given by the order statistic $M_{1:n^2}$ with expectation $\mathbb{E}(M_{1:n^2}) = n^{-2}$. The expected value of the minimum cost assignment is not just $\sum_{i=0}^{n-1} (n-i)^{-2}$ because the expectation doesn’t take into account the previous iteration’s minimum value. To accomplish this we make use of the memoryless property of the exponential distribution to observe that the expected difference in minimums between iterations is the expected minimum value given by $M_{i:k^2}$. If we add up all these differences we get the expected minimum value of the k’th iteration; summing all these expectations then yields the expected minimum cost assignment:

$\displaystyle \mathbb{E}(C_n) = \sum_{i=0}^{n-1} \sum_{j=0}^{i} \frac{1}{(n - j)^2} = \sum_{j=0}^{n-1} \frac{(n-j)}{(n-j)^2} = \sum_{j=0}^{n-1} \frac{1}{n-j} = H_n$

This is the harmonic number of order one which does not converge. The resulting approximation factor is:

$\displaystyle \alpha_n = \frac{H_n}{H_n^{(2)}}$

Random The random algorithm will simply select an assignment permutation, so we are just adding up $n$ $\text{Exp}(1)$ distributed random variables leading to an expected cost of:

$\displaystyle \mathbb{E}(C_n) = \sum_{i=1}^n \mathbb{E}(M_{i, \pi_i}) = n$

And approximation factor:

$\displaystyle \alpha_n = \frac{n}{H_n^{(2)}}$

From this analysis one concludes that the greedy algorithm has an unbounded approximation factor that grows significantly slower than that of randomly selecting assignments.

### Evaluation

To illustrate the preceding results, Figure 1 shows the approximation factor for the greedy algorithm implementations against the derived approximation factor. The simulated results are based on 120 $n \times n$ standard exponentially distributed matrices for $1 \le n \le 1000$. Using the same conventions for the approximation factor, Figure 2 illustrates the runtime characteristics of the algorithms after rejecting outliers due to system fluctuations. Results obtained from source code compiled with -O3 flags and ran on a Xeon E3-1245 v5 3.5 Ghz system with 32 GBs of 2133Mhz DDR4 RAM. The algorithms coincide with the theoretical time complexities as shown in Table 2.

Solver MSE
GREEDY-EFFICIENT 0.002139
GREEDY-NAIVE 0.014161
HUNGARIAN 0.232998
Table 1: Mean square error of fitted model to mean runtime for each solver. Models given by the corresponding time complexity. Fit by Levenberg-Marquardt.

### Summary

Brute Random Greedy Hungarian
Complexity $\mathcal{O}\left(n!\right)$ $\mathcal{O}\left(n\right)$ $\mathcal{O}\left(n^2 \log n\right)$ $\mathcal{O}\left(n^3\right)$
$\alpha_n$ 1 $n / H_n^{(2)}$ $H_n / H_n^{(2)}$ 1
Table 2: Merits of each approach.

Exact solutions can be delivered by the brute method when a handful of workers are being considered, and the Hungarian method should be considered for all other instances. Approximate solutions can be provided by the greedy algorithm with logarithmic degeneracy while providing a linear factor improvement over the Hungarian method. For inputs greater than those considered, the parallel Auction algorithm [3] is a suitable alternative and the subject of future work.

### References

1. Aldous, D. J. The $\zeta(2)$ limit in the random assignment problem. Random Structures & Algorithms 18, 4 (2001), 381-418.
2. Balakrishnan, N., and Rao, C. Handbook of statistics 16: Order statistics-theory and methods, 2000.
3. Bertsekas, D. P. The auction algorithm: A distributed relaxation method for the assignment problem. Annals of operation research 4, 1 (1988), 105-123.
4. Durtenfeld, R. Algorithm 235: random permutation. Communications of the ACM 7, 7 (1964), 420.
5. Heap, B. Permutations by interchanges. The Computer Journal 6, 3 (1963), 293-298.
6. Kuhn, H. W. The hungarian method for the assignment problem. Naval research logistics quarterly 2, 1-2 (1955), 83097.
7. Kurtzberg, J. M. On approximation methods for the assignment problem. Journal of the ACM (JACM) 9, 4 (1962), 419-439.
8. Steele, M. J. Probability and statistics in the service of computer science: illustrations using the assignment problem. Communications in Statistics-Theory and Methods 19, 11 (1990), 4315-4329.
9. Munkres, J. Algorithms for the assignment and transportation problems. Journal of the society for industrial and applied mathematics 5, 1 (1957), 32-38.
10. Williamson, D. P., and Shmoys, D. B. The design of approximation algorithms. Cambridge university press, 2011.

Written by lewellen

2017-03-21 at 11:12 am

## Expected Maximum and Minimum of Real-Valued Continuous Random Variables

### Introduction

This is a quick paper exploring the expected maximum and minimum of real-valued continuous random variables for a project that I’m working on. This paper will be somewhat more formal than some of my previous writings, but should be an easy read beginning with some required definitions, problem statement, general solution and specific results for a small handful of continuous probability distributions.

### Definitions

Definition (1) : Given the probability space, $(\Omega, \mathcal{F}, \mathbb{P})$, consisting of a set representing the sample space, $\Omega$, a $\text{Borel }\sigma \text{-algebra}$, $\mathcal{F}$, and a Lebesgue measure, $\mathbb{P}$, the following properties hold true:

1. Non-negativity: $\mathbb{P}(F) \ge 0 \quad \forall F \in \mathcal{F}$
2. Null empty set: $\mathbb{P}(\emptyset) = 0$
3. Countable additivity of disjoint sets $\displaystyle \mathbb{P}\left( \bigcup_{i=0}^{\infty} F_i \right) = \sum_{i=0}^{\infty} \mathbb{P}(F_i) \quad F_i \subset \mathcal{F}$

Definition (2) : Given a real-valued continuous random variable such that $X : \Omega \to \mathbb{R}$, the event the random variable takes on a fixed value, $x \in \mathbb{R}$, is the event $\lbrace \omega : X(\omega) = x \rbrace \in \mathcal{F}$ measured by the probability distribution function $f_X(x) = \mathbb{P}(X = x)$. Similarly, the event that the random variable takes on a range of values less than some fixed value, $x \in \mathbb{R}$, is the event $\lbrace \omega : X(\omega) \le x \rbrace \in \mathcal{F}$ measured by the cumulative distribution function $F_X(x) = \mathbb{P}(X \le x)$. By Definition, the following properties hold true:

1. $\displaystyle F_X(x) = \int_{-\infty}^{x} f_X(t) \, dt$
2. $\displaystyle \lim_{x \to \infty} F_X(x) = 1$
3. $\displaystyle 1 - \int_{-\infty}^{x} t f_X(t) \, dt = \int_{x}^{\infty} t f_X(t) \, dt$

Defintion (3) : Given a second real-valued continuous random variable, $Y : \Omega \to \mathbb{R}$, The joint event $\lbrace \omega : X(\omega) = x, Y(\omega) = y \rbrace \in \mathcal{F}$ $(x,y) \in \mathbb{R}^2$ will be measured by the joint probability distribution $f_{X, Y}(x,y) = \mathbb{P}(X = x, Y = y)$. If $X$ and $Y$ are statistically independent, then $f_{X,Y}(x,y) = f_X(x) f_Y(y)$.

Definition (4) : Given a real-valued continuous random variable, $X : \Omega \to \mathbb{R}$, the expected value is $\displaystyle \mathbb{E}(X) = \int_{-\infty}^{\infty} x f_X(x) \, dx$.

Definition (5) : (Law of the unconscious statistician) Given a real-valued continuous random variable, $X$, and a function, $g : \mathbb{R} \to \mathbb{R}$, then $g(X)$ is also a real-valued continuous random variable and its expected value is $\displaystyle \mathbb{E}(g(X)) = \int_{-\infty}^{\infty} g(x) f_X(x) \, dx$ provided the integral converges. Given two real-valued continuous random variables, $X, Y$, and a function, $g : \mathbb{R}^2 \to \mathbb{R}$, then $g(X, Y)$ is also a real-valued continuous random variable and its expected value is $\displaystyle \mathbb{E}(g(X,Y)) = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} g(x,y) f_{X,Y}(x,y) \, dx \, dy$. Under the independence assumption of Definition (3), the expected value becomes $\displaystyle \mathbb{E}(g(X,Y)) = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} g(x,y) f_X(x) f_Y(y) \, dx \, dy$.

Remark (1) : For the remainder of this paper, all real-valued continuous random variables will be assumed to be independent.

### Problem Statement

Theorem (1) : Given two real-valued continuous random variables $X, Y \in \Omega \to \mathbb{R}$, then the expected value of the minimum of the two variables is $\mathbb{E} \left( \min{ \left( X, Y \right) } \right ) = \mathbb{E} \left( X \right ) + \mathbb{E} \left( Y \right ) - \mathbb{E} \left( \max{ \left( X, Y \right) } \right )$.

Lemma (1) : Given two real-valued continuous random variables $X, Y \in \Omega \to \mathbb{R}$, then the expected value of the maximum of the two variables is $\displaystyle \mathbb{E} \left( \max{ \left ( X, Y \right) } \right ) = \int_{-\infty}^{\infty} x f_X(x) F_Y(x) \, dx + \int_{-\infty}^{\infty} y f_Y(y) F_X(y) \, dy$

Proof of Lemma (1) :

$\displaystyle \mathbb{E} \left( \max{ \left ( X, Y \right) } \right ) = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \max{\left( x, y \right)} f_X(x) f_Y(y) \, dx \, dy$ (Definition (5))

$\displaystyle = \int_{-\infty}^{\infty} \int_{-\infty}^{x} x f_X(x) f_Y(y) \, dy \, dx + \int_{-\infty}^{\infty} \int_{-\infty}^{y} y f_X(x) f_Y(y) \, dx \, dy$ (Definition (1.iii))

$\displaystyle = \int_{-\infty}^{\infty} x f_X(x) \left ( \int_{-\infty}^{x} f_Y(y) \, dy \right ) \, dx + \int_{-\infty}^{\infty} y f_Y(y) \left ( \int_{-\infty}^{y} f_X(x) \, dx \right ) \, dy$ (Fubini’s theorem)

$\displaystyle = \int_{-\infty}^{\infty} x f_X(x) F_Y(x) \, dx + \int_{-\infty}^{\infty} y f_Y(y) F_X(y) \, dy \quad \square$ (Definition (2.i))

Proof of Theorem (1)

$\displaystyle \mathbb{E} \left( \min{ \left ( X, Y \right) } \right ) = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \min{\left( x, y \right)} f_X(x) f_Y(y) \, dx \, dy$ (Definition (4))

$\displaystyle = \int_{-\infty}^{\infty} \int_{x}^{\infty} x f_X(x) f_Y(y) \, dy \, dx + \int_{-\infty}^{\infty} \int_{y}^{\infty} y f_X(x) f_Y(y) \, dx \, dy$ (Definition (1.iii))

$\displaystyle = \int_{-\infty}^{\infty} x f_X(x) \left ( \int_{x}^{\infty} f_Y(y) \, dy \right ) \, dx + \int_{-\infty}^{\infty} y f_Y(y) \left ( \int_{y}^{\infty} f_X(x) \, dx \right ) \, dy$ (Fubini’s theorem)

$\displaystyle = \int_{-\infty}^{\infty} x f_X(x) \left (1 - \int_{-\infty}^{x} f_Y(y) \, dy \right ) \, dx + \int_{-\infty}^{\infty} y f_Y(y) \left (1 - \int_{-\infty}^{y} f_X(x) \, dx \right ) \, dy$ (Definition (2.iii))

$\displaystyle = \int_{-\infty}^{\infty} x f_X(x) \left (1 - F_Y(x) \right ) \, dx + \int_{-\infty}^{\infty} y f_Y(y) \left( 1 - F_X(y) \right ) \, dy$ (Definition (2.i))

$\displaystyle = \int_{-\infty}^{\infty} x f_X(x) \, dx - \int_{-\infty}^{\infty} x f_X(x) F_Y(x) \, dx + \int_{-\infty}^{\infty} y f_Y(y) \, dy - \int_{-\infty}^{\infty} y f_Y(y) F_X(y) \, dy$

$\displaystyle = \int_{-\infty}^{\infty} x f_X(x) \, dx + \int_{-\infty}^{\infty} y f_Y(y) \, dy - \left ( \int_{-\infty}^{\infty} x f_X(x) F_Y(x) \, dx + \int_{-\infty}^{\infty} y f_Y(y) F_X(y) \, dy \right )$

$\displaystyle = \mathbb{E}(X) + \mathbb{E}(Y) - \mathbb{E} \left( \max{ \left( X, Y \right) } \right) \quad \blacksquare$ (Definition (4), Lemma (1))

Remark (2) : For real values $x, y \in \mathbb{R}$, $\min{\left(x,y\right)} = x + y - \max{ \left( x, y \right) }$.

Proof Remark (2) : If $x \ge y$, then $\min{\left(x,y\right)} = y$, otherwise $x$. If $x \ge y$, then $\max{\left(x,y\right)} = x$, otherwise $y$. If $x \ge y$, then $\min{\left(x,y\right)} = y + \left( x - \max{\left(x,y\right)} \right )$, otherwise, $\min{\left(x,y\right)} = x + \left( y - \max{\left(x,y\right)} \right )$. Therefore, $\min{\left(x,y\right)} = x + y - \max{\left(x,y\right)} \quad \square$

### Worked Continuous Probability Distributions

The following section of this paper derives the expected value of the maximum of real-valued continuous random variables for the exponential distribution, normal distribution and continuous uniform distribution. The derivation of the expected value of the minimum of real-valued continuous random variables is omitted as it can be found by applying Theorem (1).

#### Exponential Distribution

Definition (6) : Given a real-valued continuous exponentially distributed random variable, $X \sim \text{Exp}(\alpha)$, with rate parameter, $\alpha > 0$, the probability density function is $\displaystyle f_X(x) = \alpha e^{-\alpha x}$ for all $x \ge 0$ and zero everywhere else.

Corollary (6.i) The cumulative distribution function of a real-valued continuous exponentially distributed random variable, $X \sim \text{Exp}(\alpha)$, is therefore $\displaystyle F_X(x) = 1 - e^{-\alpha x}$ for all $x \ge 0$ and zero everywhere else.

Proof of Corollary (6.i)

$\displaystyle F_X(x) = \int_{-\infty}^{x} f_x(t) \, dt = \int_{0}^{x} \alpha e^{-\alpha t} \, dt = -\frac{\alpha}{\alpha} e^{-\alpha t} \bigg|_{0}^{x} = 1 - e^{- \alpha x} \quad \square$

Corollary (6.ii) : The expected value of a real-valued continuous exponentially distributed random variable, $X \sim \text{Exp}(\alpha)$, is therefore $\displaystyle \frac{1}{\alpha}$.

Proof of Corollary (6.ii)

The expected value is $\mathbb{E}(X) = \frac{1}{\alpha}$ by Definition (4) and Lemma (2) $\square$.

Lemma (2) : Given real values $\alpha, \gamma \in \mathbb{R} \quad \gamma \neq 0$, then $\displaystyle \int_{0}^{\infty} \alpha x e^{-\gamma x} \, dx = \frac{\alpha}{\gamma^2}$.

Proof of Lemma (2) :

$\displaystyle \int_{0}^{\infty} \alpha x e^{-\gamma x} \, dx = - x \frac{\alpha}{\gamma} e^{-\alpha x} \bigg|_{0}^{\infty} + \int_{0}^{\infty} \frac{\alpha}{\gamma} e^{-\gamma x} \, dx = - x \frac{\alpha}{\gamma} e^{-\alpha x} - \frac{\alpha}{\gamma}^2 e^{-\alpha x} \bigg|_{0}^{\infty}$

$\displaystyle = \lim_{x \to \infty} \left( - x \frac{\alpha}{\gamma} e^{-\alpha x} - \frac{\alpha}{\gamma^2} e^{-\alpha x} \right ) - \left( - \frac{\alpha}{\gamma^2} \right) = \frac{\alpha}{\gamma^2} \quad \square$

Theorem (2) : The expected value of the maximum of the real-valued continuous exponentially distributed random variables $X \sim \text{Exp}(\alpha)$, $Y \sim \text{Exp}(\beta)$ is $\displaystyle \frac{1}{\alpha} + \frac{1}{\beta} - \frac{1}{\alpha + \beta}$.

Proof of Theorem (2) :

$\displaystyle \mathbb{E} \left ( \max{\left( X, Y \right)} \right ) = \int_{-\infty}^{\infty} x f_X(x) F_Y(x) \, dx + \int_{-\infty}^{\infty} y f_Y(y) F_X(y) \, dy$ (Lemma (1))

$\displaystyle = \int_{0}^{\infty} x \alpha e^{-\alpha x} \left( 1 - e^{-\beta x} \right ) \, dx + \int_{0}^{\infty} y \beta e^{-\beta y} \left( 1 - e^{-\alpha y} \right ) \, dy$ (Corollary (6.i))

$\displaystyle = \left( \int_{0}^{\infty} x \alpha e^{-\alpha x} \, dx \right )- \left( \int_{0}^{\infty} x \alpha e^{-(\alpha + \beta) x} \, dx \right ) + \left( \int_{0}^{\infty} y \beta e^{-\beta y} \, dy \right ) - \left( \int_{0}^{\infty} y \beta e^{-(\alpha + \beta) y} \, dy \right)$ (Integral linearity)

$\displaystyle = \frac{1}{\alpha} - \frac{\alpha}{(\alpha + \beta)^2} + \frac{1}{\beta} - \frac{\beta}{(\alpha + \beta)^2}$ (Lemma (2), Corollary (6.ii))

$\displaystyle = \frac{1}{\alpha} + \frac{1}{\beta} - \frac{1}{\alpha+\beta} \quad \blacksquare$

#### Normal Distribution

Definition (7) : The following Gaussian integral is the error function $\displaystyle \text{erf}(x) = \frac{2}{ \sqrt{\pi} } \int_{0}^{x} e^{ - u^2 } \, du$ for which the following properties hold true:

1. Odd function: $\displaystyle \text{erf}(-x) = -\text{erf}(x)$
2. Limiting behavior: $\displaystyle \lim_{x \to \infty} \text{erf}(x) = 1$

Definition (8) : Given a real-valued continuous normally distributed random variable , $X \sim \mathcal{N}(\mu, \sigma)$, with mean parameter, $\mu$. and standard deviation parameter, $\sigma \neq 0$, the probability density function is $\displaystyle f_X(x) = \frac{1}{\sigma \sqrt{2 \pi} } e^{ -\frac{1}{2} \left ( \frac{x - \mu}{\sigma} \right )^2 }$ for all values on the real line.

Corollary (8.i) : The cumulative distribution function of a real-valued continuous normally distributed random variable, $X \sim \mathcal{N}(\mu, \sigma)$, is therefore $\displaystyle F_X(x) = \frac{1}{2} \left (1 + \text{erf} \left ( \frac{x-\mu}{\sqrt{2} \sigma} \right ) \right )$.

Proof of Corollary (8.i) :

$\displaystyle F_X(x) = \int^{x}_{-\infty} \frac{1}{\sigma \sqrt{2 \pi} } e^{ - \left ( \frac{t - \mu}{\sqrt{2} \sigma} \right )^2 } \, dt$ (Definition (2.i))

$\displaystyle = \frac{1}{ \sqrt{\pi} } \int_{-\infty}^{\frac{x - \mu}{\sqrt{2} \sigma}} e^{ - u^2 } \, du$ (U-substitution with $\displaystyle u = \frac{t - \mu}{\sqrt{2} \sigma} \implies du = \frac{1}{\sqrt{2} \sigma} dt$)

$\displaystyle = \frac{1}{ \sqrt{\pi} } \int_{-\infty}^{ 0 } e^{ - u^2 } \, du + \frac{1}{ \sqrt{\pi} } \int_{0}^{ \frac{x - \mu}{\sqrt{2} \sigma} } e^{ - u^2 } \, du$ (Definition (2.iii))

$\displaystyle = - \frac{1}{ \sqrt{\pi} } \int_{0}^{-\infty} e^{ - u^2 } \, du + \frac{1}{ \sqrt{\pi} } \int_{0}^{ \frac{x - \mu}{\sqrt{2} \sigma} } e^{ - u^2 } \, du$ (Reverse limits of integration)

$\displaystyle = \frac{1}{2} \lim_{u \to \infty} - \text{erf}(-u) + \frac{1}{2} \text{erf} \left ( \frac{x - \mu}{\sqrt{2} \sigma} \right )$ (Definition (7))

$\displaystyle = \frac{1}{2} \lim_{u \to \infty} \text{erf}(u) + \frac{1}{2} \text{erf} \left ( \frac{x - \mu}{\sqrt{2} \sigma} \right )$ (Definition (7.i))

$\displaystyle = \frac{1}{2} + \frac{1}{2} \text{erf} \left ( \frac{x - \mu}{\sqrt{2} \sigma} \right )$ (Definition (7.ii))

$\displaystyle = \frac{1}{2} \left (1 + \text{erf} \left ( \frac{x-\mu}{\sqrt{2} \sigma} \right ) \right ) \quad \square$

Corollary (8.ii) : The expected value of a real-valued continuous normally distributed random variable, $X \sim \mathcal{N}(\mu, \sigma)$, is therefore $\mathbb{E}(X) = \mu$.

Proof of Corollary (8.ii) :

$\displaystyle \mathbb{E}(X) = \int_{-\infty}^{\infty} x f_X(x) \, dx = \int_{-\infty}^{\infty} x \frac{1}{\sigma \sqrt{2 \pi} } e^{ -\frac{1}{2} \left ( \frac{x - \mu}{\sigma} \right )^2 } \, dx$ (Definition (4))

$\displaystyle = \int_{-\infty}^{\infty} (\sqrt{2}\sigma u + \mu) \frac{1}{\sqrt{\pi} } e^{ - u^2 } \, du$ (U-substitution with $\displaystyle u = \frac{x-\mu}{\sqrt{2} \sigma} \implies du = \frac{1}{\sqrt{2}\sigma} dx$ $\sqrt{2}\sigma u + \mu = x$)

$\displaystyle = \frac{\sqrt{2}\sigma}{\sqrt{\pi}} \int_{-\infty}^{\infty} u e^{ - u^2 } \, du + \frac{\mu}{\sqrt{\pi}} \int_{-\infty}^{\infty} e^{ - u^2 } \, du$ (Integral linearity)

$\displaystyle = \frac{\sqrt{2}\sigma}{\sqrt{\pi}} \left( \int_{-\infty}^{0} u e^{ - u^2 } \, du + \int_{0}^{\infty} u e^{ - u^2 } \, du \right ) + \frac{\mu}{\sqrt{\pi}} \int_{-\infty}^{\infty} e^{ - u^2 } \, du$ (Definition (1.iii))

$\displaystyle = \frac{\sqrt{2}\sigma}{\sqrt{\pi}} \left( - \int_{0}^{\infty} u e^{ - u^2 } \, du + \int_{0}^{\infty} u e^{ - u^2 } \, du \right ) + 2 \frac{\mu}{\sqrt{\pi}} \int_{0}^{\infty} e^{ - u^2 } \, du$ ($u e^{-u^2}$ is odd, $e^{-u^2}$ is even)

$\displaystyle = \mu \frac{2}{\sqrt{\pi}} \left ( \frac{\sqrt{\pi}}{2} \lim_{x \to \infty} \text{erf}(x) \right ) = \mu \quad \square$ (Definition (7), Definition (7.ii))

Definition (9) : Given a real-valued continuous normally distributed random variable, $X \sim \mathcal{N}(0, 1)$, the probability distribution function will be denoted as standard normal probability distribution function, $\phi(x)$, and the cumulative distribution function as the standard normal cumulative distribution function, $\Phi(x)$. By definition, the following properties hold true:

1. Non-standard probability density function: If $X \sim \mathcal{N}(\mu, \sigma)$, then $\displaystyle f_X(x) = \frac{1}{\sigma} \phi \left( \frac{x - \mu}{\sigma} \right )$
2. Non-standard cumulative distribution function: If $X \sim \mathcal{N}(\mu, \sigma)$, then $\displaystyle F_X(x) = \Phi\left( \frac{x - \mu}{\sigma} \right )$
3. Complement: $\Phi(-x) = 1 - \Phi(x)$

Definition (10) : [PaRe96] Given $\phi(x)$ and $\Phi(x)$, the following integrals hold true:

1. $\displaystyle \int_{-\infty}^\infty x\Phi(a+bx)\phi(x) \, dx = \frac{b}{\sqrt{1+b^2}} \phi \left( \frac{a}{\sqrt{1+b^2}} \right )$
2. $\displaystyle \int_{-\infty}^\infty \Phi(a+bx)\phi(x) \, dx = \Phi \left ( \frac{a}{\sqrt{1+b^2}} \right )$

Theorem (3) : The expected value of the maximum of the real-valued continuous normally distributed random variables $X \sim \mathcal{N}(\mu, \sigma)$, $Y \sim \mathcal{N}(\nu, \tau)$ is $\displaystyle \sqrt{ \sigma^2 + \tau^2 } \phi \left( \frac{\nu - \mu}{\sqrt{\sigma^2 + \tau^2 }} \right ) + (\nu - \mu) \Phi \left ( \frac{\nu - \mu}{\sqrt{\sigma^2 + \tau^2}} \right ) + \mu$.

Lemma (3) : Given real-valued continuous normally distributed random variables $X \sim \mathcal{N}(\mu, \sigma)$, $Y \sim \mathcal{N}(\nu, \tau)$, $\displaystyle \int_{-\infty}^{\infty} y f_Y(y) F_X(y) \, dy = \frac{\tau^2}{\sqrt{\sigma^2 + \tau^2 }} \phi \left( \frac{\nu - \mu}{\sqrt{\sigma^2 + \tau^2 }} \right ) + \nu \Phi \left ( \frac{\nu - \mu}{\sqrt{\sigma^2 + \tau^2}} \right )$.

Proof of Lemma (3) :

$\displaystyle \int_{-\infty}^{\infty} y f_Y(y) F_X(y) \, dy = \int_{-\infty}^{\infty} y \frac{1}{\tau} \phi \left ( \frac{y-\nu}{\tau} \right ) \Phi \left ( \frac{y-\mu}{\sigma} \right ) \, dy$ (Definition (9.i), Definition (9.ii))

$\displaystyle = \int_{-\infty}^{\infty} (u \tau + \nu) \phi(u) \Phi \left ( \frac{u \tau + \nu -\mu}{\sigma} \right ) \, du$ (U-substitution with $\displaystyle u = \frac{y-\nu}{\tau} \implies du = \frac{1}{\tau} dy$, $y = u \tau + \nu$)

$\displaystyle = \tau \int_{-\infty}^{\infty} u \phi(u) \Phi \left ( \frac{u \tau + \nu -\mu}{\sigma} \right ) \, du + \nu \int_{-\infty}^{\infty} \phi(u) \Phi \left ( \frac{u \tau + \nu -\mu}{\sigma} \right ) \, du$ (Integral linearity)

$\displaystyle = \frac{\tau^2}{\sqrt{\sigma^2 + \tau^2 }} \phi \left( \frac{\nu - \mu}{\sqrt{\sigma^2 + \tau^2 }} \right ) + \nu \Phi \left ( \frac{\nu - \mu}{\sqrt{\sigma^2 + \tau^2}} \right ) \, \square$ (Definition (10.i), Definition (10.ii))

Proof of Theorem (3) :

$\displaystyle \mathbb{E} \left( \max{ \left ( X, Y \right) } \right ) = \int_{-\infty}^{\infty} x f_X(x) F_Y(x) \, dx + \int_{-\infty}^{\infty} y f_Y(y) F_X(y) \, dy$ (Lemma (1))

$\displaystyle = \int_{-\infty}^{\infty} x \frac{1}{\sigma} \phi \left ( \frac{x-\mu}{\sigma} \right ) \Phi \left ( \frac{x-\nu}{\tau} \right ) \, dy + \int_{-\infty}^{\infty} y \frac{1}{\tau} \phi \left ( \frac{y-\nu}{\tau} \right ) \Phi \left ( \frac{y-\mu}{\sigma} \right ) \, dy$ (Definition (11.i), Definition (11.ii))

$\displaystyle = \frac{\sigma^2}{\sqrt{\sigma^2 + \tau^2}} \phi \left( \frac{\mu - \nu}{\sqrt{\sigma^2 + \tau^2 }} \right ) + \mu \Phi \left ( \frac{\mu - \nu}{\sqrt{\sigma^2 + \tau^2}} \right ) + \frac{\tau^2}{\sqrt{\sigma^2 + \tau^2 }} \phi \left( \frac{\nu - \mu}{\sqrt{\sigma^2 + \tau^2 }} \right ) + \nu \Phi \left ( \frac{\nu - \mu}{\sqrt{\sigma^2 + \tau^2}} \right )$ (Lemma (3))

$\displaystyle = \frac{\sigma^2}{\sqrt{\sigma^2 + \tau^2}} \phi \left( \frac{\nu - \mu}{\sqrt{\sigma^2 + \tau^2 }} \right ) + \mu \left ( 1 - \Phi \left ( \frac{\nu - \mu}{\sqrt{\sigma^2 + \tau^2}} \right ) \right ) + \frac{\tau^2}{\sqrt{\sigma^2 + \tau^2 }} \phi \left( \frac{\nu - \mu}{\sqrt{\sigma^2 + \tau^2 }} \right ) + \nu \Phi \left ( \frac{\nu - \mu}{\sqrt{\sigma^2 + \tau^2}} \right )$ (Definition (9.iii))

$\displaystyle = \frac{\sigma^2}{\sqrt{\sigma^2 + \tau^2}} \phi \left( \frac{\nu - \mu}{\sqrt{\sigma^2 + \tau^2 }} \right ) - \mu \Phi \left ( \frac{\nu - \mu}{\sqrt{\sigma^2 + \tau^2}} \right ) + \mu + \frac{\tau^2}{\sqrt{\sigma^2 + \tau^2 }} \phi \left( \frac{\nu - \mu}{\sqrt{\sigma^2 + \tau^2 }} \right ) + \nu \Phi \left ( \frac{\nu - \mu}{\sqrt{\sigma^2 + \tau^2}} \right )$

$\displaystyle = \sqrt{ \sigma^2 + \tau^2 } \phi \left( \frac{\nu - \mu}{\sqrt{\sigma^2 + \tau^2 }} \right ) + (\nu - \mu) \Phi \left ( \frac{\nu - \mu}{\sqrt{\sigma^2 + \tau^2}} \right ) + \mu \quad \blacksquare$

#### Continuous Uniform Distribution

Definition (11) : Given a real-valued continuous uniformly distributed random variable, $X \sim U(a,b)$, with inclusive boundaries $a, b$ such that $a < b$, the probability density function is $\displaystyle f_X(x) = \frac{1}{b-a}$ for all $x \in [a, b]$ and zero everywhere else.

Corollary (11.i) : The cumulative distribution function of a real-valued continuous uniformly distributed random variable, $X \sim U(a,b)$, is therefore $\displaystyle F_X(x) = \begin{cases} 0 & x < a \\ \frac{x-a}{b-a} & x \in [a,b] \\ 1 & x > b \end{cases}$.

Proof of Corollary (11.i) :

$\displaystyle F_X(x) = \int_{-\infty}^{\infty} f_X(t) \, dt = \int_{a}^{b} \frac{1}{b-a} \, dt = \frac{1}{b-a} x \bigg|_{a}^{x} = \begin{cases} 0 & x < a \\ \frac{x-a}{b-a} & x \in [a,b] \\ 1 & x > b \end{cases}\quad \square$.

Corollary (11.ii) : The expected value of a real-valued continuous uniformly distributed random variable, $X \sim U(a,b)$, is therefore $\displaystyle \frac{a+b}{2}$.

Proof of Corollary (11.ii)

$\displaystyle \mathbb{E}(X) = \int_{-\infty}^{\infty} x f_X(x) \, dx = \int_{a}^{b} x \frac{1}{b-a} \, dx = \frac{x^2}{2(b-a)} \bigg|_{a}^{b} = \frac{ b^2 -a^2 }{ 2(b-a) } = \frac{b+a}{2} \quad \square$

Theorem (4) : The expected value of the maximum of real-valued continuous uniformly distributed random variables $X \sim U(a,b)$, $Y \sim U(c,d)$ is $\displaystyle \begin{cases} \frac{c+d}{2} & a < b \le c < d \\ \frac{4 (b^3 - c^3) - 3 (c + a) (b^2 - c^2) }{6(d-c)(b-a)} + \frac{d^2 - b^2}{2(d-c)} & a \le c < b \le d \\ \frac{4(b^3-c^3) - 3(c+a)(b^2-c^2)}{6(b-a)(d-c)} + \frac{b^2-d^2}{2(b-a)} & a \le c < d \le b \\ \frac{4 (b^3-a^3) - 3 (c + a) (b^2 - a^2) }{6(d-c)(b-a)} + \frac{d^2-b^2}{2(d-c)} & c \le a < b \le d\\ \frac{4 (d^3 - a^3) -3 (c+a) (d^2-a^2) }{6(b-a)(d-c)} + \frac{b^2-d^2}{2(b-a)} & c \le a < d \le b \\ \frac{a+b}{2} & c < d \le a < b \end{cases}$.

Proof of Theorem (4) :

$\displaystyle \mathbb{E} \left ( \max{ \left ( X, Y \right )} \right ) = \int_{-\infty}^{\infty} x f_X(x) F_Y(x) \, dx + \int_{-\infty}^{\infty} y f_Y(y) F_X(y) \, dy$ (Lemma (1))

$\displaystyle = \int_{a}^{b} x \frac{1}{b-a} \begin{cases} 0 & x < c \\ \frac{x - c}{d-c} & x \in [c,d] \\ 1 & \text{otherwise} \end{cases} \, dx + \int_{c}^{d} y \frac{1}{d-c} \begin{cases} 0 & y < a \\ \frac{y - a}{b-a} & y \in [a,b] \\ 1 & \text{otherwise} \end{cases} \, dy$

Case (1) : $a < b \le c < d$

$\displaystyle = \left ( \int_{a}^{b} x \frac{1}{b-a} 0 \, dx \right ) + \left ( \int_{c}^{d} y \frac{1}{d-c} 1 \, dy \right )$

$\displaystyle = \frac{c+d}{2} \quad \square$

Case (2) : $a \le c < b \le d$

$\displaystyle = \left ( \int_{a}^{c} x \frac{1}{b-a} 0 \, dx + \int_{c}^{b} x \frac{1}{b-a} \frac{x-c}{d-c} \, dx \right ) + \left ( \int_{c}^{b} y \frac{1}{d-c} \frac{y-a}{b-a} \, dy + \int_{b}^{d} y \frac{1}{d-c} 1 \, dy \right )$

$\displaystyle = \frac{2 x^3 - 3 c x^2}{6(b-a)(d-c)} \bigg|_{c}^{b} + \frac{2 y^3 - 3ay^2 }{6(d-c)(b-a)} \bigg|_{c}^{b} + \frac{y^2}{2(d-c)} \bigg|_{b}^{d}$

$\displaystyle = \frac{2 (b^3 - c^3) - 3 c (b^2 - c^2) }{6(b-a)(d-c)} + \frac{2 (b^3 - c^3) - 3 a (b^2 - c^2) }{6(d-c)(b-a)} + \frac{d^2 - b^2}{2(d-c)}$

$\displaystyle = \frac{4 (b^3 - c^3) - 3 (c + a) (b^2 - c^2) }{6(d-c)(b-a)} + \frac{d^2 - b^2}{2(d-c)} \quad \square$

Case (3) : $a \le c < d \le b$

$\displaystyle = \left ( \int_{a}^{c} x \frac{1}{b-a} 0 \, dx + \int_{c}^{b} x \frac{1}{b-a} \frac{x-c}{d-c} \, dx + \int_{d}^{b} x \frac{1}{b-a} 1 \, dx \right ) + \left ( \int_{c}^{d} y \frac{1}{d-c} \frac{y-a}{b-a} \, dy \right)$

$\displaystyle = \frac{2x^3 - 3cx^2}{6(b-a)(d-c)} \bigg|_{c}^{b} + \frac{x^2}{2(b-a)} \bigg|_{d}^{b} + \frac{2y^3 - 3ay^2}{6(b-a)(d-c)} \bigg|_{c}^{d}$

$\displaystyle = \frac{2(b^3-c^3) - 3c(b^2-c^2)}{6(b-a)(d-c)} + \frac{b^2-d^2}{2(b-a)} + \frac{2(d^3-c^3) - 3a(d^2-c^2)}{6(b-a)(d-c)}$

$\displaystyle = \frac{4(b^3-c^3) - 3(c+a)(b^2-c^2)}{6(b-a)(d-c)} + \frac{b^2-d^2}{2(b-a)} \quad \square$

Case (4) : $c \le a < b \le d$

$\displaystyle = \left( \int_{a}^{b} x \frac{1}{b-a} \frac{x-c}{d-c} \, dx \right ) + \left( \int_{c}^{a} y \frac{1}{d-c} 0 \, dy + \int_{a}^{b} y \frac{1}{d-c} \frac{y-a}{b-a} \, dy + \int_{b}^{d} y \frac{1}{d-c} 1 \, dy \right )$

$\displaystyle = \frac{2 x^3 - 3 c x^2 }{6(d-c)(b-a)} \bigg|_{a}^{b} + \frac{2 y^3 - 3 a y^2 }{6(d-c)(b-a)} \bigg|_{a}^{b} + \frac{y^2}{2(d-c)} \bigg|_{b}^{d}$

$\displaystyle = \frac{2 (b^3-a^3) - 3 c (b^2 - a^2) }{6(d-c)(b-a)} + \frac{2 (b^3-a^3) - 3 a (b^2 -a^2) }{6(d-c)(b-a)} + \frac{d^2-b^2}{2(d-c)}$

$\displaystyle = \frac{4 (b^3-a^3) - 3 (c + a) (b^2 - a^2) }{6(d-c)(b-a)} + \frac{d^2-b^2}{2(d-c)} \quad \square$

Case (5) : $c \le a < d \le b$

$\displaystyle = \left ( \int_{a}^{d} x \frac{1}{b-a} \frac{x-c}{d-c} \, dx + \int_{d}^{b} x \frac{1}{b-a} 1 \, dx \right ) + \left ( \int_{c}^{a} y \frac{1}{d-c} 0 \, dy + \int_{a}^{d} y \frac{1}{d-c} \frac{y-a}{b-a} \, dy \right )$

$\displaystyle = \frac{2 x^3 -3 c x^2}{6(b-a)(d-c)} \bigg|_{a}^{d} + \frac{x^2}{2(b-a)} \bigg|_{d}^{b} + \frac{2 y^3 -3 a y^2}{6(b-a)(d-c)} \bigg|_{a}^{d}$

$\displaystyle = \frac{2 (d^3 - a^3) -3 c (d^2-a^2) }{6(b-a)(d-c)} + \frac{b^2-d^2}{2(b-a)} + \frac{2 (d^3-a^3) -3 a (d^2-a^2)}{6(b-a)(d-c)}$

$\displaystyle = \frac{4 (d^3 - a^3) -3 (c+a) (d^2-a^2) }{6(b-a)(d-c)} + \frac{b^2-d^2}{2(b-a)} \quad \square$

Case (6) : $c < d \le a < b$

$\displaystyle = \left ( \int_{a}^{b} x \frac{1}{b-a} 1 \, dx \right ) + \left ( \int_{c}^{d} y \frac{1}{d-c} 0 \, dy \right )$

$\displaystyle = \frac{a+b}{2}$

$\displaystyle \therefore \mathbb{E} \left ( \max{\left ( X, Y \right )} \right ) = \begin{cases} \frac{c+d}{2} & a < b \le c < d \\ \frac{4 (b^3 - c^3) - 3 (c + a) (b^2 - c^2) }{6(d-c)(b-a)} + \frac{d^2 - b^2}{2(d-c)} & a \le c < b \le d \\ \frac{4(b^3-c^3) - 3(c+a)(b^2-c^2)}{6(b-a)(d-c)} + \frac{b^2-d^2}{2(b-a)} & a \le c < d \le b \\ \frac{4 (b^3-a^3) - 3 (c + a) (b^2 - a^2) }{6(d-c)(b-a)} + \frac{d^2-b^2}{2(d-c)} & c \le a < b \le d\\ \frac{4 (d^3 - a^3) -3 (c+a) (d^2-a^2) }{6(b-a)(d-c)} + \frac{b^2-d^2}{2(b-a)} & c \le a < d \le b \\ \frac{a+b}{2} & c < d \le a < b \end{cases} \quad \blacksquare$

### Summary Table

The following summary table lists the expected value of the maximum of real-valued continuous random variables for the exponential distribution, normal distribution and continuous uniform distribution. The corresponding minimum can be obtained by Theorem (1).

Random Variables Maximum
$X \sim$ $Y \sim$
$\text{Exp}(\alpha)$ $\text{Exp}(\beta)$ $\displaystyle \frac{1}{\alpha} + \frac{1}{\beta} - \frac{1}{\alpha + \beta}$
$\mathcal{N}(\mu, \sigma)$ $\mathcal{N}(\nu, \tau)$ $\displaystyle \sqrt{ \sigma^2 + \tau^2 } \phi \left( \frac{\nu - \mu}{\sqrt{\sigma^2 + \tau^2 }} \right ) + (\nu - \mu) \Phi \left ( \frac{\nu - \mu}{\sqrt{\sigma^2 + \tau^2}} \right ) + \mu$
$\text{U}(a, b)$ $\text{U}(c, d)$ $\displaystyle \begin{cases} \frac{c+d}{2} & a < b \le c < d \\ \frac{4 (b^3 - c^3) - 3 (c + a) (b^2 - c^2) }{6(d-c)(b-a)} + \frac{d^2 - b^2}{2(d-c)} & a \le c < b \le d \\ \frac{4(b^3-c^3) - 3(c+a)(b^2-c^2)}{6(b-a)(d-c)} + \frac{b^2-d^2}{2(b-a)} & a \le c < d \le b \\ \frac{4 (b^3-a^3) - 3 (c + a) (b^2 - a^2) }{6(d-c)(b-a)} + \frac{d^2-b^2}{2(d-c)} & c \le a < b \le d\\ \frac{4 (d^3 - a^3) -3 (c+a) (d^2-a^2) }{6(b-a)(d-c)} + \frac{b^2-d^2}{2(b-a)} & c \le a < d \le b \\ \frac{a+b}{2} & c < d \le a < b \end{cases}$

### References

[GrSt01] Grimmett, Geoffrey, and David Stirzaker. Probability and Random Processes. Oxford: Oxford UP, 2001. Print.

[PaRe96] Patel, Jagdish K., and Campbell B. Read. Handbook of the Normal Distribution. 2nd ed. New York: Marcel Dekker, 1996. Print.

Written by lewellen

2013-01-01 at 8:00 am