# Antimatroid, The

thoughts on computer science, electronics, mathematics

## Large-Scale Detection and Tracking of Aircraft from Satellite Images

Abstract
In this paper a distributed system for detecting and tracking commercial aircraft from medium resolution synthetic satellite images is presented. Discussion consisting of the system’s Apache Spark distributed architecture, correlation-based template matching, heuristic-based tracking, and evaluation of the system’s Standalone, Cluster, and Cloud modes of operation are covered before concluding with future work. Experimental results support 10 m detection accuracy, 96% detection rate, and 200 ms/image and 10 mb/sec Cloud processing performance. The goal of this work is to demonstrate that a satellite-based aircraft tracking system is feasible and that the system’s oversimplifying assumptions offer a baseline which similar real-world systems may be compared. Applications of this system include real-time tracking of commercial aircraft, flight deviation, and the automatic discovery of commercial aircraft from historic imagery. To the best of the author’s knowledge, a system of this nature has not been published publicly.

## Introduction

### Motivation

Historically, search and rescue operations have relied on on-site volunteers to expedite the recovery of missing aircraft. However, on-site volunteers are not always an option when the search area is very broad, or difficult to access or traverse as was the case when Malaysia Airlines Flight 370 disappeared over the Indian Ocean in March, 2014. Crowd-sourcing services such as tomnod.com have stepped up to the challenge by offering volunteers to manually inspect satellite images for missing aircraft in an attempt to expedite the recovery process. This process is effective, but slow. Given the urgency of these events, an automated image processing solution is needed.

### Prior Work

The idea of detecting and tracking objects from satellite images is not new. There is plenty of academic literature on detection or tracking, but often not both for things like oil tankers, aircraft, and even clouds. Most distributed image processing literature is focused on using grid, cloud, or specialized hardware for processing large streams of image data using specialized software or general platforms like Hadoop. Commercially, companies like DigitalGlobe have lots of satellite data, but haven’t publicized their processing frameworks. Start-ups like Planet Labs have computing and satellite resources capable of processing and providing whole earth coverage, but have not publicized any work on this problem.

The FAA has mandated a more down to earth approach through ADS-B. By 2020, all aircraft flying above 10,000 ft will be required to have Automatic dependent surveillance broadcast transceivers in order to broadcast their location, bearing, speed and identifying information so that they can be easily tracked. Adoption of the standard is increasing with sites like Flightrader24.com and Flightaware.org providing real-time access to ongoing flights.

### Problem Statement

Fundamentally, there are two variants of this problem: online and offline. The offline variant most closely resembles the tomnod.com paradigm which is concerned with processing historic satellite images. The online variant most closely resembles air traffic control systems where we’d be processing a continual stream of images coming from a satellite constellation providing whole earth coverage. The focus of this work is on the online variant since it presents a series of more interesting problems (of which the offline problem can be seen as a sub-problem.)

In both cases, we’d need to be able to process large volumes of satellite images. One complication is that large volumes of satellite images are difficult and expensive to come by. To work around this limitation, synthetic images will be generated based off OpenFlights.org data with the intent of evaluating natural images on the system in the future. To generate data we’ll pretend there are enough satellites in orbit to provide whole earth coverage capturing simulated flights over a fixed region of space and window of time. The following video is an example of the approximately 60,000 flights in the dataset being simulated to completion:

To detect all these aircraft from a world-wide image, we’ll use correlation based template matching. There are many ways to parallelize and distribute this operation, but an intuitive distributed processing of image patches will be done with each cluster node performing a parallelized Fast Fourier Transform to identify any aircraft in a given patch. Tracking will be done using an online heuristic algorithm to “connect the dots” recovered from detection. At the end of the simulation, these trails of dots will be paired with simulated routes to evaluate how well the system did.

The remainder of this post will cover the architecture of the system based on Apache Spark, its configurations for running locally and on Amazon Web Services, and how well it performs before concluding with possible future work and cost analysis of what it would take to turn this into a real-world system.

## System Architecture

### Overview

The system relies on the data pipeline architecture presented above. Data is read in from the OpenFlights.org dataset consisting of airport information and flight routes. A fixed number of national flights are selected and passed along to a simulation module. At each time step, the simulator identifies which airplanes to launch, update latitude and longitude coordinates, and remove those that have arrived at their destination.

To minimize the amount of network traffic being exchanged between nodes, flights are placed into buckets based on their current latitude and longitude. Buckets having flights are then processed in parallel by the Spark Workers. Each worker receives a bucket and generates a synthetic satellite image; this image is then given to the detection module which in turn recovers the coordinates from the image.

These coordinates are coalesced at the Spark Driver and passed along to the tracking module where the coordinates are appended to previously grown flight trails. Once 24 hours worth of simulated time has elapsed (in simulated 15 minute increments), the resulting tracking information is passed along to a reporting module which matches the simulated routes with the flight trails. These results are then visually inspected for quality.

### Simulation Assumptions

All latitude and longitude calculations are done under the Equirectangular projection. A corresponding flight exists for each route in the OpenFlights.org dataset (Open Database License). Flights departing hourly follow a straight line trajectory between destinations. Once en route, flights are assumed to be Boeing 747s traveling at altitude of 35,000 ft with a cruising speed of 575 mph.

### Generation

Flights are mapped to one of $4000 \times 8000$ buckets based on each flight’s latitude and longitude coordinate. Each bucket spans a $0.045 \times 0.045$ degree region as illustrated in the middle layer of Fig. (2). Given a bucket, a $512 \times 512$ oversimplified synthetic medium-resolution monochromatic satellite image is created with adorning aircraft silhouettes for each $71 \times 65$ m Boeing 747 airliner in the bucket. (Visual obstructions such as clouds or nightfall will not be depicted.) This image, in addition to the latitude and longitude of the top-left and bottom-right of the image, are then passed along to the detection module.

### Detection

Given an image and its world coordinate frame, the detection module performs textbook Fourier-based correlation template matching to identify silhouettes of airplanes, $X$, in the image, $Y$:

 $\displaystyle Z = \mathcal{F}^{-} \left( \mathcal{F}(X) \circ \mathcal{F}(Y) \right )$ (1)

Where the two-dimensional Discrete Fourier Transform and inverse transform are defined as:

 $\displaystyle \mathcal{F}(f)(u,v) = \frac{1}{M N} \sum_{m = 0}^{M - 1} \sum_{n = 0}^{N - 1} f(m, n) \exp{ \left( -2 \pi i \left( \frac{mu}{M} + \frac{nv}{N} \right ) \right ) }$ (2)

 $\displaystyle \mathcal{F^{-}}(F)(m,n) = \sum_{u = 0}^{M - 1} \sum_{v = 0}^{N - 1} F(u, v) \exp{ \left( 2 \pi i \left( \frac{mu}{M} + \frac{nv}{N} \right ) \right ) }$ (3)

To carry out these calculations efficiently, a parallelized two-dimensional Fast Fourier Transform (FFT) $\mathcal{O}(N \log N)$ time algorithm was implemented for both forward and inverse operations by leveraging the fact that operations (2), (3) can be factored so that the FFT can be computed row-wise, and then on those results column-wise. Hadamard (element-wise) product of the frequency domain representation of the airplane and satellite image is done naively in quadratic time.

To denoise the results, the recovered spatial product, $Z$, is thresholded so that any real values greater than 90% of the product’s real maximum, $Z^*$, are kept:

 $\displaystyle Z_{x,y} = \Re{ \left( Z_{x, y} \right ) } 1_{ [ \Re{ \left( Z_{x, y} \right ) } \ge 0.9 Z^{*} ] }$ (4)

Since there are many values greater than the threshold, a linear time (in number of nodes) connected component labeling algorithm is applied to identify the most likely aircraft locations. The algorithm treats each pixel of the image as a node if that pixel’s value is greater than the threshold. An edge exists between two nodes if the nodes’ pixel coordinates are within an $\ell_\infty$ distance of two. The centroid of each connected component is then taken to be the true coordinate of each detected aircraft. Next only those centroids derived from clusters having more than half the average number of pixels per cluster are kept. Finally, these centroids are transformed to latitude and longitude coordinates given the world coordinate frame.

### Tracking

The tracking module uses an $181 \times 361$ grid of buckets with each bucket representing approximately a square degree region as illustrated as the top layer in Fig. (2). Each individual bucket consists of a stack of sightings where a sighting is a timestamped collection of coordinates. Here an individual coordinate is allowed to “connect” up to two other coordinates. Coordinates connected in this fashion form a trail, which is the primary output of the module.

For each latitude and longitude coordinate from the detection module, $d \in D$, the tracking module picks all the previous time step’s coordinates, $p \in P$, from the neighboring ($\ell_\infty \le 5$) buckets of $d$‘s bucket. From $P$, only those coordinates that satisfy the following criteria are considered:

• $p$ must be free to “connect” to another coordinate.
• $d$ must be collinear to the coordinates of the trail headed by $p$, i.e., $\lvert AC - AB - BC \rvert \le \epsilon$ as in Fig. (3).
• Given the predecessor of $p$, the inner product of the vectors formed from the predecessor to $p$ and $d$ must be positive, i.e., $\langle \overrightarrow{AB}, \overrightarrow{AC} \rangle > 0$ as in Fig. (3).

Next, the nearest neighbor of $p$ is chosen from this remaining set of points. If a nearest neighbor exists, then $p$ is appended to the end of the nearest neighbor’s trail, otherwise a new trail is created. Finally, $p$ is added to its designated bucket so that it can be used for future trail building.

When the simulation completes, all trails from tracking module are analyzed and matched to the known routes used in the simulation. Matching is done by minimizing the distance from the trail’s origin and destination to a route’s origin and destination respectively. If the mean orthogonal distance:

 $\displaystyle MOD(x) = \frac{1}{N} \sum_{i} \frac{ \lvert \langle w, x_i \rangle + b \rvert }{ \lVert w \rVert }$ (5)

from the coordinates in the trail to the line formed by connecting the route’s origin and destination is greater than 25 m, then the match is rejected.

### Reporting

The reporting module is responsible for summarizing the system’s performance. The average mean orthogonal distance given by Eqn. (5) is reported for all identified routes, total number of images processed and coordinates detected, and the portion of routes correctly matched is reported.

## System Configurations

### Standalone

Standalone mode runs the application in a single JVM without using Spark. Experiments were ran on the quad-core Intel i7 3630QM laptop jaws, which has 8 GB of memory, 500 GB hard drive, and is running Windows 8.1 with Java SE 7.

### Cluster

Cluster mode runs the application on a Spark cluster in standalone mode. Experiments were ran on a network consisting of two laptop computers connected to a private 802.11n wireless network. In addition to jaws, the laptop oddjob was used. oddjob is a quad-core Intel i7 2630QM laptop with 6 GB of memory, 500 GB hard drive running Windows 7. Atop each machine, Oracle VM VirtualBox hosts two cloned Ubuntu 14.04 guest operating systems. Each virtual machine has two cores, 2 GB of memory and a 8 GB hard drive. Each virtual machine connects to the network using a bridged network adapter to its host’s. Host and guest operating systems are running Java SE 7, Apache Hadoop 2.6, and Scala 2.11.6 as prerequisites for Apache Spark 1.3.1. In total, there are four Spark Workers who report to a single Spark Master which is driven by a single Spark Driver.

### Cloud

Cloud mode runs the application on an Amazon Web Services (AWS) provisioned Spark cluster managed by Apache Yarn. Experiments were ran using AWS’s Elastic Map Reduce (EMR) platform to provision the maximum allowable twenty[1] Elastic Compute Cloud (EC2) previous generation m1.medium instances (one master, nineteen core) by scheduling jobs to execute the application JARs from the Simple Storage Service (S3). Each m1.medium instance consists of one 2 GHz CPU, 3.7 GB of memory, 3.9 GB hard drive running Amazon Machine Image (AMI) 3.6 equipped with Red Hat 4.8, Java SE 7, Spark 1.3.0. In total, there are nineteen Spark Workers and one Spark Master – one per virtual machine – managed by a Yarn Resource Manager driven by a single Yarn Client hosting the application.

## System Evaluation

### Detection Rate

 $\displaystyle D_R = \frac{\# \left( \text{Detected coordinates} \right)}{\# \left( \text{Expected coordinates} \right)}$ (6)

When an image is sparsely populated, the system consistently detects the presence of the aircraft, however as the density increases, the system is less adapt at finding all flights in the region as shown in Fig. (7). This is unexpected behavior. Explanations include the possibility that the threshold needs to be made to be adaptive, or that a different approach needs to be taken all together. In terms of real world implications, FAA regulations (JO 7110.65V 5-4-4) state that flights must maintain a minimum lateral distance of 3 and 5 miles (4.8 to 8 km). In principle, there could be at most four flights in a given image under these guidelines and the system would still have a 96.6% chance of identifying all four positions.

### Detection Accuracy

A flight from DIA to CTL was simulated to measure how accurate the template matching approach works as illustrated in Fig. (8). Two errors were measured: the mean orthogonal distance given by Eqn. (5) and the mean distance between the detected and actual coordinate for all time steps:

 $\displaystyle MD(x, y) = \frac{1}{N} \sum_i \lVert x_i - y_i \rVert_2$ (7)

For Eqn. (5) a mean error of $9.99 \pm 3.2$ m was found, and for Eqn. (7) $19.95 \pm 3.3$ m. Both errors are acceptable given a single pixel represents approximately $10$ m. (For context, the global positioning system (GPS) has a 7.8 m error.)

In terms of how accurate detection is at the macro level, 500 flights were simulated and the resulting mean orthogonal distance was analyzed. Fig. (9) illustrates the bimodal distribution that was observed. 65% of the flights observed an accuracy less than 26 m with an average accuracy of $14.2 \pm 5.8$ m, while the remaining 35% saw an average accuracy of $111.9 \pm 100$ km which is effectively being off by a full degree. It is assumed that this 35% are cases where trails are paired incorrectly with routes. Based on these findings, the system enforces that all pairings not exceed a mean orthogonal distance of 25 m.

### Tracking Rate

 $\displaystyle T_R = \frac{\# \left( \text{Correctly paired trails} \right)}{\# \left(\text{ Expected trails} \right)}$ (8)

For Fig. (10), an increasing number of random flights were simulated to completion and the resulting mean tracking rate reported. Based on these findings, the tracking module is having difficulty correctly handling many concurrent flights originating from different airports. This behavior is likely a byproduct of how quickly the detection rate degrades when many flights occupy a single region of space. When running a similar simulation where all flights originate from the same airport, the tracking rate is consistently near-perfect independent of the number of flights. This would suggest the system has difficulty with flights that cross paths. (For context, there is on average 7,000 concurrent flights over US airspace at any given time that we would wish to track.)

### Performance

A series of experiments was conducted against the three configurations to measure how quickly the system could process different volumes of flights across the United States over a 24-hours period. The results are illustrated in Fig. (11). Unsurprisingly, the Cloud mode outperforms both the Standalone and Cluster modes by a considerable factor as the number of flights increases.

Configuration ms/image mb/sec Time (min)
Standalone 704 3.00 260
Cluster 670 2.84 222
Cloud 207 9.67 76
Table 1: Processing time per image, megabytes of image data processed per second, and overall processing time for 22k images by system configuration.

Table (1) lists the overall processing time for 22k images representing roughly 550k km2, and 43 GB of image data. If the Cloud configuration was used to monitor the entire United States, then it would need approximately 22 hours to process a single snapshot consisting of 770 GB of image data. Obviously, the processing time is inadequate to keep up with a recurring avalanche of data every fifteen minutes. To do so, a real-world system would need to be capable of processing an image every 2 ms. To achieve this 1) more instances could be added, 2) the implementation can be refined to be more efficient, 3) the implementation can leverage GPUs for detection, and 4) a custom tailored alternative to Spark could be used.

## Discussion

### Future Work

There are many opportunities to exchange the underlying modules with more robust techniques that both scale and are able to handle real-world satellite images. The intake and generation modules can be adapted to either generate more realistic flight paths and resulting satellite imagery, or adapted to handle real-world satellite imagery from a vendor such as Skybox Imaging, Planet Labs, or DigitalGlobe.

For detection, the correlation based approach can be replaced with a cross-correlation approach, or with the more involved Scale Invariant Feature Transformation (SIFT) method which would be more robust at handling aircraft of different sizes and orientations. Alternatively, the parallelism granularity can be changed so that the two-dimensional FFT row-wise and column-wise operations are distributed over the cluster permitting the processing of larger images.

Tracking remains an open issue for this system. Getting the detection rate to be near perfect will go a long way, but the age of historical sightings considered could be increased to account for “gaps” in the detection trail. Yilmaz et al. provide an exhaustive survey of deterministic and statistical point tracking methods that can be applied here, in particular the Joint Probability Data Association Filter (JPDAF) and Multiple Hypothesis Tracking (MHT) methods which are worth exploring further.

On the reporting end of the system, a dashboard showing all of the detected trails and coordinates would provide an accessible user interface to end-users to view and analyze flight trails, discover last known locations, and detect anomalies.

### Real-world Feasibility

While the scope of this work has focused on system internals, it is important to recognize that a real-world system requires a supporting infrastructure of satellites, ground stations, computing resources, facilities and staff- each of which imposes its own set of limitations on the system. To evaluate the system’s feasibility, its expected cost is compared to the expected cost of the ADS-B approach.

Following the CubeSat model and a 1970 study by J. G. Walker, 25 satellites ($1M ea.) forming a constellation in low earth orbit is needed to provide continuous whole earth coverage for$25M. Ground stations ($120k ea.) can communicate with a satellite at a time bringing total costs to$50M.[2] Assuming that a single computer is responsible for square degree region, the system will require 64,800 virtual machines, equivalently 1,440 quad-core servers ($1k ea.) bringing the running total to$51M.

ADS-B costs are handled by aircraft owners. Average upgrade costs are $5k with prices varying by vendor and aircraft. Airports already have Universal Access Transceivers (UATs) to receive the ADS-B signals. FAA statistics list approximately 200,000 registered aircraft suggesting total cost of$1B.

Given that these are very rough estimates, an unobtrusive $51M system would be a good alternative to a$1B dollar exchange between private owners to ADS-B vendors. (Operational costs of the system were estimated to be \$1.7M/year based on market rates for co-locations and staff salaries.)

### Conclusions

In this work, a distributed system has been presented to detect and track commercial aircraft from synthetic satellite imagery. The system’s accuracy and detection rates are acceptable given established technologies. Given suitable hardware resources, it would be an effective tool in assisting search-and-rescue teams locate airplanes from historic satellite images. More work needs to be done to improve the system’s tracking abilities for it to be a viable real-world air traffic control system. While not implemented, the data needed to support flight deviation, flight collision detection and other air traffic control functionality is readily available. Spark is an effective tool for quickly distributing work to a cluster, but more specialized high performance computing approaches may yield better runtime performance.

## References

[1] Automatic dependent surveillance-broadcast (ads-b) out equipment and use. Technical Report 14 CFR 91.225, U.S. Department of Transportation Federal Aviation Administration, May 2010.

[2] General aviation and air taxi activity survey. Technical Report Table 1.2, U.S. Department of Transportation Federal Aviation Administration, Sep 2014.

[3] Nanosats are go! The Economist, June 7 2014.

[4] A. Eisenberg. Microsatellites: What big eyes they have. New York Times, August 10 2013.

[5] A. Fasih. Machine vision. Lecture given at Transportation Informatics Group, ALPEN-ADRIA University of Klagenfurt, 2009.

[6] S. Kang, K. Kim, and K. Lee. Tablet application for satellite image processing on cloud computing platform In Geoscience and Remote Sensing Symposium (IGARSS), 2013 IEEE International, pages 1710-1712, July 2013.

[7] W. Lee, Y. Choi, K. Shon, and J. Kim. Fast distributed and parallel pre-processing on massive satellite data using grid computing. Journal of Central South University, 21(10):3850-3855, 2014.

[8] J. Lewis. Fast normalized cross-correlation. In Vision interface, volume 10, pages 120-123, 1995.

[9] W. Li, S. Xiang, H. Wang, and C. Pan. Robust airplane detection in satellite images. In Image Processing (ICIP), 2011 18th IEEE International conference on, pages 2821-2824, Sept 2011.

[10] D. G. Lowe. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2):91-110, 2004.

[11] H. Prajapati and S. Vij. Analytical study of parallel and distributed image processing. In Image Information Processing (ICIIP), 2011 International Conference on, pages 1-6, Nov 2011.

[12] E. L. Ray. Air traffic organization policy. Technical Report Order JO 7110.65V, U.S. Department of Transportation Federal Aviation Administration, April 2014.

[13] J. Tunaley. Algorithms for ship detection and tracking using satellite imagery. In Geoscience and Remote Sensing Symposium, 2004. IGARSS ’04. Proceedings. 2004 IEEE International, volume 3, pages 1804-1807, Sept 2004.

[14] J. G. Walker. Circular orbit patterns providing continuous whole earth coverage. Technical report, DTIC Document, 1970.

[15] Y. Yan and L. Huang. Large-scale image processing research cloud. In CLOUD COMPUTING 2014, The Fifth International Conference on Cloud Computing, GRIDs, and Virtualization, pages 88-93, 2014.

[16] A. Yilmaz, O. Javed, and M. Shah. Object tracking: A survey. Acm computing surveys (CSUR), 38(4):13, 2006.

Written by lewellen

2015-05-20 at 8:00 am

## Parallel Merge Sort in Java

with one comment

### Introduction

This past November I was a pretty busy getting settled into a new job and trying to balance life’s other priorities. With a new job also came a new technology stack and while I’ll continue to do C# development in my free time, I’m going to be going back to doing Java development after a seven year hiatus. Before starting the new job, I decided to refresh my memory of the language’s finer details when it comes to generics and threading. So, I decided to implement something simple and settled on a parallel implementation of merge sort. This article is going to focus on making use of Java’s various features and evaluating the theoretical and empirical run time performance of the sequential and parallel versions of the algorithm.

### Sequential Approach

#### Specification

Given a list of values, the list is sorted by employing a divide and conquer method that partitions the list into two (roughly) equal sized partitions, followed by recursively sorting each partition and then merging the two resulting sorted partitions into the final sorted list.

#### Pseudocode

 $\displaystyle \textbf{MERGE}(X, Y) \newline \indent L_X \leftarrow \textbf{LENGTH}(X) \newline \indent L_Y \leftarrow \textbf{LENGTH}(Y) \newline \indent L_Z \leftarrow L_X + L_Y \newline \indent Z \leftarrow [L_Z] \newline \indent i, j, k \leftarrow 0, 0, 0 \newline \newline \indent \textbf{while} \quad k < L_Y \newline \indent \indent \textbf{if} \quad i < L_X \land j \ge L_Y \newline \indent \indent \indent \indent Z[k] \leftarrow X[i] \newline \indent \indent \indent \indent i \leftarrow i + 1 \newline \indent \indent \textbf{else-if} \quad i \ge L_X \land j < L_Y \newline \indent \indent \indent \indent Z[k] \leftarrow Y[j] \newline \indent \indent \indent \indent j \leftarrow j + 1 \newline \indent \indent \textbf{else-if} \quad i < L_X \land j < L_Y \newline \indent \indent \indent \textbf{if} \quad X[i] \le Y[j] \newline \indent \indent \indent \indent Z[k] \leftarrow X[i] \newline \indent \indent \indent \indent i \leftarrow i + 1 \newline \indent \indent \indent \textbf{else} \newline \indent \indent \indent \indent Z[k] \leftarrow Y[j] \newline \indent \indent \indent \indent j \leftarrow j + 1 \newline \indent \indent k \leftarrow k + 1 \newline \newline \indent \textbf{return} \quad Z$ $\displaystyle \textbf{MERGE-SORT}(X) \newline \indent L \leftarrow \textbf{LENGTH}(X) \newline \indent \textbf{if} \quad L \le 1 \newline \indent \indent \textbf{return} \quad X \newline \newline \indent \textbf{return} \quad \textbf{MERGE} ( \newline \indent \indent \textbf{MERGE-SORT} ( \newline \indent \indent \indent \textbf{PARTITION}(X, 0, \lfloor\ L / 2 \rfloor + L \mod 2) \newline \indent \indent ), \newline \indent \indent \textbf{MERGE-SORT}( \newline \indent \indent \indent \textbf{PARTITION}(X, \lfloor\ L / 2 \rfloor + L \mod 2, \lfloor\ L / 2 \rfloor) \newline \indent \indent ) \newline \indent )$ $\displaystyle \textbf{PARTITION}(X, s, L) \newline \indent Y \leftarrow [L] \newline \indent k \leftarrow 0 \newline \newline \indent \textbf{while} \quad k < L \newline \indent \indent Y[k] \leftarrow X[s + k] \newline \indent \indent k \leftarrow k + 1 \newline \newline \indent \textbf{return} \quad Y$

#### Time Complexity

In terms of time complexity, the algorithm is on the order of $\mathcal{O}(n \log_2(n))$. To show this, observe that the input size, $n$, is divided into to two equal parts, $2 T(n/2)$, followed by a merge operation, $f(n)$. This leads to the recurrence relation given by $\displaystyle T(n) = \begin{cases} 1 & n \le 1 \\ 2 T(n/2) + f(n) & n > 1 \end{cases}$. By induction, the recurrence relation is reduced to $\displaystyle T(n) = 2^k T(n/2^k) + \sum_{m = 0}^{k-1} 2^n f \left ( \frac{n}{2^m} \right )$. Observing that the merge function is on the order $\mathcal{O}(n)$, i.e., $f(n) = c n$, then the expression reduces further to $\displaystyle T(n) = 2^k T \left ( \frac{n}{2^k} \right ) + \sum_{m = 0}^{k-1} c n$ and $\displaystyle T(n) = 2^k T \left ( \frac{n}{2^k} \right ) + c n k$. As the number of subdivisions increases, eventually $n$ will be reduced to $1$. As such, let $1 = n/2^k$ which implies $2^k = n$ which implies $k = \log_2(n)$, and thus $T(n) = n T(1) + c n \log_2(n)$. Therefore, $T(n) \subset \mathcal{O}(n \log_2 n) \quad \square$

#### Implementation

In attempting to implement a generic version of merge sort there were a few matters that needed to be addressed. First, the type being sorted required an order relation to be specified so that the merge operation could take place. This is facilitated by restricting the type parameter T to Comparable<T>. Secondly, I had forgotten that you can’t initialize arrays of generics in Java like you can in C# [1]. To workaround this limitation, I settled on specifying the desired operations over implementations of the List<T> interface. Finally, since the List<T> interface makes no guarantees that its implementations provide (near) constant time reading or writing of elements from the list, an additional generic parameter, L, was added so that only those implementations of the List<T> and RandomAccess [2] interfaces could use this implementation of merge sort. The rest of the implementation is a near facsimile of the pseudocode.

package com.wordpress.antimatroid;

import java.util.List;
import java.util.RandomAccess;

public interface IListOperation
<T, L extends List<T> & RandomAccess> {

L execute();
}

package com.wordpress.antimatroid;

import java.util.ArrayList;
import java.util.List;
import java.util.RandomAccess;

public class CopyListOperation
<T, L extends List<T> & RandomAccess>
implements IListOperation<T, L> {

private final L source;
private final int length, initialIndex;

public CopyListOperation(L source, int length, int initialIndex) {
if(source == null)
throw new IllegalArgumentException("source must be non-null.");

if(length < 0)
throw new IllegalArgumentException(String.format(
"length, %d, must be greater than or equal to zero.", length
));

if(initialIndex < 0)
throw new IllegalArgumentException(String.format(
"initialIndex, %d, must be greater than or equal to zero.", initialIndex
));

if(initialIndex + length > source.size())
throw new IllegalArgumentException(String.format(
"initialIndex, %d, + length, %d, must be less than or equal to source.size(), %d.",
initialIndex, length, source.size()
));

this.source = source;
this.length = length;
this.initialIndex = initialIndex;
}

@Override
public L execute() {
L destination = (L) new ArrayList<T>(length);
for(int i = 0; i < length; i++)
return destination;
}
}

package com.wordpress.antimatroid;

import java.util.ArrayList;
import java.util.List;
import java.util.RandomAccess;

public class MergeListOperation
<T extends Comparable<T>, L extends List<T> & RandomAccess>
implements IListOperation<T, L> {

private final L a, b;

public MergeListOperation(L a, L b) {
if(a == null)
throw new IllegalArgumentException("a must not be null.");

if(b == null)
throw new IllegalArgumentException("b must not be null.");

this.a = a;
this.b = b;
}

@Override
public L execute() {
int length = a.size() + b.size();
L c = (L) new ArrayList<T>(length);

int i = 0, j = 0;
for(int k = 0; k < length; k++) {
if(i < a.size() && j < b.size()) {
if(a.get(i).compareTo(b.get(j)) <= 0) {
} else {
}
} else if (i < a.size() && j >= b.size()) {
} else if (i >= a.size() && j < b.size()) {
} else {
break;
}
}

return c;
}
}

package com.wordpress.antimatroid;

import java.util.List;
import java.util.RandomAccess;

public class MergeSortListOperation <
T extends Comparable<T>,
L extends List<T> & RandomAccess
> implements IListOperation<T, L> {

private final L a;

public MergeSortListOperation(L a) {
if(a == null)
throw new IllegalArgumentException("a must not be null.");

this.a = a;
}

@Override
public L execute() {
if(a.size() <= 1)
return a;

CopyListOperation<T, L> leftPartition
= new CopyListOperation<T, L>(a, (a.size() / 2) +  a.size() % 2, 0);
CopyListOperation<T, L> rightPartition
= new CopyListOperation<T, L>(a, (a.size() / 2), (a.size() / 2) +  a.size() % 2);

MergeSortListOperation<T, L> leftSort
= new MergeSortListOperation<T, L>(leftPartition.execute());
MergeSortListOperation<T, L> rightSort
= new MergeSortListOperation<T, L>(rightPartition.execute());

MergeListOperation<T, L> merge
= new MergeListOperation<T, L>(leftSort.execute(), rightSort.execute());

return merge.execute();
}
}


#### Run Time Analysis

Noting that the theoretical time complexity is $\mathcal{O}(n \log_2 n)$, inputs of the form $2^k$ will yield a $k 2^k$ curve. Taking the logarithm of which will give $\log(k) + k$. Observing that as $k$ increases the linear term will dominate the expression. As a result, the curve should look near linear in logarithmic space with the exception of small values of $k$. Which means that conducting a linear least squares regression of the empirical run times in logarithmic space will yield a satisfactory approximation to the theoretical time complexity.

To verify that the implementation follows the theoretical time complexity, increasing values of $k$ were used to generate lists containing $2^k$ random values. These lists were then sorted and the System.nanoTime() before and after values were used to determine the elapsed time. These values were collected and a total of 50 identical trails were conducted on an Intel Core i7-2630QM CPU @ 2.00 GHz based machine with 6.00 GB RAM.

As presented in the plot, the regressed linear model in logarithmic space yields a satisfactory theoretical curve whose relative error to the empirical curve diminishes to zero as the input size increases.

### Parallel Approach

#### Specification

The parallel implementation operates under the premise that the divide portion of merge sort can be easily parallelized by sorting one partition on the present thread and sorting the other partition on a secondary thread. Once the secondary thread has completed, then the two threads join, and consequently, the two sorted lists are merged. To avoid copious thread creation, whenever the input size is less than a threshold, $\tau$, the sequential version of the algorithm is used.

This process can be easily visualized below where each left-hand branch is the originating thread processing the first partition, each right-hand branch is the secondary thread processing the second partition and the junction of those edges represents the consequent merge operation after the secondary thread as joined back in with the originating thread.

#### Time Complexity

The introduction of parallelism changes the original recurrence relation to the following:

$T(N) = \begin{cases} 1 & n \le 1 \\ 2T(n/2) + f(n) & n \le \tau \\ \max{\left (T(n/2),T(n/2)\right )} + f(n) & n > \tau \end{cases}$

Assuming, $\tau = 1$, and that there is no asymptotic difference in sorting the first and second partition, then the time complexity is on the order of $\mathcal{O}(n)$. To see this, observe that the recurrence relation becomes $T(N) = \begin{cases} 1 & n \le 1 \\ T(n/2) + f(n) & n > 1 \end{cases}$ under the presented assumtions. Following the same process of induction as in the sequential case, the recurrence relation reduces to $\displaystyle T(n) = T \left ( \frac{n}{2^k} \right ) + \sum_{m=0}^{k-1} f \left ( \frac{n}{2^m} \right )$ and is simplified further under the assumption $f(n) = c n$ to $\displaystyle T(n) = T \left ( \frac{n}{2^k} \right ) + c n \sum_{m=0}^{k-1} \frac{1}{2^m}$. Observing that the sum is a finite geometric series leads to $\displaystyle T(n) = T \left ( \frac{n}{2^k} \right ) + c n 2 (1 - \frac{1}{2^{k-1}})$ and under the same reduction argument as before to $T(n) = T(1) + c n 2 (1 - 2/n)$. Thus, the time complexity of the parallel merge sort specified is $T(n) \subset \mathcal{O}(n) \quad \square$

Assuming $\tau = \infty$, then the time complexity of the algorithm is still on the order $\mathcal{O}(n \log_2 n)$. Thus, for various values of $\tau \in [0, \infty)$ and $n \ge 2$, the time complexity is between $\mathcal{O}(n \log_2 n) \le T(n) \le \mathcal{O}(n)$.

#### Implementation

In terms of parallelizing the sequential implementation, an addition interface, IThreadedListOperation was added to provide a BeginOperation, EndOperation asynchronous programming model found in the .net world. After looking around the Java world, I didn’t encounter a preferred idiom, so I went with what I knew.

As I mentioned in the sequential approach, the original data structures were going to be arrays which have a guarantee of providing thread safe reads, but not necessarily thread safe writes. To avoid the issue all together, I decided that the IListOperations should always return a new List<T> instance so that only one thread at a time would be reading or manipulating that memory. Since I knew my implementation would not be sharing IListOperations between threads, I decided not to gold plate the implementation with synchronization constructs. If in the future such ability were required, I would go back and modify the code accordingly.

For the parallel implementation I took advantage of the fact that method arguments are evaluated left-to-right [3] to save one some space, but if the specification ever changed, then it would be more appropriate to move the out the leftSort.execute() and rightSort.executeEnd() methods up a line to form a more explicit operation.

package com.wordpress.antimatroid;

import java.util.List;
import java.util.RandomAccess;

<T, L extends List<T> & RandomAccess>
implements Runnable, IListOperation<T, L> {

public void executeBegin() {
throw new IllegalStateException();

}

public L executeEnd() {
throw new IllegalStateException();

try {
} catch (InterruptedException e) {

}

return getResult();
}

public L execute() {
throw new IllegalStateException();

run();
return getResult();
}

abstract protected L getResult();
}

package com.wordpress.antimatroid;

import java.util.List;
import java.util.RandomAccess;

<T extends Comparable<T>, L extends List<T> & RandomAccess>

private final L a;
private L b;

private final int threshold;

this(a, 1024);
}

public MergeSortThreadedListOperation(L a, int threshold) {
if(a == null)
throw new IllegalArgumentException("a must be non-null.");

if(threshold <= 0)
throw new IllegalArgumentException("threshold must be greater than zero.");

this.a = a;
this.threshold = threshold;
}

@Override
public void run() {
if(a.size() <= 1) {
b = a;
return;
}

if(a.size() <= threshold) {
MergeSortListOperation<T, L> mergeSort = new MergeSortListOperation<T, L>(a);
b = mergeSort.execute();
return;
}

CopyListOperation<T, L> leftPartition
= new CopyListOperation<T, L>(a, (a.size() / 2) +  a.size() % 2, 0);

CopyListOperation<T, L> rightPartition
= new CopyListOperation<T, L>(a, (a.size() / 2), (a.size() / 2) +  a.size() % 2);

rightSort.executeBegin();

MergeListOperation<T, L> merge
= new MergeListOperation<T, L>(leftSort.execute(), rightSort.executeEnd());

b = merge.execute();
}

@Override
protected L getResult() {
return b;
}
}


#### Run Time Analysis

Noting that the time complexity for the parallel approach is $\mathcal{O}(n)$, a simple linear least squares regression of the empirical run times in normal space will yield a satisfactory approximation to the theoretical time complexity.

The trial methodology used in the sequential run time analysis is used once again to produce the following plot. Note that it begins at 2048 instead of 1. This was done so that only the parallel implementation was considered and not the sequential implementation when the input size is $\le 1024$.

As presented in the plot, the regressed linear model in logarithmic space yields a satisfactory theoretical curve whose relative error to the empirical curve diminishes to zero as the input size increases.

#### Threshold Selection

As a thought experiment, it makes sense that as the threshold approaches infinity, that there is no difference between the sequential implementation and parallel one. Likewise, as the threshold approaches one, then the number of threads being created becomes exceedingly large and as a result, places a higher cost on parallelizing the operation. Someplace in the middle ought to be an optimal threshold that yields better run time performance compared to the sequential implementation and a pure parallel implementation. So a fixed input size should produce a convex curve as a function of the threshold and hence have a global minimum.

Conducting a similar set of trials as the ones conducted under the analysis of the sequential run time give a fully parallel and sequential curve which to evaluate where the optimal threshold resides. As the plot depicts, as the threshold approaches one, there is an increase in the processing taking the form of a convex curve. As the threshold exceeds the input size, then the sequential approach dominates. By conducting a Paired T-Test against the means of the two curves at each input size, 1024 was determined to be the optimal threshold based on the hardware used to conduct the trials. As the input size grows, it is evident that for thresholds less than 1024, the sequential approach requires less time and afterwards, the parallel approach is favorable.

### Conclusion

In comparing the sequential and parallel implementations it was observed that the specified parallel implementation produced as much as a 2.65 factor improvement over the specified sequential implementation for megabyte sized lists.

Larger sized lists exhibited a declining improvement factor. It is presumed that as the input size grows that the amount of memory being created is causing excessive paging and as a result increasing the total run time and consequently reducing the improvement factor. To get around this limitation, the algorithm would need to utilize an in-place approach and appropriate synchronization constructs put into place to guarantee thread safety.

From a theoretical point of view, the improvement factor is the ratio of the run time of the sequential implementation to the parallel implementation. Using the time complexities presented, $\displaystyle S = \frac{n \log_2 n}{n}$. Taking the limit as the input size grows to infinity gives $\displaystyle \lim_{n \to \infty} \log_2 n = \infty$. So if there is any upper bound to the improvement factor it should be purely technical.

### Footnotes

[1] This design decision is discussed in §4.7 of the Java Language Specification (3rd Edition) on reifiable types.

[2] The only two java.util classes providing this guarantee are ArrayList and Vector. Both of which implement the interface RandomAccess which is intended indicate that the class provides the (near) constant reading and writing of elements.

[3] The left-to-right order of operations is specified by §15.7.4 of the Java Language Specification (3rd Edition). Also worth noting the specification recommends against the practice of relying on this convention however in §15.7:

… It is recommended that code not rely crucially on this specification. Code is usually clearer when each expression contains at most one side effect, as its outermost operation, and when code does not depend on exactly which exception arises as a consequence of the left-to-right evaluation of expressions.

Written by lewellen

2012-12-01 at 8:00 am

## Viderefit: A Fitness Tracking App for Android Tablets

with one comment

### Introduction

Earlier this year I talked a bit about how I wanted to do some Android development to broaden my skill set. A little after that post I finally joined the 21st century and got an Android smartphone. Over the course of the summer I recorded all of my hikes and bike rides using Google’s My Tracks app. With my season coming to a close, I began to think about what I could do with all this data that I’d collected and what kind of insights I could discover. As a result, I came up with Viderefit, a simple Android tablet app, that allows me to review my changes in my performance over time. In this post I’m going to go over the product design cycle that went into making this first phase of the app- from brain storming, requirements building, user interface design, development, and post-production. I’ll be finishing up with some thoughts on the next set of features I’ll be contemplating to make the app commercially viable on Google Play.

### Goals

For the first phase of the project, I set out with a few simple goals that I wanted to focus on:

• Since I’d been focusing research projects lately, I wanted to return to my roots and focus a bit on user experience and interface design. As a result, I decided It was important to me that I create an intuitive to use and visually appealing user interface that utilized a number of appropriate and meaningful information visualization techniques.
• I’ve done a lot of C# and Haskell development lately, and I wanted to do something relatively new, so I decided that I would develop on Android and dust off my Java skills from my college days.
• I wanted a “quick win”, something that I knew that I could complete over the course of a couple weeks. As a result, I decided that I would spend two weeks planning the project starting in mid-September, followed by another two weeks of development wrapping up mid-October, and the remaining two weeks of October to put together this post for a November publication.

### Brain Storming

In thinking about what exactly it was I was going to build I began to ask myself, what questions should I be asking myself. So, I opened up Word and began typing out a bullet point list of questions to understand where I was going with this project. First thing I began to think about was what exactly is physical fitness? What exactly is being measured over time to show improvement? What exactly is improvement? I had some ideas from my experience, but nothing formal, so like anyone else, I jumped Wikipedia and came across the following quotation on the topic’s page:

Physical fitness has been defined as a set of attributes or characteristics that people have or achieve that relates to the ability to perform physical activity. – Physical Activity and Health: A Report of the Surgeon General

Not being completely satisfied with this, I looked around a bit more and found several pages outlining five areas that constitute physical fitness: aerobic or cardiovascular endurance, body composition, muscular strength, muscular endurance and finally, flexibility. Having felt like some progress was made, I moved on to the next question pertaining to measurements. Researching each of the five areas yielded some insights in the types of tests and measurements being used to assess these abilities such as VO2 max, BMI, ROM, S&R and a whole slew of alphabet soup measurements that I unfortunately did not have access to nor were they obtainable from the available set of data.

Thinking a bit more about the data that was available to me, it was clear the only area of physical fitness I could focus on was aerobic endurance. Despite the fact I lacked sufficient data to derive some of the formal measures of physical fitness, I could derive some common sense measures to relate my performance over time. Am I going longer, going further, going faster as I got deeper into my season? Is my improvement uniform over time or did I hit any plateaus? And so on. To explore these ideas, I exported all of the My Tracks data from my smartphone to a SD Card and combined the results using a throwaway C# application and loaded the combined CSV file into Excel.

Based on my explorations in Excel, I decided that I had the data I needed to answer the types of common sense question I was thinking about and decided what I was after was three different views of my data: a summary dashboard, performance reporting and a raw view of the data.

### Requirements

In deciding on my requirements, I thought a bit about what I had read in Ben Fry‘s Visualizing Data: Exploring and Explaining Data, that exploring most data sets consists of acquiring, parsing, filtering, mining, representing, refining and interacting with the data set. Keeping that in mind, I decided that I would likely have a series of tabs providing different views of the underlying data and sets of tools on each tab to assist in that data’s interpretation.

The summary dashboard is about capturing the “big picture” of the underlying data. In thinking about what was important to me, I wanted to capture how far I’d gone, how long I’d spent and how many times I went out. Each of these three sections would be broken down into a total, a percentage of some reference figures (e.g., the distance between cities), a chart showing the total broken out by activity type and month, box plot showing the underling distribution, a stacked bar chart showing the underlying activity distribution and finally the longest, furthest, or most common track was to be presented.

Performance reporting is about enabling the end user to explore the underlying data. In essence, enabling the end user to chart different features plotted against one another and summarized according to some scheme. The user would then be able to filter by activity type and break the data set out into pre-season, mid-season and post-season components to better see trends taking place over time.

Finally, the raw view of the data provides a listing of all the tracks that were captured. Selecting a track displays a speed and altitude plot along with box plots for speed and altitude for the track in addition to box plots for the season so that the user can compare how a particular track compares to seasonal statistics.

### Design

With an idea of the type of requirements that would be needed, it is time to flush out what the user interface will look like and how the user will interact with it. There are a lot of great products out there for mocking up user interfaces, I’ve used Balsamiq and really enjoyed it, but decided for this project, I would keep things simple and just mock things up in Photoshop since it’s easy to work with and low fidelity designs are all that’s needed at this point.

The summary dashboard incorporates the requirements into a vertical three panel design capturing distance, time and frequency of the underlying data. The dashboard is meant to be looked at and not interacted with, as a result the only thing the end user can do at this point is click on other tabs.

Bulk of the features in the application will appear on the performance reporting tab. The user will be able to select x-Axis, y-Axis features and y-Axis feature aggregation strategy in order to plot out the results in the right-hand chart area. Beneath the selection criteria are checkboxes for splitting the data out in to full season, pre-season, mid-season and post-season components. Finally, the user can filter out different activities by checking each activity for exclusion or inclusion.

The view of the raw data is to provide a view outlining all of the user’s tracks. Each track listing includes the name of the track, date, length, duration and a altitude sparkline. Clicking on a track reveals a speed and altitude plot along with the box plots described in the previous section.

### Development

Based on the planning done earlier in the project, development was a matter spending some time in IntelliJ, translating the design into the appropriate Android conventions and implementing the necessary logic to calculate various statistics and data calculations.

Much of what was needed to implement the application was available in the JDK and Android SDK, however there were a few areas I felt I could leverage existing open source libraries without having to roll my own solution (especially given the timeline I had decided upon):

• For charting I decided to use achartengine (1.0.0) since it looked to be the most stable and used charting library for Android.
• To parse the CSV file containing all of the track information, I went with opencsv (2.3) since it seems to most widely used. Although it does look like an Apache Commons CSV package is in the works but not yet final.
• Since the time and date handing in Java is embarrassingly lacking in JDK 1.6, I ended up using joda-time (2.1) for presenting end user friendly date and time representations.

In terms of code organization and events that take place, the application is structured around Android’s fragment approach to deal with having to provide different views based on the device being used. Since the focus of the application was to develop a tablet application, no additional layouts were developed to support smaller screen sizes. The main activity consists of loading data from an SD card and creating handlers for tab events for each of the three tabs. Each individual tab consists of a replicated paradigm of master-detail fragments and additional settings that are passed between fragments as bundles whenever an end user event takes place.

The referenced packages: common, controls, reporting and serialization contain classes for binding views to data, data aggregation (essentially a watered-down version of .NET’s LINQ and Haskell’s higher order functions), and classes for loading track data into memory.

### Post-production

With development complete, I set out to do some polishing and make the application visually consistent. To start things off, I wanted to settle on a color palette for the application. This was done by sampling the HSB space on 60 degrees increments of hue offset by 0, 15, and 30 degrees of hue, with fixed 100% saturation and 80% brightness giving a vibrant 24 color palette to work with.

Once the color scheme was established, it was a matter of going through the application and making sure each user interface element was using the correct color resource. Once the colors were applied, I focused on applying a consistent spacing between all of the UI elements- specifically 16 dp and 8 dp depending on the context and bordering between components. From there, each chart was examined to make sure that the axes units and labels were presented correctly. One gripe about achartengine is that I was unable to find a way to set the axis title’s padding, so there is some overlap between the axis value labels and the axis title.

With the application spruced up, it was on to icon design and selection. For the application icon I decided to to do a simple vector-based tri-folded map with an overlaid panel and chart.

For the icons to be used in the application, I used those found in Google’s My Tracks app since those icons are available under a Creative Commons 3.0 BY-SA license and represent the vast majority of data that would be imported. Worth noting that most of those icons are based on the National Park Service’s Map Symbols Collection. Future versions of Viderefit will likely switch over to the NPS set since they represent a more considerable collection of activities and the collection is under a public domain license.

Last thing to settle on was the name of the application. During development the name was simply “My Track Visualizer”, but I wanted the app to be more than that, so I decided on “SeeFit” or “cFit”, whichever happened to be available. After a Google search, neither were available so, I decided to use the Latin word for “to see”, Videre, and luckily “Viderefit” didn’t show up in any Google search results, so presumably nobody else has taken it.

### End result

After finishing post-production, the application was looking much more consistent and polished. Below are screenshots of each tab taken from my 10.1″ Acer Iconia Tab A500 development device.

### Future Work

In thinking about what it will take to get this product market worthy, there are a few things that stand out in my mind:

• Data importing – for this project the data resided on an external SD card and was loaded into memory from a parsed CSV file. Ideally, the end user should be able to import data from multiple applications and in varying formats such that the data is stored more efficiently using the built-in SQLite capabilities of the platform.
• Data exporting – one open question is how this application fits into the broader fitness application ecosystem and what exportable data is of use to other applications.
• Territory – the project omits any presentation of actual GPS data recorded during each session. For those individuals who are more interested in where they’ve been over their measurements, it seems a territory view would be a selling point. In addition, some form of integration done with the Google Maps API would also help visualize territory and speeds on the map over time.
• Additional screen support – right now the application was designed specifically for tablets, but other users may wish to use this application on their smartphone.
• Goal support – having reviewed other applications on the market, the idea of goal tracking is a recurring theme. To make the product more marketable, it would be useful to add in this kind of functionality.

### Conclusion

Reflecting back on the goals I set out at the beginning of the project, I feel I made an intuitive and easy to use product consisting of appropriate information visualization techniques; worked on a new platform other than my typical C# and Haskell environments; finally, managed to complete the project according to the timeline I established by finishing all work a full two weeks sooner than originally scheduled. Working on a new platform and hardware was a fun experience. With the work that I’ve done, I feel that I can proceed onto the next phase of the project and determine how to get the application ready for the marketplace and hopefully bring in some revenue in the process.

Written by lewellen

2012-11-01 at 8:00 am

Posted in Projects

Tagged with , , ,