Posts Tagged ‘Grad School’
End of the Grad School Era
If you find a path with no obstacles, it probably doesn’t lead anywhere. ~ Frank A. Clark
After a yearandahalf of intense work, I’m happy to announce that I’ve completed my Master of Science (M.S.) in Computer Science. It has been a unique experience full of challenges and opportunities that I did not fully anticipate when I quit my job as a Senior Software Developer last year. I met a diverse group of people at different points in their lives who all shared a common goal to seek out new knowledge and improve themselves; I among them. But it also meant studying something I’m passionate about, and I’m looking forward to putting into practice my new found machine learning knowledge.
When I announced going back to school I talked about making a jump; this idea that there was something better waiting for me out there. After being in the air for the past yearandahalf, I’ve landed on the west coast where I’ll be writing software related to artificial intelligence research. It’s a departure from the life I’ve built here in Colorado, but it’s also an opportunity to continue advancing my knowledge and cultivating expertise in a unique area; it is a jump to bigger and better things that I hope will be a rewarding experience.
It’s tough to say what opportunities I’ll get to work on side projects in the coming year, but I do hope to work on some miscellaneous projects related to clustering GPS trajectories; put some time into my fitness tracking application Viderefit; and put more time into GPUbased speech recognition system I’ve been steadily working on. As always, check out the archive for old posts, and subscribe by email (see side column), RSS, or Twitter (@antimatroidthe) for the latest happenings.
Using GPUs to solve Spatial FitzHughNagumo Equation Numerically
Introduction
Modeling Action Potentials
In neurophysiology, we wish to model how electrical impulses travel along the complex structures of neurons. In particular, along the axon since it is the principle outbound channel of the neuron. Along the axon are voltage gated ion channels in which concentrations of potassium and sodium are exchanged. During depolarization, a fast influx of sodium causes the voltage to gradually increase. During repolarization, a slow outflux of potassium causes the voltage to decrease gradually. Because the potassium gates are slow to close, there is a negative voltage dip and recovery phase afterwards called hyperpolarization.
Attempts to provide a continuous time model of these phases began with work in the 1950s by Hodgkin and Huxley [HH52]. The duo formulated a fourdimensional nonlinear system of ordinary differential equations based on an electrical circuit model of giant squid axons. This landmark contribution was later simplified in the 1960s by FitzHugh [Fit61]. Here FitzHugh casted the problem in terms of Van der Pol oscillators. This reformulation allowed him to explain the qualitative behavior of the system in terms of a twodimensional excitation/relaxation statespace model. The following year Nagumo [NAY62] extended these equations spatially to model the desired action potential propagation along an axon, and demonstrated its behavior on analog computers. This spatial model will be the focus of this work, and a more comprehensive account of these developments can be found in [Kee02].
FitzHughNagumo Equation
The Spatial FitzHughNagumo equation is a twodimensional nonlinear reactiondiffusion system:

(1) 
Here represents the action potential (voltage) along the axon, and the recovery of the system. Constant is absent from the literature, and is inversely proportional to recovery time.
Mathematical Analysis
The system does not admit a general analytic solution. Mathematical analysis of the system is concerned with the existence and stability of traveling waves with [AK15] providing a thorough account of these aspects. Other analyses are concerned with the statespace model and the relationship of parameters with observed physiological behavior. In particular: selfexcitatory, impulse trains, single traveling wavefronts and impulses, doubly traveling divergent wave impulses, and nonexcitatory behavior that returns the system to a resting state. Since the FitzHughNagumo equations are well understood, more recent literature is focused on higher dimensional and stochastic variants of the system which [Tuc13] discusses in detail.
Numerical Analysis
A survey of the literature revealed several numerical approaches to solving the FitzHughNagumo equations consisting of a Finite Element method with Backward Differentiation Formulae [Ott10], and the Method of Lines [Kee02] approach. In this work, three approaches based on the Finite Difference method are investigated: an explicit scheme, an adaptive explicit scheme, and an implicit scheme; along with their associated sequential CPU bound and parallel GPU bound algorithms.
Explicit Finite Difference Scheme
To study the basic properties of the FitzHughNagumo equations an explicit scheme was devised using forward and central differences for the temporal and spatial derivatives respectively.

(2) 
Truncation errors are linear in time, , and quadratic, , in space. For stability, , implying the use of inconveniently small time steps, however in practice it seems rare to find any mention of in the literature. Each time step can be computed sequentially or in parallel in linear time, . However, there is a significant constant factor improvement delivered by parallel GPU implementations since modern GPUs allow millions of nodes to be updated in parallel giving near constant runtime performance. Further parallelization is possible by trivially distributing contiguous node sets to a combination of multiple machines and multiple GPUs.
Experiments
The explicit scheme was used to investigate the traveling wave and divergent wave behaviors of the FitzHughNagumo equations. Fig. (1) demonstrate the application of a constant impulse, , at the end of an unexcited axon, , giving rise to oscillatory behavior. Fig. (2) shows a Gaussian impulse applied initially to the center of the axon, . As the system evolves, the impulse collapses giving rise to two separate impulses that travel in opposite directions before dissipating at the boundaries returning the axon to a completely unexcited state. Both test cases are qualitatively consistent with the literature and share the following parameters:
(3) 
Error Analysis
Figure 3: Varying values of w.r.t at . fixed to 0.0098. 
Figure 4: Varying values of w.r.t at different values of . fixed to 0.5. 
Two experiments were ran to verify the suggested truncation errors. The numerical solution given by a sufficiently small step size serves as an analytic solution to , which is then used to evaluate how larger steps sizes deviate as measured by the root mean squared error. Fig. (3) looks at varying and shows that as the step size as halved, the resulting RMSE is quartered consistent with the expect quadratic truncation term of the scheme. Similarly, Fig. (4) looks at varying and shows that independent of , halving the step size results in an equally reduced RMSE consistent with the expected linear truncation term of the scheme.
Runtime Performance
Figure 5: Wall time for memory allocation, memory transfers, and core loop for both CPU and GPU. 
Figure 6: Wall time for just core loop for both CPU and GPU. 
Comparison of sequential CPU and parallel GPU bound algorithms is based on the wall time taken to perform a single iteration to calculate and . Reported figures are a function of spatial node quantity and the mean run time of 30 iterations. The main bottleneck to performance is transferring buffers between the host and device. Fig. (5) illustrates this effect. For the largest test case considered, , “GPU w/o tx” delivers 19x faster runtime over “GPU w/ tx” by not transferring intermediate results back to the host. Similarly, it delivers significantly faster runtime over the CPU by a factor of 73x. To better understand this performance boost, Fig. (6) looks at just the core loop execution time. For , the core loops account for only 3.7% and 14.9% of the execution time on the CPU and GPU respectively with the GPU delivering 18x better performance than the CPU. These percentages increase monotonically, and suggest that memory transfers will eventually be dominated by sufficiently large inputs on GPUs with an abundance of memory.
Adaptive Explicit Finite Difference Scheme
While investigating the traveling wavefront solution of the system, numerical oscillations were observed as the simulation zeroed in on the steady state solution. To address this issue, an adaptive explicit scheme was devised. A survey of the literature suggested a family of moving grid methods to solve the FitzHughNagumo system based on a Lagrangian formulation [VBSS90] and a method of lines formulation [Zwa11]. Here a heuristic is used to concentrate grid points where the most change takes place.
The formulation of the scheme is identical to the former section with second order spatial derivatives being approximated by Lagrange interpolating polynomials since the technique supports nonuniform grids. A three point, first order truncation error scheme is used. A five point, third order truncation error scheme was considered, but abandoned in favor of the empirically adequate three point scheme.
(4) 
The first and second spatial derivatives given by the Lagrange interpolating polynomials are used to decide how much detail is needed in the domain in addition to the first temporal derivative given by finite differences. First, a coarse grid is laid down across the entire domain to minimize the expected distance between nodes since the second order derivative has first order truncation error. Next, the magnitude of the first order spatial derivative (second order truncation error) is used to lay down a finer grid when the derivative is greater than a specified threshold. This corresponds to where the waves of the solution are.
Next, the temporal derivative is used to lay down an even finer grid in those areas having an absolute change above a specified threshold. The change in time corresponds to the dynamics of the system, by adding detail in these areas, we can preserve the behavior of the system. Finally, the zeros of the second spatial derivative serve as indicators of inflection points in the solution. These correspond most closely to the location of the traveling wavefronts of the equation. Here, the most detail is provided around a fixed radius of the inflection points since the width of the wavefronts does not depend on parameterization.
Each iteration, the explicit scheme is evaluated on the grid from the previous iteration and those results are then used to perform the grid building scheme. To map the available solution values to the new grid, the points are linearly interpolated if a new grid point falls between two previous points, or mapped directly if there is a stationary grid point between the two iterations. The latter will be the more common case since all grid points are chosen from an underlying uniform grid specified by the user. Linear interpolation will only take place when extra grid points are included in an area.
Experiments
Figure 7: Top: Numerical oscillation of a centered Gaussian impulse with and all other parameters the same as those given in previous section for . Bottom: Eliminated numerical oscillation based on the adaptive explicit scheme.
Fig. (7) is the motivating example for this scheme and demonstrates how numerical oscillations can be avoided by avoiding calculations in regions with stationary solutions. To demonstrate that the scheme works well for other test cases, the more interesting and dynamic divergent impulse test case is shown in Fig. (8). Here we can see that as time progresses, points are allocated to regions of the grid that are most influenced by the system’s dynamics without sacrificing quality. For the time steps shown, the adaptive scheme used 23x fewer nodes than the explicit scheme from the previous section.
Figure 8: Example of adaptive grid on the divergent impulse example for { 15.06, 30.13, 45.05, 60.12, 75.05, 90.11 }. Red lines given by the explicit scheme from the previous section, green dots given by adaptive explicit scheme.
Implicit Finite Difference Scheme
An implicit CrankNicolson scheme is used in this section to solve the FitzHughNagumo equations. To simplify the computations, is solved explicitly using a second order central difference scheme before solving leading to the following formulation:

(5) 
The truncation error for this scheme is with an improved, albeit still restrictive, requirement over the explicit scheme that . Based on this formulation, the righthand side is completely known when is calculated. This gives a simpler expression to consider:
(6) 
Newton’s method is used to solve the nonlinear function with an initial guess equal to the previous time step’s values, . To refine the estimate, the following system is solved iteratively until the magnitude of the refinement, , is less than a specified tolerance or machine epsilon.
(7) 
This formulation gives rise to a tridiagonal Jacobian matrix, , where the first and last row are specified by noflux boundary conditions, and constant offdiagonal entries and variable diagonal entries given by the partial derivatives of each nodal equation.

This tridiagonal system can be solved sequentially in time using the Thomas algorithm or in time using the Cyclic Reduction algorithm of [Hoc65]. Cyclic Reduction begins with a forward phase in which all odd indexed unknowns are eliminated recursively until a or system remains that can be solved directly. Cyclic Reduction ends with a backward phase that walks up the recursion stack to solve for the previously eliminated odd indexed unknowns. Since the algorithm decouples unknowns at each recursion level of the forward and backwards phases, these reductions can be done in parallel in time on the GPU assuming fold parallelism.
Further parallelism can be achieved by solving points explicitly along the domain, then using those results to create implicit subdomains that can be solved using either Thomas or Cyclic Reduction algorithms on a combination of multiple machines and multiple GPUs at the expense of additional communication and coordination.
Error Analysis
Figure 9: Varying values of w.r.t at . fixed to 0.000071. 
Figure 10: Varying values of w.r.t at different values of . fixed to 0.52. 
Evaluation of the spatial error revealed an unexpected linear behavior as shown in Fig. (9). As the spatial step is halved, the resulting error was expected to become quartered, instead it is halved. No clear explanation was discovered to account for this discrepancy. With respect to time, Fig. (10) shows that both the Thomas and Cyclic Reduction algorithms were quadratic as multiple points in time were evaluated. The Thomas algorithm produced aberrations as the step size increased eventually producing numerical instability, while the Cyclic Reduction algorithm was immune to this issue.
Figure 11: Stability of implicit solvers. 
Figure 12: Convergence of implicit solvers. 
In terms of stability, the implicit scheme is stable up to depending on which tridiagonal solver is used, and for comparison, the explicit scheme is stable up to precisely as shown in Fig. (11). The Thomas algorithm demonstrates a slightly weaker stability guarantee than the Cyclic Reduction algorithm becoming unstable around and respectively. In terms of convergence, the Thomas algorithm typically takes more iterations than Cyclic Reduction to obtain the same degree of accuracy as shown in Fig. (12). Taking the sequence of adjustments up to machine epsilon , Thomas algorithm gives suggesting qlinear convergence. Likewise, Cyclic Reduction gives suggesting qquadratic convergence.
Runtime Performance
Figure 13: Performance comparison of CPU and GPU bound Thomas and Cyclic Reduction algorithms. 
Figure 14: Performance comparison of Jacobian solvers. 
Sequential Thomas and Cyclic Reduction routines perform equally well on CPU as shown in Fig. (13). The parallel Cyclic Reduction method did not demonstrate significant performance gains on the GPU. However, looking at just the time taken to solve the Jacobian each iteration (not including initialization or memory transfers), parallel Cyclic Reduction on the GPU was 45x faster than both sequential CPU solvers as shown in Fig. (14).
To explain the poor performance of Cyclic Reduction on the GPU, there are a number of different factors at play. The algorithm is susceptible to warp divergence due to the large number of conditionals that take place. Reliance on global memory access with varying strides contributes to slow performance since shared memory can’t be effectively utilized, and each iteration the adjustments are transferred from device to host to decide if Newton’s method should terminate. These different factors suggest alternative GPU implementations need to be investigated to address these different issues.
Discussion
Work Environment
All results in this paper were based on code written in C#, and compiled using Microsoft Visual Studio Express with Release settings to run on a commodity class Intel Core i72360QM quad core processor. Open source CUDAfy.NET is used to run C# to CUDA translated code on a commodity class NVIDIA GeForce GT 525m having two streaming multiprocessors providing 96 CUDA cores.
Future Work
Numerically, additional work could be done on the adaptive explicit scheme. In preparing the scheme, a cubic splinebased approach was abandoned in favor of simpler approaches due to time pressures. It would be worthwhile to explore how to solve the system on a splinebased “grid”.
In addition, further work could be done to optimize the implementation of the parallel Cyclic Reduction algorithm on the GPU since it delivered disappointing runtime behavior compared to the sequential algorithm on the CPU. [CmWH14] mention several different optimizations to try, and I believe better global memory access will improve runtime at the expense of more complicated addressing. As an alternative to Cyclic Reduction, both [CmWH14] and [ZCO10] detail several different parallel tridiagonal solvers that could be deployed.
In terms of models considered, there are a number of different directions that could be pursued including higher dimensional [MC04], coupled [Cat14]], [Ril06], and stochastic variants [Tuc13] of the spatial FitzHughNagumo equation. Coming from a probabilistic background, I would be interested in investing time in learning how to solve stochastic ordinary and partial differential equations.
Conclusion
Three finite difference schemes were evaluated. An explicit scheme shows great performance on both the CPU and GPU, but it is susceptible to numerical oscillations. To address this issue, an adaptive explicit scheme based on heuristics was devised and is able to eliminate these issues while requiring fewer nodes to produce results onpar with the explicit scheme. An implicit scheme was evaluated which demonstrated a principled, and robust solution for a variety of test cases and is the favored approach of the three evaluated.
References
[AK15] Gianni Arioli and Hans Koch. Existence and stability of traveling pulse solutions of the fitzhughnagumo equation. Nonlinear Analysis: Theory, Methods and Applications, 113:5170, 2015
[Cat14] Anna Cattani. Fitzhughnagumo equations with generalized diffusive coupling. Mathematical Biosciences and Engineering, 11(2):203215, April 2014
[CmWH14] LiWen Chang and Wen mei W. Hwu. A guide for implementing tridiagonal solvers on gpus. In Numerical Computations with GPUs, pages 2944. Springer International Publishing, 2014.
[Fit61] Richard FitzHugh. Impulses and physiological states in theoretical models of nerve membrane. Biophysical journal, 1(6):445, 1961.
[HH52] Alan L. Hodgkin and Andrew F. Huxley. A quantitative description of membrane current and its application to conduction and excitation in nerve. The Journal of physiology, 117(4):500544, 1952.
[Hoc65] R. W. Hockney. A fast direct solution of poisson’s equation using fourier analysis. J. ACM, 12(1):95113, Jan 1965.
[Kee02] James P. Keener. Spatial modeling. In Computational Cell Biology,volume 20 of Interdisciplinary Applied Mathematics, pages 171197. Springer New York, 2002.
[MC04] Maria Murillo and XiaoChuan Cai. A fully implicit parallel algorithm for simulating the nonlinear electrical activity of the heart. Numerical linear algebra with applications, 11(23):261277, 2004.
[NAY62] J. Nagumo, S. Arimoto, and S. Yoshizawa. An active pulse transmission line simulating nerve axon. Proceedings of the IRE, 50(10):20612070, Oct 1962.
[Ott10] Denny Otten. Mathematical models of reaction diffusion systems, their numerical solutions and the freezing method with comsol multiphysics. 2010.
[Ril06] Caroline Jane Riley. Reaction and diffusion on the sierpinkski gasket. PhD thesis, University of Manchester, 2006.
[Tuc13] Henry C. Tuckwell. Stochastic partial differential equations in neurobiology. Linear and nonlinear models for spiking neurons. In Stochastic Biomathematical Models, volume 2058 of Lecture Notes in Mathematics. Springer Berlin Heidelberg, 2013.
[VBSS89] J. G. Verwer, J. G. Blom, and J. M. SanzSerna. An adaptive moving grid method for one dimensional systems of partial differential equations. Journal of Computational Physics, 82(2):454486, 1989.
[ZCO10] Yao Zhang, Jonathan Cohen, and John D. Owens. Fast tridiagonal solvers on the gpu. SIGPLAN Not., 45(5):127136, Jan 2010.
[Zwa11] M. N. Zwarts. A test set for an adaptive moving grid pde solver with timedependent adaptivity. Master’s thesis. Utrecht University, Utrecht, Netherlands, 2011.
Notes from SIGGRAPH 2015
Introduction
I recently flew out to Los Angeles to attend the 42nd International Conference and Exhibition on Computer Graphics and Interactive Techniques. SIGGRAPH‘s theme this year was the crossroads of discovery bringing it closer to its roots that began here in Boulder, Colorado back in 1974. For me it was a chance to dig a little deeper into Computer Graphics research following my recent studies and develop a better understanding of the industries pushing the domain forward. As with most posts on this site, this is a short reminder to myself, and hopefully gives others an idea of what they could expect if they went.
Production Sessions
Disney – Pixar’s “Lava”: Moving Mountains was an informative production session detailing the process of bringing “Lava” to the screen. “Lava” is the story of Uku, a lonely volcano in search of love. As millions of years go by, he begins to lose hope as he recedes back into the ocean. But all is not lost. Uku finds renewed hope for love as newly formed volcano Lele rises to the surface. After the Pixar magicians reveal their secrets, technical details, and engrossing backstory, “Lava” becomes an even more enjoyable short film.
The presentation began with director James Murphy explaining his personal story inspiring the short before giving a live performance of the titular song. Colin Levy followed Murphy’s conceptualization, story boarding, and clay mockups with how the film would be framed for maximal emotional impact. Levy explain the exploratory process of filming the opening scene of the film to find the right combination of lenses, and flight paths based on realworld references to help illustrate the size and scale of Uku, the hopeless volcano.
Both Aaron Hartline and Austin Lee continued discussing the challenges of animating and rigging Uku, Lele, a pair of dolphins, birds, whales, and turtles (the last four representing young love, newly weds, established lives, and life long love). In particular, the different approaches for animating and rigging the facial features of Uku (eyelids, lips, checks, and so on) and how the teams iterated to find a balance between what the audience might expect from an anthropomorphic mountain and what they wanted to achieve as story tellers.
Perhaps the most interesting moment in the presentation was Dirk Van Gelder’s sneak peak of the enhancements the team made to Presto (Pixar’s inhouse animation tool) to provide animators final render quality realtime feedback of their changes through a clever combination of Rendermanbased final renders and OpenGL hardware texturing. Aside from the technical novelty, it’s a great example of time saving enhancements that make it easier for people to freely experiment and explore different approaches leading to better results.
The closing discussion by Byron Bashforth and Farhez Rayani on shading and lighting was informative and it was interesting to see how the procedural approaches were done to give Uku both a physically realistic and visually appealing biome consisting of different shaders, and static and procedural assets. Overall, a very interesting peak into the workflow of one of the most venerable studios in the industry.
Birds of a Feather
Having worked in the healthcare space for a fair bit of time, I was attracted to meetings on Volume Rendering and Medical Visualization and HealthTech: Modeling, Interaction, Hardware, and Analysis to see what people have been working on and to get a glimpse of where things are heading.
Nicholas Polys of Virginia Tech and Michael Aratow (MD) (both chairs of the Web3D Consortium Medical Working Group) began the medical visualization discussion by going over common libraries such as VTK (The Visualization Toolkit) and Voreen (Volume Rendering Engine), before discussing general purpose analysis and visualization tools such as Paraview. Volume oriented applications such as Seg3D (volume segmentation tool), OsiriX (DICOM viewer) were covered and finally, tools for exploring biomolecular systems such as Chimera, VMD (Visual Molecular Dynamics) and PathSim (EpsteinBarr Virus exploration) were discussed giving the audience a good lay of the land. Brief bit of time was given to surgical training tools based on 3D technologies and haptic feedback (e.g. H3D).
These were all interesting applications and seeing how they all work using different types of humanmachine interfaces (standard workstations, within CAVE environments, or even in virtual reality headsets and gloves) was eye opening. The second main theme of the discussion was on standardization when it comes to interoperability and reproducibility. There was a heavy push for X3D along with interoperability with DICOM. Like a lot of massive standards, DICOM has some wiggle room in it that leads to inconsistent implementations from vendors. That makes portability of data between disparate systems complicated (not to mention DICOM incorporates nongraphical metadata such as complex HL7). Suffice to say X3D is biting off a big chunk of work, and I think it will take some time for them to make progress in healthcare since it’s a fragmented industry that is not in the least bit technologically progressive.
One area I felt was absent during the discussion was how 3D graphics could be used to benefit everyday patients. There is a wealth of fMRI and ECoG data that patients could benefit from seeing in an accessible way for example showing a patient a healthy baseline, then accentuating parts of their own data and explaining how those anomalies affect their wellbeing. If a component can be developed to deliver that functionality, then it can be incorporated into a patient portal alongside all other charts and information that providers have accumulated for the patient.
The HealthTech discussion was presented by Ramesh Raskar, and his graduate students and postdocs from the MIT Media Lab. They presented a number of lowcost, lowpower diagnostic devices for retinal imaging and electroretinography, highspeed tomography, cellphonebased microscopy, skin perfusion photography, and dental imaging. Along with more social oriented technologies for identify safe streets to travel, and automatically discerning mental health from portraits. There were plenty of interesting applications being developed by the group, but it was more of a show and tell by the group than discussing the types of challenges beyond the scope of the work by MIT Media Lab (as impressive as they are). (For example, The fine work 3Shape A/S has done with fast scanning of teeth for digital dentistry.)
One thing that was discussed of key interest was Meddit a way for medical practitioners and researchers to define open problems to maturity, then presenting those challenges to computer scientists to work on and develop solutions. While the company name is uninspired, I think this is the right kind of collaboration platform for the “toolmaker” view of hardware engineers, computer scientists and software engineers as it identifies a real issue, presents an opportunity, and gives a pool of talented, bright people a way to make a difference. I am skeptical that it will take off (I think it would have more success as a niche community within an umbrella collaboration platform i.e. Stack Exchange model), but the idea is sound and something people should get excited about.
RealTime Live!
The challenge of realtime graphics is very appealing to me and getting to see what different software studios are working on was a real treat. While there were several presentations and awards given during the two hour long event, three demos stood out to me. Balloon Burst given by Miles Macklin of NVIDIA, BabyX presented by Mark Sagar of University of Auckland, and award winner A Boy and His Kite demoed by Nick Penwarden of Epic Games.
Macklin’s demo was impressive in that it simulated more than 750,000 particles (250,000 by their solver Flex, and 512,000 for mist and droplets) and their paper [pdf] Fast GridFree Surface Tracking gave some technical background into how they achieved their results. Fluid simulation is something I’d like to spend some time exploring, obviously won’t be able to create something as technical as Macklin’s group, but would like to spend some time on SmoothedParticle Hydrodynamics, and seeing NVIDIA’s work was a good motivation boost to explore the subject further on my own.
Perhaps the most unexpected entry in the series was Sagar’s BabyX. It was a fascinating assemblage of neural networks, real time graphics, natural language processing, computer vision, and image processing to create the ultimate “Sims” like character a baby that could learn and invoke different emotional responses based on external stimuli. Realtime graphics were photorealistic, and seeing the modeling behind the system to emulate how the brain behaves in the presence of different dopamine levels (and how those levels correspond to things like Parkinson’s and schizophrenia) was impressive as well. Overall, a fantastic technical achievement and I look forward to following Sagar’s work as it continues to evolve.
My main interest in going to RealTime Live! was to see Penwarden’s work on A Boy and His Kite. This impressive demo spanning hundred square miles inspired by the Isle of Skye really puts to shame my prior work in creating procedural environments. Nonetheless, it goes to show to far the medium can be pushed and how small the divide between realtime and film is becoming. Computer Graphics World published (JulyAugust 2015) a very thorough technical overview [p. 4048] of how Penwarden’s team produced the short, in addition to the features added to Unreal Engine 4 to make the demo shine.
Wrapup
There were many other things I explored that I won’t go into detail namely the VR Village, Emerging Technologies, Research Posters, Exhibition, and Job Fair. I’m still quite skeptical that virtual reality (and to the same extent augmented reality) technologies will come into the mainstream; I think they’ll continue to be the subject of researchers, gaming enthusiasts, and industry solutions for automotive, and healthcare problems. One thing that was a bit of a disappointment was the Job Fair as there were barely any companies participating. Overall, a positive experience learning what other people are doing in the industry, and getting to see how research is being applied in a variety of different domains including automotive, entertainment, engineering, healthcare, and science.
Deep Learning for Automatic Speech Recognition
Introduction
The problem of automatic speech recognition, and details of the traditional Hidden Markov Model and Gaussian Mixture Model hybrid architecture (HMMGMM) for acoustic modeling are detailed in [JM08], but will be skipped here. Instead, the focus of this literature review is to discuss how [DYDA12] uses a context dependent Hidden Markov Model and Deep Neural Network hybrid architecture (CDHMMGMM) for acoustic modeling as it represents a significant improvement over the traditional HMMGMM approach. This review will begin with motivation for the architecture, then go into detail the algorithms used for pretraining, and outline the algorithms used for training before concluding with how well the approach outperforms the standard HMMGMM approach.
Architecture
To motivate their architecture, [DYDA12] rely on the standard noisy channel model for speech recognition presented in [JM08] where we wish to maximize the likelihood of a decoded word sequence given our input audio observations:
(1) 
Where and represent the language and acoustic models respectively. [JM08] state that the language model can be computed via an Ngram model; [DYDA12] acknowledge using this approach, but focus their efforts into explaining their acoustic model:
(2) 
Here the acoustic model is viewed as a sequence of transitions between states of tiedstate triphones which [DYDA12] refer to as senones giving us the context dependent aspect of the architecture. [FLMS14] explains that senones represent the pronunciation of words and are derived by decision trees. By tying triphone states together, this approach is able to avoid having to process a large number of triphones and avoid the likely sparseness of training examples for every possible triphone.
The model assumes that there is a probability for the starting state, probabilities of transitioning to the state observed at step to step , and finally, the probability of the acoustics given the current state . [DYDA12] expand this last term further into:
(3) 
Where models the tied triphone senone posterior given melfrequency cepstral coefficients (MFCCs) based on 11 sampled frames of audio. While MFCCs come from signal processing, they have proven to be effective features for automatic speech recognition. Based on the power spectrum derived from sample audio frames, MFCCs represent characteristics of the audio that our ears are sensitive to as explained in [Ada10]. is the prior probability of the senone, and can be ignored since it does not vary based on the decoded word sequence we are trying to find.
Based on this formalism, [DYDA12] chose to use a pretrained Deep Neural Network to estimate using MFCCs as DNN inputs and taking the senone posterior probabilities as DNN outputs. The transitioning between events is best modeled by a Hidden Markov Model whose notation, appears in Eq. (2). Now that we have an overview of the general CDDNNHMM architecture, we can look at how [DYDA12] train their model.
PreTraining
Given the DNN model we wish to fit the parameters of the model to a training set. This is usually accomplished by minimizing a likelihood function and deploying a gradient descent procedure to update the weights. One complication to this approach is that the likelihood can be computationally expensive for multilayer networks with many nodes rendering the approach unusable. As an alternative, one can attempt to optimize a computationally tractable surrogate to the likelihood. In this case the surrogate is the contrastive divergence method developed by [Hin02]. This sidestep enabled [HOT06] to develop an efficient unsupervised greedy pretraining process whose results can then be refined using a few iterations of the traditional supervised backpropagation approach. In this portion of the paper we discuss the work of [Hin02] and explain the greedy algorithm of [HOT06] before going on to discuss the highlevel training procedure of [DYDA12].
To understand the pretraining process, it is necessary to discuss the Restricted Boltzmann Machine (RBM) and Deep Belief Network (DBN) models. RBMs are an undirected bipartite graphical model with Gaussian distributed input nodes in a visible layer connecting to binary nodes in a hidden layer. Every possible arrangement of hidden, , and visible, , nodes is given an energy under the RBM model:
(4) 
Where is the weight of connections between nodes and vectors and correspond to the visible and hidden biases respectively. The resulting probability is then given by:
(5) 
Where is a normalization factor. Based on the assumptions of the RBM, [DYDA12] derive expressions for and given by:
(6) 
Where is an elementwise logistic function. [DYDA12] argue that Eq. (6) allows one to repurpose the RBM parameters to initialize a neural network. Training of the RBM is done by stochastic gradient descent against the negative log likelihood since we wish to find a stable energy configuration for the model:
(7) 
however [DYDA12] point out that the gradient of the negative log likelihood cannot be computed exactly since the term takes exponential time. As a result, the contrastive divergence method is used to approximate the derivative:
(8) 
where is a single step Gibbs sampled expectation. These terms are expectations in which nodes are simultaneously active given the training data and model. Given this insight, regular stochastic gradient descent can be performed and the parameters of a RBM fitted to training data.
Now that we have an understanding of RBMs, we can shift our focus to DBNs. A Deep Belief Network is a multilayer model with undirected connections between the top two layers and directed between other layers. To train these models, [HOT06] had the insight to treat adjacent layers of nodes as RBMs. One starts with the bottom two layers and trains them as though they were a single RBM. Once those two layers are trained, then the top layer of the RBM is treated as the input layer of a new RBM with the layer above that layer acting as the hidden layer of the new RBM. The sliding window over the layers continues until the full DBN is trained. After this, [HOT06] describe an “updown” algorithm to further refine the learned weights. The learned parameters of this greedy approach can then be used as the parameters of a DNN as explained earlier in the discussion of Eq. (6).
Training
Training of the CDDNNHMM model consists of roughly a dozen involved steps. We won’t elaborate here on the full details of each step, but will instead provide a highlevel sketch of the procedure to convey its general mechanics.
The first highlevel step of the procedure is to initialize the CDDNNHMM model. This is done by first training a decision tree to find the best tying of triphone states which are then used to train a CDGMMHMM system. Next, the unique tied state triphones are each assigned a unique senone identifier. This mapping will then be used to label each of the tied state triphones. (These identifiers will be used later to refine the DNN.) Finally, the trained CDGMMHMM is converted into a CDDNNHMM by retaining the triphone and senone structure and HMM parameters. This resulting DNN goes through the previously discussed pretraining procedure.
The next highlevel step iteratively refines the CDDNNHMM. To do this, first the originally trained CDGMMHMM model is used to generate a raw alignment of states which is then mapped to its corresponding senone identifier. This resulting alignment is then used to refine the DBN by backpropagation. Next, the prior senone probability is estimated based on the number of frames paired with the senone and the total number of frames. These estimates are then used to refine the HMM transition probabilities to maximize the features. Finally, if this newly estimated parameters do not improve accuracy against a development set, then the training procedure terminates; otherwise, the procedure repeats this highlevel step.
Experimental Results
System Configurations
[DYDA12] report that their system relies on nationwide language model consisting of 1.5 million trigrams. For their acoustic model, they use a five hidden layer DNN with each layer containing 2,048 hidden units. Training the system from scratch on 24 hours of training data takes four days on a Dell T3500 workstation with an NVIDIA Tesla GPU. [DYDA12] emphasize the importance of the GPU in obtaining acceptable training time, and that without it, training time would be 30x slower.
Datasets and Metrics
Comparison of automatic speech recognition system consists of three principle error metrics: sentence (SER), word (WER), and phoneme (PER) error rates. These look at the ratio of incorrect entities to the number of total entities with the exception of word error rate which uses a Levenshtein approach to measure the number of insertions, substitutions, and deletions relative to the total number of words. A sentence is considered incorrect if there is at least one incorrect word.
These error metrics often coincide with different datasets, in particular WER is reported for Switchboard, SER for Bing Mobile Voice Search (BMVS), and PER on TIMIT. Switchboard is a collection of phone conversations between two people, while BMVS is a collection of short spoken questions such as “The Med” or “Chautauqua Park” that are used to find these locations, while TIMIT is a phonetic focused corpus of spoken sentences that are phonetically rich.
Results
Switchboard  BMVS  TIMIT  
(WER)  (SER)  (PER)  
GMM  23.6^{[2]}  36.2^{[1]}  21.7^{[2]} 
DNN  16.1^{[2]}  30.4^{[1]}  21.9^{[3]} 
CNN  –  –  20.2^{[3]} 
RNN  –  –  17.7^{[4]} 
Direct comparison of models is complicated by the variety of error metrics and datasets; [DBL12] is used to fill in these gaps to make a meaningful comparison. As one can see from Table (1), the neural network approaches do better on average over the traditional GMM approach. To illustrate that it is not only DNN approaches that do better, the work of [AMJ+14] using a Convolutional Neural Network (CNN) and [GMH13] using a Recurrent Neural Network (RNN) are included to further drive the point that neural network architectures are viable alternatives to GMMs.
Conclusions
[DYDA12], [AMJ+14], and [GMH13] have shown that neural network architectures exhibit better performance over Gaussian Mixture Models. [DYDA12] believes that a more capable first layer model provided by meancovariance restricted Boltzmann machines will increase performance, while [AMJ+14] plans to investigate unexpected improvements in largevocabulary speech recognition where they were absent in phone recognition tasks when using convolutional restricted Boltzmann machines. Both routes seem promising and are likely to produce improved error rates inline with [GMH13]’s results.
In [DBL12], the authors of both research groups suggest key gains will come from improved understanding of the pretraining process and how the types of units used in these models affect error rates. They conclude that distributed training is the largest hurdle to overcome for these systems to make use of more training data. (Parallelization is limited by the sequential stochastic gradient descent at the heart of the pretraining and training processes.) As [DYDA12] point out in their paper, GPUbased approaches can assist in reducing computation time, but more foundational approaches need to be pursued.
In a 2014 talk [Hin14], Hinton criticizes existing neural network architectures on philosophical grounds arguing that they do not correspond well enough to how the brain functions citing inadequate structural complexity. His proposed solution is a new neural network approach that clusters neurons together into capsules, which he believes will better model how the cortical columns of the brain behave. If Hinton is right (which his track record suggests), then it is likely we’ll see this capsule approach outperform existing models, and consequently, yield improved error rates in automatic speech recognition.
References
[Ada10] Andre Gustavo Adami. Automatic speech recognition: From the beginning to the portuguese language. In The Int. Conf. on Computational Processing of Portuguese (PROPOR). Rio Grande do Sul: Porto Alegre, 2010.
[AMJ+14] Ossama AbdelHamid, Abdelrahman Mohamed, Hui Jiang, Li Deng, Gerald Penn, and Dong Ui. Convolutional neural networks for speech recognition. IEEE/ACM Transactions on Audio, Speech & Language Processing, 22(10):15331545, 2014.
[DBL12] Deep neural networks for acoustic modeling in speech recognition: The shared views of four research grounds. IEEE Signal Process. Mag., 29(6):8297, 2012.
[DYDA12] George E. Dahl, Dong Ui, Li Deng, and Alex Acero. Contextdependent pretrained deep neural networks for largevocabulary speech recognition. IEEE Transactions on Audio, speech & Language Processing, 20(1):3042, 2012.
[FLMS14] Luciana Ferrer, Yun Lei, Mitchell McLaren, and Nicolas Scheffer. Spoken language recognition based on senone posteriors. In INTERSPEECH 2014, 15th Annual Conference of the International Speech Communnication Association, Singapore, September 1418, 2014, pages 21502154. ISCA, 2014.
[GMH13] Alex Graves, Abdelrahman Mohamed, and Geoffrey E. Hinton, Speech recognition with deep recurrent neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, May 2631, 2013, pages 66456649, 2013.
[Hin02] Geoffrey E. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):17711800, 2002.
[Hin14] Geoffrey E. Hinton. What’s wrong with convolutional nets? Massachusetts Institute of Technology, Department of Brain and Cognitive Sciences, Fall Colloquium Series, 2014.
[HOT06] Geoffrey E. Hinton, Simon Osindero, and Yee Whye Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7):15271554, 2006.
[JM08] Daniel Jurafsky and James H. Martin. Speech and Language Processing, 2nd Edition. Prentice Hall, 2008.
LargeScale Detection and Tracking of Aircraft from Satellite Images
Introduction
Motivation
Historically, search and rescue operations have relied on onsite volunteers to expedite the recovery of missing aircraft. However, onsite volunteers are not always an option when the search area is very broad, or difficult to access or traverse as was the case when Malaysia Airlines Flight 370 disappeared over the Indian Ocean in March, 2014. Crowdsourcing services such as tomnod.com have stepped up to the challenge by offering volunteers to manually inspect satellite images for missing aircraft in an attempt to expedite the recovery process. This process is effective, but slow. Given the urgency of these events, an automated image processing solution is needed.
Prior Work
The idea of detecting and tracking objects from satellite images is not new. There is plenty of academic literature on detection or tracking, but often not both for things like oil tankers, aircraft, and even clouds. Most distributed image processing literature is focused on using grid, cloud, or specialized hardware for processing large streams of image data using specialized software or general platforms like Hadoop. Commercially, companies like DigitalGlobe have lots of satellite data, but haven’t publicized their processing frameworks. Startups like Planet Labs have computing and satellite resources capable of processing and providing whole earth coverage, but have not publicized any work on this problem.
The FAA has mandated a more down to earth approach through ADSB. By 2020, all aircraft flying above 10,000 ft will be required to have Automatic dependent surveillance broadcast transceivers in order to broadcast their location, bearing, speed and identifying information so that they can be easily tracked. Adoption of the standard is increasing with sites like Flightrader24.com and Flightaware.org providing realtime access to ongoing flights.
Problem Statement
Fundamentally, there are two variants of this problem: online and offline. The offline variant most closely resembles the tomnod.com paradigm which is concerned with processing historic satellite images. The online variant most closely resembles air traffic control systems where we’d be processing a continual stream of images coming from a satellite constellation providing whole earth coverage. The focus of this work is on the online variant since it presents a series of more interesting problems (of which the offline problem can be seen as a subproblem.)
In both cases, we’d need to be able to process large volumes of satellite images. One complication is that large volumes of satellite images are difficult and expensive to come by. To work around this limitation, synthetic images will be generated based off OpenFlights.org data with the intent of evaluating natural images on the system in the future. To generate data we’ll pretend there are enough satellites in orbit to provide whole earth coverage capturing simulated flights over a fixed region of space and window of time. The following video is an example of the approximately 60,000 flights in the dataset being simulated to completion:
To detect all these aircraft from a worldwide image, we’ll use correlation based template matching. There are many ways to parallelize and distribute this operation, but an intuitive distributed processing of image patches will be done with each cluster node performing a parallelized Fast Fourier Transform to identify any aircraft in a given patch. Tracking will be done using an online heuristic algorithm to “connect the dots” recovered from detection. At the end of the simulation, these trails of dots will be paired with simulated routes to evaluate how well the system did.
The remainder of this post will cover the architecture of the system based on Apache Spark, its configurations for running locally and on Amazon Web Services, and how well it performs before concluding with possible future work and cost analysis of what it would take to turn this into a realworld system.
System Architecture
Overview
The system relies on the data pipeline architecture presented above. Data is read in from the OpenFlights.org dataset consisting of airport information and flight routes. A fixed number of national flights are selected and passed along to a simulation module. At each time step, the simulator identifies which airplanes to launch, update latitude and longitude coordinates, and remove those that have arrived at their destination.
To minimize the amount of network traffic being exchanged between nodes, flights are placed into buckets based on their current latitude and longitude. Buckets having flights are then processed in parallel by the Spark Workers. Each worker receives a bucket and generates a synthetic satellite image; this image is then given to the detection module which in turn recovers the coordinates from the image.
These coordinates are coalesced at the Spark Driver and passed along to the tracking module where the coordinates are appended to previously grown flight trails. Once 24 hours worth of simulated time has elapsed (in simulated 15 minute increments), the resulting tracking information is passed along to a reporting module which matches the simulated routes with the flight trails. These results are then visually inspected for quality.
Simulation Assumptions
All latitude and longitude calculations are done under the Equirectangular projection. A corresponding flight exists for each route in the OpenFlights.org dataset (Open Database License). Flights departing hourly follow a straight line trajectory between destinations. Once en route, flights are assumed to be Boeing 747s traveling at altitude of 35,000 ft with a cruising speed of 575 mph.
Generation
Flights are mapped to one of buckets based on each flight’s latitude and longitude coordinate. Each bucket spans a degree region as illustrated in the middle layer of Fig. (2). Given a bucket, a oversimplified synthetic mediumresolution monochromatic satellite image is created with adorning aircraft silhouettes for each m Boeing 747 airliner in the bucket. (Visual obstructions such as clouds or nightfall will not be depicted.) This image, in addition to the latitude and longitude of the topleft and bottomright of the image, are then passed along to the detection module.
Detection
Given an image and its world coordinate frame, the detection module performs textbook Fourierbased correlation template matching to identify silhouettes of airplanes, , in the image, :
(1) 
Where the twodimensional Discrete Fourier Transform and inverse transform are defined as:
(2) 
(3) 
To carry out these calculations efficiently, a parallelized twodimensional Fast Fourier Transform (FFT) time algorithm was implemented for both forward and inverse operations by leveraging the fact that operations (2), (3) can be factored so that the FFT can be computed rowwise, and then on those results columnwise. Hadamard (elementwise) product of the frequency domain representation of the airplane and satellite image is done naively in quadratic time.
To denoise the results, the recovered spatial product, , is thresholded so that any real values greater than 90% of the product’s real maximum, , are kept:
(4) 
Since there are many values greater than the threshold, a linear time (in number of nodes) connected component labeling algorithm is applied to identify the most likely aircraft locations. The algorithm treats each pixel of the image as a node if that pixel’s value is greater than the threshold. An edge exists between two nodes if the nodes’ pixel coordinates are within an distance of two. The centroid of each connected component is then taken to be the true coordinate of each detected aircraft. Next only those centroids derived from clusters having more than half the average number of pixels per cluster are kept. Finally, these centroids are transformed to latitude and longitude coordinates given the world coordinate frame.
Tracking
The tracking module uses an grid of buckets with each bucket representing approximately a square degree region as illustrated as the top layer in Fig. (2). Each individual bucket consists of a stack of sightings where a sighting is a timestamped collection of coordinates. Here an individual coordinate is allowed to “connect” up to two other coordinates. Coordinates connected in this fashion form a trail, which is the primary output of the module.
For each latitude and longitude coordinate from the detection module, , the tracking module picks all the previous time step’s coordinates, , from the neighboring () buckets of ‘s bucket. From , only those coordinates that satisfy the following criteria are considered:
 must be free to “connect” to another coordinate.
 must be collinear to the coordinates of the trail headed by , i.e., as in Fig. (3).
 Given the predecessor of , the inner product of the vectors formed from the predecessor to and must be positive, i.e., as in Fig. (3).
Next, the nearest neighbor of is chosen from this remaining set of points. If a nearest neighbor exists, then is appended to the end of the nearest neighbor’s trail, otherwise a new trail is created. Finally, is added to its designated bucket so that it can be used for future trail building.
When the simulation completes, all trails from tracking module are analyzed and matched to the known routes used in the simulation. Matching is done by minimizing the distance from the trail’s origin and destination to a route’s origin and destination respectively. If the mean orthogonal distance:
(5) 
from the coordinates in the trail to the line formed by connecting the route’s origin and destination is greater than 25 m, then the match is rejected.
Reporting
The reporting module is responsible for summarizing the system’s performance. The average mean orthogonal distance given by Eqn. (5) is reported for all identified routes, total number of images processed and coordinates detected, and the portion of routes correctly matched is reported.
System Configurations
Standalone
Standalone mode runs the application in a single JVM without using Spark. Experiments were ran on the quadcore Intel i7 3630QM laptop jaws, which has 8 GB of memory, 500 GB hard drive, and is running Windows 8.1 with Java SE 7.
Cluster
Cluster mode runs the application on a Spark cluster in standalone mode. Experiments were ran on a network consisting of two laptop computers connected to a private 802.11n wireless network. In addition to jaws, the laptop oddjob was used. oddjob is a quadcore Intel i7 2630QM laptop with 6 GB of memory, 500 GB hard drive running Windows 7. Atop each machine, Oracle VM VirtualBox hosts two cloned Ubuntu 14.04 guest operating systems. Each virtual machine has two cores, 2 GB of memory and a 8 GB hard drive. Each virtual machine connects to the network using a bridged network adapter to its host’s. Host and guest operating systems are running Java SE 7, Apache Hadoop 2.6, and Scala 2.11.6 as prerequisites for Apache Spark 1.3.1. In total, there are four Spark Workers who report to a single Spark Master which is driven by a single Spark Driver.
Cloud
Cloud mode runs the application on an Amazon Web Services (AWS) provisioned Spark cluster managed by Apache Yarn. Experiments were ran using AWS’s Elastic Map Reduce (EMR) platform to provision the maximum allowable twenty^{[1]} Elastic Compute Cloud (EC2) previous generation m1.medium instances (one master, nineteen core) by scheduling jobs to execute the application JARs from the Simple Storage Service (S3). Each m1.medium instance consists of one 2 GHz CPU, 3.7 GB of memory, 3.9 GB hard drive running Amazon Machine Image (AMI) 3.6 equipped with Red Hat 4.8, Java SE 7, Spark 1.3.0. In total, there are nineteen Spark Workers and one Spark Master – one per virtual machine – managed by a Yarn Resource Manager driven by a single Yarn Client hosting the application.
System Evaluation
Detection Rate
(6) 
When an image is sparsely populated, the system consistently detects the presence of the aircraft, however as the density increases, the system is less adapt at finding all flights in the region as shown in Fig. (7). This is unexpected behavior. Explanations include the possibility that the threshold needs to be made to be adaptive, or that a different approach needs to be taken all together. In terms of real world implications, FAA regulations (JO 7110.65V 544) state that flights must maintain a minimum lateral distance of 3 and 5 miles (4.8 to 8 km). In principle, there could be at most four flights in a given image under these guidelines and the system would still have a 96.6% chance of identifying all four positions.
Detection Accuracy
A flight from DIA to CTL was simulated to measure how accurate the template matching approach works as illustrated in Fig. (8). Two errors were measured: the mean orthogonal distance given by Eqn. (5) and the mean distance between the detected and actual coordinate for all time steps:
(7) 
For Eqn. (5) a mean error of m was found, and for Eqn. (7) m. Both errors are acceptable given a single pixel represents approximately m. (For context, the global positioning system (GPS) has a 7.8 m error.)
In terms of how accurate detection is at the macro level, 500 flights were simulated and the resulting mean orthogonal distance was analyzed. Fig. (9) illustrates the bimodal distribution that was observed. 65% of the flights observed an accuracy less than 26 m with an average accuracy of m, while the remaining 35% saw an average accuracy of km which is effectively being off by a full degree. It is assumed that this 35% are cases where trails are paired incorrectly with routes. Based on these findings, the system enforces that all pairings not exceed a mean orthogonal distance of 25 m.
Tracking Rate
(8) 
For Fig. (10), an increasing number of random flights were simulated to completion and the resulting mean tracking rate reported. Based on these findings, the tracking module is having difficulty correctly handling many concurrent flights originating from different airports. This behavior is likely a byproduct of how quickly the detection rate degrades when many flights occupy a single region of space. When running a similar simulation where all flights originate from the same airport, the tracking rate is consistently nearperfect independent of the number of flights. This would suggest the system has difficulty with flights that cross paths. (For context, there is on average 7,000 concurrent flights over US airspace at any given time that we would wish to track.)
Performance
A series of experiments was conducted against the three configurations to measure how quickly the system could process different volumes of flights across the United States over a 24hours period. The results are illustrated in Fig. (11). Unsurprisingly, the Cloud mode outperforms both the Standalone and Cluster modes by a considerable factor as the number of flights increases.
Configuration  ms/image  mb/sec  Time (min) 

Standalone  704  3.00  260 
Cluster  670  2.84  222 
Cloud  207  9.67  76 
Table (1) lists the overall processing time for 22k images representing roughly 550k km^{2}, and 43 GB of image data. If the Cloud configuration was used to monitor the entire United States, then it would need approximately 22 hours to process a single snapshot consisting of 770 GB of image data. Obviously, the processing time is inadequate to keep up with a recurring avalanche of data every fifteen minutes. To do so, a realworld system would need to be capable of processing an image every 2 ms. To achieve this 1) more instances could be added, 2) the implementation can be refined to be more efficient, 3) the implementation can leverage GPUs for detection, and 4) a custom tailored alternative to Spark could be used.
Discussion
Future Work
There are many opportunities to exchange the underlying modules with more robust techniques that both scale and are able to handle realworld satellite images. The intake and generation modules can be adapted to either generate more realistic flight paths and resulting satellite imagery, or adapted to handle realworld satellite imagery from a vendor such as Skybox Imaging, Planet Labs, or DigitalGlobe.
For detection, the correlation based approach can be replaced with a crosscorrelation approach, or with the more involved Scale Invariant Feature Transformation (SIFT) method which would be more robust at handling aircraft of different sizes and orientations. Alternatively, the parallelism granularity can be changed so that the twodimensional FFT rowwise and columnwise operations are distributed over the cluster permitting the processing of larger images.
Tracking remains an open issue for this system. Getting the detection rate to be near perfect will go a long way, but the age of historical sightings considered could be increased to account for “gaps” in the detection trail. Yilmaz et al. provide an exhaustive survey of deterministic and statistical point tracking methods that can be applied here, in particular the Joint Probability Data Association Filter (JPDAF) and Multiple Hypothesis Tracking (MHT) methods which are worth exploring further.
On the reporting end of the system, a dashboard showing all of the detected trails and coordinates would provide an accessible user interface to endusers to view and analyze flight trails, discover last known locations, and detect anomalies.
Realworld Feasibility
While the scope of this work has focused on system internals, it is important to recognize that a realworld system requires a supporting infrastructure of satellites, ground stations, computing resources, facilities and staff each of which imposes its own set of limitations on the system. To evaluate the system’s feasibility, its expected cost is compared to the expected cost of the ADSB approach.
Following the CubeSat model and a 1970 study by J. G. Walker, 25 satellites ($1M ea.) forming a constellation in low earth orbit is needed to provide continuous whole earth coverage for $25M. Ground stations ($120k ea.) can communicate with a satellite at a time bringing total costs to $50M.^{[2]} Assuming that a single computer is responsible for square degree region, the system will require 64,800 virtual machines, equivalently 1,440 quadcore servers ($1k ea.) bringing the running total to $51M.
ADSB costs are handled by aircraft owners. Average upgrade costs are $5k with prices varying by vendor and aircraft. Airports already have Universal Access Transceivers (UATs) to receive the ADSB signals. FAA statistics list approximately 200,000 registered aircraft suggesting total cost of $1B.
Given that these are very rough estimates, an unobtrusive $51M system would be a good alternative to a $1B dollar exchange between private owners to ADSB vendors. (Operational costs of the system were estimated to be $1.7M/year based on market rates for colocations and staff salaries.)
Conclusions
In this work, a distributed system has been presented to detect and track commercial aircraft from synthetic satellite imagery. The system’s accuracy and detection rates are acceptable given established technologies. Given suitable hardware resources, it would be an effective tool in assisting searchandrescue teams locate airplanes from historic satellite images. More work needs to be done to improve the system’s tracking abilities for it to be a viable realworld air traffic control system. While not implemented, the data needed to support flight deviation, flight collision detection and other air traffic control functionality is readily available. Spark is an effective tool for quickly distributing work to a cluster, but more specialized high performance computing approaches may yield better runtime performance.
References
[1] Automatic dependent surveillancebroadcast (adsb) out equipment and use. Technical Report 14 CFR 91.225, U.S. Department of Transportation Federal Aviation Administration, May 2010.
[2] General aviation and air taxi activity survey. Technical Report Table 1.2, U.S. Department of Transportation Federal Aviation Administration, Sep 2014.
[3] Nanosats are go! The Economist, June 7 2014.
[4] A. Eisenberg. Microsatellites: What big eyes they have. New York Times, August 10 2013.
[5] A. Fasih. Machine vision. Lecture given at Transportation Informatics Group, ALPENADRIA University of Klagenfurt, 2009.
[6] S. Kang, K. Kim, and K. Lee. Tablet application for satellite image processing on cloud computing platform In Geoscience and Remote Sensing Symposium (IGARSS), 2013 IEEE International, pages 17101712, July 2013.
[7] W. Lee, Y. Choi, K. Shon, and J. Kim. Fast distributed and parallel preprocessing on massive satellite data using grid computing. Journal of Central South University, 21(10):38503855, 2014.
[8] J. Lewis. Fast normalized crosscorrelation. In Vision interface, volume 10, pages 120123, 1995.
[9] W. Li, S. Xiang, H. Wang, and C. Pan. Robust airplane detection in satellite images. In Image Processing (ICIP), 2011 18th IEEE International conference on, pages 28212824, Sept 2011.
[10] D. G. Lowe. Distinctive image features from scaleinvariant keypoints. International journal of computer vision, 60(2):91110, 2004.
[11] H. Prajapati and S. Vij. Analytical study of parallel and distributed image processing. In Image Information Processing (ICIIP), 2011 International Conference on, pages 16, Nov 2011.
[12] E. L. Ray. Air traffic organization policy. Technical Report Order JO 7110.65V, U.S. Department of Transportation Federal Aviation Administration, April 2014.
[13] J. Tunaley. Algorithms for ship detection and tracking using satellite imagery. In Geoscience and Remote Sensing Symposium, 2004. IGARSS ’04. Proceedings. 2004 IEEE International, volume 3, pages 18041807, Sept 2004.
[14] J. G. Walker. Circular orbit patterns providing continuous whole earth coverage. Technical report, DTIC Document, 1970.
[15] Y. Yan and L. Huang. Largescale image processing research cloud. In CLOUD COMPUTING 2014, The Fifth International Conference on Cloud Computing, GRIDs, and Virtualization, pages 8893, 2014.
[16] A. Yilmaz, O. Javed, and M. Shah. Object tracking: A survey. Acm computing surveys (CSUR), 38(4):13, 2006.