Antimatroid, The

thoughts on computer science, electronics, mathematics

Archive for the ‘Computer Graphics’ Category

Notes from SIGGRAPH 2015

leave a comment »

Introduction

I recently flew out to Los Angeles to attend the 42nd International Conference and Exhibition on Computer Graphics and Interactive Techniques. SIGGRAPH‘s theme this year was the crossroads of discovery bringing it closer to its roots that began here in Boulder, Colorado back in 1974. For me it was a chance to dig a little deeper into Computer Graphics research following my recent studies and develop a better understanding of the industries pushing the domain forward. As with most posts on this site, this is a short reminder to myself, and hopefully gives others an idea of what they could expect if they went.

Production Sessions

Disney – Pixar’s “Lava”: Moving Mountains was an informative production session detailing the process of bringing “Lava” to the screen. “Lava” is the story of Uku, a lonely volcano in search of love. As millions of years go by, he begins to lose hope as he recedes back into the ocean. But all is not lost. Uku finds renewed hope for love as newly formed volcano Lele rises to the surface. After the Pixar magicians reveal their secrets, technical details, and engrossing backstory, “Lava” becomes an even more enjoyable short film.

The presentation began with director James Murphy explaining his personal story inspiring the short before giving a live performance of the titular song. Colin Levy followed Murphy’s conceptualization, story boarding, and clay mockups with how the film would be framed for maximal emotional impact. Levy explain the exploratory process of filming the opening scene of the film to find the right combination of lenses, and flight paths based on real-world references to help illustrate the size and scale of Uku, the hopeless volcano.

Both Aaron Hartline and Austin Lee continued discussing the challenges of animating and rigging Uku, Lele, a pair of dolphins, birds, whales, and turtles (the last four representing young love, newly weds, established lives, and life long love). In particular, the different approaches for animating and rigging the facial features of Uku (eyelids, lips, checks, and so on) and how the teams iterated to find a balance between what the audience might expect from an anthropomorphic mountain and what they wanted to achieve as story tellers.

Perhaps the most interesting moment in the presentation was Dirk Van Gelder’s sneak peak of the enhancements the team made to Presto (Pixar’s in-house animation tool) to provide animators final render quality real-time feedback of their changes through a clever combination of Renderman-based final renders and OpenGL hardware texturing. Aside from the technical novelty, it’s a great example of time saving enhancements that make it easier for people to freely experiment and explore different approaches leading to better results.

The closing discussion by Byron Bashforth and Farhez Rayani on shading and lighting was informative and it was interesting to see how the procedural approaches were done to give Uku both a physically realistic and visually appealing biome consisting of different shaders, and static and procedural assets. Overall, a very interesting peak into the workflow of one of the most venerable studios in the industry.

Birds of a Feather

Having worked in the healthcare space for a fair bit of time, I was attracted to meetings on Volume Rendering and Medical Visualization and HealthTech: Modeling, Interaction, Hardware, and Analysis to see what people have been working on and to get a glimpse of where things are heading.

volumetric-visualization
Credit: (Left) X3D Example Archives: Basic, Medical: “Skeleton Complete Normals” as seen in H3DViewer. (Right) Virginia Tech Visionarium examples: “Blended body internals”.

Nicholas Polys of Virginia Tech and Michael Aratow (MD) (both chairs of the Web3D Consortium Medical Working Group) began the medical visualization discussion by going over common libraries such as VTK (The Visualization Toolkit) and Voreen (Volume Rendering Engine), before discussing general purpose analysis and visualization tools such as Paraview. Volume oriented applications such as Seg3D (volume segmentation tool), OsiriX (DICOM viewer) were covered and finally, tools for exploring biomolecular systems such as Chimera, VMD (Visual Molecular Dynamics) and PathSim (Epstein-Barr Virus exploration) were discussed giving the audience a good lay of the land. Brief bit of time was given to surgical training tools based on 3D technologies and haptic feedback (e.g. H3D).

These were all interesting applications and seeing how they all work using different types of human-machine interfaces (standard workstations, within CAVE environments, or even in virtual reality headsets and gloves) was eye opening. The second main theme of the discussion was on standardization when it comes to interoperability and reproducibility. There was a heavy push for X3D along with interoperability with DICOM. Like a lot of massive standards, DICOM has some wiggle room in it that leads to inconsistent implementations from vendors. That makes portability of data between disparate systems complicated (not to mention DICOM incorporates non-graphical metadata such as complex HL7). Suffice to say X3D is biting off a big chunk of work, and I think it will take some time for them to make progress in healthcare since it’s a fragmented industry that is not in the least bit technologically progressive.

One area I felt was absent during the discussion was how 3D graphics could be used to benefit everyday patients. There is a wealth of fMRI and ECoG data that patients could benefit from seeing in an accessible way- for example showing a patient a healthy baseline, then accentuating parts of their own data and explaining how those anomalies affect their well-being. If a component can be developed to deliver that functionality, then it can be incorporated into a patient portal alongside all other charts and information that providers have accumulated for the patient.

The HealthTech discussion was presented by Ramesh Raskar, and his graduate students and postdocs from the MIT Media Lab. They presented a number of low-cost, low-power diagnostic devices for retinal imaging and electroretinography, high-speed tomography, cellphone-based microscopy, skin perfusion photography, and dental imaging. Along with more social oriented technologies for identify safe streets to travel, and automatically discerning mental health from portraits. There were plenty of interesting applications being developed by the group, but it was more of a show and tell by the group than discussing the types of challenges beyond the scope of the work by MIT Media Lab (as impressive as they are). (For example, The fine work 3Shape A/S has done with fast scanning of teeth for digital dentistry.)

One thing that was discussed of key interest was Meddit a way for medical practitioners and researchers to define open problems to maturity, then presenting those challenges to computer scientists to work on and develop solutions. While the company name is uninspired, I think this is the right kind of collaboration platform for the “toolmaker” view of hardware engineers, computer scientists and software engineers as it identifies a real issue, presents an opportunity, and gives a pool of talented, bright people a way to make a difference. I am skeptical that it will take off (I think it would have more success as a niche community within an umbrella collaboration platform- i.e. Stack Exchange model), but the idea is sound and something people should get excited about.

Real-Time Live!

The challenge of real-time graphics is very appealing to me and getting to see what different software studios are working on was a real treat. While there were several presentations and awards given during the two hour long event, three demos stood out to me. Balloon Burst given by Miles Macklin of NVIDIA, BabyX presented by Mark Sagar of University of Auckland, and award winner A Boy and His Kite demoed by Nick Penwarden of Epic Games.

balloon-burst
Credit: Figure 1 of Fast Grid-Free Surface Tracking. Chentanez, Mueller, Macklin, Kim. 2015.

Macklin’s demo was impressive in that it simulated more than 750,000 particles (250,000 by their solver Flex, and 512,000 for mist and droplets) and their paper [pdf] Fast Grid-Free Surface Tracking gave some technical background into how they achieved their results. Fluid simulation is something I’d like to spend some time exploring, obviously won’t be able to create something as technical as Macklin’s group, but would like to spend some time on Smoothed-Particle Hydrodynamics, and seeing NVIDIA’s work was a good motivation boost to explore the subject further on my own.

Perhaps the most unexpected entry in the series was Sagar’s BabyX. It was a fascinating assemblage of neural networks, real time graphics, natural language processing, computer vision, and image processing to create the ultimate “Sims” like character- a baby that could learn and invoke different emotional responses based on external stimuli. Real-time graphics were photorealistic, and seeing the modeling behind the system to emulate how the brain behaves in the presence of different dopamine levels (and how those levels correspond to things like Parkinson’s and schizophrenia) was impressive as well. Overall, a fantastic technical achievement and I look forward to following Sagar’s work as it continues to evolve.

My main interest in going to Real-Time Live! was to see Penwarden’s work on A Boy and His Kite. This impressive demo spanning hundred square miles inspired by the Isle of Skye really puts to shame my prior work in creating procedural environments. Nonetheless, it goes to show to far the medium can be pushed and how small the divide between real-time and film is becoming. Computer Graphics World published (July-August 2015) a very thorough technical overview [p. 40-48] of how Penwarden’s team produced the short, in addition to the features added to Unreal Engine 4 to make the demo shine.

Wrap-up

There were many other things I explored that I won’t go into detail- namely the VR Village, Emerging Technologies, Research Posters, Exhibition, and Job Fair. I’m still quite skeptical that virtual reality (and to the same extent augmented reality) technologies will come into the mainstream; I think they’ll continue to be the subject of researchers, gaming enthusiasts, and industry solutions for automotive, and healthcare problems. One thing that was a bit of a disappointment was the Job Fair as there were barely any companies participating. Overall, a positive experience learning what other people are doing in the industry, and getting to see how research is being applied in a variety of different domains including automotive, entertainment, engineering, healthcare, and science.

Advertisements

Written by lewellen

2015-08-14 at 1:41 pm

Posted in Computer Graphics

Tagged with ,

Algorithms for Procedurally Generated Environments

with one comment

Fig. 1: Demonstration of different graphics techniques and features of the procedurally generated environment.

Introduction

I recently completed a graduate course in Computer Graphics that required us to demonstrate a significant understanding of OpenGL and general graphics techniques. Given the short amount of time to work with, I chose to work on creating a procedurally generated environment consisting of land, water, trees, a cabin, smoke, and flying insects. The following write-up explains my approach and the established algorithms that were used to create the different visual effects showcased in the video above.

Terrain

terrain-2
Fig. 2: First few iterations of the Midpoint Displacement algorithm.

There are a variety of different techniques for creating terrain. More complex ones rely on visualizing a three dimensional scalar field, while simpler ones visualize a two dimensional surface defined by a fixed image, or dynamically using a series of specially crafted functions or random behavior. I chose to take a fractal-based approach given by [FFC82]’s Midpoint Displacement algorithm. A two dimensional grid of size (2^n + 1)^2 (for n > 1) is allocated with each entry representing the height of the terrain at that row and column position. Beginning with the corners, the corresponding four midpoints, as well as singular center, are found and their height calculated by drawing from a uniform random variable whose support is given by the respective corners. The newly assigned values now form four separate squares which can be assigned values recursively as shown above. The Midpoint Displacement algorithm produces noisy surfaces that are jagged in appearance. To smooth out the results, a 3 \times 3 Gaussian Filter is applied to the surface twice in order to produce a more natural, smooth looking surface.

Face normals can be found by taking the cross product of the forward differences in both the row and column directions, however this leads to a faceted looking surface. For a more natural appearance, vertex normals are calculated using central differences in the row and column direction. The cross product of these approximations then gives the approximate surface normal at each vertex to ensure proper lighting.

terrain-texturing
Fig. 3: Example process of blending an elevation band texture with the base texture to create geographically correct looking terrain textures.

Texturing of the terrain is done by dynamically blending eight separate textures together based on terrain height as shown above. The process begins by loading in a base texture which is applied to all heights of the terrain. The next texture to apply is loaded, and an alpha mask is created and applied to the next texture based on random noise and a specific terrain height band. Blending of the masked texture and base texture is a function of the terrain’s height where the normalized height is passed through a logistic function to decide what portion of each texture should be used. The combined texture then serves as a new base texture and the process repeats until all textures have been blended together.

Water

In order to generate realistic looking water, a number of different OpenGL abilities were employed to accurately capture a half dozen different water effects consisting of reflections, waves, ripples, lighting, and Fresnel effects. Compared to other elements of the project, water required the largest graphics effort to get right.

reflection-passes
Fig. 4: The different rendering passes for creating realistic looking water with reflections and refraction.

To obtain reflections, a three-pass rendering process is used. In the first pass, the scene is clipped and rendered to a frame buffer (with color and depth attachments) from above the water revealing only what is below the surface for the refraction effects. The second pass clips and renders the scene to a frame buffer (with only a color attachment) from below the water surface revealing only what is above the water for the reflection effects. The third pass then combines these buffers on the water surface through vertex and fragment shaders to give the desired appearance. To map the frame buffer renderings to the water surface, the clipped coordinates calculated in the vertex shader are converted to normalized device coordinates through perspective division in the fragment shader which allows one to map the (u, v) coordinate of the texture as it should appear on the screen to coordinates of the water surface.

shader-effects
Fig. 5: Reflection, ripples normal map, water depth, refraction, specular normal map, and depth buffer textures used to achieve different visual effects.

To create the appearance of water ripples, a normal map is sampled and the resulting time varying displacement is used to sample the reflection texture for the fragment’s reflection color. Next, a similar sampling of the normal map is done at a coarser level to emulate the specular lighting that would appear on the subtle water waves created by the vertex shader. Refraction ripples and caustic lighting are achieved by sampling from the normal map just as the surface ripples and specular lighting were. To make the water appear cloudy, the depth buffer from the refraction rendering is used in conjunction with water depth so that terrain deeper under water is less visible as it would be in real life.

To combine the reflection and refraction components, the Fresnel Effect is used. This effect causes the surface of the water to vary in appearance based on viewing angle. When the viewer’s gaze is shallow to the water surface, the water is dominated by the reflection component, while when gazing downward, the water is more transparent giving way to the refraction component. The final combination effect is to adjust the transparency of the texture near the shore so that shallower water reveals more of the underlying terrain.

Flora

flora-examples
Fig. 6: Examples of plants generated by using the Stochastic Lindenmayer Systems framework.

The scene consists of single cottonwood trees, but the underlying algorithm based on Stochastic Lindermayer Systems [Lin68] which can produce a large variety of flora as shown above. The idea is that an n-ary tree is created with geometric annotations consisting of length and radius, and relative position, yaw, and pitch to its parent node. Internal nodes of the tree are rendered as branches or stems, while leaf nodes as groups of leaves, flowers, fruits and so on. Depending on the type of plant one wishes to generate, these parameters take on different values, and the construction of the n-ary tree varies. For a palm tree, a linked list is created, whereas a flower may have several linked lists sharing a common head, and a bush may be a factorial tree with a regular pitch.

Smoke

smoke
Fig. 7: Examples of smoke plumes generated using random particles, Metaballs, and Marching Tetrahedra algorithms.

The primary algorithmic challenge of the project was to visualize smoke coming from the chimney of the cabin in the scene. To achieve this, a simple particle system was written in conjunction with [Bli82]’s Metaballs and [PT+90]’s Marching Tetrahedra algorithms. The tetrahedral variant was chosen since it easier to implement from scratch than [LC87]’s original Marching Cubes algorithm. The resulting smoke plumes produced from this chain of algorithms is shown above.

particles
Fig. 8: Visual explanation of the smoke generation process in two dimensions.

Particles possess position and velocity three dimensional components, and are added at a fixed interval to the system in batches with random initial values on the xy-plane and zero velocity terms. A uniform random vector field is created with random x, y components and fixed, positive z component. Euler’s Forward Method is applied to the system to update each particle’s position and velocity. Any particles that escape from the unit bounding cube are removed from the system. This process produces the desired Brownian paths that are typical of smoke particles. To visualize each particle, the Metaballs algorithm is used to create a potential field about each particle. The three dimensional grid is populated in linear time with respect to the number of particles by iterating over a fixed volume about each particle since there is no need to go outside of the fixed volume where the point’s potential field is surely zero.

marching
Fig. 9: Cube segmented into 6 tetrahedra, and the two primary cases of the Marching Tetrahedra algorithm.

The resulting scalar field from this process is passed along to the Marching Tetrahedra algorithm. The algorithm will inspect each volume of the grid in cubic time with respect to the grid edge size. The eight points of the volume are then assigned inside / outside labellings with those volumes completely inside or outside ignored. Those having mixed labellings contain a segment of the surface we wish to render. A single volume is segmented into 6 tetrahedra[1], with two tetrahedra facing each plane resulting in a common corner shared by all as shown above. Each tetrahedron then has sixteen cases to examine leading to different surfaces. Two of these cases are degenerate; all inside or all outside. The remaining fourteen cases can be reduced to two by symmetry as shown above. To ensure the surface is accurate, the surface vertices are found by linearly interpolating between inside / outside grid points.

Face normals can be computed directly from the resulting surface planes by taking the usual cross product. In order to calculate vertex normals, numerical differentiation is used to derive the gradient of the scalar field at the grid point using backward, central, and forward differences depending on availability. Taking the calculated normals at each grid point, the surface normals at the previously interpolated surface vertices are then the linear interpolation of the corresponding grid point normals. Given more time, I would have liked to put more time into surface tracking and related data structures to reduce the cubic surface generation process down to just those volumes that require a surface to be drawn.

Butterflies

pursuit
Fig. 10: Example series of movements and completed pursuit curve of predator and prey.

For the final stretch goal, the proposed static options were eschewed in favor of adding in a dynamic element that would help bring the scene to life without being obtrusive. As a result, a kaleidoscope of butterflies that meander through the scene were introduced.

Each butterfly follows a pursuit curve as it chases after an invisible particle following a random walk. Both butterfly and target are assigned positions drawn from a uniform random variable with unit support. At the beginning of a time step, the direction to where the particle will be at the next time step from where the butterfly is at the current time step is calculated, and the butterfly’s position is then incremented in that direction. Once the particle escapes the unit cube, or the butterfly catches the particle, then the particle is assigned a new position, and the game of cat and mouse continues.

Each time step the butterfly’s wings and body are rotated a slight amount by a fixed value, and by the Euler angles defined by its direction to give the correct appearance of flying. To add variety to the butterflies, each takes on one of three different appearances (Monarch, and Blue and White Morpho varieties) based on a fair dice roll. One flaw with the butterflies is that they do not take into account the positions of other objects in the scene and can be often seen flying into the ground, or through the cabin.

Conclusions

This project discussed a large variety of topics related to introductory computer graphics, but did not cover other details that were developed including navigation, camera control, lighting, algorithms for constructing basic primitives, and the underlying design of the C++ program or implementation of the GLSL shaders. While most of the research applied to this project dates back nearly 35 years, the combination of techniques lends to a diverse and interesting virtual environment. Given more time, additional work could be done to expand the scene to include more procedurally generated plants, objects, and animals, as well as additional work done to make the existing elements look more photorealistic.

References

[Bli82] James F. Blinn. A generalization of algebraic surface drawing. ACM Trans. Graph., 1(3):235-256, 1982.

[FFC82] Alain Fournier, Donald S. Fussell, and Loren C. Carpenter. Computer render of stochastic models. Commun. ACM, 25(6):371-384, 1982.

[LC87] William E. Lorensen and Harvey E. Cline. Marching cubes: A high resolution 3d surface construction algorithm. In Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1987, pages 163-169, 1987.

[Lin68] Astrid Lindenmayer. Mathematical models for cellular interactions in development i. filaments with one-side inputs. Journal of theoretical biology, 18(3):280-299, 1968.

[PT+90] Bradley Payne, Arthur W Toga, et al. Surface mapping brain function on 3d models. Computer Graphics and Applications, IEEE, 10(5):33-41, 1990.

Category Recognition of Golden and Silver Age Comic Book Covers

leave a comment »

Introduction

Motivation

For a while now, I’ve been wanting to work on a computer vision project and decided for my next research focused project that I would learn some image processing and machine learning algorithms in order to build a system that would classify the visual contents of images, a category recognizer. Over the course of the summer I researched several techniques and built the system I had envisioned. The end result is by no means state of the art, but satisfactory for four months of on-and-off development and research. The following post includes my notes on the techniques and algorithms that were used in the project followed by a summary of the system and its performance against a comic book data set that was produced during development.

Subject Matter

The original subject matter of this project were paintings from the 1890s done in the Cloisonnism art style. Artists of the style are exemplified by Emile Bernard, Paul Gaugin and Paul Serusier. The style is characterized by large regions of flat colors outlined by dark lines; characteristics that would be easy to work with using established image processing techniques. During development, it became evident that no one approach would work well with these images. As an alternative, I decided to work with Golden and Silver Age comic book covers from the 1940s to 1960s which also typified this art style. Many of the comic books were drawn by the same individuals such as Jack Kirby, Joe Shuster and Bob Kane. As an added benefit, there are thousands of comic book covers available online compared to the dozens of Cloisonnism paintings.

Image Processing

Representation

An image is a function, I : \mathbb{Z}^2 \to \mathbb{Z}^3, where each input vector, \vec{x}, represents an image coordinate and each output vector, \vec{y}, represents the red, blue and green (RGB) channels, \vec{y}_c, of an image. Individual input values are bound between zero and the width, w, or height, h, of the image and output values are between zero and 255. Each output vector represents a unique color in RGB space giving rise to 2^{24} possible colors. Below is a basic sample image broken down into to its individual channels.

From left to right: original image, red channel, blue channel and green channel.
Original image: Action Comics 1 (1938). Art by Joe Shuster. Copyright © 1938 DC Comics, Inc.

Like any other vector field, transformations can be applied to the image to yield a new image, \hat{I}. In image processing, these transformations are referred to as image filters and come in three varieties of point-based, neighbor-based and image-based filters. As the names suggest, point-based filters map single output vectors to a single output vector, neighbor-based filters map neighboring output vectors to a single output vector, and image-based filters map the whole image and a single or neighboring set of output vectors to a single output vector.

There are many different instances of these types of filters, but only those used in this project are discussed below. Computational complexity and efficient algorithms for each type of filter are also discussed where appropriate.

Point-based Filters

Point-based filters, f : \mathbb{Z}^3 \to \mathbb{Z}^3 , map an output vector to a new output vector in the form \hat{I}(\vec{x}) = f(I(\vec{x})). Application of a point-based filter is done in quadratic time with respect to the dimensions of the image \mathcal{O}(N^2).

Grayscale Projection

It is helpful to collapse the RGB channels of an image down to a single channel for the purpose of simplifying filter results. This can be done by using a filter of the form f(\vec{y})_c = \frac{ \lVert \vec{y} \rVert_2 }{ \sqrt{3} }. Alternatively one can use a filter of the form f(\vec{y})_c = (0.2126, 0.7152, 0.0722)^T \cdot \vec{y} to represent the luminescence of the output vector.

Thresholding

A threshold filter serves as a way to accentuate all values in the image greater than or equal to a threshold, \gamma, or to attenuate all those values less than the threshold.

The first variety is the step threshold filter, f(\vec{y})_c = \begin{cases} 255 & \vec{y}_{c} \ge \gamma \\ 0 & \text{otherwise} \end{cases}, which exhibits an ideal threshold cutoff after the threshold value.

The second variety is a logistic threshold filter, \displaystyle f(\vec{y})_c = \frac{255}{ 1.0 + \exp{\sigma(\gamma - \vec{y}_c)} }, with an additional parameter, \sigma, allowing for wiggle room about the threshold yielding a tapered step function as \sigma increases in size.

From left to right: original image, step threshold filter (\gamma = 0.5), logistic threshold filter (\gamma = 0.5, \sigma = 4), and logistic threshold filter (\gamma = 0.5, \sigma = 16).
Original image: The Amazing Spider-Man 1 (1963). Art by Jack Kirby & Steve Ditko. Copyright © 1963 Non-Pareil Publishing Corp.

Neighbor-based Filters

All neighbor-based filters take the output vectors neighboring an input vector to calculate a new output vector value. How the neighboring output vectors should be aggregated together is given by a kernel image, K, and the computation is represented as a two-dimensional, discrete convolution.

\hat{I}_c = \displaystyle (I \star K)(u,v)_c = \sum_{i=-\infty}^{\infty} \sum_{j=-\infty}^{\infty} I(i,j)_c K(u-i,v-j)_c

Neighbor-based filters can be applied naïvely in quartic time as a function of the image and kernel dimensions, \mathcal{O}(N^4). However, a more efficient implementation allows for \mathcal{O}(N^2 \log_2 N) time by way of the Discrete Fourier Transform.

\displaystyle \mathcal{F}(f)(x) = \sum_{n=0}^{N-1} f(n) \exp{ -2 \pi i \frac{n}{N} x}

The Discrete Fourier Transform is a way of converting a signal residing in the spatial domain into a signal in the frequency domain by aggregating waveforms of varying frequencies where each waveform is amplified by its corresponding value in the input signal. The Inverse Discrete Fourier Transform maps a frequency domain signal back to the spatial domain.

\displaystyle \mathcal{F}^{-}(f)(x) = \frac{1}{N} \sum_{n=0}^{N-1} f(n) \exp{ 2 \pi i \frac{n}{N} x}

Applying the Discrete Fourier Transform to a convolution, \mathcal{F}(f \star g), comes with the convenient property that the transformed convolution can be rewritten as the product of the transformed functions, \mathcal{F}(f) \mathcal{F}(g), by way of the Convolution Theorem.

The improved time complexity is achieved by using a divide a conquer algorithm known as the Fast Fourier Transform which takes advantage of the Danielson-Lanczos Lemma which states that the Discrete Fourier Transform of a signal can be calculated by splitting the signal into two equal sized signals and computing their Discrete Fourier Transform.

\displaystyle \mathcal{F}(f)(x) =  \sum_{n=0}^{\frac{N}{2} - 1} f(2n) \exp{ -2 \pi i \frac{2n}{N} x} + \sum_{n=0}^{\frac{N}{2} - 1} f(2n+1) \exp{ -2 \pi i \frac{2n+1}{N} x }

\displaystyle \mathcal{F}(f)(x) =  \sum_{n=0}^{\frac{N}{2} - 1} f(2n) \exp{ -2 \pi i \frac{n}{N / 2} x} + \exp{ -2 \pi i \frac{x}{N}  } \sum_{n=0}^{\frac{N}{2} - 1} f(2n+1) \exp{ -2 \pi i \frac{n}{ N / 2 } x }

\displaystyle \mathcal{F}(f)(x) = \mathcal{F}(f_{even})(x) + \mathcal{F}(f_{odd})(x) \exp{ -2 \pi i \frac{x}{N}  }

For the purposes of image processing, we use the two-dimensional Discrete and Inverse Discrete Fourier Transform.

\displaystyle \mathcal{F}(f)(u, v) = \frac{1}{w h} \sum_{m = 0}^{w - 1} \sum_{n = 0}^{h-1} f(m,n) \exp{-2 \pi i \left (\frac{m u}{h} + \frac{nv}{w} \right) }

The expression can be rearranged to be the Discrete Fourier Transform of each column in the image and then computing the resulting Discrete Fourier Transform of those results to obtain the full two-dimensional Discrete Fourier Transform.

\displaystyle \mathcal{F}(f)(u, v) = \frac{1}{w} \sum_{m = 0}^{w-1} \left (  \frac{1}{h} \sum_{n = 0}^{h-1} f(m,n) \exp{-\frac{2 \pi i n v}{h}}  \right) \exp{-\frac{2 \pi i m u}{w}}

As a result, we can extend the Fast Fourier Transform in one dimension easily into two dimensions producing a much more efficient time complexity.

Weighted Means: Gaussian and Inverse Distance

Weighted mean filters are used to modify the morphology of an image by averaging neighboring output vectors together according to some scheme.

A Gaussian filter is used to blur an image by using the Gaussian distribution with standard deviation, \sigma, as a kernel.

\displaystyle K(u, v)_c = \frac{1}{2 \pi \sigma^2} \exp{-\frac{u^2+v^2}{2 \sigma^2} }

The inverse distance filter calculates how far the neighboring output vectors are with respect to the new output vector being calculated. Each result is also scaled by the parameter, \gamma, allowing for contrast adjustments.

\displaystyle K(u, v)_c = \begin{cases}  \gamma  \lVert (u-i, v-j) \rVert^{-}  & (u,v) \neq (i,j)  \\ 0 & \text{otherwise} \end{cases}

Laplacian

A Laplacian filter detects changes in an image and can be used for sharpening and edge detection. Much like in calculus of a single variable, the slope of a surface can be calculated by the Gradient operator, \displaystyle \nabla = \left ( \frac{\partial}{\partial x}, \frac{\partial}{\partial y} \right ). Since it is easier to work with a scalar than a vector, the magnitude of the gradient is given by the Laplacian operator, \displaystyle \Delta = \nabla \cdot \nabla = \frac{\partial^2}{\partial x^2} + \frac{\partial^2}{\partial y^2}.

Since an image is a discrete function, the Laplacian operator needs to be approximated numerically using a central difference. h represents the spacing between successive samples of the underlying function. Since the finest resolution that can be achieved in an image is an individual displacement, h = 1.

\displaystyle \Delta I \approx  \frac{I(x + h, y) + I(x, y + h) - 4 I(x,y) - I(x - h, y) + I(x, y - h)}{h^2}

\displaystyle \Delta I \approx I \star K = I \star \frac{1}{h^2} \begin{pmatrix} 0 & 1 & 0 \\ 1 & -4 & 1 \\ 0 & 1 & 0 \end{pmatrix}

From left to right: original image, Gaussian filter (\sigma = 4.0), inverse distance filter (\gamma = 2.0) and Laplacian filter.
Original image: Batman 1 (1940). Art by Bob Kane. Copyright © 1940 DC Comics, Inc.

Image-based Filters

Image-based filters calculate some information about the contents of the image and then use that information to generate the appropriate point-based and neighbor based filters.

Normalization

The normalization process computes the minimum, \vec{y}_{c}^{\text{min}} and maximum, \vec{y}_{c}^{\text{max}} values of each channel and linearly maps all values between those extrema to new values between the possible channel extrema of c^{\text{min}} = 0 and c^{\text{max}} = 255.

\displaystyle \hat{I}(\vec{x})_c =  \frac{I(\vec{x})_c - \vec{y}^{\text{min}}_c}{\vec{y}^{\text{max}}_c - \vec{y}^{\text{max}}_c} (c^\text{max} - c^\text{min}) + c^\text{min}

This particular image-based filter can be applied in quadratic time, \mathcal{O}(N^2), to calculate the extrema of the image and apply the linear map.

From left to right: original image, normalized image and accompanying histograms.
Original image: Daredevil 1 (1964). Art by Jack Kirby. Copyright © 1964 Marvel Comics.

Edge Detection

Edge detection is the process of identifying changes (e.g., texture, color, luminance and so on) in an image. As alluded to in the image processing section, the Laplacian filter is central to detecting edges within an image. As a result A sequence of filters is used before and after a Laplacian filter to produce a detector that consistently segments comic book covers. The following sequence of filters was used.

  1. Grayscale Projection – Since all logical components of a comic book cover are separated by inked lines, it is permissible to ignore the full set of RGB channel information and collapse the image down to a grayscale image.
  2. Normalization – It is conceivable that the input image has poor contrast and brightness. To ensure that the full range of luminescence values are presented, the image is normalized.
  3. Gaussian (\sigma = 1.5) – An image may have some degree of noise superimposed on the image. To reduce the noise, the image is blurred using a Gaussian filter with a standard deviation of 1.5. This is enough to smooth out the image without distorting finer image detail.
  4. Laplacian – Once the image has been prepared, its edges are calculated using the Laplacian filter.
  5. Normalization – Most of the changes in the image may be subtle and need to make sure that all edge information is accentuated as much as possible by applying a normalization filter.
  6. Step Threshold (\gamma = 0.05) – Since a partial edge isn’t particularly useful information, any edge RGB value less than 12.75 is attenuated to zero and all other values accentuated to 255.
  7. Inverse Distance (\gamma = 1.5) – It is possible that during the threshold process that discontinuities were introduced into some of the edges. To mitigate this impact, an inverse distance filter is used to inflate existing edges and intensify the result with a gain of 1.5.
From left to right, top to bottom: original image, grayscale projection, normalization, Gaussian (\sigma = 1.5), Laplacian, normalization, step threshold (\gamma = 0.05), inverse distance (\gamma = 1.5).
Original image: Captain America 111 (1964). Art by Jim Steranko. Copyright © 1968 Marvel Comics.

The complete edge detection process takes computational complexity of \mathcal{O}(N^2 \log_2 N) due to the neighbor-based filters used to eliminate noise and smooth edge discontinuities.

Segmentation

With the edge image it is possible to segment the image into its visual components. This is achieved by doing a flood fill on the image and using the edge image as the boundaries for the fill. Once the fill runs out of points to flood, the segment is complete and the next remaining point in the image is considered. To reduce the number of minuscule segments, only those segments representing 0.1 \% of the image are included.

From left to right: original image, edge image, sample overlaid segments and sample segments.
Original image: Fantastic Four Vol 1 59 (1967). Art by Jack Kirby. Copyright © 1967 Marvel Comics.

Machine Learning

Classifiers

The task of classification is to identify decision boundaries separating all of the classification within the data set. Such data sets can be linearly or non-linearly separable and as a result, classifiers were developed to solve the linear case and then adapted to deal with the more complicated non-linear case. While there are a number of classifiers, only the K-Nearest Neighbor and Support Vector Machine classifiers were researched and implemented in this project.

K-Nearest Neighbor

The K-Nearest Neighbor classifier is an online classifier which operates under the assumption that a yet to be classified vector is most likely to be the same classification as those k training vectors which are closest to the vector based on a distance measure, d : \mathbb{R}^n \times \mathbb{R}^n \to \mathbb{R}.

Distance can be measured in a variety of ways for arbitrary vectors, \vec{x}, \vec{y} \in \mathbb{R}^n, residing in some real space. The most common of which are specialized cases of the Minkowski distance.

\displaystyle d_{p}(\vec{x},\vec{y}) = \left ( \sum_{i = 0}^{n} \lvert \vec{x}_{i} - \vec{y}_{i} \rvert^{p} \right )^{ \frac{1}{p} }

The Manhattan distance, d_{1}(\vec{x},\vec{y}), yields the distance traveled along a grid between two vectors (hence a name in reference to the New York City borough). The Euclidean distance, d_{2}(\vec{x}, \vec{y}), gives the distance between the vectors in the usual familiar sense. The last specialized cased considered is the Chebyshev distance, d_{\infty}(\vec{x},\vec{y}), which gives the maximum distance between any one dimension of the two vectors.

Two factors affect the efficacy of the algorithm. The first is the dimension of the data, n, and the size of the train data set, N. As the training data set increases with size, there are more vectors which a test vector must be compared against. As a result, an efficient means of searching the training set must be used to yield satisfactory performance. This can be achieved by using kd-Trees which give \mathcal{O}(\log N) search performance or branch and bound methods giving similar performance. As the dimensionality of the dataset increases, the efficacy of kd-Trees diminishes to a near linear search of the training data set due to the “curse of dimensionality.”

From left to right: Test point (green triangle) near two categorizes, identification of nearest neighbors, and resolved category.

Support Vector Machine
Formulation

The Support Vector Machine classifier is an offline linear, binary classifier which operates under the assumption that a training set, (\vec{x}, y)^{(i)} \in \mathcal{D}, consists of linearly separable classifications, y \in \lbrace -1, +1 \rbrace , of data, \vec{x} \in \mathbb{R}^n, by some optimal hyperplane of the form \langle \vec{w}, \vec{x} \rangle + b = 0. Where \langle \cdot, \cdot \rangle is the inner product, \vec{w} \in \mathbb{R}^n and b \in \mathbb{R}. When \langle \vec{w}, \vec{x} \rangle + b \ge 1, then the classification +1 is presented and when \langle \vec{w}, \vec{x} \rangle + b \le -1, the classification -1 is presented.

From left to right: Training data of two categorizes, optimally separating hyperplane, and resolved categorizes in \mathbb{R}^2.

The hyperplane is padded by two hyperplanes separated by an equal distance to the nearest training examples of each classification. The span between the supporting hyper planes is the margin. The goal then is to pick a hyperplane which provides the largest margin between the two separable classifications. The margin between the supporting hyperplanes is given by \displaystyle \frac{2}{\lVert \vec{w} \rVert}. Given the demarcation criteria, the maximum margin will also be subject to the constraint that all training examples satisfy y^{(i)} (\langle \vec{w}, \vec{x}^{(i)} \rangle + b) - 1 \ge 0. As a result of the objective function and accompanying linear constraint, the problem is stated in terms of its native primal Quadratic Programming form.

\min \mathcal{L}_P(\vec{w}) = \frac{1}{2} \langle \vec{w}, \vec{w} \rangle subject to y^{(i)} (\langle \vec{w}, \vec{x}^{(i)} \rangle + b) - 1 \ge 0 \forall (\vec{x}, y)^{(i)} \in \mathcal{D}

To find the optimal parameters, it is easier to translate the problem into a dual form by applying the technique of Lagrange Multipliers. The technique takes an objective function, f, and constraint functions, g^{(i)}, and yields a new function \mathcal{L}(\vec{x}, \vec{\alpha}) = f(\vec{x}) + \sum \vec{\alpha}_i g(\vec{x})^{(i)} to be optimized subject to the added constraint \vec{\alpha}_i \ge 0 \forall i.

\displaystyle \max_{\vec{\alpha}} \mathcal{L}(\vec{w}, b, \vec{\alpha}) = \frac{1}{2} \langle \vec{w}, \vec{w} \rangle - \sum_{i=0}^{\lvert \mathcal{D} \rvert-1} \vec{\alpha}_{i} (y^{(i)} (\langle \vec{w}, \vec{x}^{(i)} \rangle + b) - 1) \vec{\alpha} \in \mathbb{R}^{\lvert \mathcal{D} \rvert } subject to \vec{\alpha}_{i} \ge 0 \forall i

The next step is to differentiate the objective function with respect to the parameters to determine the optimal solution. Since the function is concave, the results will yield the desired maximum constraints.

\displaystyle \frac{\partial \mathcal{L}}{\partial b} = 0 \implies \sum_{i=0}^{\lvert \mathcal{D} \rvert-1} \vec{\alpha}_i y^{(i)} = 0 \displaystyle \frac{\partial \mathcal{L}}{\partial \vec{w}} = 0 \implies \vec{w} = \sum_{i=0}^{\lvert \mathcal{D} \rvert-1} \vec{\alpha}_i y^{(i)} \vec{x}^{(i)}

As a result the dual problem can be written as the following:

\displaystyle \max \mathcal{L}_D(\vec{\alpha}) = \frac{1}{2} \sum_{i=0}^{\lvert \mathcal{D} \rvert-1} \sum_{j=0}^{\lvert \mathcal{D} \rvert-1} \vec{\alpha}_i \vec{\alpha}_j y^{(i)} y^{(j)} \langle \vec{x}^{(i)}, \vec{x}^{(j)} \rangle - \sum_{k=0}^{\lvert \mathcal{D} \rvert-1} \vec{\alpha}_k subject to \vec{\alpha}_{i} \ge 0 \forall i, \displaystyle \sum_{j=0}^{\lvert \mathcal{D} \rvert-1} \vec{\alpha}_j y^{(j)} = 0

Handling of non-linearly separable data

In the event that the data is not linearly separable, then an additional parameter, C, is added as a penalty factor for those values that reside on the wrong side of the hyperplane. The derivation for the quadratic program is identical to the one presented above with the exception that the lagrange multipliers now have an upper boundary 0 \le \vec{\alpha}_i \le C \forall i.

Non-linear classification

By way of Mercer’s Theorem, the linear Support Vector Machine can be modified to allow for non-linear classification through the introduction of symmetric, positive semidefinite kernel functions, \Phi : \mathbb{R}^n \times \mathbb{R}^n \to \mathbb{R}. The idea being that if the data is not linearly separable in its present dimensional space that by mapping it to a higher dimensional space that the data may become linearly separable by some higher dimensional hyperplane. The benefit of a kernel function is that the higher dimensional vector need not be computed explicitly. This “kernel trick” allows for all inner products in the dual representation to be substituted with a kernel.

\displaystyle \max \mathcal{L}_D(\vec{\alpha}) = \frac{1}{2} \sum_{i=0}^{\lvert \mathcal{D} \rvert-1} \sum_{j=0}^{\lvert \mathcal{D} \rvert-1} \vec{\alpha}_i \vec{\alpha}_j y^{(i)} y^{(j)} \Phi(\vec{x}^{(i)}, \vec{x}^{(j)}) - \sum_{k=0}^{\lvert \mathcal{D} \rvert-1} \vec{\alpha}_k subject to 0 \le \vec{\alpha}_{i} \le C \forall i, \displaystyle \sum_{j=0}^{\lvert \mathcal{D} \rvert-1} \vec{\alpha}_j y^{(j)} = 0

And the decision hyperplane function then becomes:

\displaystyle f(\vec{x}) = \sum_{i=0}^{\lvert \mathcal{D} \rvert-1} \alpha_i y^{(i)} \Phi (\vec{x}^{(i)}, \vec{x}) + b

The following are some typical kernels:

  • Linear – \Phi(\vec{x}, \vec{y}) = \langle \vec{x}, \vec{y} \rangle
  • Polynomial – \Phi(\vec{x}, \vec{y}) = (\gamma \langle \vec{x}, \vec{y} \rangle + r)^d \gamma > 0
  • Radial basis function – \Phi(\vec{x}, \vec{y}) = \exp{-\gamma \langle \vec{x} - \vec{y}, \vec{x} - \vec{y} \rangle} \gamma > 0
  • Sigmoid – \Phi(\vec{x}, \vec{y}) = \tanh (\gamma \langle \vec{x}, \vec{y} \rangle + r)

From a practical point of view, only the linear and radial basis function kernels from this list should be considered since the polynomial kernel has too many parameters to optimize and the sigmoid kernel does not satisfy the positive semidefinite kernel matrix requirement of Mercer’s Theorem.

Algorithmic details

The Support Vector Machine classifier can be implemented using a quadratic programming solver or by incremental descent algorithms. Both methods work, but are difficult to implement and expensive to procure. An alternative is the Sequential Minimal Optimization algorithm developed by John Platt at Microsoft Research. The algorithm works by analytically solving the dual problem for the case of two training examples then iterating over all of the lagrange multipliers verifying that the constraints are satisfied. For those that are not, the algorithm computes new lagrange multiplier values. The full details of the algorithm can be found in Platt’s paper.

The time complexity of the algorithm is quadratic with respect to the number of training samples and support vectors \mathcal{O}(N^2).

The time complexity of evaluating the decision function is linear with respect to the number of support vectors \mathcal{O}(N).

Multiclass Classification

The classification methods presented in the previous section are utilized as binary classifiers. These classifiers can be used to classify multiple classifications by employing a one-vs-all or all-vs-all approach. In the former a single classification is separated from the remaining classifications to produce N classifiers for the N classifications. Each classifier is then used to evaluate a vector and the classifier with the highest confidence is then used to declare the classification.

In the latter, a single classification is compared individually to each other classification resulting in \frac{N(N - 1)}{2} classifiers. All of the classifiers are then evaluated against the test vector and the classification with the greatest consensus from the classifiers is declared the classification of the test vector.

Both methods have their place. The benefit of a one-vs-all approach is that there are fewer classifiers to maintain. However, training a single classifier on a complete data set is time consuming and can give deceptive performance measures. All-vs-all does result in more classifiers, but it also provides for faster training which can be easily parallelized on a single machine and distributed to machines on a network.

Classifier Evaluation

Individual classifiers are evaluated by training the classifier against a data set and then determining how many correct and incorrect classifications were produced. This evaluation produces a confusion matrix.

Predicted Classification
Positive Negatives Total
Actual Classification Positive (TP) True Positive (FN) False Negative (AP) Actual Positives
Negatives (FP) False Positive (TN) True Negative (AN) Actual Negatives
Total (PP) Predicted Positives (PN) Predicted Negatives (N) Examples
Confusion matrix defintion and associated terms.

The confusion matrix is used to calculate a number of values which are used to evaluate the performance of the classifier. The first of which is the accuracy and error of the classifier. Accuracy measures the number of instances where the actual and predicted classifications matched up and the error for when they do not.

\displaystyle \text{Accuracy} = \frac{TP + TN}{N} \displaystyle \text{Error} = \frac{FP + FN}{N}

Since we should expect to get different results each time we evaluate a classifier, the values that we obtain above are sample estimates of the true values that are expected. Given enough trails and measurements, it is possible to determine empirically what the true values actually are. However, this is time consuming and it is instead easier to use confidence intervals to determine what interval of values a measurement is mostly likely to fall into.

Training and Testing

Each of the classifiers presented have some number of parameters that must be determined. The parameters can be selected by having some prior knowledge or by exploring the parameter space and determining which parameters yield optimal performance. This is done by performing a simple grid search over the parameter space and evaluating and attempting to minimize the error.

K-folds cross-validation is used at each grid location to produce a reliable measure of the error. The idea is that a data set is split into k disjoint sets. The first set is used as a validation set and the remaining k - 1 sets are used in unison as the training data set for the classifier. This process is done on the next set and so on until all k sets have been used as a validation set.

System

Implementation

The system was implemented in C# 4.0 on top of the Microsoft .NET Framework. The user interface was written by hand using the WinForms library. No other third-party libraries or frameworks were used. When possible, all algorithms were parallelized to take advantage of multi-core capabilities to improve processing times.

Summary

The system consists of two modes of operation: training and production. In training, a human classifier labels image segments with an appropriate classification. New image segments are then taken into consideration during the training of machine learning algorithms. Those algorithms producing the lowest error for a given classification are then used in production mode. During production, a user submits an image and each image segment is then evaluated against the available classifiers. Those image segments are then presented to the user with the most likely classification. These two modes along with their workflows and components are illustrated in the following diagram.

System inputs, components and outputs used for training and production workflows.

Training Mode

Data Set Construction

The user interface of the system allows users to add an image segment to a local data set of images. Once added, the image is then processed to yield image segments. The user can then label an image segment by editing the segment and moving on to the next image segment. This allows for easy and efficient human classification of data. If the user does not wish to keep the image, he or she may remove the image from the data set as well.

Data set construction tab.
Input image: Superman 14 (1942). Art by Fred Ray. Copyright © 1942 DC Comics, Inc.

Data Set Cleaning

During the construction phase, errors may be introduced into the data set typically in the case of typos or forgetting which segment was currently being edited. The data set is cleaned by listing out all available classifications and presenting the user with all available segments associated with that classification. The user can then review the image segment as it was identified in the source image. If the user does not wish to keep the classification, he or she may remove the image from the data set as well.

Data set cleaning tab.
Input image: Action Comics 62 (1943). Art by Jack Burnley. Copyright © 1943 DC Comics, Inc.

Data Set Statistics

The data set consists of 496 comic book covers pulled from the Cover Browser database of comic book covers. The first 62 consecutive published comic book covers where used from Action Comics, Amazing Spider-man, Batman, Captain America, Daredevil, Detective Comics, Superman, and Wonder Woman and then processed by the image processing subsystem yielding 24,369 image segments. 11,463 of these segments represented classifiable segments which were then labeled by hand over the course of two weeks; the remaining segments were then discarded.

Data Set Distribution consists of predominately clothing and text with equal representation from all comic book series sources.

In total, there were 239 classifications identified in the data set among 18 categories. Text, clothing, geography, and transportation categories accounting for 90% of the data set. Since the majority of classification were incidental, only those classifications having 50 or more image segments were considered by the application leaving a total of 38 classifications.

Classifier Evaluation

For the 38 classifications meeting the minimum criteria for classification, the K-Nearest Neighbor approach worked well in distinguishing between text classifications from other classifications and between intra-text classifications for both all-vs-all and one-vs-all schemes.

All-vs-All K-Nearest Neighbor Performance. One-vs-All K-Nearest Neighbor Performance.

The Support Vector Machine approach presented unremarkable results for both all-vs-all and one-vs-all methods. In the former, only a few pairings resulted in acceptable error rates whereas the later presented only a couple acceptable error rates.

All-vs-All Support Vector Machine Performance. One-vs-All Support Vector Machine Performance.

For both classification methods presented, the all-vs-all method yielded superior results to the one-vs-all method. In comparing the two classifier methods, the K-Nearest Neighbor seems to have done better than the Support Vector Machine approach, contrary to what was expected from literature. Both classifier methods are used in production mode.

Production Mode

Production mode allows the end user to add an image to the data set and then review the most likely classifications produced by evaluating each image segment against the available set of classifiers. The end user is then expected to review each segment and accept or reject the suggested classification. Aside from this additional functionality, production mode is nearly identical in functionality to training mode.

Production data set construction tab.
Input image: Detective Comics 36 (1940). Art by Bob Kane. Copyright © 1940 DC Comics, Inc.

Conclusions

The time spent on this project was well spent. I met the objectives that I laid out at the beginning of the project and now have a better understanding of the image processing algorithms and machine learning concepts from a theoretical and practical point of view.

Future Work

Segmentation

One issue with the existing implementation is that it over segments the image. Ideally, fewer segments would be produced that are more closely aligned with their conceptual classification. There are a number of popular alternatives to the approach taken, such as level set methods, which should be further investigated.

Classification

The approach taken to map scaled versions of the image segments to a 2^{10} space is simple to implement, but it did not assist well in the classification process. Alternative mappings such as histogram models should be evaluated in the future to decrease classification times and to determine if classification error rates can be reduced.

System User Interface

While it was disappointing to have spent so much time building a data set only to have to limit what was considered, it assisted me in building a user interface that had to be easy and fast to use. The application can certainly be developed further and adapted to allow for other data sets to be constructed, image segmentation methods to be added and additional classifications to be evaluated.

System Scalability

The system is limited now to a single machine, but to grow and handle more classifications, it would need to be modified to run on multiple machines, have a web-based user interface developed and a capable database to handle the massive amounts of data that would be required to support a data set on the scale of the complete Cover Browser’s or similar sites’ databases (e.g., 450,000 comic book covers scaled linearly would require 546 GiB of storage.) Not to mention data center considerations for overall system availability and scalability.

References

Aly, Mohamed. Survey on Multiclass Classification Methods. [pdf] Rep. Oct. 2011. Caltech. 24 Aug. 2012.

Asmar, Nakhle H. Partial Differential Equations: With Fourier Series and Boundary Value Problems. 2nd ed. Upper Saddle River, NJ: Pearson Prentice Hall, 2004. Print.

Bousquet, Olivier, Stephane Boucheron, and Gabor Lugosi. “Introduction to Statistical Learning Theory.” [pdf] Advanced Lectures on Machine Learning 2003,Advanced Lectures on Machine Learning: ML Summer Schools 2003, Canberra, Australia, February 2-14, 2003, Tübingen, Germany, August 4-16, 2003 (2004): 169-207. 7 July 2012.

Boyd, Stephen, and Lieven Vandenberghe. Convex Optimization [pdf]. N.p.: Cambridge UP, 2004. Web. 28 June 2012.

Burden, Richard L., and J. Douglas. Faires. Numerical Analysis. 8th ed. Belmont, CA: Thomson Brooks/Cole, 2005. Print.

Caruana, Rich, Nikos Karampatziakis, and Ainur Yessenalina. “An Empirical Evaluation of Supervised Learning in High Dimensions.” [pdf] ICML ’08 Proceedings of the 25th international conference on Machine learning (2008): 96-103. 2 May 2008. 6 June 2012.

Fukunaga, Keinosuke, and Patrenahalli M. Narendra. “A Branch and Bound Algorithm for Computing k-Nearest Neighbors.” [pdf] IEEE Transactions on Computers (1975): 750-53. 9 Jan. 2004. 27 Aug. 2012.

Gerlach, U. H. Linear Mathematics in Infinite Dimensions: Signals, Boundary Value Problems and Special Functions. Beta ed. 09 Dec. 2010. Web. 29 June 2012.

Glynn, Earl F. “Fourier Analysis and Image Processing.” [pdf] Lecture. Bioinformatics Weekly Seminar. 14 Feb. 2007. Web. 29 May 2012.

Gunn, Steve R. “Support Vector Machines for Classification and Regression” [pdf]. Working paper. 10 May 1998. University of Southampton. 6 June 2012.

Hlavac, Vaclav. “Fourier Transform, in 1D and in 2D.” [pdf] Lecture. Czech Technical University in Prague, 6 Mar. 2012. Web. 30 May 2012.

Hsu, Chih-Wei, Chih-Chung Chang, and Chih-Jen Lin. A Practical Guide to Support Vector Classification. [pdf] Tech. 18 May 2010. National Taiwan University. 6 June 2012.

Kibriya, Ashraf M. and Eibe Frank. “An empirical comparison of exact nearest neighbour algorithms.” [pdf] Proc 11th European Conference on Principles and Practice of Knowledge Discovery in Databases. (2007): 140-51. 27 Aug. 2012.

Marshall, A. D. “Vision Systems.” Vision Systems. Web. 29 May 2012.

Panigraphy, Rina. Nearest Neighbor Search using Kd-trees. [pdf] Tech. 4 Dec. 2006. Stanford University. 27 Aug. 2012.

Pantic, Maja. “Lecture 11-12: Evaluating Hypotheses.” [pdf] Imperial College London. 27 Aug. 2012.

Platt, John C. “Fast Training of Support Vector Machines Using Sequential Minimal Optimization.” [pdf] Advances in Kernel Methods – Support Vector Learning (1999): 185-208. Microsoft Research. Web. 29 June 2012.

Sonka, Milan, Vaclav Hlavac, and Roger Boyle. Image Processing, Analysis, and Machine Vision. 2nd ed. CL-Engineering, 1998. 21 Aug. 2000. Web. 29 May 2012.

Szeliski, Richard. Computer vision: Algorithms and applications. London: Springer, 2011. Print.

Tam, Pang-Ning, Michael Steinbach, and Vipin Kumar. “Classification: Basic Concepts, Decision Trees, and Model Evaluation.” [pdf] Introduction to Data Mining. Addison-Wesley, 2005. 145-205. 24 Aug. 2012.

Vajda, Steven. Mathematical programming. Mineola, NY: Dover Publications, 2009. Print.

Welling, Max. “Support Vector Machines“. [pdf] 27 Jan. 2005. University of Toronto. 28 June 2012

Weston, Jason. “Support Vector Machine (and Statistical Learning Theory) Tutorial.” [pdf] Columbia University, New York City. 7 Nov. 2007. 28 June 2012.

Zhang, Hui, Jason E. Fritts, and Sally A. Goldman. “Image Segmentation Evaluation: A Survey of Unsupervised Methods.” [pdf] Computer Vision and Image Understanding 110 (2008): 260-80. 24 Aug. 2012.

Copyright

Images in this post are used under §107(2) Limitations on exclusive rights: Fair use of Chapter 1: Subject Matter and Scope of Copyright of the of the Copyright Act of 1976 of Title 17 of the United States Code.

Menger Sponge in C++ using OpenGL

with one comment

This past summer I was going through some old projects and came across a Menger Sponge visualizer that I wrote back in college. A Menger Sponge is simple fractal that has infinite surface area and encloses zero volume. The sponge is constructed in successive iterations and the first four iterations are rendered in the video below.

The sponge starts as a single cube that is segmented into twenty-seven equally sized cubes. The center cubes of each face and that of the parent cube are then discarded and the process is applied again to each of the remaining cubes. Visually, the process looks like the following:

The geometry of the process is straight forward. Starting with a cube’s origin, \vec{o}, and edge length, e, each of the children’s attributes can be calculated. Each child’s edge length is given by e_{Child} = \frac{1}{3} e_{Parent}. Each child’s origin given by \vec{o}_{Child} = \vec{o}_{Parent} + e_{Child} \vec{c}_{Child}. The constant represents a child’s relative position (e.g., (1, -1, 0)) to its parent.

The following implementation isn’t particularly well written, but it accomplishes the desired end result. The point and Cube classes achieve the logic that I’ve outlined above. Cube can be thought of as a tree structure that is generated upon instantiation. The visualize() method pumps out the desired OpenGL commands to produce the faces of the cubes.

#include <GL\glut.h>

#include <math.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

//=================================================================================
//=================================================================================
class point
{
public:
	point(GLfloat x, GLfloat y, GLfloat z, point* ref = NULL);
	void visualize();

	GLfloat x,y,z;
};

point::point(GLfloat x, GLfloat y, GLfloat z, point* ref)
{
	this->x = x;
	this->y = y;
	this->z = z;

	if(ref != NULL)
	{
		this->x += ref->x;
		this->y += ref->y;
		this->z += ref->z;
	}
}

//=================================================================================
//=================================================================================

class Cube
{
public:
	Cube(point* origin, GLfloat edgelength, GLfloat depth);
	~Cube();

	void visualize();

private:
	void MakeFace(int i, int j, int k, int l);
	void ActAsContainer(point* o, GLfloat e, GLfloat d);
	void ActAsCube(point* o, GLfloat e);

	point** PointCloud;
	Cube** SubCubes;
};

Cube::Cube(point* origin, GLfloat edgelength, GLfloat depth)
{
	if(depth <= 1.0)
	{
		ActAsCube(origin, edgelength);
	} else {
		ActAsContainer(origin, edgelength, depth);
	}
}

Cube::~Cube()
{
	int i;

	if(PointCloud != NULL)
	{
		for(i = 0; i < 8; i++)
			delete PointCloud[i];
		delete[] PointCloud;
	}

	if(SubCubes != NULL)
	{
		for(i = 0; i < 20; i++)
			delete SubCubes[i];
		delete[] SubCubes;
	}
}

void Cube::ActAsCube(point* o, GLfloat e)
{
	GLfloat ne = e / 2.0;

	PointCloud = new point*[8];		// This is the actual physical cube coordinates;
	SubCubes = NULL;

	PointCloud[0] = new point( ne,  ne,  ne, o);	// net
	PointCloud[1] = new point( ne, -ne,  ne, o);	// set
	PointCloud[2] = new point(-ne,  ne,  ne, o);	// nwt
	PointCloud[3] = new point(-ne, -ne,  ne, o);	// swt
	PointCloud[4] = new point( ne,  ne, -ne, o);	// neb
	PointCloud[5] = new point( ne, -ne, -ne, o);	// seb
	PointCloud[6] = new point(-ne,  ne, -ne, o);	// nwb
	PointCloud[7] = new point(-ne, -ne, -ne, o);	// swb
}

void Cube::ActAsContainer(point* o, GLfloat e, GLfloat d)
{
	GLfloat ne = e / 3.0;

	SubCubes = new Cube*[20];	// These are the centers of each sub cube structure
	PointCloud = NULL;

	SubCubes[0] = new Cube(new point(-ne,  ne,  ne, o), ne, d-1.0);
	SubCubes[1] = new Cube(new point(0.0,  ne,  ne, o), ne, d-1.0);
	SubCubes[2] = new Cube(new point( ne,  ne,  ne, o), ne, d-1.0);
	SubCubes[3] = new Cube(new point( ne, 0.0,  ne, o), ne, d-1.0);
	SubCubes[4] = new Cube(new point( ne, -ne,  ne, o), ne, d-1.0);
	SubCubes[5] = new Cube(new point(0.0, -ne,  ne, o), ne, d-1.0);
	SubCubes[6] = new Cube(new point(-ne, -ne,  ne, o), ne, d-1.0);
	SubCubes[7] = new Cube(new point(-ne, 0.0,  ne, o), ne, d-1.0);
	
	SubCubes[8] = new Cube(new point( ne,  ne,  0.0, o), ne, d-1.0);
	SubCubes[9] = new Cube(new point( ne, -ne,  0.0, o), ne, d-1.0);
	SubCubes[10] = new Cube(new point(-ne, ne,  0.0, o), ne, d-1.0);
	SubCubes[11] = new Cube(new point(-ne, -ne,  0.0, o), ne, d-1.0);
	
	SubCubes[12] = new Cube(new point(-ne,  ne, -ne, o), ne, d-1.0);
	SubCubes[13] = new Cube(new point(0.0,  ne, -ne, o), ne, d-1.0);
	SubCubes[14] = new Cube(new point( ne,  ne, -ne, o), ne, d-1.0);
	SubCubes[15] = new Cube(new point( ne, 0.0, -ne, o), ne, d-1.0);
	SubCubes[16] = new Cube(new point( ne, -ne, -ne, o), ne, d-1.0);
	SubCubes[17] = new Cube(new point(0.0, -ne, -ne, o), ne, d-1.0);
	SubCubes[18] = new Cube(new point(-ne, -ne, -ne, o), ne, d-1.0);
	SubCubes[19] = new Cube(new point(-ne, 0.0, -ne, o), ne, d-1.0);
}

void Cube::MakeFace(int i, int j, int k, int l)
{
		glVertex3f(PointCloud[i]->x, PointCloud[i]->y, PointCloud[i]->z);
		glVertex3f(PointCloud[j]->x, PointCloud[j]->y, PointCloud[j]->z);
		glVertex3f(PointCloud[k]->x, PointCloud[k]->y, PointCloud[k]->z);
		glVertex3f(PointCloud[l]->x, PointCloud[l]->y, PointCloud[l]->z);

}

void Cube::visualize()
{
	int i;

	if(PointCloud != NULL)
	{
		glBegin(GL_QUADS);
			glColor3f(1.0,0.0,0.0);// top
			MakeFace(0,2,3,1);
			glColor3f(0.0,1.0,1.0);//bottom
			MakeFace(4,6,7,5);
			
			glColor3f(0.0,1.0,0.0);// north
			MakeFace(0,2,6,4);
			glColor3f(1.0,0.0,1.0);// south
			MakeFace(1,3,7,5);
			
			glColor3f(0.0,0.0,1.0);//east
			MakeFace(0,4,5,1);
			glColor3f(1.0,1.0,0.0);// west
			MakeFace(2,6,7,3);
		glEnd();
	}

	if(SubCubes != NULL)
	{
		for(i = 0; i < 20; i++)
		{
			SubCubes[i]->visualize();
		}
	}
}

The implementation of the program is your run-of-the-mill OpenGL boilerplate. The application takes in an argument dictating what order of sponge it should produce. It sets up the camera and positions the sponge at the origin. The sponge is left stationary, while the camera is made to orbit upon each display(). On idle(), a redisplay message is sent back to the OpenGL system in order to achieve the effect that the sponge is spinning.

//=================================================================================
//=================================================================================
Cube* MengerCube;

void idle()
{
	glutPostRedisplay();
}

void display()
{
	static GLfloat rtri = 0.0;
	
	glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
	glMatrixMode(GL_MODELVIEW);
	glLoadIdentity();
	gluLookAt(1.0,1.0,1.0, 0.0,0.0,0.0,0.0,1.0,0.0);
	glRotatef((rtri+=0.932), 1.0, 0.5, -1.0);

	MengerCube->visualize();

	glutSwapBuffers();
}

void reshape(int w, int h)
{
	glViewport(0,0,w,h);
	glMatrixMode(GL_PROJECTION);
	glLoadIdentity();
	glOrtho(-8.0, 8.0,-8.0, 8.0,-8.0, 8.0);
}

void init()
{
	glShadeModel(GL_SMOOTH);
	glClearColor(0.0, 0.0, 0.0, 0.0);
	glClearDepth(1.0f);
	glEnable(GL_DEPTH_TEST);
	glColor3f(1.0, 1.0, 1.0);
}

GLfloat getDepth(char* depth)
{
	int k = atoi(depth);
	if(k <= 1) return 1.0;
	else if (k>= 5) return 5.0;
	else return (GLfloat) k;
}

int main(int argc, char* argv[])
{
	GLfloat depth;
	bool viewAsPointCloud = false;
	point origin(0.0, 0.0, 0.0);

	printf("%d\n",argc);

	switch(argc)
	{
	case 2:
		depth = getDepth(argv[1]);
		break;
	default:
		depth = 2.0;
		break;
	}

	MengerCube = new Cube(&origin, 8.0, depth);

	glutInit(&argc, argv);
	glutInitDisplayMode(GLUT_DOUBLE | GLUT_RGB | GLUT_DEPTH);
	glEnable(GL_DEPTH_TEST);
	glutInitWindowSize(500,500);
	glutInitWindowPosition(0,0);
	glutCreateWindow(*argv);
	glutReshapeFunc(reshape);
	glutDisplayFunc(display);
	glutIdleFunc(idle);
	init();
	glutMainLoop();

	delete MengerCube;
}

Written by lewellen

2012-02-01 at 8:00 am

Tree Drawing: Force-based Algorithm

I’ve written a couple times on different data visualization techniques for viewing hierarchical information. One method that I haven’t discussed is the force-based algorithm approach. After digging around a bit, it looks like this method was originally developed in Birbil’s and Fang’s 2003 publication [pdf], “An Electromagnetism-like Mechanism for Global Optimization”. In terms of tree drawing, the idea is fairly simple: treat the tree as a dynamical system composed of springs and point charges, apply the appropriate physical laws and over time, you will get a nicely laid out tree. My implementation in the clip below demonstrates this approach.


Let’s think about why we want a dynamical system. An idealized tree drawing is aesthetically pleasing, compact, nodes are uniformly distributed and edges between nodes do not cross. Under this proposed system, we can’t guarantee these criteria, but we can get “close enough” to produce something that is both nice on paper and sensible in practice. Treating the nodes as point charges will make the nodes repel from one another. The magnitude of the repulsion is determined by Column’s Law which states that the force applied to a point charge by another point charge follows an inverse square law. If all we had was Coulomb’s Law, then all the nodes will fly away from one another and finally come to rest, but the nodes would be too distantly spread across the screen. To alleviate this, we can treat the edges between nodes as springs. Following Hooke’s Law we can tether the nodes together to preserve the compactness criteria. The magnitude of this force is directly linear with respect to the distance between the nodes.

Let’s jump into some quick physics. To determine the net force applied on a given node, we need to calculate the spring force applied to the node, and then we need to calculate all the point charge forces applied to the node. The key to getting this right is making sure that we get the direction and magnitude of the forces correct. To make sure that we have the direction correct, let \bold{d}_{i,j} = \bold{x}_{i} - \bold{x}_{j} be the vector representing the distance between the i’th and j’th nodes (\bold{x}_{i} representing the i’th node’s location). Let \hat{\bold{d}}_{i,j} = \frac{\bold{d}_{i,j}}{\lVert \bold{d}_{i,j} \rVert} be the normalized vector representing the direction that the force that needs to be applied. For the spring force we are looking at \bold{F}_{i, j} = -k (\lVert \bold{d}_{i,j} \rVert - r) \hat{\bold{d}}_{i,j} where k is the spring constant, r is the length of the spring at rest and \bold{F}_{i,j} is the force between the i’th and j’th node. For the point charge we are interested in \bold{F}_{i,j} = k \frac{ Q_{i} Q_{j} }{ \lVert \bold{d}_{i,j} \rVert ^{2} } \hat{\bold{d}}_{i,j}. Q_{i} is the charge of the i’th node and k is the coulomb constant. (I default all constants to 1.0 in my implementation).

We’ve got the net force on each node figured out, so now it is necessary to figure out the velocity of the nodes as well as their location. To do this, we recall that by Newton’s Laws, a force is the product of a mass and acceleration F = ma. If we know an acceleration, we can derive (integrate mathematically) the velocity and location each iteration. Thus, if we know a node’s present velocity v_{n}, then we know that v_{n+1} = a h + v_{n}. If the node has current location p_{n}, then its next location will be p_{n+1} = \frac{h^{2}}{2} a + h v_{n} + p_n. Once we’ve determined these values between time steps h, we will need to remember to zero out the net force on each node in our implementation. In my implementation I chose a value of 0.1 for my time step) One important aspect to note for this algorithm, is the need to apply dampening to the velocity over time, otherwise the structure continue to move in space and we run into the possibility of running out of numbers on the machine resulting in an overflow error.

To go about implementing all of this you’ll find the time complexity of the algorithm should be on the order O(\lVert{V}\rVert^2). The square order comes from apply Coulomb’s law. By Newton’s third law (equal and opposite forces), you can reduce this down to \frac{ \lVert V \lVert^2 - \lVert V \lVert }{2}, but for significantly sized V you are still going to be looking at O(\lVert{V}\rVert^{2}). You could improve this by applying a binary space partitioning (in particular the R-tree and its variants) data structure to the tree and only use those nodes within a certain radius of the node when calculating the Coulomb forces. Testing my implementation (using Lists), I found things get a little sluggish for a real-time render around 100 nodes on my 2.4GHz laptop. Depending on your application (domain and implementation), you millage may vary. Following are some test cases I used in writing my solution.

Fan Flat
Each node a number of nodes equivalent to its number of siblings minus one Tree containing one level of nodes
Complete Binary Random
Every node has exactly two children Randomly generated tree of depth four

I chose to implement my solution in C# using the Microsoft .net 3.5 Framework (as a side note, 4.0 was recently released on the 12th of April). I went with my custom mathematics library and the WinForms library of the Framework (in the future I would like to migrate this over to WPF) for displaying and end-user manipulation of the algorithm’s variables. I went with a vanilla UserControl that contains a setter for the tree data structure, the algorithm to apply to the structure and the settings for that algorithm. Upon setting the tree, a Timer instance kicks off (non-reentrant) to invoke the algorithm below every 50ms. To get the tree to display correctly on the control, the algorithm calculates the boundary of the nodes and then the control performs a linear map from the model space to the view space. The first implementation used the DoubleBuffered property of the UserControl, but I found that it was ineffective in reducing flickering so I implemented custom double buffering using the BufferedGraphicsContext class. It’s worth noting that most implementations track the kinetic energy of the system to determine when to terminate the algorithm. I chose not to do this, as I didn’t find value in adding in the additional calculation.

using System;
using System.Collections.Generic;
using System.Drawing;
using Library.Mathematics;

namespace ForceBasedTreeLayout {
    public class ForceBasedTreeLayout {
        private LayoutSettings settings;

        public ForceBasedTreeLayout(LayoutSettings settings) {
            this.settings = settings;
        }

        public RectangleF Evaluate(Node root) {
            List<Node> nodes = new List<Node>();
            foreachNode(root, (x) => nodes.Add(x));

            // Apply Hooke's law
            // F = -k x
            foreachNode(root, (parent) => {
                parent.Children.ForEach((child) => {
                    Vector dist = parent.Location - child.Location;
                    Vector restingDist = settings.MinimumSpringLength * dist.Normalize();
                    Vector force = -settings.SpringConstant * (dist - restingDist);

                    parent.Acceleration += force;
                    child.Acceleration -= force;
                });
            });

            // Apply Coulomb's Law
            // F = Q1 Q1 / d^2
            for (int n = 0; n < nodes.Count; n++) {
                for (int m = n + 1; m < nodes.Count; m++) {
                    Vector dist = nodes[n].Location - nodes[m].Location;
                    Vector norm = dist.Normalize();
                    Vector force = new Vector(2, (i) => norm[i] / Math.Pow(dist.Norm() + 1.0, 2.0));

                    nodes[n].Acceleration += force;
                    nodes[m].Acceleration -= force;
                }
            }

            Vector xExtrema = new Vector(2);
            xExtrema[0] = double.MaxValue;
            xExtrema[1] = double.MinValue;

            Vector yExtrema = new Vector(2);
            yExtrema[0] = double.MaxValue;
            yExtrema[1] = double.MinValue;

            // update the locations & velocity && figure out new bounding region
            foreach (Node node in nodes) {
                // p = a0t^2/2 + v0t + p0
                node.Location = (settings.TimeStep * settings.TimeStep * 0.5) * node.Acceleration + (settings.TimeStep) * node.Velocity + node.Location;

                // v = at + v0
                node.Velocity = (settings.TimeStep) * node.Acceleration + node.Velocity;
                node.Velocity = (settings.VelocityDampening) * node.Velocity;
                
                node.Acceleration = new Vector(2, (i) => 0.0);

                xExtrema[0] = Math.Min(xExtrema[0], node.Location[0]);
                xExtrema[1] = Math.Max(xExtrema[1], node.Location[0]);

                yExtrema[0] = Math.Min(yExtrema[0], node.Location[1]);
                yExtrema[1] = Math.Max(yExtrema[1], node.Location[1]);
            }

            RectangleF R = new RectangleF();
            R.X = (float)xExtrema[0];
            R.Y = (float)yExtrema[0];
            R.Width = (float)(xExtrema[1] - xExtrema[0]);
            R.Height = (float)(yExtrema[1] - yExtrema[0]);

            R.X -= R.Width / 2;
            R.Y -= R.Height / 2;
            R.Width *= 2;
            R.Height *= 2;

            return R;
        }

        private void foreachNode(Node root, Action<Node> act) {
            Stack<Node> stack = new Stack<Node>();
            stack.Push(root);
            while (stack.Count > 0) {
                Node node = stack.Pop();
                act(node);
                node.Children.ForEach((x) => stack.Push(x));
            }
        }
    }
}

Edit: 2010-10-21

By popular demand, here is the vector class:

public class Vector {
	private double[] V;

	public int Dimension { get { return V.Length; } }

	public double this[int n] {
		get {
			if (n < 0 || n >= Dimension)
				throw new Exception(string.Format("{0} must be between 0 and {1}", n, Dimension));
			return V[n];
		}
		set {
			if (n < 0 || n >= Dimension)
				throw new Exception(string.Format("{0} must be between 0 and {1}", n, Dimension));
			V[n] = value;
		}
	}

	public Vector() : this(0) { }

	public Vector(int n) { V = new double[n]; }

	public Vector(int n, VectorInitializer initializer) : this(n) {
		for (int i = 0; i < Dimension; i++)
			V[i] = initializer(i);
	}

	public double dot(Vector y) {
		if (Dimension != y.Dimension)
			throw new Exception();
		double d = 0.0;
		for (int n = 0; n < Dimension; n++)
			d += this[n] * y[n];
		return d;
	}

	public override bool Equals(object obj) {
		Vector x = obj as Vector;
		if (x == null || x.Dimension != Dimension)
			return false;

		for (int n = 0; n < Dimension; n++)
			if (this[n] != x[n])
				return false;
		return true;
	}

	static public Vector operator +(Vector x, Vector y) {
		if (x.Dimension != y.Dimension)
			throw new Exception();

		Vector z = new Vector(x.Dimension);
		for (int n = 0; n < z.Dimension; n++)
			z[n] = x[n] + y[n];
		return z;
	}

	static public Vector operator -(Vector x, Vector y) {
		if (x.Dimension != y.Dimension)
			throw new Exception();

		Vector z = new Vector(x.Dimension);
		for (int n = 0; n < z.Dimension; n++)
			z[n] = x[n] - y[n];
		return z;
	}

	static public double operator *(Vector x, Vector y) {
		return x.dot(y);
	}

	static public Vector operator *(double c, Vector x) {
		Vector y = new Vector(x.Dimension);
		for (int n = 0; n < y.Dimension; n++)
			y[n] = c*x[n];
		return y;
	}
}

Written by lewellen

2010-05-01 at 8:00 am

Tree Drawing: The radial tree

leave a comment »

Although not as interesting as a sunburst diagram, the radial tree view can hold its color against a number of more primitive information visualizations. A radial tree view places the root at the center of the screen then fans out each child node. Each child node then fans out its children within a restricted span and continues on until each leaf is reached. The strengths of the technique allow for any easy to digest depiction of the structure behind the data in a compact space. A common application is visualizing computer networks. It is worthwhile to examine the algorithm behind the technique because it is an exercise in identifying simplicity.

While in college, I would have approached this problem by trying to identify the location of nodes in terms of (r, \theta) after all, I want a radial tree view- makes sense to sprint out the gate with a polar system right? While possible, this is a bad path to head down, as you end up drowning in a sea of extraneous details. Rather, it is better to think in terms of (x,y) and then map to (r, \theta). To clarify that position, let’s think about how we’d go about drawing the run of the mill tree view as in the figure below:

radialtree_normal

First some observations:

  • Every node at a given depth lies on the same line.
  • Every child at a given depth is given an equal share of horizontal space independent of necessity relative to the space owned by its parent.

We can construct a simple recursive definition for drawing the tree if we think about these two facts. Given a node, we want to center the node at the top of a region then carve up a region into the number of child nodes where each sub region is equally wide and the same height as the parent minus a layering distance, then draw a line from the parent node to the child node. Continue doing so until all of the nodes have been drawn. All that remains is mapping this tree to the radial tree view below:

radialtree1

To achieve this last step, we want to map each node at (x,y) to a point \hat{C} + r \cdot(cos(\theta), sin(\theta)) Where \hat{C} is the center of the display area. The radius is simply the node’s present y coordinate. \theta can be determined as the ratio between the node’s present x coordinate and the display width times 2 \pi. And thus, the mapping is complete.

Written by lewellen

2008-08-31 at 6:05 pm

Posted in Computer Graphics

Tagged with

Tree Drawing: Sunburst diagram

with one comment

Information Visualization is an interesting subject for me because it is the aesthetics of displaying large amounts of information into an easy to digest vision that depicts metrics of interest. One branch of this subject deals with methods for visualizing hierarchical information such techniques as tree maps, hyperbolic tree views, et al are typically deployed. Of these methods, one less common is the sunburst diagram.

A sunburst diagram is essentially the polar form of a Tree icicle visualization. At the core is the root of the tree, each concentric ring represents the child nodes and is partitioned by the metric of our choosing to represent the percentage a node consumes relative to its siblings. Sunburst diagrams are ideal for displaying any kind of tree data where nodes have weights and the totality of the nodes represents a whole.

While there are certainly several existing software solutions out on the market that utilize this information visualization technique, I feel that the majority of which are too focused on solving one specific problem rather than stepping back to identify the general problem that sunburst diagrams solve. As a result, I find it appropriate to develop my own software solution.

I want to try and incorporate some of the following features that I feel are lacking in other applications:

  • Ability to visualize any XML document by way of import and option to export
  • Choose to visualize metrics based on the attributes defined on each node of an XML document
  • Capacity to search and filter information in the document
  • Freedom to navigate the tree in an intuitive manner
  • A clean and sharp looking user interface that is easy on the eyes

This project really boils down to a Swiss Army knife of simple data analysis tools. Depending on my schedule this project may or may not become a reality but could turn out to be useful for a variety of problems. As the project grows and matures I’ll try to post updates as appropriate.

Written by lewellen

2008-08-03 at 6:00 pm

Posted in Computer Graphics

Tagged with