Archive for the ‘Computer Graphics’ Category
Notes from SIGGRAPH 2015
Introduction
I recently flew out to Los Angeles to attend the 42nd International Conference and Exhibition on Computer Graphics and Interactive Techniques. SIGGRAPH‘s theme this year was the crossroads of discovery bringing it closer to its roots that began here in Boulder, Colorado back in 1974. For me it was a chance to dig a little deeper into Computer Graphics research following my recent studies and develop a better understanding of the industries pushing the domain forward. As with most posts on this site, this is a short reminder to myself, and hopefully gives others an idea of what they could expect if they went.
Production Sessions
Disney – Pixar’s “Lava”: Moving Mountains was an informative production session detailing the process of bringing “Lava” to the screen. “Lava” is the story of Uku, a lonely volcano in search of love. As millions of years go by, he begins to lose hope as he recedes back into the ocean. But all is not lost. Uku finds renewed hope for love as newly formed volcano Lele rises to the surface. After the Pixar magicians reveal their secrets, technical details, and engrossing backstory, “Lava” becomes an even more enjoyable short film.
The presentation began with director James Murphy explaining his personal story inspiring the short before giving a live performance of the titular song. Colin Levy followed Murphy’s conceptualization, story boarding, and clay mockups with how the film would be framed for maximal emotional impact. Levy explain the exploratory process of filming the opening scene of the film to find the right combination of lenses, and flight paths based on real-world references to help illustrate the size and scale of Uku, the hopeless volcano.
Both Aaron Hartline and Austin Lee continued discussing the challenges of animating and rigging Uku, Lele, a pair of dolphins, birds, whales, and turtles (the last four representing young love, newly weds, established lives, and life long love). In particular, the different approaches for animating and rigging the facial features of Uku (eyelids, lips, checks, and so on) and how the teams iterated to find a balance between what the audience might expect from an anthropomorphic mountain and what they wanted to achieve as story tellers.
Perhaps the most interesting moment in the presentation was Dirk Van Gelder’s sneak peak of the enhancements the team made to Presto (Pixar’s in-house animation tool) to provide animators final render quality real-time feedback of their changes through a clever combination of Renderman-based final renders and OpenGL hardware texturing. Aside from the technical novelty, it’s a great example of time saving enhancements that make it easier for people to freely experiment and explore different approaches leading to better results.
The closing discussion by Byron Bashforth and Farhez Rayani on shading and lighting was informative and it was interesting to see how the procedural approaches were done to give Uku both a physically realistic and visually appealing biome consisting of different shaders, and static and procedural assets. Overall, a very interesting peak into the workflow of one of the most venerable studios in the industry.
Birds of a Feather
Having worked in the healthcare space for a fair bit of time, I was attracted to meetings on Volume Rendering and Medical Visualization and HealthTech: Modeling, Interaction, Hardware, and Analysis to see what people have been working on and to get a glimpse of where things are heading.
![]() |
Nicholas Polys of Virginia Tech and Michael Aratow (MD) (both chairs of the Web3D Consortium Medical Working Group) began the medical visualization discussion by going over common libraries such as VTK (The Visualization Toolkit) and Voreen (Volume Rendering Engine), before discussing general purpose analysis and visualization tools such as Paraview. Volume oriented applications such as Seg3D (volume segmentation tool), OsiriX (DICOM viewer) were covered and finally, tools for exploring biomolecular systems such as Chimera, VMD (Visual Molecular Dynamics) and PathSim (Epstein-Barr Virus exploration) were discussed giving the audience a good lay of the land. Brief bit of time was given to surgical training tools based on 3D technologies and haptic feedback (e.g. H3D).
These were all interesting applications and seeing how they all work using different types of human-machine interfaces (standard workstations, within CAVE environments, or even in virtual reality headsets and gloves) was eye opening. The second main theme of the discussion was on standardization when it comes to interoperability and reproducibility. There was a heavy push for X3D along with interoperability with DICOM. Like a lot of massive standards, DICOM has some wiggle room in it that leads to inconsistent implementations from vendors. That makes portability of data between disparate systems complicated (not to mention DICOM incorporates non-graphical metadata such as complex HL7). Suffice to say X3D is biting off a big chunk of work, and I think it will take some time for them to make progress in healthcare since it’s a fragmented industry that is not in the least bit technologically progressive.
One area I felt was absent during the discussion was how 3D graphics could be used to benefit everyday patients. There is a wealth of fMRI and ECoG data that patients could benefit from seeing in an accessible way- for example showing a patient a healthy baseline, then accentuating parts of their own data and explaining how those anomalies affect their well-being. If a component can be developed to deliver that functionality, then it can be incorporated into a patient portal alongside all other charts and information that providers have accumulated for the patient.
The HealthTech discussion was presented by Ramesh Raskar, and his graduate students and postdocs from the MIT Media Lab. They presented a number of low-cost, low-power diagnostic devices for retinal imaging and electroretinography, high-speed tomography, cellphone-based microscopy, skin perfusion photography, and dental imaging. Along with more social oriented technologies for identify safe streets to travel, and automatically discerning mental health from portraits. There were plenty of interesting applications being developed by the group, but it was more of a show and tell by the group than discussing the types of challenges beyond the scope of the work by MIT Media Lab (as impressive as they are). (For example, The fine work 3Shape A/S has done with fast scanning of teeth for digital dentistry.)
One thing that was discussed of key interest was Meddit a way for medical practitioners and researchers to define open problems to maturity, then presenting those challenges to computer scientists to work on and develop solutions. While the company name is uninspired, I think this is the right kind of collaboration platform for the “toolmaker” view of hardware engineers, computer scientists and software engineers as it identifies a real issue, presents an opportunity, and gives a pool of talented, bright people a way to make a difference. I am skeptical that it will take off (I think it would have more success as a niche community within an umbrella collaboration platform- i.e. Stack Exchange model), but the idea is sound and something people should get excited about.
Real-Time Live!
The challenge of real-time graphics is very appealing to me and getting to see what different software studios are working on was a real treat. While there were several presentations and awards given during the two hour long event, three demos stood out to me. Balloon Burst given by Miles Macklin of NVIDIA, BabyX presented by Mark Sagar of University of Auckland, and award winner A Boy and His Kite demoed by Nick Penwarden of Epic Games.
![]() |
Macklin’s demo was impressive in that it simulated more than 750,000 particles (250,000 by their solver Flex, and 512,000 for mist and droplets) and their paper [pdf] Fast Grid-Free Surface Tracking gave some technical background into how they achieved their results. Fluid simulation is something I’d like to spend some time exploring, obviously won’t be able to create something as technical as Macklin’s group, but would like to spend some time on Smoothed-Particle Hydrodynamics, and seeing NVIDIA’s work was a good motivation boost to explore the subject further on my own.
Perhaps the most unexpected entry in the series was Sagar’s BabyX. It was a fascinating assemblage of neural networks, real time graphics, natural language processing, computer vision, and image processing to create the ultimate “Sims” like character- a baby that could learn and invoke different emotional responses based on external stimuli. Real-time graphics were photorealistic, and seeing the modeling behind the system to emulate how the brain behaves in the presence of different dopamine levels (and how those levels correspond to things like Parkinson’s and schizophrenia) was impressive as well. Overall, a fantastic technical achievement and I look forward to following Sagar’s work as it continues to evolve.
My main interest in going to Real-Time Live! was to see Penwarden’s work on A Boy and His Kite. This impressive demo spanning hundred square miles inspired by the Isle of Skye really puts to shame my prior work in creating procedural environments. Nonetheless, it goes to show to far the medium can be pushed and how small the divide between real-time and film is becoming. Computer Graphics World published (July-August 2015) a very thorough technical overview [p. 40-48] of how Penwarden’s team produced the short, in addition to the features added to Unreal Engine 4 to make the demo shine.
Wrap-up
There were many other things I explored that I won’t go into detail- namely the VR Village, Emerging Technologies, Research Posters, Exhibition, and Job Fair. I’m still quite skeptical that virtual reality (and to the same extent augmented reality) technologies will come into the mainstream; I think they’ll continue to be the subject of researchers, gaming enthusiasts, and industry solutions for automotive, and healthcare problems. One thing that was a bit of a disappointment was the Job Fair as there were barely any companies participating. Overall, a positive experience learning what other people are doing in the industry, and getting to see how research is being applied in a variety of different domains including automotive, entertainment, engineering, healthcare, and science.
Category Recognition of Golden and Silver Age Comic Book Covers
Introduction
Motivation
For a while now, I’ve been wanting to work on a computer vision project and decided for my next research focused project that I would learn some image processing and machine learning algorithms in order to build a system that would classify the visual contents of images, a category recognizer. Over the course of the summer I researched several techniques and built the system I had envisioned. The end result is by no means state of the art, but satisfactory for four months of on-and-off development and research. The following post includes my notes on the techniques and algorithms that were used in the project followed by a summary of the system and its performance against a comic book data set that was produced during development.
Subject Matter
The original subject matter of this project were paintings from the 1890s done in the Cloisonnism art style. Artists of the style are exemplified by Emile Bernard, Paul Gaugin and Paul Serusier. The style is characterized by large regions of flat colors outlined by dark lines; characteristics that would be easy to work with using established image processing techniques. During development, it became evident that no one approach would work well with these images. As an alternative, I decided to work with Golden and Silver Age comic book covers from the 1940s to 1960s which also typified this art style. Many of the comic books were drawn by the same individuals such as Jack Kirby, Joe Shuster and Bob Kane. As an added benefit, there are thousands of comic book covers available online compared to the dozens of Cloisonnism paintings.
Image Processing
Representation
An image is a function, , where each input vector,
, represents an image coordinate and each output vector,
, represents the red, blue and green (RGB) channels,
, of an image. Individual input values are bound between zero and the width,
, or height,
, of the image and output values are between zero and
. Each output vector represents a unique color in RGB space giving rise to
possible colors. Below is a basic sample image broken down into to its individual channels.
![]() |
Like any other vector field, transformations can be applied to the image to yield a new image, . In image processing, these transformations are referred to as image filters and come in three varieties of point-based, neighbor-based and image-based filters. As the names suggest, point-based filters map single output vectors to a single output vector, neighbor-based filters map neighboring output vectors to a single output vector, and image-based filters map the whole image and a single or neighboring set of output vectors to a single output vector.
There are many different instances of these types of filters, but only those used in this project are discussed below. Computational complexity and efficient algorithms for each type of filter are also discussed where appropriate.
Point-based Filters
Point-based filters, , map an output vector to a new output vector in the form
. Application of a point-based filter is done in quadratic time with respect to the dimensions of the image
.
Grayscale Projection
It is helpful to collapse the RGB channels of an image down to a single channel for the purpose of simplifying filter results. This can be done by using a filter of the form . Alternatively one can use a filter of the form
to represent the luminescence of the output vector.
Thresholding
A threshold filter serves as a way to accentuate all values in the image greater than or equal to a threshold, , or to attenuate all those values less than the threshold.
The first variety is the step threshold filter, , which exhibits an ideal threshold cutoff after the threshold value.
The second variety is a logistic threshold filter, , with an additional parameter,
, allowing for wiggle room about the threshold yielding a tapered step function as
increases in size.
![]() |
Neighbor-based Filters
All neighbor-based filters take the output vectors neighboring an input vector to calculate a new output vector value. How the neighboring output vectors should be aggregated together is given by a kernel image, , and the computation is represented as a two-dimensional, discrete convolution.
Neighbor-based filters can be applied naïvely in quartic time as a function of the image and kernel dimensions, . However, a more efficient implementation allows for
time by way of the Discrete Fourier Transform.
The Discrete Fourier Transform is a way of converting a signal residing in the spatial domain into a signal in the frequency domain by aggregating waveforms of varying frequencies where each waveform is amplified by its corresponding value in the input signal. The Inverse Discrete Fourier Transform maps a frequency domain signal back to the spatial domain.
Applying the Discrete Fourier Transform to a convolution, , comes with the convenient property that the transformed convolution can be rewritten as the product of the transformed functions,
, by way of the Convolution Theorem.
The improved time complexity is achieved by using a divide a conquer algorithm known as the Fast Fourier Transform which takes advantage of the Danielson-Lanczos Lemma which states that the Discrete Fourier Transform of a signal can be calculated by splitting the signal into two equal sized signals and computing their Discrete Fourier Transform.
For the purposes of image processing, we use the two-dimensional Discrete and Inverse Discrete Fourier Transform.
The expression can be rearranged to be the Discrete Fourier Transform of each column in the image and then computing the resulting Discrete Fourier Transform of those results to obtain the full two-dimensional Discrete Fourier Transform.
As a result, we can extend the Fast Fourier Transform in one dimension easily into two dimensions producing a much more efficient time complexity.
Weighted Means: Gaussian and Inverse Distance
Weighted mean filters are used to modify the morphology of an image by averaging neighboring output vectors together according to some scheme.
A Gaussian filter is used to blur an image by using the Gaussian distribution with standard deviation, , as a kernel.
The inverse distance filter calculates how far the neighboring output vectors are with respect to the new output vector being calculated. Each result is also scaled by the parameter, , allowing for contrast adjustments.
Laplacian
A Laplacian filter detects changes in an image and can be used for sharpening and edge detection. Much like in calculus of a single variable, the slope of a surface can be calculated by the Gradient operator, . Since it is easier to work with a scalar than a vector, the magnitude of the gradient is given by the Laplacian operator,
.
Since an image is a discrete function, the Laplacian operator needs to be approximated numerically using a central difference. represents the spacing between successive samples of the underlying function. Since the finest resolution that can be achieved in an image is an individual displacement,
.
![]() |
Image-based Filters
Image-based filters calculate some information about the contents of the image and then use that information to generate the appropriate point-based and neighbor based filters.
Normalization
The normalization process computes the minimum, and maximum,
values of each channel and linearly maps all values between those extrema to new values between the possible channel extrema of
and
.
This particular image-based filter can be applied in quadratic time, , to calculate the extrema of the image and apply the linear map.
![]() |
Edge Detection
Edge detection is the process of identifying changes (e.g., texture, color, luminance and so on) in an image. As alluded to in the image processing section, the Laplacian filter is central to detecting edges within an image. As a result A sequence of filters is used before and after a Laplacian filter to produce a detector that consistently segments comic book covers. The following sequence of filters was used.
- Grayscale Projection – Since all logical components of a comic book cover are separated by inked lines, it is permissible to ignore the full set of RGB channel information and collapse the image down to a grayscale image.
- Normalization – It is conceivable that the input image has poor contrast and brightness. To ensure that the full range of luminescence values are presented, the image is normalized.
- Gaussian (
) – An image may have some degree of noise superimposed on the image. To reduce the noise, the image is blurred using a Gaussian filter with a standard deviation of
. This is enough to smooth out the image without distorting finer image detail.
- Laplacian – Once the image has been prepared, its edges are calculated using the Laplacian filter.
- Normalization – Most of the changes in the image may be subtle and need to make sure that all edge information is accentuated as much as possible by applying a normalization filter.
- Step Threshold (
) – Since a partial edge isn’t particularly useful information, any edge RGB value less than
is attenuated to zero and all other values accentuated to
.
- Inverse Distance (
) – It is possible that during the threshold process that discontinuities were introduced into some of the edges. To mitigate this impact, an inverse distance filter is used to inflate existing edges and intensify the result with a gain of
.
The complete edge detection process takes computational complexity of due to the neighbor-based filters used to eliminate noise and smooth edge discontinuities.
Segmentation
With the edge image it is possible to segment the image into its visual components. This is achieved by doing a flood fill on the image and using the edge image as the boundaries for the fill. Once the fill runs out of points to flood, the segment is complete and the next remaining point in the image is considered. To reduce the number of minuscule segments, only those segments representing of the image are included.
![]() |
Machine Learning
Classifiers
The task of classification is to identify decision boundaries separating all of the classification within the data set. Such data sets can be linearly or non-linearly separable and as a result, classifiers were developed to solve the linear case and then adapted to deal with the more complicated non-linear case. While there are a number of classifiers, only the K-Nearest Neighbor and Support Vector Machine classifiers were researched and implemented in this project.
K-Nearest Neighbor
The K-Nearest Neighbor classifier is an online classifier which operates under the assumption that a yet to be classified vector is most likely to be the same classification as those training vectors which are closest to the vector based on a distance measure,
.
Distance can be measured in a variety of ways for arbitrary vectors, , residing in some real space. The most common of which are specialized cases of the Minkowski distance.
The Manhattan distance, , yields the distance traveled along a grid between two vectors (hence a name in reference to the New York City borough). The Euclidean distance,
, gives the distance between the vectors in the usual familiar sense. The last specialized cased considered is the Chebyshev distance,
, which gives the maximum distance between any one dimension of the two vectors.
Two factors affect the efficacy of the algorithm. The first is the dimension of the data, , and the size of the train data set,
. As the training data set increases with size, there are more vectors which a test vector must be compared against. As a result, an efficient means of searching the training set must be used to yield satisfactory performance. This can be achieved by using kd-Trees which give
search performance or branch and bound methods giving similar performance. As the dimensionality of the dataset increases, the efficacy of kd-Trees diminishes to a near linear search of the training data set due to the “curse of dimensionality.”
![]() |
Support Vector Machine
Formulation
The Support Vector Machine classifier is an offline linear, binary classifier which operates under the assumption that a training set, , consists of linearly separable classifications,
, of data,
, by some optimal hyperplane of the form
. Where
is the inner product,
and
. When
, then the classification
is presented and when
, the classification
is presented.
![]() |
The hyperplane is padded by two hyperplanes separated by an equal distance to the nearest training examples of each classification. The span between the supporting hyper planes is the margin. The goal then is to pick a hyperplane which provides the largest margin between the two separable classifications. The margin between the supporting hyperplanes is given by . Given the demarcation criteria, the maximum margin will also be subject to the constraint that all training examples satisfy
. As a result of the objective function and accompanying linear constraint, the problem is stated in terms of its native primal Quadratic Programming form.
subject to
To find the optimal parameters, it is easier to translate the problem into a dual form by applying the technique of Lagrange Multipliers. The technique takes an objective function, , and constraint functions,
, and yields a new function
to be optimized subject to the added constraint
.
subject to
The next step is to differentiate the objective function with respect to the parameters to determine the optimal solution. Since the function is concave, the results will yield the desired maximum constraints.
As a result the dual problem can be written as the following:
subject to
,
Handling of non-linearly separable data
In the event that the data is not linearly separable, then an additional parameter, , is added as a penalty factor for those values that reside on the wrong side of the hyperplane. The derivation for the quadratic program is identical to the one presented above with the exception that the lagrange multipliers now have an upper boundary
.
Non-linear classification
By way of Mercer’s Theorem, the linear Support Vector Machine can be modified to allow for non-linear classification through the introduction of symmetric, positive semidefinite kernel functions, . The idea being that if the data is not linearly separable in its present dimensional space that by mapping it to a higher dimensional space that the data may become linearly separable by some higher dimensional hyperplane. The benefit of a kernel function is that the higher dimensional vector need not be computed explicitly. This “kernel trick” allows for all inner products in the dual representation to be substituted with a kernel.
subject to
,
And the decision hyperplane function then becomes:
The following are some typical kernels:
- Linear –
- Polynomial –
- Radial basis function –
- Sigmoid –
From a practical point of view, only the linear and radial basis function kernels from this list should be considered since the polynomial kernel has too many parameters to optimize and the sigmoid kernel does not satisfy the positive semidefinite kernel matrix requirement of Mercer’s Theorem.
Algorithmic details
The Support Vector Machine classifier can be implemented using a quadratic programming solver or by incremental descent algorithms. Both methods work, but are difficult to implement and expensive to procure. An alternative is the Sequential Minimal Optimization algorithm developed by John Platt at Microsoft Research. The algorithm works by analytically solving the dual problem for the case of two training examples then iterating over all of the lagrange multipliers verifying that the constraints are satisfied. For those that are not, the algorithm computes new lagrange multiplier values. The full details of the algorithm can be found in Platt’s paper.
The time complexity of the algorithm is quadratic with respect to the number of training samples and support vectors .
The time complexity of evaluating the decision function is linear with respect to the number of support vectors .
Multiclass Classification
The classification methods presented in the previous section are utilized as binary classifiers. These classifiers can be used to classify multiple classifications by employing a one-vs-all or all-vs-all approach. In the former a single classification is separated from the remaining classifications to produce classifiers for the
classifications. Each classifier is then used to evaluate a vector and the classifier with the highest confidence is then used to declare the classification.
In the latter, a single classification is compared individually to each other classification resulting in classifiers. All of the classifiers are then evaluated against the test vector and the classification with the greatest consensus from the classifiers is declared the classification of the test vector.
Both methods have their place. The benefit of a one-vs-all approach is that there are fewer classifiers to maintain. However, training a single classifier on a complete data set is time consuming and can give deceptive performance measures. All-vs-all does result in more classifiers, but it also provides for faster training which can be easily parallelized on a single machine and distributed to machines on a network.
Classifier Evaluation
Individual classifiers are evaluated by training the classifier against a data set and then determining how many correct and incorrect classifications were produced. This evaluation produces a confusion matrix.
Predicted Classification | ||||
---|---|---|---|---|
Positive | Negatives | Total | ||
Actual Classification | Positive | (TP) True Positive | (FN) False Negative | (AP) Actual Positives |
Negatives | (FP) False Positive | (TN) True Negative | (AN) Actual Negatives | |
Total | (PP) Predicted Positives | (PN) Predicted Negatives | (N) Examples |
The confusion matrix is used to calculate a number of values which are used to evaluate the performance of the classifier. The first of which is the accuracy and error of the classifier. Accuracy measures the number of instances where the actual and predicted classifications matched up and the error for when they do not.
Since we should expect to get different results each time we evaluate a classifier, the values that we obtain above are sample estimates of the true values that are expected. Given enough trails and measurements, it is possible to determine empirically what the true values actually are. However, this is time consuming and it is instead easier to use confidence intervals to determine what interval of values a measurement is mostly likely to fall into.
Training and Testing
Each of the classifiers presented have some number of parameters that must be determined. The parameters can be selected by having some prior knowledge or by exploring the parameter space and determining which parameters yield optimal performance. This is done by performing a simple grid search over the parameter space and evaluating and attempting to minimize the error.
K-folds cross-validation is used at each grid location to produce a reliable measure of the error. The idea is that a data set is split into disjoint sets. The first set is used as a validation set and the remaining
sets are used in unison as the training data set for the classifier. This process is done on the next set and so on until all
sets have been used as a validation set.
System
Implementation
The system was implemented in C# 4.0 on top of the Microsoft .NET Framework. The user interface was written by hand using the WinForms library. No other third-party libraries or frameworks were used. When possible, all algorithms were parallelized to take advantage of multi-core capabilities to improve processing times.
Summary
The system consists of two modes of operation: training and production. In training, a human classifier labels image segments with an appropriate classification. New image segments are then taken into consideration during the training of machine learning algorithms. Those algorithms producing the lowest error for a given classification are then used in production mode. During production, a user submits an image and each image segment is then evaluated against the available classifiers. Those image segments are then presented to the user with the most likely classification. These two modes along with their workflows and components are illustrated in the following diagram.
![]() |
Training Mode
Data Set Construction
The user interface of the system allows users to add an image segment to a local data set of images. Once added, the image is then processed to yield image segments. The user can then label an image segment by editing the segment and moving on to the next image segment. This allows for easy and efficient human classification of data. If the user does not wish to keep the image, he or she may remove the image from the data set as well.
![]() |
Data Set Cleaning
During the construction phase, errors may be introduced into the data set typically in the case of typos or forgetting which segment was currently being edited. The data set is cleaned by listing out all available classifications and presenting the user with all available segments associated with that classification. The user can then review the image segment as it was identified in the source image. If the user does not wish to keep the classification, he or she may remove the image from the data set as well.
![]() |
Data Set Statistics
The data set consists of 496 comic book covers pulled from the Cover Browser database of comic book covers. The first 62 consecutive published comic book covers where used from Action Comics, Amazing Spider-man, Batman, Captain America, Daredevil, Detective Comics, Superman, and Wonder Woman and then processed by the image processing subsystem yielding 24,369 image segments. 11,463 of these segments represented classifiable segments which were then labeled by hand over the course of two weeks; the remaining segments were then discarded.
![]() |
In total, there were 239 classifications identified in the data set among 18 categories. Text, clothing, geography, and transportation categories accounting for 90% of the data set. Since the majority of classification were incidental, only those classifications having 50 or more image segments were considered by the application leaving a total of 38 classifications.
Classifier Evaluation
For the 38 classifications meeting the minimum criteria for classification, the K-Nearest Neighbor approach worked well in distinguishing between text classifications from other classifications and between intra-text classifications for both all-vs-all and one-vs-all schemes.
![]() |
![]() |
All-vs-All K-Nearest Neighbor Performance. | One-vs-All K-Nearest Neighbor Performance. |
The Support Vector Machine approach presented unremarkable results for both all-vs-all and one-vs-all methods. In the former, only a few pairings resulted in acceptable error rates whereas the later presented only a couple acceptable error rates.
![]() |
![]() |
All-vs-All Support Vector Machine Performance. | One-vs-All Support Vector Machine Performance. |
For both classification methods presented, the all-vs-all method yielded superior results to the one-vs-all method. In comparing the two classifier methods, the K-Nearest Neighbor seems to have done better than the Support Vector Machine approach, contrary to what was expected from literature. Both classifier methods are used in production mode.
Production Mode
Production mode allows the end user to add an image to the data set and then review the most likely classifications produced by evaluating each image segment against the available set of classifiers. The end user is then expected to review each segment and accept or reject the suggested classification. Aside from this additional functionality, production mode is nearly identical in functionality to training mode.
![]() |
Conclusions
The time spent on this project was well spent. I met the objectives that I laid out at the beginning of the project and now have a better understanding of the image processing algorithms and machine learning concepts from a theoretical and practical point of view.
Future Work
Segmentation
One issue with the existing implementation is that it over segments the image. Ideally, fewer segments would be produced that are more closely aligned with their conceptual classification. There are a number of popular alternatives to the approach taken, such as level set methods, which should be further investigated.
Classification
The approach taken to map scaled versions of the image segments to a space is simple to implement, but it did not assist well in the classification process. Alternative mappings such as histogram models should be evaluated in the future to decrease classification times and to determine if classification error rates can be reduced.
System User Interface
While it was disappointing to have spent so much time building a data set only to have to limit what was considered, it assisted me in building a user interface that had to be easy and fast to use. The application can certainly be developed further and adapted to allow for other data sets to be constructed, image segmentation methods to be added and additional classifications to be evaluated.
System Scalability
The system is limited now to a single machine, but to grow and handle more classifications, it would need to be modified to run on multiple machines, have a web-based user interface developed and a capable database to handle the massive amounts of data that would be required to support a data set on the scale of the complete Cover Browser’s or similar sites’ databases (e.g., 450,000 comic book covers scaled linearly would require 546 GiB of storage.) Not to mention data center considerations for overall system availability and scalability.
References
Aly, Mohamed. Survey on Multiclass Classification Methods. [pdf] Rep. Oct. 2011. Caltech. 24 Aug. 2012.
Asmar, Nakhle H. Partial Differential Equations: With Fourier Series and Boundary Value Problems. 2nd ed. Upper Saddle River, NJ: Pearson Prentice Hall, 2004. Print.
Bousquet, Olivier, Stephane Boucheron, and Gabor Lugosi. “Introduction to Statistical Learning Theory.” [pdf] Advanced Lectures on Machine Learning 2003,Advanced Lectures on Machine Learning: ML Summer Schools 2003, Canberra, Australia, February 2-14, 2003, Tübingen, Germany, August 4-16, 2003 (2004): 169-207. 7 July 2012.
Boyd, Stephen, and Lieven Vandenberghe. Convex Optimization [pdf]. N.p.: Cambridge UP, 2004. Web. 28 June 2012.
Burden, Richard L., and J. Douglas. Faires. Numerical Analysis. 8th ed. Belmont, CA: Thomson Brooks/Cole, 2005. Print.
Caruana, Rich, Nikos Karampatziakis, and Ainur Yessenalina. “An Empirical Evaluation of Supervised Learning in High Dimensions.” [pdf] ICML ’08 Proceedings of the 25th international conference on Machine learning (2008): 96-103. 2 May 2008. 6 June 2012.
Fukunaga, Keinosuke, and Patrenahalli M. Narendra. “A Branch and Bound Algorithm for Computing k-Nearest Neighbors.” [pdf] IEEE Transactions on Computers (1975): 750-53. 9 Jan. 2004. 27 Aug. 2012.
Gerlach, U. H. Linear Mathematics in Infinite Dimensions: Signals, Boundary Value Problems and Special Functions. Beta ed. 09 Dec. 2010. Web. 29 June 2012.
Glynn, Earl F. “Fourier Analysis and Image Processing.” [pdf] Lecture. Bioinformatics Weekly Seminar. 14 Feb. 2007. Web. 29 May 2012.
Gunn, Steve R. “Support Vector Machines for Classification and Regression” [pdf]. Working paper. 10 May 1998. University of Southampton. 6 June 2012.
Hlavac, Vaclav. “Fourier Transform, in 1D and in 2D.” [pdf] Lecture. Czech Technical University in Prague, 6 Mar. 2012. Web. 30 May 2012.
Hsu, Chih-Wei, Chih-Chung Chang, and Chih-Jen Lin. A Practical Guide to Support Vector Classification. [pdf] Tech. 18 May 2010. National Taiwan University. 6 June 2012.
Kibriya, Ashraf M. and Eibe Frank. “An empirical comparison of exact nearest neighbour algorithms.” [pdf] Proc 11th European Conference on Principles and Practice of Knowledge Discovery in Databases. (2007): 140-51. 27 Aug. 2012.
Marshall, A. D. “Vision Systems.” Vision Systems. Web. 29 May 2012.
Panigraphy, Rina. Nearest Neighbor Search using Kd-trees. [pdf] Tech. 4 Dec. 2006. Stanford University. 27 Aug. 2012.
Pantic, Maja. “Lecture 11-12: Evaluating Hypotheses.” [pdf] Imperial College London. 27 Aug. 2012.
Platt, John C. “Fast Training of Support Vector Machines Using Sequential Minimal Optimization.” [pdf] Advances in Kernel Methods – Support Vector Learning (1999): 185-208. Microsoft Research. Web. 29 June 2012.
Sonka, Milan, Vaclav Hlavac, and Roger Boyle. Image Processing, Analysis, and Machine Vision. 2nd ed. CL-Engineering, 1998. 21 Aug. 2000. Web. 29 May 2012.
Szeliski, Richard. Computer vision: Algorithms and applications. London: Springer, 2011. Print.
Tam, Pang-Ning, Michael Steinbach, and Vipin Kumar. “Classification: Basic Concepts, Decision Trees, and Model Evaluation.” [pdf] Introduction to Data Mining. Addison-Wesley, 2005. 145-205. 24 Aug. 2012.
Vajda, Steven. Mathematical programming. Mineola, NY: Dover Publications, 2009. Print.
Welling, Max. “Support Vector Machines“. [pdf] 27 Jan. 2005. University of Toronto. 28 June 2012
Weston, Jason. “Support Vector Machine (and Statistical Learning Theory) Tutorial.” [pdf] Columbia University, New York City. 7 Nov. 2007. 28 June 2012.
Zhang, Hui, Jason E. Fritts, and Sally A. Goldman. “Image Segmentation Evaluation: A Survey of Unsupervised Methods.” [pdf] Computer Vision and Image Understanding 110 (2008): 260-80. 24 Aug. 2012.
Copyright
Images in this post are used under §107(2) Limitations on exclusive rights: Fair use of Chapter 1: Subject Matter and Scope of Copyright of the of the Copyright Act of 1976 of Title 17 of the United States Code.
Menger Sponge in C++ using OpenGL
This past summer I was going through some old projects and came across a Menger Sponge visualizer that I wrote back in college. A Menger Sponge is simple fractal that has infinite surface area and encloses zero volume. The sponge is constructed in successive iterations and the first four iterations are rendered in the video below.
The sponge starts as a single cube that is segmented into twenty-seven equally sized cubes. The center cubes of each face and that of the parent cube are then discarded and the process is applied again to each of the remaining cubes. Visually, the process looks like the following:
The geometry of the process is straight forward. Starting with a cube’s origin, , and edge length,
, each of the children’s attributes can be calculated. Each child’s edge length is given by
. Each child’s origin given by
. The constant represents a child’s relative position (e.g.,
) to its parent.
The following implementation isn’t particularly well written, but it accomplishes the desired end result. The point
and Cube
classes achieve the logic that I’ve outlined above. Cube
can be thought of as a tree structure that is generated upon instantiation. The visualize()
method pumps out the desired OpenGL commands to produce the faces of the cubes.
#include <GL\glut.h> #include <math.h> #include <stdlib.h> #include <stdio.h> #include <string.h> //================================================================================= //================================================================================= class point { public: point(GLfloat x, GLfloat y, GLfloat z, point* ref = NULL); void visualize(); GLfloat x,y,z; }; point::point(GLfloat x, GLfloat y, GLfloat z, point* ref) { this->x = x; this->y = y; this->z = z; if(ref != NULL) { this->x += ref->x; this->y += ref->y; this->z += ref->z; } } //================================================================================= //================================================================================= class Cube { public: Cube(point* origin, GLfloat edgelength, GLfloat depth); ~Cube(); void visualize(); private: void MakeFace(int i, int j, int k, int l); void ActAsContainer(point* o, GLfloat e, GLfloat d); void ActAsCube(point* o, GLfloat e); point** PointCloud; Cube** SubCubes; }; Cube::Cube(point* origin, GLfloat edgelength, GLfloat depth) { if(depth <= 1.0) { ActAsCube(origin, edgelength); } else { ActAsContainer(origin, edgelength, depth); } } Cube::~Cube() { int i; if(PointCloud != NULL) { for(i = 0; i < 8; i++) delete PointCloud[i]; delete[] PointCloud; } if(SubCubes != NULL) { for(i = 0; i < 20; i++) delete SubCubes[i]; delete[] SubCubes; } } void Cube::ActAsCube(point* o, GLfloat e) { GLfloat ne = e / 2.0; PointCloud = new point*[8]; // This is the actual physical cube coordinates; SubCubes = NULL; PointCloud[0] = new point( ne, ne, ne, o); // net PointCloud[1] = new point( ne, -ne, ne, o); // set PointCloud[2] = new point(-ne, ne, ne, o); // nwt PointCloud[3] = new point(-ne, -ne, ne, o); // swt PointCloud[4] = new point( ne, ne, -ne, o); // neb PointCloud[5] = new point( ne, -ne, -ne, o); // seb PointCloud[6] = new point(-ne, ne, -ne, o); // nwb PointCloud[7] = new point(-ne, -ne, -ne, o); // swb } void Cube::ActAsContainer(point* o, GLfloat e, GLfloat d) { GLfloat ne = e / 3.0; SubCubes = new Cube*[20]; // These are the centers of each sub cube structure PointCloud = NULL; SubCubes[0] = new Cube(new point(-ne, ne, ne, o), ne, d-1.0); SubCubes[1] = new Cube(new point(0.0, ne, ne, o), ne, d-1.0); SubCubes[2] = new Cube(new point( ne, ne, ne, o), ne, d-1.0); SubCubes[3] = new Cube(new point( ne, 0.0, ne, o), ne, d-1.0); SubCubes[4] = new Cube(new point( ne, -ne, ne, o), ne, d-1.0); SubCubes[5] = new Cube(new point(0.0, -ne, ne, o), ne, d-1.0); SubCubes[6] = new Cube(new point(-ne, -ne, ne, o), ne, d-1.0); SubCubes[7] = new Cube(new point(-ne, 0.0, ne, o), ne, d-1.0); SubCubes[8] = new Cube(new point( ne, ne, 0.0, o), ne, d-1.0); SubCubes[9] = new Cube(new point( ne, -ne, 0.0, o), ne, d-1.0); SubCubes[10] = new Cube(new point(-ne, ne, 0.0, o), ne, d-1.0); SubCubes[11] = new Cube(new point(-ne, -ne, 0.0, o), ne, d-1.0); SubCubes[12] = new Cube(new point(-ne, ne, -ne, o), ne, d-1.0); SubCubes[13] = new Cube(new point(0.0, ne, -ne, o), ne, d-1.0); SubCubes[14] = new Cube(new point( ne, ne, -ne, o), ne, d-1.0); SubCubes[15] = new Cube(new point( ne, 0.0, -ne, o), ne, d-1.0); SubCubes[16] = new Cube(new point( ne, -ne, -ne, o), ne, d-1.0); SubCubes[17] = new Cube(new point(0.0, -ne, -ne, o), ne, d-1.0); SubCubes[18] = new Cube(new point(-ne, -ne, -ne, o), ne, d-1.0); SubCubes[19] = new Cube(new point(-ne, 0.0, -ne, o), ne, d-1.0); } void Cube::MakeFace(int i, int j, int k, int l) { glVertex3f(PointCloud[i]->x, PointCloud[i]->y, PointCloud[i]->z); glVertex3f(PointCloud[j]->x, PointCloud[j]->y, PointCloud[j]->z); glVertex3f(PointCloud[k]->x, PointCloud[k]->y, PointCloud[k]->z); glVertex3f(PointCloud[l]->x, PointCloud[l]->y, PointCloud[l]->z); } void Cube::visualize() { int i; if(PointCloud != NULL) { glBegin(GL_QUADS); glColor3f(1.0,0.0,0.0);// top MakeFace(0,2,3,1); glColor3f(0.0,1.0,1.0);//bottom MakeFace(4,6,7,5); glColor3f(0.0,1.0,0.0);// north MakeFace(0,2,6,4); glColor3f(1.0,0.0,1.0);// south MakeFace(1,3,7,5); glColor3f(0.0,0.0,1.0);//east MakeFace(0,4,5,1); glColor3f(1.0,1.0,0.0);// west MakeFace(2,6,7,3); glEnd(); } if(SubCubes != NULL) { for(i = 0; i < 20; i++) { SubCubes[i]->visualize(); } } }
The implementation of the program is your run-of-the-mill OpenGL boilerplate. The application takes in an argument dictating what order of sponge it should produce. It sets up the camera and positions the sponge at the origin. The sponge is left stationary, while the camera is made to orbit upon each display()
. On idle()
, a redisplay message is sent back to the OpenGL system in order to achieve the effect that the sponge is spinning.
//================================================================================= //================================================================================= Cube* MengerCube; void idle() { glutPostRedisplay(); } void display() { static GLfloat rtri = 0.0; glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); glMatrixMode(GL_MODELVIEW); glLoadIdentity(); gluLookAt(1.0,1.0,1.0, 0.0,0.0,0.0,0.0,1.0,0.0); glRotatef((rtri+=0.932), 1.0, 0.5, -1.0); MengerCube->visualize(); glutSwapBuffers(); } void reshape(int w, int h) { glViewport(0,0,w,h); glMatrixMode(GL_PROJECTION); glLoadIdentity(); glOrtho(-8.0, 8.0,-8.0, 8.0,-8.0, 8.0); } void init() { glShadeModel(GL_SMOOTH); glClearColor(0.0, 0.0, 0.0, 0.0); glClearDepth(1.0f); glEnable(GL_DEPTH_TEST); glColor3f(1.0, 1.0, 1.0); } GLfloat getDepth(char* depth) { int k = atoi(depth); if(k <= 1) return 1.0; else if (k>= 5) return 5.0; else return (GLfloat) k; } int main(int argc, char* argv[]) { GLfloat depth; bool viewAsPointCloud = false; point origin(0.0, 0.0, 0.0); printf("%d\n",argc); switch(argc) { case 2: depth = getDepth(argv[1]); break; default: depth = 2.0; break; } MengerCube = new Cube(&origin, 8.0, depth); glutInit(&argc, argv); glutInitDisplayMode(GLUT_DOUBLE | GLUT_RGB | GLUT_DEPTH); glEnable(GL_DEPTH_TEST); glutInitWindowSize(500,500); glutInitWindowPosition(0,0); glutCreateWindow(*argv); glutReshapeFunc(reshape); glutDisplayFunc(display); glutIdleFunc(idle); init(); glutMainLoop(); delete MengerCube; }
Tree Drawing: Force-based Algorithm
I’ve written a couple times on different data visualization techniques for viewing hierarchical information. One method that I haven’t discussed is the force-based algorithm approach. After digging around a bit, it looks like this method was originally developed in Birbil’s and Fang’s 2003 publication [pdf], “An Electromagnetism-like Mechanism for Global Optimization”. In terms of tree drawing, the idea is fairly simple: treat the tree as a dynamical system composed of springs and point charges, apply the appropriate physical laws and over time, you will get a nicely laid out tree. My implementation in the clip below demonstrates this approach.
Let’s think about why we want a dynamical system. An idealized tree drawing is aesthetically pleasing, compact, nodes are uniformly distributed and edges between nodes do not cross. Under this proposed system, we can’t guarantee these criteria, but we can get “close enough” to produce something that is both nice on paper and sensible in practice. Treating the nodes as point charges will make the nodes repel from one another. The magnitude of the repulsion is determined by Column’s Law which states that the force applied to a point charge by another point charge follows an inverse square law. If all we had was Coulomb’s Law, then all the nodes will fly away from one another and finally come to rest, but the nodes would be too distantly spread across the screen. To alleviate this, we can treat the edges between nodes as springs. Following Hooke’s Law we can tether the nodes together to preserve the compactness criteria. The magnitude of this force is directly linear with respect to the distance between the nodes.
Let’s jump into some quick physics. To determine the net force applied on a given node, we need to calculate the spring force applied to the node, and then we need to calculate all the point charge forces applied to the node. The key to getting this right is making sure that we get the direction and magnitude of the forces correct. To make sure that we have the direction correct, let be the vector representing the distance between the i’th and j’th nodes (
representing the i’th node’s location). Let
be the normalized vector representing the direction that the force that needs to be applied. For the spring force we are looking at
where
is the spring constant,
is the length of the spring at rest and
is the force between the i’th and j’th node. For the point charge we are interested in
.
is the charge of the i’th node and
is the coulomb constant. (I default all constants to 1.0 in my implementation).
We’ve got the net force on each node figured out, so now it is necessary to figure out the velocity of the nodes as well as their location. To do this, we recall that by Newton’s Laws, a force is the product of a mass and acceleration . If we know an acceleration, we can derive (integrate mathematically) the velocity and location each iteration. Thus, if we know a node’s present velocity
, then we know that
. If the node has current location
, then its next location will be
. Once we’ve determined these values between time steps
, we will need to remember to zero out the net force on each node in our implementation. In my implementation I chose a value of 0.1 for my time step) One important aspect to note for this algorithm, is the need to apply dampening to the velocity over time, otherwise the structure continue to move in space and we run into the possibility of running out of numbers on the machine resulting in an overflow error.
To go about implementing all of this you’ll find the time complexity of the algorithm should be on the order . The square order comes from apply Coulomb’s law. By Newton’s third law (equal and opposite forces), you can reduce this down to
, but for significantly sized
you are still going to be looking at
. You could improve this by applying a binary space partitioning (in particular the R-tree and its variants) data structure to the tree and only use those nodes within a certain radius of the node when calculating the Coulomb forces. Testing my implementation (using Lists), I found things get a little sluggish for a real-time render around 100 nodes on my 2.4GHz laptop. Depending on your application (domain and implementation), you millage may vary. Following are some test cases I used in writing my solution.
Fan | Flat |
---|---|
![]() |
![]() |
Each node a number of nodes equivalent to its number of siblings minus one | Tree containing one level of nodes |
Complete Binary | Random |
![]() |
![]() |
Every node has exactly two children | Randomly generated tree of depth four |
I chose to implement my solution in C# using the Microsoft .net 3.5 Framework (as a side note, 4.0 was recently released on the 12th of April). I went with my custom mathematics library and the WinForms library of the Framework (in the future I would like to migrate this over to WPF) for displaying and end-user manipulation of the algorithm’s variables. I went with a vanilla UserControl that contains a setter for the tree data structure, the algorithm to apply to the structure and the settings for that algorithm. Upon setting the tree, a Timer instance kicks off (non-reentrant) to invoke the algorithm below every 50ms. To get the tree to display correctly on the control, the algorithm calculates the boundary of the nodes and then the control performs a linear map from the model space to the view space. The first implementation used the DoubleBuffered property of the UserControl, but I found that it was ineffective in reducing flickering so I implemented custom double buffering using the BufferedGraphicsContext class. It’s worth noting that most implementations track the kinetic energy of the system to determine when to terminate the algorithm. I chose not to do this, as I didn’t find value in adding in the additional calculation.
using System; using System.Collections.Generic; using System.Drawing; using Library.Mathematics; namespace ForceBasedTreeLayout { public class ForceBasedTreeLayout { private LayoutSettings settings; public ForceBasedTreeLayout(LayoutSettings settings) { this.settings = settings; } public RectangleF Evaluate(Node root) { List<Node> nodes = new List<Node>(); foreachNode(root, (x) => nodes.Add(x)); // Apply Hooke's law // F = -k x foreachNode(root, (parent) => { parent.Children.ForEach((child) => { Vector dist = parent.Location - child.Location; Vector restingDist = settings.MinimumSpringLength * dist.Normalize(); Vector force = -settings.SpringConstant * (dist - restingDist); parent.Acceleration += force; child.Acceleration -= force; }); }); // Apply Coulomb's Law // F = Q1 Q1 / d^2 for (int n = 0; n < nodes.Count; n++) { for (int m = n + 1; m < nodes.Count; m++) { Vector dist = nodes[n].Location - nodes[m].Location; Vector norm = dist.Normalize(); Vector force = new Vector(2, (i) => norm[i] / Math.Pow(dist.Norm() + 1.0, 2.0)); nodes[n].Acceleration += force; nodes[m].Acceleration -= force; } } Vector xExtrema = new Vector(2); xExtrema[0] = double.MaxValue; xExtrema[1] = double.MinValue; Vector yExtrema = new Vector(2); yExtrema[0] = double.MaxValue; yExtrema[1] = double.MinValue; // update the locations & velocity && figure out new bounding region foreach (Node node in nodes) { // p = a0t^2/2 + v0t + p0 node.Location = (settings.TimeStep * settings.TimeStep * 0.5) * node.Acceleration + (settings.TimeStep) * node.Velocity + node.Location; // v = at + v0 node.Velocity = (settings.TimeStep) * node.Acceleration + node.Velocity; node.Velocity = (settings.VelocityDampening) * node.Velocity; node.Acceleration = new Vector(2, (i) => 0.0); xExtrema[0] = Math.Min(xExtrema[0], node.Location[0]); xExtrema[1] = Math.Max(xExtrema[1], node.Location[0]); yExtrema[0] = Math.Min(yExtrema[0], node.Location[1]); yExtrema[1] = Math.Max(yExtrema[1], node.Location[1]); } RectangleF R = new RectangleF(); R.X = (float)xExtrema[0]; R.Y = (float)yExtrema[0]; R.Width = (float)(xExtrema[1] - xExtrema[0]); R.Height = (float)(yExtrema[1] - yExtrema[0]); R.X -= R.Width / 2; R.Y -= R.Height / 2; R.Width *= 2; R.Height *= 2; return R; } private void foreachNode(Node root, Action<Node> act) { Stack<Node> stack = new Stack<Node>(); stack.Push(root); while (stack.Count > 0) { Node node = stack.Pop(); act(node); node.Children.ForEach((x) => stack.Push(x)); } } } }
Edit: 2010-10-21
By popular demand, here is the vector class:
public class Vector { private double[] V; public int Dimension { get { return V.Length; } } public double this[int n] { get { if (n < 0 || n >= Dimension) throw new Exception(string.Format("{0} must be between 0 and {1}", n, Dimension)); return V[n]; } set { if (n < 0 || n >= Dimension) throw new Exception(string.Format("{0} must be between 0 and {1}", n, Dimension)); V[n] = value; } } public Vector() : this(0) { } public Vector(int n) { V = new double[n]; } public Vector(int n, VectorInitializer initializer) : this(n) { for (int i = 0; i < Dimension; i++) V[i] = initializer(i); } public double dot(Vector y) { if (Dimension != y.Dimension) throw new Exception(); double d = 0.0; for (int n = 0; n < Dimension; n++) d += this[n] * y[n]; return d; } public override bool Equals(object obj) { Vector x = obj as Vector; if (x == null || x.Dimension != Dimension) return false; for (int n = 0; n < Dimension; n++) if (this[n] != x[n]) return false; return true; } static public Vector operator +(Vector x, Vector y) { if (x.Dimension != y.Dimension) throw new Exception(); Vector z = new Vector(x.Dimension); for (int n = 0; n < z.Dimension; n++) z[n] = x[n] + y[n]; return z; } static public Vector operator -(Vector x, Vector y) { if (x.Dimension != y.Dimension) throw new Exception(); Vector z = new Vector(x.Dimension); for (int n = 0; n < z.Dimension; n++) z[n] = x[n] - y[n]; return z; } static public double operator *(Vector x, Vector y) { return x.dot(y); } static public Vector operator *(double c, Vector x) { Vector y = new Vector(x.Dimension); for (int n = 0; n < y.Dimension; n++) y[n] = c*x[n]; return y; } }
Tree Drawing: The radial tree
Although not as interesting as a sunburst diagram, the radial tree view can hold its color against a number of more primitive information visualizations. A radial tree view places the root at the center of the screen then fans out each child node. Each child node then fans out its children within a restricted span and continues on until each leaf is reached. The strengths of the technique allow for any easy to digest depiction of the structure behind the data in a compact space. A common application is visualizing computer networks. It is worthwhile to examine the algorithm behind the technique because it is an exercise in identifying simplicity.
While in college, I would have approached this problem by trying to identify the location of nodes in terms of after all, I want a radial tree view- makes sense to sprint out the gate with a polar system right? While possible, this is a bad path to head down, as you end up drowning in a sea of extraneous details. Rather, it is better to think in terms of
and then map to
. To clarify that position, let’s think about how we’d go about drawing the run of the mill tree view as in the figure below:
First some observations:
- Every node at a given depth lies on the same line.
- Every child at a given depth is given an equal share of horizontal space independent of necessity relative to the space owned by its parent.
We can construct a simple recursive definition for drawing the tree if we think about these two facts. Given a node, we want to center the node at the top of a region then carve up a region into the number of child nodes where each sub region is equally wide and the same height as the parent minus a layering distance, then draw a line from the parent node to the child node. Continue doing so until all of the nodes have been drawn. All that remains is mapping this tree to the radial tree view below:
To achieve this last step, we want to map each node at to a point
Where
is the center of the display area. The radius is simply the node’s present
coordinate.
can be determined as the ratio between the node’s present
coordinate and the display width times
. And thus, the mapping is complete.
Tree Drawing: Sunburst diagram
Information Visualization is an interesting subject for me because it is the aesthetics of displaying large amounts of information into an easy to digest vision that depicts metrics of interest. One branch of this subject deals with methods for visualizing hierarchical information such techniques as tree maps, hyperbolic tree views, et al are typically deployed. Of these methods, one less common is the sunburst diagram.
A sunburst diagram is essentially the polar form of a Tree icicle visualization. At the core is the root of the tree, each concentric ring represents the child nodes and is partitioned by the metric of our choosing to represent the percentage a node consumes relative to its siblings. Sunburst diagrams are ideal for displaying any kind of tree data where nodes have weights and the totality of the nodes represents a whole.
While there are certainly several existing software solutions out on the market that utilize this information visualization technique, I feel that the majority of which are too focused on solving one specific problem rather than stepping back to identify the general problem that sunburst diagrams solve. As a result, I find it appropriate to develop my own software solution.
I want to try and incorporate some of the following features that I feel are lacking in other applications:
- Ability to visualize any XML document by way of import and option to export
- Choose to visualize metrics based on the attributes defined on each node of an XML document
- Capacity to search and filter information in the document
- Freedom to navigate the tree in an intuitive manner
- A clean and sharp looking user interface that is easy on the eyes
This project really boils down to a Swiss Army knife of simple data analysis tools. Depending on my schedule this project may or may not become a reality but could turn out to be useful for a variety of problems. As the project grows and matures I’ll try to post updates as appropriate.