Raniero's Research Blog: October 2014

Wednesday, October 22, 2014

Combining Corners from Multiple Segmenters

Bibliography Information:

Wolin, Field, Hammond. "Combining Corners from Multiple Segmenters". SBIM '11 Proceedings of the Eighth Eurographics Symposium on Sketch-Based Interfaces and Modeling. Pages 117-124 . ACM New York, NY, USA ©2011

URL:

http://dl.acm.org/citation.cfm?id=2021164.2021185&coll=DL&dl=GUIDE

This paper seeks to vastly improve the corner finding process by combining the corner finding algorithms from five different processes and determining which of the given points are discarded and which should be kept by using a variation of feature subset selection. This algorithm uses sequential floating backward selection (different from SFS - Sequential Forward Selection) and a mean-squared error (abbreviated MSE) to find the best subset of corners that are then used to identify the corners themselves. This combination reaches an all-or-nothing accuracy of 0.926.

Corner finding, also known as "segmentation", "fragmentation", or "cusp detection", is a vital aspect of sketch recognition. As the "segmentation" name may imply, it allows an algorithm to divide one polyline into a varying amounts of straight lines, which can easily be recognized and processed individually. From there, it is feasible to recognize lines, arcs, curves, ellipses, spirals, and helices. Existing corner finding algorithms focus on one technique, such as curvature detection or a combination of speed and curvature. There exist considerable drawbacks for each technique, however: over-segmentation (finding too many corners), under-segmentation (finding too few corners) and incorrect segmentation (finding the wrong corners). Another issue is the use of thresholds, which are typically calculated empirically but are far less flexible than data that can be trained onto the system. The paper recognizes the "No Free Lunch" theorem, where the increased focus of an algorithm into a particular area means that the algorithm will lose just as much focus (and performance) in a different area. The basic concept of this paper is that by combining the five major corner recognition algorithms, the shortcomings of one of these recognizers can be mitigated by the focus of the next.

The five segmenters that are used are ShortStraw, Douglas-Peucker, PaleoSketch, Sezgin's segmentation algorithm, and Kim et. al's. ShortStraw, Douglas-Peucker, and PaleoSketch rely on line tests to determine if a segment between two identified corners is a line, and Sezgin's and Kim's algorithms use multiple-primitive segmenters to split the strokes into lines, curves, and arcs. The results from all of these are combined with duplicates removed. This algorithm then uses a feature subset selection algorithm dubbed the "corner subset selection".

Feature subset selection is a technique to remove unneeded dimensions from the classification process that only contribute noise. In this case, each "feature" is the set of points. If removing a point reduces performance by 1%, we remove it since it is not likely to result in providing useful information. Conversely, if removing a point reduces performance by 20%, it is an important point and thus we preserve it.

This algorithm uses sequential floating backward selection algorithms to sift through the initial corner data, which differs from more traditional sequential forward selection algorithms by allowing the algorithm to reconsider previously discarded points. Say, for instance, we remove feature Fs, and in the following step we analyze if adding Fs back into the subset will yield better performance. Bookkeeping techniques are employed to mitigate the possibility of adding the same Fs back and forth in subsequent steps, far after it has been proved that Fs is irrelevant to subset.

To determine the performance of the system, the Mean-Squared Error (MSE) is calculated between the stroke segments and an "ideal" shape created by linking the corners. The equation to calculate the MSE is as follows:

Where i is the index of the original stroke, opti is the optimal point that's vertically closest, pi is a point in the original stroke, and N is the number of original points.

To work against the fact that there's likely to be a change in MSE due to the changing of shape scaling, we normalize MSE between the existing i and the next. Overall, using this technique it was found that the overall accuracy was much higher than any of the individual implementations, and this combination was not significant enough in run time to affect performance.

I think that the paper was well presented and explained. I feel the evaluation segment could have been more elaborated upon, but it is possible that user testing was not needed when the accuracy was considered on an all-or-nothing basis.

Tuesday, October 21, 2014

Sketch Based Interfaces: Early Processing for Sketch Understanding

Bibliographical Information:
Sezgin, Stahovich, Davis. "Sketch Based Interfaces: Early Processing for Sketch Understanding". Proceedings of PUI-2001. NY. 2001. ACM Press.

URL:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.8291

This paper is one of the earlier efforts in attempting to generate corner-finding and curve recognition algorithms. The extended intention of this is to recognize shape primitives in such a way that makes geometric sense: drawn squares are converted into geometric squares that a second system would then be able to understand. This paper describes the multiple processes utilized to recognize freehand sketches.

There are three phases to the approach of "early processing" for sketch recognition: Approximation, Beautification, Basic Recognition.

Approximation
Fits lines and curves into a given set of pixels. Intended to streamline data in such a way that it's easier to identify the geometric properties of a given freehand shape.

The first process in this phase is vertex detection. This essentially is what we refer to as "corner finding" in more modern sketch recognition systems. This paper uses three sources of information to help identify vertices: direction, curvature (change in direction with respect to arc length), and speed data. Interestingly enough, speed data was largely considered irrelevant in basic sketch recognition when it was introduced by Rubine, and Long, in fact, removed it entirely from his modified set of features. This vertex detection process uses the extrema in curvature and speed, and notices that the maxima in curvature and minima of speed are correlated. We cannot, for instance, simply take minima in speed to mean that the user was drawing a corner, even if a typical user typically slows down his or her pen when drawing a corner. There also has to be a change in curvature, or else we would consider any abrupt change in speed, even on straight lines, as a "corner". However, even using both of these attributes at once yields false positives, so the paper uses a technique called "average based filtering" that looks for extrema in portions of curvatures that lie beyond a specified threshold. This threshold is based on the mean of each data set. These thresholds help us separate the data into regions that we can then use local and global maxima/minima on to identify which of these are false positives. These thresholds are applied to the curvature data and speed change. From there, the paper generates hybrid fits that implement these together to identify true positives in vertex detection. To generate hybrid fits, the paper computes vertex certainties, generates a set of hybrid fits, and selects the best fit from the previous two.

Beautification
This is the process in which small alterations to the input sketch are generated. For instance, lines that were clearly meant to be drawn as parallel (e.g. a rectangle) are altered so that they are geometrically parallel. Circles drawn imperfectly are changed to have geometrically accurate diameters. Curves that are a hybrid of these two are approximated as a set of several small geometrically accurate ellipses pieced together to form one large curve. Slopes are determined as a weighted average of the slopes of each segmented section of the shape, to determine the intended slope when the user was drawing it.

Basic Recognition
Objects that can be drawn from the user's input sketch are classified into lines and curves, and then classified further into a set of basic geometric shapes such as ovals, circles, squares, and rectangles. These features are hand-tailored from the given information presented by the paper's algorithm. It uses the segmented sections of the shape divided out by the vertices recognized in the 'approximation' phase, and uses template matching to see if they align with the geometric rules given by the algorithm to classify them as each of the geometric shapes.

To evaluate this system, thirteen students used this system and a Unix tool called Xfig. Xfig is a tool that supports freehand sketching but requires users to indicate which sections of his or her sketch are meant to be straight lines, curves, etc. Users were given the chance to get comfortable with both of the systems, and their experiences were documented with qualitative analyses. The students expressed approval at the automated recognition of this new system, and all but one preferred to use this system than Xfig. Additionally, high-level recognizers were written to classify and beautify additional shapes. The paper also reports to be working toward supporting overtracing.

Overall, this paper I felt provides good information on early efforts given toward vertex recognition, and using them to construct a roughly hierarchical recognition system that would identify curves and lines to eventually generate geometrically-correct shapes. Some of the evaluation, particularly with the user study, was unclear and I felt could be expanded more. For instance, what happened when it generated incorrect sketch analyses to the user? Was the user expected to redraw the shape? Was this shortcoming communicated clearly? How did the user react? The only source of evaluation in the user study was whether they preferred this or Xfig, and whether they would use it again, but there is a significant amount of information that could be discerned from the user experience. Additionally, some of the information given with regards to the algorithm was unclear and required significant explanation to fully understand.

Thursday, October 16, 2014

PaleoSketch: Accurate Primitive Sketch Recognition and Beautification

Bibliography Information:
Paulson, Hammond. "PaleoSketch: Accurate Primitive Sketch Recognition and Beautification." IUI '08 Proceedings of the 13th international conference on Intelligent user interfaces. Pages 1-10 . ACM New York, NY, USA 2008.

URL:
http://dl.acm.org/citation.cfm?id=1378775

PaleoSketch, a novel low-level shape recognizer, is one of the flagship projects of the Sketch Recognition Lab. It is a recognition and beautification system that recognizes eight primitive shapes as well as combinations of these primitives themselves to achieve recognition rates upwards of 98.56%, as well as beautifying input input sketches to aid in classification at an early level. Much of this success has been brought about due to the inclusion of two new classification features, Normalized Distance between Direction Extremes (NDDE), and Direction Change Ratio (DCR).

To calculate NDDE, the point with the highest change in direction value (highest change of Y over change of X) and the point with the lowest change are recorded. The stroke length between these two points is the divided by the total stroke length, which essentially provides us with the percentage of the stroke that is between these two extremes. Shapes such as arcs have been observed to yield particularly high NDDE values. Polylines with particularly many "spikes" in their shapes, on the other hand, yield lower NDDE values.

DCR is used to telling polylines and curved strokes apart. DCR is similar to NDDE in that it's also used to see whether there are "spikes" in the drawn shape. DCR is the the max change in direction divided by average change in direction. Polylines will typically have very high DCR values, whereas they will have lower NDDE values.

From there, Paleosketch performs varying tests for classification, each with their own guidelines:

Line Test:
Polyline Test
Ellipse Test
Circle Test
Arc Test
Spiral Test
Helix Test
Complex Test

After collecting two sets of data (the first collecting 900 shapes from 10 users for training, and same amount for testing), it was found that the correct-shape interpretation was 99.89%, but only 98.56% was reported as the top interpretation. Polylines accounted for 1.44% of the error. Overall, the results rival those of state-of-the-art, low-level recognizers that do not recognize as many of these primitives. These are promising results, as they can easily be fitted into fitting higher degree complex interpretations.

As previously mentioned, this paper proved to be one of the most important in the Sketch Recognition Lab, as it was the single most accurate and novel implementation of sketch recognition. The implementation of the different tests, and the fact that there's a difference between, for instance, a circle and an ellipse as two different primitives, makes it clear that the classification categories that were considered prior to this paper could stand to have significant improvement.

What!?! No Rubine Features?: Using Geometric-based Features to Produce Normalized Confidence Values for Sketch Recognition

Bibliographical Information:
Paulson, Rajan, Davalos, Gutierrez-Osuna, Hammond. "What!?! No Rubine Features?: Using Geometric-based Features to Produce Normalized Confidence Values for Sketch Recognition." Visual Languages and Human Centric Computing Workshop on Sketch Tools for Diagramming, Herrsching am Ammersee, Germany, September 5, 2008

URL:
http://srl.cse.tamu.edu/srlng/research/paper/23?from=/srlng/research/

At the time of this paper's publication, there were two main types of recognizers: gesture-based recognizers or geometric-based recognizers. The former focused on a user's input gesture and analyzed it to recognize some low-level shape, but frequently did not analyze the resulting shape itself. The latter would take a single stroke and try to classify it as one set of several different primitives that can then be combined hierarchically to recognize more complex shapes. They both have their individual advantages and disadvantages, and typically their trade-offs would mean that one application would have to ignore the advantages of the other. This paper focuses on creating a combination of the two, resulting in a set of features that largely does not conform to the standards set forth in the Rubine paper.

The paper uses a quadratic classifier that includes the feature sets from geometric and gesture features. With a set that started off at 44 features, 31 of them were geometric based, with the rest being the 13 features proposed by Rubine. Tests were performed on 1800 sketch samples coming from 20 different users. The main goal was to see whether it was possible to create a statistical classifier that would classify single-stroke sketched primitives using both geometric and gestural features. The paper also sought to identify if such a system would produce results at least as good as PaleoSketch's.

Employing a greedy Sequential Forward Selection technique, the team selected a subset of the features. They used 10 folds of SFS, and the best-performing features (mostly anything above 50% accuracy) were chosen (a total of 14). The paper found that, using only 6 of these features, an accuracy of up to 93% was observed, suggesting these features contained a wealth of information required to correctly classify the vast majority of input data. The most important observation is that only one of the gesture-based features ended up being statistically relevant to the classification (total rotation), which incidentally was also used in PaleoSketch. This suggests that we may be moving beyond the Rubine features that had been previously dominating the sketch recognition discussion.

I found this a valuable paper, mostly because it contextualizes and puts into practice both geometric and gesture-based sketch recognition. By making a practical comparison between the two types that analyzed the classification accuracy of both, the paper was able to provide some much-needed insight into whether Rubine remained closely relevant at this point in time.