Wednesday, October 22, 2014

Combining Corners from Multiple Segmenters



Bibliography Information:
Wolin, Field, Hammond. "Combining Corners from Multiple Segmenters". SBIM '11 Proceedings of the Eighth Eurographics Symposium on Sketch-Based Interfaces and Modeling. Pages 117-124 . ACM New York, NY, USA ©2011

URL:
http://dl.acm.org/citation.cfm?id=2021164.2021185&coll=DL&dl=GUIDE

This paper seeks to vastly improve the corner finding process by combining the corner finding algorithms from five different processes and determining which of the given points are discarded and which should be kept by using a variation of feature subset selection. This algorithm uses sequential floating backward selection (different from SFS - Sequential Forward Selection) and a mean-squared error (abbreviated MSE) to find the best subset of corners that are then used to identify the corners themselves. This combination reaches an all-or-nothing accuracy of 0.926.

Corner finding, also known as "segmentation", "fragmentation", or "cusp detection", is a vital aspect of sketch recognition. As the "segmentation" name may imply, it allows an algorithm to divide one polyline into a varying amounts of straight lines, which can easily be recognized and processed individually. From there, it is feasible to recognize lines, arcs, curves, ellipses, spirals, and helices. Existing corner finding algorithms focus on one technique, such as curvature detection or a combination of speed and curvature. There exist considerable drawbacks for each technique, however: over-segmentation (finding too many corners), under-segmentation (finding too few corners) and incorrect segmentation (finding the wrong corners). Another issue is the use of thresholds, which are typically calculated empirically but are far less flexible than data that can be trained onto the system. The paper recognizes the "No Free Lunch" theorem, where the increased focus of an algorithm into a particular area means that the algorithm will lose just as much focus (and performance) in a different area. The basic concept of this paper is that by combining the five major corner recognition algorithms, the shortcomings of one of these recognizers can be mitigated by the focus of the next.

The five segmenters that are used are ShortStraw, Douglas-Peucker, PaleoSketch, Sezgin's segmentation algorithm, and Kim et. al's. ShortStraw, Douglas-Peucker, and PaleoSketch rely on line tests to determine if a segment between two identified corners is a line, and Sezgin's and Kim's algorithms use multiple-primitive segmenters to split the strokes into lines, curves, and arcs. The results from all of these are combined with duplicates removed. This algorithm then uses a feature subset selection algorithm dubbed the "corner subset selection".

Feature subset selection is a technique to remove unneeded dimensions from the classification process that only contribute noise. In this case, each "feature" is the set of points. If removing a point reduces performance by 1%, we remove it since it is not likely to result in providing useful information. Conversely, if removing a point reduces performance by 20%, it is an important point and thus we preserve it.

This algorithm uses sequential floating backward selection algorithms to sift through the initial corner data, which differs from more traditional sequential forward selection algorithms by allowing the algorithm to  reconsider previously discarded points. Say, for instance, we remove feature Fs, and in the following step we analyze if adding Fs back into the subset will yield better performance. Bookkeeping techniques are employed to mitigate the possibility of adding the same Fs back and forth in subsequent steps, far after it has been proved that Fs is irrelevant to subset.

To determine the performance of the system, the Mean-Squared Error (MSE) is calculated between the stroke segments and an "ideal" shape created by linking the corners. The equation to calculate the MSE is as follows:

Where i is the index of the original stroke, opti is the optimal point that's vertically closest, pi is a point in the original stroke, and N is the number of original points.

To work against the fact that there's likely to be a change in MSE due to the changing of shape scaling, we normalize MSE between the existing i and the next. Overall, using this technique it was found that the overall accuracy was much higher than any of the individual implementations, and this combination was not significant enough in run time to affect performance.

I think that the paper was well presented and explained. I feel the evaluation segment could have been more elaborated upon, but it is possible that user testing was not needed when the accuracy was considered on an all-or-nothing basis.

No comments:

Post a Comment