Sunday, December 7, 2014

An image-based, trainable symbol recognizer for hand-drawn sketches


Bibliographical Information:
Levent Burak Kara, Thomas F. Stahovich, An image-based, trainable symbol recognizer for hand-drawn sketches, Computers & Graphics, Volume 29, Issue 4, August 2005, Pages 501-517, ISSN 0097-8493, http://dx.doi.org/10.1016/j.cag.2005.05.004.
(http://www.sciencedirect.com/science/article/pii/S0097849305000853)

URL:
http://www.sciencedirect.com.lib-ezproxy.tamu.edu:2048/science/article/pii/S0097849305000853

This paper is about a trainable, hand-drawn symbol recognizer that is based on the concept of "multiple layers". While this system differs from most symbol recognizer systems in that it uses template matching extensively, it is in no way at a disadvantage for doing so.

The focus in this system is to develop a "portable" ink recognition utility that could be used in multiple applications. For that reason, the system's "trainability" by a layman user is an important feature. Symbols intended to be treated as "templates" can easily be drawn and re-drawn by the user at will, creating a flexible system in which a user can input new template data without having to navigate away from the main app.

One of the significant and curious new contributions in this paper is in how it deals with rotations while template matching. Rather than relying on the traditional coordinate or an x-y pattern, the sketches are looked at using a polar coordinate system that makes template and sketch comparison much easier and more complete. This is due to the fact that images can be rotated and analyzed in the "native" application of the polar coordinate system.

The architecture of this recognition system is as follows: from the raw "ink" input from the user, there is the possibility that this input will be used as a template in itself. Whether or not it is used for recognition, there is some pre-processing done on the ink itself, where the image is rasterized into a 48x48 grid where the main template is identified as a series of "sectors" that the 48x48 dots are colored. This makes the system much more easy to use for storage and the recognition itself.

From there, if the stroke is to be used for recognition purposes, the grid is changed into polar coordinates such that it can be compared against its template with no need to account for any data discrepancies caused by rotation. This change is motivated largely by the fact that manually changing sketches to match the angle of the template is extremely computationally expensive. The result is given in the image above, where the letter "P" in its upright and rotated positions still yield incredibly similar graphs when both are converted to polar coordinates. At worst, the rotation will cause the latter segment of the polar graph to shift to the beginning of the entire graph, but the graph as a whole is still incredibly similar. This is used in the system as a way of "pre-recognizing", where the system can use discrepancies in polar coordinate data to help determine if an input stroke matches a template.

The classifiers that are used for template matching are Haudstorff Distance, a modified version of the Hudstorff Distance, the Tanimoto Coefficient, and the Yule Coefficient. The result from each of these is used to generate recognition results. This as well as the conversion to polar coordinates yields a very flexible and trainable system, since the user study utilized a wide variety of symbols. The symbols used were beam, pivot, root, pump, Cantilever beam, piston, sum symbol, random number symbol, square, spring, current, sine wave, matrix, damper, circular sum, pulley, differentiator, diode, integrator, and signum.

I think that the use of the polar coordinate system can yield a large variety of additional features that have not been explored yet. For instance, I think that most of the rotation-based features, such as the various features based on rotation from Rubine, can be applied to polar coordinates and will likely yield analysis times that are much better, much like the way that this sketch recognition system uses it to benefit from its improved analysis run time.Another interesting application could be in using this to apply sketch recognition to shapes drawn in polar coordinates, which is a rarely used but vastly overlooked possibility in the field of sketch recognition.

1 comment:

  1. I too was fascinated by their polar transformation. I should have thought about how useful a change in basis could be since it's used all the time in techniques like principle components analysis and linear discriminant analysis, but I hadn't thought of it as applied to polar coordinates. In this program, it gives them some very obvious benefits in the form of rotational invariance, and I agree that it would be an interesting experiment to see how such a transformation could benefit other recognizers and their features.

    By the way, I really like your blog. The background and layout is really nice, but most of all, your usage of images from each paper is very effective! I'm seeing a lot of blogs with excellent use of imagery; maybe I should have used some myself...

    ReplyDelete