Bibliography Information:
Sezgin, Davis. "HMM-Based Efficient Sketch Recognition". 2005 International Conference on Intelligent User Interfaces.
URL:
http://rationale.csail.mit.edu/publications/Sezgin2005HMM-extended.pdf
This paper focuses on the usage of Hidden Markov Models (HMM) to help classify and group sketched objects together. The idea is that, even though there exists a large variety of different methods for sketch recognition that classify individual shapes together, the ordering of the shapes and how this can be used to predict the shape itself was not something that had been explored at the time.
The paper begins by describing what it calls a "sketching style", which is the user's preferred, though "not necessarily conscious" order in which the strokes of a shape are drawn. Sketch recognition in the paper is described as three steps: segmentation, classification, and labeling. In this sense, what the paper talks about is largely similar to what most every other paper read in this course talks about. However, the clue lies in the ordering in which the strokes are completed. Indeed, by analyzing the order in which the sketches were produced, it would give insight into the style of the individual's sketch as well as the general process by which all users draw. Though technically the amount of permutations of a shape of n strokes is n!, the actual number of permutations in a real-world sketching scenario is far less. For instance, when drawing a stick figure the vast majority of people either draw the stick representing the body, or the circle representing the head. One would rarely see the right leg or arm being drawn first.
Sezgin et. al. conducted a research study first in which this behavior is observed before proceeding into classifying the drawn shapes using HMMs. The sketches that the users were asked to draw were diagrams of finite state machines, Unified Modeling Language diagrams, scenes with "stick figures playing certain sports", digital circuit diagrams, and emoticons. This produced a total of 180 sketches. This user study revealed that users draw these sketches in a "highly stylized" fashion, that these styles persist across different sketches, that users preferred a particular order when drawing sketches, and that enclosed shapes (e.g. circles, squares, triangles) were drawn first.
In order to apply HMMs to object recognition, the system used had to support for multiple classes and different orders of drawing the same sketch, supporting any variations in drawing/encoding length, being able to compute a probabilistic matching score, and be able to support system-level learning of the user's intuition in drawing order. In terms of encoding, a total of 13 symbols were used as observations to encode the output sequences. 4 were used to encode lines, 3 encoded ovals, 4 for polylines, and one to encode any complex approximations that consisted of two or more of the above. Additionally, there were two kinds of HMMs: fixed input length and variable input length. Because the construction of the grouping graphs is made easy due to the "hard coding" of the fixed length data, and because it requires artificial grouping on a per-case basis, the former has somewhat of a limited use in "real world" sketch recognition. The sketching data is grouped by similar "sketching styles" in order to avoid this artificial partitioning, and Sezgin et. al. introduce weights in the graph to account for the differing lengths in the data. This HMM was implemented via BNT in Matlab. The Bakis left-to-right topology was used due to the incremental nature of sketching.
This work was evaluated by both evaluating real data and comparing the performance of this new algorithm to an existing "baseline method". With the real data coming from a total of 88 sketched objects from users, the fixed input HMM method yielded an accuracy of 96%, while the variable input method yielded 81% if they had no explicit ending states, and 97% if they did. "Negative examples" were introduced into the input data, such as superfluous or intentionally unrecognizable shapes to gauge how well the algorithm performed. The results varied depending of the size of the "neighborhood of misrecognition": 1 in 57% of the trials, 2 in 35%, 3 or more of 8%. With the simulated "low level errors", 43.5% of them were within a neighborhood of 1, 22.5% within 2, and 3% within 3 or more.
In order to test the "baseline recognizer", a feature-based one was used. This would assign spacial context to different shapes and would provide additional structural object descriptions. The running time for the baseline method increased exponentially, whereas the HMM-based recognition system scaled linearly.
Overall, I think the work presented in this paper was substantial. It presented not only a very accurate method of classification, but it also opens the door to many future experiments in identifying and exploring the differing "sketching styles" of people and how they affect sketch recognition. User studies can become far richer when we begin applying HMMs not only to describe their sketching styles, but also to predict them in the future. Using neural networks these can also be used to create original sketches made by a computer in the "style" of a specific user's sketch.
No comments:
Post a Comment