Raniero's Research Blog: 2014

Monday, December 8, 2014

KimCHI: A Sketch-Based Developmental Skill Classifier to Enhance Pen-Driven Educational Interfaces for Children

Bibliography Information:

Hong-hoe Kim, Paul Taele, Stephanie Valentine, Erin McTigue, and Tracy Hammond. 2013. KimCHI: a sketch-based developmental skill classifier to enhance pen-driven educational interfaces for children. In Proceedings of the International Symposium on Sketch-Based Interfaces and Modeling (SBIM '13), Stephen N. Spencer (Ed.). ACM, New York, NY, USA, 33-42. DOI=10.1145/2487381.2487389 http://doi.acm.org/10.1145/2487381.2487389

URL:

http://dl.acm.org/citation.cfm?id=2487389

KimCHI is a sketch recognition system developed toward the recognition of sketches drawn by children. Specifically, it aims to recognize children's sketches and differentiate them between whether the drawer is a pre-school child or an elementary-school level child. Additionally, the way that sketches are drawn can be analyzed to tell apart whether the user is a male or female, and whether the user is an adult or child. The novelty in this recognition system is due to the fact that there has been very little work done on analyzing sketch recognition for children, since children draw shapes very differently depending on their fine motor skills as well as many other factors.

Out of the data from the 725 sketches created by 20 children, classification techniques were used to develop the optimal features for classifying between pre-schoolers vs. grade schoolers. These features are: average curvature of the stroke, direction change ratio, error of the best fit line of the direction strength, and the max curvature to average curvature value. All of these features have a 100% accuracy in determining these factors. The optimal features for classifying between pre-schoolers and adults are Average curvature of the stroke (100%), Direction change ratio (100%), The angle of the major axis relative to center (100%), The error of the best fit line of the direction graph (100%), The maximum curvature to average curvature value (100%), and Slope of the direction graph (100%).

Sketch recognition in children is one of the domains that is highly overlooked. I believe that this is imperative if we want to move sketch recognition as a field into the realm of education. We can use this for number recognition to teach children handwriting as well as basic arithmetic, since these are important pillars of early grade school education.

Sunday, December 7, 2014

World of Workout: A Contextual Mobile RPG to Encourage Long Term Fitness

Bilbiographical Information:

Joey Bartley, Jonathon Forsyth, Prachi Pendse, Da Xin, Garrett Brown, Paul Hagseth, Ashish Agrawal, Daniel W. Goldberg, and Tracy Hammond. 2013. World of workout: a contextual mobile RPG to encourage long term fitness. In Proceedings of the Second ACM SIGSPATIAL International Workshop on the Use of GIS in Public Health (HealthGIS '13). ACM, New York, NY, USA, 60-67. DOI=10.1145/2535708.2535718 http://doi.acm.org/10.1145/2535708.2535718

URL:

http://dl.acm.org/citation.cfm?id=2535718

World of Workout is an alternative exercise application for mobile phones. It mixes the "game-ification" of an exercise application that is becoming increasingly more common with a dedicated game application. The idea behind World of Workout is to create an avatar in a game world whose statistics depend directly on the exercise data of that particular user.

The user's avatar has three main statistics: strength, speed, and stamina, all of which have a direct role in how the avatar plays in the game, and all are directly influenced by the exercise data. In order to detect user exercises, the application uses a phone's built-in GPS and accelerometer feature to track runs and stationary exercises. The in-game avatar attributes of speed, strength, and stamina are affected by how a user performs sprints, weights and push-ups, and long distance runs and biking respectively. The game also contains a random-reward function, in which a user is recommended one particular exercise with an immediate and significant bonus to the avatar's statistics if the exercise is completed. This is one of the two components that encourages users to try a variety of exercises, which is both safe for the user as well as creates a more well rounded exercise program.

The second of these components is the "fatigue" mechanic, which is integrated deeply into how the exercise translates to avatar progress. The rate at which the player's avatar raises his or her stats is linear in terms of how much exercise that person performs. Indeed, the biggest progress happens when the user performs the biggest variety of exercises possible. "Fatigue" in this case is an implementation of diminishing returns, where a user who does nothing but sprint will see his or her improvements in the avatar's speed diminish the more he or she does it, to the point where the improvement in avatar speed is negligible.

The reasons for implementing this mechanic are numerous. First, as mentioned previously, this is done to create a well rounded exercise program. Second, repeated and concentrated use of any one exercise can prove to strain the muscles and eventually become dangerous, so this is implemented to prevent users from over-exerting themselves for the sake of improving their avatar. Third, this system prevents cheating, where a user who has figured out how to "cheat' the phone's accelerometer to make it think that the user is running when really he or she is in place.

I think that the fatigue factor in this application is easily the most novel contribution to this space. Even though some applications have existed to game-ify exercise activities, there are not very many mobile games that are directly linked to an exercise application, and certainly not one where the game's avatar performance is directly linked to the user's exercise regimen. Finally, the fatigue factor performs many functions at once and creates a level of abstraction between player performance and avatar progress that is vital to making a video game "fun". I believe that a larger set of "fatigue"-like mechanics applied to this exercise application, as well as the development of a very robust game to go along with the exercise portion of the activity can prove to make a very useful tool for exercise.

Recognizing Interspersed Sketches Quickly

Bibliographical Information:
Tracy A. Hammond and Randall Davis. 2009. Recognizing interspersed sketches quickly. In Proceedings of Graphics Interface 2009 (GI '09). Canadian Information Processing Society, Toronto, Ont., Canada, Canada, 157-166.

URL:
http://dl.acm.org.lib-ezproxy.tamu.edu:2048/citation.cfm?id=1555917

One of the biggest challenges in sketch recognition is in the recognition of of "interspersed sketches", which consists whose shapes are drawn out of order from other shapes. In the drawing of two triangles, for instance, one may choose to draw the first two sides before drawing a third. This proves challenging for many recognition systems, with algorithms that end up running in exponential time. By coming up with a system of constraints in which to look, this recognition system greatly reduces the runtime for recognition that supports interspersed sketch drawing, even if it still runs in exponential time. This algorithm uses an indexing technique that uses geometric recognition for the purposes of ranking and pruning.

The geometric constraints can be classified in four different categories: Orientation-dependent constraints (such as "horizontal", "vertical", etc.), Orientation-independent constraints ("acute", "bisects", "same side"), Composite constraints ("smaller", "centered left"), and "other" ("equal", "greater than"). Signal noise, while significant, is handled by the recognition system itself. The template shapes are typically described without this signal noise, so when the user shape is recognized, there is no signal noise being accounted for either. Additional noise can occur if the user takes more than one stroke to draw a shape. This is alleviated by splicing together bigger shapes from smaller strokes if they correctly match a given shape. The system also uses a collection of constraints, each constraint with its own error tolerance that will cause the recognition to be discarded if it lies outside of this constraint. There are two types of "requirements" that must be satisfied. Not only must a shape be "geometrically satisfied", which is the constraints that are purely geometric based, but also "perceptually satisfied", which can be described as the set of constraints that any regular human would be able to perceive. For instance, a stroke that is "to the left" of another one may be geometrically satisfied if the shape is even one pixel to the left of the original, but a human would very rarely ever make the same call if two shapes are presented with this small a discrepancy. Making sure a human can perceive these constraints is key to making sure they are meaningful in sketch recognition. The constraints were both calculated and experimented with users.

The indexing algorithm to help rank these constraints and how they score is divided into three sections: domain-independent primitive finding, domain-independent constraint indexing, and domain-dependent shape formation. The first breaks down the shapes given to primitives such as arcs, ellipses, and circles, and corners, the last of which is found via speed and curvature data. The second uses indexing that occurs only once for every shape, such that the algorithm does not impose too many rules to the shapes it recognizes. Each shape is only indexed based on its own constraints and NOT to the constraints of the shapes around it. The final section attempts to group together indexed shapes into compound or more complex versions of the primitives. The system assigns recognized shapes into a limited number of "slots" which are then used for comparison and possible combination. The constraints applied to the compound shapes then dictate whether the shapes in the slots are removed, remain, or exchanged. This system also makes extensive use of pruning to prevent unnecessary branching, which cuts down the run time drastically.

I think that this constraint-based limitation is an interesting alternative implementation to the problem of interspersed-shape sketch recognition. This is especially useful in educational situations, where the user is expected to be a student who is not an expert in their field of study and thus are unlikely to have developed a standard, industry-followed order of sketch recognition, if there is even any one at all. In the field of education, sketch recognition needs to be highly flexible to allow different students to learn in their own way, so in this case the application of interspersed-shape sketching can prove very useful.

An image-based, trainable symbol recognizer for hand-drawn sketches

Bibliographical Information:

Levent Burak Kara, Thomas F. Stahovich, An image-based, trainable symbol recognizer for hand-drawn sketches, Computers & Graphics, Volume 29, Issue 4, August 2005, Pages 501-517, ISSN 0097-8493, http://dx.doi.org/10.1016/j.cag.2005.05.004.
(http://www.sciencedirect.com/science/article/pii/S0097849305000853)

URL:
http://www.sciencedirect.com.lib-ezproxy.tamu.edu:2048/science/article/pii/S0097849305000853

This paper is about a trainable, hand-drawn symbol recognizer that is based on the concept of "multiple layers". While this system differs from most symbol recognizer systems in that it uses template matching extensively, it is in no way at a disadvantage for doing so.

The focus in this system is to develop a "portable" ink recognition utility that could be used in multiple applications. For that reason, the system's "trainability" by a layman user is an important feature. Symbols intended to be treated as "templates" can easily be drawn and re-drawn by the user at will, creating a flexible system in which a user can input new template data without having to navigate away from the main app.

One of the significant and curious new contributions in this paper is in how it deals with rotations while template matching. Rather than relying on the traditional coordinate or an x-y pattern, the sketches are looked at using a polar coordinate system that makes template and sketch comparison much easier and more complete. This is due to the fact that images can be rotated and analyzed in the "native" application of the polar coordinate system.

The architecture of this recognition system is as follows: from the raw "ink" input from the user, there is the possibility that this input will be used as a template in itself. Whether or not it is used for recognition, there is some pre-processing done on the ink itself, where the image is rasterized into a 48x48 grid where the main template is identified as a series of "sectors" that the 48x48 dots are colored. This makes the system much more easy to use for storage and the recognition itself.

From there, if the stroke is to be used for recognition purposes, the grid is changed into polar coordinates such that it can be compared against its template with no need to account for any data discrepancies caused by rotation. This change is motivated largely by the fact that manually changing sketches to match the angle of the template is extremely computationally expensive. The result is given in the image above, where the letter "P" in its upright and rotated positions still yield incredibly similar graphs when both are converted to polar coordinates. At worst, the rotation will cause the latter segment of the polar graph to shift to the beginning of the entire graph, but the graph as a whole is still incredibly similar. This is used in the system as a way of "pre-recognizing", where the system can use discrepancies in polar coordinate data to help determine if an input stroke matches a template.

The classifiers that are used for template matching are Haudstorff Distance, a modified version of the Hudstorff Distance, the Tanimoto Coefficient, and the Yule Coefficient. The result from each of these is used to generate recognition results. This as well as the conversion to polar coordinates yields a very flexible and trainable system, since the user study utilized a wide variety of symbols. The symbols used were beam, pivot, root, pump, Cantilever beam, piston, sum symbol, random number symbol, square, spring, current, sine wave, matrix, damper, circular sum, pulley, differentiator, diode, integrator, and signum.

I think that the use of the polar coordinate system can yield a large variety of additional features that have not been explored yet. For instance, I think that most of the rotation-based features, such as the various features based on rotation from Rubine, can be applied to polar coordinates and will likely yield analysis times that are much better, much like the way that this sketch recognition system uses it to benefit from its improved analysis run time.Another interesting application could be in using this to apply sketch recognition to shapes drawn in polar coordinates, which is a rarely used but vastly overlooked possibility in the field of sketch recognition.

Tahuti: A Sketch Recognition System for UML Class Diagrams

Bibliography Information:
Hammond, T., & Davis, R. (2002). Tahuti: A sketch recognition system for uml class diagrams. In AAAI Spring Symposium on Sketch Understanding (Vol. 3, No. 2).

URL:
http://rationale.csail.mit.edu/pubs/hammond/hammondsketchsymp2001.pdf

Tahuti is a sketch recognition environment designed around creating a "smarter" form of sketch recognition. The system is designed around recognizing shapes via geometric recognition, rather than requiring users draw a shape to match a template. This system uses a multi-layer framework for the recognition.Strokes are initially recognized as line segments, then grouped into more complex shapes if they are recognized as any particular shape.

The system recognizes seven "viewable" objects: a general class, an interface class, an inheritance
association, an aggregation association, a dependency association, an interface association, or a collection of unrecognized strokes. These objects can be further edited, which is a highly important feature in Tahuti that sets it apart from many other sketch recognition systems. For geometric recognition, Tahuti uses a variety of geometric "rules" that each shape follows. For instance, the way that Tahuti recognizes arrows is as follows:

Locate arrow shaft (identify points furthest away from each other. In the above picture, this would return the line segment between A and B)
Locate arrow heads (identify points furthest away from the shaft. This returns C and D)
Locate E, the point in the AB line that is twice the distance from B as there is distance between the line formed from the points CD to the arrowhead B.
Look to classify these line segments as either the arrow shaft, arrow head section, or "unclassified".
Based on the results from 4, classify arrow type as "dependency", "inheritance", "aggregation", or leave unclassified.

A preliminary study showed that the high accuracy of the system was pleasing to most users, who scored the system a 4.375/5 for drawing and a 4.825 for editing. The use experience was compared against an alternative paint program called Rational Rose, and most users preferred Tahuti for its ease of use, especially when it came to editing.

I believe that Tahuti was one of the early papers that influenced the Sketch Recognition Lab's focus on geometric recognition over template matching. Indeed, at this point in the field of sketch recognition, barely any of it is simply template matching. I also believe that it smartly uses the concepts of geometric properties from other sketch recognition techniques such as Rubine's features. This style of sketch recognition I believe is intuitive for testing applications in education in geometry, where the very systems used to recognize can be the ones used to grade basic geometric problems.

Fitts' Law as a Research and Design Tool in Human-Computer Interaction

Bibliographical Information:
I. Scott MacKenzie. 1992. Fitts' law as a research and design tool in human-computer interaction.Hum.-Comput. Interact. 7, 1 (March 1992), 91-139. DOI=10.1207/s15327051hci0701_3 http://dx.doi.org/10.1207/s15327051hci0701_3

URL:
http://dl.acm.org/citation.cfm?id=1461857

This paper is an outline on the principles of Fitts' Law, and how it can be applied to the domain of Human-Computer Interaction. Previously, this research and the subsequent applications had been applied only to the domains pertaining to psychology, measuring the cognitive effects of differing "targets". The main focus is in quantifying task completion time through a mathematical formula that depends on both the distance that the test subject is expected to traverse and the size of the target he or she is expected to "hit". The formula that Fitts developed is shown below:

$T = a + b(ID) = a + b \log_2 \Bigg(\frac{2D}{W}\Bigg)$

Where T is the average time it will take the subject to perform the task, a and b are the parameters of the particular model, ID is what is called the "index of difficulty", which is calculated the logarithm (base 2) of twice the distance D of the target over the width of the target W, measured along the axis of rotation. The variation of this law that is most frequently used in the field of Human-Computer Interaction is called the "Shannon form", named after the Shannon-Hartley Theorem since the notation below resembles it very closely:

$T = a + b \log_2 \Bigg(1+\frac{D}{W}\Bigg)$

The purpose of this model is in the development of information in the form of "bits per second". This is decidedly an interesting abstraction, since at the time of this form's development little work had been done in quantifying human brain activity using computer science metrics. Interestingly, this particular implementation was originally not conceived as a direct application to the field of computer science but rather psychology, which was an interesting concept and eventually became very easily applied to computer science when it was decided to be applied that way.

Fitts' Law in the field of computer science has been traditionally used for gauging user interface readability with a traditional mouse pointer. However, I believe we can apply Fitts' Law in numerous ways. We can apply Fitts' Law in psychology exams delivered through computer systems to better analyze this information, and we can likely develop additional metrics and more information-rich variation of Fitts' Law once we can analyze this information with much finder granularity.

Wednesday, November 19, 2014

Using Entropy to Distinguish Shape Versus Text in Hand-Drawn Diagrams

Bibliographical Information:
Bhat, Akshay, and Tracy Hammond. "Using Entropy to Distinguish Shape Versus Text in Hand-Drawn Diagrams." In IJCAI, vol. 9, pp. 1395-1400. 2009.

URL:
http://www.aaai.org/ocs/index.php/IJCAI/IJCAI-09/paper/download/592/906

This paper written by the Sketch Recognition Lab identifies text and free-hand diagrams using only one feature: zero-order entropy rate. The paper found that relying solely on this entropy rate yields correct classification between text drawings and diagram sketches 92.06% of the time. This feature also performed favorably in domains for which no training examples were provided.

The main motivation behind this paper is the fact that many sources of sketches introduce both hand-written text as well as shape-based diagrams. These examples include engineering diagrams where the author writes annotations and labels components with text, military course of action diagrams, and UML sketches have both text and shapes as considerable components in their respective sketches. Separating what is a sketch and what is hand-drawn text is necessary for accurate sketch recognition in both domains. Despite the fact that recognition of both types of sketches vary greatly in procedure and final classification, these two types are found frequently in traditional sketching, providing a strong motivation to be able to separate them successfully.

This paper uses zero-order "entropy rate" to classify sketches between shapes and text. Entropy measure is essentially a measure of "the degree of randomness from an information source". With regards to sketch drawing, this randomness appears with far higher frequency with text sketches rather than sketch shapes. The paper develops an entropy model "alphabet" to characterize how these shapes tend to be drawn. This essentially means that the range of angles of any stroke are divided into discrete thresholds and assigned a symbol as a classification (in this case, the range from 0 to pi/6, for instance, is given the symbol "A". Different ranges correspond uniquely to B, C, D, E, F, and a range corresponding to an "end-point" is given the symbol X.

With this classification, the strokes are grouped and an "alphabet" of each of their angles is shown. The representation of text strokes yield more varied letters than shape strokes, and that is what is used as the "entropy" measure in this instance. The strokes were labeled as "shape", "text", or "unclassified", the last of which is the classification where the entropy measurement is above the threshold to classify as a "shape", but below the threshold of "text". The results are promising, with 92.06% correct classification when the system is required to produce a classification. Military course-of-action diagrams and other types of sketches that haven't been assigned a "domain" and therefore not studied performed favorably as well.

I think that this paper is both well motivated and implemented, with the idea of "entropy" a particularly clever one that ends up yielding very favorable results. I think that the paper's explanation in how the classification worked was easy to understand and well elaborated, with a surprising amount of intuition applied to the general back-end of how this classification technique works. Using higher-order entropy models to determine accuracy that way would be an interesting continuation of this study.

Sunday, November 16, 2014

SketchREAD: A Multi-Domain Sketch Recognition Engine

Bibliographical Information:
Christine Alvarado and Randall Davis. 2004. SketchREAD: a multi-domain sketch recognition engine. In Proceedings of the 17th annual ACM symposium on User interface software and technology (UIST '04). ACM, New York, NY, USA, 23-32. DOI=10.1145/1029632.1029637 http://doi.acm.org.lib-ezproxy.tamu.edu:2048/10.1145/1029632.1029637

URL:
http://dl.acm.org.lib-ezproxy.tamu.edu:2048/citation.cfm?id=1029637

SketchREAD is a system developed by Christine Alvarado and Randall Davis, which uses many different principles in the field of sketch recognition to present an algorithm that is as least intrusive to the user sketching experience as possible. The main motivation behind this paper is the fact that most systems regarded as "accurate" are either overly restrictive in the sketches the consider (e.g., one can only draw the primitive shape using one stroke), require the user to input a significant amount of additional data (e.g., training data or labeling shapes), or severely sacrifice response time or accuracy. SketchREAD was designed as a way for the recognizer to work "in the background", and to continuously update its own recognition with 1) no prior training data and 2) no feedback provided to the user, as the latter is considered to be distracting to the sketching experience.

This algorithm makes extensive use of Bayesian Networks, which are networks of predictions based on uncertainty that use background data to calculate said uncertainty. They consist of two parts, a "directed acyclic graph that encodes" which world factors influence each other, and a set of probabilities that specify how these factors affect each other. This is used especially in one of the two test cases for the SketchREAD system: the formulation of a family tree. Because the relationship between family trees and, therefore, the identification of object attributes, depends on how the objects are translated, the system continues to update its interpretation as more sketch components are added. Bottom-up and top-down approaches are applied to help the run time of the recognition algorithm. This is distinctly similar to how the ShortStraw algorithm is implemented. Additionally, pruning is performed on probability calculations, which we have also seen in HMMs.

Overall, I think this paper is an amalgam of a large number of different techniques employed in several other recognition algorithms. The system is shown to significantly improve the user experience in both the family writing domain and the circuit design domain as well. Finally, it employs heavy use of the hierarchical system of sketch recognition, which has also been seen in other systems such as the one currently in place for SRL's Mechanix. I think this system can easily be applied to several other kinds of domains, and the fact that it draws from many other sketch recognition algorithms to create its own implementation is an indication of a solid application of the best aspects of these algorithms.

Saturday, November 15, 2014

ShortStraw: A Simple and Effective Corner Finder for Polylines

Bibliographical Information:
A. Wolin, B. Eoff, and T. Hammond. 2008. ShortStraw: a simple and effective corner finder for polylines. In Proceedings of the Fifth Eurographics conference on Sketch-Based Interfaces and Modeling (SBM'08), Christine Alvarado and Marie-Paule Cani (Eds.). Eurographics Association, Aire-la-Ville, Switzerland, Switzerland, 33-40. DOI=10.2312/SBM/SBM08/033-040 http://dx.doi.org/10.2312/SBM/SBM08/033-040

URL:
http://dl.acm.org.lib-ezproxy.tamu.edu:2048/citation.cfm?id=2386308

ShortStraw is an algorithm for corner finding that relies entirely on one "feature". This technique was originally intended for use in conjunction of other features that helped identify corners in a closed shape, but this particular feature was so accurate that the corners would be identified by using only ShortStraw.

The technique for using ShortStraw is relatively simple. Similar to $1, the algorithm begins by resampling the shape, except unlike $1, the shape is always resampled to N=40 points. From then, "straws" are created, which is essentially small Eucledian Distances between two points. One point is chosen as the "center", while an additional three points are chosen as the "window". The "positive window" are the three points in front of the "center" point, and the "negative window" are the three prior ones. For short, the frontmost point is p[i+w], where i is the current center point and w is the "window", in this case w=3. The backmost point is p[i-w]. Straws are calculated as follows: straw 1 is the eucledian distance between p[i+w] and p[i-w], straw 2 is distance between p[i+w-1] and p[i-w+1], and straw 3 is p[i+w-2] and p[i-w+2].

ShortStraw is divided into two approaches: top-down and bottom-up. Bottom-up is the process by which the straws are compared to first identify a "first pass" of detected corners, while the top-down approach takes these corners and runs another pass to eliminate false positives and identify missed corners.

The bottom-up process iterates through each point and creates the three straws as mentioned above for each point. If the largest straw is nearly identical (with a threshold of 95%) in size with that of the distance along all three points, we know the straw is "laying flat" on top of the shape thus it cannot be a corner. If the eucledian distnace is describing a corner, however, the straw will be much smaller. When these straws are smallest is when these points are considered to be corners.

The top-down approach then takes these corners and runs another pass, ensuring that every single point yields a "flat" line. If this is not the case, corners and removed/added depending on whether this fault was a false positive or false negative. This top-down approach is vital in removing erroneous results from the initial pass of the bottom-up approach.

One of the points raised by this paper is the fact that its definition of "accuracy" is strictly "all-or-nothing". In previous corner finding algorithms by other authors, very high accuracy rates were reported, but this paper feels that these are somewhat inflated due to the fact that there's no "penalty" for false positives. By this logic we could return every single point as a corner and still have a reported 100% accuracy. While this may make the corner finding algorithm sound accurate on paper, in practice reporting too many corners is identically detrimental to the user experience to reporting too few, or the wrong ones, or none at all. In any case, the user has to re-draw or correct their input diagram. The more false positives reported, the worse the user experience, even if the reported accuracy is 100%. Because of this, this algorithm is scrutinized under an all-or-nothing. If the corners found do not match the actual corners exactly (no more, no less), this is counted as a "miss". Under this much more practical definition of accuracy, ShortStraw's accuracy is 0.741 out of 1, while Sezgin's is 0.278 and Kim&Kim's is 0.297. As can be easily seen, ShortStraw is far more accurate at finding corners with an all-or-nothing accuracy model than previously established algorithms.

I believe this paper to be a very interesting one, both in concept and in execution. The concept of "straws" and the identification of local maxima and minima of straw length may at first sound somewhat unorthodox, but in practice it's shown to be highly important. Most intriguing of all is the fact that the algorithm uses only the length of straws to determine corners, whereas other corner-finding algorithms rely heavily on multiple features to make their decisions. Finally, the discussion on the importance of all-or-nothing is one that may have been inconvenient for several established algorithms, but since these techniques are ultimately intended for use by a layman, a technical definition of accuracy that omits false positives does not help the practical usability of these algorithms at all. In some ways this adoption of all-or-nothing changes the discussion of what it means for a recognition algorithm to be "accurate".

Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes

Bibliographical information:
Jacob O. Wobbrock, Andrew D. Wilson, and Yang Li. 2007. Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes. In Proceedings of the 20th annual ACM symposium on User interface software and technology (UIST '07). ACM, New York, NY, USA, 159-168. DOI=10.1145/1294211.1294238 http://doi.acm.org.lib-ezproxy.tamu.edu:2048/10.1145/1294211.1294238

URL:
http://dl.acm.org.lib-ezproxy.tamu.edu:2048/citation.cfm?id=1294238

The $1 recognizer was designed to be small, fast, and easily implementable with under 100 lines of code. With only 1 loaded template, $1 had 97% accuracy, and with 3+ loaded templates achieved as high as 99%. Although this paper is one of the most referenced in the field of sketch recognition, at the time of publication none of the algorithms involved were particularly novel. However, due to the fact that nobody had ever actually published this information in a paper, this paper was in fact the first to fully document this well-known recognition technique. It can be credited with standardizing this procedure as well as providing a base from which other researchers developed recognizers such as $2 and $n.

The algorithm works as follows. The sketches are divided into two sets of points: the "candidate points" C given by the user sketch, and "template points" T that matches the candidate points most. The candidate points are sampled by a combination of hardware and software. Because a simple template comparison would rarely provide a match, there are several more steps that must be taken to successfully make such a comparison:

Resampling the point path. The paper describes that the number of samples for each shape to resampled to varies in optimal use depending on the shape, but as long as N number of samples is between 32<N<256, the algorithm can recognize the best. The optimal number of sample points was found to be N=64
Rotate Once based on the Indicative Angle. The recognizer searches "over the space of possible angles" to identify the optimal angle through which both the candidate and template shapes will be compared against.
Scale and Translate. The candidate shape is rotated about its centroid based on the "reference square" that gets determined. The shape is then translated to a reference point to match the two shapes as closely as possible.
Find the Optimal Angle for the Best Score. The "score" is determined based on the distance between the template and corresponding candidate points, and the size of the points between corresponding template and candidate points as well.

The paper identifies several limitations of this recognizer. First, because the recognizer is meant to be flexible with regards to rotation and scale, any shapes whose identifying features include a specific rotation (such as the difference between a rhombus and a square only being the rotation of the shape) will not be considered a separate shape. Non-uniform scaling can also abuse horizontal or vertical line, so shapes purely consisting of one of these lines can have trouble being recognized. Third, because the system does not have any time-based recognition, any shapes that rely on drawing time or speed will not be recognized.

The $1 recognizer, as the paper initially claims, is used today mostly to get a recognizer easily integrated into a system. This recognizer can be used easily with simple shapes such as the letters of the alphabet. However, due to the significant limitations of this particular algorithm, any significant research in sketch recognition and most modern formats do not use something as simple as $1 by itself. Most modern systems use a combination of these along with several other kinds of algorithms such as feature-based ones, to generate the most accurate and useful recognition abilities for specific applications.

Monday, November 10, 2014

HMM-Based Efficient Sketch Recognition

Bibliography Information:
Sezgin, Davis. "HMM-Based Efficient Sketch Recognition". 2005 International Conference on Intelligent User Interfaces.

URL:
http://rationale.csail.mit.edu/publications/Sezgin2005HMM-extended.pdf

This paper focuses on the usage of Hidden Markov Models (HMM) to help classify and group sketched objects together. The idea is that, even though there exists a large variety of different methods for sketch recognition that classify individual shapes together, the ordering of the shapes and how this can be used to predict the shape itself was not something that had been explored at the time.

The paper begins by describing what it calls a "sketching style", which is the user's preferred, though "not necessarily conscious" order in which the strokes of a shape are drawn. Sketch recognition in the paper is described as three steps: segmentation, classification, and labeling. In this sense, what the paper talks about is largely similar to what most every other paper read in this course talks about. However, the clue lies in the ordering in which the strokes are completed. Indeed, by analyzing the order in which the sketches were produced, it would give insight into the style of the individual's sketch as well as the general process by which all users draw. Though technically the amount of permutations of a shape of n strokes is n!, the actual number of permutations in a real-world sketching scenario is far less. For instance, when drawing a stick figure the vast majority of people either draw the stick representing the body, or the circle representing the head. One would rarely see the right leg or arm being drawn first.

Sezgin et. al. conducted a research study first in which this behavior is observed before proceeding into classifying the drawn shapes using HMMs. The sketches that the users were asked to draw were diagrams of finite state machines, Unified Modeling Language diagrams, scenes with "stick figures playing certain sports", digital circuit diagrams, and emoticons. This produced a total of 180 sketches. This user study revealed that users draw these sketches in a "highly stylized" fashion, that these styles persist across different sketches, that users preferred a particular order when drawing sketches, and that enclosed shapes (e.g. circles, squares, triangles) were drawn first.

In order to apply HMMs to object recognition, the system used had to support for multiple classes and different orders of drawing the same sketch, supporting any variations in drawing/encoding length, being able to compute a probabilistic matching score, and be able to support system-level learning of the user's intuition in drawing order. In terms of encoding, a total of 13 symbols were used as observations to encode the output sequences. 4 were used to encode lines, 3 encoded ovals, 4 for polylines, and one to encode any complex approximations that consisted of two or more of the above. Additionally, there were two kinds of HMMs: fixed input length and variable input length. Because the construction of the grouping graphs is made easy due to the "hard coding" of the fixed length data, and because it requires artificial grouping on a per-case basis, the former has somewhat of a limited use in "real world" sketch recognition. The sketching data is grouped by similar "sketching styles" in order to avoid this artificial partitioning, and Sezgin et. al. introduce weights in the graph to account for the differing lengths in the data. This HMM was implemented via BNT in Matlab. The Bakis left-to-right topology was used due to the incremental nature of sketching.

This work was evaluated by both evaluating real data and comparing the performance of this new algorithm to an existing "baseline method". With the real data coming from a total of 88 sketched objects from users, the fixed input HMM method yielded an accuracy of 96%, while the variable input method yielded 81% if they had no explicit ending states, and 97% if they did. "Negative examples" were introduced into the input data, such as superfluous or intentionally unrecognizable shapes to gauge how well the algorithm performed. The results varied depending of the size of the "neighborhood of misrecognition": 1 in 57% of the trials, 2 in 35%, 3 or more of 8%. With the simulated "low level errors", 43.5% of them were within a neighborhood of 1, 22.5% within 2, and 3% within 3 or more.

In order to test the "baseline recognizer", a feature-based one was used. This would assign spacial context to different shapes and would provide additional structural object descriptions. The running time for the baseline method increased exponentially, whereas the HMM-based recognition system scaled linearly.

Overall, I think the work presented in this paper was substantial. It presented not only a very accurate method of classification, but it also opens the door to many future experiments in identifying and exploring the differing "sketching styles" of people and how they affect sketch recognition. User studies can become far richer when we begin applying HMMs not only to describe their sketching styles, but also to predict them in the future. Using neural networks these can also be used to create original sketches made by a computer in the "style" of a specific user's sketch.

Wednesday, November 5, 2014

A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition

Bibliographical Information:
Rabiner. "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition". Preceedings of the IEEE, Vol. 77, No. 2, February 1989.

URL:
http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf

This paper covers the explanations and basic tutorials regarding Markov Models, namely, the Hidden Markov Model that either derives information from or attempts to provide information regarding a hidden state, usually the origin.

The paper begins by explaining a standard Markov model, which is essentially a state machine with probabilities attached to each state, and one that can likely change its probabilities depending on current and previous states reached. This probability can be evaluated much in the same way that any probabilistic problem can be evaluated, by multiplying certain probabilities together based on a set of rules. The paper then extends this explanation to Hidden Markov Models, which requires that some piece of information, usually the origin point from which the states are determined, is unknown. Using the probabilities of the outputs, we can use Hidden Markov Models to numerically estimate the probabilities of the inputs. The total number of unknown parameters in an HMM system depends on the different kinds of models. For instance, an HMM system that estimates the coins used in a coin-toss experiment varies depending on whether we have 1, 2, or 3 coins. With 1 coin, we have only one unknown parameter. With 2 coins we have 4 unknowns, and with 3 coins we have 9 unknown parameters.

The paper describes the basic elements of an HMM:
1. The number of states in the model. Though these will be frequently the unknowns, there is also a state of "physical significance". The coins themselves represent the number of states in the coin-toss experiment, since the coins will only have two states (heads or tails)
2. The number of "observation symbols", or rather, the number of state types. "Heads, tails", means this number will be two.
3. The state transition probability distribution. The multiplication of the probabilities of how likely it is for the HMM problem to move from one state to the next. If a biased coin is more likely to switch sides if the same side has been acquired three times in a row, for instance, that probability will show up here.
4. Observation symbol probability distribution. The multiplication of the probabilities of the observations. For instance, the total probability of "how likely is this coin to come out heads, tails", etc.
5. Initial state distribution. The multiplication of the probabilities of the first, initial state.

The paper continues to describe three main "problems" that can be solved through an HMM:
1. Computing the probability of the observation sequence given a model
2. Computing the "best" observations given a model (i.e., the ones likely to yield meaningful training data)
3. How do we adjust the parameters of the model to maximize the probability of the observation sequence?

Rabiner describes all three of these main problems using basic examples that highlight the main concepts that the problems describe. It also goes over some general steps in how one may solve each of these three problems.

The paper goes over some additional "types" of HMMs. For instance, it describes the "left-right", or "Bakis" model, called left-right due to the underlying state sequence increasing as the state index increases. No transitions are allowed to states whose indices are lower than the current state. It also discusses the many instances in which the observations are not expected to be discrete, but rather continuous observations whose data and probability changes over time. Applicable to speech processing, another type was briefly discussed, the "auto-regressive HMM", where the observations are drawn from what the paper calls an "auto-regressive process", which ensures that each sample signal is counted exactly once.

Additional, more advanced information is covered that involves optimization of models, variants on the discussed HMM structures, scaling to include additional sequences, accounting for multiple observation sequences, choosing models, covering what happens when there is too little data, and an extended example of an application of an HMM structure to speech recognition.

Overall, I think this paper was very informative, if only a little exhaustive. The basics of Markov models and, more specifically, HMMs, are not as daunting as this paper may initially present them to be. Some of the examples that the paper went over, such as the ball-urn model or the coin-toss model, were only used to establish some terminology, not so much run through a bottom-up description of an entire HMM structure with these easy-to-understand models. Indeed, the only real extended example is that of speech recognition, which in itself is difficult to understand, so posing the understanding of an HMM structure to speech recognition is likely too esoteric for a beginner of HMMs. In fact, someone likely to be unfamiliar with HMMs is very likely to be unfamiliar with speech recognition, so this extended example is likely not the best to use as a "tutorial". Some additional information was then presented to the reader, such as the different types of HMMs, before a sufficient enough explanation was provided for the more simple types of HMMs, so that also hindered the learning experience somewhat. The subject is, however, very relevant to the field of sketch recognition and important in deriving the likelihood of what a user meant to write on a digital pen system rather than relying on what the user actually writes.

Saturday, November 1, 2014

LADDER, a sketching language for user interface develoeprs

Bibliography Information:

Hammond, Davis. "LADDER, a sketching language for user interface developers". Computers & Graphics 29. Pages 518-532. 2005.

URL:

http://www.sciencedirect.com/science/article/pii/S0097849305000865

LADDER was designed around the increasing ubiquity and extended use cases for applied sketch recognition. In areas for sketch recognition where the user and creator are both expected to share some knowledge of a particular domain (such as a characters in a new language, mathematical symbols, or free-hand physics diagrams), the requirement for someone to create a sketch recognition system in the past would require the "instructor" (that is, the individual providing the domain-specific materials) to have additional significant experience in the concepts behind sketch recognition. LADDER was built specifically for user-interface developers to more easily create sketch recognition domains without the need for expertise in the field of recognition. Instead they would be asked to describe how the diagrams in the domain are drawn, displayed and edited. To this end, the LADDER system handles shape and editing recognizers, shape exhibitors (that is, rules in how shapes are displayed to the screen), and an extensive sketch recognition system. A code generator was built so these descriptions can be input into the base recognition system.

LADDER's system relies mostly on recognizing as few strokes as possible (such that an "artistic rendition" of a cat, complete with thousands of small strokes for added facial and shadow detail, etc., would not be considered), relying instead on the most essential geometric rules of a shape to help craft the definition of a shape. Shapes are identified as a list of components (for instance, an arrow is identified as being "built from three lines"), constraints that define the relationships between the components, a set of aliases that simply elements in the description, editing behaviors that differentiate user input between "drawing" and "editing" actions (cut and paste, translation, rubber-band resizing, etc.), and display methods such as beautification of input strokes (strokes intended by the user to be lines but exhibiting minor hand tremors are "redrawn" as perfectly straight lines).

These recognized shapes are then identified into shape "groups" that make up more complex shapes, usually these consisting of domain-specific shapes unique only to one particular field (such as shapes depicting chemistry diagrams, Kanji characters, etc.) The language that describes LADDER shapes is based on a hierarchical system, providing descriptors such as the bounding box, center point, width and height, etc. LADDER is has several predefined constraints built into it, such as perpendicular, parallel, sameSide, oppositeSide, etc. Some constraints are valid only in a particular direction, and some directions can be specified in degrees relative to other shapes. The researchers found that a description system based on orientation is most useful to content providers that would use a system like LADDER.

Recognition of domain shapes happens over the course of several bottom-up data triggers where a recognized shape in the drawing can represent a particular "fact about the world". This shape recognition uses the Jess rule-based system that performs "clean up" on these shapes. These recognized shapes are added into the Jess system for later recognition. Additional similar rules exist on user input intended to perform edits on existing shapes, except these rules are applied to the user input rather than the resulting sketch.

I believe the work provided in LADDER is an extremely important advancement on the field of sketch recognition, since it works a great deal toward 1) contextualizing sketch input that makes this technology useful for real-world applications and 2) making it easier for content providers to create and support their own domains without expertise required in the field of sketch recognition. This system and its underlying principles have been used as the driving force behind much of the projects in the Sketch Recognition Lab.

Wednesday, October 22, 2014

Combining Corners from Multiple Segmenters

Bibliography Information:

Wolin, Field, Hammond. "Combining Corners from Multiple Segmenters". SBIM '11 Proceedings of the Eighth Eurographics Symposium on Sketch-Based Interfaces and Modeling. Pages 117-124 . ACM New York, NY, USA ©2011

URL:

http://dl.acm.org/citation.cfm?id=2021164.2021185&coll=DL&dl=GUIDE

This paper seeks to vastly improve the corner finding process by combining the corner finding algorithms from five different processes and determining which of the given points are discarded and which should be kept by using a variation of feature subset selection. This algorithm uses sequential floating backward selection (different from SFS - Sequential Forward Selection) and a mean-squared error (abbreviated MSE) to find the best subset of corners that are then used to identify the corners themselves. This combination reaches an all-or-nothing accuracy of 0.926.

Corner finding, also known as "segmentation", "fragmentation", or "cusp detection", is a vital aspect of sketch recognition. As the "segmentation" name may imply, it allows an algorithm to divide one polyline into a varying amounts of straight lines, which can easily be recognized and processed individually. From there, it is feasible to recognize lines, arcs, curves, ellipses, spirals, and helices. Existing corner finding algorithms focus on one technique, such as curvature detection or a combination of speed and curvature. There exist considerable drawbacks for each technique, however: over-segmentation (finding too many corners), under-segmentation (finding too few corners) and incorrect segmentation (finding the wrong corners). Another issue is the use of thresholds, which are typically calculated empirically but are far less flexible than data that can be trained onto the system. The paper recognizes the "No Free Lunch" theorem, where the increased focus of an algorithm into a particular area means that the algorithm will lose just as much focus (and performance) in a different area. The basic concept of this paper is that by combining the five major corner recognition algorithms, the shortcomings of one of these recognizers can be mitigated by the focus of the next.

The five segmenters that are used are ShortStraw, Douglas-Peucker, PaleoSketch, Sezgin's segmentation algorithm, and Kim et. al's. ShortStraw, Douglas-Peucker, and PaleoSketch rely on line tests to determine if a segment between two identified corners is a line, and Sezgin's and Kim's algorithms use multiple-primitive segmenters to split the strokes into lines, curves, and arcs. The results from all of these are combined with duplicates removed. This algorithm then uses a feature subset selection algorithm dubbed the "corner subset selection".

Feature subset selection is a technique to remove unneeded dimensions from the classification process that only contribute noise. In this case, each "feature" is the set of points. If removing a point reduces performance by 1%, we remove it since it is not likely to result in providing useful information. Conversely, if removing a point reduces performance by 20%, it is an important point and thus we preserve it.

This algorithm uses sequential floating backward selection algorithms to sift through the initial corner data, which differs from more traditional sequential forward selection algorithms by allowing the algorithm to reconsider previously discarded points. Say, for instance, we remove feature Fs, and in the following step we analyze if adding Fs back into the subset will yield better performance. Bookkeeping techniques are employed to mitigate the possibility of adding the same Fs back and forth in subsequent steps, far after it has been proved that Fs is irrelevant to subset.

To determine the performance of the system, the Mean-Squared Error (MSE) is calculated between the stroke segments and an "ideal" shape created by linking the corners. The equation to calculate the MSE is as follows:

Where i is the index of the original stroke, opti is the optimal point that's vertically closest, pi is a point in the original stroke, and N is the number of original points.

To work against the fact that there's likely to be a change in MSE due to the changing of shape scaling, we normalize MSE between the existing i and the next. Overall, using this technique it was found that the overall accuracy was much higher than any of the individual implementations, and this combination was not significant enough in run time to affect performance.

I think that the paper was well presented and explained. I feel the evaluation segment could have been more elaborated upon, but it is possible that user testing was not needed when the accuracy was considered on an all-or-nothing basis.

Tuesday, October 21, 2014

Sketch Based Interfaces: Early Processing for Sketch Understanding

Bibliographical Information:
Sezgin, Stahovich, Davis. "Sketch Based Interfaces: Early Processing for Sketch Understanding". Proceedings of PUI-2001. NY. 2001. ACM Press.

URL:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.8291

This paper is one of the earlier efforts in attempting to generate corner-finding and curve recognition algorithms. The extended intention of this is to recognize shape primitives in such a way that makes geometric sense: drawn squares are converted into geometric squares that a second system would then be able to understand. This paper describes the multiple processes utilized to recognize freehand sketches.

There are three phases to the approach of "early processing" for sketch recognition: Approximation, Beautification, Basic Recognition.

Approximation
Fits lines and curves into a given set of pixels. Intended to streamline data in such a way that it's easier to identify the geometric properties of a given freehand shape.

The first process in this phase is vertex detection. This essentially is what we refer to as "corner finding" in more modern sketch recognition systems. This paper uses three sources of information to help identify vertices: direction, curvature (change in direction with respect to arc length), and speed data. Interestingly enough, speed data was largely considered irrelevant in basic sketch recognition when it was introduced by Rubine, and Long, in fact, removed it entirely from his modified set of features. This vertex detection process uses the extrema in curvature and speed, and notices that the maxima in curvature and minima of speed are correlated. We cannot, for instance, simply take minima in speed to mean that the user was drawing a corner, even if a typical user typically slows down his or her pen when drawing a corner. There also has to be a change in curvature, or else we would consider any abrupt change in speed, even on straight lines, as a "corner". However, even using both of these attributes at once yields false positives, so the paper uses a technique called "average based filtering" that looks for extrema in portions of curvatures that lie beyond a specified threshold. This threshold is based on the mean of each data set. These thresholds help us separate the data into regions that we can then use local and global maxima/minima on to identify which of these are false positives. These thresholds are applied to the curvature data and speed change. From there, the paper generates hybrid fits that implement these together to identify true positives in vertex detection. To generate hybrid fits, the paper computes vertex certainties, generates a set of hybrid fits, and selects the best fit from the previous two.

Beautification
This is the process in which small alterations to the input sketch are generated. For instance, lines that were clearly meant to be drawn as parallel (e.g. a rectangle) are altered so that they are geometrically parallel. Circles drawn imperfectly are changed to have geometrically accurate diameters. Curves that are a hybrid of these two are approximated as a set of several small geometrically accurate ellipses pieced together to form one large curve. Slopes are determined as a weighted average of the slopes of each segmented section of the shape, to determine the intended slope when the user was drawing it.

Basic Recognition
Objects that can be drawn from the user's input sketch are classified into lines and curves, and then classified further into a set of basic geometric shapes such as ovals, circles, squares, and rectangles. These features are hand-tailored from the given information presented by the paper's algorithm. It uses the segmented sections of the shape divided out by the vertices recognized in the 'approximation' phase, and uses template matching to see if they align with the geometric rules given by the algorithm to classify them as each of the geometric shapes.

To evaluate this system, thirteen students used this system and a Unix tool called Xfig. Xfig is a tool that supports freehand sketching but requires users to indicate which sections of his or her sketch are meant to be straight lines, curves, etc. Users were given the chance to get comfortable with both of the systems, and their experiences were documented with qualitative analyses. The students expressed approval at the automated recognition of this new system, and all but one preferred to use this system than Xfig. Additionally, high-level recognizers were written to classify and beautify additional shapes. The paper also reports to be working toward supporting overtracing.

Overall, this paper I felt provides good information on early efforts given toward vertex recognition, and using them to construct a roughly hierarchical recognition system that would identify curves and lines to eventually generate geometrically-correct shapes. Some of the evaluation, particularly with the user study, was unclear and I felt could be expanded more. For instance, what happened when it generated incorrect sketch analyses to the user? Was the user expected to redraw the shape? Was this shortcoming communicated clearly? How did the user react? The only source of evaluation in the user study was whether they preferred this or Xfig, and whether they would use it again, but there is a significant amount of information that could be discerned from the user experience. Additionally, some of the information given with regards to the algorithm was unclear and required significant explanation to fully understand.

Thursday, October 16, 2014

PaleoSketch: Accurate Primitive Sketch Recognition and Beautification

Bibliography Information:
Paulson, Hammond. "PaleoSketch: Accurate Primitive Sketch Recognition and Beautification." IUI '08 Proceedings of the 13th international conference on Intelligent user interfaces. Pages 1-10 . ACM New York, NY, USA 2008.

URL:
http://dl.acm.org/citation.cfm?id=1378775

PaleoSketch, a novel low-level shape recognizer, is one of the flagship projects of the Sketch Recognition Lab. It is a recognition and beautification system that recognizes eight primitive shapes as well as combinations of these primitives themselves to achieve recognition rates upwards of 98.56%, as well as beautifying input input sketches to aid in classification at an early level. Much of this success has been brought about due to the inclusion of two new classification features, Normalized Distance between Direction Extremes (NDDE), and Direction Change Ratio (DCR).

To calculate NDDE, the point with the highest change in direction value (highest change of Y over change of X) and the point with the lowest change are recorded. The stroke length between these two points is the divided by the total stroke length, which essentially provides us with the percentage of the stroke that is between these two extremes. Shapes such as arcs have been observed to yield particularly high NDDE values. Polylines with particularly many "spikes" in their shapes, on the other hand, yield lower NDDE values.

DCR is used to telling polylines and curved strokes apart. DCR is similar to NDDE in that it's also used to see whether there are "spikes" in the drawn shape. DCR is the the max change in direction divided by average change in direction. Polylines will typically have very high DCR values, whereas they will have lower NDDE values.

From there, Paleosketch performs varying tests for classification, each with their own guidelines:

Line Test:
Polyline Test
Ellipse Test
Circle Test
Arc Test
Spiral Test
Helix Test
Complex Test

After collecting two sets of data (the first collecting 900 shapes from 10 users for training, and same amount for testing), it was found that the correct-shape interpretation was 99.89%, but only 98.56% was reported as the top interpretation. Polylines accounted for 1.44% of the error. Overall, the results rival those of state-of-the-art, low-level recognizers that do not recognize as many of these primitives. These are promising results, as they can easily be fitted into fitting higher degree complex interpretations.

As previously mentioned, this paper proved to be one of the most important in the Sketch Recognition Lab, as it was the single most accurate and novel implementation of sketch recognition. The implementation of the different tests, and the fact that there's a difference between, for instance, a circle and an ellipse as two different primitives, makes it clear that the classification categories that were considered prior to this paper could stand to have significant improvement.

What!?! No Rubine Features?: Using Geometric-based Features to Produce Normalized Confidence Values for Sketch Recognition

Bibliographical Information:
Paulson, Rajan, Davalos, Gutierrez-Osuna, Hammond. "What!?! No Rubine Features?: Using Geometric-based Features to Produce Normalized Confidence Values for Sketch Recognition." Visual Languages and Human Centric Computing Workshop on Sketch Tools for Diagramming, Herrsching am Ammersee, Germany, September 5, 2008

URL:
http://srl.cse.tamu.edu/srlng/research/paper/23?from=/srlng/research/

At the time of this paper's publication, there were two main types of recognizers: gesture-based recognizers or geometric-based recognizers. The former focused on a user's input gesture and analyzed it to recognize some low-level shape, but frequently did not analyze the resulting shape itself. The latter would take a single stroke and try to classify it as one set of several different primitives that can then be combined hierarchically to recognize more complex shapes. They both have their individual advantages and disadvantages, and typically their trade-offs would mean that one application would have to ignore the advantages of the other. This paper focuses on creating a combination of the two, resulting in a set of features that largely does not conform to the standards set forth in the Rubine paper.

The paper uses a quadratic classifier that includes the feature sets from geometric and gesture features. With a set that started off at 44 features, 31 of them were geometric based, with the rest being the 13 features proposed by Rubine. Tests were performed on 1800 sketch samples coming from 20 different users. The main goal was to see whether it was possible to create a statistical classifier that would classify single-stroke sketched primitives using both geometric and gestural features. The paper also sought to identify if such a system would produce results at least as good as PaleoSketch's.

Employing a greedy Sequential Forward Selection technique, the team selected a subset of the features. They used 10 folds of SFS, and the best-performing features (mostly anything above 50% accuracy) were chosen (a total of 14). The paper found that, using only 6 of these features, an accuracy of up to 93% was observed, suggesting these features contained a wealth of information required to correctly classify the vast majority of input data. The most important observation is that only one of the gesture-based features ended up being statistically relevant to the classification (total rotation), which incidentally was also used in PaleoSketch. This suggests that we may be moving beyond the Rubine features that had been previously dominating the sketch recognition discussion.

I found this a valuable paper, mostly because it contextualizes and puts into practice both geometric and gesture-based sketch recognition. By making a practical comparison between the two types that analyzed the classification accuracy of both, the paper was able to provide some much-needed insight into whether Rubine remained closely relevant at this point in time.

Tuesday, September 30, 2014

Sketch System Principles and Mechanix Feedback

What a Sketch System Should Do:

Pressure-based line thickness. Though this is a recent development and is highly dependent on hardware, it greatly enhances the feeling of
Use a stylus as its primary input method. Most touch systems of today use finger touch, but finger paint is an extremely limited subset of all kinds of paper sketching/painting. Stylus sketching is far more reflective of the type of sketching most people do on paper
Beautify strokes to appear as seamless as possible. Work to remove any jarring angular changes in sketches resulting from the touchscreen hardware. The more pixelated a sketch looks, the more it breaks the immersion of writing as one would on paper
Stroke deletion. Ensure that the system supports deleting entire strokes at a time. Although this does not reflects the realism of writing on actual paper, deleting entire strokes at a time is extremely convenient for anything other than drawing applications. It should at least be an option
Zoom in/out, translate. Since touch surfaces are typically smaller than regular pieces of paper (especially true of tablets), the user should be able to zoom in, out, and translate the page across a zoomed in version to better suit the user's preferred comfort of sketch size.
Have a non-intrusive interface. Interfaces in sketching is unlike any other kind of system. Features cannot be buried under menus, nor can every feature be listed at once. Features should be smartly presented and removed in such a way that it interferes as little as possible with the actual sketching
As big a sketch area as possible. Sketch areas should not stop strictly at the edge of the window. They should be easily and intuitively be resized. Avoid using the "Windows resize arrow" style of resizing, as it does not work well with a stylus or touchscreen.
Have a grid, unless it is a drawing application. Most every piece of paper, from regular writing to architecture sketching to mathematical formulas, all have notebook lines or grids to aid the user's writing task. Completely blank pages are only suitable for art drawing and should only be default if the application is intended for that domain
Provide ample amounts of stroke size and color options. Users are more creative when given stroke size and color choices, and frequently aid them when sketching for homework or productivity tasks. They are both fun in recreational use and useful in productivity tasks.
Simulate the on-paper experience as much as possible. The most successful sketch systems are the ones where a user acclimated to the system "forgets" he or she is writing on a digital surface when sketching or writing. All sketching system should strive for this kind of user experience.

What a Sketch System Should Not Do:

Aggressively auto-correct. While some beautification is expected, snapping lines to make them perfectly straight and at predetermined angles 100% of the time is jarring and removes a user's sense of agency.
Be laggy. A sketch system should have little to no lag between the user input and the sketch showing. Significant lag can break immersion at best and render the entire application useless at worst. A lag-less system is vital to capturing the illusion of writing on a regular piece of paper
Be "heavy". "Heavy" systems, whether in terms of computing power, or in terms of file size when saving sketches, is vastly inefficient and adds a layer of technology-borne burden that otherwise would not exist on regular paper.
Write or preserve a "mouse-based" interface. Do not assume that users have access to a mouse to complement their sketching needs, and thus do not use small interface items optimized for mice. Assume that the user will want to reach every last item of the UI through stylus or finger only.
Be strict in sketch order. Especially with sketches that are to be analyzed by a system, expecting users to know the order in which sketches should be drawn is highly unintuitive and should be completely avoided.
Fault the user for misinterpreted shapes. Nearly everyone is sure that the shapes they draw are "correct". If a stroke is misinterpreted, do not say "you have drawn this incorrectly", or something to that effect, as it is overly antagonistic and isn't in the spirit of free-hand sketching
Have the user spend more time on the UI than sketching. Digitized applications can always come with a plethora of options and features, but with sketching, less is more. Sketch systems should not see their users spend more time navigating and "figuring out" the UI than actually sketching.
Mix stylus with touch sketching at the same time. If the user is sketching with a stylus, disable all touch-based input and vice versa. Most people rest their hands on the paper when writing, so they do not expect the side of their hand to actually be involved with the writing process.
Use pre-determined shapes as a primary method of sketching. Paper sketching does not afford dragging and dropping items as a primary input method. These tasks are far more suited for more traditional, mouse-based systems.
Use non-paper backgrounds or patterns. Do not visually model the drawing surface aesthetic as anything other than what a person would reasonably expect to be a writing surface. Avoid the use of unusual colors or patterns that do not in any way resemble a drawing surface. This does not mean all interface should be skeuomorphic, it simply means they should not be so far removed from a writing experience that it is jarring to users.

Improvements for Mechanix:

Optimize idle CPU load. The CPU is always being "used" in Mechanix, even during idle time. This greatly drains batteries and runs hardware hot, which in a more tablet-based industry is quickly becoming unacceptable for users.
Save student progress on the database. Maintaining saved, partially completed homework assignments locally is not portable. Since the MCX system is online-only, there shouldn't be "I left my homework at home" scenarios of the offline world.
Re-think the items displayed on the drop-down menus. Some features are experimental, some antiquated, some buggy. For the user build, we should comb through what is shown on these and remove the ones that aren't intended for a student to use in his or her homework assignment.
Alter color palette. Some of the contrasting colors can be rethought to better suit the homework style for MCX. Green-on-white for the description of solved problems, for instance, strains the eyes too much.
UI scaling. Although this is a big task, most computers having a wide range of resolutions among them means Mechanix should implement UI scaling to preserve the user experience. Users with lower resolutions have considerably more trouble with much smaller sketching areas.

Sunday, September 28, 2014

Visual Similarity of Pen Gestures

Bibliographical Information:
Long, Landay, Rowe, Michiels. "Visual Similarity of Pen Gestures." CHI '00 Proceedings of the SIGCHI conference on Human Factors in Computing Systems. Pages 360-367. ACM New York, NY, USA

URL:
http://dl.acm.org.lib-ezproxy.tamu.edu:2048/citation.cfm?id=332458

This paper provides somewhat of an early look into computational interpretation of sketches. Specifically, the authors gear this research toward one-stroke gestures intended to be used as commands, as was common the year this paper was published. PDA devices with smaller screens relied on one-stroke gestures to perform commands, and while it proved convenient, many felt frustration and confusion when the system failed to interpret their gestures as intended. The paper presents two experiments intended to provide better insight into how humans perceive "similarity" between two gestures. Indeed, one of the more common complaints when a user is presented with a gesture recognition failure is "but my gesture really does look like a triangle!" Providing insight into why and how a human can perceive similarity is what leads to the paper's main motivation, that being the identification of what the paper calls "perceptual similarity", and whether this similarity can be computed empirically. Both experiments involve users being presented with similar shapes, with the user selecting the shape in the set that is the least similar. Computationally, the authors generated a set of gesture features that played the most vital role in determining how something can be perceived as "similar" or "different" based on these responses. The first experiment involved the user picking between a set of three gestures, called triads. All triads were seen exactly once. The second experiment involved three new gesture sets of nine gestures each, with those gestures being the point of comparison. The paper presented a computable model to determine this perceptual similarity, which they found to coordinate with reported similarity with a factor of 0.56. The authors found these results to be encouraging and significant, as a computational model was now possible to help determine what could previously only be a subjective analysis.

The paper provides a good foundation for the concept of perception and what would eventually become geometric recognition. I did find some issues with the fact that none of these gestures compared were drawn by users, but rather picked as shapes from a menu of choices. While the significant reduction in experiment time was important, the task being observed here is one of simple image comparison, which is very different from the actual practice of a user's gesture and how a computer would perceive it. Additionally, even though subjectivity is important to capture, relying on pure subjective analysis between individual users to then generate a computational formula to replicate it would only reflect the differences in perceptions of that particular group at best. However, this paper still did provide a significant first step toward computationally determining similarity as it would be reported by a human being.