Raniero's Research Blog

Monday, December 8, 2014

KimCHI: A Sketch-Based Developmental Skill Classifier to Enhance Pen-Driven Educational Interfaces for Children

Bibliography Information:

Hong-hoe Kim, Paul Taele, Stephanie Valentine, Erin McTigue, and Tracy Hammond. 2013. KimCHI: a sketch-based developmental skill classifier to enhance pen-driven educational interfaces for children. In Proceedings of the International Symposium on Sketch-Based Interfaces and Modeling (SBIM '13), Stephen N. Spencer (Ed.). ACM, New York, NY, USA, 33-42. DOI=10.1145/2487381.2487389 http://doi.acm.org/10.1145/2487381.2487389

URL:

http://dl.acm.org/citation.cfm?id=2487389

KimCHI is a sketch recognition system developed toward the recognition of sketches drawn by children. Specifically, it aims to recognize children's sketches and differentiate them between whether the drawer is a pre-school child or an elementary-school level child. Additionally, the way that sketches are drawn can be analyzed to tell apart whether the user is a male or female, and whether the user is an adult or child. The novelty in this recognition system is due to the fact that there has been very little work done on analyzing sketch recognition for children, since children draw shapes very differently depending on their fine motor skills as well as many other factors.

Out of the data from the 725 sketches created by 20 children, classification techniques were used to develop the optimal features for classifying between pre-schoolers vs. grade schoolers. These features are: average curvature of the stroke, direction change ratio, error of the best fit line of the direction strength, and the max curvature to average curvature value. All of these features have a 100% accuracy in determining these factors. The optimal features for classifying between pre-schoolers and adults are Average curvature of the stroke (100%), Direction change ratio (100%), The angle of the major axis relative to center (100%), The error of the best fit line of the direction graph (100%), The maximum curvature to average curvature value (100%), and Slope of the direction graph (100%).

Sketch recognition in children is one of the domains that is highly overlooked. I believe that this is imperative if we want to move sketch recognition as a field into the realm of education. We can use this for number recognition to teach children handwriting as well as basic arithmetic, since these are important pillars of early grade school education.

Sunday, December 7, 2014

World of Workout: A Contextual Mobile RPG to Encourage Long Term Fitness

Bilbiographical Information:

Joey Bartley, Jonathon Forsyth, Prachi Pendse, Da Xin, Garrett Brown, Paul Hagseth, Ashish Agrawal, Daniel W. Goldberg, and Tracy Hammond. 2013. World of workout: a contextual mobile RPG to encourage long term fitness. In Proceedings of the Second ACM SIGSPATIAL International Workshop on the Use of GIS in Public Health (HealthGIS '13). ACM, New York, NY, USA, 60-67. DOI=10.1145/2535708.2535718 http://doi.acm.org/10.1145/2535708.2535718

URL:

http://dl.acm.org/citation.cfm?id=2535718

World of Workout is an alternative exercise application for mobile phones. It mixes the "game-ification" of an exercise application that is becoming increasingly more common with a dedicated game application. The idea behind World of Workout is to create an avatar in a game world whose statistics depend directly on the exercise data of that particular user.

The user's avatar has three main statistics: strength, speed, and stamina, all of which have a direct role in how the avatar plays in the game, and all are directly influenced by the exercise data. In order to detect user exercises, the application uses a phone's built-in GPS and accelerometer feature to track runs and stationary exercises. The in-game avatar attributes of speed, strength, and stamina are affected by how a user performs sprints, weights and push-ups, and long distance runs and biking respectively. The game also contains a random-reward function, in which a user is recommended one particular exercise with an immediate and significant bonus to the avatar's statistics if the exercise is completed. This is one of the two components that encourages users to try a variety of exercises, which is both safe for the user as well as creates a more well rounded exercise program.

The second of these components is the "fatigue" mechanic, which is integrated deeply into how the exercise translates to avatar progress. The rate at which the player's avatar raises his or her stats is linear in terms of how much exercise that person performs. Indeed, the biggest progress happens when the user performs the biggest variety of exercises possible. "Fatigue" in this case is an implementation of diminishing returns, where a user who does nothing but sprint will see his or her improvements in the avatar's speed diminish the more he or she does it, to the point where the improvement in avatar speed is negligible.

The reasons for implementing this mechanic are numerous. First, as mentioned previously, this is done to create a well rounded exercise program. Second, repeated and concentrated use of any one exercise can prove to strain the muscles and eventually become dangerous, so this is implemented to prevent users from over-exerting themselves for the sake of improving their avatar. Third, this system prevents cheating, where a user who has figured out how to "cheat' the phone's accelerometer to make it think that the user is running when really he or she is in place.

I think that the fatigue factor in this application is easily the most novel contribution to this space. Even though some applications have existed to game-ify exercise activities, there are not very many mobile games that are directly linked to an exercise application, and certainly not one where the game's avatar performance is directly linked to the user's exercise regimen. Finally, the fatigue factor performs many functions at once and creates a level of abstraction between player performance and avatar progress that is vital to making a video game "fun". I believe that a larger set of "fatigue"-like mechanics applied to this exercise application, as well as the development of a very robust game to go along with the exercise portion of the activity can prove to make a very useful tool for exercise.

Recognizing Interspersed Sketches Quickly

Bibliographical Information:
Tracy A. Hammond and Randall Davis. 2009. Recognizing interspersed sketches quickly. In Proceedings of Graphics Interface 2009 (GI '09). Canadian Information Processing Society, Toronto, Ont., Canada, Canada, 157-166.

URL:
http://dl.acm.org.lib-ezproxy.tamu.edu:2048/citation.cfm?id=1555917

One of the biggest challenges in sketch recognition is in the recognition of of "interspersed sketches", which consists whose shapes are drawn out of order from other shapes. In the drawing of two triangles, for instance, one may choose to draw the first two sides before drawing a third. This proves challenging for many recognition systems, with algorithms that end up running in exponential time. By coming up with a system of constraints in which to look, this recognition system greatly reduces the runtime for recognition that supports interspersed sketch drawing, even if it still runs in exponential time. This algorithm uses an indexing technique that uses geometric recognition for the purposes of ranking and pruning.

The geometric constraints can be classified in four different categories: Orientation-dependent constraints (such as "horizontal", "vertical", etc.), Orientation-independent constraints ("acute", "bisects", "same side"), Composite constraints ("smaller", "centered left"), and "other" ("equal", "greater than"). Signal noise, while significant, is handled by the recognition system itself. The template shapes are typically described without this signal noise, so when the user shape is recognized, there is no signal noise being accounted for either. Additional noise can occur if the user takes more than one stroke to draw a shape. This is alleviated by splicing together bigger shapes from smaller strokes if they correctly match a given shape. The system also uses a collection of constraints, each constraint with its own error tolerance that will cause the recognition to be discarded if it lies outside of this constraint. There are two types of "requirements" that must be satisfied. Not only must a shape be "geometrically satisfied", which is the constraints that are purely geometric based, but also "perceptually satisfied", which can be described as the set of constraints that any regular human would be able to perceive. For instance, a stroke that is "to the left" of another one may be geometrically satisfied if the shape is even one pixel to the left of the original, but a human would very rarely ever make the same call if two shapes are presented with this small a discrepancy. Making sure a human can perceive these constraints is key to making sure they are meaningful in sketch recognition. The constraints were both calculated and experimented with users.

The indexing algorithm to help rank these constraints and how they score is divided into three sections: domain-independent primitive finding, domain-independent constraint indexing, and domain-dependent shape formation. The first breaks down the shapes given to primitives such as arcs, ellipses, and circles, and corners, the last of which is found via speed and curvature data. The second uses indexing that occurs only once for every shape, such that the algorithm does not impose too many rules to the shapes it recognizes. Each shape is only indexed based on its own constraints and NOT to the constraints of the shapes around it. The final section attempts to group together indexed shapes into compound or more complex versions of the primitives. The system assigns recognized shapes into a limited number of "slots" which are then used for comparison and possible combination. The constraints applied to the compound shapes then dictate whether the shapes in the slots are removed, remain, or exchanged. This system also makes extensive use of pruning to prevent unnecessary branching, which cuts down the run time drastically.

I think that this constraint-based limitation is an interesting alternative implementation to the problem of interspersed-shape sketch recognition. This is especially useful in educational situations, where the user is expected to be a student who is not an expert in their field of study and thus are unlikely to have developed a standard, industry-followed order of sketch recognition, if there is even any one at all. In the field of education, sketch recognition needs to be highly flexible to allow different students to learn in their own way, so in this case the application of interspersed-shape sketching can prove very useful.

An image-based, trainable symbol recognizer for hand-drawn sketches

Bibliographical Information:

Levent Burak Kara, Thomas F. Stahovich, An image-based, trainable symbol recognizer for hand-drawn sketches, Computers & Graphics, Volume 29, Issue 4, August 2005, Pages 501-517, ISSN 0097-8493, http://dx.doi.org/10.1016/j.cag.2005.05.004.
(http://www.sciencedirect.com/science/article/pii/S0097849305000853)

URL:
http://www.sciencedirect.com.lib-ezproxy.tamu.edu:2048/science/article/pii/S0097849305000853

This paper is about a trainable, hand-drawn symbol recognizer that is based on the concept of "multiple layers". While this system differs from most symbol recognizer systems in that it uses template matching extensively, it is in no way at a disadvantage for doing so.

The focus in this system is to develop a "portable" ink recognition utility that could be used in multiple applications. For that reason, the system's "trainability" by a layman user is an important feature. Symbols intended to be treated as "templates" can easily be drawn and re-drawn by the user at will, creating a flexible system in which a user can input new template data without having to navigate away from the main app.

One of the significant and curious new contributions in this paper is in how it deals with rotations while template matching. Rather than relying on the traditional coordinate or an x-y pattern, the sketches are looked at using a polar coordinate system that makes template and sketch comparison much easier and more complete. This is due to the fact that images can be rotated and analyzed in the "native" application of the polar coordinate system.

The architecture of this recognition system is as follows: from the raw "ink" input from the user, there is the possibility that this input will be used as a template in itself. Whether or not it is used for recognition, there is some pre-processing done on the ink itself, where the image is rasterized into a 48x48 grid where the main template is identified as a series of "sectors" that the 48x48 dots are colored. This makes the system much more easy to use for storage and the recognition itself.

From there, if the stroke is to be used for recognition purposes, the grid is changed into polar coordinates such that it can be compared against its template with no need to account for any data discrepancies caused by rotation. This change is motivated largely by the fact that manually changing sketches to match the angle of the template is extremely computationally expensive. The result is given in the image above, where the letter "P" in its upright and rotated positions still yield incredibly similar graphs when both are converted to polar coordinates. At worst, the rotation will cause the latter segment of the polar graph to shift to the beginning of the entire graph, but the graph as a whole is still incredibly similar. This is used in the system as a way of "pre-recognizing", where the system can use discrepancies in polar coordinate data to help determine if an input stroke matches a template.

The classifiers that are used for template matching are Haudstorff Distance, a modified version of the Hudstorff Distance, the Tanimoto Coefficient, and the Yule Coefficient. The result from each of these is used to generate recognition results. This as well as the conversion to polar coordinates yields a very flexible and trainable system, since the user study utilized a wide variety of symbols. The symbols used were beam, pivot, root, pump, Cantilever beam, piston, sum symbol, random number symbol, square, spring, current, sine wave, matrix, damper, circular sum, pulley, differentiator, diode, integrator, and signum.

I think that the use of the polar coordinate system can yield a large variety of additional features that have not been explored yet. For instance, I think that most of the rotation-based features, such as the various features based on rotation from Rubine, can be applied to polar coordinates and will likely yield analysis times that are much better, much like the way that this sketch recognition system uses it to benefit from its improved analysis run time.Another interesting application could be in using this to apply sketch recognition to shapes drawn in polar coordinates, which is a rarely used but vastly overlooked possibility in the field of sketch recognition.

Tahuti: A Sketch Recognition System for UML Class Diagrams

Bibliography Information:
Hammond, T., & Davis, R. (2002). Tahuti: A sketch recognition system for uml class diagrams. In AAAI Spring Symposium on Sketch Understanding (Vol. 3, No. 2).

URL:
http://rationale.csail.mit.edu/pubs/hammond/hammondsketchsymp2001.pdf

Tahuti is a sketch recognition environment designed around creating a "smarter" form of sketch recognition. The system is designed around recognizing shapes via geometric recognition, rather than requiring users draw a shape to match a template. This system uses a multi-layer framework for the recognition.Strokes are initially recognized as line segments, then grouped into more complex shapes if they are recognized as any particular shape.

The system recognizes seven "viewable" objects: a general class, an interface class, an inheritance
association, an aggregation association, a dependency association, an interface association, or a collection of unrecognized strokes. These objects can be further edited, which is a highly important feature in Tahuti that sets it apart from many other sketch recognition systems. For geometric recognition, Tahuti uses a variety of geometric "rules" that each shape follows. For instance, the way that Tahuti recognizes arrows is as follows:

Locate arrow shaft (identify points furthest away from each other. In the above picture, this would return the line segment between A and B)
Locate arrow heads (identify points furthest away from the shaft. This returns C and D)
Locate E, the point in the AB line that is twice the distance from B as there is distance between the line formed from the points CD to the arrowhead B.
Look to classify these line segments as either the arrow shaft, arrow head section, or "unclassified".
Based on the results from 4, classify arrow type as "dependency", "inheritance", "aggregation", or leave unclassified.

A preliminary study showed that the high accuracy of the system was pleasing to most users, who scored the system a 4.375/5 for drawing and a 4.825 for editing. The use experience was compared against an alternative paint program called Rational Rose, and most users preferred Tahuti for its ease of use, especially when it came to editing.

I believe that Tahuti was one of the early papers that influenced the Sketch Recognition Lab's focus on geometric recognition over template matching. Indeed, at this point in the field of sketch recognition, barely any of it is simply template matching. I also believe that it smartly uses the concepts of geometric properties from other sketch recognition techniques such as Rubine's features. This style of sketch recognition I believe is intuitive for testing applications in education in geometry, where the very systems used to recognize can be the ones used to grade basic geometric problems.

Fitts' Law as a Research and Design Tool in Human-Computer Interaction

Bibliographical Information:
I. Scott MacKenzie. 1992. Fitts' law as a research and design tool in human-computer interaction.Hum.-Comput. Interact. 7, 1 (March 1992), 91-139. DOI=10.1207/s15327051hci0701_3 http://dx.doi.org/10.1207/s15327051hci0701_3

URL:
http://dl.acm.org/citation.cfm?id=1461857

This paper is an outline on the principles of Fitts' Law, and how it can be applied to the domain of Human-Computer Interaction. Previously, this research and the subsequent applications had been applied only to the domains pertaining to psychology, measuring the cognitive effects of differing "targets". The main focus is in quantifying task completion time through a mathematical formula that depends on both the distance that the test subject is expected to traverse and the size of the target he or she is expected to "hit". The formula that Fitts developed is shown below:

$T = a + b(ID) = a + b \log_2 \Bigg(\frac{2D}{W}\Bigg)$

Where T is the average time it will take the subject to perform the task, a and b are the parameters of the particular model, ID is what is called the "index of difficulty", which is calculated the logarithm (base 2) of twice the distance D of the target over the width of the target W, measured along the axis of rotation. The variation of this law that is most frequently used in the field of Human-Computer Interaction is called the "Shannon form", named after the Shannon-Hartley Theorem since the notation below resembles it very closely:

$T = a + b \log_2 \Bigg(1+\frac{D}{W}\Bigg)$

The purpose of this model is in the development of information in the form of "bits per second". This is decidedly an interesting abstraction, since at the time of this form's development little work had been done in quantifying human brain activity using computer science metrics. Interestingly, this particular implementation was originally not conceived as a direct application to the field of computer science but rather psychology, which was an interesting concept and eventually became very easily applied to computer science when it was decided to be applied that way.

Fitts' Law in the field of computer science has been traditionally used for gauging user interface readability with a traditional mouse pointer. However, I believe we can apply Fitts' Law in numerous ways. We can apply Fitts' Law in psychology exams delivered through computer systems to better analyze this information, and we can likely develop additional metrics and more information-rich variation of Fitts' Law once we can analyze this information with much finder granularity.

Wednesday, November 19, 2014

Using Entropy to Distinguish Shape Versus Text in Hand-Drawn Diagrams

Bibliographical Information:
Bhat, Akshay, and Tracy Hammond. "Using Entropy to Distinguish Shape Versus Text in Hand-Drawn Diagrams." In IJCAI, vol. 9, pp. 1395-1400. 2009.

URL:
http://www.aaai.org/ocs/index.php/IJCAI/IJCAI-09/paper/download/592/906

This paper written by the Sketch Recognition Lab identifies text and free-hand diagrams using only one feature: zero-order entropy rate. The paper found that relying solely on this entropy rate yields correct classification between text drawings and diagram sketches 92.06% of the time. This feature also performed favorably in domains for which no training examples were provided.

The main motivation behind this paper is the fact that many sources of sketches introduce both hand-written text as well as shape-based diagrams. These examples include engineering diagrams where the author writes annotations and labels components with text, military course of action diagrams, and UML sketches have both text and shapes as considerable components in their respective sketches. Separating what is a sketch and what is hand-drawn text is necessary for accurate sketch recognition in both domains. Despite the fact that recognition of both types of sketches vary greatly in procedure and final classification, these two types are found frequently in traditional sketching, providing a strong motivation to be able to separate them successfully.

This paper uses zero-order "entropy rate" to classify sketches between shapes and text. Entropy measure is essentially a measure of "the degree of randomness from an information source". With regards to sketch drawing, this randomness appears with far higher frequency with text sketches rather than sketch shapes. The paper develops an entropy model "alphabet" to characterize how these shapes tend to be drawn. This essentially means that the range of angles of any stroke are divided into discrete thresholds and assigned a symbol as a classification (in this case, the range from 0 to pi/6, for instance, is given the symbol "A". Different ranges correspond uniquely to B, C, D, E, F, and a range corresponding to an "end-point" is given the symbol X.

With this classification, the strokes are grouped and an "alphabet" of each of their angles is shown. The representation of text strokes yield more varied letters than shape strokes, and that is what is used as the "entropy" measure in this instance. The strokes were labeled as "shape", "text", or "unclassified", the last of which is the classification where the entropy measurement is above the threshold to classify as a "shape", but below the threshold of "text". The results are promising, with 92.06% correct classification when the system is required to produce a classification. Military course-of-action diagrams and other types of sketches that haven't been assigned a "domain" and therefore not studied performed favorably as well.

I think that this paper is both well motivated and implemented, with the idea of "entropy" a particularly clever one that ends up yielding very favorable results. I think that the paper's explanation in how the classification worked was easy to understand and well elaborated, with a surprising amount of intuition applied to the general back-end of how this classification technique works. Using higher-order entropy models to determine accuracy that way would be an interesting continuation of this study.