Raniero's Research Blog: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition

Bibliographical Information:
Rabiner. "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition". Preceedings of the IEEE, Vol. 77, No. 2, February 1989.

URL:
http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf

This paper covers the explanations and basic tutorials regarding Markov Models, namely, the Hidden Markov Model that either derives information from or attempts to provide information regarding a hidden state, usually the origin.

The paper begins by explaining a standard Markov model, which is essentially a state machine with probabilities attached to each state, and one that can likely change its probabilities depending on current and previous states reached. This probability can be evaluated much in the same way that any probabilistic problem can be evaluated, by multiplying certain probabilities together based on a set of rules. The paper then extends this explanation to Hidden Markov Models, which requires that some piece of information, usually the origin point from which the states are determined, is unknown. Using the probabilities of the outputs, we can use Hidden Markov Models to numerically estimate the probabilities of the inputs. The total number of unknown parameters in an HMM system depends on the different kinds of models. For instance, an HMM system that estimates the coins used in a coin-toss experiment varies depending on whether we have 1, 2, or 3 coins. With 1 coin, we have only one unknown parameter. With 2 coins we have 4 unknowns, and with 3 coins we have 9 unknown parameters.

The paper describes the basic elements of an HMM:
1. The number of states in the model. Though these will be frequently the unknowns, there is also a state of "physical significance". The coins themselves represent the number of states in the coin-toss experiment, since the coins will only have two states (heads or tails)
2. The number of "observation symbols", or rather, the number of state types. "Heads, tails", means this number will be two.
3. The state transition probability distribution. The multiplication of the probabilities of how likely it is for the HMM problem to move from one state to the next. If a biased coin is more likely to switch sides if the same side has been acquired three times in a row, for instance, that probability will show up here.
4. Observation symbol probability distribution. The multiplication of the probabilities of the observations. For instance, the total probability of "how likely is this coin to come out heads, tails", etc.
5. Initial state distribution. The multiplication of the probabilities of the first, initial state.

The paper continues to describe three main "problems" that can be solved through an HMM:
1. Computing the probability of the observation sequence given a model
2. Computing the "best" observations given a model (i.e., the ones likely to yield meaningful training data)
3. How do we adjust the parameters of the model to maximize the probability of the observation sequence?

Rabiner describes all three of these main problems using basic examples that highlight the main concepts that the problems describe. It also goes over some general steps in how one may solve each of these three problems.

The paper goes over some additional "types" of HMMs. For instance, it describes the "left-right", or "Bakis" model, called left-right due to the underlying state sequence increasing as the state index increases. No transitions are allowed to states whose indices are lower than the current state. It also discusses the many instances in which the observations are not expected to be discrete, but rather continuous observations whose data and probability changes over time. Applicable to speech processing, another type was briefly discussed, the "auto-regressive HMM", where the observations are drawn from what the paper calls an "auto-regressive process", which ensures that each sample signal is counted exactly once.

Additional, more advanced information is covered that involves optimization of models, variants on the discussed HMM structures, scaling to include additional sequences, accounting for multiple observation sequences, choosing models, covering what happens when there is too little data, and an extended example of an application of an HMM structure to speech recognition.

Overall, I think this paper was very informative, if only a little exhaustive. The basics of Markov models and, more specifically, HMMs, are not as daunting as this paper may initially present them to be. Some of the examples that the paper went over, such as the ball-urn model or the coin-toss model, were only used to establish some terminology, not so much run through a bottom-up description of an entire HMM structure with these easy-to-understand models. Indeed, the only real extended example is that of speech recognition, which in itself is difficult to understand, so posing the understanding of an HMM structure to speech recognition is likely too esoteric for a beginner of HMMs. In fact, someone likely to be unfamiliar with HMMs is very likely to be unfamiliar with speech recognition, so this extended example is likely not the best to use as a "tutorial". Some additional information was then presented to the reader, such as the different types of HMMs, before a sufficient enough explanation was provided for the more simple types of HMMs, so that also hindered the learning experience somewhat. The subject is, however, very relevant to the field of sketch recognition and important in deriving the likelihood of what a user meant to write on a digital pen system rather than relying on what the user actually writes.

Raniero's Research Blog

Wednesday, November 5, 2014

A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition

No comments:

Post a Comment