In Chapter~\ref{ch:HmmDigraphs} we were concerned only with the HMM state transition structure. In this chapter we are concerned with developing and using statistical models for the observation vectors. \begin{haskell}{HmmDensities} > module HmmDensities( > module MathTypes, module Phones, > GaussianComponent, TiedMixture(..), TmTable, > LogDensityTable, > eval_log_densities, readMixtures, readMixture, extern_to_intern > ) where > import Native -- hbc library modules > import Lists -- general library modules > import MathTypes > import MaybeStateT > import Phones -- application-specific modules > import HmmConstants > import Array--1.3 #if __HASKELL1__ < 5 #define amap map #else #define amap fmap #endif \end{haskell} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section {Multivariate Gaussian Densities} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Let $x$ be a vector of dimension $N$ with real-valued components, i.e., $x \in \reals^N$. The formula for a multivariate Gaussian probability density function having mean vector $\mu \in \reals^N$ and covariance matrix $\Sigma \in \reals^{N \times N}$ is \begin{equation} \scriptN(x \verticalbar \mu, \Sigma) = \frac{1}{ (2\pi)^{N/2} (\det \Sigma)^{1/2} } \exp \{ -\frac{1}{2} (x - \mu)^{T} \Sigma^{-1} (x - \mu) \}. \label{eq:gauss-gen} \end{equation} The superscript $T$ denotes the matrix/vector transpose; vectors without the superscript $T$ are taken to be column vectors. If the covariance matrix is diagonal, i.e., $\Sigma = \mbox{diag}\{\sigma_1^2,\sigma_2^2,\ldots,\sigma_N^2\}$, and if we take $x = (x_1, x_2, \ldots, x_N)$ and $\mu = (\mu_1, \mu_2, \ldots, \mu_N)$, then (\ref{eq:gauss-gen}) can be simplified to \begin{equation} \scriptN(x \verticalbar \mu, \Sigma) = \frac{1}{ (2\pi)^{N/2} \prod_{j=1}^N \sigma_j} \exp \{ -\frac{1}{2} \sum_{j=1}^N (x_j - \mu_j)^2 / \sigma_j^2 \} . \label{eq:gauss-diag} \end{equation} We will assume diagonal covariance matrices for the rest of this chapter. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section {Mixture Densities} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Given $M$ density functions $f_1, f_2, \ldots, f_M$ defined over $\reals^N$ and $M$ nonnegative real numbers $c_1, c_2, \ldots, c_M$ for which \[ \sum_{i=1}^M c_i \ = \ 1, \] we can define a new density function $g$ over $\reals^N$ by the equation \[ g = \sum_{i=1}^M c_i f_i. \] The function $g$ is called a {\em mixture density function}. The $c_i$'s are the {\em mixture coefficients\/} or the {\em component probabilities}, and the $f_i$'s are the {\em component densities}. An excellent introduction to the theory of mixture distributions is provided in~\cite{EverHand81}. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section {Gaussian Mixture Densities} \label{sc:gms} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% A {\em Gaussian mixture density\/} is a mixture density for which each component density is Gaussian. Thus, a Gaussian mixture density has the form \begin{equation} g(x) = \sum^M_{i=1} c_i \scriptN(x \verticalbar \mu_i,\Sigma_i) \label{eq:mixG-gen} \end{equation} where the $i$th component density has mean $\mu_i$ and covariance matrix $\Sigma_i$. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection {Representation} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Recall that we are assuming that the covariance matrix is diagonal; hence, it can be represented as a vector. The parameters of a Gaussian mixture can be stored in a list of triples. The first component of each triple is the mixture coefficient, the second component is the mean vector, and the third component is the variance vector. \begin{haskell}{GaussianComponent} > type GaussianComponent = (Float, Vector, Vector) \end{haskell} \fixhaskellspacing\begin{haskell}{GaussianMixture} > type GaussianMixture = [GaussianComponent] \end{haskell} For example, a 3-component mixture density would be represented by the list \[ \startlist (c_1, \mu_1, \Sigma_1),\ (c_2, \mu_2, \Sigma_2),\ (c_3, \mu_3, \Sigma_3) \stoplist \] This is how we will store the mixture parameters in external files. However, to make the evaluation of a mixture density more efficient, it pays to modify the {\em internal\/} representation somewhat. Using a subscript $i$ to index the mixture components and another subscript $j$ to index the vector components, we can replace each $\scriptN(x \verticalbar \mu_i, \Sigma_i)$ in (\ref{eq:mixG-gen}) by its definition (\ref{eq:gauss-diag}) to get \begin{equation} g(x) = \sum^M_{i=1} \frac{c_i}{ (2\pi)^{N/2} \prod_{j=1}^N \sigma_{i,j} } \exp \{ -\frac{1}{2} \sum_{j=1}^N (x_j - \mu_{i,j})^2 / \sigma_{i,j}^2 \}. \end{equation} Denote the ``normalized'' mixture coefficient by $c'_i$: \begin{equation} c'_i = \frac{c_i}{ (2\pi)^{N/2} \prod_{j=1}^N \sigma_{i,j} }. \label{eq:normalized-coefficient} \end{equation} Then \begin{eqnarray} g(x) & = & \sum^M_{i=1} c'_i \exp \{ -\frac{1}{2} \sum^N_{j=1} (x_j - \mu_{i,j})^2 / \sigma_{i,j}^2 \}\\ & = & \sum^M_{i=1} c'_i \exp \{ - \sum^N_{j=1} (x_j - \mu_{i,j})^2 / (2\sigma_{i,j}^2) \}. \label{eq:mixture-final-form} \end{eqnarray} The internal representation has the same structure as the external representation, i.e., a list of triples of type \verb~GaussianComponent~, except the mixture coefficient is replaced by the normalized mixture coefficient and each of the variances is multiplied by two. For example, a 3-component mixture density would be represented internally by the list \[ \startlist (c'_1, \mu_1, \Sigma'_1),\ (c'_2, \mu_2, \Sigma'_2),\ (c'_3, \mu_3, \Sigma'_3) \stoplist \] where \[ \Sigma_i = \mbox{diag}\{\sigma_{i,1}^2, \sigma_{i,2}^2, \ldots, \sigma_{i,N}^2 \} \ \Rightarrow\ \Sigma'_i = \mbox{diag}\{2\sigma_{i,1}^2, 2\sigma_{i,2}^2, \ldots, 2\sigma_{i,N}^2 \} \] We need to know the dimension of the observation vectors to calculate the Gaussian normalization coefficient efficiently. We could do the calculation without explicit knowledge of the observation dimension by putting a $\sqrt{2\pi}$ term inside the multiplication over the $\sigma_{i,j}$'s (see (\ref{eq:normalized-coefficient})), but this increases the amount of computation. Since we need the observation dimension for other functions, we assume that we have access to it via the module \verb~HmmConstants~. \begin{haskell}{gaussian_factor} > gaussian_factor = sqrt (2.0*pi) ^ observation_dimen :: Double \end{haskell} The normalizing calculation is performed using double precision arithmetic. The expression \verb~fromRational.toRational~ is a Haskell idiom for conversion of floating point types. \begin{haskell}{extern_to_intern} > extern_to_intern :: GaussianMixture -> GaussianMixture > extern_to_intern = map extern_to_intern' > extern_to_intern' :: GaussianComponent -> GaussianComponent > extern_to_intern' (c, mu, vars) = > let > vars_d = map (fromRational.toRational) vars :: [Double] > t = gaussian_factor * sqrt (product vars_d) > denom = (fromRational.toRational) t :: Float > in > (c/denom, mu, map (2*) vars) \end{haskell} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection {Evaluation} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% In the Viterbi alignment algorithm~(Chapter~\ref{ch:Viterbi}), we are actually going to do our calculations on a logarithmic scale. Thus, the function we really want to compute is \begin{equation} \ln g(x) = \ln\ \sum^M_{i=1} c'_i \exp \{ - \sum^N_{j=1} (x_j - \mu_{i,j})^2 / (2\sigma_{i,j}^2) \}. \label{eq:log-mixture-final-form} \end{equation} The function for evaluating (\ref{eq:log-mixture-final-form}) with parameters \verb~igm~ (for ``internal Gaussian mixture'') at the point \verb~x~ can be defined as follows: \begin{haskell}{eval_log_mixture} > eval_log_mixture :: Vector -> GaussianMixture -> Float > eval_log_mixture x igm = > density_log (sum [c' * exp( eval_exponent x mu vars' ) > | (c', mu, vars') <- igm]) \end{haskell} \fixhaskellspacing\begin{haskell}{eval_exponent} > eval_exponent (x:rxs) (u:rus) (v:rvs) = > let t = x - u > in (eval_exponent rxs rus rvs) - (t*t)/v > eval_exponent [] [] [] = 0.0 \end{haskell} The use of the function \verb~density_log~ is an attempt to avoid an unnecessary logarithm calculation if the density value is too small. \begin{haskell}{density_log} > density_log x = if x <= density_threshold > then smallest_log_density > else log x \end{haskell} \fixhaskellspacing\begin{haskell}{density_threshold} > density_threshold = 0.1023 ^ observation_dimen :: Float \end{haskell} \fixhaskellspacing\begin{haskell}{smallest_log_density} > smallest_log_density = log density_threshold \end{haskell} We now explain how we picked the formula for the density threshold. Equation (\ref{eq:gauss-diag}) is the probability density function for $N$ mutually independent Gaussian random variables. The probability that an individual zero-mean unit variance Gaussian random variable takes on an absolute value exceeding 1.65 is about 0.10. The value of the density function at this point is about 0.1023. Since the Gaussian density function is unimodal, if all $N$ components had taken values exceeding 1.65 in absolute value, the value of the density function at such a point would have to be smaller than $0.1023^N$. But the probability of this happing is not greater than $0.10^N$. So, if our density is this small, it is quite unlikely that the observed vector came from this particular mixture, so we don't bother to compute the logarithm precisely.\footnote{Of course, there are other ways that the density function can be that small; this calculation is just an engineering approximation.} Note that, like the Gaussian normalizing factor, these constants depend on the dimension of the observation vector, which is defined in the module \verb~HmmConstants~; see Chapter~\ref{ch:Constants}. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section {The Log-Density Table} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% The states of each HMM are indexed by integers. \begin{verbatim} > type HmmState = Int \end{verbatim} Each state of a phonetic HMM either has its own mixture density or it is {\em tied\/} to the density of another state, possibly belonging to another HMM. \begin{haskell}{TiedMixture} > data TiedMixture = Gm GaussianMixture | Tie Phone HmmState \end{haskell} The tied mixtures are stored in a two-dimensional table that is implemented as an array of arrays in order to save storage space when building tables for HMMs with different numbers of states. \begin{haskell}{TmTable} > type TmTable = Array Phone (Array HmmState TiedMixture) \end{haskell} The log-density values are stored in a table that has the same structure as the mixture parameter table. \begin{haskell}{LogDensityTable} > type LogDensityTable = Array Phone (Array HmmState Float) \end{haskell} The function \verb~eval_log_densities~ computes a table of log-density values from an observation vector and a tied mixture table. In other words, this function evaluates the log-densities for all of the states of all of the HMMs, placing the values in an array for efficient retrieval. \begin{haskell}{eval_log_densities} > eval_log_densities :: TmTable -> Vector -> LogDensityTable > eval_log_densities tmt x = ldt > where ldt = amap (amap eval_tied_mixture) tmt > eval_tied_mixture (Gm gm) = eval_log_mixture x gm > eval_tied_mixture (Tie p k) = ldt!p!k \end{haskell} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section {Reading Gaussian Mixture Densities from Binary Files} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% The function readMixtures reads all of the mixtures in a binary file that is assumed to contain only mixture data. \begin{haskell}{readMixtures} > readMixtures :: Bytes -> [GaussianMixture] > readMixtures bs = > case readMixture bs of > Nothing -> if null bs > then [] > else error "readMixtures: left-over bytes" > Just (m, bs') -> m : readMixtures bs' \end{haskell} The function \verb~readMixture~ reads the parameters for one mixture density. First it reads the number of component densities in the mixture, then it reads each of them in. This function and the next two that follow are implemented using the Maybe State Transformer (Chapter~\ref{ch:MaybeStateT}). \begin{haskell}{readMixture} > readMixture :: MST Bytes GaussianMixture > readMixture = > readBytes `bindMST` \ num_components -> > readGaussians num_components `bindMST` \ gs -> > returnMST gs \end{haskell} The first argument to the function \verb~readGaussians~ is the number of component Gaussian densities in the mixture. \begin{haskell}{readGaussians} > readGaussians :: Int -> MST Bytes GaussianMixture > readGaussians 0 = returnMST [] > readGaussians m = readGaussian `bindMST` \ g -> > readGaussians (m-1) `bindMST` \ gs -> > returnMST (g:gs) \end{haskell} The function \verb~readGaussian~ reads the mixture coefficient, the mean, and the variance vector for a multivariate Gaussian density. \begin{haskell}{readGaussian} > readGaussian :: MST Bytes GaussianComponent > readGaussian = > readBytes `bindMST` \ c -> > listReadBytes observation_dimen `bindMST` \ m -> > listReadBytes observation_dimen `bindMST` \ v -> > returnMST (c, m, v) \end{haskell} %%%%%%%%%% End of HmmDensities.lhs %%%%%%%%%%