raw math

It is commonly not taught explicitly, but in machine learning you quite often come across problems which contain the following quantity and knowing the trick can help a lot. Let's say we have an nn-dimensional vector and want to calculate:

y=logi=1nexiy = \log\sum\limits_{i=1}^n e^{x_i}

For example in filtering problems for which the posterior is calculated recursively:

p(htv1:t)α(ht)=p(vtht)htα(ht1)p(htht1)p(h_t\vert v_{1:t}) \equiv \alpha(h_t) = p(v_t\vert h_t)\prod_{h_t}\alpha(h_{t-1})p(h_t\vert h_{t-1})

As α\alpha can be quite small, it's a common method to solve the problem in log-space:

logα(ht)=logp(vtht)+loghtexp(logα(ht1)+logp(htht1))\log \alpha(h_t) = \log p(v_t\vert h_t)+ \log \sum_{h_t}\exp(\log \alpha(h_{t-1}) + \log p(h_t\vert h_{t-1}))

Another example is a multinomial distribution which you want to parameterize with softmax, like e.g. logistic regression with more than 2 unordered categories. If you now want to calculate the log-likelihood you get the quantity due to the normalization constant.

Both problems have in common, that if you try to calculate it naively, you quite quickly will encounter underflows or overflows, depending on the scale of xix_i. Even if you work in log-space, the limited precision of computers is not enough and the result will be INF or -INF. So what can we do?

We can show, that the following equation holds:

logi=1nexi=a+logi=1nexia\log\sum\limits_{i=1}^n e^{x_i} = a + \log\sum\limits_{i=1}^n e^{x_i-a}

For an arbitrary aa. This means, you can shift the center of the exponential sum. A typical value is setting aa to the maximum, which forces the greatest value to be zero and even if the other values would underflow, you get a reasonable result:

a=maxixia=\underset{i}{max} x_i

Proof of log-sum-exp trick

y=logi=1nexiey=i=1nexieaey=eai=1nexieya=i=1neaexiya=logi=1nexiay=a+logi=1nexia \begin{array}{rll} & y &= \log\sum\limits_{i=1}^n e^{x_i} \\ \Leftrightarrow & e^y &= \sum\limits_{i=1}^n e^{x_i} \\ \Leftrightarrow & e^{-a} e^y &= e^{-a} \sum\limits_{i=1}^n e^{x_i} \\ \Leftrightarrow & e ^{y-a} &= \sum\limits_{i=1}^n e^{-a} e^{x_i} \\ \Leftrightarrow & y-a &= \log\sum\limits_{i=1}^n e^{x_i-a} \\ \Leftrightarrow & y &= a+\log\sum\limits_{i=1}^n e^{x_i-a} \end{array}