Jensen’s Inequality is one of those lovely maths tricks that comes up in the most random of places and is just useful to have in your back pocket. In particular it’s great for proving a number of other useful inequalities by choosing an appropriate function to use within the theorem and doing some rearranging.
The goal in this post is to prove the weighted version of Jensen’s Inequality and then give an example usage of it. We start by talking about convex functions and then use convex functions when talking about Jensen’s Inequality.
At a high level a function is convex on an interval if any line segment between two points within that interval lie above the function. The diagram below shows this quite nicely.
The definition is all good and well … but how do we tell if a given function is actually convex? For certain (common) functions we can use a little bit of calculus to help us out here and we state the below theorem without proof.
As an example a well known convex function is where . The proof that this is convex is very straightforward since . In addition I managed to find a proof which doesn’t rely on derivatives here: How to prove e^x is convex.
Formal Statement of Jensen’s Inequality
In addition to the above and to ensure the requirement that we can instead re-write the inequality for Jensen as
Proving the Theorem
We give a short proof by induction.
- For the case the theorem trivially holds;
- For we have from the definition of a convex function so again the theorem holds;
For the inductive step we assume the theorem holds for where and show it holds for . At a high level we do this be re-expressing the various summations in terms of and by scaling the weights to fit this re-scaling. We use
Then since . We define .
The values of are our scaled values that go from to . From this we can write
Hence the can be used as weighting factors for from to . Then
Where , , and . Since and we have a convex function we can use case of and write
Next we use the inductive assumption that our theorem holds for to get
Then by using we can re-write the RHS of the above inequality as which gives us our final result
And we are done.
Young’s Inequality and the Arithmetic-Geometric Mean (Cauchy)
To arrive at this we start with the Jensen Inequality and use . We also use the properties of logs:
From these we can write
Dropping the logs in the above from both sides gives Young’s Inequality (the weighted Arithmetic-Geometric Mean):
The special case of for all gives the Arithmetic-Geometric Mean Inequality – also known as the Cauchy Inequality:
In the case of the Arithmetic-Geometric Mean Inequality we can replace the with to arrive at an inequality involving harmonic means:
In words, the geometric mean of a finite collection of positive numbers is always greater than their harmonic mean, unless all the numbers are equal when the two means coincide.
An inequality with Natural Exponents
For this example we take and for all . This is strictly convex on so we can use
Hence we get
and for the special case of we use and to get