Jensen’s Inequality is one of those lovely maths tricks that comes up in the most random of places and is just useful to have in your back pocket. In particular it’s great for proving a number of other useful inequalities by choosing an appropriate function to use within the theorem and doing some rearranging.

The goal in this post is to prove the weighted version of Jensen’s Inequality and then give an example usage of it. We start by talking about convex functions and then use convex functions when talking about Jensen’s Inequality.

## Convex Functions

**Definition (Convex Function)**

*The function such that is convex on if for every and we have *

At a high level a function is convex on an interval if any line segment between two points within that interval lie above the function. The diagram below shows this quite nicely.

The definition is all good and well … but how do we tell if a given function is actually convex? For certain (common) functions we can use a little bit of calculus to help us out here and we state the below theorem without proof.

**Theorem (How to Show Convexity)**

*Let be twice differentiable on the interval . Then is convex on . *

As an example a well known convex function is where . The proof that this is convex is very straightforward since . In addition I managed to find a proof which doesn’t rely on derivatives here: How to prove e^x is convex.

## Formal Statement of Jensen’s Inequality

**Theorem (Jensen)**

*Let be a convex function defined on an interval . If and are such that then*

In addition to the above and to ensure the requirement that we can instead re-write the inequality for Jensen as

**Theorem (Jensen – alternative)**

*Let be a convex function defined on an interval . If and then*

## Proving the Theorem

We give a short proof by induction.

- For the case the theorem trivially holds;
- For we have from the definition of a convex function so again the theorem holds;

For the inductive step we assume the theorem holds for where and show it holds for . At a high level we do this be re-expressing the various summations in terms of and by scaling the weights to fit this re-scaling. We use

Then since . We define .

The values of are our scaled values that go from to . From this we can write

Hence the can be used as weighting factors for from to . Then

(1)

Where , , and . Since and we have a convex function we can use case of and write

Next we use the inductive assumption that our theorem holds for to get

Then by using we can re-write the RHS of the above inequality as which gives us our final result

And we are done.

## Young’s Inequality and the Arithmetic-Geometric Mean (Cauchy)

To arrive at this we start with the Jensen Inequality and use . We also use the properties of logs:

From these we can write

Dropping the logs in the above from both sides gives Young’s Inequality (the weighted Arithmetic-Geometric Mean):

The special case of for all gives the Arithmetic-Geometric Mean Inequality – also known as the Cauchy Inequality:

Harmonic Means

In the case of the Arithmetic-Geometric Mean Inequality we can replace the with to arrive at an inequality involving harmonic means:

In words, the geometric mean of a finite collection of positive numbers is always greater than their harmonic mean, unless all the numbers are equal when the two means coincide.

## An inequality with Natural Exponents

For this example we take and for all . This is strictly convex on so we can use

Hence we get

and for the special case of we use and to get