Extremisation of 2D functionals; application to a differential equation problem

[In this post I use calculus that I don’t understand rigorously. I hope that is excused for this  seemed a pretty interesting thing to discover.] 

One is familiar with the Euler-Lagrange equation in one dimension, where the function y(x) that extremises \int_{x_{1}}^{x_{2}} L(y, \dot{y}) \, dx necessiates the following equation on L be satisfied:

\frac{ \partial L}{ \partial y}= \frac{d}{dx} \left( \frac{ \partial L}{ \partial \dot{y}} \right) \, \ldots \ldots (1)

Using (1) one can find y(x) as a differential equation, which can be completely solved if initial conditions are given. As an aside of some interest (in the Lagrangian formulation of classical mechanics), one can note that there can be at most two undetermined constants in the solution of (1), so taking y(x_{1}) and taking y(x_2) as known will provide a unique solution to y(x), as will taking y(x_{1}) and \dot{y}(x_{1}) . This equivalence of two initial conditions means that the behaviour of the function L is completely specified for all defined x if the initial coordinate y(x_{1}) and the derivative of the coordinate \dot{y}(x_{1}) are given, and an equation of the form (1) is known; for that (1) can be derived by fixing the coordinates of two end-points (but not the derivatives of the coordinates) and imposing the conditions of extremum for the phase space of the functional lying between x_{1} and x_{2}.

This post aims to derive the two dimensional analogue of (1), i.e. what equation a function of two independent variables x and y, u(x,y), must satisfy so that an integral of a functional,

I(u)= \oint_{S} L(u,u_{x},u_{y}) \, dS \, \ldots \ldots (2)

is extremised. S is a simply connected subset of \mathbb{R}^{2}, i.e. a patch colloquially, with a defined closed boundary (the boundary can be piecewise continuous, the only requirement is that it has only a finite number of points of non-differentiability). u(x,y) is assumed to be differentiable upto at least the second order, and so is L. It is to be noted that S need not be convex in general. But we have the property that any surface can be subdivided into a number of surfaces such that for each surface at no point the inverse tangent of slope (wrt to a cartesian coordinate system), if exists, shall not change counterclockwise and cross the m=0 or m= \infty barriers. This will ensure that we can define (3) below in terms of complete integrals. Since any subsurface of S must separately satisfy the extremisation principle, in this proof it shall be assumed that S has the property above.

Since L is differentiable upto the second order, and is of the property above, the form (2) can be written in a cartesian form using double integrals as

I(u)= \int_{x_i}^{x_f} \left[ \int_{y_{1}(x)}^{y_{2}(x)}L(u,u_{x},u_{y}) \, dy \right] \, dx= \int_{y_i}^{y_f} \left[ \int_{x_{1}(y)}^{x_{2}(y)}L(u,u_{x},u_{y}) \, dx \right] \, dy \, \ldots \ldots (3)

The notation I have used for the limits of the integrals in (3) is: y_{1}(x) and y_{2}(x) are the lower and upper limits of a small rectangular strip of S of width dx at a specific x , and likewise for interchanging x and y ; x_{i},x_{f} are the x-bounds of S itself, and likewise y_{i},y_{f} are the y-bounds of S. It is to be noted that all the limits of integration in (3) are boundary points.

Attempting to tackle the problem like the one-dimensional case, let us assume u(x,y)=f(x,y) for (x,y) \, \, \in \, \, C where C is the closed boundary of S and f(x,y) is known; u(x,y) is then specified at the boundary of S .

Assume that the specific function u(x,y) that extremises (1) is u_{0}(x,y) , i.e. I(u_{0}) is a local extremum for all u within a sufficiently small neighbourhood of u_{0} (of an infinite dimensional function space). Working analogously with this loose intuitive idea of an infinite dimensional function space, let us effect a variation in u wrt u_{0}, i.e. replace u_{0} by a point u close to u_{0} , analogous to the general procedure we use for the one dimensional problem. This variation can be intuitively pictured as having a magnitude and a direction, and so we can write

u(x,y)=u_{0}(x,y)+ \alpha \eta(x,y) \ldots \ldots (4)

where \alpha is the magnitude of the displacement along the drection of \eta . Due to the boundary conditions being fixed, we have

\eta(x,y)=0 \, \, \forall \, \, (x,y) \, \in C \, \ldots \ldots (5)

Differentiating (4) wrt x and y and then wrt \alpha, one can get the equations

u_{x}=(u_{0})_{x}+ \alpha \eta_{x} \, , \, \, \, u_{y}=(u_{0})_{y}+ \alpha \eta_{y} \, \ldots \ldots (6)

and so

\frac{du}{d \alpha}= \eta \, , \, \, \,\frac{du_{x}}{d \alpha}= \eta_{x} \, , \, \, \, \frac{du_{y}}{d \alpha}= \eta_{y} \, \ldots \ldots (7)

One can now say that for u_{0} to extremise I(u), \frac{dI_{u}}{d \alpha}=0 at u=u_{0}.

This means, from (2),

\oint_{S} \frac{dL}{d \alpha} \, ds=0

Expanding using the chain rule, one has

\oint_{S} \left[ \frac{ \partial L}{ \partial u} \frac{du}{d \alpha}+ \frac{ \partial L}{ \partial u_{x}} \frac{du_{x}}{d \alpha}+ \frac{ \partial L}{ \partial u_{y}} \frac{du_{y}}{d \alpha} \right] \, ds=0

Using (7) the above can be compacted as

\oint_{S} \frac{ \partial L}{ \partial u} \eta \, dS+ \oint_{S} \frac{ \partial L}{ \partial u_{x}} \eta_{x} \, dS+ \oint_{S} \frac{ \partial L}{ \partial u_{y}} \eta_{y} \, dS=0 \, \ldots \ldots (8)

Consider the integral \oint \frac{ \partial L}{ \partial u_{x}} \eta_{x} \, dS. It can be equated thus, using (3), and the fact that \eta (x,y)=0 at boundary points:

\begin{array}{rcl} \oint_{S} \frac{ \partial L}{ \partial u_{x}} \eta_{x} \, dS &=& \int_{y_{i}}^{y_{f}} \left[ \int_{x_{1}(y)}^{x_{2}(y)} \frac{ \partial L}{ \partial u_{x}} \eta_{x} \, dx \right] \, dy \\ \\ &=& \int_{y_{i}}^{y_{f}} \left[ \left[ \frac{ \partial L}{ \partial x} \eta \right]_{x_{1}(y)}^{x_{2}(y)}- \int_{x_{1}(y)}^{x_{2}(y)} \frac{ \partial}{ \partial x} \left( \frac{ \partial L}{ \partial u_{x}} \right) \eta \, dx \right] \, dy \\ \\ &=& - \int_{y_{i}}^{y_{f}} \left[ \int_{x_{1}(y)}^{x_{2}(y)} \frac{ \partial}{ \partial x} \left( \frac{ \partial L}{ \partial u_{x}} \right) \eta \, dx \right] \, dy \\ \\ &=& - \oint_{S} \frac{ \partial}{ \partial x} \left( \frac{ \partial L}{ \partial u_{x}} \right) \eta \, dS \, \ldots \ldots (9) \end{array}

Similarly, interchanging the roles of x and y as in (9) and again using (3), one can get

\begin{array}{rcl} \oint_{S} \frac{ \partial L}{ \partial u_{y}} \eta_{y} \, dS &=& - \oint_{S} \frac{ \partial}{ \partial y} \left( \frac{ \partial L}{ \partial u_{y}} \right) \eta \, dS \, \ldots \ldots (10) \end{array}

Using (9) and (10), (8) becomes

\oint_{S} \left[ \frac{ \partial L}{ \partial u}- \frac{ \partial}{ \partial x} \left( \frac{ \partial L}{ \partial u_{x}} \right)- \frac{ \partial}{ \partial y} \left( \frac{ \partial L}{ \partial u_{y}} \right) \right] \eta \, dS=0 \, \ldots \ldots (11)

Since (11) must hold true if we took any subdomain S' of S, we hence conclude that the term in crochets will always be identically 0. This gives us the required equation L must satisfy to extremise (2) which we set out to find:

\frac{ \partial L}{ \partial u}= \frac{ \partial}{ \partial x} \left( \frac{ \partial L}{ \partial u_{x}} \right)+ \frac{ \partial}{ \partial y} \left( \frac{ \partial L}{ \partial u_{y}} \right)

This holds true for all kinds of simply connected surfaces, convex or concave. _{ \Box}

Application to the stretched membrane problem:

[Some calculation credit is due to more analytically adept people.]

We are given a closed frame in space, that forms a closed loop L. In a three dimensional cartesian system the loop can be described by its height h at each point from the x-y plane, and the x- and y-coordinates of the projection of the loop on the x-y plane. The loop is assumed to be without any kinks, i.e. if parameterised, all functions will be differentiable upto the second order. An elastic rubber membrane is stretched over the loop so that the edges of the membrane adhere to the loop. What will be the equation of the membrane?

A physical assumption is that the membrane tries to assume a shape that renders it with the minimal possible area, to assume the least potential energy position. This means that an integral of the form

\oint_{S} V(h,h_{x},h_{y}) \, dS \, \ldots \ldots (12)

is to be minimised (there is clearly no maximum limit to the area). Here S is the projection of the loop L on the x-y plane, and V(h,h_{x},h_{y}) is a function which encodes how ‘steep’ the slope is at a particular point (h,x,y), i.e. if a small patch of the manifold of measure dV made an angle of \theta with a plane parallel to the x-y plane, then dV=A \, dS= \text{cosec} \theta \, dS .

One can obtain through some geometry (skipping the details) that V(h,h_{x},h_{y})= \sqrt{1+h_{x}^{2}+h_{y}^2}. Plugging this form into equation (12) gives

\frac{ \partial}{ \partial x} \left[ \frac{h_{x}}{ \sqrt{1+h_{x}^{2}+h_{y}^2}} \right]+ \frac{ \partial}{ \partial y} \left[ \frac{h_{y}}{ \sqrt{1+h_{x}^{2}+h_{y}^{2}}} \right] =0

Simplyfing the above one gets the simple and elegant (and almost Laplacian)

h_{xx}+h_{yy}=2h_{xy}h_{x}h_{y}

which has as initial values the coordinates h(x,y) of the loop. I’m not sure on how to show that that is sufficient to determine the membrane uniquely, if at all. _{ \Box}

Advertisements

Tao Exercise-“Formal limits are actual limits”

I have been learning real analysis from Terence Tao’s Analysis books, the two volumes Analysis I and Analysis II . One particularly important juncture in the construction of real numbers is the unification of the limit with the ‘formal limit’, a scaffolding of sorts that was introduced to construct the reals from the rationals (or rather, Cauchy Sequences of rationals.) Tao leaves it as an exercise (6.1.6) to verify that formal limits are actual limits. That will be done partly for its instructiveness and partly for my own greater understanding in this blogpost. Much of my terminology will be as I have learnt from Tao (because it is elegant and precise). But first, a little background (meaning a long digression) as to what ‘formal limits’ and ‘actual limits’ mean:

A real number is defined as a formal limit of a Cauchy Sequence of rational numbers. Uptil now the rationals have been constructed, with the four operations on them, exponentiation of rationals by integers, and order amongst themselves. One can intuitively think of the formal limit as being ‘what the sequence seems to narrow down to’, and it is with that idea in the back of one’s mind that one will proceed to define how formal limits interact and are worked with. Note that the term ‘formal limit’ as of now is not defined, it is just an ‘operation’ on a Cauchy Sequence that gives ‘something’. That ‘something’ might be a rational number, might not be, we have not defined anything yet. We will soon be able to get a better idea of formal limits once more axioms are made about them. As of now, we start with defining a real number to be a formal limit LIM_{n \rightarrow \infty} a_{n} of a Cauchy Sequence (a_{n})_{m}^{n} . Note that here the parameter \epsilon of a Cauchy Sequence (denoting small numbers that ultimately at some point will have two members of the sequence differencing to be lesser than them) is restricted to be rational, for we have not constructed the reals yet.

We can define two real numbers x and y to be equal iff they have equivalent Cauchy Sequences, i.e. if x=LIM_{n \rightarrow \infty} (x_{n})_{m}^{ \infty} and y=LIM_{n \rightarrow \infty} (y_{n})_{m}^{ \infty} ,

\forall \, \, \epsilon>0 \in \mathbb{Q}, \, \exists \, \, N_{ \epsilon} \in \mathbb{N} \, | \, |x_{p}-y_{p}| \leq \epsilon \, \, \forall \, p \geq N_{ \epsilon}

This definition of equality can be easily shown to obey what is expected of an equality, i.e. reflexivity, associativity and commutativity, so that there are no misalignments with the idea of reals we all have.

We can define addition and multiplication for real numbers:

LIM_{n \rightarrow \infty} \left[ (x_{n})_{m}^{ \infty} \right] \oplus LIM_{n \rightarrow \infty} \left[ (x_{n})_{m}^{ \infty} \right] =LIM_{n \rightarrow \infty} \left[ (x_{n}+y_{n})_{m}^{ \infty} \right]

LIM_{n \rightarrow \infty} \left[ (x_{n})_{m}^{ \infty} \right] \odot LIM_{n \rightarrow \infty} \left[ (x_{n})_{m}^{ \infty} \right] =LIM_{n \rightarrow \infty} \left[ (x_{n}.y_{n})_{m}^{ \infty} \right]

We have to use new symbols for now because we have not embedded the rationals into the reals yet. We shall do this now, though: We can axiomatise that the formal limit of a constant Cauchy Sequence (a Cauchy sequence where all the terms are equal to a certain rational number) is that rational number. In this way, we have embedded the rationals into the reals, by associating rationals with the formal limits of certain special Cauchy Sequences that are constant. Now we can pretend that the operation \oplus was the ordinary addition + ‘all along’, and the operation \odot was the ordinary multiplication . ‘all along’; this is just a way of embedding the rationals into the reals by seeing the ‘original’ + and . of the rationals as just being ‘special cases’ of the all-encompassing + and . of the reals. This form of embedding (which is essentially a bunch of nomenclature tricks) can also be done for subtraction and division, and exponentiation. This generalisation does not affect the system in any way, there are no contradictions. I could go on more about the formalisation of order, etc, but the digression is turning out to be too long, and anyway the basic idea is given. Note that we have already got two of our axioms about how Cauchy Sequences are to be operated upon; substituting \oplus by + and \odot by ..

Now let’s fast forward. The rationals have been successfully integrated into the reals, except for one thing: limits. The formal LIM is defined to work only on rational Cauchy Sequences. We can define the limit lim for non-rational Cauchy Sequences, with a real \epsilon parameter, as follows: if L is a real number, the real Cauchy Sequence (x_{n})_{m}^{ \infty} is said to converge to L if

\forall \, \, \epsilon>0 \in \mathbb{R}, \, \exists \, \, N_{ \epsilon} \in \mathbb{N} \, | \, |x_{p}-L| \leq \epsilon \, \, \forall \, \, p \geq N_{ \epsilon}

This is expressed by saying

\lim_{n \rightarrow \infty} x_{n}=L

It is to be noted that real Cauchy Sequences are also Cauchy when \epsilon is restricted to rational values, and hence also preserves the functionality of the original definition of a rational Cauchy Sequence. Now we come to the original question: showing that this new ‘genuine’ limit is equivalent to the old formal limit (which is no longer needed):

If (a_{n})_{n=m}^{ \infty} is a Cauchy Sequence of rational numbers, it converges to LIM_{n \rightarrow \infty} a_{n} , i.e.

\lim_{n \rightarrow \infty} a_{n}=LIM_{n \rightarrow \infty} a_{n}

To prove this, let us assume by contradiction that the Cauchy Sequence does not converge to L= LIM_{n \rightarrow \infty} a_{n} . This means that for all rational numbers \epsilon less than or equal to a certain real number \delta , there will be infinite terms a_{i} such that |a_{i}-L|> \epsilon (note that we can write in terms of a more lax inequality only because of the unique least upper bound property of subsets of the reals. Also, we are particular about rational Cauchy Sequences here because we need to step down to the level of the old definition to prevent circularity.) This means, mathematically, that

\exists \, \, \delta >0 \, \, | \, \, \forall \, \, 0 \leq \epsilon < \delta \, , \forall \, \, n \in \mathbb{N} \, , \, \exists \, \, i \geq n \, | \, |a_{i}-L|> \epsilon

One naturally wonders if the number of ‘Cauchy’ a_{i} shall be finite or not, i.e. we have only said that terms exist, not all terms, that are of the form |a_{i}-L|> \epsilon . One can venture to examine if the number of a_{i} that satisfy |a_{i}-L| \leq \epsilon , for whatever \epsilon we take, is finite or not. Let us fix an \epsilon. Suppose there exist infinite a_{i} such that |L-a_{i}| \leq \frac{ \epsilon}{2} . This means that, for all natural numbers n, there will exist atleast one term a_{i}, i \geq n such that |a_{i}-L| \leq \frac{ \epsilon}{2} . We can say that because \frac{ \epsilon}{2} \leq \delta. There will also exist atleast one a_{j}, j \geq n such that |a_{j}-L|> \epsilon. Subtracting the first inequality from the second, one sees that there exist i,j \geq n such that |a_{i}-a_{j}|> \frac{ \epsilon}{2} for any choice of n, employing the modulus property |x-y| \geq |x|-|y| . However, since (a_{i})_{i=0}^{ \infty} is a Cauchy Sequence, for the value \frac{ \epsilon}{2} there must exist a natural number N_{ \frac{ \epsilon}{2}} such that |a_{i}-a_{j}| \leq \frac{ \epsilon}{2} \, \, \forall \, \, i,j \geq N_{ \frac{ \epsilon}{2}}. We naturally see that if we put n=N_{ \frac{ \epsilon}{2}} we get a contradiction. Hence there are only finite a_{i} such that |a_{i-L}| \leq \frac{ \epsilon}{2} , i.e. after a certain integer M_{ \epsilon} \leq N_{ \epsilon} all terms of the series are of the form |a_{i}-L|> \frac{ \epsilon}{2}. Since \epsilon was arbitrary, we can generalise and say that for all positive rationals \epsilon \leq \delta, there exists a natural number N_{ \epsilon} such that |a_{i}-a_{j}| \leq \epsilon <|a_{i}-L| \, \, \forall \, \, i,j \geq N_{ \epsilon}.

Separating the a_{i} satisfying the above criteria into two sets, depending on whether they are less than \epsilon or greater than \epsilon , we can form respectively two sets of indexes S_{1} and S_{2}, both being disjoint subsets of \mathbb{N}- \{ 1,2,...,N_{ \epsilon}-1 \}, such that we have

a_{i}<L- \epsilon \, \, \forall \, \, i \in S_{1}

a_{i}>L+ \epsilon \, \, \forall \, \, i \in S_{2}

Atleast one of the sets S_{1} and S_{2} will be of cardinality \aleph_{0} . Let us here on denote the general series of the set S= \{S_{i} \} by the quantifier S_{x} where x is either 1 or 2.  We can replace the series (a_{i})_{i \in S_{x}} with the equivalent series (b_{i})_{i=0}^{ \infty}, such that a bijection exists between the two series, termwise (this is slightly unrigorous as have been some other points in this post, but those are not troubling at all.) Then we have the b_{i} \neq L for all i \in \mathbb{N}. By a theorem [see footnote for a short proof], we must have then LIM_{n \rightarrow \infty} b_{n} \neq L, which implies L \neq L , since the series (b_{i})_{i=0}^{ \infty} is equivalent to the series (a_{i})_{i \in S_{x}}. This is absurd. So that original claim of ours, that (a_{n})_{n=0}^{ \infty} does not converge to L, is false.

Footnote:

If a_{n} \neq L \, \, \forall \, \, i , then LIM_{n \rightarrow \infty} a_{n} \neq L ; the proof is similar for the less-than case:

Consider a Cauchy Series giving rise to L : (l_{i})_{i=0}^{ \infty} . By the definition of inequality of real numbers, we must have, for all i, there existing infinite j such that |a_{i}-l_{j}|> \epsilon for all rational \epsilon. However this translates to, and actually covers, the Cauchy Series nonequivalence criteria. So we must have LIM_{n \rightarrow \infty} a_{n} \neq L.

Decomposing a matrix in terms of eigenvalues

Suppose a matrix with real entities A=(a_{ij}), 1 \leq i,j \leq n has distinct real eigenvalues \lambda_1 , \lambda_2 \ldots , \lambda_n , where n \geq 2 is a positive integer. Then we intend to prove the known identity that for positive integral k \geq 1 , the k th power of A can be written as

A^{k}= \sum_{i=1}^{n} \lambda_{i}^{k} \, \frac{ |y_{i} \rangle \langle x_{i}|}{ \langle x_{i}| y_{i} \rangle}

Here the \langle x_{i}| and |y_{i} \rangle are respectively the left and right eigenvectors of A corresponding to the eigenvalues \lambda_{i} . Since the solutions of |A|=0 are the \lambda_i , it follows that each bra \langle x_{i}| corresponds to the ket |y_{i} \rangle as having the same eigenvalue \lambda_{i} . This means that the \langle x_{i}| and the |y_{i} \rangle satisfy the equations, for all 1 \leq i \leq n ,

\langle x_{i}|A= \lambda_{i} \langle x_{i}|

A|y_{i} \rangle = \lambda_{i}|y_{i} \rangle

Also, neither of the eigenbras and the eigenkets are equal to 0.

To prove that the expansion asserted is consistent, we first need a couple of straightforward lemmas:

1. \langle x_{i}|y_{j} \rangle =0 if i \neq j

To prove this, let us consider the two equations

\langle x_{i}|A= \lambda_{i} \langle x_{i}|

A|y_{j} \rangle = \lambda_{j}|y_{j} \rangle

Multiplying the first by |y_{j} \rangle and the second by \langle x_{i}| and subtracting, we have

(\lambda_{i}- \lambda_{j}) \langle x_{i}|y_{j} \rangle =0

Since the eigenvalues are distinct it follows that \langle x_{i}|y_{j} \rangle =0

2. The \langle x_{i}| are all linearly independent. So are the |y_{i} \rangle

Let us prove for the bras; the proof is identical in structure for the kets. Let us start with the equation

\sum_{i=1}^{n} c_{i} \langle x_{i}|=0

where c_{i} \, \in \mathbb{R}. Multiplying by the matrix A on the right side, and using the eigenvalue conditions, one also has

\sum_{i=1}^{n} c_{i} \lambda_{i} \langle x_{i}|=0

Let us select one value of the index i, say i=p. We then have from the first equation, multiplying by \lambda_{p},

c_{p} \lambda_{p} \langle x_{p}|+ \sum_{i \neq p} c_{i} \lambda_{p} \langle x_{p}|=0

This on subtracting from the second equation gives

\sum_{i \neq p}c_{i}( \lambda_{i}- \lambda_{p}) \langle x_{p}|=0

We have hence ‘eliminated’ a bra from the equation. We can proceed to do this further, eliminating bras one by one, and ultimately obtain a solitary bra whose coefficient is zero. But we can do this for all the bras, so that we can essentially prove all the c_{i} as 0 , because the terms of the form \lambda_i- \lambda_j are all non-zero. So the \langle x_{i}| are linearly independent.

Now let us take up the original problem. At first let us consider k=1. We can see upon inspection that if the expansion

B= \sum_{i=1}^{n} \lambda_{i} \frac{|y_{i} \rangle \langle x_{i}|}{ \langle x_{i}|y_{i} \rangle}

is multiplied on the right by any of the |y_{j} \rangle , or on the left by \langle x_{j}| , we have, for all j ,

\langle x_{j}|B= \lambda_{j} \langle x_{j}|

B|y_{j} \rangle = \lambda_{j}|y_{j} \rangle

This happens because due to the orthogonality conditions (1) the terms where i \neq j automatically vanish. Also, the vector space being real, the inner product commutes.

Let us try to prove that B=A . It is so far obvious that B ‘mimics’ all the properties A has towards the eigenkets of A; however that is not enough to prove their equality. If we construct the matrices

X= \begin{pmatrix} \langle x_{1}|^{(1)} & \ldots & \langle x_{1}|^{(n)} \\ \vdots & \ldots & \vdots \\ \langle x_{n}|^{(1)} & \ldots & \langle x_{n}|^{(n)} \end{pmatrix}

Y= \begin{pmatrix} |y_{1} \rangle ^{(1)} & \ldots & |y_{n} \rangle ^{(1)} \\ \vdots & \ldots & \vdots \\ |y_{1} \rangle ^{(n)} & \ldots & |y_{n} \rangle ^{(n)} \end{pmatrix}

\Lambda=( \lambda_{i} \delta_{ij})

i.e. arranging the bras along the rows and arranging the kets along the columns, and the eigenvalues along the diagonal keeping other elements of the matrix 0, we can see that X and Y are both invertible. Because the rows of X and the columns of Y are linearly independent and we know the determinant of a matrix ceases to exist only when some linear combination of rows or columns of the matrix becomes zero (geometrically this is ‘flattening the n-dimensional parallelopiped the matrix turns the unit n-cube into, the determinant giving the n-volume of the unit n-cube). So, taking the matrix X, one has

AX=BX=X \Lambda

Multiplying by X^{-1} on both sides, one has A=B. This proves the relation for k=1 .

The proof for any general k is easily done through induction. The base case k=1 is proven. Assume it is true upto any general k=p . Then

A^{p}= \sum_{i=1}^{n} \lambda_{i}^{p} \frac{|y_{i} \rangle \langle x_{i}|}{ \langle x_{i}|y_{i} \rangle}

Then,

A^{p+1}= \left( \sum_{i=1}^{n} \lambda_{i}^{p} \frac{|y_{i} \rangle \langle x_{i}|}{ \langle x_{i}|y_{i} \rangle} \right) \left( \sum_{j=1}^{n} \lambda_{j} \frac{|y_{j} \rangle \langle x_{j}|}{ \langle x_{j}|y_{j} \rangle} \right)

The above sum expands termwise to vanish for all terms where i \neq j , because of the \langle x_{i}|y_{j} \rangle . Otherwise the terms simply multiply to give themselves all over again, i.e.

P_{i}= \frac{ |y_{i} \rangle \langle x_{i}|}{ \langle x_{i}|y_{i} \rangle}

then P_{i}.P_{i}=P_{i} .

So we ultimately have

A^{p+1}= \lambda_{i}^{p+1} \sum_{i=1}^{n} \frac{ |y_{i} \rangle \langle x_{i}|}{ \langle x_{i}|y_{i} \rangle}

and hence the induction hypothesis and consequently the identity is verified. \Box

Functional analysis #1

[This was stimulated by this discussion on a facebook page, the initial question being put forward by Yehezkiel Tumewu. The part assuming double differentiability is introduced by me. I can’t really ‘prove’ every assumption taken, because that was resulted purely by intuition, and I couldn’t make much progress anyway, but I found this interesting, and my blog has been dormant, so…..]

This post aims to (inconclusively for the moment) examine under what conditions something that is evident to one’s intuition is derivable rigorously. Suppose one is given two continuous functions f,g: \mathbb{R} \rightarrow \mathbb{R} , which are not constant functions, and which are differentiable. Suppose the two functions obey the equations

\begin{array}{rcl} f(x+y)&=& f(x)f(y)- g(x)g(y) \, \ldots \ldots \, (1) \\ \\ g(x+y) &=& g(x)f(y)+f(x)g(y) \,\ldots \ldots \, (2) \end{array}

\forall \, \, x,y \in \mathbb{R}. We can say from observation that the functions f(x)= \cos kx , g(x)= \sin kx are solutions, where k \in \mathbb{R}. But it is not clear whether we can with the given data explicitly show that these are the only functions satisfying (1) and (2). We are also not clear as to what more we need to explicitly show that these are the only functions. Intuiting further, it seems that if we were to get differential equations for f and g, the values of f , f' ,g , g' at specific given points would determine the constant(s) of integration, which might conceivably fix the particular value of k . Before making any specific assumptions about initial points, one sees that one can get explicit values of f(0) and g(0) from the non-constant function property, in the manner described below.

In (1) and (2) , set x=y=0 . This gives

\begin{array}{rcl} f(0) &=& f^{2}(0)-g^{2}(0) \, \ldots \ldots \, (3) \\ \\ g(0) &=& 2f(0)g(0) \, \ldots \ldots \, (4) \end{array}

From (4), one gets that either g(0)=0 , or f(0)= \frac{1}{2} , or both. But f(0)= \frac{1}{2} cannot be true because substitution in (3) yields g^{2}(0)= - \frac{1}{4} which is absurd. So g(0)=0. Substituting this in (3) yields f(0)=0 or f(0)=1 . but we can reject f(0)=0 , because putting y=0 in (1) would imply

f(x+0)=f(x)f(0)-g(x)g(0)=0 \, \, \forall \, \, x

which would contradict the non-constant nature of f . Hence we have as a pair of ‘initial conditions’,

f(0)=1

g(0)=0

which we didn’t have to assume anything about. One might think that if the goal is to get some kind of set of differential equations, one would need to mention something about differentiability. Also, it seems that any way to get a differential equation might progress to a situation where we get a second order differential equation in one function entirely, by eliminating the other function, since we can then solve separately. Moreover boundary conditions are needed. It seems plausible that through manipulation of functions, as was found for f(0) and g(0) , one can’t explicitly find values for f'(0) and g'(0), because we never get to separate the variables x and y on both sides, unless identically so. We might find some relation between f' and g' , but that will be dependent on constants and/or the original functions. So the issue of boundary values of first derivatives will not be considered here. But we shall make the assumption that

f and g have continuous first and second order derivatives.

So lets proceed.

We can differentiate (1) and (2) in y , keeping x constant while differentiating (ah, the joys of functional equations!) to obtain the following pair:

\begin{array}{rcl} f'(x+y) &=& f(x)f'(y)-g(x)g'(y) \\ \\ \implies f'(x) &=& f'(0)f(x)-g'(0)g(x) \, \ldots \ldots \, (5) \\ \\ g'(x+y) &=& f(x)g'(y)+g(x)f'(y) \\ \\ \implies g'(x) &=& g'(0)f(x)+f'(0)g(x) \, \ldots \ldots \, (6) \end{array}

(Solving this purely in terms of matrix equations and involving some linear algebra in is rather tedious, and I do not have the entire acumen or patience for that now. I hope to cover this in another blogpost, or can make some partial progress as a footnote.)

At this stage, we can ‘make a gamble’: seeing that we have to express f(x) in a purely cosine form, we can set as an initial condition f'(0)=0 , as that would be independent of k, and moreover leaves one free to take the other boundary condition as g'(0)= \alpha , where \alpha is a real number. This choice of initial coordinates is consistent with the trigonometric functions. So now (5) and (6) become

\begin{array}{rcl} f'(x) &=& -\alpha g(x) \, \ldots \ldots \, (7) \\ \\ g'(x) &=& \alpha f(x)\, \ldots \ldots \, (8) \end{array}

Differentiating one and substituting into the other yields two very familiar equations

\begin{array}{rcl} f''(x) &=& -\alpha^{2} f(x) \, \ldots \ldots \, (9) \\ \\ g''(x) &=& - \alpha^{2} g(x)\, \ldots \ldots \, (10) \end{array}

It is clear that \alpha can’t be 0, for that will mean from (7) and (8) that the functions are both constant.

The solutions of (9) and (10) are

\begin{array}{rcl} f(x) &=& a_{1} \cos \alpha x \, + b_{1} \sin \alpha x \, \ldots \ldots \, (11) \\ \\ g(x) &=& a_{2} \cos \alpha x \, + b_{2} \sin \alpha x \ldots \ldots \, (12) \end{array}

for real a_{1},b_{1},a_{2},b_{2}.

Plugging in f(0)=1 into (11) yields a_{1}=1 , and then plugging f'(0)=0 again into (11) gives b_{1}=0 . Similarly plugging in g(0)=0 in (12) yields a_{2}=0 , and g'(0)= \alpha yields b_{2}=1 . So we have found the intuited pair of solutions:

\begin{array}{rcl} f(x) &=& \cos \alpha x \\ \\ g(x) &=& \sin \alpha x \end{array}

It seems that second differentiability and fixing of four pairs of initial solutions (of which two are evident from the very definition of the functions) are sufficient to determine the two functions by the method of differential equations involving the second order. However this post has been inadequate in answering whether these are necessary conditions, or even are unique. What if we did not assume second differentiability, or assigned a non-zero value to f'(0)

Algebra #1

[This problem came in the Putnam this year. I don’t remember the exact wording used so am giving my own.]

Suppose a set S exists, over the binomial operation * , such that * is commutative and associative for all elements in S . This algebraic structure defined by (S,*) has the property that for each x and y in S, there exists a (not necessarily unique) z in S such that x*z=y . If for three elements a , b , c in S , one has a*b=a*c , show that b=c (a cancellation law.)

Before providing a solution (which was surprisingly easy) I would like to digress a bit as to how one can intuite the solution process. It is obvious that the condition given regarding there existing a mapping from every element in S to every other element in S is very powerful, for it greatly ‘patternises’ the forms S can have. Also since * is commutative, one can use it more. With these tools we can begin attack on the two elements to be shown equal, b and c .

Let p be an element in S such that b*p=c . Similarly, let q be an element in S such that c*q=b . Using these one has, by substituting one in another alternately,

\begin{array}{rcl} (b*p)*q=b &,& (c*q)*p=c \\ \implies b*(p*q)=b &,& c*(q*p)=c \\ \end{array}

Since * is commutative, we see that r=p*q maps both b and c to themselves. Let k be the element in S such that a*k=r . We then have:

\begin{array}{rcl} a*b &=& a*c \\ \implies (k*a)*b &=& (k*a)*c \\ \implies r*b &=& r*c \\ \therefore \, \, b &=& c \end{array}

where we were able to write the second step due to the associativity condition. \Box

Mechanics #2

A parabola rolls on a straight line without slipping. Show that its focus traces out a catenary.

Let us consider the situation given below in the figure. We are in the frame of reference of the parabola (X-Y) , in black, whereas the frame of reference of the line on which the parabola is rolling along is (X’-Y’) , in green. Let the parabola be represented in X-Y as y^{2}=4ax . E is the point of contact of the parabola at that given instant.

Let (x,y) and (x',y') be the coordinates of any point in the X-Y and X’-Y’ systems respectively. We shall use x , y etc to denote general coordintes of the points. If t is the parameter representing the ‘evolution’ of the point E , we can represent it in the X-Y frame by the parametric coordinates (2at,at^{2}) .

The coordinates of F in the X-Y frame are (0,a) . Hence

EF= \sqrt{4a^{2}t^{2}+a^{2}(t^{2}-1)^{2}}=a(t^{2}+1)

O'E is the arc length of the parabola upto the point E starting from O ; at that point the slope of the parabola is \frac{dy}{dx} =t . Hence performing the change of variables from x to t ,

\begin{array}{rcl} O'E &=& \int_{0}^{t} \sqrt{1+ \left( \frac{dy}{dx} \right)^{2}} \,(2a \, dt) \\ \\ &=& 2a \int_{0}^{t} \sqrt{1+t^{2}} \, dt \\ \\ &=& a \left[ t \sqrt{1+t^{2}}+ \sinh^{-1} t \right] \end{array}

There is a property of a parabola, easily provable by elementary calculus, that any line perpendicular to the directrix of the parabola makes the same acute angle with the parabola that the line joining that point to the focus does. This finds use in geometric optics where all parallel light beams falling into a parabolic mirror can be made to converge to or appear to diverge from the focus. Using that property it is evident from the figure that \tan \theta = \frac{1}{t} . So we have,

\begin{array}{rcl} O'P &=& O'E-EF \cos \theta \\ \\ &=& a \sinh^{-1} t \\ \\ \\ PF &=& EF \sin \theta \\ \\ &=& a \sqrt{t^{2}+1} \end{array}

So, in the X’-Y’ system the coordinates of F are x'=a \sinh^{-1} t , y=a \sqrt{1+t^{2}} . Eliminating t between these two equations gives

y'=a \cosh \left( \frac{x'}{a} \right)

which is an equation of a catenary. \Box