Nonlinear function (“model”) $M$ of the initial state ${x}_{0}$ (state at time $0$), giving the state ${x}_{t}$ at time $t$:

${x}_{0}$ and ${x}_{t}$ are vectors with $I$ components, respectively ${x}_{i}^{0}$ and ${x}_{i}^{t}$. $M$ is then a vector function.

Let $M\left({x}_{0}\right)$ be the Jacobian matrix of the vector function $M$, containing its first derivatives with respect to the initial variables:

since $M$ is a nonlinear function, its derivatives depend on the initial state, $M=M\left({x}_{0}\right)$. This dependence is not indicated hereafter for simplicity. As a matrix, $M$ is:

$$M=\left[\begin{array}{cccc}\frac{\partial {M}_{1}}{\partial {x}_{1}^{0}}& \frac{\partial {M}_{1}}{\partial {x}_{2}^{0}}& \cdots & \frac{\partial {M}_{1}}{\partial {x}_{I}^{0}}\\ \frac{\partial {M}_{2}}{\partial {x}_{1}^{0}}& \frac{\partial {M}_{2}}{\partial {x}_{2}^{0}}& \cdots & \frac{\partial {M}_{2}}{\partial {x}_{I}^{0}}\\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial {M}_{I}}{\partial {x}_{1}^{0}}& \frac{\partial {M}_{I}}{\partial {x}_{2}^{0}}& \cdots & \frac{\partial {M}_{I}}{\partial {x}_{I}^{0}}\end{array}\right]$$

The first order variation is then obtained as a row-by-column product:$$\delta {x}_{t}=M\delta {x}_{0}$$

$$\delta {x}_{i}^{t}={\sum}_{i=1}^{I}\frac{\partial {M}_{i}}{\partial {x}_{j}^{0}}\delta {x}_{j}^{0}$$

The Jacobian matrix $M$ is also called tangent linear operator: its is linearly applied to variations of the initial state (tangent vectors) and it depends on the intial state because $M$ is nonlinear.

Its first variation is obtained multiplying the row vector obtained by transposing its gradient to the vector $\delta {x}_{t}$, which can be expressed as above:

$$\delta J={\left(\frac{\partial J}{\partial {x}_{t}}\right)}^{T}\delta {x}_{t}={\left(\frac{\partial J}{\partial {x}_{t}}\right)}^{T}M\delta {x}_{0}$$

So its first variation can be expressed by means of its gradient with respect to the initial condition:

$${\left(\frac{\partial J}{\partial {x}_{0}}\right)}^{T}={\left(\frac{\partial J}{\partial {x}_{t}}\right)}^{T}M$$

So the transpose of the Jacobian matrix, the transpose operator, is linearly applied to the gradient with respect to **final** time variables, to give the gradient with respect to **initial** time variables.

The transpose operator is sometimes called “adjoint” operator, though they are not exactly the same because the adjoint operator depends on the definition of a scalar product. The gradients are then “adjoint vectors”: remark that if the state components have physical dimensions, then the tangent vectors have the same dimensions and the components of the adjoint vectors have physical dimension which are the inverse (apart from possible physical dimensions of $J$) of their corresponding tangent or state components.