Differential calculus

At the core of calculus lie derivatives, where the derivative is defined as the instantaneous rate of change of a given function with respect to one of its variables. The study of finding a derivative is known as differentiation. Geometrically, the derivative at a known point is given by the slope of a tangent line to the graph of the function, provided that the derivative exists, and is defined at that point.

Differentiation is the reverse of Integration. Differentiation has several applications; like in physics, the derivative of displacement is velocity, and the derivative of velocity is acceleration. Derivatives are mainly used to find maxima or minima of a function.

Within machine learning, we deal with the functions that operate on variables or features having hundreds or more dimensions. We calculate derivatives of the function in each dimension of the variable, and combine these partial derivatives into a vector, which gives us what is called a gradient. Similarly, taking the second-order derivative of a gradient gives us a matrix termed as Hessian.

The knowledge of gradients and hessians helps us define things like directions of descent and rate of descent, which tell us how we should travel in our function space in order to get to the bottom-most point, in order to minimize the function.

The following is an example of a simple objective function (linear regression with weights x, N data points, and D dimensions in a vectorized notation:

The method of Lagrange multipliers is a standard way in calculus to maximize or minimize functions when there are constraints involved.