Defining the derivative

From Applied Science

Before I discuss it I should point out one confusion that has happened to me. Every textbook discusses the problem of finding a tangent line before defining the derivative of a function. If you ever watched a video on that, maybe the music video "I will derive", you should have witnessed a tangent line behaving like a roller-coaster, riding over the graph of a function. Careful there! The tangent line is one thing. The derivative of a function is not the tangent line itself! When we calculate a limit it yields two possible results: a number or infinity. The definition of a derivative is a limit, but in this case the result of it is another function. It can happen that the derivative yields a number, in which case it's a constant function.

I'm mentioning that confusion because I think very often some people are mislead, thinking that to derive a function is the same thing as finding the tangent line. Not quite. When we have functions such as polynomials of degree greater than 2 and any transcendental function, the process of calculating a derivative yields another function that is not linear, it's not a straight line! There is no such thing as finding a function that is tangent to another in multiple points.

The tangent line problem

         

The definition of a tangent is the rise / run ratio on a right triangle. At school we are given the length of the triangle's sides or we measure it with a ruler. With analytical geometry we know that the distance between two points is [math]\displaystyle{ |a - b| }[/math] in case the line between them is parallel to the axis. When the rise is close to zero, the angle is close to zero. Meaning that a ramp has a very low steepness. The opposite is when the rise's length is so much more than the run that the angle is close to 90°, meaning the highest possible steepness.

    [math]\displaystyle{ \text{tan} = \frac{f(x) - f(p)}{x - p} }[/math]

Careful here! The triangle's hypotenuse is not a tangent. It's a secant because it's crossing the graph in two points. Now to make that secant a tangent what we need is a limit to bring the distance between the two points close to zero.

[math]\displaystyle{ \lim_{x \ \to \ p} \frac{f(x) - f(p)}{x - p} }[/math]

What that limit is calculating is the slope of that point. If we could draw a right triangle at a microscopic scale it'd have rise / run ratio equal to that number.

Footnote: about the order of the points. Depending on the textbook they have a graph with concavity up or concavity down. That's why the order of the points in the limit above is reversed. The standard is to have the [math]\displaystyle{ p }[/math] to the right of [math]\displaystyle{ x }[/math]. Since the standard notation is [math]\displaystyle{ f(x) }[/math] it's more natural to write [math]\displaystyle{ x \to p }[/math] than the other way around.

The derivative

I'm repeating the same graph as above with the same points, except that I've changed the variables to a different notation. The notation emphasizes the idea of a limit more than the geometric idea of the rise / run ratio. Now we have [math]\displaystyle{ \text{run} = |(x + h) - x| }[/math] and [math]\displaystyle{ \text{height} = |f(x + h) - (x)| }[/math].

[math]\displaystyle{ f'(x) = \lim_{h \ \to \ 0} \frac{f(x + h) - f(x)}{h} }[/math]

In the previous graph the idea was to make the distance between the two points infinitely small but not equal to zero. Now we are regarding [math]\displaystyle{ h }[/math] as the smallest possible quantity that is as close as possible to zero. This is important, when solving exercises with that definition we can do this [math]\displaystyle{ h/h = 1 }[/math] because we are not dividing zero by zero. What happens when [math]\displaystyle{ h \ \to \ 0 }[/math] ? The distance between [math]\displaystyle{ (x,f(x)) }[/math] and [math]\displaystyle{ (x + h,f(x + h)) }[/math] becomes so small that we no longer have two points and a secant, but one point and a tangent line.

In some places the [math]\displaystyle{ h }[/math] is replaced by [math]\displaystyle{ \Delta x }[/math] or [math]\displaystyle{ \Delta h }[/math]. The letter Delta in physics is associated to change or difference, as in [math]\displaystyle{ \Delta S_2 - \Delta S_1 = \Delta S }[/math] in the case of average velocity. It's a another way to write the idea of increments, from one point to the next one.

Leibniz's notation: [math]\displaystyle{ \frac{dy}{dx} }[/math]. It looks like a ratio, but the meaning of it is not a ratio. If we look at the finding the tangent line problem, the tangent is a ratio that is associated to two points. We have [math]\displaystyle{ a }[/math] and [math]\displaystyle{ b }[/math] and [math]\displaystyle{ f(a) }[/math] and [math]\displaystyle{ f(b) }[/math]. We use those to draw a triangle and view the rate of change as a ratio, because it really is. In physics it's common to associate [math]\displaystyle{ \Delta S/\Delta t }[/math] as velocity, because it's a ratio between variation in space in respect to a variation in time. It's quite fine to associate the derivative with the tangent and a ratio. Centuries ago when Leibniz was developing calculus he probably had the same geometrical perspective.

Now [math]\displaystyle{ \frac{df}{dx} = \frac{d}{dx}f(x) }[/math] has an issue related to the meaning of infinitesimal and we don't learn about this in calculus, unless the teacher takes the time to explain it because it requires knowledge of concepts that are more advanced than the level of calculus. Why can we divide by [math]\displaystyle{ dx }[/math]? Because it represents "infinitely close to zero". Now a function itself is not a number, [math]\displaystyle{ df }[/math] represents a variation from [math]\displaystyle{ f(a) }[/math] to [math]\displaystyle{ f(b) }[/math] when [math]\displaystyle{ a \neq b }[/math]. The whole problem behind the word "infinitesimal" is that it attempts to convey the idea of the smallest number, which does not exist because every time you have a number, there is a larger and a smaller one. For the same reason, the largest number that is larger than every other doesn't exist too.

I mentioned "to divide" and that [math]\displaystyle{ \frac{dy}{dx} }[/math] is not a ratio. Let's make things clear:

[math]\displaystyle{ \frac{\Delta y}{\Delta x} = \lim_{\Delta x \to 0} \frac{f(x + \Delta x) - f(x)}{\Delta x} }[/math] (In fact, Leibniz did use the same notation as to divide a number by another because the limit does have a quotient. However, we are dealing with a limit and with limits we have the special case of dividing by something small that is close do zero but it's not zero itself)

When we have a derivative and want to calculate its value at [math]\displaystyle{ x = a }[/math], it's quite fine to associate it to the tangent line problem and the rise / run ratio. The problem is that when we interpret the derivative as a function. A function has two sets, the domain and the range and then the idea of a ratio between whole sets of numbers doesn't make sense.

For second order we write [math]\displaystyle{ \frac{d^2y}{dx^2} = \frac{dy}{dx}\left(\frac{dy}{dx}\right) }[/math]. Now this notation is pretty confusing at first because it's not a square, a power. The parenthesis does not mean that we are calculating a product. It means that we are taking the derivative and then again, the derivative of a derivative. This notation is rarely used with single variable calculus, being more common with multivariable functions.

Lagrange's notation: [math]\displaystyle{ f'(x) }[/math]. This notation has an advantage that is to say that the derivative is, in fact, a function. We read it as "f line" but I have no idea if it's a reference to the tangent line. Derive again and we have [math]\displaystyle{ f''(x) }[/math] (f two lines) and we can continue for as many times as we like.

This [math]\displaystyle{ \frac{dy}{dx} \Bigg|_{x \ = \ 1} }[/math] means the same as [math]\displaystyle{ f'(1) }[/math].

Newton's notation: [math]\displaystyle{ \dot{x} }[/math], [math]\displaystyle{ \ddot{x} }[/math], so on. I have no idea why Newton used this, but it's common in mechanics and other physics related equations.

Differential operator: [math]\displaystyle{ Df }[/math] is called a differential operator. A second order differential is [math]\displaystyle{ D^2f }[/math]. When we calculate a derivative, the process to find the derivative is called differentiation. I don't know functional analysis, but the word "operator" is akin to the arithmetic operations we all know. We add a number to produce another number. We operate with functions to produce other functions.