Logic in mathematics

We start by outlining how mathematics is built and introducing the mathematical language, in particular we show some conventions that need to be understood. Then we look at proofs.

Language of mathematics

Mathematics is a study of artificial worlds of thought that have one important feature: Everything is black or white there, therefore formal logic can be applied, in particular the results one gets are totally reliable since their truthfullness is proved. To see how one gets there we have to start at the beginning. Mathematics studies objects, but not real ones (stones, frogs, stars) but rather ideal objects, or better yet, cathegories of ideal objects. One such cathegory may be real numbers, another cathegory can be even numbers, another one may be real functions etc.

If we want to have reliable answers, we need these cathegories to be clear, without any grays. For instance, when we take some number, then it must be clear whether it is an even number or not. In other words, there must be some test that gives definite answers. In this case the test is simple, you try to express the given number as a double of an integer. If it can be done, you've got yourself an even number, otherwise not. The great thing about it is that everybody, anywhere in the world, can make the same test and arrives at exactly the same answer.

Hopefully you appreciate by now how crucial such a clear-cut situation is. This is a big difference compared to art (no test whether something is beautiful or not), psychology (no definite test whether somebody is normal or not), philosophy etc. This lack of precision is what makes reliable results impossible in these fields.

Back to mathematics. Obviously before we start investigating something, we have to first make clear what we actually talk about, that is, define precisely what objects we investigate. Thus mathematical texts traditionally start with definitions. A definition is just a piece of text where we introduce some new cathegory (a name for an object or its property) and a test that allows us to decide whether an arbitrary object we pick does belong to this new cathegory or not.

Writing such a definition (or any other mathematical statement) using purely logical symbols would be possible, but also very hard to decipher. Therefore you rarely find those beautiful logical connectives in mathematical books and papers, they are hard on the eyes and mathematicians prefer to see human words when possible. Thus mathematics developed a language of its own which makes logical statements more similar to the way normal humans talk, but preserving precision of thought. Along the way it developed some quirks that one should know about in order to understand things properly.

For example, a definition of an even number may look like this.

Let n be an integer. We say that it is even if there is an integer k such that n = 2⋅k.

The first sentence is a preamble, an introduction that focuses our attention on a specific type of object that is already known (defined). In the second sentence we single out a special kind of integers (even ones) using a condition that decides whether an integer has this new property or not. Note that the preamble is not really necessary and we may also use a different cathegory there, for instance we may let n belong to the set of real numbers or even try for something more general. Allowing more objects into play does not spoil anything, since numbers other than integers have no chance of passing the test of being even anyway. But it might make it harder for the reader, so we usually use the first sentence to focus our attention to objects where the new notion makes sense.

In general, a definition typically has the following form.

We say that an object x satisfies property P (or we call this object P) if it satisfies the following condition C.

The idea is simple. When you are given an object, you apply the specified test and depending on the outcome, this particular object either is or is not called P. Note that there is a two-way relationship, which means logical equivalence. However, the definition above is written in the form of implication! This is because of tradition, for some reason mathematicians decided long time ago to write definitions as implications and it stuck. Although it is not technically correct, all mathematicians and their dogs know that in fact definitions are equivalences, so they feel no need to change this tradition, after all, if something has been happening for over a hundred years in most languages, it is really hard to change. You can think of it as a trade secret, definitions are equivalences although they are not written as such.

This is essentially the only open breach between mathematical language and formal logic. Now we pass on. Having defined some types of objects, we start investigating them and discover some facts about them, that is, some statements that are always true in the world of mathematics, we often call them theorems. Again, we prefer to use a more human language when stating them.

To show that we will consider one typical mathematical statement (a theorem in the form of implication) and look at various ways in which mathematicians can express it. Since there are no strict rules, the opinion on what is proper varies according to how picky a particular mathematician is. Some are really into formal correctness (in my misspent youth I even went as far as really writing definitions as equivalences, but I got to my senses since), some are more interested in how well things read. A typical theorem in a book may look like this.

Theorem.
Let f be a function defined on some open interval I. If f is differentiable on I, then it is also continuous on I.

Again we start with a preamble, it states what objects we will talk about in this theorem. Again, while this form is used perhaps most often, it is not completely correct from logical point of view as written. Namely, the word "arbitrary" is missing at two crucial places. We want this theorem to work for all functions on all intervals. A better version may therefore be the following one.

Theorem.
Let f be an arbitrary function defined on an arbitrary open interval I. If f is differentiable on I, then it is also continuous on I.

However, very few mathematicians are picky enough to actually write it like this. Thus we have another convention to remember: The word "let" in preambles has "for every" hidden in it, unless specified otherwise (say, a theorem may also start "Let there exist something", then it is an existencial quantifier there). Correct translation of out theorem into logical language would therefore be

Theorem.
For every open interval I and for every function f defined on I the following is true: If f is differentiable on I, then it is also continuous on I.

Now it is written exactly in the proper logical way, so one can even rewrite it using quantifiers:

Theorem.
∀ a < b ∈ ℝ^* ∀ f a function on I = (a,b): ( f is differentiable on I => f is continuous on I ).

With a bit of practice one can readily translate statements as written by mathematicians into proper logical expressions with quantifiers. Mostly it is obvious.

We started with the usual mathematical language and moved towards a more formal shape. Sometimes we move in the opposite way, especially when we just talk about mathematics. A very concise formulation is this:

Theorem.
Differentiable functions are also continuous.

This is a borderline case, used mostly informally, and quite a few mathematicians would raise an eyebrow in a significant way if they saw this in print. Above all, this statement does not say what functions are meant, which is - from the formal point of view - quite a serious breach of good manners (and most mathematicians are rather picky about such things). However, an experienced mathematician can readily fill in the missing parts (preamble) and appreciates how this short version captures the essence, which is useful for instance in lectures or when you just want to refer to this well-known fact.

Proofs

One thing that mathematicians are proud of is that every statement is 100% reliable, since everything is proved (well, everything apart from axioms, see below). Given that most statements are implications, we will focus here mostly (but not only) on various ways of proving implications. So consider one particular implication p => q. How does one go about proving it?

Every implication is satisfied when its assumption is false, so this case is irrelevant when deciding whether a particular implication is true or not. The key situation is when the assumption is true. Then we have to ask whether also the conclusion is true. If it is always so, then the implication is satisfied in all cases. If there is a case when the assumption is true but the conclusion is not, then the implication fails. In other words, one has to show that the case "assumption true—conclusion false" can never happen.

This brings us to one interesting point. When facing a statement, there are two things we may attempt: Prove it, or prove that it is false, which we call "disproving" that statement. Thus to disprove an implication, it is enough to show just one instance when its assumption is true but the conclusion is not. A typical example: The implication "If a number is a prime, then it is odd" is false, which we prove by exhibiting a counterexample, namely number 2. On the other hand, the implication "If a number larger than 2 is a prime, then it is odd" is true. One way to prove it would be to go through all primes larger than 2, but there are infinitely many of them and most of us do not have that much time on our hands. Thus one has to apply different methods, methods that can handle infinitely many numbers at the same time. This situation is fairly typical. Most mathematical statements start with a general quantifier, which in most cases means that we are supposed to show that something works in infinitely many cases. However, to disprove a general quantifier, it is enough to find just one counterexample.

This duality is in a sense universal. Some mathematical statements to be proved involve the existential quantifier, for instance, "there is a solution to such and such equation." Then the roles are reversed. To prove such a statement, it is enough to exhibit just one solution that works. On the other hand, to disprove such a statement, one has to prove that no concievable candidate works, which means showing something about infinitely many objects.

But now let us return to the problem of proving an implication.

Direct proof.

Direct proof work as follows. Since we are only interested in what happens when the assumption is true, we simply assume that it is so and then we use this assumption when building some argument showing that the conclusion is also true. Since the step from the assumption to the conclusion is usually far from simple (easy things are not called theorems), we often try to break it into very small steps that are obvious. That is, instead of trying to somehow justify the leap p => q, one shows lots of small jumps:

p => p₁ => p₂ => ... => p_n => q.

Each of those little implications must be either something so simple that its validity is obvious, or something that was already proved before, one can also expect that the induction assumption shoudl be used in justifying one of those little steps.

Many proofs fall into the direct proof box, however, such a direct path is not always possible. Sometimes the proof goes through more twisted paths that may even fork and merge, but as long as we start with p and after a while end up with q, it is a direct proof.

Example: Prove the following statement:

Let x be a real number. If it is positive, then also x⋅(x + 1) is positive.

We can use the definition of a positive number to rewrite this into a logical statement.

For every real number x: [if x > 0, then also x⋅(x + 1) > 0].

In order to prove this statement, one has to show validity of that implication for all real numbers, it is definitely not enough to just choose one concrete number (say, 13) and show that the implication works for it. We have to prove that implication for all real numbers, which is best done by choosing an arbitrary real number x and see whether the given implication works for it or not. Since we are taking an arbitrary number, we do not know which one it is, so the only things we can do with it are those that are true for all real numbers. That is, we will rely only on its general properties.

For this particular real number x (whichever it is) we need to show that the stated implication is true. We will try the direct proof, that is, we will take the assumption for granted and see what can be deduced from it. Thus we now have a number x about which we know that it is real and also that it is positive. As the first intermediate step we will show that then also x + 1 > 0. We start by taking the given inequality x > 0 and add 1 to both sides, which is a valid operation for all inequalities with real numbers, therefore a correct step. We get x + 1 > 1. Thus we have the chain of inequalities x + 1 > 1 > 0 and therefore x + 1 > 0 as claimed.

Now we will complete the proof by doing a second step. We know that every inequality can be multiplied by a positive number and still stay true. Thus we can multiply x + 1 > 0 by x (we assume and therefore take as true that it is positive) and obtain x⋅(x + 1) > 0, which completes the proof.

Somewhat more involved, but also more educational direct proof is here. We suggest that you look at it.

Indirect proof.

Indirect proof works as follows. Instead of proving the desired implication, we prove its contrapositive. It makes sense since we know that an implication and its contrapositive have exactly the same validity, so whatever we prove of a contrapositive (true, false) must also apply to the original implication. Since this contrapositive as a statement is also an implication, we can prove it for instance directly as explained above.

Example: We will prove that every prime larger than 2 is odd. One way to express it formally is this:

∀n∈{x∈ℕ; x > 2}: [if n is a prime, then n is odd].

This statement starts with a general quantifier again, so we start by taking an arbitrary natural number n that is more than 2. This means that when we start our proof, we will be able to use everything that is known about natural numbers and also the fact that n > 2. We need to prove that for this n the following implication is satisfied:

n is a prime => n is odd.

However, a direct proof is somewhat awkward here. We should start by assuming that n is a prime, but the definition of being a prime is not very convenient (it says that n cannot be writen in some way, which is a negative statement) and it is not clear how one can go on from there. We therefore decide to prove the contrapositive instead:

n is not odd => n is not a prime.

That is,

n is even => n is not a prime.

Proving this implication seems much easier, since it starts with a definite information: n is even. Thus we now assume that we have a natural number n greater than 2 and moreover, it is even. This means that there is an integer k such that n = 2k. Now all depends on k, what do we know about it? Since n > 2, necessarily k > 1. We managed to decompose n as a product of two natural numbers different from 1, which means that n is not a prime and the contrapositive is proved. Consequently, also the original implication is true.

This example shows why it may be useful to pass to a contrapositive: The properties used in the original implication may become much nicer when negated. A typical example would be properties that involve the relation "not equal to", then the negation will change it into equality, a much better relation from practical point of view. This is the case of the definition of a 1-1 function. Another good reason for trying an indirect proof may be that negation changes quantifiers from general to existential and vice versa, which may also sometimes help. Finally, sometimes we simply prefer start working at the "other end" of an implication.

Remark: The statement about primes can be formally expressed in other ways, for instance like this:

∀n∈ℕ: [if (n is a prime and n > 2), then n is odd].

This is an equally good way to express our fact, but when passing to the contrapositive we would have to do a negation of the conjunction on the left, which seems a bit more work than what we had in our proof above. This is in a sense typical. Often there are more ways to express formally a certain idea, and depending which way we choose, the proof may range from relatively simple to quite long.

Proof by contradiction.

Consider some statement r (not necessarily an implication). One possibility to prove its validity is to proceed by contradiction. We assume that r fails and then show that this is somehow in conflicts with known facts. This then shows that r must be true. It is worthwhile to look at this reasoning closer. Formally, a proof by contradiction means that we will prove validity of the following implication:

¬ r => F.

What does it give us? The conclusion of this implication is a false statement and the only way such an implication can be true is if also the assumption is false. But ¬ r false means that r is true, exactly what we wanted.

Proof by contradiction is useful when proving that something cannot happen. This is a "negative" information, which is generally hard to prove. A proof by contradiction starts with negation, that is, with assumption that something can happen, which gives us something definite to start from.

Now we apply this to the case of implication. We need to negate it, a proof by contradiction of an implication p => q means that we should prove validity of

(p and non q) => F.

Thus we assume that p is true while q is not and then show that these two assumptions together lead to some nonsense.

Example: We will try to prove the above statement about primes again, this time by contradiction. So take an arbitrary natural number n larger than 2. We need to show that if it is a prime, then it is odd.

Since we want to go by contradiction, we will assume that this number is a prime but it is not odd. But then this number must be even so we can write it as n = 2k for some natural number k. Since we assume that n is a prime, we must have k = 1 and therefore n = 2. But we also assume that n > 2, which brings us to the conclusion that 2 > 2, definitely a false statement. Thus the proof by contradiction is complete.

In practice we rarely go all the way to some explicit false statement, we usually we stop a bit earlier, the moment we arrive at conflicting situation. In the above proof we would go like this: "...and therefore n = 2, but that contradicts our assumption n > 2. Proof is complete." We will show another proof by contradiction to make the basic idea clearer.

Example: We will prove that there is no smallest positive rational number, which in particular means that rational numbers cannot be arranged according to size. We will express it precisely.

The set of positive rational numbers does not have its smallest element (minimum).

Negative statements are often close to impossible to prove directly, so negation — that is, proof by contradiction — is an obvious start. So assume that the statement above is false, that is, assume that the set of positive rational numbers has its least element, its minimum (see Topology of real numbers in Functions - Theory - Real functions), call it r. Since it is a rational number, then also the number r/2 must be a rational number, and it is positive as well. Thus r/2 is an element of the set of positive rational numbers and also r/2 < r, which contradicts the assumption that r is a minimum of that set. The proof is complete.

Note that we did not really arrive at a statement that is false, but stopped at the first comflicting situation, just like we advertised above. If somebody really wanted to see this done all the way, here it comes: Since r/2 is a positive rational number and r is a minimal element of this set, we must have r/2 ≥ r. Dividing by r (a positive number, so dividing by it does not change the inequality) and rearranging it a bit we arrive at the inequality 1 ≥ 2, which is the desired nonsense.

This concludes the section on proofs.

We offer another proof here. It is a direct proof, but somewhat more difficult, so it may be a challenge for you to go through it and try to understand.

Appendix: Axioms and things one can rely on.

You might have noticed that in proofs we rely on things that are already known, but these had to be also proved and those proofs had to use something reliable and so on, where does it all end? Mathematical theories are like trees, from known statements and properties we branch up to more and more statements, each is proved using things that are lower in that tree. But what are the roots, what is at the beginning? Actually, nothing. There are no basic facts in mathematics that are somehow true by themselves.

This may seem unsettling, so let's look at it somewhat closer. Mathematicians spent a lot of time tracking down proofs and came up with a relatively short list of things that are needed in order to build all the rest. It was decided to take those key things for granted, these basic facts are called axioms. They define basic rules and all the rest of mathematics stems from them. Thus when a mathematician says that something is true, it actually means that it is true assuming that all the standard axioms are accepted as true. And where do those axioms come from? Since we want math to be useful, axioms are chosen in such a way that they agree with the way the world appears to us (something like "1 + 1 = 2"). So far it seems that these standard axioms were chosen well, since the results we obtained based on our axioms work.

However, this does not mean that mathematicians actually need to believe in these axioms. As a matter of fact, many mathematicians like to play and just for fun they try to see what would happen if they tried to take a different set of axioms. They obtain alternative mathematical theories, describing worlds where things work differently from ours, even a small change in basic axioms may have far reaching consequences. Some of these worlds are fascinating and deserve further attention, some are funny and some weird. Probably the most amazing thing is that some of these different worlds turn out to be useful. For instance, in the 19th century, mathematicians tried to see how things would work in a world where geometry (lines, angles, parallels etc.) is almost but not quite the same as the one we are used to (they changed one axiom). To everyone's surprise, these results were exactly what later Einstein needed to describe the universe. Similarly, mathematicians were curious about what would happen with spaces if the dimension was increased to infinity. Such infinite-dimensional spaces then became a key tool for physicists when they got to playing with atoms (quantum theory).

To sum it up, mathematics can be also viewed as follows: We create a world by specifying its objects and some basic rules governing them (definitions) and mathematics then deduces reliable information about such a world. Investigating worlds whose basic rules are not exactly the same as those of our own world is hard. We cannot use our experience with how things work, we cannot make experiments, we cannot use our senses. There is a constant danger, the temptation to automatically use something that we consider obvious, because we know it from our world. The only defence against such a mistake is logic, it makes sure that whatever we claim about a certain world follows only from its basic rules. Its role of a guardian angel of correctness is crucial also when investigating our world using the standard set of axioms, where it protects us from mistaken assumptions and other mistakes of reasoning. This concludes this remark and this section.