Thomas A. Alspaugh

In ordinary usage

Something is said to be formal if it follows rules, conventions, or standard forms.

In logic and computer science

Here formal means something more specific. The things that are formal are languages. A formal language can be textual, graphical, sonic, … you name it. A formal language has:


At a minimum, it has a formal syntax, meaning rules by which anyone can determine whether a candidate utterance in the language is of a correct form. Preferably these rules can be interpreted by a program, so that syntactic membership in the language can be determined automatically.

The usual (and most useful) kind of syntax is a recursive one, in which utterances can be made not only as long as you want, by adding more pieces onto the end, but as deep as you want, by recursively taking a part of the utterance and replacing a piece at its core with a deeper piece.


It is desirable that a formal language also have formal semantics, which define the meaning of some or all syntactically-correct utterances. The semantics of the language can rule out some syntactically-correct utterances, as being ones that have no meaning (because the semantics defines no meaning for them).

As a practical matter, meanings will be recursive just as the syntax of utterances is, and the recursive structure of the meaning of an utterance will map onto the recursive structure of the syntax of the utterance.

Inference System

Finally, if a formal language has formal syntax and formal semantics, it is also desirable that it have a formal inference system as well, that let one (or a program, ideally) draw conclusions about semantically-meaningful utterances.


Even if a language only has #1, a formal syntax, this could be a step forward. It would be possible to focus only on the utterances in the language, with confidence that none were being missed. And a program could be written to determine membership—given a possible utterance, is it in the language or not? The program could even produce a derivation for utterances in the language.

Adding #2, the formal semantics, could be a further step forward. It would conclusively resolve disagreements about what specific utterances met, and if people are using the language to communicate, #2 would make it possible to eliminate ambiguity in the communication. If there are syntactically correct utterances that have no meaning, #2 would unambiguously identify them. And a program could be written to evaluate the meaning of each utterance in the language that has one, and flag those that don't.

Adding #3, the inference rules, could be yet another step forward. With inference rules, one could reason about utterances in general rather than only about specific ones. A program could be written to draw specific kinds of inferences supported by the rules.

But …

A means, not an end

Formality is a means for achieving other goals, not the goal itself. The goals are ideally:

  1. To abstract away what is unimportant or less important, and focus on what is important.
  2. To express the essence of the problem the language addresses.
  3. To make solving the problem as simple as it can be (but no simpler).
  4. To replace learning a lot of details with learning a few powerful principles. This doesn't necessarily mean that formality will make a language easier to learn (although it can), but it does mean that formality can make each of the things that have to be learned be a lot more worthwhile.

Formal from the beginning

#2 and #3, if present, will have necessarily influenced (usually strongly) the language given formal syntax in #1. In general it is not possible to take an informal language, or an informal language that has been neatened up for #1, and add #2 much less #3 satisfactorily.

A language that is planned from the beginning to have formal semantics and possibly inference rules will be quite different from a language that is planned without them to achieve informal goals.



The language of arithmetic—fully parenthesized—with + − × ÷ ( ) is formal, and here's why:

  1. It has a formal syntax, given by this grammar for arithmetic expressions E, with arithmetic expressions E1 and E2 and rational numbers R:
    E R A rational number is an expression
    (E1) An expression in parentheses is an expression
    +(E) A positive expression is an expression
    −(E) A negative expression is an expression
    (E1)+(E2) The sum of two expressions is an expression
    (E1)−(E2) The difference of two expressions is an expression
    (E1)×(E2) The product of two expressions is an expression
    (E1)÷(E2) The ratio of two expressions is an expression

    (When we are writing arithmetic expressions ourselves, we often leave out superfluous parentheses, but in this fully-parenthesized language they are all present.)

    These production rules let anyone, and even (and especially) a program, determine whether strings made out of digits, dots, parentheses, and + − × ÷ are or are not syntactically correct arithmetic expressions. For example,

    • −1
    • 2+3
    • (5)

    are syntactically correct, because we can construct a derivation for each one using the rules of the grammar, and

    • −×
    • 2.3.4
    • ((6+

    are not, because we can't for them.

  2. It has a formal semantics, which assigns a rational number value V  to each syntactically correct expression. (For this formal language, there are no syntactically correct expressions that are not also semantically correct.)

    We will use the notation V⟦E⟧ for the value of expression E. Because we don't like to do unnecessary work, we'll reuse the form of the syntax rules for the semantic rules.

    V⟦R⟧ R A rational number is its own value
    V⟦(E)⟧ V⟦E⟧ Enclosing in parentheses doesn't affect the value
    V⟦+(E)⟧ V⟦E⟧ Prepending + to a parenthesized subexpression doesn't affect the value
    V⟦−(E)⟧ V⟦E⟧ The value of a negated expression is the negative of the value of the subexpression
    V⟦(E1)+(E2)⟧ V⟦E1⟧ +V⟦E2 The value of a sum is the sum of the values of the subexpressions
    V⟦(E1)−(E2)⟧ V⟦E1⟧ −V⟦E2 The value of a difference is the difference of the values of the subexpressions
    V⟦(E1)×(E2)⟧ V⟦E1⟧ ×V⟦E2 The value of a product is the product of the values of the subexpressions
    V⟦(E1)÷(E2)⟧ V⟦E1⟧ ÷V⟦E2 The value of a ratio is the ratio of the values of the subexpressions
  3. It doesn't have to have inference rules, since for arithmetic we are primarily concerned with values, but it could; for example, you could probably construct rules to identify expressions whose parentheses could be simplified, and rules to do the simplification.

    For a slightly more complex language, in which variables could also be expressions, inference rules would be extremely useful. You might, for example, be able to infer whether an expression produced only integral values; or only positive values; and so forth.

Of course, the language of arithmetic wasn't initially designed to be this formal (think of wedges on clay tablets, and Roman numerals). It has evolved over the millennia, and in recent centuries the evolution has been towards greater formality and towards formalism that expresses the essence of each problem being addressed. It's interesting to examine the steps in this evolution from the point of view of what new goals were to be supported.

flip bgunflip