Something is said to be formal if it follows rules, conventions, or standard forms.
Here formal
means something more specific.
The things that are formal are languages.
A formal language can be textual,
graphical,
sonic,
…
you name it.
A formal language has:
At a minimum, it has a formal syntax, meaning rules by which anyone can determine whether a candidate utterance in the language is of a correct form. Preferably these rules can be interpreted by a program, so that syntactic membership in the language can be determined automatically.
The usual (and most useful) kind of syntax is a recursive one, in which utterances can be made not only as long as you want, by adding more pieces onto the end, but as deep as you want, by recursively taking a part of the utterance and replacing a piece at its core with a deeper piece.
It is desirable that a formal language also have formal semantics, which define the meaning of some or all syntactically-correct utterances. The semantics of the language can rule out some syntactically-correct utterances, as being ones that have no meaning (because the semantics defines no meaning for them).
As a practical matter, meanings will be recursive just as the syntax of utterances is, and the recursive structure of the meaning of an utterance will map onto the recursive structure of the syntax of the utterance.
Finally, if a formal language has formal syntax and formal semantics, it is also desirable that it have a formal inference system as well, that let one (or a program, ideally) draw conclusions about semantically-meaningful utterances.
Even if a language only has #1, a formal syntax, this could be a step forward. It would be possible to focus only on the utterances in the language, with confidence that none were being missed. And a program could be written to determine membership—given a possible utterance, is it in the language or not? The program could even produce a derivation for utterances in the language.
Adding #2, the formal semantics, could be a further step forward. It would conclusively resolve disagreements about what specific utterances met, and if people are using the language to communicate, #2 would make it possible to eliminate ambiguity in the communication. If there are syntactically correct utterances that have no meaning, #2 would unambiguously identify them. And a program could be written to evaluate the meaning of each utterance in the language that has one, and flag those that don't.
Adding #3, the inference rules, could be yet another step forward. With inference rules, one could reason about utterances in general rather than only about specific ones. A program could be written to draw specific kinds of inferences supported by the rules.
Formality is a means for achieving other goals, not the goal itself. The goals are ideally:
#2 and #3,
if present,
will have necessarily influenced (usually strongly)
the language given formal syntax in #1.
In general it is not possible to take an informal language,
or an informal language that has been neatened up
for #1,
and add #2 much less #3
satisfactorily.
A language that is planned from the beginning to have formal semantics and possibly inference rules will be quite different from a language that is planned without them to achieve informal goals.
The language of arithmetic—fully parenthesized—with + − × ÷ ( ) is formal, and here's why:
E | R | A rational number is an expression | |
(E1) | An expression in parentheses is an expression | ||
+(E) | A positive expression is an expression | ||
−(E) | A negative expression is an expression | ||
(E1)+(E2) | The sum of two expressions is an expression | ||
(E1)−(E2) | The difference of two expressions is an expression | ||
(E1)×(E2) | The product of two expressions is an expression | ||
(E1)÷(E2) | The ratio of two expressions is an expression |
(When we are writing arithmetic expressions ourselves, we often leave out superfluous parentheses, but in this fully-parenthesized language they are all present.)
These production rules let anyone, and even (and especially) a program, determine whether strings made out of digits, dots, parentheses, and + − × ÷ are or are not syntactically correct arithmetic expressions. For example,
are syntactically correct, because we can construct a derivation for each one using the rules of the grammar, and
are not, because we can't for them.
We will use the notation V⟦E⟧ for the value of expression E. Because we don't like to do unnecessary work, we'll reuse the form of the syntax rules for the semantic rules.
V⟦R⟧ | R | A rational number is its own value | |
V⟦(E)⟧ | V⟦E⟧ | Enclosing in parentheses doesn't affect the value | |
V⟦+(E)⟧ | V⟦E⟧ | Prepending + to a parenthesized subexpression doesn't affect the value | |
V⟦−(E)⟧ | −V⟦E⟧ | The value of a negated expression is the negative of the value of the subexpression | |
V⟦(E1)+(E2)⟧ | V⟦E1⟧ +V⟦E2⟧ | The value of a sum is the sum of the values of the subexpressions | |
V⟦(E1)−(E2)⟧ | V⟦E1⟧ −V⟦E2⟧ | The value of a difference is the difference of the values of the subexpressions | |
V⟦(E1)×(E2)⟧ | V⟦E1⟧ ×V⟦E2⟧ | The value of a product is the product of the values of the subexpressions | |
V⟦(E1)÷(E2)⟧ | V⟦E1⟧ ÷V⟦E2⟧ | The value of a ratio is the ratio of the values of the subexpressions |
For a slightly more complex language, in which variables could also be expressions, inference rules would be extremely useful. You might, for example, be able to infer whether an expression produced only integral values; or only positive values; and so forth.
Of course, the language of arithmetic wasn't initially designed to be this formal (think of wedges on clay tablets, and Roman numerals). It has evolved over the millennia, and in recent centuries the evolution has been towards greater formality and towards formalism that expresses the essence of each problem being addressed. It's interesting to examine the steps in this evolution from the point of view of what new goals were to be supported.