Character set and tokens
The words(tokens) and statements used in any language are formed from their basic character set. For example, the words used in English language are formed from its alphabet having 26 different symbols i.e. A,B, ..., Z or a,b, ..., z. The characters used in C are divided into four categories.
- Latter or alphabets
- Digits
- Special characters
- White spaces
The letters include the uppercase(A,B, ... ,Z) and lowercase(a,b, ..., z) alphabets of English language. The digits include 0,1,2, ..., 9. The C also uses special characters like ;(semicolon), '(single quotes), "(double quotes), +,-,*,/,%,>, = etc. for different purposes. The white spaces are used to separate the words or tokens. They are blank, tab and newline.
The character together makes special symbol or word knows as token. The examples of tokens are the words used to define data types like int, float, etc. Similarly any operator i.e. +,-,*,/,%,> etc, punctuation marks like ;(semicolon), braces { } etc are also tokens. The constants used in programs like 12, 12.65 are also tokens.