Explain the different phases of compiler with example a+b+c*d/e-f
Answers
Analysis part
• Analysis part breaks the source program into constituent pieces and imposes a grammatical structure on them which further uses this structure to create an intermediate representation of the source program.
• It is also termed as front end of compiler.
• Information about the source program is collected and stored in a data structure called symbol table.
Synthesis part
• Synthesis part takes the intermediate representation as input and transforms it to the target program.
• It is also termed as back end of compiler.
The design of compiler can be decomposed into several phases, each of which converts one form of source program into another.
The different phases of compiler are as follows:
1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
4. Intermediate code generation
5. Code optimization
6. Code generation
All of the aforementioned phases involve the following tasks:
• Symbol table management.
• Error handling.
Lexical Analysis
• Lexical analysis is the first phase of compiler which is also termed as scanning.
• Source program is scanned to read the stream of characters and those characters are grouped to form a sequence called lexemes which produces token as output.
• Token: Token is a sequence of characters that represent lexical unit, which matches with the pattern, such as keywords, operators, identifiers etc.
• Lexeme: Lexeme is instance of a token i.e., group of characters forming a token. ,
• Pattern: Pattern describes the rule that the lexemes of a token takes. It is the structure that must be matched by strings.
• Once a token is generated the corresponding entry is made in the symbol table.
Input: stream of characters
Output: Token
Token Template: <token-name, attribute-value>
(eg.) c=a+b*5;
Syntax Analysis
• Syntax analysis is the second phase of compiler which is also called as parsing.
• Parser converts the tokens produced by lexical analyzer into a tree like representation called parse tree.
• A parse tree describes the syntactic structure of the input.
Parse tree
• Syntax tree is a compressed representation of the parse tree in which the operators appear as interior nodes and the operands of the operator are the children of the node for that operator.
Input: Tokens
Output: Syntax tree
Semantic Analysis
• Semantic analysis is the third phase of compiler.
• It checks for the semantic consistency.
• Type information is gathered and stored in symbol table or in syntax tree.
• Performs type checking.
Answer:
Explanation:
Lexical Analysis:
LA or Scanner reads the source program one character at a time, separates the source program into a
sequence of atomic units called tokens. The usual tokens are keywords such as WHILE, FOR, DO or
IF, identifiers such as X or NUM, operator symbols such as <,<=,+,>,>= and punctuation symbols such
2 Marks
5
as parentheses or commas. The output of the lexical analyzer is a stream of tokens, which is passed to
the next phase.
Syntax Analysis:
The second phase is called Syntax analysis or parser. In this phase expressions, statements, declarations
etc… are identified by using the results of lexical analysis. It takes the token produced by lexical
analysis as input and generates a parse tree (or syntax tree). In this phase, token arrangements are
checked against the source code grammar, i.e., the parser checks if the expression made by the tokens is
syntactically correct.
Semantic Analysis:
Semantic analysis checks whether the parse tree constructed follows the rules of language. For
example, assignment of values is between compatible data types, and adding string to an integer. Also,
the semantic analyzer keeps track of identifiers, their types and expressions; whether identifiers are
declared before use or not, etc. The semantic analyzer produces an annotated syntax tree as an output.
Intermediate Code Generations:
After semantic analysis, the compiler generates an intermediate code of the source code for the target
machine. It represents a program for some abstract machine. It is in between the high-level language
and the machine language. This intermediate code should be generated in such a way that it makes it
easier to be translated into the target machine code. This phase bridges the analysis and synthesis
phases of translation.
Code Generation:
The last phase of translation is code generation. A number of optimizations to reduce the length of
machine language program are carried out during this phase. The output of the code generator is the
machine language program of the specified computer.