Building a Parser: Blocks, Nested Scopes, and Different AST Formats
In our previous post, we implemented statement lists. Now, let’s add support for block statements and nested scopes.
Block Statements

Block statements group multiple statements together and create new scopes. They’re fundamental for:
- Function bodies
- If statement branches
- Loop bodies
- Local scope creation
Updated Grammar
Let’s extend our grammar to support blocks:
Statement
: ExpressionStatement
| BlockStatement
| EmptyStatement
;
BlockStatement
: "{" OptStatementList "}"
;
OptStatementList
: StatementList
| ε
;
EmptyStatement
: ";"
;
Implementation
Here’s how we implement block statements:
class Parser {
/**
* Statement
* : ExpressionStatement
* | BlockStatement
* | EmptyStatement
* ;
*/
Statement() {
switch (this._lookahead.type) {
case '{':
return this.BlockStatement();
case ';':
return this.EmptyStatement();
default:
return this.ExpressionStatement();
}
}
/**
* BlockStatement
* : '{' OptStatementList '}'
* ;
*/
BlockStatement() {
this._eat('{');
const body = this._lookahead.type !== '}'
? this.StatementList('}')
: [];
this._eat('}');
return {
type: 'BlockStatement',
body,
};
}
/**
* EmptyStatement
* : ';'
* ;
*/
EmptyStatement() {
this._eat(';');
return {
type: 'EmptyStatement',
};
}
/**
* StatementList
* : Statement
* | Statement StatementList
* ;
*/
StatementList(stopLookahead = null) {
const statements = [this.Statement()];
while (this._lookahead !== null &&
this._lookahead.type !== stopLookahead) {
statements.push(this.Statement());
}
return statements;
}
}
Testing Block Statements
Let’s write comprehensive tests for our block implementation:
describe('Parser: Block Statements', () => {
let parser;
beforeEach(() => {
parser = new Parser();
});
describe('Basic Blocks', () => {
test('parses empty block', () => {
const ast = parser.parse('{}');
expect(ast).toEqual({
type: 'Program',
body: [{
type: 'BlockStatement',
body: []
}]
});
});
test('parses block with statements', () => {
const ast = parser.parse(`{
42;
"hello";
}`);
expect(ast).toEqual({
type: 'Program',
body: [{
type: 'BlockStatement',
body: [
{
type: 'ExpressionStatement',
expression: { type: 'NumericLiteral', value: 42 }
},
{
type: 'ExpressionStatement',
expression: { type: 'StringLiteral', value: 'hello' }
}
]
}]
});
});
});
describe('Nested Blocks', () => {
test('parses nested blocks', () => {
const ast = parser.parse(`{
42;
{
"hello";
}
}`);
expect(ast).toEqual({
type: 'Program',
body: [{
type: 'BlockStatement',
body: [
{
type: 'ExpressionStatement',
expression: { type: 'NumericLiteral', value: 42 }
},
{
type: 'BlockStatement',
body: [
{
type: 'ExpressionStatement',
expression: { type: 'StringLiteral', value: 'hello' }
}
]
}
]
}]
});
});
});
describe('Empty Statements', () => {
test('parses empty statement', () => {
const ast = parser.parse(';');
expect(ast).toEqual({
type: 'Program',
body: [{
type: 'EmptyStatement'
}]
});
});
test('parses multiple empty statements', () => {
const ast = parser.parse(';;;');
expect(ast).toEqual({
type: 'Program',
body: [
{ type: 'EmptyStatement' },
{ type: 'EmptyStatement' },
{ type: 'EmptyStatement' }
]
});
});
});
});
Key Implementation Details
-
Predictive Parsing: We use the lookahead token to determine which type of statement to parse:
{indicates a block statement;indicates an empty statement- Otherwise, we assume an expression statement
-
Optional Statement Lists: Block statements can be empty or contain multiple statements. We handle this by checking if the next token is
}before parsing the statement list. -
Nested Scopes: Block statements can naturally contain other block statements, creating nested scopes. This falls out naturally from our recursive descent parser design.
Common Edge Cases
Watch out for these scenarios:
// Empty blocks
{}
// Nested empty blocks
{ {} }
// Multiple empty statements
;;;
// Mixed blocks and empty statements
{
42;
;
{
"hello";
}
;
}
AST Format Flexibility
While we’ve been using a specific AST format throughout our implementation, it’s worth exploring how we can make our parser more flexible by supporting different AST formats.
Current Approach vs. Factory Pattern
Currently, we create AST nodes directly in our parser methods:
BlockStatement() {
this._eat('{');
const body = this._lookahead.type !== '}'
? this.StatementList('}')
: [];
this._eat('}');
return {
type: 'BlockStatement',
body,
};
}
A more maintainable approach is to use the Factory pattern:
// AST Node Factories
const DefaultFactory = {
Program(body) {
return {
type: 'Program',
body,
};
},
BlockStatement(body) {
return {
type: 'BlockStatement',
body,
};
},
EmptyStatement() {
return {
type: 'EmptyStatement',
};
}
};
// S-expression Factory for LISP-like format
const SExpressionFactory = {
Program(body) {
return ['begin', ...body];
},
BlockStatement(body) {
return ['begin', ...body];
},
EmptyStatement() {
return [];
}
};
class Parser {
constructor(astFormat = 'default') {
this._factory = astFormat === 'default' ? DefaultFactory : SExpressionFactory;
}
BlockStatement() {
this._eat('{');
const body = this._lookahead.type !== '}'
? this.StatementList('}')
: [];
this._eat('}');
return this._factory.BlockStatement(body);
}
}
Benefits of Factory Pattern
- Format Flexibility: Easy to switch between different AST formats
- Separation of Concerns: Parsing logic is decoupled from AST structure
- Maintainability: AST format changes only require updating the factory
- Multiple Outputs: Support for various use cases (e.g., different tools or languages)
Example Outputs
Default AST format:
{
type: 'Program',
body: [{
type: 'BlockStatement',
body: [{
type: 'ExpressionStatement',
expression: { type: 'NumericLiteral', value: 42 }
}]
}]
}
S-expression format:
['begin',
['begin',
42
]
]
Exercise for Readers
Try extending the parser to support:
- Block-level comments inside blocks
- Multiple blocks at the top level
- Error recovery for mismatched braces
Here’s a starting point for brace matching:
_validateBraces() {
let depth = 0;
for (const token of this._tokens) {
if (token.type === '{') depth++;
if (token.type === '}') depth--;
if (depth < 0) throw new SyntaxError('Unexpected closing brace');
}
if (depth !== 0) throw new SyntaxError('Missing closing brace');
}