LR(1)分析器

需求

  1. 构建项目活前缀DFA
  2. 构建分析预测表
  3. 完成对输入串的解析

设计思路

  • 文法处理

使用文件储存文法,对文法的要求时不同符号使用空格隔开,这样有利于解析文法时使用流符号对行分割。
使用map对文法进行数据上的储存,key为产生式左部,value为string的二维数组,这样做的好处是在于对于左部相同的不同产生式,其右部能被储存到一起,方便访问。

  • 构建项目活前缀

首先需要设计计算闭包函数和gotoTransf函数,这里的goto和goto表的goto含义不同,这里的goto对应算法的动作是小点后移
使用数据结构LR1Item,定义在~/include/LR.h(~这里指代项目根目录),使用dotPos来记录点所处的位置。
闭包函数的思路是,先将输入的项目集放入输出的项目集(最简单的闭包就是自身),同时将它们放入一个队列,依次访问队列计算其闭包,对每一个节点的产生式左部在文法中查找,生成一个含有新的向前搜索符号(lookahead)的项目,在项目集中查重,如果不存在则将它放入闭包项目集和队列
gotoTransf的思路是,将待约和移进项目的小点位置+1,然后计算闭包

  • 构建分析预测表

使用二维vector储存状态,遍历每个状态的项目集,根据其产生式左部(symbol)和小点位置计算

  • 完成对输入串的解析

书上有算法

实现

遇到的问题

没什么大的问题。最大的问题可能是发现忘记实现lookahead的计算规则了

一些感想

标准库真好用,不用手写轮子

目录说明

文档

设计、实现、测试的说明即为本markdown文档, 输出文件位于~/result/,截图位于下方
Screenshot-from-2024-05-20-06-59-53.png
Screenshot-from-2024-05-20-06-59-34.png

源代码

~/source/main.cc, ~/source/LR.cpp为源程序, 采用命令行参数的设计,使用时使用-g “file directory” -i "input string"运行
LR.h

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#ifndef _LR_H
#define _LR_H

#include <string>
#include <vector>
#include <map>
#include <set>
#include <unordered_set>
//LR1项目
struct LR1Item {
std::string lhs;
std::vector<std::string> rhs;
int dotPos;
std::string lookahead;

bool operator==(const LR1Item& other) const {
return lhs == other.lhs && rhs == other.rhs && dotPos == other.dotPos && lookahead == other.lookahead;
}
};

using ItemSet = std::set<LR1Item>;

class LR {
public:
LR(const std::string& filename);
void readGrammar();
void extendGrammar();
std::map<std::string, std::vector<std::vector<std::string>>> getGrammar() const;
std::unordered_set<std::string> computeFirst(const std::vector<std::string>& symbols) const;
std::vector<LR1Item> computeClosure(const std::vector<LR1Item>& items, const std::map<std::string, std::vector<std::vector<std::string>>>& grammar);
std::vector<LR1Item> gotoTransf(const std::vector<LR1Item>& items, const std::string& symbol, const std::map<std::string, std::vector<std::vector<std::string>>>& grammar);
void constructDFA();
void printDFA() const;
void constructParsingTable();
void printParsingTable() const;
std::vector<std::string> lex(const std::string& input);
void parse(const std::string& input);
private:
std::string filename;
std::map<std::string, std::vector<std::vector<std::string>>> grammar;
void parseLine(const std::string& line);
std::vector<std::vector<LR1Item>> states;
std::map<std::pair<int, std::string>, int> transitions;
std::map<int, std::map<std::string, std::string>> actionTable;
std::map<int, std::map<std::string, int>> gotoTable;
std::string startSymbol;
std::string originStartSymbol;
};

#endif

LR.cpp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
#include "LR.h"
#include <fstream>
#include <sstream>
#include <iostream>
#include <algorithm>
#include <queue>
#include <stack>

LR::LR(const std::string& filename) : filename(filename) {}

void LR ::readGrammar() {
std::ifstream file(filename);
if (!file.is_open()) {
std::cerr << "Error opening file: " << filename << "\n";
return;
}

std::string line;
while (std::getline(file, line)) {
parseLine(line);
}

file.close();
}

//要求产生式的左部,箭头,右部隔开
void LR::parseLine(const std::string& line) {
std::istringstream iss(line);
std::string lhs;
std::string arrow;
iss >> lhs >> arrow;

std::vector<std::string> rhs;
std::string symbol;
while (iss >> symbol) {
rhs.push_back(symbol);
}

grammar[lhs].push_back(rhs);
}

std::map<std::string, std::vector<std::vector<std::string>>> LR::getGrammar() const {
return grammar;
}

void LR ::extendGrammar() {
auto it = grammar.begin();
if (it != grammar.end()) {
startSymbol = it->first + "'";
originStartSymbol = it->first;
grammar[startSymbol] = {{it->first}};
}
}

std::unordered_set<std::string> LR::computeFirst(const std::vector<std::string>& symbols) const {
std::unordered_set<std::string> firstSet;
for (const auto& symbol : symbols) {
if (grammar.find(symbol) == grammar.end()) { // 终结符
firstSet.insert(symbol);
break;
} else { // 非终结符
bool hasEpsilon = false;
for (const auto& production : grammar.at(symbol)) {
auto first = computeFirst(production);
firstSet.insert(first.begin(), first.end());
if (firstSet.find("ε") != firstSet.end()) {
hasEpsilon = true;
firstSet.erase("ε");
}
}
if (!hasEpsilon) {
break;
}
}
}
return firstSet;
}

std::vector<LR1Item> LR::computeClosure(const std::vector<LR1Item>& items, const std::map<std::string, std::vector<std::vector<std::string>>>& grammar) {
std::vector<LR1Item> closureItems = items;
std::queue<LR1Item> workQueue;

for (const auto& item : items) {
workQueue.push(item);
}

while (!workQueue.empty()) {
LR1Item item = workQueue.front();
workQueue.pop();

//如果是归约,则不计算
if (item.dotPos < item.rhs.size()) {
std::string symbol = item.rhs[item.dotPos];
if (grammar.count(symbol)) {
std::vector<std::string> beta = {item.rhs.begin() + item.dotPos + 1, item.rhs.end()};
beta.push_back(item.lookahead);
auto firstSet = computeFirst(beta);
//grammar.at返回的结果是vector<vector<string>>,表示symbol的所有产生式
for (const auto& production : grammar.at(symbol)) {
for (const auto& lookahead : firstSet) {
LR1Item newItem = {symbol, production, 0, lookahead};
if (std::find(closureItems.begin(), closureItems.end(), newItem) == closureItems.end()) {
closureItems.push_back(newItem);
workQueue.push(newItem);
}
}
}
}
}
}

return closureItems;
}


//待约
std::vector<LR1Item> LR::gotoTransf(const std::vector<LR1Item>& items, const std::string& symbol, const std::map<std::string, std::vector<std::vector<std::string>>>& grammar) {
std::vector<LR1Item> gotoItems;

for (const auto& item : items) {
if (item.dotPos < item.rhs.size() && item.rhs[item.dotPos] == symbol) {
LR1Item newItem = item;
newItem.dotPos++;
gotoItems.push_back(newItem);
}
}

return computeClosure(gotoItems, grammar);
}

void LR::constructDFA() {
LR1Item startItem = {startSymbol, {originStartSymbol}, 0, "$"};
std::vector<LR1Item> startState = computeClosure({startItem}, grammar);

states.push_back(startState);
std::queue<int> stateQueue;
stateQueue.push(0);

while (!stateQueue.empty()) {
int stateIndex = stateQueue.front();
stateQueue.pop();

std::set<std::string> symbols;
for (const auto& item : states[stateIndex]) {
if (item.dotPos < item.rhs.size()) {
symbols.insert(item.rhs[item.dotPos]);
}
}

for (const auto& symbol : symbols) {
std::vector<LR1Item> newState = gotoTransf(states[stateIndex], symbol, grammar);
if (newState.empty()) continue;

int newStateIndex;
if (auto it = std::find(states.begin(), states.end(), newState); it == states.end()) {
states.push_back(newState);
newStateIndex = states.size() - 1;
stateQueue.push(newStateIndex);
} else {
newStateIndex = it - states.begin();
}

transitions[{stateIndex, symbol}] = newStateIndex;
}
}
}

void LR::printDFA() const {
for (size_t i = 0; i < states.size(); ++i) {
std::cout << "State " << i << ":\n";
for (const auto& item : states[i]) {
std::cout << item.lhs << " -> ";
for (size_t j = 0; j < item.rhs.size(); ++j) {
if (j == item.dotPos) std::cout << ".";
std::cout << item.rhs[j] << " ";
}
if (item.dotPos == item.rhs.size()) std::cout << ".";
std::cout << ", " << item.lookahead << "\n";
}
std::cout << "\n";
}

std::cout << "Transitions:\n";
for (const auto& transition : transitions) {
std::cout << "State " << transition.first.first << " --" << transition.first.second << "--> State " << transition.second << "\n";
}
}

void LR::constructParsingTable() {
for (size_t i = 0; i < states.size(); ++i) {
for (const auto& item : states[i]) {
if (item.dotPos < item.rhs.size()) {
std::string symbol = item.rhs[item.dotPos];
if (!grammar.count(symbol)) { // 如果是终结符
int nextState = transitions[{i, symbol}];
actionTable[i][symbol] = "shift " + std::to_string(nextState);
}
} else {
if (item.lhs == startSymbol) {
actionTable[i]["$"] = "accept";
} else {
std::string production = item.lhs + " ->";
for (const auto& sym : item.rhs) {
production += " " + sym;
}
actionTable[i][item.lookahead] = "reduce " + production;
}
}
}
for (const auto& transition : transitions) {
//遍历转移表,如果找到与状态i匹配且是终结符,那么将转移表的要转移的状态添加都goto表中
if (transition.first.first == i && grammar.count(transition.first.second)) {
gotoTable[i][transition.first.second] = transition.second;
}
}
}
}

void LR::printParsingTable() const {
std::cout << "ACTION Table:\n";
for (const auto& state : actionTable) {
std::cout << "State " << state.first << ":\n";
for (const auto& action : state.second) {
std::cout << " " << action.first << ": " << action.second << "\n";
}
}

std::cout << "GOTO Table:\n";
for (const auto& state : gotoTable) {
std::cout << "State " << state.first << ":\n";
for (const auto& go : state.second) {
std::cout << " " << go.first << ": " << go.second << "\n";
}
}
}


std::vector<std::string> LR::lex(const std::string& input) {
std::vector<std::string> tokens;
std::string token;
for (size_t i = 0; i < input.size(); ++i) {
char ch = input[i];
if (std::isspace(ch)) {
continue;
} else if (std::isdigit(ch)) {
token += ch;
while (i + 1 < input.size() && std::isdigit(input[i + 1])) {
token += input[++i];
}
tokens.push_back("num");
token.clear();
} else {
tokens.push_back(std::string(1, ch));
}
}
tokens.push_back("$");
return tokens;
}

void LR::parse(const std::string& input) {
std::cout << "Input string: " << input << "\n";
std::stack<int> stateStack;
std::stack<std::string> symbolStack;
stateStack.push(0);
auto tokens = lex(input);
size_t ip = 0;

while (true) {
int state = stateStack.top();
std::string symbol = tokens[ip];

if (actionTable[state].find(symbol) == actionTable[state].end()) {
std::cout << "Error: Unexpected symbol '" << symbol << "' at position " << ip << "\n";
std::cerr << "Error: Unexpected symbol '" << symbol << "' at position " << ip << "\n";
return;
}

std::string action = actionTable[state][symbol];

if (action.find("shift") == 0) {
int newState = std::stoi(action.substr(6));
stateStack.push(newState);
symbolStack.push(symbol);
ip++;
} else if (action.find("reduce") == 0) {
std::string production = action.substr(7);
size_t pos = production.find(" -> ");
std::string lhs = production.substr(0, pos);
std::string rhs = production.substr(pos + 4);
std::istringstream iss(rhs);
std::vector<std::string> rhsSymbols;
std::string sym;
while (iss >> sym) {
rhsSymbols.push_back(sym);
}

for (size_t i = 0; i < rhsSymbols.size(); ++i) {
stateStack.pop();
symbolStack.pop();
}

state = stateStack.top();
stateStack.push(gotoTable[state][lhs]);
symbolStack.push(lhs);

std::cout << "Reduce by: " << production << "\n";
} else if (action == "accept") {
std::cout << "Accepted!" << "\n";
return;
} else {
std::cerr << "Error: Invalid action '" << action << "'" << "\n";
return;
}
}
}

main.cpp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#include "LR.h"
#include <iostream>
#include <boost/program_options.hpp>

namespace po = boost::program_options;

int main(int argc, char* argv[]) {
std::string filename;
std::string input;

po::options_description desc("options");
desc.add_options()
("help,h", "help message")
("grammar,g", po::value<std::string>(&filename)->required(), "grammar file")
("input,i", po::value<std::string>(&input)->required(), "input string");
po::variables_map vm;
try {
po::store(po::parse_command_line(argc, argv, desc), vm);

if (vm.count("help")) {
std::cout << desc << "\n";
return 0;
}

po::notify(vm);
} catch (const po::error &e) {
std::cerr << "ERROR: " << e.what() << "\n";
std::cerr << desc << "\n";
return 1;
}

LR parser(filename);
parser.readGrammar();
parser.extendGrammar();
parser.constructDFA();
parser.printDFA();
parser.constructParsingTable();
parser.printParsingTable();


parser.parse(input);

return 0;
}

测试用例集

在输出文件中末尾有input的打印