Quantcast
Channel: CodeSection,代码区,Python开发技术文章_教程 - CodeSec
Viewing all articles
Browse latest Browse all 9596

The ANTLR mega tutorial

$
0
0

The ANTLR mega tutorial
Parsers are powerful tools, and using ANTLR you could write all sort of parsers usable from many different languages.

In this complete tutorial we are going to:

explain the basis : what a parser is, what it can be used for see how to setup ANTLR to be used from javascript, python, Java and C# discuss how to test your parser present the most advanced and useful features present in ANTLR: you will learn all you need to parse all possible languages show tons of examples

Maybe you have read some tutorial that was too complicated or so partial that seemed to assume that you already know how to use a parser. This is not that kind of tutorial. We just expect you to know how to code and how to use a text editor or an IDE. That’s it.

At the end of this tutorial:

you will be able to write a parser to recognize different formats and languages you will be able to create all the rules you need to build a lexer and a parser you will know how to deal with the common problems you will encounter you will understand errors and you will know how to avoid them by testing your grammar.

In other words, we will start from the very beginning and when we reach the end you will have learned all you could possible need to learn about ANTLR.


The ANTLR mega tutorial

ANTLR Mega Tutorial Giant List of Content

What is ANTLR?

ANTLR is a parser generator, a tool that helps you tocreate parsers. A parser takes a piece of text and transform it in an organized structure , such as an Abstract Syntax Tree (AST). You can think of the AST as a story describing the content of the code or also as its logical representation created by putting together the various pieces.


The ANTLR mega tutorial

Graphical representation of an AST for the Euclidean algorithm

What you need to do to get an AST:

define a lexer and parser grammar invoke ANTLR: it will generate a lexer and a parser in your target language (e.g., Java, Python, C#, Javascript) use the generated lexer and parser: you invoke them passing the code to recognize and they return to you an AST

So you need to start by defining a lexer and parser grammarfor the thing that you are analyzing. Usually the “thing” is a language, but it could also be a data format, a diagram, or any kind of structure that is represented with text.

Aren’t regular expressionsenough?

If you are the typical programmer you may ask yourself why can’t I use a regular expression ? A regular expression is quite useful, such as when you want to find a number in a string of text, but it also has many limitations.

The most obvious is the lack of recursion: you can’t find a (regular) expression inside another one, unless you code it by hand for each level. Something that quickly became unmaintainable . But the larger problem is that it’s not really scalable: if you are going to put together even just a few regular expressions, you are going to create a fragile mess that would be hard to maintain.

It’s not that easy to use regular expressions

Have you ever tried parsing HTML with a regular expression? It’s a terrible idea, for one you risk summoning Cthulhu , but more importantly it doesn’t really work . You don’t believe me?Let’s see, you want to find the elements of a table, so you try a regular exprdatession like this one: <table>(.*?)</table> . Brilliant! You did it! Except somebody adds attributes totheir table, such as style or id . It doesn’t matter, you do this <table.*?>(.*?)</table> , but you actually cared about the data inside the table, so you then need to parse tr and td , but they are full of tags.

So you need to eliminate that, too. And somebody dares even to use comments like <!― my comment &gtl―>. Comments can be used everywhere, and that is not easy to treat with your regular expression. Is it?

So you forbid the internet to use comments in HTML: problem solved.

Or alternatively you use ANTLR, whatever seems simpler to you.

ANTLRvs writing your own parser by hand

Okay, you are convinced, you need a parser,but why to use a parser generator like ANTLR instead of building your own?

The main advantage of ANTLR is productivity

If you actually have to work with a parser all the time, because your language, or format, is evolving,you need to be able to keep the pace, something you can’t do if you have to deal with the details of implementing a parser. Since you are not parsing for parsing’s sake, you must have the chance to concentrate on accomplishing your goals. And ANTLR make it much easier to do that, rapidly and cleanly.

As second thing, once you defined your grammars you can ask ANTLR to generate multiple parsers in different languages. For example you can get a parser in C# and one in Javascript to parse the same language in a desktop application and in a web application.

Some people argue that writing a parser by hand you can make it faster and you can produce better error messages. There is some truth in this, but in my experience parsers generated by ANTLR are always fast enough. You can tweak them and improve both performance and error handling by working on your grammar, if you really need to. And you can do that once you are happy with your grammar.

Table of Contents or ok, I am convinced, show me what you got

Two small notes:

in the companion repository of this tutorial you are going to find all the code with testing, even where we don’t see it in the article the examples will be in different languages, but the knowledge would be generally applicable to any language Setup Beginner Mid-Level Setting Up the Chat Project in Javascript Working witha Listener Solving Ambiguities with Semantic Predicates Continuing the Chat in Python The Python Way of Working with a Listener Advanced The Markup Project in Java Transforming Code with ANTLR Joy and Pain of Transforming Code Dealing with Expressions The Spreadsheet Project in C# Final Remarks Setup

In this section we prepare our development environment to work with ANTLR: the parser generator tool, the supporting tools and the runtimes for each language.

1. Setup ANTLR

ANTLR is actually made up of two main parts: the tool, used to generate the lexer and parser, and the runtime, needed to run them.

The tool will be needed just by you, the language engineer, while the runtime will be included in the final software using your language.

The tool is always the same no matter which language you are targeting: it’s a Java program that you need on your development machine. While the runtime is different for every language and must be available both to the developer and to the user.

The only requirement for the tool is that you have installed at least Ja

Viewing all articles
Browse latest Browse all 9596

Latest Images

Trending Articles