Quantcast
Channel: CodeSection,代码区,Python开发技术文章_教程 - CodeSec
Viewing all articles
Browse latest Browse all 9596

The Fastest FizzBuzz In The West

$
0
0

Wherein our lowly protagonist gets fed up with the state of software development interviews, and creates a new programming language which is particularly well-suited to implementing FizzBuzz (with the help of python and RPLY ).

I'm sure almost everyone's heard of FizzBuzz , but if you haven't, it's is a mythical interview question to see how well you can write code in an environment in which you will never actually write code.

The goal is to sufficiently intimidate junior developers so they will know to steer clear of your organization. It also has the nice effect of offending senior developers as well, so it's great for quickly narrowing down your hiring pipeline!

The general idea is as follows:

Write a program that prints the numbers 1 through n , however for every number divisible by 3 , print 'fizz' and for every number divisible by 5 , print 'buzz' . If a number is divisible by both 3 and 5 , print 'fizzbuzz' .

We don't use FizzBuzz or anything like it when we screen potential new hires at PromptWorks, and I've never actually been asked to implement it. But, if you gave me this question in an interview, after my initial shock wore off, I might have asked, "Can I use Python?" And if you said yes, I might have written something like this:

def fizzbuzz(n): for i in range(1, n+1): if i % 15 == 0: print 'fizzbuzz' elif i % 3 == 0: print 'fizz' elif i % 5 == 0: print 'buzz' else: print i

Which would technically produce the correct output:

>>> fizzbuzz(15) 1 2 fizz 4 buzz fizz 7 8 fizz buzz 11 fizz 13 14 fizzbuzz

But that's no fun!

Plus, it took way too long to write. Which is why I created my own programming language, called DIVSPL (Dustin Ingram's Very Special Programming Language) , which is particularly well-suited for implementing FizzBuzz, quickly! And in this article, I'm going to walk you through each step in building DIVSPL from scratch.

We'll build DIVSPL with RPLY , an implementation of PLY , but with a "cooler" API, and compatible with RPython , a restricted subset of the Python programming language.

PLY is an implementation of two very old and battle tested Unix tools, Lex , which generates lexical analyzers, and Yacc , which is a parser generator.

Let's design a language

Here's what I propose as an example syntax of DIVSPL , which just happens to correspond to an implementation of the FizzBuzz problem mentioned above:

1...15 fizz=3 buzz=5

This allows us to express two things in the language: a range of numbers, and some form of variable assignment.

Let's make a lexer

First things first: we need to build a lexer. The lexer takes a stream of characters as input, and outputs a list (or an iterator) of tokens from the language. This is referred to as "tokenization." When using RPLY, tokens are represented as a named regular expression. In DIVSPL , we'll have a total of four tokens:

ELLIPSIS , which is literally three periods in a row: ... ; NUMBER , which is one or more numeric characters in a row, such as 42 ; EQUALS , which is just the equals character = ; and WORD , which is one or more lowercase alphabetic characters, such as foo .

Also, because we're cool, we'll ignore all whitespace characters, and because we're good developers, we'll allow for comments, which involves ignoring everything between a # symbol and a newline character \n .

Here's how we create our LexerGenerator as described:

from rply import LexerGenerator lg = LexerGenerator() lg.add("ELLIPSIS", r"\.\.\.") lg.add("NUMBER", r"\d+") lg.add("EQUALS", r"=") lg.add("WORD", r"[a-z]+") lg.ignore(r"\s+") # Ignore whitespace lg.ignore(r"#.*\n") # Ignore comments

And here's how we build the lexer from the lexer generator:

lexer = lg.build()

We can use the lexer as an iterator: when given a stream of characters, it gives us a new token in our "language" every time we call it:

>>> iterator = lexer.lex('...foo42hut=') >>> iterator.next() Token('ELLIPSIS', '...') >>> iterator.next() Token('WORD', 'foo') >>> iterator.next() Token('NUMBER', '42') >>> iterator.next() Token('WORD', 'hut')

If our stream has invalid tokens, it works as expected, until we get to a token the lexer doesn't understand:

>>> iterator = lexer.lex('foobar!!!') >>> iterator.next() Token('WORD', 'foobar') >>> iterator.next() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "lexer.py", line 53, in next raise LexingError(...) rply.errors.LexingError

Let's make a parser

Specifically, we're going to make a Look-Ahead Left-Right parser. The parser will take the token stream from the lexer and separate and analyze the tokens according to a set of production rules specified by a formal grammar.

Let's define our grammar

Using only the tokens in our language ( ELLIPSIS , NUMBER , EQUALS , and WORD ), we'll now define a formal grammar as a Context Free Grammar (CFG), which is a series of rules called "productions" which describe how to combine the tokens of the language into a syntax, thus describing all possible strings in the language.

For example, if we simplified the English language to only the tokens ARTICLE , NOUN , and VERB (which would match all articles, nouns, and verbs respectively), we might create a production rule called sentence as such:

$$ sentence \longrightarrow \mathit{ARTICLE}\ \ \mathit{NOUN}\ \ \mathit{VERB} $$

This would match the phrase, "The boy runs," but not the phrase, "The boy runs quickly."

Productions can also refer to other previously defined productions, or themselves. To support adverbs, we could add the token ADVERB and the following rule:

$$ \begin{align} sentence &\longrightarrow \mathit{ARTICLE}\ \ \mathit{NOUN}\ \ \mathit{VERB} \\ sentence &\longrightarrow sentence\ \ \mathit{ADVERB} \\ \end{align} $$

And now the phrase, "The boy runs quickly," would be recognized by our grammar as a syntactically correct sentence.

Now, we can define the grammar for DIVSPL as follows:

$$ \begin{align} main &\longrightarrow range\ \ assignments \\ assignments &\longrightarrow \epsilon \\ assignments &\longrightarrow assignments\ \ assignment \\ assignment &\longrightarrow word\ \ \mathit{EQUALS}\ \ number \\ range &\longrightarrow number\ \ \mathit{ELLIPSIS}\ \ number \\ word &\longrightarrow \mathit{WORD}\ \\ number &\longrightarrow \mathit{NUMBER}\ \\ \end{align} $$

Here, the \(\epsilon\) character represents an epsilon production, which matches on an empty string, and, in combination with the previous production, allows assignments to match anywhere from zero to an infinite number of assignment rules.

The \(main\) rule represents an entire valid DIVSPL program, and evaluating it will produce the output of our entire program.

Let's play with boxes

To build this grammar with RPLY, we'll create what is called a "box" in RPLY to represent the return value for each of these expressions.

"Boxes" are just Python classes which extend from rply.token.BaseBox . From https://github.com/alex/rply#why-do-we-have-the-boxes :

In RPython, like other statically typed languages, a variable must have a specific type, we take advantage of polymorphism to keep values in a box so that everything is statically typed. You can write whatever boxes you need for your project.

Note: While we don't necessarily need to use "boxes" here, if we want DIVSPL to be compatible with RPython as well, we'll need to use them. There's no particular reason why DIVSPL needs to be compatible with RPython, but since RPLY already is, we'll follow suit and do the same. It does have the one advantage of being significantly faster that using Plain ol' Python.

First up is the implementation of our \(number\) rule, the IntBox :

class IntBox(BaseBox): def __init__(self, value): self.value = int(value.getstr()) def int(self): return self.value

The IntBox takes a single value, and stores it as an integer. It has one function, int , which returns that value.

The box for our next rule, \(word\), is similar:

class WordBox(BaseBox): def __init__(self, value): self.value = value.getstr() def str(self): return self.value

Here, the WordBox also takes one value, which it stores as a string, and which is the return value for its str function.

Next is the implementation for our \(range\) rule, the RangeBox :

from rply.token import BaseBox class RangeBox(BaseBox): def __init__(self, low, high): self.low = low self.high = high def range(self): return range(self.low.int(), self.high.int() + 1)

A RangeBox takes two IntBox es, low and high , and when evaluated, returns an inclusive range:

>>> low = IntBox(Token('NUMBER', '1')) >>> high = IntBox(Token('NUMBER', '3')) >>> box = RangeBox(low, high) >>> box.range() [1, 2, 3]

Next is the "box" for our \(assignment\) (singular) rule, AssignmentBox , which looks like this:

class AssignmentBox(BaseBox): def __init__(self, word, number): self.word = word self.number = number def eval_with(self, i): if not i % int(self.number.int()): return self.word.str() return ''

The AssignmentBox has a quirky eval_with function which takes an argument i as a parameter. If i is divisible by self.number , it returns self.word . Otherwise, it returns an empty string:

>>> word = WordBox(Token('WORD', 'foo')) >>> number = IntBox(Token('NUMBER', 7)) >>> box = AssignmentBox(word, number) >>> box.eval_with(42) 'foo' >>> box.eval_with(40) ''

Our \(assignments\) rule also needs a "box":

class AssignmentsBox(BaseBox): def __init__(self, assignments=None, assignment=None): self.assignments = assignments self.assignment = assignment def list(self): if self.assignments: return self.assignments.list() + [self.assignment] return []

This "box" has a function list which will return a list of assignments if there are any, otherwise it returns an empty list.

Finally, we implement the "box" for our \(main\) rule, MainBox , as follows:

class MainBox(BaseBox): def __init__(self, range_box, assignments): self.range_box = range_box self.assignments = assignments def eval(self): lines = [] for i in self.range_box.range(): line = "" for assignment in self.assignments.list(): line += assignment.eval_with(i) lines.append(line or str(i)) return "\n".join(lines) + "\n"

(Note that we can't use fancy list comprehensions here, because RPython prohibits it... :disappointed: )

The __init__ method of the MainBox is straightforward enough, but that eval function looks crazy. Let's break it down:

First, it creates an empty list, lines . For every integer i in the resulting range of RangeBox , it creates an empty string, line . For every AssignmentBox in the list of assignments, it evaluates each one with i and concatenates the results to line . If the line is not an empty string, append it to lines , otherwise cast the integer i to a string instead and append that. Finally, join the results with newlines and return.

Not sure what that does? You will soon see...

Let's finally make a parser

Now, we're ready to build our parser. First, we'll tell the ParserGenerator what tokens are valid:

from rply import ParserGenerator pg = ParserGenerator([ "ELLIPSIS", "EQUALS", "NUMBER", "WORD" ])

Next, we'll implement each of our rules from our grammar as "productions" using the @pg.production decorator. This decorator takes a string which corresponds quite nicely to the rule in our grammar. For example, here's the rule for \(number\):

$$ number \longrightarrow \mathit{NUMBER} $$

And here's the production:

@pg.production("number: NUMBER") def number(p): return IntBox(p[0])

A production gets a single argument p , which is a list of tokens which match its rule. Here's a nearly identical production for \(word\):

@pg.production("word: WORD") def word(p): return WordBox(p[0])

Next is our \(range\) rule, which is as follows:

$$ range \longrightarrow number\ \ \mathit{ELLIPSIS}\ \ number $$

And here's the production:

@pg.production("range : number ELLIPSIS number") def range_op(p): return RangeBox(p[0], p[2])

For \(range\), we don't care about ELLIPSIS token (aside from the fact that it's there) so we only pass the two number tokens to RangeBox (as integers).

Next, the \(assignment\) rule:

$$ assignment \longrightarrow word\ \ \mathit{EQUALS}\ \ number $$

And its production:

@pg.production("assignment : word EQUALS number") def assignment_op(p): return AssignmentBox(p[0], p[2])

Again, we don't care about the EQUALS token, and only pass word and number to the AssignmentBox .

The \(assignments\) rules are similar:

$$ \begin{align} assignments &\longrightarrow assignments\ \ assignment\\ assignments &\longrightarrow \epsilon\\ \end{align} $$

Here are their productions:

@pg.production("assignments : assignments assignment") def expr_assignments(p): return AssignmentsBox(p[0], p[1]) @pg.production("assignments : ") def expr_empty_assignments(p): return AssignmentsBox()

The latter production takes advantage of the default arguments in AssignmentsBox to eventually return an empty list.

Last is our \(main\) rule:

$$ main \longrightarrow range\ \ assignments $$

And its production:

@pg.production("main : range assignments") def main(p): return MainBox(p[0], p[1])

And we build the parser!

parser = pg.build()

Let's make an interpreter

What does an interpreter do? In our case, not much:

def begin(argv): if len(argv) > 1: with open(argv[1], 'r') as f: result = parser.parse(lexer.lex(f.read())) os.write(1, result.eval()) else: os.write(1, "Please provide a filename.")

It is a command-line utility which reads in a file via a filename which is the first argument. Then it hands the character stream to the lexer, which hands the token stream to the parser. When the parser returns a result (which will be a MainBox ), we evaluate it and print the result.

Let's code

So now we have an interpreter, let's write some code! As you might have guessed, our DIVSPL language is really only good for one thing: implementing FizzBuzz as quickly as possible. Let's do it! Open up a file fizzbuzz.divspl and write:

1...15 fizz=3 buzz=5

With the divspl interpreter, (which you can get via pip install divspl in case you weren't following along), you can execute that file:

$ divspl fizzbuzz.divspl 1 2 fizz 4 buzz fizz 7 8 fizz buzz 11 fizz 13 14 fizzbuzz

Look at that! The fastest FizzBuzz in the West.

Let's figure out what happened

When we defined our fizzbuzz.divspl program, it looked like this:

1...15 fizz=3 buzz=5

When we passed the character stream to the lexer, it returned a list containing Token s which looked like this:

[ Token('NUMBER', '1'), Token('ELLIPSIS', '...'), Token('NUMBER', '15'), Token('WORD', 'fizz'), Token('EQUALS', '='), Token('NUMBER', '3'), Token('WORD', 'buzz'), Token('EQUALS', '='), Token('NUMBER', '5'), ]

Since the entire stream was exhausted and successfully turned into Token s, the lexer decided everything is OK, and passed the list of tokens to the parser, which turned it into a combination of "boxes," like so:

MainBox( RangeBox( IntBox(Token('NUMBER', '1')), IntBox(Token('NUMBER', '15')) ), AssignmentsBox( AssignmentsBox( AssignmentsBox(), AssignmentBox( WordBox(Token('WORD', 'fizz')), IntBox(Token('NUMBER', '3')) ) ), AssignmentBox( WordBox(Token('WORD', 'buzz')), IntBox(Token('NUMBER', '5')) ) ) )

The two assignment rules together made up an \(assignments\) rule, and that, preceded by a \(range\) rule, satisfied the \(main\) rule, making our program syntactically valid.

Since the parser was able to make the token stream satisfy the rules of our grammar, it decides everything is OK and passes the MainBox to our interpreter.

Finally, the interpreter calls .eval() on the result, and it returns the following string:

"1\n2\nfizz\n4\nbuzz\nfizz\n7\n8\nfizz\nbuzz\n11\nfizz\n13\n14\nfizzbuzz\n"

Which, when printed to the console, gives us the FizzBuzz we know and love:

1 2 fizz 4 buzz fizz 7 8 fizz buzz 11 fizz 13 14 fizzbuzz

Let's take it a step further

Not only is DIVSPL particularly well-suited to implementing FizzBuzz, it's actually far better at implementing FizzBuzz than most modern languages.

Consider if we wanted to modify the standard FizzBuzz problem as follows:

instead of printing integers up to 15 , print up to 105 ; whenever a number in the range is divisible by 7 , we print 'fuzz' (and the resulting combinations of 'fuzz' with 'fizz' and 'buzz' ).

In our original example in Python, we would not only need to add case for i%7 (for 'fuzz' ), but also:

i%21 (for 'fizzfuzz' ); i%35 (for 'buzzfuzz' ); and i%105 (for 'fizzbuzzfuzz' ).

This would quickly become a huge pain, especially if we had to add any other cases. Luckily, doing the same in DIVSPL requires only one new line:

1...105 fizz=3 buzz=5 fuzz=7

And it works perfectly:

$ divspl fizzbuzzfuzz.divspl ...(intermediate output omitted) 97 fuzz fizz buzz 101 fizz 103 104 fizzbuzzfuzz

Let's see the source

DIVSPLis open source and available at https://github.com/di/divspl . Pull requests are welcome!

Thanks to Patrick Smith, Ray Zane, and Brian Duggan for reading drafts of this post. Also, thanks to @alex for creating RPLY , and to @kirbyfan64 for creating The Magic of RPython , which was invaluable in helping distinguish RPython from magic. :tophat::rabbit2::sparkles:


Viewing all articles
Browse latest Browse all 9596

Trending Articles