aboutsummaryrefslogtreecommitdiffstats
path: root/docs
diff options
context:
space:
mode:
authorChris Lattner <sabre@nondot.org>2007-10-22 04:32:37 +0000
committerChris Lattner <sabre@nondot.org>2007-10-22 04:32:37 +0000
commitdbca7df0f705638a0a0b336cae87829cf07eb54d (patch)
tree5e94542a46166f64687f1187dd5c685762f2b088 /docs
parent33d7a7657bc06363fe41d8773fa5467653d7e99b (diff)
downloadexternal_llvm-dbca7df0f705638a0a0b336cae87829cf07eb54d.zip
external_llvm-dbca7df0f705638a0a0b336cae87829cf07eb54d.tar.gz
external_llvm-dbca7df0f705638a0a0b336cae87829cf07eb54d.tar.bz2
add part 1, review appreciated.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@43215 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'docs')
-rw-r--r--docs/tutorial/LangImpl1.html263
-rw-r--r--docs/tutorial/index.html7
2 files changed, 267 insertions, 3 deletions
diff --git a/docs/tutorial/LangImpl1.html b/docs/tutorial/LangImpl1.html
new file mode 100644
index 0000000..49ad181
--- /dev/null
+++ b/docs/tutorial/LangImpl1.html
@@ -0,0 +1,263 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
+ "http://www.w3.org/TR/html4/strict.dtd">
+
+<html>
+<head>
+ <title>Kaleidoscope: The basic language, with its lexer</title>
+ <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+ <meta name="author" content="Chris Lattner">
+ <link rel="stylesheet" href="../llvm.css" type="text/css">
+</head>
+
+<body>
+
+<div class="doc_title">Kaleidoscope: The basic language, with its lexer</div>
+
+<div class="doc_author">
+ <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
+</div>
+
+<!-- *********************************************************************** -->
+<div class="doc_section"><a name="intro">Tutorial Introduction</a></div>
+<!-- *********************************************************************** -->
+
+<div class="doc_text">
+
+<p>Welcome to the "Implementing a language with LLVM" tutorial. This tutorial
+will run through implementation of a simple language, showing how fun and easy
+it can be. This tutorial will get you up and started and build a framework you
+can extend to other languages and to play with other things.
+</p>
+
+</div>
+
+<!-- *********************************************************************** -->
+<div class="doc_section"><a name="language">The basic language</a></div>
+<!-- *********************************************************************** -->
+
+<div class="doc_text">
+
+<p>This tutorial will be illustrated with a toy language that we'll call
+"<a href="http://en.wikipedia.org/wiki/Kaleidoscope">Kaleidoscope</a>".
+Kaleidoscope is a procedural language that allows you to define functions, use
+conditionals, math, etc. Over the course of the tutorial, we'll extend
+Kaleidoscope to support if/then/else, operator overloading, JIT compilation with
+a simple command line interface, etc.</p>
+
+<p>Because we want to keep things simple, in Kaleidoscope the only datatype is a
+64-bit floating point type (aka 'double' in C parlance). As such, all values
+are implicitly double precision and the language doesn't require type
+declarations. This gives the language a very nice and simple syntax. For
+example, A simple example computes <a
+href="http://en.wikipedia.org/wiki/Fibonacci_number">Fibonacci numbers</a>,
+which looks like this:</p>
+
+<div class="doc_code">
+<pre>
+# Compute the x'th fibonacci number.
+def fib(x)
+ if x < 3 then
+ 1
+ else
+ fib(x-1)+fib(x-2)
+
+# This expression will compute the 40th number.
+fib(40)
+</pre>
+</div>
+
+<p>We also allow Kaleidoscope to call into standard library functions (this LLVM
+JIT makes this completely trivial). This means that you can use the 'extern'
+keyword to define a function before you use it (this is also useful for mutually
+recursive functions). For example:</p>
+
+<div class="doc_code">
+<pre>
+extern sin(arg);
+extern cos(arg);
+extern atan2(arg1 arg2);
+
+atan2(sin(.4), cos(42))
+</pre>
+</div>
+
+<p>In the first incarnation of the language, we will only support basic
+arithmetic: if/then/else will be added in a future installment. Another
+interesting aspect of the first implementation is that it is a completely
+functional language, which does not allow you to have side-effects etc. We will
+eventually add side effects for those who prefer them.</p>
+
+<p>In order to make this tutorial
+maximally understandable and hackable, we choose to implement everything in C++
+instead of using lexer and parser generators. LLVM obviously works just fine
+with these tools, and choice of these tools doesn't impact overall design.</p>
+
+<p>A note about this tutorial: we expect you to extend the language and play
+with it on your own. Take the code and go crazy hacking away at it. It can be
+a lot of fun to play with languages!</p>
+
+</div>
+
+<!-- *********************************************************************** -->
+<div class="doc_section"><a name="language">The Lexer</a></div>
+<!-- *********************************************************************** -->
+
+<div class="doc_text">
+
+<p>When it comes to implementing a language, the first thing needed is
+the ability to process a text file and recognize what it says. The traditional
+way to do this is to use a "<a
+href="http://en.wikipedia.org/wiki/Lexical_analysis">lexer</a>" (aka 'scanner')
+to break the input up into "tokens". Each token returned by the lexer includes
+a token code and potentially some metadata (e.g. the numeric value of a number).
+First, we define the possibilities:
+</p>
+
+<div class="doc_code">
+<pre>
+// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
+// of these for known things.
+enum Token {
+ tok_eof = -1,
+
+ // commands
+ tok_def = -2, tok_extern = -3,
+
+ // primary
+ tok_identifier = -4, tok_number = -5,
+};
+
+static std::string IdentifierStr; // Filled in if tok_identifier
+static double NumVal; // Filled in if tok_number
+</pre>
+</div>
+
+<p>Each token returned by our lexer will either be one of the Token enum values
+or it will be an 'unknown' character like '+' which is returned as its ascii
+value. If the current token is an identifier, the <tt>IdentifierStr</tt>
+global variable holds the name of the identifier. If the current token is a
+numeric literal (like 1.0), <tt>NumVal</tt> holds its value. Note that we use
+global variables for simplicity, this is not the best choice for a real language
+implementation :).
+</p>
+
+<p>The actual implementation of the lexer is a single function <tt>gettok</tt>.
+<tt>gettok</tt> is called to return the next token from standard input. Its
+definition starts as:</p>
+
+<div class="doc_code">
+<pre>
+/// gettok - Return the next token from standard input.
+static int gettok() {
+ static int LastChar = ' ';
+
+ // Skip any whitespace.
+ while (isspace(LastChar))
+ LastChar = getchar();
+</pre>
+</div>
+
+<p>
+<tt>gettok</tt> works by calling the C <tt>getchar()</tt> function to read
+characters one at a time from standard input. It eats them as it recognizes
+them and stores the last character read but not processed in LastChar. The
+first thing that it has to do is ignore whitespace between tokens. This is
+accomplished with the loop above.</p>
+
+<p>The next thing it needs to do is recognize identifiers, and specific keywords
+like "def". Kaleidoscope does this with this simple loop:</p>
+
+<div class="doc_code">
+<pre>
+ if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
+ IdentifierStr = LastChar;
+ while (isalnum((LastChar = getchar())))
+ IdentifierStr += LastChar;
+
+ if (IdentifierStr == "def") return tok_def;
+ if (IdentifierStr == "extern") return tok_extern;
+ return tok_identifier;
+ }
+</pre>
+</div>
+
+<p>Note that it sets the '<tt>IdentifierStr</tt>' global whenever it lexes an
+identifier. Also, since language keywords are matched by the same loop, we
+handle them here inline. Numeric values are similar:</p>
+
+<div class="doc_code">
+<pre>
+ if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
+ std::string NumStr;
+ do {
+ NumStr += LastChar;
+ LastChar = getchar();
+ } while (isdigit(LastChar) || LastChar == '.');
+
+ NumVal = strtod(NumStr.c_str(), 0);
+ return tok_number;
+ }
+</pre>
+</div>
+
+<p>This is all pretty straight-forward code for processing input. When reading
+a numeric value from input, we use the C <tt>strtod</tt> function to convert it
+to a numeric value that we store in <tt>NumVal</tt>. Note that this isn't doing
+sufficient error checking: it will incorrect read "1.23.45.67" and handle it as
+if you typed in "1.23". Feel free to extend it :). Next we handle comments:
+</p>
+
+<div class="doc_code">
+<pre>
+ if (LastChar == '#') {
+ // Comment until end of line.
+ do LastChar = getchar();
+ while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp; LastChar != '\r');
+
+ if (LastChar != EOF)
+ return gettok();
+ }
+</pre>
+</div>
+
+<p>We handle comments by skipping to the end of the line and then returnning the
+next comment. Finally, if the input doesn't match one of the above cases, it is
+either an operator character like '+', the end of file. These are handled with
+this code:</p>
+
+<div class="doc_code">
+<pre>
+ // Check for end of file. Don't eat the EOF.
+ if (LastChar == EOF)
+ return tok_eof;
+
+ // Otherwise, just return the character as its ascii value.
+ int ThisChar = LastChar;
+ LastChar = getchar();
+ return ThisChar;
+}
+</pre>
+</div>
+
+<p>With this, we have the complete lexer for the basic Kaleidoscope language.
+Next we'll <a href="LangImpl2.html">build a simple parser that uses this to
+build an Abstract Syntax Tree</a>. If you prefer, you can jump to the <a
+href="index.html">main tutorial index page</a>.
+</p>
+
+</div>
+
+<!-- *********************************************************************** -->
+<hr>
+<address>
+ <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
+ src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
+ <a href="http://validator.w3.org/check/referer"><img
+ src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!" /></a>
+
+ <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
+ <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
+ Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
+</address>
+</body>
+</html>
diff --git a/docs/tutorial/index.html b/docs/tutorial/index.html
index 43c5dab..acaee03 100644
--- a/docs/tutorial/index.html
+++ b/docs/tutorial/index.html
@@ -25,17 +25,18 @@
<li><!--<a href="Tutorial5.html">-->Invoking the JIT</li>
</ol>
</li>
- <li>Implementing a simple language with LLVM
+ <li>Implementing a language with LLVM: Kaleidoscope
<ol>
<li><a href="LangImpl1.html">The basic language, with its lexer</a></li>
<li>Implementing a Parser and AST</li>
<li>Implementing code generation to LLVM IR</li>
+ <li>Adding JIT codegen support</li>
<li>Extending the language: if/then/else</li>
<li>Extending the language: operator overloading</li>
- <li>Adding JIT codegen support</li>
+ <li>Extending the language: mutable variables</li>
<li>Thoughts and ideas for extensions</li>
</ol></li>
</ol>
</body>
-</html> \ No newline at end of file
+</html>