aboutsummaryrefslogtreecommitdiffstats
path: root/docs
diff options
context:
space:
mode:
authorPirama Arumuga Nainar <pirama@google.com>2015-05-06 11:46:36 -0700
committerPirama Arumuga Nainar <pirama@google.com>2015-05-18 10:52:30 -0700
commit2c3e0051c31c3f5b2328b447eadf1cf9c4427442 (patch)
treec0104029af14e9f47c2ef58ca60e6137691f3c9b /docs
parente1bc145815f4334641be19f1c45ecf85d25b6e5a (diff)
downloadexternal_llvm-2c3e0051c31c3f5b2328b447eadf1cf9c4427442.zip
external_llvm-2c3e0051c31c3f5b2328b447eadf1cf9c4427442.tar.gz
external_llvm-2c3e0051c31c3f5b2328b447eadf1cf9c4427442.tar.bz2
Update aosp/master LLVM for rebase to r235153
Change-Id: I9bf53792f9fc30570e81a8d80d296c681d005ea7 (cherry picked from commit 0c7f116bb6950ef819323d855415b2f2b0aad987)
Diffstat (limited to 'docs')
-rw-r--r--docs/Bugpoint.rst2
-rw-r--r--docs/ExceptionHandling.rst40
-rw-r--r--docs/ExtendingLLVM.rst97
-rw-r--r--docs/Extensions.rst23
-rw-r--r--docs/LangRef.rst35
-rw-r--r--docs/LibFuzzer.rst364
-rw-r--r--docs/Phabricator.rst19
-rw-r--r--docs/ProgrammersManual.rst13
-rw-r--r--docs/R600Usage.rst60
-rw-r--r--docs/Vectorizers.rst2
-rw-r--r--docs/YamlIO.rst4
-rw-r--r--docs/index.rst4
-rw-r--r--docs/tutorial/LangImpl5.rst4
-rw-r--r--docs/tutorial/LangImpl7.rst2
-rw-r--r--docs/tutorial/OCamlLangImpl5.rst2
15 files changed, 573 insertions, 98 deletions
diff --git a/docs/Bugpoint.rst b/docs/Bugpoint.rst
index 8fa64bc..6bd7ff9 100644
--- a/docs/Bugpoint.rst
+++ b/docs/Bugpoint.rst
@@ -208,7 +208,7 @@ point---a simple binary search may not be sufficient, as transformations that
interact may require isolating more than one call. In TargetLowering, use
``return SDNode();`` instead of ``return false;``.
-Now that that the number of transformations is down to a manageable number, try
+Now that the number of transformations is down to a manageable number, try
examining the output to see if you can figure out which transformations are
being done. If that can be figured out, then do the usual debugging. If which
code corresponds to the transformation being performed isn't obvious, set a
diff --git a/docs/ExceptionHandling.rst b/docs/ExceptionHandling.rst
index 21de19b..72ed78a 100644
--- a/docs/ExceptionHandling.rst
+++ b/docs/ExceptionHandling.rst
@@ -519,37 +519,29 @@ action.
A code of ``i32 1`` indicates a catch action, which expects three additional
arguments. Different EH schemes give different meanings to the three arguments,
-but the first argument indicates whether the catch should fire, the second is a
-pointer to stack object where the exception object should be stored, and the
-third is the code to run to catch the exception.
+but the first argument indicates whether the catch should fire, the second is
+the frameescape index of the exception object, and the third is the code to run
+to catch the exception.
For Windows C++ exception handling, the first argument for a catch handler is a
-pointer to the RTTI type descriptor for the object to catch. The third argument
-is a pointer to a function implementing the catch. This function returns the
-address of the basic block where execution should resume after handling the
-exception.
+pointer to the RTTI type descriptor for the object to catch. The second
+argument is an index into the argument list of the ``llvm.frameescape`` call in
+the main function. The exception object will be copied into the provided stack
+object. If the exception object is not required, this argument should be -1.
+The third argument is a pointer to a function implementing the catch. This
+function returns the address of the basic block where execution should resume
+after handling the exception.
For Windows SEH, the first argument is a pointer to the filter function, which
indicates if the exception should be caught or not. The second argument is
-typically null. The third argument is the address of a basic block where the
-exception will be handled. In other words, catch handlers are not outlined in
-SEH. After running cleanups, execution immediately resumes at this PC.
+typically negative one. The third argument is the address of a basic block
+where the exception will be handled. In other words, catch handlers are not
+outlined in SEH. After running cleanups, execution immediately resumes at this
+PC.
In order to preserve the structure of the CFG, a call to '``llvm.eh.actions``'
-must be followed by an ':ref:`indirectbr <i_indirectbr>`' instruction that jumps
-to the result of the intrinsic call.
-
-``llvm.eh.unwindhelp``
-----------------------
-
-.. code-block:: llvm
-
- void @llvm.eh.unwindhelp(i8*)
-
-This intrinsic designates the provided static alloca as the unwind help object.
-This object is used by Windows native exception handling on non-x86 platforms
-where xdata unwind information is used. It is typically an 8 byte chunk of
-memory treated as two 32-bit integers.
+must be followed by an ':ref:`indirectbr <i_indirectbr>`' instruction that
+jumps to the result of the intrinsic call.
SJLJ Intrinsics
diff --git a/docs/ExtendingLLVM.rst b/docs/ExtendingLLVM.rst
index 2552c07..56c48af 100644
--- a/docs/ExtendingLLVM.rst
+++ b/docs/ExtendingLLVM.rst
@@ -178,42 +178,46 @@ Adding a new instruction
to maintain compatibility with the previous version. Only add an instruction
if it is absolutely necessary.
-#. ``llvm/include/llvm/Instruction.def``:
+#. ``llvm/include/llvm/IR/Instruction.def``:
add a number for your instruction and an enum name
-#. ``llvm/include/llvm/Instructions.h``:
+#. ``llvm/include/llvm/IR/Instructions.h``:
add a definition for the class that will represent your instruction
-#. ``llvm/include/llvm/Support/InstVisitor.h``:
+#. ``llvm/include/llvm/IR/InstVisitor.h``:
add a prototype for a visitor to your new instruction type
-#. ``llvm/lib/AsmParser/Lexer.l``:
+#. ``llvm/lib/AsmParser/LLLexer.cpp``:
add a new token to parse your instruction from assembly text file
-#. ``llvm/lib/AsmParser/llvmAsmParser.y``:
+#. ``llvm/lib/AsmParser/LLParser.cpp``:
add the grammar on how your instruction can be read and what it will
construct as a result
-#. ``llvm/lib/Bitcode/Reader/Reader.cpp``:
+#. ``llvm/lib/Bitcode/Reader/BitcodeReader.cpp``:
add a case for your instruction and how it will be parsed from bitcode
-#. ``llvm/lib/VMCore/Instruction.cpp``:
+#. ``llvm/lib/Bitcode/Writer/BitcodeWriter.cpp``:
+
+ add a case for your instruction and how it will be parsed from bitcode
+
+#. ``llvm/lib/IR/Instruction.cpp``:
add a case for how your instruction will be printed out to assembly
-#. ``llvm/lib/VMCore/Instructions.cpp``:
+#. ``llvm/lib/IR/Instructions.cpp``:
implement the class you defined in ``llvm/include/llvm/Instructions.h``
#. Test your instruction
-#. ``llvm/lib/Target/*``:
+#. ``llvm/lib/Target/*``:
add support for your instruction to code generators, or add a lowering pass.
@@ -236,69 +240,88 @@ Adding a new type
Adding a fundamental type
-------------------------
-#. ``llvm/include/llvm/Type.h``:
+#. ``llvm/include/llvm/IR/Type.h``:
add enum for the new type; add static ``Type*`` for this type
-#. ``llvm/lib/VMCore/Type.cpp``:
+#. ``llvm/lib/IR/Type.cpp`` and ``llvm/lib/IR/ValueTypes.cpp``:
add mapping from ``TypeID`` => ``Type*``; initialize the static ``Type*``
-#. ``llvm/lib/AsmReader/Lexer.l``:
+#. ``llvm/llvm/llvm-c/Core.cpp``:
+
+ add enum ``LLVMTypeKind`` and modify
+ ``LLVMTypeKind LLVMGetTypeKind(LLVMTypeRef Ty)`` for the new type
+
+#. ``llvm/include/llvm/IR/TypeBuilder.h``:
+
+ add new class to represent new type in the hierarchy
+
+#. ``llvm/lib/AsmParser/LLLexer.cpp``:
add ability to parse in the type from text assembly
-#. ``llvm/lib/AsmReader/llvmAsmParser.y``:
+#. ``llvm/lib/AsmParser/LLParser.cpp``:
add a token for that type
+#. ``llvm/lib/Bitcode/Writer/BitcodeWriter.cpp``:
+
+ modify ``static void WriteTypeTable(const ValueEnumerator &VE,
+ BitstreamWriter &Stream)`` to serialize your type
+
+#. ``llvm/lib/Bitcode/Reader/BitcodeReader.cpp``:
+
+ modify ``bool BitcodeReader::ParseTypeType()`` to read your data type
+
+#. ``include/llvm/Bitcode/LLVMBitCodes.h``:
+
+ add enum ``TypeCodes`` for the new type
+
Adding a derived type
---------------------
-#. ``llvm/include/llvm/Type.h``:
+#. ``llvm/include/llvm/IR/Type.h``:
add enum for the new type; add a forward declaration of the type also
-#. ``llvm/include/llvm/DerivedTypes.h``:
+#. ``llvm/include/llvm/IR/DerivedTypes.h``:
add new class to represent new class in the hierarchy; add forward
declaration to the TypeMap value type
-#. ``llvm/lib/VMCore/Type.cpp``:
+#. ``llvm/lib/IR/Type.cpp`` and ``llvm/lib/IR/ValueTypes.cpp``:
- add support for derived type to:
+ add support for derived type, notably `enum TypeID` and `is`, `get` methods.
- .. code-block:: c++
+#. ``llvm/llvm/llvm-c/Core.cpp``:
- std::string getTypeDescription(const Type &Ty,
- std::vector<const Type*> &TypeStack)
- bool TypesEqual(const Type *Ty, const Type *Ty2,
- std::map<const Type*, const Type*> &EqTypes)
+ add enum ``LLVMTypeKind`` and modify
+ `LLVMTypeKind LLVMGetTypeKind(LLVMTypeRef Ty)` for the new type
- add necessary member functions for type, and factory methods
+#. ``llvm/include/llvm/IR/TypeBuilder.h``:
-#. ``llvm/lib/AsmReader/Lexer.l``:
+ add new class to represent new class in the hierarchy
- add ability to parse in the type from text assembly
+#. ``llvm/lib/AsmParser/LLLexer.cpp``:
-#. ``llvm/lib/Bitcode/Writer/Writer.cpp``:
+ modify ``lltok::Kind LLLexer::LexIdentifier()`` to add ability to
+ parse in the type from text assembly
- modify ``void BitcodeWriter::outputType(const Type *T)`` to serialize your
- type
+#. ``llvm/lib/Bitcode/Writer/BitcodeWriter.cpp``:
-#. ``llvm/lib/Bitcode/Reader/Reader.cpp``:
+ modify ``static void WriteTypeTable(const ValueEnumerator &VE,
+ BitstreamWriter &Stream)`` to serialize your type
- modify ``const Type *BitcodeReader::ParseType()`` to read your data type
+#. ``llvm/lib/Bitcode/Reader/BitcodeReader.cpp``:
-#. ``llvm/lib/VMCore/AsmWriter.cpp``:
+ modify ``bool BitcodeReader::ParseTypeType()`` to read your data type
- modify
+#. ``include/llvm/Bitcode/LLVMBitCodes.h``:
- .. code-block:: c++
+ add enum ``TypeCodes`` for the new type
- void calcTypeName(const Type *Ty,
- std::vector<const Type*> &TypeStack,
- std::map<const Type*,std::string> &TypeNames,
- std::string &Result)
+#. ``llvm/lib/IR/AsmWriter.cpp``:
+ modify ``void TypePrinting::print(Type *Ty, raw_ostream &OS)``
to output the new derived type
diff --git a/docs/Extensions.rst b/docs/Extensions.rst
index 271c085..c8ff07c 100644
--- a/docs/Extensions.rst
+++ b/docs/Extensions.rst
@@ -165,6 +165,29 @@ and ``.bar`` is associated to ``.foo``.
.section .foo,"bw",discard, "sym"
.section .bar,"rd",associative, "sym"
+
+ELF-Dependent
+-------------
+
+``.section`` Directive
+^^^^^^^^^^^^^^^^^^^^^^
+
+In order to support creating multiple sections with the same name and comdat,
+it is possible to add an unique number at the end of the ``.seciton`` directive.
+For example, the following code creates two sections named ``.text``.
+
+.. code-block:: gas
+
+ .section .text,"ax",@progbits,unique,1
+ nop
+
+ .section .text,"ax",@progbits,unique,2
+ nop
+
+
+The unique number is not present in the resulting object at all. It is just used
+in the assembler to differentiate the sections.
+
Target Specific Behaviour
=========================
diff --git a/docs/LangRef.rst b/docs/LangRef.rst
index 5eaea1c..a5a8869 100644
--- a/docs/LangRef.rst
+++ b/docs/LangRef.rst
@@ -348,13 +348,13 @@ added in the future:
"``anyregcc``" - Dynamic calling convention for code patching
This is a special convention that supports patching an arbitrary code
sequence in place of a call site. This convention forces the call
- arguments into registers but allows them to be dynamcially
+ arguments into registers but allows them to be dynamically
allocated. This can currently only be used with calls to
llvm.experimental.patchpoint because only this intrinsic records
the location of its arguments in a side table. See :doc:`StackMaps`.
"``preserve_mostcc``" - The `PreserveMost` calling convention
- This calling convention attempts to make the code in the caller as little
- intrusive as possible. This calling convention behaves identical to the `C`
+ This calling convention attempts to make the code in the caller as
+ unintrusive as possible. This convention behaves identically to the `C`
calling convention on how arguments and return values are passed, but it
uses a different set of caller/callee-saved registers. This alleviates the
burden of saving and recovering a large register set before and after the
@@ -1012,6 +1012,19 @@ Currently, only the following parameter attributes are defined:
array), however ``dereferenceable(<n>)`` does imply ``nonnull`` in
``addrspace(0)`` (which is the default address space).
+``dereferenceable_or_null(<n>)``
+ This indicates that the parameter or return value isn't both
+ non-null and non-dereferenceable (up to ``<n>`` bytes) at the same
+ time. All non-null pointers tagged with
+ ``dereferenceable_or_null(<n>)`` are ``dereferenceable(<n>)``.
+ For address space 0 ``dereferenceable_or_null(<n>)`` implies that
+ a pointer is exactly one of ``dereferenceable(<n>)`` or ``null``,
+ and in other address spaces ``dereferenceable_or_null(<n>)``
+ implies that a pointer is at least one of ``dereferenceable(<n>)``
+ or ``null`` (i.e. it may be both ``null`` and
+ ``dereferenceable(<n>)``). This attribute may only be applied to
+ pointer typed parameters.
+
.. _gc:
Garbage Collector Strategy Names
@@ -3235,21 +3248,15 @@ arguments (``DW_TAG_arg_variable``). In the latter case, the ``arg:`` field
specifies the argument position, and this variable will be included in the
``variables:`` field of its :ref:`MDSubprogram`.
-If set, the ``inlinedAt:`` field points at an :ref:`MDLocation`, and the
-variable represents an inlined version of a variable (with all other fields
-duplicated from the non-inlined version).
-
.. code-block:: llvm
!0 = !MDLocalVariable(tag: DW_TAG_arg_variable, name: "this", arg: 0,
scope: !3, file: !2, line: 7, type: !3,
- flags: DIFlagArtificial, inlinedAt: !4)
+ flags: DIFlagArtificial)
!1 = !MDLocalVariable(tag: DW_TAG_arg_variable, name: "x", arg: 1,
- scope: !4, file: !2, line: 7, type: !3,
- inlinedAt: !6)
+ scope: !4, file: !2, line: 7, type: !3)
!1 = !MDLocalVariable(tag: DW_TAG_auto_variable, name: "y",
- scope: !5, file: !2, line: 7, type: !3,
- inlinedAt: !6)
+ scope: !5, file: !2, line: 7, type: !3)
MDExpression
""""""""""""
@@ -3370,7 +3377,7 @@ instructions (loads, stores, memory-accessing calls, etc.) that carry
``noalias`` metadata can specifically be specified not to alias with some other
collection of memory access instructions that carry ``alias.scope`` metadata.
Each type of metadata specifies a list of scopes where each scope has an id and
-a domain. When evaluating an aliasing query, if for some some domain, the set
+a domain. When evaluating an aliasing query, if for some domain, the set
of scopes with that domain in one instruction's ``alias.scope`` list is a
subset of (or equal to) the set of scopes for that domain in another
instruction's ``noalias`` list, then the two memory accesses are assumed not to
@@ -6577,7 +6584,7 @@ Arguments:
""""""""""
The '``ptrtoint``' instruction takes a ``value`` to cast, which must be
-a a value of type :ref:`pointer <t_pointer>` or a vector of pointers, and a
+a value of type :ref:`pointer <t_pointer>` or a vector of pointers, and a
type to cast it to ``ty2``, which must be an :ref:`integer <t_integer>` or
a vector of integers type.
diff --git a/docs/LibFuzzer.rst b/docs/LibFuzzer.rst
new file mode 100644
index 0000000..a31f83d
--- /dev/null
+++ b/docs/LibFuzzer.rst
@@ -0,0 +1,364 @@
+========================================================
+LibFuzzer -- a library for coverage-guided fuzz testing.
+========================================================
+.. contents::
+ :local:
+ :depth: 4
+
+Introduction
+============
+
+This library is intended primarily for in-process coverage-guided fuzz testing
+(fuzzing) of other libraries. The typical workflow looks like this:
+
+* Build the Fuzzer library as a static archive (or just a set of .o files).
+ Note that the Fuzzer contains the main() function.
+ Preferably do *not* use sanitizers while building the Fuzzer.
+* Build the library you are going to test with -fsanitize-coverage=[234]
+ and one of the sanitizers. We recommend to build the library in several
+ different modes (e.g. asan, msan, lsan, ubsan, etc) and even using different
+ optimizations options (e.g. -O0, -O1, -O2) to diversify testing.
+* Build a test driver using the same options as the library.
+ The test driver is a C/C++ file containing interesting calls to the library
+ inside a single function ``extern "C" void TestOneInput(const uint8_t *Data, size_t Size);``
+* Link the Fuzzer, the library and the driver together into an executable
+ using the same sanitizer options as for the library.
+* Collect the initial corpus of inputs for the
+ fuzzer (a directory with test inputs, one file per input).
+ The better your inputs are the faster you will find something interesting.
+ Also try to keep your inputs small, otherwise the Fuzzer will run too slow.
+* Run the fuzzer with the test corpus. As new interesting test cases are
+ discovered they will be added to the corpus. If a bug is discovered by
+ the sanitizer (asan, etc) it will be reported as usual and the reproducer
+ will be written to disk.
+ Each Fuzzer process is single-threaded (unless the library starts its own
+ threads). You can run the Fuzzer on the same corpus in multiple processes.
+ in parallel. For run-time options run the Fuzzer binary with '-help=1'.
+
+
+The Fuzzer is similar in concept to AFL_,
+but uses in-process Fuzzing, which is more fragile, more restrictive, but
+potentially much faster as it has no overhead for process start-up.
+It uses LLVM's SanitizerCoverage_ instrumentation to get in-process
+coverage-feedback
+
+The code resides in the LLVM repository, requires the fresh Clang compiler to build
+and is used to fuzz various parts of LLVM,
+but the Fuzzer itself does not (and should not) depend on any
+part of LLVM and can be used for other projects w/o requiring the rest of LLVM.
+
+Usage examples
+==============
+
+Toy example
+-----------
+
+A simple function that does something interesting if it receives the input "HI!"::
+
+ cat << EOF >> test_fuzzer.cc
+ extern "C" void TestOneInput(const unsigned char *data, unsigned long size) {
+ if (size > 0 && data[0] == 'H')
+ if (size > 1 && data[1] == 'I')
+ if (size > 2 && data[2] == '!')
+ __builtin_trap();
+ }
+ EOF
+ # Get lib/Fuzzer. Assuming that you already have fresh clang in PATH.
+ svn co http://llvm.org/svn/llvm-project/llvm/trunk/lib/Fuzzer
+ # Build lib/Fuzzer files.
+ clang -c -g -O2 -std=c++11 Fuzzer/*.cpp -IFuzzer
+ # Build test_fuzzer.cc with asan and link against lib/Fuzzer.
+ clang++ -fsanitize=address -fsanitize-coverage=3 test_fuzzer.cc Fuzzer*.o
+ # Run the fuzzer with no corpus.
+ ./a.out
+
+You should get ``Illegal instruction (core dumped)`` pretty quickly.
+
+PCRE2
+-----
+
+Here we show how to use lib/Fuzzer on something real, yet simple: pcre2_::
+
+ COV_FLAGS=" -fsanitize-coverage=4 -mllvm -sanitizer-coverage-8bit-counters=1"
+ # Get PCRE2
+ svn co svn://vcs.exim.org/pcre2/code/trunk pcre
+ # Get lib/Fuzzer. Assuming that you already have fresh clang in PATH.
+ svn co http://llvm.org/svn/llvm-project/llvm/trunk/lib/Fuzzer
+ # Build PCRE2 with AddressSanitizer and coverage.
+ (cd pcre; ./autogen.sh; CC="clang -fsanitize=address $COV_FLAGS" ./configure --prefix=`pwd`/../inst && make -j && make install)
+ # Build lib/Fuzzer files.
+ clang -c -g -O2 -std=c++11 Fuzzer/*.cpp -IFuzzer
+ # Build the the actual function that does something interesting with PCRE2.
+ cat << EOF > pcre_fuzzer.cc
+ #include <string.h>
+ #include "pcre2posix.h"
+ extern "C" void TestOneInput(const unsigned char *data, size_t size) {
+ if (size < 1) return;
+ char *str = new char[size+1];
+ memcpy(str, data, size);
+ str[size] = 0;
+ regex_t preg;
+ if (0 == regcomp(&preg, str, 0)) {
+ regexec(&preg, str, 0, 0, 0);
+ regfree(&preg);
+ }
+ delete [] str;
+ }
+ EOF
+ clang++ -g -fsanitize=address $COV_FLAGS -c -std=c++11 -I inst/include/ pcre_fuzzer.cc
+ # Link.
+ clang++ -g -fsanitize=address -Wl,--whole-archive inst/lib/*.a -Wl,-no-whole-archive Fuzzer*.o pcre_fuzzer.o -o pcre_fuzzer
+
+This will give you a binary of the fuzzer, called ``pcre_fuzzer``.
+Now, create a directory that will hold the test corpus::
+
+ mkdir -p CORPUS
+
+For simple input languages like regular expressions this is all you need.
+For more complicated inputs populate the directory with some input samples.
+Now run the fuzzer with the corpus dir as the only parameter::
+
+ ./pcre_fuzzer ./CORPUS
+
+You will see output like this::
+
+ Seed: 1876794929
+ #0 READ cov 0 bits 0 units 1 exec/s 0
+ #1 pulse cov 3 bits 0 units 1 exec/s 0
+ #1 INITED cov 3 bits 0 units 1 exec/s 0
+ #2 pulse cov 208 bits 0 units 1 exec/s 0
+ #2 NEW cov 208 bits 0 units 2 exec/s 0 L: 64
+ #3 NEW cov 217 bits 0 units 3 exec/s 0 L: 63
+ #4 pulse cov 217 bits 0 units 3 exec/s 0
+
+* The ``Seed:`` line shows you the current random seed (you can change it with ``-seed=N`` flag).
+* The ``READ`` line shows you how many input files were read (since you passed an empty dir there were inputs, but one dummy input was synthesised).
+* The ``INITED`` line shows you that how many inputs will be fuzzed.
+* The ``NEW`` lines appear with the fuzzer finds a new interesting input, which is saved to the CORPUS dir. If multiple corpus dirs are given, the first one is used.
+* The ``pulse`` lines appear periodically to show the current status.
+
+Now, interrupt the fuzzer and run it again the same way. You will see::
+
+ Seed: 1879995378
+ #0 READ cov 0 bits 0 units 564 exec/s 0
+ #1 pulse cov 502 bits 0 units 564 exec/s 0
+ ...
+ #512 pulse cov 2933 bits 0 units 564 exec/s 512
+ #564 INITED cov 2991 bits 0 units 344 exec/s 564
+ #1024 pulse cov 2991 bits 0 units 344 exec/s 1024
+ #1455 NEW cov 2995 bits 0 units 345 exec/s 1455 L: 49
+
+This time you were running the fuzzer with a non-empty input corpus (564 items).
+As the first step, the fuzzer minimized the set to produce 344 interesting items (the ``INITED`` line)
+
+You may run ``N`` independent fuzzer jobs in parallel on ``M`` CPUs::
+
+ N=100; M=4; ./pcre_fuzzer ./CORPUS -jobs=$N -workers=$M
+
+This is useful when you already have an exhaustive test corpus.
+If you've just started fuzzing with no good corpus running independent
+jobs will create a corpus with too many duplicates.
+One way to avoid this and still use all of your CPUs is to use the flag ``-exit_on_first=1``
+which will cause the fuzzer to exit on the first new synthesised input::
+
+ N=100; M=4; ./pcre_fuzzer ./CORPUS -jobs=$N -workers=$M -exit_on_first=1
+
+Heartbleed
+----------
+Remember Heartbleed_?
+As it was recently `shown <https://blog.hboeck.de/archives/868-How-Heartbleed-couldve-been-found.html>`_,
+fuzzing with AddressSanitizer can find Heartbleed. Indeed, here are the step-by-step instructions
+to find Heartbleed with LibFuzzer::
+
+ wget https://www.openssl.org/source/openssl-1.0.1f.tar.gz
+ tar xf openssl-1.0.1f.tar.gz
+ COV_FLAGS="-fsanitize-coverage=4" # -mllvm -sanitizer-coverage-8bit-counters=1"
+ (cd openssl-1.0.1f/ && ./config &&
+ make -j 32 CC="clang -g -fsanitize=address $COV_FLAGS")
+ # Get and build LibFuzzer
+ svn co http://llvm.org/svn/llvm-project/llvm/trunk/lib/Fuzzer
+ clang -c -g -O2 -std=c++11 Fuzzer/*.cpp -IFuzzer
+ # Get examples of key/pem files.
+ git clone https://github.com/hannob/selftls
+ cp selftls/server* . -v
+ cat << EOF > handshake-fuzz.cc
+ #include <openssl/ssl.h>
+ #include <openssl/err.h>
+ #include <assert.h>
+ SSL_CTX *sctx;
+ int Init() {
+ SSL_library_init();
+ SSL_load_error_strings();
+ ERR_load_BIO_strings();
+ OpenSSL_add_all_algorithms();
+ assert (sctx = SSL_CTX_new(TLSv1_method()));
+ assert (SSL_CTX_use_certificate_file(sctx, "server.pem", SSL_FILETYPE_PEM));
+ assert (SSL_CTX_use_PrivateKey_file(sctx, "server.key", SSL_FILETYPE_PEM));
+ return 0;
+ }
+ extern "C" void TestOneInput(unsigned char *Data, size_t Size) {
+ static int unused = Init();
+ SSL *server = SSL_new(sctx);
+ BIO *sinbio = BIO_new(BIO_s_mem());
+ BIO *soutbio = BIO_new(BIO_s_mem());
+ SSL_set_bio(server, sinbio, soutbio);
+ SSL_set_accept_state(server);
+ BIO_write(sinbio, Data, Size);
+ SSL_do_handshake(server);
+ SSL_free(server);
+ }
+ EOF
+ # Build the fuzzer.
+ clang++ -g handshake-fuzz.cc -fsanitize=address \
+ openssl-1.0.1f/libssl.a openssl-1.0.1f/libcrypto.a Fuzzer*.o
+ # Run 20 independent fuzzer jobs.
+ ./a.out -jobs=20 -workers=20
+
+Voila::
+
+ #1048576 pulse cov 3424 bits 0 units 9 exec/s 24385
+ =================================================================
+ ==17488==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x629000004748 at pc 0x00000048c979 bp 0x7fffe3e864f0 sp 0x7fffe3e85ca8
+ READ of size 60731 at 0x629000004748 thread T0
+ #0 0x48c978 in __asan_memcpy
+ #1 0x4db504 in tls1_process_heartbeat openssl-1.0.1f/ssl/t1_lib.c:2586:3
+ #2 0x580be3 in ssl3_read_bytes openssl-1.0.1f/ssl/s3_pkt.c:1092:4
+
+Advanced features
+=================
+
+Tokens
+------
+
+By default, the fuzzer is not aware of complexities of the input language
+and when fuzzing e.g. a C++ parser it will mostly stress the lexer.
+It is very hard for the fuzzer to come up with something like ``reinterpret_cast<int>``
+from a test corpus that doesn't have it.
+See a detailed discussion of this topic at
+http://lcamtuf.blogspot.com/2015/01/afl-fuzz-making-up-grammar-with.html.
+
+lib/Fuzzer implements a simple technique that allows to fuzz input languages with
+long tokens. All you need is to prepare a text file containing up to 253 tokens, one token per line,
+and pass it to the fuzzer as ``-tokens=TOKENS_FILE.txt``.
+Three implicit tokens are added: ``" "``, ``"\t"``, and ``"\n"``.
+The fuzzer itself will still be mutating a string of bytes
+but before passing this input to the target library it will replace every byte ``b`` with the ``b``-th token.
+If there are less than ``b`` tokens, a space will be added instead.
+
+AFL compatibility
+-----------------
+LibFuzzer can be used in parallel with AFL_ on the same test corpus.
+Both fuzzers expect the test corpus to reside in a directory, one file per input.
+You can run both fuzzers on the same corpus in parallel::
+
+ ./afl-fuzz -i testcase_dir -o findings_dir /path/to/program -r @@
+ ./llvm-fuzz testcase_dir findings_dir # Will write new tests to testcase_dir
+
+Periodically restart both fuzzers so that they can use each other's findings.
+
+How good is my fuzzer?
+----------------------
+
+Once you implement your target function ``TestOneInput`` and fuzz it to death,
+you will want to know whether the function or the corpus can be improved further.
+One easy to use metric is, of course, code coverage.
+You can get the coverage for your corpus like this::
+
+ ASAN_OPTIONS=coverage_pcs=1 ./fuzzer CORPUS_DIR -runs=0
+
+This will run all the tests in the CORPUS_DIR but will not generate any new tests
+and dump covered PCs to disk before exiting.
+Then you can subtract the set of covered PCs from the set of all instrumented PCs in the binary,
+see SanitizerCoverage_ for details.
+
+Fuzzing components of LLVM
+==========================
+
+clang-format-fuzzer
+-------------------
+The inputs are random pieces of C++-like text.
+
+Build (make sure to use fresh clang as the host compiler)::
+
+ cmake -GNinja -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_USE_SANITIZER=Address -DLLVM_USE_SANITIZE_COVERAGE=YES -DCMAKE_BUILD_TYPE=Release /path/to/llvm
+ ninja clang-format-fuzzer
+ mkdir CORPUS_DIR
+ ./bin/clang-format-fuzzer CORPUS_DIR
+
+Optionally build other kinds of binaries (asan+Debug, msan, ubsan, etc).
+
+TODO: commit the pre-fuzzed corpus to svn (?).
+
+Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=23052
+
+clang-fuzzer
+------------
+
+The default behavior is very similar to ``clang-format-fuzzer``.
+Clang can also be fuzzed with Tokens_ using ``-tokens=$LLVM/lib/Fuzzer/cxx_fuzzer_tokens.txt`` option.
+
+Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=23057
+
+FAQ
+=========================
+
+Q. Why Fuzzer does not use any of the LLVM support?
+---------------------------------------------------
+
+There are two reasons.
+
+First, we want this library to be used outside of the LLVM w/o users having to
+build the rest of LLVM. This may sound unconvincing for many LLVM folks,
+but in practice the need for building the whole LLVM frightens many potential
+users -- and we want more users to use this code.
+
+Second, there is a subtle technical reason not to rely on the rest of LLVM, or
+any other large body of code (maybe not even STL). When coverage instrumentation
+is enabled, it will also instrument the LLVM support code which will blow up the
+coverage set of the process (since the fuzzer is in-process). In other words, by
+using more external dependencies we will slow down the fuzzer while the main
+reason for it to exist is extreme speed.
+
+Q. What about Windows then? The Fuzzer contains code that does not build on Windows.
+------------------------------------------------------------------------------------
+
+The sanitizer coverage support does not work on Windows either as of 01/2015.
+Once it's there, we'll need to re-implement OS-specific parts (I/O, signals).
+
+Q. When this Fuzzer is not a good solution for a problem?
+---------------------------------------------------------
+
+* If the test inputs are validated by the target library and the validator
+ asserts/crashes on invalid inputs, the in-process fuzzer is not applicable
+ (we could use fork() w/o exec, but it comes with extra overhead).
+* Bugs in the target library may accumulate w/o being detected. E.g. a memory
+ corruption that goes undetected at first and then leads to a crash while
+ testing another input. This is why it is highly recommended to run this
+ in-process fuzzer with all sanitizers to detect most bugs on the spot.
+* It is harder to protect the in-process fuzzer from excessive memory
+ consumption and infinite loops in the target library (still possible).
+* The target library should not have significant global state that is not
+ reset between the runs.
+* Many interesting target libs are not designed in a way that supports
+ the in-process fuzzer interface (e.g. require a file path instead of a
+ byte array).
+* If a single test run takes a considerable fraction of a second (or
+ more) the speed benefit from the in-process fuzzer is negligible.
+* If the target library runs persistent threads (that outlive
+ execution of one test) the fuzzing results will be unreliable.
+
+Q. So, what exactly this Fuzzer is good for?
+--------------------------------------------
+
+This Fuzzer might be a good choice for testing libraries that have relatively
+small inputs, each input takes < 1ms to run, and the library code is not expected
+to crash on invalid inputs.
+Examples: regular expression matchers, text or binary format parsers.
+
+.. _pcre2: http://www.pcre.org/
+
+.. _AFL: http://lcamtuf.coredump.cx/afl/
+
+.. _SanitizerCoverage: https://code.google.com/p/address-sanitizer/wiki/AsanCoverage
+
+.. _Heartbleed: http://en.wikipedia.org/wiki/Heartbleed
diff --git a/docs/Phabricator.rst b/docs/Phabricator.rst
index 3f4f72a..b0f62e0 100644
--- a/docs/Phabricator.rst
+++ b/docs/Phabricator.rst
@@ -64,7 +64,9 @@ To upload a new patch:
* Paste the text diff or upload the patch file.
Note that TODO
* Leave the drop down on *Create a new Revision...* and click *Continue*.
-* Enter a descriptive title and summary; add reviewers and mailing
+* Enter a descriptive title and summary. The title and summary are usually
+ in the form of a :ref:`commit message <commit messages>`.
+* Add reviewers and mailing
lists that you want to be included in the review. If your patch is
for LLVM, add llvm-commits as a subscriber; if your patch is for Clang,
add cfe-commits.
@@ -85,8 +87,11 @@ Reviewing code with Phabricator
Phabricator allows you to add inline comments as well as overall comments
to a revision. To add an inline comment, select the lines of code you want
to comment on by clicking and dragging the line numbers in the diff pane.
+When you have added all your comments, scroll to the bottom of the page and
+click the Submit button.
-You can add overall comments or submit your comments at the bottom of the page.
+You can add overall comments in the text box at the bottom of the page.
+When you're done, click the Submit button.
Phabricator has many useful features, for example allowing you to select
diffs between different versions of the patch as it was reviewed in the
@@ -128,6 +133,16 @@ This allows people reading the version history to see the review for
context. This also allows Phabricator to detect the commit, close the
review, and add a link from the review to the commit.
+Abandoning a change
+-------------------
+
+If you decide you should not commit the patch, you should explicitly abandon
+the review so that reviewers don't think it is still open. In the web UI,
+scroll to the bottom of the page where normally you would enter an overall
+comment. In the drop-down Action list, which defaults to "Comment," you should
+select "Abandon Revision" and then enter a comment explaining why. Click the
+Submit button to finish closing the review.
+
Status
------
diff --git a/docs/ProgrammersManual.rst b/docs/ProgrammersManual.rst
index 2c7e4a9..6a4c22a 100644
--- a/docs/ProgrammersManual.rst
+++ b/docs/ProgrammersManual.rst
@@ -940,7 +940,7 @@ There are a variety of ways to pass around and use strings in C and C++, and
LLVM adds a few new options to choose from. Pick the first option on this list
that will do what you need, they are ordered according to their relative cost.
-Note that is is generally preferred to *not* pass strings around as ``const
+Note that it is generally preferred to *not* pass strings around as ``const
char*``'s. These have a number of problems, including the fact that they
cannot represent embedded nul ("\0") characters, and do not have a length
available efficiently. The general replacement for '``const char*``' is
@@ -1687,8 +1687,8 @@ they will automatically convert to a ptr-to-instance type whenever they need to.
Instead of derferencing the iterator and then taking the address of the result,
you can simply assign the iterator to the proper pointer type and you get the
dereference and address-of operation as a result of the assignment (behind the
-scenes, this is a result of overloading casting mechanisms). Thus the last line
-of the last example,
+scenes, this is a result of overloading casting mechanisms). Thus the second
+line of the last example,
.. code-block:: c++
@@ -2582,8 +2582,9 @@ doxygen info: `Type Clases <http://llvm.org/doxygen/classllvm_1_1Type.html>`_
The Core LLVM classes are the primary means of representing the program being
inspected or transformed. The core LLVM classes are defined in header files in
-the ``include/llvm/`` directory, and implemented in the ``lib/VMCore``
-directory.
+the ``include/llvm/IR`` directory, and implemented in the ``lib/IR``
+directory. It's worth noting that, for historical reasons, this library is
+called ``libLLVMCore.so``, not ``libLLVMIR.so`` as you might expect.
.. _Type:
@@ -2651,7 +2652,7 @@ Important Derived Types
Subclass of SequentialType for vector types. A vector type is similar to an
ArrayType but is distinguished because it is a first class type whereas
ArrayType is not. Vector types are used for vector operations and are usually
- small vectors of of an integer or floating point type.
+ small vectors of an integer or floating point type.
``StructType``
Subclass of DerivedTypes for struct types.
diff --git a/docs/R600Usage.rst b/docs/R600Usage.rst
index 48a30c8..093cdd7 100644
--- a/docs/R600Usage.rst
+++ b/docs/R600Usage.rst
@@ -6,22 +6,51 @@ Introduction
============
The R600 back-end provides ISA code generation for AMD GPUs, starting with
-the R600 family up until the current Sea Islands (GCN Gen 2).
+the R600 family up until the current Volcanic Islands (GCN Gen 3).
Assembler
=========
-The assembler is currently a work in progress and not yet complete. Below
-are the currently supported features.
+The assembler is currently considered experimental.
+
+For syntax examples look in test/MC/R600.
+
+Below some of the currently supported features (modulo bugs). These
+all apply to the Southern Islands ISA, Sea Islands and Volcanic Islands
+are also supported but may be missing some instructions and have more bugs:
+
+DS Instructions
+---------------
+All DS instructions are supported.
+
+MUBUF Instructions
+------------------
+All non-atomic MUBUF instructions are supported.
+
+SMRD Instructions
+-----------------
+Only the s_load_dword* SMRD instructions are supported.
+
+SOP1 Instructions
+-----------------
+All SOP1 instructions are supported.
+
+SOP2 Instructions
+-----------------
+All SOP2 instructions are supported.
+
+SOPC Instructions
+-----------------
+All SOPC instructions are supported.
SOPP Instructions
-----------------
-Unless otherwise mentioned, all SOPP instructions that with an operand
-accept a integer operand(s) only. No verification is performed on the
-operands, so it is up to the programmer to be familiar with the range
-or acceptable values.
+Unless otherwise mentioned, all SOPP instructions that have one or more
+operands accept integer operands only. No verification is performed
+on the operands, so it is up to the programmer to be familiar with the
+range or acceptable values.
s_waitcnt
^^^^^^^^^
@@ -41,3 +70,20 @@ wait for.
// Wait for vmcnt counter to be 1.
s_waitcnt vmcnt(1)
+VOP1, VOP2, VOP3, VOPC Instructions
+-----------------------------------
+
+All 32-bit and 64-bit encodings should work.
+
+The assembler will automatically detect which encoding size to use for
+VOP1, VOP2, and VOPC instructions based on the operands. If you want to force
+a specific encoding size, you can add an _e32 (for 32-bit encoding) or
+_e64 (for 64-bit encoding) suffix to the instruction. Most, but not all
+instructions support an explicit suffix. These are all valid assembly
+strings:
+
+.. code-block:: nasm
+
+ v_mul_i32_i24 v1, v2, v3
+ v_mul_i32_i24_e32 v1, v2, v3
+ v_mul_i32_i24_e64 v1, v2, v3
diff --git a/docs/Vectorizers.rst b/docs/Vectorizers.rst
index 2b70217..65c19aa 100644
--- a/docs/Vectorizers.rst
+++ b/docs/Vectorizers.rst
@@ -366,7 +366,7 @@ The decision to unroll the loop depends on the register pressure and the generat
Performance
-----------
-This section shows the the execution time of Clang on a simple benchmark:
+This section shows the execution time of Clang on a simple benchmark:
`gcc-loops <http://llvm.org/viewvc/llvm-project/test-suite/trunk/SingleSource/UnitTests/Vectorizer/>`_.
This benchmarks is a collection of loops from the GCC autovectorization
`page <http://gcc.gnu.org/projects/tree-ssa/vectorization.html>`_ by Dorit Nuzman.
diff --git a/docs/YamlIO.rst b/docs/YamlIO.rst
index 76dd021..3cc683b 100644
--- a/docs/YamlIO.rst
+++ b/docs/YamlIO.rst
@@ -332,7 +332,7 @@ as a field type:
}
};
-When reading YAML, if the string found does not match any of the the strings
+When reading YAML, if the string found does not match any of the strings
specified by enumCase() methods, an error is automatically generated.
When writing YAML, if the value being written does not match any of the values
specified by the enumCase() methods, a runtime assertion is triggered.
@@ -767,7 +767,7 @@ add "static const bool flow = true;". For instance:
};
With the above, if you used MyList as the data type in your native data
-structures, then then when converted to YAML, a flow sequence of integers
+structures, then when converted to YAML, a flow sequence of integers
will be used (e.g. [ 10, -3, 4 ]).
diff --git a/docs/index.rst b/docs/index.rst
index adb5419..2cc5b8b 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -177,6 +177,7 @@ For developers of applications which use LLVM as a library.
HowToSetUpLLVMStyleRTTI
ProgrammersManual
Extensions
+ LibFuzzer
:doc:`LLVM Language Reference Manual <LangRef>`
Defines the LLVM intermediate representation and the assembly form of the
@@ -218,6 +219,9 @@ For developers of applications which use LLVM as a library.
:doc:`CompilerWriterInfo`
A list of helpful links for compiler writers.
+:doc:`LibFuzzer`
+ A library for writing in-process guided fuzzers.
+
Subsystem Documentation
=======================
diff --git a/docs/tutorial/LangImpl5.rst b/docs/tutorial/LangImpl5.rst
index 72e34b1..ca2ffeb 100644
--- a/docs/tutorial/LangImpl5.rst
+++ b/docs/tutorial/LangImpl5.rst
@@ -254,7 +254,7 @@ In `Chapter 7 <LangImpl7.html>`_ of this tutorial ("mutable variables"),
we'll talk about #1 in depth. For now, just believe me that you don't
need SSA construction to handle this case. For #2, you have the choice
of using the techniques that we will describe for #1, or you can insert
-Phi nodes directly, if convenient. In this case, it is really really
+Phi nodes directly, if convenient. In this case, it is really
easy to generate the Phi node, so we choose to do it directly.
Okay, enough of the motivation and overview, lets generate code!
@@ -388,7 +388,7 @@ code:
The first two lines here are now familiar: the first adds the "merge"
block to the Function object (it was previously floating, like the else
-block above). The second block changes the insertion point so that newly
+block above). The second changes the insertion point so that newly
created code will go into the "merge" block. Once that is done, we need
to create the PHI node and set up the block/value pairs for the PHI.
diff --git a/docs/tutorial/LangImpl7.rst b/docs/tutorial/LangImpl7.rst
index c445908..6489407 100644
--- a/docs/tutorial/LangImpl7.rst
+++ b/docs/tutorial/LangImpl7.rst
@@ -632,7 +632,7 @@ own local variables, lets add this next!
User-defined Local Variables
============================
-Adding var/in is just like any other other extensions we made to
+Adding var/in is just like any other extension we made to
Kaleidoscope: we extend the lexer, the parser, the AST and the code
generator. The first step for adding our new 'var/in' construct is to
extend the lexer. As before, this is pretty trivial, the code looks like
diff --git a/docs/tutorial/OCamlLangImpl5.rst b/docs/tutorial/OCamlLangImpl5.rst
index b8ae3c5..0faecfb 100644
--- a/docs/tutorial/OCamlLangImpl5.rst
+++ b/docs/tutorial/OCamlLangImpl5.rst
@@ -336,7 +336,7 @@ for the 'then' block.
let phi = build_phi incoming "iftmp" builder in
The first two lines here are now familiar: the first adds the "merge"
-block to the Function object. The second block changes the insertion
+block to the Function object. The second changes the insertion
point so that newly created code will go into the "merge" block. Once
that is done, we need to create the PHI node and set up the block/value
pairs for the PHI.