diff options
author | Justin Holewinski <justin.holewinski@gmail.com> | 2011-08-11 17:34:16 +0000 |
---|---|---|
committer | Justin Holewinski <justin.holewinski@gmail.com> | 2011-08-11 17:34:16 +0000 |
commit | dceb002f826db10e260a7843e12a48b9fadde349 (patch) | |
tree | e1164a5506d108dfdbbd6a89f59d7b9906ccdc20 /docs | |
parent | 6236f7f2b66cecdfe18c7c9b77b59cb922617f3f (diff) | |
download | external_llvm-dceb002f826db10e260a7843e12a48b9fadde349.zip external_llvm-dceb002f826db10e260a7843e12a48b9fadde349.tar.gz external_llvm-dceb002f826db10e260a7843e12a48b9fadde349.tar.bz2 |
PTX: Add basic documentation to CodeGenerator.html
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137315 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'docs')
-rw-r--r-- | docs/CodeGenerator.html | 65 |
1 files changed, 65 insertions, 0 deletions
diff --git a/docs/CodeGenerator.html b/docs/CodeGenerator.html index db62780..248a85c 100644 --- a/docs/CodeGenerator.html +++ b/docs/CodeGenerator.html @@ -114,6 +114,7 @@ <li><a href="#ppc_prolog">Prolog/Epilog</a></li> <li><a href="#ppc_dynamic">Dynamic Allocation</a></li> </ul></li> + <li><a href="#ptx">The PTX backend</a></li> </ul></li> </ol> @@ -2914,6 +2915,70 @@ MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory </div> +<!-- ======================================================================= --> +<h3> + <a name="ptx">The PTX backend</a> +</h3> + +<div> + +<p>The PTX code generator lives in the lib/Target/PTX directory. It is + currently a work-in-progress, but already supports most of the code + generation functionality needed to generate correct PTX kernels for + CUDA devices.</p> + +<p>The code generator can target PTX 2.0+, and shader model 1.0+. The + PTX ISA Reference Manual is used as the primary source of ISA + information, though an effort is made to make the output of the code + generator match the output of the NVidia nvcc compiler, whenever + possible.</p> + +<p>Code Generator Options:</p> +<table border="1" cellspacing="0"> + <tr> + <th>Option</th> + <th>Description</th> + </tr> + <tr> + <td><code>double</code></td> + <td align="left">If enabled, the map_f64_to_f32 directive is + disabled in the PTX output, allowing native double-precision + arithmetic</td> + </tr> + <tr> + <td><code>no-fma</code></td> + <td align="left">Disable generation of Fused-Multiply Add + instructions, which may be beneficial for some devices</td> + </tr> + <tr> + <td><code>smxy / computexy</code></td> + <td align="left">Set shader model/compute capability to x.y, + e.g. sm20 or compute13</td> + </tr> +</table> + +<p>Working:</p> +<ul> + <li>Arithmetic instruction selection (including combo FMA)</li> + <li>Bitwise instruction selection</li> + <li>Control-flow instruction selection</li> + <li>Function calls (only on SM 2.0+ and no return arguments)</li> + <li>Addresses spaces (0 = global, 1 = constant, 2 = local, 4 = + shared)</li> + <li>Thread synchronization (bar.sync)</li> + <li>Special register reads ([N]TID, [N]CTAID, PMx, CLOCK, etc.)</li> +</ul> + +<p>In Progress:</p> +<ul> + <li>Robust call instruction selection</li> + <li>Stack frame allocation</li> + <li>Device-specific instruction scheduling optimizations</li> +</ul> + + +</div> + </div> <!-- *********************************************************************** --> |