summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
Commit message (Collapse)AuthorAgeFilesLines
* i965/fs/generator: Don't use the address immediate for MOV_INDIRECTJason Ekstrand2016-11-011-28/+27
| | | | | | | | | | | | | | The address immediate field is only 9 bits and, since the value is in bytes, the highest GRF we can point to with it is g15. This makes it pretty close to useless for MOV_INDIRECT. There were already piles of restrictions preventing us from using it prior to Broadwell, so let's get rid of the gen8+ code path entirely. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97779 Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Matt Turner <mattst88@gmail.com> (cherry picked from commit 2a4a86862c949055c71637429f6d5f2e725d07d8)
* i965: Introduce downcast helpers for prog_data structures.Kenneth Graunke2016-10-051-4/+3
| | | | | | | Similar to brw_context(...), intel_texture_object(...), and so on. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timothy Arceri <timothy.arcero@collabora.com>
* i965/ir: Pass identity mask to brw_find_live_channel() in the packed ↵Francisco Jerez2016-09-211-1/+4
| | | | | | | | | dispatch case. This avoids emitting a few extra instructions required to take the dispatch mask into account when it's known to be tightly packed. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Take Dispatch/Vector mask into account in FIND_LIVE_CHANNELJason Ekstrand2016-09-211-2/+5
| | | | | | | | | | | | | | | | | | | On at least Sky Lake, ce0 does not contain the full story as far as enabled channels goes. It is possible to have completely disabled channels where the corresponding bits in ce0 are 1. In order to get the correct execution mask, you have to mask off those channels which were disabled from the beginning by taking the AND of ce0 with either sr0.2 or sr0.3 depending on the shader stage. Failure to do so can result in FIND_LIVE_CHANNEL returning a completely dead channel. Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Cc: Francisco Jerez <currojerez@riseup.net> [ Francisco Jerez: Fix a couple of typos, add mask register type assertion, clarify reason why ce0 can have bits set for disabled channels, clarify that this may only be a problem when thread dispatch doesn't pack channels tightly in the SIMD thread. Apply same treatment to Align16 path. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Handle arbitrary offsets in brw_reg_from_fs_reg for MRF/VGRF registers.Francisco Jerez2016-09-141-3/+2
| | | | | | | This restriction seemed rather artificial... Removing it actually simplifies things slightly. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Replace fs_inst::regs_written with ::size_written field in bytes.Francisco Jerez2016-09-141-6/+10
| | | | | | | | | | | | | | The previous regs_written field can be recovered by rewriting each rvalue reference of regs_written like 'x = i.regs_written' to 'x = DIV_ROUND_UP(i.size_written, reg_unit)', and each lvalue reference like 'i.regs_written = x' to 'i.size_written = x * reg_unit'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the interest of keeping the rather lengthy patch as obvious as possible. I'll come back later to clean up any ugliness introduced here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Replace fs_reg::subreg_offset with fs_reg::offset expressed in bytes.Francisco Jerez2016-09-141-2/+2
| | | | | | | | | | | | | | | | | The fs_reg::subreg_offset and ::offset fields are now redundant, the sub-GRF offset can just be added to the single ::offset field expressed in byte units. The current subreg_offset value can be recovered by applying the following rule: Replace each rvalue reference of subreg_offset like 'x = r.subreg_offset' with 'x = r.offset % reg_unit', and each lvalue reference like 'r.subreg_offset = x' with 'r.offset = ROUND_DOWN_TO(r.offset, reg_unit) + x'. For the same reason as in the previous patches, this doesn't attempt to be particularly clever about simplifying the result in the interest of keeping the rather lengthy patch as obvious as possible. I'll come back later to clean up any ugliness introduced here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Replace fs_reg::reg_offset with fs_reg::offset expressed in bytes.Francisco Jerez2016-09-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fs_reg::offset field in byte units introduced in this patch is a more straightforward alternative to the current register offset representation split between fs_reg::reg_offset and ::subreg_offset. The split representation makes it too easy to forget about one of the offsets while dealing with the other, which has led to multiple back-end bugs in the past. To make the matter worse the unit reg_offset was expressed in was rather inconsistent, for uniforms it would be expressed in either 4B or 16B units depending on the back-end, and for most other things it would be expressed in 32B units. This encodes reg_offset as a new offset field expressed consistently in byte units. Each rvalue reference of reg_offset in existing code like 'x = r.reg_offset' is rewritten to 'x = r.offset / reg_unit', and each lvalue reference like 'r.reg_offset = x' is rewritten to 'r.offset = r.offset % reg_unit + x * reg_unit'. Because the change affects a lot of places and is rather non-trivial to verify due to the inconsistent value of reg_unit, I've tried to avoid making any additional changes other than applying the rewrite rule above in order to keep the patch as simple as possible, sometimes at the cost of introducing obvious stupidity (e.g. algebraic expressions that could be simplified given some knowledge of the context) -- I'll clean those up later on in a second pass. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965: Pass start_offset to brw_set_uip_jip().Matt Turner2016-08-311-1/+1
| | | | | | | | | | | | Without this, we would pass over the instructions in the SIMD8 program (which is located earlier in the buffer) when brw_set_uip_jip() is called to handle the SIMD16 program. The assertion about compacted control flow was bogus: halt, cont, break cannot be compacted because they have both JIP and UIP. Instead, we should never see a compacted instruction in this code at all. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/fs: Define framebuffer read virtual opcode.Francisco Jerez2016-08-251-0/+20
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Drop bogus writemasking disable bit from HALT instructions.Francisco Jerez2016-08-181-4/+0
| | | | | | | | | | This may have been the reason people ran into problems with non-uniform HALT instructions and ended up using the inefficient ANY16H/ANY8H predicates instead of ANY4H or NORMAL in order to prevent non-uniform discard. The HALT instruction is able to handle non-uniform execution masks just fine. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965: Delete the FS_OPCODE_INTERPOLATE_AT_CENTROID virtual opcode.Kenneth Graunke2016-07-201-5/+0
| | | | | | | | | We no longer use this message. As far as I can tell, it's fairly useless - the equivalent information is provided in the payload. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisforbes@google.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965: Use LZD to implement nir_op_ufind_msbIan Romanick2016-07-191-0/+3
| | | | | | | | | | This uses one less instruction. v2: Move emit_find_msb_using_lzd out of the visitor classes. Suggested by Curro. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: enable the emission of the DIM instructionSamuel Iglesias Gonsálvez2016-07-141-0/+7
| | | | | | | | | | v2 (Matt): - Take a DF source argument for the DIM instruction emission in the visitors. - Indentation. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: do not require force_writemask_all with exec_size 4Samuel Iglesias Gonsálvez2016-07-131-1/+1
| | | | | | | | | | | | So far we only used instructions with this size in situations where we did not operate per-channel and we wanted to ignore the execution mask, but gen7 fp64 will need to emit code with a width of 4 that needs normal execution masking. v2: - Modify the assert instead of deleting it (Curro) Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* i965: Defeat the register stride checker in pull uniform messages.Samuel Iglesias Gonsálvez2016-06-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pulling DF uniforms from pull constant buffer generates messages like: send(4) g12<1>DF g12<0,1,0>F sampler ld SIMD4x2 Surface = 1 Sampler = 0 mlen 1 rlen 1 which produces GPU hangs in Cherryview/Braswell: "For 64-bit Align1 operation or multiplication of dwords in CHV, source horizontal stride must be aligned to qword." This seems to be documented in the Cherryview PRM, Volume 7, Page 843: "When source or destination datatype is 64b or operation is integer DWord multiply, regioning in Align1 must follow these rules: 1. Source and Destination horizontal stride must be aligned to the same qword." We should set the destination type to UD, D, or F so that the register stride checker doesn't notice. The destination type of send messages is basically irrelevant anyway. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95462 Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965: Defeat the register stride checker in URB reads.Kenneth Graunke2016-06-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pulling DF inputs from the URB generates messages like: send(8) g23<1>DF g1<8,8,1>UD urb 3 SIMD8 read mlen 1 rlen 2 { align1 1Q }; which makes the simulator angry: "For 64-bit Align1 operation or multiplication of dwords in CHV, source horizontal stride must be aligned to qword." This seems to be documented in the Cherryview PRM, Volume 7, Page 823: "When source or destination datatype is 64b or operation is integer DWord multiply, regioning in Align1 must follow these rules: 1. Source and Destination horizontal stride must be aligned to the same qword." Setting the source horizontal stride to QWord is insane, as it's the message header containing 8 URB handles in a single 32-bit DWord. Instead, we should whack the destination type to UD, D, or F so that the register stride checker doesn't notice. The destination type of send messages is basically irrelevant anyway. Cc: "12.0" <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95462 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965/fs: Add (sub)reg_offset asserts to brw_reg_from_fs_reg.Francisco Jerez2016-05-271-0/+2
| | | | | | | These are completely ignored by the conversion to brw_reg, so they better be zero. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Expose arbitrary channel execution groups to the IR.Francisco Jerez2016-05-271-3/+4
| | | | | | | | | | This generalizes the current fs_inst::force_sechalf flag to allow specifying channel enable groups other than 0 or 8. At some point it will likely make sense to fix the vec4 generator to support arbitrary execution groups and then move the definition of fs_inst::group into backend_instruction (e.g. so we can do FP64 in the VEC4 back-end). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/ir: Make BROADCAST emit an unmasked single-channel move.Francisco Jerez2016-05-271-0/+1
| | | | | | | | | | | | | | Alternatively we could have extended the current semantics to 32-wide mode by changing brw_broadcast() to emit multiple indexed MOV instructions in the generator copying the selected value to all destination registers, but it seemed rather silly to waste EU cycles unnecessarily copying the exact same value 32 times in the GRF. The vstride change in the Align16 path is required to avoid assertions in validate_reg() since the change causes the execution size of the MOV and SEL instructions to be equal to the source region width. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Lower 32-wide scratch writes in the generator.Francisco Jerez2016-05-271-6/+24
| | | | | | | | | | | The hardware has messages that can write 32 32bit components at once but the channel enable mask gets messed up. We need to split them into several 16-wide scratch writes for the channel enables to be applied correctly. The SIMD lowering pass cannot be used for this because scratch writes are emitted rather late during register allocation long after SIMD lowering has been done. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Implement scratch reads and writes of 4 GRFs at a time.Francisco Jerez2016-05-271-0/+4
| | | | Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Clean up remaining uses of dispatch_width in the generator.Francisco Jerez2016-05-271-6/+7
| | | | | | | | Most of these are bugs because the intended execution size of an instruction and the dispatch width of the shader aren't necessarily the same (especially in SIMD32 programs). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/eu: Remove brw_codegen::compressed and ::compressed_stack.Francisco Jerez2016-05-271-5/+5
| | | | Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: No need to reset predicate control after emitting some instructions.Francisco Jerez2016-05-271-2/+0
| | | | | | Trivial clean-up. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Pass current execution size to brw_IF() and brw_DO().Francisco Jerez2016-05-271-2/+2
| | | | | | | | | | | | This gets IF and DO instructions working in SIMD32 programs. brw_IF() and brw_DO() should probably behave in the same way as other generator functions that emit control flow instructions and just figure out the right execution size by themselves from the current execution controls specified through the brw_codegen argument. Changing that will require updating lots of Gen4-5 clipper code though, so for the moment just pass the current value redundantly from the FS generator. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Extend region width calculation to allow arbitrary execution sizes.Francisco Jerez2016-05-271-16/+23
| | | | | | | | | | | | Instead of just halving the execution size when the instruction is compressed hoping that it will give a legal source region width, we can calculate the maximum legal width value in closed form from the component size and stride. This makes sure that brw_reg_from_fs_reg() always returns a valid hardware region even for virtual 32-wide instructions (e.g. send-like instructions) that would seem to exceed the hardware region width limit after halving. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Pass the compression mode to brw_reg_from_fs_reg().Kenneth Graunke2016-05-271-5/+6
| | | | | | | | | | Curro is planning to eliminate p->compressed, so let's avoid using it here and just pass in the value directly. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> [ Francisco Jerez: Pass boolean flag instead of brw_compression enum. ] Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Simplify per-instruction compression control setup in generator.Francisco Jerez2016-05-271-27/+17
| | | | | | | | By using the new compression/group control interface. This will allow easier extension to support arbitrary channel enable groups at the IR level. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: No need to set compression control at the top of generate_code().Francisco Jerez2016-05-271-2/+0
| | | | | | | The right value is dependent on the specific IR instruction being generated so it has to be reset in every iteration of the loop anyway. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/eu: Fix a bunch of compression control bugs in the generator.Francisco Jerez2016-05-271-1/+1
| | | | | | | | | | Most of these were resetting quarter control to zero incorrectly even though everything they needed to do was disable instruction compression -- The brw_SAMPLE() case was doing the right thing but it can be simplified slightly by using the new compression control interface. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Remove FS_OPCODE_PACK_STENCIL_REF virtual instruction.Francisco Jerez2016-05-271-45/+0
| | | | | | It's just a byte MOV with strided source. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Remove extract virtual opcodes.Francisco Jerez2016-05-271-22/+0
| | | | | | | These can be easily represented in the IR as a MOV instruction with strided source so they seem rather redundant. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Remove manual splitting of DDY ops in the generator.Francisco Jerez2016-05-271-37/+1
| | | | Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Remove manual unrolling of BFI instructions from the generator.Francisco Jerez2016-05-271-34/+2
| | | | Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Drop Gen7 CMP SIMD unrolling workaround from the generator.Francisco Jerez2016-05-271-36/+10
| | | | Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Drop lowering code for a few three-source instructions from the ↵Francisco Jerez2016-05-271-47/+4
| | | | | | generator. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Set default access mode to Align1 for all instructions in the ↵Francisco Jerez2016-05-271-0/+1
| | | | | | | | | | | | | generator. Currently the generator code for most opcodes honours the default access mode (which should typically be Align1 in the scalar back-end), but generate_code() doesn't set it explicitly which means that the access mode from a previous instruction could leak into the following ones if you did something special and weren't careful enough to save and restore the previous access mode. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Remove handcrafted math SIMD lowering from the generator.Francisco Jerez2016-05-271-91/+21
| | | | | | | | Most of this wouldn't have worked for SIMD32 and had various dispatch_width and compression control bugs. It's mostly dead now with SIMD lowering of math instructions turned on in the compiler. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Rename Gen4 physical varying pull constant load opcode.Francisco Jerez2016-05-271-5/+5
| | | | | | | | | For consistency with the Gen7 variant. I'm not doing the same to the uniform pull constant message at this point because the non-GEN7 one is still overloaded to be either an expression-like logical instruction or a Gen4-specific physical send message. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Hide varying pull constant load message setup behind logical opcode.Francisco Jerez2016-05-271-7/+2
| | | | | | | | This will allow the SIMD lowering pass to split 32-wide varying pull constant loads (not natively supported by the hardware) into 16-wide instructions. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965/fs: Mark UBO uniform pull constant loads as force_writemask_all.Francisco Jerez2016-05-231-0/+2
| | | | | | | | | | This lets the rest of the backend know that the uniform pull constant load opcodes don't respect channel enables -- Without this the register allocator has no way to know that the return payload of a pull constant load is not per-channel and spills of the destination will be broken under non-uniform control flow. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965: Delete dead dFdy flipping code.Kenneth Graunke2016-05-201-19/+5
| | | | | | | | | | | Rob's nir_lower_wpos_ytransform() pass flips dFdy in the opposite case of what I expected, so we always take the negate_value case. It doesn't really matter. v2: Write src0 before src1 in ADD instructions (requested by Matt). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965, anv: Use NIR FragCoord re-center and y-transform passes.Kenneth Graunke2016-05-201-4/+4
| | | | | | | | | | | | | | This handles gl_FragCoord transformations and other window system vs. user FBO coordinate system flipping by multiplying/adding uniform values, rather than recompiles. This is much better because we have no decent way to guess whether the application is going to use a shader with the window system FBO or a user FBO, much less the drawable height. This led to a lot of recompiles in many applications. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Add infrastucture for sample lod-zero operations.Matt Turner2016-05-191-0/+14
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Make brw_reg_from_fs_reg() halve exec_size when compressed.Kenneth Graunke2016-05-171-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | In a5d7e144eaf43fee37e6ff9e2de194407087632b, Connor generalized the exec_size halving code to handle more cases. As part of this, he made it not halve anything if the region accessed falls completely in a single register. Unfortunately, it started producing some invalid regions: -add(16) g6<1>F g10<8,8,1>UW -g1<0,1,0>F { align1 compr }; -add(16) g8<1>F g12<8,8,1>UW -g1.1<0,1,0>F { align1 compr }; +add(16) g6<1>F g10<16,16,1>UW -g1<0,1,0>F { align1 compr }; +add(16) g8<1>F g12<16,16,1>UW -g1.1<0,1,0>F { align1 compr }; Here, the UW source region completely fits within a register. However, we have to use instruction compression because the destination region spans two registers. <16,16,1> is invalid because it's compressed. To handle this, skip the "everything fits in one register" case and fall through to the exec_size halving case when compressed. Fixes hundreds of Piglit regressions on GM965. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95370 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Move compression decisions before brw_reg_from_fs_reg().Kenneth Graunke2016-05-171-26/+26
| | | | | | | | | brw_reg_from_fs_reg() needs to know whether the instruction will be compressed or not. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95370 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/blorp: Delete the old blorp shader emit codeJason Ekstrand2016-05-141-21/+0
| | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Fix undefined df bits in brw_reg comparisons.Kenneth Graunke2016-05-141-1/+1
| | | | | | | | | | | | | | | | | | Commit 5310bca024f77da40ea6f4c275455f9cb0528f9e added a new "double df" field to the brw_reg struct, adding an extra 4 bytes of data that isn't usually initialized (or may contain irrelevant garbage if the struct is mutated). This means that it's no longer safe to memcmp(). Instead, add a brw_regs_equal() function which ignores the extra df bits unless they matter. To keep the implementation cheap, we wrap the first set of fields in a union/struct so that we can use a single DWord comparison. v2: Drop unnecessary casts (caught by Francisco Jerez). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
* i965/fs: extend exec_size halving in the generatorConnor Abbott2016-05-101-6/+10
| | | | | | | | | | | | | The HW has a restriction that only vertical stride may cross register boundaries. Previously, this only mattered for SIMD16 instructions where we needed to use the same regioning parameters as the equivalent SIMD8 instruction but double the exec size. But we need to do the same splitting for 64-bit instructions as well as instructions with a stride of 2 (which effectively consume 64 bits per element). Fix up the code to do the right thing instead of special-casing SIMD16. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>