summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri
Commit message (Collapse)AuthorAgeFilesLines
* i965/gen6: Set up layer constraints properly for depth buffers.Kenneth Graunke2015-07-101-1/+5
| | | | | | | | | | This ports over Chris Forbes' equivalent fixes in gen7_misc_state.c from commit 77d55ef4819436ebbf9786a1e720ec00707bbb19. No Piglit changes on Sandybridge. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Label the repclear shader "meta repclear" rather than "meta clear".Kenneth Graunke2015-07-101-1/+1
| | | | | | | | | | | | | | Color clears can be performed via two separate shaders - one is the generic "meta clear" shader (in meta.c); the other is the i965 specific "repclear" shader (in brw_meta_fast_clear.c). Giving them separate names makes them distinguishable when reading INTEL_DEBUG=shader_time output. v2: Call it "meta repclear", as suggested by Jason. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965: Fix indentation in emit_control_data_bits().Kenneth Graunke2015-07-101-72/+70
| | | | | | | The last patch left the code indented too far. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/gs: Move vertex_count != 0 check up a level; skip one caller.Kenneth Graunke2015-07-101-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Paul's original code had emit_control_data_bits() skip the URB write if vertex_count was 0. This meant wrapping every control data write in a conditional write. We accumulate control data bits in a single UD (32-bit) register. For simple shaders that don't emit many vertices, the control data header will be <= 32-bits long, so we only need to write it once at the end of the shader. For shaders with larger headers, we write out batches of control data bits at EmitVertex(), when (vertex_count * bits_per_vertex) % 32 == 0. On the first EmitVertex() call, the above expression will evaluate to true simply because vertex_count == 0. But we want to avoid emitting the control data bits, because we haven't accumulated 32-bits worth yet. In other words, the vertex_count != 0 check is really only necessary in the EmitVertex() batching case, not the end-of-thread case. This saves a CMP/IF/ENDIF in every shader that uses EndPrimitive() or multiple streams. The only downside is that a shader which emits no vertices at all will execute an additional URB write---but such shaders are pointless and not worth optimizing. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vs: Get rid of brw_vs_compile completely.Kenneth Graunke2015-07-093-40/+31
| | | | | | | | | After tearing it out another level or two, and just passing the key and vp directly, we can finally remove this struct. It also eliminates a pointless memcpy() of the key. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965/vs: Remove 'c'/vs_compile from vec4_vs_visitor.Kenneth Graunke2015-07-094-15/+15
| | | | | | | | | At this point, the brw_vs_compile structure only contains the key and gl_vertex_program pointer. We may as well pass and store them directly; it's simpler and more convenient (key-> instead of vs_compile->key...). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965/vec4: Move c->last_scratch into vec4_visitor.Kenneth Graunke2015-07-098-22/+15
| | | | | | | | | | | | Nothing outside of vec4_visitor uses it, so we may as well keep it internal. Commit db9c915abcc5ad78d2d11d0e732f04cc94631350 for the vec4 backend. (The empty class will be going away soon.) Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965/vec4: Move total_scratch calculation into the visitor.Kenneth Graunke2015-07-093-10/+7
| | | | | | | | This is more consistent with how we do it in the FS backend, and reduces a tiny bit of duplication. It'll also allow for a bit more tidying. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965/vec4: Move perf_debug about register spilling into the visitor.Kenneth Graunke2015-07-093-11/+13
| | | | | | | | | | | | This patch makes us only issue the performance warning about register spilling if we actually spilled registers. We also use scratch space for indirect addressing and the like. This is basically commit c51163b0cf7aff0375b1a5ea4cb3da9d9e164044 for the vec4 backend. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965/vec4: Plumb log_data through so the backend_shader field gets set.Kenneth Graunke2015-07-098-8/+18
| | | | | | | | | | Jason plumbed this through a while back in the FS backend, but apparently we were just passing NULL in the vec4 backend. This patch passes brw in as intended. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965: Switch on shader stage in nir_setup_outputs().Kenneth Graunke2015-07-091-26/+33
| | | | | | | | | Adding new shader stages to a switch statement is less confusing than an if-else-if ladder where all but the first case are fragment shader specific (but don't claim to be). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965: Set brw->batch.emit only #ifdef DEBUG.Matt Turner2015-07-092-1/+3
| | | | | | | | | | | | | | | It's only used inside #ifdef DEBUG. Cuts ~1.7k of .text, and more importantly prevents a larger code size regression in the next commit when the .used field is replaced and calculated on demand. text data bss dec hex filename 4945468 195152 26192 5166812 4ed6dc i965_dri.so before 4943740 195152 26192 5165084 4ed01c i965_dri.so after And surround the emit and total fields with #ifdef DEBUG to prevent such mistakes from happening again. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/hsw: Implement end of batch workaroundBen Widawsky2015-07-092-2/+29
| | | | | | | | | | | | | | | | | | | | This patch can cause an infinite recursion if the previous patch titled, "i965: Track finished batch state" isn't present (backporters take notice). v2: Sent out the wrong patch originally. This patches switches the order of flushes, doing the generic flush before the CC_STATE, and the required workaround flush afterwards v3: Only perform workaround for render ring Add text to the BATCH_RESERVE comments v4 (By Ken): Rebase; update citation to mention PRM and Wa name; combine two blocks. http://otc-mesa-ci.jf.intel.com/job/bwidawsk/171/ Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Move pipecontrol workaround bo to brw_pipe_controlChris Wilson2015-07-086-37/+64
| | | | | | | | | | With the exception of gen8, the sole user of the workaround bo are for emitting pipe controls. Move it out of the purview of the batchbuffer and into the pipecontrol. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
* i965: Query whether we have kernel support for the TIMESTAMP register onceChris Wilson2015-07-083-5/+25
| | | | | | | | | | | | | | | | Move the query for the TIMESTAMP register from context init to the screen, so that it is only queried once for all contexts. On 32bit systems, some old kernels trigger a hw bug resulting in the TIMESTAMP register being shifted and the low 32bits always zero. Detect this by repeating the read a few times and check the register is incrementing every 80ns as expected and not stuck on zero (as would be the case with the buggy kernel/hw.). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Martin Peres <martin.peres@linux.intel.com> Reviewed-by: Martin Peres <martin.peres@linux.intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/vs: Fix matNxM vertex attributes where M != 4.Kenneth Graunke2015-07-071-4/+11
| | | | | | | | | | | | | Matrix vertex attributes have their columns padded out to vec4s, which I was failing to account for. Scalar NIR expects them to be packed, however. Fixes 1256 dEQP tests on Broadwell. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Tested-by: Mark Janes <mark.a.janes@intel.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
* i965/gen4-5: Enable 16-wide dispatch on shaders with control flow.Francisco Jerez2015-07-071-7/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was probably disabled due to a combination of several bugs in the generator code (fixed earlier in this series) and a misunderstanding of the hardware spec. The documentation for most control flow instructions mentions among other restrictions: "Instruction compression is not allowed." This however doesn't have any implications on 16 wide not being supported, because none of the control flow instructions have multi-register operands (control flow instructions are not compressed on more recent hardware either, except maybe SNB's IF with inline compare). In fact Gen4-5 had 16-wide control flow masks and stacks, and the spec mentions in several places that control flow instructions push and pop 16 channels worth of data -- Otherwise there doesn't seem to be any indication that it shouldn't work. Causes no piglit regressions, and gives the following shader-db results on ILK: total instructions in shared programs: 4711384 -> 4711384 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 GAINED: 1215 LOST: 0 Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/gen4-5: Program the execution size correctly for DO/WHILE instructions.Francisco Jerez2015-07-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | From the hardware docs for the DO instruction: "Execution size is ignored for this instruction." My observation on ILK hardware contradicts the spec though, channels over the execution size of a DO instruction won't enter the loop, and channels over the execution size of a WHILE instruction will exit the loop after the first iteration -- The latter is consistent with the spec though, there's no claim about the execution size being ignored for the WHILE instruction so it's not completely unexpected that it has an influence on the evaluation of EMask. The execute_size argument of brw_DO() shouldn't have any effect on Gen6 and newer hardware. On Gen4-5 WHILE instructions inherit the execution size from the matching DO, so this patch should fix them too. The execution size of BREAK and CONT instructions was already being set correctly. Fixes some 50 piglit tests on Gen4-5 when forced to run shaders with conditional and loop instructions 16-wide, e.g. shaders/glsl-fs-continue-inside-do-while. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/gen4-5: Set ENDIF dst and src0 fields to the null register.Francisco Jerez2015-07-071-2/+2
| | | | | | | | | | | | | | The hardware docs don't mention explicitly what these fields should be, but I've verified experimentally on ILK that using a GRF as destination causes the register to be corrupted when the execution size of an ENDIF instruction is higher than 8 -- and because the destination we were using was g0, eventually a hang. Fixes some 150 piglit tests on Gen4-5 when forced to run shaders with if conditionals 16-wide, e.g. shaders/glsl-fs-sampler-numbering-3. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Reserve more batch space to accomodate Gen6 perfmonitors.Kenneth Graunke2015-07-061-2/+2
| | | | | | | | | | | | | | | Ben noticed that I said each PIPE_CONTROL was 4 DWords, but it's actually 5 DWords on Gen6-7. We've been reserving insufficient space for performance monitoring on Sandybridge, which means it would likely break if you used that functionality. (Thankfully, no one does...) Also, the existing number of 146 was the result of me flubbing up the arithmetic: it should have actually been 140. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
* i965/skl: Set the pulls bary bit in 3DSTATE_PS_EXTRANeil Roberts2015-07-064-0/+9
| | | | | | | | | | | On Gen9+ there is a new bit in 3DSTATE_PS_EXTRA that must be set if the shader sends a message to the pixel interpolator. This fixes the interpolateAt* tests on SKL, apart from interpolateatsample-nonconst but that is not implemented anywhere so it's not a regression. Reviewed-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: "10.6 10.5" <mesa-stable@lists.freedesktop.org>
* dri/common: allow BGRX sRGB visualsMarek Olšák2015-07-031-0/+1
|
* i965/fs: Don't disable SIMD16 when using the pixel interpolatorNeil Roberts2015-07-031-8/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was a comment saying that in SIMD16 mode the pixel interpolator returns coords interleaved 8 channels at a time and that this requires extra work to support. However, this interleaved format is exactly what the PLN instruction requires so I don't think anything needs to be done to support it apart from removing the line to disable it and to ensure that the message lengths for the send message are correct. I am more convinced that this is correct because as it says in the comment this interleaved output is identical to what is given in the thread payload. The code generated to apply the plane equation to these coordinates is identical on SIMD16 and SIMD8 except that the dispatch width is larger which implies no special unmangling is needed. Perhaps the confusion stems from the fact that the description of the PLN instruction in the IVB PRM seems to imply that the src1 inputs are not interleaved so it wouldn't work. However, in the HSW and BDW PRMs, the pseudo-code is different and looks like it expects the interleaved format. Mesa doesn't seem to generate different code on IVB to uninterleave the payload registers and everything is working so I can only assume that the PRM is wrong. I tested the interpolateAt tests on HSW and did a full Piglit run on IVB on there were no regressions. Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
* i965: allocate at least 1 BLEND_STATE elementMike Stroyan2015-07-021-1/+1
| | | | | | | | | | | | | | | | When there are no color buffer render targets, gen6 and gen7 still use the first BLEND_STATE element to determine alpha test. gen6_upload_blend_state was allocating zero elements when ctx->Color.AlphaEnabled was false. That left _3DSTATE_CC_STATE_POINTERS or _3DSTATE_BLEND_STATE_POINTERS pointing to random data from some previous brw_state_batch(). That sometimes suppressed depth rendering when those bits happened to mean COMPAREFUNC_NEVER. This produced flickering shadows for dota2 reborn. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80500 Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/gen9: use an unreserved surface alignment valueNanley Chery2015-07-011-4/+4
| | | | | | | | | | | Although the horizontal and vertical alignment fields are ignored here, 0 is a reserved value for them and may cause undefined behavior. Change the default value to an abitrary valid one. v2: add comment about chosen value (Topi). Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
* i965/fs: Use the builder directly for the gen6 interpolation add(32)Jason Ekstrand2015-07-011-6/+5
| | | | | | | Now that we can create builders with a bigger width than their parent as long as it's exec_all, we don't need to create the instruction manually. Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Relax fs_builder channel group assertion when force_writemask_all ↵Francisco Jerez2015-07-013-7/+7
| | | | | | | | | | | | | | | | | | | is on. This assertion was meant to catch code inadvertently escaping the control flow jail determined by the group of channel enable signals selected by some caller, however it seems useful to be able to increase the default execution size as long as force_writemask_all is enabled, because force_writemask_all is an explicit indication that there is no longer a one-to-one correspondence between channels and SIMD components so the restriction doesn't apply. In addition reorder the calls to fs_builder::group and ::exec_all in a couple of places to make sure that we don't temporarily break this invariant in the future for instructions with exec_size higher than the dispatch width. Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* nouveau: rename var name for nouveau_vieux to avoid conflict with nouveauIlia Mirkin2015-07-011-2/+2
| | | | | | | | | | | | We want to require different versions for nouveau and nouveau_vieux. autoconf will only check for NOUVEAU once if both drivers are enabled, meaning both version checks don't get executed. Rename the nouveau_vieux one to NVVIEUX to avoid the issue. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Tested-by: Alexandre Courbot <acourbot@nvidia.com> Tested-by: Martin Peres <martin.peres@free.fr> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
* i965/fs: Fix PIXEL_X/Y in regs_read()Jason Ekstrand2015-06-301-1/+1
| | | | PIXEL_X/Y takes a vec2 in the first argument
* i965/fs: Remove the width field from fs_regJason Ekstrand2015-06-308-107/+30
| | | | | | | | | | | | | As of now, the width field is no longer used for anything. The width field "seemed like a good idea at the time" but is actually entirely redundant with the instruction's execution size. Initially, it gave us the ability to easily set the instructions execution size based entirely on register widths. With the builder, we can easiliy set the sizes explicitly and the width field doesn't have as much purpose. At this point, it's just redundant information that can get out of sync so it really needs to go. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs_generator: Use inst->exec_size for determining hardware reg widthsJason Ekstrand2015-06-301-7/+7
| | | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Use exec_size instead of dst.width for computing component sizeJason Ekstrand2015-06-305-8/+8
| | | | | | | | There are a variety of places where we use dst.width / 8 to compute the size of a single logical channel. Instead, we should be using exec_size. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Use the builder dispatch_width for computing register offsetsJason Ekstrand2015-06-301-1/+1
| | | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Use the builder dispatch width instead of dst.width for pull constantsJason Ekstrand2015-06-301-4/+4
| | | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Remove exec_size guessing from fs_inst::init()Jason Ekstrand2015-06-301-22/+0
| | | | | | | | Now that all of the non-explicit constructors are gone, we don't need to guess anymore. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs_builder: Use the dispatch width for setting exec sizesJason Ekstrand2015-06-301-9/+11
| | | | | | | Previously we used dst.width but the two *should* be the same. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Use exec_size for determining regs read/written and partial writesJason Ekstrand2015-06-301-3/+3
| | | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Remove fs_inst constructors that don't take an explicit exec_sizeJason Ekstrand2015-06-304-39/+8
| | | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Make better use of the builder in shader_timeJason Ekstrand2015-06-301-6/+8
| | | | | | | | | Previously, we were just depending on register widths to ensure that various things were exec_size of 1 etc. Now, we do so explicitly using the builder. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Add a builder argument to offset()Jason Ekstrand2015-06-307-123/+132
| | | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Move offset(fs_reg, unsigned) to brw_fs.hJason Ekstrand2015-06-302-21/+21
| | | | | | | | Shortly, offset() will depend on the builder so we need it moved to some place where it has access to that. Reviewed-by: Iago Toral Quiroga <itoral@igali.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/blorp: Explicitly set execution sizes for new'd instructionsJason Ekstrand2015-06-301-4/+5
| | | | | | | | This doesn't affect instructions allocated using the builder. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Set the builder group for emitting FB-write stencil/AA alphaJason Ekstrand2015-06-301-1/+1
| | | | | | Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Explicitly set the exec_size on the add(32) in interpolation setupJason Ekstrand2015-06-301-4/+6
| | | | | | | | | | | | | | | | | Soon we will start using the builder to explicitly set all the execution sizes. We could make a 32-wide builder, but the builder asserts that we never grow it which is usually a reasonable assumption. Since this one instruction is a bit of an odd-ball, we just set the exec_size explicitly. v2: Explicitly new the fs_inst instead of using the builder and setting exec_size after the fact. v3: Set force_writemask_all with the builder instead of directly. The builder over-writes it if we set it manually. Also, if we don't have force_writemask_all in the builder it will assert-fail on SIMD32. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Properly handle LOAD_PAYLOAD in fs_inst::regs_readJason Ekstrand2015-06-301-0/+5
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Report the right value in fs_inst::regs_read() for PIXEL_X/YJason Ekstrand2015-06-301-2/+9
| | | | | | Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Fix fs_inst::regs_read() for uniform pull constant loadsJason Ekstrand2015-06-301-0/+6
| | | | | | | | | | | | Previously, fs_inst::regs_read() fell back to depending on the register width for the second source. This isn't really correct since it isn't a SIMD8 value at all, but a SIMD4x2 value. This commit changes it to explicitly be always one register. v2: Use mlen for determining the number of registers read Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Acked-by: Francisco Jerez <currojerez@riseup.net>
* i965/fs: Actually set/use the mlen for gen7 uniform pull constant loadsJason Ekstrand2015-06-302-13/+15
| | | | | | | | | Previously, we were allocating the payload with different sizes per gen and then figuring out the mlen in the generator based on gen. This meant, among other things, that the higher level passes knew nothing about it. Acked-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Use a switch statement in fs_inst::regs_read()Jason Ekstrand2015-06-301-22/+23
| | | | | | | | This makes things a little simpler, more efficient, and quite a bit more readable. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
* i965/fs: emit constants only onceConnor Abbott2015-06-302-13/+16
| | | | | | | | | | | | | | | | | | | Before, we would lazily emit a MOV whenever we encountered a use of a constant. Now that we have a dedicated file for SSA values, we can instead only emit the MOV's once, which is more consistent and prevents us from relying on CSE to re-combine the constants when they aren't absorbed into the instruction. total instructions in shared programs: 6078991 -> 6073118 (-0.10%) instructions in affected programs: 402221 -> 396348 (-1.46%) helped: 1527 HURT: 0 GAINED: 8 LOST: 2 v2: split this out from the previous commit (Jason) Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>