summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_nir.c
Commit message (Collapse)AuthorAgeFilesLines
* i965: Don't use nir_assign_var_locations for VS/TES/GS outputs.Kenneth Graunke2016-10-271-9/+4
| | | | | | | | | | | | Fixes spec/arb_enhanced_layouts/execution/component-layout/vs-fs-array-dvec3. v2: Remove nir_outputs field from fs_visitor (caught by Tim and Iago). Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com> (cherry picked from commit 59864e8e02057cc6fa0448a8af067a3cf53389da)
* i965: Silence unused parameter warningsIan Romanick2016-10-171-1/+0
| | | | | | | | | | | | | | | | | | | | brw_link.cpp:76:44: warning: unused parameter ‘shader_type’ [-Wunused-parameter] gl_shader_stage shader_type, ^ brw_nir.c: In function ‘brw_nir_lower_vs_inputs’: brw_nir.c:194:55: warning: unused parameter ‘devinfo’ [-Wunused-parameter] const struct gen_device_info *devinfo, ^ brw_vec4_visitor.cpp:914:37: warning: unused parameter ‘sampler’ [-Wunused-parameter] uint32_t sampler, ^ brw_vec4_visitor.cpp:1146:34: warning: unused parameter ‘stream_id’ [-Wunused-parameter] vec4_visitor::gs_emit_vertex(int stream_id) ^ Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Engestrom <eric@engestrom.ch>
* nir: Allow opt_peephole_sel to be more aggressive in flattening IFs.Eric Anholt2016-09-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VC4 was running into a major performance regression from enabling control flow in the glmark2 conditionals test, because of short if statements containing an ffract. This pass seems like it was was trying to ensure that we only flattened IFs that should be entirely a win by guaranteeing that there would be fewer bcsels than there were MOVs otherwise. However, if the number of ALU ops is small, we can avoid the overhead of branching (which itself costs cycles) and still get a win, even if it means moving real instructions out of the THEN/ELSE blocks. For now, just turn on aggressive flattening on vc4. i965 will need some tuning to avoid regressions. It does looks like this may be useful to replace freedreno code. Improves glmark2 -b conditionals:fragment-steps=5:vertex-steps=0 from 47 fps to 95 fps on vc4. vc4 shader-db: total instructions in shared programs: 101282 -> 99543 (-1.72%) instructions in affected programs: 17365 -> 15626 (-10.01%) total uniforms in shared programs: 31295 -> 31172 (-0.39%) uniforms in affected programs: 3580 -> 3457 (-3.44%) total estimated cycles in shared programs: 225182 -> 223746 (-0.64%) estimated cycles in affected programs: 26085 -> 24649 (-5.51%) v2: Update shader-db output. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
* i965/nir: Roll set_default_interpolation into lower_fs_inputsJason Ekstrand2016-09-151-0/+24
| | | | | | Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Use NIR for handling forced per-sample interpolationJason Ekstrand2016-09-151-2/+7
| | | | | | Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* nir: Add a flag to lower_io to force "sample" interpolationJason Ekstrand2016-09-151-9/+9
| | | | | | Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* nir: Report progress from nir_lower_phis_to_scalar.Kenneth Graunke2016-09-141-1/+1
| | | | | Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* nir: Report progress from nir_lower_alu_to_scalar.Kenneth Graunke2016-09-141-1/+1
| | | | | Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
* intel: s/brw_device_info/gen_device_info/Jason Ekstrand2016-09-031-3/+3
| | | | | | | | | | | | | Generated by: sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* i965/fs: Rework representation of fragment output locations in NIR.Francisco Jerez2016-08-251-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem with the current approach is that driver output locations are represented as a linear offset within the nir_outputs array, which makes it rather difficult for the back-end to figure out what color output and index some nir_intrinsic_load/store_output was meant for, because the offset of a given output within the nir_output array is dependent on the type and size of all previously allocated outputs. Instead this defines the driver location of an output to be the pair formed by its GLSL-assigned location and index (I've borrowed the bitfield macros from brw_defines.h in order to represent the pair of integers as a single scalar value that can be assigned to nir_variable_data::driver_location). nir_assign_var_locations is no longer useful for fragment outputs. Because fragment outputs are now allocated independently rather than within the nir_outputs array, the get_frag_output() helper becomes necessary in order to obtain the right temporary register for a given location-index pair. The type_size helper passed to nir_lower_io is now type_size_dvec4 rather than type_size_vec4_times_4 so that output array offsets are provided in terms of whole array elements rather than in terms of scalar components (dvec4 is the largest vector type supported by the GLSL so this will cause all individual fragment outputs to have a size of one regardless of the type). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/nir: Enable NIR lowering of txf and rect offsetsJason Ekstrand2016-07-221-0/+2
| | | | | | | | | | | | | | This fixes the following piglit tests on gen6+: tex-miplevel-selection textureProjGradOffset 2DRect tex-miplevel-selection textureGradOffset 2DRect tex-miplevel-selection textureGradOffset 2DRectShadow tex-miplevel-selection textureProjGradOffset 2DRect_ProjVec4 tex-miplevel-selection textureProjGradOffset 2DRectShadow Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: "12.0" <mesa-dev@lists.freedesktop.org>
* i965: enable component packing for vs and fsTimothy Arceri2016-07-211-4/+4
| | | | | | | | | | | Rather than trying to work out the total number of components used at a location we simply treat all outputs as vec4s. This removes the need for complex code looping over varyings to match packed locations and the need for storing the total number of components used at each location. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Rewrite FS input handling to use the new NIR intrinsics.Kenneth Graunke2016-07-201-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This eliminates the need to walk the list of input variables, recurse into their types (via logic largely redundant with nir_lower_io), and interpolate all possible inputs up front. The backend no longer has to care about variables at all, which eliminates complications from trying to pack multiple variables into the same location. Instead, each intrinsic specifies exactly what's needed. This should unblock Timothy's work on GL_ARB_enhanced_layouts. Each load_interpolated_input intrinsic corresponds to PLN instructions, while load_barycentric_at_* intrinsics correspond to pixel interpolator messages. The pixel/centroid/sample barycentric intrinsics simply refer to payload fields (delta_xy[]), and don't actually generate any code. Because we use a single intrinsic for both centroid-qualified variables and interpolateAtCentroid(), they become indistinguishable. We stop sending pixel interpolator messages for those, and instead use the payload provided data, which should be considerably faster. On Broadwell: total instructions in shared programs: 9067751 -> 9067570 (-0.00%) instructions in affected programs: 145902 -> 145721 (-0.12%) helped: 422 HURT: 209 total spills in shared programs: 2849 -> 2899 (1.76%) spills in affected programs: 760 -> 810 (6.58%) helped: 0 HURT: 10 total fills in shared programs: 3910 -> 3950 (1.02%) fills in affected programs: 617 -> 657 (6.48%) helped: 0 HURT: 10 LOST: 3 GAINED: 3 The differences mostly appear to be slight changes in MOVs. v2: Use nir_shader_compiler_options::use_interpolated_input_intrinsics flag rather than passing it directly to nir_lower_io. Use the unreachable() macro rather than assert in one place. (Review feedback from Chris Forbes.) Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisforbes@google.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>
* nir: use the same driver location for packed varyingsTimothy Arceri2016-07-071-3/+5
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: remove type_size_vec4_times_4()Timothy Arceri2016-06-151-2/+2
| | | | | | | | type_size_vec4_times_4() was introduced as a fix in 8dcf807cb43383 however since 3810c1561 we can just use type_size_scalar() and get the actual number of outputs we need. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Move brw_create_nir to brw_program.cJason Ekstrand2016-05-261-70/+0
| | | | | | | | This way it's no longer part of libi965_compiler.la since it depends on GLSL and ARB program stuff. Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
* i965/nir: Move the type_size_*_bytes functions to brw_nir.hJason Ekstrand2016-05-261-12/+0
| | | | | Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
* compiler: Move glsl_to_nir to libglsl.laJason Ekstrand2016-05-261-1/+1
| | | | | | | | Right now libglsl.la depends on libnir.la so putting it in libnir.la adds a dependency on libglsl.la that goes the wrong direction. Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
* i965: Invoke lowering pass for YUV texturesKristian Høgsberg Kristensen2016-05-241-0/+4
| | | | Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
* nir: remove dead glsl variables before lowering io.Dave Airlie2016-05-211-0/+1
| | | | | | | | | | | | | For cull distance GLSL will let unsized unused arrays get into the backend, we should nuke those straight away, to save caring about them later. This fixes: arb_separate_shader_objects/linker/large-number-of-unused-varyings as a side effect (even without culling changes). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Dave Airlie <airlied@redhat.com>
* i965, anv: Use NIR FragCoord re-center and y-transform passes.Kenneth Graunke2016-05-201-0/+13
| | | | | | | | | | | | | | This handles gl_FragCoord transformations and other window system vs. user FBO coordinate system flipping by multiplying/adding uniform values, rather than recompiles. This is much better because we have no decent way to guess whether the application is going to use a shader with the window system FBO or a user FBO, much less the drawable height. This led to a lot of recompiles in many applications. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: take care of doubles when lowering VS inputsJuan A. Suarez Romero2016-05-171-1/+2
| | | | | | | Input attributes can require 2 vec4 or 1 vec4 depending on whether they are double-precision or not. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: take care of doubles when remapping VS attributesJuan A. Suarez Romero2016-05-171-15/+11
| | | | | | | Double-precision types require 1 slot in VUE for double and dvec2, and 2 slots for anything else. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* nir/lower-io: add support for lowering inputsRob Clark2016-05-111-1/+3
| | | | | Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* nir: rename lower_outputs_to_temporaries -> lower_io_to_temporariesRob Clark2016-05-111-1/+1
| | | | | | Since it will gain support to lower inputs, give it a more generic name. Signed-off-by: Rob Clark <robclark@freedesktop.org>
* nir: move callsite of lower_outputs_to_temporariesRob Clark2016-05-111-0/+1
| | | | | | | | | | Going to convert this pass to parameterized lower_io_to_temporaries, and we want the user to be able to specify whether to lower outputs or inputs or both. The restriction of running this pass before validate to avoid output reads no longer applies. Signed-off-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965: use pack/unpackDouble loweringConnor Abbott2016-05-101-0/+1
| | | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* i965: use double lowering passConnor Abbott2016-05-101-0/+9
| | | | | | | | | | v2: also lower trunc, ceil, floor, fract and roundEven (Iago) v3: also lower mod for doubles (Sam) Signed-off-by: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* nir: Switch the arguments to nir_foreach_functionJason Ekstrand2016-04-281-6/+6
| | | | | | | | | This matches the "foreach x in container" pattern found in many other programming languages. Generated by the following regular expression: s/nir_foreach_function(\([^,]*\),\s*\([^,]*\))/nir_foreach_function(\2, \1)/ Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* nir: Switch the arguments to nir_foreach_instrJason Ekstrand2016-04-281-4/+4
| | | | | | | | | | | This matches the "foreach x in container" pattern found in many other programming languages. Generated by the following regular expression: s/nir_foreach_instr(\([^,]*\),\s*\([^,]*\))/nir_foreach_instr(\2, \1)/ and similar expressions for nir_foreach_instr_safe etc. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/nir: fixup for new foreach_block()Connor Abbott2016-04-281-51/+39
| | | | Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* nir: rename nir_foreach_block*() to nir_foreach_block*_call()Connor Abbott2016-04-201-5/+5
| | | | Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* nir/dead_variables: Configurably work with any variable modeJason Ekstrand2016-04-131-1/+1
| | | | | | | The old version of the pass only worked on globals and locals and always left inputs, outputs, uniforms, etc. alone. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Port INTEL_PRECISE_TRIG=1 to NIR.Kenneth Graunke2016-04-111-0/+3
| | | | | | | | | | | | | | | | | | | | This makes the extra multiply visible to NIR's algebraic optimizations (for constant reassociation) as well as constant folding. This means that when the result of sin/cos are multiplied by an constant, we can eliminate the extra multiply altogether, reducing the cost of the workaround. It also means we only have to implement it one place, rather than in both backends. This makes INTEL_PRECISE_TRIG=1 cost nothing on GPUTest/Volplosion, which has a ton of sin() calls, but always multiplies them by an immediate constant. The extra multiply gets folded away. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Pass brw_compiler into brw_preprocess_nir() instead of is_scalar.Kenneth Graunke2016-04-111-2/+4
| | | | | | | | | | I want to be able to read other fields. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>
* nir: rename nir_const_value fields to include bitsize informationIago Toral Quiroga2016-03-171-2/+2
| | | | | Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
* nir: update opcode definitions for different bit sizesConnor Abbott2016-03-171-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some opcodes need explicit bitsizes, and sometimes we need to use the double version when constant folding. v2: fix output type for u2f (Iago) v3: do not change vecN opcodes to be float. The next commit will add infrastructure to enable 64-bit integer constant folding so this is isn't really necessary. Also, that created problems with source modifiers in some cases (Iago) v4 (Jason): - do not change bcsel to work in terms of floats - leave ldexp generic Squashed changes to handle different bit sizes when constant folding since otherwise we would break the build. v2: - Use the bit-size information from the opcode information if defined (Iago) - Use helpers to get type size and base type of nir_alu_type enum (Sam) - Do not fallback to sized types to guess bit-size information. (Jason) Squashed changes in i965 and gallium/nir drivers to support sized types. These functions should only see sized types, but we can't make that change until we make sure that nir uses the sized versions in all the relevant places. A later commit will address this. Signed-off-by: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/nir: Lower nir compute shader shared variablesJordan Justen2016-03-171-0/+8
| | | | | Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* i965: Simplify brw_nir_lower_vue_inputs() slightly.Kenneth Graunke2016-02-261-11/+6
| | | | | | | The same code appeared in both branches; pull it above the if statement. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965: Avoid recalculating the normal VUE map for IO lowering.Kenneth Graunke2016-02-261-24/+3
| | | | | | | | The caller already computes it. Now that we have stage specific functions, it's really easy to pass this in. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965: Avoid recalculating the tessellation VUE map for IO lowering.Kenneth Graunke2016-02-261-11/+8
| | | | | | | | The caller already computes it. Now that we have stage specific functions, it's really easy to pass this in. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965: Eliminate brw_nir_lower_{inputs,outputs,io} functions.Kenneth Graunke2016-02-261-81/+7
| | | | | | | | | | | | Now that each stage is directly calling brw_nir_lower_io(), and we have per-stage helper functions, it makes sense to just call the relevant one directly, rather than going through multiple switch statements. This also eliminates stupid function parameters, such as the two that only apply to vertex attributes. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965: Split brw_nir_lower_inputs/outputs into per-stage functions.Kenneth Graunke2016-02-261-130/+174
| | | | | | | | These functions are both giant switch statements where most cases don't overlap at all. Let's put the bulk of the work in per-stage helpers. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965: Remove catch-all nir_lower_io call with specific cases.Kenneth Graunke2016-02-261-1/+4
| | | | | | | | | Most cases already call nir_lower_io explicitly for input and output lowering. This catch all isn't very useful anymore - we can just add it to the remaining cases. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965: Move optimizations from brw_nir_lower_io to brw_postprocess_nir.Kenneth Graunke2016-02-261-1/+3
| | | | | | | | | | This simplifies things. Every caller of brw_nir_lower_io() immediately calls brw_postprocess_nir(). The only real change this will have is that we get an extra brw_nir_optimize() call when compiling compute shaders, but that seems fine. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965: Always do NIR IO lowering at specialization time.Kenneth Graunke2016-02-261-8/+0
| | | | | | | | | | | | | | We've now hit literally every case other than geometry shaders (and compute shaders, but those are a no-op). So, let's just move geometry shaders over too and be done with it. The only advantage to doing this at link time was to save the expense of running the pass on recompiles. But we're already running a lot of passes, and the extra code complexity isn't worth it. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/nir: Do lower_io late for fragment shadersJason Ekstrand2016-02-261-1/+2
| | | | | | | | | | | | The Vulkan driver wants to be able to delete fragment outputs that are beyond key.nr_color_regions; this is a lot easier if we lower outputs at specialization time rather than link time. (Rationale added to commit message by Ken) Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965: Apply VS attribute workarounds in NIR.Kenneth Graunke2016-02-091-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch re-implements the pre-Haswell VS attribute workarounds. Instead of emitting shader code in the vec4 backend, we now simply call a NIR pass to emit the necessary code. This simplifies the vec4 backend. Beyond deleting code, it removes the primary use of ATTR as a destination. It also eliminates the requirement that the vec4 VS backend express the ATTR file in terms of VERT_ATTRIB_* locations, giving us a bit more flexibility. This approach is a little different: rather than munging the attributes at the top, we emit code to fix them up when they're accessed. However, we run the optimizer afterwards, so CSE should eliminate the redundant math. It may even be able to fuse it with other calculations based on the input value. shader-db does not handle non-default NOS settings, so I have no statistics about this patch. Note that the scalar backend does not implement VS attribute workarounds, as they are unnecessary on hardware which allows SIMD8 VS. v2: Do one multiply for FIXED rescaling and select components from either the original or scaled copy, rather than multiplying each component separately (suggested by Matt Turner). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
* i965: Use nir_lower_load_const_to_scalar().Kenneth Graunke2016-02-081-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I don't know why, but we never hooked up this pass Eric wrote. Otherwise, you can end up with stupid scalarized code such as: vec4 ssa_7 = load_const (0.0, 0.0, 0.0, 0.0) vec4 ssa_8 = ... vec1 ssa_9 = feq ssa_8, ssa_7 vec1 ssa_10 = feq ssa_8.y, ssa_7.y vec1 ssa_11 = feq ssa_8, ssa_7.z vec1 ssa_12 = feq ssa_8.y, ssa_7.w ssa_8.xyxy == <0, 0, 0, 0> should only take two feq instructions. shader-db on Skylake: total instructions in shared programs: 9121153 -> 9120749 (-0.00%) instructions in affected programs: 32421 -> 32017 (-1.25%) helped: 277 HURT: 69 total cycles in shared programs: 69003364 -> 69000912 (-0.00%) cycles in affected programs: 899186 -> 896734 (-0.27%) helped: 313 HURT: 403 This also prevents regressions when disabling channel expressions. v2: Don't call opt_cse afterwards (requested by Matt). It should happen in the optimization loop below anyway. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
* nir: move to compiler/Emil Velikov2016-01-261-2/+2
| | | | | | Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Matt Turner <mattst88@gmail.com> Acked-by: Jose Fonseca <jfonseca@vmware.com>