summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965
Commit message (Collapse)AuthorAgeFilesLines
* i965/peephole_ffma: Only match a mul+add if none of the ops are exactJason Ekstrand2016-03-281-0/+11
|
* i965: Allow mul+add fusing againJason Ekstrand2016-03-251-1/+1
|
* i965/vec4: Get rid of a stray predicate inverse in opquantizef16Jason Ekstrand2016-03-251-1/+0
| | | | This fixes 30 opquantize CTS tests on HSW
* Merge remote-tracking branch 'public/master' into vulkanJason Ekstrand2016-03-2430-172/+262
|\
| * mesa: replace gl_context->Multisample._Enabled with ↵Bas Nieuwenhuizen2016-03-246-12/+14
| | | | | | | | | | | | | | | | | | | | | | | | _mesa_is_multisample_enabled. This removes any dependency on driver validation of the number of framebuffer samples. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Tested-by: Brian Paul <brianp@vmware.com>
| * i965/peephole_ffma: Don't fuse exact addsJason Ekstrand2016-03-231-1/+3
| | | | | | | | Reviewed-by: Francisco Jerez <currojerez@riseup.net>
| * i965/fs: Don't constant-fold RCPJason Ekstrand2016-03-221-15/+0
| | | | | | | | | | | | No shader-db changes on Broadwell Reviewed-by: Matt Turner <mattst88@gmail.com>
| * i965: Remove the RCP+RSQ algebraic optimizationsJason Ekstrand2016-03-222-22/+0
| | | | | | | | | | | | | | | | | | NIR already has this optimization and it can do much better than the little peephole in the backend. No shader-db change on Haswell or Broadwell. Reviewed-by: Matt Turner <mattst88@gmail.com>
| * i965: Have NIR lower flrp on pre-GEN6 vec4 backendIan Romanick2016-03-221-2/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we were doing the lowering by hand in vec4_visitor::emit_lrp. By doing it in NIR, we have the opportunity for NIR to do additional optimization of the expanded code. This also enables optimizations added by the next commit. shader-db results: G4X / Ironlake total instructions in shared programs: 4024401 -> 4016538 (-0.20%) instructions in affected programs: 447686 -> 439823 (-1.76%) helped: 2623 HURT: 0 total cycles in shared programs: 84375846 -> 84328296 (-0.06%) cycles in affected programs: 16964960 -> 16917410 (-0.28%) helped: 2556 HURT: 41 Unsurprisingly, no changes on later platforms. v2: Formatting and comment changes suggested by Matt. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
| * i965: fix invalid memory writeMarc-André Lureau2016-03-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I noticed some heap corruption running virgl tests, and valgrind helped me to track it down to the following error: ==29272== Invalid write of size 4 ==29272== at 0x90283D4: push_loop_stack (brw_eu_emit.c:1307) ==29272== by 0x9029A7D: brw_DO (brw_eu_emit.c:1750) ==29272== by 0x90554B0: fs_generator::generate_code(cfg_t const*, int) (brw_fs_generator.cpp:1999) ==29272== by 0x904491F: brw_compile_fs (brw_fs.cpp:5685) ==29272== by 0x8FC5DC5: brw_codegen_wm_prog (brw_wm.c:137) ==29272== by 0x8FC7663: brw_fs_precompile (brw_wm.c:638) ==29272== by 0x8FA4040: brw_shader_precompile(gl_context*, gl_shader_program*) (brw_link.cpp:51) ==29272== by 0x8FA4A9A: brw_link_shader (brw_link.cpp:260) ==29272== by 0x8DEF751: _mesa_glsl_link_shader (ir_to_mesa.cpp:3006) ==29272== by 0x8C84325: _mesa_link_program (shaderapi.c:1042) ==29272== by 0x8C851D7: _mesa_LinkProgram (shaderapi.c:1515) ==29272== by 0x4E4B8E8: add_shader_program (vrend_renderer.c:880) ==29272== Address 0xf2f3cb0 is 0 bytes after a block of size 112 alloc'd ==29272== at 0x4C2AA98: calloc (vg_replace_malloc.c:711) ==29272== by 0x8ED11F7: ralloc_size (ralloc.c:113) ==29272== by 0x8ED1282: rzalloc_size (ralloc.c:134) ==29272== by 0x8ED14C0: rzalloc_array_size (ralloc.c:196) ==29272== by 0x9019C7B: brw_init_codegen (brw_eu.c:291) ==29272== by 0x904F565: fs_generator::fs_generator(brw_compiler const*, void*, void*, void const*, brw_stage_prog_data*, unsigned int, bool, gl_shader_stage) (brw_fs_generator.cpp:124) ==29272== by 0x9044883: brw_compile_fs (brw_fs.cpp:5675) ==29272== by 0x8FC5DC5: brw_codegen_wm_prog (brw_wm.c:137) ==29272== by 0x8FC7663: brw_fs_precompile (brw_wm.c:638) ==29272== by 0x8FA4040: brw_shader_precompile(gl_context*, gl_shader_program*) (brw_link.cpp:51) ==29272== by 0x8FA4A9A: brw_link_shader (brw_link.cpp:260) ==29272== by 0x8DEF751: _mesa_glsl_link_shader (ir_to_mesa.cpp:3006) if_depth_in_loop is an array of size p->loop_stack_array_size, and push_loop_stack() will access if_depth_in_loop[p->loop_stack_depth+1], thus the condition to grow the array should be p->loop_stack_array_size <= (p->loop_stack_depth + 1) (it's currently off by 2...) This can be reproduced by running the following test with virgl test server: LIBGL_ALWAYS_SOFTWARE=y GALLIUM_DRIVER=virpipe bin/shader_runner ./tests/shaders/glsl-fs-unroll-explosion.shader_test -auto Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
| * i965: Fix assert conditions for src/dst x/y offsetsAnuj Phogat2016-03-211-3/+3
| | | | | | | | | | | | Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
| * i965/blorp: Make BlitFramebuffer() do sRGB encoding in ES 3.x.Kenneth Graunke2016-03-211-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to the ES 3.0 and GL 4.4 specifications, glBlitFramebuffer is supposed to perform sRGB decoding and encoding whenever sRGB formats are in use. The ES 3.0 specification is completely clear, and has always stated this. However, the GL specification has changed behavior in 4.1, 4.2, and 4.4. The original behavior stated that no sRGB encoding should occur. The 4.4 behavior matches ES 3.0's wording. However, implementing the new behavior appears to break applications such as Left 4 Dead 2. This patch changes Meta to apply the ES 3.x rules in ES 3.x, but leaves OpenGL alone for now, to avoid breaking applications. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
| * i965/blorp: Refactor sRGB encoding/decoding.Kenneth Graunke2016-03-214-11/+23
| | | | | | | | | | | | | | | | | | | | | | | | Because the rules for sRGB are so insane, we change brw_blorp_miptrees to take decode_srgb and encode_srgb flags, which control linearization of the source and destination separately. This should make it easy to implement whatever crazy combination of rules people throw at us. For now, it should be equivalent. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
| * i965: Stop XY clipping point and line primitives.Kenneth Graunke2016-03-181-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Wide points and lines are not supposed to be clipped by the viewport. Rather, they should be rendered, and any fragments outside of the viewport should be discarded. The traditional use case for this behavior is rendering moving wide point particles. When the center of the point approaches the viewport edge, clipping would make it pop out of view early. Fixes: - dEQP-GLES2.functional.clipping.point.wide_point_clip - dEQP-GLES3.functional.clipping.point.wide_point_clip - dEQP-GLES3.functional.clipping.point.wide_point_clip_viewport_center - dEQP-GLES3.functional.clipping.point.wide_point_clip_viewport_corner - dEQP-GLES3.functional.clipping.line.wide_line_clip_viewport_center - dEQP-GLES3.functional.clipping.line.wide_line_clip_viewport_corner Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94453 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94454 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
| * i965: Scissor to the viewport when rendering points/lines.Kenneth Graunke2016-03-182-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We're about to start allowing wide points/lines whose vertices are outside the viewport past the clipper. This scissoring hack ensures that any fragments generated are still restricted to the viewport. It is not necessary on Gen8+ as those platforms already discard fragments which are outside the viewport. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94453 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94454 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
| * i965: Include the viewport in the scissor rectangle.Kenneth Graunke2016-03-181-4/+4
| | | | | | | | | | | | | | | | | | | | We'll need to use scissoring to restrict fragments to the viewport soon. It seems harmless to include it generally, so let's do that. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94453 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94454 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
| * i965: Introduce an is_drawing_lines() helper.Kenneth Graunke2016-03-181-0/+30
| | | | | | | | | | | | | | | | | | Similar to is_drawing_points(). v2: Account for isoline tessellation output topology. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
| * i965: Move is_drawing_points to brw_state.h.Kenneth Graunke2016-03-182-24/+24
| | | | | | | | | | | | | | | | | | I need to use this in multiple source files. v2: Rebase on TES output domain fix. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
| * i965: Fix gl_TessLevelOuter[] for isolines.Kenneth Graunke2016-03-182-6/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Thanks to James Legg for finding this! From the ARB_tessellation_shader spec: "The number of isolines generated is derived from the first outer tessellation level; the number of segments in each isoline is derived from the second outer tessellation level." According to the PRM, "TF.LineDensity determines # lines" while "TF.LineDetail determines # segments". Line Density is stored at DWord 6, while Line Detail is at DWord 7. So, they're not reversed like they are for triangles and quads. Fixes Piglit's spec/arb_tessellation_shader/execution/isoline, and about 24 dEQP isoline tests (with GL_EXT_tessellation_shader hacked on - it's not normally enabled). Cc: mesa-stable@lists.freedesktop.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94524 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
| * i965: Decode non-normalized coordinates bit in SAMPLER_STATE.Kenneth Graunke2016-03-181-2/+3
| | | | | | | | | | | | | | We weren't printing this for some reason. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
| * i965: Account for TES in is_drawing_points().Kenneth Graunke2016-03-181-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | Now that we implement tessellation shaders, the TES might be the last stage enabled. If it's outputting points, then the primitive type reaching the SF is points. We need to account for this. Caught by Ilia Mirkin. v2: Update dirty bit comment above caller (caught by Iago) Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
| * nir: add a bit_size parameter to nir_ssa_dest_initConnor Abbott2016-03-171-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | v2: Squash multiple commits addressing the new parameter in different files so we don't break the build (Iago) v3: Fix tgsi (Samuel) v4: Fix nir_clone.c (Samuel) v5: Fix vc4 and freedreno (Iago) v6 (Sam) - Fix build errors in nir_lower_indirect_derefs - Use helper to get type size from nir_alu_type. Signed-off-by: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com> Tested-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
| * nir: rename nir_const_value fields to include bitsize informationIago Toral Quiroga2016-03-176-63/+63
| | | | | | | | | | Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
| * nir: update opcode definitions for different bit sizesConnor Abbott2016-03-171-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some opcodes need explicit bitsizes, and sometimes we need to use the double version when constant folding. v2: fix output type for u2f (Iago) v3: do not change vecN opcodes to be float. The next commit will add infrastructure to enable 64-bit integer constant folding so this is isn't really necessary. Also, that created problems with source modifiers in some cases (Iago) v4 (Jason): - do not change bcsel to work in terms of floats - leave ldexp generic Squashed changes to handle different bit sizes when constant folding since otherwise we would break the build. v2: - Use the bit-size information from the opcode information if defined (Iago) - Use helpers to get type size and base type of nir_alu_type enum (Sam) - Do not fallback to sized types to guess bit-size information. (Jason) Squashed changes in i965 and gallium/nir drivers to support sized types. These functions should only see sized types, but we can't make that change until we make sure that nir uses the sized versions in all the relevant places. A later commit will address this. Signed-off-by: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
| * i965/nir: fix check to resolve booleans to work with sized nir_alu_typeSamuel Iglesias Gonsálvez2016-03-171-1/+1
| | | | | | | | | | | | | | | | | | | | As nir_alu_type has now embedded the data size, the check for the instruction's output type (to see if a boolean resolve is required) should ignore the data size part. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* | Merge remote-tracking branch 'origin/master' into vulkanJordan Justen2016-03-1710-56/+93
|\ \ | |/
| * i965/nir: Lower nir compute shader shared variablesJordan Justen2016-03-173-0/+11
| | | | | | | | | | Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
| * i965: Skip execution size adjustment for instructions of width 4Iago Toral Quiroga2016-03-171-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This code in brw_set_dest adjusts the execution size of any instruction with a dst.width < 8. However, we don't want to do this with instructions operating on doubles, since these will have a width of 4, but still need an execution size of 8 (for SIMD8). Unfortunately, we can't just check the size of the operands involved to detect if we are doing an operation on doubles, because we can have instructions that do operations on double operands interpreted as UD, operating on any of its 2 32-bit components. Previous commits have made it so we never emit instructions with a horizontal width of 4 that don't have the correct execution size set for gen6+, so we can skip it in this case, avoiding the conflicts with fp64 requirements. Expanding the same fix to other hardware generations requires many more changes but since we are not targetting fp64 support on them wer don't really care for now. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
| * i965/vec4/gen6: fix exec_size for MOV with a width of 4 in generate_gs_ff_sync()Samuel Iglesias Gonsalvez2016-03-171-1/+3
| | | | | | | | | | Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
| * i965/vec4/gen6: fix exec_size for instructions with destination width of 4Samuel Iglesias Gonsalvez2016-03-171-0/+6
| | | | | | | | | | Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
| * i965/vec4/gen6: fix exec_size for instructions with width of 4 in ↵Samuel Iglesias Gonsalvez2016-03-171-0/+3
| | | | | | | | | | | | | | generate_gs_svb_write() Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
| * i965/gs/gen6: fix execsize for instructions with width of 4 in ↵Samuel Iglesias Gonsalvez2016-03-171-1/+10
| | | | | | | | | | | | | | | | | | | | gen6_sol_program() v2: - Add assert (Topi). Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
| * i965: set correct execsize for MOVS with a width of 4 in brw_find_live_channelIago Toral Quiroga2016-03-171-0/+3
| | | | | | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
| * i965/eu: set execution size for SEND message in brw_send_indirect_messageIago Toral Quiroga2016-03-171-0/+3
| | | | | | | | Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
| * i965/fs: Set exec size for gen7 pull const loadsIago Toral Quiroga2016-03-171-0/+1
| | | | | | | | | | | | | | | | | | | | v2 (Topi): - No need to set the execsize for the indirect send message, the next patch will handle that. - Set the execution size explicitly instead of taking it from the width of the dst that we set before. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
| * i965/eu: set correct execution size in brw_NOPIago Toral Quiroga2016-03-171-2/+3
| | | | | | | | | | | | | | v2: NOP should have an execsize of 1 (Matt) Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
| * meta: Don't use integer handles for shaders or programs.Kenneth Graunke2016-03-163-33/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we gave our internal clear/blit shaders actual GL handles and stored them in the shader/program hash table. We used ordinary GL API entrypoints to work with them. We thought this shouldn't be a problem because GL doesn't allow applications to invent their own names for shaders or programs. GL allocates all names via glCreateShader and glCreateProgram. However, having them in the hash table is a bit risky: if a broken application guesses the name of our shaders or programs, it could alter them, potentially screwing up future meta operations. Also, test cases can observe the programs in the hash table. Running a single dEQP process that executes the following test list: dEQP-GLES3.functional.negative_api.buffer.clear dEQP-GLES3.functional.negative_api.shader.compile_shader dEQP-GLES3.functional.negative_api.shader.delete_shader would result in the last two tests breaking. The compile_shader test calls glCompileShader(9) straight away, and since it hasn't even created any shaders or programs, it expects to get a GL_INVALID_VALUE error because there's no such name. However, because the clear test ran first, it created Meta programs, so an object named "9" did exist. This patch reworks Meta to work with gl_shader and gl_shader_program pointers directly. These internal programs have bogus names, and are never stored in the hash tables, so they're invisible to applications. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94485 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
| * meta: Use the _mesa_meta_compile_and_link_program helper more places.Kenneth Graunke2016-03-161-11/+3
| | | | | | | | | | | | | | Less boilerplate. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
| * meta: Use ARB_explicit_attrib_location in the rest of the meta shaders.Kenneth Graunke2016-03-161-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is cleaner than using glBindAttribLocation(). Not all drivers support the extension, but I don't think those drivers use GLSL in the first place. Apparently some Meta shaders already use GL_ARB_explicit_attrib_location, so I think it should be okay. Honestly, I'm not sure how the old code worked anyway - we bound the attribute location for "texcoords", while all the shaders capitalized or spelled it differently. v2: Convert another instance in brw_meta_fast_clear.c. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* | Merge remote-tracking branch 'public/master' into vulkanJason Ekstrand2016-03-1534-322/+645
|\ \ | |/
| * i965/fs: Restrict inequality that can only hold equal in saturate propagation.Francisco Jerez2016-03-141-1/+1
| | | | | | | | | | | | | | | | | | | | Should have no functional change. The IP value of an instruction that reads src_var cannot possibly be after the end of the live interval of the variable it's reading from, by the definition of live interval. Might save future readers a momentary WTF while trying to understand this code. Reviewed-by: Matt Turner <mattst88@gmail.com>
| * i965/vec4: Consider removal of no-op MOVs as progress during register coalesce.Francisco Jerez2016-03-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Bug found by the liveness analysis validation pass that will be introduced in a later commit. The no-op MOV check in opt_register_coalesce() was removing instructions which makes the cached liveness analysis calculation inconsistent with the shader IR. We were failing to set progress to true in that case though, which means that invalidate_live_intervals() wouldn't necessarily be called at the end of the function. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Matt Turner <mattst88@gmail.com>
| * i965/fs: Add missing analysis invalidation in fixup_3src_null_dest().Francisco Jerez2016-03-141-0/+6
| | | | | | | | | | | | | | | | | | | | Bug found by the liveness analysis validation pass that will be introduced in a later commit. fixup_3src_null_dest() was allocating registers which makes the cached liveness analysis calculation incomplete, so it must be invalidated. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Matt Turner <mattst88@gmail.com>
| * i965/fs: Add missing analysis invalidation in opt_sampler_eot().Francisco Jerez2016-03-141-1/+4
| | | | | | | | | | | | | | | | | | | | | | Bug found by the liveness analysis validation pass that will be introduced in a later commit. opt_sampler_eot() was allocating registers and inserting and removing instructions, which makes the cached liveness analysis calculation inconsistent with the shader IR, so it must be invalidated. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Matt Turner <mattst88@gmail.com>
| * mesa: docs: Intel i965 hardware limits.Sarah Sharp2016-03-141-7/+48
| | | | | | | | | | | | | | This should help the next person working on hardware enabling figure out where in the Intel PRMs to find the magic platform hardware values. Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
| * mesa: docs: i965: Use correct doxygen groupings syntaxSarah Sharp2016-03-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When reading the source code, it's useful to indicate that a group of fields in a struct are related in someway. There were several places where people tried to group related structure members with the {@ syntax, without realizing they also needed to add the \name syntax in order to generate correct doxygen html. There are several files with groupings that look like this: struct foo { /** * Related fields description * @{ */ int bar; char baz; /** @} */ long qux; } However, the doxygen syntax for grouping is: struct foo { /** * \name Related fields description * @{ */ int bar; char baz; /** @} */ long qux; } https://www.stack.nl/~dimitri/doxygen/manual/grouping.html Without the group name definition, the fields don't get properly grouped. Instead, the group description is applied to the first field. Fix the Intel hardware information structure, brw_device_info to properly group the GPU hardware limitations and hardware quirks fields. Once you've run `cd doxygen; make clean; make all`, updated documentation can be found at mesa/doxygen/i965/structbrw__device__info.html Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
| * i965: Remove useless IR self-destruct backend_shader method.Francisco Jerez2016-03-132-8/+0
| | | | | | | | | | | | | | | | | | | | | | From the point it's constructed the CFG contains the only existing copy of the program IR, and it never becomes invalid. Calling backend_shader::invalidate_cfg would have destroyed the program structure irrecoverably -- We weren't calling it at all for a good reason. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org Reviewed-by: Matt Turner <mattst88@gmail.com>
| * i965: Use foreach_in_list_reverse_safe() macro.Matt Turner2016-03-121-12/+2
| | | | | | | | Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
| * i965/chv: Display proper brandingBen Widawsky2016-03-113-5/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "Braswell" is a Cherryview based *thing*. It unfortunately requires extra information to determine its marketing name. Unlike all previous products, and hopefully all future ones, there is no unique 1:1 mapping of PCI device ID to brand string. I put up a fight about adding any complexity to our GL renderer string code for a very long time. However, a wise man made a comment to me that I couldn't argue with: if a user installs Windows on their hardware, the brand string should be the same as what we display in Linux. The Windows driver apparently does this check, so we should too. Note that I did manage to find a good use for this info anyway in the compute shader thread counts. v2: memcpy instead of strncpy, and some minor changes (Matt) Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com
| * i965/chv: Update lower min for CS threadsBen Widawsky2016-03-111-1/+1
| | | | | | | | | | | | | | | | We have better information now, and 28 was not a valid thing to support. 6 EUs per sublice with 7 threads per EU is the minimum supported config. Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com