external_mesa3d.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	i965: Transplant PIPE_CONTROL routines to brw_pipe_control	Chris Wilson	2015-06-24	1	-0/+1
\| \| \| \| \| \| \| \| \|	Start trimming the fat from intel_batchbuffer.c. First by moving the set of routines for emitting PIPE_CONTROLS (along with the lore concerning hardware workarounds) to a separate brw_pipe_control.c Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Split VUE map handling out of brw_vs.c into brw_vue_map.c.	Kenneth Graunke	2015-06-22	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	This was originally only used by the vertex shader, but it's now used by the geometry shader as well, and will also eventually be used for tessellation control and evaluation shaders. I suspect it will be easier to find in a file named after the concept. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
*	i965/fs: Introduce FS IR builder.	Francisco Jerez	2015-06-09	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: Remove the old fragment program code	Jason Ekstrand	2015-05-28	1	-1/+0
\| \| \| \| \| \| \|	Now that everything is running through NIR, this is all dead. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: add brw_cs.h to the sources list	Emil Velikov	2015-05-19	1	-0/+1
\| \| \| \|	Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
*	i965: Use predicate enable bit for conditional rendering w/o stalling	Neil Roberts	2015-05-12	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously whenever a primitive is drawn the driver would call _mesa_check_conditional_render which blocks waiting for the result of the query to determine whether to render. On Gen7+ there is a bit in the 3DPRIMITIVE command which can be used to disable the primitive based on the value of a state bit. This state bit can be set based on whether two registers have different values using the MI_PREDICATE command. We can load these two registers with the pixel count values stored in the query begin and end to implement conditional rendering without stalling. Unfortunately these two source registers were not in the whitelist of available registers in the kernel driver until v3.19. This patch uses the command parser version from intel_screen to detect whether to attempt to set the predicate data registers. The predicate enable bit is currently only used for drawing 3D primitives. For blits, clears, bitmaps, copypixels and drawpixels it still causes a stall. For most of these it would probably just work to call the new brw_check_conditional_render function instead of _mesa_check_conditional_render because they already work in terms of rendering primitives. However it's a bit trickier for blits because it can use the BLT ring or the blorp codepath. I think these operations are less useful for conditional rendering than rendering primitives so it might be best to leave it for a later patch. v2: Use the command parser version to detect whether we can write to the predicate data registers instead of trying to execute a register load command. v3: Simple rebase v4: Changes suggested by Kenneth Graunke: Split the load_64bit_register function out to a separate patch so it can be a shared public function. Avoid calling _mesa_check_conditional_render if we've already determined that there's no query object. Some styling fixes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Implement DispatchCompute() back-end	Paul Berry	2015-05-02	1	-0/+1
\| \| \| \| \| \| \|	brw_emit_gpgpu_walker will be implemented in a subsequent patch. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/cache: Add support for CS in program state cache	Jordan Justen	2015-05-02	1	-0/+1
\| \| \| \| \| \|	Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Create NIR during LinkShader() and ProgramStringNotify().	Kenneth Graunke	2015-04-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, we translated into NIR and did all the optimizations and lowering as part of running fs_visitor. This meant that we did all of that work twice for fragment shaders - once for SIMD8, and again for SIMD16. We also had to redo it every time we hit a state based recompile. We now generate NIR once at link time. ARB programs don't have linking, so we instead generate it at ProgramStringNotify time. Mesa's fixed function vertex program handling doesn't bother to inform the driver about new programs at all (which is rather mean), so we generate NIR at the last minute, if it hasn't happened already. shader-db runs ~9.4% faster on my i7-5600U, with a release build. v2: Check NirOptions != NULL in ProgramStringNotify(). Don't bother using _mesa_program_enum_to_shader_stage as we already know it. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
*	i965: add the remaining files to the tarball	Emil Velikov	2015-03-24	1	-0/+3
\| \| \| \| \|	Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: Add a NIR analysis pass for determining when a boolean resolve is needed	Jason Ekstrand	2015-03-23	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	v2: Fix the spelling of analyze and re-arrange code for better readability as per Connor's comments. v3: Make the naming of things more consistent and add a pile of comments v4: Stop trying to avoid vectors Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
*	i965/fs: Add pass to combine immediates.	Matt Turner	2015-02-17	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	total instructions in shared programs: 5885407 -> 5940958 (0.94%) instructions in affected programs: 3617311 -> 3672862 (1.54%) helped: 3 HURT: 23556 GAINED: 31 LOST: 165 ... but will allow us to always emit MAD instructions. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Refactor tiled memcpy functions and move them into their own file	Sisinty Sasmita Patra	2015-01-26	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit refactors the tiled_memcpy code in intel_tex_subimage.c and moves it into its own file intel_tiled_memcpy files. Also, xtile_copy and ytile_copy are renamed to linear_to_xtiled and linear_to_ytiled respectively. The *_faster functions are similarly renamed. There was also a bit of logic to select between the the libc provided memcpy function and our custom memcpy that does an RGBA -> BGRA swizzle. This was moved into an intel_get_memcpy function so that rgba8_copy can live (and be inlined) in intel_tiled_memcpy.c. v2: Jason Ekstrand <jason.ekstrand@intel.com> - Better commit message - Fix up the copyright on the intel_tiled_memcpy files - Various whitespace fixes - Moved a bunch of stuff that did not need to be exposed from intel_tiled_memcpy.h to intel_tiled_memcpy.c - Added proper documentation for intel_get_memcpy - Incorperated the ptrdiff_t tweaks from commit 225a09790 v3: Jason Ekstrand <jason.ekstrand@intel.com> - Fixed a comment - Move the tile size constants into the .c file Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Chad Versace <chad.versace@intel.com>
*	i965/fs: Add pass to propagate conditional modifiers.	Matt Turner	2015-01-23	1	-0/+1
\| \| \| \| \| \| \| \| \|	total instructions in shared programs: 5974160 -> 5959463 (-0.25%) instructions in affected programs: 1743737 -> 1729040 (-0.84%) GAINED: 0 LOST: 12 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965/fs: add a NIR frontend	Connor Abbott	2015-01-15	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	This is similar to the GLSL IR frontend, except consuming NIR. This lets us test NIR as part of an actual compiler. v2: Jason Ekstrand <jason.ekstrand@intel.com>: Make brw_fs_nir build again Only use NIR of INTEL_USE_NIR is set whitespace fixes
*	i965: Add headers to distribution.	Matt Turner	2014-12-12	1	-0/+47
\|
*	i965: Alphabetize source list.	Matt Turner	2014-12-12	1	-35/+35
\|
*	i965/vec4: Rewrite dead code elimination to use live in/out.	Matt Turner	2014-12-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Improves 359 shaders by >=10% 114 shaders by >=20% 91 shaders by >=30% 82 shaders by >=40% 22 shaders by >=50% 4 shaders by >=60% 2 shaders by >=80% total instructions in shared programs: 5845346 -> 5822422 (-0.39%) instructions in affected programs: 364979 -> 342055 (-6.28%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Add functions to convert float <-> VF.	Matt Turner	2014-11-25	1	-0/+1
\| \| \| \|	Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Rename brw_vec4_gs.[ch] to brw_gs.[ch].	Kenneth Graunke	2014-10-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	These source files support actual geometry shaders, so using "gs" for the name makes a lot of sense. We're going to be adding SIMD8 geometry shader support as well, at which point "vec4_gs" will be a misnomer. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Matt Turner <mattst88@gmail.com> Acked-by: Jason Ekstrand <jason.ekstrand@intel.com> Acked-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965: Rename brw_gs{,_emit}.[ch] to brw_ff_gs{,_emit}.[ch].	Kenneth Graunke	2014-10-29	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The brw_gs.[ch] and brw_gs_emit.c source files contain code for emulating fixed-function unit functionality (VF primitive decomposition or SOL) using the GS unit. They do not contain code to support proper geometry shaders. We've taken to calling that code "ff_gs" (see brw_ff_gs_prog_key, brw_ff_gs_prog_data, brw_context::ff_gs, brw_ff_gs_compile, brw_ff_gs_prog). So it makes sense to make the filenames match. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Matt Turner <mattst88@gmail.com> Acked-by: Jason Ekstrand <jason.ekstrand@intel.com> Acked-by: Iago Toral Quiroga <itoral@igalia.com>
*	i965/gen6/gs: Add initial implementation for a gen6 geometry shader visitor.	Iago Toral Quiroga	2014-09-19	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Geometry shaders in gen6 are significantly different from gen7+ so it is better to have them implemented in a different file rather than adding gen6 branching paths all over brw_vec4_gs_visitor.cpp. This commit adds an initial implementation that only handles point output, which is the simplest case. Acked-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
*	i965: Split gen6 depth hiz state out from brw	Jordan Justen	2014-08-15	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	We will program the gen6 hiz depth state differently to enable layered rendering on gen6. v2: * Remove unneeded gen6_emit_depthbuffer as suggested by Topi Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Split gen6 renderbuffer surface state from gen5 and older	Jordan Justen	2014-08-15	1	-0/+1
\| \| \| \| \| \| \| \| \|	We will program the gen6 renderbuffer surface state differently to enable layered rendering on gen6. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Implement fast color clears using meta operations	Kristian Høgsberg	2014-08-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch uses the infrastructure put in place by previous patches to implement fast color clears and replicated color clears in terms of meta operations. This works all the way back to gen7 where fast clear was introduced and adds support for fast clear on gen8. It replaces the blorp path completely and improves on a few cases. Layered clears are now done using instanced rendering and multiple render-target clears use a MRT shader with rep16 writes. Signed-off-by: Kristian Høgsberg <krh@bitplanet.net> Acked-by: Kenneth Graunke <kenneth@whitecape.org>
*	android: dri/i9*5: remove used _INCLUDES variable	Emil Velikov	2014-08-13	1	-6/+1
\| \| \| \| \| \|	No longer needed as of last commit. Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
*	i965: Delete the Gen8 code generators.	Kenneth Graunke	2014-08-12	1	-4/+0
\| \| \| \| \| \| \| \|	We now use the brw_eu_emit.c code instead. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Matt Turner <mattst88@gmail.com>
*	i965: Add support for ARB_copy_image	Jason Ekstrand	2014-08-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This, together with the meta path, provides a complete implemetation of ARB_copy_image. v2: Add a fallback memcpy path for when the texture is too big for the blitter v3: Properly support copying between two places on the same texture in the memcpy fallback v4: Properly handle blit between the same two images in the fallback path v5: Properly handle blit between the same two compressed images in the fallback path v6: Fix a typo in a comment Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Neil Roberts <neil@linux.intel.com>
*	i965: Stop using gen7_update_sampler_state; rm gen7_sampler_state.c.	Kenneth Graunke	2014-08-02	1	-1/+0
\| \| \| \| \| \| \| \|	The code in brw_sampler_state.c now handles all generations; we don't need the extra Gen7+ only code anymore. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
*	i965: Rename brw_wm_sampler_state.c to brw_sampler_state.c.	Kenneth Graunke	2014-08-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the driver was originally written, it only supported texturing in the pixel shader backend; vertex and geometry shader texturing came much later. Originally, the pixel shader was referred to as "WM" (the Windowizer/Masker unit). So, this code happened to only be relevant for the WM stage, at the time. However, sampler state really applies to all stages, so putting "wm" in the filename doesn't make sense. I dropped it in gen7_sampler_state.c; at this point the asymmetry just trips people up. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
*	i965/vec4: Add basic common subexpression elimination.	Kenneth Graunke	2014-07-06	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	[mattst88]: Modified to perform CSE on instructions with the same writemask. Offered no improvement before. total instructions in shared programs: 1995633 -> 1995185 (-0.02%) instructions in affected programs: 14410 -> 13962 (-3.11%) Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Rename intel_asm_printer -> intel_asm_annotation.	Matt Turner	2014-07-05	1	-1/+1
\| \| \| \| \| \|	The #ifndef include guards already said the right thing :) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
*	i965/disasm: Delete gen8_disasm.c.	Kenneth Graunke	2014-06-30	1	-1/+0
\| \| \| \| \| \| \| \|	The functionality has been merged into brw_disasm.c; use that instead. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
*	i965: Add annotation data structure and support code.	Matt Turner	2014-05-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Will be used to print disassembly after jump targets are set and instructions are compacted, while still retaining higher-level IR annotations and basic block information. An array of 'struct annotation' will live along side the generated assembly. The generators will populate the array with their IR annotations, and basic block pointers if the instructions began or ended a basic block pointer. We'll then update the instruction offset when we compact instructions and then using the annotations print the disassembly. Reviewed-by: Eric Anholt <eric@anholt.net>
*	i965/meta: Stencil blits	Topi Pohjolainen	2014-05-15	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	v2: Create the intel renderbuffer with level hardcoded to zero instead of overriding it in the surface state configuration. Also moved the dimension adjustments for tiling, mip level, msaa into the render buffer creation. Finally prepares for another blit path needed for miptree updownsampling. v3 (Ken): Dropped unnecessary memory context for "ralloc_asprintf()" Cc: "10.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
*	i965/blorp: Expose coordinate scissoring and mirroring	Topi Pohjolainen	2014-05-12	1	-0/+1
\| \| \| \| \| \|	Cc: "10.2" <mesa-stable@lists.freedesktop.org> Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	i965: Delete the intel_regions.c code.	Eric Anholt	2014-05-01	1	-1/+0
\| \| \| \| \|	Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
*	i965/fs: Reimplement dead_code_elimination().	Matt Turner	2014-04-15	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	total instructions in shared programs: 1653399 -> 1651790 (-0.10%) instructions in affected programs: 92157 -> 90548 (-1.75%) GAINED: 2 LOST: 2 Also significantly reduces the number of optimization loop iterations: total loop iterations in shared programs: 39724 -> 31651 (-20.32%) loop iterations in affected programs: 21617 -> 13544 (-37.35%) Including some great pathological cases, like 29 -> 3 in Strike Suit Zero and 24 -> 3 in Dota2. Reviewed-by: Eric Anholt <eric@anholt.net>
*	i965/fs: Split fs_visitor::register_coalesce() into its own file.	Matt Turner	2014-04-05	1	-0/+1
\| \| \| \| \| \| \|	The function has gotten large, and brw_fs.cpp is the largest source file in the driver. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
*	i965/gen8: Change the winsys MSAA blits from blorp to meta.	Eric Anholt	2014-03-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This gets us equivalent code paths on BDW and pre-BDW, except for stencil (where we don't have MSAA stencil resolve code yet) Improves MSAA-forced citybench by 7.94496% +/- 2.38429% (n=16). Reduces DRI2 MSAA glxgears performance by -12.3559% +/- 1.52845% (n=9). v2: Move the new meta code to brw_meta_updownsample.c, name it brw_meta_updownsample(), add a comment about intel_rb_storage_first_mt_slice(), and rename that function and move the RB generation into it (review ideas by Ken). v3: Fix 2 src vs dst pasteos in previous change. v4: Skip this path pre-gen8 for now, until we can analyze the glxgears performance delta some more. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*	glsl/i965: move lower_offset_array up to GLSL compiler level.	Dave Airlie	2014-02-25	1	-1/+0
\| \| \| \| \| \| \| \|	This lowering pass will be useful for gallium drivers as well, in order to support the GL TG4 oddity that is textureGatherOffsets. Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Signed-off-by: Dave Airlie <airlied@redhat.com>
*	i965: Update GS state for Broadwell.	Kenneth Graunke	2014-01-31	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is quite similar to the Gen7 code. The main changes: - 48-bit relocations - Thread count is specified as U/2-1 instead of U-1. - An extra DWord (DW9) with clip planes, URB entry output length/offsets - We need to program the "Expected Vertex Count" (VerticesIn) v2: Set the number of binding table entries so they can be prefetched (requested by Eric Anholt). v3: Add a WARN_ONCE for a missing workaround. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
*	i965: Update multisampling state for Broadwell.	Kenneth Graunke	2014-01-31	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On previous platforms, 3DSTATE_MULTISAMPLE contained the number of samples, pixel location, and the positions of each sample within a pixel for each multisampling mode (4x and 8x). It was also a non-pipelined command, presumably since changing the sample positions is fairly drastic. Broadwell improves upon this by splitting the sample positions out into a separate non-pipelined state packet, 3DSTATE_SAMPLE_PATTERN. With that removed, 3DSTATE_MULTISAMPLE becomes a pipelined state packet. Broadwell also supports 2x and 16x multisampling, in addition to the 4x and 8x supported by Gen7. This patch, however, does not implement 2x and 16x. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Chad Versace <chad.versace@linux.intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>
*	i965: Update 3DSTATE_{DEPTH,STENCIL,...}_BUFFER and such for Broadwell.	Kenneth Graunke	2014-01-31	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The amount of cut and paste from Gen7 is rather ugly, and should probably be cleaned up in the future. Even the Gen7 code is in need of some tidying though; many of the function parameters aren't used on platforms that use level/layer rather than tile offsets. Tidying both can be left to a future patch series. This at least gets things going. v2: Rebase on Paul's rename of NumLayers -> MaxNumLayers. v3: Shift QPitch by 2 when storing it in the packet. Bits 14:0 store bits 16:2 of the actual value. Fixes tests. v4: Add missing stencil buffer QPitch. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Eric Anholt <eric@anholt.net>
*	i965: Update SF_CLIP_VIEWPORT for Broadwell.	Kenneth Graunke	2014-01-31	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	It has additional fields to support clipping to the viewport even if guardband clipping is enabled. v2: Update for viewport array changes. v3: No, seriously, update for viewport array changes. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
*	i965: Rework SURFACE_STATE entries for Broadwell.	Kenneth Graunke	2014-01-31	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	v2: Add missing SCS setting in gen8_emit_buffer_surface_state (caught by Eric Anholt). v3: Use stored QPitch rather than recomputing it. v4: Shift QPitch by 2 when setting it in the packet; bits 14:0 store bits 16:2 of the actual value (fixes myriads of cube and array texturing tests). Also, only enable cube face bits for cubemaps (matches Chris Forbes' commit on master). Port to use offset64. v5: s/gl_format/mesa_format/g v6: Fix DW5 of renderbuffer state, which neglected to subtract irb->mt->first_level. Use vertical_alignment() rather than hardcoding 4. Use ffs for multisample counts rather than a large switch statement (all caught/suggested by Eric). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
*	i965: Update SOL state for Broadwell.	Kenneth Graunke	2014-01-31	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unlike on Gen7, we can directly set the offset via the state packet. We also -have- to: the kernel SOL reset code won't work anymore. v2: Fix copy and paste mistake in buffer stride setup; drop stale comment (caught by Eric Anholt). Add a perf_debug for missing MOCS setup. v3: Rebase on Paul Berry's changes to CurrentVertexProgram. v4: Fix SO Write Offset handling. We need to set bits 20 and 21 so the hardware both loads and saves the offset. There's also a restriction that 3DSTATE_SO_BUFFER can only be programmed once per buffer between primitives, so the "reset to zero" code needed reworking. Fixes most of the transform feedback Piglit tests. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> [v2]
*	i965: Update the code that disables unused shader stages for Broadwell.	Kenneth Graunke	2014-01-31	1	-0/+1
\| \| \| \| \| \| \|	v2: Also disable 3DSTATE_WM_CHROMAKEY for safety. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
*	i965: Rework vertex uploads for Broadwell.	Kenneth Graunke	2014-01-31	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	v2: Emit a dummy 3DSTATE_VF_SGVS packet when not needed. v3: Add WARN_ONCE and perf_debugs requested by Eric Anholt. v4: Program 3DSTATE_SGVS even in the no-elements case so gl_VertexID continues working. Fix 3DSTATE_VF_INSTANCING to not use an element index to access the buffers array. Some ARB_draw_indirect prep work. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>
*	i965: Update STATE_BASE_ADDRESS for Broadwell.	Kenneth Graunke	2014-01-31	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	v2: Fix missing "change" bit on instruction state base address (caught by Haihao Xiang). v3: Add a perf_debug for missing MOCS setup, requested by Eric. v4: Fix buffer sizes. The value, specified at bit 12 and up, is actually measured in 4k pages. We need to round up to the next multiple of 4k. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> [v3] Reviewed-by: Matt Turner <mattst88@gmail.com> [v4]