summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* tgsi: add option to dump floats as hex valuesDave Airlie2015-10-233-2/+30
| | | | | | | | | | | This adds support to the parser to accept hex values as floats, and then adds support to the dumper to allow the user to select to dump float as 32-bit hex numbers. This is required to get accurate values for virgl use of TGSI. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
* svga: Condition preemptive flush on draw emissionSinclair Yeh2015-10-224-5/+25
| | | | | | | | | | | | | | On ultra high resolution modes, the preemptive flush flag can be set midway through command submission, a condition that cannot be recovered from a flush-retry, causing rendering artifacts. This patch prevents a preemtive_flush until a draw has been emitted. Signed-off-by: Sinclair Yeh <syeh@vmware.com> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>
* svga: try to avoid index generation for some primitive typesBrian Paul2015-10-221-0/+14
| | | | | | | | | | The svga device doesn't directly support quads, quad strips or polygons so we have to convert those types to indexed triangle lists. But we can sometimes avoid that if we're drawing flat/constant-colored prims and we don't have to worry about provoking vertex. Reviewed-by: Charmaine Lee <charmainel@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com>
* svga: avoid provoking vertex conversion when possibleBrian Paul2015-10-221-1/+14
| | | | | | | | | | | | Provoking vertex comes into play when doing flat shading. But if we know that all fragments in a primitive are the same color, the provoking vertex doesn't matter. Check for that case and use whichever provoking vertex convention is supported by the device. This avoids generating an index buffer to do the PV conversion. Reviewed-by: Charmaine Lee <charmainel@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com>
* svga: detect constant color writes in fragment shadersBrian Paul2015-10-225-2/+77
| | | | | | | | | | | | | | | | | | Examine the fragment shader to try to detect TGSI shaders which use "MOV OUT[0], CONST[i]" to write a constant value for the fragment color. In this case, all fragments will have the same color (unless blending is enabled). This is a common case for OpenGL code such as: glColor(), glBegin(), glVertex(), ..., glEnd() when lighting/fog/etc are disabled. In this case, the Mesa/gallium state tracker actually generates a simple "MOV OUT[0], CONST[i]" fragment shader. This will be used by the next commit to avoid provoking vertex conversion (creating/rewriting an index buffer) when drawing flat-shaded primitives. Reviewed-by: Charmaine Lee <charmainel@vmware.com> Reviewed-by: José Fonseca <jfonseca@vmware.com>
* mesa: check for unchanged line width before error checkingBrian Paul2015-10-221-3/+4
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* st/mesa: use _mesa_RasterPos() when possibleBrian Paul2015-10-221-0/+10
| | | | | | | | | | | | | | | The st_RasterPos() function goes to great pains to implement the rasterpos transformation. It basically uses gallium's draw module to execute the vertex shader to draw a point, then capture that point's attributes. But glRasterPos isn't typically used with a vertex shader so we can usually use the old/fixed-function implementation which is a lot simpler and faster. This can add up for legacy apps that make a lot of calls to glRasterPos. Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* tnl: remove t_rasterpos.cBrian Paul2015-10-222-479/+0
| | | | Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* drivers/common: use _mesa_RasterPos instead of _tnl_RasterPosBrian Paul2015-10-221-1/+2
| | | | Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* mesa: copy rasterpos evaluation code into core MesaBrian Paul2015-10-222-0/+444
| | | | | | | We'll remove it from the tnl module next. By lifting this code into core Mesa we can use it from the gallium state tracker. Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* vbo: optimize vertex copying when 'wrapping'Brian Paul2015-10-222-17/+14
| | | | | | | Instead of calling memcpy() 'n' times, we can do it all at once since the source and dest regions are all contiguous. Reviewed-by: Matt Turner <mattst88@gmail.com>
* radeon/uvd: don't expose HEVC on old UVD hw (v3)Alex Deucher2015-10-221-32/+18
| | | | | | | | | | | | | | | | The section for UVD 2 and older was not updated when HEVC support was added. Reported by Kano on irc. v2: integrate the UVD2 and older checks into the main switch statement. v3: handle encode checking as well. Encode is already checked in the top case statement, so drop encode checks in the lower case statement. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: mesa-stable@lists.freedesktop.org
* i965/vec4: print predicate control at brw_vec4 dump_instructionAlejandro Piñeiro2015-10-223-3/+5
| | | | | | | v2: externalize pred_ctrl_align16 from brw_disasm.c instead of adding a copy on brw_vec4.c, as suggested by Matt Turner Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: use an envvar to decide to print the assembly on cmod_propagation ↵Alejandro Piñeiro2015-10-222-2/+2
| | | | | | | | | | | | | | | | | tests The complete way to do this would be parse INTEL_DEBUG and print the output if DEBUG_VS (or a new one) is present (see intel_debug.c). But that seems like an overkill for the unit tests, that after all, the most common use case is being run when calling make check. v2: use the same idea for the fs counterpart too, as suggested by Matt Turner Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: Add unit tests for cmod propagation passAlejandro Piñeiro2015-10-222-0/+829
| | | | | | | | | | | | | | | | | | This include the same tests coming from test_fs_cmod_propagation, (non vector glsl types included) plus some new with vec4 types, inspired on the regressions found while the optimization was a work in progress. Additionally, the check of number of instructions after the optimization was changed from EXPECT_EQ to ASSERT_EQ. This was done to avoid a crash on failing tests that expected no optimization, as after checking the number of instructions, there were some checks related to this last instruction opcode/conditional mod. v2: update tests after Matt Turner's review of the optimization pass v3: tweaks on the tests (mostly on the comments), after Matt Turner's review Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: adding vec4_cmod_propagation optimizationAlejandro Piñeiro2015-10-224-0/+160
| | | | | | | | | | | | | | | | | | | | | vec4 port of fs_cmod_propagation. Shader-db results (no vec4 grepping): total instructions in shared programs: 6240413 -> 6235841 (-0.07%) instructions in affected programs: 401933 -> 397361 (-1.14%) total loops in shared programs: 1979 -> 1979 (0.00%) helped: 2265 HURT: 0 v2: remove extra space and combine two if blocks, as suggested by Matt Turner v3: add condition check to bail out if current inst and inst being scanned has different writemask, as pointed by Matt Turner v3: updated shader-db numbers v4: remove block from foreach_inst_in_block_*_starting_from after commit 801f151917fedb13c5c6e96281a18d833dd6901f Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: track and use independently each flag channelAlejandro Piñeiro2015-10-223-14/+52
| | | | | | | | | | | | | | | vec4_live_variables tracks now each flag channel independently, so vec4_dead_code_eliminate can update the writemask of null registers, based on which component are alive at the moment. This would allow vec4_cmod_propagation to optimize out several movs involving null registers. v2: added support to track each flag channel independently at vec4 live_variables, as v1 assumed that it was already doing it, as pointed by Francisco Jerez v3: general cleaningn after Matt Turner's review Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: nir_emit_if doesn't need to predicate based on all the channelsAlejandro Piñeiro2015-10-221-1/+3
| | | | | | | v2: changed comment, as suggested by Matt Turner Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/vec4/gs: Fix signed/unsigned comparison warning.Matt Turner2015-10-221-1/+1
|
* i965/fs: Emit a single ADD instruction for SET_SAMPLE_ID on Gen8+.Matt Turner2015-10-221-1/+1
| | | | | | | | | Gen8+ lifted the register region restriction that an instruction whose destination spans two registers must have sources that also span two registers. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965/fs: Drop unnecessary write-enable-all from SET_SAMPLE_ID.Matt Turner2015-10-221-5/+5
| | | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965/fs: Trim unneeded channels in SampleID setup.Matt Turner2015-10-221-6/+6
| | | | | | | | | The AND and SHR produce a scalar value that we had been replicating across $dispatch_width channels. The immediate MOV produces only four useful channels of data. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965/fs: Use type-W for immediate in SampleID setup.Matt Turner2015-10-222-3/+3
| | | | | | | | | | | Not a functional difference, but register is loaded with a signed immediate (V) and added to a signed type (D) producing a signed result (D). Also change the type of g0 to allow for compaction. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
* i965/vec4: Initialize LOD to 0.0f for textureQueryLevels() and texture().Matt Turner2015-10-221-0/+12
| | | | | | | | | | | | | We implement textureQueryLevels (which takes no arguments, save the sampler) using the resinfo message (which takes an argument of LOD). Without initializing it, we'd generate a MOV from the null register to load the LOD argument. Essentially the same logic applies to texture. A vertex shader cannot compute derivatives and so cannot produce an LOD, so TXL with an LOD of 0.0 is used. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965: Note that the UV immediate type is Gen6+.Matt Turner2015-10-221-1/+1
|
* gallivm: Translate all util_cpu_caps bits to LLVM attributes.Jose Fonseca2015-10-221-2/+34
| | | | | | | | | | | | | | This should prevent disparity between features Mesa and LLVM believe are supported by the CPU. http://lists.freedesktop.org/archives/mesa-dev/2015-October/thread.html#96990 Tested on a i7-3720QM w/ LLVM 3.3 and 3.6. v2: Increase SmallVector initial size as suggested by Gustaw Smolarczyk. Reviewed-by: Roland Scheidegger <sroland@vmware.com> CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
* i965/fs: Disable CSE optimization for untyped & typed surface readsJordan Justen2015-10-223-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | An untyped surface read is volatile because it might be affected by a write. In the ES31-CTS.compute_shader.resources-max test, two back to back read/modify/writes of an SSBO variable looked something like this: r1 = untyped_surface_read(ssbo_float) r2 = r1 + 1 untyped_surface_write(ssbo_float, r2) r3 = untyped_surface_read(ssbo_float) r4 = r3 + 1 untyped_surface_write(ssbo_float, r4) And after CSE, we had: r1 = untyped_surface_read(ssbo_float) r2 = r1 + 1 untyped_surface_write(ssbo_float, r2) r4 = r1 + 1 untyped_surface_write(ssbo_float, r4) Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* ilo: make sure there is HiZ before resolvingChia-I Wu2015-10-221-2/+4
| | | | We do not want to perform a depth resolve on an MCS enabled surface.
* ilo: fix max thread count for HS on Gen8Chia-I Wu2015-10-221-3/+5
| | | | It is in DW2 on Gen8.
* i965: Advertise ARB_shader_stencil_export (gen9+)Ben Widawsky2015-10-212-0/+2
| | | | | Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Implement ARB_shader_stencil_export (gen9+)Ben Widawsky2015-10-219-3/+98
| | | | | | | | | | | | | v2: remove useless source_stencil_to_render_target (Ken) Squash in the actual packing function, which also got to v2: Move the definition of the OPCODE outside of FB_WRITE opcodes (Matt) Reorder the regioning to be in VWH order (Matt) Don't retype src in the backend, just assert instead (Matt) Rename the debug prints to something better (Matt) Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/fs: Enumerate logical fb writes argumentsBen Widawsky2015-10-213-21/+29
| | | | | | | | | | | | | | | | | | | Gen9 adds the ability to write out a stencil value, so we need to expand the virtual payload by one. Abstracting this now makes that change easier to read. I was admittedly confused early on about some of the hardcoding. If people believe the resulting code is inferior, I am not super attached to the patch. v2: Remove explicit numbering from the enumeration (Matt). Use a real naming scheme, and reference it in the opcode definition (Curro) Add a missed hardcoded logical position in get_lowered_simd_width (Ben) Add an assertion to make sure the component numbering is correct (Ben) Cc: Matt Turner <mattst88@gmail.com> Cc: Francisco Jerez <currojerez@riseup.net> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* svga: fix clip plane regression after recent tgsi_scan changeBrian Paul2015-10-211-2/+2
| | | | | | | | | Before the change "tgsi/scan: use properties for clip/cull distance writemasks", the tgsi_shader_info::num_written_clipdistance field was a multiple of four, now it's an accurate count. In the svga driver, we need a minor change to the loop test. Reviewed-by: Charmaine Lee <charmainel@vmware.com>
* i965: Implement gl_InvocationID.Kenneth Graunke2015-10-211-0/+13
| | | | | | | It's stored in bits 31:27 of g1 (along with the URB handles). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
* i965: Implement nir_intrinsic_load_primitive.Kenneth Graunke2015-10-211-0/+8
| | | | | Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
* i965: Add a fs_visitor constructor that takes a brw_gs_compile.Kenneth Graunke2015-10-212-3/+39
| | | | | | | | | | | Unlike the vs/wm structs, brw_gs_compile is actually useful: it contains the input VUE map and information about the control data headers. Passing this in allows us to share that code in brw_gs.c, and calculate them before deciding on vec4 vs. scalar mode, as it's independent of that choice. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
* i965: Add a brw->scalar_gs flag controlled by INTEL_SCALAR_GS=1.Kenneth Graunke2015-10-213-1/+8
| | | | | | | | | | | This patch introduces a brw->scalar_gs flag, similar to brw->scalar_vs, which controls whether or not to use SIMD8 geometry shaders. For now, we control it via a new environment variable, INTEL_SCALAR_GS. This provides a convenient way to try it out. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
* i965: Make emit_urb_writes() reserve space for GS header information.Kenneth Graunke2015-10-211-2/+16
| | | | | | | | | | | Geometry shaders have additional header data at the beginning of their output URB entries. Shaders that use EndPrimitive() or multiple streams have a control data header; shaders with a dynamic vertex count have an additional vec4 slot to hold the 32-bit vertex count (and 96 bits of padding). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
* i965: Make emit_urb_writes() only set EOT for the VS.Kenneth Graunke2015-10-211-1/+1
| | | | | | | | | The GS will emit a bunch of vertices, and we don't want to do an EOT prematurely. We'll emit GS_OPCODE_THREAD_END when we want to terminate the thread. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
* i965: Make fs_visitor::emit_urb_writes reusable for scalar GS.Kenneth Graunke2015-10-211-7/+7
| | | | | | | | GS doesn't have ClampVertexColor, and we don't want to go through VS structures. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
* i965: Introduce a brw_vue_prog_data::include_vue_handles flag.Kenneth Graunke2015-10-212-0/+5
| | | | | | | | | Tessellation shaders and SIMD8 geometry shaders may need to resort to the pull model for inputs at times. When set, the state upload code will tell the hardware to provide URB handles for input data. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
* i965: Introduce a new SHADER_OPCODE_URB_READ_SIMD8 opcode.Kenneth Graunke2015-10-215-0/+40
| | | | | | | | | | | | In scalar mode, geometry shader inputs can easily take up hundreds of registers. This makes pushing VUE entries impractical; we'll need to resort to the pull model in some cases. To support this, we introduce a new opcode corresponding to the "URB Read SIMD8" message. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
* i965: Introduce new SHADER_OPCODE_URB_WRITE_SIMD8_MASKED/PER_SLOT opcodes.Kenneth Graunke2015-10-215-0/+33
| | | | | | | | | | | | | In the vec4 backend, we have a vec4_instruction::urb_write_flags field. There are many kinds of flags for SIMD4x2 messages. However, there are really only two (per-slot offset, use channel masks) for SIMD8 messages. Rather than adding a boolean flag for per-slot offsets (polluting all instructions), I decided to just make three new opcodes. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
* i965/gs: Do prog_data setup and other calculations in brw_compile_gsJason Ekstrand2015-10-214-220/+222
| | | | | | | | | | | | | | | | This commit moves the large pile of setup calculations we have to do for geometry shaders out of brw_gs_emit and into brw_compile_gs. This has a couple of nice implications. First, it's less work that the caller of brw_compile_gs has to do. Second, it's consistent with the vertex and fragment stages. Finally, it allows us to put brw_gs_compile back behind the API boundary where it belongs. v2 (Jason Ekstrand): - Pull the changes to use nir info into a separate patch - Put brw_gs_compile into brw_shader.h rather than brw_vec4_gs_visitor.h so that we can use it for scalar GS. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/gs: Use NIR info for setting up prog_dataJason Ekstrand2015-10-211-11/+13
| | | | | | | | | | Previously, we were pulling bits from GL data structures in order to set up the prog_data. However, in this brave new world of NIR, we want to be pulling it out of the NIR shader whenever possible. This way, we can move all this setup code into brw_compile_gs without depending on the old GL stuff. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/gs: Pull prog_data out of brw_gs_compileJason Ekstrand2015-10-217-79/+80
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/gs: Use NIR instead of the brw_geometry_program for GS metadataJason Ekstrand2015-10-214-12/+9
| | | | | | With this, we can remove the geometry program from brw_gs_compile. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/gs: Move the mem_ctx argument to brw_compile_gsJason Ekstrand2015-10-213-4/+4
| | | | | | This makes it better match the other brw_compile_* functions. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/gs: Set static_vertex_count unconditionally on GEN8+Jason Ekstrand2015-10-211-1/+1
| | | | | | We always have NIR, so there's no reason for the check. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* nir: Constify nir_gs_count_verticesJason Ekstrand2015-10-212-2/+2
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>