summaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* i965: Always do NIR IO lowering at specialization time.Kenneth Graunke2016-02-262-8/+1
| | | | | | | | | | | | | | We've now hit literally every case other than geometry shaders (and compute shaders, but those are a no-op). So, let's just move geometry shaders over too and be done with it. The only advantage to doing this at link time was to save the expense of running the pass on recompiles. But we're already running a lot of passes, and the extra code complexity isn't worth it. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965: Make an is_scalar boolean in brw_compile_gs().Kenneth Graunke2016-02-261-4/+4
| | | | | | | | | Shorter than compiler->scalar_stage[MESA_SHADER_GEOMETRY], which can help with line-wrapping. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965/nir: Do lower_io late for fragment shadersJason Ekstrand2016-02-262-1/+3
| | | | | | | | | | | | The Vulkan driver wants to be able to delete fragment outputs that are beyond key.nr_color_regions; this is a lot easier if we lower outputs at specialization time rather than link time. (Rationale added to commit message by Ken) Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
* i965: Set dest type to UW for several send messagesJordan Justen2016-02-262-2/+5
| | | | | | | | | | | | | | Without this, on SIMD 16 the send instruction destination will appear to write more than one destination register, causing the simulator to report an error. Of course, the send instruction can actually write more than one destination register regardless of the type set for the destination, so this is a bit strange. Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* program: Remove extra reference_program()Miklós Máté2016-02-251-2/+0
| | | | | | It was already done in get_mesa_program() Signed-off-by: Marek Olšák <marek.olsak@amd.com>
* i965/fs: Allow saturate propagation to propagate negations into MADs.Matt Turner2016-02-251-0/+4
| | | | | | | | | | | | | | | | Allows us to transform mad res src0 src1 src2 mov.sat dst -res into mad.sat dst -src0 -src1 src2 instructions in affected programs: 3712 -> 3688 (-0.65%) helped: 24 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/fs: Allow saturate propagation to propagate negations into ADDs.Matt Turner2016-02-252-4/+52
| | | | | | | | | | | | | | | Allows us to transform add res src0 src1 mov.sat dst -res into add.sat dst -src0 -src1 No shader-db changes. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/fs: Allow saturate propagation to propagate negations into MULs.Matt Turner2016-02-252-3/+137
| | | | | | | | | | | | | | | | Allows us to transform mul res src0 src1 mov.sat dst -res into mul.sat dst src0 -src1 instructions in affected programs: 45246 -> 45054 (-0.42%) helped: 162 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965/fs: Don't CSE negated multiplies with saturation.Matt Turner2016-02-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | It's not correct to CSE these multiplies mul.sat dst1, -a, b mul.sat dst2, a, b by emitting a negated MOV from dst1 to dst2: mul.sat dst1, -a, b mov dst2, -dst1 Take 2.0*2.0 for example. The first multiply would produce 0.0 and the second would produce 1.0. Fixes bad generated code in 18 to 22 shaders: instructions in affected programs: 432 -> 464 (7.41%) helped: 4 HURT: 18 Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: Enable tiled mem_copy with sRGB-formatted resourcesNanley Chery2016-02-241-2/+6
| | | | | | | | | | | RGBA8 and BGRA8 unorm formats are compatible with the various mem_copy functions. Their sRGB counterparts are also compatible because they're also color-renderable (of importance when the specified resource is a readbuffer) and they share the same physical layout. Signed-off-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
* mesa: replace for loop with bitshifting in supported_buffer_bitmask()Brian Paul2016-02-241-4/+1
| | | | | Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* mesa: updates some comments in buffers.cBrian Paul2016-02-241-3/+6
| | | | | Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* mesa: make _mesa_draw_buffers() staticBrian Paul2016-02-242-11/+7
| | | | | Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* mesa: make _mesa_draw_buffer() staticBrian Paul2016-02-242-9/+6
| | | | | Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* mesa: make _mesa_read_buffer() staticBrian Paul2016-02-242-10/+7
| | | | | | | Not called from any other file. Remove _mesa_ prefix and update comments. Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* mesa: move declaration of buffer var in handle_first_current()Brian Paul2016-02-241-2/+4
| | | | | | | Declare the var in the scopes where it's used. Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* mesa: use gl_buffer_index in a few placesBrian Paul2016-02-243-5/+6
| | | | | Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* st/mesa: remove useless break statementBrian Paul2016-02-241-1/+0
| | | | | Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* st/mesa: rename st_readpixels to st_ReadPixelsBrian Paul2016-02-241-2/+2
| | | | | | | To match the convention of other device driver functions. Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* st/mesa: fix frontbuffer glReadPixels regressionsBrian Paul2016-02-241-2/+11
| | | | | | | | | | | | | | | | | | | | | The change "mesa/readpix: Don't clip in _mesa_readpixels()" caused a few piglit regressions. The failing tests use glReadPixels to read from the front color buffer. The problem is we were trying to read from a non-existant front color buffer. The front color buffer is created on demand in st/mesa. Since the missing buffer bounds were effectively 0 x 0 the glReadPixels was totally clipped and returned early. The fix involves creating the real front color buffer when we're about to try reading from it. Tested with llvmpipe and VMware driver on Linux, Windows. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94253 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94254 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94257 Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* mesa: use sizeof on the correct typeThomas Hindoe Paaboel Andersen2016-02-231-1/+1
| | | | | | | | Before the luminance stride was based on the size of GL_FLOAT which is just the type constant (0x1406). Change it to use the size of GLfloat. Reviewed-by: Brian Paul <brianp@vmware.com>
* i965/fs: Return result of image atomic in a register of the expected type.Francisco Jerez2016-02-221-1/+1
| | | | | | | | So the result is of float type if we're implementing the float overload of imageAtomicExchange. This is the only back-end change required to support OES_shader_image_atomic AFAICT. Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* mesa: Add extension table entry for OES_shader_image_atomic.Francisco Jerez2016-02-221-0/+1
| | | | | | v2: No need for extension enable bits (Ilia). Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* mesa: add GL_EXT_texture_border_clamp supportIlia Mirkin2016-02-221-0/+1
| | | | | | | | This extension is identical to GL_OES_texture_border_clamp. But dEQP has tests that want the EXT variant. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
* mesa: add GL_OES_texture_border_clamp supportIlia Mirkin2016-02-224-6/+22
| | | | | | | Only minor differences to the existing ARB_texture_border_clamp support. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
* st/mesa: force depth mode to GL_RED for sized depth/stencil formatsIlia Mirkin2016-02-191-9/+25
| | | | | | | | | | | See commit 9db2098d for the i965 version of this. This fixes depth in a bunch of dEQP EXT_texture_border_clamp tests. And probably other ones as well. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: mesa-stable@lists.freedesktop.org
* meta/copy_image: use precomputed dst_internal_format to avoid segfaultIlia Mirkin2016-02-191-1/+1
| | | | | | | | | If the destination is a renderbuffer, dst_tex_image will be NULL. This fixes the *to_renderbuffer dEQP copy image tests. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Cc: mesa-stable@lists.freedesktop.org
* mesa: add GL_OES_texture_stencil8 supportIlia Mirkin2016-02-193-0/+11
| | | | | | | | | It's basically the same thing as GL_ARB_texture_stencil8 except that glCopyTexImage isn't supported, so add STENCIL_INDEX to the list of invalid GLES formats for glCopyTexImage. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
* st/mesa: fix pbo uploadsIlia Mirkin2016-02-191-10/+18
| | | | | | | | | | | - LOD must be provided in .w for TXF (even for buffer textures) - User buffer must be valid at draw time - Must have a sampler associated with the sampler view This makes PBO uploads work again on nouveau. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* mesa: check fbo completeness based on internal format, not driver formatIlia Mirkin2016-02-191-3/+2
| | | | | | | | | | | | The base format is a function of the user-requested format, while the driver format is not. So we should use the base format instead. The driver format can be anything. Specifically in the stencil-only case, it might be a depth/stencil format. However we still want to refuse such an attachment when bound to GL_DEPTH. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Brian Paul <brianp@vmware.com>
* mesa: small optimization of _mesa_expand_bitmap()Brian Paul2016-02-191-7/+4
| | | | | | Avoid a per-pixel multiply. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* mesa: add special case ubyte[4] / BGRA conversion functionBrian Paul2016-02-191-5/+69
| | | | | | | | This reduces a glTexImage(GL_RGBA, GL_UNSIGNED_BYTE) hot spot in when storing the texture as BGRA. Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* st/mesa: implement a simple cache for glDrawPixelsBrian Paul2016-02-193-0/+97
| | | | | | | | | Instead of discarding the texture we created, keep it around in case the next glDrawPixels draws the same image again. This is intended to help application which draw the same image several times in a row, either within a frame or subsequent frames. Reviewed-by: Charmaine Lee <charmainel@vmware.com>
* st/mesa: disable depth/stencil/alpha tests in PBO uploadNicolai Hähnle2016-02-181-0/+8
| | | | | | Noticed by Brian Paul. Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* mesa: fix new gcc6 warningsRob Clark2016-02-181-3/+0
| | | | | | | | | | | | | | | | | | | | | | | src/mesa/main/texstore.c:92:22: warning: ‘map_1032’ defined but not used [-Wunused-const-variable] static const GLubyte map_1032[6] = { 1, 0, 3, 2, ZERO, ONE }; ^~~~~~~~ src/mesa/main/texstore.c:91:22: warning: ‘map_3210’ defined but not used [-Wunused-const-variable] static const GLubyte map_3210[6] = { 3, 2, 1, 0, ZERO, ONE }; ^~~~~~~~ src/mesa/main/texstore.c:90:22: warning: ‘map_identity’ defined but not used [-Wunused-const-variable] static const GLubyte map_identity[6] = { 0, 1, 2, 3, ZERO, ONE }; ^~~~~~~~~~~~ These appear to be unused since: commit 8ec6534b266549cdc2798e2523bf6753924f6cde Author: Iago Toral Quiroga <itoral@igalia.com> AuthorDate: Wed Oct 15 13:42:11 2014 +0200 mesa: Use _mesa_format_convert to implement texstore_rgba. Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* i965: fix new gcc6 warningsRob Clark2016-02-181-1/+1
| | | | | | | | | | | | | src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp:244:1: warning: ‘void {anonymous}::fs_copy_prop_dataflow::dump_block_data() const’ defined but not used [-Wunused-function] fs_copy_prop_dataflow::dump_block_data() const ^~~~~~~~~~~~~~~~~~~~~ From looking at git history, it looks like this is intended to be unused (ie. just for adding on-demand debug prints) Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* Android: fix build break in libmesa_programRob Herring2016-02-181-1/+1
| | | | | | | | | | | | Commit 5fd848f6c9ee ("program: Use _mesa_geometric_samples to calculate gl_NumSamples") broken Android builds. Add the missing include path "main" to framebuffer.h like other includes in prog_statevars.c. Cc: Neil Roberts <neil@linux.intel.com> Cc: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Rob Herring <robh@kernel.org> Reviewed-by: Neil Roberts <neil@linux.intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
* mesa: gl_NumSamples should always be at least oneIlia Mirkin2016-02-181-1/+1
| | | | | | | | | | | | | From ARB_sample_shading: "gl_NumSamples is the total number of samples in the framebuffer, or one if rendering to a non-multisample framebuffer" So make sure to always pass in at least 1. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Edward O`Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Neil Roberts <neil@linux.intel.com>
* compiler/glsl: Fix uniform location counting.Plamena Manolova2016-02-181-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch moves the calculation of current uniforms to link_uniforms, which makes use of UniformRemapTable which stores all the reserved uniform locations. Location assignment for implicit uniforms now tries to use any gaps left in the table after the location assignment for explicit uniforms. This gives us more space to store more uniforms. Patch is based on earlier patch with following changes/additions: 1: Move the counting of explicit locations to check_explicit_uniform_locations and then pass the number to link_assign_uniform_locations. 2: Count the number of empty slots in UniformRemapTable and store them in a list_head. 3: Try to find an empty slot for implicit locations from the list, if that fails resize UniformRemapTable. Fixes following CTS tests: ES31-CTS.explicit_uniform_location.uniform-loc-mix-with-implicit-max ES31-CTS.explicit_uniform_location.uniform-loc-mix-with-implicit-max-array Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Signed-off-by: Plamena Manolova <plamena.manolova@intel.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93696
* st/mesa: new st_DrawAtlasBitmaps() function for drawing bitmap textBrian Paul2016-02-172-3/+141
| | | | | | | | | | | | This basically saves the current pipeline state, sets up state for rendering, constructs a set of textured quads, renders, then restores the previous pipeline state. It shouldn't be hard to implement a similar function for non-gallium drives. With some code refactoring, the vertex definition code could probably be shared. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* mesa: implement a display list / glBitmap texture atlasBrian Paul2016-02-175-0/+448
| | | | | | | | | | | | | | | | | | | | | This improves the performance of applications which use glXUseXFont() or wglUseFontBitmaps() and glCallLists() to draw bitmap text. Basically, we collect all the glBitmap images from the display lists and put them into a texture atlas. To render the bitmaps for a glCallLists() command, we render a set of textured quads where each quad is textured with one bitmap image. Actually, the rendering part has to be done by the Mesa driver or Mesa/gallium state tracker. Note that GLUT demos that use glutBitmapCharacter() don't benefit from this. v2, per Nicolai Hähnle: - check the max tex rect size is at least 1024. - add comment in dd.h that texture_rectangle is required. - in _mesa_DeleteLists(), try to delete the atlas before the list(s) Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
* st/mesa: apply DepthMode swizzle to stencil texturing as wellIlia Mirkin2016-02-171-2/+0
| | | | | | | | Gallium doesn't present these as GL_RED-style. A swizzle is necessary to present the proper data in the unused components. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* mesa: allow multisampled format info to be returned on GLES 3.1Ilia Mirkin2016-02-171-1/+4
| | | | | | | | | | | | The restriction on multisampled integer texture formats only applies to GLES 3.0, so don't apply it to GLES 3.1 contexts. This fixes a slew of dEQP-GLES31.functional.state_query.internal_format.* tests, which now all pass. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
* i965: Extract push constant state to a new fileBen Widawsky2016-02-174-164/+191
| | | | | | | | | | | Every stage has a corresponding 3DSTATE_CONSTANT_XS packet, so having the code to create and emit push constant buffers in genX_vs_state.c is a little strange. Moving it to a separate file seems more logical. v2 [Ken]: Rebase on master, explain motivation in the commit message. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Make emit_minmax return an instruction*.Matt Turner2016-02-173-10/+10
| | | | And use it in brw_fs_nir.cpp.
* i965: Lower min/max after optimization on Gen4/5.Matt Turner2016-02-178-44/+88
| | | | | | | | | | | | | | | | | | | Gen4/5's SEL instruction cannot use conditional modifiers, so min/max are implemented as CMP + SEL. Handling that after optimization lets us CSE more. On Ironlake: total instructions in shared programs: 6426035 -> 6422753 (-0.05%) instructions in affected programs: 326604 -> 323322 (-1.00%) helped: 1411 total cycles in shared programs: 129184700 -> 129101586 (-0.06%) cycles in affected programs: 18950290 -> 18867176 (-0.44%) helped: 2419 HURT: 328 Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* i965/vec4: Initialize force_writemask_all in vec4_builder().Matt Turner2016-02-171-1/+2
| | | | Reviewed-by: Francisco Jerez <currojerez@riseup.net>
* st/mesa: fix up result_src.type when doing i2u/u2i conversionsIlia Mirkin2016-02-171-0/+1
| | | | | | | | | | | | Even though it's a no-op, it's important to keep track of the type so that we can pick the properly-signed op later on. This fixes dEQP-GLES3.functional.shaders.precision.uint.highp_div_fragment, which ended up using IDIV instead of UDIV. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: mesa-stable@lists.freedesktop.org
* st/mesa: use cso_set_viewport_dims() in try_pbo_upload_common()Brian Paul2016-02-171-12/+1
| | | | | | | Note that this results in a different transformation for the viewport's Z axis (depth range), but that doesn't matter for this case. Reviewed-by: Roland Scheidegger <sroland@vmware.com>
* i965/gen7: Use predicated rendering for indirect computeJordan Justen2016-02-172-14/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | On gen7 (Ivy Bridge, Haswell), we will get a GPU hang if an indirect dispatch is used, but one of the dimensions is 0. Therefore we use predicated rendering on the GPGPU_WALKER command to handle this case. Fixes piglit test: spec/arb_compute_shader/zero-dispatch-size From the ARB_compute_shader spec, under DispatchCompute: "If the work group count in any dimension is zero, no work groups are dispatched." And then for DispatchComputeIndirect: ... "is equivalent (assuming no errors are generated) to calling DispatchCompute with <num_groups_x>, <num_groups_y> and <num_groups_z>" ... Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94100 Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>