external_mesa3d.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	draw: improve vertex fetch (v2)	Roland Scheidegger	2016-10-19	1	-86/+104
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The per-element fetch has quite some calculations which are constant, these can be moved outside both the per-element as well as the main shader loop (llvm can figure out it's constant mostly on its own, however this can have a significant compile time cost). Similarly, it looks easier swapping the fetch loops (outer loop per attrib, inner loop filling up the per vertex elements - this way the aos->soa conversion also can be done per attrib and not just at the end though again this doesn't really make much of a difference in the generated code). (This would also make it possible to vectorize the calculations leading to the fetches.) There's also some minimal change simplifying the overflow math slightly. All in all, the generated code seems to look slightly simpler (depending on the actual vs), but more importantly I've seen a significant reduction in compile times for some vs (albeit with old (3.3) llvm version, and the time reduction is only really for the optimizations run on the IR). v2: adapt to other draw change. No changes with piglit. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
*	draw: improved handling of undefined inputs	Roland Scheidegger	2016-10-19	1	-21/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previous attempts to zero initialize all inputs were not really optimal (though no performance impact was measurable). In fact this is not really necessary, since we know the max number of inputs used. Instead, just generate fetch for up to max inputs used by the shader, directly replacing inputs for which there was no vertex element by zero. This also cleans up key generation, which previously would have stored some garbage for these elements. And also drop the assertion which indicates such bogus usage by a debug_printf (the whole point of initializing the undefined inputs was to make this case safe to handle). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
*	draw: initialize shader inputs	Roland Scheidegger	2016-10-12	1	-0/+7
\| \| \| \| \| \| \| \| \|	This should make the code more robust if a shader tries to use inputs which aren't defined by the vertex element layout (which usually shouldn't happen). No piglit change. Reviewed-by: Brian Paul <brianp@vmware.com>
*	gallium: Use enum pipe_shader_type in set_sampler_views()	Kai Wasserbäch	2016-08-29	4	-9/+11
\| \| \| \| \|	Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org> Reviewed-by: Brian Paul <brianp@vmware.com>
*	gallium: Use enum pipe_shader_type in bind_sampler_states() (v2)	Kai Wasserbäch	2016-08-29	2	-6/+10
\| \| \| \| \| \| \| \| \| \| \|	v1 → v2: - Fixed indentation (noted by Brian Paul) - Removed second assert from nouveau's switch statements (suggested by Brian Paul) Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Brian Paul <brianp@vmware.com>
*	draw: Avoid aliasing violations.	Matt Turner	2016-08-01	2	-3/+6
\| \| \| \|	Reviewed-by: Marek Olšák <marek.olsak@amd.com>
*	gallivm: Use llvm.fmuladd.*.	Jose Fonseca	2016-06-10	1	-10/+5
\| \| \| \|	Reviewed-by: Roland Scheidegger <sroland@vmware.com>
*	draw: stop using CULLDIST semantic.	Dave Airlie	2016-05-23	10	-48/+31
\| \| \| \| \| \| \| \| \| \| \|	The way the HW works doesn't really fit with having two semantics for this. The GLSL compiler emits 2 vec4s and two properties, this makes draw use those instead of CULLDIST semantics. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
*	draw: s/Elements/ARRAY_SIZE/	Brian Paul	2016-04-27	7	-24/+24
\| \| \| \|	Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
*	tgsi: accept a starting PC value for exec machine.	Dave Airlie	2016-04-27	2	-2/+2
\| \| \| \| \| \| \| \|	This will be used later to restart barriered execution threads in compute, for now we just want to change the API. Acked-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
*	tgsi: move to using vector for system values.	Dave Airlie	2016-04-27	2	-5/+5
\| \| \| \| \| \| \| \| \| \|	For compute support some of the system values are .xyz types, so move to using a vector instead of a single channel. [airlied: squash swizzle fix from compute series]. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
*	tgsi: pass a shader type to the machine create and clean up.	Dave Airlie	2016-04-26	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	There was definitely bugs here mixing up the PIPE_ and TGSI_ defines, hopefully they didn't cause any problems, since mostly it was special cases for GEOMETRY. This clarifies at shader machine create what type of shader this machine will execute. This is needed also for compute shaders where we don't want to allocate inputs/outputs. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
*	gallium/tgsi: move tgsi_exec.h header out of draw_context.h	Dave Airlie	2016-04-26	2	-1/+1
\| \| \| \| \| \| \| \| \|	It gets annoying that changing the tgsi exec rebuilds the state tracker unnecessarily. Putting this include into draw_gs.h which uses it causes a lot less rebuilds. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
*	gallivm: convert size query to using a set of parameters.	Dave Airlie	2016-04-19	1	-18/+4
\| \| \| \| \| \| \| \| \| \|	This isn't currently that easy to expand, so fix it up before expanding it later to include dynamic samplers. [airlied: use some local variables (Roland)] Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
*	draw: add support for passing buffers to vs/gs shaders.	Dave Airlie	2016-04-12	5	-3/+32
\| \| \| \| \| \| \| \|	Like the image code, but for shader buffers this time. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
*	tgsi: add support for buffer/atomic operations to tgsi_exec.	Dave Airlie	2016-04-12	2	-2/+2
\| \| \| \| \| \| \| \| \|	This adds support for doing load/store/atomic operations on buffer objects. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
*	tgsi: set nonhelpermask for vertex shaders	Dave Airlie	2016-04-12	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	For atomic operations we really need to avoid executing unnecessary shaders, so for some tests that just draw a single point we only want one vertex to get processed not 4, this fixes a number of the atomic counters tests. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
*	draw/aaline: stronger guard against no free samplers (v2)	Nicolai Hähnle	2016-04-07	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Line anti-aliasing will fail when there is no free sampler available. Make the corresponding guard more robust in preparation of raising PIPE_MAX_SAMPLERS to 32. The literal 1 is a (signed) int, and shifting into the sign bit is undefined in C, so change occurences of 1 to 1u. v2: add an assert for bitfield size and use 1u << idx Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> (v1) Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v1) Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
*	gallivm: Use standard LLVMSetAlignment from LLVM 3.4 onwards.	Jose Fonseca	2016-04-03	1	-2/+2
\| \| \| \| \| \| \| \| \|	Only provide a fallback for LLVM 3.3. One less dependency on LLVM C++ interface. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>
*	draw: add support for passing images to vs/gs shaders.	Dave Airlie	2016-03-31	5	-2/+29
\| \| \| \| \| \| \| \|	This just adds support for passing through images to the tgsi execution stage. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
*	tgsi: add support for image operations to tgsi_exec. (v2.1)	Dave Airlie	2016-03-31	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	This adds support for load/store/atomic operations on images along with image tracking support. v2: add RESQ support. (Ilia) v2.1: constify interface (Brian) split get_image_coord_dim (Brian) Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
*	tgsi: drop unused set_exec/kill_mask interfaces.	Dave Airlie	2016-03-22	2	-12/+0
\| \| \| \| \| \| \| \| \|	These don't get used and haven't been in git history from what I can see, so drop them. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
*	gallium/tgsi: pass TGSI tex target to tgsi_transform_tex_inst()	Brian Paul	2016-03-21	1	-5/+5
\| \| \| \| \| \| \|	Instead of hard-coded 2D tex target in tgsi_transform_tex_2d_inst() Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>
*	draw: fix line stippling	Roland Scheidegger	2016-03-15	1	-15/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The logic was comparing actual ints, not true/false values. This meant that it was emitting always multiple line segments instead of just one even if the stipple test had the same result, which looks inefficient, and the segments also overlapped thus breaking line aa as well. (In practice, with the no-op default line stipple pattern, for a 10-pixel long line from 0-9 it was emitting 10 segments, with the individual segments ranging from 0-1, 0-2, 0-3 and so on.) This fixes https://bugs.freedesktop.org/show_bug.cgi?id=94193 Reviewed-by: Jose Fonseca <jfonseca@vmware.com> CC: <mesa-stable@lists.freedesktop.org>
*	draw: use util_pstipple_* function for stipple pattern textures and samplers	Nicolai Hähnle	2016-02-09	1	-110/+11
\| \| \| \| \| \| \| \|	This reduces code duplication. Suggested-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
*	draw: use util_pstipple_create_fragment_shader	Nicolai Hähnle	2016-02-09	1	-197/+12
\| \| \| \| \| \| \| \| \|	This reduces code duplication. It also adds support for drivers where the fragment position is a system value. Suggested-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
*	llvmpipe,i915: add back NEW_RASTERIZER dependency when computing vertex info	Roland Scheidegger	2016-01-21	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I removed this mistakenly in 2dbc20e45689e09766552517a74e2270e49817b5. I actually thought it should not be necessary and a piglit run didn't show any differences, but this shouldn't have been in there. draw_prepare_shader_outputs() is in fact dependent on NEW_RASTERIZER. The new polygon-mode-facing test indeed shows why this is necessary, there's lots of invalid reads and writes with valgrind (also crashes without valgrind), because the pre-pipeline vertex size doesn't match the post-pipeline vertex size (note this won't help much with stages which don't have the prepare hook which can grow the vertex size, in particular the wide point stage, but this isn't used by llvmpipe). The test still won't pass, of course, but it is only usage of uninitialized values now, which is much less dangerous... (Albeit I'm pretty sure for i915 it really is not needed anymore as it doesn't care about the extra outputs and doesn't call draw_prepare_shader_outputs().) Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
*	draw: fix key comparison with uninitialized value	Roland Scheidegger	2016-01-13	2	-6/+6
\| \| \| \| \| \| \| \|	Discovered by accident, valgrind was complaining (could have possibly caused us to create redundant geometry shader variants). v2: convinced by Brian and Jose, just use memset for both gs and vs keys, just as easy and less error prone.
*	draw: initialize prim header flags when clipping lines	Roland Scheidegger	2016-01-08	1	-0/+2
\| \| \| \| \| \| \| \| \|	Otherwise, clipped lines would have undefined stippling reset bit if line stippling is enabled. (Untested, and I just assume copying over the bits from the original line is actually the right thing to do.) Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
*	draw: fix line stippling with unfilled prims	Roland Scheidegger	2016-01-08	1	-18/+38
\| \| \| \| \| \| \| \| \| \| \| \| \|	The unfilled stage was not filling in the prim header, and the line stage then decided to reset the stipple counter or not based on the uninitialized data. This causes some failures in conform linestipple test (albeit quite randomly happening depending on environment). So fill in the prim header in the unfilled stage - I am not entirely sure if anybody really needs determinant after that stage, but there's at least later stages (wide line for instance) which copy over the determinant as well. Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>
*	draw: nuke the interp parameter from vertex_info	Roland Scheidegger	2016-01-07	1	-16/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	draw emit couldn't care less what the interpolation mode is... This somehow looked like it would matter, all drivers more or less dutifully filled that in correctly. But this is only used for emit, if draw needs to know about interpolation mode (for clipping for instance) it will get that information from the vs anyway. softpipe actually used to depend on that interpolation parameter, as it abused that structure quite a bit but no longer. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
*	draw: rework handling of non-existing outputs in emit code	Roland Scheidegger	2016-01-07	3	-23/+46
\| \| \| \| \| \| \| \| \| \| \| \| \|	Previously the code would just redirect requests for attributes which don't exist to use output 0. Rework this to output all zeros instead which seems more useful - in particular some extensions like ARB_fragment_layer_viewport require 0 in the fs even if it wasn't output by previous stages. That way, drivers don't have to special case this depending if the vs/gs outputs some attribute or not. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
*	gallium: Remove unnecessary semicolons	Edward O'Callaghan	2016-01-06	1	-1/+1
\| \| \| \| \| \| \| \| \|	Fix silly issue with MSVC case fall-though support to need a extra 'break;' Found-by: Coccinelle Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Reviewed-by: Brian Paul <brianp@vmware.com>
*	draw: minor indentation fix	Brian Paul	2016-01-05	1	-1/+1
\|
*	draw: fix clip test with NaNs	Roland Scheidegger	2015-12-18	2	-14/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	NaNs mean it should be clipped, otherwise the NaNs might get passed to the next stages (if clipping didn't happen for another reason already), which might cause all kind of problems. The llvm path got this right already (possibly by luck), but this isn't used when there's a gs active. Found by code inspection, verified with some hacked piglit test and some more hacked debug output. (Note the clipper can still itself incorrectly generate NaN and INF position values in its output prims (at least after w divide / viewport transform) even if the inputs weren't NaNs, if the position data of the vertices is "sufficiently bad".) Reviewed-by: Brian Paul <brianp@vmware.com>
*	draw: fix pstipple and aaline stages wrt sampler_views/samplers	Roland Scheidegger	2015-12-18	2	-7/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Those stages only really work for OGL-style texturing (so number of samplers and views mostly the same, certainly for the max values). These get often set up all at once, thus there might be max number of both even if all of them are just NULL. We must not set the max number of samplers and views to the same value since that will lead to terrible things if a driver supports more views than samplers (and the state tracker set up all the views). (This will not make these stages magically work if a shader uses dx10-style texturing, they might still replace an actually used sview in that case.) Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
*	draw: handle edge flags in llvm path	Roland Scheidegger	2015-12-16	2	-26/+61
\| \| \| \| \| \| \| \| \| \| \| \| \|	We just ignored them altogether. While this feature is rather old-fashioned supporting it is actually rather trivial. This fixes the associated piglit tests (2 gl-1.0-edgeflag, 2 gl-2.0-edgeflag and all (7) of point-vertex-id). v2: comment fixes, and make the use of the edgeflag in clipmask consistent with when it's actually there (should be impossible to hit a case where the difference would actually matter but still...) Reviewed-by: Brian Paul <brianp@vmware.com>
*	draw: don't set start_instance and instance id for pt emit	Roland Scheidegger	2015-12-16	1	-31/+31
\| \| \| \| \| \| \| \| \| \| \| \|	This just adds confusion, these parameters are used when fetching vertices by translate, but certainly not when emitting hw vertices for drivers, they make no sense there (setting them has no consequences otherwise since there won't be any elements with instance_divisor set). So just set them to 0 (the draw_pipe_vbuf code for emitting vertices when the draw pipeline is run already does exactly that). Also while here do some whitespace cleanup. Reviewed-by: Brian Paul <brianp@vmware.com>
*	draw: remove clip_vertex from vertex header	Roland Scheidegger	2015-12-15	5	-40/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	vertex header had both clip_pos and clip_vertex. We only really need one (clip_pos) because the draw llvm shader would overwrite the position output from the vs with the viewport transformed. However, we don't really need the second one, which was only really used for gl_ClipVertex - if the shader didn't have that the values were just duplicated to both clip_pos and clip_vertex. So, just use this from the vs output instead when we actually need it. Also change clip debug to output both the data from clip_pos and the clipVertex output (if available). Makes some things more complex, some things less complex, but seems more easy to understand what clipping actually does (and what values it uses to do its magic). Reviewed-by: Brian Paul <brianp@vmware.com Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
*	draw: use clip_pos, not clip_vertex for the fake guardband xy point clipping	Roland Scheidegger	2015-12-15	1	-3/+3
\| \| \| \| \| \| \| \|	Seems obvious now this should use the data from position and not clip_vertex (albeit might not really make a difference). Reviewed-by: Brian Paul <brianp@vmware.com Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
*	draw: rename vertex header members	Roland Scheidegger	2015-12-15	6	-42/+46
\| \| \| \| \| \| \| \| \|	clip -> clip_vertex and pre_clip_pos -> clip_pos. Looks more obvious to me what these values actually represent (so use something resembling the vs output names). Reviewed-by: Brian Paul <brianp@vmware.com Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
*	draw: don't pretend have_clipdist is per-vertex	Roland Scheidegger	2015-12-15	5	-18/+20
\| \| \| \| \| \| \| \| \| \|	This is just for code cleanup, conceptually the have_clipdist really isn't per-vertex state, so don't put it there (just dependent on the shader). Even though there wasn't really any overhead associated with this, we shouldn't store random shader information in the vertex header. Reviewed-by: Brian Paul <brianp@vmware.com Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
*	draw: use position not clipVertex output for xyz view volume clipping	Roland Scheidegger	2015-12-15	1	-1/+10
\| \| \| \| \| \| \| \| \| \| \|	I'm pretty sure this should use position (i.e. pre_clip_pos) and not the output from clipVertex. Albeit piglit doesn't care. It is what we use in the clip test, and it is what every other driver does (as they don't even have clipVertex output and lower the additional planes to clip distances). Reviewed-by: Brian Paul <brianp@vmware.com Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
*	draw: don't assume fixed offset for data in struct vertex_info	Roland Scheidegger	2015-12-11	1	-5/+3
\| \| \| \| \| \|	Otherwise, if struct vertex_info is changed, you're in for some surprises... Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
*	draw: fix clipping with linear interpolated values and gl_ClipVertex	Roland Scheidegger	2015-12-11	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Discovered this when working on other clip code, apparently didn't work correctly - the combination of linear interpolated values and using gl_ClipVertex produced wrong values (failing all such combinations in piglits glsl-1.30 interpolation tests, named interpolation-noperspective-XXX-vertex). Use the pre-clip-pos values when determining the interpolation factor to fix this. Noone really understands this code well, but everybody agrees this looks sane... This fixes all those failing tests (10 in total) both with the llvm and non-llvm draw paths, with no piglit regressions. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Dave Airlie <airlied@redhat.com>
*	gallium: Remove redundant NULL ptr checks	Edward O'Callaghan	2015-12-06	1	-2/+1
\| \| \| \| \|	Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>
*	gallium/auxiliary: Sanitize NULL checks into canonical form	Edward O'Callaghan	2015-12-06	23	-35/+35
\| \| \| \| \| \| \| \| \| \|	Use NULL tests of the form `if (ptr)' or `if (!ptr)'. They do not depend on the definition of the symbol NULL. Further, they provide the opportunity for the accidental assignment, are clear and succinct. Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>
*	gallium/auxiliary: Trivial code style cleanup	Edward O'Callaghan	2015-12-06	2	-2/+2
\| \| \| \| \|	Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>
*	draw: fix clipping of layer/vp index outputs	Roland Scheidegger	2015-12-04	1	-139/+186
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was just plain broken. It used always the value from v0 (for vp_index) but would pass the value from the provoking vertex to later stages - but only if there was a corresponding fs input, otherwise the layer/vp index would get lost completely (as it would try to interpolate the (unsigned) values as floats). So, make it obey provoking vertex rules (drivers relying on draw will need to do the same). And make sure that the default interpolation mode (when no corresponding fs input is found) for them is constant. Also, change the code a bit so constant inputs aren't interpolated then copied over later. Fixes the new piglit test gl-layer-render-clipped. v2: more consistent whitespaces fixes for function defs, and more tab killing (overall still not quite right however). Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
*	llvmpipe: add cache for compressed textures	Roland Scheidegger	2015-11-04	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	compressed textures are very slow because decoding is rather complex (and because there's no jit code code to decode them too for non-technical reasons). Thus, add some texture cache which holds a couple of decoded blocks. Right now this handles only s3tc format albeit it could be extended to work with other formats rather trivially as long as the result of decode fits into 32bit per texel (ideally, rgtc actually would decode to more than 8 bits per channel, but even then making it work for it shouldn't be too difficult). This can improve performance noticeably but don't expect wonders (uncompressed is unsurprisingly still faster). It's also possible it might be slower in some cases (using nearest filtering for example or if there's otherwise not many cache hits, the cache is only direct mapped which isn't great). Also, actual decode of a block relies on util code, thus even though always full blocks are decoded it is done texel by texel - this could obviously benefit greatly from simd-optimized code decoding full blocks at once... Note the cache is per (raster) thread, and currently only used for fragment shaders. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>