external_mesa3d.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge remote-tracking branch 'mesa/13.0' into nougat-x86	Chih-Wei Huang	2016-11-25	7	-14/+50
\|\
\| *	radeonsi: store group_size_variable in struct si_compute	Nicolai Hähnle	2016-11-24	1	-5/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For compute shaders, we free the selector after the shader has been compiled, so we need to save this bit somewhere else. Also, make sure that this type of bug cannot re-appear, by NULL-ing the selector pointer after we're done with it. This bug has been there since the feature was added, but was only exposed in piglit arb_compute_variable_group_size-local-size by commit 9bfee7047b70cb0aa026ca9536465762f96cb2b1 (which is totally unrelated). Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com> (cherry picked from commit 42d5e91a2ae235c007c5d17935be9bb1c4ff388e)
\| *	vc4: Fix register class handling of DDX/DDY arguments.	Eric Anholt	2016-11-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I had this exactly backwards, but apparently the piglit tests were all landing in r0-r3 anyway. Cc: "13.0" <mesa-stable@lists.freedesktop.org> (cherry picked from commit 977d8b526b983c8d19df00af224033389f8ab7c8)
\| *	vc4: Clamp the shadow comparison value.	Eric Anholt	2016-11-23	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes piglit glsl-fs-shadow2D-clamp-z. Cc: <mesa-stable@lists.freedesktop.org> (cherry picked from commit 08d51487e3b8cfb14ca2ece9545b2e2ed344e3cc)
\| *	vc4: Don't abort when a shader compile fails.	Eric Anholt	2016-11-23	6	-8/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's much better to just skip the draw call entirely. Getting this information out of register allocation will also be useful for implementing threaded fragment shaders, which will need to retry non-threaded if RA fails. Cc: <mesa-stable@lists.freedesktop.org> (cherry picked from commit 4d019bd703e7c20d56d5b858577607115b4926a3)
* \|	Merge remote-tracking branch 'mesa/13.0' into nougat-x86	Chih-Wei Huang	2016-11-16	8	-70/+86
\|\ \ \| \|/
\| *	gallium/hud: protect against and initialization race	Steven Toth	2016-11-14	4	-8/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the event that multiple threads attempt to install a graph concurrently, protect the shared list. Signed-off-by: Steven Toth <stoth@kernellabs.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit 381edca826ee27b1a49f19b0731c777bdf241b20)
\| *	gallium/hud: close a previously opened handle	Steven Toth	2016-11-14	3	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We're missing the closedir() to the matching opendir(). Signed-off-by: Steven Toth <stoth@kernellabs.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit 5a58323064b32442e2de23c95642bc421be696f8)
\| *	gallium/hud: fix a problem where objects are free'd while in use.	Steven Toth	2016-11-14	4	-55/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of trying to maintain a reference counted list of valid HUD objects, and freeing them accordingly, creating race conditions between unanticipated multiple threads, simply accept they're allocated once and never released until the process terminates. They're a shared resource between multiple threads, so accept they're always available for use. Signed-off-by: Steven Toth <stoth@kernellabs.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit 6ffed086795aaa84ab35668bb59d712cdde34da3)
\| *	vc4: Use Newton-Raphson on the 1/W write to fix glmark2 terrain.	Eric Anholt	2016-11-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The 1/W was apparently not accurate enough, and we were getting sparklies in the distance. The closed driver also did a N-R step here. Cc: <mesa-stable@lists.freedesktop.org> (cherry picked from commit 283d4d18e598793bbff7d9ba5a601bced9b36542)
\| *	Revert "st/vdpau: use linear layout for output surfaces"	Dave Airlie	2016-11-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit d180de35320eafa3df3d76f0e82b332656530126. This is a radeon specific hack that causes problems on nouveau when combined with the SHARED flag later. If radeonsi needs a fix for this, please fix it in the driver. [chk] Using linear surfaces for this makes sense because tilling isn't beneficial and the surfaces can potentially be shared with other GPUs using the VDPAU OpenGL interop. [airlied] I think we need a flag that isn't SHARED/LINEAR that is more SHARED_OTHER_GPU. [mareko] Does radeonsi need PIPE_BIND_VIDEO_DECODE_OUTPUT that it would translate into linear ? [mareko] My only concern is decoding performance. If the decoder works in 64x1 blocks, tiling will hurt. That's the theory. I don't know how the decoder works. Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Tested-by: Ilia Mirkin <imirkin@alum.mit.edu> Tested-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> (I+A) (cherry picked from commit d0d5f7600c2e8ab8d0c153787185f7a534753edd)
\| *	radeonsi: fix an assertion failure in si_decompress_sampler_color_textures	Marek Olšák	2016-11-09	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes a crash in Deus Ex: Mankind Divided. Release builds were unaffected, so it's not too serious. Cc: 11.2 12.0 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit 00baaa4752ab7e721218a2840cf0952d8c7c6eca)
\| *	radeonsi: fix BFE/BFI lowering for GLSL semantics	Nicolai Hähnle	2016-11-09	1	-3/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes spec/arb_gpu_shader5/execution/built-in-functions/*-bitfield{Extract,Insert} Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com> (cherry picked from commit 5aef14932ac047dc5f1af311a26b7f41b140d79f)
* \|	st/dri: remove trailing whitespace	Mauro Rossi	2016-11-01	1	-1/+1
\| \|
* \|	android: fix building errors on Android 7.0	Chih-Wei Huang	2016-11-01	3	-9/+3
\| \|
* \|	nv30 locking fixes	Ilia Mirkin	2016-11-01	2	-2/+22
\| \|
* \|	nouveau: more locking - make sure that fence work is always done with	Ilia Mirkin	2016-11-01	4	-4/+17
\| \| \| \| \| \| \| \|	the push mutex acquired
* \|	WIP nouveau: add locking	Ilia Mirkin	2016-11-01	30	-45/+372
\| \|
* \|	android: more fixes for llvmpipe software rendering	Chih-Wei Huang	2016-11-01	2	-2/+11
\| \| \| \| \| \| \| \| \| \|	* add dri2_create_from_texture to driswImageExtension * add dri2FenceExtension to drisw_screen_extensions
* \|	virgl: fix null pointer exceptions	Chih-Wei Huang	2016-11-01	1	-0/+2
\| \|
* \|	android: support swrast	WuZhen	2016-11-01	8	-7/+168
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	System boots up with gles_mesa/softpipe/llvmpipe. NO_REF_TASK Tested: local run Change-Id: I629ed0ca9fad12e32270eb8e8bfa9f7681b68474 Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
* \|	android: print debug info to logcat	WuZhen	2016-11-01	3	-3/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Redirect logs printed to stderr to logcat. NO_REF_TASK Tested: local run Change-Id: I58e3966a608af361b86c54b4c95a92561b711968 Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
* \|	st/dri: fix double free of dri_drawable	WuZhen	2016-11-01	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the callchain destroy_surface->destroyDrawable->dri_put_drawable-> dri_put_drawable->DestroyBuffer By the semantic of it, dri_destroy_buffer should not free drawable struct, all vendor specific and legacy swrast version of the function do not. wonder why no body else ran into this. NO_REF_TASK Tested: local run Change-Id: Ibe82d82d2e34b162e64bf0b8805f8a4553d362d5 Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
* \|	android: change some PIPE to SVGA3D format mappings	Chih-Wei Huang	2016-11-01	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a try-and-error patch which fixes the Android-x86 black screen issue of VMware on Linux host. Tested OK on VMware Workstation 12 Player. But the red and blue colors are exchanged. Note it doesn't affect VMware on Windows host.
* \|	gallium/radeon: define some prototypes of LLVMInitialize functions	Chih-Wei Huang	2016-11-01	1	-8/+1
\| \|
* \|	gallium: introduce load_pipe_screen()	Rob Herring	2016-11-01	4	-3/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce load_pipe_screen() public entry point for other code which dlopen()'s gralloc_dri.so for purposes of loading a pipe_screen. This way drm_gralloc can avoid static linking of each gallium winsys and driver, and avoid duplicated logic to figure out which pipe driver to load. This is based on Rob Clark's work. I moved it into pipe_loader which seems to be a better spot. Signed-off-by: Rob Herring <robh@kernel.org>
* \|	Android: Export gallium_dri include files	Rob Herring	2016-11-01	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This doesn't work yet because the exported include files can't be picked up by the android build system unless the library has a 'lib' prefix. Signed-off-by: Rob Herring <robh@kernel.org>
* \|	android: add support for LLVM 3.7.0 for marshmallow	Mauro Rossi	2016-11-01	1	-1/+1
\|/ \| \| \| \| \| \| \| \| \| \| \| \|	The changes add support for LLVM 3.7.0 for marshmallow, while keeping support for LLVM 3.5.0 with lollipop. MESA_LLVM_VERSION_PATCH=0 is compatible with radeonsi build in lollipop-x86, since mesa 11.0 and newer do not check anymore for LLVM 3.5.2 This changes, combined with specific R600 patches for external/llvm, enable building gallium radeonsi driver in marshmallow-x86. The patch is applicable to 11.2.0devel, 11.1 and 11.0 branches.
*	st/omx/dec: disable tunnel for size different case	Leo Liu	2016-11-01	3	-1/+11
\| \| \| \| \| \| \| \| \| \| \|	When the video coded size is different from frame size, we need the result buffers are same as coded size, which are not size compatible with encode required size, so that simply use no tunnel for this case instead of frame by frame converting. Signed-off-by: Leo Liu <leo.liu@amd.com> Cc: 13.0 <mesa-stable@lists.freedesktop.org> (cherry picked from commit 06e3cd6a45ae2ad19f77e0f283c46d5f85112847)
*	st/omx/dec: result buffers size should match codec decoder size	Leo Liu	2016-11-01	3	-19/+18
\| \| \| \| \| \| \| \| \|	Otherwise fails the check of matching between decoder size and buffers size in kernel. Signed-off-by: Leo Liu <leo.liu@amd.com> Cc: 13.0 <mesa-stable@lists.freedesktop.org> (cherry picked from commit d9b2c4048d55011bb04bd9848a3b47af7216389f)
*	radeonsi: fix behavior of GLSL findLSB(0)	Marek Olšák	2016-11-01	1	-4/+13
\| \| \| \| \| \| \| \|	12.0 and older need the same fix but elsewhere. Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit 4bf45a6079b5cc6b0360b637c0c7baa456b8257d)
*	radeonsi: set VGT_GS_ONCHIP_CNTL on CIK and later	Marek Olšák	2016-11-01	1	-0/+8
\| \| \| \| \| \|	Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Cc: 11.2 12.0 13.0 <mesa-stable@lists.freedesktop.org> (cherry picked from commit e24dc4316487eeaa6ee8aa5c709546d814e96f03)
*	nvc0/ir: fix emission of IMAD with NEG modifiers	Samuel Pitoiset	2016-11-01	2	-2/+2
\| \| \| \| \| \| \| \| \| \|	The emitter tried to emit sub instead of subr when src0 has actually a NEG modifier. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "11.0 12.0 13.0" <mesa-stable@lists.freedesktop.org> (cherry picked from commit 84e946380b2d5ddc62a107b667be39abf1932704)
*	nvc0/ir: fix emission of SHLADD with NEG modifiers	Samuel Pitoiset	2016-10-27	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This affects GF100:GK110 chipsets, but not GM107+ where the logic is a bit different. The emitters tried to emit sub instead of subr when src0 has a NEG modifier. This fixes the following piglit tests glsl-fs-loop-nested and glsl-vs-loop-nested. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "13.0" <mesa-stable@lists.freedesktop.org> (cherry picked from commit 1ec7227d44dceae8de7b93f846bbd33d66007909)
*	winsys/amdgpu: fix radeon_surf::macro_tile_index for imported textures	Marek Olšák	2016-10-27	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Maybe this is why SDMA has been broken for many amdgpu users? SDMA is the only block which is used with imported textures and relies on this variable. DB also uses it, but it doesn't get imported textures, so it's unaffected. I do get SDMA failures on Tonga before this patch if R600_DEBUG=testdma is changed to use imported textures. Cc: 11.2 12.0 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit 6ec3b2a4b1d41b83a4721d06b42c49f55e695cbf)
*	gallium/radeon: make sure the address of separate CMASK is aligned properly	Marek Olšák	2016-10-27	1	-2/+3
\| \| \| \| \| \| \| \| \|	This should fix random GPU hangs on Hawaii and Fiji. Cc: 11.2 12.0 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit dce05b342355eac9296ee7110385b16d6edb059d)
*	gallium/radeon: fix incorrect bpe use in si_set_optimal_micro_tile_mode	Marek Olšák	2016-10-27	1	-7/+7
\| \| \| \| \| \| \| \| \|	Oh my god, I wonder what catastrophic issues this was causing on SI. Cc: 13.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (cherry picked from commit 8a21f52d73936e23a314a288a36782a698c7c1b9)
*	nvc0: use correct bufctx when invalidating CP textures	Samuel Pitoiset	2016-10-27	1	-1/+1
\| \| \| \| \| \| \|	Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org> (cherry picked from commit 7b2712c367891e96384226a1fa94679a814235d0)
*	st/nine: Fix locking CubeTexture surfaces.	Axel Davy	2016-10-27	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	Only one face of Cubetextures was locked when in DEFAULT Pool. Fixes: https://github.com/iXit/Mesa-3D/issues/129 CC: "12.0 13.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Axel Davy <axel.davy@ens.fr> (cherry picked from commit eed605a473554575305e1bf10c3641761a85feb9)
*	st/nine: Fix mistake in Volume9 UnlockBox	Axel Davy	2016-10-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	In the format fallback path, the height was used instead of the depth. CC: "12.0 13.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Axel Davy <axel.davy@ens.fr> (cherry picked from commit fe7bb46134162c9a9a18832f1746991aa78121e8)
*	st/nine: Fix leak with integer and boolean constants	Axel Davy	2016-10-27	1	-21/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Leak introduced by: a83dce01284f220b1bf932774730e13fca6cdd20 The patch also moves the part to release changed.vs_const_i and changed.vs_const_b before the if (!cb.buffer_size) check, to avoid reuploading every draw call if integer or boolean constants are dirty, but the shaders use no constants. Signed-off-by: Axel Davy <axel.davy@ens.fr> CC: "13.0" <mesa-stable@lists.freedesktop.org> (cherry picked from commit 25beccb379731b0e6fc728982190779da47aa6fd)
*	nvc0: do not break 3D state by pushing MS coordinates on Fermi	Samuel Pitoiset	2016-10-24	1	-43/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Long story short, 3D and CP are aliased on Fermi and initializing compute after pushing the MS sample coordinate offsets seems to corrupt 3D state for weird reasons. I still don't have the faintest clue what is going on, but this seems to only affect Fermi generation. A possible fix could be to use two different channels, one for 3D and one for CP. This fixes a bunch of regressions pinpointed by piglit. Fixes: "nvc0: fix up image support for allowing multiple samples" Cc: "13.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> (cherry picked from commit 42273edf79c2500957f51690499aa3405cc689db)
*	radeonsi: fix 64-bit loads from LDS	Nicolai Hähnle	2016-10-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Fixes spec/arb_tessellation_shader/execution/dvec[23]-vs-tcs-tes, among others. Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com> (cherry picked from commit 4a2dbfff05f7be271c2aa72e783e24b31906db51)
*	nv50/ir: process texture offset sources as regular sources	Ilia Mirkin	2016-10-24	1	-53/+94
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With ARB_gpu_shader5, texture offsets can be any source, including TEMPs and IN's. Make sure to process them as regular sources so that we pick up masks, etc. This should fix some CTS tests that feed offsets directly to textureGatherOffset, and we were not picking up the input use, thus not advertising it in the shader header. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Tested-by: Dave Airlie <airlied@redhat.com> Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org> (cherry picked from commit cd45d758ff87305ceecca899fe7325779bb6755b)
*	nv50,nvc0: avoid reading out of bounds when getting bogus so info	Ilia Mirkin	2016-10-24	2	-2/+8
\| \| \| \| \| \| \| \| \| \|	The state tracker tries to attach the info to the wrong shader. This is easy enough to protect against. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org> (cherry picked from commit 313fba5ee1de9416930e45da8aff63a24763940b)
*	radeonsi: remove cb0_is_integer handling	Marek Olšák	2016-10-19	3	-13/+3
\| \| \| \| \| \|	st/mesa does this for us. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
*	draw: improve vertex fetch (v2)	Roland Scheidegger	2016-10-19	3	-86/+134
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The per-element fetch has quite some calculations which are constant, these can be moved outside both the per-element as well as the main shader loop (llvm can figure out it's constant mostly on its own, however this can have a significant compile time cost). Similarly, it looks easier swapping the fetch loops (outer loop per attrib, inner loop filling up the per vertex elements - this way the aos->soa conversion also can be done per attrib and not just at the end though again this doesn't really make much of a difference in the generated code). (This would also make it possible to vectorize the calculations leading to the fetches.) There's also some minimal change simplifying the overflow math slightly. All in all, the generated code seems to look slightly simpler (depending on the actual vs), but more importantly I've seen a significant reduction in compile times for some vs (albeit with old (3.3) llvm version, and the time reduction is only really for the optimizations run on the IR). v2: adapt to other draw change. No changes with piglit. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
*	draw: improved handling of undefined inputs	Roland Scheidegger	2016-10-19	1	-21/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previous attempts to zero initialize all inputs were not really optimal (though no performance impact was measurable). In fact this is not really necessary, since we know the max number of inputs used. Instead, just generate fetch for up to max inputs used by the shader, directly replacing inputs for which there was no vertex element by zero. This also cleans up key generation, which previously would have stored some garbage for these elements. And also drop the assertion which indicates such bogus usage by a debug_printf (the whole point of initializing the undefined inputs was to make this case safe to handle). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
*	gallivm: print out time for jitting functions with GALLIVM_DEBUG=perf	Roland Scheidegger	2016-10-19	1	-0/+11
\| \| \| \| \| \| \| \|	Compilation to actual machine code can easily take as much time as the optimization passes on the IR if not more, so print this out too. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
*	gallivm: Use native packs and unpacks for the lerps	Roland Scheidegger	2016-10-19	3	-13/+156
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For the texturing packs, things looked pretty terrible. For every lerp, we were repacking the values, and while those look sort of cheap with 128bit, with 256bit we end up with 2 of them instead of just 1 but worse, plus 2 extracts too (the unpack, however, works fine with a single instruction, albeit only with llvm 3.8 - the vpmovzxbw). Ideally we'd use more clever pack for llvmpipe backend conversion too since we actually use the "wrong" shuffle (which is more work) when doing the fs twiddle just so we end up with the wrong order for being able to do native pack when converting from 2x8f -> 1x16b. But this requires some refactoring, since the untwiddle is separate from conversion. This is only used for avx2 256bit pack/unpack for now. Improves openarena scores by 8% or so, though overall it's still pretty disappointing how much faster 256bit vectors are even with avx2 (or rather, aren't...). And, of course, eliminating the needless packs/unpacks in the first place would eliminate most of that advantage (not quite all) from this patch. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>