- Index
- December 2023
VPERMB — Permute Packed Bytes Elements
Opcode/Instruction | Op/En | 64/32 bit Mode Support | CPUID Feature Flag | Description |
---|---|---|---|---|
EVEX.128.66.0F38.W0 8D /r VPERMB xmm1 {k1}{z}, xmm2, xmm3/m128 | A | V/V | AVX512VL AVX512_VBMI | Permute bytes in xmm3/m128 using byte indexes in xmm2 and store the result in xmm1 using writemask k1. |
EVEX.256.66.0F38.W0 8D /r VPERMB ymm1 {k1}{z}, ymm2, ymm3/m256 | A | V/V | AVX512VL AVX512_VBMI | Permute bytes in ymm3/m256 using byte indexes in ymm2 and store the result in ymm1 using writemask k1. |
EVEX.512.66.0F38.W0 8D /r VPERMB zmm1 {k1}{z}, zmm2, zmm3/m512 | A | V/V | AVX512_VBMI | Permute bytes in zmm3/m512 using byte indexes in zmm2 and store the result in zmm1 using writemask k1. |
Instruction Operand Encoding ¶
Op/En | Tuple Type | Operand 1 | Operand 2 | Operand 3 | Operand 4 |
---|---|---|---|---|---|
A | Full Mem | ModRM:reg (w) | EVEX.vvvv (r) | ModRM:r/m (r) | N/A |
Description ¶
Copies bytes from the second source operand (the third operand) to the destination operand (the first operand) according to the byte indices in the first source operand (the second operand). Note that this instruction permits a byte in the source operand to be copied to more than one location in the destination operand.
Only the low 6(EVEX.512)/5(EVEX.256)/4(EVEX.128) bits of each byte index is used to select the location of the source byte from the second source operand.
The first source operand is a ZMM/YMM/XMM register. The second source operand can be a ZMM/YMM/XMM register, a 512/256/128-bit memory location. The destination operand is a ZMM/YMM/XMM register updated at byte granularity by the writemask k1.
Operation ¶
VPERMB (EVEX encoded versions) ¶
(KL, VL) = (16, 128), (32, 256), (64, 512) IF VL = 128: n := 3; ELSE IF VL = 256: n := 4; ELSE IF VL = 512: n := 5; FI; FOR j := 0 TO KL-1: id := SRC1[j*8 + n : j*8] ; // location of the source byte IF k1[j] OR *no writemask* THEN DEST[j*8 + 7: j*8] := SRC2[id*8 +7: id*8]; ELSE IF zeroing-masking THEN DEST[j*8 + 7: j*8] := 0; *ELSE DEST[j*8 + 7: j*8] remains unchanged* FI ENDFOR DEST[MAX_VL-1:VL] := 0;
Intel C/C++ Compiler Intrinsic Equivalent ¶
VPERMB __m512i _mm512_permutexvar_epi8( __m512i idx, __m512i a);
VPERMB __m512i _mm512_mask_permutexvar_epi8(__m512i s, __mmask64 k, __m512i idx, __m512i a);
VPERMB __m512i _mm512_maskz_permutexvar_epi8( __mmask64 k, __m512i idx, __m512i a);
VPERMB __m256i _mm256_permutexvar_epi8( __m256i idx, __m256i a);
VPERMB __m256i _mm256_mask_permutexvar_epi8(__m256i s, __mmask32 k, __m256i idx, __m256i a);
VPERMB __m256i _mm256_maskz_permutexvar_epi8( __mmask32 k, __m256i idx, __m256i a);
VPERMB __m128i _mm_permutexvar_epi8( __m128i idx, __m128i a);
VPERMB __m128i _mm_mask_permutexvar_epi8(__m128i s, __mmask16 k, __m128i idx, __m128i a);
VPERMB __m128i _mm_maskz_permutexvar_epi8( __mmask16 k, __m128i idx, __m128i a);
SIMD Floating-Point Exceptions ¶
None.
Other Exceptions ¶
See Exceptions Type E4NF.nb in Table 2-50, “Type E4NF Class Exception Conditions.”