ASTC weights SIMD encoding #298

ronanbel · 2022-05-06T14:48:50Z

ssse3 (I5 6300) : 163 => 136 ms
arm (A53) : 340 => 282 ms
I moved the block weight transform code in a single function : pack_astc_block_weights
you can enable/disable the SIMD code with a define
BASISD_ASTC_SIMD
All the simd code is annotated

tested x86_64 on windows, compiled with VS2019 and clang 11
tested arm & arm64 on android, compiled with latest NDK (clang11)

if needed, you can get in touch at :
[email protected]
[email protected]

richgel999 · 2022-05-12T07:00:19Z

Thank you - this is great. I normally shy away from merging code that I can't easily maintain, but let me see what I can do. How much does this help encoding perf.?

ssse3 (I5 6300) : 163 => 136 ms arm (A53) : 340 => 282 ms I moved the block weight transform code in a single function : pack_astc_block_weights you can enable/disable the SIMD code with a define BASISD_ASTC_SIMD All the simd code is annotated tested x86_64 on windows, compiled with VS2019 and clang 11 tested arm & arm64 on android, compiled with latest NDK (clang11) if needed, you can get in touch at : [email protected] [email protected] fix previous issue + optimize unpack

ronanbel · 2022-05-12T18:06:48Z

Hi Richard. I did a second submit today (there was an error in the first one) with additional optimization (uastc_unpack) on my test device (arm a53), it lasts 200ms from 340ms (base code) to transcode the uastc to astc (the same on an A57) it's also faster on PC I'm using a big texture for the tests, do a CRC32 at the end to detect if the decoding gives the same result. If you need some other optims or modifications, don't hesitate to let me know. And if you have a unit-test asset (a test.basis) which has all modes & subsets & anchor patterns, it would help me to track my modifications are valid have a nice day Le jeu. 12 mai 2022 à 09:00, Rich Geldreich ***@***.***> a écrit :

…

Thank you - this is great. I normally shy away from merging code that I can't easily maintain, but let me see what I can do. How much does this help encoding perf.? — Reply to this email directly, view it on GitHub <#298 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACOL7CCQAODQFKKYWU5QX2LVJSUBBANCNFSM5VIMSKDQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

ronanbel · 2022-05-12T18:51:17Z

re-hi (from home after a long day at work) #define BASISD_ASTC_SIMD // 287 => 240 #define BASISD_UASTC_SIMD // 240 => 222 (231 without BASISD_USE_UNALIGNED_WORD_READS) #define BASISD_ASTC_24WRITE here are the switches to enable most of my modifications (some like a hint for the compiler to do a fast div3/5 are not between #if) I'm using this version of the code on the project I'm working on, at least on this dataset, all is fine. I may do some other modifications (if needed, if I have some spare time because I did most of this during the week-end) I understand what you mean by "I can't easily maintain", I would do the same. All I can say is that I will continue to work in the game industry for at least 15/20 years (probably not at Ubisoft), so I think you can ask me for maintenance for the next decade :) (anyway, you may leave this between #if defined and leave the user the ability to use the simd version if he needs some speedup) (there is not so much simd code, most added lines are here to setup tables for further simd processing) Le jeu. 12 mai 2022 à 20:06, ronan bel ***@***.***> a écrit :

…

Hi Richard. I did a second submit today (there was an error in the first one) with additional optimization (uastc_unpack) on my test device (arm a53), it lasts 200ms from 340ms (base code) to transcode the uastc to astc (the same on an A57) it's also faster on PC I'm using a big texture for the tests, do a CRC32 at the end to detect if the decoding gives the same result. If you need some other optims or modifications, don't hesitate to let me know. And if you have a unit-test asset (a test.basis) which has all modes & subsets & anchor patterns, it would help me to track my modifications are valid have a nice day Le jeu. 12 mai 2022 à 09:00, Rich Geldreich ***@***.***> a écrit : > Thank you - this is great. I normally shy away from merging code that I > can't easily maintain, but let me see what I can do. How much does this > help encoding perf.? > > — > Reply to this email directly, view it on GitHub > <#298 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ACOL7CCQAODQFKKYWU5QX2LVJSUBBANCNFSM5VIMSKDQ> > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> >

ronanbel · 2022-05-12T19:00:48Z

How much does this help encoding perf

sorry I didn't answer your questions (12h day at work today, I'm tired) I don't think it helps the encoder it's only in the transcoder one part in the uastc_unpack one part in the transcoding to astc (the one I needed) all is for the runtime client code (parsing uastc, then converting the astc) Le jeu. 12 mai 2022 à 20:51, ronan bel ***@***.***> a écrit :

…

re-hi (from home after a long day at work) #define BASISD_ASTC_SIMD // 287 => 240 #define BASISD_UASTC_SIMD // 240 => 222 (231 without BASISD_USE_UNALIGNED_WORD_READS) #define BASISD_ASTC_24WRITE here are the switches to enable most of my modifications (some like a hint for the compiler to do a fast div3/5 are not between #if) I'm using this version of the code on the project I'm working on, at least on this dataset, all is fine. I may do some other modifications (if needed, if I have some spare time because I did most of this during the week-end) I understand what you mean by "I can't easily maintain", I would do the same. All I can say is that I will continue to work in the game industry for at least 15/20 years (probably not at Ubisoft), so I think you can ask me for maintenance for the next decade :) (anyway, you may leave this between #if defined and leave the user the ability to use the simd version if he needs some speedup) (there is not so much simd code, most added lines are here to setup tables for further simd processing) Le jeu. 12 mai 2022 à 20:06, ronan bel ***@***.***> a écrit : > Hi Richard. > I did a second submit today (there was an error in the first one) > with additional optimization (uastc_unpack) > on my test device (arm a53), it lasts 200ms from 340ms (base code) to > transcode the uastc to astc (the same on an A57) > it's also faster on PC > I'm using a big texture for the tests, do a CRC32 at the end to detect if > the decoding gives the same result. > If you need some other optims or modifications, don't hesitate to let me > know. > And if you have a unit-test asset (a test.basis) which has all modes & > subsets & anchor patterns, it would help me to track my modifications are > valid > > have a nice day > > > Le jeu. 12 mai 2022 à 09:00, Rich Geldreich ***@***.***> a > écrit : > >> Thank you - this is great. I normally shy away from merging code that I >> can't easily maintain, but let me see what I can do. How much does this >> help encoding perf.? >> >> — >> Reply to this email directly, view it on GitHub >> <#298 (comment)>, >> or unsubscribe >> <https://github.com/notifications/unsubscribe-auth/ACOL7CCQAODQFKKYWU5QX2LVJSUBBANCNFSM5VIMSKDQ> >> . >> You are receiving this because you authored the thread.Message ID: >> ***@***.***> >> >

Akaricchi · 2023-04-10T03:26:43Z

transcoder/basisu_transcoder.cpp

+		uint16x8_t	bitMask0 = vshlq_u16( vdupq_n_u16(1), bitNum0 );					//	bitMask = (1U << n) - 1U
+		uint16x8_t	bitMask1 = vshlq_u16( vdupq_n_u16(1), bitNum1 );					//	bitMask = (1U << n) - 1U


This fails on GCC without -flax-vector-conversions:

../subprojects/basis_universal/transcoder/basisu_transcoder.cpp: In function 'void basist::pack_astc_block_weights(uint8_t*, const uint8_t*, int, int)': ../subprojects/basis_universal/transcoder/basisu_transcoder.cpp:12010:80: note: use '-flax-vector-conversions' to permit conversions between vectors with differing element types or numbers of subparts 12010 | uint8x8_t rev8 = vqmovn_u16( vcombine_u16( rev8lohi, vdup_n_u8(0) ) ); // 8bits in 4 u8 (clear lower 32) | ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~ ../subprojects/basis_universal/transcoder/basisu_transcoder.cpp:12010:101: error: cannot convert 'uint8x8_t' to 'uint16x4_t' 12010 | uint8x8_t rev8 = vqmovn_u16( vcombine_u16( rev8lohi, vdup_n_u8(0) ) ); // 8bits in 4 u8 (clear lower 32) | ~~~~~~~~~^~~ | | | uint8x8_t

An explicit cast fixes it:

Suggested change

uint16x8_t bitMask0 = vshlq_u16( vdupq_n_u16(1), bitNum0 ); // bitMask = (1U << n) - 1U

uint16x8_t bitMask1 = vshlq_u16( vdupq_n_u16(1), bitNum1 ); // bitMask = (1U << n) - 1U

uint16x8_t bitMask0 = vshlq_u16( vdupq_n_u16(1), (int16x8_t)bitNum0 ); // bitMask = (1U << n) - 1U

uint16x8_t bitMask1 = vshlq_u16( vdupq_n_u16(1), (int16x8_t)bitNum1 ); // bitMask = (1U << n) - 1U

ronanbel · 2023-04-26T02:06:56Z

Hi, yes you can do a cast on the shift(always left) to satisfy GCC and you can replace the vdup_n_u8(0) by a vdup_16(0), logically it's the same (I'm only using clang & msvc) (sorry I don't have git on this PC) Le lun. 10 avr. 2023 à 05:26, Andrei Alexeyev ***@***.***> a écrit :

…

***@***.**** commented on this pull request. ------------------------------ In transcoder/basisu_transcoder.cpp <#298 (comment)> : > + uint16x8_t bitMask0 = vshlq_u16( vdupq_n_u16(1), bitNum0 ); // bitMask = (1U << n) - 1U + uint16x8_t bitMask1 = vshlq_u16( vdupq_n_u16(1), bitNum1 ); // bitMask = (1U << n) - 1U This fails on GCC without -flax-vector-conversions: ../subprojects/basis_universal/transcoder/basisu_transcoder.cpp: In function 'void basist::pack_astc_block_weights(uint8_t*, const uint8_t*, int, int)': ../subprojects/basis_universal/transcoder/basisu_transcoder.cpp:12010:80: note: use '-flax-vector-conversions' to permit conversions between vectors with differing element types or numbers of subparts 12010 | uint8x8_t rev8 = vqmovn_u16( vcombine_u16( rev8lohi, vdup_n_u8(0) ) ); // 8bits in 4 u8 (clear lower 32) | ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~ ../subprojects/basis_universal/transcoder/basisu_transcoder.cpp:12010:101: error: cannot convert 'uint8x8_t' to 'uint16x4_t' 12010 | uint8x8_t rev8 = vqmovn_u16( vcombine_u16( rev8lohi, vdup_n_u8(0) ) ); // 8bits in 4 u8 (clear lower 32) | ~~~~~~~~~^~~ | | | uint8x8_t An explicit cast fixes it: ⬇️ Suggested change - uint16x8_t bitMask0 = vshlq_u16( vdupq_n_u16(1), bitNum0 ); // bitMask = (1U << n) - 1U - uint16x8_t bitMask1 = vshlq_u16( vdupq_n_u16(1), bitNum1 ); // bitMask = (1U << n) - 1U + uint16x8_t bitMask0 = vshlq_u16( vdupq_n_u16(1), (int16x8_t)bitNum0 ); // bitMask = (1U << n) - 1U + uint16x8_t bitMask1 = vshlq_u16( vdupq_n_u16(1), (int16x8_t)bitNum1 ); // bitMask = (1U << n) - 1U — Reply to this email directly, view it on GitHub <#298 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACOL7CARDZELDN4VTHPMGJDXAN4YBANCNFSM5VIMSKDQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

richgel999 added the enhancement New feature or request label May 12, 2022

ronanbel force-pushed the ASTC_SIMD branch from a2b0d94 to b781af6 Compare May 12, 2022 14:51

Akaricchi reviewed Apr 10, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASTC weights SIMD encoding #298

ASTC weights SIMD encoding #298

ronanbel commented May 6, 2022

richgel999 commented May 12, 2022

ronanbel commented May 12, 2022 via email

ronanbel commented May 12, 2022 via email

ronanbel commented May 12, 2022 via email

Akaricchi Apr 10, 2023

ronanbel commented Apr 26, 2023 via email

		uint16x8_t bitMask0 = vshlq_u16( vdupq_n_u16(1), bitNum0 ); // bitMask = (1U << n) - 1U
		uint16x8_t bitMask1 = vshlq_u16( vdupq_n_u16(1), bitNum1 ); // bitMask = (1U << n) - 1U

ASTC weights SIMD encoding #298

Are you sure you want to change the base?

ASTC weights SIMD encoding #298

Conversation

ronanbel commented May 6, 2022

richgel999 commented May 12, 2022

ronanbel commented May 12, 2022 via email

ronanbel commented May 12, 2022 via email

ronanbel commented May 12, 2022 via email

Akaricchi Apr 10, 2023

Choose a reason for hiding this comment

ronanbel commented Apr 26, 2023 via email