Releases: alpaka-group/alpaka
Releases · alpaka-group/alpaka
0.3.5: Enhance OpenMP atomics performance
0.3.4: Support for CUDA 10 and bug fixes
Compatibility Changes:
- added support for boost-1.68.0
- added support for CUDA 10
- support for glibc < 2.18 (fix missing macros)
- added checks for available OpenMP versions
Bug Fixes:
- fixed empty(StreamCpuAsync) returning true even though the last task is still in progress
- fixed integer overflows in case of int16_t being used as accelerator index type
- made some throwing destructors not throwing to support clang 7
- fixed broken alpaka::math::min for non-integral types
New Features:
- added prepareForAsyncCopy which can be called to enable async copies for a specific buffer (if it is supported)
- allowed to run alpaka OpenMP 2 block accelerated kernels within existing parallel region
- added alpaka::ignore_unused which can be used in kernels
0.3.3: RNG enhancements
0.3.2: Support for CUDA 9.2
Bug fix and compatibility release for version 0.3.1.
New Features:
- Enhanced the compiler compatibility checks within the CMake scripts
Bugs Fixed:
- fixed missing error in case of wrong OpenMP thread count being used by the runtime that was not triggered when not in debug mode
- fixed CUDA driver API error handling
- fixed CUDA memcpy and memset for zero sized buffers (division by zero)
- fixed OpenMP 4 execution
- fixed the VS2017 CUDA build (not officially supported)
- fixed CUDA callback execution not waiting for the task to finish executing
- fixed cudaOnly test being part of make test when cuda only mode is not enabled
Compatibility Changes:
- added support for CUDA 9.2
0.3.1: A Bunch of CMake Fixes
Bug fix release for version 0.3.0.
New Features:
- CMake: added option to control tests
BUILD_TESTING
- CMake: unified requirement of CMake 3.7.0+
- CMake: used targets for Boost dependencies
- CMake: made alpaka a pure interface library
Bugs Fixed:
- fixed getDevCount documentation
- fixed undefined define warnings
- fixed self containing header check for CUDA
Compatibility Changes:
- n/a
0.3.0: Bug Fixes and New Features
This is the third release of alpaka.
Bugs Fixed:
- fixed multiple bugs where CPU streams/events could deadlock or behaved different than the native CUDA events
- fixed a bug where the block synchronization of the Boost.Fiber backend crashed due to uninitialized variables
New Features / Enhancements:
- added support for stream callbacks allowing to enqueue arbitrary host code using
alpaka::stream::enqueue(stream, [&](){...});
- added support for compiling for multiple architectures using e.g.
ALPAKA_CUDA_ARCH="20;35"
- added support for using
__host__ constexpr
code within__device__
code - enhanced the CUDA error handling
- enhanced the documentation for mapping CUDA to alpaka
Compatibility Changes:
- added support for CUDA 9.0 and 9.1
- added support for CMake 3.9 and 3.10
- removed support for CMake 3.6 and older
- added support for boost-1.65.0
- removed support for boost-1.61.0 and older
- added support for gcc 7
- added support for clang 4 and 5
- removed support for VS2015
0.2.0: Compatibility Fixes and Enhancements
This is the second release of alpaka. It does not contain huge changes but only compatibility fixes and small enhancements:
- the documentation has been greatly enhanced
- adds support for CUDA 8.0
- adds support for CMake versions 3.6, 3.7 and 3.8
- adds support for Boost 1.62, 1.63 and 1.64
- adds support for clang-3.9
- adds support for Visual Studio 2017
- alpaka now compiles clean even with clang
-Weverything
- re-enabled the
boost::fiber
accelerator backend which was disabled in the last release
API changes:
mapIdx
is moved from namespacealpaka::core
toalpaka::idx
Vec
is moved from namespacealpaka
toalpaka::vec
vec::Vec
is now allowed to be zero-dimensional (was previously forbidden)- added
vec::concat
- added element-wise
operator<
forvec::Vec
which returns a vector of bool - CPU accelerators now support arbitrary dimensionality (both kernel execution as well as memory operations)
- added support for
syncBlockThreadsPredicate
withblock::sync::op::LogicalOr
,block::sync::op::LogicalAnd
andblock::sync::op::Count
- memory allocations are now aligned optimally for the underlying architecture (16 bit for SSE, 32 bit for AVX, 64 bit for AVX512) instead of 16 bit for all architectures in the previous release
Thanks to all contributors!