Releases: libocca/occa
v2.0
Features
- The maximum number of kernel arguments can be adjusted at build time. [#718]
- SYCL subgroup size can be set via kernel property or
@simd_lengthattribute. [#726] - Initial support for compiler attribute statements. [#729]
Breaking Changes
memory::size()returns the number ofdtypeentries instead of byte-length. [#711]- Memory copies are now datatype aware for consistency. [#728]
- The CMake variables
OCCA_<MODE>_ENABLEDare set in parent scope. [#720] - CMake build options
ENABLE_<OPTION>have been renamedOCCA_ENABLE_<OPTION>. [#733] memoryPoolhas graduated from an experimental feature and is now in the mainoccanamespace. [#741]
Bugfixes
- Correctly sync all streams in
device::finishAll(). [#723] - Corruption of memory datatypes when using slices. [#727]
Contributors
OCCA is a community driven project that relies on the support of people like you. Thank you everyone who contributed to this release!
Full Changelog: v1.6.0...v2.0.0
v1.6.0
Features
- Devices can be shared by multiple host threads [#672]
- Pass general objects to kernels by value [#676]
- Quick return from some memory functions for zero sized allocations [#678]
- OKL support for typedef enums and unions [#705]
Bugfixes
- Correctly set source and binary filenames when building a launchedKernel [#666]
Contributors
OCCA is a community driven project that relies on the support of people like you. Thank you everyone who contributed to this release!
- @ooreilly
- @BenWibking
- @topazus
- @cjatin
- @mkbosmans
- @TejaX-Alaghari
- @kian-ohara
- @deukhyun-cha
- @amikstcyr
- @thilinarmtb
- @noelchalmers
- @kris-rowe
** Full Changelog**: v1.5.0...v1.6.0
v1.5.0
Features
Memory Pools
A device memory pool implementation is available in the occa::experimental namespace, targeting applications that frequently allocate/deallocate memory. See Example 17 for more details.
Provide feedback or share your use cases for this feature in the Experimental discussion category.
Outward Interoperability
An unwrap function has been added to the core classesβdevice, memory, stream, and streamTagβwhich returns a void* pointer to the mode-specific object used to implement each class.
This advance feature is intended to facilitate interoperability between occa and other accelerated libraries. Application developers are responsible casting the returned pointer to the correct mode-specific type.
In the future, a type-safe interface will be provided for the C++ API.
Breaking Changes
- Compiler flags set via
occa::jsonkernel properties now take precedence over the corresponding environment variables. [#622]
Bugfixes
- Dynamic @exclusive sizes [#121]
- Build artifacts (e.g., binaries for kernel + launcher) are not durable [#515]
- Missing initial index value on @inner/@outer loops causes a segfault during translation. [#610]
- A seg fault is encountered when destroying an occa::kernel that was created via a multi-kernel OKL file in CUDA, HIP, or DPC++ modes. [#624]
Contributors
OCCA is a community driven project that relies on the support of people like you. Thank you everyone who contributed to this release!
Full Changelog: v1.4.0...v1.5.0
Release Version 1.4.0
Features
Stream and device synchronization
- The member function
stream::finish()allows for synchronization with a specific stream. - The member function
device::finishAll()synchronizes all streams on a device.- This is in contrast to
device::finish(), which only synchronizes the current stream.
- This is in contrast to
- Related C and Fortran interfaces have been included for both functions.
- Example 07 streams has been updated to demonstrate intended usage.
Breaking Changes
- All MPI related functionality has been removed from OCCA.
Bugfixes
- A race condition that occurs when writing kernel caches. [#594]
- Reported CPU frequencies are now scaled correctly. [#601]
- Redundant OpenMP and library flags have been removed from the CMake build. [#607]
Contributors
OCCA is a community driven project that relies on the support of people like you. Thank you everyone who contributed to this release!
Full Changelog: v1.3.0...v1.4.0
Release version 1.3.0
Features
CMake Package Files [#533]
OCCA now provides CMake package files which are configured during installation. These package files define an imported target, OCCA::libocca, and look for all required dependencies.
For example, the CMakeLists.txt of downstream projects using OCCA would include
find_package(OCCA REQUIRED)
add_executable(downstream-app ...)
target_link_libraries(downstream-app PRIVATE OCCA::libocca)
add_library(downstream-lib ...)
target_link_libraries(downstream-lib PUBLIC OCCA::libocca)In the case of a downstream library, linking OCCA using the PUBLIC specifier ensures that CMake will automatically forward OCCA's dependencies to applications which use the library.
Environment Module [#580]
During installation, the Env Modules file /modulefiles/occa is generated. When this module is loaded, paths to the installed bin, lib, and include directories are appended to environment variables such as PATH and LD_LIBRARY_PATH.
To use this modulefile, add the following line to your .modulerc file
module use -a <occa-install-prefix>/modulefiles
then call
module load occa
Non-blocking Streams [#498]
The CUDA and HIP backends now support the creation of non-blocking streams.
An example has been added demonstrating how to enable this feature.
Additionally, a new API has been added wrap native backend streams [#525]
Profiling and Debugging
An interface has been added for logging the memory high watermark. [#522]
OCCA preprocessor error messages have also been improved [#572]
OKL
A new attribute, @nobarrier, prevents the automatic addition of barriers to @inner loop blocks. [#544]
Kernel Loop Ranges [#531]
When @inner loop ranges are known at compile-time, compiler optimization directives are added to translated kernel code for the CUDA, HIP, OpenCL, and SYCL backends.
If @inner loop ranges are passed as a kernel argument, the OKL translator will not automatically add optimization directives. In this case, the attribute @max_inner_dims can be used to achieve the same effect.
Dependency Changes
- The minimum version of CMake required is now v3.17 [#528]
Bugfixes
- Compilation on MacOS [#485]
- streamTag timings for OpenCL were corrected to give meaningful results [#518]
- Git ignore build dir if it's a symlink [#536]
- Use mpi_f08 module to fix Intel compiler warnings [#539]
- Use correct directory in run_examples script [#540]
- HIP compiler error and warnings [#547]
- Examples JSON [#549]
- Broken caching for "output" file [#554]
Contributors
OCCA is a community driven project that relies on the support of people like you. Thank you everyone who contributed to this release!
- @Luthaf
- @AljenU
- @deukhyun-cha
- @wjhorne
- @MalachiTimothyPhillips
- @stgeke
- @SFrijters
- @noelchalmers
- @kris-rowe
Full Changelog: v1.2.0...v1.3.0
v1.2.0
Table of Contents
π₯ Exciting News
We introduce 3 awesome new features to the OCCA library. They are still in the π experimental stage, mainly due to performance reasons. We found an initial approach to enabling inlined lambdas and wanted to see how far we could go with them.
Future work includes profiling and optimizing build + launch of the inlined lambdas. How we cache kernel builds and fetch from the cache is still up in the air, but looking forward to tacking this fun problem π.
π occa::forLoop and inlined kernels
Basic Example
Here we generate a for-loop that goes through [0, N) and tiled by tileSize
occa::forLoop()
.tile({N, tileSize})
.run(scope, OCCA_FUNCTION([=](const int index) -> void {
// ...
}));We can do it manually by calling .outer and .inner
occa::forLoop()
.outer(occa::range(0, N, tileSize))
.inner(tileSize)
.run(scope, OCCA_FUNCTION([=](const int outerIndex, const int innerIndex) -> void {
const int index = innerIndex + (tileSize * innerIndex);
// ...
}));Indices + Multiple Dimensions
We give an example where an index array is passed rather than a simple occa::range
Additionally, this @inner loop has 2 dimensions so the expected OCCA_FUNCTION should be taking in an int2 for the inner indices
occa::array<int> indices;
// ...
occa::forLoop()
.outer(indices)
.inner(X, Y)
.run(scope, OCCA_FUNCTION([=](const int outerIndex, const int2 innerIndex) -> void {
// ...
}));π occa::array and Functional Programming
We introduce a simple wrapper on occa::memory which is typed and contains some of the core map and reduce functional methods.
Example
const double dynamicValue = 10;
const double compileTimeValue = 100;
occa::scope scope({
// Passed as arguments
{"dynamicValue", dynamicValue}
}, {
// Passed as compile-time #defines
{"defines/compileTimeValue", compileTimeValue}
});
occa::array<double> doubleValues = (
values.map(OCCA_FUNCTION(scope, [](int value) -> double {
return compileTimeValue + (dynamicValue * value);
}));
);We also include a helper method occa::range which implements most of the occa::array methods but can be used without allocating data before iteration. It's useful if there is no specific input/output but still need to call a generic map or reduce function.
// Iterates through [0, 1, 2]
occa::range(3).map(...);
// Iterates through [3, 4, 5]
occa::range(3, 6).map(...);
// Iterates through [6, 5, 4]
occa::range(6, 3).map(...);
// Iterates through [0, 2, 4]
occa::range(0, 6, 2).map(...);
// Iterates through [6, 4, 2]
occa::range(6, 0, -2).map(...);
// No-op since there isn't anything to iterate through
occa::range(6, 0, 1).map(...);Core methods
forEachmapTomapreduce
Reduction
everymaxminsome
Re-indexing
reverseshiftLeftshiftRight
Utility methods
castclampclampMaxclampMinconcatdotfillslice
Search
findIndexfindincludesindexOflastIndexOf
π Atomics
It's still in it's π experimental stage, but OKL now allows for basic atomic operations!
βΉοΈ Β @atomic should be fully available for Serial and OpenMP modes. There is probably still room for improvement in the OpenMP implementation!
HIP, CUDA, OpenCL) don't have general atomics implemented, only have the following basic updates:
@atomic value += update;@atomic value -= update;@atomic value &= update;@atomic value |= update;@atomic value ^= update;
Inlined @atomic
@atomic *ptr += value;Block @atomic
If you prefer, you can use blocks which will be equivalent to inlined @atomic use if possible
@atomic {
*ptr += value;
}However, generic @atomic blocks are also possible
@atomic {
*ptr += value;
*ptr2 += value2;
}π DPC++ Backend
The DPC++ backend was added by the great work completed jointly by ALCF and Intel, with contributions from:
- Anoop Madhusoodhanan Prabha (Intel)
- Cedric Andreolli (Intel)
- Kris Rowe (ALCF)
- Phillipe Thierry (Intel)
- Saumil Patel (ALCF)
Notes
Currently only building with CMake is supported.
Code Transformation Rewrite
The way statement and expression code transformations are done have been fully rewritten!A functional occa::lang::array class was introduced to help with statement (statement_t) and expression (exprNode) iteration and transformation. More information on PR #404.
Additionally the occa::lang::expr class helps create expressions easily without having to worry about pointers or underlying node objects. More information on PR #407.
β οΈ Breaking Changes
-
This is more of a potential breaking change but in a series of commits, we finally split up the public/private API!
-
occa::propertiesis now deprecated and replaced withocca::json
occa::properties wasn't adding much on top of occa::json, instead making auto-casting harder since we had to handle both json and prop objects. We still keep the properties and props naming convention throughout the library, since that's what they are but have transitioned the types to occa::json.
We still have a
typedef json properties;so there shouldn't be any type-breaking changes for C++. The big difference is how std::string is being cast to json/properties:
std::string?occa::properties: Thestd::stringvalue is parsed into its JSON value. For example, we can pass{key: 1}orkey: 1std::string?occa::json: Theocca::jsonvalue is a literal string value. For example, if we pass{key: 1} then theocca::jsonvalue will be a string whose value is"{key: 1}".
Details about the refactor:
- [C++] The only breaking change is property strings now need to have the surrounding braces ({}) to make it valid JSON
- [C] All property methods have been removed and should be replaced with the Json methods
- [Fortran] All property methods have been removed and should be replaced with the Json methods
- [#475 ] We're removing
umalloc+ UVA since it's only adding extra overhead and introduces a 3rd way to manage memory along withocca::memoryandocca::array.
β Features
- [#376] Adds
host: trueoption tomallocfor better host-allocation strategies (Thanks @noelchalmers!) - [#404] Code transformation just got easier with the introduction of the very functional
statementArrayandexprNodeArraywhich makes it easy to:- Iterate through statements (
forEachornestedForEach(recursive)) - Filter statements (
filterorflatFilter(recursive)) - Transform expressions (
exprNode) throughexprNodeArray::inplaceMap
- Iterate through statements (
- [#407] Introduces the
occa::lang::exprhelper class to build expressions without having to know the underlyingexprNodetypes or worry about pointers! - [#408] Adds
okl/strict_headerskernel property to avoid erroring on headers OCCA can't find. Useful for mode-specific system headers. - [#409] Adds
sourceCodeStatementto inject non-standard source code when needed. - [#410] Adds
@atomicsupport (TODO: Finish most base implementations) - [#411] Updates bash autocomplete
- [#420] Adds
occa::setOccaCacheDirto programmatically set theOCCA_CACHE_DIRat runtime - [#421] Handle new HIP output formats in our builds (Thanks @dmcdougall!)
- [#427] Adds
occa::getDeviceCount(Thanks @noelchalmers!) - [#425] We now test CMake builds in our Github Actions (Thanks @noelchalmers!)
- [#435] Adds
device.wrapMemoryto wrap native pointers intoocca::memoryobjects - [#459] Adds
occaKernelRunWithArgswhich takes anoccaTypepointer - [#494] Adds DPC++ backend (Thanks @kris-rowe π π π)
- [#490] Change...
v1.1.0
π₯ Exciting News
Fortran API
I'm super excited to announce the Fortran API! This was single-handedly designed and built by @awehrfritz, so huge thanks!! The API is not finalized but most likely not changing much in the future since the design matches our other language APIs.
For more information, the initial PR can be found here: #341
Collaboration
For a lot of the OCCA development, most of the work was done by a very small group of people. The project has grown over the last few years from it being a research project to it being used by a few organizations.
During this release, we added CMake support. While it's not directly adding any development features, it will enable the use of the OCCA library to a greater audience which some might say is even more impactful than adding features. What makes this even more exciting is how many unrelated collaborators took part in this work!
Lots of PRs that made this happen: #310, #313, #319, #323, #329, #344, #345, #357
Many thanks to
β οΈ Breaking Changes
-
[de598e6] OCCA now compiles with C++11. C++ projects will need the
-std=c++11flag for most compilers added to compilation. -
[f4fea62] Renamed
occa::hash_tmethodshash_t::toString()βhash_t::getString()hash_t::toFullString()βhash_t::getFullString()
-
[#322] Updates
occaFreeto take in the argument by reference rather than valueoccaFree(value)
β
occaFree(&value)
πExperimental
-
[#341] The Fortran API
-
[08b3a68] Adds
OCCA_JITandOCCA_JIT_WITH_SCOPEmacro. Examples for C++ and C can be found:For Example:
OCCA_JIT( (entries, a, b, ab), ( for (int i = 0; i < entries; ++i; @tile(16, @outer, @inner)) { ab[i] = 100 * (a[i] + b[i]); } ) );
-
[0a77696] Adds
okl-mode.elfor editing OKL kernels in Emacs π
βοΈ Features
-
[f813c34] Adds templated
mallocfor easier use while keeping backwards compatibilityOriginal malloc
occa::memory mem = occa::malloc(10 * sizeof(float), src);
β
Initial dtype malloc
occa::memory mem = occa::malloc(10, occa::dtype::float_, src);β
New malloc
occa::memory mem = occa::malloc<float>(10, src);
-
[92ffb58] Adds templated
umallocfor easier use while keeping backwards compatibilityfloat *a = (float*) occa::umalloc(10, occa::dtype::float_, src); void *b = occa::umalloc(10 * sizeof(float), src);
β
float *a = occa::umalloc<float>(10, src); void *b = occa::umalloc(10 * sizeof(float), src);
-
[c61d636] Adds templated
ptrfor easier use. Defaults to the return value ofvoid*for backwards compatibility.occa::memory mem = occa::malloc(10, occa::dtype::float_, src); float *ptr = (float*) mem.ptr();
β
occa::memory mem = occa::malloc(10, occa::dtype::float_, src); float *ptr = mem.ptr<float>();
-
[c61d636] Adds
use_host_pointerto memory props to auto-wrap source pointers duringmalloccallsfloat *hostPtr = new float[10]; occa::memory mem = occa::malloc<float>(10, occa::dtype::float_, hostPtr, "use_host_pointer: true"); mem.ptr<float>() == hostPtr;
-
Adds polyfills to test compilation of locally unsupported modes
-
[284aff8] Adds method to get the kernel hash
C++occa::kernel::hash()which returns aocca::hash_tobjectCoccaKernelGetHashandoccaKernelGetFullHashwhich return hash as aconst char*
-
[f2f21a3] Adds Metal backend for GPGPU in MacOS
- Requires MacOS to be at least 10.4 (Mojave)
- Requires XCode version to be at least 10.2.1
- Metal does not support
doubleorlongtypes - Issues with global
typedefdue to missing address space qualifiers
-
[386bc4c] Adds
occa translate --launcherto get the host code needed to launch the device kernels (CUDA, HIP, OpenCL, Metal modes) -
[#246] Adds the
@directivepreprocessor attribute to add directives inside macros, such asOCCA_JIT@directive("#pragma ivdep")β
#pragma ivdep -
[#265] Adds
OCCA_CONFIGconfig file to set defaults. There is aconfig.defaults.jsonfile with explanation of possible properties that can be set, including mode-specific properties. -
[#266] Allows HIP to compile CUDA kernels (Thanks @noelchalmers!)
-
[#270] Adds
occa::nullfor passing aNULLequivalent toocca::kernels (occaNullin C) -
[#284] Adds
OCCA_LDFLAGSalong withkernel/compiler_linker_flags(Thanks @stgeke!) -
[#308] Adds
OCCA_SHARED_FLAGSalong withkernel/compiler_shared_flags -
[#308] Adds support to build native C kernels (disabling OKL with
okl/enabledset tofalseand settingkernel/compiler_languagetoCwhich defaults toC++) (Thanks @amikstcyr!) -
[#346] Supports
#includeof standard C and C++ headers in OKL kernels. Note this will print warnings since adding these headers is not a portable solution across supported backends. -
[#347] Adds some standard defines on OKL kernels so users can check if the kernel is being processed by an OKL kernel or not. This is useful when reusing source code for OCCA kernels and non-OCCA kernels.
-
[#349] Keeps some comments around after applying OKL transformation for cleaner generated kernels.
-
[#354] Adds
OKL_KERNEL_HASHdefine to help debug which kernel is currently being run (Tip:printfandstd::coutare available inSerialandOpenMPmodes!) -
[#349][#355][#358][#364] Keeps comments around when transpiling kernels
π Bugs Fixed
- [ebdb659] Updates to HIP backend (Thanks @noelchalmers!)
- [ac117fb] Fixed caching bugs (Thanks Nigel Nunn!)
- [5420005] Use
.dylibinstead of.soon MacOS (Thanks @thilinarmtb!) - [ce4df26] Properly copy over artifacts when building with
PREFIX(Thanks @thilinarmtb!) - [#243] Properly avoid overriding and duplicating compiler shared flags(Thanks @noelchalmers!)
- [f23ce88] Avoids writing lockfile when checking compiler vendor
- [3df3955] Properly fixed untyped umalloc in C
- [4d5d5bc] Kernels from strings were badly generating the launcher kernel
- [27a7420] OpenCL translation was converting the const pointer typedefs
constqualifier &rarr__constant - [#261] Invalid read in
json->propertiesunsafe cast (Thanks for pointing it out @stgeke!) - [#265] Fixes object/mode specific properties from not propagating
- [86dead2] OpenCL timing was done backwards, resulting in negative times. (Thanks @tcew!)
- [#293] Fixed some reference counting issues with the
kernelBuilder - [#400] CUDA context was not being set in a few places (Thanks @amikstcyr!)
π Contributors
v1.0.9
βοΈ Features
-
[beec086] Added
structsupport to OKLThere are still a few missing features when using
structs, such as:-
typedef-ing structstypedef struct { } foo;
-
Expanding
@attributeson struct variablesstruct mat3 { int *values @dim(3, 3); } mat3 m; // Error since the parser right now doesn't "know" `values` is a @dim(3, 3) m.values(0, 0);
-
Access level modifiers are not supported at the moment
struct foo { private: ... }
-
-
[bf1dd16]
@restrictexpands to__restrict__by default-
OpenCL mode overrides it to
restrict -
Setting the property
options/restrictoverrides either of those two values. For example:-
disablewill make it so@restrictis ignored -
Any other value will be used instead (e.g. setting it to
'__declspec(restrict)'would be preferred in Windows)
-
-
-
[897f600] Defaults compiler flags to optimize compilation (e.g.
-O3)
π Bugs Fixed
- [e21962d] CPU wrapped memory was being freed by the
occa::memoryobject
v1.0.8
π’ Annoucement
Python API released!
Check it out at libocca/occa.py or install running
pip install occa- Most of the core API is ported to Python
- Numpy arrays are used seamlessly with
occa.memoryobjects - First steps to supporting JIT-compiled Python functions as OKL kernels:
@okl.kernel
def py_add_vectors(a: Const[List[np.float32]],
b: Const[List[np.float32]],
ab: List[np.float32]) -> None:
for i in okl.range(entries).tile(16):
ab[i] = a[i] + b[i]β
@kernel void py_add_vectors(const float *a,
const float *b,
float *ab) {
for (int i = 0; i < entries; ++i; @tile(16, @outer, @inner)) {
ab[i] = a[i] + b[i];
}
}βοΈ Features
-
[54f4003] Added dtypes which can be optionally used for runtime type checking
- New class
occa::dtype_t - Optional typed
occa::memoryallocation
occa::malloc(10 * sizeof(float)); // Regular malloc occa::malloc(10, occa::dtype::float_); // Typed malloc occa::malloc(10, occa::dtype::get<float>()); // Templated typed malloc
occaMalloc(10 * sizeof(float), NULL, occaDefault); // Regular malloc occaTypedMalloc(10, occaDtypeFloat, NULL, occaDefault); // Typed malloc
- API for creating custom dtypes, for example:
occa::dtype_t vec3; // { float x, y, z } vec3.addField("x", occa::dtype::float_); vec3.addField("y", occa::dtype::float_); vec3.addField("z", occa::dtype::float_);
- New class
-
[994eb2a] Added more kernel methods for the C API
void occaKernelPushArg(occaKernel kernel, occaType arg); void occaKernelClearArgs(occaKernel kernel); void occaKernelRunFromArgs(occaKernel kernel); void occaKernelVaRun(occaKernel kernel, const int argc, va_list args);
-
[f6333f2] Custom kernel library paths, for example:
// Application code occa::io::addLibraryPath("mylibrary", "./path/to/kernels/dir"); occa::io::addLibraryPath("mylibrary", "${MY_LIBRARY_DIR}"); // Kernel #include "occa://mylibrary/kernel.okl"
π Bugs Fixed
v1.0.7
β οΈ Breaking Changes
-
[cd68708] Updated
wrapMemoryto take in anocca::deviceandocca::propertiesBefore
occa::cpu::wrapMemory(void* ptr, const udim_t bytes)After
occa::cpu::wrapMemory(occa::device device, void* ptr, const udim_t bytes, occa::properties props) -
[959ec4a] Renamed
occaSetDeviceFromInfosto fit the rest of the methodsBefore
occaSetDeviceFromInfos(const char *info)After
occaSetDeviceFromString(const char *info) -
[7735c66] Removed some redundant stream methods
Before
occa::device::freeStream(occa::stream) // C++ occaDeviceFreeStream(occaStream) // C
After (Not new)
occa::stream::free() // C++ occaFree(occaStream) // C
-
[f81054d] Removed
occa::opencl::event()and moved it toocca::opencl::streamTag::clEvent -
[f81054d] Removed
occa::cuda::event()and moved it toocca::cuda::streamTag::cuEvent -
[f81054d] Removed
occa::streamTag::tagTime. Tags can only be used for:- Waiting for queued tasks to finish (e.g. launched kernels or memory copies)
- Time gaps between 2 tags
βοΈ Features
- [daf0300] Faster
makebuild and addedmake info@v-dobrev - [1024a62] Switched garbage collection strategy to
NULLout existing device/kernel/memory objects when one is freed. This switchesSEGFAULTissues toocca::exceptionerrors that can be more easily debugged. - [527494c] Linalg methods reuse device buffers for reductions
- [ce46013] Loading cached kernels are sped up by avoiding locks if possible
- [e27b29e] Added
occaJson - [fdd2d7c] Added
occaCreateDeviceFromString - [fdd2d7c] Added CLI to C exampleOpenCL mode
- [959ec4a] Added UVA methods to C API
- [7735c66] The
occa::streamclass can now be extended - [f81054d] The
occa::streamTagclass can now be extended
π Bugs Fixed
- [99ce6fb] Linalg properly deletes array allocations @jdahm
- [b7384bc] Kernel hashes is generated only from needed props (e.g. ignores
verbose) - [780a06a] OpenCL
__global,__local, and__kernelare properly inserted in the beginning - [dba0db9]
memory::slicewas improperly freeing UVA pointers in - [3260a05] The
verboseproperty was being overwritten in CUDA mode