-
Notifications
You must be signed in to change notification settings - Fork 90
Description
Table of Contents
π₯ Exciting News
We introduce 3 awesome new features to the OCCA library. They are still in the π experimental stage, mainly due to performance reasons. We found an initial approach to enabling inlined lambdas and wanted to see how far we could go with them.
Future work includes profiling and optimizing build + launch of the inlined lambdas. How we cache kernel builds and fetch from the cache is still up in the air, but looking forward to tacking this fun problem π.
π occa::forLoop and inlined kernels
Basic Example
Here we generate a for-loop that goes through [0, N) and tiled by tileSize
occa::forLoop()
.tile({N, tileSize})
.run(scope, OCCA_FUNCTION([=](const int index) -> void {
// ...
}));We can do it manually by calling .outer and .inner
occa::forLoop()
.outer(occa::range(0, N, tileSize))
.inner(tileSize)
.run(scope, OCCA_FUNCTION([=](const int outerIndex, const int innerIndex) -> void {
const int index = innerIndex + (tileSize * innerIndex);
// ...
}));Indices + Multiple Dimensions
We give an example where an index array is passed rather than a simple occa::range
Additionally, this @inner loop has 2 dimensions so the expected OCCA_FUNCTION should be taking in an int2 for the inner indices
occa::array<int> indices;
// ...
occa::forLoop()
.outer(indices)
.inner(X, Y)
.run(scope, OCCA_FUNCTION([=](const int outerIndex, const int2 innerIndex) -> void {
// ...
}));π occa::array and Functional Programming
We introduce a simple wrapper on occa::memory which is typed and contains some of the core map and reduce functional methods.
Example
const double dynamicValue = 10;
const double compileTimeValue = 100;
occa::scope scope({
// Passed as arguments
{"dynamicValue", dynamicValue}
}, {
// Passed as compile-time #defines
{"defines/compileTimeValue", compileTimeValue}
});
occa::array<double> doubleValues = (
values.map(OCCA_FUNCTION(scope, [](int value) -> double {
return compileTimeValue + (dynamicValue * value);
}));
);We also include a helper method occa::range which implements most of the occa::array methods but can be used without allocating data before iteration. It's useful if there is no specific input/output but still need to call a generic map or reduce function.
// Iterates through [0, 1, 2]
occa::range(3).map(...);
// Iterates through [3, 4, 5]
occa::range(3, 6).map(...);
// Iterates through [6, 5, 4]
occa::range(6, 3).map(...);
// Iterates through [0, 2, 4]
occa::range(0, 6, 2).map(...);
// Iterates through [6, 4, 2]
occa::range(6, 0, -2).map(...);
// No-op since there isn't anything to iterate through
occa::range(6, 0, 1).map(...);Core methods
forEachmapTomapreduce
Reduction
everymaxminsome
Re-indexing
reverseshiftLeftshiftRight
Utility methods
castclampclampMaxclampMinconcatdotfillslice
Search
findIndexfindincludesindexOflastIndexOf
π Atomics
It's still in it's π experimental stage, but OKL now allows for basic atomic operations!
βΉοΈ Β @atomic should be fully available for Serial and OpenMP modes. There is probably still room for improvement in the OpenMP implementation!
HIP, CUDA, OpenCL) don't have general atomics implemented, only have the following basic updates:
@atomic value += update;@atomic value -= update;@atomic value &= update;@atomic value |= update;@atomic value ^= update;
Inlined @atomic
@atomic *ptr += value;Block @atomic
If you prefer, you can use blocks which will be equivalent to inlined @atomic use if possible
@atomic {
*ptr += value;
}However, generic @atomic blocks are also possible
@atomic {
*ptr += value;
*ptr2 += value2;
}π DPC++ Backend
The DPC++ backend was added by the great work completed jointly by ALCF and Intel, with contributions from:
- Anoop Madhusoodhanan Prabha (Intel)
- Cedric Andreolli (Intel)
- Kris Rowe (ALCF)
- Phillipe Thierry (Intel)
- Saumil Patel (ALCF)
Notes
Currently only building with CMake is supported.
Code Transformation Rewrite
The way statement and expression code transformations are done have been fully rewritten!A functional occa::lang::array class was introduced to help with statement (statement_t) and expression (exprNode) iteration and transformation. More information on PR #404.
Additionally the occa::lang::expr class helps create expressions easily without having to worry about pointers or underlying node objects. More information on PR #407.
β οΈ Breaking Changes
-
This is more of a potential breaking change but in a series of commits, we finally split up the public/private API!
-
occa::propertiesis now deprecated and replaced withocca::json
occa::properties wasn't adding much on top of occa::json, instead making auto-casting harder since we had to handle both json and prop objects. We still keep the properties and props naming convention throughout the library, since that's what they are but have transitioned the types to occa::json.
We still have a
typedef json properties;so there shouldn't be any type-breaking changes for C++. The big difference is how std::string is being cast to json/properties:
std::string?occa::properties: Thestd::stringvalue is parsed into its JSON value. For example, we can pass{key: 1}orkey: 1std::string?occa::json: Theocca::jsonvalue is a literal string value. For example, if we pass{key: 1} then theocca::jsonvalue will be a string whose value is"{key: 1}".
Details about the refactor:
- [C++] The only breaking change is property strings now need to have the surrounding braces ({}) to make it valid JSON
- [C] All property methods have been removed and should be replaced with the Json methods
- [Fortran] All property methods have been removed and should be replaced with the Json methods
- [Removing umalloc on v1.2.0 (Feedback Wanted)Β #475 ] We're removing
umalloc+ UVA since it's only adding extra overhead and introduces a 3rd way to manage memory along withocca::memoryandocca::array.
β Features
- [[Core] Adding hostMalloc APIΒ #376] Adds
host: trueoption tomallocfor better host-allocation strategies (Thanks @noelchalmers!) - [[Lang] Add statementArray methods and switch over CUDAΒ #404] Code transformation just got easier with the introduction of the very functional
statementArrayandexprNodeArraywhich makes it easy to:- Iterate through statements (
forEachornestedForEach(recursive)) - Filter statements (
filterorflatFilter(recursive)) - Transform expressions (
exprNode) throughexprNodeArray::inplaceMap
- Iterate through statements (
- [[Lang][Expr] Adds expr: expression builder helperΒ #407] Introduces the
occa::lang::exprhelper class to build expressions without having to know the underlyingexprNodetypes or worry about pointers! - [[Lang][Preprocessor] Adds okl/strict_headers optionΒ #408] Adds
okl/strict_headerskernel property to avoid erroring on headers OCCA can't find. Useful for mode-specific system headers. - [[Lang] Adds source code statementΒ #409] Adds
sourceCodeStatementto inject non-standard source code when needed. - [[Lang] Adds @.atomic and a few implementationsΒ #410] Adds
@atomicsupport (TODO: Finish most base implementations) - [[CLI] Better autocompleteΒ #411] Updates bash autocomplete
- [[QOL] Adds occa::env::setOccaCacheDirΒ #420] Adds
occa::setOccaCacheDirto programmatically set theOCCA_CACHE_DIRat runtime - [Make build system understand new hipconfig outputΒ #421] Handle new HIP output formats in our builds (Thanks @dmcdougall!)
- [[Modes] Adding a getDeviceCount API to query number of available devices in an enabled modeΒ #427] Adds
occa::getDeviceCount(Thanks @noelchalmers!) - [[CMake] Enable use of CMake in github actions .yml (continued from libocca/occa#342)Β #425] We now test CMake builds in our Github Actions (Thanks @noelchalmers!)
- [[Core] Adds wrapMemoryΒ #435] Adds
device.wrapMemoryto wrap native pointers intoocca::memoryobjects - [[C] Adds occaKernelRunWithArgsΒ #459] Adds
occaKernelRunWithArgswhich takes anoccaTypepointer - [[Modes] DPC++ backendΒ #494] Adds DPC++ backend (Thanks @kris-rowe π π π)
- [[Core][Modes] New buffer object to track underlying mode allocations β¦Β #490] Changes how
occa::memorytracks objects to properly handle slicing (Thanks @noelchalmers π )
π Bugs Fixed
- [[CMake] Fix intermittent errors when building Fortran examples/testsΒ #405][[CLI] Report OCCA_MPI_ENABLEDΒ #412][[CMake] Fix changed subdirs in testsΒ #413][[Fix] shadowing warningsΒ #414] Lots of fixes from @SFrijters, thank you! π
- [[#396] Avoids kernel calls on noop dim sizesΒ #431] Avoid calling kernels when the run sizes are
0 - [[#399][Sys] Remove sysctl usageΒ #432] Removed use of
sysctlsince it was deprecated and later removed from the C standard - [[HIP] Update ptr ref hackeryΒ #437] Fix HIP pointer reference
- [Presumed race condition in cachingΒ #449] Replacing locks with temporary files to avoid race conditions (Thanks @SFrijters + @jedbrown!!)
- [[Lang] Adding missing if statement condition to inner statement arrayΒ #456] Iterates through an
ifstatement'scondition(Thanks @noelchalmers!) - [[Fortran] Generate fixed arity interfaceΒ #477] Generate fixed arity interface (Thanks @SFrijters!)
- [Invalid reads in occa::primitiveΒ #495] Fixes unsafe use of
strncmp(Thanks @MalachiTimothyPhillips!)