Public API
User facing functions supported by cuNumeric.jl
cuNumeric.versioninfo Method
versioninfo()
Prints the cuNumeric build configuration summary, including package metadata, Julia and compiler version, and paths to core dependencies.
sourcecuNumeric.disable_gc! Method
disable_gc!()
Disables the automatic garbage collection heuristics. This gives the user full control over memory management.
sourcecuNumeric.init_gc! Method
init_gc!()
Initializes the cuNumeric garbage collector by querying the available device memory and enabling the automatic GC heuristics.
sourcecuNumeric.get_time_microseconds Method
Returns the timestamp in microseconds. Blocks on all Legate operations preceding the call to this function.
sourcecuNumeric.get_time_nanoseconds Method
Returns the timestamp in nanoseconds. Blocks on all Legate operations preceding the call to this function.
sourceBase.:- Method
Base.:-(val::Number, arr::NDArray)
Base.:-(arr::NDArray, val::Number)
Base.:-(lhs::NDArray, rhs::NDArray)
Perform subtraction involving an NDArray
and a scalar or between two NDArrays.
val - arr
subtractsval
byarr
.arr - val
subtracts scalarval
from each element ofarr
.Element-wise subtraction is supported between two NDArrays.
Broadcasting is also supported for these operations.
Examples
lhs - 3
3 - rhs
lhs - rhs
Base.:/ Method
Base.:/(arr::NDArray, val::Scalar)
Base.Broadcast.broadcasted(::typeof(/), arr::NDArray, val::Scalar)
Returns the element-wise multiplication of arr
by the scalar reciprocal 1 / val
.
Examples
arr = cuNumeric.ones(2, 2)
arr / 2
Base.:== Method
==(arr::NDArray, julia_arr::Array)
==(julia_arr::Array, arr::NDArray)
Compare an NDArray
and a Julia Array
for element-wise equality.
Returns true
if both arrays have the same shape and all corresponding elements are equal. Returns false
otherwise (including if sizes differ, with a warning).
Warning
This function uses scalar indexing and should not be used in production code. This is meant for testing.
Examples
arr = cuNumeric.ones(2, 2)
julia_arr = ones(2, 2)
arr == julia_arr
julia_arr == arr
julia_arr2 = zeros(2, 2)
arr == julia_arr2
Base.:== Method
==(arr1::NDArray, arr2::NDArray)
Check if two NDArrays are equal element-wise.
Returns true
if both arrays have the same shape and all corresponding elements are equal. Currently supports arrays up to 3 dimensions. For higher dimensions, returns false
with a warning.
Warning
This function uses scalar indexing and should not be used in production code. This is meant for testing.
Examples
a = cuNumeric.ones(2, 2)
b = cuNumeric.ones(2, 2)
a == b
c = cuNumeric.zeros(2, 2)
a == c
Base.Broadcast.broadcasted Method
Base.Broadcast.broadcasted(::typeof(/), lhs::NDArray, rhs::NDArray)
Perform element-wise division of two NDArrays.
Examples
A = cuNumeric.rand(2, 2)
B = cuNumeric.ones(2, 2)
C = A ./ B
typeof(C)
Base.Broadcast.broadcasted Method
Base.Broadcast.broadcasted(::typeof(/), val::Scalar, arr::NDArray)
Throws an error since element-wise division of a scalar by an NDArray is not supported yet.
Examples
arr = cuNumeric.ones(2, 2)
# 2 ./ arr # ERROR
Base.copy Method
Base.copy(arr::NDArray)
Create and return a deep copy of the given NDArray
.
Examples
a = cuNumeric.ones(2, 2)
b = copy(a)
b === a
b[1,1] == a[1,1]
Base.firstindex Method
Base.firstindex(arr::NDArray, dim::Int)
Base.lastindex(arr::NDArray, dim::Int)
Base.lastindex(arr::NDArray)
Provide the first and last valid indices along a given dimension dim
for NDArray
.
firstindex
always returns 1, since Julia arrays are 1-indexed.lastindex
returns the size of the array along the specified dimension.lastindex(arr)
returns the size along the first dimension.
Examples
arr = cuNumeric.rand(4, 5);
firstindex(arr, 2)
lastindex(arr, 2)
lastindex(arr)
Base.isapprox Method
isapprox(arr1::NDArray, arr2::NDArray; atol=0, rtol=0)
isapprox(arr::NDArray, julia_array::AbstractArray; atol=0, rtol=0)
isapprox(julia_array::AbstractArray, arr::NDArray; atol=0, rtol=0)
Approximate equality comparison between two NDArray
s or between an NDArray
and a Julia AbstractArray
.
Returns true
if the arrays have the same shape and all corresponding elements are approximately equal within the given absolute tolerance atol
and relative tolerance rtol
.
The second and third methods handle comparisons between NDArray
and Julia arrays by forwarding to a common comparison function.
Warning
This function uses scalar indexing and should not be used in production code. This is meant for testing.
Examples
arr1 = cuNumeric.ones(2, 2)
arr2 = cuNumeric.ones(2, 2)
julia_arr = ones(2, 2)
isapprox(arr1, arr2)
isapprox(arr1, julia_arr)
isapprox(julia_arr, arr2)
Base.size Method
Base.size(arr::NDArray)
Base.size(arr::NDArray, dim::Int)
Return the size of the given NDArray
.
Base.size(arr)
returns a tuple of dimensions of the array.Base.size(arr, dim)
returns the size of the array along the specified dimensiondim
.
These override Base's size methods for the NDArray
type, using the underlying cuNumeric API to query array shape.
Examples
arr = cuNumeric.rand(3, 4, 5);
size(arr)
size(arr, 2)
LinearAlgebra.mul! Method
LinearAlgebra.mul!(out::NDArray, arr1::NDArray, arr2::NDArray)
Compute the matrix multiplication (dot product) of arr1
and arr2
, storing the result in out
.
This function performs the operation in-place, modifying out
.
Examples
a = cuNumeric.ones(2, 3)
b = cuNumeric.ones(3, 2)
out = cuNumeric.zeros(2, 2)
LinearAlgebra.mul!(out, a, b)
Random.rand! Method
cuNumeric.rand!(arr::NDArray)
Fills arr
with Float64s uniformly at random.
cuNumeric.rand(NDArray, dims::Int...)
cuNumeric.rand(NDArray, dims::Tuple)
Create a new NDArray
of element type Float64, filled with uniform random values.
This function uses the same signature as Base.rand
with a custom backend, and currently supports only Float64
with uniform distribution (code = 0
).
Examples
cuNumeric.rand(NDArray, 2, 2)
cuNumeric.rand(NDArray, (4, 1))
A = cuNumeric.zeros(2, 2); cuNumeric.rand!(A)
cuNumeric.add! Method
add!(out::NDArray, arr1::NDArray, arr2::NDArray)
Compute element-wise addition of arr1
and arr2
storing the result in out
.
This is an in-place operation and is used to support .+=
style syntax.
Examples
a = cuNumeric.ones(2, 2)
b = cuNumeric.ones(2, 2)
out = similar(a)
add!(out, a, b)
cuNumeric.assign Method
assign(arr::NDArray, other::NDArray)
Assign the contents of other
to arr
element-wise.
This function overwrites the data in arr
with the values from other
. Both arrays must have the same shape.
Examples
a = cuNumeric.zeros(2, 2)
b = cuNumeric.ones(2, 2)
cuNumeric.assign(a, b);
a[1,1]
cuNumeric.full Method
cuNumeric.full(dims::Tuple, val)
cuNumeric.full(dim::Int, val)
Create an NDArray
filled with the scalar value val
, with the shape specified by dims
.
Examples
cuNumeric.full((2, 3), 7.5)
cuNumeric.full(4, 0)
cuNumeric.multiply! Method
multiply!(out::NDArray, arr1::NDArray, arr2::NDArray)
Compute element-wise multiplication of arr1
and arr2
, storing the result in out
.
This function performs the operation in-place, modifying out
.
Examples
a = cuNumeric.ones(2, 2)
b = cuNumeric.ones(2, 2)
out = similar(a)
multiply!(out, a, b)
cuNumeric.ones Method
cuNumeric.ones([T=Float32,] dims::Int...)
cuNumeric.ones([T=Float32,] dims::Tuple)
Create an NDArray with element type T
, of all zeros with size specified by dims
. This function has the same signature as Base.ones
, so be sure to call it as cuNuermic.ones
.
Examples
cuNumeric.ones(2, 2)
cuNumeric.ones(Float32, 3)
cuNumeric.ones(Int32, (2, 3))
cuNumeric.zeros Method
cuNumeric.zeros([T=Float32,] dims::Int...)
cuNumeric.zeros([T=Float32,] dims::Tuple)
Create an NDArray with element type T
, of all zeros with size specified by dims
. This function mirrors the signature of Base.zeros
, and defaults to Float32
when the type is omitted.
Examples
cuNumeric.zeros(2, 2)
cuNumeric.zeros(Float64, 3)
cuNumeric.zeros(Int32, (2,3))
cuNumeric.unary_op_map_no_args Constant
Supported Unary Operations
The following unary operations are supported and can be applied directly to NDArray
values:
abs
acos
asin
asinh
atan
atanh
cbrt
conj
cos
cosh
deg2rad
exp
exp2
expm1
floor
log
log10
log1p
log2
-
(negation)rad2deg
sin
sinh
sqrt
square
tan
tanh
These operations are applied elementwise by default and follow standard Julia semantics.
Examples
A = NDArray(randn(Float32, 3, 3))
abs(A)
log.(A .+ 1)
-sqrt(abs(A))
square(A)
cuNumeric.unary_reduction_map Constant
Supported Unary Reduction Operations
The following unary reduction operations are supported and can be applied directly to NDArray
values:
• maximum
• minimum
• prod
• sum
These operations follow standard Julia semantics.
Examples
A = NDArray(randn(Float32, 3, 3))
maximum(A)
sum(A)
cuNumeric.binary_op_map Constant
Supported Binary Operations
The following binary operations are supported and can be applied elementwise to pairs of NDArray
values:
• +
• -
• *
• /
• ^
• div
• atan
• hypot
These operations are applied elementwise by default and follow standard Julia semantics.
Examples
A = NDArray(randn(Float64, 4))
B = NDArray(randn(Float64, 4))
A + B
A / B
hypot.(A, B)
div.(A, B)
A .^ 2
CUDA.jl Tasking
This section will detail how to use custom CUDA.jl kernels with the Legate runtime.
cuNumeric.@cuda_task Macro
@cuda_task(f(args...))
Compile a Julia GPU kernel to PTX, register it with the Legate runtime, and return a CUDATask
object for later launch.
Arguments
f
— The name of the Julia CUDA.jl GPU kernel function to compile.args...
— Example arguments to the kernel, used to determine the argument type signature when generating PTX.
Description
This macro automates the process of:
Inferring the CUDA argument types for the given
args
usingmap_ndarray_cuda_types
.Using
CUDA.code_ptx
to compile the specified GPU kernel (f
) into raw PTX text for the inferred types.Extracting the kernel's function symbol name from the PTX using
extract_kernel_name
.Registering the compiled PTX and kernel name with the Legate runtime via
ptx_task
, making it available for GPU execution.Returning a
CUDATask
struct that stores the kernel name and type signature, which can be used to configure and launch the kernel later.
Notes
The
args...
are not executed; they are used solely for type inference.This macro is intended for use with the Legate runtime and assumes a CUDA context is available.
Make sure your kernel code is GPU-compatible and does not rely on unsupported Julia features.
Example
mytask = @cuda_task my_kernel(A, B, C)
cuNumeric.@launch Macro
@launch(; task, blocks=(1,), threads=(256,), inputs=(), outputs=(), scalars=())
Launch a GPU kernel (previously registered via @cuda_task
) through the Legate runtime.
Keywords
task
— ACUDATask
object, typically returned by@cuda_task
.blocks
— Tuple or single element specifying the CUDA grid dimensions. Defaults to(1,)
.threads
— Tuple or single element specifying the CUDA block dimensions. Defaults to(256,)
.inputs
— Tuple or single element of input NDArray objects.outputs
— Tuple or single element of output NDArray objects.scalars
— Tuple or single element of scalar values.
Description
The @launch
macro validates the provided keywords, ensuring only the allowed set (:task
, :blocks
, :threads
, :inputs
, :outputs
, :scalars
) are present. It then expands to a call to cuNumeric.launch
, passing the given arguments to the Legate runtime for execution.
This macro is meant to provide a concise, declarative syntax for launching GPU kernels, separating kernel compilation (via @cuda_task
) from execution configuration.
Notes
task
must be a kernel registered with the runtime, usually from@cuda_task
.All keyword arguments must be specified as assignments, e.g.
blocks=(2,2)
not positional arguments.Defaults are chosen for single-block, 256-thread 1D launches.
The macro escapes its body so that the values of inputs/outputs/scalars are captured from the surrounding scope at macro expansion time.
Example
mytask = @cuda_task my_kernel(A, B, C)
@launch task=mytask blocks=(8,8) threads=(32,32) inputs=(A, B) outputs=(C)
CNPreferences
This section details how to set custom build configuration options. To see more details visit our install guide here.
CNPreferences.check_unchanged Method
CNPreferences.check_unchanged()
Throws an error if the preferences have been modified in the current Julia session, or if they are modified after this function is called.
This is should be called from the __init__()
function of any package which relies on the values of CNPreferences.
CNPreferences.use_conda Method
CNPreferences.use_conda(conda_env::String; export_prefs = false, force = true)
Tells cuNumeric.jl to use existing conda install. We make no gurantees of compiler compatability at this time.
Expects conda_env
to be the absolute path to the root of the environment. For example, /home/julialegate/.conda/envs/cunumeric-gpu
CNPreferences.use_developer_mode Method
CNPreferences.use_developer_mode(; wrapper_branch="main", use_cupynumeric_jll=true, cupynumeric_path=nothing, export_prefs = false, force = true)
Tells cuNumeric.jl to enable developer mode. This will clone cunumeric_jl_wrapper into cuNumeric.jl/deps.
To specify a cunumeric_jl_wrapper branch: wrapper_branch="some-branch"
To disable using cupynumeric_jll: use_cupynumeric_jll=false
If you disable cupynumeric_jll, then you need to set a path to cuPyNumeric with cupynumeric_path="/path/to/cupynumeric"
CNPreferences.use_jll_binary Method
CNPreferences.use_jll_binary(; export_prefs = false, force = true)
Tells Legate.jl to use JLLs. This is the default option.
sourceInternal API
cuNumeric.NDArray Type
Internal API
The NDArray type represents a multi-dimensional array in cuNumeric. It is a wrapper around a Legate array and provides various methods for array manipulation and operations. Finalizer calls nda_destroy_array
to clean up the underlying Legate array when the NDArray is garbage collected.
Base.eltype Method
Base.eltype(arr::NDArray)
Internal API
Returns the element type of the NDArray
.
This method uses nda_array_type_code
internally to map to the appropriate Julia element type.
cuNumeric.LegateType Method
LegateType(T::Type)
Internal API
Converts a Julia type T
to the corresponding Legate type.
cuNumeric.compare Method
compare(x, y, max_diff)
Internal API
Compare two arrays x
and y
for approximate equality within a maximum difference max_diff
.
Supports comparisons between:
an
NDArray
and a JuliaAbstractArray
two
NDArray
sa Julia
AbstractArray
and anNDArray
Returns true
if the arrays have the same shape and element type (for mixed types), and all corresponding elements differ by no more than max_diff
.
Emits warnings when array sizes or element types differ.
Warning
This function uses scalar indexing and should not be used in production code. This is meant for testing.
Notes
This is an internal API used by higher-level approximate equality functions.
Does not support relative tolerance (
rtol
).
Behavior
Checks size compatibility.
Checks element type compatibility for
NDArray
vs Julia array.Iterates over elements using
CartesianIndices
to compare element-wise difference.
cuNumeric.padded_shape Method
padded_shape(arr::NDArray)
Internal API
Return the size of the given NDArray
. This will include the padded size.
cuNumeric.shape Method
shape(arr::NDArray)
Internal API
Return the size of the given NDArray
.
cuNumeric.slice_array Method
slice_array(slices::Vararg{Tuple{Union{Int,Nothing},Union{Int,Nothing}},N}) where {N}
Internal API
Constructs a vector of cuNumeric.Slice
objects from a variable number of (start, stop)
tuples.
Each tuple corresponds to a dimension slice, using slice
internally.
cuNumeric.to_cpp_index Function
to_cpp_index(d::Int64, int_type::Type=UInt64)
Internal API
Converts a single Julia 1-based index d
to a zero-based C++ style index wrapped in StdVector
.
cuNumeric.to_cpp_index Method
to_cpp_index(idx::Dims{N}, int_type::Type=UInt64) where {N}
Internal API
Converts a Julia 1-based index tuple idx
to a zero-based C++ style index wrapped in StdVector
of the specified integer type.
Each element of idx
is decremented by 1 to adjust from Julia’s 1-based indexing to C++ 0-based indexing.