Public API
User facing functions supported by cuNumeric.jl
cuNumeric.versioninfo Method
versioninfo()Prints the cuNumeric build configuration summary, including package metadata, Julia and compiler version, and paths to core dependencies.
sourcecuNumeric.disable_gc! Method
disable_gc!()Disables the automatic garbage collection heuristics. This gives the user full control over memory management.
sourcecuNumeric.init_gc! Method
init_gc!()Initializes the cuNumeric garbage collector by querying the available device memory and enabling the automatic GC heuristics.
sourcecuNumeric.@cunumeric Macro
@cunumeric exprWraps a block of code so that all temporary NDArray allocations (e.g. from slicing or function calls) are tracked and safely freed at the end of the block. Ensures proper cleanup of GPU memory by inserting maybe_insert_delete calls automatically.
cuNumeric.allowpromotion Function
allowpromotion([true])
allowpromotion([true]) do
...
endUse this function to allow or disallow promotion to double precision, either globally or for the duration of the do block.
See also: @allowpromotion.
cuNumeric.allowscalar Function
allowscalar([true])
allowscalar([true]) do
...
endUse this function to allow or disallow scalar indexing, either globall or for the duration of the do block.
See also: @allowscalar.
cuNumeric.assertpromotion Method
assertpromotion(op)Assert that a certain operation op performs promotion to a wider type. If this is not allowed, an error will be thrown (assertpromotion).
cuNumeric.assertscalar Method
assertscalar(op::String)Assert that a certain operation op performs scalar indexing. If this is not allowed, an error will be thrown (allowscalar).
cuNumeric.@allowpromotion Macro
@allowpromotion() begin
# code that can use scalar indexing
endDenote which operations can use scalar indexing.
See also: allowpromotion.
cuNumeric.@allowscalar Macro
@allowscalar() begin
# code that can use scalar indexing
endDenote which operations can use scalar indexing.
See also: allowscalar.
cuNumeric.get_time_microseconds Method
Returns the timestamp in microseconds. Blocks on all Legate operations preceding the call to this function.
sourcecuNumeric.get_time_nanoseconds Method
Returns the timestamp in nanoseconds. Blocks on all Legate operations preceding the call to this function.
sourceBase.:== Method
==(arr::NDArray, julia_arr::Array)
==(julia_arr::Array, arr::NDArray)Compare an NDArray and a Julia Array for element-wise equality.
Returns true if both arrays have the same shape and all corresponding elements are equal. Returns false otherwise (including if sizes differ, with a warning).
Warning
This function uses scalar indexing and should not be used in production code. This is meant for testing.
Examples
arr = cuNumeric.ones(2, 2)
julia_arr = ones(2, 2)
arr == julia_arr
julia_arr == arr
julia_arr2 = zeros(2, 2)
arr == julia_arr2Base.:== Method
==(arr1::NDArray, arr2::NDArray)Check if two NDArrays are equal element-wise.
Returns true if both arrays have the same shape and all corresponding elements are equal. Currently supports arrays up to 3 dimensions. For higher dimensions, returns false with a warning.
Warning
This function uses scalar indexing and should not be used in production code. This is meant for testing.
Examples
a = cuNumeric.ones(2, 2)
b = cuNumeric.ones(2, 2)
a == b
c = cuNumeric.zeros(2, 2)
a == cBase.copy Method
Base.copy(arr::NDArray)Create and return a deep copy of the given NDArray.
Examples
a = cuNumeric.ones(2, 2)
b = copy(a)
b === a
b[1,1] == a[1,1]Base.copyto! Method
copyto!(arr::NDArray, other::NDArray)Assign the contents of other to arr element-wise.
This function overwrites the data in arr with the values from other. Both arrays must have the same shape.
Examples
a = cuNumeric.zeros(2, 2)
b = cuNumeric.ones(2, 2)
copyto!(a, b);
a[1,1]Base.firstindex Method
Base.firstindex(arr::NDArray, dim::Int)
Base.lastindex(arr::NDArray, dim::Int)
Base.lastindex(arr::NDArray)Provide the first and last valid indices along a given dimension dim for NDArray.
firstindexalways returns 1, since Julia arrays are 1-indexed.lastindexreturns the size of the array along the specified dimension.lastindex(arr)returns the size along the first dimension.
Examples
arr = cuNumeric.rand(4, 5);
firstindex(arr, 2)
lastindex(arr, 2)
lastindex(arr)Base.isapprox Method
isapprox(arr1::NDArray, arr2::NDArray; atol=0, rtol=0)
isapprox(arr::NDArray, julia_array::AbstractArray; atol=0, rtol=0)
isapprox(julia_array::AbstractArray, arr::NDArray; atol=0, rtol=0)Approximate equality comparison between two NDArrays or between an NDArray and a Julia AbstractArray.
Returns true if the arrays have the same shape and all corresponding elements are approximately equal within the given absolute tolerance atol and relative tolerance rtol.
The second and third methods handle comparisons between NDArray and Julia arrays by forwarding to a common comparison function.
Warning
This function uses scalar indexing and should not be used in production code. This is meant for testing.
Examples
arr1 = cuNumeric.ones(2, 2)
arr2 = cuNumeric.ones(2, 2)
julia_arr = ones(2, 2)
isapprox(arr1, arr2)
isapprox(arr1, julia_arr)
isapprox(julia_arr, arr2)Base.size Method
Base.size(arr::NDArray)
Base.size(arr::NDArray, dim::Int)Return the size of the given NDArray.
Base.size(arr)returns a tuple of dimensions of the array.Base.size(arr, dim)returns the size of the array along the specified dimensiondim.
These override Base's size methods for the NDArray type, using the underlying cuNumeric API to query array shape.
Examples
arr = cuNumeric.rand(3, 4, 5);
size(arr)
size(arr, 2)Random.rand! Method
cuNumeric.rand!(arr::NDArray)Fills arr with AbstractFloats uniformly at random.
cuNumeric.rand(NDArray, dims::Int...)
cuNumeric.rand(NDArray, dims::Tuple)Create a new NDArray of element type Float64, filled with uniform random values.
This function uses the same signature as Base.rand with a custom backend, and currently supports only Float64 with uniform distribution (code = 0). In order to support other Floats, we type convert for the user automatically.
Examples
cuNumeric.rand(NDArray, 2, 2)
cuNumeric.rand(NDArray, (4, 1))
A = cuNumeric.zeros(2, 2); cuNumeric.rand!(A)cuNumeric.as_type Method
as_type(arr::NDArray, t::Type{T}) where {T}Convert the element type of arr to type T, returning a new NDArray with elements cast to T.
Arguments
arr::NDArray: Input array.t::Type{T}: Target element type.
Returns
A new NDArray with the same shape as arr but with elements of type T.
Examples
arr = cuNumeric.rand(4, 5);
as_type(arr, Float32)cuNumeric.falses Method
cuNumeric.falses(dims::Tuple, val)
cuNumeric.falses(dim::Int, val)
cuNumeric.falses(dims::Int...)Create an NDArray filled with the false, with the shape specified by dims.
Examples
cuNumeric.falses(2, 3)cuNumeric.full Method
cuNumeric.full(dims::Tuple, val)
cuNumeric.full(dim::Int, val)Create an NDArray filled with the scalar value val, with the shape specified by dims.
Examples
cuNumeric.full((2, 3), 7.5)
cuNumeric.full(4, 0)cuNumeric.ones Method
cuNumeric.ones([T=Float32,] dims::Int...)
cuNumeric.ones([T=Float32,] dims::Tuple)Create an NDArray with element type T, of all zeros with size specified by dims. This function has the same signature as Base.ones, so be sure to call it as cuNuermic.ones.
Examples
cuNumeric.ones(2, 2)
cuNumeric.ones(Float32, 3)
cuNumeric.ones(Int32, (2, 3))cuNumeric.trues Method
cuNumeric.trues(dims::Tuple, val)
cuNumeric.trues(dim::Int, val)
cuNumeric.trues(dims::Int...)Create an NDArray filled with the true, with the shape specified by dims.
Examples
cuNumeric.trues(2, 3)cuNumeric.zeros Method
cuNumeric.zeros([T=Float32,] dims::Int...)
cuNumeric.zeros([T=Float32,] dims::Tuple)Create an NDArray with element type T, of all zeros with size specified by dims. This function mirrors the signature of Base.zeros, and defaults to Float32 when the type is omitted.
Examples
cuNumeric.zeros(2, 2)
cuNumeric.zeros(Float64, 3)
cuNumeric.zeros(Int32, (2,3))cuNumeric.floaty_unary_ops_no_args Constant
Supported Unary Operations
The following unary operations are supported and can be broadcast over NDArray:
-(negation)!(logical not)absacosacoshasinasinhatanatanhcbrtcoscoshdeg2radexpexp2expm1floorisfiniteloglog10log1plog2rad2degsignsignbitsinsinhsqrttantanh^2^-1orinv
Differences
- The
acoshfunction in Julia will error on inputs outside of the domain (x >= 1) but cuNumeric.jl will return NaN.
Examples
A = cuNumeric.ones(Float32, 3, 3)
abs.(A)
log.(A .+ 1)
-sqrt.(abs.(A))cuNumeric.unary_reduction_map Constant
Supported Unary Reduction Operations
The following unary reduction operations are supported and can be applied directly to NDArray values:
• all • any • maximum • minimum • prod • sum
These operations follow standard Julia semantics.
Examples
A = cuNumeric.ones(5)
maximum(A)
sum(A)LinearAlgebra.mul! Method
LinearAlgebra.mul!(out::NDArray, arr1::NDArray, arr2::NDArray)Compute the matrix multiplication of arr1 and arr2, storing the result in out.
This function performs the operation in-place, modifying out.
Examples
a = cuNumeric.ones(2, 3)
b = cuNumeric.ones(3, 2)
out = cuNumeric.zeros(2, 2)
LinearAlgebra.mul!(out, a, b)CUDA.jl Tasking
This section will detail how to use custom CUDA.jl kernels with the Legate runtime.
cuNumeric.@cuda_task Macro
@cuda_task(f(args...))Compile a Julia GPU kernel to PTX, register it with the Legate runtime, and return a CUDATask object for later launch.
Arguments
f— The name of the Julia CUDA.jl GPU kernel function to compile.args...— Example arguments to the kernel, used to determine the argument type signature when generating PTX.
Description
This macro automates the process of:
Inferring the CUDA argument types for the given
argsusingmap_ndarray_cuda_types.Using
CUDA.code_ptxto compile the specified GPU kernel (f) into raw PTX text for the inferred types.Extracting the kernel's function symbol name from the PTX using
extract_kernel_name.Registering the compiled PTX and kernel name with the Legate runtime via
ptx_task, making it available for GPU execution.Returning a
CUDATaskstruct that stores the kernel name and type signature, which can be used to configure and launch the kernel later.
Notes
The
args...are not executed; they are used solely for type inference.This macro is intended for use with the Legate runtime and assumes a CUDA context is available.
Make sure your kernel code is GPU-compatible and does not rely on unsupported Julia features.
Example
mytask = @cuda_task my_kernel(A, B, C)cuNumeric.@launch Macro
@launch(; task, blocks=(1,), threads=(256,), inputs=(), outputs=(), scalars=())Launch a GPU kernel (previously registered via @cuda_task) through the Legate runtime.
Keywords
task— ACUDATaskobject, typically returned by@cuda_task.blocks— Tuple or single element specifying the CUDA grid dimensions. Defaults to(1,).threads— Tuple or single element specifying the CUDA block dimensions. Defaults to(256,).inputs— Tuple or single element of input NDArray objects.outputs— Tuple or single element of output NDArray objects.scalars— Tuple or single element of scalar values.
Description
The @launch macro validates the provided keywords, ensuring only the allowed set (:task, :blocks, :threads, :inputs, :outputs, :scalars) are present. It then expands to a call to cuNumeric.launch, passing the given arguments to the Legate runtime for execution.
This macro is meant to provide a concise, declarative syntax for launching GPU kernels, separating kernel compilation (via @cuda_task) from execution configuration.
Notes
taskmust be a kernel registered with the runtime, usually from@cuda_task.All keyword arguments must be specified as assignments, e.g.
blocks=(2,2)not positional arguments.Defaults are chosen for single-block, 256-thread 1D launches.
The macro escapes its body so that the values of inputs/outputs/scalars are captured from the surrounding scope at macro expansion time.
Example
mytask = @cuda_task my_kernel(A, B, C)
@launch task=mytask blocks=(8,8) threads=(32,32) inputs=(A, B) outputs=(C)CNPreferences
This section details how to set custom build configuration options. To see more details visit our install guide here.
CNPreferences.check_unchanged Method
CNPreferences.check_unchanged()Throws an error if the preferences have been modified in the current Julia session, or if they are modified after this function is called.
This is should be called from the __init__() function of any package which relies on the values of CNPreferences.
CNPreferences.use_conda Method
CNPreferences.use_conda(conda_env::String; export_prefs = false, force = true)Tells cuNumeric.jl to use existing conda install. We make no gurantees of compiler compatability at this time.
Expects conda_env to be the absolute path to the root of the environment. For example, /home/julialegate/.conda/envs/cunumeric-gpu
CNPreferences.use_developer_mode Method
CNPreferences.use_developer_mode(; wrapper_branch="main", use_cupynumeric_jll=true, cupynumeric_path=nothing, export_prefs = false, force = true)Tells cuNumeric.jl to enable developer mode. This will clone cunumeric_jl_wrapper into cuNumeric.jl/deps.
To specify a cunumeric_jl_wrapper branch: wrapper_branch="some-branch" To disable using cupynumeric_jll: use_cupynumeric_jll=false If you disable cupynumeric_jll, then you need to set a path to cuPyNumeric with cupynumeric_path="/path/to/cupynumeric"
CNPreferences.use_jll_binary Method
CNPreferences.use_jll_binary(; export_prefs = false, force = true)Tells Legate.jl to use JLLs. This is the default option.
sourceInternal API
cuNumeric.NDArray Type
Internal API
The NDArray type represents a multi-dimensional array in cuNumeric. It is a wrapper around a Legate array and provides various methods for array manipulation and operations. Finalizer calls nda_destroy_array to clean up the underlying Legate array when the NDArray is garbage collected.
cuNumeric.LegateType Method
LegateType(T::Type)Internal API
Converts a Julia type T to the corresponding Legate type.
cuNumeric.compare Method
compare(x, y, max_diff)Internal API
Compare two arrays x and y for approximate equality within a maximum difference max_diff.
Supports comparisons between:
an
NDArrayand a JuliaAbstractArraytwo
NDArraysa Julia
AbstractArrayand anNDArray
Returns true if the arrays have the same shape and element type (for mixed types), and all corresponding elements differ by no more than max_diff.
Emits warnings when array sizes or element types differ.
Warning
This function uses scalar indexing and should not be used in production code. This is meant for testing.
Notes
This is an internal API used by higher-level approximate equality functions.
Does not support relative tolerance (
rtol).
Behavior
Checks size compatibility.
Checks element type compatibility for
NDArrayvs Julia array.Iterates over elements using
CartesianIndicesto compare element-wise difference.
cuNumeric.padded_shape Method
padded_shape(arr::NDArray)Internal API
Return the size of the given NDArray. This will include the padded size.
cuNumeric.shape Method
shape(arr::NDArray)Internal API
Return the size of the given NDArray.
cuNumeric.slice_array Method
slice_array(slices::Vararg{Tuple{Union{Int,Nothing},Union{Int,Nothing}},N}) where {N}Internal API
Constructs a vector of cuNumeric.Slice objects from a variable number of (start, stop) tuples.
Each tuple corresponds to a dimension slice, using slice internally.
cuNumeric.to_cpp_index Method
to_cpp_index(d::Int64, ::Type{T}=UInt64)Internal API
Converts a single Julia 1-based index d to a zero-based C++ style index wrapped in StdVector.
cuNumeric.to_cpp_index Method
to_cpp_index(idx::Dims{N}, ::Type{T}=UInt64) where {N}Internal API
Converts a Julia 1-based index tuple idx to a zero-based C++ style index wrapped in StdVector of the specified integer type.
Each element of idx is decremented by 1 to adjust from Julia’s 1-based indexing to C++ 0-based indexing.