Skip to content

Public API

User facing functions supported by cuNumeric.jl

cuNumeric.versioninfo Method
julia
versioninfo()

Prints the cuNumeric build configuration summary, including package metadata, Julia and compiler version, and paths to core dependencies.

source
cuNumeric.disable_gc! Method
julia
disable_gc!()

Disables the automatic garbage collection heuristics. This gives the user full control over memory management.

source
cuNumeric.init_gc! Method
julia
init_gc!()

Initializes the cuNumeric garbage collector by querying the available device memory and enabling the automatic GC heuristics.

source
cuNumeric.get_time_microseconds Method

Returns the timestamp in microseconds. Blocks on all Legate operations preceding the call to this function.

source
cuNumeric.get_time_nanoseconds Method

Returns the timestamp in nanoseconds. Blocks on all Legate operations preceding the call to this function.

source
Base.:- Method
julia
Base.:-(val::Number, arr::NDArray)
Base.:-(arr::NDArray, val::Number)
Base.:-(lhs::NDArray, rhs::NDArray)

Perform subtraction involving an NDArray and a scalar or between two NDArrays.

  • val - arr subtracts val by arr.

  • arr - val subtracts scalar val from each element of arr.

  • Element-wise subtraction is supported between two NDArrays.

Broadcasting is also supported for these operations.

Examples

julia
lhs - 3
3 - rhs
lhs - rhs
source
Base.:/ Method
julia
Base.:/(arr::NDArray, val::Scalar)
Base.Broadcast.broadcasted(::typeof(/), arr::NDArray, val::Scalar)

Returns the element-wise multiplication of arr by the scalar reciprocal 1 / val.

Examples

julia
arr = cuNumeric.ones(2, 2)
arr / 2
source
Base.:== Method
julia
==(arr::NDArray, julia_arr::Array)
==(julia_arr::Array, arr::NDArray)

Compare an NDArray and a Julia Array for element-wise equality.

Returns true if both arrays have the same shape and all corresponding elements are equal. Returns false otherwise (including if sizes differ, with a warning).

Warning

This function uses scalar indexing and should not be used in production code. This is meant for testing.

Examples

julia
arr = cuNumeric.ones(2, 2)
julia_arr = ones(2, 2)
arr == julia_arr
julia_arr == arr
julia_arr2 = zeros(2, 2)
arr == julia_arr2
source
Base.:== Method
julia
==(arr1::NDArray, arr2::NDArray)

Check if two NDArrays are equal element-wise.

Returns true if both arrays have the same shape and all corresponding elements are equal. Currently supports arrays up to 3 dimensions. For higher dimensions, returns false with a warning.

Warning

This function uses scalar indexing and should not be used in production code. This is meant for testing.

Examples

julia
a = cuNumeric.ones(2, 2)
b = cuNumeric.ones(2, 2)
a == b
c = cuNumeric.zeros(2, 2)
a == c
source
Base.Broadcast.broadcasted Method
julia
Base.Broadcast.broadcasted(::typeof(/), lhs::NDArray, rhs::NDArray)

Perform element-wise division of two NDArrays.

Examples

julia
A = cuNumeric.rand(2, 2)
B = cuNumeric.ones(2, 2)
C = A ./ B
typeof(C)
source
Base.Broadcast.broadcasted Method
julia
Base.Broadcast.broadcasted(::typeof(/), val::Scalar, arr::NDArray)

Throws an error since element-wise division of a scalar by an NDArray is not supported yet.

Examples

julia
arr = cuNumeric.ones(2, 2)
# 2 ./ arr # ERROR
source
Base.copy Method
julia
Base.copy(arr::NDArray)

Create and return a deep copy of the given NDArray.

Examples

julia
a = cuNumeric.ones(2, 2)
b = copy(a)
b === a
b[1,1] == a[1,1]
source
Base.firstindex Method
julia
Base.firstindex(arr::NDArray, dim::Int)
Base.lastindex(arr::NDArray, dim::Int)
Base.lastindex(arr::NDArray)

Provide the first and last valid indices along a given dimension dim for NDArray.

  • firstindex always returns 1, since Julia arrays are 1-indexed.

  • lastindex returns the size of the array along the specified dimension.

  • lastindex(arr) returns the size along the first dimension.

Examples

julia
arr = cuNumeric.rand(4, 5);
firstindex(arr, 2)
lastindex(arr, 2)
lastindex(arr)
source
Base.isapprox Method
julia
isapprox(arr1::NDArray, arr2::NDArray; atol=0, rtol=0)
isapprox(arr::NDArray, julia_array::AbstractArray; atol=0, rtol=0)
isapprox(julia_array::AbstractArray, arr::NDArray; atol=0, rtol=0)

Approximate equality comparison between two NDArrays or between an NDArray and a Julia AbstractArray.

Returns true if the arrays have the same shape and all corresponding elements are approximately equal within the given absolute tolerance atol and relative tolerance rtol.

The second and third methods handle comparisons between NDArray and Julia arrays by forwarding to a common comparison function.

Warning

This function uses scalar indexing and should not be used in production code. This is meant for testing.

Examples

julia
arr1 = cuNumeric.ones(2, 2)
arr2 = cuNumeric.ones(2, 2)
julia_arr = ones(2, 2)
isapprox(arr1, arr2)
isapprox(arr1, julia_arr)
isapprox(julia_arr, arr2)
source
Base.size Method
julia
Base.size(arr::NDArray)
Base.size(arr::NDArray, dim::Int)

Return the size of the given NDArray.

  • Base.size(arr) returns a tuple of dimensions of the array.

  • Base.size(arr, dim) returns the size of the array along the specified dimension dim.

These override Base's size methods for the NDArray type, using the underlying cuNumeric API to query array shape.

Examples

julia
arr = cuNumeric.rand(3, 4, 5);
size(arr)
size(arr, 2)
source
LinearAlgebra.mul! Method
julia
LinearAlgebra.mul!(out::NDArray, arr1::NDArray, arr2::NDArray)

Compute the matrix multiplication (dot product) of arr1 and arr2, storing the result in out.

This function performs the operation in-place, modifying out.

Examples

julia
a = cuNumeric.ones(2, 3)
b = cuNumeric.ones(3, 2)
out = cuNumeric.zeros(2, 2)
LinearAlgebra.mul!(out, a, b)
source
Random.rand! Method
julia
cuNumeric.rand!(arr::NDArray)

Fills arr with Float64s uniformly at random.

cuNumeric.rand(NDArray, dims::Int...)
cuNumeric.rand(NDArray, dims::Tuple)

Create a new NDArray of element type Float64, filled with uniform random values.

This function uses the same signature as Base.rand with a custom backend, and currently supports only Float64 with uniform distribution (code = 0).

Examples

julia
cuNumeric.rand(NDArray, 2, 2)
cuNumeric.rand(NDArray, (4, 1))
A = cuNumeric.zeros(2, 2); cuNumeric.rand!(A)
source
cuNumeric.add! Method
julia
add!(out::NDArray, arr1::NDArray, arr2::NDArray)

Compute element-wise addition of arr1 and arr2 storing the result in out.

This is an in-place operation and is used to support .+= style syntax.

Examples

julia
a = cuNumeric.ones(2, 2)
b = cuNumeric.ones(2, 2)
out = similar(a)
add!(out, a, b)
source
cuNumeric.assign Method
julia
assign(arr::NDArray, other::NDArray)

Assign the contents of other to arr element-wise.

This function overwrites the data in arr with the values from other. Both arrays must have the same shape.

Examples

julia
a = cuNumeric.zeros(2, 2)
b = cuNumeric.ones(2, 2)
cuNumeric.assign(a, b);
a[1,1]
source
cuNumeric.full Method
julia
cuNumeric.full(dims::Tuple, val)
cuNumeric.full(dim::Int, val)

Create an NDArray filled with the scalar value val, with the shape specified by dims.

Examples

julia
cuNumeric.full((2, 3), 7.5)
cuNumeric.full(4, 0)
source
cuNumeric.multiply! Method
julia
multiply!(out::NDArray, arr1::NDArray, arr2::NDArray)

Compute element-wise multiplication of arr1 and arr2, storing the result in out.

This function performs the operation in-place, modifying out.

Examples

julia
a = cuNumeric.ones(2, 2)
b = cuNumeric.ones(2, 2)
out = similar(a)
multiply!(out, a, b)
source
cuNumeric.ones Method
julia
cuNumeric.ones([T=Float32,] dims::Int...)
cuNumeric.ones([T=Float32,] dims::Tuple)

Create an NDArray with element type T, of all zeros with size specified by dims. This function has the same signature as Base.ones, so be sure to call it as cuNuermic.ones.

Examples

julia
cuNumeric.ones(2, 2)
cuNumeric.ones(Float32, 3)
cuNumeric.ones(Int32, (2, 3))
source
cuNumeric.zeros Method
julia
cuNumeric.zeros([T=Float32,] dims::Int...)
cuNumeric.zeros([T=Float32,] dims::Tuple)

Create an NDArray with element type T, of all zeros with size specified by dims. This function mirrors the signature of Base.zeros, and defaults to Float32 when the type is omitted.

Examples

julia
cuNumeric.zeros(2, 2)
cuNumeric.zeros(Float64, 3)
cuNumeric.zeros(Int32, (2,3))
source
cuNumeric.unary_op_map_no_args Constant

Supported Unary Operations

The following unary operations are supported and can be applied directly to NDArray values:

  • abs

  • acos

  • asin

  • asinh

  • atan

  • atanh

  • cbrt

  • conj

  • cos

  • cosh

  • deg2rad

  • exp

  • exp2

  • expm1

  • floor

  • log

  • log10

  • log1p

  • log2

  • - (negation)

  • rad2deg

  • sin

  • sinh

  • sqrt

  • square

  • tan

  • tanh

These operations are applied elementwise by default and follow standard Julia semantics.

Examples

julia
A = NDArray(randn(Float32, 3, 3))

abs(A)
log.(A .+ 1)
-sqrt(abs(A))
square(A)
source
cuNumeric.unary_reduction_map Constant

Supported Unary Reduction Operations

The following unary reduction operations are supported and can be applied directly to NDArray values:

maximumminimumprodsum

These operations follow standard Julia semantics.

Examples

julia
A = NDArray(randn(Float32, 3, 3))

maximum(A)
sum(A)
source
cuNumeric.square Function
julia
square(arr::NDArray)

Elementwise square of each element in arr.

source
cuNumeric.binary_op_map Constant

Supported Binary Operations

The following binary operations are supported and can be applied elementwise to pairs of NDArray values:

+-*/^divatanhypot

These operations are applied elementwise by default and follow standard Julia semantics.

Examples

julia
A = NDArray(randn(Float64, 4))
B = NDArray(randn(Float64, 4))

A + B
A / B
hypot.(A, B)
div.(A, B)
A .^ 2
source

CUDA.jl Tasking

This section will detail how to use custom CUDA.jl kernels with the Legate runtime.

cuNumeric.@cuda_task Macro
julia
@cuda_task(f(args...))

Compile a Julia GPU kernel to PTX, register it with the Legate runtime, and return a CUDATask object for later launch.

Arguments

  • f — The name of the Julia CUDA.jl GPU kernel function to compile.

  • args... — Example arguments to the kernel, used to determine the argument type signature when generating PTX.

Description

This macro automates the process of:

  1. Inferring the CUDA argument types for the given args using map_ndarray_cuda_types.

  2. Using CUDA.code_ptx to compile the specified GPU kernel (f) into raw PTX text for the inferred types.

  3. Extracting the kernel's function symbol name from the PTX using extract_kernel_name.

  4. Registering the compiled PTX and kernel name with the Legate runtime via ptx_task, making it available for GPU execution.

  5. Returning a CUDATask struct that stores the kernel name and type signature, which can be used to configure and launch the kernel later.

Notes

  • The args... are not executed; they are used solely for type inference.

  • This macro is intended for use with the Legate runtime and assumes a CUDA context is available.

  • Make sure your kernel code is GPU-compatible and does not rely on unsupported Julia features.

Example

julia
mytask = @cuda_task my_kernel(A, B, C)
source
cuNumeric.@launch Macro
julia
@launch(; task, blocks=(1,), threads=(256,), inputs=(), outputs=(), scalars=())

Launch a GPU kernel (previously registered via @cuda_task) through the Legate runtime.

Keywords

  • task — A CUDATask object, typically returned by @cuda_task.

  • blocks — Tuple or single element specifying the CUDA grid dimensions. Defaults to (1,).

  • threads — Tuple or single element specifying the CUDA block dimensions. Defaults to (256,).

  • inputs — Tuple or single element of input NDArray objects.

  • outputs — Tuple or single element of output NDArray objects.

  • scalars — Tuple or single element of scalar values.

Description

The @launch macro validates the provided keywords, ensuring only the allowed set (:task, :blocks, :threads, :inputs, :outputs, :scalars) are present. It then expands to a call to cuNumeric.launch, passing the given arguments to the Legate runtime for execution.

This macro is meant to provide a concise, declarative syntax for launching GPU kernels, separating kernel compilation (via @cuda_task) from execution configuration.

Notes

  • task must be a kernel registered with the runtime, usually from @cuda_task.

  • All keyword arguments must be specified as assignments, e.g. blocks=(2,2) not positional arguments.

  • Defaults are chosen for single-block, 256-thread 1D launches.

  • The macro escapes its body so that the values of inputs/outputs/scalars are captured from the surrounding scope at macro expansion time.

Example

julia
mytask = @cuda_task my_kernel(A, B, C)

@launch task=mytask blocks=(8,8) threads=(32,32) inputs=(A, B) outputs=(C)
source

CNPreferences

This section details how to set custom build configuration options. To see more details visit our install guide here.

CNPreferences.check_unchanged Method
julia
CNPreferences.check_unchanged()

Throws an error if the preferences have been modified in the current Julia session, or if they are modified after this function is called.

This is should be called from the __init__() function of any package which relies on the values of CNPreferences.

source
CNPreferences.use_conda Method
julia
CNPreferences.use_conda(conda_env::String; export_prefs = false, force = true)

Tells cuNumeric.jl to use existing conda install. We make no gurantees of compiler compatability at this time.

Expects conda_env to be the absolute path to the root of the environment. For example, /home/julialegate/.conda/envs/cunumeric-gpu

source
CNPreferences.use_developer_mode Method
julia
CNPreferences.use_developer_mode(; wrapper_branch="main", use_cupynumeric_jll=true, cupynumeric_path=nothing, export_prefs = false, force = true)

Tells cuNumeric.jl to enable developer mode. This will clone cunumeric_jl_wrapper into cuNumeric.jl/deps.

To specify a cunumeric_jl_wrapper branch: wrapper_branch="some-branch" To disable using cupynumeric_jll: use_cupynumeric_jll=false If you disable cupynumeric_jll, then you need to set a path to cuPyNumeric with cupynumeric_path="/path/to/cupynumeric"

source
CNPreferences.use_jll_binary Method
julia
CNPreferences.use_jll_binary(; export_prefs = false, force = true)

Tells Legate.jl to use JLLs. This is the default option.

source

Internal API

cuNumeric.NDArray Type

Internal API

The NDArray type represents a multi-dimensional array in cuNumeric. It is a wrapper around a Legate array and provides various methods for array manipulation and operations. Finalizer calls nda_destroy_array to clean up the underlying Legate array when the NDArray is garbage collected.

source
Base.eltype Method
julia
Base.eltype(arr::NDArray)

Internal API

Returns the element type of the NDArray.

This method uses nda_array_type_code internally to map to the appropriate Julia element type.

source
cuNumeric.LegateType Method
julia
LegateType(T::Type)

Internal API

Converts a Julia type T to the corresponding Legate type.

source
cuNumeric.compare Method
julia
compare(x, y, max_diff)

Internal API

Compare two arrays x and y for approximate equality within a maximum difference max_diff.

Supports comparisons between:

  • an NDArray and a Julia AbstractArray

  • two NDArrays

  • a Julia AbstractArray and an NDArray

Returns true if the arrays have the same shape and element type (for mixed types), and all corresponding elements differ by no more than max_diff.

Emits warnings when array sizes or element types differ.

Warning

This function uses scalar indexing and should not be used in production code. This is meant for testing.

Notes

  • This is an internal API used by higher-level approximate equality functions.

  • Does not support relative tolerance (rtol).

Behavior

  • Checks size compatibility.

  • Checks element type compatibility for NDArray vs Julia array.

  • Iterates over elements using CartesianIndices to compare element-wise difference.

source
cuNumeric.padded_shape Method
julia
padded_shape(arr::NDArray)

Internal API

Return the size of the given NDArray. This will include the padded size.

source
cuNumeric.shape Method
julia
shape(arr::NDArray)

Internal API

Return the size of the given NDArray.

source
cuNumeric.slice_array Method
julia
slice_array(slices::Vararg{Tuple{Union{Int,Nothing},Union{Int,Nothing}},N}) where {N}

Internal API

Constructs a vector of cuNumeric.Slice objects from a variable number of (start, stop) tuples.

Each tuple corresponds to a dimension slice, using slice internally.

source
cuNumeric.to_cpp_index Function
julia
to_cpp_index(d::Int64, int_type::Type=UInt64)

Internal API

Converts a single Julia 1-based index d to a zero-based C++ style index wrapped in StdVector.

source
cuNumeric.to_cpp_index Method
julia
to_cpp_index(idx::Dims{N}, int_type::Type=UInt64) where {N}

Internal API

Converts a Julia 1-based index tuple idx to a zero-based C++ style index wrapped in StdVector of the specified integer type.

Each element of idx is decremented by 1 to adjust from Julia’s 1-based indexing to C++ 0-based indexing.

source