REPORTS & PUBLICATIONS

Accelerating photonic quantum simulations with GPUs

MAR 12 2026

High-performance simulation tools are essential for the development of scalable quantum computing systems. For photonic quantum processors in particular, accurate simulation enables algorithm prototyping, architecture validation, and performance benchmarking. However, most widely available quantum simulators today are designed around qubit-based systems, while near-term photonic systems are more naturally described in terms of “photons” and “qumodes”. As a result, the photonic quantum computing community has been underserved by existing simulation infrastructure.

To address this gap, ORCA has developed a tensor-network–based photonic simulator built on NVIDIA’s cuTensorNet library. Tensor networks provide a structured, compressed representation of quantum states that can dramatically reduce computational cost when circuit structure and entanglement patterns permit. By leveraging cuTensorNet’s highly optimized GPU tensor contraction routines, we can efficiently simulate larger-scale photonic circuits than would be feasible with conventional methods. This approach allows ORCA to take full advantage of NVIDIA GPUs and the broader CUDA-Q software ecosystem.

Benchmarking against a CPU-based simulator

To evaluate performance, we compared our GPU-based tensor-network simulator against a traditional CPU-based state-vector simulator. State-vector simulation provides a complete description of the quantum state, but its memory and compute requirements scale exponentially with the number of qumodes. In practice, this limits CPU-based state-vector approaches to approximately 20 qumodes for realistic photonic simulations. Tensor networks provide a more scalable alternative that is particularly well aligned with ORCA’s time-bin processor architecture. By exploiting circuit structure and entanglement locality, the tensor-network approach avoids constructing the full state vector explicitly, significantly reducing both memory footprint and compute time.

The figure below shows the average wall-clock time required to simulate a circuit execution from a simulated photonic quantum system using these two methods, measured over five runs. While the CPU-based state-vector simulator scales poorly and cannot go much beyond 20 qumodes, the GPU-based tensor-network simulator running on a single NVIDIA A100 scales efficiently up to 48 qumodes. Since 48 qumodes corresponds to the operating regime of the ORCA PT-2 processor, this capability provides a practical and powerful tool for understanding, validating, and optimizing real hardware performance.

Open-sourcing to support the community

ORCA intends to open-source this photonic simulator to support the broader photonic quantum computing community. By making these tools publicly available, we aim to accelerate research, enable reproducible benchmarking, and lower barriers to entry for developers building quantum applications. Release of this simulator is scheduled to align with upcoming updates to CUDA-Q.

Our collaboration with NVIDIA ensures that the simulator integrates cleanly with CUDA-Q and leverages cuTensorNet’s ongoing optimizations. This partnership not only strengthens support for photonic quantum workflows within the NVIDIA ecosystem, but also expands the range of GPU-accelerated tools available to the quantum computing community as a whole.

REPORTS & PUBLICATIONS

Accelerating photonic quantum simulations with GPUs

Energy

Where Quantum Computing can help

ORCA in Action