NEWS

ORCA Computing, PCSS and Imperial College collaborate with NVIDIA to demonstrate a distributed quantum neural network in a hybrid HPC environment

SEP 16 2025

Quantum computing is rapidly evolving from theory to practice, and a key step on that journey is the integration of quantum processors with the high-performance computing (HPC) systems that already underpin today’s AI and scientific workloads. Researchers from ORCA Computing, the Poznan Supercomputing Networking Center (PCSS) and Imperial College London, in collaboration with NVIDIA, are working to achieve this integration and develop algorithms that can leverage it.

Here, we announce a first demonstration of a distributed photonic quantum neural network on a hybrid HPC environment. This work highlights the potential of quantum photonic processors as building blocks in distributed machine learning architectures. By coupling quantum devices to GPU nodes, we can expand the capabilities of classical neural networks with new quantum methods and optimization pathways.

This work leverages the world’s first HPC environment allowing multiple users to access a multi-QPU and multi-GPU environment, which used the CUDA-Q platform to combine ORCA quantum processors and NVIDIA GPUs. We are pleased to announce the release of our first paper describing this environment, which also presents additional applications in machine learning and optimization.

A multi-QPU, multi-GPU and multi-user environment

At the heart of this effort is PCSS’s HPC center, where the team deployed two ORCA Computing PT photonic quantum processors (shown in figure 1) alongside existing NVIDIA GPU nodes. PCSS hosts many GPU nodes including the PROXIMA system that contains 87 GPU nodes, each with 4 NVIDIA H100 GPUs. The ORCA PT systems, operating at room temperature in standard 19-inch rack cabinets, integrate into this environment via Ethernet, requiring no special networking, cooling, or facility adaptations.

Fig. 1 An ORCA Computing PT system.

As described in our paper, the hybrid environment is orchestrated using Slurm, the industry-standard workload manager. We extended Slurm to schedule jobs across CPUs, GPUs, and QPUs simultaneously, ensuring that users can access heterogeneous resources using the tools that they are used to.

On the software side, the CUDA-Q platform plays a central role. CUDA-Q is an open source, QPU-agnostic platform to program the heterogeneous quantum-classical architectures that are needed to run useful, large-scale quantum computing applications. The integration of quantum processors within AI supercomputers is widely accepted to be a necessary architecture for building large-scale, useful quantum computers. Additionally, development platforms for quantum-classical architectures must provide user-friendly hybrid programming workflows able to maximize performance across a heterogeneous compute architecture, as referenced here.

We extended CUDA-Q with a backend supporting ORCA’s photonic processors. ORCA’s are the first photonic processors integrated in CUDA-Q. This integration in CUDA-Q allows users to access these processors from Python and C++ environments and program seamlessly classical and quantum resources. ORCA’s photonic processors and simulators for photonic systems are available within CUDA-Q. Learn more here.

To demonstrate uses of this platform, we have implemented a range of hybrid applications including:

Optimization: implementing the Binary Bosonic Solver (BBS) algorithm for problems such as Max-Cut and Job Shop Scheduling.
Hybrid machine learning: embedding photonic quantum layers inside classical neural networks for classification tasks.
Neural architecture search: evolving ensembles of classical networks using quantum processors to guide the search for high-performing models.

These early demonstrations showcase the flexibility of this environment and provide a foundation for more advanced hybrid workflows.

A distributed quantum neural network

Building on an initial theory proposal from researchers at Imperial College London, which received the best paper award in the photonics track of the IEEE Quantum Week in 2025, we performed a demonstration of a novel distributed quantum neural network algorithm that is perfectly suited for this environment. Dr. Louis Chen, from the Distributed Quantum Computing team and Imperial QuEST at Imperial College London, visited PCSS to carry out this work jointly with PCSS team led by Dr. Piotr Rydlichowski and supported by the technical expertise of the PCSS engineering group.

Two ORCA PT systems were used to train a classical neural network; inference then ran on NVIDIA GPUs. Specifically, the outputs of quantum circuits running on the two PT systems were mapped onto the weights of a neural network using a tensor network. These weights were then trained by updating the quantum circuit parameters using a gradient-based method. Once trained, the neural network was deployed and evaluated on NVIDIA GPUs. This approach is parameter-efficient, since in our work only 14 quantum circuit parameters were used to train a neural network with 8000 weights.

The initial validation of this approach was performed using the MNIST dataset, a standard benchmark for neural network architectures. The training process ran continuously for nearly 20 hours without error or human intervention, underscoring the stability and reliability of the Quantum–HPC system. Importantly, the real hardware experiment achieved performance comparable to the noiseless simulation (shown in Fig. 2), confirming both the robustness of the algorithm to noise and the viability of the framework under practical operating conditions. Building on this foundation, we are extending the framework to more challenging domains, including biological and medical imaging datasets, where distributed quantum–classical pipelines could provide significant advantages in processing high-dimensional data.

Fig. 2 Accuracy comparison of real quantum hardware and noiseless simulation.

This demonstration highlights not only the robustness of photonic quantum processors under distributed operation but also their ability to support workloads in a way that is naturally aligned with modern HPC environments. Using Slurm and CUDA-Q, our architecture seamlessly combines classical and quantum workflows. The integration with NVIDIA GPU acceleration remains a critical enabler, supporting gradient evaluation, tensor-network mapping, and neural network forward/backward passes.

Conclusion

This collaboration demonstrates how quantum computing can move from theoretical proposals to practical implementations inside operational HPC environments. By combining photonic quantum processors with GPU-accelerated clusters under a unified scheduling and programming model, we have created the first platform capable of supporting multi-user, multi-QPU, and multi-GPU workflows. We have also co-developed a novel use case for this platform, where multiple QPUs are used to train the parameters of a neural network running on GPUs.

Looking ahead, this architecture will serve as a useful testbed for new hybrid algorithms and applications in machine learning, optimization, and beyond. Future work will focus on reducing latency, broadening algorithmic support, and scaling to next-generation quantum processors. Together, these steps will help pave the way toward useful, large-scale quantum computing applications, unlocking new computational possibilities at the intersection of HPC and quantum computing.