Credit: Quanta

Quantum Graph Neural Networks Applied

Tackling particle reconstruction with hybrid quantum-classical graph neural networks

  • if you’re interested in the ins & outs of intriguing QML applications
  • you’re keen to get a snapshot understanding of how classical graph neural networks work
  • you’re the kind of person that gets dopamine hits when you make cross-domain connections between ideas
  1. We’ll start with understanding our problem setting at CERN, touching on what’s being done to solve it right now, and why graphML could be the potential key to solving it.
  2. Then we’ll make an interlude to understand classical graph neural networks
  3. So that we can break down the quantum graph neural network in detail
  4. And end off by sharing how my group & I contributed to the existing work.

Section 1: GraphML meets High Energy Physics

Before diving right into the QML, let’s understand the problem we’re hoping to solve along with the context it resides upon. That takes us to Geneva.

CERN general operations

Diagram of the particle accelerator rings at CERN (27km in circumference) | Image credit: CERN
An internal view of the ultra-high vacuum tubes that accelerate the particle within the ring | Image credit: CERN

Particle track reconstruction problem description

With a grasp of what’s going on at CERN, let’s define the specific task we’re trying to tackle with quantum graph neural networks.

A visualization of complex sprays of subatomic particles, produced from colliding proton beams in CERN’s CMS detector at the Large Hadron Collider near Geneva, Switzerland in mid-April of 2018. Credit: CERN
Credit: CERN

Status quo & Opportunity

Right now, computational algorithms for particle reconstruction manage at the current rate of collisions but they suffer from higher collision rates as they scale worse than quadratically (ex. O(n⁶) ). However, CERN is currently upgrading the LHC to have a higher number of particles in each beam (high luminosity) meaning that each collision will spawn even more particles and set off more hits. Scientists realize that the current algorithms will become unwieldy to utilize with the upcoming High Luminosity LHC experiments. So, the efficient reconstruction of particle trajectories is one of the most important challenges in the HL-LHC upgrade.

Graph problem formulation and motivation for graph neural networks

A simple reframing makes it clear how this problem lives squarely within graph theory land.

Source

Section 2: Classical graph neural networks

We’re now going to take one step back to take two steps forward. As exciting as the problem I just described is, let’s first understand the premise of graph neural networks before biting into the intricacies of the hybrid quantum-classical graph neural network.

Graph representation problems

Graphs are everywhere! Graphs are a mathematical object that represents the relations (edges) between a collection of entities (nodes). In other words, it’s a collection of nodes connected by edges.

  • Understanding natural language (semantic analysis)
  • Traffic data
  • Modelling molecules
  • Given the traffic data across American cities, what is the predicted traffic in New York City tomorrow (node classification)?
  • Knowing the network of LinkedIn connections, how likely is it that Henry has already met Terry (link prediction)?
  • Given a molecule, what is its’ probability of being toxic (global graph level prediction)?
  1. Node-level — predicting some property for each node in a graph
  2. Edge-level — predicting the property or presence of edges in a graph
  3. Graph-level — predicting a property for a whole graph

Challenges with graph data

Yet it’s not so trivial. How does a neural network even take a graph as input? Answering this question turns tricky since neural networks have been traditionally used to operate on fixed-size and/or regular-ordered inputs (such as sentences, images and video). So what does it even mean for a neural network to be fed raw graphs as input? Neural networks don’t naturally speak the language of graphs.

Source
Source

Naive graph neural network

In the most fundamental sense, a graph neural network learns the properties of graph-structured data. Since indexing can now allow us to package graphs reliably into conventional feed-forward neural networks, we can train a neural network to learn the desired task given proper labels. The simplest of such an architecture could be the following (as detailed here).

Graph convolutional neural network (GCN)

I’ve shown you what poorly architected GNNs look like, but what about the good? To arrive there, let’s think through a few questions and reconcile what we already know.

Source
Source
Illustration of 2D Convolutional Neural Networks (left) and Graph Convolutional Networks (right), via source
Source
Where N(h_i) is the index set of all neighbouring feature vectors to node h_i, and h_u^{(k-1)} is the neighboring node feature vector of the prior layer. It’s also worth noting that |N(h_i)| is the number of neighboring nodes to h_i.
Where f^{(k)} and W^{(k)} are the ReLU function and weight matrix of the kth layer, respectively.
  • (Edge→Node) If the task of the GNN is node classification but you have key information stored in edges, then you will need to pool information from the edges into nodes. This is done by aggregating (sum/max/mean) all touching edge feature vectors of a given node and either directly updating the node embedding with the result, or updating the node with the output of an update function that has been fed the aggregated vector.
  • (Node→Edge) this is the procedure explained at the start of this section.

Section 3: The hybrid quantum-classical graph neural network (QGNN)

Remember the particle reconstruction problem? Let’s piece through the inner workings of this hybrid quantum-classical graph neural network designed to tackle exactly that.

Broad overview

Recall that the point of our QGNN is to predict the probability that each edge in an inputted event graph is a track segment. Each “event”, being the collision of two particle beams, generates a hit graph of 80,000+ nodes. So at the end of all this, we hope to feed new event graphs into the trained QGNN to infer edge predictions which when connected together form particle trajectories.

The three-step process diagram described above is from [4]: Schematic of the QGNN architecture. The pre-processed graph is fed to an Input Network, which increases the dimension of the node features. Then, the graph’s features are updated with the Edge and Node Networks iteratively, number of iterations (NI ) times. Finally, the same Edge Network is used one more time to extract the edge features of the graph that predicts the track segments. There is only one Edge Network in the pipeline, two Edge Networks are drawn only for visual purposes.

Input Network

The input network takes as input a matrix of all the hit data per node and applies a single-layer neural network to cast the hit data to an embedding space of a higher dimension. The number of hidden dimensions, N_D, is a hyperparameter that dictates the size of all node embeddings.

Edge Network

The edge network is used to predict the probability that a given particle travels from its inner node to the outer node. In other words, it answers the question: does this edge exist? It’s worth noting that these are undirected edges since particles only travel outwards from the origin.

The last layer width should be just 1 neuron for the edge network. Adapted from ref paper
RY angle embedding scheme

Node Network

Since the edge network is the sub-component of the QGNN responsible for predicting track segments, it’s easy to assume that the node network is rather unimportant. This couldn’t be further from the truth. The node network plays a vital role in iteratively updating the node embeddings which condense hidden graph attributes. This allows the edge network to work off of richer hints.

Node network, Adapted from ref paper

Ansatz choice

Heretofore, we briefly addressed the PQC but never saw what the circuit actually looks like. Partly since it can largely vary!

[Left] Single layer of circuit 10 from Sim2019 ; [Right] Single layer of the ansatz developed during QHack

Training and loss function

Once you’ve understood how a QGNN works, training it is rather simple. The process is very similar to any other binary classification model, with the addition that the quantum gradients need to be calculated independently using inefficient analytic parameter-shift rules.

  1. Feed-in a batch of events (individual graphs of hits), and conduct a forward pass to arrive at the predicted edge structure for each event.
  2. Calculate the binary cross-entropy loss between the predicted edge values and the labels. See below for the mathematical formulation.
  3. Backpropagate the loss to find the gradients of each classical and quantum weight. The quantum gradients are analytically found using parameter-shift rule. The classical gradients are computed as usual leveraging auto-differentiation.
  4. Update each weight using calculated gradients with the optimizer of your choice. [4] uses an ADAM optimizer with a learning rate of 0.01.
  5. Repeat above for all batches in an epoch, over 10–100 epochs.
Binary cross-entropy loss function

Recapping QGNNs

Equipped with a deeper understanding of the components, let’s recap how they all mesh together and touch on the motivation behind some design choices.

There is only one Edge Network in the pipeline, two Edge Networks are drawn only for visual purposes.

Section 4: Architecture improvements made in QHack

Thus far, I’ve detailed at great length how the original QGNN works as brought forth within [4].

Vanishing/exploding gradient problem

A known issue of recurrent neural networks is the vanishing gradient problem and the QGNN used here is no exception. Training an RNN requires using back-propagation through time (BPTT), which means that you choose a number of time steps N, and you unroll your network such that it becomes a feed-forward neural network (FFNN) with N duplicates of the original network.

An unrolled recurrent neural network (source)

Assumptions, gaps & future work

  • Assumptions — The 5 day time constraint meant that we could never train the complete QGNN over the complete set of all events (takes a whole week to train). What this implies is that there’s a chance that our findings with the vanishing gradient problem don’t necessarily pan out somewhere along the way when fully training.
  • Future work — There still remains a lot of tinkering that can be done with the architecture of this primitive QGNN to see how performance could be improved. Specifically, it would interest me to see the performance on a GNN with a more conventional message passing & aggregation method.

--

--

pavanjay.com | Invested in QC + ML | EECS @UWaterloo | Seeker of rigorous truth

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store