Photo by Alina Grubnyak on Unsplash

Quantum Generative Adversarial Networks

A gentle introduction to QGANs

What are GANs in the first place?

Before we jump into QGANs, it’s always helpful to debrief the vanilla GAN architecture. Feel free to skip around if you’re confident with GANs.

Generator

Given a real distribution of data (ex. images), the generator will generate fake data samples aiming to mimic the fixed distribution of real data. As an unsupervised network, the generator takes input a uniform random vector/variable z without any clue of the real data distribution. With each step of training G (the generator), we update the weights, and it gets better at transforming this noise source z into a data sample that mimics the distribution of real data.

Discriminator

The discriminator (D) takes as input either real data samples or fake data samples (not knowing which is which), and its’ goal is to discriminate between these two classes, outputting a binary random variable.

The fake artist & art expert

This person does not exist.

QGANs

Source: Dallaire-Demers et al. (2018)

An aside on variational quantum algorithms

Despite the deceiving name, a QGAN is not a fully quantum algorithm. In fact, all near-term quantum machine learning algorithms are hybrid quantum-classical algorithms containing both classical parts and quantum parts.

Variational Quantum Algorithms — How do they work?

Here’s the gist

Let’s equip you with a high-level overview of what’s going on before jumping into the technicalities.

The general structure of QGANs. The real source R or the parametrized generator G(theta_G) is applied on an initial state |0⟩, and each outputs a mixed or pure quantum state described by a density matrix. The discriminator is denoted as D(theta_D). We will discuss details on each register later.

A quick note before we jump into the fun stuff

QGANs actually come in several degrees of “quantumness.” In the interest of brevity, we will carry on to explain QGANs that are based upon quantum data/generator/discriminator. Although there can be a mixture of all those parts being either classical or quantum, the principles that underly a solely quantum QGAN are inclusive to other variations.

Timeline of the development of quantum generative adversarial network models (source). The implementation we’ll be covering (circled in red) was first proposed by Dallaire-Demers et al. (source), and several concepts in this article are drawn from that original paper.

Let’s dive a bit deeper.

Now that you have a high-level intuition about how a QGAN works, let’s solidify it, starting with the generator.

On the discriminator

The discriminator takes as input both data sources (R and G) but is clueless about whether it’s been given a real data sample or a fake one. After running the quantum circuit, it outputs a single binary decision variable that could mean ‘1’ for real and ‘-1’ for fake. Recall the analogy of the fake artist and the art expert.

The cost function — for the mathematically motivated

Now that we have these expectation values, what do we do with them? Well, we need to construct a cost function for our classical optimizer to optimize over. In the next few minutes, I’ll be guiding you through arriving at our final cost function. Still, I must preface that some basic prerequisite linear algebra and quantum computing fundamentals are needed.

Eq 1
Eq 2
Quantum adversarial learning strategy — Eq 3
The simplified cost function for the generator
Classical adversarial learning strategy — log-likelihood functions
The general form of Z operator
Z = |0⟩⟨0| - |1⟩⟨1|  # when we label |real⟩ as |0⟩ and |fake⟩ as |1⟩ 〈real|Z|real〉= 1  # Expvals of statevectors with respect to Z are
〈fake|Z|fake〉= −1 proportional to the correct classification
The real source R or G(θ_G) is applied on the initial state |0,z⟩ respectively defined on the Out R|G and Bath R|G registers. The discriminator uses the outputted state from the source and an initial state |0,0⟩ defined on the Out D, and Bath D registers to output its’ answer |real⟩ or |fake⟩ in the Out D register. Bath D and Bath R|G are workspaces for the discriminator and generator, respectively.
Eq 4 — recall that the expectation value of a density matrix with respect to an observable Z is the trace of that Z operator applied to the density matrix. Read here to learn more
The final quantum optimization problem for QGANs

Now what? Update rule.

Using the final cost function V(θ_D, θ_G), we can use classical gradient descent to arrive at the optimal parameters. Depending on the specific training step k, the update rule for the parameters D(θ^k _D ) or G( θ^k _G ) is given by

Convergence!

With sufficient training steps of G, the statistics produced by G become approximately equivalent to the real data source! At this point, we’ve converged, leaving D unable to differentiate between R and G, as the probability of successful classification reaches its equilibrium value of 1/2 (meaning that D is basically guessing).

The cross-entropy between R and G

The algorithmic flow summed up

Look at the flowchart below. You understand all of it now! Isn’t that crazy? In just minutes, you learned the algorithmic flow of a QGAN, backed by mathematical understanding.

  1. We start by initializing our parameter vectors for G and D arbitrarily
  2. With a fixed θ_G, we evaluate the cost for D and update parameters accordingly for a sufficient number of steps to maximize V(θ_D, θ_G)
  3. Then fixing θ_D, we evaluate the cost for G and update its’ parameters to minimize V(θ_D, θ_G)
  4. We then repeat steps 2&3 until we have converged at a point when all gradients vanish, and the discriminator’s best strategy becomes to guess with an accuracy of 50%.
  5. Return the optimal parameter vector for G
The algorithmic flow of a QGAN, source: Dallaire-Demers et al. (2018)

Some thoughts to keep in mind

Evaluating the D and G cost on quantum circuits

If you look back at our cost function from beforehand, you’ll see that two terms depend on θ_D in the cost function. This makes sense since each term is a measure of the discriminator’s efficacy on each data source. But what this means in practice is that you must double the number of circuit evaluations when calculating the discriminator’s cost compared to the generator’s cost.

The final quantum optimization problem for QGANs
Simplified individual cost functions for D and G

The D and G training tradeoff

A common problem with classical GANs that carries over with QGANs is the training tradeoff between the generator and discriminator. Let me explain.

Thinking on G and |z⟩

Think of the quantum generator as transforming an unstructured latent input space into a generated sample space. What this means in practice is that we input a random variable to the generator, which it then maps to a sample in the k-dimensional generated probability distribution. The whole act of training G is for it to get better at transforming this random variable |z⟩ to something matching R. One detail that I’ve excluded thus far is this random vector/variable, so let’s talk about it.

Beware: classical data encoding

Although the variation of QGANs we discuss here deals with quantum data, I want to address a non-trivial assumption regarding the real data circuit when exploring outside the quantum data domain.

An ansatz suggestion

Until now, I purposely avoided touching on potential ansatzes since ansatz choice varies considerably dependant on your use case.

A practical universal circuit ansatz for either G or D. Each layer is composed of parameterized single-qubit X rotations followed by parameterized Z rotations. A layer of staggered sets of parameterized nearest-neighbour ZZ rotations follows the single-qubit rotations.

Why quantum?

Apart from QGANs simply being cool, after all this, I feel I should probably address the motivation behind quantizing GANs. Why do researchers believe there to be a potential speedup over classical GANs? Let me explain.

Wrapping it up

Okay, I must admit that was a lot. Let’s recap:

  • A QGAN uses two parameterized quantum circuits, the discriminator and the generator. The generator learns the optimal parameters to transform a random latent input space into a k-dimensional generated probability distribution space to ensure the greatest intersection with the real probability distribution. Inversely, the discriminator optimizes against this to uphold the cross-entropy between the two data distributions.
  • The classical log-likelihood cost function's reformulation into our quantum formalism as a linear convex optimization problem (which we derived from scratch) ensures a reliable gradient descent.
  • The QGAN has finished training once the trained discriminator cannot distinguish between generated/real quantum states with probability >1/2. At this point, all gradients vanish, and the cross-entropy between the real and generated density matrices is 0.
  • The unique fixed point is called the Nash equilibrium when the generator produces data equivalent to the real data. The QGAN will always converge to this point.
  • QGANs have the potential for an exponential speedup in sampling, manipulating, and reproducing probability distributions that are classically intractable (e.g. correlated fermionic states)

Resources

Want to see a QGAN built? Look through this pennylane tutorial. Having understood what we’ve discussed here, you will breeze through it with a newfound appreciation for the simplicity of QGANs.

--

--

pavanjay.com | Invested in QC + ML | EECS @UWaterloo | Seeker of rigorous truth

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store