Data Encoding Analysis¶

Copyright (c) 2022 Institute for Quantum Computing, Baidu Inc. All Rights Reserved.

Overview¶

Many studies are now exhibiting the promise of Variational Quantum Algorithms (VQA) in demonstrating quantum advantages on near-term quantum devices, with particular emphasis on using them to solve supervised learning tasks. VQA is also known as a Parameterized Quantum Circuit (PQC), which is divided into two main parts in this tutorial: the data encoding circuit and the quantum neural network. The data encoding circuit converts classical data into quantum states, and the quality of the quantum states has a direct impact on the classification results. When data encoding is analyzed through the perspective of kernel functions, different encoding methods correspond to different kernel functions, which play a critical role in the classification of quantum states [1,2]. It often determines the expressiveness and generalization ability of the method by analyzing data encoding from the standpoint of statistical learning theory [3,4]. Therefore, it is necessary to conduct a systematic analysis of data encoding. From the standpoint of quantum information, literature [5] critically analyzes the effect of the width and depth of the data encoding circuit on the quantum state.

The next part of the tutorial is organized around the literature [5] and is divided into two parts: theory and Paddle Quantum implementation. The fundamental concepts and main conclusions are introduced first, followed by specific implementation solutions.

Theory¶

Fundamental Concepts¶

Figure 1 shows the flowchart for encoding classical data into quantum states. Assume that the classical data are independent identically distributed samples of distribution DD, each classical data is xxxx, which becomes a quantum state ρ(xx)ρ(xx) after passing through the data encoding circuit. Here the concept of average quantum state over the distribution DD is introduced, i.e.

¯ρ:=E[ρ(xx)].(1)(1)ρ¯:=E[ρ(xx)].

Given a classical data set SS consisting of MM data, we usually use the average value to approximate the average quantum state of SS

¯ρ:=1MM∑j=1ρ(xxj).(2)(2)ρ¯:=1M∑j=1Mρ(xxj).

illustration

There are various methods for measuring the distance between quantum states, including trace distance, fidelity, Petz-Rényi divergence, and others. In this tutorial, the Petz-Rényi divergence is used as a measure of the distance between different quantum states. Specifically, the Petz-Rényi divergence of the quantum states ρ0ρ0 and ρ1ρ1 is defined as

D2(ρ0||ρ1)=logTr[ρ20ρ−11],(3)(3)D2(ρ0||ρ1)=logTr[ρ02ρ1−1],

the closer the two quantum states are, the smaller the value of D2D2.

Main Result¶

The Petz-Rényi divergence metric is then used to calculate the distance between the encoding average quantum state and the maximum mixed state. It is not difficult to believe that this distance is related to the property of the classical dataset and how the data are encoded. Assume that each feature dimension of the classical dataset satisfies certain independence and has a standard deviation of at least σσ. The following inequality then holds for the encoding circuit depicted in figure 2 (with width and depth nn and DD, respectively)

D2(¯ρ||I/2n)≤log(1+(2n−1)2−Dσ2),(4)(4)D2(ρ¯||I/2n)≤log(1+(2n−1)2−Dσ2),

where I/2nI/2n is the maximum mixed state of nn bits and II is the unit matrix.

encoding-u3

A more rigorous description and proof of the theorem can be found in [5], and the focus of this tutorial will be on what this conclusion implies and illuminates.

  • This means that the average quantum state converges to the maximum mixed state at an exponential rate as the circuit depth increases. For example, if the average quantum states of both classes 0 and 1 eventually converge to the maximum mixed state, it will be impossible to distinguish the average quantum states of these two classes from the standpoint of quantum information, i.e., it will be impossible to distinguish classical data features in an average sense.

  • When the classical data features have a high dimensionality (e.g., picture data), angle encoding may not be the best option. This is due to the fact that it can easily lead to very deep encoding circuits, and widening circuits may run into the barren plateau problem, which severely limits VQA's capability. As a result, some dimensionality reduction operations must be performed on the traditional data before it is fed into the circuit.

Paddle Quantum Implementation¶

This section focuses on two experiments using Paddle Quantum on the MNIST dataset:

  • Exploring the trend of the Petz-Rényi divergence between the average quantum state and the maximum mixed state with the depth of the data encoding circuit
  • Investigate changes in classification accuracy as the data encoding circuit becomes deeper.

First import the relevant packages¶

In [2]:
# import numpy, paddle, and paddle_quantum
import numpy as np
import paddle
import paddle_quantum

# import circuit module
from paddle_quantum.ansatz import Circuit

# import some function
from numpy import pi as PI
from paddle import matmul, transpose, reshape, real, argmax, cast, mean, concat, real
from paddle_quantum.qinfo import pauli_str_to_matrix 
from paddle_quantum.linalg import dagger
import paddle.nn.functional as F

# dataset tool
from paddle_quantum.dataset import MNIST

# plot and time module
from matplotlib import pyplot as plt
from pylab import axes
import time

Parameterized Quantum Circuit¶

encoding-ry

The red box on the left in Figure 3 is the data encoding circuit, a special case of Figure 2, and the blue box on the right is the quantum neural network. The data encoding circuit here is made up of RyRy and CNOT, and the specific circuit depth DD is determined by the data feature dimension. The MNIST dataset will be used in this tutorial and the images are downscaled to 16-dimensional feature vectors. The number of quantum bits is chosen to be 8, 6, 4, 3, 2, and the corresponding circuit depths are 2, 3, 4, 6, 8. The positions larger than 16 dimensions will be filled with 0, i.e., Ry(0)Ry(0). The quantum neural network part consists of the single-bit universal gate U3U3 and CNOT. The specific circuit depth LL can be set freely.

Encoding classical Data into quantum states¶

Here you need the dataset tool dataset provided by Paddle Quantum. The MNIST data are encoded into quantum states using the data encoding circuit shown in Figure 2 before being fed into the quantum neural network for training.

In [4]:
train_data, test_data = [], []

# Binary classification task 
classes = [3,6]

training_data_num = 1000
testing_data_num = 200
qubit_num_list = [8, 6, 4, 3, 2]

# Encode classical data using circuits of different widths and depths and save them
for qubit_num in qubit_num_list:
    
    # training dataset
    train_dataset = MNIST(mode='train', encoding='real_entangled_encoding', num_qubits=qubit_num,
                          classes=classes,
                          data_num=training_data_num,
                          downscaling_method='resize', target_dimension=16,
                          need_relabel=True, return_state=True)

    # validation dataset
    val_dataset = MNIST(mode='test', encoding='real_entangled_encoding', num_qubits=qubit_num,
                        classes=classes,
                        data_num=testing_data_num,
                        downscaling_method='resize', target_dimension=16,
                        need_relabel=True, return_state=True)

    # x and y
    train_x, train_y = train_dataset.quantum_image_states, train_dataset.labels
    test_x, test_y = val_dataset.quantum_image_states, val_dataset.labels
    train_data.append((train_x, train_y))
    test_data.append((test_x, test_y))
In [5]:
print(train_data[0][0].shape)
print(train_data[0][1].shape)
print(test_data[0][0].shape)
print(test_data[0][1].shape)
[1000, 256]
(1000,)
[200, 256]
(200,)

Building Quantum Neural Network¶

In [6]:
# Construct model
class Net(paddle.nn.Layer):
    """
    construct network
    """
    def __init__(self, n, depth):
        # Initialize the circuit: n, depth
        super(Net, self).__init__()
        self.n = n
        self.depth = depth
        
        self.circuit = Circuit(n)
        # Add layers of rotation gates
        for i in range(n):
            self.circuit.rz(qubits_idx=i)
            self.circuit.ry(qubits_idx=i)
            self.circuit.rz(qubits_idx=i)

        # default depth = 1
        # Add layers of entanglement
        for d in range(3, depth + 3):
            for i in range(n-1):
                self.circuit.cnot(qubits_idx=[i, i + 1])
            self.circuit.cnot(qubits_idx=[n-1, 0])
            for i in range(n):
                self.circuit.ry(qubits_idx=i)

    # Define forward propagation mechanism, and then calculate loss function and cross-validation accuracy
    def forward(self, state_in, label):
        """
        Input: 
            state_in: input quantum state, shape: [-1, 1, 2^n] -- Here is [BATCH, 1, 2^n]
            label: labels of input quantum state, shape: [-1, 1]
        Loss function:
            The cross entropy loss 
        """
        # Initialize theta 
        Utheta = self.circuit.unitary_matrix()

        # row vector operations here to speed up 
        state_out = matmul(state_in, Utheta)  # shape [-1, 1, 2 ** n]

        # Measure the expected value of the pauli Z operator <Z>
        Ob1 = paddle.to_tensor(pauli_str_to_matrix([[1.0, 'z0']], self.n))
        E_Ob1 = matmul(matmul(state_out, Ob1), transpose(paddle.conj(state_out), perm=[0, 2, 1]))
        E_Ob1_re = reshape(real(E_Ob1), [-1, 1])

        Ob2 = paddle.to_tensor(pauli_str_to_matrix([[1.0, 'x0']], self.n))
        E_Ob2 = matmul(matmul(state_out, Ob2), transpose(paddle.conj(state_out), perm=[0, 2, 1]))
        E_Ob2_re = reshape(real(E_Ob2), [-1, 1])

        outputs = concat([E_Ob1_re, E_Ob2_re], axis=-1)
    
        # Calculate loss and accuracy
        loss = F.cross_entropy(outputs, label)
        acc = mean(cast(argmax(outputs, axis=-1) == label, "float32"))
        
        return loss, acc
In [7]:
# Define a classifier
def QClassifier(train_x, train_y, test_x, test_y, N, D, EPOCH, LR, BATCH, seed=0):
    """
    Quantum binary classifier
    """
    train_y = paddle.to_tensor(train_y, dtype="int64")
    test_y = paddle.to_tensor(test_y, dtype="int64")

    N_train, in_dim = train_x.shape
    
    # Initialize the neural network
    paddle.seed(0)
    net = Net(n=N, depth=D)

    # Generally speaking, we use Adam optimizer to obtain relatively good convergence,
    # You can change it to SGD or RMS prop.
    opt = paddle.optimizer.Adam(learning_rate=LR, parameters=net.parameters())

    # Optimization loop
    for ep in range(EPOCH):
        for itr in range(N_train // BATCH):
            input_state = train_x[itr * BATCH:(itr + 1) * BATCH]
            input_state = reshape(input_state, [-1, 1, 2 ** N])
            label = train_y[itr * BATCH:(itr + 1) * BATCH]

            test_input_state = reshape(test_x, [-1, 1, 2 ** N])

            # Forward propagation to calculate loss and accuracy
            train_loss, train_acc = net(state_in=input_state, label=label)

            if itr % 3 == 0:
                # Compute test accuracy and loss
                loss_useless, test_acc = net(state_in=test_input_state, label=test_y)
                print("epoch:", ep, "iter:", itr,
                      "train loss: %.4f" % train_loss.numpy(),
                      "train acc: %.4f" % train_acc,
                      "test acc: %.4f" % test_acc)

            # Use back propagation to minimize the loss function
            train_loss.backward()
            opt.minimize(train_loss)
            opt.clear_grad()
    
    # Compute test accuracy and loss
    _, test_acc = net(state_in=test_input_state, label=test_y) 
    
    return test_acc.numpy()
In [8]:
time_start = time.time()

acc_list = []

for i in range(5):
    print('***************************** qubit num : %s *****************************'%qubit_num_list[i])
    train_x, train_y = train_data[i]
    test_x, test_y = test_data[i]

    acc = QClassifier(
        train_x,                  # training data x
        train_y,                  # training data label
        test_x,                   # test data x
        test_y,                   # test data label
        N=qubit_num_list[i],      # Number of qubits
        D=qubit_num_list[i] + 2,  # Circuit depth
        EPOCH=5,                  # Number of training epochs
        LR=0.05,                  # Learning rate
        BATCH=200,                # Batch size
        seed=0
    )
    acc_list.append(acc)      

time_span = time.time() - time_start
print('time used:', time_span, 's')
***************************** qubit num : 8 *****************************
epoch: 0 iter: 0 train loss: 0.6913 train acc: 0.5200 test acc: 0.4250
epoch: 0 iter: 3 train loss: 0.6836 train acc: 0.6500 test acc: 0.5200
epoch: 1 iter: 0 train loss: 0.6759 train acc: 0.6850 test acc: 0.5550
epoch: 1 iter: 3 train loss: 0.6709 train acc: 0.6950 test acc: 0.5650
epoch: 2 iter: 0 train loss: 0.6651 train acc: 0.6650 test acc: 0.5700
epoch: 2 iter: 3 train loss: 0.6621 train acc: 0.7050 test acc: 0.6050
epoch: 3 iter: 0 train loss: 0.6589 train acc: 0.6900 test acc: 0.6000
epoch: 3 iter: 3 train loss: 0.6597 train acc: 0.7050 test acc: 0.6250
epoch: 4 iter: 0 train loss: 0.6563 train acc: 0.7150 test acc: 0.6550
epoch: 4 iter: 3 train loss: 0.6566 train acc: 0.7250 test acc: 0.6700
***************************** qubit num : 6 *****************************
epoch: 0 iter: 0 train loss: 0.6966 train acc: 0.4900 test acc: 0.5250
epoch: 0 iter: 3 train loss: 0.6938 train acc: 0.4850 test acc: 0.5450
epoch: 1 iter: 0 train loss: 0.6884 train acc: 0.5350 test acc: 0.5450
epoch: 1 iter: 3 train loss: 0.6862 train acc: 0.5400 test acc: 0.5700
epoch: 2 iter: 0 train loss: 0.6775 train acc: 0.5750 test acc: 0.5850
epoch: 2 iter: 3 train loss: 0.6744 train acc: 0.6100 test acc: 0.6000
epoch: 3 iter: 0 train loss: 0.6642 train acc: 0.6350 test acc: 0.5950
epoch: 3 iter: 3 train loss: 0.6615 train acc: 0.6450 test acc: 0.6200
epoch: 4 iter: 0 train loss: 0.6526 train acc: 0.6900 test acc: 0.6300
epoch: 4 iter: 3 train loss: 0.6560 train acc: 0.6250 test acc: 0.6350
***************************** qubit num : 4 *****************************
epoch: 0 iter: 0 train loss: 0.7081 train acc: 0.4650 test acc: 0.5350
epoch: 0 iter: 3 train loss: 0.6994 train acc: 0.4950 test acc: 0.5900
epoch: 1 iter: 0 train loss: 0.6902 train acc: 0.5450 test acc: 0.5750
epoch: 1 iter: 3 train loss: 0.6942 train acc: 0.5150 test acc: 0.5800
epoch: 2 iter: 0 train loss: 0.6869 train acc: 0.6100 test acc: 0.5850
epoch: 2 iter: 3 train loss: 0.6923 train acc: 0.5150 test acc: 0.6000
epoch: 3 iter: 0 train loss: 0.6825 train acc: 0.5700 test acc: 0.6050
epoch: 3 iter: 3 train loss: 0.6917 train acc: 0.5200 test acc: 0.5950
epoch: 4 iter: 0 train loss: 0.6776 train acc: 0.5850 test acc: 0.5800
epoch: 4 iter: 3 train loss: 0.6901 train acc: 0.5450 test acc: 0.6050
***************************** qubit num : 3 *****************************
epoch: 0 iter: 0 train loss: 0.7104 train acc: 0.4550 test acc: 0.5000
epoch: 0 iter: 3 train loss: 0.6931 train acc: 0.4950 test acc: 0.4900
epoch: 1 iter: 0 train loss: 0.6928 train acc: 0.4600 test acc: 0.4700
epoch: 1 iter: 3 train loss: 0.6968 train acc: 0.4800 test acc: 0.4800
epoch: 2 iter: 0 train loss: 0.6964 train acc: 0.5100 test acc: 0.4750
epoch: 2 iter: 3 train loss: 0.7004 train acc: 0.5150 test acc: 0.4600
epoch: 3 iter: 0 train loss: 0.6961 train acc: 0.5050 test acc: 0.4800
epoch: 3 iter: 3 train loss: 0.6957 train acc: 0.5300 test acc: 0.4850
epoch: 4 iter: 0 train loss: 0.6938 train acc: 0.5250 test acc: 0.5150
epoch: 4 iter: 3 train loss: 0.6919 train acc: 0.5150 test acc: 0.5250
***************************** qubit num : 2 *****************************
epoch: 0 iter: 0 train loss: 0.7031 train acc: 0.5450 test acc: 0.4800
epoch: 0 iter: 3 train loss: 0.6960 train acc: 0.5350 test acc: 0.4550
epoch: 1 iter: 0 train loss: 0.6932 train acc: 0.5100 test acc: 0.5100
epoch: 1 iter: 3 train loss: 0.6930 train acc: 0.5250 test acc: 0.5200
epoch: 2 iter: 0 train loss: 0.6930 train acc: 0.5150 test acc: 0.4750
epoch: 2 iter: 3 train loss: 0.6933 train acc: 0.5050 test acc: 0.4600
epoch: 3 iter: 0 train loss: 0.6925 train acc: 0.5400 test acc: 0.4600
epoch: 3 iter: 3 train loss: 0.6909 train acc: 0.5350 test acc: 0.4700
epoch: 4 iter: 0 train loss: 0.6939 train acc: 0.5450 test acc: 0.4750
epoch: 4 iter: 3 train loss: 0.6853 train acc: 0.5600 test acc: 0.4750
time used: 184.6011986732483 s

The Petz-Rényi divergence of the average quantum state and the maximum mixed state¶

In [9]:
fig = plt.figure(1)
ax = axes([0.15, 0.15, 0.8, 0.8])


def average_encode_state(train_x):
    d1, d2 = train_x.shape
    density_matrices = np.reshape(train_x, [d1, d2, 1]) @ np.reshape(np.conj(train_x), [d1, 1, d2])
    return np.mean(density_matrices, axis=0)


depth_list = [2, 3, 4, 6, 8]

Q_2Renyi_D3_list = []
Q_2Renyi_D6_list = []

for i,q in enumerate(qubit_num_list):
    train_x, train_y = train_data[i]
    train_x = train_x.numpy()
    
    train_x0 = train_x[train_y == 0]
    train_x1 = train_x[train_y == 1]
    average_state0 = average_encode_state(train_x0.real)
    average_state1 = average_encode_state(train_x1.real)
#     print(average_state)
    Q_2Renyi_D3 = np.log2(np.trace(average_state0 @ average_state0) * 2 ** q)
    Q_2Renyi_D6 = np.log2(np.trace(average_state1 @ average_state1) * 2 ** q)
#     bound = np.sum(np.sqrt(S0)) ** 2 / 2 ** num_qubits
    Q_2Renyi_D3_list.append(Q_2Renyi_D3)
    Q_2Renyi_D6_list.append(Q_2Renyi_D6)

print(Q_2Renyi_D3_list)
print(Q_2Renyi_D6_list)


func3, = ax.plot(depth_list, Q_2Renyi_D3_list, linewidth=1.5,
                 marker="<",
                 linestyle=":",
                 color="r"
                 )

func6, = ax.plot(depth_list, Q_2Renyi_D6_list, linewidth=1.5,
                 marker="o",
                 linestyle="-.",
                 color="g"
                 )

plt.xticks(fontsize=22)
plt.yticks(fontsize=22)

plt.xlabel("Depth", fontsize=22)
plt.ylabel(r"$D_2(\overline{\rho} || I/2^n)$", fontsize=22)
ax.semilogy()
ax.legend(handles=[func3, func6],
          labels=["Digit 3", "Digit 6"],
          loc="best",
          fontsize=18)
ax.grid(axis="y")
plt.show()
[1.9842463368536747, 0.5256437464833253, 0.1964853652901484, 0.05162865627740749, 0.022790754263846934]
[1.8628793706046225, 0.6064395532199834, 0.1529884926031612, 0.04173701231534178, 0.009023512622560221]
No description has been provided for this image

Investigating the effect on classification accuracy¶

In [10]:
print(acc_list)

fig = plt.figure(1)
ax = axes([0.15, 0.15, 0.8, 0.8])

func36_acc, = ax.plot(depth_list, acc_list, linewidth=1.5,
                      marker="*",
                      linestyle="--",
                      color="b"
                      )

plt.xticks(fontsize=22)
plt.yticks(fontsize=22)
plt.ylim(0.48, 0.75)
plt.xlabel("Depth", fontsize=22)
plt.ylabel(r"Test Accuracy", fontsize=22)
ax.legend(handles=[func36_acc,],
          labels=["classifying 3 and 6"],
          loc="best", fontsize=22)
ax.grid(axis="y")
plt.show()
[array([0.675], dtype=float32), array([0.61], dtype=float32), array([0.585], dtype=float32), array([0.535], dtype=float32), array([0.475], dtype=float32)]
No description has been provided for this image

Conclusion¶

According to the experimental results, the average quantum state of each category tends to the maximum mixed state at an exponential rate as the data encoding circuit deepens. As a result, the final quantum neural network's classification accuracy decreases. With the help of the literature [5], we have come to understand some of the limitations of angle coding. It is also acknowledged that designing a data coding strategy capable of solving real-world problems (often with high dimensionality of data features) is both urgent and difficult.


References¶

[1] Schuld, M. "Quantum machine learning models are kernel methods." arXiv preprint arXiv:2101.11020.(2021)

[2] Lloyd, Seth, et al. "Quantum embeddings for machine learning." arXiv preprint arXiv:2001.03622 (2020).

[3] Caro, Matthias C., et al. "Encoding-dependent generalization bounds for parametrized quantum circuits." Quantum 5 (2021): 582.

[4] Leonardo Banchi, et al. "Generalization in quantum machine learning: A quantum information standpoint." PRX Quantum 2.4 (2021): 040321.

[5] Li, Guangxi, et al. "Concentration of Data Encoding in Parameterized Quantum Circuits." arXiv preprint arXiv:2206.08273 (2022).