Ink-washing Paintings
Ink-washing painting is a form of art which depicts nature with the use of water and ink. With the evolution of history, ink painting has developed various forms such as coloured ink and claborate style, offering rich artistic appreciation values. In terms of the content, there are many ways to divide the content of ink painting. The so-called “painting is divided into three subjects”, which can be summarized into three categories: man and nature, man and life, and man and society.
Given time constraints, we will focus on collecting landscape paintings in this project, i.e., paintings that mainly depict mountains, rivers, and natural landscapes.
Due to the limited number of high-definition electronic landscape paintings in the university library, we spent time collecting and sorting out landscape paintings. With the assistance of many parties, we successfully obtained more than 3,000 paintings. We would like to express our warm gratitude to the Art Museum of the Chinese University, the Chinese University of Hong Kong Library, and the New Asia College Ch’ien Mu Library.
This collection of raw data includes digitized image files, images obtained by scanning physical books, and photos taken by mobile phones.
Data Preprocessing
In this section, we will discuss methods for data preprocessing and the reasonings behind them. Since directly fitting the raw data into the model may lead to inaccurate or unsatisfactory results, we managed to handle the data in advance. The procedures are: (i) manually removing the meaningless background of an image; (ii) splitting an image into several pieces depending on its height-to-width ratio; (iii) cropping the largest square for each piece; (iv) resizing each square piece to 1024×1024. The full procedure will be demonstrated here:
1. Background Removal
This part was done by using Python 3.9. We mainly relied on OpenCV to crop and resize an image. Another popular library Python Imaging Library (PIL) is also recommended for image manipulation, but we didn’t apply this in our project. Libraries we used:
import cv2 # Image manipulation
import matplotlib.pyplot as plt # Plot
import numpy as np # Math operation
from tqdm import tqdm # Progress bar
Then, we defined a function for loading an image and a function for displaying it.
# Function for read an image
def img_read(img_path):
img = cv2.imread(img_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
return img
# Function for displaying an image
def img_display(img):
plt.figure(figsize=(4,4))
plt.imshow(img)
plt.axis("off")
plt.tight_layout()
plt.show()
Example
Fig. 1 was taken by our smartphone. Apparently, it contains meaningless background (i.e., table, lighting from outside) or “noise” that may affect later training processes. Although it sounds trivial to remove the background at first, it is not easy to determine a general rule for extracting the desired part from raw images with various “noises”. Thus, we did not use programming and eventually removed the background manually.
Fig. 1: Original image
Fig. 2: After removing the meaningless background
2. Image Cropping and Resizing
We used the model that works best for square images of the size being the power of 2 (e.g. 512×512 and 1024*1024). Since generating high-resolution images is of our interest, we preferred the sizes of 1024×1024. As a reminder, the size of the input is a predefined parameter in a deep-learning model. In other words, we have to resize all images to 1024×1024. Below is the function for resizing an image with arbitrary sizes.
# Function for reshape an image
def img_resize(img, size):
img = cv2.resize(img, dsize=size, interpolation=cv2.INTER_AREA)
return img
We adopted the following cropping method concerning the height-to-width ratio (HWR) or width-to-height ratio (WHR) of an image. When the height of an image is higher than its width, there will be 3 cases depending on HWR:
- Case 1 – HWR < 1.5: If it is a nearly square image, we will extract the largest square sub image from the center of it.
- Case 2 – 1.5 <= HWR < 3: We find the largest square sub images from left, right and center. In total, 3 square images will be obtained.
- Case 3 – HWR >= 3: It is similar to case 2, but one more square image will be cut.
The cropping method for W>H is similar. At last, images were split into pieces, and we had more data available for training. Roughly speaking, it is kind of a data augmentation technique for increasing the number of data. On the other hand, it is emphasized that the above cropping method is not the only rule for cutting an image. Other reasonable rules of cropping are also applicable.
Turn it into program, we have the following function for image cropping:
# Function for image cropping
def img_split(img):
h, w, _ = img.shape
split_hor = h > w
ratio = h/w if split_hor else w/h
if split_hor:
if ratio < 1.5:
split = np.linspace(start=0, stop=h, num=3)
split = list(map(round, split))
return [img[(split[1] - w // 2) : (split[1] + w // 2), :]]
elif 1.5 <= ratio < 3:
split = np.linspace(start=0, stop=h, num=3)
split = list(map(round, split))
img_1 = img[split[0] : w, :]
img_2 = img[(split[1] - w // 2) : (split[1] + w // 2), :]
img_3 = img[(split[2] - w) : split[2] ,:]
return [img_1, img_2, img_3]
elif ratio >= 3:
split = np.linspace(start=0, stop=h, num=4)
split = list(map(round, split))
img_1 = img[split[0] : w, :]
img_2 = img[(split[1] - w // 2) : (split[1] + w // 2), :]
img_3 = img[(split[2] - w // 2) : (split[2] + w // 2), :]
img_4 = img[(split[3] - w) : split[3] ,:]
return [img_1, img_2, img_3, img_4]
else:
if ratio < 1.5:
split = np.linspace(start=0, stop=w, num=3)
split = list(map(round, split))
return [img[:, (split[1] - h // 2) : (split[1] + h // 2)]]
elif 1.5 <= ratio < 3:
split = np.linspace(start=0, stop=w, num=3)
split = list(map(round, split))
img_1 = img[:, split[0] : h]
img_2 = img[:, (split[1] - h // 2) : (split[1] + h // 2)]
img_3 = img[:,(split[2] - h) : split[2]]
return [img_1, img_2, img_3]
elif ratio >= 3:
split = np.linspace(start=0, stop=w, num=4)
split = list(map(round, split))
img_1 = img[:, split[0] : h]
img_2 = img[:, (split[1] - h // 2) : (split[1] + h // 2)]
img_3 = img[:, (split[2] - h // 2) : (split[2] + h // 2)]
img_4 = img[:, (split[3] - h) : split[3]]
return [img_1, img_2, img_3, img_4]
Finally, a function for preprocessing a file of images is given below:
def crop_and_resize(img_path, output_path, save_category, size=(1024, 1024)):
count = 1
for file in tqdm(os.listdir(img_path)):
if file == '.DS_Store':
continue
image = img_read(os.path.join(img_path, file))
image_splited = img_split(image)
for img in image_splited:
img = img_resize(img, size)
cv2.imwrite(output_path + save_category + "_" + str(count) + ".jpg", img)
count = count + 1
Demonstration
Finally, we need to read or retrieve the images after preprocessing for further training. The output of the code below is referred to Fig. 2
img_path = "/Users/yihongan/Desktop/Example.png"
example_img = img_read(img_path)
img_display(example_img)
Then, apply the img_sphit
and the img_resize
functions to example_img
and display the results one by one.
cropped_img = img_split(example_img)
for img in cropped_img:
img_display(img))
Fig. 3
Modelling
*For the following explaination, an image will be referred to as “fake” if it is generated by a model.
From a statistical point of view, achieving a new observation (i.e., computer-generated ink-washing painting) requires knowledge of the probability distribution of ink-washing painting. Since the true probability distribution is unknown, our goal is to estimate it by the data. However, the density estimation of high dimensional data such as images is difficult, due to the curse of dimensionality. Regarding this issue, we focus on Generative Adversarial Network (GAN). It is the class of state-of-art deep learning models which is capable of effective learning on the distribution, allowing good-looking “fake” images to be generated.
Generative Adversarial Network (GAN)
Despite being commonly used in image generation, GAN is also a general framework for handling many other applications including but not limited to image style transfer, music generation, regression and classification. The mathematical philosophy behind GAN is game theory. Each GAN consists of 2 neural networks: the generator and the discriminator. As their names suggested, the generator is responsible for sampling “fake” images while the discriminator will verify whether the given image is a “fake” image or the observed real data. In other words, they are playing a zero-sum game against each other. Consequently, GAN somehow come up with the “best” strategy for learning the data distribution .
Many existing GAN models as DCGAN require a large dataset in order to achieve satisfactory results. We employed the StyleGAN 2 ADA which was proposed by Nvidia, as it can resolve the above problem by adopting an adaptive discriminator augmentation (ADA). We trained the model on Google Colab, an online version of Jupiter notebook with computation being done by Google. The main reason for choosing this platform is that it provides a strong CUDA GPU which is needed for training the model. With upgrading to the pro user, the A100-SXM4-40GB GPU was provided for efficient training. In addition, it allows users to connect to Google Drive and upload or download their dataset. Accordingly, users may make good use of the resources provided here and perfect the training in the future.
Training StyleGAN 2 ADA
Again, the training process is compytational extensive, and using a powerful CUDA GPU is necessary. The basic code for training StyleGAN 2 ADA can be found online. First, we downloaded the model from
!git clone https://github.com/NVlabs/stylegan2-ada-pytorch # or
!git clone https://github.com/dvschultz/stylegan2-ada-pytorch
The former one is the official implementation by Nvidia, and the second one is basically the same as the former one but has additional applications from others. Thus, we recommend users download the second link for broader utilization.
In our project, we applied the data preprocessing tool provided by Nvidia. The below step is suggested to spot out any errors:
data_path = "/content/drive/MyDrive/data/gan/images/山水"
output_path = "/content/drive/MyDrive/data/gan/dataset/山水"
cmd = f"python /content/stylegan2-ada-pytorch/dataset_tool.py --source={data_path} --dest={output_path}"
!{cmd}
There were many hyperparameters in the models, so we defined some important hyperparamters explicitly.
# Hyperparamters setting
# Users specify
data_path = "/content/drive/MyDrive/data/gan/dataset/山水"
output_path = "/content/drive/MyDrive/data/gan/training/"
aug_strength = 0
train_count = 0
# No need to change it unless for special purpose
gamma_value = 50.0
augs = 'bg'
config = '11gb-gpu'
snapshot_count = 4
Started to train the model
cmd = f"!python /content/stylegan2-ada-pytorch/train.py --gpus 1 --cfg={config} --metrics None --outdir={output_path} --data={data_path} --snap={snapshot_count} --augpipe={augs} --initstrength={aug_strength} --gamma={gamma_value} --nkimg={train_count}"
!{cmd}
Image Generation with Trained Model
First of all, we imported all necessary libraries and loaded the generator from the trained model. It is noted that a CUDA GPU is needed for using the model.
# Import libraries
import pickle
import os
import numpy as np
import PIL.Image
from IPython.display import Image
import matplotlib.pyplot as plt
import IPython.display
import torch
import dnnlib
import legacy
from math import ceil
# Load the generator
network_pkl = "/content/drive/MyDrive/data/gan/training/ink.plt"
device = torch.device('cuda')
with dnnlib.util.open_url(network_pkl) as f:
G = legacy.load_network_pkl(f)['G_ema'].to(device)
The generator in StyleGAN 2 ADA accepted a 512 dimensional vector for generating an image. We then inputted a list of 512 numbers to the generator. Since randomly generating inputs by seed is convenient, we therefore:
# Function to randomly generate a 512 dimensional vector from seed
def seed2vec(G, seed):
return np.random.RandomState(seed).randn(1, G.z_dim)
Then, we defined a function for generating images from a 512 dimensional vector. Additionally, a function for defining the class label was adopted, in spite of having only one class of data.
# Function for defining the class label
def get_label(G, device):
return torch.zeros([1, G.c_dim], device=device)
# Function for generating an image from vector
def generate_image(device, G, z, truncation_psi=1.0, noise_mode='const'):
z = torch.from_numpy(z).to(device)
label = get_label(G, device)
img = G(z, label, truncation_psi=truncation_psi, noise_mode=noise_mode)
img = (img.permute(0, 2, 3, 1) * 127.5 + 128).clamp(0, 255).to(torch.uint8)
return PIL.Image.fromarray(img[0].cpu().numpy(), 'RGB')
Combining the above functions, we had a function for generating multiple images from multiple seeds.
# Function for generating images from seeds
def get_images_from_seeds(seeds):
imgs = []
for i in seeds:
print(f"Seed {i}")
z = seed2vec(G, i)
img = generate_image(device, G, z)
imgs.append(img)
return imgs
The functions below enabled us to visualize several images in a grid.
def plot_images(images, scale=0.25, rows=1):
w, h = images[0].size
w = int(w*scale)
h = int(h*scale)
height = rows*h
cols = ceil(len(images) / rows)
width = cols*w
canvas = PIL.Image.new('RGBA', (width,height), 'white')
for i, img in enumerate(images):
img = img.resize((w,h), PIL.Image.ANTIALIAS)
canvas.paste(img, (w*(i % cols), h*(i // cols)))
return canvas
Demonstration
Consider lists of seeds 2000, 2001, …, 2009 as the input of the function get_images_from_seeds
, we got images shown in Fig. 4. The generated images intuitively look like the real ink-washing paintings which yields that the model successfully learns the distribution of the training data.
*It should be noted that the judgment relies on intuitive experience rather than professional knowledge. Given the limited data, our current training results may not be able to “escape” the professional judgements. However, the current results can already satisfy the imagination of ink-washing paintings held by ordinary people.
seeds = range(2000, 2010)
images = get_images_from_seeds(seeds)
plot_images(images, rows=2)
Fig. 4
Model Exploration
When we use the model to successfully generate landscape paintings, we can’t help but think about what this result means. How does the computer learn? How exactly is the picture generated? In order to further explore the laws behind the training data, we decided to start with dimensional vectors.
The generator needs 512 dimensional vectors for generating an image. The collection of all 512 dimensional vectors is called the Z-space. Usually, the Z-space follows either a uniform distribution or the normal distribution. However, uniform distribution or normal distribution is often not a good approximation to the distribution of the training data. Therefore, there is one additional mapping network in StyleGAN 2 ADA. It maps a point in the Z-space to another space called W-space. The mapping is also part of learnings in training so that the distribution of W space is much closer to the distribution of the training data. Both Z-space and W-space are referred to as the latent space of the model.
Intuitively, one may think that the Z-space and the W-space can be separated into many regions, with each region corresponding to a certain type or style of images. This property motivated us to investigate which direction we move in the latent space will lead to the change of styles of an image. In other words, our goal is to find important feature vectors that represent different features of ink-washing paintings.
Function for mapping a vector from Z-space to W-space:
def z2w(device, G, z):
z = torch.from_numpy(z)
z = z.to(device)
label = get_label(G, device)
return G.mapping(z, label, truncation_psi=1, truncation_cutoff=8)
Function for synthesizing an image from point in W-space:
def synthesize_image(G, w, noise_mode='const'):
img = G.synthesis(w, noise_mode=noise_mode, force_fp32=True)
img = (img.permute(0, 2, 3, 1) * 127.5 + 128).clamp(0, 255).to(torch.uint8)
return PIL.Image.fromarray(img[0].cpu().numpy(), 'RGB')
Linear Interpolation
Before jumping into feature vectors, we shall also understand the concept of linear interpolation. Linear interpolation means finding points within a straight line with a predefined starting point and end point. Imagining that we pick two points, A and B, in the latent space, where A and B associate with image A and image B respectively. If we slightly move one step from A to B, we will get an image which is slightly different from image A. Continuing the movement, we will get an image which gradually looks more akin to Image B as we get closer to B. Therefore, we are able to examine the effect of change in style in one direction by using linear interpolation.
It is emphasized that even if the starting point and end point in Z-space and W-space are the same, the effect of linear interpolation is slightly different. As mentioned above, W-space is more structured, and thus the change in style will be more stable, whereas the changes in Z-space will be more fluctuated. We will clarify the differences below demonstration.
There are many kinds of interpolation, such as circular interpolation, in which the movement is along a circle. If you are interested, you may explore them.
For implementation, we first defined a function for calculation points within two given points.
def lerp(zs, steps):
out = []
for i in range(len(zs)-1):
for index in range(steps):
t = index/float(steps)
out.append(zs[i+1]*t + zs[i]*(1-t))
return out
Then, a function for linear interpolation in Z-space or W-space was given below.
def get_linear_interpolation(seed1, seed2, space=["z", "w"], num_frame = 10):
z1 = seed2vec(G, seed1)
z2 = seed2vec(G, seed2)
images = []
if space == "z":
frame = lerp([z1, z2], num_frame)
for idx, z in enumerate(frame):
print('Generating frame %d/%d' % (idx, len(frame)))
img = generate_image(device, G, z)
images.append(img)
elif space == "w":
w1 = z2w(device, G, z1)
w2 = z2w(device, G, z2)
frame = lerp([w1, w2], num_frame)
for idx, w in enumerate(frame):
print('Generating frame %d/%d' % (idx + 1, len(frame)))
img = synthesize_image(G, w)
images.append(img)
return images
Feature Vectors Extraction
As mentioned before, feature vectors indicate the direction of movement which lead to significant changes in image styles. There are many methods, including supervised and unsupervised approaches for feature vector extraction. In particular, we considered Closed Form Factorization, which belongs to the class of unsupervised methods. The advantages of this method are low computation cost and high flexibility to various kinds of GAN models. Since the dimension of the Z-space in our model is 512×512, there is a total of 512 feature vectors with different levels of importance listed in ascending order. Specifically, the first feature vector will induce the greatest changes in style.
Computing feature vectors:
# Generate feature vectors
modulate = {
k[0]: k[1]
for k in G.named_parameters()
if "affine" in k[0] and "torgb" not in k[0] and "weight" in k[0] or ("torgb" in k[0] and "b4" in k[0] and "weight" in k[0] and "affine" in k[0])
}
weight_mat = []
for k, v in modulate.items():
weight_mat.append(v)
W = torch.cat(weight_mat, 0)
# Feature vectors
eigvec = torch.linalg.svd(W).Vh.to("cpu")
Linear interpolation in Z-space or W-space along with the direction of feature vector:
def get_feature_linear_interpolation(seed, feature_vector, space=["z", "w"],vector_index=0, degree=10, num_frame=10):
z = seed2vec(G, seed)
z = torch.from_numpy(z)
current_eigvec = feature_vector[vector_index]
direction = degree * current_eigvec
z0 = z - direction
z1 = z + direction
images = []
if space == "z":
frame = lerp([z0.numpy(), z1.numpy()], num_frame)
for idx, z in enumerate(frame):
print('Generating frame %d/%d' % (idx, len(frame)))
img = generate_image(device, G, z)
images.append(img)
elif space == "w":
w0 = z2w(device, G, z0.numpy())
w1 = z2w(device, G, z1.numpy())
frame = lerp([w0, w1], num_frame)
for idx, w in enumerate(frame):
print('Generating frame %d/%d' % (idx + 1, len(frame)))
img = synthesize_image(G, w)
images.append(img)
return images
Demonstration
We will first display the difference between linear interpolation in Z-space and W-space. Seeds 301 and 401 are chosen as the starting point and the end point respectively.
Linear interpolation in W-space:
images = get_linear_interpolation(301, 401, "w", 7)
plot_images(images, rows=1))
Fig. 5
Linear interpolation in Z-space:
images = get_linear_interpolation(301, 401, "z", 7)
plot_images(images, rows=1)
Fig. 6
With the starting point of seed 301, we will use the first and second feature vector as the directions of the movement
Linear interpolation in W-space along with the direction of the first feature vector:
feature_0 = get_feature_linear_interpolation(301, eigvec, space = "w", num_frame = 7)
plot_images(feature_0)
Fig. 7
Linear interpolation in Z-space along with the direction of the first feature vector:
feature_0 = get_feature_linear_interpolation(301, eigvec, space = "z", num_frame = 7)
plot_images(feature_0)
Fig. 8
Linear interpolation in W-space along with the direction of the second feature vector:
feature_1 = get_feature_linear_interpolation(301, eigvec, space = "w", vector_index=1, num_frame=7)
plot_images(feature_1)
Fig. 9
Linear interpolation in Z-space along with the direction of the second feature vector:
feature_1 = get_feature_linear_interpolation(301, eigvec, space = "z", vector_index=1, num_frame=7)
plot_images(feature_1)
It may not be easy to tell the differences between the interpolation process. The changes may concern the height of a mountain, density of trees, brightness, etc. With limited time and expertise, we may not provide a complete answer here. Hence the meaning or interpretation of these feature vectors leaves for future study. We believe that clarifying these features may help beginners or ordinary people to have clear ideas of the skills and artistic conceptions in ink-washing paintings.
Produced by CUHK Library | Eva So | Henry Ngan