Writing Synthetic Data generator using Opencv
In this story, we will be writing a simple script to generate synthetic data for anomaly detection which can be used to train neural networks.
Synthetic data has gained a lot of hype this year and I thought to toy a little bit with it. The challenge in the practical world when you want to use neural networks is “data”. Good, clean and labelled data is any computer vision engineer’s/researcher’s dream.
This year I faced an interesting problem where I had to build anomaly detection tool. The challenge was that the assembly line is extremely efficient and will generate almost 10–15 faulty parts every quarter! In such scenario how does one devise a strategy to collect data?
We were doing a lot of brainstorming on whiteboard around this issue. Suddenly I recalled a podcast where the creators of AirSim were discussing how they trained their models on simulator and then that model performed extremely well in real world scenario.
Then without telling the anomaly detecting team anything, I went to my coding cocoon to build a simple synthetic anomaly data generator using nothing but opencv. I wanted to build a proof of concept which can be used to train neural networks and if it works then we can go ahead and generate realistic looking data. I am glad that the idea worked!
We will start coding by importing necessary modules —
# author - Vipul Vaibhawimport cv2
import numpy as np
import random
Now we need a blank image, basically a numpy array which we will be using to generate images.
height = 500
width = 500
blank_image = np.zeros((height,width,3), np.uint8)
Okay, now if you do a cv2.imshow
here. You will see a black image because all the pixel values are zero.
Great! Our drawing board is ready. We now need to tinker with pixel values and channels to get the type of image we want.
Obviously here, I cannot show the images we got from the plant. The images were close-up shots of a crankshaft which looked green under some lights and chemicals. The scratches on those crankshafts looked a bit like silverish lines.
Okay, then let’s pick the second channel of the image (RGB or BGR, green is always the second channel) and modify it.
There were two types of images which we needed, one was with grainy background and the other one was with smooth background.
for row in blank_image[:,:width,1]:
for element in range(len(row)):
row[element] = random.choice([100,255])
We iterated through the second channel and randomly assign pixel values to 100(dark green) and 255(light green) to get grainy effect.
To get the clean background, we can simply draw a circle with center at the center of the image and radius big enough to encompass all the image.
cv2.circle(blank_image, (250,250), 400, (0,100,0), thickness=-1, lineType=8, shift=0)
Now, we observed that on grainy background the scratch used to look whitish however on clean background scratches were silverish.
num_scratches=random.randint(0,5)for _ in range(num_scratches):
row_random = random.randint(150,400)
blank_image[row_random:(row_random+1), row_random:(row_random+random.randint(25,75))] = (192,192,192)
To generate scratches on noisy background, we generated white scratches.
Great! Now we were able to generate data. We separated the data into two folders i.e defected and good. Now we had an ability to generate as much data as we want. This data was then used to do the training of neural networks to do the classification whether an image has scratches or not.
Tip — use cv2.createCLAHE()
to get luminous effects to your images.
This was a proof-of-concept. We are now using blender to generate more data. I am personally working on a project which will take simple parameters from the user and can generate time-series data, different image synthetic data etc.
Get in touch if you need some help in generating your synthetic data.
I hope that you liked the article! Keep Learning and sharing knowledge. Follow me on github, stackoverflow, linkedin or twitter.