Generate Dataset

About⌗

As you know, it really requires a lot of data, especially, accurate data in order to also achieve a successful model at the end of the training process. Considering sometimes data, which is completely custom and really hard to find high quality ones on the internet, this script is going to increase your current data size by a few times with different methods of image manipulation to give you a much more accurate and stable model at the end.

Based on your current available data, this simple script will go over, and extend your current dataset by using image manipulation methods as following: Blurred Black and White Noisy Flip (over X, Y, Center) Bright Hue More to come…

Requirements⌗

numpy
opencv-python
scikit-image
random

Input Images⌗

Test input images are as following:

Input Images

Blurred Images⌗

Blurred output images are as following:

Blurred Images

Noisy Images⌗

Noisy output images are as following:

Noisy Images

Flip Images⌗

Flip images based on the flip over X, Y, and Center are attached below:

Flipped Images 1 Flipped Images 2 Flipped Images 3

Black&White Images⌗

Black & White output images are as following:

Black&White Images

Bright Images⌗

Bright input images are as following:

Bright Images

Hue Images⌗

Hue input images are as following:

Hue Images

Running⌗

// Windows: 
python main.py

// Linux & MacOS: 
$python3 main.py

Folder Structure⌗

Generate-Datset
│
└── BlackAndWhite
│    └── __init__.py
│
└── Blur
│    └── __init__.py
│
└── Brightness
│    └── __init__.py
│
└── Flip
│    └── __init__.py
│
└── Hue
│    └── __init__.py
│
└── Noise
│    └── __init__.py
│
└── Shear
│   └── __init__.py
│
└── main.py
│
│
└── README.md
│
└── src
└── input
│    └──  image1.jpg
│    │
│    └──  image2.jpg
│    │
│    └──  image3.jpg
│    │
│    └──  image4.jpg
│    │
│    └──  image5.jpg
└── output

Features to come⌗

Current version of the script allows the users to prepare their data before annotating them. However, in the next version, script will also be ready to be easily used by the users on a ready to train data, where the script itself will also generate the label data of the given inputs by itself automatically, letting user only spend a few seconds on the generation, and then directly set on training. New image manipulation methods to increase the data as much as we can and more!

Source Code⌗

https://github.com/woosal1337/Generate-Dataset