Object Detection with Yolo-v3 on Keras¶

Shi Yongxiang

2020\11\03

In this Lab, We are going to train a Yolov3 Network to do Object Detection. Here is the online lab

Objective¶

1 Learn the image and label format of an object detection dataset
2 Learn about setting up a Yolo network and its loss function. And after that, train it.
3 Use the trained Yolo network to do object detection in an image

Preparations for Locally Running¶

Download Yolo lab and unzip it. The unzipped folder contains all necessarys files and Blood Cells dataset.
Prepare a conda python3 env, which includes tensorflow 1.15,keras 2.2, opencv-python, matplotlib, sklearn, h5py, numpy and pandas.
Open Yolo.ipynb in this env by jupyter-notebook to see all commands.

Preparations for Online Running¶

Before running the following code blocks in google colab, please prepare an python3 env:

Use Tensorflow 1.x. The default tensorflow version is 2.2 and our code works on 1.x.
Download Pre-trained Weights. The network of Yolo has 23 convolutional layers, in which 22 convolutional layers are pretrained.
Download Necessary Functions and Datasets From Github. In this lab, we are going to apply Yolo on 2 project: detect blood cells and license plate.

Use Tensorflow 1.x¶

Run the following cell, And then restart the host (by Runtime/Factory reset runtime) to refresh tensorflow version.

% tensorflow_version 1.x
import tensorflow as tf
version = tf.__version__
if version[0:2] == '2.':
    print('fail to use 1.x tensorflow, some unpredicted errors will occurs')
    print('You have to reset the runtime to refresh tensorflow Version')
else:
    print('Successfully activate tensorflow 1.x')
    print(tf.__version__)

TensorFlow 1.x selected.
Successfully activate tensorflow 1.x
1.15.2

Download Pre-trained Weights.¶

Weight file is yolov2.weights, I have shared it to everyone by email and google drive.

Method 1: Upload it to our work folder. Our work folder is /content/, you can upload the local file to this folder. But it will be deleted if you logout the notebook. Also, make sure you have a high speed to connect colab.
Method 2: Download from Website. Run ! wget https://pjreddie.com/media/files/yolov2.weights, but the speed is only 128KB.
Method 3: mount your google drive: Upload the weight to your google drive and copy it to the folder when you run the notebook. But you have to reset the runtime after giving colab the access.

Choose one and uncomment its commands by ctrl+/

# Method 1
# upload your local file by Files/upload

# Method 2
# ! wget https://pjreddie.com/media/files/yolov2.weights

# Method 3
import os
if 'drive' in os.listdir('./'):
    print('you have successfully mounted your google drive')
else:
    from google.colab import drive
    drive.mount('/content/drive')
    print('successfully mount your google drive, please restart the run time')
! cp drive/My\ Drive/yolov2.weights ./

try:
    f = open('yolov2.weights','r')
    f.close()
except IOError:
    raise ValueError('You have to upload the yolov2.weights into the folder')

you have successfully mounted your google drive

Download Necessary Functions and Datasets From Github¶

We have two datasets, One is blood cells object detection and another is license plate. For more dataset, go Here

Dataset	mAP	Demo	Config	Model
Kangaroo Detection (1 class) (https://github.com/experiencor/kangaroo)	95%	https://youtu.be/URO3UDHvoLY	check zoo	https://bit.ly/39rLNoE
License Plate Detection (European in Romania) (1 class) (https://github.com/RobertLucian/license-plate-dataset)	90%	https://youtu.be/HrqzIXFVCRo	check zoo	https://bit.ly/2tIpvPl
Raccoon Detection (1 class) (https://github.com/experiencor/raccoon_dataset)	98%	https://youtu.be/lxLyLIL7OsU	check zoo	https://bit.ly/39rLNoE
Red Blood Cell Detection (3 classes) (https://github.com/experiencor/BCCD_Dataset)	84%	https://imgur.com/a/uJl2lRI	check zoo	https://bit.ly/39rLNoE
VOC (20 classes) (http://host.robots.ox.ac.uk/pascal/VOC/voc2012/)	72%	https://youtu.be/0RmOI6hcfBI	check zoo	https://bit.ly/39rLNoE

! pwd
! git clone https://github.com/experiencor/keras-yolo2.git
# first script, we need to use some functions inside
! mv keras-yolo2/preprocessing.py ./
# second script
! mv keras-yolo2/utils.py ./
# blood images
! git clone https://github.com/Shenggan/BCCD_Dataset.git
# license plate dataset
! git clone https://github.com/RobertLucian/license-plate-dataset.git

/content
Cloning into 'keras-yolo2'...
remote: Enumerating objects: 6, done.
remote: Counting objects: 100% (6/6), done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 330 (delta 1), reused 1 (delta 0), pack-reused 324
Receiving objects: 100% (330/330), 53.90 MiB | 45.69 MiB/s, done.
Resolving deltas: 100% (180/180), done.
Cloning into 'BCCD_Dataset'...
remote: Enumerating objects: 800, done.
remote: Total 800 (delta 0), reused 0 (delta 0), pack-reused 800
Receiving objects: 100% (800/800), 7.39 MiB | 38.20 MiB/s, done.
Resolving deltas: 100% (378/378), done.

Codes and Procedures¶

1 Import Python Packages¶

from keras.models import Sequential, Model
from keras.layers import Reshape, Activation, Conv2D, Input, MaxPooling2D, BatchNormalization, Flatten, Dense, Lambda
from keras.layers.advanced_activations import LeakyReLU
from keras.callbacks import EarlyStopping, ModelCheckpoint, TensorBoard
from keras.optimizers import SGD, Adam, RMSprop
from keras.layers.merge import concatenate
import matplotlib.pyplot as plt
import keras.backend as K
import tensorflow as tf
import imgaug as ia
from tqdm import tqdm
from imgaug import augmenters as iaa
import numpy as np
import pickle
import os, cv2
from preprocessing import parse_annotation, BatchGenerator
from utils import WeightReader, decode_netout, draw_boxes
import keras

print("Keras version used: ", keras.__version__)
print("GPU imformation:", tf.test.gpu_device_name())
from tensorflow.python.client import device_lib
print(tf.__version__)
device_lib.list_local_devices()

Using TensorFlow backend.

Keras version used:  2.3.1
GPU imformation: /device:GPU:0
1.15.2

[name: "/device:CPU:0"
 device_type: "CPU"
 memory_limit: 268435456
 locality {
 }
 incarnation: 4101478489354035466, name: "/device:XLA_CPU:0"
 device_type: "XLA_CPU"
 memory_limit: 17179869184
 locality {
 }
 incarnation: 4013299294323971709
 physical_device_desc: "device: XLA_CPU device", name: "/device:XLA_GPU:0"
 device_type: "XLA_GPU"
 memory_limit: 17179869184
 locality {
 }
 incarnation: 5022800919874508597
 physical_device_desc: "device: XLA_GPU device", name: "/device:GPU:0"
 device_type: "GPU"
 memory_limit: 14912136807
 locality {
   bus_id: 1
   links {
   }
 }
 incarnation: 10349791960940202535
 physical_device_desc: "device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5"]

2 Define Configuration¶

Define dataset paths and weights path

wt_path = 'yolov2.weights'  
# BCCD dataset                     
train_image_folder = 'BCCD_Dataset/BCCD/JPEGImages/'
train_annot_folder = 'BCCD_Dataset/BCCD/Annotations/'
valid_image_folder = 'BCCD_Dataset/BCCD/JPEGImages/' # no valid set
valid_annot_folder = 'BCCD_Dataset/BCCD/Annotations/'
# license plate dataset
# train_image_folder = '/content/license-plate-dataset/dataset/train/images/'
# train_annot_folder = '/content/license-plate-dataset/dataset/train/annots/'
# valid_image_folder = '/content/license-plate-dataset/dataset/valid/images/'
# valid_annot_folder = '/content/license-plate-dataset/dataset/valid/annots/'

Define classes, imagesize and so on parameters for training and predicting

# Blood Cells
LABELS = ["RBC",'WBC','Platelets']

# license plate dataset
# LABELS = ['license-plate']

# Image size
IMAGE_H, IMAGE_W = 416, 416
# Grid number in feature map, also the grids of Yolo
GRID_H,  GRID_W  = 13 , 13

# Class number and weights
CLASS            = len(LABELS)
CLASS_WEIGHTS    = np.ones(CLASS, dtype='float32')

# How many boxes proposed for each grid
BOX              = 5
# size of the 5 boxex
ANCHORS          = [0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828]

# Threshold for object
OBJ_THRESHOLD    = 0.3#0.5
NMS_THRESHOLD    = 0.3#0.45

NO_OBJECT_SCALE  = 1.0
OBJECT_SCALE     = 5.0
COORD_SCALE      = 1.0
CLASS_SCALE      = 1.0

# parameters for training
BATCH_SIZE       = 16
WARM_UP_BATCHES  = 0
TRUE_BOX_BUFFER  = 50 # pass all ture boxes of a image for non-object loss function

3 Check Images and Labels¶

Our images and labels are in different folders. A label is stored as a xml file.

import xml.etree.ElementTree as ET
import csv
from random import seed
import os.path
from random import randint
import random
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib import patches

image_idx = 1 

# create the csv writer object
csv_data = open('./train.csv', 'w')
csvwriter = csv.writer(csv_data)
csv_head = ['Image_names','cell_type','xmin','xmax','ymin','ymax']
csvwriter.writerow(csv_head)
imagename = train_annot_folder+"BloodImage_%05d.xml" % (image_idx)
tree=ET.parse(imagename)
root=tree.getroot()
filename=root.find('filename').text
print(filename)
for region in root.findall('object'):
    csv = []
    csv.append(filename)
    name = region.find('name').text
    csv.append(name)
    xmin = region.find('bndbox').find('xmin').text
    csv.append(xmin)
    xmax = region.find('bndbox').find('xmax').text
    csv.append(xmax)
    ymin = region.find('bndbox').find('ymin').text
    csv.append(ymin)
    ymax = region.find('bndbox').find('ymax').text
    csv.append(ymax)
    #print('cell_type='+name,'xmin='+xmin,'xmax='+xmax,'ymin='+ymin,'ymax='+ymax)
    csvwriter.writerow(csv)
csv_data.close()

# read the csv file using read_csv function of pandas
train = pd.read_csv('./train.csv')
train.head()

BloodImage_00001.jpg

So that is the true boxes information in this images, lets display them.

#plot the image
fig = plt.figure(figsize=[10,10], dpi=100)
plt.subplot(121)
newimage = train_image_folder + 'BloodImage_%05d.jpg' % (image_idx)
image = plt.imread(newimage)
plt.imshow(image)
plt.title('image')

# plot label
ax = plt.subplot(122)
image = plt.imread(newimage)
plt.imshow(image)


# iterating over the image for different objects
for _,row in train[train.Image_names == "BloodImage_%05d.jpg" % (image_idx)].iterrows():
    xmin = row.xmin
    xmax = row.xmax
    ymin = row.ymin
    ymax = row.ymax
    
    width = xmax - xmin
    height = ymax - ymin
    
    # assign different color to different classes of objects
    if row.cell_type == 'RBC':
        edgecolor = 'r'
        ax.annotate('RBC', xy=(xmax-40,ymin+20))
    elif row.cell_type == 'WBC':
        edgecolor = 'b'
        ax.annotate('WBC', xy=(xmax-40,ymin+20))
    elif row.cell_type == 'Platelets':
        edgecolor = 'g'
        ax.annotate('Platelets', xy=(xmax-40,ymin+20))
        
    # add bounding boxes to the image
    rect = patches.Rectangle((xmin,ymin), width, height, edgecolor = edgecolor, facecolor = 'none')
    
    ax.add_patch(rect)
plt.title('True Bounding Box')

Text(0.5, 1.0, 'True Bounding Box')

4 Create CNN Model¶

Here, we build a Yolo Network by keras with 23 convolutional layers.

# the function to implement the orgnization layer (thanks to github.com/allanzelener/YAD2K)
def space_to_depth_x2(x):
    return tf.nn.space_to_depth(x, block_size=2)
input_image = Input(shape=(IMAGE_H, IMAGE_W, 3))
true_boxes  = Input(shape=(1, 1, 1, TRUE_BOX_BUFFER , 4))

# Layer 1
x = Conv2D(32, (3,3), strides=(1,1), padding='same', name='conv_1', use_bias=False)(input_image)
x = BatchNormalization(name='norm_1')(x)
x = LeakyReLU(alpha=0.1)(x)
x = MaxPooling2D(pool_size=(2, 2))(x)

# Layer 2
x = Conv2D(64, (3,3), strides=(1,1), padding='same', name='conv_2', use_bias=False)(x)
x = BatchNormalization(name='norm_2')(x)
x = LeakyReLU(alpha=0.1)(x)
x = MaxPooling2D(pool_size=(2, 2))(x)

# Layer 3
x = Conv2D(128, (3,3), strides=(1,1), padding='same', name='conv_3', use_bias=False)(x)
x = BatchNormalization(name='norm_3')(x)
x = LeakyReLU(alpha=0.1)(x)

# Layer 4
x = Conv2D(64, (1,1), strides=(1,1), padding='same', name='conv_4', use_bias=False)(x)
x = BatchNormalization(name='norm_4')(x)
x = LeakyReLU(alpha=0.1)(x)

# Layer 5
x = Conv2D(128, (3,3), strides=(1,1), padding='same', name='conv_5', use_bias=False)(x)
x = BatchNormalization(name='norm_5')(x)
x = LeakyReLU(alpha=0.1)(x)
x = MaxPooling2D(pool_size=(2, 2))(x)

# Layer 6
x = Conv2D(256, (3,3), strides=(1,1), padding='same', name='conv_6', use_bias=False)(x)
x = BatchNormalization(name='norm_6')(x)
x = LeakyReLU(alpha=0.1)(x)

# Layer 7
x = Conv2D(128, (1,1), strides=(1,1), padding='same', name='conv_7', use_bias=False)(x)
x = BatchNormalization(name='norm_7')(x)
x = LeakyReLU(alpha=0.1)(x)

# Layer 8
x = Conv2D(256, (3,3), strides=(1,1), padding='same', name='conv_8', use_bias=False)(x)
x = BatchNormalization(name='norm_8')(x)
x = LeakyReLU(alpha=0.1)(x)
x = MaxPooling2D(pool_size=(2, 2))(x)

# Layer 9
x = Conv2D(512, (3,3), strides=(1,1), padding='same', name='conv_9', use_bias=False)(x)
x = BatchNormalization(name='norm_9')(x)
x = LeakyReLU(alpha=0.1)(x)

# Layer 10
x = Conv2D(256, (1,1), strides=(1,1), padding='same', name='conv_10', use_bias=False)(x)
x = BatchNormalization(name='norm_10')(x)
x = LeakyReLU(alpha=0.1)(x)

# Layer 11
x = Conv2D(512, (3,3), strides=(1,1), padding='same', name='conv_11', use_bias=False)(x)
x = BatchNormalization(name='norm_11')(x)
x = LeakyReLU(alpha=0.1)(x)

# Layer 12
x = Conv2D(256, (1,1), strides=(1,1), padding='same', name='conv_12', use_bias=False)(x)
x = BatchNormalization(name='norm_12')(x)
x = LeakyReLU(alpha=0.1)(x)

# Layer 13
x = Conv2D(512, (3,3), strides=(1,1), padding='same', name='conv_13', use_bias=False)(x)
x = BatchNormalization(name='norm_13')(x)
x = LeakyReLU(alpha=0.1)(x)

skip_connection = x

x = MaxPooling2D(pool_size=(2, 2))(x)

# Layer 14
x = Conv2D(1024, (3,3), strides=(1,1), padding='same', name='conv_14', use_bias=False)(x)
x = BatchNormalization(name='norm_14')(x)
x = LeakyReLU(alpha=0.1)(x)

# Layer 15
x = Conv2D(512, (1,1), strides=(1,1), padding='same', name='conv_15', use_bias=False)(x)
x = BatchNormalization(name='norm_15')(x)
x = LeakyReLU(alpha=0.1)(x)

# Layer 16
x = Conv2D(1024, (3,3), strides=(1,1), padding='same', name='conv_16', use_bias=False)(x)
x = BatchNormalization(name='norm_16')(x)
x = LeakyReLU(alpha=0.1)(x)

# Layer 17
x = Conv2D(512, (1,1), strides=(1,1), padding='same', name='conv_17', use_bias=False)(x)
x = BatchNormalization(name='norm_17')(x)
x = LeakyReLU(alpha=0.1)(x)

# Layer 18
x = Conv2D(1024, (3,3), strides=(1,1), padding='same', name='conv_18', use_bias=False)(x)
x = BatchNormalization(name='norm_18')(x)
x = LeakyReLU(alpha=0.1)(x)

# Layer 19
x = Conv2D(1024, (3,3), strides=(1,1), padding='same', name='conv_19', use_bias=False)(x)
x = BatchNormalization(name='norm_19')(x)
x = LeakyReLU(alpha=0.1)(x)

# Layer 20
x = Conv2D(1024, (3,3), strides=(1,1), padding='same', name='conv_20', use_bias=False)(x)
x = BatchNormalization(name='norm_20')(x)
x = LeakyReLU(alpha=0.1)(x)

# Layer 21
skip_connection = Conv2D(64, (1,1), strides=(1,1), padding='same', name='conv_21', use_bias=False)(skip_connection)
skip_connection = BatchNormalization(name='norm_21')(skip_connection)
skip_connection = LeakyReLU(alpha=0.1)(skip_connection)
skip_connection = Lambda(space_to_depth_x2)(skip_connection)

x = concatenate([skip_connection, x])

# Layer 22
x = Conv2D(1024, (3,3), strides=(1,1), padding='same', name='conv_22', use_bias=False)(x)
x = BatchNormalization(name='norm_22')(x)
x = LeakyReLU(alpha=0.1)(x)

# Layer 23
# This layer will generate 5 boxes for each grid, and we have 13x13 grids
# each box will predict an object with W, H and center coordinate(X, Y), and also Class
# so for each box, network will give we a vector [W, H, X, Y, No_object, is_class1, is_class2, is_classn]
# In the end, the output shape of the yolo network should be [13, 13, 5, 4+1+NUM_class]
x = Conv2D(BOX * (4 + 1 + 3), (1,1), strides=(1,1), padding='same', name='conv_23')(x)
output = Reshape((GRID_H, GRID_W, BOX, 4 + 1 + 3))(x)

# small hack to allow true_boxes to be registered when Keras build the model 
# for more information: https://github.com/fchollet/keras/issues/2790
output = Lambda(lambda args: args[0])([output, true_boxes])

model = Model([input_image, true_boxes], output)
model.summary()

WARNING:tensorflow:From /tensorflow-1.15.2/python3.6/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From /tensorflow-1.15.2/python3.6/keras/backend/tensorflow_backend.py:4070: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 416, 416, 3)  0                                            
__________________________________________________________________________________________________
conv_1 (Conv2D)                 (None, 416, 416, 32) 864         input_1[0][0]                    
__________________________________________________________________________________________________
norm_1 (BatchNormalization)     (None, 416, 416, 32) 128         conv_1[0][0]                     
__________________________________________________________________________________________________
leaky_re_lu_1 (LeakyReLU)       (None, 416, 416, 32) 0           norm_1[0][0]                     
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, 208, 208, 32) 0           leaky_re_lu_1[0][0]              
__________________________________________________________________________________________________
conv_2 (Conv2D)                 (None, 208, 208, 64) 18432       max_pooling2d_1[0][0]            
__________________________________________________________________________________________________
norm_2 (BatchNormalization)     (None, 208, 208, 64) 256         conv_2[0][0]                     
__________________________________________________________________________________________________
leaky_re_lu_2 (LeakyReLU)       (None, 208, 208, 64) 0           norm_2[0][0]                     
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D)  (None, 104, 104, 64) 0           leaky_re_lu_2[0][0]              
__________________________________________________________________________________________________
conv_3 (Conv2D)                 (None, 104, 104, 128 73728       max_pooling2d_2[0][0]            
__________________________________________________________________________________________________
norm_3 (BatchNormalization)     (None, 104, 104, 128 512         conv_3[0][0]                     
__________________________________________________________________________________________________
leaky_re_lu_3 (LeakyReLU)       (None, 104, 104, 128 0           norm_3[0][0]                     
__________________________________________________________________________________________________
conv_4 (Conv2D)                 (None, 104, 104, 64) 8192        leaky_re_lu_3[0][0]              
__________________________________________________________________________________________________
norm_4 (BatchNormalization)     (None, 104, 104, 64) 256         conv_4[0][0]                     
__________________________________________________________________________________________________
leaky_re_lu_4 (LeakyReLU)       (None, 104, 104, 64) 0           norm_4[0][0]                     
__________________________________________________________________________________________________
conv_5 (Conv2D)                 (None, 104, 104, 128 73728       leaky_re_lu_4[0][0]              
__________________________________________________________________________________________________
norm_5 (BatchNormalization)     (None, 104, 104, 128 512         conv_5[0][0]                     
__________________________________________________________________________________________________
leaky_re_lu_5 (LeakyReLU)       (None, 104, 104, 128 0           norm_5[0][0]                     
__________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D)  (None, 52, 52, 128)  0           leaky_re_lu_5[0][0]              
__________________________________________________________________________________________________
conv_6 (Conv2D)                 (None, 52, 52, 256)  294912      max_pooling2d_3[0][0]            
__________________________________________________________________________________________________
norm_6 (BatchNormalization)     (None, 52, 52, 256)  1024        conv_6[0][0]                     
__________________________________________________________________________________________________
leaky_re_lu_6 (LeakyReLU)       (None, 52, 52, 256)  0           norm_6[0][0]                     
__________________________________________________________________________________________________
conv_7 (Conv2D)                 (None, 52, 52, 128)  32768       leaky_re_lu_6[0][0]              
__________________________________________________________________________________________________
norm_7 (BatchNormalization)     (None, 52, 52, 128)  512         conv_7[0][0]                     
__________________________________________________________________________________________________
leaky_re_lu_7 (LeakyReLU)       (None, 52, 52, 128)  0           norm_7[0][0]                     
__________________________________________________________________________________________________
conv_8 (Conv2D)                 (None, 52, 52, 256)  294912      leaky_re_lu_7[0][0]              
__________________________________________________________________________________________________
norm_8 (BatchNormalization)     (None, 52, 52, 256)  1024        conv_8[0][0]                     
__________________________________________________________________________________________________
leaky_re_lu_8 (LeakyReLU)       (None, 52, 52, 256)  0           norm_8[0][0]                     
__________________________________________________________________________________________________
max_pooling2d_4 (MaxPooling2D)  (None, 26, 26, 256)  0           leaky_re_lu_8[0][0]              
__________________________________________________________________________________________________
conv_9 (Conv2D)                 (None, 26, 26, 512)  1179648     max_pooling2d_4[0][0]            
__________________________________________________________________________________________________
norm_9 (BatchNormalization)     (None, 26, 26, 512)  2048        conv_9[0][0]                     
__________________________________________________________________________________________________
leaky_re_lu_9 (LeakyReLU)       (None, 26, 26, 512)  0           norm_9[0][0]                     
__________________________________________________________________________________________________
conv_10 (Conv2D)                (None, 26, 26, 256)  131072      leaky_re_lu_9[0][0]              
__________________________________________________________________________________________________
norm_10 (BatchNormalization)    (None, 26, 26, 256)  1024        conv_10[0][0]                    
__________________________________________________________________________________________________
leaky_re_lu_10 (LeakyReLU)      (None, 26, 26, 256)  0           norm_10[0][0]                    
__________________________________________________________________________________________________
conv_11 (Conv2D)                (None, 26, 26, 512)  1179648     leaky_re_lu_10[0][0]             
__________________________________________________________________________________________________
norm_11 (BatchNormalization)    (None, 26, 26, 512)  2048        conv_11[0][0]                    
__________________________________________________________________________________________________
leaky_re_lu_11 (LeakyReLU)      (None, 26, 26, 512)  0           norm_11[0][0]                    
__________________________________________________________________________________________________
conv_12 (Conv2D)                (None, 26, 26, 256)  131072      leaky_re_lu_11[0][0]             
__________________________________________________________________________________________________
norm_12 (BatchNormalization)    (None, 26, 26, 256)  1024        conv_12[0][0]                    
__________________________________________________________________________________________________
leaky_re_lu_12 (LeakyReLU)      (None, 26, 26, 256)  0           norm_12[0][0]                    
__________________________________________________________________________________________________
conv_13 (Conv2D)                (None, 26, 26, 512)  1179648     leaky_re_lu_12[0][0]             
__________________________________________________________________________________________________
norm_13 (BatchNormalization)    (None, 26, 26, 512)  2048        conv_13[0][0]                    
__________________________________________________________________________________________________
leaky_re_lu_13 (LeakyReLU)      (None, 26, 26, 512)  0           norm_13[0][0]                    
__________________________________________________________________________________________________
max_pooling2d_5 (MaxPooling2D)  (None, 13, 13, 512)  0           leaky_re_lu_13[0][0]             
__________________________________________________________________________________________________
conv_14 (Conv2D)                (None, 13, 13, 1024) 4718592     max_pooling2d_5[0][0]            
__________________________________________________________________________________________________
norm_14 (BatchNormalization)    (None, 13, 13, 1024) 4096        conv_14[0][0]                    
__________________________________________________________________________________________________
leaky_re_lu_14 (LeakyReLU)      (None, 13, 13, 1024) 0           norm_14[0][0]                    
__________________________________________________________________________________________________
conv_15 (Conv2D)                (None, 13, 13, 512)  524288      leaky_re_lu_14[0][0]             
__________________________________________________________________________________________________
norm_15 (BatchNormalization)    (None, 13, 13, 512)  2048        conv_15[0][0]                    
__________________________________________________________________________________________________
leaky_re_lu_15 (LeakyReLU)      (None, 13, 13, 512)  0           norm_15[0][0]                    
__________________________________________________________________________________________________
conv_16 (Conv2D)                (None, 13, 13, 1024) 4718592     leaky_re_lu_15[0][0]             
__________________________________________________________________________________________________
norm_16 (BatchNormalization)    (None, 13, 13, 1024) 4096        conv_16[0][0]                    
__________________________________________________________________________________________________
leaky_re_lu_16 (LeakyReLU)      (None, 13, 13, 1024) 0           norm_16[0][0]                    
__________________________________________________________________________________________________
conv_17 (Conv2D)                (None, 13, 13, 512)  524288      leaky_re_lu_16[0][0]             
__________________________________________________________________________________________________
norm_17 (BatchNormalization)    (None, 13, 13, 512)  2048        conv_17[0][0]                    
__________________________________________________________________________________________________
leaky_re_lu_17 (LeakyReLU)      (None, 13, 13, 512)  0           norm_17[0][0]                    
__________________________________________________________________________________________________
conv_18 (Conv2D)                (None, 13, 13, 1024) 4718592     leaky_re_lu_17[0][0]             
__________________________________________________________________________________________________
norm_18 (BatchNormalization)    (None, 13, 13, 1024) 4096        conv_18[0][0]                    
__________________________________________________________________________________________________
leaky_re_lu_18 (LeakyReLU)      (None, 13, 13, 1024) 0           norm_18[0][0]                    
__________________________________________________________________________________________________
conv_19 (Conv2D)                (None, 13, 13, 1024) 9437184     leaky_re_lu_18[0][0]             
__________________________________________________________________________________________________
norm_19 (BatchNormalization)    (None, 13, 13, 1024) 4096        conv_19[0][0]                    
__________________________________________________________________________________________________
conv_21 (Conv2D)                (None, 26, 26, 64)   32768       leaky_re_lu_13[0][0]             
__________________________________________________________________________________________________
leaky_re_lu_19 (LeakyReLU)      (None, 13, 13, 1024) 0           norm_19[0][0]                    
__________________________________________________________________________________________________
norm_21 (BatchNormalization)    (None, 26, 26, 64)   256         conv_21[0][0]                    
__________________________________________________________________________________________________
conv_20 (Conv2D)                (None, 13, 13, 1024) 9437184     leaky_re_lu_19[0][0]             
__________________________________________________________________________________________________
leaky_re_lu_21 (LeakyReLU)      (None, 26, 26, 64)   0           norm_21[0][0]                    
__________________________________________________________________________________________________
norm_20 (BatchNormalization)    (None, 13, 13, 1024) 4096        conv_20[0][0]                    
__________________________________________________________________________________________________
lambda_1 (Lambda)               (None, 13, 13, 256)  0           leaky_re_lu_21[0][0]             
__________________________________________________________________________________________________
leaky_re_lu_20 (LeakyReLU)      (None, 13, 13, 1024) 0           norm_20[0][0]                    
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 13, 13, 1280) 0           lambda_1[0][0]                   
                                                                 leaky_re_lu_20[0][0]             
__________________________________________________________________________________________________
conv_22 (Conv2D)                (None, 13, 13, 1024) 11796480    concatenate_1[0][0]              
__________________________________________________________________________________________________
norm_22 (BatchNormalization)    (None, 13, 13, 1024) 4096        conv_22[0][0]                    
__________________________________________________________________________________________________
leaky_re_lu_22 (LeakyReLU)      (None, 13, 13, 1024) 0           norm_22[0][0]                    
__________________________________________________________________________________________________
conv_23 (Conv2D)                (None, 13, 13, 40)   41000       leaky_re_lu_22[0][0]             
__________________________________________________________________________________________________
reshape_1 (Reshape)             (None, 13, 13, 5, 8) 0           conv_23[0][0]                    
__________________________________________________________________________________________________
input_2 (InputLayer)            (None, 1, 1, 1, 50,  0                                            
__________________________________________________________________________________________________
lambda_2 (Lambda)               (None, 13, 13, 5, 8) 0           reshape_1[0][0]                  
                                                                 input_2[0][0]                    
==================================================================================================
Total params: 50,588,936
Trainable params: 50,568,264
Non-trainable params: 20,672
__________________________________________________________________________________________________

5 Load Pre-trained Weights¶

Considering that we do not have so many images to train the big network, we load pre-trained weights and only randomize the last layer to fit objects in BCCD dataset.

# load weights for all layers
weight_reader = WeightReader(wt_path)
weight_reader.reset()
nb_conv = 23

for i in range(1, nb_conv+1):

    conv_layer = model.get_layer('conv_' + str(i))
    print('load weight for conv_{}'.format(i))
    # weights in BN
    if i < nb_conv:
        norm_layer = model.get_layer('norm_' + str(i))
        
        size = np.prod(norm_layer.get_weights()[0].shape)

        beta  = weight_reader.read_bytes(size)
        gamma = weight_reader.read_bytes(size)
        mean  = weight_reader.read_bytes(size)
        var   = weight_reader.read_bytes(size)

        weights = norm_layer.set_weights([gamma, beta, mean, var])       
    # weights in convolutions
    if len(conv_layer.get_weights()) > 1:
        bias   = weight_reader.read_bytes(np.prod(conv_layer.get_weights()[1].shape))
        kernel = weight_reader.read_bytes(np.prod(conv_layer.get_weights()[0].shape))
        kernel = kernel.reshape(list(reversed(conv_layer.get_weights()[0].shape)))
        kernel = kernel.transpose([2,3,1,0])
        conv_layer.set_weights([kernel, bias])
    else:
        kernel = weight_reader.read_bytes(np.prod(conv_layer.get_weights()[0].shape))
        kernel = kernel.reshape(list(reversed(conv_layer.get_weights()[0].shape)))
        kernel = kernel.transpose([2,3,1,0])
        conv_layer.set_weights([kernel])

# Randomize weights of the last layer 
layer   = model.layers[-4] # the last convolutional layer
weights = layer.get_weights()

new_kernel = np.random.normal(size=weights[0].shape)/(GRID_H*GRID_W)
new_bias   = np.random.normal(size=weights[1].shape)/(GRID_H*GRID_W)

layer.set_weights([new_kernel, new_bias])

load weight for conv_1
load weight for conv_2
load weight for conv_3
load weight for conv_4
load weight for conv_5
load weight for conv_6
load weight for conv_7
load weight for conv_8
load weight for conv_9
load weight for conv_10
load weight for conv_11
load weight for conv_12
load weight for conv_13
load weight for conv_14
load weight for conv_15
load weight for conv_16
load weight for conv_17
load weight for conv_18
load weight for conv_19
load weight for conv_20
load weight for conv_21
load weight for conv_22
load weight for conv_23

6 Define Loss function¶

$Loss$

Where:

$s^2$ is number of grids in last features map
$B$ is the number of anchors for each grid
$C$ is the confidence about whether there is an object in the proposal box of an anchor
$p(c)$ is the possibility of being a object of a class in the proposal box of an achorr.
$\lambda_?$ is the weight of an item
$L_{i,j}^{obj/noobj}$ is the mask for a $obj/noobj$ anchor boxes. And $L^{obj}\cup L^{nonobj} \not= U_{anchors}$

There are serval tricks:

Warm-Up Stage: When train-batch-num is smaller than a value, the true X&Y of an anchor whose $\hat{C}=0$ should be set as the its center; the true W&H should be set as the W&H of the anchor. After that, they are 0.
No-Object Anchor: {Best_IOU<0.6} and {$\hat{C}=0$}. If the best IOU of a predict box based on the anchor is smaller than 0.6 with any true boxes , it will be considered as a no-object anchor. (fourth item in loss function).
Object Anchor:{$\hat{C}=1$}. Only the anchor, which has both same center point coordinate and biggest IOU with a true box, is labeled as $\hat{C}=1$
Negelected Anchor:{Best_IOU>=0.6}&&{$\hat{C}$=0}. This kind of anchors does not occur in our loss function, because it could predict a correct box, but usually it is not the best anchor. So it is hard to punish it.

Anchor

# it is defined by tensorflow and very fragile 
# if you want to run another model for license-plate detection, factory reset the runtime
def custom_loss(y_true, y_pred):
    # size of y: [N, 13, 13, 5, 4+1+NUM_class]
    seen = tf.Variable(0.)
    total_recall = tf.Variable(0.)
    
    '''
        calculate meshgrid
    '''
    mask_shape = tf.shape(y_true)[:4]
    # mask_shape=[N_batch, 13,13, 5]
    
    # X, Y meshgrid, coordinate of each grid
    cell_x = tf.cast(tf.reshape(tf.tile(tf.range(GRID_W), [GRID_H]), (1, GRID_H, GRID_W, 1, 1)),'float32')
    cell_y = tf.compat.v1.transpose(cell_x, (0,2,1,3,4))

    cell_grid = tf.compat.v1.tile(tf.concat([cell_x,cell_y], -1), [BATCH_SIZE, 1, 1, 5, 1])
    
    coord_mask = tf.zeros(mask_shape)
    conf_mask  = tf.zeros(mask_shape)
    class_mask = tf.zeros(mask_shape)

    '''
        get slice of predicted [X Y], predicted [W H], confidence, and class
    '''
    ### adjust x and y, normalize each x and y centered with the coor of each grid 
    pred_box_xy = tf.math.sigmoid(y_pred[..., :2]) + cell_grid
    
    ### adjust w and h, get the normalized w and h of the box
    pred_box_wh = tf.math.exp(y_pred[..., 2:4]) * np.reshape(ANCHORS, [1,1,1,BOX,2])
    
    ### adjust confidence, whether there is an object(1) or not(0) for each box of each grid
    pred_box_conf = tf.math.sigmoid(y_pred[..., 4])
    
    ### adjust class probabilities, what kind of the object
    pred_box_class = y_pred[..., 5:]
    
    '''
        Adjust ground truth, do same thing on our label
    '''
    ### adjust x and y
    true_box_xy = y_true[..., 0:2] # relative position to the containing cell
    
    ### adjust w and h
    true_box_wh = y_true[..., 2:4] # number of cells accross, horizontally and vertically
    
    '''
        get boxes information (left bottom point and right upper point) to calculate IOU
    '''
    true_wh_half = true_box_wh / 2.
    true_mins    = true_box_xy - true_wh_half # left bottom of a box
    true_maxes   = true_box_xy + true_wh_half  # right upper of a box
    
    pred_wh_half = pred_box_wh / 2.
    pred_mins    = pred_box_xy - pred_wh_half
    pred_maxes   = pred_box_xy + pred_wh_half       
    
    # calulate intersection of union
    intersect_mins  = tf.math.maximum(pred_mins,  true_mins)
    intersect_maxes = tf.math.minimum(pred_maxes, true_maxes)
    intersect_wh    = tf.math.maximum(intersect_maxes - intersect_mins, 0.)
    intersect_areas = intersect_wh[..., 0] * intersect_wh[..., 1]
    
    true_areas = true_box_wh[..., 0] * true_box_wh[..., 1]
    pred_areas = pred_box_wh[..., 0] * pred_box_wh[..., 1]

    union_areas = pred_areas + true_areas - intersect_areas
    iou_scores  = tf.math.truediv(intersect_areas, union_areas)
    # when there is an object(1) true_box_conf = iou_scores else: true_box_conf=0
    true_box_conf = iou_scores * y_true[..., 4]
    

    ### adjust class probabilities
    true_box_class = tf.math.argmax(y_true[..., 5:], -1)
    
    '''
        Determine the masks
    '''
    ### coordinate mask: simply the position of the ground truth boxes (the predictors)
    coord_mask = tf.expand_dims(y_true[..., 4], axis=-1) * COORD_SCALE
    
    ### confidence mask: penelize predictors + penalize boxes with low IOU
    # penalize the confidence of the boxes, which have IOU with some ground truth box < 0.6
    true_xy = true_boxes[..., 0:2] # size[1, 1, 1, B, 2]
    true_wh = true_boxes[..., 2:4] # size[1, 1, 1, B, 2]
    
    true_wh_half = true_wh / 2.
    true_mins    = true_xy - true_wh_half
    true_maxes   = true_xy + true_wh_half
    
    pred_xy = tf.expand_dims(pred_box_xy, 4) # size[1, 1, 1, B, 1, 2]
    pred_wh = tf.expand_dims(pred_box_wh, 4) # size[1, 1, 1, B, 1, 2]
    
    
    pred_wh_half = pred_wh / 2.
    pred_mins    = pred_xy - pred_wh_half # # size[N, H, W, B, 2]
    pred_maxes   = pred_xy + pred_wh_half    
    
    
    intersect_mins  = tf.math.maximum(pred_mins,  true_mins)
    intersect_maxes = tf.math.minimum(pred_maxes, true_maxes)
    intersect_wh    = tf.math.maximum(intersect_maxes - intersect_mins, 0.) # must have some overlap, otherwise 0
    intersect_areas = intersect_wh[..., 0] * intersect_wh[..., 1] # size[N, H, W, B, 1]
    
    true_areas = true_wh[..., 0] * true_wh[..., 1]
    pred_areas = pred_wh[..., 0] * pred_wh[..., 1]

    union_areas = pred_areas + true_areas - intersect_areas
    iou_scores  = tf.math.truediv(intersect_areas, union_areas) # size[N, H, W, B, 1]
    
    best_ious = tf.math.reduce_max(iou_scores, axis=4) # size[N, H, W, B]
    conf_mask = conf_mask + tf.cast(best_ious < 0.6, 'float32') * (1 - y_true[..., 4]) * NO_OBJECT_SCALE
    
    # penalize the confidence of the boxes, which are reponsible for corresponding ground truth box
    conf_mask = conf_mask + y_true[..., 4] * OBJECT_SCALE
    
    ### class mask: simply the position of the ground truth boxes (the predictors)
    class_mask = y_true[..., 4] * tf.gather(CLASS_WEIGHTS, true_box_class) * CLASS_SCALE       
    
    """
    Warm-up training
    """
    no_boxes_mask = tf.cast(coord_mask < COORD_SCALE/2.,'float32')
    seen = tf.assign_add(seen, 1.)
    
    true_box_xy, true_box_wh, coord_mask = tf.cond(tf.math.less(seen, WARM_UP_BATCHES), 
                          lambda: [true_box_xy + (0.5 + cell_grid) * no_boxes_mask, 
                                   true_box_wh + tf.ones_like(true_box_wh) * np.reshape(ANCHORS, [1,1,1,BOX,2]) * no_boxes_mask, 
                                   tf.ones_like(coord_mask)],
                          lambda: [true_box_xy, 
                                   true_box_wh,
                                   coord_mask])
    
    """
    Finalize the loss
    """
    nb_coord_box = tf.math.reduce_sum(tf.compat.v1.to_float(coord_mask > 0.0))
    nb_conf_box  = tf.math.reduce_sum(tf.compat.v1.to_float(conf_mask  > 0.0))
    nb_class_box = tf.math.reduce_sum(tf.compat.v1.to_float(class_mask > 0.0))
    
    loss_xy    = tf.math.reduce_sum(tf.math.square(true_box_xy-pred_box_xy)     * coord_mask) / (nb_coord_box + 1e-6) / 2.
    loss_wh    = tf.math.reduce_sum(tf.math.square(true_box_wh-pred_box_wh)     * coord_mask) / (nb_coord_box + 1e-6) / 2.
    loss_conf  = tf.math.reduce_sum(tf.math.square(true_box_conf-pred_box_conf) * conf_mask)  / (nb_conf_box  + 1e-6) / 2.
    loss_class = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=true_box_class, logits=pred_box_class)
    loss_class = tf.math.reduce_sum(loss_class * class_mask) / (nb_class_box + 1e-6)
    
    loss = loss_xy + loss_wh + loss_conf + loss_class
    
    nb_true_box = tf.math.reduce_sum(y_true[..., 4])
    nb_pred_box = tf.math.reduce_sum(tf.cast(true_box_conf > 0.5,'float32') * tf.cast(pred_box_conf > 0.3,'float32'))

    """
    Debugging code
    """    
    current_recall = nb_pred_box/(nb_true_box + 1e-6)
    total_recall = tf.assign_add(total_recall, current_recall) 

    loss = tf.compat.v1.Print(loss, [tf.zeros((1))], message='Dummy Line \t', summarize=1000)
    loss = tf.compat.v1.Print(loss, [loss_xy], message='Loss XY \t', summarize=1000)
    loss = tf.compat.v1.Print(loss, [loss_wh], message='Loss WH \t', summarize=1000)
    loss = tf.compat.v1.Print(loss, [loss_conf], message='Loss Conf \t', summarize=1000)
    loss = tf.compat.v1.Print(loss, [loss_class], message='Loss Class \t', summarize=1000)
    loss = tf.compat.v1.Print(loss, [loss], message='Total Loss \t', summarize=1000)
    loss = tf.compat.v1.Print(loss, [current_recall], message='Current Recall \t', summarize=1000)
    loss = tf.compat.v1.Print(loss, [total_recall/seen], message='Average Recall \t', summarize=1000)
    
    return loss

7 Train the model¶

Define a configuration

generator_config = {
    'IMAGE_H'         : IMAGE_H, 
    'IMAGE_W'         : IMAGE_W,
    'GRID_H'          : GRID_H,  
    'GRID_W'          : GRID_W,
    'BOX'             : BOX,
    'LABELS'          : LABELS,
    'CLASS'           : len(LABELS),
    'ANCHORS'         : ANCHORS,
    'BATCH_SIZE'      : BATCH_SIZE,
    'TRUE_BOX_BUFFER' : 50,
}
def normalize(image):
    return image / 255.

load all images and create trainset and validset

# Get 
train_imgs, seen_train_labels = parse_annotation(train_annot_folder, train_image_folder, labels=LABELS)
train_batch = BatchGenerator(train_imgs, generator_config, norm=normalize)

valid_imgs, seen_valid_labels = parse_annotation(valid_annot_folder, valid_image_folder, labels=LABELS)
valid_batch = BatchGenerator(valid_imgs, generator_config, norm=normalize, jitter=False)

Set serval callbacks to store weights after each epoch and early stop the training in some cases

Finally, we have the yolo network after 26 epochs training.

# callbacks
early_stop = EarlyStopping(monitor='val_loss', 
                           min_delta=0.001, 
                           patience=3, 
                           mode='min', 
                           verbose=1)

checkpoint = ModelCheckpoint('weights.h5', 
                             monitor='val_loss', 
                             verbose=1, 
                             save_best_only=True, 
                             mode='min', 
                             period=1)
tensorboard = TensorBoard(log_dir='./logs', 
                          histogram_freq=0, 
                          write_graph=True, 
                          write_images=False)

optimizer = Adam(lr=0.5e-4, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
#optimizer = SGD(lr=1e-4, decay=0.0005, momentum=0.9)
#optimizer = RMSprop(lr=1e-4, rho=0.9, epsilon=1e-08, decay=0.0)

model.compile(loss=custom_loss, optimizer=optimizer)
model.fit_generator(generator = train_batch, 
                    steps_per_epoch  = len(train_batch), 
                    epochs           = 100, 
                    verbose          = 1,
                    validation_data  = valid_batch,
                    validation_steps = len(valid_batch),
                    callbacks        = [early_stop, checkpoint], 
                    max_queue_size   = 3)

WARNING:tensorflow:From <ipython-input-10-0f9ad60bf9d7>:138: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
WARNING:tensorflow:From <ipython-input-10-0f9ad60bf9d7>:159: Print (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2018-08-20.
Instructions for updating:
Use tf.print instead of tf.Print. Note that tf.print returns a no-output operator that directly prints the output. Outside of defuns or eager mode, this operator will not be executed unless it is directly specified in session.run or used as a control dependency for other operators. This is only a concern in graph mode. Below is an example of how to ensure tf.print executes in graph mode:

WARNING:tensorflow:From /tensorflow-1.15.2/python3.6/tensorflow_core/python/ops/math_grad.py:1424: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /tensorflow-1.15.2/python3.6/keras/backend/tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From /tensorflow-1.15.2/python3.6/keras/backend/tensorflow_backend.py:431: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

WARNING:tensorflow:From /tensorflow-1.15.2/python3.6/keras/backend/tensorflow_backend.py:438: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

Epoch 1/100
23/23 [==============================] - 38s 2s/step - loss: 1.4136 - val_loss: 0.9960

Epoch 00001: val_loss improved from inf to 0.99602, saving model to weights.h5
Epoch 2/100
23/23 [==============================] - 21s 934ms/step - loss: 0.7795 - val_loss: 0.5534

Epoch 00002: val_loss improved from 0.99602 to 0.55342, saving model to weights.h5
Epoch 3/100
23/23 [==============================] - 21s 928ms/step - loss: 0.4942 - val_loss: 0.4059

Epoch 00003: val_loss improved from 0.55342 to 0.40593, saving model to weights.h5
Epoch 4/100
23/23 [==============================] - 22s 957ms/step - loss: 0.3733 - val_loss: 0.3364

Epoch 00004: val_loss improved from 0.40593 to 0.33644, saving model to weights.h5
Epoch 5/100
23/23 [==============================] - 22s 974ms/step - loss: 0.3076 - val_loss: 0.2827

Epoch 00005: val_loss improved from 0.33644 to 0.28271, saving model to weights.h5
Epoch 6/100
23/23 [==============================] - 22s 944ms/step - loss: 0.2581 - val_loss: 0.2464

Epoch 00006: val_loss improved from 0.28271 to 0.24639, saving model to weights.h5
Epoch 7/100
23/23 [==============================] - 21s 925ms/step - loss: 0.2277 - val_loss: 0.2170

Epoch 00007: val_loss improved from 0.24639 to 0.21699, saving model to weights.h5
Epoch 8/100
23/23 [==============================] - 22s 978ms/step - loss: 0.2003 - val_loss: 0.1739

Epoch 00008: val_loss improved from 0.21699 to 0.17392, saving model to weights.h5
Epoch 9/100
23/23 [==============================] - 22s 955ms/step - loss: 0.1804 - val_loss: 0.1610

Epoch 00009: val_loss improved from 0.17392 to 0.16098, saving model to weights.h5
Epoch 10/100
23/23 [==============================] - 22s 953ms/step - loss: 0.1736 - val_loss: 0.1381

Epoch 00010: val_loss improved from 0.16098 to 0.13810, saving model to weights.h5
Epoch 11/100
23/23 [==============================] - 22s 936ms/step - loss: 0.1601 - val_loss: 0.1322

Epoch 00011: val_loss improved from 0.13810 to 0.13216, saving model to weights.h5
Epoch 12/100
23/23 [==============================] - 23s 979ms/step - loss: 0.1515 - val_loss: 0.1325

Epoch 00012: val_loss did not improve from 0.13216
Epoch 13/100
23/23 [==============================] - 21s 920ms/step - loss: 0.1412 - val_loss: 0.1103

Epoch 00013: val_loss improved from 0.13216 to 0.11034, saving model to weights.h5
Epoch 14/100
23/23 [==============================] - 22s 969ms/step - loss: 0.1403 - val_loss: 0.0948

Epoch 00014: val_loss improved from 0.11034 to 0.09478, saving model to weights.h5
Epoch 15/100
23/23 [==============================] - 21s 928ms/step - loss: 0.1312 - val_loss: 0.1092

Epoch 00015: val_loss did not improve from 0.09478
Epoch 16/100
23/23 [==============================] - 22s 944ms/step - loss: 0.1278 - val_loss: 0.0946

Epoch 00016: val_loss improved from 0.09478 to 0.09463, saving model to weights.h5
Epoch 17/100
23/23 [==============================] - 23s 995ms/step - loss: 0.1217 - val_loss: 0.0918

Epoch 00017: val_loss improved from 0.09463 to 0.09180, saving model to weights.h5
Epoch 18/100
23/23 [==============================] - 22s 957ms/step - loss: 0.1225 - val_loss: 0.0964

Epoch 00018: val_loss did not improve from 0.09180
Epoch 19/100
23/23 [==============================] - 22s 947ms/step - loss: 0.1128 - val_loss: 0.0882

Epoch 00019: val_loss improved from 0.09180 to 0.08823, saving model to weights.h5
Epoch 20/100
23/23 [==============================] - 22s 965ms/step - loss: 0.1101 - val_loss: 0.0778

Epoch 00020: val_loss improved from 0.08823 to 0.07779, saving model to weights.h5
Epoch 21/100
23/23 [==============================] - 23s 979ms/step - loss: 0.1100 - val_loss: 0.0743

Epoch 00021: val_loss improved from 0.07779 to 0.07433, saving model to weights.h5
Epoch 22/100
23/23 [==============================] - 23s 985ms/step - loss: 0.1076 - val_loss: 0.0667

Epoch 00022: val_loss improved from 0.07433 to 0.06666, saving model to weights.h5
Epoch 23/100
23/23 [==============================] - 21s 932ms/step - loss: 0.1043 - val_loss: 0.0633

Epoch 00023: val_loss improved from 0.06666 to 0.06327, saving model to weights.h5
Epoch 24/100
23/23 [==============================] - 23s 995ms/step - loss: 0.1041 - val_loss: 0.0692

Epoch 00024: val_loss did not improve from 0.06327
Epoch 25/100
23/23 [==============================] - 22s 963ms/step - loss: 0.0938 - val_loss: 0.0710

Epoch 00025: val_loss did not improve from 0.06327
Epoch 26/100
23/23 [==============================] - 22s 970ms/step - loss: 0.0938 - val_loss: 0.0804

Epoch 00026: val_loss did not improve from 0.06327
Epoch 00026: early stopping

<keras.callbacks.callbacks.History at 0x7f2cc5a57f28>

8 Detect Objects¶

Here, we use the trained weights to detect cells in an image

model.load_weights('weights.h5')

image = cv2.imread('/content/BCCD_Dataset/BCCD/JPEGImages/BloodImage_00001.jpg')
#image = cv2.imread('/content/license-plate-dataset/dataset/valid/images/dayride_type1_001.mp4#t=1135.jpg')

dummy_array = np.zeros((1,1,1,1,TRUE_BOX_BUFFER,4))

plt.figure(figsize=(10,10), dpi=100)
# read image
input_image = cv2.resize(image, (416,416))
input_image = input_image / 255.
input_image = input_image[:,:,::-1]
input_image = np.expand_dims(input_image, 0)

# do prediction and decode output to be predicted bounding boxes
netout = model.predict([input_image, dummy_array])

boxes = decode_netout(netout[0], 
                      obj_threshold=0.2,
                      nms_threshold=NMS_THRESHOLD,
                      anchors=ANCHORS, 
                      nb_class=CLASS)
plt.subplot(121)
plt.imshow(image[:,:,::-1]);
plt.title('raw image')
plt.subplot(122)            
image = draw_boxes(image, boxes, labels=LABELS)
plt.imshow(image[:,:,::-1])
plt.title('labeled image')

Text(0.5, 1.0, 'labeled image')

We think some small fossils in a rock is similar to blood cells, so try to apply yolo to detect fossils.

image = cv2.imread('2.png')
#image = cv2.imread('/content/license-plate-dataset/dataset/valid/images/dayride_type1_001.mp4#t=1135.jpg')

dummy_array = np.zeros((1,1,1,1,TRUE_BOX_BUFFER,4))

plt.figure(figsize=(10,10), dpi=100)
input_image = cv2.resize(image, (416,416))
input_image = input_image / 255.
input_image = input_image[:,:,::-1]
input_image = np.expand_dims(input_image, 0)

netout = model.predict([input_image, dummy_array])

boxes = decode_netout(netout[0], 
                      obj_threshold=0.2,
                      nms_threshold=NMS_THRESHOLD,
                      anchors=ANCHORS, 
                      nb_class=CLASS)
plt.subplot(121)
plt.imshow(image[:,:,::-1]);
plt.title('raw image')
plt.subplot(122)            
image = draw_boxes(image, boxes, labels=LABELS)
plt.imshow(image[:,:,::-1])
plt.title('labeled image')

Text(0.5, 1.0, 'labeled image')

Exercises¶

1 Run the keras code to build a Yolo network and load pre-trained weights.
2 Use the prepared Blood Cells dataset to train the Yolo network.
3 Use the trained Yolo to detect cells of an image and observe its performance.
4 Try to apply trained Yolo on a fossil image.
5 Try to change anchor boxes number and size, to compare different performance.
6 Modifiy some parameters and use license-plate dataset to train Yolo

Thanks for keras-yolo2 in github

	Image_names	cell_type	xmin	xmax	ymin	ymax
0	BloodImage_00001.jpg	WBC	68	286	315	480
1	BloodImage_00001.jpg	RBC	346	446	361	454
2	BloodImage_00001.jpg	RBC	53	146	179	299
3	BloodImage_00001.jpg	RBC	449	536	400	480
4	BloodImage_00001.jpg	RBC	461	548	132	212