In the past we have used Tensorflow Object Detection to detect sharks, social distancing and squirrels. Detecting objects is fun and we can build on top of that. Our main task will be to detect the two teams on a soccer field. We will use Tensorflow Object Detection to detect the people and then we’ll use unsupervised learning to cluster the people objects based on their shirt color. We’ll use k-means to cluster the people objects.
We’ll start with the regular Tensorflow Object Detection sample. After that we’ll follow some steps to build our little project.
This will be our end result:

First thing we’ll need to do is modify the method: visualize_boxes_and_labels_on_image_array . This will allow us use a different bounding box color for each team. Although we need to copy-paste the whole method, the change is pretty small:
'''
if agnostic_mode:
box_to_color_map[box] = 'DarkOrange'
elif track_ids is not None:
prime_multipler = _get_multiplier_for_color_randomness()
box_to_color_map[box] = STANDARD_COLORS[
(prime_multipler * track_ids[i]) % len(STANDARD_COLORS)]
else:
box_to_color_map[box] = STANDARD_COLORS[
classes[i] % len(STANDARD_COLORS)]
'''
box_to_color_map[box] = STANDARD_COLORS[team[i]]
We commented a lot of stuff and assigned the color based on a team array that contains different numbers for each team.
Then we’ll have our main method which will let us detect the teams. At a high level this method performs the following steps:
- Performs object detection and filters people
- Processes the coordinates to feed them into the k-means
- Use k-means to find clusters
- Displays the images with the teams detected
def detect_team(model, frame,df):
# the array based representation of the image will be used later in order to prepare the
# result image with boxes and labels on it.
person_class = 1
original_image = frame
image_np = frame
# Actual detection.
output_dict = run_inference_for_single_image(model, image_np)
boolPersons = output_dict['detection_classes'] == person_class
output_dict['detection_scores'] = output_dict['detection_scores'][boolPersons]
output_dict['detection_classes'] = output_dict['detection_classes'][boolPersons]
output_dict['detection_boxes'] = output_dict['detection_boxes'][boolPersons]
r_points = []
b_points = []
g_points = []
for i in output_dict['detection_boxes']:
new_box = denormalize_coordinates(i,original_image.shape[1],original_image.shape[0])
im2 = original_image[int(new_box[0]):int(new_box[2]),int(new_box[1]):int(new_box[3]),:]
r_points.append(im2[:,:,0].mean())
b_points.append(im2[:,:,1].mean())
g_points.append(im2[:,:,2].mean())
new_row = {'R':im2[:,:,0].mean(), 'G':im2[:,:,1].mean(), 'B':im2[:,:,2].mean()}
df = df.append(new_row, ignore_index=True)
#print(df.shape)
if len(output_dict['detection_boxes']) > 1:
kmeans = KMeans(n_clusters = 2, init = 'k-means++', max_iter=1000, n_init = 100, random_state=0)
y_kmeans = kmeans.fit_predict(df)
visualize_boxes_and_labels_on_image_array(
image_np,
output_dict['detection_boxes'],
output_dict['detection_classes'],
output_dict['detection_scores'],
category_index,
instance_masks=output_dict.get('detection_masks_reframed', None),
use_normalized_coordinates=True,
line_thickness=8,
team = y_kmeans)
'''
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(r_points, b_points, g_points, c=y_kmeans)
plt.show()
'''
return image_np
Another interesting part is how we apply k-means. Given that images in numpy are represented with a tridimensional vector (red, green ,blue) we average each layer and get 3 numbers per people object. We feed those 3 dimensions into the k-means and get the clusters.
You can also display the k-means visualization by uncommenting these lines:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(r_points, b_points, g_points, c=y_kmeans)
plt.show()

I also added a code snippet that you can use to read a video and generate another video with the detected teams:
from google.colab.patches import cv2_imshow
import cv2
FILE_OUTPUT = "test.avi"
PATH_TO_TEST_IMAGES_DIR = pathlib.Path('models/research/object_detection/test_images/soccer.avi')
vcap = cv2.VideoCapture('models/research/object_detection/test_images/soccer.avi')
frame_width = int(vcap.get(3))
frame_height = int(vcap.get(4))
out = cv2.VideoWriter(FILE_OUTPUT, cv2.VideoWriter_fourcc('M', 'J', 'P', 'G'),
24, (frame_width, frame_height))
ret, frame = vcap.read()
i = 0
while(i<1):
ret, frame = vcap.read()
im = detect_team(detection_model, frame,df)
#cv2_imshow(im)
out.write(im)
i = i+1
vcap.release()
out.release()
Take a look at the video:
You can find the code on this repository.