Deep Traffic Signs

Real-time road sign assistance

This project is maintained by octaexon

homeintroenvdata

Data

Before we can decide what to do, we need to get some understanding of what we have at our disposal.

Bounding the extent of the challenge

Focussing on the recognition stage for a moment, a comprehensive road sign classifier is a significant undertaking. The number of distinct signs on our roads is incredibly large and ever increasing. For example, here’s a sign we didn’t have on our roads until recently:

Evidently, I need to place restrictions to ensure feasibility.

GTSDB

As I stated earlier, the core dataset comes from the GTSDB (German Traffic Sign Detection Benchmark). This link downloads the associated zip file and on the surface, it’s a relatively meaty 1.6GB. Unfortunately, the files are in the rather inefficient Portable Pixmap format (24-bit-color images where each pixel is encoded as uncompressed text). On closer inspection, there are only 900 images spread across 43 classes. Here is their distribution:

As its name suggests, this is a benchmarking dataset, so one course of action, would be to compare any model I construct against the benchmark. However, I’m not motivated to go down that road and you can see why if you look inside the dataset: even within the more frequently occurring classes, the environment is not particularly diverse. Of course, road signs occur at roads, so one shouldn’t expect a magical wonderland of diversity, but:

Certainly, some of the signs are difficult to read so models doing better on this benchmark are probably better in real life, but the benchmarking error rates are unlikely to hold any precise meaning for real world applications.

Given the constraints detailed earlier, I find it more interesting (and indeed more feasible) to develop a model restricted to a very small number of signs and attempt to evaluate on live test data in a variety of conditions.

Statement of intent

I picked the following signs:

priority road give way speed limit 30 speed limit 50

for the several reasons:

With preliminary data at hand, let’s pick out the models.