Every now and again things align so perfectly you can’t not do something.
- In the past I’ve dabbled with AI, searching for Lego bricks with limited success.
- I had recently discovered Unity’s Perception package for generating labelled synthetic “photo realistic” images for AI / deep learning
- I’m familiar with Unity having used it to create a VR game
- I had also been playing with Unity’s Lego Microgame package
Then on a recent NVIDIA AI Experts webinar for my day job I took on board one major takeaway: More data and bigger models tend to improve accuracy more than tuning a particular model parameters.
It is well known labelling data images can be the most time consuming part of an AI project, inspired by the above I set out to use my Unity game development skills and the Lego part assets from their microgame sample project to rapidly create a large set of labelled images of scattered Lego bricks for AI training.
A “fake” dataset where I lack the time to make one.
Volume, Variety, Veracity
Randomisation is the key here. We can randomises many factors to create a huge volume of varied data. In my simulation I drop a random number (50-150) of randomly coloured (from 45 Lego colours) random lego parts (74 unique part models) from random positions and starting velocities onto a randomised floor texture (99 textures)
And then I randomises the lighting conditions and direction, as well as the camera position and orientation, within some conditions to ensure a good number of the simulated parts are captured in the frame. The number of unique images possible is limited only by computer randomisation ability.
Plus as the labelling data is captured from within the rendering pipe-line the labels have pixel-perfect veracity.
Training
Training was quite straightforward, working on the notion of larger mode with more data being better I chose a resnet50 architecture – about the largest my machine can cope with. The concept of finding Lego bricks isn’t much different from any object detection model, I borrowed heavily from a notebook from Kaggle, adapting it to be for multiple labels and wrangled my data into a similar shape as the example.
After a few hours using transfer learning from the pre-trained architecture I already have a model that can detect a fair number of parts with reasonable accuracy. I’ve commented before that a brick finder doesn’t need to find all bricks just the most probable detections for a given part so am reasonably happy with my early progress.
And what I’m most pleased with is it works quite well inferencing on real-world photos as can be seen in my video. The next stage I would like to deploy this or an improved model with something like torchserve, and then point a mobile app (likely built in Unity) at it to get a real-world brick-finder.
Scale up
I used my gaming laptop for these experiments but the concepts scale up beautifully. The process of generating synthetic data is embarrassingly parallel, where it took my laptop about 6 hours to generate 100,000 images, a cluster of more powerful servers can generate many more in much less time. Unity can even push your simulation to multiple worker nodes in the cloud right from within the editor.
Plus AI training servers with more of and more powerful GPUs than my laptop’s RTX chip are accessible. Whilst few of us can afford a system the scale of one of Tesla’s super computers, a cloud instance the size of NVIDIA’s flagship DGX A100 server can be rented for $32.77 an hour.
Scale of course costs money, but compared to the cost of labelling a huge dataset the compute is the easy part.
Fake it?
I think “synthetic” data can be a good starting point in ones AI journey as well as for rapid prototyping. The volume of good quality data is almost like a cheat-code, getting you to training and testing models for your own use case much sooner. Yes there are a lot of large sample datasets you can download and run the code but I feel you learn a lot more diving into your own project, but data labelling is usually the largest hurdle here.
Before investing time or considerable money in generating a real world dataset, I would encourage you to consider the benefits of synthetic data. Prove the concept on a “fake” dataset, gain experience, then augment it with real data once your confident your idea has legs.