Starship builds a fleet of robots to deliver packages locally on demand. To achieve this successfully, robots must be safe, kind and fast. But how to achieve this with small computing resources and without expensive sensors such as LIDAR? This is an engineering reality that you have to deal with unless you live in a universe where customers are happy to pay $ 100 for shipping.
For starters, robots start by sensing the world with radar, a multitude of cameras and ultrasound.
However, the challenge is that most of this knowledge is low-level and non-semantic. For example, a robot may sense that an object is ten meters away, but without knowing the category of the object, it is difficult to make a decision about safe driving.
Neural network machine learning is surprisingly useful in converting this unstructured low-level data into higher-level information.
Star robots mostly drive on sidewalks and cross the streets when they need to. This presents a different set of challenges compared to self-driving cars. Highway traffic is more structured and predictable. Cars move in lanes and do not change direction too often, while people often stop abruptly, meander, can be accompanied by a dog on a leash, and do not signal their intentions with turn signals.
To understand the environment in real time, the central component of the robot is an object detection module – a program that enters images and returns a list of object boxes.
That’s all very well, but how do you write such a program?
An image is a large three-dimensional array consisting of countless numbers that represent the intensity of pixels. These values change significantly when the image is taken at night instead of during the day; when the color, scale, or position of an object changes, or when the object itself is shortened or clogged.
For some complex problems, teaching is more natural than programming.
In robot software we have a set of units that can be trained, mostly neural networks, where the code is written by the model itself. The program is presented with a set of weights.
Initially, these numbers are randomly initialized, and program output is also random. Engineers present examples of models of what they would like to predict and ask the network to be better the next time it sees a similar input. By iteratively changing weights, the optimization algorithm searches for programs that predict boundary frames more and more precisely.
However, it is necessary to think deeply about the examples used to train models.
- Should a model be punished or rewarded when she discovers a car in a window reflection?
- What will he do when he discovers a picture of a man on a poster?
- Should a trailer full of cars be marked as a whole or should each of the cars be marked separately?
These are all examples that happened during the construction of the object detection module in our robots.
When teaching a machine, big data is simply not enough. The data collected must be rich and varied. For example, using only uniformly sampled images and tagging them would show many pedestrians and cars, but the model would lack examples of motorcycles or skaters to reliably detect these categories.
The team must pay special attention to difficult examples and rare cases, otherwise the model would not progress. Starship operates in several different countries, and different weather conditions enrich it with examples. Many people were surprised when the starship delivery robots worked during the snowstorm ‘Ema’ in the UK, however, airports and schools remained closed.
At the same time, tagging data requires time and resources. Ideally, it is best to train and improve models with less data. This is where architectural engineering comes into play. We encode prior knowledge into architecture and optimization processes to reduce search space to programs that are more likely in the real world.
In some computer-based applications, such as pixel segmentation, it is useful for the model to know whether the robot is on a sidewalk or a road crossing. To provide a hint, we encode global traces at the image level into neural network architecture; the model then decides whether to use it or not without having to learn it from scratch.
After data engineering and architecture, the model could work well. However, deep learning models require a significant amount of computing power, and this is a big challenge for the team because we cannot take advantage of the most powerful graphics cards on cheap battery-powered robots.
Starship wants our deliveries to be low priced, which means our hardware has to be cheap. That’s why Starship doesn’t use LIDAR (a radar-based detection system that uses laser light) that would make it much easier to understand the world – but we don’t want our customers to pay more than they need to deliver.
The most modern object detection systems published in academic papers work at about 5 frames per second [MaskRCNN], and real-time object detection papers do not report rates significantly higher than 100 FPS [Light-Head R-CNN, tiny-YOLO, tiny-DSOD]. Moreover, these numbers are shown in one figure; however, we need a 360-degree understanding (equivalent to processing approximately 5 individual images).
To provide perspective, Starship models run over 2,000 FPS when measured on a consumer GPU and process a full 360-degree panoramic image in one forward pass. This is equivalent to 10,000 FPS when processing 5 single images with a batch size of 1.
Neural networks are better than humans in many visual problems, although they can still contain errors. For example, the boundary frame may be too wide, reliability too low, or the object may be hallucinating in a place that is actually empty.
Correcting these errors is challenging.
Neural networks are considered to be black boxes that are difficult to analyze and understand. However, to improve the model, engineers must understand failure cases and dive deep into the specifics of what the model has learned.
The model is represented by a set of weights and one can visualize what each specific neuron is trying to detect. For example, the first layers of the Starship network are activated according to standard patterns such as horizontal and vertical edges. The next block of layers reveals more complex textures, while the higher layers reveal car parts and full objects.
Technical debt took on a different meaning with machine learning models. Engineers are continuously improving architectures, optimization processes, and datasets. As a result, the model becomes more accurate. However, changing the detection model for the better does not necessarily guarantee success in the overall behavior of the robot.
There are dozens of components that use the output of object detection models, each of which requires different precision and levels of recollection that are set based on the existing model. However, the new model may work differently in different ways. For example, the probability distribution of the output may be biased to higher values or wider. While average performance is better, they can be worse for a particular group like big cars. To avoid these obstacles, the team calibrates the probabilities and checks the regressions on multiple stratified datasets.
Monitoring trainable software components presents a different set of challenges than monitoring standard software. Little care is taken about lock time or memory usage, as they are mostly constant.
However, the shift of the data set becomes a primary concern – the distribution of data used to train the model differs from that in which the model is currently deployed.
For example, electric scooters driving on sidewalks can suddenly occur. If the model does not take this class into account, the model will have difficulty classifying it correctly. The information derived from the object detection module will not match other sensory information, which will result in seeking help from human operators and thus slowing down delivery.
Neural networks allow Starship robots to be safe at road crossings by avoiding obstacles like cars and on sidewalks by understanding all the different directions people and other obstacles can choose to go.
Starship robots achieve this by using cheap hardware that poses many engineering challenges, but makes robot deliveries a strong reality today. Starship robots make real deliveries seven days a week in several cities around the world, and we are glad to see how our technology continuously brings people increased comfort in their lives.