I'm curious how this works on the other side of the warehouse - in shipping. Especially once you start thinking about the games of tetris that are played when loading the container. Boxes are often not uniform in dimensions or weight, nor should they be loaded like they are in the picture if it can be avoided. They need to be interlocked (assuming non-cubic boxes) so that the entire load doesn't come crashing down if it shifts.
I used to work at a factory that was doing this (loading the trucks for shipping with a robot). It took much longer to implement than expected, but eventually worked pretty well. In the beginning, we would test by driving the loaded truck on a typical route and then checking the contents. The first few times were a disaster with boxes thrown all over the trailer. It was very cool to watch the robot work.
That's routinely done by automatic palletizers. Here's a standalone robotic palletizer for mixed pallets.[1]
Here's an entire automated distribution center.[2] This is more traditional automation. Pallets come in, are broken apart, and items put into storage. Then items are assembled to fill orders, and stacked on pallets. They even call it "automated Tetris". They try to stack the items to match a retail store's planogram, so the people doing shelf restocking usually have what they need next on top. Objects are lifted from the bottom, or sometimes grabbed from two sides, so they don't rely on super-strong cardboard.
The forklifts that empty and fill trucks with pallets are still human-driven in that system.
Automated forklifts do exist.[3] They're still rare, though.
All this gear seems to come from more advanced countries that don't have an underclass for cheap labor. New Zealand, Germany, the Scandinavian countries.
Perhaps we can use the golden hammer of deep learning for this? Maybe the input layer would be a 3d tensor with a 1 for the of space taken by each box, or the edge of each box, and a zero for everything else. The output layer would be a quality score that would be the result of a simulation of how badly the boxes were damaged given the truck shifting wildly. Then the model, when sufficiently trained, could be used by the packer robot to iterate solutions on the boxes it had for packing to achieve the best predicted quality score?
You don't need (unreliable, prone to out of distribution breakage) deep learning for this. Bin packing is a well known problem in optimization literature and there are established algorithms for solving it with bounds on distance to optimality.
But when it comes to actually controlling the putting of the boxes in the truck the way the bin packing algorithm wants, you need deep learning. In case the robot slips a bit, changing direction a bit, on someone's handkerchief on the floor or whatever.
There exist methods to guarantee robustness within a certain L_p ball on your parameters (in this case the box positions). I'm sure with only a few hundred boxes they would be tractable to solve to optimality.
The actual robotics part is definitely deep learning though.
If you can provide exact coordinates of every box, then yes, mostly* (including correct guesses of where the box can be grabbed without ripping it. Of course exact coordinates are required even when the box is moving. If it shifts when you're grabbing/lifting/putting it down, you still have complete the action correctly). Or to put it more in engineer wording: it's going to be far more robust to environment changes.
And I would argue that whilst the machine learning way is pretty complex it's still simpler than 3d motion planning of moving robot platforms. And one machine learning solution can adapt to many robots with just retraining, without redoing the formulas from scratch.
* technically on a moving robot platform it hasn't entirely been solved, but good enough solutions do exist.
Ah, right, image recognition for the boxes. But I don't think they would use it for moving the arm.
> And I would argue that whilst the machine learning way is pretty complex it's still simpler than 3d motion planning of moving robot platforms.
On what grounds do you think this? 3d motion planning isn't complex in these scenarios.
> And one machine learning solution can adapt to many robots with just retraining, without redoing the formulas from scratch.
You don't redo the formulas from scratch, you just plug in the specs of the robot and then you have it. Positioning and moving arm parts is a solved problem. Redoing machine learning for every arm seems much more cumbersome.
Speaking from experience, there's two main strategies:
1) Diligence, care, time, and luck on the part of the loader, carefully stacking boxes in such a way to make sure there isn't any room for the boxes to shift in transit or fall over when unloading.
2) Just throw it all in a big pile and hope nothing breaks.