Moving MNIST
(10 input frames, 10 output frames)
No Spatial Frame Dependencies
VPN
Robotic Pushing
(2 input frames, 18 output frames)
Validation Set
with seen objects
and unseen actions ​ 
Test Set
with unseen objects
and unseen actions
Continuations from same 2 input frames
No Spatial Frame Dependencies
VPN