We present CLIPORT, a language-conditioned imitation learning agent that combines the broad semantic understanding of CLIP [1] with the spatial precision of Transporter. Our end-to-end framework is capable of solving a variety of language-specified tabletop tasks from packing unseen objects to folding cloths, all without any explicit representations of object poses, instance segmentations, history, symbolic states, or syntactic structures.
2021: Mohit Shridhar, Lucas Manuelli, D. Fox
https://arxiv.org/pdf/2109.12098v1.pdf
view more