Object recognition and pose estimation are two fundamental problems in the field of computer vision. Recognizing objects and their poses/viewpoints are critical components of ample vision and robotic systems. Multiple viewpoints of an object lie on an intrinsic low-dimensional manifold in the input space (i.e. descriptor space). Different objects captured from the same set of viewpoints have manifolds with a common topology. In this paper we utilize this common topology between object manifolds by learning a low-dimensional latent space which non-linearly maps between a common unified manifold and the object manifold in the input space. Using a supervised embedding approach, the latent space is computed and used to jointly infer the category and pose of objects. We empirically validate our model by using multiple inference approaches and testing on multiple challenging datasets. We compare our results with the state-of-the-art and present our increased category recognition and pose estimation accuracy.
|Original language||English (US)|
|Title of host publication||2016 IEEE Winter Conference on Applications of Computer Vision, WACV 2016|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|State||Published - May 23 2016|