Multi-view, Multi-instance, and Multi-label Learning (M3L) can model complex objects (bags), which are represented with different feature views, made of diverse instances, and annotated with discrete nonexclusive labels. Existing M3L approaches assume a complete correspondence between bags and views, and also assume a complete annotation for training. However, in practice, neither the correspondence between bags, nor the bags' annotations are complete. To tackle such a weakly-supervised M3L task, a solution called WSM3L is introduced. WSM3L adapts multimodal dictionary learning to learn a shared dictionary (representational space) across views and individual encoding vectors of bags for each view. The label similarity and feature similarity of encoded bags are jointly used to match bags across views. In addition, it replenishes the annotations of a bag based on the annotations of its neighborhood bags, and introduces a dispatch and aggregation term to dispatch bag-level annotations to instances and to reversely aggregate instance-level annotations to bags. WSM3L unifies these objectives and processes in a joint objective function to predict the instance-level and bag-level annotations in a coordinated fashion, and it further introduces an alternative solution for the objective function optimization. Extensive experimental results show the effectiveness of WSM3L on benchmark datasets.
|Original language||English (US)|
|Title of host publication||Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence|
|Publisher||International Joint Conferences on Artificial Intelligence Organization|
|Number of pages||7|
|State||Published - Jul 2020|