One transfer learning approach that has gained a wide popularity lately is attribute-based zero-shot learning. Its goal is to learn novel classes that were never seen during the training stage. The classical route towards realizing this goal is to incorporate a prior knowledge, in the form of a semantic embedding of classes, and to learn to predict classes indirectly via their semantic attributes. Despite the amount of research devoted to this subject lately, no known algorithm has yet reported a predictive accuracy that could exceed the accuracy of supervised learning with very few training examples. For instance, the direct attribute prediction (DAP) algorithm, which forms a standard baseline for the task, is known to be as accurate as supervised learning when as few as two examples from each hidden class are used for training on some popular benchmark datasets! In this paper, we argue that this lack of significant results in the literature is not a coincidence; attribute-based zero-shot learning is fundamentally an ill-posed strategy. The key insight is the observation that the mechanical task of predicting an attribute is, in fact, quite different from the epistemological task of learning the “correct meaning” of the attribute itself. This renders attribute-based zero-shot learning fundamentally ill-posed. In more precise mathematical terms, attribute-based zero-shot learning is equivalent to the mirage goal of learning with respect to one distribution of instances, with the hope of being able to predict with respect to any arbitrary distribution. We demonstrate this overlooked fact on some synthetic and real datasets. The data and software related to this paper are available at https://mine. kaust.edu.sa/Pages/zero-shot-learning.aspx. © Springer International Publishing AG 2016.