One of the major problems in CBIR is the so-called `semantic gap': the difference between low-level features, extracted from images, and the high-level `information need' of the user. Reaching that goal can be regarded as a quest for similar `concepts', where a concept is loosely defined as ``what words (or images) stand for, signify, or mean'' [1]. We first seek to establish a metaphysical basis for CBIR. We look at ontological questions, such as `what is similarity?' and `what is an image?' in the context of CBIR. We will investigate these questions via thought experiments. We will argue that the meaning of an image-the concept it stands for-rests on at least three pillars: what actually can be seen on an image (its ontology), convention and imagination.