Intelligent access to multimedia databases for ``naive user'' should probably be based on queries formulation by ``intelligent agents''. These agents should ``understand'' the semantics of the source contents, learn user preferences and deliver to the user a subset of source contents, for further navigation. The goal of such systems should be to enable ``zero-command'' access to the contents, while keeping the freedom of choice of the user. Such systems should interpret multimedia contents in terms of multiple audiovisual objects (from video to visual or audio object), and on actions and scenarios. In our project we have developed a method for image segmentation into semantic objects, even in the case of still images. We use this method, and user-defined collections of such objects, to facilitate temporal segmentation of videos into multiple semantic granules from story and sequence to object, and to characterize stories contents. For this purpose, we also use audio information from selected parts of the video. Stories are characterized by a set of visual concepts and words, and semantic similarity between stories is evaluated based on information retrieval methods. The system learns user preferences, and incrementally builds a user profile, which is used to present relevant stories in an appropriate order. This approach was used to build a mockup of a simple ``push'' engine, which is presently being experimented.