The Video4IMX – sponsored by the TRANSMIXR project, with i-Game being a co-organizing project – is aiming to address the increasing importance and relevance of classical (linear 2D), interactive (non-linear), 360° and volumetric video assets in the creation of immersive experiences, in connection with the use of state of the art Generative AI models. Richly granular and semantically expressive descriptive metadata about video assets is necessary for their discovery, adaptation and re-use in immersive content experiences, both in automated ways (e.g. automated insertion of journalist’s video recordings inside an immersive experience for a breaking news story) and semi-automated (e.g. creatives can search for and re-use videos as part of a theatrical or cultural immersive experience).
The workshop will solicit the latest research and development in all areas around the extraction, modeling and management of descriptive metadata for video as well as approaches to adapt or convert video according to its purpose and use in an immersive experience. It aims to support the growth of a community of researchers and practitioners interested in creating an ecosystem of tools, specifications and best practices for video discovery, adaptation, summarization or generation, particularly in the context of video (re-)use in immersive experiences.
Topics for the workshop include, but are not limited to:
• Extraction and modeling of descriptive metadata about traditional 2D video, 360° video and volumetric video (decomposition, semantic representation, categorization, annotation, emotion/mood extraction etc.)
• Tools and algorithms for the (semi-automatic) adaptation, summarization or remixing of any type of video assets (traditional, interactive, 360°, volumetric), particularly for re-use in immersive content
• Generative AI (foundational vision models, vision language models) for visual understanding, extraction of descriptive metadata from traditional, interactive, 360° or volumetric video
• Generative AI for creation of video assets out of other modal inputs such as textual prompts or image sets
• Generative AI for transformation of or between any type of video, such as generating (possibly multimodal) video summaries, or converting an input video into immersive content (3d objects or scenes)
• Artificial intelligence and machine learning for volumetric video content analysis, understanding and retrieval to facilitate XR content generation
• Methods for explainable AI for visual content understanding and for immersive multimedia applications (e.g. game design)
• Examples and use cases for usage of video (esp. 360° or volumetric) or immersive content generated from video in immersive experiences
• Evaluations of user experience with video (esp. 360° or volumetric) or immersive content generated from video as part of an immersive experience
• Multimedia tools and algorithms for multi-modal immersive simulations