The 31st International Conference on Multimedia Modeling (MMM2025)

Special Session

Spatial Intelligence in Multimedia Analytics (SpIMA)

Georeferenced multimedia data, like satellite imagery and videos, is vital for Earth Observation, urban computing, and lifelogging. However, managing this data is challenging due to its diversity and complexity. Deep learning and multimodal analytics offer promising solutions to unlock its potential. The special session focuses on merging spatial information with other data modalities to enrich the original data. It also explores using foundation models, such as large language models, in spatial AI to enhance interpretability and performance. Integrating these models can offer new insights from georeferenced multimedia datasets. Although cross-modal retrieval and image captioning are thriving, their integration into location-based services is relatively unexplored. Nevertheless, interpretable machine learning techniques are essential for extracting insights from multimodal geospatial data. The SpIMA special session aims to unite various communities working on georeferenced data and foster the exchange of ideas and methods.

The SpIMA special session will explore the following main topics, which include but are not restricted to:

Important Dates


The content is restricted to 12 pages, encompassing all figures, tables, and appendices, following the Springer LNCS style guidelines. An additional allowance of 2 pages exclusively for cited references is permitted. This aligns with the guidelines set for the main conference of MMM2025.