GeoExplorer:

Active Geo-localization with Curiosity-Driven Exploration

Li Mi, Manon Béchaz, Zeming Chen, Antoine Bosselut, Devis Tuia

EPFL, Switzerland

ICCV 2025

Pre-Print Code

Dataset

GeoExplorer combines goal-oriented reward and curiosity-driven reward
to address the task of Active Geo-localization

Integrating curiosity-driven rewards with goal-oriented rewards introduces an essential trade-off between following the direct, goal-oriented guidance and engaging in exploratory behavior.

Active Geo-localization

Active Geo-Localization (AGL) within a goal-reaching reinforcement learning (RL) context.
(a) AGL focuses on localizing a target (goal), within a predefined search area (environment) presented in the bird’s eye view, by navigating the agent towards it. At a given time, the agent observes a state, i.e., a patch representing a limited observation of the environment, and selects an action, i.e., a decision that modifies the agent position and the observed state.
(b) The location of the goal is unknown during infrerence but its content can be described in various modalities:

I: aerial image patches.
G: ground-level images.
T: textual descriptions.

GeoExplorer

In this work, we introduce GeoExplorer, an AGL agent that:

1) jointly predicts the action-state dynamics
2) explores the search region with a curiosity-driven reward

The learning process can be divided into three stages sequentially: feature representation, Action-State Dynamics Modeling (DM), and Curiosity-Driven Exploration (CE).
(a) Feature Representation. The environment (st) and goal (sgoal) are encoded with different but aligned encoders, according to their modalities (e.g., aerial images (Igoal), ground-level images (Ggoal), or text (Tgoal)).
(b) Action-State Dynamics Modeling. A causal Transformer is trained to jointly capture action-state dynamics, guided by supervision from generated action-state trajectories for environment modeling.
(c) Curiosity-Driven Exploration. Based on state prediction from (b), a curiosity-driven intrinsic reward (rin) is used to encourage the agent to explore the environment by measuring the t differences between prediction and observations.

SwissView Dataset

Our proposed SwissView dataset is constructed from Swisstopo’s SWISSIMAGE 10cm imagery, with two distinct components:

SwissView100, which comprises 100 images randomly selected from across the Swiss territory, thereby providing diverse natural and urban environment.
SwissViewMonuments, which includes 15 images of atypical or distinctive scenes, such as unusual buildings and landscapes, with corresponding ground level images.

Results

We evaluate GeoExplorer in four settings:

Validation: we evaluate the model on the same dataset as it is trained.
Cross-domain Transfer: the model trained on the Masa dataset is evaluated on an unseen dataset using aerial view as goal modality.
Cross-modal Generalization: the goal is presented from different modalities.
Unseen Target Generalization: to evaluate the models’ adaptation to the unseen targets, we evaluate the model on our SwissView dataset, with aerial view and ground-level images.

1. Validation and Cross-domain Generalization

Average success rate of GeoExplorer and the baseline over start-goal distance between 4 to 8 (C=4 to C=8) on the validation (Masa dataset) and cross-domain generalization (x-BD and Swiss-view100 datasets).

2. Multimodal Generalization

Average success rate of GeoExplorer and the baseline over start-goal distance between 4 to 8 (C=4 to C=8) on the cross-modal generalization (MM-GAG dataset). Green, Blue and Yellow denote aerial image, ground-level image and text as the goal, respectively.

Takeaway I: GeoExplorer with curiosity-driven intrinsic reward shows improved cross-domain and cross-modal generalization ability over baseline model with extrinsic reward only.

3. Unseen Target Generalization

Average success rete of GeoExplorer and the baseline when C=4, C=5, and C=6 on the SwissViewMonuments dataset: (a) Aerial view as the goal; (b) Ground-level image as the goal.

Takeaway II: GeoExplorer exhibits an impressive generalization ability in localizing unseen targets, especially when the path is long.

Does intrinsic reward improve exploration?

To provide insights of intrinsic reard and its impact on exploration, we design the following analysis and visualization:

Generated Path Visualization: we visualize the generated paths of baseline and GeoExplorer, which visualize the exploration behaviour of models.
Path Statistics: we count the visited patches and end patches of all generated path, which indicates the coverage region of exploration within search area.
Intrinsic Reward Visualization and Grounding: we visualize the average intrinsic reward per patch, which provides a further understanding grounding of intrinsic reward on the exploration paths.

1. Visualization of Exploration Ability

Generated path visualization on the SwissViewMonuments dataset. Given a pair of {start (◦), goal (△)} per search area, models generate four trials with stochastic policy, randomly shown in four different colors. Compared with the baseline, the paths generated by GeoExplorer are more robust (adapted to various envrionment), diverse (different paths for the same {start, goal} pairs) and content-aware (related to state observations).

Takeaway III: GeoExplorer shows robust, diverse, and content-related exploration ability.

Statistics of the path end and path visited on the Masa dataset. (a) Statistics of the path end. We count the end location of the 895 paths in the Masa dataset test set for ground truth (goal location), GOMAA-Geo and GeoExplorer when C = 4 and C = 8. (b) Statistics of the path visited. We count all the visited patches of 895 paths in the Masa dataset test set for GOMAA-Geo and GeoExplorer when C = 4 and C = 8.

Takeaway IV: With intrinsic reward that encourages environment exploration, GeoExplorer tends to explore more patches in the search areas, especially the patches in the center, indicating a improved exploration ability.

2. Analysis of Intrinsic Reward

Intrinsic reward visualization with images from the SwissViewMonuments dataset. For each sample, from left to right: the search area, path visualization and intrinsic reward per patch. The patch with the highest intrinsic reward is highlighted with an orange rectangle in the search area. Patches with higher intrinsic reward turn out to be more “interesting”, i.e., the semantic content of these patches can hardly be predicted 546 from the surrounding patches.

Takeaway V: Curiosity-driven intrinsic rewards provide dense, goal-agnostic, and content-related guidance to enhance the exploration ability of GeoExplorer.

BibTeX

@article{mi2025geoexplorer,
      title={{GeoExplorer}: Active Geo-localization with Curiosity-Driven Exploration}, 
      author={Li Mi and Manon Béchaz and Zeming Chen and Antoine Bosselut and Devis Tuia},
      year={2025},
      journal={arXiv preprint arXiv:2508.00152},
}