
The Segment Anything Model (SAM) is a powerful, prompt-driven foundation model for image segmentation. It allows users to interactively specify regions of interest through prompts, enabling flexible segmentation across diverse datasets. Recent research has explored automating SAM’s prompting process to achieve semantic segmentation without human intervention. While this approach works well in standard image datasets, SAM struggles to generalize in zero-shot segmentation tasks involving uncommon data domains such as seismic or medical imaging. These domains feature highly specialized patterns that differ significantly from natural images, making it difficult for SAM to generate meaningful segmentations without manual guidance.
To address this problem, we propose Human-Initiated Prompt Optimization (HIPPO)—a novel method that enhances SAM’s segmentation capabilities in specialized domains. HIPPO leverages human input to establish an effective prompting strategy, allowing for better segmentation of seismic facies with minimal human effort.
The HIPPO framework begins with a human expert providing an initial set of prompts on an exemplar image containing at least one instance of the geological feature of interest. Unlike traditional manual prompting, HIPPO introduces an optimization process that evaluates the effectiveness of each prompt. It assigns necessity and sufficiency scores to each inclusion and exclusion point, determining which prompts contribute the most to accurate segmentation. The goal is to find the smallest set of prompts that achieves the highest mean Intersection over Union (mIoU), a common metric for segmentation accuracy. This optimization ensures that SAM can segment the desired geological body with as few prompts as possible, reducing the effect of over-prompting.

Once the optimal prompts are identified, they are transferred to new images in the dataset. To achieve this, HIPPO matches image embeddings between the exemplar image and the new input image. By computing the cosine similarity between embeddings, HIPPO locates the regions in the new image that correspond to the originally segmented feature. The optimized prompts are then applied at these locations, enabling automated segmentation of multiple instances of the geological body across different images. This transfer mechanism allows SAM to operate efficiently in seismic facies segmentation without requiring new prompts for every image, significantly reducing the workload for human analysts.
HIPPO offers several key advantages. First, it enhances SAM’s adaptability to specialized domains where standard prompting strategies fail. Second, it reduces human effort by identifying the most effective prompts for a given task, allowing the segmentation process to scale across large datasets. Third, by leveraging image embedding matching, HIPPO ensures that the learned prompts can be transferred and reused, making segmentation more efficient and reliable. Most importantly, HIPPO achieves state-of-the-art (SOTA) results when compared to other automated prompting approaches in the literature, demonstrating superior segmentation accuracy with fewer prompts.