PointPrompt: A Visual Prompting Dataset based on the Segment Anything Model

Dataset Description

PointPrompt is the first visual segmentation prompting dataset based on the Segment Anything Model (SAM). It is a comprehensive collection of human-generated prompts across 6000 images corresponding to 16 image categories and 4 data modalities (natural, seismic, medical and underwater). The prompting data was generated in an interactive manner, and for each image, annotators produced a sequence of triplets comprising:

  • Prompts: the spatial location of the inclusion/exclusion points chosen by the annotator
  • Mask: the segmentation mask produced by SAM for a given set of prompts
  • Score: Intersection Over Union (IoU) score between the SAM-generated and ground-truth mask

At each step of the prompting process, the generated mask and associated score were shown to the annotator so they could adapt their strategy or move on to the next image.

Why is human prompting data relevant?

We have compared the segmentation scores obtained by our 48 human annotators against several existing automated prompting methods, showing that human prompting is consistently superior. This implies humans strategize and adapt their prompts in ways that cannot yet be replicated in an autonomous way, and that there are benefits to understanding the underlying patterns in human visual prompting.

Comparison of Human and Automated Strategies

Access Code and Dataset

Code: https://github.com/olivesgatech/PointPrompt

Dataset: https://zenodo.org/records/11187949

Print Friendly, PDF & Email