ML4Self: Self-supervised and Robust Learning Research Directions

Program Overview

Develop a unified framework that leverages self-supervised learning (SSL) and other learning paradigms, such as federated learning, to learn high-quality representations from unlabeled field data and synthetic data. The learned representations are optimized based on properties such as dimensionality and spread of feature spaces to enhance generalization on downstream applications. Consequently, the representations transfer effectively to large-scale field data. Developing these representation learning paradigms will harness unlabeled data at scale to significantly reduce annotation costs and tackle transferability issues from synthetic to field data. This research program lays the foundation for developing robust and generalizable frameworks that learn from synthetic data and unlabeled seismic volumes to deliver insightful understanding of complex subsurface structures under the label-scarce reality of field data.

Objectives

Representation Learning via Pseudo-tasks that Capture Seismic Data Properties: Integrate volumetric constraints in contrastive objective losses. Develop adaptive and stage-wise SSL losses.
Seismic Representation Analysis: Analyze the properties, such as spread and dimensionality, of learned representation spaces.
Global and Local Collapse Prevention in the Representation Space: Develop regularization techniques including global uniformity and local dispersion penalties tailored to seismic volumetric correlations.
Representation Alignment across Domains for Generalization on Real Data: Develop introspective gradient-based alignment that enforces both direction and magnitude consistency between synthetic and field data.
Federated Learning to Develop Decentralized Training Frameworks.
Model and Data Uncertainty to Combat Label Scarcity.

Key themes

SSL on synthetic data and abundant unlabeled data.
Robust representation learning.
Global and local representation collapse prevention.
Federated learning.
Uncertainty quantification.
Synthetic to real domain adaptation.
Weakly-supervised learning.
Learning with limited labels.
Representation learning.

Participation and Governance

This CRP operates within the ML4Seismic philanthropic partnership framework. Participation requires enrollment as an Executive Member with at least one CRP selected. Partners are

invited to:

Participate in CRP-specific workshops and benchmarking initiatives
Collaborate on shared scientific objectives and data challenges
Engage with students and researchers in community events

All research outcomes are shared via open-source repositories and peer-reviewed publications in accordance with ML4Seismic operating guidelines.

Contact

Prof. Ghassan AlRegib alregib@gatech.edu