Configuration¶

This tutorial covers the configuration system for the displacement calibration workflow.

Philosophy: The config system uses Pydantic for validation at serialization boundaries. This means:

Type-safe configurations with automatic validation
Clear error messages when something is wrong
YAML files that map directly to Python objects
No scattered validation checks throughout the codebase

Quick Start: Minimal Configuration¶

Let's start with the simplest possible config:

In [ ]:

Copied!

from pathlib import Path

from cal_disp.config import CalibrationWorkflow

# Create minimal config - just directories
config = CalibrationWorkflow.create_minimal()
print(config.summary())
from pathlib import Path

from cal_disp.config import CalibrationWorkflow

# Create minimal config - just directories
config = CalibrationWorkflow.create_minimal()
print(config.summary())

Note: This config isn't ready to run yet. It needs input files.

Step 1: Adding Required Input Files¶

Every workflow needs:

A displacement file (DISP-S1 product)
A calibration reference grid
Dynamic ancillary files (DEM, LOS, masks, etc.)

In [ ]:

Copied!





from cal_disp.config import DynamicAncillaryFileGroup, InputFileGroup

# Set up input files
config.input_options = InputFileGroup(
    disp_file=Path("data/OPERA_L3_DISP-S1_T001-000001-IFG_20240101_20240113_v1.0.nc"),
    calibration_reference_grid=Path("data/reference_grid.parquet"),
    frame_id=1,
)

# Set up dynamic ancillaries
config.dynamic_ancillary_file_options = DynamicAncillaryFileGroup(
    dem_file=Path("data/dem.tif"),
    los_east_file=Path("data/los_east.tif"),
    los_north_file=Path("data/los_north.tif"),
)

# Check if ready to run
status = config.validate_ready_to_run()
print(f"Ready to run: {status['ready']}")
if status['errors']:
    print("Errors:")
    for error in status['errors']:
        print(f"  - {error}")
from cal_disp.config import DynamicAncillaryFileGroup, InputFileGroup

# Set up input files
config.input_options = InputFileGroup(
    disp_file=Path("data/OPERA_L3_DISP-S1_T001-000001-IFG_20240101_20240113_v1.0.nc"),
    calibration_reference_grid=Path("data/reference_grid.parquet"),
    frame_id=1,
)

# Set up dynamic ancillaries
config.dynamic_ancillary_file_options = DynamicAncillaryFileGroup(
    dem_file=Path("data/dem.tif"),
    los_east_file=Path("data/los_east.tif"),
    los_north_file=Path("data/los_north.tif"),
)

# Check if ready to run
status = config.validate_ready_to_run()
print(f"Ready to run: {status['ready']}")
if status['errors']:
    print("Errors:")
    for error in status['errors']:
        print(f"  - {error}")

Step 2: Working with YAML Files¶

Configs are typically stored as YAML files. Here's the round-trip:

In [ ]:

Copied!





# Save to YAML
yaml_path = Path("workflow_config.yaml")
config.to_yaml(yaml_path)
print(f"Saved to {yaml_path}")

# View the YAML
print("\nYAML contents:")
print(yaml_path.read_text())
# Save to YAML
yaml_path = Path("workflow_config.yaml")
config.to_yaml(yaml_path)
print(f"Saved to {yaml_path}")

# View the YAML
print("\nYAML contents:")
print(yaml_path.read_text())

In [ ]:

Copied!

# Load from YAML
loaded_config = CalibrationWorkflow.from_yaml(yaml_path)
print(loaded_config.summary())
# Load from YAML
loaded_config = CalibrationWorkflow.from_yaml(yaml_path)
print(loaded_config.summary())

Step 3: Worker Configuration¶

Control parallelism and memory usage:

In [ ]:

Copied!





from cal_disp.config import WorkerSettings

# Default settings (auto-detect CPU count)
config.worker_settings = WorkerSettings.create_standard()
print(f"Workers: {config.worker_settings.n_workers}")
print(f"Threads per worker: {config.worker_settings.threads_per_worker}")
print(f"Total threads: {config.worker_settings.total_threads}")

# Custom settings for a specific machine
config.worker_settings = WorkerSettings(
    n_workers=4,
    threads_per_worker=2,
    block_shape=(512, 512),
)
print(f"\nCustom: {config.worker_settings.total_threads} total threads")
from cal_disp.config import WorkerSettings

# Default settings (auto-detect CPU count)
config.worker_settings = WorkerSettings.create_standard()
print(f"Workers: {config.worker_settings.n_workers}")
print(f"Threads per worker: {config.worker_settings.threads_per_worker}")
print(f"Total threads: {config.worker_settings.total_threads}")

# Custom settings for a specific machine
config.worker_settings = WorkerSettings(
    n_workers=4,
    threads_per_worker=2,
    block_shape=(512, 512),
)
print(f"\nCustom: {config.worker_settings.total_threads} total threads")

Step 4: File Validation¶

Check which files exist before running:

In [ ]:

Copied!





# Check all input files
file_status = config.validate_input_files_exist()

print("File validation results:")
for filename, info in file_status.items():
    status = "✓" if info['exists'] else "✗"
    print(f"  {status} {filename}: {info['path']}")

# Get just the missing files
missing = config.get_missing_files()
if missing:
    print(f"\nMissing files: {', '.join(missing)}")
else:
    print("\nAll files exist!")
# Check all input files
file_status = config.validate_input_files_exist()

print("File validation results:")
for filename, info in file_status.items():
    status = "✓" if info['exists'] else "✗"
    print(f"  {status} {filename}: {info['path']}")

# Get just the missing files
missing = config.get_missing_files()
if missing:
    print(f"\nMissing files: {', '.join(missing)}")
else:
    print("\nAll files exist!")

Step 5: Static Ancillary Files (Optional)¶

For algorithm overrides, custom databases, etc:

In [ ]:

Copied!





from cal_disp.config import StaticAncillaryFileGroup

config.static_ancillary_file_options = StaticAncillaryFileGroup(
    algorithm_parameters_overrides_json=Path("data/custom_params.json"),
    # Add other static files as needed
)

print(config.summary())
from cal_disp.config import StaticAncillaryFileGroup

config.static_ancillary_file_options = StaticAncillaryFileGroup(
    algorithm_parameters_overrides_json=Path("data/custom_params.json"),
    # Add other static files as needed
)

print(config.summary())

Step 6: Directory Management¶

Create output directories and set up logging:

In [ ]:

Copied!





# Create directories
config.create_directories()
print(f"Work directory: {config.work_directory}")
print(f"Output directory: {config.output_directory}")
print(f"Log file: {config.log_file}")

# Set up logging
import logging

logger = config.setup_logging(level=logging.INFO)
logger.info("Workflow initialized")
# Create directories
config.create_directories()
print(f"Work directory: {config.work_directory}")
print(f"Output directory: {config.output_directory}")
print(f"Log file: {config.log_file}")

# Set up logging
import logging

logger = config.setup_logging(level=logging.INFO)
logger.info("Workflow initialized")

Complete Example: From Scratch¶

Putting it all together:

In [ ]:

Copied!





# Create a complete config in one go
complete_config = CalibrationWorkflow(
    work_directory=Path("./processing/work"),
    output_directory=Path("./processing/output"),
    input_options=InputFileGroup(
        disp_file=Path("data/disp.nc"),
        calibration_reference_grid=Path("data/ref_grid.parquet"),
        frame_id=1,
    ),
    dynamic_ancillary_file_options=DynamicAncillaryFileGroup(
        dem_file=Path("data/dem.tif"),
        los_east_file=Path("data/los_east.tif"),
        los_north_file=Path("data/los_north.tif"),
    ),
    worker_settings=WorkerSettings(
        n_workers=4,
        threads_per_worker=2,
    ),
    keep_paths_relative=True,  # Keep relative for portability
)

# Save and verify
complete_config.to_yaml("production_config.yaml")
print(complete_config.summary())
# Create a complete config in one go
complete_config = CalibrationWorkflow(
    work_directory=Path("./processing/work"),
    output_directory=Path("./processing/output"),
    input_options=InputFileGroup(
        disp_file=Path("data/disp.nc"),
        calibration_reference_grid=Path("data/ref_grid.parquet"),
        frame_id=1,
    ),
    dynamic_ancillary_file_options=DynamicAncillaryFileGroup(
        dem_file=Path("data/dem.tif"),
        los_east_file=Path("data/los_east.tif"),
        los_north_file=Path("data/los_north.tif"),
    ),
    worker_settings=WorkerSettings(
        n_workers=4,
        threads_per_worker=2,
    ),
    keep_paths_relative=True,  # Keep relative for portability
)

# Save and verify
complete_config.to_yaml("production_config.yaml")
print(complete_config.summary())

Path Resolution¶

By default, paths are resolved to absolute. Control this with keep_paths_relative:

In [ ]:

Copied!





# Absolute paths (default)
abs_config = CalibrationWorkflow(
    work_directory=Path("./work"),
    output_directory=Path("./output"),
    keep_paths_relative=False,
)
print(f"Work dir: {abs_config.work_directory}")

# Relative paths (for portability)
rel_config = CalibrationWorkflow(
    work_directory=Path("./work"),
    output_directory=Path("./output"),
    keep_paths_relative=True,
)
print(f"Work dir: {rel_config.work_directory}")
# Absolute paths (default)
abs_config = CalibrationWorkflow(
    work_directory=Path("./work"),
    output_directory=Path("./output"),
    keep_paths_relative=False,
)
print(f"Work dir: {abs_config.work_directory}")

# Relative paths (for portability)
rel_config = CalibrationWorkflow(
    work_directory=Path("./work"),
    output_directory=Path("./output"),
    keep_paths_relative=True,
)
print(f"Work dir: {rel_config.work_directory}")

Error Handling¶

The config system validates at creation time:

In [ ]:

Copied!





try:
    # This will fail - invalid type for n_workers
    bad_config = CalibrationWorkflow(
        worker_settings=WorkerSettings(n_workers="not a number")
    )
except Exception as e:
    print(f"Validation error: {e}")
try:
    # This will fail - invalid type for n_workers
    bad_config = CalibrationWorkflow(
        worker_settings=WorkerSettings(n_workers="not a number")
    )
except Exception as e:
    print(f"Validation error: {e}")

In [ ]:

Copied!





# Check readiness before running
incomplete_config = CalibrationWorkflow.create_minimal()
status = incomplete_config.validate_ready_to_run()

if not status['ready']:
    print("Not ready to run!")
    print("Errors:")
    for error in status['errors']:
        print(f"  - {error}")
else:
    print("Ready to run!")
# Check readiness before running
incomplete_config = CalibrationWorkflow.create_minimal()
status = incomplete_config.validate_ready_to_run()

if not status['ready']:
    print("Not ready to run!")
    print("Errors:")
    for error in status['errors']:
        print(f"  - {error}")
else:
    print("Ready to run!")

Command-Line Usage¶

Typical workflow from the command line:

# Create a template config
python -c "from cal_disp.config import CalibrationWorkflow; \
           CalibrationWorkflow.create_example().to_yaml('config.yaml')"

# Edit config.yaml with your paths
vim config.yaml

# Run the workflow
cal-disp run config.yaml

Tips and Best Practices¶

Start with create_minimal() or create_example() - don't build configs from scratch
Use relative paths when configs need to be portable across machines
Always call validate_ready_to_run() before starting a long job
Set up logging early with setup_logging() to catch issues
Version control your YAML configs - they're small and readable
Use summary() to sanity-check your configuration before running

Common Patterns¶

Pattern 1: Load, Modify, Save¶

# Load existing config
config = CalibrationWorkflow.from_yaml("config.yaml")

# Modify one thing
config.worker_settings.n_workers = 8

# Save as new config
config.to_yaml("config_8workers.yaml")

Pattern 2: Batch Processing¶

# Create configs for multiple frames
base_config = CalibrationWorkflow.from_yaml("base_config.yaml")

for frame_id in [1, 2, 3, 4]:
    config = base_config.model_copy(deep=True)
    config.input_options.frame_id = frame_id
    config.input_options.disp_file = Path(f"data/frame_{frame_id}.nc")
    config.output_directory = Path(f"output/frame_{frame_id}")
    config.to_yaml(f"config_frame_{frame_id}.yaml")

Pattern 3: Conditional Configuration¶

import os

# Different settings for local vs HPC
if os.getenv("SLURM_JOB_ID"):
    # On HPC cluster
    config.worker_settings = WorkerSettings(
        n_workers=int(os.getenv("SLURM_CPUS_PER_TASK", 16)),
        threads_per_worker=1,
    )
else:
    # Local machine
    config.worker_settings = WorkerSettings.create_standard()

Next Steps¶

See docs/tutorials/workflow.ipynb for running the full calibration workflow
Check the API docs for all available options
Look at examples/ directory for real-world configurations