Configuration¶
This tutorial covers the configuration system for the displacement calibration workflow.
Philosophy: The config system uses Pydantic for validation at serialization boundaries. This means:
- Type-safe configurations with automatic validation
- Clear error messages when something is wrong
- YAML files that map directly to Python objects
- No scattered validation checks throughout the codebase
Quick Start: Minimal Configuration¶
Let's start with the simplest possible config:
In [ ]:
Copied!
from pathlib import Path
from cal_disp.config import CalibrationWorkflow
# Create minimal config - just directories
config = CalibrationWorkflow.create_minimal()
print(config.summary())
from pathlib import Path
from cal_disp.config import CalibrationWorkflow
# Create minimal config - just directories
config = CalibrationWorkflow.create_minimal()
print(config.summary())
Note: This config isn't ready to run yet. It needs input files.
Step 1: Adding Required Input Files¶
Every workflow needs:
- A displacement file (DISP-S1 product)
- A calibration reference grid
- Dynamic ancillary files (DEM, LOS, masks, etc.)
In [ ]:
Copied!
from cal_disp.config import DynamicAncillaryFileGroup, InputFileGroup
# Set up input files
config.input_options = InputFileGroup(
disp_file=Path("data/OPERA_L3_DISP-S1_T001-000001-IFG_20240101_20240113_v1.0.nc"),
calibration_reference_grid=Path("data/reference_grid.parquet"),
frame_id=1,
)
# Set up dynamic ancillaries
config.dynamic_ancillary_file_options = DynamicAncillaryFileGroup(
dem_file=Path("data/dem.tif"),
los_east_file=Path("data/los_east.tif"),
los_north_file=Path("data/los_north.tif"),
)
# Check if ready to run
status = config.validate_ready_to_run()
print(f"Ready to run: {status['ready']}")
if status['errors']:
print("Errors:")
for error in status['errors']:
print(f" - {error}")
from cal_disp.config import DynamicAncillaryFileGroup, InputFileGroup
# Set up input files
config.input_options = InputFileGroup(
disp_file=Path("data/OPERA_L3_DISP-S1_T001-000001-IFG_20240101_20240113_v1.0.nc"),
calibration_reference_grid=Path("data/reference_grid.parquet"),
frame_id=1,
)
# Set up dynamic ancillaries
config.dynamic_ancillary_file_options = DynamicAncillaryFileGroup(
dem_file=Path("data/dem.tif"),
los_east_file=Path("data/los_east.tif"),
los_north_file=Path("data/los_north.tif"),
)
# Check if ready to run
status = config.validate_ready_to_run()
print(f"Ready to run: {status['ready']}")
if status['errors']:
print("Errors:")
for error in status['errors']:
print(f" - {error}")
Step 2: Working with YAML Files¶
Configs are typically stored as YAML files. Here's the round-trip:
In [ ]:
Copied!
# Save to YAML
yaml_path = Path("workflow_config.yaml")
config.to_yaml(yaml_path)
print(f"Saved to {yaml_path}")
# View the YAML
print("\nYAML contents:")
print(yaml_path.read_text())
# Save to YAML
yaml_path = Path("workflow_config.yaml")
config.to_yaml(yaml_path)
print(f"Saved to {yaml_path}")
# View the YAML
print("\nYAML contents:")
print(yaml_path.read_text())
In [ ]:
Copied!
# Load from YAML
loaded_config = CalibrationWorkflow.from_yaml(yaml_path)
print(loaded_config.summary())
# Load from YAML
loaded_config = CalibrationWorkflow.from_yaml(yaml_path)
print(loaded_config.summary())
Step 3: Worker Configuration¶
Control parallelism and memory usage:
In [ ]:
Copied!
from cal_disp.config import WorkerSettings
# Default settings (auto-detect CPU count)
config.worker_settings = WorkerSettings.create_standard()
print(f"Workers: {config.worker_settings.n_workers}")
print(f"Threads per worker: {config.worker_settings.threads_per_worker}")
print(f"Total threads: {config.worker_settings.total_threads}")
# Custom settings for a specific machine
config.worker_settings = WorkerSettings(
n_workers=4,
threads_per_worker=2,
block_shape=(512, 512),
)
print(f"\nCustom: {config.worker_settings.total_threads} total threads")
from cal_disp.config import WorkerSettings
# Default settings (auto-detect CPU count)
config.worker_settings = WorkerSettings.create_standard()
print(f"Workers: {config.worker_settings.n_workers}")
print(f"Threads per worker: {config.worker_settings.threads_per_worker}")
print(f"Total threads: {config.worker_settings.total_threads}")
# Custom settings for a specific machine
config.worker_settings = WorkerSettings(
n_workers=4,
threads_per_worker=2,
block_shape=(512, 512),
)
print(f"\nCustom: {config.worker_settings.total_threads} total threads")
Step 4: File Validation¶
Check which files exist before running:
In [ ]:
Copied!
# Check all input files
file_status = config.validate_input_files_exist()
print("File validation results:")
for filename, info in file_status.items():
status = "✓" if info['exists'] else "✗"
print(f" {status} {filename}: {info['path']}")
# Get just the missing files
missing = config.get_missing_files()
if missing:
print(f"\nMissing files: {', '.join(missing)}")
else:
print("\nAll files exist!")
# Check all input files
file_status = config.validate_input_files_exist()
print("File validation results:")
for filename, info in file_status.items():
status = "✓" if info['exists'] else "✗"
print(f" {status} {filename}: {info['path']}")
# Get just the missing files
missing = config.get_missing_files()
if missing:
print(f"\nMissing files: {', '.join(missing)}")
else:
print("\nAll files exist!")
Step 5: Static Ancillary Files (Optional)¶
For algorithm overrides, custom databases, etc:
In [ ]:
Copied!
from cal_disp.config import StaticAncillaryFileGroup
config.static_ancillary_file_options = StaticAncillaryFileGroup(
algorithm_parameters_overrides_json=Path("data/custom_params.json"),
# Add other static files as needed
)
print(config.summary())
from cal_disp.config import StaticAncillaryFileGroup
config.static_ancillary_file_options = StaticAncillaryFileGroup(
algorithm_parameters_overrides_json=Path("data/custom_params.json"),
# Add other static files as needed
)
print(config.summary())
Step 6: Directory Management¶
Create output directories and set up logging:
In [ ]:
Copied!
# Create directories
config.create_directories()
print(f"Work directory: {config.work_directory}")
print(f"Output directory: {config.output_directory}")
print(f"Log file: {config.log_file}")
# Set up logging
import logging
logger = config.setup_logging(level=logging.INFO)
logger.info("Workflow initialized")
# Create directories
config.create_directories()
print(f"Work directory: {config.work_directory}")
print(f"Output directory: {config.output_directory}")
print(f"Log file: {config.log_file}")
# Set up logging
import logging
logger = config.setup_logging(level=logging.INFO)
logger.info("Workflow initialized")
Complete Example: From Scratch¶
Putting it all together:
In [ ]:
Copied!
# Create a complete config in one go
complete_config = CalibrationWorkflow(
work_directory=Path("./processing/work"),
output_directory=Path("./processing/output"),
input_options=InputFileGroup(
disp_file=Path("data/disp.nc"),
calibration_reference_grid=Path("data/ref_grid.parquet"),
frame_id=1,
),
dynamic_ancillary_file_options=DynamicAncillaryFileGroup(
dem_file=Path("data/dem.tif"),
los_east_file=Path("data/los_east.tif"),
los_north_file=Path("data/los_north.tif"),
),
worker_settings=WorkerSettings(
n_workers=4,
threads_per_worker=2,
),
keep_paths_relative=True, # Keep relative for portability
)
# Save and verify
complete_config.to_yaml("production_config.yaml")
print(complete_config.summary())
# Create a complete config in one go
complete_config = CalibrationWorkflow(
work_directory=Path("./processing/work"),
output_directory=Path("./processing/output"),
input_options=InputFileGroup(
disp_file=Path("data/disp.nc"),
calibration_reference_grid=Path("data/ref_grid.parquet"),
frame_id=1,
),
dynamic_ancillary_file_options=DynamicAncillaryFileGroup(
dem_file=Path("data/dem.tif"),
los_east_file=Path("data/los_east.tif"),
los_north_file=Path("data/los_north.tif"),
),
worker_settings=WorkerSettings(
n_workers=4,
threads_per_worker=2,
),
keep_paths_relative=True, # Keep relative for portability
)
# Save and verify
complete_config.to_yaml("production_config.yaml")
print(complete_config.summary())
Path Resolution¶
By default, paths are resolved to absolute. Control this with keep_paths_relative:
In [ ]:
Copied!
# Absolute paths (default)
abs_config = CalibrationWorkflow(
work_directory=Path("./work"),
output_directory=Path("./output"),
keep_paths_relative=False,
)
print(f"Work dir: {abs_config.work_directory}")
# Relative paths (for portability)
rel_config = CalibrationWorkflow(
work_directory=Path("./work"),
output_directory=Path("./output"),
keep_paths_relative=True,
)
print(f"Work dir: {rel_config.work_directory}")
# Absolute paths (default)
abs_config = CalibrationWorkflow(
work_directory=Path("./work"),
output_directory=Path("./output"),
keep_paths_relative=False,
)
print(f"Work dir: {abs_config.work_directory}")
# Relative paths (for portability)
rel_config = CalibrationWorkflow(
work_directory=Path("./work"),
output_directory=Path("./output"),
keep_paths_relative=True,
)
print(f"Work dir: {rel_config.work_directory}")
Error Handling¶
The config system validates at creation time:
In [ ]:
Copied!
try:
# This will fail - invalid type for n_workers
bad_config = CalibrationWorkflow(
worker_settings=WorkerSettings(n_workers="not a number")
)
except Exception as e:
print(f"Validation error: {e}")
try:
# This will fail - invalid type for n_workers
bad_config = CalibrationWorkflow(
worker_settings=WorkerSettings(n_workers="not a number")
)
except Exception as e:
print(f"Validation error: {e}")
In [ ]:
Copied!
# Check readiness before running
incomplete_config = CalibrationWorkflow.create_minimal()
status = incomplete_config.validate_ready_to_run()
if not status['ready']:
print("Not ready to run!")
print("Errors:")
for error in status['errors']:
print(f" - {error}")
else:
print("Ready to run!")
# Check readiness before running
incomplete_config = CalibrationWorkflow.create_minimal()
status = incomplete_config.validate_ready_to_run()
if not status['ready']:
print("Not ready to run!")
print("Errors:")
for error in status['errors']:
print(f" - {error}")
else:
print("Ready to run!")
Command-Line Usage¶
Typical workflow from the command line:
# Create a template config
python -c "from cal_disp.config import CalibrationWorkflow; \
CalibrationWorkflow.create_example().to_yaml('config.yaml')"
# Edit config.yaml with your paths
vim config.yaml
# Run the workflow
cal-disp run config.yaml
Tips and Best Practices¶
- Start with
create_minimal()orcreate_example()- don't build configs from scratch - Use relative paths when configs need to be portable across machines
- Always call
validate_ready_to_run()before starting a long job - Set up logging early with
setup_logging()to catch issues - Version control your YAML configs - they're small and readable
- Use
summary()to sanity-check your configuration before running
Common Patterns¶
Pattern 1: Load, Modify, Save¶
# Load existing config
config = CalibrationWorkflow.from_yaml("config.yaml")
# Modify one thing
config.worker_settings.n_workers = 8
# Save as new config
config.to_yaml("config_8workers.yaml")
Pattern 2: Batch Processing¶
# Create configs for multiple frames
base_config = CalibrationWorkflow.from_yaml("base_config.yaml")
for frame_id in [1, 2, 3, 4]:
config = base_config.model_copy(deep=True)
config.input_options.frame_id = frame_id
config.input_options.disp_file = Path(f"data/frame_{frame_id}.nc")
config.output_directory = Path(f"output/frame_{frame_id}")
config.to_yaml(f"config_frame_{frame_id}.yaml")
Pattern 3: Conditional Configuration¶
import os
# Different settings for local vs HPC
if os.getenv("SLURM_JOB_ID"):
# On HPC cluster
config.worker_settings = WorkerSettings(
n_workers=int(os.getenv("SLURM_CPUS_PER_TASK", 16)),
threads_per_worker=1,
)
else:
# Local machine
config.worker_settings = WorkerSettings.create_standard()