kamco-cd-cron/shp-exporter/CLAUDE.md

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

Spring Boot 3.5.7 CLI application that converts PostgreSQL PostGIS spatial data to ESRI shapefiles and GeoJSON formats. The application uses **Spring Batch** for memory-efficient processing of large datasets (1M+ records) and supports automatic GeoServer layer registration via REST API.

**Key Features**:
- Memory-optimized batch processing (90-95% reduction: 2-13GB → 150-200MB)
- Chunk-based streaming with cursor pagination (fetch-size: 1000)
- Automatic geometry validation and type conversion (MultiPolygon → Polygon)
- Coordinate system validation (EPSG:5186 Korean 2000 / Central Belt)
- Dual execution modes: Spring Batch (recommended) and Legacy mode

## Build and Run Commands

### Build
```bash
./gradlew build                  # Full build with tests
./gradlew clean build -x test   # Skip tests
./gradlew spotlessApply         # Apply Google Java Format (2-space indentation)
./gradlew spotlessCheck         # Verify formatting without applying
```

Output: `build/libs/shp-exporter.jar` (fixed name, no version suffix)

### Run Application

#### Spring Batch Mode (Recommended)
```bash
# Generate shapefile + GeoJSON
./gradlew bootRun --args="--batch --converter.batch-ids[0]=252"

# With GeoServer registration
export GEOSERVER_USERNAME=admin
export GEOSERVER_PASSWORD=geoserver
./gradlew bootRun --args="--batch --geoserver.enabled=true --converter.batch-ids[0]=252"

# Using JAR (production)
java -jar build/libs/shp-exporter.jar \
  --batch \
  --converter.inference-id=D5E46F60FC40B1A8BE0CD1F3547AA6 \
  --converter.batch-ids[0]=252 \
  --converter.batch-ids[1]=253
```

#### Legacy Mode (Small Datasets Only)
```bash
./gradlew bootRun  # No --batch flag
# Warning: May OOM on large datasets
```

#### Upload Shapefile to GeoServer
Set environment variables first:
```bash
export GEOSERVER_USERNAME=admin
export GEOSERVER_PASSWORD=geoserver
```

Then upload:
```bash
./gradlew bootRun --args="--upload-shp /path/to/file.shp --layer layer_name"
```

Or using JAR:
```bash
java -jar build/libs/shp-exporter.jar --upload-shp /path/to/file.shp --layer layer_name
```

#### Override Configuration via Command Line
Using Gradle (recommended - no quoting issues):
```bash
./gradlew bootRun --args="--converter.inference-id=ABC123 --converter.map-ids[0]=35813030 --converter.batch-ids[0]=252 --converter.mode=MERGED"
```

Using JAR with zsh (quote arguments with brackets):
```bash
java -jar build/libs/shp-exporter.jar '--converter.inference-id=ABC123' '--converter.map-ids[0]=35813030'
```

### Code Formatting
Apply Google Java Format (2-space indentation) before committing:
```bash
./gradlew spotlessApply
```

Check formatting without applying:
```bash
./gradlew spotlessCheck
```

### Active Profile
By default, the application runs with `spring.profiles.active=prod` (set in `application.yml`). Profile-specific configurations are in `application-{profile}.yml` files.

## Architecture

### Dual Execution Modes

The application supports two execution modes with distinct processing pipelines:

#### Spring Batch Mode (Recommended)
**Trigger**: `--batch` flag
**Use Case**: Large datasets (100K+ records), production workloads
**Memory**: 150-200MB constant (chunk-based streaming)

**Pipeline Flow**:
```
ConverterCommandLineRunner
  → JobLauncher.run(mergedModeJob)
    → Step 1: GeometryTypeValidationTasklet (validates geometry homogeneity)
    → Step 2: generateShapefileStep (chunk-oriented)
        → JdbcCursorItemReader (fetch-size: 1000)
        → FeatureConversionProcessor (InferenceResult → SimpleFeature)
        → StreamingShapefileWriter (chunk-based append)
    → Step 3: generateGeoJsonStep (chunk-oriented, same pattern)
    → Step 4: CreateZipTasklet (creates .zip for GeoServer)
    → Step 5: GeoServerRegistrationTasklet (conditional, if --geoserver.enabled=true)
    → Step 6: generateMapIdFilesStep (partitioned, sequential map_id processing)
```

**Key Components**:
- `JdbcCursorItemReader`: Cursor-based streaming (no full result set loading)
- `StreamingShapefileWriter`: Opens GeoTools transaction, writes chunks incrementally, commits at end
- `GeometryTypeValidationTasklet`: Pre-validates with SQL `DISTINCT ST_GeometryType()`, auto-converts MultiPolygon
- `CompositeItemWriter`: Simultaneously writes shapefile and GeoJSON in map_id worker step

#### Legacy Mode
**Trigger**: No `--batch` flag (deprecated)
**Use Case**: Small datasets (<10K records)
**Memory**: 1.4-9GB (loads entire result set)

**Pipeline Flow**:
```
ConverterCommandLineRunner
  → ShapefileConverterService.convertAll()
    → InferenceResultRepository.findByBatchIds() (full List<InferenceResult>)
    → validateGeometries() (in-memory validation)
    → ShapefileWriter.write() (DefaultFeatureCollection accumulation)
    → GeoJsonWriter.write()
```

### Key Design Patterns

**Geometry Type Validation & Auto-Conversion**:
- Pre-validation step runs SQL `SELECT DISTINCT ST_GeometryType(geometry)` to detect mixed types
- Supports automatic conversion: `ST_MultiPolygon` → `ST_Polygon` (extracts first polygon only)
- Fails fast on unsupported mixed types (e.g., Polygon + LineString)
- Validates EPSG:5186 coordinate bounds (X: 125-530km, Y: -600-988km) and ST_IsValid()
- See `GeometryTypeValidationTasklet` (batch/tasklet/GeometryTypeValidationTasklet.java:1-290)

**WKT to JTS Conversion Pipeline**:
1. PostGIS query returns `ST_AsText(geometry)` as WKT string
2. `GeometryConvertingRowMapper` converts ResultSet row to `InferenceResult` with WKT string (batch/reader/GeometryConvertingRowMapper.java:1-74)
3. `FeatureConversionProcessor` uses `GeometryConverter.parseGeometry()` to convert WKT → JTS Geometry (service/GeometryConverter.java:1-92)
4. `StreamingShapefileWriter` wraps JTS geometry in GeoTools `SimpleFeature` and writes to shapefile

**Chunk-Based Transaction Management** (Spring Batch only):
```java
// StreamingShapefileWriter
@BeforeStep
public void open() {
    transaction = new DefaultTransaction("create");
    featureStore.setTransaction(transaction);  // Long-running transaction
}

@Override
public void write(Chunk<SimpleFeature> chunk) {
    ListFeatureCollection collection = new ListFeatureCollection(featureType, chunk.getItems());
    featureStore.addFeatures(collection);  // Append chunk to shapefile
    // chunk goes out of scope → GC eligible
}

@AfterStep
public void afterStep() {
    transaction.commit();  // Commit all chunks at once
    transaction.close();
}
```

**PostgreSQL Array Parameter Handling**:
```java
// InferenceResultItemReaderConfig uses PreparedStatementSetter
ps -> {
    Array batchIdsArray = ps.getConnection().createArrayOf("bigint", batchIds.toArray());
    ps.setArray(1, batchIdsArray);  // WHERE batch_id = ANY(?)
    ps.setString(2, mapId);
}
```

**Output Directory Strategy**:
- Batch mode (MERGED): `{output-base-dir}/{inference-id}/merge/` → Single merged shapefile + GeoJSON
- Batch mode (map_id partitioning): `{output-base-dir}/{inference-id}/{map-id}/` → Per-map_id files
- Legacy mode: `{output-base-dir}/{inference-id}/{map-id}/` (no merge folder)

**GeoServer Registration**:
- Only shapefile ZIP is uploaded (GeoJSON not registered)
- Requires pre-created workspace 'cd' and environment variables for auth
- Conditional execution via JobParameter `geoserver.enabled`
- Non-blocking: failures logged but don't stop batch job

## Configuration

### Profile System
- Default profile: `prod` (set in application.yml)
- Configuration hierarchy: `application.yml` → `application-{profile}.yml`
- Override via: `--spring.profiles.active=dev`

### Key Configuration Properties

**Converter Settings** (`ConverterProperties.java`):
```yaml
converter:
  inference-id: 'D5E46F60FC40B1A8BE0CD1F3547AA6'  # Output folder name
  batch-ids: [252, 253, 257]  # PostgreSQL batch_id filter (required)
  map-ids: []                 # Legacy mode only (ignored in batch mode)
  mode: 'MERGED'              # Legacy mode only: MERGED, MAP_IDS, or RESOLVE
  output-base-dir: '/data/model_output/export/'
  crs: 'EPSG:5186'            # Korean 2000 / Central Belt

  batch:
    chunk-size: 1000          # Records per chunk (affects memory usage)
    fetch-size: 1000          # JDBC cursor fetch size
    skip-limit: 100           # Max skippable records per chunk
    enable-partitioning: false  # Future: parallel map_id processing
```

**GeoServer Settings** (`GeoServerProperties.java`):
```yaml
geoserver:
  base-url: 'https://kamco.geo-dev.gs.dabeeo.com/geoserver'
  workspace: 'cd'              # Must be pre-created in GeoServer
  overwrite-existing: true     # Delete existing layer before registration
  connection-timeout: 30000    # 30 seconds
  read-timeout: 60000          # 60 seconds
  # Credentials from environment variables (preferred):
  # GEOSERVER_USERNAME, GEOSERVER_PASSWORD
```

**Spring Batch Metadata**:
```yaml
spring:
  batch:
    job:
      enabled: false           # Prevent auto-run on startup
    jdbc:
      initialize-schema: always  # Auto-create BATCH_* tables
```

## Database Integration

### Query Strategies

**Spring Batch Mode** (streaming):
```sql
-- InferenceResultItemReaderConfig.java
SELECT uid, map_id, probability, before_year, after_year,
       before_c, before_p, after_c, after_p,
       ST_AsText(geometry) as geometry_wkt
FROM inference_results_testing
WHERE batch_id = ANY(?)
  AND ST_GeometryType(geometry) IN ('ST_Polygon', 'ST_MultiPolygon')
  AND ST_SRID(geometry) = 5186
  AND ST_X(ST_Centroid(geometry)) BETWEEN 125000 AND 530000
  AND ST_Y(ST_Centroid(geometry)) BETWEEN -600000 AND 988000
  AND ST_IsValid(geometry) = true
ORDER BY map_id, uid
-- Uses server-side cursor with fetch-size=1000
```

**Legacy Mode** (full load):
```sql
-- InferenceResultRepository.java
SELECT uid, map_id, probability, before_year, after_year,
       before_c, before_p, after_c, after_p,
       ST_AsText(geometry) as geometry_wkt
FROM inference_results_testing
WHERE batch_id = ANY(?) AND map_id = ?
-- Returns full List<InferenceResult> in memory
```

**Geometry Type Validation**:
```sql
-- GeometryTypeValidationTasklet.java
SELECT DISTINCT ST_GeometryType(geometry)
FROM inference_results_testing
WHERE batch_id = ANY(?) AND geometry IS NOT NULL
-- Pre-validates homogeneous geometry requirement
```

### Field Mapping
Database columns map to shapefile fields (10-character limit):

| Database Column | DB Type | Shapefile Field | Shapefile Type | Notes |
|-----------------|---------|-----------------|----------------|-------|
| uid             | uuid    | chnDtctId       | String         | Change detection ID |
| map_id          | text    | mpqd_no         | String         | Map quadrant number |
| probability     | float8  | chn_dtct_p      | Double         | Change detection probability |
| before_year     | bigint  | cprs_yr         | Long           | Comparison year |
| after_year      | bigint  | crtr_yr         | Long           | Criteria year |
| before_c        | text    | bf_cls_cd       | String         | Before classification code |
| before_p        | float8  | bf_cls_pro      | Double         | Before classification probability |
| after_c         | text    | af_cls_cd       | String         | After classification code |
| after_p         | float8  | af_cls_pro      | Double         | After classification probability |
| geometry        | geom    | the_geom        | Polygon        | Geometry in EPSG:5186 |

**Field name source**: See `FeatureTypeFactory.java` (batch/util/FeatureTypeFactory.java:1-104)

### Coordinate Reference System
- **CRS**: EPSG:5186 (Korean 2000 / Central Belt)
- **Valid Coordinate Bounds**: X ∈ [125km, 530km], Y ∈ [-600km, 988km]
- **Encoding**: WKT in SQL → JTS Geometry → GeoTools SimpleFeature → `.prj` file
- **Validation**: Automatic in batch mode via `ST_X(ST_Centroid())` range check

## Dependencies

**Core Framework**:
- Spring Boot 3.5.7
  - `spring-boot-starter`: DI container, logging
  - `spring-boot-starter-jdbc`: JDBC template, HikariCP
  - `spring-boot-starter-batch`: Spring Batch framework, job repository
  - `spring-boot-starter-web`: RestTemplate for GeoServer API calls
  - `spring-boot-starter-validation`: @NotBlank annotations

**Spatial Libraries**:
- GeoTools 30.0 (via OSGeo repository)
  - `gt-shapefile`: Shapefile I/O (DataStore, FeatureStore, Transaction)
  - `gt-geojson`: GeoJSON encoding/decoding
  - `gt-referencing`: CRS transformations
  - `gt-epsg-hsql`: EPSG database for CRS lookups
- JTS 1.19.0: Geometry primitives (Polygon, MultiPolygon, GeometryFactory)
- PostGIS JDBC 2.5.1: PostGIS geometry type support

**Database**:
- PostgreSQL JDBC Driver (latest)
- HikariCP (bundled with Spring Boot)

**Build Configuration**:
```gradle
// build.gradle
configurations.all {
  exclude group: 'javax.media', module: 'jai_core'  // Conflicts with GeoTools
}

bootJar {
  archiveFileName = "shp-exporter.jar"  // Fixed JAR name
}

spotless {
  java {
    googleJavaFormat('1.19.2')  // 2-space indentation
  }
}
```

## Development Patterns

### Adding a New Step to Spring Batch Job

When adding steps to `mergedModeJob`, follow this pattern:

1. **Create Tasklet or ItemWriter** in `batch/tasklet/` or `batch/writer/`
2. **Define Step Bean** in `MergedModeJobConfig.java`:
```java
@Bean
public Step myNewStep(JobRepository jobRepository,
                      PlatformTransactionManager transactionManager,
                      MyTasklet tasklet,
                      BatchExecutionHistoryListener historyListener) {
  return new StepBuilder("myNewStep", jobRepository)
      .tasklet(tasklet, transactionManager)
      .listener(historyListener)  // REQUIRED for history tracking
      .build();
}
```
3. **Add to Job Flow** in `mergedModeJob()`:
```java
.next(myNewStep)
```
4. **Always include `BatchExecutionHistoryListener`** to track execution metrics

### Modifying ItemReader Configuration

ItemReaders are **not thread-safe**. Each step requires its own instance:

```java
// WRONG: Sharing reader between steps
@Bean
public JdbcCursorItemReader<InferenceResult> reader() { ... }

// RIGHT: Separate readers with @StepScope
@Bean
@StepScope  // Creates new instance per step
public JdbcCursorItemReader<InferenceResult> shapefileReader() { ... }

@Bean
@StepScope
public JdbcCursorItemReader<InferenceResult> geoJsonReader() { ... }
```

See `InferenceResultItemReaderConfig.java` for working examples.

### Streaming Writers Pattern

When writing custom streaming writers, follow `StreamingShapefileWriter` pattern:

```java
@Component
@StepScope
public class MyStreamingWriter implements ItemStreamWriter<MyType> {
    private Transaction transaction;

    @BeforeStep
    public void open(ExecutionContext context) {
        // Open resources, start transaction
        transaction = new DefaultTransaction("create");
    }

    @Override
    public void write(Chunk<? extends MyType> chunk) {
        // Write chunk incrementally
        // Do NOT accumulate in memory
    }

    @AfterStep
    public ExitStatus afterStep(StepExecution stepExecution) {
        transaction.commit();  // Commit all chunks
        transaction.close();
        return ExitStatus.COMPLETED;
    }
}
```

### JobParameters and StepExecutionContext

**Pass data between steps** using `StepExecutionContext`:

```java
// Step 1: Store data
stepExecution.getExecutionContext().putString("geometryType", "ST_Polygon");

// Step 2: Retrieve data
@BeforeStep
public void beforeStep(StepExecution stepExecution) {
    String geomType = stepExecution.getJobExecution()
        .getExecutionContext()
        .getString("geometryType");
}
```

**Job-level parameters** from command line:
```java
// ConverterCommandLineRunner.buildJobParameters()
JobParametersBuilder builder = new JobParametersBuilder();
builder.addString("inferenceId", converterProperties.getInferenceId());
builder.addLong("timestamp", System.currentTimeMillis());  // Ensures uniqueness
```

### Partitioning Pattern (Map ID Processing)

The `generateMapIdFilesStep` uses partitioning but runs **sequentially** to avoid DB connection pool exhaustion:

```java
@Bean
public Step generateMapIdFilesStep(...) {
    return new StepBuilder("generateMapIdFilesStep", jobRepository)
        .partitioner("mapIdWorker", partitioner)
        .step(mapIdWorkerStep)
        .taskExecutor(new SyncTaskExecutor())  // SEQUENTIAL execution
        .build();
}
```

For parallel execution in future (requires connection pool tuning):
```java
.taskExecutor(new SimpleAsyncTaskExecutor())
.gridSize(4)  // 4 concurrent workers
```

### GeoServer REST API Integration

GeoServer operations use `RestTemplate` with custom error handling:

```java
// GeoServerRegistrationService.java
try {
    restTemplate.exchange(url, HttpMethod.PUT, entity, String.class);
} catch (HttpClientErrorException e) {
    if (e.getStatusCode() == HttpStatus.NOT_FOUND) {
        // Handle workspace not found
    }
}
```

Always check workspace existence before layer registration.

### Testing Considerations

- **Unit tests**: Mock `JdbcTemplate`, `DataSource` for repository tests
- **Integration tests**: Use `@SpringBatchTest` with embedded H2 database
- **GeoTools**: Use `MemoryDataStore` for shapefile writer tests
- **Current state**: Limited test coverage (focus on critical path validation)

Refer to `claudedocs/SPRING_BATCH_MIGRATION.md` for detailed batch architecture documentation.