Files
kamco-cd-cron/shp-exporter/CLAUDE.md
2026-04-15 12:01:53 +09:00

18 KiB
Executable File

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Spring Boot 3.5.7 / Java 21 CLI application that converts PostgreSQL PostGIS spatial data to ESRI shapefiles and GeoJSON formats. The application uses Spring Batch for memory-efficient processing of large datasets (1M+ records) and supports automatic GeoServer layer registration via REST API.

Key Features:

  • Memory-optimized batch processing (90-95% reduction: 2-13GB → 150-200MB)
  • Chunk-based streaming with cursor pagination (fetch-size: 1000)
  • Automatic geometry validation and type conversion (MultiPolygon → Polygon)
  • Coordinate system validation (EPSG:5186 Korean 2000 / Central Belt)
  • Three execution modes: Spring Batch (recommended), Legacy, and GeoServer registration-only

Build and Run Commands

Build

./gradlew build                  # Full build with tests
./gradlew clean build -x test   # Skip tests
./gradlew spotlessApply         # Apply Google Java Format (2-space indentation)
./gradlew spotlessCheck         # Verify formatting without applying

Output: build/libs/shp-exporter.jar (fixed name, no version suffix)

Note

: The Dockerfile currently references shp-exporter-v2.jar in its COPY step, which does not match the actual build output. Update the Dockerfile if building a Docker image.

Run Application

# Generate shapefile + GeoJSON
./gradlew bootRun --args="--batch --converter.batch-ids[0]=252"

# With GeoServer registration
export GEOSERVER_USERNAME=admin
export GEOSERVER_PASSWORD=geoserver
./gradlew bootRun --args="--batch --geoserver.enabled=true --converter.batch-ids[0]=252"

# Using JAR (production)
java -jar build/libs/shp-exporter.jar \
  --batch \
  --converter.inference-id=D5E46F60FC40B1A8BE0CD1F3547AA6 \
  --converter.batch-ids[0]=252 \
  --converter.batch-ids[1]=253

Legacy Mode (Small Datasets Only)

./gradlew bootRun  # No --batch flag
# Warning: May OOM on large datasets

Upload Shapefile to GeoServer

Set environment variables first:

export GEOSERVER_USERNAME=admin
export GEOSERVER_PASSWORD=geoserver

Then upload:

./gradlew bootRun --args="--upload-shp /path/to/file.shp --layer layer_name"

Or using JAR:

java -jar build/libs/shp-exporter.jar --upload-shp /path/to/file.shp --layer layer_name

Override Configuration via Command Line

Using Gradle (recommended - no quoting issues):

./gradlew bootRun --args="--converter.inference-id=ABC123 --converter.map-ids[0]=35813030 --converter.batch-ids[0]=252 --converter.mode=MERGED"

Using JAR with zsh (quote arguments with brackets):

java -jar build/libs/shp-exporter.jar '--converter.inference-id=ABC123' '--converter.map-ids[0]=35813030'

Code Formatting

Apply Google Java Format (2-space indentation) before committing:

./gradlew spotlessApply

Check formatting without applying:

./gradlew spotlessCheck

Active Profile

By default, the application runs with spring.profiles.active=prod (set in application.yml). Profile-specific configurations are in application-{profile}.yml files.

Architecture

Dual Execution Modes

The application supports two execution modes with distinct processing pipelines:

Trigger: --batch flag Use Case: Large datasets (100K+ records), production workloads Memory: 150-200MB constant (chunk-based streaming)

Pipeline Flow:

ConverterCommandLineRunner
  → JobLauncher.run(mergedModeJob)
    → Step 1: GeometryTypeValidationTasklet (validates geometry homogeneity)
    → Step 2: generateShapefileStep (chunk-oriented)
        → JdbcCursorItemReader (fetch-size: 1000)
        → FeatureConversionProcessor (InferenceResult → SimpleFeature)
        → StreamingShapefileWriter (chunk-based append)
    → Step 2-1: PostShapefileUpdateTasklet (post-export DB UPDATE hook)
    → Step 3: generateGeoJsonStep (chunk-oriented, same pattern)
    → Step 4: CreateZipTasklet (creates .zip for GeoServer)
    → Step 5: GeoServerRegistrationTasklet (conditional, if --geoserver.enabled=true)
    → Step 6: generateMapIdFilesStep (partitioned, sequential map_id processing)

Key Components:

  • JdbcCursorItemReader: Cursor-based streaming (no full result set loading)
  • StreamingShapefileWriter: Opens GeoTools transaction, writes chunks incrementally, commits at end
  • GeometryTypeValidationTasklet: Pre-validates with SQL DISTINCT ST_GeometryType(), auto-converts MultiPolygon
  • CompositeItemWriter: Simultaneously writes shapefile and GeoJSON in map_id worker step

Legacy Mode

Trigger: No --batch flag (deprecated) Use Case: Small datasets (<10K records) Memory: 1.4-9GB (loads entire result set)

Pipeline Flow:

ConverterCommandLineRunner
  → ShapefileConverterService.convertAll()
    → InferenceResultRepository.findByBatchIds() (full List<InferenceResult>)
    → validateGeometries() (in-memory validation)
    → ShapefileWriter.write() (DefaultFeatureCollection accumulation)
    → GeoJsonWriter.write()

Key Design Patterns

Geometry Type Validation & Auto-Conversion:

  • Pre-validation step runs SQL SELECT DISTINCT ST_GeometryType(geometry) to detect mixed types
  • Supports automatic conversion: ST_MultiPolygonST_Polygon (extracts first polygon only)
  • Fails fast on unsupported mixed types (e.g., Polygon + LineString)
  • Validates EPSG:5186 coordinate bounds (X: 125-530km, Y: -600-988km) and ST_IsValid()
  • See GeometryTypeValidationTasklet (batch/tasklet/GeometryTypeValidationTasklet.java:1-290)

WKT to JTS Conversion Pipeline:

  1. PostGIS query returns ST_AsText(geometry) as WKT string
  2. GeometryConvertingRowMapper converts ResultSet row to InferenceResult with WKT string (batch/reader/GeometryConvertingRowMapper.java:1-74)
  3. FeatureConversionProcessor uses GeometryConverter.parseGeometry() to convert WKT → JTS Geometry (service/GeometryConverter.java:1-92)
  4. StreamingShapefileWriter wraps JTS geometry in GeoTools SimpleFeature and writes to shapefile

Chunk-Based Transaction Management (Spring Batch only):

// StreamingShapefileWriter
@BeforeStep
public void open() {
    transaction = new DefaultTransaction("create");
    featureStore.setTransaction(transaction);  // Long-running transaction
}

@Override
public void write(Chunk<SimpleFeature> chunk) {
    ListFeatureCollection collection = new ListFeatureCollection(featureType, chunk.getItems());
    featureStore.addFeatures(collection);  // Append chunk to shapefile
    // chunk goes out of scope → GC eligible
}

@AfterStep
public void afterStep() {
    transaction.commit();  // Commit all chunks at once
    transaction.close();
}

PostgreSQL Array Parameter Handling:

// InferenceResultItemReaderConfig uses PreparedStatementSetter
ps -> {
    Array batchIdsArray = ps.getConnection().createArrayOf("bigint", batchIds.toArray());
    ps.setArray(1, batchIdsArray);  // WHERE batch_id = ANY(?)
    ps.setString(2, mapId);
}

Output Directory Strategy:

  • Batch mode (MERGED): {output-base-dir}/{inference-id}/merge/ → Single merged shapefile + GeoJSON
  • Batch mode (map_id partitioning): {output-base-dir}/{inference-id}/{map-id}/ → Per-map_id files
  • Legacy mode: {output-base-dir}/{inference-id}/{map-id}/ (no merge folder)

GeoServer Registration:

  • Only shapefile ZIP is uploaded (GeoJSON not registered)
  • Requires pre-created workspace 'cd' and environment variables for auth
  • Conditional execution via JobParameter geoserver.enabled
  • Non-blocking: failures logged but don't stop batch job

Configuration

Profile System

  • Default profile: prod (set in application.yml)
  • Configuration hierarchy: application.ymlapplication-{profile}.yml
  • Override via: --spring.profiles.active=dev

Key Configuration Properties

Converter Settings (ConverterProperties.java):

converter:
  inference-id: 'D5E46F60FC40B1A8BE0CD1F3547AA6'  # Output folder name
  batch-ids: [252, 253, 257]  # PostgreSQL batch_id filter (required)
  map-ids: []                 # Legacy mode only (ignored in batch mode)
  mode: 'MERGED'              # Legacy mode only: MERGED, MAP_IDS, or RESOLVE
  output-base-dir: '/data/model_output/export/'
  crs: 'EPSG:5186'            # Korean 2000 / Central Belt

  batch:
    chunk-size: 1000          # Records per chunk (affects memory usage)
    fetch-size: 1000          # JDBC cursor fetch size
    skip-limit: 100           # Max skippable records per chunk
    enable-partitioning: false  # Future: parallel map_id processing

GeoServer Settings (GeoServerProperties.java):

geoserver:
  base-url: 'https://kamco.geo-dev.gs.dabeeo.com/geoserver'
  workspace: 'cd'              # Must be pre-created in GeoServer
  overwrite-existing: true     # Delete existing layer before registration
  connection-timeout: 30000    # 30 seconds
  read-timeout: 60000          # 60 seconds
  # Credentials from environment variables (preferred):
  # GEOSERVER_USERNAME, GEOSERVER_PASSWORD

Spring Batch Metadata:

spring:
  batch:
    job:
      enabled: false           # Prevent auto-run on startup
    jdbc:
      initialize-schema: always  # Auto-create BATCH_* tables

Database Integration

Query Strategies

Spring Batch Mode (streaming):

-- InferenceResultItemReaderConfig.java
SELECT uid, map_id, probability, before_year, after_year,
       before_c, before_p, after_c, after_p,
       ST_AsText(geometry) as geometry_wkt
FROM inference_results_testing
WHERE batch_id = ANY(?)
  AND ST_GeometryType(geometry) IN ('ST_Polygon', 'ST_MultiPolygon')
  AND ST_SRID(geometry) = 5186
  AND ST_X(ST_Centroid(geometry)) BETWEEN 125000 AND 530000
  AND ST_Y(ST_Centroid(geometry)) BETWEEN -600000 AND 988000
  AND ST_IsValid(geometry) = true
ORDER BY map_id, uid
-- Uses server-side cursor with fetch-size=1000

Legacy Mode (full load):

-- InferenceResultRepository.java
SELECT uid, map_id, probability, before_year, after_year,
       before_c, before_p, after_c, after_p,
       ST_AsText(geometry) as geometry_wkt
FROM inference_results_testing
WHERE batch_id = ANY(?) AND map_id = ?
-- Returns full List<InferenceResult> in memory

Geometry Type Validation:

-- GeometryTypeValidationTasklet.java
SELECT DISTINCT ST_GeometryType(geometry)
FROM inference_results_testing
WHERE batch_id = ANY(?) AND geometry IS NOT NULL
-- Pre-validates homogeneous geometry requirement

Field Mapping

Database columns map to shapefile fields (10-character limit):

Database Column DB Type Shapefile Field Shapefile Type Notes
uid uuid chnDtctId String Change detection ID
map_id text mpqd_no String Map quadrant number
probability float8 chn_dtct_p Double Change detection probability
before_year bigint cprs_yr Long Comparison year
after_year bigint crtr_yr Long Criteria year
before_c text bf_cls_cd String Before classification code
before_p float8 bf_cls_pro Double Before classification probability
after_c text af_cls_cd String After classification code
after_p float8 af_cls_pro Double After classification probability
geometry geom the_geom Polygon Geometry in EPSG:5186

Field name source: See FeatureTypeFactory.java (batch/util/FeatureTypeFactory.java:1-104)

Coordinate Reference System

  • CRS: EPSG:5186 (Korean 2000 / Central Belt)
  • Valid Coordinate Bounds: X ∈ [125km, 530km], Y ∈ [-600km, 988km]
  • Encoding: WKT in SQL → JTS Geometry → GeoTools SimpleFeature → .prj file
  • Validation: Automatic in batch mode via ST_X(ST_Centroid()) range check

Dependencies

Core Framework:

  • Spring Boot 3.5.7
    • spring-boot-starter: DI container, logging
    • spring-boot-starter-jdbc: JDBC template, HikariCP
    • spring-boot-starter-batch: Spring Batch framework, job repository
    • spring-boot-starter-web: RestTemplate for GeoServer API calls
    • spring-boot-starter-validation: @NotBlank annotations

Spatial Libraries:

  • GeoTools 30.0 (via OSGeo repository)
    • gt-shapefile: Shapefile I/O (DataStore, FeatureStore, Transaction)
    • gt-geojson: GeoJSON encoding/decoding
    • gt-referencing: CRS transformations
    • gt-epsg-hsql: EPSG database for CRS lookups
  • JTS 1.19.0: Geometry primitives (Polygon, MultiPolygon, GeometryFactory)
  • PostGIS JDBC 2.5.1: PostGIS geometry type support

Database:

  • PostgreSQL JDBC Driver (latest)
  • HikariCP (bundled with Spring Boot)

Build Configuration:

// build.gradle
configurations.all {
  exclude group: 'javax.media', module: 'jai_core'  // Conflicts with GeoTools
}

bootJar {
  archiveFileName = "shp-exporter.jar"  // Fixed JAR name
}

spotless {
  java {
    googleJavaFormat('1.19.2')  // 2-space indentation
  }
}

Development Patterns

Adding a New Step to Spring Batch Job

When adding steps to mergedModeJob, follow this pattern:

  1. Create Tasklet or ItemWriter in batch/tasklet/ or batch/writer/
  2. Define Step Bean in MergedModeJobConfig.java:
@Bean
public Step myNewStep(JobRepository jobRepository,
                      PlatformTransactionManager transactionManager,
                      MyTasklet tasklet,
                      BatchExecutionHistoryListener historyListener) {
  return new StepBuilder("myNewStep", jobRepository)
      .tasklet(tasklet, transactionManager)
      .listener(historyListener)  // REQUIRED for history tracking
      .build();
}
  1. Add to Job Flow in mergedModeJob():
.next(myNewStep)
  1. Always include BatchExecutionHistoryListener to track execution metrics

Post-Export DB Hook (PostShapefileUpdateTasklet)

PostShapefileUpdateTasklet runs immediately after generateShapefileStep and is designed as a placeholder for running UPDATE SQL after shapefile export (e.g., marking rows as exported). The SQL body is intentionally left as a // TODO — add your UPDATE statement inside execute():

// batch/tasklet/PostShapefileUpdateTasklet.java
int updated = jdbcTemplate.update(
    "UPDATE some_table SET status = 'EXPORTED' WHERE batch_id = ANY(?)",
    ps -> {
      ps.setArray(1, ps.getConnection().createArrayOf("bigint", batchIdList.toArray()));
    });

Job parameters available: inferenceId (String), batchIds (comma-separated String → List<Long>).

Modifying ItemReader Configuration

ItemReaders are not thread-safe. Each step requires its own instance:

// WRONG: Sharing reader between steps
@Bean
public JdbcCursorItemReader<InferenceResult> reader() { ... }

// RIGHT: Separate readers with @StepScope
@Bean
@StepScope  // Creates new instance per step
public JdbcCursorItemReader<InferenceResult> shapefileReader() { ... }

@Bean
@StepScope
public JdbcCursorItemReader<InferenceResult> geoJsonReader() { ... }

See InferenceResultItemReaderConfig.java for working examples.

Streaming Writers Pattern

When writing custom streaming writers, follow StreamingShapefileWriter pattern:

@Component
@StepScope
public class MyStreamingWriter implements ItemStreamWriter<MyType> {
    private Transaction transaction;

    @BeforeStep
    public void open(ExecutionContext context) {
        // Open resources, start transaction
        transaction = new DefaultTransaction("create");
    }

    @Override
    public void write(Chunk<? extends MyType> chunk) {
        // Write chunk incrementally
        // Do NOT accumulate in memory
    }

    @AfterStep
    public ExitStatus afterStep(StepExecution stepExecution) {
        transaction.commit();  // Commit all chunks
        transaction.close();
        return ExitStatus.COMPLETED;
    }
}

JobParameters and StepExecutionContext

Pass data between steps using StepExecutionContext:

// Step 1: Store data
stepExecution.getExecutionContext().putString("geometryType", "ST_Polygon");

// Step 2: Retrieve data
@BeforeStep
public void beforeStep(StepExecution stepExecution) {
    String geomType = stepExecution.getJobExecution()
        .getExecutionContext()
        .getString("geometryType");
}

Job-level parameters from command line:

// ConverterCommandLineRunner.buildJobParameters()
JobParametersBuilder builder = new JobParametersBuilder();
builder.addString("inferenceId", converterProperties.getInferenceId());
builder.addLong("timestamp", System.currentTimeMillis());  // Ensures uniqueness

Partitioning Pattern (Map ID Processing)

The generateMapIdFilesStep uses partitioning but runs sequentially to avoid DB connection pool exhaustion:

@Bean
public Step generateMapIdFilesStep(...) {
    return new StepBuilder("generateMapIdFilesStep", jobRepository)
        .partitioner("mapIdWorker", partitioner)
        .step(mapIdWorkerStep)
        .taskExecutor(new SyncTaskExecutor())  // SEQUENTIAL execution
        .build();
}

For parallel execution in future (requires connection pool tuning):

.taskExecutor(new SimpleAsyncTaskExecutor())
.gridSize(4)  // 4 concurrent workers

GeoServer REST API Integration

GeoServer operations use RestTemplate with custom error handling:

// GeoServerRegistrationService.java
try {
    restTemplate.exchange(url, HttpMethod.PUT, entity, String.class);
} catch (HttpClientErrorException e) {
    if (e.getStatusCode() == HttpStatus.NOT_FOUND) {
        // Handle workspace not found
    }
}

Always check workspace existence before layer registration.

Testing Considerations

  • Unit tests: Mock JdbcTemplate, DataSource for repository tests
  • Integration tests: Use @SpringBatchTest with embedded H2 database
  • GeoTools: Use MemoryDataStore for shapefile writer tests
  • Current state: Limited test coverage (focus on critical path validation)

Refer to claudedocs/SPRING_BATCH_MIGRATION.md for detailed batch architecture documentation.