MVPTeam/kamco-cd-cron

Fork 0

Files

dean 0f7d794a38 oom처리

2026-04-15 12:01:53 +09:00

18 KiB

Executable File

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Spring Boot 3.5.7 / Java 21 CLI application that converts PostgreSQL PostGIS spatial data to ESRI shapefiles and GeoJSON formats. The application uses Spring Batch for memory-efficient processing of large datasets (1M+ records) and supports automatic GeoServer layer registration via REST API.

Key Features:

Memory-optimized batch processing (90-95% reduction: 2-13GB → 150-200MB)
Chunk-based streaming with cursor pagination (fetch-size: 1000)
Automatic geometry validation and type conversion (MultiPolygon → Polygon)
Coordinate system validation (EPSG:5186 Korean 2000 / Central Belt)
Three execution modes: Spring Batch (recommended), Legacy, and GeoServer registration-only

Build and Run Commands

Build

./gradlew build                  # Full build with tests
./gradlew clean build -x test   # Skip tests
./gradlew spotlessApply         # Apply Google Java Format (2-space indentation)
./gradlew spotlessCheck         # Verify formatting without applying

Output: build/libs/shp-exporter.jar (fixed name, no version suffix)

Note

: The Dockerfile currently references shp-exporter-v2.jar in its COPY step, which does not match the actual build output. Update the Dockerfile if building a Docker image.

Run Application

Spring Batch Mode (Recommended)

# Generate shapefile + GeoJSON
./gradlew bootRun --args="--batch --converter.batch-ids[0]=252"

# With GeoServer registration
export GEOSERVER_USERNAME=admin
export GEOSERVER_PASSWORD=geoserver
./gradlew bootRun --args="--batch --geoserver.enabled=true --converter.batch-ids[0]=252"

# Using JAR (production)
java -jar build/libs/shp-exporter.jar \
  --batch \
  --converter.inference-id=D5E46F60FC40B1A8BE0CD1F3547AA6 \
  --converter.batch-ids[0]=252 \
  --converter.batch-ids[1]=253

Legacy Mode (Small Datasets Only)

./gradlew bootRun  # No --batch flag
# Warning: May OOM on large datasets

Upload Shapefile to GeoServer

Set environment variables first:

export GEOSERVER_USERNAME=admin
export GEOSERVER_PASSWORD=geoserver

Then upload:

./gradlew bootRun --args="--upload-shp /path/to/file.shp --layer layer_name"

Or using JAR:

java -jar build/libs/shp-exporter.jar --upload-shp /path/to/file.shp --layer layer_name

Override Configuration via Command Line

Using Gradle (recommended - no quoting issues):

./gradlew bootRun --args="--converter.inference-id=ABC123 --converter.map-ids[0]=35813030 --converter.batch-ids[0]=252 --converter.mode=MERGED"

Using JAR with zsh (quote arguments with brackets):

java -jar build/libs/shp-exporter.jar '--converter.inference-id=ABC123' '--converter.map-ids[0]=35813030'

Code Formatting

Apply Google Java Format (2-space indentation) before committing:

./gradlew spotlessApply

Check formatting without applying:

./gradlew spotlessCheck

Active Profile

By default, the application runs with spring.profiles.active=prod (set in application.yml). Profile-specific configurations are in application-{profile}.yml files.

Architecture

Dual Execution Modes

The application supports two execution modes with distinct processing pipelines:

Spring Batch Mode (Recommended)

Trigger: --batch flag Use Case: Large datasets (100K+ records), production workloads Memory: 150-200MB constant (chunk-based streaming)

Pipeline Flow:

ConverterCommandLineRunner
  → JobLauncher.run(mergedModeJob)
    → Step 1: GeometryTypeValidationTasklet (validates geometry homogeneity)
    → Step 2: generateShapefileStep (chunk-oriented)
        → JdbcCursorItemReader (fetch-size: 1000)
        → FeatureConversionProcessor (InferenceResult → SimpleFeature)
        → StreamingShapefileWriter (chunk-based append)
    → Step 2-1: PostShapefileUpdateTasklet (post-export DB UPDATE hook)
    → Step 3: generateGeoJsonStep (chunk-oriented, same pattern)
    → Step 4: CreateZipTasklet (creates .zip for GeoServer)
    → Step 5: GeoServerRegistrationTasklet (conditional, if --geoserver.enabled=true)
    → Step 6: generateMapIdFilesStep (partitioned, sequential map_id processing)

Key Components:

JdbcCursorItemReader: Cursor-based streaming (no full result set loading)
StreamingShapefileWriter: Opens GeoTools transaction, writes chunks incrementally, commits at end
GeometryTypeValidationTasklet: Pre-validates with SQL DISTINCT ST_GeometryType(), auto-converts MultiPolygon
CompositeItemWriter: Simultaneously writes shapefile and GeoJSON in map_id worker step

Legacy Mode

Trigger: No --batch flag (deprecated) Use Case: Small datasets (<10K records) Memory: 1.4-9GB (loads entire result set)

Pipeline Flow:

ConverterCommandLineRunner
  → ShapefileConverterService.convertAll()
    → InferenceResultRepository.findByBatchIds() (full List<InferenceResult>)
    → validateGeometries() (in-memory validation)
    → ShapefileWriter.write() (DefaultFeatureCollection accumulation)
    → GeoJsonWriter.write()

Key Design Patterns

Geometry Type Validation & Auto-Conversion:

Pre-validation step runs SQL SELECT DISTINCT ST_GeometryType(geometry) to detect mixed types
Supports automatic conversion: ST_MultiPolygon → ST_Polygon (extracts first polygon only)
Fails fast on unsupported mixed types (e.g., Polygon + LineString)
Validates EPSG:5186 coordinate bounds (X: 125-530km, Y: -600-988km) and ST_IsValid()
See GeometryTypeValidationTasklet (batch/tasklet/GeometryTypeValidationTasklet.java:1-290)

WKT to JTS Conversion Pipeline:

PostGIS query returns ST_AsText(geometry) as WKT string
GeometryConvertingRowMapper converts ResultSet row to InferenceResult with WKT string (batch/reader/GeometryConvertingRowMapper.java:1-74)
FeatureConversionProcessor uses GeometryConverter.parseGeometry() to convert WKT → JTS Geometry (service/GeometryConverter.java:1-92)
StreamingShapefileWriter wraps JTS geometry in GeoTools SimpleFeature and writes to shapefile

Chunk-Based Transaction Management (Spring Batch only):

// StreamingShapefileWriter
@BeforeStep
public void open() {
    transaction = new DefaultTransaction("create");
    featureStore.setTransaction(transaction);  // Long-running transaction
}

@Override
public void write(Chunk<SimpleFeature> chunk) {
    ListFeatureCollection collection = new ListFeatureCollection(featureType, chunk.getItems());
    featureStore.addFeatures(collection);  // Append chunk to shapefile
    // chunk goes out of scope → GC eligible
}

@AfterStep
public void afterStep() {
    transaction.commit();  // Commit all chunks at once
    transaction.close();
}

PostgreSQL Array Parameter Handling:

// InferenceResultItemReaderConfig uses PreparedStatementSetter
ps -> {
    Array batchIdsArray = ps.getConnection().createArrayOf("bigint", batchIds.toArray());
    ps.setArray(1, batchIdsArray);  // WHERE batch_id = ANY(?)
    ps.setString(2, mapId);
}

Output Directory Strategy:

Batch mode (MERGED): {output-base-dir}/{inference-id}/merge/ → Single merged shapefile + GeoJSON
Batch mode (map_id partitioning): {output-base-dir}/{inference-id}/{map-id}/ → Per-map_id files
Legacy mode: {output-base-dir}/{inference-id}/{map-id}/ (no merge folder)

GeoServer Registration:

Only shapefile ZIP is uploaded (GeoJSON not registered)
Requires pre-created workspace 'cd' and environment variables for auth
Conditional execution via JobParameter geoserver.enabled
Non-blocking: failures logged but don't stop batch job

Configuration

Profile System

Default profile: prod (set in application.yml)
Configuration hierarchy: application.yml → application-{profile}.yml
Override via: --spring.profiles.active=dev

Key Configuration Properties

Converter Settings (ConverterProperties.java):

converter:
  inference-id: 'D5E46F60FC40B1A8BE0CD1F3547AA6'  # Output folder name
  batch-ids: [252, 253, 257]  # PostgreSQL batch_id filter (required)
  map-ids: []                 # Legacy mode only (ignored in batch mode)
  mode: 'MERGED'              # Legacy mode only: MERGED, MAP_IDS, or RESOLVE
  output-base-dir: '/data/model_output/export/'
  crs: 'EPSG:5186'            # Korean 2000 / Central Belt

  batch:
    chunk-size: 1000          # Records per chunk (affects memory usage)
    fetch-size: 1000          # JDBC cursor fetch size
    skip-limit: 100           # Max skippable records per chunk
    enable-partitioning: false  # Future: parallel map_id processing

GeoServer Settings (GeoServerProperties.java):

geoserver:
  base-url: 'https://kamco.geo-dev.gs.dabeeo.com/geoserver'
  workspace: 'cd'              # Must be pre-created in GeoServer
  overwrite-existing: true     # Delete existing layer before registration
  connection-timeout: 30000    # 30 seconds
  read-timeout: 60000          # 60 seconds
  # Credentials from environment variables (preferred):
  # GEOSERVER_USERNAME, GEOSERVER_PASSWORD

Spring Batch Metadata:

spring:
  batch:
    job:
      enabled: false           # Prevent auto-run on startup
    jdbc:
      initialize-schema: always  # Auto-create BATCH_* tables

Database Integration

Query Strategies

Spring Batch Mode (streaming):

-- InferenceResultItemReaderConfig.java
SELECT uid, map_id, probability, before_year, after_year,
       before_c, before_p, after_c, after_p,
       ST_AsText(geometry) as geometry_wkt
FROM inference_results_testing
WHERE batch_id = ANY(?)
  AND ST_GeometryType(geometry) IN ('ST_Polygon', 'ST_MultiPolygon')
  AND ST_SRID(geometry) = 5186
  AND ST_X(ST_Centroid(geometry)) BETWEEN 125000 AND 530000
  AND ST_Y(ST_Centroid(geometry)) BETWEEN -600000 AND 988000
  AND ST_IsValid(geometry) = true
ORDER BY map_id, uid
-- Uses server-side cursor with fetch-size=1000

Legacy Mode (full load):

-- InferenceResultRepository.java
SELECT uid, map_id, probability, before_year, after_year,
       before_c, before_p, after_c, after_p,
       ST_AsText(geometry) as geometry_wkt
FROM inference_results_testing
WHERE batch_id = ANY(?) AND map_id = ?
-- Returns full List<InferenceResult> in memory

Geometry Type Validation:

-- GeometryTypeValidationTasklet.java
SELECT DISTINCT ST_GeometryType(geometry)
FROM inference_results_testing
WHERE batch_id = ANY(?) AND geometry IS NOT NULL
-- Pre-validates homogeneous geometry requirement

Field Mapping

Database columns map to shapefile fields (10-character limit):

Database Column	DB Type	Shapefile Field	Shapefile Type	Notes
uid	uuid	chnDtctId	String	Change detection ID
map_id	text	mpqd_no	String	Map quadrant number
probability	float8	chn_dtct_p	Double	Change detection probability
before_year	bigint	cprs_yr	Long	Comparison year
after_year	bigint	crtr_yr	Long	Criteria year
before_c	text	bf_cls_cd	String	Before classification code
before_p	float8	bf_cls_pro	Double	Before classification probability
after_c	text	af_cls_cd	String	After classification code
after_p	float8	af_cls_pro	Double	After classification probability
geometry	geom	the_geom	Polygon	Geometry in EPSG:5186

Field name source: See FeatureTypeFactory.java (batch/util/FeatureTypeFactory.java:1-104)

Coordinate Reference System

CRS: EPSG:5186 (Korean 2000 / Central Belt)
Valid Coordinate Bounds: X ∈ [125km, 530km], Y ∈ [-600km, 988km]
Encoding: WKT in SQL → JTS Geometry → GeoTools SimpleFeature → .prj file
Validation: Automatic in batch mode via ST_X(ST_Centroid()) range check

Dependencies

Core Framework:

Spring Boot 3.5.7
- spring-boot-starter: DI container, logging
- spring-boot-starter-jdbc: JDBC template, HikariCP
- spring-boot-starter-batch: Spring Batch framework, job repository
- spring-boot-starter-web: RestTemplate for GeoServer API calls
- spring-boot-starter-validation: @NotBlank annotations

Spatial Libraries:

GeoTools 30.0 (via OSGeo repository)
- gt-shapefile: Shapefile I/O (DataStore, FeatureStore, Transaction)
- gt-geojson: GeoJSON encoding/decoding
- gt-referencing: CRS transformations
- gt-epsg-hsql: EPSG database for CRS lookups
JTS 1.19.0: Geometry primitives (Polygon, MultiPolygon, GeometryFactory)
PostGIS JDBC 2.5.1: PostGIS geometry type support

Database:

PostgreSQL JDBC Driver (latest)
HikariCP (bundled with Spring Boot)

Build Configuration:

// build.gradle
configurations.all {
  exclude group: 'javax.media', module: 'jai_core'  // Conflicts with GeoTools
}

bootJar {
  archiveFileName = "shp-exporter.jar"  // Fixed JAR name
}

spotless {
  java {
    googleJavaFormat('1.19.2')  // 2-space indentation
  }
}

Development Patterns

Adding a New Step to Spring Batch Job

When adding steps to mergedModeJob, follow this pattern:

Create Tasklet or ItemWriter in batch/tasklet/ or batch/writer/
Define Step Bean in MergedModeJobConfig.java:

@Bean
public Step myNewStep(JobRepository jobRepository,
                      PlatformTransactionManager transactionManager,
                      MyTasklet tasklet,
                      BatchExecutionHistoryListener historyListener) {
  return new StepBuilder("myNewStep", jobRepository)
      .tasklet(tasklet, transactionManager)
      .listener(historyListener)  // REQUIRED for history tracking
      .build();
}

Add to Job Flow in mergedModeJob():

.next(myNewStep)

Always include BatchExecutionHistoryListener to track execution metrics

Post-Export DB Hook (`PostShapefileUpdateTasklet`)

PostShapefileUpdateTasklet runs immediately after generateShapefileStep and is designed as a placeholder for running UPDATE SQL after shapefile export (e.g., marking rows as exported). The SQL body is intentionally left as a // TODO — add your UPDATE statement inside execute():

// batch/tasklet/PostShapefileUpdateTasklet.java
int updated = jdbcTemplate.update(
    "UPDATE some_table SET status = 'EXPORTED' WHERE batch_id = ANY(?)",
    ps -> {
      ps.setArray(1, ps.getConnection().createArrayOf("bigint", batchIdList.toArray()));
    });

Job parameters available: inferenceId (String), batchIds (comma-separated String → List<Long>).

Modifying ItemReader Configuration

ItemReaders are not thread-safe. Each step requires its own instance:

// WRONG: Sharing reader between steps
@Bean
public JdbcCursorItemReader<InferenceResult> reader() { ... }

// RIGHT: Separate readers with @StepScope
@Bean
@StepScope  // Creates new instance per step
public JdbcCursorItemReader<InferenceResult> shapefileReader() { ... }

@Bean
@StepScope
public JdbcCursorItemReader<InferenceResult> geoJsonReader() { ... }

See InferenceResultItemReaderConfig.java for working examples.

Streaming Writers Pattern

When writing custom streaming writers, follow StreamingShapefileWriter pattern:

@Component
@StepScope
public class MyStreamingWriter implements ItemStreamWriter<MyType> {
    private Transaction transaction;

    @BeforeStep
    public void open(ExecutionContext context) {
        // Open resources, start transaction
        transaction = new DefaultTransaction("create");
    }

    @Override
    public void write(Chunk<? extends MyType> chunk) {
        // Write chunk incrementally
        // Do NOT accumulate in memory
    }

    @AfterStep
    public ExitStatus afterStep(StepExecution stepExecution) {
        transaction.commit();  // Commit all chunks
        transaction.close();
        return ExitStatus.COMPLETED;
    }
}

JobParameters and StepExecutionContext

Pass data between steps using StepExecutionContext:

// Step 1: Store data
stepExecution.getExecutionContext().putString("geometryType", "ST_Polygon");

// Step 2: Retrieve data
@BeforeStep
public void beforeStep(StepExecution stepExecution) {
    String geomType = stepExecution.getJobExecution()
        .getExecutionContext()
        .getString("geometryType");
}

Job-level parameters from command line:

// ConverterCommandLineRunner.buildJobParameters()
JobParametersBuilder builder = new JobParametersBuilder();
builder.addString("inferenceId", converterProperties.getInferenceId());
builder.addLong("timestamp", System.currentTimeMillis());  // Ensures uniqueness

Partitioning Pattern (Map ID Processing)

The generateMapIdFilesStep uses partitioning but runs sequentially to avoid DB connection pool exhaustion:

@Bean
public Step generateMapIdFilesStep(...) {
    return new StepBuilder("generateMapIdFilesStep", jobRepository)
        .partitioner("mapIdWorker", partitioner)
        .step(mapIdWorkerStep)
        .taskExecutor(new SyncTaskExecutor())  // SEQUENTIAL execution
        .build();
}

For parallel execution in future (requires connection pool tuning):

.taskExecutor(new SimpleAsyncTaskExecutor())
.gridSize(4)  // 4 concurrent workers

GeoServer REST API Integration

GeoServer operations use RestTemplate with custom error handling:

// GeoServerRegistrationService.java
try {
    restTemplate.exchange(url, HttpMethod.PUT, entity, String.class);
} catch (HttpClientErrorException e) {
    if (e.getStatusCode() == HttpStatus.NOT_FOUND) {
        // Handle workspace not found
    }
}

Always check workspace existence before layer registration.

Testing Considerations

Unit tests: Mock JdbcTemplate, DataSource for repository tests
Integration tests: Use @SpringBatchTest with embedded H2 database
GeoTools: Use MemoryDataStore for shapefile writer tests
Current state: Limited test coverage (focus on critical path validation)

Refer to claudedocs/SPRING_BATCH_MIGRATION.md for detailed batch architecture documentation.

18 KiB Executable File Raw Blame History

CLAUDE.md

Project Overview

Build and Run Commands

Build

Run Application

Spring Batch Mode (Recommended)

Legacy Mode (Small Datasets Only)

Upload Shapefile to GeoServer

Override Configuration via Command Line

Code Formatting

Active Profile

Architecture

Dual Execution Modes

Spring Batch Mode (Recommended)

Legacy Mode

Key Design Patterns

Configuration

Profile System

Key Configuration Properties

Database Integration

Query Strategies

Field Mapping

Coordinate Reference System

Dependencies

Development Patterns

Adding a New Step to Spring Batch Job

Post-Export DB Hook (PostShapefileUpdateTasklet)

Modifying ItemReader Configuration

Streaming Writers Pattern

JobParameters and StepExecutionContext

Partitioning Pattern (Map ID Processing)

GeoServer REST API Integration

Testing Considerations

18 KiB

Executable File

Raw Blame History

Post-Export DB Hook (`PostShapefileUpdateTasklet`)