18 KiB
Executable File
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
Spring Boot 3.5.7 / Java 21 CLI application that converts PostgreSQL PostGIS spatial data to ESRI shapefiles and GeoJSON formats. The application uses Spring Batch for memory-efficient processing of large datasets (1M+ records) and supports automatic GeoServer layer registration via REST API.
Key Features:
- Memory-optimized batch processing (90-95% reduction: 2-13GB → 150-200MB)
- Chunk-based streaming with cursor pagination (fetch-size: 1000)
- Automatic geometry validation and type conversion (MultiPolygon → Polygon)
- Coordinate system validation (EPSG:5186 Korean 2000 / Central Belt)
- Three execution modes: Spring Batch (recommended), Legacy, and GeoServer registration-only
Build and Run Commands
Build
./gradlew build # Full build with tests
./gradlew clean build -x test # Skip tests
./gradlew spotlessApply # Apply Google Java Format (2-space indentation)
./gradlew spotlessCheck # Verify formatting without applying
Output: build/libs/shp-exporter.jar (fixed name, no version suffix)
Note
: The
Dockerfilecurrently referencesshp-exporter-v2.jarin itsCOPYstep, which does not match the actual build output. Update the Dockerfile if building a Docker image.
Run Application
Spring Batch Mode (Recommended)
# Generate shapefile + GeoJSON
./gradlew bootRun --args="--batch --converter.batch-ids[0]=252"
# With GeoServer registration
export GEOSERVER_USERNAME=admin
export GEOSERVER_PASSWORD=geoserver
./gradlew bootRun --args="--batch --geoserver.enabled=true --converter.batch-ids[0]=252"
# Using JAR (production)
java -jar build/libs/shp-exporter.jar \
--batch \
--converter.inference-id=D5E46F60FC40B1A8BE0CD1F3547AA6 \
--converter.batch-ids[0]=252 \
--converter.batch-ids[1]=253
Legacy Mode (Small Datasets Only)
./gradlew bootRun # No --batch flag
# Warning: May OOM on large datasets
Upload Shapefile to GeoServer
Set environment variables first:
export GEOSERVER_USERNAME=admin
export GEOSERVER_PASSWORD=geoserver
Then upload:
./gradlew bootRun --args="--upload-shp /path/to/file.shp --layer layer_name"
Or using JAR:
java -jar build/libs/shp-exporter.jar --upload-shp /path/to/file.shp --layer layer_name
Override Configuration via Command Line
Using Gradle (recommended - no quoting issues):
./gradlew bootRun --args="--converter.inference-id=ABC123 --converter.map-ids[0]=35813030 --converter.batch-ids[0]=252 --converter.mode=MERGED"
Using JAR with zsh (quote arguments with brackets):
java -jar build/libs/shp-exporter.jar '--converter.inference-id=ABC123' '--converter.map-ids[0]=35813030'
Code Formatting
Apply Google Java Format (2-space indentation) before committing:
./gradlew spotlessApply
Check formatting without applying:
./gradlew spotlessCheck
Active Profile
By default, the application runs with spring.profiles.active=prod (set in application.yml). Profile-specific configurations are in application-{profile}.yml files.
Architecture
Dual Execution Modes
The application supports two execution modes with distinct processing pipelines:
Spring Batch Mode (Recommended)
Trigger: --batch flag
Use Case: Large datasets (100K+ records), production workloads
Memory: 150-200MB constant (chunk-based streaming)
Pipeline Flow:
ConverterCommandLineRunner
→ JobLauncher.run(mergedModeJob)
→ Step 1: GeometryTypeValidationTasklet (validates geometry homogeneity)
→ Step 2: generateShapefileStep (chunk-oriented)
→ JdbcCursorItemReader (fetch-size: 1000)
→ FeatureConversionProcessor (InferenceResult → SimpleFeature)
→ StreamingShapefileWriter (chunk-based append)
→ Step 2-1: PostShapefileUpdateTasklet (post-export DB UPDATE hook)
→ Step 3: generateGeoJsonStep (chunk-oriented, same pattern)
→ Step 4: CreateZipTasklet (creates .zip for GeoServer)
→ Step 5: GeoServerRegistrationTasklet (conditional, if --geoserver.enabled=true)
→ Step 6: generateMapIdFilesStep (partitioned, sequential map_id processing)
Key Components:
JdbcCursorItemReader: Cursor-based streaming (no full result set loading)StreamingShapefileWriter: Opens GeoTools transaction, writes chunks incrementally, commits at endGeometryTypeValidationTasklet: Pre-validates with SQLDISTINCT ST_GeometryType(), auto-converts MultiPolygonCompositeItemWriter: Simultaneously writes shapefile and GeoJSON in map_id worker step
Legacy Mode
Trigger: No --batch flag (deprecated)
Use Case: Small datasets (<10K records)
Memory: 1.4-9GB (loads entire result set)
Pipeline Flow:
ConverterCommandLineRunner
→ ShapefileConverterService.convertAll()
→ InferenceResultRepository.findByBatchIds() (full List<InferenceResult>)
→ validateGeometries() (in-memory validation)
→ ShapefileWriter.write() (DefaultFeatureCollection accumulation)
→ GeoJsonWriter.write()
Key Design Patterns
Geometry Type Validation & Auto-Conversion:
- Pre-validation step runs SQL
SELECT DISTINCT ST_GeometryType(geometry)to detect mixed types - Supports automatic conversion:
ST_MultiPolygon→ST_Polygon(extracts first polygon only) - Fails fast on unsupported mixed types (e.g., Polygon + LineString)
- Validates EPSG:5186 coordinate bounds (X: 125-530km, Y: -600-988km) and ST_IsValid()
- See
GeometryTypeValidationTasklet(batch/tasklet/GeometryTypeValidationTasklet.java:1-290)
WKT to JTS Conversion Pipeline:
- PostGIS query returns
ST_AsText(geometry)as WKT string GeometryConvertingRowMapperconverts ResultSet row toInferenceResultwith WKT string (batch/reader/GeometryConvertingRowMapper.java:1-74)FeatureConversionProcessorusesGeometryConverter.parseGeometry()to convert WKT → JTS Geometry (service/GeometryConverter.java:1-92)StreamingShapefileWriterwraps JTS geometry in GeoToolsSimpleFeatureand writes to shapefile
Chunk-Based Transaction Management (Spring Batch only):
// StreamingShapefileWriter
@BeforeStep
public void open() {
transaction = new DefaultTransaction("create");
featureStore.setTransaction(transaction); // Long-running transaction
}
@Override
public void write(Chunk<SimpleFeature> chunk) {
ListFeatureCollection collection = new ListFeatureCollection(featureType, chunk.getItems());
featureStore.addFeatures(collection); // Append chunk to shapefile
// chunk goes out of scope → GC eligible
}
@AfterStep
public void afterStep() {
transaction.commit(); // Commit all chunks at once
transaction.close();
}
PostgreSQL Array Parameter Handling:
// InferenceResultItemReaderConfig uses PreparedStatementSetter
ps -> {
Array batchIdsArray = ps.getConnection().createArrayOf("bigint", batchIds.toArray());
ps.setArray(1, batchIdsArray); // WHERE batch_id = ANY(?)
ps.setString(2, mapId);
}
Output Directory Strategy:
- Batch mode (MERGED):
{output-base-dir}/{inference-id}/merge/→ Single merged shapefile + GeoJSON - Batch mode (map_id partitioning):
{output-base-dir}/{inference-id}/{map-id}/→ Per-map_id files - Legacy mode:
{output-base-dir}/{inference-id}/{map-id}/(no merge folder)
GeoServer Registration:
- Only shapefile ZIP is uploaded (GeoJSON not registered)
- Requires pre-created workspace 'cd' and environment variables for auth
- Conditional execution via JobParameter
geoserver.enabled - Non-blocking: failures logged but don't stop batch job
Configuration
Profile System
- Default profile:
prod(set in application.yml) - Configuration hierarchy:
application.yml→application-{profile}.yml - Override via:
--spring.profiles.active=dev
Key Configuration Properties
Converter Settings (ConverterProperties.java):
converter:
inference-id: 'D5E46F60FC40B1A8BE0CD1F3547AA6' # Output folder name
batch-ids: [252, 253, 257] # PostgreSQL batch_id filter (required)
map-ids: [] # Legacy mode only (ignored in batch mode)
mode: 'MERGED' # Legacy mode only: MERGED, MAP_IDS, or RESOLVE
output-base-dir: '/data/model_output/export/'
crs: 'EPSG:5186' # Korean 2000 / Central Belt
batch:
chunk-size: 1000 # Records per chunk (affects memory usage)
fetch-size: 1000 # JDBC cursor fetch size
skip-limit: 100 # Max skippable records per chunk
enable-partitioning: false # Future: parallel map_id processing
GeoServer Settings (GeoServerProperties.java):
geoserver:
base-url: 'https://kamco.geo-dev.gs.dabeeo.com/geoserver'
workspace: 'cd' # Must be pre-created in GeoServer
overwrite-existing: true # Delete existing layer before registration
connection-timeout: 30000 # 30 seconds
read-timeout: 60000 # 60 seconds
# Credentials from environment variables (preferred):
# GEOSERVER_USERNAME, GEOSERVER_PASSWORD
Spring Batch Metadata:
spring:
batch:
job:
enabled: false # Prevent auto-run on startup
jdbc:
initialize-schema: always # Auto-create BATCH_* tables
Database Integration
Query Strategies
Spring Batch Mode (streaming):
-- InferenceResultItemReaderConfig.java
SELECT uid, map_id, probability, before_year, after_year,
before_c, before_p, after_c, after_p,
ST_AsText(geometry) as geometry_wkt
FROM inference_results_testing
WHERE batch_id = ANY(?)
AND ST_GeometryType(geometry) IN ('ST_Polygon', 'ST_MultiPolygon')
AND ST_SRID(geometry) = 5186
AND ST_X(ST_Centroid(geometry)) BETWEEN 125000 AND 530000
AND ST_Y(ST_Centroid(geometry)) BETWEEN -600000 AND 988000
AND ST_IsValid(geometry) = true
ORDER BY map_id, uid
-- Uses server-side cursor with fetch-size=1000
Legacy Mode (full load):
-- InferenceResultRepository.java
SELECT uid, map_id, probability, before_year, after_year,
before_c, before_p, after_c, after_p,
ST_AsText(geometry) as geometry_wkt
FROM inference_results_testing
WHERE batch_id = ANY(?) AND map_id = ?
-- Returns full List<InferenceResult> in memory
Geometry Type Validation:
-- GeometryTypeValidationTasklet.java
SELECT DISTINCT ST_GeometryType(geometry)
FROM inference_results_testing
WHERE batch_id = ANY(?) AND geometry IS NOT NULL
-- Pre-validates homogeneous geometry requirement
Field Mapping
Database columns map to shapefile fields (10-character limit):
| Database Column | DB Type | Shapefile Field | Shapefile Type | Notes |
|---|---|---|---|---|
| uid | uuid | chnDtctId | String | Change detection ID |
| map_id | text | mpqd_no | String | Map quadrant number |
| probability | float8 | chn_dtct_p | Double | Change detection probability |
| before_year | bigint | cprs_yr | Long | Comparison year |
| after_year | bigint | crtr_yr | Long | Criteria year |
| before_c | text | bf_cls_cd | String | Before classification code |
| before_p | float8 | bf_cls_pro | Double | Before classification probability |
| after_c | text | af_cls_cd | String | After classification code |
| after_p | float8 | af_cls_pro | Double | After classification probability |
| geometry | geom | the_geom | Polygon | Geometry in EPSG:5186 |
Field name source: See FeatureTypeFactory.java (batch/util/FeatureTypeFactory.java:1-104)
Coordinate Reference System
- CRS: EPSG:5186 (Korean 2000 / Central Belt)
- Valid Coordinate Bounds: X ∈ [125km, 530km], Y ∈ [-600km, 988km]
- Encoding: WKT in SQL → JTS Geometry → GeoTools SimpleFeature →
.prjfile - Validation: Automatic in batch mode via
ST_X(ST_Centroid())range check
Dependencies
Core Framework:
- Spring Boot 3.5.7
spring-boot-starter: DI container, loggingspring-boot-starter-jdbc: JDBC template, HikariCPspring-boot-starter-batch: Spring Batch framework, job repositoryspring-boot-starter-web: RestTemplate for GeoServer API callsspring-boot-starter-validation: @NotBlank annotations
Spatial Libraries:
- GeoTools 30.0 (via OSGeo repository)
gt-shapefile: Shapefile I/O (DataStore, FeatureStore, Transaction)gt-geojson: GeoJSON encoding/decodinggt-referencing: CRS transformationsgt-epsg-hsql: EPSG database for CRS lookups
- JTS 1.19.0: Geometry primitives (Polygon, MultiPolygon, GeometryFactory)
- PostGIS JDBC 2.5.1: PostGIS geometry type support
Database:
- PostgreSQL JDBC Driver (latest)
- HikariCP (bundled with Spring Boot)
Build Configuration:
// build.gradle
configurations.all {
exclude group: 'javax.media', module: 'jai_core' // Conflicts with GeoTools
}
bootJar {
archiveFileName = "shp-exporter.jar" // Fixed JAR name
}
spotless {
java {
googleJavaFormat('1.19.2') // 2-space indentation
}
}
Development Patterns
Adding a New Step to Spring Batch Job
When adding steps to mergedModeJob, follow this pattern:
- Create Tasklet or ItemWriter in
batch/tasklet/orbatch/writer/ - Define Step Bean in
MergedModeJobConfig.java:
@Bean
public Step myNewStep(JobRepository jobRepository,
PlatformTransactionManager transactionManager,
MyTasklet tasklet,
BatchExecutionHistoryListener historyListener) {
return new StepBuilder("myNewStep", jobRepository)
.tasklet(tasklet, transactionManager)
.listener(historyListener) // REQUIRED for history tracking
.build();
}
- Add to Job Flow in
mergedModeJob():
.next(myNewStep)
- Always include
BatchExecutionHistoryListenerto track execution metrics
Post-Export DB Hook (PostShapefileUpdateTasklet)
PostShapefileUpdateTasklet runs immediately after generateShapefileStep and is designed as a placeholder for running UPDATE SQL after shapefile export (e.g., marking rows as exported). The SQL body is intentionally left as a // TODO — add your UPDATE statement inside execute():
// batch/tasklet/PostShapefileUpdateTasklet.java
int updated = jdbcTemplate.update(
"UPDATE some_table SET status = 'EXPORTED' WHERE batch_id = ANY(?)",
ps -> {
ps.setArray(1, ps.getConnection().createArrayOf("bigint", batchIdList.toArray()));
});
Job parameters available: inferenceId (String), batchIds (comma-separated String → List<Long>).
Modifying ItemReader Configuration
ItemReaders are not thread-safe. Each step requires its own instance:
// WRONG: Sharing reader between steps
@Bean
public JdbcCursorItemReader<InferenceResult> reader() { ... }
// RIGHT: Separate readers with @StepScope
@Bean
@StepScope // Creates new instance per step
public JdbcCursorItemReader<InferenceResult> shapefileReader() { ... }
@Bean
@StepScope
public JdbcCursorItemReader<InferenceResult> geoJsonReader() { ... }
See InferenceResultItemReaderConfig.java for working examples.
Streaming Writers Pattern
When writing custom streaming writers, follow StreamingShapefileWriter pattern:
@Component
@StepScope
public class MyStreamingWriter implements ItemStreamWriter<MyType> {
private Transaction transaction;
@BeforeStep
public void open(ExecutionContext context) {
// Open resources, start transaction
transaction = new DefaultTransaction("create");
}
@Override
public void write(Chunk<? extends MyType> chunk) {
// Write chunk incrementally
// Do NOT accumulate in memory
}
@AfterStep
public ExitStatus afterStep(StepExecution stepExecution) {
transaction.commit(); // Commit all chunks
transaction.close();
return ExitStatus.COMPLETED;
}
}
JobParameters and StepExecutionContext
Pass data between steps using StepExecutionContext:
// Step 1: Store data
stepExecution.getExecutionContext().putString("geometryType", "ST_Polygon");
// Step 2: Retrieve data
@BeforeStep
public void beforeStep(StepExecution stepExecution) {
String geomType = stepExecution.getJobExecution()
.getExecutionContext()
.getString("geometryType");
}
Job-level parameters from command line:
// ConverterCommandLineRunner.buildJobParameters()
JobParametersBuilder builder = new JobParametersBuilder();
builder.addString("inferenceId", converterProperties.getInferenceId());
builder.addLong("timestamp", System.currentTimeMillis()); // Ensures uniqueness
Partitioning Pattern (Map ID Processing)
The generateMapIdFilesStep uses partitioning but runs sequentially to avoid DB connection pool exhaustion:
@Bean
public Step generateMapIdFilesStep(...) {
return new StepBuilder("generateMapIdFilesStep", jobRepository)
.partitioner("mapIdWorker", partitioner)
.step(mapIdWorkerStep)
.taskExecutor(new SyncTaskExecutor()) // SEQUENTIAL execution
.build();
}
For parallel execution in future (requires connection pool tuning):
.taskExecutor(new SimpleAsyncTaskExecutor())
.gridSize(4) // 4 concurrent workers
GeoServer REST API Integration
GeoServer operations use RestTemplate with custom error handling:
// GeoServerRegistrationService.java
try {
restTemplate.exchange(url, HttpMethod.PUT, entity, String.class);
} catch (HttpClientErrorException e) {
if (e.getStatusCode() == HttpStatus.NOT_FOUND) {
// Handle workspace not found
}
}
Always check workspace existence before layer registration.
Testing Considerations
- Unit tests: Mock
JdbcTemplate,DataSourcefor repository tests - Integration tests: Use
@SpringBatchTestwith embedded H2 database - GeoTools: Use
MemoryDataStorefor shapefile writer tests - Current state: Limited test coverage (focus on critical path validation)
Refer to claudedocs/SPRING_BATCH_MIGRATION.md for detailed batch architecture documentation.