# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview Spring Boot 3.5.7 CLI application that converts PostgreSQL PostGIS spatial data to ESRI shapefiles and GeoJSON formats. The application uses **Spring Batch** for memory-efficient processing of large datasets (1M+ records) and supports automatic GeoServer layer registration via REST API. **Key Features**: - Memory-optimized batch processing (90-95% reduction: 2-13GB → 150-200MB) - Chunk-based streaming with cursor pagination (fetch-size: 1000) - Automatic geometry validation and type conversion (MultiPolygon → Polygon) - Coordinate system validation (EPSG:5186 Korean 2000 / Central Belt) - Dual execution modes: Spring Batch (recommended) and Legacy mode ## Build and Run Commands ### Build ```bash ./gradlew build # Full build with tests ./gradlew clean build -x test # Skip tests ./gradlew spotlessApply # Apply Google Java Format (2-space indentation) ./gradlew spotlessCheck # Verify formatting without applying ``` Output: `build/libs/shp-exporter.jar` (fixed name, no version suffix) ### Run Application #### Spring Batch Mode (Recommended) ```bash # Generate shapefile + GeoJSON ./gradlew bootRun --args="--batch --converter.batch-ids[0]=252" # With GeoServer registration export GEOSERVER_USERNAME=admin export GEOSERVER_PASSWORD=geoserver ./gradlew bootRun --args="--batch --geoserver.enabled=true --converter.batch-ids[0]=252" # Using JAR (production) java -jar build/libs/shp-exporter.jar \ --batch \ --converter.inference-id=D5E46F60FC40B1A8BE0CD1F3547AA6 \ --converter.batch-ids[0]=252 \ --converter.batch-ids[1]=253 ``` #### Legacy Mode (Small Datasets Only) ```bash ./gradlew bootRun # No --batch flag # Warning: May OOM on large datasets ``` #### Upload Shapefile to GeoServer Set environment variables first: ```bash export GEOSERVER_USERNAME=admin export GEOSERVER_PASSWORD=geoserver ``` Then upload: ```bash ./gradlew bootRun --args="--upload-shp /path/to/file.shp --layer layer_name" ``` Or using JAR: ```bash java -jar build/libs/shp-exporter.jar --upload-shp /path/to/file.shp --layer layer_name ``` #### Override Configuration via Command Line Using Gradle (recommended - no quoting issues): ```bash ./gradlew bootRun --args="--converter.inference-id=ABC123 --converter.map-ids[0]=35813030 --converter.batch-ids[0]=252 --converter.mode=MERGED" ``` Using JAR with zsh (quote arguments with brackets): ```bash java -jar build/libs/shp-exporter.jar '--converter.inference-id=ABC123' '--converter.map-ids[0]=35813030' ``` ### Code Formatting Apply Google Java Format (2-space indentation) before committing: ```bash ./gradlew spotlessApply ``` Check formatting without applying: ```bash ./gradlew spotlessCheck ``` ### Active Profile By default, the application runs with `spring.profiles.active=prod` (set in `application.yml`). Profile-specific configurations are in `application-{profile}.yml` files. ## Architecture ### Dual Execution Modes The application supports two execution modes with distinct processing pipelines: #### Spring Batch Mode (Recommended) **Trigger**: `--batch` flag **Use Case**: Large datasets (100K+ records), production workloads **Memory**: 150-200MB constant (chunk-based streaming) **Pipeline Flow**: ``` ConverterCommandLineRunner → JobLauncher.run(mergedModeJob) → Step 1: GeometryTypeValidationTasklet (validates geometry homogeneity) → Step 2: generateShapefileStep (chunk-oriented) → JdbcCursorItemReader (fetch-size: 1000) → FeatureConversionProcessor (InferenceResult → SimpleFeature) → StreamingShapefileWriter (chunk-based append) → Step 3: generateGeoJsonStep (chunk-oriented, same pattern) → Step 4: CreateZipTasklet (creates .zip for GeoServer) → Step 5: GeoServerRegistrationTasklet (conditional, if --geoserver.enabled=true) → Step 6: generateMapIdFilesStep (partitioned, sequential map_id processing) ``` **Key Components**: - `JdbcCursorItemReader`: Cursor-based streaming (no full result set loading) - `StreamingShapefileWriter`: Opens GeoTools transaction, writes chunks incrementally, commits at end - `GeometryTypeValidationTasklet`: Pre-validates with SQL `DISTINCT ST_GeometryType()`, auto-converts MultiPolygon - `CompositeItemWriter`: Simultaneously writes shapefile and GeoJSON in map_id worker step #### Legacy Mode **Trigger**: No `--batch` flag (deprecated) **Use Case**: Small datasets (<10K records) **Memory**: 1.4-9GB (loads entire result set) **Pipeline Flow**: ``` ConverterCommandLineRunner → ShapefileConverterService.convertAll() → InferenceResultRepository.findByBatchIds() (full List) → validateGeometries() (in-memory validation) → ShapefileWriter.write() (DefaultFeatureCollection accumulation) → GeoJsonWriter.write() ``` ### Key Design Patterns **Geometry Type Validation & Auto-Conversion**: - Pre-validation step runs SQL `SELECT DISTINCT ST_GeometryType(geometry)` to detect mixed types - Supports automatic conversion: `ST_MultiPolygon` → `ST_Polygon` (extracts first polygon only) - Fails fast on unsupported mixed types (e.g., Polygon + LineString) - Validates EPSG:5186 coordinate bounds (X: 125-530km, Y: -600-988km) and ST_IsValid() - See `GeometryTypeValidationTasklet` (batch/tasklet/GeometryTypeValidationTasklet.java:1-290) **WKT to JTS Conversion Pipeline**: 1. PostGIS query returns `ST_AsText(geometry)` as WKT string 2. `GeometryConvertingRowMapper` converts ResultSet row to `InferenceResult` with WKT string (batch/reader/GeometryConvertingRowMapper.java:1-74) 3. `FeatureConversionProcessor` uses `GeometryConverter.parseGeometry()` to convert WKT → JTS Geometry (service/GeometryConverter.java:1-92) 4. `StreamingShapefileWriter` wraps JTS geometry in GeoTools `SimpleFeature` and writes to shapefile **Chunk-Based Transaction Management** (Spring Batch only): ```java // StreamingShapefileWriter @BeforeStep public void open() { transaction = new DefaultTransaction("create"); featureStore.setTransaction(transaction); // Long-running transaction } @Override public void write(Chunk chunk) { ListFeatureCollection collection = new ListFeatureCollection(featureType, chunk.getItems()); featureStore.addFeatures(collection); // Append chunk to shapefile // chunk goes out of scope → GC eligible } @AfterStep public void afterStep() { transaction.commit(); // Commit all chunks at once transaction.close(); } ``` **PostgreSQL Array Parameter Handling**: ```java // InferenceResultItemReaderConfig uses PreparedStatementSetter ps -> { Array batchIdsArray = ps.getConnection().createArrayOf("bigint", batchIds.toArray()); ps.setArray(1, batchIdsArray); // WHERE batch_id = ANY(?) ps.setString(2, mapId); } ``` **Output Directory Strategy**: - Batch mode (MERGED): `{output-base-dir}/{inference-id}/merge/` → Single merged shapefile + GeoJSON - Batch mode (map_id partitioning): `{output-base-dir}/{inference-id}/{map-id}/` → Per-map_id files - Legacy mode: `{output-base-dir}/{inference-id}/{map-id}/` (no merge folder) **GeoServer Registration**: - Only shapefile ZIP is uploaded (GeoJSON not registered) - Requires pre-created workspace 'cd' and environment variables for auth - Conditional execution via JobParameter `geoserver.enabled` - Non-blocking: failures logged but don't stop batch job ## Configuration ### Profile System - Default profile: `prod` (set in application.yml) - Configuration hierarchy: `application.yml` → `application-{profile}.yml` - Override via: `--spring.profiles.active=dev` ### Key Configuration Properties **Converter Settings** (`ConverterProperties.java`): ```yaml converter: inference-id: 'D5E46F60FC40B1A8BE0CD1F3547AA6' # Output folder name batch-ids: [252, 253, 257] # PostgreSQL batch_id filter (required) map-ids: [] # Legacy mode only (ignored in batch mode) mode: 'MERGED' # Legacy mode only: MERGED, MAP_IDS, or RESOLVE output-base-dir: '/data/model_output/export/' crs: 'EPSG:5186' # Korean 2000 / Central Belt batch: chunk-size: 1000 # Records per chunk (affects memory usage) fetch-size: 1000 # JDBC cursor fetch size skip-limit: 100 # Max skippable records per chunk enable-partitioning: false # Future: parallel map_id processing ``` **GeoServer Settings** (`GeoServerProperties.java`): ```yaml geoserver: base-url: 'https://kamco.geo-dev.gs.dabeeo.com/geoserver' workspace: 'cd' # Must be pre-created in GeoServer overwrite-existing: true # Delete existing layer before registration connection-timeout: 30000 # 30 seconds read-timeout: 60000 # 60 seconds # Credentials from environment variables (preferred): # GEOSERVER_USERNAME, GEOSERVER_PASSWORD ``` **Spring Batch Metadata**: ```yaml spring: batch: job: enabled: false # Prevent auto-run on startup jdbc: initialize-schema: always # Auto-create BATCH_* tables ``` ## Database Integration ### Query Strategies **Spring Batch Mode** (streaming): ```sql -- InferenceResultItemReaderConfig.java SELECT uid, map_id, probability, before_year, after_year, before_c, before_p, after_c, after_p, ST_AsText(geometry) as geometry_wkt FROM inference_results_testing WHERE batch_id = ANY(?) AND ST_GeometryType(geometry) IN ('ST_Polygon', 'ST_MultiPolygon') AND ST_SRID(geometry) = 5186 AND ST_X(ST_Centroid(geometry)) BETWEEN 125000 AND 530000 AND ST_Y(ST_Centroid(geometry)) BETWEEN -600000 AND 988000 AND ST_IsValid(geometry) = true ORDER BY map_id, uid -- Uses server-side cursor with fetch-size=1000 ``` **Legacy Mode** (full load): ```sql -- InferenceResultRepository.java SELECT uid, map_id, probability, before_year, after_year, before_c, before_p, after_c, after_p, ST_AsText(geometry) as geometry_wkt FROM inference_results_testing WHERE batch_id = ANY(?) AND map_id = ? -- Returns full List in memory ``` **Geometry Type Validation**: ```sql -- GeometryTypeValidationTasklet.java SELECT DISTINCT ST_GeometryType(geometry) FROM inference_results_testing WHERE batch_id = ANY(?) AND geometry IS NOT NULL -- Pre-validates homogeneous geometry requirement ``` ### Field Mapping Database columns map to shapefile fields (10-character limit): | Database Column | DB Type | Shapefile Field | Shapefile Type | Notes | |-----------------|---------|-----------------|----------------|-------| | uid | uuid | chnDtctId | String | Change detection ID | | map_id | text | mpqd_no | String | Map quadrant number | | probability | float8 | chn_dtct_p | Double | Change detection probability | | before_year | bigint | cprs_yr | Long | Comparison year | | after_year | bigint | crtr_yr | Long | Criteria year | | before_c | text | bf_cls_cd | String | Before classification code | | before_p | float8 | bf_cls_pro | Double | Before classification probability | | after_c | text | af_cls_cd | String | After classification code | | after_p | float8 | af_cls_pro | Double | After classification probability | | geometry | geom | the_geom | Polygon | Geometry in EPSG:5186 | **Field name source**: See `FeatureTypeFactory.java` (batch/util/FeatureTypeFactory.java:1-104) ### Coordinate Reference System - **CRS**: EPSG:5186 (Korean 2000 / Central Belt) - **Valid Coordinate Bounds**: X ∈ [125km, 530km], Y ∈ [-600km, 988km] - **Encoding**: WKT in SQL → JTS Geometry → GeoTools SimpleFeature → `.prj` file - **Validation**: Automatic in batch mode via `ST_X(ST_Centroid())` range check ## Dependencies **Core Framework**: - Spring Boot 3.5.7 - `spring-boot-starter`: DI container, logging - `spring-boot-starter-jdbc`: JDBC template, HikariCP - `spring-boot-starter-batch`: Spring Batch framework, job repository - `spring-boot-starter-web`: RestTemplate for GeoServer API calls - `spring-boot-starter-validation`: @NotBlank annotations **Spatial Libraries**: - GeoTools 30.0 (via OSGeo repository) - `gt-shapefile`: Shapefile I/O (DataStore, FeatureStore, Transaction) - `gt-geojson`: GeoJSON encoding/decoding - `gt-referencing`: CRS transformations - `gt-epsg-hsql`: EPSG database for CRS lookups - JTS 1.19.0: Geometry primitives (Polygon, MultiPolygon, GeometryFactory) - PostGIS JDBC 2.5.1: PostGIS geometry type support **Database**: - PostgreSQL JDBC Driver (latest) - HikariCP (bundled with Spring Boot) **Build Configuration**: ```gradle // build.gradle configurations.all { exclude group: 'javax.media', module: 'jai_core' // Conflicts with GeoTools } bootJar { archiveFileName = "shp-exporter.jar" // Fixed JAR name } spotless { java { googleJavaFormat('1.19.2') // 2-space indentation } } ``` ## Development Patterns ### Adding a New Step to Spring Batch Job When adding steps to `mergedModeJob`, follow this pattern: 1. **Create Tasklet or ItemWriter** in `batch/tasklet/` or `batch/writer/` 2. **Define Step Bean** in `MergedModeJobConfig.java`: ```java @Bean public Step myNewStep(JobRepository jobRepository, PlatformTransactionManager transactionManager, MyTasklet tasklet, BatchExecutionHistoryListener historyListener) { return new StepBuilder("myNewStep", jobRepository) .tasklet(tasklet, transactionManager) .listener(historyListener) // REQUIRED for history tracking .build(); } ``` 3. **Add to Job Flow** in `mergedModeJob()`: ```java .next(myNewStep) ``` 4. **Always include `BatchExecutionHistoryListener`** to track execution metrics ### Modifying ItemReader Configuration ItemReaders are **not thread-safe**. Each step requires its own instance: ```java // WRONG: Sharing reader between steps @Bean public JdbcCursorItemReader reader() { ... } // RIGHT: Separate readers with @StepScope @Bean @StepScope // Creates new instance per step public JdbcCursorItemReader shapefileReader() { ... } @Bean @StepScope public JdbcCursorItemReader geoJsonReader() { ... } ``` See `InferenceResultItemReaderConfig.java` for working examples. ### Streaming Writers Pattern When writing custom streaming writers, follow `StreamingShapefileWriter` pattern: ```java @Component @StepScope public class MyStreamingWriter implements ItemStreamWriter { private Transaction transaction; @BeforeStep public void open(ExecutionContext context) { // Open resources, start transaction transaction = new DefaultTransaction("create"); } @Override public void write(Chunk chunk) { // Write chunk incrementally // Do NOT accumulate in memory } @AfterStep public ExitStatus afterStep(StepExecution stepExecution) { transaction.commit(); // Commit all chunks transaction.close(); return ExitStatus.COMPLETED; } } ``` ### JobParameters and StepExecutionContext **Pass data between steps** using `StepExecutionContext`: ```java // Step 1: Store data stepExecution.getExecutionContext().putString("geometryType", "ST_Polygon"); // Step 2: Retrieve data @BeforeStep public void beforeStep(StepExecution stepExecution) { String geomType = stepExecution.getJobExecution() .getExecutionContext() .getString("geometryType"); } ``` **Job-level parameters** from command line: ```java // ConverterCommandLineRunner.buildJobParameters() JobParametersBuilder builder = new JobParametersBuilder(); builder.addString("inferenceId", converterProperties.getInferenceId()); builder.addLong("timestamp", System.currentTimeMillis()); // Ensures uniqueness ``` ### Partitioning Pattern (Map ID Processing) The `generateMapIdFilesStep` uses partitioning but runs **sequentially** to avoid DB connection pool exhaustion: ```java @Bean public Step generateMapIdFilesStep(...) { return new StepBuilder("generateMapIdFilesStep", jobRepository) .partitioner("mapIdWorker", partitioner) .step(mapIdWorkerStep) .taskExecutor(new SyncTaskExecutor()) // SEQUENTIAL execution .build(); } ``` For parallel execution in future (requires connection pool tuning): ```java .taskExecutor(new SimpleAsyncTaskExecutor()) .gridSize(4) // 4 concurrent workers ``` ### GeoServer REST API Integration GeoServer operations use `RestTemplate` with custom error handling: ```java // GeoServerRegistrationService.java try { restTemplate.exchange(url, HttpMethod.PUT, entity, String.class); } catch (HttpClientErrorException e) { if (e.getStatusCode() == HttpStatus.NOT_FOUND) { // Handle workspace not found } } ``` Always check workspace existence before layer registration. ### Testing Considerations - **Unit tests**: Mock `JdbcTemplate`, `DataSource` for repository tests - **Integration tests**: Use `@SpringBatchTest` with embedded H2 database - **GeoTools**: Use `MemoryDataStore` for shapefile writer tests - **Current state**: Limited test coverage (focus on critical path validation) Refer to `claudedocs/SPRING_BATCH_MIGRATION.md` for detailed batch architecture documentation.