505 lines
17 KiB
Markdown
Executable File
505 lines
17 KiB
Markdown
Executable File
# CLAUDE.md
|
|
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
|
|
## Project Overview
|
|
|
|
Spring Boot 3.5.7 CLI application that converts PostgreSQL PostGIS spatial data to ESRI shapefiles and GeoJSON formats. The application uses **Spring Batch** for memory-efficient processing of large datasets (1M+ records) and supports automatic GeoServer layer registration via REST API.
|
|
|
|
**Key Features**:
|
|
- Memory-optimized batch processing (90-95% reduction: 2-13GB → 150-200MB)
|
|
- Chunk-based streaming with cursor pagination (fetch-size: 1000)
|
|
- Automatic geometry validation and type conversion (MultiPolygon → Polygon)
|
|
- Coordinate system validation (EPSG:5186 Korean 2000 / Central Belt)
|
|
- Dual execution modes: Spring Batch (recommended) and Legacy mode
|
|
|
|
## Build and Run Commands
|
|
|
|
### Build
|
|
```bash
|
|
./gradlew build # Full build with tests
|
|
./gradlew clean build -x test # Skip tests
|
|
./gradlew spotlessApply # Apply Google Java Format (2-space indentation)
|
|
./gradlew spotlessCheck # Verify formatting without applying
|
|
```
|
|
|
|
Output: `build/libs/shp-exporter.jar` (fixed name, no version suffix)
|
|
|
|
### Run Application
|
|
|
|
#### Spring Batch Mode (Recommended)
|
|
```bash
|
|
# Generate shapefile + GeoJSON
|
|
./gradlew bootRun --args="--batch --converter.batch-ids[0]=252"
|
|
|
|
# With GeoServer registration
|
|
export GEOSERVER_USERNAME=admin
|
|
export GEOSERVER_PASSWORD=geoserver
|
|
./gradlew bootRun --args="--batch --geoserver.enabled=true --converter.batch-ids[0]=252"
|
|
|
|
# Using JAR (production)
|
|
java -jar build/libs/shp-exporter.jar \
|
|
--batch \
|
|
--converter.inference-id=D5E46F60FC40B1A8BE0CD1F3547AA6 \
|
|
--converter.batch-ids[0]=252 \
|
|
--converter.batch-ids[1]=253
|
|
```
|
|
|
|
#### Legacy Mode (Small Datasets Only)
|
|
```bash
|
|
./gradlew bootRun # No --batch flag
|
|
# Warning: May OOM on large datasets
|
|
```
|
|
|
|
#### Upload Shapefile to GeoServer
|
|
Set environment variables first:
|
|
```bash
|
|
export GEOSERVER_USERNAME=admin
|
|
export GEOSERVER_PASSWORD=geoserver
|
|
```
|
|
|
|
Then upload:
|
|
```bash
|
|
./gradlew bootRun --args="--upload-shp /path/to/file.shp --layer layer_name"
|
|
```
|
|
|
|
Or using JAR:
|
|
```bash
|
|
java -jar build/libs/shp-exporter.jar --upload-shp /path/to/file.shp --layer layer_name
|
|
```
|
|
|
|
#### Override Configuration via Command Line
|
|
Using Gradle (recommended - no quoting issues):
|
|
```bash
|
|
./gradlew bootRun --args="--converter.inference-id=ABC123 --converter.map-ids[0]=35813030 --converter.batch-ids[0]=252 --converter.mode=MERGED"
|
|
```
|
|
|
|
Using JAR with zsh (quote arguments with brackets):
|
|
```bash
|
|
java -jar build/libs/shp-exporter.jar '--converter.inference-id=ABC123' '--converter.map-ids[0]=35813030'
|
|
```
|
|
|
|
### Code Formatting
|
|
Apply Google Java Format (2-space indentation) before committing:
|
|
```bash
|
|
./gradlew spotlessApply
|
|
```
|
|
|
|
Check formatting without applying:
|
|
```bash
|
|
./gradlew spotlessCheck
|
|
```
|
|
|
|
### Active Profile
|
|
By default, the application runs with `spring.profiles.active=prod` (set in `application.yml`). Profile-specific configurations are in `application-{profile}.yml` files.
|
|
|
|
## Architecture
|
|
|
|
### Dual Execution Modes
|
|
|
|
The application supports two execution modes with distinct processing pipelines:
|
|
|
|
#### Spring Batch Mode (Recommended)
|
|
**Trigger**: `--batch` flag
|
|
**Use Case**: Large datasets (100K+ records), production workloads
|
|
**Memory**: 150-200MB constant (chunk-based streaming)
|
|
|
|
**Pipeline Flow**:
|
|
```
|
|
ConverterCommandLineRunner
|
|
→ JobLauncher.run(mergedModeJob)
|
|
→ Step 1: GeometryTypeValidationTasklet (validates geometry homogeneity)
|
|
→ Step 2: generateShapefileStep (chunk-oriented)
|
|
→ JdbcCursorItemReader (fetch-size: 1000)
|
|
→ FeatureConversionProcessor (InferenceResult → SimpleFeature)
|
|
→ StreamingShapefileWriter (chunk-based append)
|
|
→ Step 3: generateGeoJsonStep (chunk-oriented, same pattern)
|
|
→ Step 4: CreateZipTasklet (creates .zip for GeoServer)
|
|
→ Step 5: GeoServerRegistrationTasklet (conditional, if --geoserver.enabled=true)
|
|
→ Step 6: generateMapIdFilesStep (partitioned, sequential map_id processing)
|
|
```
|
|
|
|
**Key Components**:
|
|
- `JdbcCursorItemReader`: Cursor-based streaming (no full result set loading)
|
|
- `StreamingShapefileWriter`: Opens GeoTools transaction, writes chunks incrementally, commits at end
|
|
- `GeometryTypeValidationTasklet`: Pre-validates with SQL `DISTINCT ST_GeometryType()`, auto-converts MultiPolygon
|
|
- `CompositeItemWriter`: Simultaneously writes shapefile and GeoJSON in map_id worker step
|
|
|
|
#### Legacy Mode
|
|
**Trigger**: No `--batch` flag (deprecated)
|
|
**Use Case**: Small datasets (<10K records)
|
|
**Memory**: 1.4-9GB (loads entire result set)
|
|
|
|
**Pipeline Flow**:
|
|
```
|
|
ConverterCommandLineRunner
|
|
→ ShapefileConverterService.convertAll()
|
|
→ InferenceResultRepository.findByBatchIds() (full List<InferenceResult>)
|
|
→ validateGeometries() (in-memory validation)
|
|
→ ShapefileWriter.write() (DefaultFeatureCollection accumulation)
|
|
→ GeoJsonWriter.write()
|
|
```
|
|
|
|
### Key Design Patterns
|
|
|
|
**Geometry Type Validation & Auto-Conversion**:
|
|
- Pre-validation step runs SQL `SELECT DISTINCT ST_GeometryType(geometry)` to detect mixed types
|
|
- Supports automatic conversion: `ST_MultiPolygon` → `ST_Polygon` (extracts first polygon only)
|
|
- Fails fast on unsupported mixed types (e.g., Polygon + LineString)
|
|
- Validates EPSG:5186 coordinate bounds (X: 125-530km, Y: -600-988km) and ST_IsValid()
|
|
- See `GeometryTypeValidationTasklet` (batch/tasklet/GeometryTypeValidationTasklet.java:1-290)
|
|
|
|
**WKT to JTS Conversion Pipeline**:
|
|
1. PostGIS query returns `ST_AsText(geometry)` as WKT string
|
|
2. `GeometryConvertingRowMapper` converts ResultSet row to `InferenceResult` with WKT string (batch/reader/GeometryConvertingRowMapper.java:1-74)
|
|
3. `FeatureConversionProcessor` uses `GeometryConverter.parseGeometry()` to convert WKT → JTS Geometry (service/GeometryConverter.java:1-92)
|
|
4. `StreamingShapefileWriter` wraps JTS geometry in GeoTools `SimpleFeature` and writes to shapefile
|
|
|
|
**Chunk-Based Transaction Management** (Spring Batch only):
|
|
```java
|
|
// StreamingShapefileWriter
|
|
@BeforeStep
|
|
public void open() {
|
|
transaction = new DefaultTransaction("create");
|
|
featureStore.setTransaction(transaction); // Long-running transaction
|
|
}
|
|
|
|
@Override
|
|
public void write(Chunk<SimpleFeature> chunk) {
|
|
ListFeatureCollection collection = new ListFeatureCollection(featureType, chunk.getItems());
|
|
featureStore.addFeatures(collection); // Append chunk to shapefile
|
|
// chunk goes out of scope → GC eligible
|
|
}
|
|
|
|
@AfterStep
|
|
public void afterStep() {
|
|
transaction.commit(); // Commit all chunks at once
|
|
transaction.close();
|
|
}
|
|
```
|
|
|
|
**PostgreSQL Array Parameter Handling**:
|
|
```java
|
|
// InferenceResultItemReaderConfig uses PreparedStatementSetter
|
|
ps -> {
|
|
Array batchIdsArray = ps.getConnection().createArrayOf("bigint", batchIds.toArray());
|
|
ps.setArray(1, batchIdsArray); // WHERE batch_id = ANY(?)
|
|
ps.setString(2, mapId);
|
|
}
|
|
```
|
|
|
|
**Output Directory Strategy**:
|
|
- Batch mode (MERGED): `{output-base-dir}/{inference-id}/merge/` → Single merged shapefile + GeoJSON
|
|
- Batch mode (map_id partitioning): `{output-base-dir}/{inference-id}/{map-id}/` → Per-map_id files
|
|
- Legacy mode: `{output-base-dir}/{inference-id}/{map-id}/` (no merge folder)
|
|
|
|
**GeoServer Registration**:
|
|
- Only shapefile ZIP is uploaded (GeoJSON not registered)
|
|
- Requires pre-created workspace 'cd' and environment variables for auth
|
|
- Conditional execution via JobParameter `geoserver.enabled`
|
|
- Non-blocking: failures logged but don't stop batch job
|
|
|
|
## Configuration
|
|
|
|
### Profile System
|
|
- Default profile: `prod` (set in application.yml)
|
|
- Configuration hierarchy: `application.yml` → `application-{profile}.yml`
|
|
- Override via: `--spring.profiles.active=dev`
|
|
|
|
### Key Configuration Properties
|
|
|
|
**Converter Settings** (`ConverterProperties.java`):
|
|
```yaml
|
|
converter:
|
|
inference-id: 'D5E46F60FC40B1A8BE0CD1F3547AA6' # Output folder name
|
|
batch-ids: [252, 253, 257] # PostgreSQL batch_id filter (required)
|
|
map-ids: [] # Legacy mode only (ignored in batch mode)
|
|
mode: 'MERGED' # Legacy mode only: MERGED, MAP_IDS, or RESOLVE
|
|
output-base-dir: '/data/model_output/export/'
|
|
crs: 'EPSG:5186' # Korean 2000 / Central Belt
|
|
|
|
batch:
|
|
chunk-size: 1000 # Records per chunk (affects memory usage)
|
|
fetch-size: 1000 # JDBC cursor fetch size
|
|
skip-limit: 100 # Max skippable records per chunk
|
|
enable-partitioning: false # Future: parallel map_id processing
|
|
```
|
|
|
|
**GeoServer Settings** (`GeoServerProperties.java`):
|
|
```yaml
|
|
geoserver:
|
|
base-url: 'https://kamco.geo-dev.gs.dabeeo.com/geoserver'
|
|
workspace: 'cd' # Must be pre-created in GeoServer
|
|
overwrite-existing: true # Delete existing layer before registration
|
|
connection-timeout: 30000 # 30 seconds
|
|
read-timeout: 60000 # 60 seconds
|
|
# Credentials from environment variables (preferred):
|
|
# GEOSERVER_USERNAME, GEOSERVER_PASSWORD
|
|
```
|
|
|
|
**Spring Batch Metadata**:
|
|
```yaml
|
|
spring:
|
|
batch:
|
|
job:
|
|
enabled: false # Prevent auto-run on startup
|
|
jdbc:
|
|
initialize-schema: always # Auto-create BATCH_* tables
|
|
```
|
|
|
|
## Database Integration
|
|
|
|
### Query Strategies
|
|
|
|
**Spring Batch Mode** (streaming):
|
|
```sql
|
|
-- InferenceResultItemReaderConfig.java
|
|
SELECT uid, map_id, probability, before_year, after_year,
|
|
before_c, before_p, after_c, after_p,
|
|
ST_AsText(geometry) as geometry_wkt
|
|
FROM inference_results_testing
|
|
WHERE batch_id = ANY(?)
|
|
AND ST_GeometryType(geometry) IN ('ST_Polygon', 'ST_MultiPolygon')
|
|
AND ST_SRID(geometry) = 5186
|
|
AND ST_X(ST_Centroid(geometry)) BETWEEN 125000 AND 530000
|
|
AND ST_Y(ST_Centroid(geometry)) BETWEEN -600000 AND 988000
|
|
AND ST_IsValid(geometry) = true
|
|
ORDER BY map_id, uid
|
|
-- Uses server-side cursor with fetch-size=1000
|
|
```
|
|
|
|
**Legacy Mode** (full load):
|
|
```sql
|
|
-- InferenceResultRepository.java
|
|
SELECT uid, map_id, probability, before_year, after_year,
|
|
before_c, before_p, after_c, after_p,
|
|
ST_AsText(geometry) as geometry_wkt
|
|
FROM inference_results_testing
|
|
WHERE batch_id = ANY(?) AND map_id = ?
|
|
-- Returns full List<InferenceResult> in memory
|
|
```
|
|
|
|
**Geometry Type Validation**:
|
|
```sql
|
|
-- GeometryTypeValidationTasklet.java
|
|
SELECT DISTINCT ST_GeometryType(geometry)
|
|
FROM inference_results_testing
|
|
WHERE batch_id = ANY(?) AND geometry IS NOT NULL
|
|
-- Pre-validates homogeneous geometry requirement
|
|
```
|
|
|
|
### Field Mapping
|
|
Database columns map to shapefile fields (10-character limit):
|
|
|
|
| Database Column | DB Type | Shapefile Field | Shapefile Type | Notes |
|
|
|-----------------|---------|-----------------|----------------|-------|
|
|
| uid | uuid | chnDtctId | String | Change detection ID |
|
|
| map_id | text | mpqd_no | String | Map quadrant number |
|
|
| probability | float8 | chn_dtct_p | Double | Change detection probability |
|
|
| before_year | bigint | cprs_yr | Long | Comparison year |
|
|
| after_year | bigint | crtr_yr | Long | Criteria year |
|
|
| before_c | text | bf_cls_cd | String | Before classification code |
|
|
| before_p | float8 | bf_cls_pro | Double | Before classification probability |
|
|
| after_c | text | af_cls_cd | String | After classification code |
|
|
| after_p | float8 | af_cls_pro | Double | After classification probability |
|
|
| geometry | geom | the_geom | Polygon | Geometry in EPSG:5186 |
|
|
|
|
**Field name source**: See `FeatureTypeFactory.java` (batch/util/FeatureTypeFactory.java:1-104)
|
|
|
|
### Coordinate Reference System
|
|
- **CRS**: EPSG:5186 (Korean 2000 / Central Belt)
|
|
- **Valid Coordinate Bounds**: X ∈ [125km, 530km], Y ∈ [-600km, 988km]
|
|
- **Encoding**: WKT in SQL → JTS Geometry → GeoTools SimpleFeature → `.prj` file
|
|
- **Validation**: Automatic in batch mode via `ST_X(ST_Centroid())` range check
|
|
|
|
## Dependencies
|
|
|
|
**Core Framework**:
|
|
- Spring Boot 3.5.7
|
|
- `spring-boot-starter`: DI container, logging
|
|
- `spring-boot-starter-jdbc`: JDBC template, HikariCP
|
|
- `spring-boot-starter-batch`: Spring Batch framework, job repository
|
|
- `spring-boot-starter-web`: RestTemplate for GeoServer API calls
|
|
- `spring-boot-starter-validation`: @NotBlank annotations
|
|
|
|
**Spatial Libraries**:
|
|
- GeoTools 30.0 (via OSGeo repository)
|
|
- `gt-shapefile`: Shapefile I/O (DataStore, FeatureStore, Transaction)
|
|
- `gt-geojson`: GeoJSON encoding/decoding
|
|
- `gt-referencing`: CRS transformations
|
|
- `gt-epsg-hsql`: EPSG database for CRS lookups
|
|
- JTS 1.19.0: Geometry primitives (Polygon, MultiPolygon, GeometryFactory)
|
|
- PostGIS JDBC 2.5.1: PostGIS geometry type support
|
|
|
|
**Database**:
|
|
- PostgreSQL JDBC Driver (latest)
|
|
- HikariCP (bundled with Spring Boot)
|
|
|
|
**Build Configuration**:
|
|
```gradle
|
|
// build.gradle
|
|
configurations.all {
|
|
exclude group: 'javax.media', module: 'jai_core' // Conflicts with GeoTools
|
|
}
|
|
|
|
bootJar {
|
|
archiveFileName = "shp-exporter.jar" // Fixed JAR name
|
|
}
|
|
|
|
spotless {
|
|
java {
|
|
googleJavaFormat('1.19.2') // 2-space indentation
|
|
}
|
|
}
|
|
```
|
|
|
|
## Development Patterns
|
|
|
|
### Adding a New Step to Spring Batch Job
|
|
|
|
When adding steps to `mergedModeJob`, follow this pattern:
|
|
|
|
1. **Create Tasklet or ItemWriter** in `batch/tasklet/` or `batch/writer/`
|
|
2. **Define Step Bean** in `MergedModeJobConfig.java`:
|
|
```java
|
|
@Bean
|
|
public Step myNewStep(JobRepository jobRepository,
|
|
PlatformTransactionManager transactionManager,
|
|
MyTasklet tasklet,
|
|
BatchExecutionHistoryListener historyListener) {
|
|
return new StepBuilder("myNewStep", jobRepository)
|
|
.tasklet(tasklet, transactionManager)
|
|
.listener(historyListener) // REQUIRED for history tracking
|
|
.build();
|
|
}
|
|
```
|
|
3. **Add to Job Flow** in `mergedModeJob()`:
|
|
```java
|
|
.next(myNewStep)
|
|
```
|
|
4. **Always include `BatchExecutionHistoryListener`** to track execution metrics
|
|
|
|
### Modifying ItemReader Configuration
|
|
|
|
ItemReaders are **not thread-safe**. Each step requires its own instance:
|
|
|
|
```java
|
|
// WRONG: Sharing reader between steps
|
|
@Bean
|
|
public JdbcCursorItemReader<InferenceResult> reader() { ... }
|
|
|
|
// RIGHT: Separate readers with @StepScope
|
|
@Bean
|
|
@StepScope // Creates new instance per step
|
|
public JdbcCursorItemReader<InferenceResult> shapefileReader() { ... }
|
|
|
|
@Bean
|
|
@StepScope
|
|
public JdbcCursorItemReader<InferenceResult> geoJsonReader() { ... }
|
|
```
|
|
|
|
See `InferenceResultItemReaderConfig.java` for working examples.
|
|
|
|
### Streaming Writers Pattern
|
|
|
|
When writing custom streaming writers, follow `StreamingShapefileWriter` pattern:
|
|
|
|
```java
|
|
@Component
|
|
@StepScope
|
|
public class MyStreamingWriter implements ItemStreamWriter<MyType> {
|
|
private Transaction transaction;
|
|
|
|
@BeforeStep
|
|
public void open(ExecutionContext context) {
|
|
// Open resources, start transaction
|
|
transaction = new DefaultTransaction("create");
|
|
}
|
|
|
|
@Override
|
|
public void write(Chunk<? extends MyType> chunk) {
|
|
// Write chunk incrementally
|
|
// Do NOT accumulate in memory
|
|
}
|
|
|
|
@AfterStep
|
|
public ExitStatus afterStep(StepExecution stepExecution) {
|
|
transaction.commit(); // Commit all chunks
|
|
transaction.close();
|
|
return ExitStatus.COMPLETED;
|
|
}
|
|
}
|
|
```
|
|
|
|
### JobParameters and StepExecutionContext
|
|
|
|
**Pass data between steps** using `StepExecutionContext`:
|
|
|
|
```java
|
|
// Step 1: Store data
|
|
stepExecution.getExecutionContext().putString("geometryType", "ST_Polygon");
|
|
|
|
// Step 2: Retrieve data
|
|
@BeforeStep
|
|
public void beforeStep(StepExecution stepExecution) {
|
|
String geomType = stepExecution.getJobExecution()
|
|
.getExecutionContext()
|
|
.getString("geometryType");
|
|
}
|
|
```
|
|
|
|
**Job-level parameters** from command line:
|
|
```java
|
|
// ConverterCommandLineRunner.buildJobParameters()
|
|
JobParametersBuilder builder = new JobParametersBuilder();
|
|
builder.addString("inferenceId", converterProperties.getInferenceId());
|
|
builder.addLong("timestamp", System.currentTimeMillis()); // Ensures uniqueness
|
|
```
|
|
|
|
### Partitioning Pattern (Map ID Processing)
|
|
|
|
The `generateMapIdFilesStep` uses partitioning but runs **sequentially** to avoid DB connection pool exhaustion:
|
|
|
|
```java
|
|
@Bean
|
|
public Step generateMapIdFilesStep(...) {
|
|
return new StepBuilder("generateMapIdFilesStep", jobRepository)
|
|
.partitioner("mapIdWorker", partitioner)
|
|
.step(mapIdWorkerStep)
|
|
.taskExecutor(new SyncTaskExecutor()) // SEQUENTIAL execution
|
|
.build();
|
|
}
|
|
```
|
|
|
|
For parallel execution in future (requires connection pool tuning):
|
|
```java
|
|
.taskExecutor(new SimpleAsyncTaskExecutor())
|
|
.gridSize(4) // 4 concurrent workers
|
|
```
|
|
|
|
### GeoServer REST API Integration
|
|
|
|
GeoServer operations use `RestTemplate` with custom error handling:
|
|
|
|
```java
|
|
// GeoServerRegistrationService.java
|
|
try {
|
|
restTemplate.exchange(url, HttpMethod.PUT, entity, String.class);
|
|
} catch (HttpClientErrorException e) {
|
|
if (e.getStatusCode() == HttpStatus.NOT_FOUND) {
|
|
// Handle workspace not found
|
|
}
|
|
}
|
|
```
|
|
|
|
Always check workspace existence before layer registration.
|
|
|
|
### Testing Considerations
|
|
|
|
- **Unit tests**: Mock `JdbcTemplate`, `DataSource` for repository tests
|
|
- **Integration tests**: Use `@SpringBatchTest` with embedded H2 database
|
|
- **GeoTools**: Use `MemoryDataStore` for shapefile writer tests
|
|
- **Current state**: Limited test coverage (focus on critical path validation)
|
|
|
|
Refer to `claudedocs/SPRING_BATCH_MIGRATION.md` for detailed batch architecture documentation.
|