# PostgreSQL to Shapefile Converter Spring Boot CLI application that queries PostgreSQL spatial data and generates shapefiles. ## Features - Batch processing for multiple map_ids - PostGIS geometry to Shapefile conversion - **GeoServer REST API integration for automatic layer registration** - Configurable via `application.yml` - Generates all required shapefile files (.shp, .shx, .dbf, .prj) - Supports EPSG:5186 (Korean 2000 / Central Belt) coordinate reference system - GeoJSON export support ## Prerequisites - Java 17 or higher - PostgreSQL database with PostGIS extension - Access to the KAMCO database at 192.168.2.127:15432 ## Configuration Edit `src/main/resources/application.yml` to configure: ```yaml converter: spring.profiles.active: 'dev' #profiles.active inference-id: 'D5E46F60FC40B1A8BE0CD1F3547AA6' # Inference ID (used for output folder structure) map-ids: - '35813030' # Add your map_ids here (text type) batch-ids: - 252 - 253 - 257 output-base-dir: '/kamco-nfs/dataset/export/' crs: 'EPSG:5186' # Korean 2000 / Central Belt geoserver: base-url: 'https://kamco.geo-dev.gs.dabeeo.com' workspace: 'cd' datastore: 'inference_result' overwrite-existing: true # Delete existing layers before re-registering connection-timeout: 30000 # 30 seconds read-timeout: 60000 # 60 seconds ``` ```md ## Converter Mode (`converter.mode`) `converter.mode`는 대량 `map_ids` 처리 시 OS 커맨드라인 길이 제한(`Argument list too long`) 문제를 방지하기 위해 추가 하였습니다. ### Supported Modes #### MERGED - `batch-ids`에 해당하는 **모든 데이터를 하나의 Shapefile로 병합 생성** - `map-ids`가 설정되어 있어도 **무시됨** - 단일 결과 파일이 필요한 경우 적합 #### MAP_IDS - 명시적으로 전달한 `map-ids`만 대상으로 Shapefile 생성 - `converter.map-ids` **필수** - `map-ids` 개수가 많을 경우 OS 커맨드라인 길이 제한에 걸릴 수 있음 #### RESOLVE - `batch-ids` 기준으로 **JAR 내부에서 map_ids를 조회**한 뒤 Shapefile 생성 - `map-ids`를 커맨드라인 인자로 전달하지 않음 - 대량 데이터 처리 시 가장 안전한 방식 ### Default Behavior (mode 미지정 시) - `converter.map-ids`가 비어 있으면 → **MERGED** - `converter.map-ids`가 있으면 → **MAP_IDS** ### Command Line Parameters You can override configuration values using command line arguments: **Using Gradle (recommended - no quoting issues):** ```bash ./gradlew bootRun --args="--converter.inference-id=D5E46F60FC40B1A8BE0CD1F3547AA6 --converter.map-ids[0]=35813030 --converter.batch-ids[0]=252 --converter.batch-ids[1]=253 --converter.batch-ids[2]=257 --converter.mode=MERGED" ``` **Using JAR (zsh shell - quote arguments with brackets):** ```bash java -jar build/libs/shp-exporter.jar \ '--converter.inference-id=D5E46F60FC40B1A8BE0CD1F3547AA6' \ '--converter.map-ids[0]=35813030' \ '--converter.batch-ids[0]=252' \ '--converter.batch-ids[1]=253' \ '--converter.batch-ids[2]=257' \ '--converter.mode=MERGED' ``` **Using JAR (bash shell - no quotes needed):** ```bash java -jar build/libs/shp-exporter.jar \ --converter.inference-id=D5E46F60FC40B1A8BE0CD1F3547AA6 \ --converter.map-ids[0]=35813030 \ --converter.batch-ids[0]=252 \ --converter.batch-ids[1]=253 \ --converter.batch-ids[2]=257 \ --converter.mode=MERGED ``` ```bash java -jar build/libs/shp-exporter.jar \ --batch \ --converter.inference-id=test009 \ --converter.batch-ids[0]=111 \ --converter.batch-ids[1]=114 \ --converter.batch-ids[2]=162 \ --geoserver.enabled=true ``` **Note for zsh users:** zsh interprets square brackets `[]` as glob patterns. Always quote arguments containing brackets when using zsh. ## Building ```bash ./gradlew build ``` ## Running ### Generate Shapefiles (Spring Batch Mode - Recommended) **New in v1.1.0**: Spring Batch mode provides memory-optimized processing for large datasets. ```bash # MERGED mode (creates single shapefile + GeoJSON for all batch-ids) ./gradlew bootRun --args="--batch --converter.batch-ids[0]=252 --converter.batch-ids[1]=253" # With GeoServer registration ./gradlew bootRun --args="--batch --geoserver.enabled=true --converter.batch-ids[0]=252" ``` **Output Files** (in `{output-base-dir}/{inference-id}/merge/`): - `{inference-id}.shp` (+ .shx, .dbf, .prj) - Shapefile - `{inference-id}.geojson` - GeoJSON file - `{inference-id}.zip` - ZIP archive of shapefile **Benefits**: - 90-95% memory reduction (2-13GB → 150-200MB for 1M records) - Chunk-based streaming (1000 records per chunk) - Restart capability after failures - Step-by-step execution support See [claudedocs/SPRING_BATCH_MIGRATION.md](claudedocs/SPRING_BATCH_MIGRATION.md) for detailed documentation. ### Generate Shapefiles (Legacy Mode) ```bash ./gradlew bootRun ``` Or run the JAR directly: ```bash java -jar build/libs/shp-exporter.jar ``` ### Register Shapefile to GeoServer First, set GeoServer credentials as environment variables: ```bash export GEOSERVER_USERNAME=admin export GEOSERVER_PASSWORD=geoserver ``` Then register a shapefile: ```bash ./gradlew bootRun --args="--upload-shp /kamco-nfs/dataset/export/D5E46F60FC40B1A8BE0CD1F3547AA6/35813030/35813030.shp --layer inference_35813030" ``` Or using the JAR: ```bash java -jar build/libs/shp-exporter.jar \ --upload-shp /path/to/shapefile.shp \ --layer layer_name ``` **GeoServer Registration Process:** 1. Verifies workspace 'cd' exists (must be pre-created in GeoServer) 2. Creates datastore 'inference_result' if it doesn't exist 3. Deletes existing layer if `overwrite-existing: true` 4. Publishes shapefile via REST API 5. Verifies successful registration 6. Automatically enables WMS, WFS, WMTS services **Important Notes:** - Workspace 'cd' must exist in GeoServer before registration - Environment variables `GEOSERVER_USERNAME` and `GEOSERVER_PASSWORD` must be set - Shapefile path must be absolute - GeoServer must have file system access to the shapefile location ## Output ### Legacy Mode Output Shapefiles will be created in directories structured as `output-base-dir/inference-id/map-id/`: ``` /kamco-nfs/dataset/export/D5E46F60FC40B1A8BE0CD1F3547AA6/35813030/ ├── 35813030.shp # Shapefile geometry ├── 35813030.shx # Shape index ├── 35813030.dbf # Attribute data └── 35813030.prj # Projection information ``` ### Spring Batch Mode Output Output structure for MERGED mode (`output-base-dir/inference-id/merge/`): ``` /kamco-nfs/dataset/export/D5E46F60FC40B1A8BE0CD1F3547AA6/merge/ ├── D5E46F60FC40B1A8BE0CD1F3547AA6.shp # Shapefile geometry ├── D5E46F60FC40B1A8BE0CD1F3547AA6.shx # Shape index ├── D5E46F60FC40B1A8BE0CD1F3547AA6.dbf # Attribute data ├── D5E46F60FC40B1A8BE0CD1F3547AA6.prj # Projection information ├── D5E46F60FC40B1A8BE0CD1F3547AA6.geojson # GeoJSON format └── D5E46F60FC40B1A8BE0CD1F3547AA6.zip # ZIP archive (for GeoServer) ``` **Note**: Only the shapefile (.shp and related files) are registered to GeoServer. GeoJSON files are generated for alternative consumption. ## Database Query ### Spring Batch Mode (Recommended) The Spring Batch mode applies comprehensive validation to ensure data quality: ```sql ㄴㅅ ORDER BY map_id, uid ``` **Validation Criteria**: - **Geometry Type**: Only ST_Polygon and ST_MultiPolygon (excludes Point, LineString, etc.) - **Coordinate System**: EPSG:5186 (Korean 2000 / Central Belt) - **Coordinate Range**: Korea territory bounds (X: 125-530km, Y: -600-988km) - **Geometry Validity**: Valid topology (ST_IsValid) Rows failing validation are automatically excluded from processing, ensuring clean shapefile generation. **Performance**: See [PERFORMANCE_OPTIMIZATION.md](claudedocs/PERFORMANCE_OPTIMIZATION.md) for indexing recommendations. ### Legacy Mode Legacy mode uses a simpler query without validation: ```sql SELECT uid, map_id, probability, before_year, after_year, before_c, before_p, after_c, after_p, ST_AsText(geometry) as geometry_wkt FROM inference_results_testing WHERE batch_id = ANY(?) AND map_id = ? ``` ### Database Schema - **geometry**: `geometry(Polygon, 5186)` - EPSG:5186 좌표계 - **map_id**: `text` - 문자열 타입 - **before_year, after_year**: `bigint` - Long 타입 - **batch_id**: `bigint` - Long 타입 - **uid**: `uuid` - UUID 타입 ## Field Mapping Shapefile field names are limited to 10 characters: | Database Column | DB Type | Shapefile Field | Shapefile Type | |-----------------|----------|-----------------|----------------| | uid | uuid | chnDtctId | String | | map_id | text | mpqd_no | String | | probability | float8 | chn_dtct_p | Double | | before_year | bigint | cprs_yr | Long | | after_year | bigint | crtr_yr | Long | | before_c | text | bf_cls_cd | String | | before_p | float8 | bf_cls_pro | Double | | after_c | text | af_cls_cd | String | | after_p | float8 | af_cls_pro | Double | | geometry | geom | the_geom | Polygon | ## Error Handling ### Shapefile Generation - **No results**: Logs warning and continues to next map_id - **Mixed geometry types**: Throws exception (shapefiles require homogeneous geometry) - **Database connection failure**: Application exits with error - **Invalid geometry**: Logs warning and continues processing ### GeoServer Registration - **Workspace not found**: Logs error with remediation steps (must be pre-created) - **Authentication failure**: Logs error prompting to verify environment variables - **Network timeout**: Logs connection error with timeout details - **Layer already exists**: Automatically deletes and re-registers if `overwrite-existing: true` - **Registration failure**: Logs error but does not stop application (non-blocking) ## Validating Output ### Shapefile Validation Open the generated shapefiles in QGIS or ArcGIS to verify: 1. Geometry displays correctly 2. Attribute table contains all expected fields 3. CRS is EPSG:5186 (Korean 2000 / Central Belt) ### GeoServer Layer Validation After registering to GeoServer, verify the layer: 1. **GeoServer Admin Console**: https://kamco.geo-dev.gs.dabeeo.com/geoserver/web 2. Navigate to **Layers** → Find your layer (e.g., `cd:inference_35813030`) 3. Preview the layer using **Layer Preview** 4. Verify services are enabled: - WMS: `https://kamco.geo-dev.gs.dabeeo.com/geoserver/cd/wms` - WFS: `https://kamco.geo-dev.gs.dabeeo.com/geoserver/cd/wfs` - WMTS: `https://kamco.geo-dev.gs.dabeeo.com/geoserver/cd/wmts` **Example WMS GetMap Request:** ``` https://kamco.geo-dev.gs.dabeeo.com/geoserver/cd/wms? service=WMS& version=1.1.0& request=GetMap& layers=cd:inference_35813030& bbox=& width=768& height=768& srs=EPSG:5186& format=image/png ``` ## Development ### Code Formatting The project uses Google Java Format with 2-space indentation: ```bash ./gradlew spotlessApply ``` ### Project Structure ``` src/main/java/com/kamco/makesample/ ├── MakeSampleApplication.java # Main application class ├── batch/ # Spring Batch components (v1.1.0+) │ ├── config/ │ │ ├── BatchConfiguration.java # Spring Batch configuration │ │ └── MergedModeJobConfig.java # MERGED mode Job definition │ ├── processor/ │ │ └── FeatureConversionProcessor.java # InferenceResult → SimpleFeature processor │ ├── reader/ │ │ ├── GeometryConvertingRowMapper.java # WKT → JTS converter │ │ └── InferenceResultItemReaderConfig.java # Cursor-based DB reader │ ├── tasklet/ │ │ ├── CreateZipTasklet.java # ZIP creation tasklet │ │ ├── GeoServerRegistrationTasklet.java # GeoServer registration tasklet │ │ └── GeometryTypeValidationTasklet.java # Geometry validation tasklet │ ├── util/ │ │ └── FeatureTypeFactory.java # Shared feature type creation │ └── writer/ │ ├── StreamingGeoJsonWriter.java # Streaming GeoJSON writer │ └── StreamingShapefileWriter.java # Streaming shapefile writer ├── cli/ │ └── ConverterCommandLineRunner.java # CLI entry point (batch + legacy) ├── config/ │ ├── ConverterProperties.java # Shapefile converter configuration │ ├── GeoServerProperties.java # GeoServer configuration │ ├── GeoServerCredentials.java # GeoServer authentication │ └── RestTemplateConfig.java # HTTP client configuration ├── exception/ │ ├── ShapefileConversionException.java │ ├── GeometryConversionException.java │ ├── MixedGeometryException.java │ └── GeoServerRegistrationException.java # GeoServer registration errors ├── model/ │ └── InferenceResult.java # Domain model ├── repository/ │ └── InferenceResultRepository.java # Data access layer (legacy) ├── service/ │ ├── GeometryConverter.java # PostGIS to JTS conversion │ ├── ShapefileConverterService.java # Orchestration service (legacy) │ └── GeoServerRegistrationService.java # GeoServer REST API integration └── writer/ ├── ShapefileWriter.java # GeoTools shapefile writer (legacy) └── GeoJsonWriter.java # GeoJSON export writer (legacy) ``` ## Dependencies - Spring Boot 3.5.7 - spring-boot-starter - spring-boot-starter-jdbc - spring-boot-starter-batch (v1.1.0+) - spring-boot-starter-web (for RestTemplate) - spring-boot-starter-validation (for @NotBlank annotations) - GeoTools 30.0 - gt-shapefile - gt-referencing - gt-epsg-hsql - gt-geojson - PostgreSQL JDBC Driver - PostGIS JDBC 2.5.1 - JTS (Java Topology Suite) 1.19.0 ## Troubleshooting ### GeoServer Registration Issues **Problem: "Workspace not found: cd"** ``` Solution: Create workspace 'cd' in GeoServer admin console before registration Steps: 1. Login to GeoServer admin: https://kamco.geo-dev.gs.dabeeo.com/geoserver/web 2. Go to Workspaces → Add new workspace 3. Name: cd, Namespace URI: http://cd 4. Click Save ``` **Problem: "GeoServer credentials not configured"** ``` Solution: Set environment variables before running export GEOSERVER_USERNAME=admin export GEOSERVER_PASSWORD=geoserver ``` **Problem: "Layer already exists and overwrite is disabled"** ``` Solution: Enable overwrite in application.yml geoserver: overwrite-existing: true ``` **Problem: Connection timeout to GeoServer** ``` Solution: Increase timeout values in application.yml geoserver: connection-timeout: 60000 # 60 seconds read-timeout: 120000 # 120 seconds ``` **Problem: "Registration failed. Layer not found after publication"** ``` Possible causes: 1. GeoServer cannot access shapefile path (check file system permissions) 2. Shapefile is corrupted or invalid 3. Network issue interrupted registration Solution: 1. Verify GeoServer has read access to shapefile directory 2. Validate shapefile using QGIS or ogr2ogr 3. Check GeoServer logs for detailed error messages ``` ### Database Connection Issues **Problem: "Connection refused to 192.168.2.127:15432"** ``` Solution: Verify PostgreSQL is running and accessible psql -h 192.168.2.127 -p 15432 -U kamco_cds -d kamco_cds ``` **Problem: "No results found for map_id"** ``` Solution: Verify data exists in database SELECT COUNT(*) FROM inference_results_testing WHERE batch_id IN (252, 253, 257) AND map_id = '35813030'; ``` ## Batch Execution History ### Overview Spring Batch mode automatically tracks execution history for each step, recording: - Start time, end time, duration - Success/failure status - Error messages and stack traces (if failed) - Processing statistics (read/write/commit/rollback/skip counts) ### Table Setup Create the `batch_execution_history` table before running batch jobs: ```bash psql -h 192.168.2.127 -p 15432 -U kamco_cds -d kamco_cds \ -f src/main/resources/db/migration/V1__create_batch_execution_history.sql ``` ### Query Examples **View execution history for a specific job**: ```sql SELECT step_name, start_time, end_time, duration_ms, status, read_count, write_count FROM batch_execution_history WHERE job_execution_id = 123 ORDER BY start_time; ``` **Check failed steps**: ```sql SELECT job_execution_id, step_name, start_time, error_message FROM batch_execution_history WHERE status = 'FAILED' ORDER BY start_time DESC LIMIT 10; ``` **Average step duration**: ```sql SELECT step_name, COUNT(*) as executions, ROUND(AVG(duration_ms) / 1000.0, 2) as avg_duration_sec FROM batch_execution_history WHERE status = 'COMPLETED' GROUP BY step_name ORDER BY avg_duration_sec DESC; ``` For more query examples and detailed documentation, see [BATCH_EXECUTION_HISTORY.md](claudedocs/BATCH_EXECUTION_HISTORY.md). ## License KAMCO Internal Use Only