Files
kamco-cd-cron/shp-exporter/README.md

17 KiB
Executable File

PostgreSQL to Shapefile Converter

Spring Boot CLI application that queries PostgreSQL spatial data and generates shapefiles.

Features

  • Batch processing for multiple map_ids
  • PostGIS geometry to Shapefile conversion
  • GeoServer REST API integration for automatic layer registration
  • Configurable via application.yml
  • Generates all required shapefile files (.shp, .shx, .dbf, .prj)
  • Supports EPSG:5186 (Korean 2000 / Central Belt) coordinate reference system
  • GeoJSON export support

Prerequisites

  • Java 17 or higher
  • PostgreSQL database with PostGIS extension
  • Access to the KAMCO database at 192.168.2.127:15432

Configuration

Edit src/main/resources/application.yml to configure:

converter:
  spring.profiles.active: 'dev' #profiles.active
  inference-id: 'D5E46F60FC40B1A8BE0CD1F3547AA6'  # Inference ID (used for output folder structure)
  map-ids:
    - '35813030'    # Add your map_ids here (text type)
  batch-ids:
    - 252
    - 253
    - 257
  output-base-dir: '/kamco-nfs/dataset/export/'
  crs: 'EPSG:5186'  # Korean 2000 / Central Belt

geoserver:
  base-url: 'https://kamco.geo-dev.gs.dabeeo.com'
  workspace: 'cd'
  datastore: 'inference_result'
  overwrite-existing: true  # Delete existing layers before re-registering
  connection-timeout: 30000  # 30 seconds
  read-timeout: 60000        # 60 seconds
## Converter Mode (`converter.mode`)

`converter.mode`는 대량 `map_ids` 처리 시
OS 커맨드라인 길이 제한(`Argument list too long`) 문제를 방지하기 위해 추가 하였습니다.

### Supported Modes

#### MERGED
- `batch-ids`에 해당하는 **모든 데이터를 하나의 Shapefile로 병합 생성**
- `map-ids`가 설정되어 있어도 **무시됨**
- 단일 결과 파일이 필요한 경우 적합

#### MAP_IDS
- 명시적으로 전달한 `map-ids`만 대상으로 Shapefile 생성
- `converter.map-ids` **필수**
- `map-ids` 개수가 많을 경우 OS 커맨드라인 길이 제한에 걸릴 수 있음

#### RESOLVE
- `batch-ids` 기준으로 **JAR 내부에서 map_ids를 조회**한 뒤 Shapefile 생성
- `map-ids`를 커맨드라인 인자로 전달하지 않음
- 대량 데이터 처리 시 가장 안전한 방식

### Default Behavior (mode 미지정 시)

- `converter.map-ids`가 비어 있으면 → **MERGED**
- `converter.map-ids`가 있으면 → **MAP_IDS**

### Command Line Parameters

You can override configuration values using command line arguments:

**Using Gradle (recommended - no quoting issues):**

```bash
./gradlew bootRun --args="--converter.inference-id=D5E46F60FC40B1A8BE0CD1F3547AA6 --converter.map-ids[0]=35813030 --converter.batch-ids[0]=252 --converter.batch-ids[1]=253 --converter.batch-ids[2]=257 --converter.mode=MERGED"

Using JAR (zsh shell - quote arguments with brackets):

java -jar build/libs/shp-exporter.jar \
  '--converter.inference-id=D5E46F60FC40B1A8BE0CD1F3547AA6' \
  '--converter.map-ids[0]=35813030' \
  '--converter.batch-ids[0]=252' \
  '--converter.batch-ids[1]=253' \
  '--converter.batch-ids[2]=257' \
  '--converter.mode=MERGED'

Using JAR (bash shell - no quotes needed):

java -jar build/libs/shp-exporter.jar \
  --converter.inference-id=D5E46F60FC40B1A8BE0CD1F3547AA6 \
  --converter.map-ids[0]=35813030 \
  --converter.batch-ids[0]=252 \
  --converter.batch-ids[1]=253 \
  --converter.batch-ids[2]=257 \
  --converter.mode=MERGED

Note for zsh users: zsh interprets square brackets [] as glob patterns. Always quote arguments containing brackets when using zsh.

Building

./gradlew build

Running

New in v1.1.0: Spring Batch mode provides memory-optimized processing for large datasets.

# MERGED mode (creates single shapefile + GeoJSON for all batch-ids)
./gradlew bootRun --args="--batch --converter.batch-ids[0]=252 --converter.batch-ids[1]=253"

# With GeoServer registration
./gradlew bootRun --args="--batch --geoserver.enabled=true --converter.batch-ids[0]=252"

Output Files (in {output-base-dir}/{inference-id}/merge/):

  • {inference-id}.shp (+ .shx, .dbf, .prj) - Shapefile
  • {inference-id}.geojson - GeoJSON file
  • {inference-id}.zip - ZIP archive of shapefile

Benefits:

  • 90-95% memory reduction (2-13GB → 150-200MB for 1M records)
  • Chunk-based streaming (1000 records per chunk)
  • Restart capability after failures
  • Step-by-step execution support

See claudedocs/SPRING_BATCH_MIGRATION.md for detailed documentation.

Generate Shapefiles (Legacy Mode)

./gradlew bootRun

Or run the JAR directly:

java -jar build/libs/shp-exporter.jar

Register Shapefile to GeoServer

First, set GeoServer credentials as environment variables:

export GEOSERVER_USERNAME=admin
export GEOSERVER_PASSWORD=geoserver

Then register a shapefile:

./gradlew bootRun --args="--upload-shp /kamco-nfs/dataset/export/D5E46F60FC40B1A8BE0CD1F3547AA6/35813030/35813030.shp --layer inference_35813030"

Or using the JAR:

java -jar build/libs/shp-exporter.jar \
  --upload-shp /path/to/shapefile.shp \
  --layer layer_name

GeoServer Registration Process:

  1. Verifies workspace 'cd' exists (must be pre-created in GeoServer)
  2. Creates datastore 'inference_result' if it doesn't exist
  3. Deletes existing layer if overwrite-existing: true
  4. Publishes shapefile via REST API
  5. Verifies successful registration
  6. Automatically enables WMS, WFS, WMTS services

Important Notes:

  • Workspace 'cd' must exist in GeoServer before registration
  • Environment variables GEOSERVER_USERNAME and GEOSERVER_PASSWORD must be set
  • Shapefile path must be absolute
  • GeoServer must have file system access to the shapefile location

Output

Legacy Mode Output

Shapefiles will be created in directories structured as output-base-dir/inference-id/map-id/:

/kamco-nfs/dataset/export/D5E46F60FC40B1A8BE0CD1F3547AA6/35813030/
├── 35813030.shp    # Shapefile geometry
├── 35813030.shx    # Shape index
├── 35813030.dbf    # Attribute data
└── 35813030.prj    # Projection information

Spring Batch Mode Output

Output structure for MERGED mode (output-base-dir/inference-id/merge/):

/kamco-nfs/dataset/export/D5E46F60FC40B1A8BE0CD1F3547AA6/merge/
├── D5E46F60FC40B1A8BE0CD1F3547AA6.shp       # Shapefile geometry
├── D5E46F60FC40B1A8BE0CD1F3547AA6.shx       # Shape index
├── D5E46F60FC40B1A8BE0CD1F3547AA6.dbf       # Attribute data
├── D5E46F60FC40B1A8BE0CD1F3547AA6.prj       # Projection information
├── D5E46F60FC40B1A8BE0CD1F3547AA6.geojson   # GeoJSON format
└── D5E46F60FC40B1A8BE0CD1F3547AA6.zip       # ZIP archive (for GeoServer)

Note: Only the shapefile (.shp and related files) are registered to GeoServer. GeoJSON files are generated for alternative consumption.

Database Query

The Spring Batch mode applies comprehensive validation to ensure data quality:

ㄴㅅ
ORDER BY map_id, uid

Validation Criteria:

  • Geometry Type: Only ST_Polygon and ST_MultiPolygon (excludes Point, LineString, etc.)
  • Coordinate System: EPSG:5186 (Korean 2000 / Central Belt)
  • Coordinate Range: Korea territory bounds (X: 125-530km, Y: -600-988km)
  • Geometry Validity: Valid topology (ST_IsValid)

Rows failing validation are automatically excluded from processing, ensuring clean shapefile generation.

Performance: See PERFORMANCE_OPTIMIZATION.md for indexing recommendations.

Legacy Mode

Legacy mode uses a simpler query without validation:

SELECT uid, map_id, probability, before_year, after_year,
       before_c, before_p, after_c, after_p, ST_AsText(geometry) as geometry_wkt
FROM inference_results_testing
WHERE batch_id = ANY(?) AND map_id = ?

Database Schema

  • geometry: geometry(Polygon, 5186) - EPSG:5186 좌표계
  • map_id: text - 문자열 타입
  • before_year, after_year: bigint - Long 타입
  • batch_id: bigint - Long 타입
  • uid: uuid - UUID 타입

Field Mapping

Shapefile field names are limited to 10 characters:

Database Column DB Type Shapefile Field Shapefile Type
uid uuid chnDtctId String
map_id text mpqd_no String
probability float8 chn_dtct_p Double
before_year bigint cprs_yr Long
after_year bigint crtr_yr Long
before_c text bf_cls_cd String
before_p float8 bf_cls_pro Double
after_c text af_cls_cd String
after_p float8 af_cls_pro Double
geometry geom the_geom Polygon

Error Handling

Shapefile Generation

  • No results: Logs warning and continues to next map_id
  • Mixed geometry types: Throws exception (shapefiles require homogeneous geometry)
  • Database connection failure: Application exits with error
  • Invalid geometry: Logs warning and continues processing

GeoServer Registration

  • Workspace not found: Logs error with remediation steps (must be pre-created)
  • Authentication failure: Logs error prompting to verify environment variables
  • Network timeout: Logs connection error with timeout details
  • Layer already exists: Automatically deletes and re-registers if overwrite-existing: true
  • Registration failure: Logs error but does not stop application (non-blocking)

Validating Output

Shapefile Validation

Open the generated shapefiles in QGIS or ArcGIS to verify:

  1. Geometry displays correctly
  2. Attribute table contains all expected fields
  3. CRS is EPSG:5186 (Korean 2000 / Central Belt)

GeoServer Layer Validation

After registering to GeoServer, verify the layer:

  1. GeoServer Admin Console: https://kamco.geo-dev.gs.dabeeo.com/geoserver/web
  2. Navigate to Layers → Find your layer (e.g., cd:inference_35813030)
  3. Preview the layer using Layer Preview
  4. Verify services are enabled:
    • WMS: https://kamco.geo-dev.gs.dabeeo.com/geoserver/cd/wms
    • WFS: https://kamco.geo-dev.gs.dabeeo.com/geoserver/cd/wfs
    • WMTS: https://kamco.geo-dev.gs.dabeeo.com/geoserver/cd/wmts

Example WMS GetMap Request:

https://kamco.geo-dev.gs.dabeeo.com/geoserver/cd/wms?
  service=WMS&
  version=1.1.0&
  request=GetMap&
  layers=cd:inference_35813030&
  bbox=<bounds>&
  width=768&
  height=768&
  srs=EPSG:5186&
  format=image/png

Development

Code Formatting

The project uses Google Java Format with 2-space indentation:

./gradlew spotlessApply

Project Structure

src/main/java/com/kamco/makesample/
├── MakeSampleApplication.java                    # Main application class
├── batch/                                         # Spring Batch components (v1.1.0+)
│   ├── config/
│   │   ├── BatchConfiguration.java               # Spring Batch configuration
│   │   └── MergedModeJobConfig.java              # MERGED mode Job definition
│   ├── processor/
│   │   └── FeatureConversionProcessor.java       # InferenceResult → SimpleFeature processor
│   ├── reader/
│   │   ├── GeometryConvertingRowMapper.java      # WKT → JTS converter
│   │   └── InferenceResultItemReaderConfig.java  # Cursor-based DB reader
│   ├── tasklet/
│   │   ├── CreateZipTasklet.java                 # ZIP creation tasklet
│   │   ├── GeoServerRegistrationTasklet.java     # GeoServer registration tasklet
│   │   └── GeometryTypeValidationTasklet.java    # Geometry validation tasklet
│   ├── util/
│   │   └── FeatureTypeFactory.java               # Shared feature type creation
│   └── writer/
│       ├── StreamingGeoJsonWriter.java           # Streaming GeoJSON writer
│       └── StreamingShapefileWriter.java         # Streaming shapefile writer
├── cli/
│   └── ConverterCommandLineRunner.java           # CLI entry point (batch + legacy)
├── config/
│   ├── ConverterProperties.java                  # Shapefile converter configuration
│   ├── GeoServerProperties.java                  # GeoServer configuration
│   ├── GeoServerCredentials.java                 # GeoServer authentication
│   └── RestTemplateConfig.java                   # HTTP client configuration
├── exception/
│   ├── ShapefileConversionException.java
│   ├── GeometryConversionException.java
│   ├── MixedGeometryException.java
│   └── GeoServerRegistrationException.java       # GeoServer registration errors
├── model/
│   └── InferenceResult.java                      # Domain model
├── repository/
│   └── InferenceResultRepository.java            # Data access layer (legacy)
├── service/
│   ├── GeometryConverter.java                    # PostGIS to JTS conversion
│   ├── ShapefileConverterService.java            # Orchestration service (legacy)
│   └── GeoServerRegistrationService.java         # GeoServer REST API integration
└── writer/
    ├── ShapefileWriter.java                       # GeoTools shapefile writer (legacy)
    └── GeoJsonWriter.java                         # GeoJSON export writer (legacy)

Dependencies

  • Spring Boot 3.5.7
    • spring-boot-starter
    • spring-boot-starter-jdbc
    • spring-boot-starter-batch (v1.1.0+)
    • spring-boot-starter-web (for RestTemplate)
    • spring-boot-starter-validation (for @NotBlank annotations)
  • GeoTools 30.0
    • gt-shapefile
    • gt-referencing
    • gt-epsg-hsql
    • gt-geojson
  • PostgreSQL JDBC Driver
  • PostGIS JDBC 2.5.1
  • JTS (Java Topology Suite) 1.19.0

Troubleshooting

GeoServer Registration Issues

Problem: "Workspace not found: cd"

Solution: Create workspace 'cd' in GeoServer admin console before registration
Steps:
1. Login to GeoServer admin: https://kamco.geo-dev.gs.dabeeo.com/geoserver/web
2. Go to Workspaces → Add new workspace
3. Name: cd, Namespace URI: http://cd
4. Click Save

Problem: "GeoServer credentials not configured"

Solution: Set environment variables before running
export GEOSERVER_USERNAME=admin
export GEOSERVER_PASSWORD=geoserver

Problem: "Layer already exists and overwrite is disabled"

Solution: Enable overwrite in application.yml
geoserver:
  overwrite-existing: true

Problem: Connection timeout to GeoServer

Solution: Increase timeout values in application.yml
geoserver:
  connection-timeout: 60000  # 60 seconds
  read-timeout: 120000       # 120 seconds

Problem: "Registration failed. Layer not found after publication"

Possible causes:
1. GeoServer cannot access shapefile path (check file system permissions)
2. Shapefile is corrupted or invalid
3. Network issue interrupted registration

Solution:
1. Verify GeoServer has read access to shapefile directory
2. Validate shapefile using QGIS or ogr2ogr
3. Check GeoServer logs for detailed error messages

Database Connection Issues

Problem: "Connection refused to 192.168.2.127:15432"

Solution: Verify PostgreSQL is running and accessible
psql -h 192.168.2.127 -p 15432 -U kamco_cds -d kamco_cds

Problem: "No results found for map_id"

Solution: Verify data exists in database
SELECT COUNT(*) FROM inference_results_testing
WHERE batch_id IN (252, 253, 257) AND map_id = '35813030';

Batch Execution History

Overview

Spring Batch mode automatically tracks execution history for each step, recording:

  • Start time, end time, duration
  • Success/failure status
  • Error messages and stack traces (if failed)
  • Processing statistics (read/write/commit/rollback/skip counts)

Table Setup

Create the batch_execution_history table before running batch jobs:

psql -h 192.168.2.127 -p 15432 -U kamco_cds -d kamco_cds \
  -f src/main/resources/db/migration/V1__create_batch_execution_history.sql

Query Examples

View execution history for a specific job:

SELECT step_name, start_time, end_time, duration_ms, status, read_count, write_count
FROM batch_execution_history
WHERE job_execution_id = 123
ORDER BY start_time;

Check failed steps:

SELECT job_execution_id, step_name, start_time, error_message
FROM batch_execution_history
WHERE status = 'FAILED'
ORDER BY start_time DESC
LIMIT 10;

Average step duration:

SELECT step_name,
       COUNT(*) as executions,
       ROUND(AVG(duration_ms) / 1000.0, 2) as avg_duration_sec
FROM batch_execution_history
WHERE status = 'COMPLETED'
GROUP BY step_name
ORDER BY avg_duration_sec DESC;

For more query examples and detailed documentation, see BATCH_EXECUTION_HISTORY.md.

License

KAMCO Internal Use Only