MVPTeam/kamco-cd-cron

Fork 0

Files

dean[백병남] b156b61caf 스케줄러로 변경

2026-03-08 21:33:41 +09:00

17 KiB

Executable File

Raw Blame History

PostgreSQL to Shapefile Converter

Spring Boot CLI application that queries PostgreSQL spatial data and generates shapefiles.

Features

Batch processing for multiple map_ids
PostGIS geometry to Shapefile conversion
GeoServer REST API integration for automatic layer registration
Configurable via application.yml
Generates all required shapefile files (.shp, .shx, .dbf, .prj)
Supports EPSG:5186 (Korean 2000 / Central Belt) coordinate reference system
GeoJSON export support

Prerequisites

Java 17 or higher
PostgreSQL database with PostGIS extension
Access to the KAMCO database at 192.168.2.127:15432

Configuration

Edit src/main/resources/application.yml to configure:

converter:
  spring.profiles.active: 'dev' #profiles.active
  inference-id: 'D5E46F60FC40B1A8BE0CD1F3547AA6'  # Inference ID (used for output folder structure)
  map-ids:
    - '35813030'    # Add your map_ids here (text type)
  batch-ids:
    - 252
    - 253
    - 257
  output-base-dir: '/kamco-nfs/dataset/export/'
  crs: 'EPSG:5186'  # Korean 2000 / Central Belt

geoserver:
  base-url: 'https://kamco.geo-dev.gs.dabeeo.com'
  workspace: 'cd'
  datastore: 'inference_result'
  overwrite-existing: true  # Delete existing layers before re-registering
  connection-timeout: 30000  # 30 seconds
  read-timeout: 60000        # 60 seconds

## Converter Mode (`converter.mode`)

`converter.mode`는 대량 `map_ids` 처리 시
OS 커맨드라인 길이 제한(`Argument list too long`) 문제를 방지하기 위해 추가 하였습니다.

### Supported Modes

#### MERGED
- `batch-ids`에 해당하는 **모든 데이터를 하나의 Shapefile로 병합 생성**
- `map-ids`가 설정되어 있어도 **무시됨**
- 단일 결과 파일이 필요한 경우 적합

#### MAP_IDS
- 명시적으로 전달한 `map-ids`만 대상으로 Shapefile 생성
- `converter.map-ids` **필수**
- `map-ids` 개수가 많을 경우 OS 커맨드라인 길이 제한에 걸릴 수 있음

#### RESOLVE
- `batch-ids` 기준으로 **JAR 내부에서 map_ids를 조회**한 뒤 Shapefile 생성
- `map-ids`를 커맨드라인 인자로 전달하지 않음
- 대량 데이터 처리 시 가장 안전한 방식

### Default Behavior (mode 미지정 시)

- `converter.map-ids`가 비어 있으면 → **MERGED**
- `converter.map-ids`가 있으면 → **MAP_IDS**

### Command Line Parameters

You can override configuration values using command line arguments:

**Using Gradle (recommended - no quoting issues):**

```bash
./gradlew bootRun --args="--converter.inference-id=D5E46F60FC40B1A8BE0CD1F3547AA6 --converter.map-ids[0]=35813030 --converter.batch-ids[0]=252 --converter.batch-ids[1]=253 --converter.batch-ids[2]=257 --converter.mode=MERGED"

Using JAR (zsh shell - quote arguments with brackets):

java -jar build/libs/shp-exporter.jar \
  '--converter.inference-id=D5E46F60FC40B1A8BE0CD1F3547AA6' \
  '--converter.map-ids[0]=35813030' \
  '--converter.batch-ids[0]=252' \
  '--converter.batch-ids[1]=253' \
  '--converter.batch-ids[2]=257' \
  '--converter.mode=MERGED'

Using JAR (bash shell - no quotes needed):

java -jar build/libs/shp-exporter.jar \
  --converter.inference-id=D5E46F60FC40B1A8BE0CD1F3547AA6 \
  --converter.map-ids[0]=35813030 \
  --converter.batch-ids[0]=252 \
  --converter.batch-ids[1]=253 \
  --converter.batch-ids[2]=257 \
  --converter.mode=MERGED

Note for zsh users: zsh interprets square brackets [] as glob patterns. Always quote arguments containing brackets when using zsh.

Building

./gradlew build

Running

Generate Shapefiles (Spring Batch Mode - Recommended)

New in v1.1.0: Spring Batch mode provides memory-optimized processing for large datasets.

# MERGED mode (creates single shapefile + GeoJSON for all batch-ids)
./gradlew bootRun --args="--batch --converter.batch-ids[0]=252 --converter.batch-ids[1]=253"

# With GeoServer registration
./gradlew bootRun --args="--batch --geoserver.enabled=true --converter.batch-ids[0]=252"

Output Files (in {output-base-dir}/{inference-id}/merge/):

{inference-id}.shp (+ .shx, .dbf, .prj) - Shapefile
{inference-id}.geojson - GeoJSON file
{inference-id}.zip - ZIP archive of shapefile

Benefits:

90-95% memory reduction (2-13GB → 150-200MB for 1M records)
Chunk-based streaming (1000 records per chunk)
Restart capability after failures
Step-by-step execution support

See claudedocs/SPRING_BATCH_MIGRATION.md for detailed documentation.

Generate Shapefiles (Legacy Mode)

./gradlew bootRun

Or run the JAR directly:

java -jar build/libs/shp-exporter.jar

Register Shapefile to GeoServer

First, set GeoServer credentials as environment variables:

export GEOSERVER_USERNAME=admin
export GEOSERVER_PASSWORD=geoserver

Then register a shapefile:

./gradlew bootRun --args="--upload-shp /kamco-nfs/dataset/export/D5E46F60FC40B1A8BE0CD1F3547AA6/35813030/35813030.shp --layer inference_35813030"

Or using the JAR:

java -jar build/libs/shp-exporter.jar \
  --upload-shp /path/to/shapefile.shp \
  --layer layer_name

GeoServer Registration Process:

Verifies workspace 'cd' exists (must be pre-created in GeoServer)
Creates datastore 'inference_result' if it doesn't exist
Deletes existing layer if overwrite-existing: true
Publishes shapefile via REST API
Verifies successful registration
Automatically enables WMS, WFS, WMTS services

Important Notes:

Workspace 'cd' must exist in GeoServer before registration
Environment variables GEOSERVER_USERNAME and GEOSERVER_PASSWORD must be set
Shapefile path must be absolute
GeoServer must have file system access to the shapefile location

Output

Legacy Mode Output

Shapefiles will be created in directories structured as output-base-dir/inference-id/map-id/:

/kamco-nfs/dataset/export/D5E46F60FC40B1A8BE0CD1F3547AA6/35813030/
├── 35813030.shp    # Shapefile geometry
├── 35813030.shx    # Shape index
├── 35813030.dbf    # Attribute data
└── 35813030.prj    # Projection information

Spring Batch Mode Output

Output structure for MERGED mode (output-base-dir/inference-id/merge/):

/kamco-nfs/dataset/export/D5E46F60FC40B1A8BE0CD1F3547AA6/merge/
├── D5E46F60FC40B1A8BE0CD1F3547AA6.shp       # Shapefile geometry
├── D5E46F60FC40B1A8BE0CD1F3547AA6.shx       # Shape index
├── D5E46F60FC40B1A8BE0CD1F3547AA6.dbf       # Attribute data
├── D5E46F60FC40B1A8BE0CD1F3547AA6.prj       # Projection information
├── D5E46F60FC40B1A8BE0CD1F3547AA6.geojson   # GeoJSON format
└── D5E46F60FC40B1A8BE0CD1F3547AA6.zip       # ZIP archive (for GeoServer)

Note: Only the shapefile (.shp and related files) are registered to GeoServer. GeoJSON files are generated for alternative consumption.

Database Query

Spring Batch Mode (Recommended)

The Spring Batch mode applies comprehensive validation to ensure data quality:

ㄴㅅ
ORDER BY map_id, uid

Validation Criteria:

Geometry Type: Only ST_Polygon and ST_MultiPolygon (excludes Point, LineString, etc.)
Coordinate System: EPSG:5186 (Korean 2000 / Central Belt)
Coordinate Range: Korea territory bounds (X: 125-530km, Y: -600-988km)
Geometry Validity: Valid topology (ST_IsValid)

Rows failing validation are automatically excluded from processing, ensuring clean shapefile generation.

Performance: See PERFORMANCE_OPTIMIZATION.md for indexing recommendations.

Legacy Mode

Legacy mode uses a simpler query without validation:

SELECT uid, map_id, probability, before_year, after_year,
       before_c, before_p, after_c, after_p, ST_AsText(geometry) as geometry_wkt
FROM inference_results_testing
WHERE batch_id = ANY(?) AND map_id = ?

Database Schema

geometry: geometry(Polygon, 5186) - EPSG:5186 좌표계
map_id: text - 문자열 타입
before_year, after_year: bigint - Long 타입
batch_id: bigint - Long 타입
uid: uuid - UUID 타입

Field Mapping

Shapefile field names are limited to 10 characters:

Database Column	DB Type	Shapefile Field	Shapefile Type
uid	uuid	chnDtctId	String
map_id	text	mpqd_no	String
probability	float8	chn_dtct_p	Double
before_year	bigint	cprs_yr	Long
after_year	bigint	crtr_yr	Long
before_c	text	bf_cls_cd	String
before_p	float8	bf_cls_pro	Double
after_c	text	af_cls_cd	String
after_p	float8	af_cls_pro	Double
geometry	geom	the_geom	Polygon

Error Handling

Shapefile Generation

No results: Logs warning and continues to next map_id
Mixed geometry types: Throws exception (shapefiles require homogeneous geometry)
Database connection failure: Application exits with error
Invalid geometry: Logs warning and continues processing

GeoServer Registration

Workspace not found: Logs error with remediation steps (must be pre-created)
Authentication failure: Logs error prompting to verify environment variables
Network timeout: Logs connection error with timeout details
Layer already exists: Automatically deletes and re-registers if overwrite-existing: true
Registration failure: Logs error but does not stop application (non-blocking)

Validating Output

Shapefile Validation

Open the generated shapefiles in QGIS or ArcGIS to verify:

Geometry displays correctly
Attribute table contains all expected fields
CRS is EPSG:5186 (Korean 2000 / Central Belt)

GeoServer Layer Validation

After registering to GeoServer, verify the layer:

GeoServer Admin Console: https://kamco.geo-dev.gs.dabeeo.com/geoserver/web
Navigate to Layers → Find your layer (e.g., cd:inference_35813030)
Preview the layer using Layer Preview
Verify services are enabled:
- WMS: https://kamco.geo-dev.gs.dabeeo.com/geoserver/cd/wms
- WFS: https://kamco.geo-dev.gs.dabeeo.com/geoserver/cd/wfs
- WMTS: https://kamco.geo-dev.gs.dabeeo.com/geoserver/cd/wmts

Example WMS GetMap Request:

https://kamco.geo-dev.gs.dabeeo.com/geoserver/cd/wms?
  service=WMS&
  version=1.1.0&
  request=GetMap&
  layers=cd:inference_35813030&
  bbox=<bounds>&
  width=768&
  height=768&
  srs=EPSG:5186&
  format=image/png

Development

Code Formatting

The project uses Google Java Format with 2-space indentation:

./gradlew spotlessApply

Project Structure

src/main/java/com/kamco/makesample/
├── MakeSampleApplication.java                    # Main application class
├── batch/                                         # Spring Batch components (v1.1.0+)
│   ├── config/
│   │   ├── BatchConfiguration.java               # Spring Batch configuration
│   │   └── MergedModeJobConfig.java              # MERGED mode Job definition
│   ├── processor/
│   │   └── FeatureConversionProcessor.java       # InferenceResult → SimpleFeature processor
│   ├── reader/
│   │   ├── GeometryConvertingRowMapper.java      # WKT → JTS converter
│   │   └── InferenceResultItemReaderConfig.java  # Cursor-based DB reader
│   ├── tasklet/
│   │   ├── CreateZipTasklet.java                 # ZIP creation tasklet
│   │   ├── GeoServerRegistrationTasklet.java     # GeoServer registration tasklet
│   │   └── GeometryTypeValidationTasklet.java    # Geometry validation tasklet
│   ├── util/
│   │   └── FeatureTypeFactory.java               # Shared feature type creation
│   └── writer/
│       ├── StreamingGeoJsonWriter.java           # Streaming GeoJSON writer
│       └── StreamingShapefileWriter.java         # Streaming shapefile writer
├── cli/
│   └── ConverterCommandLineRunner.java           # CLI entry point (batch + legacy)
├── config/
│   ├── ConverterProperties.java                  # Shapefile converter configuration
│   ├── GeoServerProperties.java                  # GeoServer configuration
│   ├── GeoServerCredentials.java                 # GeoServer authentication
│   └── RestTemplateConfig.java                   # HTTP client configuration
├── exception/
│   ├── ShapefileConversionException.java
│   ├── GeometryConversionException.java
│   ├── MixedGeometryException.java
│   └── GeoServerRegistrationException.java       # GeoServer registration errors
├── model/
│   └── InferenceResult.java                      # Domain model
├── repository/
│   └── InferenceResultRepository.java            # Data access layer (legacy)
├── service/
│   ├── GeometryConverter.java                    # PostGIS to JTS conversion
│   ├── ShapefileConverterService.java            # Orchestration service (legacy)
│   └── GeoServerRegistrationService.java         # GeoServer REST API integration
└── writer/
    ├── ShapefileWriter.java                       # GeoTools shapefile writer (legacy)
    └── GeoJsonWriter.java                         # GeoJSON export writer (legacy)

Dependencies

Spring Boot 3.5.7
- spring-boot-starter
- spring-boot-starter-jdbc
- spring-boot-starter-batch (v1.1.0+)
- spring-boot-starter-web (for RestTemplate)
- spring-boot-starter-validation (for @NotBlank annotations)
GeoTools 30.0
- gt-shapefile
- gt-referencing
- gt-epsg-hsql
- gt-geojson
PostgreSQL JDBC Driver
PostGIS JDBC 2.5.1
JTS (Java Topology Suite) 1.19.0

Troubleshooting

GeoServer Registration Issues

Problem: "Workspace not found: cd"

Solution: Create workspace 'cd' in GeoServer admin console before registration
Steps:
1. Login to GeoServer admin: https://kamco.geo-dev.gs.dabeeo.com/geoserver/web
2. Go to Workspaces → Add new workspace
3. Name: cd, Namespace URI: http://cd
4. Click Save

Problem: "GeoServer credentials not configured"

Solution: Set environment variables before running
export GEOSERVER_USERNAME=admin
export GEOSERVER_PASSWORD=geoserver

Problem: "Layer already exists and overwrite is disabled"

Solution: Enable overwrite in application.yml
geoserver:
  overwrite-existing: true

Problem: Connection timeout to GeoServer

Solution: Increase timeout values in application.yml
geoserver:
  connection-timeout: 60000  # 60 seconds
  read-timeout: 120000       # 120 seconds

Problem: "Registration failed. Layer not found after publication"

Possible causes:
1. GeoServer cannot access shapefile path (check file system permissions)
2. Shapefile is corrupted or invalid
3. Network issue interrupted registration

Solution:
1. Verify GeoServer has read access to shapefile directory
2. Validate shapefile using QGIS or ogr2ogr
3. Check GeoServer logs for detailed error messages

Database Connection Issues

Problem: "Connection refused to 192.168.2.127:15432"

Solution: Verify PostgreSQL is running and accessible
psql -h 192.168.2.127 -p 15432 -U kamco_cds -d kamco_cds

Problem: "No results found for map_id"

Solution: Verify data exists in database
SELECT COUNT(*) FROM inference_results_testing
WHERE batch_id IN (252, 253, 257) AND map_id = '35813030';

Batch Execution History

Overview

Spring Batch mode automatically tracks execution history for each step, recording:

Start time, end time, duration
Success/failure status
Error messages and stack traces (if failed)
Processing statistics (read/write/commit/rollback/skip counts)

Table Setup

Create the batch_execution_history table before running batch jobs:

psql -h 192.168.2.127 -p 15432 -U kamco_cds -d kamco_cds \
  -f src/main/resources/db/migration/V1__create_batch_execution_history.sql

Query Examples

View execution history for a specific job:

SELECT step_name, start_time, end_time, duration_ms, status, read_count, write_count
FROM batch_execution_history
WHERE job_execution_id = 123
ORDER BY start_time;

Check failed steps:

SELECT job_execution_id, step_name, start_time, error_message
FROM batch_execution_history
WHERE status = 'FAILED'
ORDER BY start_time DESC
LIMIT 10;

Average step duration:

SELECT step_name,
       COUNT(*) as executions,
       ROUND(AVG(duration_ms) / 1000.0, 2) as avg_duration_sec
FROM batch_execution_history
WHERE status = 'COMPLETED'
GROUP BY step_name
ORDER BY avg_duration_sec DESC;

For more query examples and detailed documentation, see BATCH_EXECUTION_HISTORY.md.

License

KAMCO Internal Use Only

17 KiB Executable File Raw Blame History