534 lines
17 KiB
Markdown
Executable File
534 lines
17 KiB
Markdown
Executable File
# PostgreSQL to Shapefile Converter
|
|
|
|
Spring Boot CLI application that queries PostgreSQL spatial data and generates shapefiles.
|
|
|
|
## Features
|
|
|
|
- Batch processing for multiple map_ids
|
|
- PostGIS geometry to Shapefile conversion
|
|
- **GeoServer REST API integration for automatic layer registration**
|
|
- Configurable via `application.yml`
|
|
- Generates all required shapefile files (.shp, .shx, .dbf, .prj)
|
|
- Supports EPSG:5186 (Korean 2000 / Central Belt) coordinate reference system
|
|
- GeoJSON export support
|
|
|
|
## Prerequisites
|
|
|
|
- Java 17 or higher
|
|
- PostgreSQL database with PostGIS extension
|
|
- Access to the KAMCO database at 192.168.2.127:15432
|
|
|
|
## Configuration
|
|
|
|
Edit `src/main/resources/application.yml` to configure:
|
|
|
|
```yaml
|
|
converter:
|
|
spring.profiles.active: 'dev' #profiles.active
|
|
inference-id: 'D5E46F60FC40B1A8BE0CD1F3547AA6' # Inference ID (used for output folder structure)
|
|
map-ids:
|
|
- '35813030' # Add your map_ids here (text type)
|
|
batch-ids:
|
|
- 252
|
|
- 253
|
|
- 257
|
|
output-base-dir: '/kamco-nfs/dataset/export/'
|
|
crs: 'EPSG:5186' # Korean 2000 / Central Belt
|
|
|
|
geoserver:
|
|
base-url: 'https://kamco.geo-dev.gs.dabeeo.com'
|
|
workspace: 'cd'
|
|
datastore: 'inference_result'
|
|
overwrite-existing: true # Delete existing layers before re-registering
|
|
connection-timeout: 30000 # 30 seconds
|
|
read-timeout: 60000 # 60 seconds
|
|
```
|
|
|
|
```md
|
|
## Converter Mode (`converter.mode`)
|
|
|
|
`converter.mode`는 대량 `map_ids` 처리 시
|
|
OS 커맨드라인 길이 제한(`Argument list too long`) 문제를 방지하기 위해 추가 하였습니다.
|
|
|
|
### Supported Modes
|
|
|
|
#### MERGED
|
|
- `batch-ids`에 해당하는 **모든 데이터를 하나의 Shapefile로 병합 생성**
|
|
- `map-ids`가 설정되어 있어도 **무시됨**
|
|
- 단일 결과 파일이 필요한 경우 적합
|
|
|
|
#### MAP_IDS
|
|
- 명시적으로 전달한 `map-ids`만 대상으로 Shapefile 생성
|
|
- `converter.map-ids` **필수**
|
|
- `map-ids` 개수가 많을 경우 OS 커맨드라인 길이 제한에 걸릴 수 있음
|
|
|
|
#### RESOLVE
|
|
- `batch-ids` 기준으로 **JAR 내부에서 map_ids를 조회**한 뒤 Shapefile 생성
|
|
- `map-ids`를 커맨드라인 인자로 전달하지 않음
|
|
- 대량 데이터 처리 시 가장 안전한 방식
|
|
|
|
### Default Behavior (mode 미지정 시)
|
|
|
|
- `converter.map-ids`가 비어 있으면 → **MERGED**
|
|
- `converter.map-ids`가 있으면 → **MAP_IDS**
|
|
|
|
### Command Line Parameters
|
|
|
|
You can override configuration values using command line arguments:
|
|
|
|
**Using Gradle (recommended - no quoting issues):**
|
|
|
|
```bash
|
|
./gradlew bootRun --args="--converter.inference-id=D5E46F60FC40B1A8BE0CD1F3547AA6 --converter.map-ids[0]=35813030 --converter.batch-ids[0]=252 --converter.batch-ids[1]=253 --converter.batch-ids[2]=257 --converter.mode=MERGED"
|
|
```
|
|
|
|
**Using JAR (zsh shell - quote arguments with brackets):**
|
|
|
|
```bash
|
|
java -jar build/libs/shp-exporter.jar \
|
|
'--converter.inference-id=D5E46F60FC40B1A8BE0CD1F3547AA6' \
|
|
'--converter.map-ids[0]=35813030' \
|
|
'--converter.batch-ids[0]=252' \
|
|
'--converter.batch-ids[1]=253' \
|
|
'--converter.batch-ids[2]=257' \
|
|
'--converter.mode=MERGED'
|
|
```
|
|
|
|
**Using JAR (bash shell - no quotes needed):**
|
|
|
|
```bash
|
|
java -jar build/libs/shp-exporter.jar \
|
|
--converter.inference-id=D5E46F60FC40B1A8BE0CD1F3547AA6 \
|
|
--converter.map-ids[0]=35813030 \
|
|
--converter.batch-ids[0]=252 \
|
|
--converter.batch-ids[1]=253 \
|
|
--converter.batch-ids[2]=257 \
|
|
--converter.mode=MERGED
|
|
```
|
|
|
|
|
|
```bash
|
|
|
|
java -jar build/libs/shp-exporter.jar \
|
|
--batch \
|
|
--converter.inference-id=test009 \
|
|
--converter.batch-ids[0]=111 \
|
|
--converter.batch-ids[1]=114 \
|
|
--converter.batch-ids[2]=162 \
|
|
--geoserver.enabled=true
|
|
|
|
```
|
|
|
|
**Note for zsh users:** zsh interprets square brackets `[]` as glob patterns. Always quote arguments containing brackets when using zsh.
|
|
|
|
## Building
|
|
|
|
```bash
|
|
./gradlew build
|
|
```
|
|
|
|
## Running
|
|
|
|
### Generate Shapefiles (Spring Batch Mode - Recommended)
|
|
|
|
**New in v1.1.0**: Spring Batch mode provides memory-optimized processing for large datasets.
|
|
|
|
```bash
|
|
# MERGED mode (creates single shapefile + GeoJSON for all batch-ids)
|
|
./gradlew bootRun --args="--batch --converter.batch-ids[0]=252 --converter.batch-ids[1]=253"
|
|
|
|
# With GeoServer registration
|
|
./gradlew bootRun --args="--batch --geoserver.enabled=true --converter.batch-ids[0]=252"
|
|
```
|
|
|
|
**Output Files** (in `{output-base-dir}/{inference-id}/merge/`):
|
|
- `{inference-id}.shp` (+ .shx, .dbf, .prj) - Shapefile
|
|
- `{inference-id}.geojson` - GeoJSON file
|
|
- `{inference-id}.zip` - ZIP archive of shapefile
|
|
|
|
**Benefits**:
|
|
- 90-95% memory reduction (2-13GB → 150-200MB for 1M records)
|
|
- Chunk-based streaming (1000 records per chunk)
|
|
- Restart capability after failures
|
|
- Step-by-step execution support
|
|
|
|
See [claudedocs/SPRING_BATCH_MIGRATION.md](claudedocs/SPRING_BATCH_MIGRATION.md) for detailed documentation.
|
|
|
|
### Generate Shapefiles (Legacy Mode)
|
|
|
|
```bash
|
|
./gradlew bootRun
|
|
```
|
|
|
|
Or run the JAR directly:
|
|
|
|
```bash
|
|
java -jar build/libs/shp-exporter.jar
|
|
```
|
|
|
|
### Register Shapefile to GeoServer
|
|
|
|
First, set GeoServer credentials as environment variables:
|
|
|
|
```bash
|
|
export GEOSERVER_USERNAME=admin
|
|
export GEOSERVER_PASSWORD=geoserver
|
|
```
|
|
|
|
Then register a shapefile:
|
|
|
|
```bash
|
|
./gradlew bootRun --args="--upload-shp /kamco-nfs/dataset/export/D5E46F60FC40B1A8BE0CD1F3547AA6/35813030/35813030.shp --layer inference_35813030"
|
|
```
|
|
|
|
Or using the JAR:
|
|
|
|
```bash
|
|
java -jar build/libs/shp-exporter.jar \
|
|
--upload-shp /path/to/shapefile.shp \
|
|
--layer layer_name
|
|
```
|
|
|
|
**GeoServer Registration Process:**
|
|
1. Verifies workspace 'cd' exists (must be pre-created in GeoServer)
|
|
2. Creates datastore 'inference_result' if it doesn't exist
|
|
3. Deletes existing layer if `overwrite-existing: true`
|
|
4. Publishes shapefile via REST API
|
|
5. Verifies successful registration
|
|
6. Automatically enables WMS, WFS, WMTS services
|
|
|
|
**Important Notes:**
|
|
- Workspace 'cd' must exist in GeoServer before registration
|
|
- Environment variables `GEOSERVER_USERNAME` and `GEOSERVER_PASSWORD` must be set
|
|
- Shapefile path must be absolute
|
|
- GeoServer must have file system access to the shapefile location
|
|
|
|
## Output
|
|
|
|
### Legacy Mode Output
|
|
Shapefiles will be created in directories structured as `output-base-dir/inference-id/map-id/`:
|
|
|
|
```
|
|
/kamco-nfs/dataset/export/D5E46F60FC40B1A8BE0CD1F3547AA6/35813030/
|
|
├── 35813030.shp # Shapefile geometry
|
|
├── 35813030.shx # Shape index
|
|
├── 35813030.dbf # Attribute data
|
|
└── 35813030.prj # Projection information
|
|
```
|
|
|
|
### Spring Batch Mode Output
|
|
Output structure for MERGED mode (`output-base-dir/inference-id/merge/`):
|
|
|
|
```
|
|
/kamco-nfs/dataset/export/D5E46F60FC40B1A8BE0CD1F3547AA6/merge/
|
|
├── D5E46F60FC40B1A8BE0CD1F3547AA6.shp # Shapefile geometry
|
|
├── D5E46F60FC40B1A8BE0CD1F3547AA6.shx # Shape index
|
|
├── D5E46F60FC40B1A8BE0CD1F3547AA6.dbf # Attribute data
|
|
├── D5E46F60FC40B1A8BE0CD1F3547AA6.prj # Projection information
|
|
├── D5E46F60FC40B1A8BE0CD1F3547AA6.geojson # GeoJSON format
|
|
└── D5E46F60FC40B1A8BE0CD1F3547AA6.zip # ZIP archive (for GeoServer)
|
|
```
|
|
|
|
**Note**: Only the shapefile (.shp and related files) are registered to GeoServer. GeoJSON files are generated for alternative consumption.
|
|
|
|
## Database Query
|
|
|
|
### Spring Batch Mode (Recommended)
|
|
|
|
The Spring Batch mode applies comprehensive validation to ensure data quality:
|
|
|
|
```sql
|
|
ㄴㅅ
|
|
ORDER BY map_id, uid
|
|
```
|
|
|
|
**Validation Criteria**:
|
|
- **Geometry Type**: Only ST_Polygon and ST_MultiPolygon (excludes Point, LineString, etc.)
|
|
- **Coordinate System**: EPSG:5186 (Korean 2000 / Central Belt)
|
|
- **Coordinate Range**: Korea territory bounds (X: 125-530km, Y: -600-988km)
|
|
- **Geometry Validity**: Valid topology (ST_IsValid)
|
|
|
|
Rows failing validation are automatically excluded from processing, ensuring clean shapefile generation.
|
|
|
|
**Performance**: See [PERFORMANCE_OPTIMIZATION.md](claudedocs/PERFORMANCE_OPTIMIZATION.md) for indexing recommendations.
|
|
|
|
### Legacy Mode
|
|
|
|
Legacy mode uses a simpler query without validation:
|
|
|
|
```sql
|
|
SELECT uid, map_id, probability, before_year, after_year,
|
|
before_c, before_p, after_c, after_p, ST_AsText(geometry) as geometry_wkt
|
|
FROM inference_results_testing
|
|
WHERE batch_id = ANY(?) AND map_id = ?
|
|
```
|
|
|
|
### Database Schema
|
|
- **geometry**: `geometry(Polygon, 5186)` - EPSG:5186 좌표계
|
|
- **map_id**: `text` - 문자열 타입
|
|
- **before_year, after_year**: `bigint` - Long 타입
|
|
- **batch_id**: `bigint` - Long 타입
|
|
- **uid**: `uuid` - UUID 타입
|
|
|
|
## Field Mapping
|
|
|
|
Shapefile field names are limited to 10 characters:
|
|
|
|
| Database Column | DB Type | Shapefile Field | Shapefile Type |
|
|
|-----------------|----------|-----------------|----------------|
|
|
| uid | uuid | chnDtctId | String |
|
|
| map_id | text | mpqd_no | String |
|
|
| probability | float8 | chn_dtct_p | Double |
|
|
| before_year | bigint | cprs_yr | Long |
|
|
| after_year | bigint | crtr_yr | Long |
|
|
| before_c | text | bf_cls_cd | String |
|
|
| before_p | float8 | bf_cls_pro | Double |
|
|
| after_c | text | af_cls_cd | String |
|
|
| after_p | float8 | af_cls_pro | Double |
|
|
| geometry | geom | the_geom | Polygon |
|
|
|
|
## Error Handling
|
|
|
|
### Shapefile Generation
|
|
- **No results**: Logs warning and continues to next map_id
|
|
- **Mixed geometry types**: Throws exception (shapefiles require homogeneous geometry)
|
|
- **Database connection failure**: Application exits with error
|
|
- **Invalid geometry**: Logs warning and continues processing
|
|
|
|
### GeoServer Registration
|
|
- **Workspace not found**: Logs error with remediation steps (must be pre-created)
|
|
- **Authentication failure**: Logs error prompting to verify environment variables
|
|
- **Network timeout**: Logs connection error with timeout details
|
|
- **Layer already exists**: Automatically deletes and re-registers if `overwrite-existing: true`
|
|
- **Registration failure**: Logs error but does not stop application (non-blocking)
|
|
|
|
## Validating Output
|
|
|
|
### Shapefile Validation
|
|
|
|
Open the generated shapefiles in QGIS or ArcGIS to verify:
|
|
|
|
1. Geometry displays correctly
|
|
2. Attribute table contains all expected fields
|
|
3. CRS is EPSG:5186 (Korean 2000 / Central Belt)
|
|
|
|
### GeoServer Layer Validation
|
|
|
|
After registering to GeoServer, verify the layer:
|
|
|
|
1. **GeoServer Admin Console**: https://kamco.geo-dev.gs.dabeeo.com/geoserver/web
|
|
2. Navigate to **Layers** → Find your layer (e.g., `cd:inference_35813030`)
|
|
3. Preview the layer using **Layer Preview**
|
|
4. Verify services are enabled:
|
|
- WMS: `https://kamco.geo-dev.gs.dabeeo.com/geoserver/cd/wms`
|
|
- WFS: `https://kamco.geo-dev.gs.dabeeo.com/geoserver/cd/wfs`
|
|
- WMTS: `https://kamco.geo-dev.gs.dabeeo.com/geoserver/cd/wmts`
|
|
|
|
**Example WMS GetMap Request:**
|
|
```
|
|
https://kamco.geo-dev.gs.dabeeo.com/geoserver/cd/wms?
|
|
service=WMS&
|
|
version=1.1.0&
|
|
request=GetMap&
|
|
layers=cd:inference_35813030&
|
|
bbox=<bounds>&
|
|
width=768&
|
|
height=768&
|
|
srs=EPSG:5186&
|
|
format=image/png
|
|
```
|
|
|
|
## Development
|
|
|
|
### Code Formatting
|
|
|
|
The project uses Google Java Format with 2-space indentation:
|
|
|
|
```bash
|
|
./gradlew spotlessApply
|
|
```
|
|
|
|
### Project Structure
|
|
|
|
```
|
|
src/main/java/com/kamco/makesample/
|
|
├── MakeSampleApplication.java # Main application class
|
|
├── batch/ # Spring Batch components (v1.1.0+)
|
|
│ ├── config/
|
|
│ │ ├── BatchConfiguration.java # Spring Batch configuration
|
|
│ │ └── MergedModeJobConfig.java # MERGED mode Job definition
|
|
│ ├── processor/
|
|
│ │ └── FeatureConversionProcessor.java # InferenceResult → SimpleFeature processor
|
|
│ ├── reader/
|
|
│ │ ├── GeometryConvertingRowMapper.java # WKT → JTS converter
|
|
│ │ └── InferenceResultItemReaderConfig.java # Cursor-based DB reader
|
|
│ ├── tasklet/
|
|
│ │ ├── CreateZipTasklet.java # ZIP creation tasklet
|
|
│ │ ├── GeoServerRegistrationTasklet.java # GeoServer registration tasklet
|
|
│ │ └── GeometryTypeValidationTasklet.java # Geometry validation tasklet
|
|
│ ├── util/
|
|
│ │ └── FeatureTypeFactory.java # Shared feature type creation
|
|
│ └── writer/
|
|
│ ├── StreamingGeoJsonWriter.java # Streaming GeoJSON writer
|
|
│ └── StreamingShapefileWriter.java # Streaming shapefile writer
|
|
├── cli/
|
|
│ └── ConverterCommandLineRunner.java # CLI entry point (batch + legacy)
|
|
├── config/
|
|
│ ├── ConverterProperties.java # Shapefile converter configuration
|
|
│ ├── GeoServerProperties.java # GeoServer configuration
|
|
│ ├── GeoServerCredentials.java # GeoServer authentication
|
|
│ └── RestTemplateConfig.java # HTTP client configuration
|
|
├── exception/
|
|
│ ├── ShapefileConversionException.java
|
|
│ ├── GeometryConversionException.java
|
|
│ ├── MixedGeometryException.java
|
|
│ └── GeoServerRegistrationException.java # GeoServer registration errors
|
|
├── model/
|
|
│ └── InferenceResult.java # Domain model
|
|
├── repository/
|
|
│ └── InferenceResultRepository.java # Data access layer (legacy)
|
|
├── service/
|
|
│ ├── GeometryConverter.java # PostGIS to JTS conversion
|
|
│ ├── ShapefileConverterService.java # Orchestration service (legacy)
|
|
│ └── GeoServerRegistrationService.java # GeoServer REST API integration
|
|
└── writer/
|
|
├── ShapefileWriter.java # GeoTools shapefile writer (legacy)
|
|
└── GeoJsonWriter.java # GeoJSON export writer (legacy)
|
|
```
|
|
|
|
## Dependencies
|
|
|
|
- Spring Boot 3.5.7
|
|
- spring-boot-starter
|
|
- spring-boot-starter-jdbc
|
|
- spring-boot-starter-batch (v1.1.0+)
|
|
- spring-boot-starter-web (for RestTemplate)
|
|
- spring-boot-starter-validation (for @NotBlank annotations)
|
|
- GeoTools 30.0
|
|
- gt-shapefile
|
|
- gt-referencing
|
|
- gt-epsg-hsql
|
|
- gt-geojson
|
|
- PostgreSQL JDBC Driver
|
|
- PostGIS JDBC 2.5.1
|
|
- JTS (Java Topology Suite) 1.19.0
|
|
|
|
## Troubleshooting
|
|
|
|
### GeoServer Registration Issues
|
|
|
|
**Problem: "Workspace not found: cd"**
|
|
```
|
|
Solution: Create workspace 'cd' in GeoServer admin console before registration
|
|
Steps:
|
|
1. Login to GeoServer admin: https://kamco.geo-dev.gs.dabeeo.com/geoserver/web
|
|
2. Go to Workspaces → Add new workspace
|
|
3. Name: cd, Namespace URI: http://cd
|
|
4. Click Save
|
|
```
|
|
|
|
**Problem: "GeoServer credentials not configured"**
|
|
```
|
|
Solution: Set environment variables before running
|
|
export GEOSERVER_USERNAME=admin
|
|
export GEOSERVER_PASSWORD=geoserver
|
|
```
|
|
|
|
**Problem: "Layer already exists and overwrite is disabled"**
|
|
```
|
|
Solution: Enable overwrite in application.yml
|
|
geoserver:
|
|
overwrite-existing: true
|
|
```
|
|
|
|
**Problem: Connection timeout to GeoServer**
|
|
```
|
|
Solution: Increase timeout values in application.yml
|
|
geoserver:
|
|
connection-timeout: 60000 # 60 seconds
|
|
read-timeout: 120000 # 120 seconds
|
|
```
|
|
|
|
**Problem: "Registration failed. Layer not found after publication"**
|
|
```
|
|
Possible causes:
|
|
1. GeoServer cannot access shapefile path (check file system permissions)
|
|
2. Shapefile is corrupted or invalid
|
|
3. Network issue interrupted registration
|
|
|
|
Solution:
|
|
1. Verify GeoServer has read access to shapefile directory
|
|
2. Validate shapefile using QGIS or ogr2ogr
|
|
3. Check GeoServer logs for detailed error messages
|
|
```
|
|
|
|
### Database Connection Issues
|
|
|
|
**Problem: "Connection refused to 192.168.2.127:15432"**
|
|
```
|
|
Solution: Verify PostgreSQL is running and accessible
|
|
psql -h 192.168.2.127 -p 15432 -U kamco_cds -d kamco_cds
|
|
```
|
|
|
|
**Problem: "No results found for map_id"**
|
|
```
|
|
Solution: Verify data exists in database
|
|
SELECT COUNT(*) FROM inference_results_testing
|
|
WHERE batch_id IN (252, 253, 257) AND map_id = '35813030';
|
|
```
|
|
|
|
## Batch Execution History
|
|
|
|
### Overview
|
|
|
|
Spring Batch mode automatically tracks execution history for each step, recording:
|
|
- Start time, end time, duration
|
|
- Success/failure status
|
|
- Error messages and stack traces (if failed)
|
|
- Processing statistics (read/write/commit/rollback/skip counts)
|
|
|
|
### Table Setup
|
|
|
|
Create the `batch_execution_history` table before running batch jobs:
|
|
|
|
```bash
|
|
psql -h 192.168.2.127 -p 15432 -U kamco_cds -d kamco_cds \
|
|
-f src/main/resources/db/migration/V1__create_batch_execution_history.sql
|
|
```
|
|
|
|
### Query Examples
|
|
|
|
**View execution history for a specific job**:
|
|
```sql
|
|
SELECT step_name, start_time, end_time, duration_ms, status, read_count, write_count
|
|
FROM batch_execution_history
|
|
WHERE job_execution_id = 123
|
|
ORDER BY start_time;
|
|
```
|
|
|
|
**Check failed steps**:
|
|
```sql
|
|
SELECT job_execution_id, step_name, start_time, error_message
|
|
FROM batch_execution_history
|
|
WHERE status = 'FAILED'
|
|
ORDER BY start_time DESC
|
|
LIMIT 10;
|
|
```
|
|
|
|
**Average step duration**:
|
|
```sql
|
|
SELECT step_name,
|
|
COUNT(*) as executions,
|
|
ROUND(AVG(duration_ms) / 1000.0, 2) as avg_duration_sec
|
|
FROM batch_execution_history
|
|
WHERE status = 'COMPLETED'
|
|
GROUP BY step_name
|
|
ORDER BY avg_duration_sec DESC;
|
|
```
|
|
|
|
For more query examples and detailed documentation, see [BATCH_EXECUTION_HISTORY.md](claudedocs/BATCH_EXECUTION_HISTORY.md).
|
|
|
|
## License
|
|
|
|
KAMCO Internal Use Only
|