kamco-cd-cron/shp-exporter/README.md

# PostgreSQL to Shapefile Converter

Spring Boot CLI application that queries PostgreSQL spatial data and generates shapefiles.

## Features

- Batch processing for multiple map_ids
- PostGIS geometry to Shapefile conversion
- **GeoServer REST API integration for automatic layer registration**
- Configurable via `application.yml`
- Generates all required shapefile files (.shp, .shx, .dbf, .prj)
- Supports EPSG:5186 (Korean 2000 / Central Belt) coordinate reference system
- GeoJSON export support

## Prerequisites

- Java 17 or higher
- PostgreSQL database with PostGIS extension
- Access to the KAMCO database at 192.168.2.127:15432

## Configuration

Edit `src/main/resources/application.yml` to configure:

```yaml
converter:
  spring.profiles.active: 'dev' #profiles.active
  inference-id: 'D5E46F60FC40B1A8BE0CD1F3547AA6'  # Inference ID (used for output folder structure)
  map-ids:
    - '35813030'    # Add your map_ids here (text type)
  batch-ids:
    - 252
    - 253
    - 257
  output-base-dir: '/kamco-nfs/dataset/export/'
  crs: 'EPSG:5186'  # Korean 2000 / Central Belt

geoserver:
  base-url: 'https://kamco.geo-dev.gs.dabeeo.com'
  workspace: 'cd'
  datastore: 'inference_result'
  overwrite-existing: true  # Delete existing layers before re-registering
  connection-timeout: 30000  # 30 seconds
  read-timeout: 60000        # 60 seconds
```

```md
## Converter Mode (`converter.mode`)

`converter.mode`는 대량 `map_ids` 처리 시
OS 커맨드라인 길이 제한(`Argument list too long`) 문제를 방지하기 위해 추가 하였습니다.

### Supported Modes

#### MERGED
- `batch-ids`에 해당하는 **모든 데이터를 하나의 Shapefile로 병합 생성**
- `map-ids`가 설정되어 있어도 **무시됨**
- 단일 결과 파일이 필요한 경우 적합

#### MAP_IDS
- 명시적으로 전달한 `map-ids`만 대상으로 Shapefile 생성
- `converter.map-ids` **필수**
- `map-ids` 개수가 많을 경우 OS 커맨드라인 길이 제한에 걸릴 수 있음

#### RESOLVE
- `batch-ids` 기준으로 **JAR 내부에서 map_ids를 조회**한 뒤 Shapefile 생성
- `map-ids`를 커맨드라인 인자로 전달하지 않음
- 대량 데이터 처리 시 가장 안전한 방식

### Default Behavior (mode 미지정 시)

- `converter.map-ids`가 비어 있으면 → **MERGED**
- `converter.map-ids`가 있으면 → **MAP_IDS**

### Command Line Parameters

You can override configuration values using command line arguments:

**Using Gradle (recommended - no quoting issues):**

```bash
./gradlew bootRun --args="--converter.inference-id=D5E46F60FC40B1A8BE0CD1F3547AA6 --converter.map-ids[0]=35813030 --converter.batch-ids[0]=252 --converter.batch-ids[1]=253 --converter.batch-ids[2]=257 --converter.mode=MERGED"
```

**Using JAR (zsh shell - quote arguments with brackets):**

```bash
java -jar build/libs/shp-exporter.jar \
  '--converter.inference-id=D5E46F60FC40B1A8BE0CD1F3547AA6' \
  '--converter.map-ids[0]=35813030' \
  '--converter.batch-ids[0]=252' \
  '--converter.batch-ids[1]=253' \
  '--converter.batch-ids[2]=257' \
  '--converter.mode=MERGED'
```

**Using JAR (bash shell - no quotes needed):**

```bash
java -jar build/libs/shp-exporter.jar \
  --converter.inference-id=D5E46F60FC40B1A8BE0CD1F3547AA6 \
  --converter.map-ids[0]=35813030 \
  --converter.batch-ids[0]=252 \
  --converter.batch-ids[1]=253 \
  --converter.batch-ids[2]=257 \
  --converter.mode=MERGED
```


```bash

java -jar build/libs/shp-exporter.jar \
  --batch \
  --converter.inference-id=test009 \
  --converter.batch-ids[0]=111 \
  --converter.batch-ids[1]=114 \
  --converter.batch-ids[2]=162 \
  --geoserver.enabled=true

```

**Note for zsh users:** zsh interprets square brackets `[]` as glob patterns. Always quote arguments containing brackets when using zsh.

## Building

```bash
./gradlew build
```

## Running

### Generate Shapefiles (Spring Batch Mode - Recommended)

**New in v1.1.0**: Spring Batch mode provides memory-optimized processing for large datasets.

```bash
# MERGED mode (creates single shapefile + GeoJSON for all batch-ids)
./gradlew bootRun --args="--batch --converter.batch-ids[0]=252 --converter.batch-ids[1]=253"

# With GeoServer registration
./gradlew bootRun --args="--batch --geoserver.enabled=true --converter.batch-ids[0]=252"
```

**Output Files** (in `{output-base-dir}/{inference-id}/merge/`):
- `{inference-id}.shp` (+ .shx, .dbf, .prj) - Shapefile
- `{inference-id}.geojson` - GeoJSON file
- `{inference-id}.zip` - ZIP archive of shapefile

**Benefits**:
- 90-95% memory reduction (2-13GB → 150-200MB for 1M records)
- Chunk-based streaming (1000 records per chunk)
- Restart capability after failures
- Step-by-step execution support

See [claudedocs/SPRING_BATCH_MIGRATION.md](claudedocs/SPRING_BATCH_MIGRATION.md) for detailed documentation.

### Generate Shapefiles (Legacy Mode)

```bash
./gradlew bootRun
```

Or run the JAR directly:

```bash
java -jar build/libs/shp-exporter.jar
```

### Register Shapefile to GeoServer

First, set GeoServer credentials as environment variables:

```bash
export GEOSERVER_USERNAME=admin
export GEOSERVER_PASSWORD=geoserver
```

Then register a shapefile:

```bash
./gradlew bootRun --args="--upload-shp /kamco-nfs/dataset/export/D5E46F60FC40B1A8BE0CD1F3547AA6/35813030/35813030.shp --layer inference_35813030"
```

Or using the JAR:

```bash
java -jar build/libs/shp-exporter.jar \
  --upload-shp /path/to/shapefile.shp \
  --layer layer_name
```

**GeoServer Registration Process:**
1. Verifies workspace 'cd' exists (must be pre-created in GeoServer)
2. Creates datastore 'inference_result' if it doesn't exist
3. Deletes existing layer if `overwrite-existing: true`
4. Publishes shapefile via REST API
5. Verifies successful registration
6. Automatically enables WMS, WFS, WMTS services

**Important Notes:**
- Workspace 'cd' must exist in GeoServer before registration
- Environment variables `GEOSERVER_USERNAME` and `GEOSERVER_PASSWORD` must be set
- Shapefile path must be absolute
- GeoServer must have file system access to the shapefile location

## Output

### Legacy Mode Output
Shapefiles will be created in directories structured as `output-base-dir/inference-id/map-id/`:

```
/kamco-nfs/dataset/export/D5E46F60FC40B1A8BE0CD1F3547AA6/35813030/
├── 35813030.shp    # Shapefile geometry
├── 35813030.shx    # Shape index
├── 35813030.dbf    # Attribute data
└── 35813030.prj    # Projection information
```

### Spring Batch Mode Output
Output structure for MERGED mode (`output-base-dir/inference-id/merge/`):

```
/kamco-nfs/dataset/export/D5E46F60FC40B1A8BE0CD1F3547AA6/merge/
├── D5E46F60FC40B1A8BE0CD1F3547AA6.shp       # Shapefile geometry
├── D5E46F60FC40B1A8BE0CD1F3547AA6.shx       # Shape index
├── D5E46F60FC40B1A8BE0CD1F3547AA6.dbf       # Attribute data
├── D5E46F60FC40B1A8BE0CD1F3547AA6.prj       # Projection information
├── D5E46F60FC40B1A8BE0CD1F3547AA6.geojson   # GeoJSON format
└── D5E46F60FC40B1A8BE0CD1F3547AA6.zip       # ZIP archive (for GeoServer)
```

**Note**: Only the shapefile (.shp and related files) are registered to GeoServer. GeoJSON files are generated for alternative consumption.

## Database Query

### Spring Batch Mode (Recommended)

The Spring Batch mode applies comprehensive validation to ensure data quality:

```sql
ㄴㅅ
ORDER BY map_id, uid
```

**Validation Criteria**:
- **Geometry Type**: Only ST_Polygon and ST_MultiPolygon (excludes Point, LineString, etc.)
- **Coordinate System**: EPSG:5186 (Korean 2000 / Central Belt)
- **Coordinate Range**: Korea territory bounds (X: 125-530km, Y: -600-988km)
- **Geometry Validity**: Valid topology (ST_IsValid)

Rows failing validation are automatically excluded from processing, ensuring clean shapefile generation.

**Performance**: See [PERFORMANCE_OPTIMIZATION.md](claudedocs/PERFORMANCE_OPTIMIZATION.md) for indexing recommendations.

### Legacy Mode

Legacy mode uses a simpler query without validation:

```sql
SELECT uid, map_id, probability, before_year, after_year,
       before_c, before_p, after_c, after_p, ST_AsText(geometry) as geometry_wkt
FROM inference_results_testing
WHERE batch_id = ANY(?) AND map_id = ?
```

### Database Schema
- **geometry**: `geometry(Polygon, 5186)` - EPSG:5186 좌표계
- **map_id**: `text` - 문자열 타입
- **before_year, after_year**: `bigint` - Long 타입
- **batch_id**: `bigint` - Long 타입
- **uid**: `uuid` - UUID 타입

## Field Mapping

Shapefile field names are limited to 10 characters:

| Database Column | DB Type  | Shapefile Field | Shapefile Type |
|-----------------|----------|-----------------|----------------|
| uid             | uuid     | chnDtctId       | String         |
| map_id          | text     | mpqd_no         | String         |
| probability     | float8   | chn_dtct_p      | Double         |
| before_year     | bigint   | cprs_yr         | Long           |
| after_year      | bigint   | crtr_yr         | Long           |
| before_c        | text     | bf_cls_cd       | String         |
| before_p        | float8   | bf_cls_pro      | Double         |
| after_c         | text     | af_cls_cd       | String         |
| after_p         | float8   | af_cls_pro      | Double         |
| geometry        | geom     | the_geom        | Polygon        |

## Error Handling

### Shapefile Generation
- **No results**: Logs warning and continues to next map_id
- **Mixed geometry types**: Throws exception (shapefiles require homogeneous geometry)
- **Database connection failure**: Application exits with error
- **Invalid geometry**: Logs warning and continues processing

### GeoServer Registration
- **Workspace not found**: Logs error with remediation steps (must be pre-created)
- **Authentication failure**: Logs error prompting to verify environment variables
- **Network timeout**: Logs connection error with timeout details
- **Layer already exists**: Automatically deletes and re-registers if `overwrite-existing: true`
- **Registration failure**: Logs error but does not stop application (non-blocking)

## Validating Output

### Shapefile Validation

Open the generated shapefiles in QGIS or ArcGIS to verify:

1. Geometry displays correctly
2. Attribute table contains all expected fields
3. CRS is EPSG:5186 (Korean 2000 / Central Belt)

### GeoServer Layer Validation

After registering to GeoServer, verify the layer:

1. **GeoServer Admin Console**: https://kamco.geo-dev.gs.dabeeo.com/geoserver/web
2. Navigate to **Layers** → Find your layer (e.g., `cd:inference_35813030`)
3. Preview the layer using **Layer Preview**
4. Verify services are enabled:
   - WMS: `https://kamco.geo-dev.gs.dabeeo.com/geoserver/cd/wms`
   - WFS: `https://kamco.geo-dev.gs.dabeeo.com/geoserver/cd/wfs`
   - WMTS: `https://kamco.geo-dev.gs.dabeeo.com/geoserver/cd/wmts`

**Example WMS GetMap Request:**
```
https://kamco.geo-dev.gs.dabeeo.com/geoserver/cd/wms?
  service=WMS&
  version=1.1.0&
  request=GetMap&
  layers=cd:inference_35813030&
  bbox=<bounds>&
  width=768&
  height=768&
  srs=EPSG:5186&
  format=image/png
```

## Development

### Code Formatting

The project uses Google Java Format with 2-space indentation:

```bash
./gradlew spotlessApply
```

### Project Structure

```
src/main/java/com/kamco/makesample/
├── MakeSampleApplication.java                    # Main application class
├── batch/                                         # Spring Batch components (v1.1.0+)
│   ├── config/
│   │   ├── BatchConfiguration.java               # Spring Batch configuration
│   │   └── MergedModeJobConfig.java              # MERGED mode Job definition
│   ├── processor/
│   │   └── FeatureConversionProcessor.java       # InferenceResult → SimpleFeature processor
│   ├── reader/
│   │   ├── GeometryConvertingRowMapper.java      # WKT → JTS converter
│   │   └── InferenceResultItemReaderConfig.java  # Cursor-based DB reader
│   ├── tasklet/
│   │   ├── CreateZipTasklet.java                 # ZIP creation tasklet
│   │   ├── GeoServerRegistrationTasklet.java     # GeoServer registration tasklet
│   │   └── GeometryTypeValidationTasklet.java    # Geometry validation tasklet
│   ├── util/
│   │   └── FeatureTypeFactory.java               # Shared feature type creation
│   └── writer/
│       ├── StreamingGeoJsonWriter.java           # Streaming GeoJSON writer
│       └── StreamingShapefileWriter.java         # Streaming shapefile writer
├── cli/
│   └── ConverterCommandLineRunner.java           # CLI entry point (batch + legacy)
├── config/
│   ├── ConverterProperties.java                  # Shapefile converter configuration
│   ├── GeoServerProperties.java                  # GeoServer configuration
│   ├── GeoServerCredentials.java                 # GeoServer authentication
│   └── RestTemplateConfig.java                   # HTTP client configuration
├── exception/
│   ├── ShapefileConversionException.java
│   ├── GeometryConversionException.java
│   ├── MixedGeometryException.java
│   └── GeoServerRegistrationException.java       # GeoServer registration errors
├── model/
│   └── InferenceResult.java                      # Domain model
├── repository/
│   └── InferenceResultRepository.java            # Data access layer (legacy)
├── service/
│   ├── GeometryConverter.java                    # PostGIS to JTS conversion
│   ├── ShapefileConverterService.java            # Orchestration service (legacy)
│   └── GeoServerRegistrationService.java         # GeoServer REST API integration
└── writer/
    ├── ShapefileWriter.java                       # GeoTools shapefile writer (legacy)
    └── GeoJsonWriter.java                         # GeoJSON export writer (legacy)
```

## Dependencies

- Spring Boot 3.5.7
  - spring-boot-starter
  - spring-boot-starter-jdbc
  - spring-boot-starter-batch (v1.1.0+)
  - spring-boot-starter-web (for RestTemplate)
  - spring-boot-starter-validation (for @NotBlank annotations)
- GeoTools 30.0
  - gt-shapefile
  - gt-referencing
  - gt-epsg-hsql
  - gt-geojson
- PostgreSQL JDBC Driver
- PostGIS JDBC 2.5.1
- JTS (Java Topology Suite) 1.19.0

## Troubleshooting

### GeoServer Registration Issues

**Problem: "Workspace not found: cd"**
```
Solution: Create workspace 'cd' in GeoServer admin console before registration
Steps:
1. Login to GeoServer admin: https://kamco.geo-dev.gs.dabeeo.com/geoserver/web
2. Go to Workspaces → Add new workspace
3. Name: cd, Namespace URI: http://cd
4. Click Save
```

**Problem: "GeoServer credentials not configured"**
```
Solution: Set environment variables before running
export GEOSERVER_USERNAME=admin
export GEOSERVER_PASSWORD=geoserver
```

**Problem: "Layer already exists and overwrite is disabled"**
```
Solution: Enable overwrite in application.yml
geoserver:
  overwrite-existing: true
```

**Problem: Connection timeout to GeoServer**
```
Solution: Increase timeout values in application.yml
geoserver:
  connection-timeout: 60000  # 60 seconds
  read-timeout: 120000       # 120 seconds
```

**Problem: "Registration failed. Layer not found after publication"**
```
Possible causes:
1. GeoServer cannot access shapefile path (check file system permissions)
2. Shapefile is corrupted or invalid
3. Network issue interrupted registration

Solution:
1. Verify GeoServer has read access to shapefile directory
2. Validate shapefile using QGIS or ogr2ogr
3. Check GeoServer logs for detailed error messages
```

### Database Connection Issues

**Problem: "Connection refused to 192.168.2.127:15432"**
```
Solution: Verify PostgreSQL is running and accessible
psql -h 192.168.2.127 -p 15432 -U kamco_cds -d kamco_cds
```

**Problem: "No results found for map_id"**
```
Solution: Verify data exists in database
SELECT COUNT(*) FROM inference_results_testing
WHERE batch_id IN (252, 253, 257) AND map_id = '35813030';
```

## Batch Execution History

### Overview

Spring Batch mode automatically tracks execution history for each step, recording:
- Start time, end time, duration
- Success/failure status
- Error messages and stack traces (if failed)
- Processing statistics (read/write/commit/rollback/skip counts)

### Table Setup

Create the `batch_execution_history` table before running batch jobs:

```bash
psql -h 192.168.2.127 -p 15432 -U kamco_cds -d kamco_cds \
  -f src/main/resources/db/migration/V1__create_batch_execution_history.sql
```

### Query Examples

**View execution history for a specific job**:
```sql
SELECT step_name, start_time, end_time, duration_ms, status, read_count, write_count
FROM batch_execution_history
WHERE job_execution_id = 123
ORDER BY start_time;
```

**Check failed steps**:
```sql
SELECT job_execution_id, step_name, start_time, error_message
FROM batch_execution_history
WHERE status = 'FAILED'
ORDER BY start_time DESC
LIMIT 10;
```

**Average step duration**:
```sql
SELECT step_name,
       COUNT(*) as executions,
       ROUND(AVG(duration_ms) / 1000.0, 2) as avg_duration_sec
FROM batch_execution_history
WHERE status = 'COMPLETED'
GROUP BY step_name
ORDER BY avg_duration_sec DESC;
```

For more query examples and detailed documentation, see [BATCH_EXECUTION_HISTORY.md](claudedocs/BATCH_EXECUTION_HISTORY.md).

## License

KAMCO Internal Use Only