스케줄러로 변경

2026-03-08 21:33:41 +09:00
parent a6bb589189
commit b156b61caf
49 changed files with 3572 additions and 126 deletions
--- a/shp-exporter/README.md
+++ b/shp-exporter/README.md
@@ -85,7 +85,7 @@ You can override configuration values using command line arguments:
 **Using JAR (zsh shell - quote arguments with brackets):**

 ```bash
-java -jar build/libs/makesample-1.0.0.jar \
+java -jar build/libs/shp-exporter.jar \
  '--converter.inference-id=D5E46F60FC40B1A8BE0CD1F3547AA6' \
  '--converter.map-ids[0]=35813030' \
  '--converter.batch-ids[0]=252' \
@@ -97,7 +97,7 @@ java -jar build/libs/makesample-1.0.0.jar \
 **Using JAR (bash shell - no quotes needed):**

 ```bash
-java -jar build/libs/makesample-1.0.0.jar \
+java -jar build/libs/shp-exporter.jar \
  --converter.inference-id=D5E46F60FC40B1A8BE0CD1F3547AA6 \
  --converter.map-ids[0]=35813030 \
  --converter.batch-ids[0]=252 \
@@ -116,7 +116,32 @@ java -jar build/libs/makesample-1.0.0.jar \

 ## Running

-### Generate Shapefiles
+### Generate Shapefiles (Spring Batch Mode - Recommended)
+
+**New in v1.1.0**: Spring Batch mode provides memory-optimized processing for large datasets.
+
+```bash
+# MERGED mode (creates single shapefile + GeoJSON for all batch-ids)
+./gradlew bootRun --args="--batch --converter.batch-ids[0]=252 --converter.batch-ids[1]=253"
+
+# With GeoServer registration
+./gradlew bootRun --args="--batch --geoserver.enabled=true --converter.batch-ids[0]=252"
+```
+
+**Output Files** (in `{output-base-dir}/{inference-id}/merge/`):
+- `{inference-id}.shp` (+ .shx, .dbf, .prj) - Shapefile
+- `{inference-id}.geojson` - GeoJSON file
+- `{inference-id}.zip` - ZIP archive of shapefile
+
+**Benefits**:
+- 90-95% memory reduction (2-13GB → 150-200MB for 1M records)
+- Chunk-based streaming (1000 records per chunk)
+- Restart capability after failures
+- Step-by-step execution support
+
+See [claudedocs/SPRING_BATCH_MIGRATION.md](claudedocs/SPRING_BATCH_MIGRATION.md) for detailed documentation.
+
+### Generate Shapefiles (Legacy Mode)

 ```bash
 ./gradlew bootRun
@@ -125,7 +150,7 @@ java -jar build/libs/makesample-1.0.0.jar \
 Or run the JAR directly:

 ```bash
-java -jar build/libs/makesample-1.0.0.jar
+java -jar build/libs/shp-exporter.jar
 ```

 ### Register Shapefile to GeoServer
@@ -146,7 +171,7 @@ Then register a shapefile:
 Or using the JAR:

 ```bash
-java -jar build/libs/makesample-1.0.0.jar \
+java -jar build/libs/shp-exporter.jar \
  --upload-shp /path/to/shapefile.shp \
  --layer layer_name
 ```
@@ -167,6 +192,7 @@ java -jar build/libs/makesample-1.0.0.jar \

 ## Output

+### Legacy Mode Output
 Shapefiles will be created in directories structured as `output-base-dir/inference-id/map-id/`:

 ```
@@ -177,9 +203,45 @@ Shapefiles will be created in directories structured as `output-base-dir/inferen
 └── 35813030.prj    # Projection information
 ```

+### Spring Batch Mode Output
+Output structure for MERGED mode (`output-base-dir/inference-id/merge/`):
+
+```
+/kamco-nfs/dataset/export/D5E46F60FC40B1A8BE0CD1F3547AA6/merge/
+├── D5E46F60FC40B1A8BE0CD1F3547AA6.shp       # Shapefile geometry
+├── D5E46F60FC40B1A8BE0CD1F3547AA6.shx       # Shape index
+├── D5E46F60FC40B1A8BE0CD1F3547AA6.dbf       # Attribute data
+├── D5E46F60FC40B1A8BE0CD1F3547AA6.prj       # Projection information
+├── D5E46F60FC40B1A8BE0CD1F3547AA6.geojson   # GeoJSON format
+└── D5E46F60FC40B1A8BE0CD1F3547AA6.zip       # ZIP archive (for GeoServer)
+```
+
+**Note**: Only the shapefile (.shp and related files) are registered to GeoServer. GeoJSON files are generated for alternative consumption.
+
 ## Database Query

-The application executes the following query for each map_id:
+### Spring Batch Mode (Recommended)
+
+The Spring Batch mode applies comprehensive validation to ensure data quality:
+
+```sql
+ㄴㅅ
+ORDER BY map_id, uid
+```
+
+**Validation Criteria**:
+- **Geometry Type**: Only ST_Polygon and ST_MultiPolygon (excludes Point, LineString, etc.)
+- **Coordinate System**: EPSG:5186 (Korean 2000 / Central Belt)
+- **Coordinate Range**: Korea territory bounds (X: 125-530km, Y: -600-988km)
+- **Geometry Validity**: Valid topology (ST_IsValid)
+
+Rows failing validation are automatically excluded from processing, ensuring clean shapefile generation.
+
+**Performance**: See [PERFORMANCE_OPTIMIZATION.md](claudedocs/PERFORMANCE_OPTIMIZATION.md) for indexing recommendations.
+
+### Legacy Mode
+
+Legacy mode uses a simpler query without validation:

 ```sql
 SELECT uid, map_id, probability, before_year, after_year,
@@ -278,8 +340,26 @@ The project uses Google Java Format with 2-space indentation:
 ```
 src/main/java/com/kamco/makesample/
 ├── MakeSampleApplication.java                    # Main application class
+├── batch/                                         # Spring Batch components (v1.1.0+)
+│   ├── config/
+│   │   ├── BatchConfiguration.java               # Spring Batch configuration
+│   │   └── MergedModeJobConfig.java              # MERGED mode Job definition
+│   ├── processor/
+│   │   └── FeatureConversionProcessor.java       # InferenceResult → SimpleFeature processor
+│   ├── reader/
+│   │   ├── GeometryConvertingRowMapper.java      # WKT → JTS converter
+│   │   └── InferenceResultItemReaderConfig.java  # Cursor-based DB reader
+│   ├── tasklet/
+│   │   ├── CreateZipTasklet.java                 # ZIP creation tasklet
+│   │   ├── GeoServerRegistrationTasklet.java     # GeoServer registration tasklet
+│   │   └── GeometryTypeValidationTasklet.java    # Geometry validation tasklet
+│   ├── util/
+│   │   └── FeatureTypeFactory.java               # Shared feature type creation
+│   └── writer/
+│       ├── StreamingGeoJsonWriter.java           # Streaming GeoJSON writer
+│       └── StreamingShapefileWriter.java         # Streaming shapefile writer
 ├── cli/
-│   └── ConverterCommandLineRunner.java           # CLI entry point
+│   └── ConverterCommandLineRunner.java           # CLI entry point (batch + legacy)
 ├── config/
 │   ├── ConverterProperties.java                  # Shapefile converter configuration
 │   ├── GeoServerProperties.java                  # GeoServer configuration
@@ -293,14 +373,14 @@ src/main/java/com/kamco/makesample/
 ├── model/
 │   └── InferenceResult.java                      # Domain model
 ├── repository/
-│   └── InferenceResultRepository.java            # Data access layer
+│   └── InferenceResultRepository.java            # Data access layer (legacy)
 ├── service/
 │   ├── GeometryConverter.java                    # PostGIS to JTS conversion
-│   ├── ShapefileConverterService.java            # Orchestration service
+│   ├── ShapefileConverterService.java            # Orchestration service (legacy)
 │   └── GeoServerRegistrationService.java         # GeoServer REST API integration
 └── writer/
-    ├── ShapefileWriter.java                       # GeoTools shapefile writer
-    └── GeoJsonWriter.java                         # GeoJSON export writer
+    ├── ShapefileWriter.java                       # GeoTools shapefile writer (legacy)
+    └── GeoJsonWriter.java                         # GeoJSON export writer (legacy)
 ```

 ## Dependencies
@@ -308,6 +388,7 @@ src/main/java/com/kamco/makesample/
 - Spring Boot 3.5.7
  - spring-boot-starter
  - spring-boot-starter-jdbc
+  - spring-boot-starter-batch (v1.1.0+)
  - spring-boot-starter-web (for RestTemplate)
  - spring-boot-starter-validation (for @NotBlank annotations)
 - GeoTools 30.0
@@ -383,6 +464,57 @@ SELECT COUNT(*) FROM inference_results_testing
 WHERE batch_id IN (252, 253, 257) AND map_id = '35813030';
 ```

+## Batch Execution History
+
+### Overview
+
+Spring Batch mode automatically tracks execution history for each step, recording:
+- Start time, end time, duration
+- Success/failure status
+- Error messages and stack traces (if failed)
+- Processing statistics (read/write/commit/rollback/skip counts)
+
+### Table Setup
+
+Create the `batch_execution_history` table before running batch jobs:
+
+```bash
+psql -h 192.168.2.127 -p 15432 -U kamco_cds -d kamco_cds \
+  -f src/main/resources/db/migration/V1__create_batch_execution_history.sql
+```
+
+### Query Examples
+
+**View execution history for a specific job**:
+```sql
+SELECT step_name, start_time, end_time, duration_ms, status, read_count, write_count
+FROM batch_execution_history
+WHERE job_execution_id = 123
+ORDER BY start_time;
+```
+
+**Check failed steps**:
+```sql
+SELECT job_execution_id, step_name, start_time, error_message
+FROM batch_execution_history
+WHERE status = 'FAILED'
+ORDER BY start_time DESC
+LIMIT 10;
+```
+
+**Average step duration**:
+```sql
+SELECT step_name,
+       COUNT(*) as executions,
+       ROUND(AVG(duration_ms) / 1000.0, 2) as avg_duration_sec
+FROM batch_execution_history
+WHERE status = 'COMPLETED'
+GROUP BY step_name
+ORDER BY avg_duration_sec DESC;
+```
+
+For more query examples and detailed documentation, see [BATCH_EXECUTION_HISTORY.md](claudedocs/BATCH_EXECUTION_HISTORY.md).
+
 ## License

 KAMCO Internal Use Only