431 lines
10 KiB
Markdown
431 lines
10 KiB
Markdown
|
|
# 已完成功能说明书
|
|||
|
|
|
|||
|
|
本文档记录项目中已完成的功能实现细节,以供后续维护和参考。
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 第一部分:TensorBoard 实验追踪系统
|
|||
|
|
|
|||
|
|
**完成时间**: 2025-09-25
|
|||
|
|
**状态**: ✅ **生产就绪**
|
|||
|
|
|
|||
|
|
### 系统概览
|
|||
|
|
|
|||
|
|
在本地工作站搭建了一套轻量、低成本的实验追踪与可视化管道,覆盖训练、评估和模板匹配流程。
|
|||
|
|
|
|||
|
|
### 1. 配置系统集成
|
|||
|
|
|
|||
|
|
**位置**: `configs/base_config.yaml`
|
|||
|
|
|
|||
|
|
```yaml
|
|||
|
|
logging:
|
|||
|
|
use_tensorboard: true
|
|||
|
|
log_dir: "runs"
|
|||
|
|
experiment_name: "baseline"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**特点**:
|
|||
|
|
- 支持全局配置
|
|||
|
|
- 命令行参数可覆盖配置项
|
|||
|
|
- 支持自定义实验名称
|
|||
|
|
|
|||
|
|
### 2. 训练脚本集成
|
|||
|
|
|
|||
|
|
**位置**: `train.py` (第 45-75 行)
|
|||
|
|
|
|||
|
|
**实现内容**:
|
|||
|
|
- ✅ SummaryWriter 初始化
|
|||
|
|
- ✅ 损失记录(loss/total, loss/det, loss/desc)
|
|||
|
|
- ✅ 学习率记录(optimizer/lr)
|
|||
|
|
- ✅ 数据集信息记录(add_text)
|
|||
|
|
- ✅ 资源清理(writer.close())
|
|||
|
|
|
|||
|
|
**使用方式**:
|
|||
|
|
```bash
|
|||
|
|
# 使用默认配置
|
|||
|
|
uv run python train.py --config configs/base_config.yaml
|
|||
|
|
|
|||
|
|
# 自定义日志目录和实验名
|
|||
|
|
uv run python train.py --config configs/base_config.yaml \
|
|||
|
|
--log-dir /custom/path \
|
|||
|
|
--experiment-name my_exp_20251019
|
|||
|
|
|
|||
|
|
# 禁用 TensorBoard
|
|||
|
|
uv run python train.py --config configs/base_config.yaml --disable-tensorboard
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3. 评估脚本集成
|
|||
|
|
|
|||
|
|
**位置**: `evaluate.py`
|
|||
|
|
|
|||
|
|
**实现内容**:
|
|||
|
|
- ✅ SummaryWriter 初始化
|
|||
|
|
- ✅ Average Precision (AP) 计算与记录
|
|||
|
|
- ✅ 单应矩阵分解(旋转、平移、缩放)
|
|||
|
|
- ✅ 几何误差计算(err_rot, err_trans, err_scale)
|
|||
|
|
- ✅ 误差分布直方图记录
|
|||
|
|
- ✅ 匹配可视化
|
|||
|
|
|
|||
|
|
**记录的指标**:
|
|||
|
|
- `eval/AP`: Average Precision
|
|||
|
|
- `eval/err_rot`: 旋转误差
|
|||
|
|
- `eval/err_trans`: 平移误差
|
|||
|
|
- `eval/err_scale`: 缩放误差
|
|||
|
|
- `eval/err_rot_hist`: 旋转误差分布
|
|||
|
|
|
|||
|
|
### 4. 匹配脚本集成
|
|||
|
|
|
|||
|
|
**位置**: `match.py` (第 165-180 行)
|
|||
|
|
|
|||
|
|
**实现内容**:
|
|||
|
|
- ✅ TensorBoard 日志写入
|
|||
|
|
- ✅ 关键点统计
|
|||
|
|
- ✅ 实例检测计数
|
|||
|
|
|
|||
|
|
**记录的指标**:
|
|||
|
|
- `match/layout_keypoints`: 版图关键点总数
|
|||
|
|
- `match/instances_found`: 找到的实例数
|
|||
|
|
|
|||
|
|
### 5. 目录结构自动化
|
|||
|
|
|
|||
|
|
自动创建的目录结构:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
runs/
|
|||
|
|
├── train/
|
|||
|
|
│ └── baseline/
|
|||
|
|
│ └── events.out.tfevents...
|
|||
|
|
├── eval/
|
|||
|
|
│ └── baseline/
|
|||
|
|
│ └── events.out.tfevents...
|
|||
|
|
└── match/
|
|||
|
|
└── baseline/
|
|||
|
|
└── events.out.tfevents...
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 6. TensorBoard 启动与使用
|
|||
|
|
|
|||
|
|
**启动命令**:
|
|||
|
|
```bash
|
|||
|
|
tensorboard --logdir runs --port 6006
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**访问方式**:
|
|||
|
|
- 本地: `http://localhost:6006`
|
|||
|
|
- 局域网: `tensorboard --logdir runs --port 6006 --bind_all`
|
|||
|
|
|
|||
|
|
**可视化面板**:
|
|||
|
|
- **Scalars**: 损失曲线、学习率、评估指标
|
|||
|
|
- **Images**: 关键点热力图、模板匹配结果
|
|||
|
|
- **Histograms**: 误差分布、描述子分布
|
|||
|
|
- **Text**: 配置摘要、Git 提交信息
|
|||
|
|
|
|||
|
|
### 7. 版本控制与实验管理
|
|||
|
|
|
|||
|
|
**实验命名规范**:
|
|||
|
|
```
|
|||
|
|
YYYYMMDD_project_variant
|
|||
|
|
例如: 20251019_rord_fpn_baseline
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**特点**:
|
|||
|
|
- 时间戳便于检索
|
|||
|
|
- 按实验名称独立组织日志
|
|||
|
|
- 方便团队协作与结果对比
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 第二部分:FPN + NMS 推理改造
|
|||
|
|
|
|||
|
|
**完成时间**: 2025-09-25
|
|||
|
|
**状态**: ✅ **完全实现**
|
|||
|
|
|
|||
|
|
### 系统概览
|
|||
|
|
|
|||
|
|
将当前的"图像金字塔 + 多次推理"的匹配流程,升级为"单次推理 + 特征金字塔 (FPN)"。在滑动窗口提取关键点后增加去重(NMS),降低冗余点与后续 RANSAC 的计算量。
|
|||
|
|
|
|||
|
|
### 1. 配置系统
|
|||
|
|
|
|||
|
|
**位置**: `configs/base_config.yaml`
|
|||
|
|
|
|||
|
|
```yaml
|
|||
|
|
model:
|
|||
|
|
fpn:
|
|||
|
|
enabled: true
|
|||
|
|
out_channels: 256
|
|||
|
|
levels: [2, 3, 4]
|
|||
|
|
norm: "bn"
|
|||
|
|
|
|||
|
|
matching:
|
|||
|
|
use_fpn: true
|
|||
|
|
nms:
|
|||
|
|
enabled: true
|
|||
|
|
radius: 4
|
|||
|
|
score_threshold: 0.5
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**配置说明**:
|
|||
|
|
|
|||
|
|
| 参数 | 值 | 说明 |
|
|||
|
|
|------|-----|------|
|
|||
|
|
| `fpn.enabled` | true | 启用 FPN 架构 |
|
|||
|
|
| `fpn.out_channels` | 256 | 金字塔特征通道数 |
|
|||
|
|
| `fpn.levels` | [2,3,4] | 输出层级(P2/P3/P4) |
|
|||
|
|
| `matching.use_fpn` | true | 使用 FPN 路径匹配 |
|
|||
|
|
| `nms.enabled` | true | 启用 NMS 去重 |
|
|||
|
|
| `nms.radius` | 4 | 半径抑制像素半径 |
|
|||
|
|
| `nms.score_threshold` | 0.5 | 关键点保留分数阈值 |
|
|||
|
|
|
|||
|
|
### 2. FPN 架构实现
|
|||
|
|
|
|||
|
|
**位置**: `models/rord.py`
|
|||
|
|
|
|||
|
|
#### 架构组件
|
|||
|
|
|
|||
|
|
1. **横向连接(Lateral Connection)**
|
|||
|
|
```python
|
|||
|
|
self.lateral_c2 = nn.Conv2d(128, 256, kernel_size=1) # C2 → 256
|
|||
|
|
self.lateral_c3 = nn.Conv2d(256, 256, kernel_size=1) # C3 → 256
|
|||
|
|
self.lateral_c4 = nn.Conv2d(512, 256, kernel_size=1) # C4 → 256
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
2. **平滑层(Smoothing)**
|
|||
|
|
```python
|
|||
|
|
self.smooth_p2 = nn.Conv2d(256, 256, kernel_size=3, padding=1)
|
|||
|
|
self.smooth_p3 = nn.Conv2d(256, 256, kernel_size=3, padding=1)
|
|||
|
|
self.smooth_p4 = nn.Conv2d(256, 256, kernel_size=3, padding=1)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
3. **FPN 头部**
|
|||
|
|
```python
|
|||
|
|
self.det_head_fpn = nn.Sequential(...) # 检测头
|
|||
|
|
self.desc_head_fpn = nn.Sequential(...) # 描述子头
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 前向路径
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
def forward(self, x: torch.Tensor, return_pyramid: bool = False):
|
|||
|
|
if not return_pyramid:
|
|||
|
|
# 单尺度路径(向后兼容)
|
|||
|
|
features = self.backbone(x)
|
|||
|
|
detection_map = self.detection_head(features)
|
|||
|
|
descriptors = self.descriptor_head(features)
|
|||
|
|
return detection_map, descriptors
|
|||
|
|
|
|||
|
|
# FPN 多尺度路径
|
|||
|
|
c2, c3, c4 = self._extract_c234(x)
|
|||
|
|
|
|||
|
|
# 自顶向下构建金字塔
|
|||
|
|
p4 = self.lateral_c4(c4)
|
|||
|
|
p3 = self.lateral_c3(c3) + F.interpolate(p4, size=c3.shape[-2:], mode="nearest")
|
|||
|
|
p2 = self.lateral_c2(c2) + F.interpolate(p3, size=c2.shape[-2:], mode="nearest")
|
|||
|
|
|
|||
|
|
# 平滑处理
|
|||
|
|
p4 = self.smooth_p4(p4)
|
|||
|
|
p3 = self.smooth_p3(p3)
|
|||
|
|
p2 = self.smooth_p2(p2)
|
|||
|
|
|
|||
|
|
# 输出多尺度特征与相应的 stride
|
|||
|
|
pyramid = {
|
|||
|
|
"P4": (self.det_head_fpn(p4), self.desc_head_fpn(p4), 8),
|
|||
|
|
"P3": (self.det_head_fpn(p3), self.desc_head_fpn(p3), 4),
|
|||
|
|
"P2": (self.det_head_fpn(p2), self.desc_head_fpn(p2), 2),
|
|||
|
|
}
|
|||
|
|
return pyramid
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3. NMS 半径抑制实现
|
|||
|
|
|
|||
|
|
**位置**: `match.py` (第 35-60 行)
|
|||
|
|
|
|||
|
|
**算法**:
|
|||
|
|
```python
|
|||
|
|
def radius_nms(kps: torch.Tensor, scores: torch.Tensor, radius: float):
|
|||
|
|
"""
|
|||
|
|
按分数降序遍历关键点
|
|||
|
|
欧氏距离 < radius 的点被抑制
|
|||
|
|
时间复杂度:O(N log N)
|
|||
|
|
"""
|
|||
|
|
idx = torch.argsort(scores, descending=True)
|
|||
|
|
keep = []
|
|||
|
|
taken = torch.zeros(len(kps), dtype=torch.bool, device=kps.device)
|
|||
|
|
|
|||
|
|
for i in idx:
|
|||
|
|
if taken[i]:
|
|||
|
|
continue
|
|||
|
|
keep.append(i.item())
|
|||
|
|
di = kps - kps[i]
|
|||
|
|
dist2 = (di[:, 0]**2 + di[:, 1]**2)
|
|||
|
|
taken |= dist2 <= (radius * radius)
|
|||
|
|
taken[i] = True
|
|||
|
|
|
|||
|
|
return torch.tensor(keep, dtype=torch.long, device=kps.device)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**特点**:
|
|||
|
|
- 高效的 GPU 计算
|
|||
|
|
- 支持自定义半径
|
|||
|
|
- O(N log N) 时间复杂度
|
|||
|
|
|
|||
|
|
### 4. 多尺度特征提取
|
|||
|
|
|
|||
|
|
**位置**: `match.py` (第 68-110 行)
|
|||
|
|
|
|||
|
|
**函数**: `extract_from_pyramid()`
|
|||
|
|
|
|||
|
|
**流程**:
|
|||
|
|
1. 调用 `model(..., return_pyramid=True)` 获取多尺度特征
|
|||
|
|
2. 对每个层级(P2, P3, P4):
|
|||
|
|
- 提取关键点坐标与分数
|
|||
|
|
- 采样对应描述子
|
|||
|
|
- 执行 NMS 去重
|
|||
|
|
- 将坐标映射回原图(乘以 stride)
|
|||
|
|
3. 合并所有层级的关键点与描述子
|
|||
|
|
|
|||
|
|
### 5. 滑动窗口特征提取
|
|||
|
|
|
|||
|
|
**位置**: `match.py` (第 62-95 行)
|
|||
|
|
|
|||
|
|
**函数**: `extract_features_sliding_window()`
|
|||
|
|
|
|||
|
|
**用途**: 当不使用 FPN 时的备选方案
|
|||
|
|
|
|||
|
|
**特点**:
|
|||
|
|
- 支持任意大小的输入图像
|
|||
|
|
- 基于配置参数的窗口大小与步长
|
|||
|
|
- 自动坐标映射
|
|||
|
|
|
|||
|
|
### 6. 多实例匹配主函数
|
|||
|
|
|
|||
|
|
**位置**: `match.py` (第 130-220 行)
|
|||
|
|
|
|||
|
|
**函数**: `match_template_multiscale()`
|
|||
|
|
|
|||
|
|
**关键特性**:
|
|||
|
|
- ✅ 配置路由:根据 `matching.use_fpn` 选择 FPN 或滑窗
|
|||
|
|
- ✅ 多实例检测:迭代查找多个匹配实例
|
|||
|
|
- ✅ 几何验证:使用 RANSAC 估计单应矩阵
|
|||
|
|
- ✅ TensorBoard 日志记录
|
|||
|
|
|
|||
|
|
### 7. 兼容性与回退机制
|
|||
|
|
|
|||
|
|
**配置开关**:
|
|||
|
|
```yaml
|
|||
|
|
matching:
|
|||
|
|
use_fpn: true # true: 使用 FPN 路径
|
|||
|
|
# false: 使用图像金字塔路径
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**特点**:
|
|||
|
|
- 无损切换(代码不变)
|
|||
|
|
- 快速回退机制
|
|||
|
|
- 便于对比实验
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 总体架构图
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
输入图像
|
|||
|
|
↓
|
|||
|
|
[VGG 骨干网络]
|
|||
|
|
↓
|
|||
|
|
├─→ [C2 (relu2_2)] ──→ [lateral_c2] → [P2]
|
|||
|
|
├─→ [C3 (relu3_3)] ──→ [lateral_c3] → [P3]
|
|||
|
|
└─→ [C4 (relu4_3)] ──→ [lateral_c4] → [P4]
|
|||
|
|
↓
|
|||
|
|
[自顶向下上采样 + 级联]
|
|||
|
|
↓
|
|||
|
|
[平滑 3×3 conv]
|
|||
|
|
↓
|
|||
|
|
┌─────────┬──────────┬──────────┐
|
|||
|
|
↓ ↓ ↓ ↓
|
|||
|
|
[det_P2] [det_P3] [det_P4] [desc_P2/P3/P4]
|
|||
|
|
↓ ↓ ↓ ↓
|
|||
|
|
关键点提取 + NMS 去重 + 坐标映射
|
|||
|
|
↓
|
|||
|
|
[特征匹配与单应性估计]
|
|||
|
|
↓
|
|||
|
|
[多实例验证]
|
|||
|
|
↓
|
|||
|
|
输出结果
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 性能与可靠性
|
|||
|
|
|
|||
|
|
| 指标 | 目标 | 状态 |
|
|||
|
|
|------|------|------|
|
|||
|
|
| 推理速度 | FPN 相比滑窗提速 ≥ 30% | 🔄 待测试 |
|
|||
|
|
| 识别精度 | 多尺度匹配不降低精度 | ✅ 已验证 |
|
|||
|
|
| 内存占用 | FPN 相比多次推理节省 | ✅ 已优化 |
|
|||
|
|
| 稳定性 | 无异常崩溃 | ✅ 已验证 |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 使用示例
|
|||
|
|
|
|||
|
|
### 启用 FPN 匹配
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
uv run python match.py \
|
|||
|
|
--config configs/base_config.yaml \
|
|||
|
|
--layout /path/to/layout.png \
|
|||
|
|
--template /path/to/template.png \
|
|||
|
|
--tb-log-matches
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 禁用 FPN(对照实验)
|
|||
|
|
|
|||
|
|
编辑 `configs/base_config.yaml`:
|
|||
|
|
```yaml
|
|||
|
|
matching:
|
|||
|
|
use_fpn: false # 使用滑窗路径
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
然后运行:
|
|||
|
|
```bash
|
|||
|
|
uv run python match.py \
|
|||
|
|
--config configs/base_config.yaml \
|
|||
|
|
--layout /path/to/layout.png \
|
|||
|
|
--template /path/to/template.png
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 调整 NMS 参数
|
|||
|
|
|
|||
|
|
编辑 `configs/base_config.yaml`:
|
|||
|
|
```yaml
|
|||
|
|
matching:
|
|||
|
|
nms:
|
|||
|
|
enabled: true
|
|||
|
|
radius: 8 # 增大抑制半径
|
|||
|
|
score_threshold: 0.3 # 降低分数阈值
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 代码参考
|
|||
|
|
|
|||
|
|
### 关键文件速查表
|
|||
|
|
|
|||
|
|
| 功能 | 文件 | 行数 |
|
|||
|
|
|------|------|------|
|
|||
|
|
| TensorBoard 配置 | `configs/base_config.yaml` | 8-12 |
|
|||
|
|
| 训练脚本集成 | `train.py` | 45-75 |
|
|||
|
|
| 评估脚本集成 | `evaluate.py` | 20-50 |
|
|||
|
|
| 匹配脚本集成 | `match.py` | 165-180 |
|
|||
|
|
| FPN 架构 | `models/rord.py` | 1-120 |
|
|||
|
|
| NMS 实现 | `match.py` | 35-60 |
|
|||
|
|
| FPN 特征提取 | `match.py` | 68-110 |
|
|||
|
|
| 滑窗特征提取 | `match.py` | 62-95 |
|
|||
|
|
| 匹配主函数 | `match.py` | 130-220 |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**最后更新**: 2025-10-19
|
|||
|
|
**维护人**: GitHub Copilot
|
|||
|
|
**状态**: ✅ 生产就绪
|
|||
|
|
|