HF Test | Notion

To Do

HF Benchmark에서 성능을 확인 가능하도록 파이프라인 설계
Falldown, Fire, Normal을 탐지할 수 있도록 PIA 사내 데이터셋을 활용해 학습
- Violence는 벤치마크 문제가 있어 건너뜀

Fuse Adapter
- Input: image features $∈ R^{B×D}$, text features avg $∈ R^{C×D}$
- Feature interaction: concat | difference | product 상호작용 피처 생성 → (B·C)×(4D)
- MLP: Linear(4D→H) → ReLU → Dropout → Linear(H→1), logits reshape → B×C 출력
- Interaction weight: 각 상호작용 별 weight 부여
- Class specific adjustments: Interaction weight를 기반으로 클래스 간 interaction weight 조정
```
class_weights_base = interaction_weight + class_specific_adjustments
class_weights = torch.softmax(class_weights_base, dim=-1)
```
Data
- Image (Curataion)
  - v3_PE_10,000 데이터셋의 각 영상들에서 8프레임씩 균일 샘플링 후 저장
  - 클래스별로 1,000장씩 benchmark event 조건에 맞는 이미지들 선별
  - 넘어지기 직전, 쭈그려앉음, 불 난 후의 연기 등은 normal로 구분해 hard normal 구축
  - 기존 데이터셋 문제점
- Prompt
  - Average text embedding (kurnianto 님께 공유받은 방식 그대로)
    - Templete
  - 클래스 당 1개의 프롬프트만 최종적으로 사용

Preprocess
- 공통: pe_transforms 이용해 기본적인 이미지 전처리 (Resize, ToTensor, Normalize 등)
- Train: RandomResizedCrop, RandomHorizontalFlip 적용
Task & Margin 정의
- Event vs Normal 구조 (2 head: fire, falldown)
- Margin
  - $m_{\text{fire}} = z_{\text{fire}} - z_{\text{normal}}$
  - $m_{\text{falldown}} = z_{\text{falldown}} - z_{\text{normal}}$
Loss
- Head별 이진 샘플만 사용:
  - Falldown head: 샘플 ∈ {falldown, normal}만 사용
  - Fire head: 샘플 ∈ {fire, normal}만 사용
- 입력: margin → BCEwithLogitsLoss
- 최종: loss = loss_fire + loss_falldown
Optimizer
- AdamW
  - Group 1: 일반 파라미터
  - Group 2: interaction_weight, class_specific_adjustments → lr = base_lr × multiplier
Calibration & Tresholding
- Warmup 단계
  - Calibration (각 head)
    - $z’ = s \cdot m + b$
    - 목적함수: BCEWithLogitsLoss(y, z') + $\lambda\big((s-1)^2 + b^2\big)$
    - 의도: margin scale/offset 보정 (margin > 0 이어도 normal 인 상황 완화)
  - Threshold Tuning
    - 보정된 score 기반
    - $F_\beta$ 를 기준으로 탐색 (기본 $\beta$ = 0.5, precision에 가중)
    - Bootstrap 수행 → 각 부트스트랩 best threshold 집합 → 0.7-quantile 선택
- After warmup
  - Calibration/threshold 고정하고 모델 파라미터 학습만 몇 epoch 더 진행
Selection
- 각 head 별 precision, recall, F1 계산 후 최적의 파라미터 선정

학습 소요 시간: 12 epoch warmup lock 기준 10분 47초 (A5000 기준)
최적 파라미터

lr interaction_lr_multiplier weight_decay dropout warmup_epochs beta

0.0005 8.0 0.001 0.4 12 0.5
Hugging face benchmark 결과

fire F1 fire Acc fire Pre fire Rec fall F1 fall Acc fall Pre fall Rec

0.7790 0.9434 0.9046 0.6840 0.7608 0.8707 0.7954 0.72914

lr	interaction_lr_multiplier	weight_decay	dropout	warmup_epochs	beta
0.0005	8.0	0.001	0.4	12	0.5

fire F1	fire Acc	fire Pre	fire Rec	fall F1	fall Acc	fall Pre	fall Rec
0.7790	0.9434	0.9046	0.6840	0.7608	0.8707	0.7954	0.72914