Fuse Adapter
Input: image features $∈ R^{B×D}$, text features avg $∈ R^{C×D}$
Feature interaction: concat | difference | product 상호작용 피처 생성 → (B·C)×(4D)
MLP: Linear(4D→H) → ReLU → Dropout → Linear(H→1), logits reshape → B×C 출력
Interaction weight: 각 상호작용 별 weight 부여
Class specific adjustments: Interaction weight를 기반으로 클래스 간 interaction weight 조정
class_weights_base = interaction_weight + class_specific_adjustments
class_weights = torch.softmax(class_weights_base, dim=-1)
Data
pe_transforms
이용해 기본적인 이미지 전처리 (Resize, ToTensor, Normalize 등)RandomResizedCrop
, RandomHorizontalFlip
적용BCEwithLogitsLoss
loss
= loss_fire
+ loss_falldown
interaction_weight
, class_specific_adjustments
→ lr = base_lr
× multiplier
BCEWithLogitsLoss(y, z')
+ $\lambda\big((s-1)^2 + b^2\big)$학습 소요 시간: 12 epoch warmup lock 기준 10분 47초 (A5000 기준)
최적 파라미터
lr | interaction_lr_multiplier | weight_decay | dropout | warmup_epochs | beta |
---|---|---|---|---|---|
0.0005 | 8.0 | 0.001 | 0.4 | 12 | 0.5 |
Hugging face benchmark 결과
fire F1 | fire Acc | fire Pre | fire Rec | fall F1 | fall Acc | fall Pre | fall Rec |
---|---|---|---|---|---|---|---|
0.7790 | 0.9434 | 0.9046 | 0.6840 | 0.7608 | 0.8707 | 0.7954 | 0.72914 |