ibrahimhamamci/CT-RATE
Preview • Updated • 119k • 244
[MICCAI' 25] From Slices to Volumes: Multi-Scale Fusion of 2D and 3D Features for CT Scan Report Generation
| Model | Bleu1 | Bleu4 | RougeL | Meteor | Bert F1 | Llama Score |
|---|---|---|---|---|---|---|
| CT2Rep | 0.309 | 0.172 | 0.243 | 0.173 | 0.865 | 6.35 |
| CT-Chat | 0.395 | - | 0.321 | 0.219 | - | 5.664 |
| Our Baseline (SAMF) | 0.423 | 0.203 | 0.338 | 0.356 | 0.879 | 6.792 |
| SAMF + Ao2D | 0.440 | 0.261 | 0.417 | 0.417 | 0.889 | 7.165 |
Slice Attentive Multimodal Fusion (SAMF) , a framework that combines the rich, high-resolution information from 2D slices with the spatial coherence of 3D volumetric data. Experimental results demonstrate that our method outperforms existing baseline approaches in both report generation and multiple-choice question answering, highlighting the critical role of multidimensional feature integration.
To perform evaluation using this model, please refer to our GitHub repository (serag-ai/SAMF), which provides detailed information on how to use it.
Base model
microsoft/Phi-3-mini-4k-instruct