Cross-modality Referring Segmentation for Abdominal MRI
The dataset, code, and trained model weights will be publicly available after anonymous peer review.
Task definition of referring segmentation in multi-modality MRI images.
CrossMR Architecture. The scribble prompt along with the reference image are used to produce image features corresponding to the prompted region to form visual tokens. Learnable object queries and visual tokens cross-attend to the target image features and self-attend within themselves. The object queries are linearly mapped to mask embeddings and class embeddings, which are used to output the segmentation mask and the text label, respectively.
Table 1. Quantitative segmentation results. Organ-wise DSC scores of the baseline model and proposed method across one in-distribution modality (T1w) and four out-of-distribution modalities (T2w, DWI, In-phase, and Opposed-phase).