âȘ Table 1: An overview of the existing image-report chest X-ray datasets. Values above 10,000 are rounded and abbreviated with K, which stands for thousand (e.g., 10K for 10,000). Table references: RP: Report Parsing, RIR: Radiologist Interpretation of Reports, RI: Radiologist Interpretation of Chest Radiographs, RCI: Radiologist Cohort Agreement for Chest radiographs, LT: Laboratory Tests. BB: Bounding Box, CL: Classification, R: Report, PA: Poster Anterior, AP: Antero Posterior, L: Lateral. (/): not declared.
Number |
Number |
Number |
Type |
Labels |
Format |
Method |
Language |
Year |
||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Shenzhen Hospital | / | / | 340 | CL | 2 | 340 | / | / | DICOM | RI | English | 2014 |
IU X-ray | 3,9K | 3,9K | 7,4K | R | / | 3,9K | / | 3,9K | JPEG | RI | English | 2016 |
Chest X-ray 14 | 31K | / | 112K | CL/BB | 14 | 67K | 45K | / | PNG | RP/RI | English | 2017 |
CX-CHR | 35K | 35K | 45K | CL | 155 | / | / | / | JPEG | RP | Chinese | 2018 |
CheXpert | 65K | 188K | 224K | CL | 14 | 29K | 162K | 32K | JPEG | RCI/RP | English | 2019 |
MIMIC-CXR | 65K | 224K | 372K | CL/R | 14 | 130K | 162K | 122k | DICOM | RP | English | 2019 |
PadChest | 67K | 110k | 160k | CL/R | 193 | 96K | 20K | 51K | DICOM | RIR/RP | Spanish | 2019 |
VinDR-CXR | / | / | 100k | CL/BB | 28 | 18K | / | / | DICOM | RI | English | 2020 |
BIMCV | 1,3K | 2,3K | 3,2K | CL | 2 | 1,1K | 1,3K | 815 | DICOM | LT | English | 2020 |
CXR14 Rad-Labels | 1,7K | / | 4,3K | CL | 4 | 3,2K | 1,1K | / | JPEG | RCI | English | 2020 |
CASIA-CXR (ours) | 11,1K | 11,1K | 11,1K | CL/R | 24 | 7,7K | 3,3K | / | JPEG | RIR/RP | French | 2024 |
âȘ Table 2: Disease Classification Performance.
Labels | Disease | Accuracy | Precision | Recall | F1-Score | Support |
---|---|---|---|---|---|---|
Global Labels (1-5) |
Cardiomegaly 1 | 0.830 | 0.937 | 0.825 | 0.877 | 3,756 |
Pneumothorax 2 | 0.823 | 0.935 | 0.811 | 0.868 | 2,000 | |
Pneumonia 3 | 0.712 | 0.817 | 0.779 | 0.797 | 2,000 | |
Pleural Effusion 4 | 0.660 | 0.841 | 0.598 | 0.698 | 2,000 | |
Mass 5 | 0.582 | 0.692 | 0.630 | 0.659 | 1,355 | |
Local Labels (6-24) |
Pulmonary Opacity 6 | 0.580 | 0.373 | 0.46 | 0.412 | 680 |
Emphysema 7 | 0.811 | 0.701 | 0.748 | 0.724 | 595 | |
Edema 8 | 0.602 | 0.655 | 0.482 | 0.555 | 609 | |
Atelectasis 9 | 0.715 | 0.722 | 0.701 | 0.711 | 554 | |
Lung Tumor 10 | 0.721 | 0.843 | 0.717 | 0.775 | 584 | |
Calcification 11 | 0.860 | 0.728 | 0.679 | 0.703 | 583 | |
Infiltration 12 | 0.821 | 0.823 | 0.691 | 0.751 | 633 | |
Cardiopathy 13 | 0.810 | 0.701 | 0.735 | 0.717 | 592 | |
Bilateral Hilar 14 | 0.638 | 0.612 | 0.437 | 0.509 | 607 | |
Dyspnea 15 | 0.596 | 0.710 | 0.533 | 0.609 | 597 | |
Apical Hypercarbia 16 | 0.583 | 0.579 | 0.562 | 0.570 | 567 | |
Hypertrophy 17 | 0.600 | 0.566 | 0.542 | 0.554 | 580 | |
Enlargement AP 18 | 0.631 | 0.638 | 0.510 | 0.567 | 599 | |
Enlargement PA 19 | 0.689 | 0.765 | 0.546 | 0.638 | 548 | |
Oval Opacity 20 | 0.808 | 0.765 | 0.711 | 0.737 | 582 | |
Pleural Thickening 21 | 0.769 | 0.674 | 0.685 | 0.679 | 556 | |
Mediastinal 22 | 0.553 | 0.733 | 0.818 | 0.773 | 600 | |
Pulmonary Cavity 23 | 0.776 | 0.634 | 0.66 | 0.647 | 595 | |
Tuberculosis 24 | 0.823 | 0.863 | 0.683 | 0.763 | 558 |

Figure 1: Disease classification performance for global and local labels.
âȘ Table 3: Intra-language contextual analysis (ILCA) results against the ground truth. G represents a portion of the generated report, whereas GT represents a portion of the ground truth. * CFF denotes: comprehensiveness, fluency, and faithfulness.
Criteria | Metric | Score | Our result samples |
---|---|---|---|
French-specific Medical Terminology |
Precision | 68.5% | G: Associe des rayures opaques rétractiles avec un épaississement bilatéral non cloisonné. GT: Opacités linéaires rétractiles avec épaississement non septaux bilatéraux. |
Recall | 60.4% | G: Tube de drainage positionné dans la partie haute du champ pulmonaire gauche. GT: Drain en place au niveau du tiers supérieur de l’hémichamp pulmonaire gauche. |
|
F-1 | 64.2% | G: Pneumothorax bilatéral d'intensité modérée observable dans les lobes supérieurs. GT: Pneumothorax de moyenne abondance bilatéral visible au niveau des lobes supérieurs. |
|
Linguistic Nuances |
Grammar Correctness |
40.4% | G: Pneumothorax a gauche moyen abondance. GT: Pneumothorax gauche de moyenne abondance. |
Cultural Context |
CFF | 76.6% | G: Présence de lignes et réticulations opaques sur le côté gauche. GT: Opacités linéaires et réticulaires gauche avec quelques opacités alvéolaires. |

Figure 2: Our results within ILCA setting against the ground truth.
âȘ Table 4: Intra-language contextual analysis (ILCA) results of various MRG models. The best results are in bold, and the underlined are the second-best results.
Dataset | Model | NLG Metrics | CE Metrics | |||||
---|---|---|---|---|---|---|---|---|
BLEU-1 | BLEU-4 | ROUGE-L | METEOR | Precision | Recall | F-1 | ||
CASIA-CXR (Ours) |
CoAtt [1] | 0.300 | 0.103 | 0.249 | 0.121 | - | - | - |
Up-Down [2] | 0.309 | 0.106 | 0.253 | 0.126 | - | - | - | |
WCL [3] | 0.333 | 0.127 | 0.266 | 0.140 | 0.380 | 0.271 | 0.316 | |
R2Gen [4] | 0.354 | 0.144 | 0.273 | 0.161 | 0.588 | 0.512 | 0.548 | |
M2TR [5] | 0.389 | 0.151 | 0.289 | 0.177 | 0.522 | 0.481 | 0.500 | |
Ours | 0.404 | 0.177 | 0.292 | 0.158 | 0.685 | 0.604 | 0.642 |
âȘ Table 5: We visualize samples of the generated reports for three cases. We bold the identical descriptions, underline the additional identified findings, and in red the failure findings..
X-ray Image | Baseline | M2TR | R2Gen | Ours |
---|---|---|---|---|
![]() |
Pneumothorax gauche de moyenne abondance./Silhouette cardiaque dâaspect normal./Absence dâanomalie mĂ©diastinale./Absence dâĂ©panchement pleural./Absence dâanomalie du gril costal./ | Pneumothorax gauche de grande abondance./Contours cardiaques rĂ©guliers./MĂ©diastin sans anomalies dĂ©celables./ Aucun Ă©panchement pleural dĂ©tectĂ©./Structure du gril costal normale. | Pneumothorax de petite abondance./Aucune anomalie visible au niveau du mĂ©diastin./Pas de prĂ©sence d'Ă©panchement pleural./ | Pneumothorax bilatĂ©ral de moyenne abondance./Configuration normale de la silhouette cardiaque./MĂ©diastin clair, sans irrĂ©gularitĂ©s./Absence d'Ă©panchement pleural./Aucune lĂ©sion du gril costal observĂ©e./ |
![]() |
RĂ©gression du pneumothorax droit./Drain en place au niveau apical droit./OpacitĂ© en bande linĂ©aire au niveau du tiers infĂ©rieure de lâhĂ©michamp pulmonaire gauche./Scissurite droite./Silhouette cardiaque dâaspect normal./Absence dâanomalie du gril costal./ | StabilitĂ© du pneumothorax droit./Drain apical droit en place et fonctionnel./Bande opacifiante au niveau du tiers moyen de lâhĂ©michamp pulmonaire gauche./LĂ©gĂšre scissurite droite./ | Diminution du Pneumothorax gauche./Drain thoracique positionnĂ© Ă l'apex gauche./OpacitĂ© linĂ©aire sur le tiers supĂ©rieur de l'hĂ©michamp pulmonaire droit./ | RĂ©gression marquĂ©e du pneumothorax droit./Disposition dâun drain au niveau apical gauche./OpacitĂ© linĂ©aire dans le tiers infĂ©rieur de lâhĂ©michamp pulmonaire gauche./NormalitĂ© du gril costal et des parties molles./ |
![]() |
Pneumothorax gauche de grande abondance./OpacitĂ© rĂ©ticulo-micronodulaire avec verre dĂ©poli de lâhĂ©michamp pulmonaire gauche avec individualisation dâune volumineuse bulle apical gauche./ | Pneumothorax droit de grande abondance./ OpacitĂ© rĂ©ticulo-micronodulaire dans lâhĂ©michamp pulmonaire droit, avec verre dĂ©poli et prĂ©sence d'une bulle apicale droite importante./ | Pneumothorax gauche modĂ©rĂ©./ OpacitĂ©s rĂ©ticulo-micronodulaires avec zones de verre dĂ©poli Ă©tendues sur lâhĂ©michamp pulmonaire droit./ | Pneumothorax bilatĂ©ral de grande abondance./ PrĂ©sence d'opacitĂ© rĂ©ticulo-micronodulaire avec aspect de verre dĂ©poli sur les deux hĂ©michamps pulmonaires, plus de grande bulle apicale sur le cĂŽtĂ© gauche./ SurĂ©levation de la coupole diaphragmatique droite avec trachĂ©e tirĂ©e vers la droite./ |
[1] On the Automatic Generation of Medical Imaging Reports. Link
[2] Bottom-up and top-down attention for image captioning and VQA. Link
[3] Weakly Supervised Contrastive Learning for Chest X-Ray Report Generation. Link
[4] Generating Radiology Reports via Memory-driven Transformer. Link
[5] Progressive Transformer-Based Generation of Radiology Reports. Link
âȘ Table 6: Cross-language contextual analysis (CLCA) results against SOTA MRG models on MIMIC-CXR dataset. We replicate the results from the original papers. The best results are in bold, and the underlined are the second-best results.
Dataset | Model | NLG Metrics | CE Metrics | |||||
---|---|---|---|---|---|---|---|---|
BLEU-1 | BLEU-4 | ROUGE-L | METEOR | Precision | Recall | F-1 | ||
MIMIC-CXR |
R2Gen [1] | 0.353 | 0.103 | 0.227 | 0.142 | 0.333 | 0.273 | 0.300 |
CMCL [2] | 0.334 | 0.097 | 0.281 | 0.133 | - | - | - | |
PPKED [3] | 0.360 | 0.106 | 0.284 | 0.149 | - | - | - | |
CA [4] | 0.350 | 0.109 | 0.283 | 0.151 | 0.352 | 0.298 | 0.322 | |
AlignTR [5] | 0.378 | 0.112 | 0.283 | 0.158 | - | - | - | |
M2TR [6] | 0.378 | 0.107 | 0.272 | 0.145 | 0.240 | 0.428 | 0.308 | |
CASIA-CXR (Ours) | Ours | 0.357 | 0.197 | 0.314 | 0.177 | 0.340 | 0.437 | 0.383 |

Figure 3: Cross-language contextual analysis (CLCA) results against SOTA MRG models.

Figure 4: (Left) Number of sentences and words generated by MRG models within the ILCA setting. (Right) Number of sentences and words generated by MRG models within the CLCA setting.

Figure 5: The most frequent radiographic findings. Labels are shown for both expert interpretation (manual labeling in light green), and our generated labels (automatic labeling in dark green).
References[1] Generating radiology reports via memory-driven transformer. Link
[2] Competence-based multimodal curriculum learning for medical report generation. Link
[3] Exploring and distilling posterior and prior knowledge for radiology report generation. Link
[4] Contrastive attention for automatic chest x-ray report generation. Link
[5] Aligntransformer: Hierarchical alignment of visual regions and disease tags for medical report generation. Link
[6] Progressive transformer-based generation of radiology reports. Link
âȘ Table 7: We visualize samples of the generated reports for three cases. We bold the identical descriptions, underline the additional identified findings, and in red the failure findings.
X-ray Image | French Ground Truth | Our Generated Report | Translated to English | English Ground Truth |
---|---|---|---|---|
![]() |
Augmentation de l'index cardio- thoracique./ Absence d'anomalie parenchymateuse./ Absence d'épanchement pleural./ |
Augmentation l'index cardio-thoracique et Absence d'épanchement pleural. Absence du gril costal./ |
Increased cardiothoracic index and absence of pleural effusion and absence of the rib grill./ |
Increased cardiothoracic index./ Absence of parenchymal abnormality./ Absence of pleural effusion./ |
![]() |
Opacités linéaire et stellaire bilatérales./ Rétractiles bilatérales et diffuses./ Surcharge hilaire bilatérale./ Il s'y associe des opacités linéaires./ |
Stellaires bilatérales et des opacités rétractiles bilatérales. Surcharge hilaire bilatérale est également présente./ |
Bilateral stellate opacities and bilateral retractile opacities. Bilateral hilar overload is also present./ |
Bilateral linear and stellar and bilateral and diffuse retractile opacities./ Bilateral hilar overload./ It is associated with linear opacities./ |
![]() |
Ăpanchement pleural gauche de moyenne abondance./ Aspect rĂ©tractĂ© du poumon gauche avec attraction de la trachĂ©e et hypertrophie du poumon droit./ |
Il est observé un épanchement pleural gauche de abondance. Le poumon gauche présente un aspect hypertrophie./ |
Abundant left pleural effusion is observed./ The left lung presents an enlarged hypertrophy./ |
Left pleural effusion of moderate abundance./ Retracted appearance of the left lung with attraction of the trachea hypertrophy of the right lung./ |
![]() |
Opacité arrondie, de tonalité hydrique, bien limitée au niveau des tiers inférieur./ Moyen de l'hémichamp pulmonaire gauche./ |
CardiomĂ©galie associĂ©e Ă un infiltrat interstitiel origine infectieuse, Ă confronter au reste du bilan./ Ăpanchement pleural gauche de abondance./ |
Cardiomégalie associated with an interstitial infiltrate of infectious origin,to be compared with the rest of the assessment, abundant left pleural effusion./ |
Rounded opacity, watery in tone, well limited, at the level of the lower and middle thirds of the left pulmonary hemifield./ |
âȘ Table 8: Ablation studies on CASIA-CXR dataset.
Our Model | NLG Metrics | CE Metrics | ||||
---|---|---|---|---|---|---|
BLEU-1 | BLEU-4 | ROUGE-L | Precision | Recall | F-1 Score | |
w/o D_Txt | 0.300 | 0.144 | 0.282 | 0.653 | 0.553 | 0.598 |
w/o D_Fused | 0.293 | 0.131 | 0.263 | 0.627 | 0.531 | 0.575 |
w/o D_Enriched | 0.281 | 0.122 | 0.254 | 0.612 | 0.507 | 0.554 |
Baseline | 0.404 | 0.177 | 0.292 | 0.685 | 0.604 | 0.639 |

Figure 6: Ablation studies on CASIA-CXR dataset.
âȘ Table 9: Results of expert evaluation on CASIA-CXR dataset. The baseline obtained through blinded evaluation conducted by the radiologists. Values are reported on a 1 to 5 scale..
Criteria | Baseline | Variants | |||
---|---|---|---|---|---|
w D_Fused | w D_Enriched | w/o D_Fused | w/o D_Enriched | ||
Comprehensiveness | 3.8 | 4.2 | 4.1 | 3.0 | 3.2 |
Fluency | 4.0 | 3.8 | 4.2 | 3.5 | 3.7 |
Faithfulness | 3.7 | 4.0 | 3.9 | 3.2 | 3.5 |

Figure 7: Expert evaluation on CASIA-CXR dataset.