Welcome to this page!
💡 Benchmarks on Medical Report Generation (MRG) in French

📃 Existing CXR Datasets

â–Ș Table 1: An overview of the existing image-report chest X-ray datasets. Values above 10,000 are rounded and abbreviated with K, which stands for thousand (e.g., 10K for 10,000). Table references: RP: Report Parsing, RIR: Radiologist Interpretation of Reports, RI: Radiologist Interpretation of Chest Radiographs, RCI: Radiologist Cohort Agreement for Chest radiographs, LT: Laboratory Tests. BB: Bounding Box, CL: Classification, R: Report, PA: Poster Anterior, AP: Antero Posterior, L: Lateral. (/): not declared.


Dataset

Patients
Number
Studies
Number
Images
Number
Annotation
Type
Annotated
Labels
Images View Position
Image
Format
Labeling
Method
Report
Language
Release
Year
PA
AP
L
Shenzhen Hospital / / 340 CL 2 340 / / DICOM RI English 2014
IU X-ray 3,9K 3,9K 7,4K R / 3,9K / 3,9K JPEG RI English 2016
Chest X-ray 14 31K / 112K CL/BB 14 67K 45K / PNG RP/RI English 2017
CX-CHR 35K 35K 45K CL 155 / / / JPEG RP Chinese 2018
CheXpert 65K 188K 224K CL 14 29K 162K 32K JPEG RCI/RP English 2019
MIMIC-CXR 65K 224K 372K CL/R 14 130K 162K 122k DICOM RP English 2019
PadChest 67K 110k 160k CL/R 193 96K 20K 51K DICOM RIR/RP Spanish 2019
VinDR-CXR / / 100k CL/BB 28 18K / / DICOM RI English 2020
BIMCV 1,3K 2,3K 3,2K CL 2 1,1K 1,3K 815 DICOM LT English 2020
CXR14 Rad-Labels 1,7K / 4,3K CL 4 3,2K 1,1K / JPEG RCI English 2020
CASIA-CXR (ours) 11,1K 11,1K 11,1K CL/R 24 7,7K 3,3K / JPEG RIR/RP French 2024

📃 Classification Performance

â–Ș Table 2: Disease Classification Performance.

Labels Disease Accuracy Precision Recall F1-Score Support



Global
Labels
(1-5)
Cardiomegaly 1 0.830 0.937 0.825 0.877 3,756
Pneumothorax 2 0.823 0.935 0.811 0.868 2,000
Pneumonia 3 0.712 0.817 0.779 0.797 2,000
Pleural Effusion 4 0.660 0.841 0.598 0.698 2,000
Mass 5 0.582 0.692 0.630 0.659 1,355













Local
Labels
(6-24)
Pulmonary Opacity 6 0.580 0.373 0.46 0.412 680
Emphysema 7 0.811 0.701 0.748 0.724 595
Edema 8 0.602 0.655 0.482 0.555 609
Atelectasis 9 0.715 0.722 0.701 0.711 554
Lung Tumor 10 0.721 0.843 0.717 0.775 584
Calcification 11 0.860 0.728 0.679 0.703 583
Infiltration 12 0.821 0.823 0.691 0.751 633
Cardiopathy 13 0.810 0.701 0.735 0.717 592
Bilateral Hilar 14 0.638 0.612 0.437 0.509 607
Dyspnea 15 0.596 0.710 0.533 0.609 597
Apical Hypercarbia 16 0.583 0.579 0.562 0.570 567
Hypertrophy 17 0.600 0.566 0.542 0.554 580
Enlargement AP 18 0.631 0.638 0.510 0.567 599
Enlargement PA 19 0.689 0.765 0.546 0.638 548
Oval Opacity 20 0.808 0.765 0.711 0.737 582
Pleural Thickening 21 0.769 0.674 0.685 0.679 556
Mediastinal 22 0.553 0.733 0.818 0.773 600
Pulmonary Cavity 23 0.776 0.634 0.66 0.647 595
Tuberculosis 24 0.823 0.863 0.683 0.763 558
CASIA-CXR: An Open Chest X-ray Dataset with Benchmarks for Automatic Radiology Report Generation in French

Figure 1: Disease classification performance for global and local labels.

📃 MRG Analysis

â–Ș Table 3: Intra-language contextual analysis (ILCA) results against the ground truth. G represents a portion of the generated report, whereas GT represents a portion of the ground truth. * CFF denotes: comprehensiveness, fluency, and faithfulness.

Criteria Metric Score Our result samples
French-specific
Medical
Terminology
Precision 68.5% G: Associe des rayures opaques rétractiles avec un épaississement bilatéral non cloisonné.
GT: Opacités linéaires rétractiles avec épaississement non septaux bilatéraux.
Recall 60.4% G: Tube de drainage positionné dans la partie haute du champ pulmonaire gauche.
GT: Drain en place au niveau du tiers supérieur de l’hémichamp pulmonaire gauche.
F-1 64.2% G: Pneumothorax bilatéral d'intensité modérée observable dans les lobes supérieurs
GT: Pneumothorax de moyenne abondance bilatéral visible au niveau des lobes supérieurs.
Linguistic
Nuances
Grammar
Correctness
40.4% G: Pneumothorax a gauche moyen abondance.
GT: Pneumothorax gauche de moyenne abondance.
Cultural
Context
CFF 76.6% G: Présence de lignes et réticulations opaques sur le côté gauche.
GT: Opacités linéaires et réticulaires gauche avec quelques opacités alvéolaires.
CASIA-CXR: An Open Chest X-ray Dataset with Benchmarks for Automatic Radiology Report Generation in French

Figure 2: Our results within ILCA setting against the ground truth.


â–Ș Table 4: Intra-language contextual analysis (ILCA) results of various MRG models. The best results are in bold, and the underlined are the second-best results.

Dataset Model NLG Metrics CE Metrics
BLEU-1 BLEU-4 ROUGE-L METEOR Precision Recall F-1




CASIA-CXR
(Ours)
CoAtt [1] 0.300 0.103 0.249 0.121 - - -
Up-Down [2] 0.309 0.106 0.253 0.126 - - -
WCL [3] 0.333 0.127 0.266 0.140 0.380 0.271 0.316
R2Gen [4] 0.354 0.144 0.273 0.161 0.588 0.512 0.548
M2TR [5] 0.389 0.151 0.289 0.177 0.522 0.481 0.500
Ours 0.404 0.177 0.292 0.158 0.685 0.604 0.642

â–Ș Table 5: We visualize samples of the generated reports for three cases. We bold the identical descriptions, underline the additional identified findings, and in red the failure findings..

X-ray Image Baseline M2TR R2Gen Ours
CASIA-CXR: An Open Chest X-ray Dataset with Benchmarks for Automatic Radiology Report Generation in French Pneumothorax gauche de moyenne abondance./Silhouette cardiaque d’aspect normal./Absence d’anomalie mĂ©diastinale./Absence d’épanchement pleural./Absence d’anomalie du gril costal./ Pneumothorax gauche de grande abondance./Contours cardiaques rĂ©guliers./MĂ©diastin sans anomalies dĂ©celables./ Aucun Ă©panchement pleural dĂ©tectĂ©./Structure du gril costal normale. Pneumothorax de petite abondance./Aucune anomalie visible au niveau du mĂ©diastin./Pas de prĂ©sence d'Ă©panchement pleural./ Pneumothorax bilatĂ©ral de moyenne abondance./Configuration normale de la silhouette cardiaque./MĂ©diastin clair, sans irrĂ©gularitĂ©s./Absence d'Ă©panchement pleural./Aucune lĂ©sion du gril costal observĂ©e./
CASIA-CXR: An Open Chest X-ray Dataset with Benchmarks for Automatic Radiology Report Generation in French RĂ©gression du pneumothorax droit./Drain en place au niveau apical droit./OpacitĂ© en bande linĂ©aire au niveau du tiers infĂ©rieure de l’hĂ©michamp pulmonaire gauche./Scissurite droite./Silhouette cardiaque d’aspect normal./Absence d’anomalie du gril costal./ StabilitĂ© du pneumothorax droit./Drain apical droit en place et fonctionnel./Bande opacifiante au niveau du tiers moyen de l’hĂ©michamp pulmonaire gauche./LĂ©gĂšre scissurite droite./ Diminution du Pneumothorax gauche./Drain thoracique positionnĂ© Ă  l'apex gauche./OpacitĂ© linĂ©aire sur le tiers supĂ©rieur de l'hĂ©michamp pulmonaire droit./ RĂ©gression marquĂ©e du pneumothorax droit./Disposition d’un drain au niveau apical gauche./OpacitĂ© linĂ©aire dans le tiers infĂ©rieur de l’hĂ©michamp pulmonaire gauche./NormalitĂ© du gril costal et des parties molles./
CASIA-CXR: An Open Chest X-ray Dataset with Benchmarks for Automatic Radiology Report Generation in French Pneumothorax gauche de grande abondance./OpacitĂ© rĂ©ticulo-micronodulaire avec verre dĂ©poli de l’hĂ©michamp pulmonaire gauche avec individualisation d’une volumineuse bulle apical gauche./ Pneumothorax droit de grande abondance./ OpacitĂ© rĂ©ticulo-micronodulaire dans l’hĂ©michamp pulmonaire droit, avec verre dĂ©poli et prĂ©sence d'une bulle apicale droite importante./ Pneumothorax gauche modĂ©rĂ©./ OpacitĂ©s rĂ©ticulo-micronodulaires avec zones de verre dĂ©poli Ă©tendues sur l’hĂ©michamp pulmonaire droit./ Pneumothorax bilatĂ©ral de grande abondance./ PrĂ©sence d'opacitĂ© rĂ©ticulo-micronodulaire avec aspect de verre dĂ©poli sur les deux hĂ©michamps pulmonaires, plus de grande bulle apicale sur le cĂŽtĂ© gauche./ SurĂ©levation de la coupole diaphragmatique droite avec trachĂ©e tirĂ©e vers la droite./
References
[1] On the Automatic Generation of Medical Imaging Reports. Link
[2] Bottom-up and top-down attention for image captioning and VQA. Link
[3] Weakly Supervised Contrastive Learning for Chest X-Ray Report Generation. Link
[4] Generating Radiology Reports via Memory-driven Transformer. Link
[5] Progressive Transformer-Based Generation of Radiology Reports. Link

â–Ș Table 6: Cross-language contextual analysis (CLCA) results against SOTA MRG models on MIMIC-CXR dataset. We replicate the results from the original papers. The best results are in bold, and the underlined are the second-best results.

Dataset Model NLG Metrics CE Metrics
BLEU-1 BLEU-4 ROUGE-L METEOR Precision Recall F-1





MIMIC-CXR
R2Gen [1] 0.353 0.103 0.227 0.142 0.333 0.273 0.300
CMCL [2] 0.334 0.097 0.281 0.133 - - -
PPKED [3] 0.360 0.106 0.284 0.149 - - -
CA [4] 0.350 0.109 0.283 0.151 0.352 0.298 0.322
AlignTR [5] 0.378 0.112 0.283 0.158 - - -
M2TR [6] 0.378 0.107 0.272 0.145 0.240 0.428 0.308
CASIA-CXR (Ours) Ours 0.357 0.197 0.314 0.177 0.340 0.437 0.383
CASIA-CXR: An Open Chest X-ray Dataset with Benchmarks for Automatic Radiology Report Generation in French

Figure 3: Cross-language contextual analysis (CLCA) results against SOTA MRG models.

CASIA-CXR: An Open Chest X-ray Dataset with Benchmarks for Automatic Radiology Report Generation in French

Figure 4: (Left) Number of sentences and words generated by MRG models within the ILCA setting. (Right) Number of sentences and words generated by MRG models within the CLCA setting.

CASIA-CXR: An Open Chest X-ray Dataset with Benchmarks for Automatic Radiology Report Generation in French

Figure 5: The most frequent radiographic findings. Labels are shown for both expert interpretation (manual labeling in light green), and our generated labels (automatic labeling in dark green).

References
[1] Generating radiology reports via memory-driven transformer. Link
[2] Competence-based multimodal curriculum learning for medical report generation. Link
[3] Exploring and distilling posterior and prior knowledge for radiology report generation. Link
[4] Contrastive attention for automatic chest x-ray report generation. Link
[5] Aligntransformer: Hierarchical alignment of visual regions and disease tags for medical report generation. Link
[6] Progressive transformer-based generation of radiology reports. Link

â–Ș Table 7: We visualize samples of the generated reports for three cases. We bold the identical descriptions, underline the additional identified findings, and in red the failure findings.

X-ray Image French Ground Truth Our Generated Report Translated to English English Ground Truth
CASIA-CXR: An Open Chest X-ray Dataset with Benchmarks for Automatic Radiology Report Generation in French Augmentation de l'index cardio- thoracique./
Absence d'anomalie parenchymateuse./
Absence d'épanchement pleural./
Augmentation l'index cardio-thoracique et Absence d'épanchement pleural.
Absence du gril costal./
Increased cardiothoracic index and absence of pleural effusion
and absence of the rib grill./
Increased cardiothoracic index./
Absence of parenchymal abnormality./
Absence of pleural effusion./
CASIA-CXR: An Open Chest X-ray Dataset with Benchmarks for Automatic Radiology Report Generation in French Opacités linéaire et stellaire bilatérales./
Rétractiles bilatérales et diffuses./
Surcharge hilaire bilatérale./
Il s'y associe des opacités linéaires./
Stellaires bilatérales et des opacités rétractiles bilatérales.
Surcharge hilaire bilatérale est également présente./
Bilateral stellate opacities and bilateral retractile opacities.
Bilateral hilar overload is also present./
Bilateral linear and stellar and bilateral and diffuse retractile opacities./
Bilateral hilar overload./
It is associated with linear opacities./
CASIA-CXR: An Open Chest X-ray Dataset with Benchmarks for Automatic Radiology Report Generation in French Épanchement pleural gauche de moyenne abondance./
Aspect rétracté du poumon gauche avec attraction
de la trachée et hypertrophie du poumon droit./
Il est observé un épanchement pleural gauche de abondance.
Le poumon gauche présente un aspect hypertrophie./
Abundant left pleural effusion is observed./
The left lung presents an enlarged hypertrophy./
Left pleural effusion of moderate abundance./
Retracted appearance of the left lung with attraction of the trachea
hypertrophy of the right lung./
CASIA-CXR: An Open Chest X-ray Dataset with Benchmarks for Automatic Radiology Report Generation in French Opacité arrondie, de tonalité hydrique, bien limitée
au niveau des tiers inférieur./
Moyen de l'hémichamp pulmonaire gauche./
Cardiomégalie associée à un infiltrat interstitiel origine infectieuse,
Ă  confronter au reste du bilan./
Épanchement pleural gauche de abondance./
Cardiomégalie associated with an interstitial infiltrate
of infectious origin,to be compared with
the rest of the assessment, abundant left pleural effusion./
Rounded opacity, watery in tone, well limited, at the level of the
lower and middle thirds of the left pulmonary hemifield./

📃 Ablation Studies

â–Ș Table 8: Ablation studies on CASIA-CXR dataset.

Our Model NLG Metrics CE Metrics
BLEU-1 BLEU-4 ROUGE-L Precision Recall F-1 Score
w/o D_Txt 0.300 0.144 0.282 0.653 0.553 0.598
w/o D_Fused 0.293 0.131 0.263 0.627 0.531 0.575
w/o D_Enriched 0.281 0.122 0.254 0.612 0.507 0.554
Baseline 0.404 0.177 0.292 0.685 0.604 0.639
CASIA-CXR: An Open Chest X-ray Dataset with Benchmarks for Automatic Radiology Report Generation in French

Figure 6: Ablation studies on CASIA-CXR dataset.

📃 Expert Evaluation

â–Ș Table 9: Results of expert evaluation on CASIA-CXR dataset. The baseline obtained through blinded evaluation conducted by the radiologists. Values are reported on a 1 to 5 scale..

Criteria Baseline Variants
w D_Fused w D_Enriched w/o D_Fused w/o D_Enriched
Comprehensiveness 3.8 4.2 4.1 3.0 3.2
Fluency 4.0 3.8 4.2 3.5 3.7
Faithfulness 3.7 4.0 3.9 3.2 3.5
CASIA-CXR: An Open Chest X-ray Dataset with Benchmarks for Automatic Radiology Report Generation in French

Figure 7: Expert evaluation on CASIA-CXR dataset.