Skip to main content

Table 1 Summary and comparison of the manual method, Rayyan thresholds A and B and the LLM method

From: Streamlining systematic reviews with large language models using prompt engineering and retrieval augmented generation

  

Manual

Rayyan Threshold A

Rayyan Threshold B

LLM

Title/Abstract

Articles to screen*

14,439

14,439

14,439

14,439

Inclusion Threshold

-

“Undecided”

“Likely to Exclude”

-

Articles Remaining after automated screening (AER)

-

3,470 (72.1%)

6,131 (50.7%)

3,280 (77.2%)

Total Articles to Manually Screen

14,439

5,470

8,131

 

Time taken for all Manual Screening Articles

144.4 h

54.7 h

81.3 h

-

Time for automated screening

-

-

-

2 h

True Positives (FNR)

N/A (Gold Standard)

19 (5%)

20 (0%)

20 (0%)

Total time for Step

144.4 h

54.7 h

81.3 h

2 h

Total Time Saved compared to manual method (%)

N/A (Gold Standard)

89.7 h (62.1%)

63.1 h (43.7%)

-

Full Text**

Articles to screen

1,680

-

-

3,280

Time to run automated screening

-

-

-

4 h

Articles Remaining (AER)

-

-

-

78 (97.6%)

Time to manually screen remaining articles

420 h

-

-

19.5 h

True Positives (FNR)

N/A (Gold Standard)

-

-

20 (0%)

Total time for step (hours)

420 h

-

-

23.5 h

Total

Total Time for both steps

564.4 h

-

-

25.5 h

Total Time Saved compared to manual method (%)

N/A (Gold Standard)

-

-

538.9 h (95.5%)

  1. AER: Article Exclusion Rate, FNR: False Negative Rate
  2. *Of the original 17,776 citations, 430 articles were excluded as their results were inadvertently not saved. 2,907 articles were deleted after duplicate removal, of the remaining 17,346 articles and 14,439 remained
  3. **Rayyan was excluded from full-text comparison, as its article classification feature is not yet supported in its full text screening platform