Skip to main content

Table 4 Evaluation of OW and FS methods by simulation with the sex-dependent heterogeneous treatment effect by exposure prevalence with observed outcome risk or with simulated outcome risk of 1%

From: Comparison of two propensity score-based methods for balancing covariates: the overlap weighting and fine stratification methods in real-world claims data

Exposure Prevalence

Methods

MB

rbias

SE

SD(rbias)

rMSE

Coverage

CoverageT

Significance

N used

Heterogeneity factor = sex and outcome risk = 27.75% (observed)

2.5%

Crude

0.6492

-84.184

0.154

24.130

0.557

0

0

13.6

4000

FSF−equ

0.1917

-60.292

0.181

23.414

0.411

0

0

31.0

3848

FSX−equ

0.1873

-57.772

0.184

23.564

0.397

0

0

32.6

3336

FSF−unequ

0.2832

-60.292

0.181

23.414

0.411

0

0

31.0

3848

FSX−unequ

0.2916

-57.772

0.184

23.564

0.397

0

0

32.6

3336

OWF

0.0418

-55.897

0.156

19.064

0.376

0

0

45.0

4000

OWX

0.0293

-55.757

0.157

19.331

0.375

0

0

45.4

3450

10%

Crude

0.5543

-81.638

0.080

12.170

0.525

0

0

33.2

4000

FSF−equ

0.0826

-58.987

0.084

11.299

0.382

0

0

87.6

3968

FSX−equ

0.0776

-58.745

0.085

11.255

0.380

0

0

89.4

3804

FSF−unequ

0.1171

-58.987

0.084

11.299

0.382

0

0

87.6

3968

FSX−unequ

0.1127

-58.745

0.085

11.255

0.380

0

0

89.4

3804

OWF

0.0008

-54.540

0.082

10.949

0.354

0

0

95.8

4000

OWX

0.0001

-54.538

0.082

10.988

0.354

0

0

95.4

3829

30%

Crude

0.5133

-82.043

0.053

8.568

0.525

0

0

58.6

4000

FSF−equ

0.0495

-58.906

0.054

7.183

0.377

0

0

99.8

3985

FSX−equ

0.0478

-58.926

0.055

7.131

0.377

0

0

100.0

3926

FSF−unequ

0.0697

-58.906

0.054

7.183

0.377

0

0

99.8

3985

FSX−unequ

0.0674

-58.926

0.055

7.131

0.377

0

0

100.0

3926

OWF

0.0003

-55.627

0.055

7.295

0.357

0

0

99.8

4000

OWX

0.0000

-55.606

0.055

7.306

0.357

0

0

100.0

3939

Heterogeneity factor = sex and outcome risk = 1% *

10%

Crude

0.5556

-121.79

0.537

334.460

2.263

60.2

6.6

7.4

4000

FSF−equ

0.0832

-60.404

0.567

381.654

2.457

92.8

22.6

24.0

3968

FSX−equ

0.0787

-55.555

0.571

344.830

2.221

92.2

22.6

24.0

3805

FSF−unequ

0.1177

-60.404

0.567

381.654

2.457

92.8

22.6

24.0

3968

FSX−unequ

0.1140

-55.555

0.571

344.830

2.221

92.2

22.6

24.0

3805

OWF

0.0008

-30.804

0.548

374.815

2.391

97.6

33.6

34.8

4000

OWX

0.0001

-31.930

0.552

374.932

2.393

97.6

31.6

32.8

3830

30%

Crude

0.5141

-85.691

0.345

59.821

0.665

23.4

5.8

6.4

4000

FSF−equ

0.0495

-15.029

0.356

58.794

0.386

79.2

38.8

39.0

3984

FSX−equ

0.0473

-17.597

0.358

59.135

0.392

76.6

37.6

37.8

3925

FSF−unequ

0.0697

-15.029

0.356

58.794

0.386

79.2

38.8

39.0

3984

FSX−unequ

0.0668

-17.597

0.358

59.135

0.392

76.6

37.6

37.8

3925

OWF

0.0003

9.529

0.358

58.189

0.375

89.4

54.0

54.2

4000

OWX

0.0000

7.850

0.361

58.524

0.375

89.6

52.8

52.8

3939

  1. Footnotes:
  2. 1. Crude = summarized by raw data without any balancing method; OW = overlap weighting method; FS = propensity score based fine stratification method
  3. 2. ‘F’ = a full set of data; ‘X’ = a subset of data after removing those unmatched
  4. 3. ‘equ’ = ATE with the equal weighting between groups; ‘unequ’ = ATE with the unequal weighting, where total weight in one group equivalent to the sample size in that group
  5. 4. The best values are bolded and can be used to guide which method performs the best per evaluation criterion
  6. 5. MB = Mahalanobis balance; The rbias, relative bias, was calculated as 100*(estimated effect – true effect)/true effect; SE = average estimated standard error; SD(rBias) = empirical standard deviation of relative bias x 100; rMSE = square root of mean squared error that combines squared bias (not relative bias) and its variance; Coverage = proportion of samples whose 95% CI cover the true effect; CoverageT = proportion of samples whose 95% CI cover the true effect but not zero; Significance = proportion of samples obtaining a significant effect (by a weighted GLM with a two-sided p-value < 0.05); N used = average total sample size that was used further for GLM.
  7. 6. The true sex-dependent treatment effect was 63.59%, calculated by the observed female proportion (63.59%) times true effect (= 1)