Skip to main content

Table 4 Evaluation of OW and FS methods by simulations with sex-dependent heterogeneous treatment effect by outcome risk along with observed/simulated exposure.*

From: Comparison of two propensity score-based methods for balancing covariates: the overlap weighting and fine stratification methods in real-world claims data

Outcome risk

Methods

MB

rbias

SE

SD(rBias)

rMSE

Coverage

CoverageT

Significance

N used

Heterogeneity factor = sex and exposure prevalence = 10.55% (observed)

1%

Crude

0.5499

-131.30

0.531

372.130

2.509

61.0

3.2

4.2

4000

FSF−equ

0.0831

-61.15

0.557

379.639

2.445

92.8

20.2

21.2

3970

FSX−equ

0.0778

-64.61

0.560

382.972

2.470

92.2

18.6

20.0

3815

FSF−unequ

0.1170

-61.15

0.557

379.639

2.445

92.8

20.2

21.2

3970

FSX−unequ

0.1121

-64.61

0.560

382.972

2.470

92.2

18.6

20.0

3815

OWF

0.0007

-41.72

0.542

417.803

2.670

97.6

28.6

30.0

4000

OWX

0.0001

-43.52

0.547

417.796

2.671

97.4

28.0

29.4

3838

10%

Crude

0.5502

-85.10

0.149

23.024

0.561

0.0

0.0

11.8

4000

FSF−equ

0.0819

-44.47

0.159

21.119

0.313

0.0

0.0

62.4

3968

FSX−equ

0.0754

-44.88

0.159

21.152

0.316

0.0

0.0

63.6

3811

FSF−unequ

0.1153

-44.47

0.159

21.119

0.313

0.0

0.0

62.4

3968

FSX−unequ

0.1087

-44.88

0.159

21.152

0.316

0.0

0.0

63.6

3811

OWF

0.0007

-36.06

0.152

20.265

0.263

0.4

0.4

77.6

4000

OWX

0.0001

-36.28

0.153

20.336

0.264

0.2

0.2

77.6

3836

30%

Crude

0.5552

-82.47

0.074

11.697

0.530

0.0

0.0

35.2

4000

FSF−equ

0.0821

-61.37

0.078

10.583

0.396

0.0

0.0

88.0

3971

FSX−equ

0.0770

-61.35

0.078

10.521

0.396

0.0

0.0

88.4

3816

FSF−unequ

0.1157

-61.37

0.078

10.583

0.396

0.0

0.0

88.0

3971

FSX−unequ

0.1114

-61.35

0.078

10.521

0.396

0.0

0.0

88.4

3816

OWF

0.0007

-56.94

0.076

10.124

0.368

0.0

0.0

96.2

4000

OWX

0.0001

-56.96

0.076

10.212

0.368

0.0

0.0

96.0

3841

Heterogeneity factor = sex and exposure prevalence = 2.5%

10%

Crude

0.6445

-92.149

0.301

6.468

0.665

3.4

3.4

7.8

4000

FSF−equ

0.1908

-48.554

0.352

7.558

0.459

58.2

20.4

20.6

3860

FSX−equ

0.1855

-47.464

0.357

7.666

0.469

60

21.0

21.0

3355

FSF−unequ

0.2808

-48.554

0.352

7.558

0.459

58.2

20.4

20.6

3860

FSX−unequ

0.2867

-47.464

0.357

7.666

0.469

60

21.0

21.0

3355

OWF

0.0295

-41.272

0.304

6.527

0.378

47.6

26.6

27.8

4000

OWX

0.0311

-40.999

0.306

6.565

0.376

47.2

26.4

27.2

3454

30%

Crude

0.6535

-83.221

0.145

22.839

0.549

0.0

0.0

13.6

4000

FSF−equ

0.1962

-59.869

0.170

22.499

0.407

0.0

0.0

36.2

3855

FSX−equ

0.1848

-57.065

0.171

22.320

0.390

0.0

0.0

38.0

3347

FSF−unequ

0.2904

-59.869

0.170

22.499

0.407

0.0

0.0

36.2

3855

FSX−unequ

0.2882

-57.065

0.171

22.320

0.390

0.0

0.0

38.0

3347

OWF

0.0065

-56.260

0.147

18.623

0.377

0.0

0.0

49.6

4000

OWX

0.1147

-57.614

0.149

33.549

0.424

0.0

0.0

48.8

3449

  1. Footnotes:
  2. * The simulation scenario with 1% outcome risk and 2.5% exposure prevalence is not conducted due to both rare event and rare exposure that resulted in the issue of complete separation or quasi-complete separation of data points (shown in Table 2)
  3. 1. Crude = summarized by raw data without any balancing method; OW = overlap weighting method; FS = propensity score based fine stratification method
  4. 2. ‘F’ = a full set of data; ‘X’ = a subset of data after removing those unmatched
  5. 3. ‘equ’ = ATE with the equal weighting between groups; ‘unequ’ = ATE with the unequal weighting, where total weight in one group equivalent to the sample size in that group
  6. 4. The best values are bolded and can be used to guide which method performs the best per evaluation criterion
  7. 5. MB = Mahalanobis balance; The rbias, relative bias, was calculated as 100*(estimated effect – true effect)/true effect; SE = average estimated standard error; SD(rBias) = empirical standard deviation of relative bias x 100; rMSE = square root of mean squared error that combines squared bias (not relative bias) and its variance; Coverage = proportion of samples whose 95% CI cover the true effect; CoverageT = proportion of samples whose 95% CI cover the true effect but not zero; Significance = proportion of samples obtaining a significant effect (by a weighted GLM with a two-sided p-value < 0.05); N used = average total sample size that was used further for GLM.
  8. 6. The true sex-dependent treatment effect was 63.59%, calculated by the observed female proportion (63.59%) times true effect (= 1)