Skip to main content

Table 3 Distribution of CVD and non-CVD cases in each cluster with different predetermined number of clusters in the training set

From: Detecting cardiovascular diseases using unsupervised machine learning clustering based on electronic medical records

Predetermined No. clusters

Presence of CVD

Cluster 1

Cluster 2

Cluster 3

Cluster 4

Cluster 5

Cluster 6

Cluster 7

Cluster 8

Total

k = 2

Non-CVD

8402

73,478

      

81,880

CVD

46,619

11,805

      

58,424

Total

55,021

85,283

      

140,304

% of CVD

0.8473

0.1384

       

k = 4

Non-CVD

3650

72,690

5538

2

    

81,880

CVD

2889

10,743

44,792

0

    

58,424

Total

6539

83,433

50,330

2

    

140,304

% of CVD

0.4418

0.1288

0.8900

0

     

k = 8

Non-CVD

53,631

4367

4550

14

4

2

3025

16,287

81,880

CVD

5554

7286

39,100

11

2

0

2255

4216

58,424

Total

59,185

11,653

43,650

25

6

2

5280

20,503

140,304

% of CVD

0.0938

0.6252

0.8958

0.4400

0.3333

0

0.4271

0.2056

Â