Functions for descriptive statistics

You can make tables summarizing descriptive statistics easily with webr package.

Installation of packages

You have to install the latest versions of “webr” and “moonBook” packages from github.

if(!require(devtools)) install.packages("devtools")
devtools::install_github("cardiomoon/webr")
devtools::install_github("cardiomoon/moonBook")   # For examples
devtools::install_github("cardiomoon/rrtable")    # For reproducible research

Load packages

require(webr)
require(moonBook) # For data acs

Summarizing Frequencies

You can summmarize the frequencies easily with freqSummary() function. Also you can make a table summarizng frequencies with freqTable() function.

freqSummary(acs$Dx)
                Count Percent Valid Percent Cum Percent
NSTEMI          "153" "17.9"  "17.9"        "17.9"     
STEMI           "304" "35.5"  "35.5"        "53.3"     
Unstable Angina "400" "46.7"  "46.7"        "100.0"    
Sum             "857" "100.0" "100.0"       ""         
freqTable(acs$Dx)

rowname

Count

Percent

Valid Percent

Cum Percent

NSTEMI

153

17.9

17.9

17.9

STEMI

304

35.5

35.5

53.3

Unstable Angina

400

46.7

46.7

100.0

Sum

857

100.0

100.0

Ready for reproducible research

The freqTable() function returns an object of class “flextable”. With this object, you can make html, pdf, docx, pptx file easily.

result=freqTable(acs$Dx)
class(result)
[1] "flextable"

Frequency table for a continuous variable

You can make the frequency table for a continuous variable. In this time, you can get a long table.

freqTable(mtcars$mpg)

rowname

Count

Percent

Valid Percent

Cum Percent

10.4

2

6.2

6.2

6.2

13.3

1

3.1

3.1

9.4

14.3

1

3.1

3.1

12.5

14.7

1

3.1

3.1

15.6

15

1

3.1

3.1

18.8

15.2

2

6.2

6.2

25.0

15.5

1

3.1

3.1

28.1

15.8

1

3.1

3.1

31.2

16.4

1

3.1

3.1

34.4

17.3

1

3.1

3.1

37.5

17.8

1

3.1

3.1

40.6

18.1

1

3.1

3.1

43.8

18.7

1

3.1

3.1

46.9

19.2

2

6.2

6.2

53.1

19.7

1

3.1

3.1

56.2

21

2

6.2

6.2

62.5

21.4

2

6.2

6.2

68.8

21.5

1

3.1

3.1

71.9

22.8

2

6.2

6.2

78.1

24.4

1

3.1

3.1

81.2

26

1

3.1

3.1

84.4

27.3

1

3.1

3.1

87.5

30.4

2

6.2

6.2

93.8

32.4

1

3.1

3.1

96.9

33.9

1

3.1

3.1

100.0

Sum

32

100.0

100.0

Frequency table for two categorical variables

You can make a table summarizing the independency of two categorical variables.

x2Table(acs,Dx,sex)

rowname

Female

Male

Total

NSTEMI

50
(32.7%)

103
(67.3%)

153
(100 %)

STEMI

84
(27.6%)

220
(72.4%)

304
(100 %)

Unstable Angina

153
(38.2%)

247
(61.8%)

400
(100 %)

Total

287
(33.5%)

570
(66.5%)

857
(100 %)

Chi-squared=8.798, df=2, Cramer's V=0.101, Chi-squared p=0.0123

You can make a table with columnwise percentages.

x2Table(acs,Dx,sex,margin=2)

rowname

Female

Male

Total

NSTEMI

50
(17.4%)

103
(18.1%)

153
(17.9%)

STEMI

84
(29.3%)

220
(38.6%)

304
(35.5%)

Unstable Angina

153
(53.3%)

247
(43.3%)

400
(46.7%)

Total

287
(100 %)

570
(100 %)

857
(100 %)

Chi-squared=8.798, df=2, Cramer's V=0.101, Chi-squared p=0.0123

You can hide pecentages.

x2Table(acs,Dx,sex,show.percent=FALSE)

rowname

Female

Male

Total

NSTEMI

50

103

153

STEMI

84

220

304

Unstable Angina

153

247

400

Total

287

570

857

Chi-squared=8.798, df=2, Cramer's V=0.101, Chi-squared p=0.0123

Numerical summary

Numerical summary of a vector

You can make a numerical summary table with numSummary() function. If you use the numSummary() function to a continuous vector, you can get the following summary. This function uses psych::describe function

require(dplyr)
numSummary(acs$age)
# A tibble: 1 × 12
      n  mean    sd median trimmed   mad   min   max range   skew kurtosis    se
  <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl> <dbl>
1   857  63.3  11.7     64    63.6  13.3    28    91    63 -0.175   -0.566 0.400
numSummaryTable(acs$age)

n

mean

sd

median

trimmed

mad

min

max

range

skew

kurtosis

se

857.00

63.31

11.70

64.00

63.56

13.34

28.00

91.00

63.00

-0.18

-0.57

0.40

Numerical summary of a data.frame or a tibble

You can make a numerical summary of a data.frame. The numSummary function uses is.numeric function to select numeric columns and make a numeric summary.

numSummary(acs)
# A tibble: 9 × 13
  vars      n  mean    sd median trimmed   mad   min   max range   skew kurtosis
  <chr> <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>
1 age     857  63.3 11.7    64      63.6 13.3   28    91    63   -0.175  -0.566 
2 EF      723  55.8  9.62   58.1    56.8  7.86  18    79    61   -0.978   1.11  
3 heig…   764 163.   9.08  165     164.   7.41 130   185    55   -0.440  -0.0145
4 weig…   766  64.8 11.4    65      64.5 10.4   30   112    82    0.336   0.444 
5 BMI     764  24.3  3.35   24.2    24.2  3.01  15.6  41.4  25.8  0.668   2.12  
6 TC      834 185.  47.8   183     184.  43.0   25   493   468    0.737   3.77  
7 LDLC    833 117.  41.1   114     115.  40.0   15   366   351    0.787   2.33  
8 HDLC    834  38.2 11.1    38      38.0 10.4    4    89    85    0.366   1.46  
9 TG      842 125.  90.9   106.    111.  60.0   11   877   866    3.02   14.9   
# ℹ 1 more variable: se <dbl>
numSummaryTable(acs)

rowname

vars

n

mean

sd

median

trimmed

mad

min

max

range

skew

kurtosis

se

1

age

857.00

63.31

11.70

64.00

63.56

13.34

28.00

91.00

63.00

-0.18

-0.57

0.40

2

EF

723.00

55.83

9.62

58.10

56.77

7.86

18.00

79.00

61.00

-0.98

1.11

0.36

3

height

764.00

163.18

9.08

165.00

163.52

7.41

130.00

185.00

55.00

-0.44

-0.01

0.33

4

weight

766.00

64.84

11.36

65.00

64.55

10.38

30.00

112.00

82.00

0.34

0.44

0.41

5

BMI

764.00

24.28

3.35

24.16

24.16

3.01

15.62

41.42

25.80

0.67

2.12

0.12

6

TC

834.00

185.20

47.77

183.00

183.76

43.00

25.00

493.00

468.00

0.74

3.77

1.65

7

LDLC

833.00

116.58

41.09

114.00

114.62

40.03

15.00

366.00

351.00

0.79

2.33

1.42

8

HDLC

834.00

38.24

11.09

38.00

37.95

10.38

4.00

89.00

85.00

0.37

1.46

0.38

9

TG

842.00

125.24

90.85

105.50

111.29

60.05

11.00

877.00

866.00

3.02

14.91

3.13

Use of dplyr::group_by() and dplyr::select() function to summarize

You can use dplyr::select() function to select variables to summarize.

acs %>% select(age,EF) %>% numSummary
# A tibble: 2 × 13
  vars      n  mean    sd median trimmed   mad   min   max range   skew kurtosis
  <chr> <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl>
1 age     857  63.3 11.7    64      63.6 13.3     28    91    63 -0.175   -0.566
2 EF      723  55.8  9.62   58.1    56.8  7.86    18    79    61 -0.978    1.11 
# ℹ 1 more variable: se <dbl>
acs %>% select(age,EF) %>% numSummaryTable

rowname

vars

n

mean

sd

median

trimmed

mad

min

max

range

skew

kurtosis

se

1

age

857.00

63.31

11.70

64.00

63.56

13.34

28.00

91.00

63.00

-0.18

-0.57

0.40

2

EF

723.00

55.83

9.62

58.10

56.77

7.86

18.00

79.00

61.00

-0.98

1.11

0.36

You can use dplyr::group_by() and dplyr::select() function to select variables to summarize by group.

acs %>% group_by(sex) %>% select(age,EF) %>% numSummary
# A tibble: 4 × 14
  sex    vars      n  mean    sd median trimmed   mad   min   max range    skew
  <chr>  <chr> <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
1 Male   age     570  60.6 11.2    61      60.6 11.9   28      91  63   -0.0148
2 Male   EF      483  55.6  9.40   57.3    56.4  8.01  18      79  61   -0.789 
3 Female age     287  68.7 10.7    70      69.4 10.4   39      90  51   -0.593 
4 Female EF      240  56.3 10.1    59.2    57.6  7.19  18.4    75  56.6 -1.30  
# ℹ 2 more variables: kurtosis <dbl>, se <dbl>
acs %>% group_by(sex) %>% select(age,EF) %>% numSummaryTable

sex

vars

n

mean

sd

median

trimmed

mad

min

max

range

skew

kurtosis

se

Male

age

570.00

60.61

11.23

61.00

60.65

11.86

28.00

91.00

63.00

-0.01

-0.36

0.47

Male

EF

483.00

55.62

9.40

57.30

56.38

8.01

18.00

79.00

61.00

-0.79

0.76

0.43

Female

age

287.00

68.68

10.73

70.00

69.43

10.38

39.00

90.00

51.00

-0.59

-0.26

0.63

Female

EF

240.00

56.27

10.06

59.25

57.57

7.19

18.40

75.00

56.60

-1.30

1.70

0.65

You can summarize by multiple groups.

acs %>% group_by(sex,Dx) %>% select(age,EF) %>% numSummary
# A tibble: 12 × 15
   sex    Dx      vars      n  mean    sd median trimmed   mad   min   max range
   <chr>  <chr>   <chr> <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>
 1 Male   STEMI   age     220  59.4 11.7    59.5    59.4 11.1   30    86    56  
 2 Male   STEMI   EF      195  52.4  8.90   54      52.9  8.45  18    73.6  55.6
 3 Female STEMI   age      84  69.1 10.4    70      70.0 10.4   42    89    47  
 4 Female STEMI   EF       77  52.3 10.9    55.7    53.7  9.04  18.4  67.1  48.7
 5 Male   NSTEMI  age     103  61.1 11.6    59      61.3 13.3   28    85    57  
 6 Male   NSTEMI  EF       94  55.1  9.42   58      55.9  7.12  21.8  74    52.2
 7 Female Unstab… age     153  67.7 10.7    70      68.3  8.90  39    90    51  
 8 Female Unstab… EF      118  59.4  8.76   61.1    60.8  5.49  22    71.9  49.9
 9 Male   Unstab… age     247  61.4 10.6    61      61.4 10.4   35    91    56  
10 Male   Unstab… EF      194  59.1  8.67   60      60.2  5.93  24.7  79    54.3
11 Female NSTEMI  age      50  70.9 11.4    74.5    71.9  8.90  42    88    46  
12 Female NSTEMI  EF       45  54.8  9.10   57      55.3  9.79  36.8  75    38.2
# ℹ 3 more variables: skew <dbl>, kurtosis <dbl>, se <dbl>
acs %>% group_by(sex,Dx) %>% select(age,EF) %>% numSummaryTable

sex

Dx

vars

n

mean

sd

median

trimmed

mad

min

max

range

skew

kurtosis

se

Male

STEMI

age

220.00

59.43

11.72

59.50

59.43

11.12

30.00

86.00

56.00

0.00

-0.55

0.79

Male

STEMI

EF

195.00

52.37

8.90

54.00

52.88

8.45

18.00

73.60

55.60

-0.62

0.53

0.64

Female

STEMI

age

84.00

69.11

10.36

70.00

70.04

10.38

42.00

89.00

47.00

-0.65

-0.09

1.13

Female

STEMI

EF

77.00

52.32

10.94

55.70

53.72

9.04

18.40

67.10

48.70

-1.17

1.01

1.25

Male

NSTEMI

age

103.00

61.15

11.57

59.00

61.28

13.34

28.00

85.00

57.00

-0.11

-0.53

1.14

Male

NSTEMI

EF

94.00

55.08

9.42

58.00

55.86

7.12

21.80

74.00

52.20

-0.83

0.57

0.97

Female

Unstable Angina

age

153.00

67.72

10.67

70.00

68.33

8.90

39.00

90.00

51.00

-0.54

-0.34

0.86

Female

Unstable Angina

EF

118.00

59.40

8.76

61.10

60.79

5.49

22.00

71.90

49.90

-1.86

4.06

0.81

Male

Unstable Angina

age

247.00

61.44

10.57

61.00

61.41

10.38

35.00

91.00

56.00

0.07

-0.15

0.67

Male

Unstable Angina

EF

194.00

59.14

8.67

60.00

60.15

5.93

24.70

79.00

54.30

-1.25

2.54

0.62

Female

NSTEMI

age

50.00

70.88

11.35

74.50

71.88

8.90

42.00

88.00

46.00

-0.72

-0.34

1.61

Female

NSTEMI

EF

45.00

54.85

9.10

57.00

55.26

9.79

36.80

75.00

38.20

-0.32

-0.83

1.36

For reproducible research

You can use package rrtable for reproducible research.

require(rrtable)
type=c("table","table")
title=c("Frequency Table","Numerical Summary")
code=c("freqTable(acs$Dx)","acs %>% group_by(sex) %>% select(EF,age) %>% numSummaryTable")
data=data.frame(type,title,code,stringsAsFactors = FALSE)
data2pptx(data)
Making File: 123
[1] "/tmp/RtmpFqbmKo/Rbuild1d6345cd72a5/webr/vignettes/./Report.pptx"
data2docx(data)
Making File: 123
[1] "/tmp/RtmpFqbmKo/Rbuild1d6345cd72a5/webr/vignettes/./Report.docx"