Chapter 10 Crosstabs

The ‘flextable’ package will not aggregate data for you but it will help you to present aggregated data (with package ‘data.table’ or ‘dplyr’).

The package allows to transform crosstabs created by tables::tabular() into ‘flexable’ with method as_flextable().

library(tables)
tab <- tabular(
    (Species + 1) ~ (n = 1) + Format(digits = 2) *
      (Sepal.Length + Sepal.Width) * (mean + sd),
    data = iris
  )
as_flextable(tab)

Species

Sepal.Length

Sepal.Width

n

mean

sd

mean

sd

setosa

50

5.01

0.35

3.43

0.38

versicolor

50

5.94

0.52

2.77

0.31

virginica

50

6.59

0.64

2.97

0.32

All

150

5.84

0.83

3.06

0.44

The package provides also the function tabulator() to prepare for the creation of multidimensional crosstabs, possibly with multiple columns. The object can then be transformed as a flextable with as_flextable().

dat <- group_by(warpbreaks, wool, tension) %>%
  summarise(breaks = mean(breaks), .groups = "drop")

cft_1 <- tabulator(
  x = dat,
  rows = "wool",
  columns = "tension",
  `μ` = as_paragraph(breaks),
  `n` = as_paragraph(length(breaks))
)

ft_1 <- as_flextable(cft_1)
ft_1

wool

L

M

H

μ

n

μ

n

μ

n

A

44.56

2

24.00

2

24.56

2

B

28.22

2

28.78

2

18.78

2

10.1 PROC FREQ

Function proc_freq() compute a contingency table and create a flextable from the result. The aim of the function is to reproduce the results of the ‘SAS’ PROC FREQ.

It can produce a two way contingency table:

proc_freq(mtcars, "gear", "vs")

gear

vs

0

1

Total

3

Count

12 (37.5%)

3 (9.4%)

15 (46.9%)

Mar. pct (1)

66.7% ; 80.0%

21.4% ; 20.0%

4

Count

2 (6.2%)

10 (31.2%)

12 (37.5%)

Mar. pct

11.1% ; 16.7%

71.4% ; 83.3%

5

Count

4 (12.5%)

1 (3.1%)

5 (15.6%)

Mar. pct

22.2% ; 80.0%

7.1% ; 20.0%

Total

Count

18 (56.2%)

14 (43.8%)

32 (100.0%)

(1) Columns and rows percentages

It can also produce a one way contingency table:

proc_freq(mtcars, "gear")

gear

Count

Percent

3

15

46.9%

4

12

37.5%

5

5

15.6%

Total

32

100.0%

10.2 Using ‘tables’

Package ‘tables’ is inspired by ‘SAS PROC TABULATE’ and enable the creation of tabular summaries with a smart and clear syntax. Use of ‘tables’ results is straightforward:

  1. create a tabular object with function tables::tabular(),
  2. transform it to a flextable with as_flextable(). One option is available to let display the first group as a row separator.
library(tables)
set_flextable_defaults(theme_fun = theme_vader)
tab <- tabular( (cut*color + 1) ~ (n=1) + Format(digits=2)*
                  (price + x)*(mean + sd), data=ggplot2::diamonds)
print(tab[1:10,])
#                                                
#                 price           x              
#  cut  color n   mean    sd      mean    sd     
#  Fair D     163 4291.06 3286.11    6.02    0.83
#       E     224 3682.31 2976.65    5.91    0.83
#       F     312 3827.00 3223.30    5.99    0.88
#       G     314 4239.25 3609.64    6.17    0.99
#       H     303 5135.68 3886.48    6.58    0.94
#       I     175 4685.45 3730.27    6.56    0.90
#       J     119 4975.66 4050.46    6.75    1.09
#  Good D     662 3405.38 3175.15    5.62    0.93
#       E     933 3423.64 3330.70    5.62    0.95
#       F     909 3495.75 3202.41    5.69    0.95
as_flextable(tab, spread_first_col = TRUE,
                  row_title = as_paragraph(
                    colorize(as_b(.row_title), color = "#ff0000"))) |> 
  labelizor(part = "header", 
            labels = c("mean" = "µ", sd = "σ"))

color

price

x

n

µ

σ

µ

σ

Fair

D

163

4291.06

3286.11

6.02

0.83

E

224

3682.31

2976.65

5.91

0.83

F

312

3827.00

3223.30

5.99

0.88

G

314

4239.25

3609.64

6.17

0.99

H

303

5135.68

3886.48

6.58

0.94

I

175

4685.45

3730.27

6.56

0.90

J

119

4975.66

4050.46

6.75

1.09

Good

D

662

3405.38

3175.15

5.62

0.93

E

933

3423.64

3330.70

5.62

0.95

F

909

3495.75

3202.41

5.69

0.95

G

871

4123.48

3702.50

5.85

1.02

H

702

4276.25

4020.66

5.97

1.13

I

522

5078.53

4631.70

6.25

1.22

J

307

4574.17

3707.79

6.38

1.12

Very Good

D

1513

3470.47

3523.75

5.50

0.97

E

2400

3214.65

3408.02

5.43

1.00

F

2164

3778.82

3786.12

5.61

1.00

G

2299

3872.75

3861.38

5.66

1.04

H

1824

4535.39

4185.80

5.98

1.16

I

1204

5255.88

4687.10

6.27

1.17

J

678

5103.51

4135.65

6.46

1.14

Premium

D

1603

3631.29

3711.63

5.60

1.02

E

2337

3538.91

3794.99

5.59

1.03

F

2331

4324.89

4012.02

5.88

1.02

G

2924

4500.74

4356.57

5.86

1.15

H

2360

5216.71

4466.19

6.24

1.23

I

1428

5946.18

5053.75

6.49

1.32

J

808

6294.59

4788.94

6.81

1.22

Ideal

D

2834

2629.09

3001.07

5.19

0.82

E

3903

2597.55

2956.01

5.22

0.85

F

3826

3374.94

3766.64

5.41

0.98

G

4884

3720.71

4006.26

5.51

1.05

H

3115

3889.33

4013.38

5.73

1.17

I

2093

4451.97

4505.15

5.98

1.24

J

896

4918.19

4476.21

6.32

1.22

53940

3932.80

3989.44

5.73

1.12

10.3 Using tabulator

10.3.1 Tabulate a dataset with tabulator

Function tabulator() presents the result of an aggregation in a nice crosstable. To illustrate the function, we will use a summary of a subset of the dataset ggplot2::diamonds.

set_flextable_defaults(theme_fun = theme_booktabs)

subdat <- ggplot2::diamonds %>%
  filter(
    cut %in% c("Fair", "Good", "Very Good"),
    clarity %in% c("I1", "SI1", "VS2"),
    color %in% c("F", "G", "H")
  )

dat <- group_by(subdat, cut, color, clarity) %>%
  summarise(
    across(
      .cols = c(z, y),
      .fns = list(
        mean = ~ mean(.x),
        sd = ~ sd(.x)
      )
    ),
    .groups = "drop"
  )
head(dat)
# # A tibble: 6 × 7
#   cut   color clarity z_mean  z_sd y_mean  y_sd
#   <ord> <ord> <ord>    <dbl> <dbl>  <dbl> <dbl>
# 1 Fair  F     I1        4.00 0.704   6.04 1.08 
# 2 Fair  F     SI1       3.73 0.542   5.87 0.844
# 3 Fair  F     VS2       3.59 0.499   5.63 0.697
# 4 Fair  G     I1        4.23 0.722   6.43 1.08 
# 5 Fair  G     SI1       3.85 0.437   5.92 0.636
# 6 Fair  G     VS2       3.96 0.487   6.14 0.787

Now the data is ready to be sent to tabulator() function.

User have to provide parameters that define the layout of the crosstable, rows and columns. These are the columns names of the aggregated dataset containing the groupings dimensions (union should be the columns used in the dplyr::group_by or the by value if ‘datatable’ is used for the aggregation). rows will be used as row dimensions and columns will be used as columns dimensions, the values to display will be located under each corresponding set of columns.

The values to display are defined in the ... parameter of the function. The expected formalism is the following:

  • The name given to the argument is used as the column name. Note that if only one column is defined, its label is not displayed.
  • The expected value for each argument is a call to the as_paragraph() function.
tabulator(
  x = dat, rows = c("cut", "color"),
  columns = "clarity",
  `y stats` = as_paragraph(
    fmt_avg_dev(y_mean, y_sd)
  )
) |>
  as_flextable()

cut

color

I1

SI1

VS2

Fair

F

6.0 (1.1)

5.9 (0.8)

5.6 (0.7)

G

6.4 (1.1)

5.9 (0.6)

6.1 (0.8)

H

6.9 (1.0)

6.4 (0.8)

6.2 (0.9)

Good

F

6.1 (1.0)

5.7 (0.8)

5.7 (0.9)

G

6.6 (1.0)

6.0 (0.9)

5.8 (0.9)

H

6.7 (0.8)

6.0 (1.1)

5.9 (1.1)

Very Good

F

6.7 (1.0)

5.8 (0.8)

5.7 (0.9)

G

6.5 (0.8)

5.8 (0.9)

5.8 (1.0)

H

7.4 (0.9)

6.2 (1.1)

6.0 (1.1)

We can add another column for the z aggregations.

ft <- tabulator(
  x = dat, rows = c("cut", "color"),
  columns = "clarity",
  `y stats` = as_paragraph(fmt_avg_dev(y_mean, y_sd)),
  `z stats` = as_paragraph(fmt_avg_dev(z_mean, z_sd))
) |>
  as_flextable()
ft

cut

color

I1

SI1

VS2

y stats

z stats

y stats

z stats

y stats

z stats

Fair

F

6.0 (1.1)

4.0 (0.7)

5.9 (0.8)

3.7 (0.5)

5.6 (0.7)

3.6 (0.5)

G

6.4 (1.1)

4.2 (0.7)

5.9 (0.6)

3.9 (0.4)

6.1 (0.8)

4.0 (0.5)

H

6.9 (1.0)

4.5 (0.7)

6.4 (0.8)

4.1 (0.5)

6.2 (0.9)

4.0 (0.6)

Good

F

6.1 (1.0)

3.9 (0.6)

5.7 (0.8)

3.6 (0.5)

5.7 (0.9)

3.5 (0.6)

G

6.6 (1.0)

3.9 (1.1)

6.0 (0.9)

3.7 (0.6)

5.8 (0.9)

3.6 (0.6)

H

6.7 (0.8)

4.2 (0.6)

6.0 (1.1)

3.7 (0.6)

5.9 (1.1)

3.7 (0.7)

Very Good

F

6.7 (1.0)

4.1 (0.5)

5.8 (0.8)

3.6 (0.5)

5.7 (0.9)

3.5 (0.6)

G

6.5 (0.8)

4.0 (0.5)

5.8 (0.9)

3.6 (0.6)

5.8 (1.0)

3.6 (0.6)

H

7.4 (0.9)

4.6 (0.6)

6.2 (1.1)

3.8 (0.7)

6.0 (1.1)

3.7 (0.7)

As it is a flextable, all flextable can be used:

ft <- color(ft, i = ~ cut %in% "Very Good", color = "blue")
ft <- add_header_lines(x = ft, "blah blah blah blah blah blah blah blah blah blah")
ft

blah blah blah blah blah blah blah blah blah blah

cut

color

I1

SI1

VS2

y stats

z stats

y stats

z stats

y stats

z stats

Fair

F

6.0 (1.1)

4.0 (0.7)

5.9 (0.8)

3.7 (0.5)

5.6 (0.7)

3.6 (0.5)

G

6.4 (1.1)

4.2 (0.7)

5.9 (0.6)

3.9 (0.4)

6.1 (0.8)

4.0 (0.5)

H

6.9 (1.0)

4.5 (0.7)

6.4 (0.8)

4.1 (0.5)

6.2 (0.9)

4.0 (0.6)

Good

F

6.1 (1.0)

3.9 (0.6)

5.7 (0.8)

3.6 (0.5)

5.7 (0.9)

3.5 (0.6)

G

6.6 (1.0)

3.9 (1.1)

6.0 (0.9)

3.7 (0.6)

5.8 (0.9)

3.6 (0.6)

H

6.7 (0.8)

4.2 (0.6)

6.0 (1.1)

3.7 (0.6)

5.9 (1.1)

3.7 (0.7)

Very Good

F

6.7 (1.0)

4.1 (0.5)

5.8 (0.8)

3.6 (0.5)

5.7 (0.9)

3.5 (0.6)

G

6.5 (0.8)

4.0 (0.5)

5.8 (0.9)

3.6 (0.6)

5.8 (1.0)

3.6 (0.6)

H

7.4 (0.9)

4.6 (0.6)

6.2 (1.1)

3.8 (0.7)

6.0 (1.1)

3.7 (0.7)

10.3.2 Display additional data

Users may wish to add labels or calculation results corresponding to the row dimensions, after or before the rotated columns. Three arguments are provided to support this.

10.3.2.1 hidden_data

hidden_data and row_compose together let users add data corresponding to rows dimensions (one row for each combination of rows) and specify how to re-format one or more rows display values:

The following aggregation will be used in the cut column.

nstat <- group_by(subdat, cut) %>% tally()
nstat
# # A tibble: 3 × 2
#   cut           n
#   <ord>     <int>
# 1 Fair        506
# 2 Good       1281
# 3 Very Good  2942

The function fmt_header_n() is made for this special display: a soft return and the (N=xxxx) value. As nstat is used as value for argument hidden_data, variable n is now available when calling as_paragraph() (in argument row_compose).

tab <- tabulator(
  x = dat, rows = c("cut", "color"),
  columns = "clarity",
  hidden_data = nstat,
  row_compose = list(
    cut = as_paragraph(
      as_chunk(cut),
      as_chunk(n, formatter = fmt_header_n)
    )
  ),
  `y stats` = as_paragraph(fmt_avg_dev(y_mean, y_sd)),
  `z stats` = as_paragraph(fmt_avg_dev(z_mean, z_sd))
)
as_flextable(tab)

cut

color

I1

SI1

VS2

y stats

z stats

y stats

z stats

y stats

z stats

Fair
(N=506)

F

6.0 (1.1)

4.0 (0.7)

5.9 (0.8)

3.7 (0.5)

5.6 (0.7)

3.6 (0.5)

G

6.4 (1.1)

4.2 (0.7)

5.9 (0.6)

3.9 (0.4)

6.1 (0.8)

4.0 (0.5)

H

6.9 (1.0)

4.5 (0.7)

6.4 (0.8)

4.1 (0.5)

6.2 (0.9)

4.0 (0.6)

Good
(N=1 281)

F

6.1 (1.0)

3.9 (0.6)

5.7 (0.8)

3.6 (0.5)

5.7 (0.9)

3.5 (0.6)

G

6.6 (1.0)

3.9 (1.1)

6.0 (0.9)

3.7 (0.6)

5.8 (0.9)

3.6 (0.6)

H

6.7 (0.8)

4.2 (0.6)

6.0 (1.1)

3.7 (0.6)

5.9 (1.1)

3.7 (0.7)

Very Good
(N=2 942)

F

6.7 (1.0)

4.1 (0.5)

5.8 (0.8)

3.6 (0.5)

5.7 (0.9)

3.5 (0.6)

G

6.5 (0.8)

4.0 (0.5)

5.8 (0.9)

3.6 (0.6)

5.8 (1.0)

3.6 (0.6)

H

7.4 (0.9)

4.6 (0.6)

6.2 (1.1)

3.8 (0.7)

6.0 (1.1)

3.7 (0.7)

10.3.2.2 datasup_first and datasup_last

Arguments datasup_first and datasup_last can be used to show data before or after the computer columns. These data.frame can contains more than one column.

tab <- tabulator(
  x = dat, rows = c("cut", "color"),
  columns = "clarity",
  datasup_last = nstat,
  `y stats` = as_paragraph(fmt_avg_dev(y_mean, y_sd)),
  `z stats` = as_paragraph(fmt_avg_dev(z_mean, z_sd))
)
as_flextable(tab)

cut

color

I1

SI1

VS2

n

y stats

z stats

y stats

z stats

y stats

z stats

Fair

F

6.0 (1.1)

4.0 (0.7)

5.9 (0.8)

3.7 (0.5)

5.6 (0.7)

3.6 (0.5)

506

G

6.4 (1.1)

4.2 (0.7)

5.9 (0.6)

3.9 (0.4)

6.1 (0.8)

4.0 (0.5)

506

H

6.9 (1.0)

4.5 (0.7)

6.4 (0.8)

4.1 (0.5)

6.2 (0.9)

4.0 (0.6)

506

Good

F

6.1 (1.0)

3.9 (0.6)

5.7 (0.8)

3.6 (0.5)

5.7 (0.9)

3.5 (0.6)

1 281

G

6.6 (1.0)

3.9 (1.1)

6.0 (0.9)

3.7 (0.6)

5.8 (0.9)

3.6 (0.6)

1 281

H

6.7 (0.8)

4.2 (0.6)

6.0 (1.1)

3.7 (0.6)

5.9 (1.1)

3.7 (0.7)

1 281

Very Good

F

6.7 (1.0)

4.1 (0.5)

5.8 (0.8)

3.6 (0.5)

5.7 (0.9)

3.5 (0.6)

2 942

G

6.5 (0.8)

4.0 (0.5)

5.8 (0.9)

3.6 (0.6)

5.8 (1.0)

3.6 (0.6)

2 942

H

7.4 (0.9)

4.6 (0.6)

6.2 (1.1)

3.8 (0.7)

6.0 (1.1)

3.7 (0.7)

2 942

10.3.3 Design options

When calling as_flextable(), several options are available. Most important are separate_with, spread_first_col and sep_w.

10.3.3.1 Separate groups with a line

Option separate_with add a line of separation between each specified group.

ft <- as_flextable(tab, separate_with = "cut")
ft

cut

color

I1

SI1

VS2

n

y stats

z stats

y stats

z stats

y stats

z stats

Fair

F

6.0 (1.1)

4.0 (0.7)

5.9 (0.8)

3.7 (0.5)

5.6 (0.7)

3.6 (0.5)

506

G

6.4 (1.1)

4.2 (0.7)

5.9 (0.6)

3.9 (0.4)

6.1 (0.8)

4.0 (0.5)

506

H

6.9 (1.0)

4.5 (0.7)

6.4 (0.8)

4.1 (0.5)

6.2 (0.9)

4.0 (0.6)

506

Good

F

6.1 (1.0)

3.9 (0.6)

5.7 (0.8)

3.6 (0.5)

5.7 (0.9)

3.5 (0.6)

1 281

G

6.6 (1.0)

3.9 (1.1)

6.0 (0.9)

3.7 (0.6)

5.8 (0.9)

3.6 (0.6)

1 281

H

6.7 (0.8)

4.2 (0.6)

6.0 (1.1)

3.7 (0.6)

5.9 (1.1)

3.7 (0.7)

1 281

Very Good

F

6.7 (1.0)

4.1 (0.5)

5.8 (0.8)

3.6 (0.5)

5.7 (0.9)

3.5 (0.6)

2 942

G

6.5 (0.8)

4.0 (0.5)

5.8 (0.9)

3.6 (0.6)

5.8 (1.0)

3.6 (0.6)

2 942

H

7.4 (0.9)

4.6 (0.6)

6.2 (1.1)

3.8 (0.7)

6.0 (1.1)

3.7 (0.7)

2 942

10.3.3.2 Drop space between columns

Option sep_w can be set to 0 and the white spaces between columns will be removed.

ft <- as_flextable(tab, separate_with = "cut", sep_w = 0)
ft

cut

color

I1

SI1

VS2

n

y stats

z stats

y stats

z stats

y stats

z stats

Fair

F

6.0 (1.1)

4.0 (0.7)

5.9 (0.8)

3.7 (0.5)

5.6 (0.7)

3.6 (0.5)

506

G

6.4 (1.1)

4.2 (0.7)

5.9 (0.6)

3.9 (0.4)

6.1 (0.8)

4.0 (0.5)

506

H

6.9 (1.0)

4.5 (0.7)

6.4 (0.8)

4.1 (0.5)

6.2 (0.9)

4.0 (0.6)

506

Good

F

6.1 (1.0)

3.9 (0.6)

5.7 (0.8)

3.6 (0.5)

5.7 (0.9)

3.5 (0.6)

1 281

G

6.6 (1.0)

3.9 (1.1)

6.0 (0.9)

3.7 (0.6)

5.8 (0.9)

3.6 (0.6)

1 281

H

6.7 (0.8)

4.2 (0.6)

6.0 (1.1)

3.7 (0.6)

5.9 (1.1)

3.7 (0.7)

1 281

Very Good

F

6.7 (1.0)

4.1 (0.5)

5.8 (0.8)

3.6 (0.5)

5.7 (0.9)

3.5 (0.6)

2 942

G

6.5 (0.8)

4.0 (0.5)

5.8 (0.9)

3.6 (0.6)

5.8 (1.0)

3.6 (0.6)

2 942

H

7.4 (0.9)

4.6 (0.6)

6.2 (1.1)

3.8 (0.7)

6.0 (1.1)

3.7 (0.7)

2 942

10.3.3.3 Spread first column as a group separator

Option spread_first_col can be set to TRUE and first row is spread as a new line separator instead of being a column.

ft <- as_flextable(tab,
  spread_first_col = TRUE,
  rows_alignment = "right"
) |>
  align(i = ~ !is.na(cut), align = "left")
ft

color

I1

SI1

VS2

n

y stats

z stats

y stats

z stats

y stats

z stats

Fair

F

6.0 (1.1)

4.0 (0.7)

5.9 (0.8)

3.7 (0.5)

5.6 (0.7)

3.6 (0.5)

506

G

6.4 (1.1)

4.2 (0.7)

5.9 (0.6)

3.9 (0.4)

6.1 (0.8)

4.0 (0.5)

506

H

6.9 (1.0)

4.5 (0.7)

6.4 (0.8)

4.1 (0.5)

6.2 (0.9)

4.0 (0.6)

506

Good

F

6.1 (1.0)

3.9 (0.6)

5.7 (0.8)

3.6 (0.5)

5.7 (0.9)

3.5 (0.6)

1 281

G

6.6 (1.0)

3.9 (1.1)

6.0 (0.9)

3.7 (0.6)

5.8 (0.9)

3.6 (0.6)

1 281

H

6.7 (0.8)

4.2 (0.6)

6.0 (1.1)

3.7 (0.6)

5.9 (1.1)

3.7 (0.7)

1 281

Very Good

F

6.7 (1.0)

4.1 (0.5)

5.8 (0.8)

3.6 (0.5)

5.7 (0.9)

3.5 (0.6)

2 942

G

6.5 (0.8)

4.0 (0.5)

5.8 (0.9)

3.6 (0.6)

5.8 (1.0)

3.6 (0.6)

2 942

H

7.4 (0.9)

4.6 (0.6)

6.2 (1.1)

3.8 (0.7)

6.0 (1.1)

3.7 (0.7)

2 942