Chapter 10 Crosstabs
The ‘flextable’ package will not aggregate data for you but it will help you to present aggregated data (with package ‘data.table’ or ‘dplyr’).
The package allows to transform crosstabs created by tables::tabular()
into ‘flexable’ with method as_flextable()
.
library(tables)
tab <- tabular(
(Species + 1) ~ (n = 1) + Format(digits = 2) *
(Sepal.Length + Sepal.Width) * (mean + sd),
data = iris
)
as_flextable(tab)
Species |
Sepal.Length |
Sepal.Width |
|||
---|---|---|---|---|---|
n |
mean |
sd |
mean |
sd |
|
setosa |
50 |
5.01 |
0.35 |
3.43 |
0.38 |
versicolor |
50 |
5.94 |
0.52 |
2.77 |
0.31 |
virginica |
50 |
6.59 |
0.64 |
2.97 |
0.32 |
All |
150 |
5.84 |
0.83 |
3.06 |
0.44 |
The package provides also the function tabulator()
to prepare for the creation
of multidimensional crosstabs, possibly with multiple columns. The object can
then be transformed as a flextable with as_flextable()
.
dat <- group_by(warpbreaks, wool, tension) %>%
summarise(breaks = mean(breaks), .groups = "drop")
cft_1 <- tabulator(
x = dat,
rows = "wool",
columns = "tension",
`μ` = as_paragraph(breaks),
`n` = as_paragraph(length(breaks))
)
ft_1 <- as_flextable(cft_1)
ft_1
wool |
L |
M |
H |
||||||
---|---|---|---|---|---|---|---|---|---|
μ |
n |
μ |
n |
μ |
n |
||||
A |
44.56 |
2 |
24.00 |
2 |
24.56 |
2 |
|||
B |
28.22 |
2 |
28.78 |
2 |
18.78 |
2 |
10.1 PROC FREQ
Function proc_freq()
compute a contingency table and
create a flextable from the result. The aim of the
function is to reproduce the results of the ‘SAS’ PROC FREQ
.
It can produce a two way contingency table:
proc_freq(mtcars, "gear", "vs")
gear |
vs |
|||
---|---|---|---|---|
0 |
1 |
Total |
||
3 |
Count |
12 (37.5%) |
3 (9.4%) |
15 (46.9%) |
Mar. pct (1) |
66.7% ; 80.0% |
21.4% ; 20.0% |
||
4 |
Count |
2 (6.2%) |
10 (31.2%) |
12 (37.5%) |
Mar. pct |
11.1% ; 16.7% |
71.4% ; 83.3% |
||
5 |
Count |
4 (12.5%) |
1 (3.1%) |
5 (15.6%) |
Mar. pct |
22.2% ; 80.0% |
7.1% ; 20.0% |
||
Total |
Count |
18 (56.2%) |
14 (43.8%) |
32 (100.0%) |
(1) Columns and rows percentages |
It can also produce a one way contingency table:
proc_freq(mtcars, "gear")
gear |
Count |
Percent |
---|---|---|
3 |
15 |
46.9% |
4 |
12 |
37.5% |
5 |
5 |
15.6% |
Total |
32 |
100.0% |
10.2 Using ‘tables’
Package ‘tables’ is inspired by ‘SAS PROC TABULATE’ and enable the creation of tabular summaries with a smart and clear syntax. Use of ‘tables’ results is straightforward:
- create a tabular object with function
tables::tabular()
, - transform it to a flextable with
as_flextable()
. One option is available to let display the first group as a row separator.
library(tables)
set_flextable_defaults(theme_fun = theme_vader)
tab <- tabular( (cut*color + 1) ~ (n=1) + Format(digits=2)*
(price + x)*(mean + sd), data=ggplot2::diamonds)
print(tab[1:10,])
#
# price x
# cut color n mean sd mean sd
# Fair D 163 4291.06 3286.11 6.02 0.83
# E 224 3682.31 2976.65 5.91 0.83
# F 312 3827.00 3223.30 5.99 0.88
# G 314 4239.25 3609.64 6.17 0.99
# H 303 5135.68 3886.48 6.58 0.94
# I 175 4685.45 3730.27 6.56 0.90
# J 119 4975.66 4050.46 6.75 1.09
# Good D 662 3405.38 3175.15 5.62 0.93
# E 933 3423.64 3330.70 5.62 0.95
# F 909 3495.75 3202.41 5.69 0.95
as_flextable(tab, spread_first_col = TRUE,
row_title = as_paragraph(
colorize(as_b(.row_title), color = "#ff0000"))) |>
labelizor(part = "header",
labels = c("mean" = "µ", sd = "σ"))
color |
price |
x |
|||
---|---|---|---|---|---|
n |
µ |
σ |
µ |
σ |
|
Fair | |||||
D |
163 |
4291.06 |
3286.11 |
6.02 |
0.83 |
E |
224 |
3682.31 |
2976.65 |
5.91 |
0.83 |
F |
312 |
3827.00 |
3223.30 |
5.99 |
0.88 |
G |
314 |
4239.25 |
3609.64 |
6.17 |
0.99 |
H |
303 |
5135.68 |
3886.48 |
6.58 |
0.94 |
I |
175 |
4685.45 |
3730.27 |
6.56 |
0.90 |
J |
119 |
4975.66 |
4050.46 |
6.75 |
1.09 |
Good | |||||
D |
662 |
3405.38 |
3175.15 |
5.62 |
0.93 |
E |
933 |
3423.64 |
3330.70 |
5.62 |
0.95 |
F |
909 |
3495.75 |
3202.41 |
5.69 |
0.95 |
G |
871 |
4123.48 |
3702.50 |
5.85 |
1.02 |
H |
702 |
4276.25 |
4020.66 |
5.97 |
1.13 |
I |
522 |
5078.53 |
4631.70 |
6.25 |
1.22 |
J |
307 |
4574.17 |
3707.79 |
6.38 |
1.12 |
Very Good | |||||
D |
1513 |
3470.47 |
3523.75 |
5.50 |
0.97 |
E |
2400 |
3214.65 |
3408.02 |
5.43 |
1.00 |
F |
2164 |
3778.82 |
3786.12 |
5.61 |
1.00 |
G |
2299 |
3872.75 |
3861.38 |
5.66 |
1.04 |
H |
1824 |
4535.39 |
4185.80 |
5.98 |
1.16 |
I |
1204 |
5255.88 |
4687.10 |
6.27 |
1.17 |
J |
678 |
5103.51 |
4135.65 |
6.46 |
1.14 |
Premium | |||||
D |
1603 |
3631.29 |
3711.63 |
5.60 |
1.02 |
E |
2337 |
3538.91 |
3794.99 |
5.59 |
1.03 |
F |
2331 |
4324.89 |
4012.02 |
5.88 |
1.02 |
G |
2924 |
4500.74 |
4356.57 |
5.86 |
1.15 |
H |
2360 |
5216.71 |
4466.19 |
6.24 |
1.23 |
I |
1428 |
5946.18 |
5053.75 |
6.49 |
1.32 |
J |
808 |
6294.59 |
4788.94 |
6.81 |
1.22 |
Ideal | |||||
D |
2834 |
2629.09 |
3001.07 |
5.19 |
0.82 |
E |
3903 |
2597.55 |
2956.01 |
5.22 |
0.85 |
F |
3826 |
3374.94 |
3766.64 |
5.41 |
0.98 |
G |
4884 |
3720.71 |
4006.26 |
5.51 |
1.05 |
H |
3115 |
3889.33 |
4013.38 |
5.73 |
1.17 |
I |
2093 |
4451.97 |
4505.15 |
5.98 |
1.24 |
J |
896 |
4918.19 |
4476.21 |
6.32 |
1.22 |
53940 |
3932.80 |
3989.44 |
5.73 |
1.12 |
10.3 Using tabulator
10.3.1 Tabulate a dataset with tabulator
Function tabulator()
presents the result of an aggregation in a
nice crosstable. To illustrate the function, we will use a summary
of a subset of the dataset ggplot2::diamonds
.
set_flextable_defaults(theme_fun = theme_booktabs)
subdat <- ggplot2::diamonds %>%
filter(
cut %in% c("Fair", "Good", "Very Good"),
clarity %in% c("I1", "SI1", "VS2"),
color %in% c("F", "G", "H")
)
dat <- group_by(subdat, cut, color, clarity) %>%
summarise(
across(
.cols = c(z, y),
.fns = list(
mean = ~ mean(.x),
sd = ~ sd(.x)
)
),
.groups = "drop"
)
head(dat)
# # A tibble: 6 × 7
# cut color clarity z_mean z_sd y_mean y_sd
# <ord> <ord> <ord> <dbl> <dbl> <dbl> <dbl>
# 1 Fair F I1 4.00 0.704 6.04 1.08
# 2 Fair F SI1 3.73 0.542 5.87 0.844
# 3 Fair F VS2 3.59 0.499 5.63 0.697
# 4 Fair G I1 4.23 0.722 6.43 1.08
# 5 Fair G SI1 3.85 0.437 5.92 0.636
# 6 Fair G VS2 3.96 0.487 6.14 0.787
Now the data is ready to be sent to tabulator()
function.
User have to provide parameters that define the layout of the crosstable, rows
and
columns
. These are the columns names of the aggregated dataset containing the
groupings dimensions (union should be the columns used in the dplyr::group_by
or the by
value if ‘datatable’ is used for the aggregation). rows
will be
used as row dimensions and columns
will be used as columns dimensions, the
values to display will be located under each corresponding set of columns.
The values to display are defined in the ...
parameter of the function.
The expected formalism is the following:
- The name given to the argument is used as the column name. Note that if only one column is defined, its label is not displayed.
- The expected value for each argument is a call to the
as_paragraph()
function.
tabulator(
x = dat, rows = c("cut", "color"),
columns = "clarity",
`y stats` = as_paragraph(
fmt_avg_dev(y_mean, y_sd)
)
) |>
as_flextable()
cut |
color |
I1 |
SI1 |
VS2 |
|||
---|---|---|---|---|---|---|---|
Fair |
F |
6.0 (1.1) |
5.9 (0.8) |
5.6 (0.7) |
|||
G |
6.4 (1.1) |
5.9 (0.6) |
6.1 (0.8) |
||||
H |
6.9 (1.0) |
6.4 (0.8) |
6.2 (0.9) |
||||
Good |
F |
6.1 (1.0) |
5.7 (0.8) |
5.7 (0.9) |
|||
G |
6.6 (1.0) |
6.0 (0.9) |
5.8 (0.9) |
||||
H |
6.7 (0.8) |
6.0 (1.1) |
5.9 (1.1) |
||||
Very Good |
F |
6.7 (1.0) |
5.8 (0.8) |
5.7 (0.9) |
|||
G |
6.5 (0.8) |
5.8 (0.9) |
5.8 (1.0) |
||||
H |
7.4 (0.9) |
6.2 (1.1) |
6.0 (1.1) |
We can add another column for the z
aggregations.
ft <- tabulator(
x = dat, rows = c("cut", "color"),
columns = "clarity",
`y stats` = as_paragraph(fmt_avg_dev(y_mean, y_sd)),
`z stats` = as_paragraph(fmt_avg_dev(z_mean, z_sd))
) |>
as_flextable()
ft
cut |
color |
I1 |
SI1 |
VS2 |
||||||
---|---|---|---|---|---|---|---|---|---|---|
y stats |
z stats |
y stats |
z stats |
y stats |
z stats |
|||||
Fair |
F |
6.0 (1.1) |
4.0 (0.7) |
5.9 (0.8) |
3.7 (0.5) |
5.6 (0.7) |
3.6 (0.5) |
|||
G |
6.4 (1.1) |
4.2 (0.7) |
5.9 (0.6) |
3.9 (0.4) |
6.1 (0.8) |
4.0 (0.5) |
||||
H |
6.9 (1.0) |
4.5 (0.7) |
6.4 (0.8) |
4.1 (0.5) |
6.2 (0.9) |
4.0 (0.6) |
||||
Good |
F |
6.1 (1.0) |
3.9 (0.6) |
5.7 (0.8) |
3.6 (0.5) |
5.7 (0.9) |
3.5 (0.6) |
|||
G |
6.6 (1.0) |
3.9 (1.1) |
6.0 (0.9) |
3.7 (0.6) |
5.8 (0.9) |
3.6 (0.6) |
||||
H |
6.7 (0.8) |
4.2 (0.6) |
6.0 (1.1) |
3.7 (0.6) |
5.9 (1.1) |
3.7 (0.7) |
||||
Very Good |
F |
6.7 (1.0) |
4.1 (0.5) |
5.8 (0.8) |
3.6 (0.5) |
5.7 (0.9) |
3.5 (0.6) |
|||
G |
6.5 (0.8) |
4.0 (0.5) |
5.8 (0.9) |
3.6 (0.6) |
5.8 (1.0) |
3.6 (0.6) |
||||
H |
7.4 (0.9) |
4.6 (0.6) |
6.2 (1.1) |
3.8 (0.7) |
6.0 (1.1) |
3.7 (0.7) |
As it is a flextable, all flextable can be used:
ft <- color(ft, i = ~ cut %in% "Very Good", color = "blue")
ft <- add_header_lines(x = ft, "blah blah blah blah blah blah blah blah blah blah")
ft
blah blah blah blah blah blah blah blah blah blah | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
cut |
color |
I1 |
SI1 |
VS2 |
||||||
y stats |
z stats |
y stats |
z stats |
y stats |
z stats |
|||||
Fair |
F |
6.0 (1.1) |
4.0 (0.7) |
5.9 (0.8) |
3.7 (0.5) |
5.6 (0.7) |
3.6 (0.5) |
|||
G |
6.4 (1.1) |
4.2 (0.7) |
5.9 (0.6) |
3.9 (0.4) |
6.1 (0.8) |
4.0 (0.5) |
||||
H |
6.9 (1.0) |
4.5 (0.7) |
6.4 (0.8) |
4.1 (0.5) |
6.2 (0.9) |
4.0 (0.6) |
||||
Good |
F |
6.1 (1.0) |
3.9 (0.6) |
5.7 (0.8) |
3.6 (0.5) |
5.7 (0.9) |
3.5 (0.6) |
|||
G |
6.6 (1.0) |
3.9 (1.1) |
6.0 (0.9) |
3.7 (0.6) |
5.8 (0.9) |
3.6 (0.6) |
||||
H |
6.7 (0.8) |
4.2 (0.6) |
6.0 (1.1) |
3.7 (0.6) |
5.9 (1.1) |
3.7 (0.7) |
||||
Very Good |
F |
6.7 (1.0) |
4.1 (0.5) |
5.8 (0.8) |
3.6 (0.5) |
5.7 (0.9) |
3.5 (0.6) |
|||
G |
6.5 (0.8) |
4.0 (0.5) |
5.8 (0.9) |
3.6 (0.6) |
5.8 (1.0) |
3.6 (0.6) |
||||
H |
7.4 (0.9) |
4.6 (0.6) |
6.2 (1.1) |
3.8 (0.7) |
6.0 (1.1) |
3.7 (0.7) |
10.3.2 Display additional data
Users may wish to add labels or calculation results corresponding to the row dimensions, after or before the rotated columns. Three arguments are provided to support this.
10.3.2.2 datasup_first and datasup_last
Arguments datasup_first
and datasup_last
can be used to show
data before or after the computer columns. These data.frame can contains
more than one column.
tab <- tabulator(
x = dat, rows = c("cut", "color"),
columns = "clarity",
datasup_last = nstat,
`y stats` = as_paragraph(fmt_avg_dev(y_mean, y_sd)),
`z stats` = as_paragraph(fmt_avg_dev(z_mean, z_sd))
)
as_flextable(tab)
cut |
color |
I1 |
SI1 |
VS2 |
n |
|||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
y stats |
z stats |
y stats |
z stats |
y stats |
z stats |
|||||||
Fair |
F |
6.0 (1.1) |
4.0 (0.7) |
5.9 (0.8) |
3.7 (0.5) |
5.6 (0.7) |
3.6 (0.5) |
506 |
||||
G |
6.4 (1.1) |
4.2 (0.7) |
5.9 (0.6) |
3.9 (0.4) |
6.1 (0.8) |
4.0 (0.5) |
506 |
|||||
H |
6.9 (1.0) |
4.5 (0.7) |
6.4 (0.8) |
4.1 (0.5) |
6.2 (0.9) |
4.0 (0.6) |
506 |
|||||
Good |
F |
6.1 (1.0) |
3.9 (0.6) |
5.7 (0.8) |
3.6 (0.5) |
5.7 (0.9) |
3.5 (0.6) |
1 281 |
||||
G |
6.6 (1.0) |
3.9 (1.1) |
6.0 (0.9) |
3.7 (0.6) |
5.8 (0.9) |
3.6 (0.6) |
1 281 |
|||||
H |
6.7 (0.8) |
4.2 (0.6) |
6.0 (1.1) |
3.7 (0.6) |
5.9 (1.1) |
3.7 (0.7) |
1 281 |
|||||
Very Good |
F |
6.7 (1.0) |
4.1 (0.5) |
5.8 (0.8) |
3.6 (0.5) |
5.7 (0.9) |
3.5 (0.6) |
2 942 |
||||
G |
6.5 (0.8) |
4.0 (0.5) |
5.8 (0.9) |
3.6 (0.6) |
5.8 (1.0) |
3.6 (0.6) |
2 942 |
|||||
H |
7.4 (0.9) |
4.6 (0.6) |
6.2 (1.1) |
3.8 (0.7) |
6.0 (1.1) |
3.7 (0.7) |
2 942 |
10.3.3 Design options
When calling as_flextable()
, several options are available.
Most important are separate_with
, spread_first_col
and
sep_w
.
10.3.3.1 Separate groups with a line
Option separate_with
add a line of separation between each specified group.
ft <- as_flextable(tab, separate_with = "cut")
ft
cut |
color |
I1 |
SI1 |
VS2 |
n |
|||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
y stats |
z stats |
y stats |
z stats |
y stats |
z stats |
|||||||
Fair |
F |
6.0 (1.1) |
4.0 (0.7) |
5.9 (0.8) |
3.7 (0.5) |
5.6 (0.7) |
3.6 (0.5) |
506 |
||||
G |
6.4 (1.1) |
4.2 (0.7) |
5.9 (0.6) |
3.9 (0.4) |
6.1 (0.8) |
4.0 (0.5) |
506 |
|||||
H |
6.9 (1.0) |
4.5 (0.7) |
6.4 (0.8) |
4.1 (0.5) |
6.2 (0.9) |
4.0 (0.6) |
506 |
|||||
Good |
F |
6.1 (1.0) |
3.9 (0.6) |
5.7 (0.8) |
3.6 (0.5) |
5.7 (0.9) |
3.5 (0.6) |
1 281 |
||||
G |
6.6 (1.0) |
3.9 (1.1) |
6.0 (0.9) |
3.7 (0.6) |
5.8 (0.9) |
3.6 (0.6) |
1 281 |
|||||
H |
6.7 (0.8) |
4.2 (0.6) |
6.0 (1.1) |
3.7 (0.6) |
5.9 (1.1) |
3.7 (0.7) |
1 281 |
|||||
Very Good |
F |
6.7 (1.0) |
4.1 (0.5) |
5.8 (0.8) |
3.6 (0.5) |
5.7 (0.9) |
3.5 (0.6) |
2 942 |
||||
G |
6.5 (0.8) |
4.0 (0.5) |
5.8 (0.9) |
3.6 (0.6) |
5.8 (1.0) |
3.6 (0.6) |
2 942 |
|||||
H |
7.4 (0.9) |
4.6 (0.6) |
6.2 (1.1) |
3.8 (0.7) |
6.0 (1.1) |
3.7 (0.7) |
2 942 |
10.3.3.2 Drop space between columns
Option sep_w
can be set to 0 and the white spaces between columns
will be removed.
ft <- as_flextable(tab, separate_with = "cut", sep_w = 0)
ft
cut |
color |
I1 |
SI1 |
VS2 |
n |
|||
---|---|---|---|---|---|---|---|---|
y stats |
z stats |
y stats |
z stats |
y stats |
z stats |
|||
Fair |
F |
6.0 (1.1) |
4.0 (0.7) |
5.9 (0.8) |
3.7 (0.5) |
5.6 (0.7) |
3.6 (0.5) |
506 |
G |
6.4 (1.1) |
4.2 (0.7) |
5.9 (0.6) |
3.9 (0.4) |
6.1 (0.8) |
4.0 (0.5) |
506 |
|
H |
6.9 (1.0) |
4.5 (0.7) |
6.4 (0.8) |
4.1 (0.5) |
6.2 (0.9) |
4.0 (0.6) |
506 |
|
Good |
F |
6.1 (1.0) |
3.9 (0.6) |
5.7 (0.8) |
3.6 (0.5) |
5.7 (0.9) |
3.5 (0.6) |
1 281 |
G |
6.6 (1.0) |
3.9 (1.1) |
6.0 (0.9) |
3.7 (0.6) |
5.8 (0.9) |
3.6 (0.6) |
1 281 |
|
H |
6.7 (0.8) |
4.2 (0.6) |
6.0 (1.1) |
3.7 (0.6) |
5.9 (1.1) |
3.7 (0.7) |
1 281 |
|
Very Good |
F |
6.7 (1.0) |
4.1 (0.5) |
5.8 (0.8) |
3.6 (0.5) |
5.7 (0.9) |
3.5 (0.6) |
2 942 |
G |
6.5 (0.8) |
4.0 (0.5) |
5.8 (0.9) |
3.6 (0.6) |
5.8 (1.0) |
3.6 (0.6) |
2 942 |
|
H |
7.4 (0.9) |
4.6 (0.6) |
6.2 (1.1) |
3.8 (0.7) |
6.0 (1.1) |
3.7 (0.7) |
2 942 |
10.3.3.3 Spread first column as a group separator
Option spread_first_col
can be set to TRUE and first row is spread as a new line separator instead of being a column.
ft <- as_flextable(tab,
spread_first_col = TRUE,
rows_alignment = "right"
) |>
align(i = ~ !is.na(cut), align = "left")
ft
color |
I1 |
SI1 |
VS2 |
n |
|||||||
---|---|---|---|---|---|---|---|---|---|---|---|
y stats |
z stats |
y stats |
z stats |
y stats |
z stats |
||||||
Fair | |||||||||||
F |
6.0 (1.1) |
4.0 (0.7) |
5.9 (0.8) |
3.7 (0.5) |
5.6 (0.7) |
3.6 (0.5) |
506 |
||||
G |
6.4 (1.1) |
4.2 (0.7) |
5.9 (0.6) |
3.9 (0.4) |
6.1 (0.8) |
4.0 (0.5) |
506 |
||||
H |
6.9 (1.0) |
4.5 (0.7) |
6.4 (0.8) |
4.1 (0.5) |
6.2 (0.9) |
4.0 (0.6) |
506 |
||||
Good | |||||||||||
F |
6.1 (1.0) |
3.9 (0.6) |
5.7 (0.8) |
3.6 (0.5) |
5.7 (0.9) |
3.5 (0.6) |
1 281 |
||||
G |
6.6 (1.0) |
3.9 (1.1) |
6.0 (0.9) |
3.7 (0.6) |
5.8 (0.9) |
3.6 (0.6) |
1 281 |
||||
H |
6.7 (0.8) |
4.2 (0.6) |
6.0 (1.1) |
3.7 (0.6) |
5.9 (1.1) |
3.7 (0.7) |
1 281 |
||||
Very Good | |||||||||||
F |
6.7 (1.0) |
4.1 (0.5) |
5.8 (0.8) |
3.6 (0.5) |
5.7 (0.9) |
3.5 (0.6) |
2 942 |
||||
G |
6.5 (0.8) |
4.0 (0.5) |
5.8 (0.9) |
3.6 (0.6) |
5.8 (1.0) |
3.6 (0.6) |
2 942 |
||||
H |
7.4 (0.9) |
4.6 (0.6) |
6.2 (1.1) |
3.8 (0.7) |
6.0 (1.1) |
3.7 (0.7) |
2 942 |