Chapter 4 officer for Word

4.1 Add contents

To add paragraphs of text, tables, images to a Word document, you have to use one of the body_add_* functions:

body_add_blocks()
body_add_break()
body_add_caption()
body_add_docx()
body_add_fpar()
body_add_gg()
body_add_img()
body_add_par()
body_add_plot()
body_add_table()
body_add_toc()

They all have the same output and the same first argument: the R object representing the Word document, these functions are all taking as first input the document that needs to be filled with some R content and are all returning the document, that has been augmented with the new R content(s) after the function call.

x <- body_add_par(x, "Level 1 title", style = "heading 1")

These functions are all creating one or more top level elements, either paragraphs, either tables.

4.1.1 Tables

The tabular reporting topic is handled by ‘officer’ using the body_add_table() function. The function is rendering data.frame as Word tables with few formatting options available; it is recommended to use the ‘flextable’ package for more advanced formatting needs.

The body_add_table() function adds a data.frame as a Word table whose formatting is defined in the document template, a group of settings that can be applied to a table. The settings include formatting for the overall table, rows, columns, etc.

You can activate the “conditional formatting” instructions, i.e. a style for the first or last row, the first or last column and a style for the row or column strips.

first_row: apply or remove formatting from the first row in the table.
first_column: apply or remove formatting from the first column in the table.
last_row: apply or remove formatting from the last row in the table.
last_column: apply or remove formatting from the last column in the table.
no_hband: don’t display odd and even rows.
no_vband: don’t display odd and even columns.

dat <- cbind(data.frame(cars = row.names(mtcars)),
             mtcars)
head(dat)

cars	mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
character	numeric	numeric	numeric	numeric	numeric	numeric	numeric	numeric	numeric	numeric	numeric
Mazda RX4	21.0	6	160	110	3.9	2.6	16.5	0	1	4	4
Mazda RX4 Wag	21.0	6	160	110	3.9	2.9	17.0	0	1	4	4
Datsun 710	22.8	4	108	93	3.9	2.3	18.6	1	1	4	1
Hornet 4 Drive	21.4	6	258	110	3.1	3.2	19.4	1	0	3	1
Hornet Sportabout	18.7	8	360	175	3.1	3.4	17.0	0	0	3	2
Valiant	18.1	6	225	105	2.8	3.5	20.2	1	0	3	1
n: 6

doc_table <- read_docx(path = "templates/template_demo.docx") |>
  body_add_table(head(dat, n = 20), style = "Table") |> 
  body_add_break() |> 
  body_add_table(head(dat, n = 20), style = "Table", 
                 first_column = TRUE)

print(doc_table, target = "static/reports/example_table.docx")

static/reports/example_table.docx

4.1.2 Paragraphs

The package makes it very easy to use paragraph styles. You can incrementally add a text associated with a paragraph style.

The function is not vectorized, it is planned to implement this vectorization in the future.

To add a text paragraph, use the body_add_paragraph() function. The function requires 3 arguments, the target document, the text to be used in the new paragraph, and the paragraph style to be used.

read_docx() |> 
  body_add_par(value = "Hello World!", style = "Normal") |> 
  body_add_par(value = "Salut Bretons!", style = "centered") |> 
  print(target = "static/reports/example_par.docx")

static/reports/example_par.docx

4.1.3 Titles

A title is a paragraph. To add a title, use body_add_par() with the style argument set to the corresponding title style.

read_docx() |> 
  body_add_par(value = "This is a title 1", style = "heading 1") |> 
  body_add_par(value = "This is a title 2", style = "heading 2") |> 
  body_add_par(value = "This is a title 3", style = "heading 3") |> 
  print(target = "static/reports/example_titles.docx")

static/reports/example_titles.docx

4.1.4 Tables of contents

A TOC (Table of Contents) is a Word computed field, table of contents is built by Word. The TOC field will collect entries using heading styles or another specified style.

Note: you have to update the fields with Word application to reflect the correct page numbers. See update the fields

Use function body_add_toc() to insert a TOC inside a Word document.

doc_toc <- read_docx(path = "templates/word_example.docx") |>
  body_add_par("Table of Contents", style = "heading 1") |> 
  body_add_toc(level = 2) |> 
  body_add_par("Table of figures", style = "heading 1") |> 
  body_add_toc(style = "Image Caption") |> 
  body_add_par("Table of tables", style = "heading 1") |> 
  body_add_toc(style = "Table Caption")

print(doc_toc, target = "static/reports/example_toc.docx")

static/reports/example_toc.docx

4.1.5 Images

Images are specific because they are part of a paragraph. This means you can mix text and images in a paragraph. An image is always rendered in a paragraph. Functions body_add_img() is a sugar function that wrap an image into a paragraph. It accepts various image formats: png, jpeg or emf.

img.file <- file.path( R.home("doc"), "html", "logo.jpg" )

read_docx() |> 
  body_add_img(src = img.file, height = 1.06, width = 1.39, style = "centered") |> 
  print(target = "static/reports/example_image.docx")

static/reports/example_image.docx

4.1.6 Plots from ‘ggplot2’

Function body_add_gg() is also a sugar function that wrap an image generated from a ggplot into a paragraph.

library(ggplot2)

gg <- ggplot(data = iris, aes(Sepal.Length, Petal.Length)) + 
  geom_point()

doc_gg <- read_docx()
doc_gg <- body_add_gg(x = doc_gg, value = gg, style = "centered")

The size of the Word document can be used to maximize the size of the graphic to be produced.

word_size <- docx_dim(doc_gg)
word_size

# $page
#     width    height 
#  8.263889 11.694444 
# 
# $landscape
# [1] FALSE
# 
# $margins
#       top    bottom      left     right    header    footer 
# 0.9840278 0.9840278 0.9840278 0.9840278 0.4916667 0.4916667

width <- word_size$page['width'] - word_size$margins['left'] - word_size$margins['right']
height <- word_size$page['height'] - word_size$margins['top'] - word_size$margins['bottom']

doc_gg <- body_add_gg(x = doc_gg, value = gg, 
                      width = width, height = height, 
                      style = "centered")

print(doc_gg, target = "static/reports/example_gg.docx")

static/reports/example_gg.docx

4.1.7 Base plot

To add a standard R graphic, use the body_add_plot function with plot_instr which contains the graphic instructions to be executed to produce a single graphic.

doc <- read_docx()
doc <- body_add_plot(doc,
    width = width, height = height,
    value = plot_instr(
      code = {barplot(1:5, col = 2:6)}),
      style = "centered")
print(doc, target = "static/reports/example_word_plot_instr.docx")

static/reports/example_word_plot_instr.docx

4.1.8 Microsoft charts

The ‘mschart’ package allows you to create native office graphics that can be used with ‘officer’. The body_add_chart function must be used to generate an office chart in Word.

library(mschart)

my_barchart <- ms_barchart(data = browser_data,
  x = "browser", y = "value", group = "serie")
my_barchart <- chart_settings( x = my_barchart,
  dir="vertical", grouping="clustered", gap_width = 50 )

read_docx() |> 
  body_add_chart(chart = my_barchart, style = "centered") |> 
  print(target = "static/reports/example_word_chart.docx")

static/reports/example_word_chart.docx

4.1.9 Page breaks

Page breaks are handy for formatting a Word document. They allow you to control where your document should move to the next page, such as at the end of a chapter or section.

Use function body_add_break() to add a page break in the Word document.

library(ggplot2)
library(flextable)

gg <- ggplot(data = iris, aes(Sepal.Length, Petal.Length)) + 
  geom_point()

ft <- flextable(head(iris, n = 10))
ft <- set_table_properties(ft, layout = "autofit")

read_docx() |> 
  body_add_par(value = "dataset iris", style = "heading 2") |> 
  body_add_flextable(value = ft ) |> 
  
  body_add_break() |> 

  body_add_par(value = "plot examples", style = "heading 2") |> 
  body_add_gg(value = gg, style = "centered") |> 
  
  print(target = "static/reports/example_break.docx")

static/reports/example_break.docx

4.1.10 External documents

Inserting a document of course allows you to integrate a previously-created Word document into another document. This can be useful when certain parts of a document need to be written manually but automatically integrated into a final document. The document to be inserted must be in docx format.

This can be done by using function body_add_docx().

read_docx() |> 
  body_add_par(value = "An external document", style = "heading 1") |> 
  body_add_docx(src = "static/reports/example_break.docx") |> 
  print(target = "static/reports/example_add_docx.docx")

static/reports/example_add_docx.docx

This can be advantageous when you are generating huge documents and the generation is getting slower and slower.

It is necessary to generate smaller documents and to design a main script that inserts the different documents into a main Word document. The following script illustrates the strategy:

library(uuid)

ft <- flextable(iris)
ft <- set_table_properties(ft, layout = "autofit")

gg_plot <- ggplot(data = iris ) +
  geom_point(mapping = aes(Sepal.Length, Petal.Length))

tmpdir <- tempfile()
dir.create(tmpdir, showWarnings = FALSE, recursive = TRUE)
tempfiles <- file.path(tmpdir, paste0(UUIDgenerate(n = 10), ".docx") )

for(i in seq_along(tempfiles)) {
  doc <- read_docx()
  doc <- body_add_par(doc, value = "", style = "Normal")
  doc <- body_add_gg(doc, value = gg_plot, style = "centered")
  doc <- body_add_par(doc, value = "", style = "Normal")
  doc <- body_add_flextable(doc, value = ft)
  temp_file <- tempfile(fileext = ".docx")
  print(doc, target = tempfiles[i])
}

# tempfiles contains all generated docx paths

main_doc <- read_docx()
for(tempfile in tempfiles){
  main_doc <- body_add_docx(main_doc, src = tempfile)
}
print(main_doc, target = "static/reports/example_huge.docx")

4.2 Add Sections

A section affects preceding paragraphs or tables (see Word Sections).

Usually, starting with a continous section and ending with the section you defined is enough.

To format your content in a section, you should use the body_end_block_section function. First you need to define the section with the block_section function, which takes an object returned by the prop_section function. It is prop_section() that allows you to define the properties of your section.

Let’s first create a document and add a graphic:

library(ggplot2)

gg <- ggplot(data = iris, aes(Sepal.Length, Petal.Length)) + 
  geom_point()

doc_section_1 <- read_docx()
doc_section_1 <- body_add_gg(
  x = doc_section_1, value = gg, 
  width = 9, height = 6,
  style = "centered")

Now, let’s add a section that will set the previously graphic display in a landscape oriented page.

ps <- prop_section(
  page_size = page_size(orient = "landscape"),
  type = "oddPage")

doc_section_1 <- body_end_block_section(
  x = doc_section_1, 
  value = block_section(property = ps))

That’s it, let’s add the graphic again to see it display at the end of the document in the default section:

doc_section_1 <- body_add_gg(
  x = doc_section_1, value = gg,
  width = 6.29, height = 9.72,
  style = "centered"
)

print(doc_section_1, target = "static/reports/example_landscape_gg.docx")

static/reports/example_landscape_gg.docx

4.2.1 Supported features

Most of the properties of Word sections are available with the ‘officer’ package: page size, page margins, section type (oddPage, continuous, nextColumn) and columns. The ability to link a header or footer to a section is not (yet) implemented.

Section properties are defined with function prop_section with arguments:

page_size: page dimensions defined with function page_size().
page_margins: page margins defined with function page_margins().
type : section type (“continuous”, “evenPage”, “oddPage”, …).
section_columns: section columns defined with function section_columns().

4.2.2 How to manage sections

The body_end_block_section function is usually used twice. The first time to close the previous section (and thus start the new one) and then another section to close the second one. All content between the end of the first section and the end of the second section will be arranged according to the rules defined for the second section.

Let’s illustrate the principle with a graphic that need to be in a landscape oriented page.

A paragraph is added.
We add an end of section (we use a continuous section for this) to let the first paragraph fit in the default section.
Add the graphic.
We add an end of section that will apply to the graphic (we reuse the property that allows to have a section oriented in landscape).

doc_section_2 <- read_docx() |> 
  body_add_par("This is a dummy text. It is in a continuous section") |> 
  body_end_block_section(block_section(prop_section(type = "continuous"))) |> 
  body_add_gg(value = gg, width = 7, height = 5, style = "centered") |> 
  body_end_block_section(block_section(property = ps))

print(doc_section_2, target = "static/reports/example_landscape_gg2.docx")

static/reports/example_landscape_gg2.docx

Note that if you add a section break at the end of the document with a different orientation than the default, it generates a last page that is empty. This is a behavior of Word and there is only one solution: using a template where the default orientation is the same as the last section break. For example, a default landscape orientation if you insert a landscape oriented section at the end of the document.

Now, let’s illustrate a complex layout. We are going to add two sections oriented in landscape. The first will contain a table, the second will contain long text separated into two columns. The final result will be a landscape oriented page containing a table and then text spread over two columns (and of course this famous extra blank page).

landscape_one_column <- block_section(
  prop_section(
    page_size = page_size(orient = "landscape"), type = "continuous"
  )
)
landscape_two_columns <- block_section(
  prop_section(
    page_size = page_size(orient = "landscape"), type = "continuous",
    section_columns = section_columns(widths = c(4, 4))
  )
)

doc_section_3 <- read_docx() |>
  body_add_table(value = head(mtcars), style = "table_template") |>
  body_end_block_section(value = landscape_one_column) |> 
  body_add_par(value = paste(rep(letters, 60), collapse = " ")) |>
  body_end_block_section(value = landscape_two_columns)

print(doc_section_3, target = "static/reports/example_complex_section.docx")

static/reports/example_complex_section.docx

4.2.3 Sugar functions

In addition to the generic function body_end_block_section, some utility functions are available to be used as shortcuts:

body_end_section_landscape()
body_end_section_portrait()
body_end_section_columns()
body_end_section_columns_landscape()
body_end_section_continuous()

4.3 Remove content

The function body_remove() lets you remove content from a Word document. This function is often to be used with a cursor_* function.

For illustration purposes, we will reuse document produced here as initial document and the last three paragraphs will be removed.

my_doc <- read_docx(path = "static/reports/example_break.docx")

my_doc <- body_remove(my_doc) |> cursor_end()
my_doc <- body_remove(my_doc) |> cursor_end()
my_doc <- body_remove(my_doc) |> cursor_end()

print(my_doc, target = "static/reports/example_remove.docx")

static/reports/example_remove.docx

4.4 Content Replacement

When it comes to replacing content in an existing Word document, there is no straightforward solution using only the ‘officer’ package. This is due to the limitations of Microsoft Word’s design as a manual editing program with limited automation capabilities. There are a few reasons why content replacement can be challenging:

Word does not provide a consistent built-in mechanism for marking a specific target area in a document, making it difficult to identify and replace specific content.
Word arranges typed words into “run” chunks, each containing identically formatted text. However, Word’s handling of run chunks can be inconsistent, especially when there are pauses in typing. For example, typing “he” and then “llo” may result in two separate runs for “he” and “llo,” making it harder to detect and replace the complete word “hello” programmatically.

While the ‘officer’ package does offer some functions for content replacement, it is important to note their limitations and potential difficulties in usage. The approach chosen by ‘officer’ involves using bookmarks placed inside paragraphs, which presents its own challenges:

Placing bookmarks on successive paragraphs renders them unusable.
Placing bookmarks on single words within a paragraph allows for their use.

The existing ‘officer’ functions for content replacement replace entire paragraphs where the bookmark is located. Unfortunately, they do not allow for partial text replacement within a paragraph, and the replaced paragraph no longer retains the original bookmark.

To preserve bookmarks after replacing the containing paragraph, you need to re-inject the bookmarks using the run_bookmark() function.

The ‘officer’ package provides the following functions for content replacement, but we do not recommend using them due to their limitations (we will deprecate these functions in the future):

body_replace_all_text()
headers_replace_all_text()
footers_replace_all_text()
body_replace_text_at_bkm()
body_replace_img_at_bkm()
headers_replace_text_at_bkm()
headers_replace_img_at_bkm()
footers_replace_text_at_bkm()
footers_replace_img_at_bkm()

Instead, we recommend using the ‘doconv’ package, which offers more flexibility and advanced features for manipulating Word documents. ‘doconv’ allows you to perform complex tasks such as updating calculated fields, including tables of contents and document property references.

With ‘doconv’, you can replace specific chunks of text rather than whole paragraphs. You can find detailed instructions on how to perform content replacement and utilize other features of ‘doconv’ in the article https://www.ardata.fr/en/post/2022/08/25/doconv-0-1-4-is-out/.

To enable text replacement without using body_replace_* functions, ‘officer’ provides functions for simple text replacement without bookmark management:

the officer::run_word_field() function, which allows calculated field insertions. Note it’s probably easier to set them manually.
the set_doc_properties() function, which adds values to be replaced in the document (thanks to calculated fields), and that can be updated with doconv::docx_update().

The following example shows a template with Word computed fields, how to replace the values and update the document with doconv::docx_update().

a word template with computed fields

doc <- read_docx("templates/replace-example.docx")
doc <- set_doc_properties(doc, 
  man = "David", 
  `tab-1-1` = "1-1", `tab-1-2` = "1-2",
  `tab-2-1` = "2-1", `tab-2-2` = "2-2",
  randomtitle = "blah blah")
print(doc, target = "static/reports/example_doc_properties.docx")

# doconv::docx_update("static/reports/example_doc_properties.docx")

static/reports/example_doc_properties.docx

For insertion of complex paragraphs at bookmark level, use:

officer::cursor_* functions to position your cursor on a specific block (paragraph or table),
and add a block_list() object at this position (using body_add_blocks(pos = "on")).

x <- read_docx(file)
x <- cursor_reach(x, "BLAH BLAH")
x <- body_add_blocks(
  x = x,
  value = block_list(
    fpar(
      "BLIH BLIH",
      fp_p = fp_par(shading.color = "#EFEFEF", text.align = "center"),
      fp_t = fp_text(font.size = 12, font.family = "Calibri", hansi.family = "Calibri")),
    fpar(
      "Blou Blou",
      fp_p = fp_par(shading.color = "red", text.align = "right"),
      fp_t = fp_text(font.size = 10, font.family = "Calibri", hansi.family = "Calibri"))
  ),
  pos = "on")