Chapter 4 officer for Word
4.1 Add contents
To add paragraphs of text, tables, images to a Word document,
you have to use one of the body_add_*
functions:
body_add_blocks()
body_add_break()
body_add_caption()
body_add_docx()
body_add_fpar()
body_add_gg()
body_add_img()
body_add_par()
body_add_plot()
body_add_table()
body_add_toc()
They all have the same output and the same first argument: the R object representing the Word document, these functions are all taking as first input the document that needs to be filled with some R content and are all returning the document, that has been augmented with the new R content(s) after the function call.
x <- body_add_par(x, "Level 1 title", style = "heading 1")
These functions are all creating one or more top level elements, either paragraphs, either tables.
4.1.1 Tables
The tabular reporting topic is handled by ‘officer’ using
the body_add_table()
function. The function is rendering data.frame
as Word tables with few formatting options available; it is recommended
to use the ‘flextable’ package for more advanced formatting needs.
The body_add_table()
function adds a data.frame
as a Word
table whose formatting is defined in the document template,
a group of settings that can be applied to a table. The settings
include formatting for the overall table, rows, columns, etc.
You can activate the “conditional formatting” instructions, i.e. a style for the first or last row, the first or last column and a style for the row or column strips.
first_row
: apply or remove formatting from the first row in the table.first_column
: apply or remove formatting from the first column in the table.last_row
: apply or remove formatting from the last row in the table.last_column
: apply or remove formatting from the last column in the table.no_hband
: don’t display odd and even rows.no_vband
: don’t display odd and even columns.
<- cbind(data.frame(cars = row.names(mtcars)),
dat
mtcars)head(dat)
cars | mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb |
---|---|---|---|---|---|---|---|---|---|---|---|
character | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric | numeric |
Mazda RX4 | 21.0 | 6 | 160 | 110 | 3.9 | 2.6 | 16.5 | 0 | 1 | 4 | 4 |
Mazda RX4 Wag | 21.0 | 6 | 160 | 110 | 3.9 | 2.9 | 17.0 | 0 | 1 | 4 | 4 |
Datsun 710 | 22.8 | 4 | 108 | 93 | 3.9 | 2.3 | 18.6 | 1 | 1 | 4 | 1 |
Hornet 4 Drive | 21.4 | 6 | 258 | 110 | 3.1 | 3.2 | 19.4 | 1 | 0 | 3 | 1 |
Hornet Sportabout | 18.7 | 8 | 360 | 175 | 3.1 | 3.4 | 17.0 | 0 | 0 | 3 | 2 |
Valiant | 18.1 | 6 | 225 | 105 | 2.8 | 3.5 | 20.2 | 1 | 0 | 3 | 1 |
n: 6 |
<- read_docx(path = "templates/template_demo.docx") |>
doc_table body_add_table(head(dat, n = 20), style = "Table") |>
body_add_break() |>
body_add_table(head(dat, n = 20), style = "Table",
first_column = TRUE)
print(doc_table, target = "static/reports/example_table.docx")
4.1.2 Paragraphs
The package makes it very easy to use paragraph styles. You can incrementally add a text associated with a paragraph style.
The function is not vectorized, it is planned to implement this vectorization in the future.
To add a text paragraph, use the body_add_paragraph()
function. The function
requires 3 arguments, the target document, the text to be used in the new
paragraph, and the paragraph style to be used.
read_docx() |>
body_add_par(value = "Hello World!", style = "Normal") |>
body_add_par(value = "Salut Bretons!", style = "centered") |>
print(target = "static/reports/example_par.docx")
4.1.3 Titles
A title is a paragraph. To add a title, use body_add_par()
with the style
argument set to the corresponding title style.
read_docx() |>
body_add_par(value = "This is a title 1", style = "heading 1") |>
body_add_par(value = "This is a title 2", style = "heading 2") |>
body_add_par(value = "This is a title 3", style = "heading 3") |>
print(target = "static/reports/example_titles.docx")
4.1.4 Tables of contents
A TOC (Table of Contents) is a Word computed field, table of contents is built by Word. The TOC field will collect entries using heading styles or another specified style.
Note: you have to update the fields with Word application to reflect the correct page numbers. See update the fields
Use function body_add_toc()
to insert a TOC inside a Word document.
<- read_docx(path = "templates/word_example.docx") |>
doc_toc body_add_par("Table of Contents", style = "heading 1") |>
body_add_toc(level = 2) |>
body_add_par("Table of figures", style = "heading 1") |>
body_add_toc(style = "Image Caption") |>
body_add_par("Table of tables", style = "heading 1") |>
body_add_toc(style = "Table Caption")
print(doc_toc, target = "static/reports/example_toc.docx")
4.1.5 Images
Images are specific because they are part of a paragraph. This means you can mix
text and images in a paragraph. An image is always rendered in a paragraph.
Functions body_add_img()
is a sugar function that wrap an image into a
paragraph. It accepts various image formats: png, jpeg or emf.
<- file.path( R.home("doc"), "html", "logo.jpg" )
img.file
read_docx() |>
body_add_img(src = img.file, height = 1.06, width = 1.39, style = "centered") |>
print(target = "static/reports/example_image.docx")
4.1.6 Plots from ‘ggplot2’
Function body_add_gg()
is also a sugar function that wrap an image generated
from a ggplot
into a paragraph.
library(ggplot2)
<- ggplot(data = iris, aes(Sepal.Length, Petal.Length)) +
gg geom_point()
<- read_docx()
doc_gg <- body_add_gg(x = doc_gg, value = gg, style = "centered") doc_gg
The size of the Word document can be used to maximize the size of the graphic to be produced.
<- docx_dim(doc_gg)
word_size word_size
# $page
# width height
# 8.263889 11.694444
#
# $landscape
# [1] FALSE
#
# $margins
# top bottom left right header footer
# 0.9840278 0.9840278 0.9840278 0.9840278 0.4916667 0.4916667
<- word_size$page['width'] - word_size$margins['left'] - word_size$margins['right']
width <- word_size$page['height'] - word_size$margins['top'] - word_size$margins['bottom']
height
<- body_add_gg(x = doc_gg, value = gg,
doc_gg width = width, height = height,
style = "centered")
print(doc_gg, target = "static/reports/example_gg.docx")
4.1.7 Base plot
To add a standard R graphic, use the body_add_plot
function with plot_instr
which contains the graphic instructions to be executed to produce a single
graphic.
<- read_docx()
doc <- body_add_plot(doc,
doc width = width, height = height,
value = plot_instr(
code = {barplot(1:5, col = 2:6)}),
style = "centered")
print(doc, target = "static/reports/example_word_plot_instr.docx")
4.1.8 Microsoft charts
The ‘mschart’ package allows you to create native office graphics
that can be used with ‘officer’. The body_add_chart
function
must be used to generate an office chart in Word.
library(mschart)
<- ms_barchart(data = browser_data,
my_barchart x = "browser", y = "value", group = "serie")
<- chart_settings( x = my_barchart,
my_barchart dir="vertical", grouping="clustered", gap_width = 50 )
read_docx() |>
body_add_chart(chart = my_barchart, style = "centered") |>
print(target = "static/reports/example_word_chart.docx")
4.1.9 Page breaks
Page breaks are handy for formatting a Word document. They allow you to control where your document should move to the next page, such as at the end of a chapter or section.
Use function body_add_break()
to add a page break in the Word document.
library(ggplot2)
library(flextable)
<- ggplot(data = iris, aes(Sepal.Length, Petal.Length)) +
gg geom_point()
<- flextable(head(iris, n = 10))
ft <- set_table_properties(ft, layout = "autofit")
ft
read_docx() |>
body_add_par(value = "dataset iris", style = "heading 2") |>
body_add_flextable(value = ft ) |>
body_add_break() |>
body_add_par(value = "plot examples", style = "heading 2") |>
body_add_gg(value = gg, style = "centered") |>
print(target = "static/reports/example_break.docx")
4.1.10 External documents
Inserting a document of course allows you to integrate a previously-created Word
document into another document. This can be useful when certain parts of a
document need to be written manually but automatically integrated into a final
document. The document to be inserted must be in docx
format.
This can be done by using function body_add_docx()
.
read_docx() |>
body_add_par(value = "An external document", style = "heading 1") |>
body_add_docx(src = "static/reports/example_break.docx") |>
print(target = "static/reports/example_add_docx.docx")
This can be advantageous when you are generating huge documents and the generation is getting slower and slower.
It is necessary to generate smaller documents and to design a main script that inserts the different documents into a main Word document. The following script illustrates the strategy:
library(uuid)
<- flextable(iris)
ft <- set_table_properties(ft, layout = "autofit")
ft
<- ggplot(data = iris ) +
gg_plot geom_point(mapping = aes(Sepal.Length, Petal.Length))
<- tempfile()
tmpdir dir.create(tmpdir, showWarnings = FALSE, recursive = TRUE)
<- file.path(tmpdir, paste0(UUIDgenerate(n = 10), ".docx") )
tempfiles
for(i in seq_along(tempfiles)) {
<- read_docx()
doc <- body_add_par(doc, value = "", style = "Normal")
doc <- body_add_gg(doc, value = gg_plot, style = "centered")
doc <- body_add_par(doc, value = "", style = "Normal")
doc <- body_add_flextable(doc, value = ft)
doc <- tempfile(fileext = ".docx")
temp_file print(doc, target = tempfiles[i])
}
# tempfiles contains all generated docx paths
<- read_docx()
main_doc for(tempfile in tempfiles){
<- body_add_docx(main_doc, src = tempfile)
main_doc
}print(main_doc, target = "static/reports/example_huge.docx")
4.2 Add Sections
A section affects preceding paragraphs or tables (see Word Sections).
Usually, starting with a continous section and ending with the section you defined is enough.
To format your content in a section, you should use the body_end_block_section
function. First you need to define the section with the block_section
function, which takes an object returned by the prop_section
function. It is
prop_section()
that allows you to define the properties of your section.
Let’s first create a document and add a graphic:
library(ggplot2)
<- ggplot(data = iris, aes(Sepal.Length, Petal.Length)) +
gg geom_point()
<- read_docx()
doc_section_1 <- body_add_gg(
doc_section_1 x = doc_section_1, value = gg,
width = 9, height = 6,
style = "centered")
Now, let’s add a section that will set the previously graphic display in a landscape oriented page.
<- prop_section(
ps page_size = page_size(orient = "landscape"),
type = "oddPage")
<- body_end_block_section(
doc_section_1 x = doc_section_1,
value = block_section(property = ps))
That’s it, let’s add the graphic again to see it display at the end of the document in the default section:
<- body_add_gg(
doc_section_1 x = doc_section_1, value = gg,
width = 6.29, height = 9.72,
style = "centered"
)
print(doc_section_1, target = "static/reports/example_landscape_gg.docx")
4.2.1 Supported features
Most of the properties of Word sections are available with the ‘officer’ package: page size, page margins, section type (oddPage, continuous, nextColumn) and columns. The ability to link a header or footer to a section is not (yet) implemented.
Section properties are defined with function prop_section
with arguments:
page_size
: page dimensions defined with functionpage_size()
.page_margins
: page margins defined with functionpage_margins()
.type
: section type (“continuous”, “evenPage”, “oddPage”, …).section_columns
: section columns defined with functionsection_columns()
.
4.2.2 How to manage sections
The body_end_block_section
function is usually used twice. The first time to
close the previous section (and thus start the new one) and then another section
to close the second one. All content between the end of the first section and
the end of the second section will be arranged according to the rules defined
for the second section.
Let’s illustrate the principle with a graphic that need to be in a landscape oriented page.
- A paragraph is added.
- We add an end of section (we use a continuous section for this) to let the first paragraph fit in the default section.
- Add the graphic.
- We add an end of section that will apply to the graphic (we reuse the property that allows to have a section oriented in landscape).
<- read_docx() |>
doc_section_2 body_add_par("This is a dummy text. It is in a continuous section") |>
body_end_block_section(block_section(prop_section(type = "continuous"))) |>
body_add_gg(value = gg, width = 7, height = 5, style = "centered") |>
body_end_block_section(block_section(property = ps))
print(doc_section_2, target = "static/reports/example_landscape_gg2.docx")
Note that if you add a section break at the end of the document with a different orientation than the default, it generates a last page that is empty. This is a behavior of Word and there is only one solution: using a template where the default orientation is the same as the last section break. For example, a default landscape orientation if you insert a landscape oriented section at the end of the document.
Now, let’s illustrate a complex layout. We are going to add two sections oriented in landscape. The first will contain a table, the second will contain long text separated into two columns. The final result will be a landscape oriented page containing a table and then text spread over two columns (and of course this famous extra blank page).
<- block_section(
landscape_one_column prop_section(
page_size = page_size(orient = "landscape"), type = "continuous"
)
)<- block_section(
landscape_two_columns prop_section(
page_size = page_size(orient = "landscape"), type = "continuous",
section_columns = section_columns(widths = c(4, 4))
)
)
<- read_docx() |>
doc_section_3 body_add_table(value = head(mtcars), style = "table_template") |>
body_end_block_section(value = landscape_one_column) |>
body_add_par(value = paste(rep(letters, 60), collapse = " ")) |>
body_end_block_section(value = landscape_two_columns)
print(doc_section_3, target = "static/reports/example_complex_section.docx")
4.3 Remove content
The function body_remove()
lets you remove content from a Word document. This
function is often to be used with a cursor_*
function.
For illustration purposes, we will reuse document produced here as initial document and the last three paragraphs will be removed.
<- read_docx(path = "static/reports/example_break.docx")
my_doc
<- body_remove(my_doc) |> cursor_end()
my_doc <- body_remove(my_doc) |> cursor_end()
my_doc <- body_remove(my_doc) |> cursor_end()
my_doc
print(my_doc, target = "static/reports/example_remove.docx")
4.4 Content Replacement
When it comes to replacing content in an existing Word document, there is no straightforward solution using only the ‘officer’ package. This is due to the limitations of Microsoft Word’s design as a manual editing program with limited automation capabilities. There are a few reasons why content replacement can be challenging:
- Word does not provide a consistent built-in mechanism for marking a specific target area in a document, making it difficult to identify and replace specific content.
- Word arranges typed words into “run” chunks, each containing identically formatted text. However, Word’s handling of run chunks can be inconsistent, especially when there are pauses in typing. For example, typing “he” and then “llo” may result in two separate runs for “he” and “llo,” making it harder to detect and replace the complete word “hello” programmatically.
While the ‘officer’ package does offer some functions for content replacement, it is important to note their limitations and potential difficulties in usage. The approach chosen by ‘officer’ involves using bookmarks placed inside paragraphs, which presents its own challenges:
- Placing bookmarks on successive paragraphs renders them unusable.
- Placing bookmarks on single words within a paragraph allows for their use.
The existing ‘officer’ functions for content replacement replace entire paragraphs where the bookmark is located. Unfortunately, they do not allow for partial text replacement within a paragraph, and the replaced paragraph no longer retains the original bookmark.
To preserve bookmarks after replacing the containing paragraph, you need to
re-inject the bookmarks using the run_bookmark()
function.
The ‘officer’ package provides the following functions for content replacement, but we do not recommend using them due to their limitations (we will deprecate these functions in the future):
body_replace_all_text()
headers_replace_all_text()
footers_replace_all_text()
body_replace_text_at_bkm()
body_replace_img_at_bkm()
headers_replace_text_at_bkm()
headers_replace_img_at_bkm()
footers_replace_text_at_bkm()
footers_replace_img_at_bkm()
Instead, we recommend using the ‘doconv’ package, which offers more flexibility and advanced features for manipulating Word documents. ‘doconv’ allows you to perform complex tasks such as updating calculated fields, including tables of contents and document property references.
With ‘doconv’, you can replace specific chunks of text rather than whole paragraphs. You can find detailed instructions on how to perform content replacement and utilize other features of ‘doconv’ in the article https://www.ardata.fr/en/post/2022/08/25/doconv-0-1-4-is-out/.
To enable text replacement without using body_replace_*
functions, ‘officer’
provides functions for simple text replacement without bookmark management:
- the
officer::run_word_field()
function, which allows calculated field insertions. Note it’s probably easier to set them manually. - the
set_doc_properties()
function, which adds values to be replaced in the document (thanks to calculated fields), and that can be updated withdoconv::docx_update()
.
The following example shows a template with Word computed fields, how
to replace the values and update the document with doconv::docx_update()
.
<- read_docx("templates/replace-example.docx")
doc <- set_doc_properties(doc,
doc man = "David",
`tab-1-1` = "1-1", `tab-1-2` = "1-2",
`tab-2-1` = "2-1", `tab-2-2` = "2-2",
randomtitle = "blah blah")
print(doc, target = "static/reports/example_doc_properties.docx")
# doconv::docx_update("static/reports/example_doc_properties.docx")
For insertion of complex paragraphs at bookmark level, use:
officer::cursor_*
functions to position your cursor on a specific block (paragraph or table),- and add a
block_list()
object at this position (usingbody_add_blocks(pos = "on")
).
<- read_docx(file)
x <- cursor_reach(x, "BLAH BLAH")
x <- body_add_blocks(
x x = x,
value = block_list(
fpar(
"BLIH BLIH",
fp_p = fp_par(shading.color = "#EFEFEF", text.align = "center"),
fp_t = fp_text(font.size = 12, font.family = "Calibri", hansi.family = "Calibri")),
fpar(
"Blou Blou",
fp_p = fp_par(shading.color = "red", text.align = "right"),
fp_t = fp_text(font.size = 10, font.family = "Calibri", hansi.family = "Calibri"))
),pos = "on")