Remove a flextable columns after the flextable creation - flextable

I create a flextable based on a csv file, I put some style on it, change some cells. Then I would like to remove a specific columns of this flextable before add it to a doc.
Is-there a way to create a copy of a flextable and specifying col_keys?
mydf <- GetData(....)
cols <- names(mydf)
myft <- flextable(mydf, col_keys = cols)
# Adding style to ft...
# ....
# Here I want to remove one column to the ft (and only here, not when first creating the ft)
# something as:
# ft <- CreateCopyOfFlextable(ft,cols[-which(cols=='COLB')])
#
my_doc <- read_docx()
my_doc <- my_doc %>% body_add_par("") %>%
body_add_flextable(value = ft)
print(my_doc, target = 'c:/temp/doc.docx')

I just had the same problem and had a devil of a time Googling for a solution. #David-Gohel truly has the answer here, but I feel the need to provide a similar solution with additional explanation.
My problem and the OPs is that we wanted to use the data from columns that wouldn't be displayed to influence the formatting of columns that will be displayed. The concept that was not initially obvious is that you can send a data frame to flextable with more columns than you intend to display (instead of displaying all and deleting them you've used them). Then, by using the col_keys argument, you can select only those columns that you want to display while keeping the remaining ones around for additional processing (e.g., for using compose(), paragraph(), or add_chunk()).
If I understand correctly, COLB is supposed to be a flag to indicate that certain rows of COLC should be modified. If so, then my solution looks like this:
library(flextable)
library(magrittr)
library(officer)
df <- data.frame(COLA=c('a', 'b', 'c'),
COLB=c('', 'changevalue', ''),
COLC=c(10, 12, 13))
ft <- flextable(df, col_keys = c("COLA", "COLC")) %>% # Retain but don't display COLB
compose(i = ~ COLB =='changevalue', # Use COLB for conditional modifications
j = "COLC",
value = as_paragraph(as_chunk('100')),
part = 'body') %>%
style(i = ~ COLB =='changevalue', # Use COLB for conditional formatting on COLC
j = "COLC",
pr_t = fp_text(color = "black",
font.size = 11,
bold = TRUE,
italic = FALSE,
underline = FALSE,
font.family = "Times New Roman"),
part = "body")
ft
And here's what the above code produces (e.g., the "changevalue" column is the trigger for conditionally inserting 100 in COLC and also for changing the formatting):

library(flextable)
library(magrittr)
library(officer)
df <- data.frame(COLA=c('a','b','c'),
COLB=c('','changevalue',''),
COLC=c(10,12,13))
ft<-flextable(df, col_keys = c("COLA", "COLB"))
ft <- ft %>%
style(i= ~ COLB=='changevalue',
pr_t=fp_text(color="black", font.size=11, bold=TRUE, italic=FALSE, underline=FALSE, font.family="Times New Roman"),part="body")
ft<-compose(ft, i=2, j="COLB", value = as_paragraph(as_chunk('100')),part = 'body')
ft

I'm styling another column based on COLB.
Here an example:
df <- data.frame(COLA=c('a','b','c'),COLB=c('','changevalue',''),COLC=c(10,12,13))
ft<-flextable(df)
ft <- ft %>% style(i=which(ft$body$dataset$COLB=='changevalue'),pr_t=fp_text(color="black", font.size=11, bold=TRUE, italic=FALSE, underline=FALSE, font.family="Times New Roman"),part="body")
ft<-compose(ft, i=2,j=3, value = as_paragraph(as_chunk('100')),part = 'body')
# now I want to remove the COLB columns as I don't need it anymore
# ???????
my_doc <- read_docx()
my_doc <- my_doc %>% body_add_par("") %>% body_add_flextable(value = ft)
print(my_doc, target = 'c:/temp/orliange_p/sample.docx') %>% invisible()

Related

Group by multiple columns in a function in dplyr

I want to create a function that takes an externally defined variable and uses it in a group by using dplyr. Here is what I have so far:
data(mtcars)
my_grp_col <- 'gear'
calculate_mean <- function(data, grouping_column, target){
data %>%
group_by(cyl, am, {{my_grp_col}}, target) %>%
summarize(mean(target, na.rm = T))
}
calculate_mean(data = mtcars, grouping_column = my_grp_col, target = mpg)
Essentially, I want to group by cyl, am, gear (which I have defined externally) and then calculate the mean of target (mpg).
The following would work (note that you need also {{...}} around target in this case):
data(mtcars)
my_grp_col <- 'gear'
calculate_mean <- function(data, grouping_column, target){
data %>%
group_by(cyl, am, !!sym(grouping_column), {{target}}) %>%
summarize(mean(target, na.rm = T))
}
calculate_mean(data = mtcars, grouping_column = my_grp_col, target = mpg)
However, it would look much nicer if you also directly give grouping_column without defining it as string before:
calculate_mean <- function(data, grouping_column, target){
data %>%
group_by(cyl, am, {{grouping_column}}, {{target}}) %>%
summarize(mean(target, na.rm = T))
}
calculate_mean(data = mtcars, grouping_column = gear, target = mpg)

Several lines with different style in Caption in both html and docx - flextable

I need to show data caption, computer name and period in the header of table.
I have also requirements: zebra theme, merging cells if needed. That's why I chose flextable.
Here is my code:
library(officer) # border settings library
library(flextable) # drawing tables library
library(dplyr)
Caption <- "<b><big>Computer01.domain.com</big></b><br>Network Interface<br>Gbit Total/sec<br><small>2021-05-14 18:04 to 2021-05-25 13:29</small>"
bold_border <- fp_border(color="gray", width =2)
std_border <- fp_border(color="gray")
stub <- "2021-05-14 01:40 to 2021-05-17 08:26"
table_data <- data.frame (
Instance = c("Intel[R] Ethernet 10G",
"Intel[R] Ethernet Converged Network Adapter _1",
"Intel[R] Ethernet Converged Network Adapter _2",
"Intel[R] Ethernet 10G",
"Intel[R] Gigabit"),
Max = c(2.45, 2.41, 2.29, 2.17, 0),
Avg = c(0.15, 0.15, 0.15, 0.17, 0)
)
table <- table_data %>% flextable() %>%
set_caption(caption = Caption , html_escape = F) %>%
bg(bg = "#579FAD", part = "header") %>%
color(color = "white", part = "header") %>%
theme_zebra(
odd_header = "#579FAD",
odd_body = "#E0EFF4",
even_header = "transparent",
even_body = "transparent"
) %>%
set_table_properties(width = 1, layout = "autofit") %>%
hline(part="all", border = std_border ) %>%
vline(part="all", border = std_border ) %>%
border_outer( border = bold_border, part = "all" ) %>%
fix_border_issues() %>%
set_header_labels(
values = list(Instance = InstanceName ) ) %>%
flextable::font (part = "all" , fontname = "Calibri")
save_as_docx( table, path = file.path("c:\\temp", "test01.docx") )
save_as_html (table, path = file.path("c:\\temp", "test01.html"))
Here is what I got in html which is okay for me:
But in docx format my header style is not applied:
How can I create header like I did for html that can be saved to both html and docx?
If I have to create separate tables - one for html, other for docx - it's not so good but acceptable options. That case my question how to create header I made in html but for docx format?

R: Converting ggplot objects to interactive graphs

I am using the R programming language. I am trying to take different types of graphs (bar graphs, pie charts) and put them on the same page. I generated some fake data and made several graphs - then I put them together (see : Combining Different Types of Graphs Together (R))
library(dplyr)
library(ggplot2)
library(cowplot)
library(gridExtra)
library(plotly)
date= seq(as.Date("2014/1/1"), as.Date("2016/1/1"),by="day")
var <- rnorm(731,10,10)
group <- sample( LETTERS[1:4], 731, replace=TRUE, prob=c(0.25, 0.22, 0.25, 0.25) )
data = data.frame(date, var, group)
data$year = as.numeric(format(data$date,'%Y'))
data$year = as.factor(data$year)
###Pie
Pie_2014 <- data %>%
filter((data$year == "2014")) %>%
group_by(group) %>%
summarise(n = n())
Pie_2014_graph = ggplot(Pie_2014, aes(x="", y=n, fill=group)) +
geom_bar(stat="identity", width=1) +
coord_polar("y", start=0) +ggtitle( "Pie Chart 2014")
Pie_2015 <- data %>%
filter((data$year == "2015")) %>%
group_by(group) %>%
summarise(n = n())
Pie_2015_graph = ggplot(Pie_2015, aes(x="", y=n, fill=group)) +
geom_bar(stat="identity", width=1) +
coord_polar("y", start=0) +ggtitle( "Pie Chart 2015")
Pie_total = data %>%
group_by(group) %>%
summarise(n = n())
Pie_total_graph = ggplot(Pie_total, aes(x="", y=n, fill=group)) +
geom_bar(stat="identity", width=1) +
coord_polar("y", start=0) +ggtitle( "Pie Chart Average")
###bars
Bar_years = data %>%
group_by(year, group) %>%
summarise(mean = mean(var))
Bar_years_plot = ggplot(Bar_years, aes(fill=group, y=mean, x=year)) +
geom_bar(position="dodge", stat="identity") + ggtitle("Bar Plot All Years")
Bar_total = data %>%
group_by(group) %>%
summarise(mean = n())
Bar_total_plot = ggplot(Bar_total, aes(x=group, y=mean, fill=group)) +
geom_bar(stat="identity")+theme_minimal() + ggtitle("Bar Plot Average")
#assembling the graphs can be done two different ways
#first way
g1 <- grid.arrange(Pie_2014_graph, Pie_2015_graph , Pie_total_graph, nrow = 1)
g2 <- grid.arrange(Bar_total_plot, Bar_years_plot, nrow = 1)
g = grid.arrange(g1, g2, ncol = 1)
#second way
# arrange subplots in rows
top_row <- plot_grid(Pie_2014_graph, Pie_2015_graph, Pie_total_graph)
middle_row <- plot_grid(Bar_years_plot, Bar_total_plot)
# arrange our new rows into combined plot
p <- plot_grid(top_row, middle_row, nrow = 2)
p
From here, I am trying to use the plotly::ggplotly() command to make the above output "interactive" (move the mouse over the graphs and see labels). I know that this works for individual plots:
ggplotly(Bar_years_plot)
However, this command does not seem to work with the "cowplot" and the "gridExtra" outputs:
#gridExtra version:
ggplotly(g)
Error in UseMethod("ggplotly", p) :
no applicable method for 'ggplotly' applied to an object of class "c('gtable', 'gTree', 'grob', 'gDesc')"
#cowplot version: (produces empty plot)
ggplotly(p)
Warning messages:
1: In geom2trace.default(dots[[1L]][[1L]], dots[[2L]][[1L]], dots[[3L]][[1L]]) :
geom_GeomDrawGrob() has yet to be implemented in plotly.
If you'd like to see this geom implemented,
Please open an issue with your example code at
https://github.com/ropensci/plotly/issues
2: In geom2trace.default(dots[[1L]][[1L]], dots[[2L]][[1L]], dots[[3L]][[1L]]) :
geom_GeomDrawGrob() has yet to be implemented in plotly.
If you'd like to see this geom implemented,
Please open an issue with your example code at
https://github.com/ropensci/plotly/issues
Does anyone know if there is a quick way to use the ggplotly() function for objects created with "gridExtra" or "cowplot"?
I know that with a bit of work, it might be possible using "htmltools":
library(htmltools)
doc <- htmltools::tagList(
div(Pie_2014_graph, style = "float:left;width:50%;"),
div(Pie_2015_graph,style = "float:left;width:50%;"),
div(Pie_total_graph, style = "float:left;width:50%;"),
div(Bar_years_plot, style = "float:left;width:50%;"),
div(Bar_total_plot, style = "float:left;width:50%;"))
save_html(html = doc, file = "out.html")
But I am not sure how to do this.
Can someone please show me how to make the collections of graphs interactive either using ggplotly() or with htmltools()?
Thanks.
You should apply ggplotly() to the individual graphs, not the collection graphs.
For example:
Pie_2014_graph = ggplotly(ggplot(Pie_2014, aes(x="", y=n, fill=group)) +
geom_bar(stat="identity", width=1) +
coord_polar("y", start=0) +ggtitle( "Pie Chart 2014") )

R - Issue with the DOM of the danish parliament (webscraping)

I've been working on a webscraping project for the political science department at my university.
The Danish parliament is very transparent about their democratic process and they are uploading all the legislative documents on their website. I've been crawling over all pages starting 2008. Right now I'm parsing the information into a dataframe and I'm having an issue that I was not able to resolve so far.
If we look at the DOM we can see that they named most of the objects div.tingdok-normal. The number of objects varies between 16-19. To parse the information correctly for my dataframe I tried to grep out the necessary parts according to patterns. However, the issue is that sometimes my pattern match more than once and I don't know how to tell R that I only want the first match.
for the sake of an example I include some code:
final.url <- "https://www.ft.dk/samling/20161/lovforslag/l154/index.htm"
to.save <- getURL(final.url)
p <- read_html(to.save)
normal <- p %>% html_nodes("div.tingdok-normal > span") %>% html_text(trim =TRUE)
tomatch <- c("Forkastet regeringsforslag", "Forkastet privat forslag", "Vedtaget regeringsforslag", "Vedtaget privat forslag")
type <- unique (grep(paste(tomatch, collapse="|"), results, value = TRUE))
Maybe you can help me with that
My understanding is that you want to extract the text of the webpage, because the "tingdok-normal" are related to the text. I was able to get the text of the webpage with the following code. Also, the following code identifies the position of the first "regex hit" of the different patterns to match.
library(pagedown)
library(pdftools)
library(stringr)
pagedown::chrome_print("https://www.ft.dk/samling/20161/lovforslag/l154/index.htm",
"C:/.../danish.pdf")
text <- pdftools::pdf_text("C:/.../danish.pdf")
tomatch <- c("(A|a)ftalen", "(O|o)pholdskravet")
nb_Tomatch <- length(tomatch)
list_Position <- list()
list_Text <- list()
for(i in 1 : nb_Tomatch)
{
# Locates the first hit of the regex
# To locate all regex hit, use stringr::str_locate_all
list_Position[[i]] <- stringr::str_locate(text , pattern = tomatch[i])
list_Text[[i]] <- stringr::str_sub(string = text,
start = list_Position[[i]][1, 1],
end = list_Position[[i]][1, 2])
}
Here is another approach :
library(RDCOMClient)
library(stringr)
library(rvest)
url <- "https://www.ft.dk/samling/20161/lovforslag/l154/index.htm"
IEApp <- COMCreate("InternetExplorer.Application")
IEApp[['Visible']] <- TRUE
IEApp$Navigate(url)
Sys.sleep(5)
doc <- IEApp$Document()
html_Content <- doc$documentElement()$innerText()
tomatch <- c("(A|a)ftalen", "(O|o)pholdskravet")
nb_Tomatch <- length(tomatch)
list_Position <- list()
list_Text <- list()
for(i in 1 : nb_Tomatch)
{
# Locates the first hit of the regex
# To locate all regex hit, use stringr::str_locate_all
list_Position[[i]] <- stringr::str_locate(text , pattern = tomatch[i])
list_Text[[i]] <- stringr::str_sub(string = text,
start = list_Position[[i]][1, 1],
end = list_Position[[i]][1, 2])
}

Clean HTML table with Reshape2

New user of R. Can't think how to even ask the question. I scraped a webpage for HTML tables. Generally, everything went well, except for one table. Instead of there being 7 separate tables, everything got collapsed into 1 table, with the column name and the value for the first table being two separate columns, and all the other tables being rows. The results is a table with something like this:
df <- data.frame(is_employed = c("Hobbies", "Has Previous Experience"), false = c("squash", "false"))
Obviously, I need to have the rows (and the column name) in the first column as their own columns, with the item in the second column as their values, preferably with underscores in the columns names. I tried:
df <- dcast(df, ~is_employed, value.var = "false")
But got an error message. Then I thought to add another column, as such:
df2 <- data.frame(number = c(1, 2), is_employed = c("Hobbies", "Has Previous Experience"), false = c("squash", "false"))
then I tried
df3 <- dcast(df2, number ~is_employed, value.var="false")
That placed the values in the first columns as their own columns, but produced two rows (instead of 1), with NAs. I'm sure this is really basic, but I can't figure it out.
On edit:
I think this gives me what I want, but I'm away from my computer so I can't confirm:
library("dplyr")
library("tidyr")
mat <- as.matrix(df)
mat <- rbind(colnames(mat), mat)
colnames(mat) <- c("variable", "value")
df2 <- as.data.frame(mat)
df3 <- df2 %>%
mutate(n = 1) %>%
spread(variable, value) %>%
select(-n)
I need to add n or I get NAs, but I don't like it.
Is this what you're after?
mat <- as.matrix(df)
mat <- rbind(colnames(mat), mat)
colnames(mat) <- c("variable", "value")
mat
# variable value
# [1,] "is_employed" "false"
# [2,] "Hobbies" "squash"
# [3,] "Has Previous Experience" "false"
as.data.frame(mat)
# variable value
# 1 is_employed false
# 2 Hobbies squash
# 3 Has Previous Experience false