I'm trying to import census data from IPUMS into R but am not sure how to account for weights.
I extracted 41 variables spanning from 2000-2020. This dataset is called usa_00001.xml (data dictionary attached).
I took a look at the codebook for the imported data set to narrow down the list of variables for my analysis. Based on my review of the codebook, I decided to focus more on family structure, income, race/ethnicity, and education. Any variables that I determined would not prove useful were dropped from the new data set (data_clean1).
Variables: (1) year = year of census, stateicp, hhincome, nmothers, nfathers, nchild, hispan, race, educd, inctot, educd_mom, educd_pop, inctot_mom, and inctot_pop.
ddi <- read_ipums_ddi("usa_00001.xml")
data <- read_ipums_micro(ddi)
makeCodebook(data, replace=TRUE, output = "pdf")
data_clean1 <- data %>%
select(YEAR, STATEICP, HHINCOME, NMOTHERS, NFATHERS, NCHILD, HISPAN, RACE, EDUCD, INCTOT, EDUCD_MOM, EDUCD_POP, INCTOT_MOM, INCTOT_POP) %>%
rename(
'Year'='YEAR',
'State_ID' = 'STATEICP',
'Household_Income' = 'HHINCOME',
'NMothers' = 'NMOTHERS',
'NFathers' = 'NFATHERS',
'NChild' = 'NCHILD',
'Hispanic' = 'HISPAN',
'Race' = 'RACE',
'Education' = 'EDUCD',
'Income_Total' = 'INCTOT',
'Education_M' = 'EDUCD_MOM',
'Education_F' = 'EDUCD_POP',
'Income_Total_M' = 'INCTOT_MOM',
'Income_Total_F' = 'INCTOT_POP') %>%
filter(Race %in% c(1:2)) %>%
filter(Education %in% c(002, 062, 063, 064, 081, 101, 114, 116)) %>%
filter(Income_Total %in% c(1:1184000)) %>%
filter(Household_Income %in% c(1:2260000)) %>%
mutate(Hispanic = factor(Hispanic,
levels = c(0, 1, 2, 3, 4, 9),
labels = c("Not Hispanic", "Mexican", "Puerto Rican", "Cuban", "Other", "Not Reported")
)) %>%
mutate(Race = factor(Race,
levels = c(1, 2),
labels = c("White", "Black/African American")
)) %>%
mutate(Education = factor(Education,
levels = c(002, 062, 063, 064, 081, 101, 114, 116),
labels = c("No Schooling Completed", "High School Graduate or GED", "Regular High School Diploma", "GED or Alternative Credential", "Associate's Degree", "Bachelor's Degree", "Master's Degree", "Doctoral Degree")
))
How do I account for weights? Do I need to keep some of the variables I deleted out? Or should I use tidycensus instead?
Related
I am trying to make use of tabnet with tidymodels and the Titanic dataset. Here is my code:
pacman::p_load(tidyverse,
tidymodels,
tabnet,
torch,
doParallel,
reprex)
data(Titanic)
Titanic <- as.data.frame(Titanic)
#partition data
set.seed(1711)
titanic_split <- initial_split(Titanic, prop = 0.75, strata = Survived)
titanic_train <- training(titanic_split)
titanic_test <- testing(titanic_split)
#create cross-validation folds of training data
set.seed(1712)
folds <- vfold_cv(titanic_train,
folds = 3,
strata = Survived)
# define recipes for different models
titanic_rec <- recipe(formula = Survived ~ ., data = titanic_train) %>%
update_role(Survived, new_role = "outcome") %>%
step_zv() %>%
step_novel(all_nominal_predictors()) %>%
step_dummy(all_nominal_predictors(), one_hot = TRUE) %>%
step_YeoJohnson()
juice(prep(titanic_rec))
#define model
tab_spec <- tabnet(
mode = "classification",
epochs = 1, batch_size = 16384, decision_width = tune(), attention_width = tune(),
num_steps = tune(), penalty = 0.000001, virtual_batch_size = 512, momentum = 0.6,
feature_reusage = 1.5, learn_rate = tune()
) %>%
set_engine("torch", verbose= T)
wf <- workflow() %>%
add_model(tab_spec) %>%
add_recipe(titanic_rec)
grid <-
wf %>%
extract_parameter_set_dials() %>%
update(
decision_width = decision_width(range = c(20, 40)),
attention_width = attention_width(range = c(20, 40)),
num_steps = num_steps(range = c(4, 6)),
learn_rate = learn_rate(range = c(-2.5, -1))
) %>%
grid_max_entropy(size = 8)
auc_metric <- metric_set(yardstick::roc_auc)
auc_ctrl <- control_race(
verbose_elim = TRUE)
auc_results <- wf %>%
tune_grid(
resamples = folds,
control = auc_ctrl,
grid = grid)
This works like a charm . If i try however to use paraller processing I´m getting an error:
cl7 <- makePSOCKcluster(7)
registerDoParallel(cl7)
auc_results <- wf %>%
tune_grid(
resamples = folds,
control = auc_ctrl,
grid = grid)
Warning message:
All models failed. Run show_notes(.Last.tune.result) for more information.
Upon running shoe_notes i get the following : unique notes:
Error in UseMethod("filter"): no applicable method for 'filter' applied to an object of class "NULL"
Anyone knows how to fix this ?
I need to show data caption, computer name and period in the header of table.
I have also requirements: zebra theme, merging cells if needed. That's why I chose flextable.
Here is my code:
library(officer) # border settings library
library(flextable) # drawing tables library
library(dplyr)
Caption <- "<b><big>Computer01.domain.com</big></b><br>Network Interface<br>Gbit Total/sec<br><small>2021-05-14 18:04 to 2021-05-25 13:29</small>"
bold_border <- fp_border(color="gray", width =2)
std_border <- fp_border(color="gray")
stub <- "2021-05-14 01:40 to 2021-05-17 08:26"
table_data <- data.frame (
Instance = c("Intel[R] Ethernet 10G",
"Intel[R] Ethernet Converged Network Adapter _1",
"Intel[R] Ethernet Converged Network Adapter _2",
"Intel[R] Ethernet 10G",
"Intel[R] Gigabit"),
Max = c(2.45, 2.41, 2.29, 2.17, 0),
Avg = c(0.15, 0.15, 0.15, 0.17, 0)
)
table <- table_data %>% flextable() %>%
set_caption(caption = Caption , html_escape = F) %>%
bg(bg = "#579FAD", part = "header") %>%
color(color = "white", part = "header") %>%
theme_zebra(
odd_header = "#579FAD",
odd_body = "#E0EFF4",
even_header = "transparent",
even_body = "transparent"
) %>%
set_table_properties(width = 1, layout = "autofit") %>%
hline(part="all", border = std_border ) %>%
vline(part="all", border = std_border ) %>%
border_outer( border = bold_border, part = "all" ) %>%
fix_border_issues() %>%
set_header_labels(
values = list(Instance = InstanceName ) ) %>%
flextable::font (part = "all" , fontname = "Calibri")
save_as_docx( table, path = file.path("c:\\temp", "test01.docx") )
save_as_html (table, path = file.path("c:\\temp", "test01.html"))
Here is what I got in html which is okay for me:
But in docx format my header style is not applied:
How can I create header like I did for html that can be saved to both html and docx?
If I have to create separate tables - one for html, other for docx - it's not so good but acceptable options. That case my question how to create header I made in html but for docx format?
I am using the R programming language. I am trying to take different types of graphs (bar graphs, pie charts) and put them on the same page. I generated some fake data and made several graphs - then I put them together (see : Combining Different Types of Graphs Together (R))
library(dplyr)
library(ggplot2)
library(cowplot)
library(gridExtra)
library(plotly)
date= seq(as.Date("2014/1/1"), as.Date("2016/1/1"),by="day")
var <- rnorm(731,10,10)
group <- sample( LETTERS[1:4], 731, replace=TRUE, prob=c(0.25, 0.22, 0.25, 0.25) )
data = data.frame(date, var, group)
data$year = as.numeric(format(data$date,'%Y'))
data$year = as.factor(data$year)
###Pie
Pie_2014 <- data %>%
filter((data$year == "2014")) %>%
group_by(group) %>%
summarise(n = n())
Pie_2014_graph = ggplot(Pie_2014, aes(x="", y=n, fill=group)) +
geom_bar(stat="identity", width=1) +
coord_polar("y", start=0) +ggtitle( "Pie Chart 2014")
Pie_2015 <- data %>%
filter((data$year == "2015")) %>%
group_by(group) %>%
summarise(n = n())
Pie_2015_graph = ggplot(Pie_2015, aes(x="", y=n, fill=group)) +
geom_bar(stat="identity", width=1) +
coord_polar("y", start=0) +ggtitle( "Pie Chart 2015")
Pie_total = data %>%
group_by(group) %>%
summarise(n = n())
Pie_total_graph = ggplot(Pie_total, aes(x="", y=n, fill=group)) +
geom_bar(stat="identity", width=1) +
coord_polar("y", start=0) +ggtitle( "Pie Chart Average")
###bars
Bar_years = data %>%
group_by(year, group) %>%
summarise(mean = mean(var))
Bar_years_plot = ggplot(Bar_years, aes(fill=group, y=mean, x=year)) +
geom_bar(position="dodge", stat="identity") + ggtitle("Bar Plot All Years")
Bar_total = data %>%
group_by(group) %>%
summarise(mean = n())
Bar_total_plot = ggplot(Bar_total, aes(x=group, y=mean, fill=group)) +
geom_bar(stat="identity")+theme_minimal() + ggtitle("Bar Plot Average")
#assembling the graphs can be done two different ways
#first way
g1 <- grid.arrange(Pie_2014_graph, Pie_2015_graph , Pie_total_graph, nrow = 1)
g2 <- grid.arrange(Bar_total_plot, Bar_years_plot, nrow = 1)
g = grid.arrange(g1, g2, ncol = 1)
#second way
# arrange subplots in rows
top_row <- plot_grid(Pie_2014_graph, Pie_2015_graph, Pie_total_graph)
middle_row <- plot_grid(Bar_years_plot, Bar_total_plot)
# arrange our new rows into combined plot
p <- plot_grid(top_row, middle_row, nrow = 2)
p
From here, I am trying to use the plotly::ggplotly() command to make the above output "interactive" (move the mouse over the graphs and see labels). I know that this works for individual plots:
ggplotly(Bar_years_plot)
However, this command does not seem to work with the "cowplot" and the "gridExtra" outputs:
#gridExtra version:
ggplotly(g)
Error in UseMethod("ggplotly", p) :
no applicable method for 'ggplotly' applied to an object of class "c('gtable', 'gTree', 'grob', 'gDesc')"
#cowplot version: (produces empty plot)
ggplotly(p)
Warning messages:
1: In geom2trace.default(dots[[1L]][[1L]], dots[[2L]][[1L]], dots[[3L]][[1L]]) :
geom_GeomDrawGrob() has yet to be implemented in plotly.
If you'd like to see this geom implemented,
Please open an issue with your example code at
https://github.com/ropensci/plotly/issues
2: In geom2trace.default(dots[[1L]][[1L]], dots[[2L]][[1L]], dots[[3L]][[1L]]) :
geom_GeomDrawGrob() has yet to be implemented in plotly.
If you'd like to see this geom implemented,
Please open an issue with your example code at
https://github.com/ropensci/plotly/issues
Does anyone know if there is a quick way to use the ggplotly() function for objects created with "gridExtra" or "cowplot"?
I know that with a bit of work, it might be possible using "htmltools":
library(htmltools)
doc <- htmltools::tagList(
div(Pie_2014_graph, style = "float:left;width:50%;"),
div(Pie_2015_graph,style = "float:left;width:50%;"),
div(Pie_total_graph, style = "float:left;width:50%;"),
div(Bar_years_plot, style = "float:left;width:50%;"),
div(Bar_total_plot, style = "float:left;width:50%;"))
save_html(html = doc, file = "out.html")
But I am not sure how to do this.
Can someone please show me how to make the collections of graphs interactive either using ggplotly() or with htmltools()?
Thanks.
You should apply ggplotly() to the individual graphs, not the collection graphs.
For example:
Pie_2014_graph = ggplotly(ggplot(Pie_2014, aes(x="", y=n, fill=group)) +
geom_bar(stat="identity", width=1) +
coord_polar("y", start=0) +ggtitle( "Pie Chart 2014") )
I am using the R programming language. I am interested in learning how to save several "html widgets" together. I have been able to manually create different types of html widgets:
#widget 1
library(htmlwidgets)
library(leaflet)
library(RColorBrewer)
# create map data
map_data <- data.frame(
"Lati" = c(43.6426, 43.6424, 43.6544, 43.6452, 43.6629), "Longi" = c(-79.3871, -79.3860, -79.3807, -79.3806, -79.3957),
"Job" = c("Economist", "Economist", "Teacher", "Teacher", "Lawyer"),
"First_Name" = c("John", "James", "Jack", "Jason", "Jim"),
"Last_Name" = c("Smith", "Charles", "Henry", "David", "Robert"),
"vehicle" = c("car", "van", "car", "none", "car")
)
kingdom <- c("Economist", "Lawyer", "Teacher")
my_palette <- brewer.pal(3, "Paired")
factpal <- colorFactor(my_palette, levels = kingdom)
groups <- unique(map_data$Job)
# finalize map
map <- leaflet(map_data) %>%
addTiles(group = "OpenStreetMap") %>%
addCircleMarkers(~Longi, ~Lati, popup = ~Job,
radius = 10, weight = 2, opacity = 1, color = ~factpal(Job),
fill = TRUE, fillOpacity = 1, group = ~Job
)
widget_1 = map %>%
addLayersControl(overlayGroups = groups, options = layersControlOptions(collapsed = FALSE)) %>%
addTiles() %>%
addMarkers(lng = ~Longi,
lat = ~Lati,
popup = ~paste("Job", Job, "<br>",
"First_Name:", First_Name, "<br>",
"Last_Name:", Last_Name, "<br>", "vehicle:", vehicle, "<br>"))
widget 2:
##### widget 2
library(plotly)
library(ggplot2)
p_plot <- data.frame(frequency = c(rnorm(31, 1), rnorm(31)),
is_consumed = factor(round(runif(62))))
p2 <- p_plot %>%
ggplot(aes(frequency, fill = is_consumed)) +
geom_density(alpha = 0.5)
widget_2 = ggplotly(p2)
widget 3:
#####widget_3
today <- Sys.Date()
tm <- seq(0, 600, by = 10)
x <- today - tm
y <- rnorm(length(x))
widget_3 <- plot_ly(x = ~x, y = ~y, mode = 'lines', text = paste(tm, "days from today"))
widget 4:
####widget_4
library(igraph)
library(dplyr)
library(visNetwork)
Data_I_Have <- data.frame(
"Node_A" = c("John", "John", "John", "Peter", "Peter", "Peter", "Tim", "Kevin", "Adam", "Adam", "Xavier"),
"Node_B" = c("Claude", "Peter", "Tim", "Tim", "Claude", "Henry", "Kevin", "Claude", "Tim", "Henry", "Claude")
)
graph_file <- data.frame(Data_I_Have$Node_A, Data_I_Have$Node_B)
colnames(graph_file) <- c("Data_I_Have$Node_A", "Data_I_Have$Node_B")
graph <- graph.data.frame(graph_file, directed=F)
graph <- simplify(graph)
nodes <- data.frame(id = V(graph)$name, title = V(graph)$name)
nodes <- nodes[order(nodes$id, decreasing = F),]
edges <- get.data.frame(graph, what="edges")[1:2]
widget_4 = visNetwork(nodes, edges) %>% visIgraphLayout(layout = "layout_with_fr") %>%
visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE)
From here, I found another stackoverflow post where a similar question was asked: Using R and plot.ly, how to save multiples htmlwidgets to my html?
In this post, it explains how to save several html widgets together - the person who answered the question wrote a function to do so:
library(htmltools)
save_tags <- function (tags, file, selfcontained = F, libdir = "./lib")
{
if (is.null(libdir)) {
libdir <- paste(tools::file_path_sans_ext(basename(file)),
"_files", sep = "")
}
htmltools::save_html(tags, file = file, libdir = libdir)
if (selfcontained) {
if (!htmlwidgets:::pandoc_available()) {
stop("Saving a widget with selfcontained = TRUE requires pandoc. For details see:\n",
"https://github.com/rstudio/rmarkdown/blob/master/PANDOC.md")
}
htmlwidgets:::pandoc_self_contained_html(file, file)
unlink(libdir, recursive = TRUE)
}
return(htmltools::tags$iframe(src= file, height = "400px", width = "100%", style="border:0;"))
}
I tried using this function to save the 4 widgets together:
save_tags(widget_1, widget_2, widget_3, widget_4)
But doing so, I got the following error:
Error in dirname(file) : a character vector argument expected
Is there a straightforward and simple way for saving multiple html widgets together?
Thanks
NOTE: I know that you can use the combineWidgets() function in R:
library(manipulateWidget)
combineWidgets(widget_1, widget_2, widget_3, widget_4)
However, I am working with a computer that has no internet access or USB ports. This computer has a pre-installed copy of R with limited libraries (it has all the libraries used throughout my question except "manipulateWidget"). I am looking for the simplest way to save multiple html widgets together (e.g. is this possible in base R)?
Thanks
If format doesn't matter too much, you can merge the widgets using tagList and save them directly:
htmltools::save_html(tagList(widget_1, widget_2, widget_3, widget_4), file = "C://Users//Me//Desktop//widgets.html")
(It goes without saying that you will need to edit the filepath!)
If you want to control the layout of the widgets, you can wrap each in a div, and then style those:
doc <- htmltools::tagList(
div(widget_1, style = "float:left;width:50%;"),
div(widget_2,style = "float:left;width:50%;"),
div(widget_3, style = "float:left;width:50%;"),
div(widget_4, style = "float:left;width:50%;")
)
htmltools::save_html(html = doc, file = "C://Users//Me//Desktop//widgets.html")
Say I have two htmlwidgets
# Load energy projection data
# Load energy projection data
library(networkD3)
URL <- paste0(
"https://cdn.rawgit.com/christophergandrud/networkD3/",
"master/JSONdata/energy.json")
Energy <- jsonlite::fromJSON(URL)
# Plot
sankeyNetwork(Links = Energy$links, Nodes = Energy$nodes, Source = "source",
Target = "target", Value = "value", NodeID = "name",
units = "TWh", fontSize = 12, nodeWidth = 30)
and
library(leaflet)
data(quakes)
# Show first 20 rows from the `quakes` dataset
leaflet(data = quakes[1:20,]) %>% addTiles() %>%
addMarkers(~long, ~lat, popup = ~as.character(mag))
And I want to put them side by side in an html page. How can I do this? Could I use an iframe? Other?
There are lots of ways to answer this. Often sizing and positioning will vary based on who authored the htmlwidget, so you might need to experiment a little. The easiest way if you don't plan to use a CSS framework with grid helpers will be to wrap each htmlwidget in tags$div() and use CSS. You also might be interested in the very nice new flexbox-based dashboard package from RStudio http://github.com/rstudio/flexdashboard.
# Load energy projection data
# Load energy projection data
library(networkD3)
URL <- paste0(
"https://cdn.rawgit.com/christophergandrud/networkD3/",
"master/JSONdata/energy.json")
Energy <- jsonlite::fromJSON(URL)
# Plot
sn <- sankeyNetwork(Links = Energy$links, Nodes = Energy$nodes, Source = "source",
Target = "target", Value = "value", NodeID = "name",
units = "TWh", fontSize = 12, nodeWidth = 30,
width = "100%")
library(leaflet)
data(quakes)
# Show first 20 rows from the `quakes` dataset
leaf <- leaflet(data = quakes[1:20,]) %>% addTiles() %>%
addMarkers(~long, ~lat, popup = ~as.character(mag))
library(htmltools)
browsable(
tagList(list(
tags$div(
style = 'width:50%;display:block;float:left;',
sn
),
tags$div(
style = 'width:50%;display:block;float:left;',
leaf
)
))
)