expand and duplicates - duplicates

I have a dataset as follows.
* Example generated by -dataex-. To install: ssc install dataex
clear
input str4 id str8 drug1 str3(drug2 drug3)
"pat" "thiazide" "BB" "CCB"
"ann" "thiazide" "ace" ""
"mary" "ace" "" ""
"john" "ace" "" ""
end
I want to create a separate row for each person for each drug they have. reshape definitely isn't what I want here: I've been experimenting with expand and think this is the solution.... , bar a few little things that I can't get right. I'm thinking I need to expand and then remove duplicates.
Step 1:
Here's the code I used to get what I want, and it works fine, except for pat: his third drug isn't copying into his third row.
expand 3
by id, sort: generate drug = cond(_n == 1,drug1, drug2, drug3)
* Example generated by -dataex-. To install: ssc install dataex
clear
input str4 id str8 drug1 str3(drug2 drug3) str8 drug
"ann" "thiazide" "ace" "" "thiazide"
"ann" "thiazide" "ace" "" "ace"
"ann" "thiazide" "ace" "" "ace"
"john" "ace" "" "" "ace"
"john" "ace" "" "" ""
"john" "ace" "" "" ""
"mary" "ace" "" "" "ace"
"mary" "ace" "" "" ""
"mary" "ace" "" "" ""
"pat" "thiazide" "BB" "CCB" "thiazide"
"pat" "thiazide" "BB" "CCB" "BB"
"pat" "thiazide" "BB" "CCB" "BB"
end
If anyone could instruct me on how to fix this that would be brilliant.
Step 2:
For step two (imagine that pat's rows are correct for this), I want to remove duplicates so that I am left only with the correct number of rows for each person according to their distinct number of drugs. For example, none of pat's rows should be duplicates, so I want to keep all his rows. but ann has a duplicate row that I need to remove.
This is what I have used:
bys id drug: gen dup2=cond(_N==1,0,_n)
drop if dup2>1
This is ok, but I am left with extra rows for mary and john. I deal with these using:
drop if drug==""
Is this the most efficient/least error prone approach?
Amendment
It turns out the my toy dataset was too simplistic to reflect my real data. My actual data are already long, so this is why reshape won't work here. I am very happy to be corrected, but I think expand might be the way to go. Except, now, when I try to expand on more complex data I can not figure out how to make the loop make the dataset I need (essentially, one observation per person per drug). Here is an example of what I have:
clear
input str4 id int day str8 drug1 str3(drug2 drug3)
"ann" 14 "thiazide" "ace" ""
"ann" 70 "thiazide" "ace" ""
"ann" 1 "CCB" "" ""
"ann" 35 "thiazide" "ace" ""
"ann " 30 "CCB" "" ""
"john" 1 "ace" "" ""
"john" 30 "CCB" "" ""
"john" 150 "ace" "" ""
"john" 60 "ace" "" ""
"john" 60 "CCB" "" ""
"john" 30 "ace" "" ""
"john" 1 "CCB" "" ""
"mary" 30 "ace" "" ""
"mary" 1 "ace" "" ""
"mary" 115 "thiazide" "" ""
"mary" 60 "ace" "" ""
"mary" 90 "ace" "" ""
"mary" 120 "ace" "" ""
"pat" 30 "thiazide" "BB" "CCB"
"pat" 1 "ace" "" ""
"pat" 30 "ace" "" ""
"pat" 1 "thiazide" "BB" "CCB"
end
After using:
expand 3
Here is an example of what I want, but am unsure of how to write the code to get this. I have tried using variations of Nick Cox's loop below; but am not getting it right.
clear
input str4 id int day str8 drug1 str3(drug2 drug3) str8 drug
"ann" 1 "CCB" "" "" "CCB"
"ann" 1 "CCB" "" "" ""
"ann" 1 "CCB" "" "" ""
"ann" 14 "thiazide" "ace" "" "thiazide"
"ann" 14 "thiazide" "ace" "" "ace"
"ann" 14 "thiazide" "ace" "" ""
"ann" 35 "thiazide" "ace" "" "thiazide"
"ann" 35 "thiazide" "ace" "" "ace"
"ann" 35 "thiazide" "ace" "" ""
"ann" 70 "thiazide" "ace" "" "thiazide"
"ann" 70 "thiazide" "ace" "" "ace"
"ann" 70 "thiazide" "ace" "" ""
"ann " 30 "CCB" "" "" "CCB"
"ann " 30 "CCB" "" "" ""
"ann " 30 "CCB" "" "" ""
"john" 1 "CCB" "" "" "CCB"
"john" 1 "CCB" "" "" ""
"john" 1 "CCB" "" "" ""
"john" 1 "ace" "" "" "ace"
"john" 1 "ace" "" "" ""
"john" 1 "ace" "" "" ""
"john" 30 "CCB" "" "" "CCB"
"john" 30 "CCB" "" "" ""
"john" 30 "CCB" "" "" ""
"john" 30 "ace" "" "" "ace"
"john" 30 "ace" "" "" ""
"john" 30 "ace" "" "" ""
"john" 60 "CCB" "" "" "CCB"
"john" 60 "CCB" "" "" ""
"john" 60 "CCB" "" "" ""
"john" 60 "ace" "" "" "ace"
"john" 60 "ace" "" "" ""
"john" 60 "ace" "" "" ""
"john" 150 "ace" "" "" "ace"
"john" 150 "ace" "" "" ""
"john" 150 "ace" "" "" ""
"mary" 1 "ace" "" "" "ace"
"mary" 1 "ace" "" "" ""
"mary" 1 "ace" "" "" ""
"mary" 30 "ace" "" "" "ace"
"mary" 30 "ace" "" "" ""
"mary" 30 "ace" "" "" ""
"mary" 60 "ace" "" "" "ace"
"mary" 60 "ace" "" "" ""
"mary" 60 "ace" "" "" ""
"mary" 90 "ace" "" "" "ace"
"mary" 90 "ace" "" "" ""
"mary" 90 "ace" "" "" ""
"mary" 115 "thiazide" "" "" "thiazide"
"mary" 115 "thiazide" "" "" ""
"mary" 115 "thiazide" "" "" ""
"mary" 120 "ace" "" "" "ace"
"mary" 120 "ace" "" "" ""
"mary" 120 "ace" "" "" ""
"pat" 1 "ace" "" "" "ace"
"pat" 1 "ace" "" "" ""
"pat" 1 "ace" "" "" ""
"pat" 1 "thiazide" "BB" "CCB" "thiazide"
"pat" 1 "thiazide" "BB" "CCB" "BB"
"pat" 1 "thiazide" "BB" "CCB" "CCB"
"pat" 30 "ace" "" "" "ace"
"pat" 30 "ace" "" "" ""
"pat" 30 "ace" "" "" ""
"pat" 30 "thiazide" "BB" "CCB" "thiazide"
"pat" 30 "thiazide" "BB" "CCB" "BB"
"pat" 30 "thiazide" "BB" "CCB" "CCB"
end
At this point I can remove the observations with missing values, and clean up the dataset to get the following:
drop if missing(drug)
drop drug?
clear
input str4 id int day str8 drug
"ann" 1 "CCB"
"ann" 14 "thiazide"
"ann" 14 "ace"
"ann" 35 "thiazide"
"ann" 35 "ace"
"ann" 70 "thiazide"
"ann" 70 "ace"
"ann " 30 "CCB"
"john" 1 "CCB"
"john" 1 "ace"
"john" 30 "CCB"
"john" 30 "ace"
"john" 60 "CCB"
"john" 60 "ace"
"john" 150 "ace"
"mary" 1 "ace"
"mary" 30 "ace"
"mary" 60 "ace"
"mary" 90 "ace"
"mary" 115 "thiazide"
"mary" 120 "ace"
"pat" 1 "ace"
"pat" 1 "thiazide"
"pat" 1 "BB"
"pat" 1 "CCB"
"pat" 30 "ace"
"pat" 30 "thiazide"
"pat" 30 "BB"
"pat" 30 "CCB"
end

I am mystified at the dismissal of reshape without argument or evidence. reshape gets you there directly except for one line to clean out missings.
* Example generated by -dataex-. To install: ssc install dataex
clear
input str4 id str8 drug1 str3(drug2 drug3)
"pat" "thiazide" "BB" "CCB"
"ann" "thiazide" "ace" ""
"mary" "ace" "" ""
"john" "ace" "" ""
end
reshape long drug, i(id) j(seq)
drop if missing(drug)
list, sepby(id)
+-----------------------+
| id seq drug |
|-----------------------|
1. | ann 1 thiazide |
2. | ann 2 ace |
|-----------------------|
3. | john 1 ace |
|-----------------------|
4. | mary 1 ace |
|-----------------------|
5. | pat 1 thiazide |
6. | pat 2 BB |
7. | pat 3 CCB |
+-----------------------+
EDIT:
Your idea of starting with expand can be made to work quite easily. Underneath the hood reshape is doing something similar.
clear
input str4 id str8 drug1 str3(drug2 drug3)
"pat" "thiazide" "BB" "CCB"
"ann" "thiazide" "ace" ""
"mary" "ace" "" ""
"john" "ace" "" ""
end
expand 3
sort id
gen drug = ""
quietly forval j = 1/3 {
by id: replace drug = drug`j' if _n == `j'
}
drop if missing(drug)
drop drug?
list, sepby(id)
EDIT 2
The extra complications are just that, complications, and don't imply a different approach. You need greater faith and to understand that reshape is more versatile than you think it is! See e.g. the FAQ here as well as the help and manual entry.
Trivially, I am going to assume that "Ann " is just a typo for "Ann". Then what we have is not just different days for the same people but also somehow duplicates for some people and days. All that means is to spell out the identifiers more fully; in fact we need one extra variable. The principle that sometimes a new identifier variable is needed to spell out a tacit order, even if arbitrarily, is discussed in the FAQ cited. The idea that "long long" layouts are possible is also a standard notion.
clear
input str4 id int day str8 drug1 str3(drug2 drug3)
"ann" 14 "thiazide" "ace" ""
"ann" 70 "thiazide" "ace" ""
"ann" 1 "CCB" "" ""
"ann" 35 "thiazide" "ace" ""
"ann " 30 "CCB" "" ""
"john" 1 "ace" "" ""
"john" 30 "CCB" "" ""
"john" 150 "ace" "" ""
"john" 60 "ace" "" ""
"john" 60 "CCB" "" ""
"john" 30 "ace" "" ""
"john" 1 "CCB" "" ""
"mary" 30 "ace" "" ""
"mary" 1 "ace" "" ""
"mary" 115 "thiazide" "" ""
"mary" 60 "ace" "" ""
"mary" 90 "ace" "" ""
"mary" 120 "ace" "" ""
"pat" 30 "thiazide" "BB" "CCB"
"pat" 1 "ace" "" ""
"pat" 30 "ace" "" ""
"pat" 1 "thiazide" "BB" "CCB"
end
replace id = trim(id)
bysort id day : gen SEQ = _n
reshape long drug, i(id day SEQ) j(seq)
drop if missing(drug)
list, sepby(id)
+-----------------------------------+
| id day SEQ seq drug |
|-----------------------------------|
1. | ann 1 1 1 CCB |
2. | ann 14 1 1 thiazide |
3. | ann 14 1 2 ace |
4. | ann 30 1 1 CCB |
5. | ann 35 1 1 thiazide |
6. | ann 35 1 2 ace |
7. | ann 70 1 1 thiazide |
8. | ann 70 1 2 ace |
|-----------------------------------|
9. | john 1 1 1 ace |
10. | john 1 2 1 CCB |
11. | john 30 1 1 ace |
12. | john 30 2 1 CCB |
13. | john 60 1 1 ace |
14. | john 60 2 1 CCB |
15. | john 150 1 1 ace |
|-----------------------------------|
16. | mary 1 1 1 ace |
17. | mary 30 1 1 ace |
18. | mary 60 1 1 ace |
19. | mary 90 1 1 ace |
20. | mary 115 1 1 thiazide |
21. | mary 120 1 1 ace |
|-----------------------------------|
22. | pat 1 1 1 ace |
23. | pat 1 2 1 thiazide |
24. | pat 1 2 2 BB |
25. | pat 1 2 3 CCB |
26. | pat 30 1 1 ace |
27. | pat 30 2 1 thiazide |
28. | pat 30 2 2 BB |
29. | pat 30 2 3 CCB |
+-----------------------------------+

Here's my effort at more complex data - seems to work ok, but happy to be corrected. Or if there is another better way of doing this, please do post!
Toy data here
clear
input str4 id int day str8 drug1 str3(drug2 drug3)
"pat" 1 "thiazide" "BB" "CCB"
"pat" 1 "ace" "" ""
"pat" 30 "ace" "" ""
"pat" 30 "thiazide" "BB" "CCB"
"ann" 1 "CCB" "" ""
"ann" 14 "thiazide" "ace" ""
"ann " 30 "CCB" "" ""
"ann" 35 "thiazide" "ace" ""
"ann" 70 "thiazide" "ace" ""
"mary" 1 "ace" "" ""
"mary" 30 "ace" "" ""
"mary" 60 "ace" "" ""
"mary" 90 "ace" "" ""
"mary" 115 "thiazide" "" ""
"mary" 120 "ace" "" ""
"john" 150 "ace" "" ""
"john" 1 "CCB" "" ""
"john" 1 "ace" "" ""
"john" 30 "CCB" "" ""
"john" 30 "ace" "" ""
"john" 60 "CCB" "" ""
"john" 60 "ace" "" ""
end
code here:
expand 3
gen drug=""
sort id day
egen group=group(id day drug1)
bys id group: gen count=_n
forval j = 1/3 {
bys id group: replace drug = drug`j' if count == `j'
}
drop if missing(drug)
drop drug? count group
NJC simplification:
expand 3
gen drug = ""
forval j = 1/3 {
by id day drug1: replace drug = drug`j' if _n == `j'
}
drop if missing(drug)
drop drug?

Related

Scraping Website with Unchanging URL in R

I would like to scrape a series of tables from a website whose URL does not change when I click through the tables in my browser. Each table corresponds to a unique date. The default table is that which corresponds to today's date. I can scroll through past dates in my browser, but can't seem to find a way to do so in R.
Using library(rvest) this bit of code will reliably download the table that corresponds to today's date (I'm only interested in the first of the three tables).
webad <- "https://official.nba.com/referee-assignments/"
off <- webad %>%
read_html() %>%
html_table()
off <- off[[1]]
How can I download the table that corresponds to, say "2022-10-04", to "2022-10-06", or to yesterday?
I've tried to work through it by identifying the node under which the table lies, in the hopes that I could manipulate it to reflect a prior date. However, the following reproduces the same table as above:
webad <- "https://official.nba.com/referee-assignments/"
off <- webad %>%
read_html() %>%
html_nodes("#main > div > section:nth-child(1) > article > div > div.dayContent > div > table") %>%
html_table()
off <- off[[1]]
Scrolling through past dates in my browser, I've identified various places in the html that reference the prior date; but I can't seem to change it from R, yet alone get the table I download to reflect a change:
webad %>%
read_html() %>%
html_nodes("#main > div > section:nth-child(1) > article > header > div")
I've messed around some with html_form(), follow_link(), and set_values() also, but to no avail.
Is there a good way to navigate this particular URL in R?
You can consider the following approach :
library(RSelenium)
library(rvest)
port <- as.integer(4444L + rpois(lambda = 1000, 1))
rd <- rsDriver(chromever = "105.0.5195.52", browser = "chrome", port = port)
remDr <- rd$client
remDr$open()
url <- "https://official.nba.com/referee-assignments/"
remDr$navigate(url)
web_Obj_Date <- remDr$findElement("css selector", "#ref-filters-menu > li > div > button")
web_Obj_Date$clickElement()
web_Obj_Date_Input <- remDr$findElement("id", 'ref-date')
web_Obj_Date_Input$clearElement()
web_Obj_Date_Input$sendKeysToElement(list("2022-10-05"))
web_Obj_Date_Input$doubleclick()
web_Obj_Date <- remDr$findElement("css selector", "#ref-filters-menu > li > div > button")
web_Obj_Date$clickElement()
web_Obj_Go_Button <- remDr$findElement("css selector", "#date-filter")
web_Obj_Go_Button$submitElement()
html_Content <- remDr$getPageSource()[[1]]
read_html(html_Content) %>% html_table()
[[1]]
# A tibble: 5 x 5
Game `Official 1` `Official 2` `Official 3` Alternate
<chr> <chr> <chr> <chr> <lgl>
1 Indiana # Charlotte John Goble (#10) Lauren Holtkamp (#7) Phenizee Ransom (#70) NA
2 Cleveland # Philadelphia Marc Davis (#8) Jacyn Goble (#68) Tyler Mirkovich (#97) NA
3 Toronto # Boston Josh Tiven (#58) Matt Boland (#18) Intae hwang (#96) NA
4 Dallas # Oklahoma City Courtney Kirkland (#61) Mitchell Ervin (#27) Cheryl Flores (#91) NA
5 Phoenix # L.A. Lakers Bill Kennedy (#55) Rodney Mott (#71) Jenna Reneau (#93) NA
[[2]]
# A tibble: 0 x 5
# ... with 5 variables: Game <lgl>, Official 1 <lgl>, Official 2 <lgl>, Official 3 <lgl>, Alternate <lgl>
# i Use `colnames()` to see all variable names
[[3]]
# A tibble: 0 x 5
# ... with 5 variables: Game <lgl>, Official 1 <lgl>, Official 2 <lgl>, Official 3 <lgl>, Alternate <lgl>
# i Use `colnames()` to see all variable names
[[4]]
# A tibble: 6 x 7
S M T W T F S
<int> <int> <int> <int> <int> <int> <int>
1 NA NA NA NA NA NA 1
2 2 3 4 5 6 7 8
3 9 10 11 12 13 14 15
4 16 17 18 19 20 21 22
5 23 24 25 26 27 28 29
6 30 31 NA NA NA NA NA
Here is another approach that can be considered :
library(RDCOMClient)
library(rvest)
url <- "https://official.nba.com/referee-assignments/"
IEApp <- COMCreate("InternetExplorer.Application")
IEApp[['Visible']] <- TRUE
IEApp$Navigate(url)
Sys.sleep(5)
doc <- IEApp$Document()
clickEvent <- doc$createEvent("MouseEvent")
clickEvent$initEvent("click", TRUE, FALSE)
web_Obj_Date <- doc$querySelector("#ref-filters-menu > li > div > button")
web_Obj_Date$dispatchEvent(clickEvent)
web_Obj_Date_Input <- doc$GetElementById('ref-date')
web_Obj_Date_Input[["Value"]] <- "2022-10-05"
web_Obj_Go_Button <- doc$querySelector("#date-filter")
web_Obj_Go_Button$dispatchEvent(clickEvent)
html_Content <- doc$Body()$innerHTML()
read_html(html_Content) %>% html_table()
[[1]]
# A tibble: 5 x 5
Game `Official 1` `Official 2` `Official 3` Alternate
<chr> <chr> <chr> <chr> <lgl>
1 Indiana # Charlotte John Goble (#10) Lauren Holtkamp (#7) Phenizee Ransom (#70) NA
2 Cleveland # Philadelphia Marc Davis (#8) Jacyn Goble (#68) Tyler Mirkovich (#97) NA
3 Toronto # Boston Josh Tiven (#58) Matt Boland (#18) Intae hwang (#96) NA
4 Dallas # Oklahoma City Courtney Kirkland (#61) Mitchell Ervin (#27) Cheryl Flores (#91) NA
5 Phoenix # L.A. Lakers Bill Kennedy (#55) Rodney Mott (#71) Jenna Reneau (#93) NA
[[2]]
# A tibble: 8 x 7
Game `Official 1` `Official 2` `Official 3` Alternate `` ``
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 "Game" "Official 1" "Official 2" "Official 3" "Alternate" NA NA
2 "S" "M" "T" "W" "T" "F" "S"
3 "" "" "" "" "" "" "1"
4 "2" "3" "4" "5" "6" "7" "8"
5 "9" "10" "11" "12" "13" "14" "15"
6 "16" "17" "18" "19" "20" "21" "22"
7 "23" "24" "25" "26" "27" "28" "29"
8 "30" "31" "" "" "" "" ""
[[3]]
# A tibble: 7 x 7
Game `Official 1` `Official 2` `Official 3` Alternate `` ``
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 "S" "M" "T" "W" "T" "F" "S"
2 "" "" "" "" "" "" "1"
3 "2" "3" "4" "5" "6" "7" "8"
4 "9" "10" "11" "12" "13" "14" "15"
5 "16" "17" "18" "19" "20" "21" "22"
6 "23" "24" "25" "26" "27" "28" "29"
7 "30" "31" "" "" "" "" ""
[[4]]
# A tibble: 6 x 7
S M T W T F S
<int> <int> <int> <int> <int> <int> <int>
1 NA NA NA NA NA NA 1
2 2 3 4 5 6 7 8
3 9 10 11 12 13 14 15
4 16 17 18 19 20 21 22
5 23 24 25 26 27 28 29
6 30 31 NA NA NA NA NA
If you install the Docker software (see https://docs.docker.com/engine/install/), you can consider the following approach with firefox :
library(RSelenium)
library(rvest)
shell('docker run -d -p 4445:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4445L, browserName = "firefox")
remDr$open()
url <- "https://official.nba.com/referee-assignments/"
remDr$navigate(url)
web_Obj_Date <- remDr$findElement("css selector", "#ref-filters-menu > li > div > button")
web_Obj_Date$clickElement()
web_Obj_Date_Input <- remDr$findElement("id", 'ref-date')
web_Obj_Date_Input$clearElement()
web_Obj_Date_Input$sendKeysToElement(list("2022-10-05"))
web_Obj_Date_Input$doubleclick()
web_Obj_Date <- remDr$findElement("css selector", "#ref-filters-menu > li > div > button")
web_Obj_Date$clickElement()
web_Obj_Go_Button <- remDr$findElement("css selector", "#date-filter")
web_Obj_Go_Button$submitElement()
html_Content <- remDr$getPageSource()[[1]]
read_html(html_Content) %>% html_table()
[[1]]
# A tibble: 5 x 5
Game `Official 1` `Official 2` `Official 3` Alternate
<chr> <chr> <chr> <chr> <lgl>
1 Indiana # Charlotte John Goble (#10) Lauren Holtkamp (#7) Phenizee Ransom (#70) NA
2 Cleveland # Philadelphia Marc Davis (#8) Jacyn Goble (#68) Tyler Mirkovich (#97) NA
3 Toronto # Boston Josh Tiven (#58) Matt Boland (#18) Intae hwang (#96) NA
4 Dallas # Oklahoma City Courtney Kirkland (#61) Mitchell Ervin (#27) Cheryl Flores (#91) NA
5 Phoenix # L.A. Lakers Bill Kennedy (#55) Rodney Mott (#71) Jenna Reneau (#93) NA
[[2]]
# A tibble: 0 x 5
# ... with 5 variables: Game <lgl>, Official 1 <lgl>, Official 2 <lgl>, Official 3 <lgl>, Alternate <lgl>
# i Use `colnames()` to see all variable names
[[3]]
# A tibble: 0 x 5
# ... with 5 variables: Game <lgl>, Official 1 <lgl>, Official 2 <lgl>, Official 3 <lgl>, Alternate <lgl>
# i Use `colnames()` to see all variable names
[[4]]
# A tibble: 6 x 7
S M T W T F S
<int> <int> <int> <int> <int> <int> <int>
1 NA NA NA NA NA NA 1
2 2 3 4 5 6 7 8
3 9 10 11 12 13 14 15
4 16 17 18 19 20 21 22
5 23 24 25 26 27 28 29
6 30 31 NA NA NA NA NA

Powershell - Convert an ordered nested dictionary to html

I need to report the number of VM snapshots based on their age. For this, I constructed an ordered dictionary that contain another dictionary like this (json output):
{
"Less than 24h": {
"Prod": 15,
"Other": 11
},
"1 day": {
"Prod": 29,
"Other": 12
},
"2 days": {
"Prod": 11,
"Other": 0
},
"3 days and more": {
"Prod": 0,
"Other": 0
}
}
I need to convert it to html to be included in a mail.
I find how to convert a "simple" dictionary :
$Body1 = $dict.GetEnumerator() | Select-Object Key,Value | ConvertTo-Html -Fragment | Out-String
$Body1 = $Body1.Replace('Key','Days').Replace('Value','Number of Snapshot')
And that work fine, but not if the values are nested dictionaries.
For nested dictionary the output will be like this :
| Days | Number of Snapshot |
|-----------------|------------------------------------------------|
| Less than 24h |`System.Collections.Specialized.OrderedDictionary`|
| 1 day |`System.Collections.Specialized.OrderedDictionary`|
| 2 days |`System.Collections.Specialized.OrderedDictionary`|
| 3 days and more |`System.Collections.Specialized.OrderedDictionary`|
Is there a way to have a html output like this?
| Days | Prod | Other |
|-----------------|------|-------|
| Less than 24h | 15 | 11 |
| 1 day | 29 | 12 |
| 2 days | 11 | 0 |
| 3 days and more | 0 | 0 |
One option is to pre-process your dictionary and convert it into something else that can be turned into html a bit easier:
$data = $dict.GetEnumerator() | foreach-object {
new-object pscustomobject -property ([ordered] #{
"Days" = $_.Key
"Prod" = $_.Value.Prod
"Other" = $_.Value.Other
})
}
This will flatten your nested structure into this json-equivalent:
[
{
"Days": "2 days",
"Prod": 11,
"Other": 0
},
{
"Days": "1 day",
"Prod": 29,
"Other": 12
},
{
"Days": "Less than 24h",
"Prod": 15,
"Other": 11
},
{
"Days": "3 days and more",
"Prod": 0,
"Other": 0
}
]
And then you can just convert the whole thing in one go without -Fragment:
$Body1 = $data | ConvertTo-Html
which gives:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>HTML TABLE</title>
</head><body>
<table>
<colgroup><col/><col/><col/></colgroup>
<tr><th>Days</th><th>Prod</th><th>Other</th></tr>
<tr><td>2 days</td><td>11</td><td>0</td></tr>
<tr><td>1 day</td><td>29</td><td>12</td></tr>
<tr><td>Less than 24h</td><td>15</td><td>11</td></tr>
<tr><td>3 days and more</td><td>0</td><td>0</td></tr>
</table>
</body></html>
or
Days
Prod
Other
2 days
11
0
1 day
29
12
Less than 24h
15
11
3 days and more
0
0

Problem with retrieving a table from Chinook dataset due to character

I have created a database in MySQL with data from the Chinook dataset, which has fictitious information on customers that buy music.
One of the tables ("Invoice"), has the billing addresses, which has characters in diverse languages:
InvoiceId CustomerId InvoiceDate BillingAddress
1 2 2009-01-01 00:00:00 Theodor-Heuss-Straße 34
2 4 2009-01-02 00:00:00 Ullevålsveien 14
3 8 2009-01-03 00:00:00 Grétrystraat 63
4 14 2009-01-06 00:00:00 8210 111 ST NW
I tried to retrieve the data using R, with the following code:
library(DBI)
library(RMySQL)
library(dplyr)
library(magrittr)
library(lubridate)
library(stringi)
# Step 1 - Connect to the database ----------------------------------------
con <- DBI::dbConnect(MySQL(),
dbname = Sys.getenv("DB_CHINOOK"),
host = Sys.getenv("HST_CHINOOK"),
user = Sys.getenv("USR_CHINOOK"),
password = Sys.getenv("PASS_CHINOOK"),
port = XXXX)
invoices_tbl <- tbl(con, "Invoice") %>%
collect()
The connection is ok, but when trying to visualize the data, I can't see the special characters:
> head(invoices_tbl[,1:4])
# A tibble: 6 x 4
InvoiceId CustomerId InvoiceDate BillingAddress
<int> <int> <chr> <chr>
1 1 2 2009-01-01 00:00:00 "Theodor-Heuss-Stra\xdfe 34"
2 2 4 2009-01-02 00:00:00 "Ullev\xe5lsveien 14"
3 3 8 2009-01-03 00:00:00 "Gr\xe9trystraat 63"
4 4 14 2009-01-06 00:00:00 "8210 111 ST NW"
5 5 23 2009-01-11 00:00:00 "69 Salem Street"
6 6 37 2009-01-19 00:00:00 "Berger Stra\xdfe 10"
My question is, should I change something in the configuration inside MySQL? Or is it an issue with R? How can I see the special characters? What is the meaning of \xdfe?
Please, any help will be greatly appreciated.
The hexadecimal format can be converted with iconv
invoices_tbl$BillingAddress <- iconv(invoices_tbl$BillingAddress,
"latin1", "utf-8")
-output
invoices_tbl
InvoiceId CustomerId InvoiceDate BillingAddress
1 1 2 2009-01-01 00:00:00 Theodor-Heuss-Straße 34
2 2 4 2009-01-02 00:00:00 Ullevålsveien 14
3 3 8 2009-01-03 00:00:00 Grétrystraat 63
4 4 14 2009-01-06 00:00:00 8210 111 ST NW
5 5 23 2009-01-11 00:00:00 69 Salem Street
6 6 37 2009-01-19 00:00:00 Berger Straße 10
data
invoices_tbl <- structure(list(InvoiceId = 1:6, CustomerId = c(2L, 4L, 8L, 14L,
23L, 37L), InvoiceDate = c("2009-01-01 00:00:00", "2009-01-02 00:00:00",
"2009-01-03 00:00:00", "2009-01-06 00:00:00", "2009-01-11 00:00:00",
"2009-01-19 00:00:00"), BillingAddress = c("Theodor-Heuss-Stra\xdfe 34",
"Ullev\xe5lsveien 14", "Gr\xe9trystraat 63", "8210 111 ST NW",
"69 Salem Street", "Berger Stra\xdfe 10")), row.names = c("1",
"2", "3", "4", "5", "6"), class = "data.frame")

Bulk import 0 Rows affected

When using the bulk insert command in SQL Server 2008 it returns:
(0 row(s) affected)
I am using this command to carry out the bulk insert:
BULK INSERT Test
FROM 'C:\DataFiles\Tests.dat'
WITH (FORMATFILE = 'C:\DataFiles\FormatFiles\TestFormat.Fmt');
GO
Tests.dat contains:
b00d23fe-580e-42dc-abd4-e8a054395126,48dd5dd6e3a144f7a817f234dd51469c,452eb8ce-6ae2-4e7a-a389-1097882c83ab,,, ,,,,Aria,,,160,,,86400,,2004-04-03 23:23:00.000,,2012-07-06 13:26:31.633,2012-07-06 13:27:44.650,3,,,,51B7A831-4731-4E2E-ACEC-06636ADC7AD3,,0,,0,,Field Name 1,,Field Name 2,,Field Name 3,,Field Name 4,
and the format file TestFormat.fmt contains:
9.0
39
1 SQLCHAR 0 37 "," 1 Key ""
2 SQLCHAR 0 37 "," 2 TestType ""
3 SQLCHAR 0 37 "," 3 CaseKey ""
4 SQLCHAR 0 30 "," 4 Height ""
5 SQLCHAR 0 30 "," 5 Weight ""
6 SQLCHAR 0 128 "," 6 PacemakerType Latin1_General_CI_AI
7 SQLCHAR 0 0 "," 7 Diary Latin1_General_CI_AI
8 SQLCHAR 0 0 "," 8 Indication Latin1_General_CI_AI
9 SQLCHAR 0 0 "," 9 Medication Latin1_General_CI_AI
10 SQLCHAR 0 37 "," 10 RecorderType ""
11 SQLCHAR 0 100 "," 11 RecorderSerial Latin1_General_CI_AI
12 SQLCHAR 0 0 "," 12 Comments Latin1_General_CI_AI
13 SQLCHAR 0 12 "," 13 Status ""
14 SQLCHAR 0 0 "," 14 AdditionalData Latin1_General_CI_AI
15 SQLCHAR 0 37 "," 15 OrderKey ""
16 SQLCHAR 0 12 "," 16 Duration ""
17 SQLCHAR 0 12 "," 17 Age ""
18 SQLCHAR 0 24 "," 18 RecordingStartDateTime ""
19 SQLCHAR 0 128 "," 19 Ward Latin1_General_CI_AI
20 SQLCHAR 0 24 "," 20 CreatedDateTime ""
21 SQLCHAR 0 24 "," 21 UpdatedDateTime ""
22 SQLCHAR 0 21 "," 22 UserGroupBits ""
23 SQLCHAR 0 24 "," 23 LastArchive ""
24 SQLCHAR 0 128 "," 24 PointOfCare Latin1_General_CI_AI
25 SQLCHAR 0 128 "," 25 Bed Latin1_General_CI_AI
26 SQLCHAR 0 37 "," 26 DownloadFacilityKey ""
27 SQLCHAR 0 37 "," 27 AnalysisFacilityKey ""
28 SQLCHAR 0 12 "," 28 Priority ""
29 SQLCHAR 0 37 "," 29 FacilityKey ""
30 SQLCHAR 0 12 "," 30 PacemakerTypeStandard ""
31 SQLCHAR 0 128 "," 31 TestTypeName Latin1_General_CI_AI
32 SQLCHAR 0 128 "," 32 UserDefined1Name Latin1_General_CI_AI
33 SQLCHAR 0 128 "," 33 UserDefined1Value Latin1_General_CI_AI
34 SQLCHAR 0 128 "," 34 UserDefined2Name Latin1_General_CI_AI
35 SQLCHAR 0 128 "," 35 UserDefined2Value Latin1_General_CI_AI
36 SQLCHAR 0 128 "," 36 UserDefined3Name Latin1_General_CI_AI
37 SQLCHAR 0 128 "," 37 UserDefined3Value Latin1_General_CI_AI
38 SQLCHAR 0 128 "," 38 UserDefined4Name Latin1_General_CI_AI
39 SQLCHAR 0 128 "\r\n" 39 UserDefined4Value Latin1_General_CI_AI
I cannot figure out why this isn't working. Other people have had similar problems because they had more fields and actual columns in their database. Or using .csv files which are not supported apparently.
This works fine on every other table in the database I am importing with no errors so I can't understand why it doesn't work here.
Any help would be greatly appreciated!
Thanks

SSIS Format File

I have inherited a SSIS package which contains a bulk import task.
The bulk import tasks uses the following format file:
8.0
38
1 SQLCHAR 0 2 "" 1 branch_code Latin1_General_CI_AS
2 SQLCHAR 0 10 "" 2 sfkacct_number Latin1_General_CI_AS
3 SQLCHAR 0 3 "" 3 sfkacct_depot Latin1_General_CI_AS
4 SQLCHAR 0 35 "" 4 sfkacct_nominee_name Latin1_General_CI_AS
5 SQLCHAR 0 2 "" 5 sfkacct_domicile Latin1_General_CI_AS
6 SQLCHAR 0 3 "" 6 secore_transaction_status Latin1_General_CI_AS
7 SQLCHAR 0 11 "" 7 secore_transaction_reference Latin1_General_CI_AS
8 SQLCHAR 0 16 "" 8 customer_reference Latin1_General_CI_AS
9 SQLCHAR 0 35 "" 9 market_reference Latin1_General_CI_AS
10 SQLCHAR 0 35 "" 10 counterparty_reference Latin1_General_CI_AS
11 SQLCHAR 0 2 "" 11 transaction_type Latin1_General_CI_AS
12 SQLCHAR 0 18 "" 12 security_quantity Latin1_General_CI_AS
13 SQLCHAR 0 10 "" 13 security_code Latin1_General_CI_AS
14 SQLCHAR 0 12 "" 14 security_number Latin1_General_CI_AS
15 SQLCHAR 0 3 "" 15 security_group Latin1_General_CI_AS
16 SQLCHAR 0 8 "" 16 trade_date Latin1_General_CI_AS
17 SQLCHAR 0 8 "" 17 contractual_settlement_date Latin1_General_CI_AS
18 SQLCHAR 0 8 "" 18 actua1_settlement_date Latin1_General_CI_AS
19 SQLCHAR 0 8 "" 19 revised_date Latin1_General_CI_AS
20 SQLCHAR 0 3 "" 20 settlement_currency Latin1_General_CI_AS
21 SQLCHAR 0 20 "" 21 settlement_amount Latin1_General_CI_AS
22 SQLCHAR 0 3 "" 22 cash_currency Latin1_General_CI_AS
23 SQLCHAR 0 14 "" 23 cashacct_number Latin1_General_CI_AS
24 SQLCHAR 0 10 "" 24 broker_code Latin1_General_CI_AS
25 SQLCHAR 0 35 "" 25 broker_description Latin1_General_CI_AS
26 SQLCHAR 0 35 "" 26 beneficiary_code Latin1_General_CI_AS
27 SQLCHAR 0 35 "" 27 beneficiary_details1 Latin1_General_CI_AS
28 SQLCHAR 0 35 "" 28 beneficiary_details2 Latin1_General_CI_AS
29 SQLCHAR 0 35 "" 29 beneficiary_details3 Latin1_General_CI_AS
30 SQLCHAR 0 16 "" 30 failcode_org Latin1_General_CI_AS
31 SQLCHAR 0 16 "" 31 failcode_lst Latin1_General_CI_AS
32 SQLCHAR 0 35 "" 32 failcode_description Latin1_General_CI_AS
33 SQLCHAR 0 2 "" 33 status_code Latin1_General_CI_AS
34 SQLCHAR 0 8 "" 34 secore_transaction_inputdate Latin1_General_CI_AS
35 SQLCHAR 0 8 "" 35 secore_transaction_valuedate Latin1_General_CI_AS
36 SQLCHAR 0 6 "" 36 yearmonth Latin1_General_CI_AS
37 SQLCHAR 0 2 "" 37 domicile Latin1_General_CI_AS
38 SQLCHAR 0 1 "\r\n" 38 instruction_mode Latin1_General_CI_AS
Coudl anyone tell me what the 8.0 at the top of the file represents?
It is the version number of the bcp.
MSDN Link
It looks like
SQL Server 2000 - 8.0
SQL Server 2005 - 9.0
SQL Server 2008 - 10.0