Extracting data from a single variable to multi variable- multi observations by symbols and text. [{..]} - html

I have a CSV, extracted from an HTML site, many columns hold a lot of information in one cell. for example- this text is from one cell. It holds the name of 3 companies:
[{"company":"Orange","location":"","url":"https://www.xyz","positions":[{"title":"CEO","subtitle":"honelulu","description":"","duration":"Dec 2021 - Present 7 months"}] ,"industry":"Non-profit Organizations"},{"company":"Fig","location":"","url":"https://www.xyz2","positions":[{"title":"Business Development Manager","subtitle":"Fig","duration":"Feb 2019 Dec 2021 2 years 11 months",}],},
{"company":"Papaya","location":"","url":"https://www.xyz3","positions":[{"title":"Business Development Manager","subtitle":"Pragaya","description":"","duration":"Jan 2018 Oct 2018 10 months",}],"industry":"High Tech"},}]
I would like to extract each company into a different row, with the user name, position, duration and industry in different columns.
I also have other date in other columns that I wish would stay the same.
Any ideas for a simple way to do this?

This tidyr approach with extract works for a start:
library(dplyr)
library(tidyr)
data.frame(dat) %>%
# simplify:
mutate(dat = gsub('["\\]\\[}{]', '', dat, perl = TRUE)) %>%
# separate:
separate_rows(dat, sep = '(?<!^)(?=company)') %>%
# extract:
extract(dat, "company", "company:([^,]+).*", remove = FALSE) %>%
extract(dat, "user_name", ".*url:([^,]+).*", remove = FALSE) %>%
extract(dat, "position", ".*\\btitle:([^,]+)", remove = FALSE)
# A tibble: 3 × 5
industry duration position user_name company
<chr> <chr> <chr> <chr> <chr>
1 Non-profit Organizations "Dec 2021 - Present 7 months " CEO https://www.xyz Orange
2 NA "Feb 2019 Dec 2021 2 years 11 months" Business Development Manager https://www.xyz2 Fig
3 High Tech "Jan 2018 Oct 2018 10 months" Business Development Manager https://www.xyz3 Papaya
Data:
dat <- '{"company":"Orange","location":"","url":"https://www.xyz","positions":[{"title":"CEO","subtitle":"honelulu","description":"","duration":"Dec 2021 - Present 7 months"}] ,"industry":"Non-profit Organizations"},{"company":"Fig","location":"","url":"https://www.xyz2","positions":[{"title":"Business Development Manager","subtitle":"Fig","duration":"Feb 2019 Dec 2021 2 years 11 months",}],},{"company":"Papaya","location":"","url":"https://www.xyz3","positions":[{"title":"Business Development Manager","subtitle":"Pragaya","description":"","duration":"Jan 2018 Oct 2018 10 months",}],"industry":"High Tech"},}]'
See also Use tidyr's function `extract` with optional capture group for a more elegant solution

Related

Is any better way to handle JSON in R | performance improvement

Data Preparation
comp <-
c('[{"id": 28, "name": "Google"}, {"id": 12, "name": "Microsoft"}]',
'[{"id": 32, "name": "Microsoft"}, {"id": 878, "name": "Facebook"}]')
id = c(1,2)
jsonData = as.data.frame(id,comp)
jsonData
id
[{"id": 28, "name": "Google"}, {"id": 12, "name": "Microsoft"}] 1
[{"id": 32, "name": "Microsoft"}, {"id": 878, "name": "Facebook"}] 2
I am not sure why 'comp' not came as column name and why 'id' came later if it's defined before, Also its giving error if I define 'as.data.frame(comp,id)'
Now I am dealing with JSON data
library(jsonlite)
library(tidyverse)
library(dplyr)
data <- jsonData %>% mutate(x = lapply(comp,fromJSON)) %>% unnest(x)
data
id id1 name
1 1 28 Google
2 1 12 Microsoft
3 2 32 Microsoft
4 2 878 Facebook
Is there any better way to deal with JSON in R, like any library which directly convert JSON to normal column, currently I am taking small data so its look easy but I have multiple columns having JSON input and Its too much performance hit for my report
JSON is text. Text parsing is slow. Also not sure why library(dplyr) is there since it comes with the tidyverse. And, you should consider reading up on how to make data frames.
Regardless. We'll make an representative example: 500,000 rows:
library(tidyverse)
data_frame(
id = rep(c(1L, 2L), 250000),
comp = rep(c(
'[{"id": 28, "name": "Google"}, {"id": 12, "name": "Microsoft"}]',
'[{"id": 32, "name": "Microsoft"}, {"id": 878, "name": "Facebook"}]'
), 250000)
) -> xdf
There are many JSON processing packages in R. Test out a few. This uses ndjson which has a function flatten() which takes a character vector of JSON strings and makes a "completely flat" structure from it.
I'm only using different data frame variables for explanatory clarity and benchmarking later.
pull(xdf, comp) %>%
ndjson::flatten() %>%
bind_cols(select(xdf, id)) -> ydf
That makes:
ydf
## Source: local data table [500,000 x 5]
##
## # A tibble: 500,000 x 5
## `0.id` `0.name` `1.id` `1.name` id
## <dbl> <chr> <dbl> <chr> <int>
## 1 28. Google 12. Microsoft 1
## 2 32. Microsoft 878. Facebook 2
## 3 28. Google 12. Microsoft 1
## 4 32. Microsoft 878. Facebook 2
## 5 28. Google 12. Microsoft 1
## 6 32. Microsoft 878. Facebook 2
## 7 28. Google 12. Microsoft 1
## 8 32. Microsoft 878. Facebook 2
## 9 28. Google 12. Microsoft 1
## 10 32. Microsoft 878. Facebook 2
## # ... with 499,990 more rows
We can turn that back into a more tidy data frame:
bind_rows(
select(ydf, id = id, id1=`0.id`, name=`0.name`),
select(ydf, id = id, id1=`1.id`, name=`1.name`)
) %>%
mutate(id1 = as.integer(id1))
## Source: local data table [1,000,000 x 3]
##
## # A tibble: 1,000,000 x 3
## id id1 name
## <int> <int> <chr>
## 1 1 28 Google
## 2 2 32 Microsoft
## 3 1 28 Google
## 4 2 32 Microsoft
## 5 1 28 Google
## 6 2 32 Microsoft
## 7 1 28 Google
## 8 2 32 Microsoft
## 9 1 28 Google
## 10 2 32 Microsoft
## # ... with 999,990 more rows
Now, we'll benchmark with 1,000 rows since I'm not waiting for the full 500,000 run to microbenchmark:
data_frame(
id = rep(c(1L, 2L), 500),
comp = rep(c(
'[{"id": 28, "name": "Google"}, {"id": 12, "name": "Microsoft"}]',
'[{"id": 32, "name": "Microsoft"}, {"id": 878, "name": "Facebook"}]'
), 500)
) -> xdf
microbenchmark::microbenchmark(
faster = {
pull(xdf, comp) %>%
ndjson::flatten() %>%
bind_cols(select(xdf, id)) -> ydf
bind_rows(
select(ydf, id = id, id1=`0.id`, name=`0.name`),
select(ydf, id = id, id1=`1.id`, name=`1.name`)
) %>%
mutate(id1 = as.integer(id1))
}
)
## Unit: milliseconds
## expr min lq mean median uq max neval
## faster 12.46409 13.71483 14.73997 14.40582 15.47529 21.09543 100
So:
15ms for 1,000 rows
15ms * 500 = 7.5s for 500,000
If you're not pedantic about the id1 column needing to be an integer, you can likely shave off a few ms.
There are other approaches. And, if you regularly work with columns of JSON data, I highly recommend checking out Apache Drill and the sergeant package.

Scraping wikipedia table r

Trying to scrape the first 8 tables (very high, high, medium, low) from the human development index in Wikipedia.
Started with but getting a list of zero. What am I doing wrong? New to R :(
libray(rvest)
url <- "https://en.wikipedia.org/wiki/List_of_countries_by_Human_Development_Index#Complete_list_of_countries"
webpage <- read_html(url)
hdi_tables <- html_nodes(webpage, 'table')
head(hdi_tables, n = 10)
scrape <- url %>%
read_html() %>%
html_nodes(xpath = '//*[#id="mw-content-text"]/div/div[5]/table/tbody/tr/td[1]/table') %>%
html_table()
head(scrape, n=10)
I think it would be easier to work with the original data source:
Select "Human Development Index (HDI)" in both the drop-down select lists, then click the "Download Data" link to get a CSV file named Human Development Index (HDI).csv.
Read it into R:
library(tidyverse)
Human_Development_Index_HDI_ <- read_csv("path/to/Human Development Index (HDI).csv",
skip = 1)
You can reshape the data, get the values for 2015 and classify countries as low, medium, high or very high:
hdi <- Human_Development_Index_HDI_ %>%
gather(Year, HDI, -`HDI Rank (2015)`, -Country) %>%
filter(Year == "2015") %>%
na.omit() %>%
mutate(Year = as.numeric(Year),
classification = cut(HDI,
breaks = c(0, 0.549, 0.699, 0.799, 1),
labels = c("low", "medium", "high", "very_high")))
hdi
# A tibble: 188 x 5
`HDI Rank (2015)` Country Year HDI classification
<int> <chr> <dbl> <dbl> <fctr>
1 169 Afghanistan 2015 0.479 low
2 75 Albania 2015 0.764 high
3 83 Algeria 2015 0.745 high
4 32 Andorra 2015 0.858 very_high
5 150 Angola 2015 0.533 low
6 62 Antigua and Barbuda 2015 0.786 high
7 45 Argentina 2015 0.827 very_high
8 84 Armenia 2015 0.743 high
9 2 Australia 2015 0.939 very_high
10 24 Austria 2015 0.893 very_high
# ... with 178 more rows
You could change the filter to get values for 2014 too, if you want to replicate the "change from previous year" values in the Wikipedia table.
If you're okay with parsing the wikipedia markup language instead, you could try using WikipediR to grab the markup of the page (from skimming the documentation, try page_content with as_wikitext set to true). Then you'll get some lines that all look like this:
| 1 || {{steady}} ||style="text-align:left"| {{flag|Norway}} || 0.949 || {{increase}} 0.001
This should be parseable in R using strsplit or something.

Scraping html text into table with delimiters that do not have a clear pattern using R (rvest)

I'm just learning how to use R to scrape data from webpages, and I'm running into a couple of issues.
For reference, the website that I am practicing on is here: http://www.rsssf.com/tables/34q.html
As far as I know, the website I am scraping data from is not a table so I can't directly scrape the information into a table, so here is the code I wrote to just have all of the text:
wcq_1934_html <- read_html("http://www.rsssf.com/tables/34q.html")
wcq_1934_node <- html_nodes(wcq_1934_html, "pre")
wcq_1934_text <- html_text(wcq_1934_node, trim = TRUE)
This results in a very long text file with all of the information that I need, just not formatted in an ideal way.
So I am next attempting to substring this text in order to get an output that looks something like this.
Country A - Country A Score - Country B - Country B Score
It doesn't have to be exactly like this, I just basically need for each game the country and how many goals they scored and ideally it should be comparable with the other country from the same game so I can know who won or lost! I do not need any of the other information like where the game was played, etc.
So I've tried three different ways to get this:
First test: split text by dashes:
test <- strsplit(wcq_1934_text, "-")
df_test <- data.frame(test)
This gives me the information I need in a table but the rows don't match the exact scores that I need (i.e. Lithuania 0, and Sweden 2 are in separate rows)
Second test: split text by spaces:
test2 <- strsplit(wcq_1934_text, " ")
df_test2 <- data.frame(test2)
This is helpful because it gives me the scores in one row (0-2 for the first game), but the countries are unevenly spaced out across rows.
Third test: split text by "tabs"
test3 <- strsplit(wcq_1934_text, " ")
df_test3 <- data.frame(test3)
This has a similar issue to the first test.
Any suggestions would be much appreciated. This is my first ever Stack Overflow post, although I've lurked around and this website has been helpful to me for a very long time. Thank you in advance!
Here's a solution that provides you most of what you need, though as MrFlick commented, it is a little fragile to this page. I'll stay with rvest, though as biomiha suggested, it isn't really buying you a lot here (though it does cleanly break out the <pre> block).
Starting with your wcq_1934_text, it's a single long string, let's break it up by newlines (CRLF in this case):
wcq_1934_text <- strsplit(wcq_1934_text, "[\r\n]+")[[1]]
str(wcq_1934_text)
# chr [1:51] "Hosts: Italy (not automatically qualified)" "Holders: Uruguay (did not enter)" "Group 1 [Sweden]" ...
I'll the magrittr package merely because it helps break out each step of the process using the %>% non-pipe; you can convert it non-magrittr by changing (say) func1() %>% func2() %>% func3() to func3(func2(func1())) (yuck) or intermediate assignment of return values, ret1 <- func1(); ret2 <- func2(ret1); ....
library(magrittr)
dat <- Filter(function(a) grepl("^[0-9][0-9]", a), wcq_1934_text) %>%
paste(., collapse = "\n") %>%
textConnection() %>%
read.fwf(file = ., widths = c(10, 16, 17, 4, 99), stringsAsFactors = FALSE) %>%
lapply(trimws) %>%
as.data.frame(stringsAsFactors = FALSE)
The widths are fragile and unique to this page. If other reporting pages have slightly different column layouts, you'll need to use a different function, perhaps one that can automatically determine the breaks.
head(dat)
# V1 V2 V3 V4 V5
# 1 11.06.33 Stockholm Sweden 6-2 Estonia
# 2 29.06.33 Kaunas Lithuania 0-2 Sweden
# 3 11.03.34 Madrid Spain 9-0 Portugal
# 4 18.03.34 Lisboa Portugal 1-2 Spain
# 5 25.03.34 Milano Italy 4-0 Greece
# 6 25.03.34 Sofia Bulgaria 1-4 Hungary
From here, it's up to you which columns you want to use.
For instance, handling of the date, you might want:
dat$V1 <- as.POSIXct(gsub("([0-9]+)$", "19\\1", dat$V1), format = "%d.%m.%Y")
dat$V1
# [1] "1933-06-11 PST" "1933-06-29 PST" "1934-03-11 PST" "1934-03-18 PST" "1934-03-25 PST" "1934-03-25 PST" "1934-04-25 PST" "1934-04-29 PST"
# [9] "1933-10-15 PST" "1934-03-15 PST" "1933-09-24 PST" "1933-10-29 PST" "1934-04-29 PST" "1934-02-25 PST" "1934-04-08 PST" "1934-04-29 PST"
# [17] "1934-03-11 PST" "1934-04-15 PST" "1934-01-28 PST" "1934-02-01 PST" "1934-02-04 PST" "1934-03-04 PST" "1934-03-11 PST" "1934-03-18 PST"
# [25] "1934-05-24 PST" "1934-03-16 PST" "1934-04-06 PST"
The gsub stuff is because as.POSIXct assumes 2-digit years less than 69 are in the 20th century, 19th for 69-99.
It's easy enough to use either strsplit on the scores, but you could also do:
library(tidyr)
dat %>%
separate(V4, c("score1", "score2"), sep="-") %>%
head()
# Warning: Too few values at 1 locations: 10
# V1 V2 V3 score1 score2 V5
# 1 1933-06-11 Stockholm Sweden 6 2 Estonia
# 2 1933-06-29 Kaunas Lithuania 0 2 Sweden
# 3 1934-03-11 Madrid Spain 9 0 Portugal
# 4 1934-03-18 Lisboa Portugal 1 2 Spain
# 5 1934-03-25 Milano Italy 4 0 Greece
# 6 1934-03-25 Sofia Bulgaria 1 4 Hungary
(The warning is expected, since one game was not played so has "n/p" for a score. You might want to handle non-score values in V4 before trying the split, perhaps replacing anything not numeric-dash-numeric with NA.)
Equally specific to this particular site but may be easier to generalize:
library(rvest)
library(purrr)
library(dplyr)
library(stringi)
pg <- read_html("http://www.rsssf.com/tables/34q.html")
Target the <pre> and strip out some things that aren't part of "tables":
html_nodes(pg, "pre") %>%
html_text() %>%
stri_split_lines() %>%
flatten_chr() %>%
discard(stri_detect_regex, "^(NB| )") -> lines
Now, we get the start and end lines indexes of each "group":
starts <- which(grepl("^Group", lines))
ends <- c(starts[-1], length(lines))
We iterate over those starts and ends and:
extract the group info
clean up the table
discard any "empty" tables
turn the tabular data into a data frame, doing some munging along the way
I can annotate the following more if needed:
map2_df(starts, ends, ~{
grp_info <- stri_match_all_regex(lines[.x], "Group ([[:digit:]]+) \\[(.*)]")[[1]][,2:3]
lines[(.x+1):.y] %>%
discard(stri_detect_regex, "(^[^[:digit:]]| round)") %>%
discard(`==`, "") -> grp
if (length(grp) == 0) return(NULL)
stri_split_regex(grp, "\ \ +") %>%
map_df(~{
.x[1:4] %>%
as.list() %>%
set_names(c("date", "team_a", "team_b", "score_team")) %>%
flatten_df() %>%
separate(score_team, c("score", "team_c"), sep=" ") %>%
mutate(group_num = grp_info[1], group_info = grp_info[2]) %>%
separate(date, c("d", "m", "y")) %>%
mutate(date = as.Date(sprintf("19%s-%s-%s", y, m, d))) %>%
select(-d, -m, -y)
})
})
## # A tibble: 27 x 7
## team_a team_b score team_c group_num group_info date
## <chr> <chr> <chr> <chr> <chr> <chr> <date>
## 1 Stockholm Sweden 6-2 Estonia 1 Sweden 1933-06-11
## 2 Kaunas Lithuania 0-2 Sweden 1 Sweden 1933-06-29
## 3 Madrid Spain 9-0 Portugal 2 Spain 1934-03-11
## 4 Lisboa Portugal 1-2 Spain 2 Spain 1934-03-18
## 5 Milano Italy 4-0 Greece 3 Italy 1934-03-25
## 6 Sofia Bulgaria 1-4 Hungary 4 Hungary, Austria 1934-03-25
## 7 Wien Austria 6-1 Bulgaria 4 Hungary, Austria 1934-04-25
## 8 Budapest Hungary 4-1 Bulgaria 4 Hungary, Austria 1934-04-29
## 9 Warszawa Poland 1-2 Czechoslovakia 5 Czechoslovakia 1933-10-15
## 10 Praha Czechoslovakia n/p Poland 5 Czechoslovakia 1934-03-15
## 11 Beograd Yugoslavia 2-2 Switzerland 6 Romania, Switzerland 1933-09-24
## 12 Bern Switzerland 2-2 Romania 6 Romania, Switzerland 1933-10-29
## 13 Bucuresti Romania 2-1 Yugoslavia 6 Romania, Switzerland 1934-04-29
## 14 Dublin Ireland 4-4 Belgium 7 Netherlands, Belgium 1934-02-25
## 15 Amsterdam Netherlands 5-2 Ireland 7 Netherlands, Belgium 1934-04-08
## 16 Antwerpen Belgium 2-4 Netherlands 7 Netherlands, Belgium 1934-04-29
## 17 Luxembourg Luxembourg 1-9 Germany 8 Germany, France 1934-03-11
## 18 Luxembourg Luxembourg 1-6 France 8 Germany, France 1934-04-15
## 19 Port-au-Prince Haiti 1-3 Cuba 11 USA 1934-01-28
## 20 Port-au-Prince Haiti 1-1 Cuba 11 USA 1934-02-01
## 21 Port-au-Prince Haiti 0-6 Cuba 11 USA 1934-02-04
## 22 Cd. de Mexico Mexico 3-2 Cuba 11 USA 1934-03-04
## 23 Cd. de Mexico Mexico 5-0 Cuba 11 USA 1934-03-11
## 24 Cd. de Mexico Mexico 4-1 Cuba 11 USA 1934-03-18
## 25 Roma USA 4-2 Mexico 11 USA 1934-05-24
## 26 Cairo Egypt 7-1 Palestina 12 Egypt 1934-03-16
## 27 Tel Aviv Palestina 1-4 Egypt 12 Egypt 1934-04-06

Extract Rows from JSON Google API request in R

I am pulling in data from banks based on a search nearby request from Google's map api. In some instances, It pulls more than one bank (as some banks may be close together). How do I go about extracting each individual JSON object returned (as I need to get the place id) in order that I can do a second api pull (based on the place id) to search more detail about each bank? Here is my code:
require(jsonlite)
require(utils)
plcUrl <- "https://maps.googleapis.com/maps/api/place/nearbysearch/json?"
key <- "myKEY"
location <- paste0("41.0272, -81.51345")
address <- "XXXXXXXXXXXXXXXXX"
type <- "bank"
radius <- "500"
name = "XXXXX"
strurl <- as.character(paste(plcUrl ,
"&location=",location,
"&address=",address,
#"&name=",name,
"&radius=",radius,
"&type=",type,
"&key=",key,
sep=""))
setInternet2(TRUE)
rd <- fromJSON(URLencode(strurl))
rd$results$place_id
As of googleway v2.4 I've added methods that access specific elements of Google API queries.
library(googleway)
key <- "your_api_key"
## search places
res <- google_places(location = c(41.0272, -81.51345),
key = key,
place_type = "bank",
radius = 500)
## get the place_id values using the `place` method to extract the ids
place(res)
# [1] "ChIJDe7R2HQqMYgRvqoszlV6YTA" "ChIJDQwLUXMqMYgR-3Nb2KFhZZ0"
## query the details
details <- google_place_details(place_id = place(res)[1], key = key)
details$result$opening_hours
# $open_now
# [1] FALSE
#
# $periods
# close.day close.time open.day open.time
# 1 1 1600 1 0900
# 2 2 1600 2 0900
# 3 3 1600 3 0900
# 4 4 1600 4 0900
# 5 5 1800 5 0900
# 6 6 1300 6 0900
#
# $weekday_text
# [1] "Monday: 9:00 AM – 4:00 PM" "Tuesday: 9:00 AM – 4:00 PM" "Wednesday: 9:00 AM – 4:00 PM" "Thursday: 9:00 AM – 4:00 PM"
# [5] "Friday: 9:00 AM – 6:00 PM" "Saturday: 9:00 AM – 1:00 PM" "Sunday: Closed"

Storing birthday in database, what type to use if there's missing values

I am collecting sports data for a database I will be creating soon, and one of the columns I have for one of my tables is player birthdays. I am using R to scrape and collect the data, and here is what it currently looks like:
head(my_df)
Name BirthDate Age Place Height Weight
1 Mike NA NA NA
2 Austin August 17, 1994 22.295 Roseville, California 73 210
3 Koby January 19, 1997 20.140 Wolfforth, Texas 70 255
4 Ty 1991 26.000 Amarillo, Texas 70 165
5 Cole NA NA NA
6 Jeff July 24, 1995 21.320 Boulder, Colorado 72 200
dput(head(my_df))
structure(list(Name = c("Mike", "Austin", "Koby", "Ty", "Cole",
"Jeff"), BirthDate = c("", "August 17, 1994", "January 19, 1997",
"1991", "", "July 24, 1995"), Age = c(NA, 22.295, 20.14, 26,
NA, 21.32), Place = c("", "Roseville, California", "Wolfforth, Texas",
"Amarillo, Texas", "", "Boulder, Colorado"), Height = c(NA, 73,
70, 70, NA, 72), Weight = c(NA, 210L, 255L, 165L, NA, 200L)), .Names = c("Name",
"BirthDate", "Age", "Place", "Height", "Weight"), row.names = 25:30, class = "data.frame")
The BirthDate column has two inconsistencies that make appropriate formatting difficult:
there are entirely missing values
for some columns, I have the birthyear but am missing Day and Month.
I'm relatively new with databases and am trying to plan ahead with this. Does anybody have any thoughts on the best way to handle this? To be more specific, how can I write the R code to format the column the right way?
Not a complete answer but a direction to try...
Genealogy databases, which have similar problems, will often have an extra column (a sort of meta-date column) that expresses confidence in, or quality of, the birth-date column. This can then be used at query-time to mediate the results returned. Could be "estimated", "not given", "year only" to suit your app.
If we intend to do calculations on date as date, or on year and month columns, for example: the age of the player, how many of them were born on January, etc. then we can can separate it into 3 numeric columns.
do.call(rbind,
lapply(my_df$BirthDate, function(i){
x <- rev(unlist(strsplit(i, " ")))
res <- data.frame(
BirthDate = i,
YYYY = as.numeric(x[1]),
DD = as.numeric(sub(",", "", x[2], fixed = TRUE)),
MM = match(x[3], month.name))
res$DOB <- as.Date(paste(res$DD, res$MM, res$YYYY, sep = "/"),
format = "%d/%m/%Y")
res
}))
# BirthDate YYYY DD MM DOB
# 1 NA NA NA <NA>
# 2 August 17, 1994 1994 17 8 1994-08-17
# 3 January 19, 1997 1997 19 1 1997-01-19
# 4 1991 1991 NA NA <NA>
# 5 NA NA NA <NA>
# 6 July 24, 1995 1995 24 7 1995-07-24