How should I format a logit regression to test company motivations? (R) - regression

I'm writing a paper on investment firms and their relationship with a sustainable finance initiative. I'm using a panel dataset with 307 investors, 125 of them signed this sustainable initiative.
I would like to add in a section in which I test which variables might be driving them to sign this initiative.
I believe I should use logit regression for this, but having not used these extensively, I'm looking for some guidance.
Currently the data looks like this:
investor
year
activity
country
region
strategy
signatory
123 IM
2002
4.45
France
europe
VC
1
123 IM
2003
3.2
France
europe
VC
1
123 IM
2004
7.8
France
europe
VC
1
Aegon
2005
5.4
Netherlands
europe
BY
0
Aegon
2006
4.2
Netherlands
europe
BY
0
Aegon
2007
1.3
Netherlands
europe
BY
0
As you can see the signatory variable is a binary, and I would be looking to test variables such as country or region against it.
Any tips would be appreciated!
Rory

You can use the glm function in R. Following is an example with country and activity variables as independent variables:
# Assuming that your dataframe name is df
my_logit <- glm(signatory ~ activity + country, family = 'binomial', data=df)
# Check the output summary
summary(my_logit)

Related

Grafana: Combination of variables (Athena Dataset)

Goal: I got an Athena Dataset that is visualized with grafana. With this i want to create several variables so i can precisely select individual areas. The test-data has a format similar to this one:
Time SensorID Location Measurement
/ 1 Berlin 12.1
/ 2 London 14.0
/ 3 NewYork 23.3
/ 3 Sydney 45.1
/ 2 London 1.3
/ 1 NewYork 17.3
/ 2 Berlin 18.9
/ 3 Sydney 4.8
I now want 2 variables where i can select the SensorID and Location at the same time. For example if i select SensorID = 1 and Location = Berlin => Measurement in my Grafana Graph should be 12.1.
Is there a solution to solve this issue, because the syntax for the athena plugin is very new to me even if it is similar to mysql. I tried to create the syntax but it wont work for me (see the pictures below):
Creation of the first variable
Creation of the panel function for the different variables
I would really look forward to hear about possible solutions or help for the athena syntax :)

Is there a way to combine these variables in a way that makes sense?

Hello stack overflow community!
I am a sociology student working on a thesis project comparing home value appreciation and neighborhood racial composition over time.
I'm currently using two separate data sources and trying to combine them in a way that makes sense without aggregating anything.
The first data source is GIS data which has information on home sales in each year by home. The second is census data which has yearly estimates of racial composition by census tract. Both are in .csv formats.
My goal is to create a set of variables for each home row in the GIS data which represents the racial composition for the tract the home is in at the year it was sold (e.g. home 1 | 2010| $500,000 | Census tract 10 | 10% white).
I began doing this by going into Stata and using the following strategy:
For example, if I'm looking at a home sold in 2010 in Census tract 10 and I find that this tract was 10% white in 2010, using something like
If censustract=10 and year=2010, replace percentwhite = 10
However, this seemed incredibly time consuming, as I'm using data that go back decades and a couple dozen Census tracts.
Does anyone have any suggestions on how I might do this smarter, not harder? The first thought I had was to aggregate the data by census tract and year, but was hoping to avoid that if possible. Thank you so much in advance for your help and have a terrific day and start to the new year!
It sounds like you can simply merge census data onto your GIS data. That will be much less painful than using -replace-. Here's an example:
*GIS data: information on home sales in each year by home
clear
input censustract house_id year house_value_k
10 100 2010 200
11 101 2020 500
11 102 1980 100
end
tempfile GIS_data
sa `GIS_data'
*census data: yearly estimates of racial composition by census tract
clear
input censustract year percentwhite
10 2010 20
10 2000 10
11 2010 25
11 2000 5
end
tempfile census_data
sa `census_data'
*easy method: merge the census data onto your GIS data
use `GIS_data', clear
mer m:1 censustract year using `census_data'
drop if _merge==2
list
*hard method: use -replace-
use `GIS_data', clear
gen percentwhite=.
replace percentwhite=20 if censustract==10 & year==2010
replace percentwhite=10 if censustract==10 & year==2000
replace percentwhite=25 if censustract==11 & year==2010
replace percentwhite=5 if censustract==11 & year==2000
list
Both methods "work", but using -merge- is much easier and less prone to errors.
Note: I intentionally created the data sets so that the merge wouldn't be perfect. You will likely want to drop some of the observations in that case. In the code above I dropped when _merge==2

Need help cycling through a webpage's links on selenium

I want to cycle through some links in this webpage, but I'm not sure how to go about it. The section I want to get the links from is this one:
Which is basically the footer. So for a brief description of what I'm trying to do, I want to scrape all of the links for the securities listed in the table and then cycle through the footer so I can change pages and scrape the links off of those as well:
So currently this is page 1 and I can scrape what I want, but I don't know how to proceed to page two and continue the process. I'll show you why in a second:
from selenium import webdriver
from bs4 import BeautifulSoup as bs
import time
from requests import get
driver = webdriver.Firefox()
driver.get("http://www.nse.com.ng/market-data/trading-statistics/equities")
time.sleep(30)
html = driver.page_source
soup = bs(html,"html.parser")
driver.close()
Above is my code. Now if I wanted to find the footer section I'd do this:
foot = soup.find("ul",class_="pagination no-top-pad").find_all("a")
And it truly returns the list of footer/pages:
However, as you can see, it doesn't really use links. There's a simple "#" symbol under the href for each page. So I can't just get the links and cycle through them. This is my problem. How do you suggest I move from page 1 to page 2 and so on? Any help would be greatly appreciated.
The page is loading the data from external URL via JavaScript, you can use this example to load the pages into a dataframe:
import pandas as pd
url = 'http://www.nse.com.ng/REST/api/statistics/equities/?market=&sector=&orderby=&pageSize=10&pageNo={page}'
for page in range(0, 10):
df = pd.read_json(url.format(page=page))
print(df)
Prints:
...
$id Id Symbol ... Market Sector Company2
0 1 123 JULI ... ASeM SERVICES JULI [MRF]
1 2 96 LASACO ... Main Board FINANCIAL SERVICES LASACO
2 3 150 LAWUNION ... Main Board FINANCIAL SERVICES LAWUNION [DIP]
3 4 116 LEARNAFRCA ... Main Board SERVICES LEARNAFRCA
4 5 92 LINKASSURE ... Main Board FINANCIAL SERVICES LINKASSURE
5 6 48 LIVESTOCK ... Main Board AGRICULTURE LIVESTOCK
6 7 106 MANSARD ... Main Board FINANCIAL SERVICES MANSARD
7 8 141 MAYBAKER ... Main Board HEALTHCARE MAYBAKER
8 9 23 MBENEFIT ... Main Board FINANCIAL SERVICES MBENEFIT
9 10 47 MCNICHOLS ... ASeM CONSUMER GOODS MCNICHOLS
[10 rows x 16 columns]
$id Id Symbol ... Market Sector Company2
0 1 153 MEDVIEWAIR ... Main Board SERVICES MEDVIEWAIR [BMF]
1 2 146 MEYER ... Main Board INDUSTRIAL GOODS MEYER
2 3 89 MOBIL ... Main Board OIL AND GAS MOBIL
3 4 98 MORISON ... Main Board HEALTHCARE MORISON
4 5 18 MRS ... Main Board OIL AND GAS MRS
5 6 162 MTNN ... Premium Board ICT MTNN
6 7 148 MULTITREX ... Main Board CONSUMER GOODS MULTITREX [BMR]
7 8 49 MULTIVERSE ... Main Board NATURAL RESOURCES MULTIVERSE
8 9 143 NAHCO ... Main Board SERVICES NAHCO
9 10 158 NASCON ... Main Board CONSUMER GOODS NASCON
[10 rows x 16 columns]
...

r Google News Results Links

I am new to getting information from the web into R but I found this nice code How to get google search results on how to get links from the ordinary google search into R.
I need to get this method running for the google NEWS search.
I know i have to change the url by adding something like "&source=lnms&tbm=nws".
The url i construct leads me to the right news result page if i copy and paste it from R to my browser - so far so good.
I was looking at the html code of the news result page and found that the information is lying inside h3[#class='r dO0Ag'] but there is another node and I don´t know how to code this part.
Would appreciate any help!
library(XML)
library(RCurl)
getGoogleURL <- function(search.term, domain = '.de', quotes=TRUE)
{
search.term <- gsub(' ', '%20', search.term)
if(quotes) search.term <- paste('%22', search.term, '%22', sep='')
#construct google news url
getGoogleURL <- paste('http://www.google', domain, '/search?q=',
search.term, sep='',"&source=lnms&tbm=nws")
return(getGoogleURL)
}
getGoogleLinks <- function(google.url) {
doc <- getURL(google.url, httpheader = c("User-Agent" = "R
(2.10.0)"))
html <- htmlTreeParse(doc, useInternalNodes = TRUE, error=function
(...){})
#?? Wrong part - gives error evaluating xpath expression ??
nodes <- getNodeSet(html, "//h3[#class='r dO0Ag']//a[#class='l lLrAF'//")
dirt_links=sapply(nodes, function(x) x <- xmlAttrs(x)[["href"]])
links <- gsub('/url\\?q=','',sapply(strsplit(dirt_links[as.vector(grep('url',dirt_links))],split='&'),'[',1))
return(links)
}
search.term <- "China"
quotes <- "TRUE"
search.url <- getGoogleURL(search.term=search.term, quotes=quotes)
links <- getGoogleLinks(search.url)
You have a number of options here.
Either RCurl or RSelenium will work.
The key point is to generate the correct URL:
> library(XML)
> library(RCurl)
> search.term <- "china"
> quotes=FALSE
> start=0
> getGoogleURL <- paste('http://www.google.com',
+ '/search?hl=en&gl=kr&tbm=nws&authuser=0&q=',
+ search.term, "&start=",start,sep='')
> getGoogleURL
[1] "http://www.google.com/search?hl=en&gl=kr&tbm=nws&authuser=0&q=china&start=0"
>
at this point, you can dereference the URL and create the HTML parse tree and extract the node data. The start reference allows you to set the return page of the result. i.e. I want to return the forth page (counting from zero)
Working Code Example:
library(XML)
library(RCurl)
getGoogleURL <- function(search.term, start=0, quotes=FALSE) {
search.term <- gsub(' ', '%20', search.term)
if(quotes) search.term <- paste('%22', search.term, '%22', sep='')
getGoogleURL <- paste('http://www.google.com',
'/search?hl=en&gl=kr&tbm=nws&authuser=0&q=',
search.term, "&start=",start,sep='')
getGoogleURL <- URLencode(getGoogleURL)
}
getGoogleNews <- function(search.term="China",
start=0,
quotes=FALSE ){
google.url <- getGoogleURL(search.term=search.term,
start, quotes=quotes)
print(google.url)
doc <- getURL(google.url,
httpheader = c("User-Agent" = "R(3.0.3)"))
html <- htmlTreeParse(doc, useInternalNodes = TRUE,
error=function(...){}, asText = TRUE)
nodes <- getNodeSet(html, "//*/h3/a[#href]")
title <- sapply(nodes, function(x) x <- xmlValue(x))
url <- unname(sapply(nodes, function(x) x <- xmlAttrs(x)))
url <- gsub("\\/url\\?q=", "", url)
nodes <- getNodeSet(html, "//div[#class='slp']")
source <- sapply(nodes, function(x) x <- xmlValue(x))
nodes <- getNodeSet(html, "//div[#class='st']")
summary <- sapply(nodes, function(x) x <- xmlValue(x))
data.frame(title=title, source=source, url=url, summary=summary)
}
getGoogleNews("China")
getGoogleNews("China", 1)
getGoogleNews("China", 2)
Runtime:
> library(XML)
> library(RCurl)
> getGoogleURL <- function(search.term, start=0, quotes=FALSE) {
+ search.term <- gsub(' ', '%20', search.term)
+ if(quotes) search.term <- paste( .... [TRUNCATED]
> getGoogleNews <- function(search.term="China",
+ start=0,
+ quotes=FALSE ){
+ google.url <- ge .... [TRUNCATED]
> getGoogleNews("China")
[1] "http://www.google.com/search?hl=en&gl=kr&tbm=nws&authuser=0&q=China&start=0"
title
1 Taiwan says China is 'out of control' as it loses El Salvador to Beijing
2 China central bank official rebuts Trump's claim it is manipulating the ...
3 Airbnb Wants to Find a Home in China
4 China's biggest risk may be its property market — not the trade war
5 Malaysia has axed $22 billion of Chinese-backed projects, in a blow ...
6 China reaches 800 million internet users
7 China DEFIES Trump to buy nearly ALL oil imports from Iran despite ...
8 7 Signs that China's Military is Becoming More Dangerous
9 Asia markets trade mostly higher as investors look ahead to US ...
10 Can China, the world's biggest pork producer, contain a fatal pig ...
source
1 CNBC - 17 hours ago
2 CNBC - 10 hours ago
3 WIRED - 13 hours ago
4 CNBC - 23 hours ago
5 Business Insider - 11 hours ago
6 TechCrunch - 10 hours ago
7 Express.co.uk - 12 hours ago
8 The National Interest Online (blog) - 16 hours ago
9 CNBC - 17 hours ago
10 Science Magazine - 5 hours ago
url
1 https://www.cnbc.com/2018/08/21/taiwan-says-china-out-of-control-as-it-loses-el-salvador-to-beijing.html&sa=U&ved=0ahUKEwi28IGAt__cAhXCj1QKHb0rDPcQqQIIFCgAMAA&usg=AOvVaw2cSTmS65-6IvKQV9xrl3y3
2 https://www.cnbc.com/2018/08/21/china-official-refutes-trumps-claim-it-is-manipulating-the-yuan.html&sa=U&ved=0ahUKEwi28IGAt__cAhXCj1QKHb0rDPcQqQIIHSgAMAE&usg=AOvVaw2q7yr2oBWHib3bRAVmOna-
3 https://www.wired.com/story/airbnb-china-market/&sa=U&ved=0ahUKEwi28IGAt__cAhXCj1QKHb0rDPcQqQIIJigAMAI&usg=AOvVaw2a2LSkYlosnwTFRCvjmUhm
4 https://www.cnbc.com/2018/08/21/china-economy-biggest-risk-may-be-property-market-not-trade-war.html&sa=U&ved=0ahUKEwi28IGAt__cAhXCj1QKHb0rDPcQqQIIKSgAMAM&usg=AOvVaw1bUY5Ii7AlWURDifpeozJU
5 https://www.businessinsider.com/malaysia-axes-22-billion-of-belt-and-road-projects-blow-to-china-2018-8&sa=U&ved=0ahUKEwi28IGAt__cAhXCj1QKHb0rDPcQqQIILCgAMAQ&usg=AOvVaw0yGdVilstHZVBBXEuuAbmu
6 https://techcrunch.com/2018/08/21/china-reaches-800-million-internet-users/&sa=U&ved=0ahUKEwi28IGAt__cAhXCj1QKHb0rDPcQqQIINSgAMAU&usg=AOvVaw0VYTngAb-OBUSYkxKs0ZKp
7 https://www.express.co.uk/news/world/1006297/Iran-oil-china-donald-trump-oil-prices-oil-price-us-iran-nuclear-deal-sanctions&sa=U&ved=0ahUKEwi28IGAt__cAhXCj1QKHb0rDPcQqQIIOCgAMAY&usg=AOvVaw3W5adCnWdzz71zvpgE1x6D
8 https://nationalinterest.org/blog/buzz/7-signs-chinas-military-becoming-more-dangerous-29352&sa=U&ved=0ahUKEwi28IGAt__cAhXCj1QKHb0rDPcQqQIIPigAMAc&usg=AOvVaw1k05lyvFRrx_FImDKIsZ61
9 https://www.cnbc.com/2018/08/21/asia-markets-us-china-trade-talks-in-focus.html&sa=U&ved=0ahUKEwi28IGAt__cAhXCj1QKHb0rDPcQqQIIQSgAMAg&usg=AOvVaw0YqzZPNbH9bawkv8qX8Bdm
10 http://www.sciencemag.org/news/2018/08/can-china-world-s-biggest-pork-producer-contain-fatal-pig-virus-scientists-fear-worst&sa=U&ved=0ahUKEwi28IGAt__cAhXCj1QKHb0rDPcQqQIIRCgAMAk&usg=AOvVaw1H0c03l4trLI3cbRRlnKJW
summary
1 Taiwan vowed on Tuesday to fight China's "increasingly out of control" behavior after Taipei lost another ally to Beijing when El Salvador ...
2 A senior official of China's central bank told a briefing on Tuesday that the yuan's exchange rate is set by the market, rebutting President Donald ...
3 China is littered with the virtual carcasses of startups that attempted to do business in the country and then gave up or were shut out.
4 China's hot real estate market remains a challenge for authorities trying to maintain stable economic growth in the face of trade tensions with ...
5 The projects were a $20 billion rail link and two gas pipelines worth $2.3 billion. All three were part of China's Belt and Road Initiative (BRI), a massive project ...
6 A new report [in Chinese] issued by the China Internet Network Information Center (CNNIC) put the number of people in China with access to ...
7 China is Iran's biggest oil customer and the shift shows the communist nation wants to keep buying Iranian crude oil despite US sanctions ...
8 Western media seized on a new Pentagon report that Chinese bombers are training to strike deep into the Western Pacific, including Guam, the ...
9 Chinese markets led gains on Tuesday in a mostly positive trading session across Asia, extending their upward climb from the previous day ...
10 As of today, ASF has been reported at sites in four provinces in China's northeast, thousands of kilometers apart. Containing the disease in a ...
> getGoogleNews("China", 1)
[1] "http://www.google.com/search?hl=en&gl=kr&tbm=nws&authuser=0&q=China&start=1"
title
1 China central bank official rebuts Trump's claim it is manipulating the ...
2 Airbnb Wants to Find a Home in China
3 China's biggest risk may be its property market — not the trade war
4 Malaysia has axed $22 billion of Chinese-backed projects, in a blow ...
5 China reaches 800 million internet users
6 China DEFIES Trump to buy nearly ALL oil imports from Iran despite ...
7 7 Signs that China's Military is Becoming More Dangerous
8 Asia markets trade mostly higher as investors look ahead to US ...
9 Can China, the world's biggest pork producer, contain a fatal pig ...
10 How China, India and the US use healthcare aid to win influence in ...
source
1 CNBC - 10 hours ago
2 WIRED - 13 hours ago
3 CNBC - 23 hours ago
4 Business Insider - 11 hours ago
5 TechCrunch - 10 hours ago
6 Express.co.uk - 12 hours ago
7 The National Interest Online (blog) - 16 hours ago
8 CNBC - 17 hours ago
9 Science Magazine - 5 hours ago
10 ABC News - 5 hours ago
url
1 https://www.cnbc.com/2018/08/21/china-official-refutes-trumps-claim-it-is-manipulating-the-yuan.html&sa=U&ved=0ahUKEwjakZ6At__cAhXjllQKHZEQA9E4ARCpAggUKAAwAA&usg=AOvVaw1Muu65XvSSWVKX06-5syLY
2 https://www.wired.com/story/airbnb-china-market/&sa=U&ved=0ahUKEwjakZ6At__cAhXjllQKHZEQA9E4ARCpAggdKAAwAQ&usg=AOvVaw0Py7bJDY3tIj4KxgwYot1A
3 https://www.cnbc.com/2018/08/21/china-economy-biggest-risk-may-be-property-market-not-trade-war.html&sa=U&ved=0ahUKEwjakZ6At__cAhXjllQKHZEQA9E4ARCpAgggKAAwAg&usg=AOvVaw2EHMCQvFQV9ubu17ERCZFO
4 https://www.businessinsider.com/malaysia-axes-22-billion-of-belt-and-road-projects-blow-to-china-2018-8&sa=U&ved=0ahUKEwjakZ6At__cAhXjllQKHZEQA9E4ARCpAggjKAAwAw&usg=AOvVaw1sMhG0tyUnj8j2W02gD3aW
5 https://techcrunch.com/2018/08/21/china-reaches-800-million-internet-users/&sa=U&ved=0ahUKEwjakZ6At__cAhXjllQKHZEQA9E4ARCpAggsKAAwBA&usg=AOvVaw1ODs1JY8V_ETi24ugz-yNn
6 https://www.express.co.uk/news/world/1006297/Iran-oil-china-donald-trump-oil-prices-oil-price-us-iran-nuclear-deal-sanctions&sa=U&ved=0ahUKEwjakZ6At__cAhXjllQKHZEQA9E4ARCpAggvKAAwBQ&usg=AOvVaw0r0HQNfZhEwfbiEocUC74Z
7 https://nationalinterest.org/blog/buzz/7-signs-chinas-military-becoming-more-dangerous-29352&sa=U&ved=0ahUKEwjakZ6At__cAhXjllQKHZEQA9E4ARCpAgg1KAAwBg&usg=AOvVaw2hpQQXrAm2HW158II7F1kG
8 https://www.cnbc.com/2018/08/21/asia-markets-us-china-trade-talks-in-focus.html&sa=U&ved=0ahUKEwjakZ6At__cAhXjllQKHZEQA9E4ARCpAgg4KAAwBw&usg=AOvVaw2surM3fW-lLJDd9P-r7xJB
9 http://www.sciencemag.org/news/2018/08/can-china-world-s-biggest-pork-producer-contain-fatal-pig-virus-scientists-fear-worst&sa=U&ved=0ahUKEwjakZ6At__cAhXjllQKHZEQA9E4ARCpAgg7KAAwCA&usg=AOvVaw3Lzvks6B0Un4IEgoMh86re
10 http://www.abc.net.au/news/2018-08-22/china-india-us-medical-diplomacy-in-the-pacific/10147632&sa=U&ved=0ahUKEwjakZ6At__cAhXjllQKHZEQA9E4ARCpAgg-KAAwCQ&usg=AOvVaw1Ogg8I6mUvDSCc9F90Usg4
summary
1 A senior official of China's central bank told a briefing on Tuesday that the yuan's exchange rate is set by the market, rebutting President Donald ...
2 China is littered with the virtual carcasses of startups that attempted to do business in the country and then gave up or were shut out.
3 China's hot real estate market remains a challenge for authorities trying to maintain stable economic growth in the face of trade tensions with ...
4 The projects were a $20 billion rail link and two gas pipelines worth $2.3 billion. All three were part of China's Belt and Road Initiative (BRI), a massive project ...
5 A new report [in Chinese] issued by the China Internet Network Information Center (CNNIC) put the number of people in China with access to ...
6 China is Iran's biggest oil customer and the shift shows the communist nation wants to keep buying Iranian crude oil despite US sanctions ...
7 Western media seized on a new Pentagon report that Chinese bombers are training to strike deep into the Western Pacific, including Guam, the ...
8 Chinese markets led gains on Tuesday in a mostly positive trading session across Asia, extending their upward climb from the previous day ...
9 As of today, ASF has been reported at sites in four provinces in China's northeast, thousands of kilometers apart. Containing the disease in a ...
10 China's 10,000-ton medical ship, the Peace Ark, has cut a broad arc through the Pacific, stopping off in Papua New Guinea, Vanuatu and Fiji ...
> getGoogleNews("China", 2)
[1] "http://www.google.com/search?hl=en&gl=kr&tbm=nws&authuser=0&q=China&start=2"
title
1 Airbnb Wants to Find a Home in China
2 China's biggest risk may be its property market — not the trade war
3 Malaysia has axed $22 billion of Chinese-backed projects, in a blow ...
4 China reaches 800 million internet users
5 China DEFIES Trump to buy nearly ALL oil imports from Iran despite ...
6 7 Signs that China's Military is Becoming More Dangerous
7 Asia markets trade mostly higher as investors look ahead to US ...
8 Can China, the world's biggest pork producer, contain a fatal pig ...
9 How China, India and the US use healthcare aid to win influence in ...
10 China Is Leading in Artificial Intelligence--and American Businesses ...
source
1 WIRED - 13 hours ago
2 CNBC - 23 hours ago
3 Business Insider - 11 hours ago
4 TechCrunch - 10 hours ago
5 Express.co.uk - 12 hours ago
6 The National Interest Online (blog) - 16 hours ago
7 CNBC - 17 hours ago
8 Science Magazine - 5 hours ago
9 ABC News - 5 hours ago
10 Inc.com - 16 hours ago
url
1 https://www.wired.com/story/airbnb-china-market/&sa=U&ved=0ahUKEwi1y7KAt__cAhWpilQKHZQXBi04AhCpAggUKAAwAA&usg=AOvVaw3M4FbZ71J-NVKHn3fHvYwZ
2 https://www.cnbc.com/2018/08/21/china-economy-biggest-risk-may-be-property-market-not-trade-war.html&sa=U&ved=0ahUKEwi1y7KAt__cAhWpilQKHZQXBi04AhCpAggXKAAwAQ&usg=AOvVaw3vieYvDvTlRzYkWncLgQfu
3 https://www.businessinsider.com/malaysia-axes-22-billion-of-belt-and-road-projects-blow-to-china-2018-8&sa=U&ved=0ahUKEwi1y7KAt__cAhWpilQKHZQXBi04AhCpAggaKAAwAg&usg=AOvVaw3JGNk2Lraivca0P1lS3CoY
4 https://techcrunch.com/2018/08/21/china-reaches-800-million-internet-users/&sa=U&ved=0ahUKEwi1y7KAt__cAhWpilQKHZQXBi04AhCpAggjKAAwAw&usg=AOvVaw2j4-NkfK_fNl8McD6WJjPa
5 https://www.express.co.uk/news/world/1006297/Iran-oil-china-donald-trump-oil-prices-oil-price-us-iran-nuclear-deal-sanctions&sa=U&ved=0ahUKEwi1y7KAt__cAhWpilQKHZQXBi04AhCpAggmKAAwBA&usg=AOvVaw0v1Lybg2SxcJoxVkP7sOx_
6 https://nationalinterest.org/blog/buzz/7-signs-chinas-military-becoming-more-dangerous-29352&sa=U&ved=0ahUKEwi1y7KAt__cAhWpilQKHZQXBi04AhCpAggsKAAwBQ&usg=AOvVaw1B7Krdzgd3LQEJ4bwWSSFW
7 https://www.cnbc.com/2018/08/21/asia-markets-us-china-trade-talks-in-focus.html&sa=U&ved=0ahUKEwi1y7KAt__cAhWpilQKHZQXBi04AhCpAggvKAAwBg&usg=AOvVaw0v734CDRel2Vpke9XVjLqA
8 http://www.sciencemag.org/news/2018/08/can-china-world-s-biggest-pork-producer-contain-fatal-pig-virus-scientists-fear-worst&sa=U&ved=0ahUKEwi1y7KAt__cAhWpilQKHZQXBi04AhCpAggyKAAwBw&usg=AOvVaw1j6E7a1jk9JiIahN5pdmi7
9 http://www.abc.net.au/news/2018-08-22/china-india-us-medical-diplomacy-in-the-pacific/10147632&sa=U&ved=0ahUKEwi1y7KAt__cAhWpilQKHZQXBi04AhCpAgg1KAAwCA&usg=AOvVaw2E0qGfLhOkKZWhh5-_Is54
10 https://www.inc.com/magazine/201809/amy-webb/china-artificial-intelligence.html&sa=U&ved=0ahUKEwi1y7KAt__cAhWpilQKHZQXBi04AhCpAgg4KAAwCQ&usg=AOvVaw1thfiF9hJWhz88BU8znvnD
summary
1 China is littered with the virtual carcasses of startups that attempted to do business in the country and then gave up or were shut out.
2 China's hot real estate market remains a challenge for authorities trying to maintain stable economic growth in the face of trade tensions with ...
3 The projects were a $20 billion rail link and two gas pipelines worth $2.3 billion. All three were part of China's Belt and Road Initiative (BRI), a massive project ...
4 A new report [in Chinese] issued by the China Internet Network Information Center (CNNIC) put the number of people in China with access to ...
5 China is Iran's biggest oil customer and the shift shows the communist nation wants to keep buying Iranian crude oil despite US sanctions ...
6 Western media seized on a new Pentagon report that Chinese bombers are training to strike deep into the Western Pacific, including Guam, the ...
7 Chinese markets led gains on Tuesday in a mostly positive trading session across Asia, extending their upward climb from the previous day ...
8 As of today, ASF has been reported at sites in four provinces in China's northeast, thousands of kilometers apart. Containing the disease in a ...
9 China's 10,000-ton medical ship, the Peace Ark, has cut a broad arc through the Pacific, stopping off in Papua New Guinea, Vanuatu and Fiji ...
10 Living in China in the early 2000s changed my perspective. I saw firsthand that the outside world's view--China was good at copying but bad at ...
>
Web Page Test of URL
Nb. Note result order will be different for different users via web page for a logged in user.
Citation:
Jinseog Kim - Associate professor in the Department of Applied Statistics at Dongguk University. He received Ph.D of Statistics in 2003 in Department of Statistics at Seoul National University. His research interests are data mining related topics including machine learning, big data analytics, networked data analysis.
Presentation Link: http://datamining.dongguk.ac.kr/lectures/2016-2/bigdata/google.pdf

Creating a corpus out of texts stored in JSON files in R

I have several JSON files with texts in grouped into date, body and title. As an example consider:
{"date": "December 31, 1990, Monday, Late Edition - Final", "body": "World stock markets begin 1991 facing the threat of a war in the Persian Gulf, recessions or economic slowdowns around the world, and dismal earnings -- the same factors that drove stock markets down sharply in 1990. Finally, there is the problem of the Soviet Union, the wild card in everyone's analysis. It is a country whose problems could send stock markets around the world reeling if something went seriously awry. With Russia about to implode, that just adds to the risk premium, said Mr. Dhar. LOAD-DATE: December 30, 1990 ", "title": "World Markets;"}
{"date": "December 30, 1992, Sunday, Late Edition - Final", "body": "DATELINE: CHICAGO Gleaming new tractors are becoming more familiar sights on America's farms. Sales and profits at the three leading United States tractor makers -- Deere & Company, the J.I. Case division of Tenneco Inc. and the Ford Motor Company's Ford New Holland division -- are all up, reflecting renewed agricultural prosperity after the near-depression of the early and mid-1980's. But the recovery in the tractor business, now in its third year, is fragile. Tractor makers hope to install computers that can digest this information, then automatically concentrate the application of costly fertilizer and chemicals on the most productive land. Within the next 15 years, that capability will be commonplace, predicted Mr. Ball. LOAD-DATE: December 30, 1990 ", "title": "All About/Tractors;"}
I have three different newspapers with separate files containing all the texts produced for the period 1989 - 2016. My ultimate goal is to combine all the texts into a single corpus. I have done it in Python using the pandas library and I am wondering if it could be done in R similarly. Here is my code with the loop in R:
for (i in 1989:2016){
df0 = pd.DataFrame([json.loads(l) for l in open('NYT_%d.json' % i)])
df1 = pd.DataFrame([json.loads(l) for l in open('USAT_%d.json' % i)])
df2 = pd.DataFrame([json.loads(l) for l in open('WP_%d.json' % i)])
appended_data.append(df0)
appended_data.append(df1)
appended_data.append(df2)
}
Use jsonlite::stream_in to read your files and jsonlite::rbind.pages to combine them.
There many options in R to read json file and convert them to a data.frame/data.table.
Here one using jsonlite and data.table:
library(data.table)
library(jsonlite)
res <- lapply(1989:2016,function(i){
ff <- c('NYT_%d.json','USAT_%d.json' ,'WP_%d.json')
list_files_paths <- sprintf(ff,i)
rbindlist(lapply(list_files_paths,fromJSON))
})
Here res is a list of data.table. If you want to aggregate all data.table in a single data.table:
rbindlist(res)
Use ndjson::stream_in to read them in faster and flatter than jsonlite::stream_in :-)