R Scraping html webpage using XML

R Scraping html webpage using XML - html

I am trying to scrape this webpage using the following code.
library(XML)
url <- html("http://www.gallop.co.za/")
doc <- htmlParse(url)
lat <- xpathSApply(doc,path="//p[#id=Racecards]",fun = xmlGetAttr , name = 'Racecards')
I looked at the webpage and the table i want to scrape is the racecard table, primarily to get the links to where the racecard data is.
I used selector gadget which returns the xml path as:
//*[(#id = "Racecards")]
However, when i use the R code, it returns a zero list. It feels like i'm getting the xml path wrong somehow, what is the correct way to return the table but also return the links within the table?

It seems that the data are transported through json and use js to insert into html. So you can't get the data from html. You can get it directly from json.
library(RCurl)
library(jsonlite)
p <- getURL("http://www.gallop.co.za/cache/horses.json")
fromJSON(p)

Related

How to read in a table from an HTML website using XML [R]

Issue
I am trying to read in the table from a website, specifically this website: https://www.nba.com/stats/teams/traditional/?sort=W_PCT&dir=-1&Season=2004-05&SeasonType=Regular%20Season
Here is how I went about it:
library(XML)
url <- "https://www.nba.com/stats/teams/traditional/?sort=W_PCT&dir=-1&Season=2004-05&SeasonType=Regular%20Season"
nbadata <- readHTMLTable(url,header=T, which = 1, stringAsFactors = F)
I am getting an error message that I am not familiar with:
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’
In addition: Warning message:
XML content does not seem to be XML: 'Season'
Questions
Where do I go wrong in my coding? Is XML the correct approach?
What does the error message mean?
I want to be able to extract the table from the 04-05 season all the way through to the 20-21 season. (You can see that the website offers a filter at the top left of the table that allows you to filter through seasons.) Is there an efficient way to extract each table from each season?

How can I get specific data from the table using Rselenium?

I am trying to scrap a table that I believe is using Java script. I want to get the data for indices (i.e., TSX). I would like to get the "Previous day data" for all indices. I am scrapping the data using Rselenium but it is unable to locate element.
Following is my code for scrapping previous day data for index called TSX:
library(RSelenium)
driver<- rsDriver(browser = "firefox")
remDr <- driver[["client"]]
remDr$navigate("http://bmgfunds.com/interactive-charts/")
elem <- remDr$findElement(using="xpath", value="//*[#id='indices-quotes']/table/tbody/tr[1]/td[2]")
In order to get the Xpath, I inspected the element and copied the Xpath by right clicking in the pan.
I also tried using rvest.
library(rvest)
st_table <- read_html("http://bmgfunds.com/interactive-charts/")
table<-html_nodes(st_table, "tbody tr")
Unfortunately, I get zero element {xml_nodeset (0)}
Any suggestion or help will be appreciated.Thanks

The table is within an iframe whose source is http://integration.nfusionsolutions.biz/client/bullionmanagementgroup/module/quotechartfull, so you can grab the table from there:
st_table <- read_html("http://integration.nfusionsolutions.biz/client/bullionmanagementgroup/module/quotechartfull")
(table <- html_table(st_table)[[3]])
This code grabs all the tables from the previous url with html_table and selects the table that you want (which is the third element of the list).

Analysis of deviance table model output in HTML

I am trying to export the output of an 'Analysis of deviance table' in HTML format, so that it can be inserted into a word document.
I created a GLM model as follows:
newmod <- glm(cbind(Recaptured, predated) ~ Morph * Plant * Site, data =
survival, family = binomial)
Running the following code gives me the output that I would like to export to HTML:
anova(newmod,test="Chisq")
I have tried the following code to create a HTML table using stargazer, however it doesn't seem to be working:
anova_mod<-anova(newmod,test="Chisq")
stargazer(newmod, type="html", out = "anova_output.htm")
Is there a simple way of doing this in r? I have managed to successfully export the summary statistics, but what I really need is the Analysis of deviance table.

I believe you are looking for:
print(xtable(anova_mod), type = "html")
as indicated by this answer: Exporting R tables to HTML
Here is my full code for reproducing something similar to your question:
plant.df = PlantGrowth
plant.df$group = factor(plant.df$group,labels = c("Control", "Treatment 1", "Treatment 2"))
newmod = lm(weight ~ group, data = plant.df)
anova_mod=anova(newmod)
anova_mod
install.packages("xtable")
require(xtable)
print(xtable(anova_mod), type = "html")
You can then paste the output to an html vizualizer such as: https://htmledit.squarefree.com/ to see the resulting table.
Instead of printing it, you can write it to a file. I have not personally tested this part, but the second answer in this question should work for you: Save html result to a txt or html file
Note: You can also reference all parts of the anova_mod separately by adding a $ after it like anova_mod$Df.

How can I specify the table I want to extract from a URL into R when there are multiple tables on the webpage?

I'm trying to extract MLB player stats from Baseball Reference. I've navigated to the URL that houses this data and execute the following code in my RStudio.
install.packages('rvest')
library(rvest)
url <- 'http://www.baseball-reference.com/leagues/MLB/2017-standard-batting.shtml#player_standard_batting::none'
webpage <- read_html(url)
b_table <- html_nodes(webpage, 'table')
b <- html_table(b_table)[[1]]
head(b)
This snipet of code however extracts the first table on the webpage, not the one that I need. I've tried using various pieces of the html code to specify the correct table but I can't figure it out.

How to display values from mysql database table into a html page using cherrypy?

how to to render values from database(mysql) table into my html page using cherrypy?
Actually what I am trying to do is, I have a html page and I want to display values from database in fields against each label.
I have searched and searched a lot, and what i found is this:
#cherrypy.expose
def extract(self):
cur = db.cursor()
cur.execute('select count(*) from config')
res = cur.fetchone()
db.commit()
cur.close()
return "<html><body>Hello, you have %d records in your table</body></html>" % res
Instead of creating a new page in the return statement i want these database values to display in my html page, corresponding to their labels.
How to do that in python using cherrypy?
test.html this is the link to my html page where in textboxes against the labels i want to display values from database table.
How to achieve this?
PS: I am a newbie to both python and cherrypy, any help would be appreciated.

One solution is using Template engines- jinja2 or mako as explained by #webKnjaZ in comment section.
Second solution is:
Return json data from pyhton function and make ajax call from html page to get the data. And once you have json data in your html , you can parse that data and display it in your html page.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

R Scraping html webpage using XML - html

It seems that the data are transported through json and use js to insert into html. So you can't get the data from html. You can get it directly from json. library(RCurl) library(jsonlite) p <- getURL("http://www.gallop.co.za/cache/horses.json") fromJSON(p)

Related

How to read in a table from an HTML website using XML [R]

How can I get specific data from the table using Rselenium?

Analysis of deviance table model output in HTML

How can I specify the table I want to extract from a URL into R when there are multiple tables on the webpage?

How to display values from mysql database table into a html page using cherrypy?

Categories

Resources