RSelenium: input value in text box - html

I'm looking to use RSelenium to input some gene names into an online repository that creates a functional annotation heatmap for said genes.
However, I'm struggling to work out how to input the gene list into the text box to generate the heatmap.
Here is an image of the text box and the html associated with it.
The code I have so far (see below) can navigate successfully to the appropriate page, and select the appropriate text box but I can't work out how to input the text such that the list of genes at the bottom of the html code is added to as if I was typing the genes in manually. Note, the genes you can see in text box in the image were input manually.
##### Load driver and navigate to site #####
driver <- rsDriver(browser=c("chrome"), chromever="80.0.3987.106")
remDr <- driver[["client"]]
remDr$navigate("http://solo.bmap.ucla.edu/shiny/webapp/")
## Select heatmap option
gene_toggle <- remDr$findElement(using = 'css', '[class="dropdown-toggle"]')
gene_toggle$clickElement()
input <- remDr$findElement(using = 'css', '[data-value="panel-Heatmap"]')
input$clickElement()
## Input gene list to text box - not working yet - can't get text to enter properly
gene_select <- remDr$findElement(using = 'css', '[class="selectize-input items not-full has-options has-items"]')
gene_select$clickElement()
##### NOTE I HAVE TRIED THESE OPTION BELOW ....
gene_select$sendKeysToElement("NEUROD1")
gene_select$sendKeysToElement(list("NEUROD1"))
gene_select$sendKeysToElement(list("NEUROD1", key = "enter"))
gene_select$sendKeysToElement(list("NEUROD6, NEUROD2"))
I feel like I'm almost there, but unsure if I'm selecting the wrong element or formatting the sendKeysToElement command wrongly. I'm fairly new to RSelenium.
Any advice would be greatly appreciated.

You're almost there indeed, just need to select the <input> element inside your gene_select:
input <- gene_select$findChildElement(using = 'xpath', value = 'input')
input$sendKeysToElement(list("NEUROD2", key = "enter"))

Related

Selenium, using find_element but end up with half the website

I finished the linked tutorial and tried to modify it to get somethings else from a different website. I am trying to get the margin table of HHI but the website is coded in a strange way that I am quite confused.
I find the child element of the parent that have the text with xpath://a[#name="HHI"], its parent is <font size="2"></font> and contains the text I wanted but there is a lot of tags named exactly <font size="2"></font> so I can't just use xpath://font[#size="2"].
Attempt to use the full xpath would print out half of the website content.
the full xpath:
/html/body/table/tbody/tr/td/table/tbody/tr/td/table/tbody/tr[3]/td/pre/font/table/tbody/tr/td[2]/pre/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font/font
Is there anyway to select that particular font tag and print the text?
website:
https://www.hkex.com.hk/eng/market/rm/rm_dcrm/riskdata/margin_hkcc/merte_hkcc.htm
Tutorial
https://www.youtube.com/watch?v=PXMJ6FS7llk&t=8740s&ab_channel=freeCodeCamp.org
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
import pandas as pd
# prepare it to automate
from datetime import datetime
import os
import sys
import csv
application_path = os.path.dirname(sys.executable) # export the result to the same file as the executable
now = datetime.now() # for modify the export name with a date
month_day_year = now.strftime("%m%d%Y") # MMDDYYYY
website = "https://www.hkex.com.hk/eng/market/rm/rm_dcrm/riskdata/margin_hkcc/merte_hkcc.htm"
path = "C:/Users/User/PycharmProjects/Automate with Python – Full Course for Beginners/venv/Scripts/chromedriver.exe"
# headless-mode
options = Options()
options.headless = True
service = Service(executable_path=path)
driver = webdriver.Chrome(service=service, options=options)
driver.get(website)
containers = driver.find_element(by="xpath", value='') # or find_elements
hhi = containers.text # if using find_elements, = containers[0].text
print(hhi)
Update:
Thank you to Conal Tuohy, I learn a few new tricks in Xpath. The website is written in a strange way that even with the Xpath that locate the exact font tag, the result would still print all text in every following tags.
I tried to make a list of different products by .split("Back to Top") then slice out the first item and use .split("\n"). I will .split() the lists within list until it can neatly fit into a dataframe with strike prices as index and maturity date as column.
Probably not the most efficient way but it works for now.
product = "HHI"
containers = driver.find_element(by="xpath", value=f'//font[a/#name="{product}"]')
hhi = containers.text.split("Back to Top")
# print(hhi)
hhi1 = hhi[0].split("\n")
df = pd.DataFrame(hhi1)
# print(df)
df.to_csv(f"{product}_{month_day_year}.csv")
You're right that HTML is just awful! But if you're after the text of the table, it seems to me you ought to select the text node that follows the B element that follows the a[#name="HHI"]; something like this:
//a[#name="HHI"]/following-sibling::b/following-sibling::text()[1]
EDIT
Of course that XPath won't work in Selenium because it identifies a text node rather than an element. So your best result is to return the font element that directly contains the //a[#name="HHI"], which will include some cruft (the Back to Top link, etc) but which will at least contain the tabular data you want:
//a[#name="HHI"]/parent::font
i.e. "the parent font element of the a element whose name attribute equals HHI"
or equivalently:
//font[a/#name="HHI"]
i.e. "the font element which has, among its child a elements, one whose name attribute equals HHI"

Rselenium select dropdown menu

Hi I am trying to use Rselenium to select a dropdown menu.
The field I want to click for the dropdown menu is Date Range so I look up in the html code (see picture below) and found
class="select2-choice"
to be the pointer so I invoke command to click on the dropdown menu
webElem <- rd$client$findElement(using = 'xpath',
value = '//*[#class="select2-choice"]')
webElem$clickElement()
Then I want to select "Custom" in the dropdown field so I look up in the html code (see picture below) and found it is under
select id="namedRange-3640"
and the option is
value="custom"
so I try to invoke Rselenium command again to click on this custom field
webElem <- rd$client$findElement(using = 'xpath', "//select[#id='namedRange-3640']/option[#value='custom']")
webElem$clickElement()
However there is no action in the webpage, there is no warning from the code either. I tried in other webpage with much simpler structure like W3C tutorial on dropdown menu and it works. However in this case it seems to be slightly more complicated, with something called ng-repeat which I have not come across before. Anyone know how to select the custom field?
Many thanks
This could be the solution.
library(RSelenium)
remDr <- remoteDriver(browser=c("firefox"), port = 4445)
remDr$open()
remDr$navigate("your_web_site.com")
frame_ws<- remDr$findElement(using='id', value="iframeResult")
remDr$switchToFrame(frame_ws)
#You can replace "today" with all elements the list
option <- remDr$findElement(using = 'xpath', "//*/option[#value = 'today']")
option$clickElement()
If you want to deep the argument you should visit here

Can't access specific content in html page with rvest and selectorGadget

I'm trying to scrape a ncbi website (https://www.ncbi.nlm.nih.gov/protein/29436380) to obtain information of a protein. I need to access the gene_synonyms and GeneID fields. I have tried to find the relevant nodes with the selectorGadget addon in chrome and with the code inspector in ff. I have tried this code:
require("dplyr")
require("rvest")
require("stringr")
GIwebPage <- read_html("https://www.ncbi.nlm.nih.gov/protein/29436380")
TestHTML <- GIwebPage %>% html_node("div.grid , div#maincontent.col.nine_col , div.sequence , pre.genebank , .feature") %>% html_text(trim = TRUE)
Then I try to find the relevant text but it is simply not there.
str_extract_all(TestHTML, pattern = "(synonym).{30}")
[[1]]
character(0)
str_extract_all(TestHTML, pattern = "(GeneID:).{30}")
[[1]]
character(0)
All I seem to be accessing is some of the text content of the column on the right.
str_extract_all(TestHTML, pattern = "(protein).{30}")
[[1]]
[1] "protein codes including ambiguities a"
[2] "protein sequence for myosin-9 (NP_00"
[3] "protein should not be confused with t"
[4] "protein, partial [Homo sapiens]gi|294"
[5] "protein codes including ambiguities a"
I have tried so many combinations of nodes selections with html_node() that I don't know anymore what to try. Is this content buried in some structure I can't see? or I'm just not skilled enough to realize the node to select?
Thanks a lot,
José.
The page is dynamically loading the information. The underlying information is store at another location.
Using the developer tools from your bowser, look for the link:
The information you are looking for is store at the "viewer.fcgi", right click to copy the link.
See similar question/answers: R not accepting xpath query

Flextable : using superscript in the dataframe

This question was asked few times, but surprinsingly, no answer was given.
I want some numbers in my dataframe to appear in superscript.
The functions compose and display are not suitable here since I don't know yet which values in my dataframe will appear in superscript (my tables are generated automatically).
I tried to use ^8^like for kable, $$10^-3$$, paste(expression(10^2)), "H\\textsubscript{123}", etc.
Nothing works !! Help ! I pull out my hair...
library(flextable)
bab = data.frame(c( "10\\textsubscript{-3}",
paste(as.expression(10^-3)), '10%-3%', '10^-2^' ))
flextable(bab)
I am knitting from Rto html.
In HTML, you do superscripts using things like <sup>-3</sup>, and subscripts using <sub>-3</sub>. However, if you put these into a cell in your table, you'll see the full text displayed, it won't be interpreted as HTML, because flextable escapes the angle brackets.
The kable() function has an argument escape = FALSE that can turn this off, but flextable doesn't: see https://github.com/davidgohel/flextable/issues/156. However, there's a hackish way to get around this limitation: replace the htmlEscape() function with a function that does nothing.
For example,
```{r}
library(flextable)
env <- parent.env(loadNamespace("flextable")) # The imports
unlockBinding("htmlEscape", env)
assign("htmlEscape", function(text, attribute = FALSE) text, envir=env)
lockBinding("htmlEscape", env)
bab = data.frame(x = "10<sup>-3</sup>")
flextable(bab)
```
This will display the table as
Be careful if you do this: there may be cases in your real tables where you really do want HTML escapes, and this code will disable that for the rest of the document. If you execute this code in an R session, it will disable escaping for the rest of the session.
And if you were thinking of using a document like this in a package you submit to CRAN, forget it. You shouldn't be messing with bindings like this in code that you expect other people to use.
Edited to add:
In fact, there's a way to do this without the hack given above. It's described in this article: https://davidgohel.github.io/flextable/articles/display.html#sugar-functions-for-complex-formatting. The idea is to replace the entries that need superscripts or subscripts with calls to as_paragraph, as_sup, as_sub, etc.:
```{r}
library(flextable)
bab <- data.frame(x = "dummy")
bab <- flextable(bab)
bab <- compose(bab, part = "body", i = 1, j = 1,
value = as_paragraph("10",
as_sup("-3")))
bab
```
This is definitely safer than the method I gave.

Rvest getting an specific text from html_node

I want to extract only "Beech Valley Solutions - "
When I run
html_nodes('li') %>%
html_nodes(".flexbox.empLoc") %>%
html_text()
All the information comes out. "Beech Valley Solutions - Atlanta, GA Today 24hr"
There is one more way of doing scraping using rvest.
Instead of passing css selector item in html_nodes(), you can pass xpath within html_nodes().Just an example below -
page %>% html_nodes(xpath = "//*[#id='series-matches']/div[20]/div[3]/div[1]/a[1]/span")
Reference:
https://blog.rstudio.com/2014/11/24/rvest-easy-web-scraping-with-r/
x path is easier to fetch -
Right click the section for which you want to fetch xpath.
Select inspect code from the drop down. 3. html page will appear to the right side, from which click the right click and press Copy option.
Drop will appear from which select "Copy xpath".
Ctrl V (Paste) the xpath within html_nodes(xpath = "xpath here"). I hope this will help you.