I'm trying to grab a table from the following webpage
http://www.bloomberg.com/markets/companies/country/hong-kong/
I have some sample code which was kindly provided by Phil Bozak here:
grabbing table from html using Google script
which grabs the table for this website:
http://www.airchina.com.cn/www/en/html/index/ir/traffic/
As you can see from Phil's code, there is alot of "getElement()" in the code. If i look at the html code for the Air China website. It looks like it's nested four times? that's why the string of .getElement?
Now I look at the source code for the Bloomberg page and its is load with "div"...
the question is can someone show me how to grab the table from this the Bloomberg page?
and just a brief explanation of the theory also would be useful. Thanks a bunch.
Let's flip your question upside down, and start with the theory. Methodology might be a better word for it.
You want to get at something specific in a structured page. To do that, you either need a way to zap right to the element (which can be done if it's labeled in a unique way that we can access), OR you need to navigate the structure more-or-less manually. You already know how to look at the source of a page, so you're familiar with this step. Here's a screenshot of Firefox Inspector, highlighting the element we're interested in.
We can see the hierarchy of elements that lead to the table: html, body, div, div, div.ticker, table.ticker_data. We can also see the source:
<table class="ticker_data">
Neat! It's labeled! Unfortunately, that class info gets dropped when we process the HTML in our script. Bummer. If it was id="ticker_data" instead, we could use the getElementByVal() utility from this answer to reach it, and give ourselves some immunity from future restructuring of the page. Put a pin in that - we'll come back to it.
It can help to visualize this in the debugger. Here's a utility script for that - run it in debug mode, and you'll have your HTML document laid out to explore:
/**
* Debug-run this in the editor to be able to explore the structure of web pages.
*
* Set target to the page you're interested in.
*/
function pageExplorer() {
var target = "http://www.bloomberg.com/markets/companies/country/hong-kong/";
var pageTxt = UrlFetchApp.fetch(target).getContentText();
var pageDoc = Xml.parse(pageTxt,true);
debugger; // Pause in debugger - explore pageDoc
}
This is what our page looks like in the debugger:
You might be wondering what the numbered elements are, since you don't see them in the source. When there are multiples of an element type at the same level in an XML document, the parser presents them as an array, numbered 0..n. Thus, when we see 0 under a div in the debugger, that's telling us that there are multiple <div> tags in the HTML source at that level, and we can access them as an array, for example .div[0].
Ok, theory behind us, let's go ahead and see how we can access the table by brute-force.
Knowing the hierarchy, including the div arrays shown in the debugger, we could do this, ala Phil's previous answer. I'll do some weird indenting to illustrate the document structure:
...
var target = "http://www.bloomberg.com/markets/companies/country/hong-kong/";
var pageTxt = UrlFetchApp.fetch(target).getContentText();
var pageDoc = Xml.parse(pageTxt,true);
var table = pageDoc.getElement()
.getElement("body")
.getElements("div")[0] // 0-th div under body, shown in debugger
.getElements("div")[5] // 5-th div under there
.getElement("div") // another div
.getElement("table"); // finally, our table
As a much more compact alternative to all those .getElement() calls, we can navigate using dot notation.
var table = pageDoc.getElement().body.div[0].div[5].div.table;
And that's that.
Let's go back to that pinned idea. In the debugger, we can see that there are various attributes attached to elements. In particular, there's an "id" on that div[5] that contains the div that contains the table. Remember, in the source we saw "class" attributes, but note that they don't make it this far.
Still, the fact that a kindly programmer put this "id" in place means we can do this, with getDivById() from that earlier question:
var contentDiv = getDivById( pageDoc.getElement().body, 'content' );
var table = contentDiv.div.table;
If they move things around, we might still be able to find that table, without changing our code.
You already know what to do once you have the table element, so we're done here!
Related
Does anyone know of a way to have a central master template for google slide presentations that automatically cascades changes down to presentations using it ?
If not automatic then maybe there is something that can be done with google apps script to pull any changes to the master template down to the associated presentations ?
Here is a simple example of what I am trying to do:
Create master template/theme (M1) with layout (L1) with two placeholders and a company logo
Create new presentation (P1) importing theme M1 above using Layout L1
Amend master theme M1 Layout A with new company logo or new placeholder
How do i get this change to propagate to P1 without manually importing the template/theme again ? It would be ideal if P1 could subscribe to changes in M1 but i can't see any option for this so was wondering if I could script something ?
Thanks in advance
Greg
This is not possible in Apps Script right now
There is a feature request for this in the Issue Tracker, go give it a ☆!
https://issuetracker.google.com/issues/129457735
Maybe go and explain your use case for it too.
Possible avenue for workaround
The best workaround I can think of is something along the lines of this script:
function copyStyling() {
// This is a standalone script
let masterID = "1107dQEIAbZ8ipBi0wvU6cdy4OV7N2hURT5fjgOwm_vY";
let childID = "1XvGARRBzXofsjrFJkl8SCmt3tQJ2nkw1n9MG3tr9fhU";
// Master Slide Variables
let masterPresentation = SlidesApp.openById(masterID);
let masterSlide = masterPresentation.getSlides()[0];
let masterElements = masterSlide.getPageElements();
// Get style elements
let masterBackground = masterSlide.getBackground();
let masterSolidFill = masterBackground.getSolidFill().getColor();
// etc
// ...
// Child Slide Variables
let childPresentation = SlidesApp.openById(childID);
let childSlide = childPresentation.getSlides()[0];
let childElements = childSlide.getPageElements();
// Updating the stylings for the page
let childBackground = childSlide.getBackground();
childBackground.setSolidFill(masterSolidFill);
// etc
// ...
// Updating the stylings for each element on the page
masterElements.forEach((element, i) => {
childElements[i].setLeft(element.getLeft());
childElements[i].setTop(element.getTop());
// etc
// ...
});
}
This script works if both Master and Child presentations use the same theme (i.e. the master style sheets)
It works by having a single slide in a "Master presentation" which you modify and the Child presentations also have only a single slide.
It gets style info. This script gets the background of the slide (if its a solid fill) and the top left position of each element.
It then updates the child with this information.
It really depends on how many changes are going to happen to the child presentations. If no elements are going to change, and only limited style characteristics are going to change, then it shouldn't take too long to get a working script together. It would just involve going through the documentation and picking out the attributes you want to update.
If the number of elements are going to change, their positions going to be rearranged, with very different content from the placeholders, then it can get considerably more complex. Then it becomes a function of how many hours you can invest into it! Though hopefully this serves as a good starting point for that.
Ideally to this script would be added the width and height of each element to go along with the top and left position, their rotation, transformation, font, font color, font font style, direction, and minimal support for shapes. With these things I believe you could have quite a powerful tool.
Reference
Apps Script Slides Service
As a beginning self-made amateur programmer I’m currently trying to get some things done with Google Fusion Tables.
I made a map with markers and got the HTML of that map. But I wish to add the function of a tooltip by a mouseover of a particular marker. I found a tutorial to work this out but I can’t enable the tooltips.
The following link shows the progress so far: http://jsbin.com/cipejicewo/1/watch?html,js,output
1 I don’t have to change something in this script that fits to the specific Fusion Table where its linked with, do I? When I do have to change the javascript, what are the specific elements I have to rename?
2 How can i call google.maps.FusionTablesLayer.enableMapTips(options)? And where do I have to put this whole ‘function init’ code in the html-file? Directly in the script that described above? Off course without losing the functions that the html already provides. Besides that, I get that I have to change the tableid and change the select column and geometry column name, but is there something more I should change in this function I'm going to add?
I'm struggeling with it now for days. And I'm out of options, so every help would be welcome. Thanks in advance!
It's not clear what your code currently looks like, but these 2 things you'll need to do first:
when you use another FusionTable than the example, make sure that the table is public and downloadable
You must use your own key(it's the variable apiConsoleKey in the example). Follow the steps in Acquiring and using an API key to get a valid key.
I need to position multiple (ultimately 4, but I am starting with two here) d3 graphs on one web page. Following this tutorial, I created two divs:
<div id="donut"></div>
<div id="line-graph"></div>
And then I appended the graphs to their respective divs like so:
var svg = d3.select("#line-graph").append("svg")
AND
var svg = d3.select("#donut").append("svg")
Yet, they are still on top of each other on the page. What am I missing?
I know there are other people who have had this problem, but a lot of those questions are either unanswered, or the answer did not solve my problem. You can see what I am talking about here.
Thanks in advance.
Both your scripts declare global variables named 'svg' and then reference this global variable in the callback after the file is loaded. If you inspect your graphs, you'll see that they are actually both on the same SVG element and the SVG element that the line graph should be on is empty.
You need to rename your variables in your second script so that they have different names than the variables in the first script.
Using PhantomJS V 1.8.1
Thanks in advance.
I am trying to run some tests on a website that I am developing which is using backbone.js.
One of my tests involve checking to see if a Canvas is present and clicking on it. My problem is that whatever selector I use to get the Canvas Element I cannot get the selector to find it. I use the same CSS selector in Google Chrome when viewing the page and all is OK. At first I thought that the issue may have been due to the element not being present on the page but other elements which are inserted with the canvas are present so I am 99% sure that this is not the problem.
The selectors I have tried to use are:
document.querySelectorAll('#idOfCanvas');
document.querySelectorAll('canvas#idOfCanvas');
Also if I use .classClassName:nth(1) to select the tyre selector, it still fails to work (works in Google Chrome though as does the other examples provided)
The canvas has a class name which is picked up by the selector by I would rather not use a class selector.
Any help would be much appreciated.
Cheers :)
Also
Like I mentioned I am almost certain that the Canvas exists as the container div for it exists. Also I have four elements on the page with the same className (two of which are canvases) and four elements are being returned when I run
return document.querySelectorAll('.className').length = 4;
Assuming you have something like this:
<canvas id="idOfCanvas"></canvas>
This should work:
canvas = document.getElementById("idOfCanvas");
// or
canvas = document.querySelector("#idOfCanvas"); // Only get the first match, ID's should be unique, any way.;
// or
canvas = document.querySelectorAll("#idOfCanvas")[0];
// or
canvas = document.getElementsByTagName("canvas")[0]; // Get the first <canvas> element.
However, you'll have to make sure your canvas element is actually loaded when the script is executed. Have a look at this onload tutorial, for example.
Try this :
canvas = document.getElementById(#IdOfCanvas:nth-child(1));
I am developing an Windows Forms application using VB.NET that offers the user to lookup addresses on Google Maps through a Web Browser. I can also successfully show the directions between two points to the user, as well as allow the user to drag the route as he/she pleases. My question now is - is it possible for me to get the lattitude/longitude information of the route, i.e. the overview_polyline array of encoded lattitude/longitude points and save it to e.g. a text file on my computer? Or is it possible to get a list of all the addresses located both sides of the route over the entire length of the route, and then save the data to a file on my computer? I'm using HTML files to access and display the Google Maps data in the Web Browser item.
Thank you
This is actually pretty simple if your just looking for the screen coordinates.
// this probably should be in your form initialization
this.MouseClick += new MouseEventHandler(MouseClickEvent);
void MouseClickEvent(object sender, MouseEventArgs e)
{
// do whatever you need with e.Location
}
if your strictly looking for the point in the browser, you need to consider the functions
browser.PointToClient();
browser.PointToScreen();
So, this method is usable if you know exactly where your form is (easy to get its coords) and where you webbrowser control is (easy to get coords of this as well since it's just a control in your form) and then, as long as you know how many pixels from the left or right, and from the top or bottom the image will be displayed, once you get the global mouse click coords (which is easy) you can predict where it was clicked on the image.
Alternatively, there are some scarier or uglier ways to do it here...
You can use the ObjectForScripting property to embed code to do this in the webbrowser. It's ugly to say the least. MSDN has some documentation on the process here: http://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser.objectforscripting.aspx
Because its really ugly, maybe a better solution is to use AxWebBrowser - it's ugly too but not so scary.
In addition, I found this post of someone wanting to do it on a pdf document, and a MSFT person saying its not possible, but really what he is trying to say is that it isn't built in, even with a pdf document its still possible to predict with high to certain accuracy where it was clicked if you use the first method i described. Here is the post anyway: http://social.msdn.microsoft.com/Forums/en/csharpgeneral/thread/2c41b74a-d140-4533-9009-9fcb382dcb60
However, it is possible, and there are a few ways to do it, so don't get scared from that last link I gave ya.
Also, this post may help if you want to do it in javascript:
http://www.devx.com/tips/Tip/29285
Basically, you can add an attribute on the image through methods available in the webbrowser control, you can add something like onclick="GetCoords();" so when it is clicked, the JavaScript function will get the coords, and then you can use javascript to place the values in a hidden input field (input type="hidden") which you can add through the webbrowser control, or if there is one already on the page, you can use that. So, once you place the coords using javacript into that input field, you can easily grab the value in that using the webbrowser control, eg:
webbrowser1.document.getElementById("myHiddenInputField").value
That will get the value in that field, which you've set through JavaScript. Also, the "GetCoords()" function i mentioned is called SetValues() in the javascript method link i provided above (in the devx.com site) but I named it GetCoords because it makes more sense and didn't want to confuse you with the actual name they used, you can change this to any name you want of course. Here is the javascript they were using, this only gets the coords into a variable, doesn't put it into a hidden input field, we will need to do that in addition (at the end of the javascript SetValues/GetCoords function).
function SetValues()
{
var s = 'X=' + window.event.clientX + ' Y=' + window.event.clientY ;
document.getElementById('divCoord').innerText = s;
}
These guys are just saving it inside a div element, which is visible to users, but you can make the div invisible if you want to use a div field, there is no advantage or disadvantage in doing that, you would just need to set the visible property to false using javascript or css, but still, it is easier to use a hidden input field so you don't need to mess with any of that.
Let me know how you get along.