Google sheets importxml failure - Can't find the correct path to table from the link - html

I'm trying to retrieve a table which is updating twice per day. On other websites i was able to find the element but i saw that the way i see don't work on all websites where i tried.
In this case the issue is:
In google sheets using importxml, i can't find the correct path to table from the link or identify the element.
The website for this example is: http://lotopolonia.com/tabel/arhiva/index.php
1. I need to retrieve the dates and numbers.
2. They are updated twice per day and being updated in my sheet with adding just the last line at the top of the others. But this one after i solve the first one.
I looked at xpath tutorial from w3c and understood the syntax a bit.
The problem is how to identify correctly the elements and nodes in the inspector to retrieve the data i need.
Also, i've installed a chrome extension (XPath Helper) which shows xpath better that what i got from chrome.
I tried the following:
=IMPORTXML("http://lotopolonia.com/tabel/arhiva/index.php","//table[#class='table_01']/tbody/tr[#class='second_row']/td[#class='colon2']")
=IMPORTXML("http://lotopolonia.com/tabel/arhiva/index.php","//table[#class='table_01']/tbody/tr[#class='second_row']/td[*]")
=IMPORTXML("http://lotopolonia.com/tabel/arhiva/index.php","//table[#class='table_01']/tbody/tr[#class='first_row'][1]/td[*]")
=IMPORTXML("http://lotopolonia.com/tabel/arhiva/index.php","//*[#class='table_01']/table/tbody/tr[#class='first_row'][1]/td[*]")
=IMPORTXML("http://lotopolonia.com/tabel/arhiva/index.php","//table[#class='table_01']/tbody/tr[3]/td[*]")
=IMPORTXML("http://lotopolonia.com/tabel/arhiva/index.php","//table[#class='table_01']/tbody/tr[*]/td[*]")
=IMPORTXML("http://lotopolonia.com/tabel/arhiva/index.php","//table[#class='table_01']/tbody/tr[#class='second_row'][1]/child::td[*]")
The formula looks ok, without errors, but at all above requests i get the same result: imported content is empty
Unfortunately i ran out of ideas and how to interpret that elements...
Any ideea how to go on?
Cheers

How about this answer? I used //table[#class='table_01']/tr[position()>2] as a xpath. "A1" has http://lotopolonia.com/tabel/arhiva/index.php.
=IMPORTXML(A1,"//table[#class='table_01']/tr[position()>2]")
Using table[#class='table_01'], retrieve the table.
Using tr[position()>2], retrieve the dates and numbers.
Result :
Note :
If you want to retrieve the whole table, please use =IMPORTXML(A1,"//table[#class='table_01']/tr").
If this was not what you want, I'm sorry.

Related

Google Spreadsheets ArrayFormula: How to split and transpose a cell-range?

Hello everybody and thanks a lot for your help.
Here's my problem:
What I have:
I have a table with raw data in 53 rows and numerous columns which I would like to reduce and restructure into three columns: City, Date and Value.
https://docs.google.com/spreadsheets/d/1bsdC8lrtSGk957ae8Z0VRGnDqTZfFLPpLkfoid0UbIQ/edit?usp=sharing
What I've done so far:
For a single row, I used the following formula to make everything work as I wanted it to:
ArrayFormula({SPLIT(TRANSPOSE(Base_Data!A2)&"|"&TRANSPOSE(Base_Data!AJ1:1&"|"&Base_Data!AJ2:2),"|")})
What I want:
I'd like to extend the formula to work for the entire area, all 53 rows. Does anyone have a tip for this? The solution doesn't have to be a formula, it would work as a script, too
I've set up a new sheet called "New_Data [Erik]" and placed the following formula into A2:
=ArrayFormula(SPLIT(FLATTEN(Base_Data!A2:A&"\"&Base_Data!AJ1:1&"\"&Base_Data!AJ2:54),"\",0,1))
If this is a one-time conversion, I'd recommend copying the results in place. To do that, select A:C, hit Ctrl-C to Copy and then Ctrl-Alt-V to Paste Special. A small clipboard icon will appear. Click it and choose "Paste Values Only."
If you'll need this functionality ongoing, just understand that FLATTEN is a not-yet-official function of Google Sheets, which means that while Google sheets may very well make it official, they may also decide to do away with it at any time. (This is why I suggest copying and pasting the results in place, if it's just a one-time conversion.)
Not sure what you're trying to get to there. If you are trying to leave out all columns but 3, just do ={Base_Data!A2:A, Base_Data!E2:E} and add as many columns as you require comma-separated within the curly brackets

Trying to crawl some data on Google Sheets but getting beat by XPath

I'm trying to make a sheet for study purposes on stock market, and I'm using this website to get the data from. Taking this stock as example.
My goals here are:
I want to grab some of the indicators from this div area (such as P/L, LPA, M. LÍQUIDA, and others);
And some of the numbers from this tables's first column (such as row 11, 15, and others).
My issues:
I'm not being able to fetch the data that I want from the div with the IMPORTXML function, neither with copying the XPath nor trying to find a specific class name to find a match.
I'm being able to fetch the specific number that I want, but it's returning 3 different values from 3 different rows (I want only the first one), due the XPath that I'm using //table/tbody/tr[11]/td[2]/span.
There's 2 more tables down the page that uses the same XPath, and the function is returning the values from row #11 of the other tables, as you can see here. The only thing that makes the 3 different from one another it's their divs, but I'm not being able to figure out how to manipulate these divs. There's any way to fix this or any function that automatically deletes the other 2 rows?
Can someone give me a light? :(
It's almost always easier to find the values you need by a reference. This should work to get the 20,76 from the first table
(//*[contains(text(), 'P/L')]/following::strong)[1]
As far as the second table goes, this should get 52.562,18 M
(//span[contains(text(), 'Receita Líquida')]/following::td)[1]
If you need to get different columns, you can just pass a higher index, this will return -0,07% for instance.
(//span[contains(text(), 'Receita Líquida')]/following::td)[5]
I also highly recommend getting some sort of xpath tester addon for your browser to play around with these if you don't already have one. I use ChroPath:
Firefox -
https://addons.mozilla.org/en-US/firefox/addon/chropath-for-firefox/
Chrome - https://chrome.google.com/webstore/detail/chropath/ljngjbnaijcbncmcnjfhigebomdlkcjo?hl=en-US

Retrieve Google Spreadsheet Worksheet JSON

I try to receive the JSON of a Google Spreadsheet Worksheet. It worked till some days ago. For the default worksheet it still works, but not for all other worksheets.
This is the working URL for the default worksheet: https://spreadsheets.google.com/feeds/list/1caRqAA1TyBoZ0eVZvvKheEBh9SGRmQII4qih9urY70k/od6/public/full?alt=json
And this is the URL for the worksheet that stopped working: https://spreadsheets.google.com/feeds/list/1caRqAA1TyBoZ0eVZvvKheEBh9SGRmQII4qih9urY70k/1416241220/public/full?alt=json
The error message is Invalid query parameter value for grid_id.
Only difference is the worksheet parameter (od6 vs 1416241220).
Any ideas on why that error suddenly occurs?
ChrisPeterson's note:
You can use worksheet position number (1 for the first/default worksheet, 2 for the second worksheet).
Original answer
I came across the same issue and I managed to find my way out.
It seems that they recently changed the id for each worksheet.
You can find the new ID at the following
https://spreadsheets.google.com/feeds/worksheets/YOUR_SPREADSHEET_ID/private/full
I got something like o3laxt8 between <id> tags
Ps: od6 anddefault values will always work and redirect to the first worksheet of your document.
Joe Germuska' note:
od6 doesn't work anymore
Seems to work again.
I'd like to share a concrete example because I find there are enough confusing instructions out there including the accepted answer and worksheet IDs and where to put them not being obvious.
Here's a document I published and anyone with the link can view:
https://docs.google.com/spreadsheets/d/1QDWpycJJFA-UAiSPIv-icJ4UZhbEmuN8wxxag83SE1c/edit?usp=sharing
The document has to be published correctly. There are two Publish buttons and the first one doesn't work for this task. Use the second.
The document KEY is important. Obtain the KEY from between the /d/ and the /edit in the url. In my example, the key is 1QDWpycJJFA-UAiSPIv-icJ4UZhbEmuN8wxxag83SE1c.
Second, use the following URL style, replacing KEY with your own:
https://spreadsheets.google.com/feeds/list/KEY/od6/public/values?alt=json
My example url links directly to published json:
https://spreadsheets.google.com/feeds/list/1QDWpycJJFA-UAiSPIv-icJ4UZhbEmuN8wxxag83SE1c/od6/public/values?alt=json
Finally, if the worksheet has multiple sheets (or tabs), replace od6 in the url with a number. My example has two tabs, so there are two urls corresponding to either tab. I simply replace od6 with 1 and 2 depending on the order of the sheets:
Tab 1:
https://spreadsheets.google.com/feeds/list/1QDWpycJJFA-UAiSPIv-icJ4UZhbEmuN8wxxag83SE1c/1/public/values?alt=json
Tab 2:
https://spreadsheets.google.com/feeds/list/1QDWpycJJFA-UAiSPIv-icJ4UZhbEmuN8wxxag83SE1c/2/public/values?alt=json
In the event of a worksheet where the tabs are reordered frequently, it is possible to get the ID of a given sheet and use that instead of ordered numbers. I first learned of this approach from this post or this post:
In brief, you would reform a private URL with your KEY:
https://spreadsheets.google.com/feeds/worksheets/KEY/private/full
This only works on a browser where you are logged into Google Drive on an account with permissions.
Next, you have to sift through XML to find your sheet IDs:
Replace the previous 1 and 2 with the IDs, for example:
Tab 1 (first worksheet id in a new google sheet is always od6 by default, no matter order of tabs):
https://spreadsheets.google.com/feeds/list/1QDWpycJJFA-UAiSPIv-icJ4UZhbEmuN8wxxag83SE1c/od6/public/values?alt=json
Tab 2:
https://spreadsheets.google.com/feeds/list/1QDWpycJJFA-UAiSPIv-icJ4UZhbEmuN8wxxag83SE1c/ope57yg/public/values?alt=json
You can find the new ID at the following
https://spreadsheets.google.com/feeds/worksheets/YOUR_SPREADSHEET_ID/private/full

Referencing cell on other tab in formula

How to reference the same (row,column) on the different tab in the same google spreadsheet document?
So, I want to do something like this:
=SOME_FORMULA('First tab'!(ADDRESS(ROW(),COLUMN()))). This doesn't work.
If the formula isn't apsolutly referenced, entries of Google Forms questionnaire change the reference and mess up the formula. (the formula that looked at row number 5 after insert looks at row number 6) I can't use apsolute referencing ($A$1) because I have to enter it manually.
Can I change the reference on multiple cells? (for one I can use cmd + f4)
I had that annoying reference problem too. If I understand correctly you are trying to get the information on some cells, but every time someone sends information to the spreadsheet by filling up a Form, that reference moves down a row.
The best solution I came up with was to create a new SpreadSheet and import all the information with this:
=importrange("spreadsheet-key","Form Responses!A1:B2107")
That function updates the info in realtime, so you can do all the processing on the new spreadsheet.
Hope this helps.
Do not quite understand what you need. Need to reference a cell in another sheet given coordinates on the current cell where it is located?
If so, the following formula can be useful:
=INDIRECT("First tab!"&ADDRESS(ROW(), COLUMN()))

How do I filter items returned by list to exclude files last modified by me?

Looking at the results of list, there is a lastModifyingUserName, but not a userid or other concrete reference to a user such that I can strongly verify that the file was last modified by me or someone else.
I can approximate this behavior using a string comparison of my user profile information, but this isn't an exact check.
I also looked at the timestamps, and timestamps for a file that was modified by me don't seem to line up, so it doesn't look like I can do this using timestamps either, which looks like a bug in and of itself, e.g.:
"modifiedByMeDate": "2013-01-31T02:25:26.738Z",
"modifiedDate": "2013-01-31T02:29:58.363Z",
Google are working on improving this so that there is consistency between the actor returned in the lastModifyingUserName field and the permission ID.
Right now I agree with you it is pretty impossible, sorry.