How to retrieve data from within a node name? - mysql

I am able to retrieve the data between the nodes, but not from in the node itself. I searched far and wide, but can't seem to find a solution for this.
My XML looks like the following:
And this XML is saved inside a nvarchar column called fileXML in SQL (Server 2008R2).
I want to retrieve the History Date, which is inside the node name.
My current code which is retrieving the "18" from the node value is the following:
, fileXML.value('(/commands/command/measure/categories/category/components/component/history)[1]', 'varchar(100)') as HisDate
Like you can see on the picture above, this is working.
But I can't seem to retrieve the info from within the node.
I searched on the web, and tried several things like:
fileXML.value('(/commands/command/measure/categories/category/components/component/history.name)[1]', 'varchar(100)') as HisDate
fileXML.value('(/commands/command/measure/categories/category/components/component/history/local-name)[1]', 'varchar(100)') as HisDate
fileXML.value('(/commands/command/measure/categories/category/components/component/history/local-name(.))[1]', 'varchar(100)') as HisDate
Where the first 2 became a NULL value, and the last one gave an error message that a function is not supported. I can give much more example on what I tried, but this would make the post a bit messy.
Any help is greatly appreciated.

date is an attribute of the history element. So your path should be
/commands/command/measure/categories/category/components/component/history/#date
Untested as you supplied the XML as a picture.

Related

Google sheets importxml failure - Can't find the correct path to table from the link

I'm trying to retrieve a table which is updating twice per day. On other websites i was able to find the element but i saw that the way i see don't work on all websites where i tried.
In this case the issue is:
In google sheets using importxml, i can't find the correct path to table from the link or identify the element.
The website for this example is: http://lotopolonia.com/tabel/arhiva/index.php
1. I need to retrieve the dates and numbers.
2. They are updated twice per day and being updated in my sheet with adding just the last line at the top of the others. But this one after i solve the first one.
I looked at xpath tutorial from w3c and understood the syntax a bit.
The problem is how to identify correctly the elements and nodes in the inspector to retrieve the data i need.
Also, i've installed a chrome extension (XPath Helper) which shows xpath better that what i got from chrome.
I tried the following:
=IMPORTXML("http://lotopolonia.com/tabel/arhiva/index.php","//table[#class='table_01']/tbody/tr[#class='second_row']/td[#class='colon2']")
=IMPORTXML("http://lotopolonia.com/tabel/arhiva/index.php","//table[#class='table_01']/tbody/tr[#class='second_row']/td[*]")
=IMPORTXML("http://lotopolonia.com/tabel/arhiva/index.php","//table[#class='table_01']/tbody/tr[#class='first_row'][1]/td[*]")
=IMPORTXML("http://lotopolonia.com/tabel/arhiva/index.php","//*[#class='table_01']/table/tbody/tr[#class='first_row'][1]/td[*]")
=IMPORTXML("http://lotopolonia.com/tabel/arhiva/index.php","//table[#class='table_01']/tbody/tr[3]/td[*]")
=IMPORTXML("http://lotopolonia.com/tabel/arhiva/index.php","//table[#class='table_01']/tbody/tr[*]/td[*]")
=IMPORTXML("http://lotopolonia.com/tabel/arhiva/index.php","//table[#class='table_01']/tbody/tr[#class='second_row'][1]/child::td[*]")
The formula looks ok, without errors, but at all above requests i get the same result: imported content is empty
Unfortunately i ran out of ideas and how to interpret that elements...
Any ideea how to go on?
Cheers
How about this answer? I used //table[#class='table_01']/tr[position()>2] as a xpath. "A1" has http://lotopolonia.com/tabel/arhiva/index.php.
=IMPORTXML(A1,"//table[#class='table_01']/tr[position()>2]")
Using table[#class='table_01'], retrieve the table.
Using tr[position()>2], retrieve the dates and numbers.
Result :
Note :
If you want to retrieve the whole table, please use =IMPORTXML(A1,"//table[#class='table_01']/tr").
If this was not what you want, I'm sorry.

Best way to parse a big and intricated Json file with OpenRefine (or R)

I know how to parse json cells in Open refine, but this one is too tricky for me.
I've used an API to extract the calendar of 4730 AirBNB's rooms, identified by their IDs.
Here is an example of one Json file : https://fr.airbnb.com/api/v2/calendar_months?key=d306zoyjsyarp7ifhu67rjxn52tv0t20&currency=EUR&locale=fr&listing_id=4212133&month=11&year=2016&count=12&_format=with_conditions
For each ID and each day of the year from now until november 2017, i would like to extract the availability of this rooms (true or false) and its price at this day.
I can't figure out how to parse out these informations. I guess that it implies a series of nested forEach, but i can't find the right way to do this with Open Refine.
I've tried, of course,
forEach(value.parseJson().calendar_months, e, e.days)
The result is an array of arrays of dictionnaries that disrupts me.
Any help would be appreciate. If the operation is too difficult in Open Refine, a solution with R (or Python) would also be fine for me.
Rather than just creating your Project as text, and working with GREL to parse out...
The best way is just select the JSON record part that you want to work with using our visual importer wizard for JSON files and XML files (you can even use a URL pointing to a JSON file as in your example). (A video tutorial shows how here: https://www.youtube.com/watch?v=vUxdB-nl0Bw )
Select the JSON part that contains your records that you want to parse and work with (this can be any repeating part, just select one of them and OpenRefine will extract all the rest)
Limit the amount of data rows that you want to load in during creation, or leave default of all rows.
Click Create Project and now your in Rows mode. However if you think that Records mode might be better suited for context, just import the project again as JSON and then select the next outside area of the content, perhaps a larger array that contains a key field, etc. In the example, the key field would probably be the Date, and why I highlight the whole record for a given date. This way OpenRefine will have Keys for each record and Records mode lets you work with them better than Row mode.
Feel free to take this example and make it better and even more helpful for all , add it to our Wiki section on How to Use
I think you are on the right track. The output of:
forEach(value.parseJson().calendar_months, e, e.days)
is hard to read because OpenRefine and JSON both use square brackets to indicate arrays. What you are getting from this expression is an OR array containing twelve items (one for each month of the year). The items in the OR array are JSON - each one an array of days in the month.
To keep the steps manageable I'd suggest tackling it like this:
First use
forEach(value.parseJson().calendar_months,m,m.days).join("|")
You have to use 'join' because OR can't store OR arrays directly in a cell - it has to be a string.
Then use "Edit Cells->Split multi-valued cells" - this will get you 12 rows per ID, each containing a JSON expression. Now for each ID you have 12 rows in OR
Then use:
forEach(value.parseJson(),d,d).join("|")
This splits the JSON down into the individual days
Then use "Edit Cells->Split multi-valued cells" again to split the details for each day into its own cell.
Using the JSON from example URL above - this gives me 441 rows for the single ID - each contains the JSON describing the availability & price for a single day. At this point you can use the 'fill down' function on the ID column to fill in the ID for each of the rows.
You've now got some pretty easy JSON in each cell - so you can extract availability using
value.parseJson().available
etc.

Finding a specific value out of an Array (RoR / MySQL)

I am trying to find a specific value inside an array. The array is composed from data in MySQL database and looks like:
info = [#<Info1: bla, Info2: blo>,#<Info1: bli, Info2, Ble>]
Now I want to get every Info1's value from it, but I do not know how.
The array was formed by calling
info = Info.find(:all)
Can anyone help me?
I am using Rails 2.2.2 (don't ask, can't do anything about it) and Ruby 1.8.
Edit: More details
Info is a database, where Info1 and info 2 are the columns. Calling it with info = Info.find(:all) returns the array above.
What I have tried so far involves trying to go through the array with each, but so far no luck.
Most of what I have tried like
a.grep(/^info1/)
and
info.select(|i| i.name == "info1")
all return empty arrays
Edit
Nevermind, I found the answer. I was thinking too weird. The answer is
info.each do |object|
puts object.info2
end
What's your selection criteria? You can do something like
info.select{|i| i.name == 'hello' }
and you will get all the Info objects with name = 'hello'.
But I would prefer to change the query, if you can, to filter them in the database query directly.

How to find last item in a repeated structure in bigquery

I have a nested repeated structure, the repeated structure is of variable length. For example, it could be a person object with a repeated structure that holds cities the person has lived in. I'd like to find the last item in that list say to find current city person lives in. Is there an easy way to do this, I tried looking around jsonpath functions but I'm not sure how to use it with "within". Any help please?
1) You can use LAST and WITHIN
SELECT
FIRST(cell.value) within record ,
LAST(cell.value) within record
FROM [publicdata:samples.trigrams]
where ngram = "! ! That"
2) or if you want something more advanced you can use POSITION
POSITION(field) - Returns the one-based, sequential position of field within a set of repeated fields.
You can check the samples from trigrams (click on Details to see the unflatten schema)
https://bigquery.cloud.google.com/table/publicdata:samples.trigrams?pli=1
And when you run POSITION, you get the ordering of that field.
SELECT
ngram,
cell.value,
position(cell.volume_count) as pos,
FROM [publicdata:samples.trigrams]
where ngram = "! ! That"
Now that you have the position, you can query for last one.

How do I filter items returned by list to exclude files last modified by me?

Looking at the results of list, there is a lastModifyingUserName, but not a userid or other concrete reference to a user such that I can strongly verify that the file was last modified by me or someone else.
I can approximate this behavior using a string comparison of my user profile information, but this isn't an exact check.
I also looked at the timestamps, and timestamps for a file that was modified by me don't seem to line up, so it doesn't look like I can do this using timestamps either, which looks like a bug in and of itself, e.g.:
"modifiedByMeDate": "2013-01-31T02:25:26.738Z",
"modifiedDate": "2013-01-31T02:29:58.363Z",
Google are working on improving this so that there is consistency between the actor returned in the lastModifyingUserName field and the permission ID.
Right now I agree with you it is pretty impossible, sorry.