Xpath to JSON - still needs tidying - json

I am trying to get the table of data from this website. and serve it out as JSON.
I am taking my first steps with Yahoo Pipes/XPath, so apologies in advance for the noob Qs.
I have a pipe that just uses an XPath expression:
//table1/tbody/tr
Which gives me the following JSON output (snippet, 1 of about 30 items):
"items": [{
"td": [{
"a": {
"href": "http:\/\/www.hydro.com.au\/system\/files\/water-storage\/Web_Lakes_AUGUSTA.pdf",
"content": "Lake Augusta"
}
},
{
"p": "2.58"
},
{
"p": "Steady"
}],
"description": null,
"title": null
},
What I want to end up with is a flattened out version, something like this:
"items": [{
{
"href": "http:\/\/www.hydro.com.au\/system\/files\/water-storage\/Web_Lakes_AUGUSTA.pdf",
},
{
"content": "Lake Augusta"
},
{
"p": "2.58"
},
{
"p": "Steady"
}],
},
Can someone help me achieve this? Should I be doing more with the original XPath, or is there a particular processor within Pipes that will help me accomplish this?

It's not possible to select the data structure you want directly with XPath.
However, it's not difficult to manipulate selected data in Yahoo Pipes, for example by combining a "Loop" module with a "Item Builder" module.
The result is a feed that looks like this:
{
"count": 43,
"value": {
"title": "Hydro Tasmania Lake Levels",
"description": "Hydro Tasmania Lake Levels\n\nfrom StackOverflow Question http://stackoverflow.com/q/23248288/18771",
"link": "http://pipes.yahoo.com/pipes/pipe.info?_id=d2fe64963964bf6636b58fb7b9814ef0",
"pubDate": "Fri, 25 Apr 2014 06:12:35 +0000",
"generator": "http://pipes.yahoo.com/pipes/",
"callback": "",
"items": [
{
"title": "Lake Augusta",
"link": "http://www.hydro.com.au/system/files/water-storage/Web_Lakes_AUGUSTA.pdf",
"meters": "2.58",
"comment": "Steady",
"description": null
},
... more ...
]
}
}

Related

PayPal JSON format updating order

I know I am close on this, the error messages are getting nicer. Currently, I can call a similar call to update the seller's email no issue via Postman currently, working on updating the amount and associated objects. Something in my request format is off.
Is my breakdown section in the correct location? The amount_breakdown documentation looks like it is on same level as value and currency_code, so does it need to move into that section.
Here's my request JSON via Postman:
[
{
"op": "replace",
"path": "/purchase_units/#reference_id=='default'/amount",
"value": {
"currency_code": "CAD",
"value": "2",
"amount": {
"currency_code": "CAD",
"value": "2",
"breakdown": {
"item_total": {
"currency_code": "CAD",
"value": "2"
},
"tax_total": {
"value": "0",
"currency_code": "CAD"
}
}
},
"items": [
{
"name": "First Product Name",
"description": "Optional descriptive text..",
"unit_amount": {
"currency_code": "CAD",
"value": "2"
},
"tax": {
"value": "0",
"currency_code": "CAD"
},
"quantity": "1"
}
]
}
}
]
RESPONSE:
{
"name": "UNPROCESSABLE_ENTITY",
"details": [
{
"field": "/purchase_units/#reference_id=='default'/amount/breakdown/item_total",
"location": "body",
"issue": "ITEM_TOTAL_REQUIRED",
"description": "If item details are specified (items.unit_amount and items.quantity) corresponding amount.breakdown.item_total is required."
}
],
"message": "The requested action could not be performed, semantically incorrect, or failed business validation.",
"debug_id": "acecd3643c994",
"links": [
{
"href": "https://developer.paypal.com/docs/api/orders/v2/#error-ITEM_TOTAL_REQUIRED",
"rel": "information_link",
"method": "GET"
}
]
}
Thanks for any help!
Different variations of objects.
I can get the other PATCH operation working no issue but it is much simpler in object structure
There should be no amount key under the /amount path, and the items array does not belong at that /amount path either.

Compare 2 cucumber JSON reports with ruby

The problem is: I have 2 cucumber test reports in JSON format
I need to remove redundant key-value pairs from those reports and compare them, but I can't understand how to remove the unnecessary data from those 2 jsons because of their structure after JSON.parse (array or hash with many nested arrays/hashes). Please advice if there are some gems or known solutions to do this
JSON structure is e.g. :
[
{
"uri": "features/home_screen.feature",
"id": "as-a-user-i-want-to-explore-home-screen",
"keyword": "Feature",
"name": "As a user I want to explore home screen",
"description": "",
"line": 2,
"tags": [
{
"name": "#home_screen",
"line": 1
}
],
"elements": [
{
"keyword": "Background",
"name": "",
"description": "",
"line": 3,
"type": "background",
"before": [
{
"match": {
"location": "features/step_definitions/support/hooks.rb:1"
},
"result": {
"status": "passed",
"duration": 505329000
}
}
],
"steps": [
{
"keyword": "Given ",
"name": "I click OK button in popup",
"line": 4,
"match": {
"location": "features/step_definitions/registration_steps.rb:91"
},
"result": {
"status": "passed",
"duration": 2329140000
}
},
{
"keyword": "And ",
"name": "I click Allow button in popup",
"line": 5,
"match": {
"location": "features/step_definitions/registration_steps.rb:96"
},
"result": {
"status": "passed",
"duration": 1861776000
}
}
]
},
Since you are asking for a gem, you might try iteraptor I have created exactly for this kind of tasks.
It allows iterating, mapping and reducing the deeply nested structures. For instance, to filter out all the keys called "name" on all levels, you might do:
input.iteraptor.reject(/name/)
The more detailed description might be found on the github page linked above.

D3js pack-layout visualization of generated json file does not work

A friend wrote a program in VBA, which generates a json data. I am trying to visualize that data via the pack-layout. We extracted the rules by what the json data is being created from the json data here: http://bl.ocks.org/mbostock/7607535
I went through the data many times myself, I just can't seem to find the problem why it is not being visualized. The browser console claims a problem in line 33 with the token "]" but in my eyes the parenthesis are right and I can't seem to find another mistake.
The visualization works properly with the data from where we extracted the rules.
The question now is, which mistake in the json file prevents the code from being visualized?
Would be amazing if somebody can see this, since we cannot see it. Thanks in advance!
The generated json data looks like this:
{
"name": "While",
"children": [
{"name": "While", "size": 27},
{
"name": "If",
"children": [
{"name": "If", "size": 22},
{
"name": "If",
"children": [
{"name": "If", "size": 3}
]
},
{
"name": "If",
"children": [
{"name": "If", "size": 3}
]
},
{
"name": "If",
"children": [
{"name": "If", "size": 3}
]
},
{
"name": "If",
"children": [
{"name": "If", "size": 3}
]
},
]
},
]
}
You have two commas(,) at the end of some arrays within that JSON of yours - that makes it invalid and prone to errors.
Just edit it and it will work. Use https://jsonformatter.curiousconcept.com/ to check.
The error lies with the script that generates it :)
Here's the fixed version of your JSON:
{
"name": "While",
"children": [{
"name": "While",
"size": 27
}, {
"name": "If",
"children": [{
"name": "If",
"size": 22
}, {
"name": "If",
"children": [{
"name": "If",
"size": 3
}]
}, {
"name": "If",
"children": [{
"name": "If",
"size": 3
}]
}, {
"name": "If",
"children": [{
"name": "If",
"size": 3
}]
}, {
"name": "If",
"children": [{
"name": "If",
"size": 3
}]
}]
}]
}

How to Index & Search Nested Json in Solr 4.9.0

I want to index & search nested json in solr. Here is my json code
{
"id": "44444",
"headline": "testing US",
"generaltags": [
{
"type": "person",
"name": "Jayalalitha",
"relevance": "0.334",
"count": 1
},
{
"type": "person",
"name": "Kumar",
"relevance": "0.234",
"count": 1
}
],
"socialtags": {
"type": "SocialTag",
"name": "US",
"importance": 2
},
"topic": {
"type": "Topic",
"name": "US",
"score": "0.936"
}
}
When I try to Index, I'm getting the error "Error parsing JSON field value. Unexpected OBJECT_START"
When we tried to use Multivalued Field & index, we couldn't able to search using the multivalued field? Its returning "Undefined Field"
Also Please advice if I need to do any changes in schema.xml file?
You are nesting child documents within your document. You need to use the proper syntax for nested child documents in JSON:
[
{
"id": "1",
"title": "Solr adds block join support",
"content_type": "parentDocument",
"_childDocuments_": [
{
"id": "2",
"comments": "SolrCloud supports it too!"
}
]
},
{
"id": "3",
"title": "Lucene and Solr 4.5 is out",
"content_type": "parentDocument",
"_childDocuments_": [
{
"id": "4",
"comments": "Lots of new features"
}
]
}
]
Have a look at this article which describes JSON child documents and block joins.
Using the format mentioned by #qux you will face "Expected: OBJECT_START but got ARRAY_START at [16]",
"code": 400
as when JSON starting with [....] will parsed as a JSON array
{
"id": "44444",
"headline": "testing US",
"generaltags": [
{
"type": "person",
"name": "Jayalalitha",
"relevance": "0.334",
"count": 1
},
{
"type": "person",
"name": "Kumar",
"relevance": "0.234",
"count": 1
}
],
"socialtags": {
"type": "SocialTag",
"name": "US",
"importance": 2
},
"topic": {
"type": "Topic",
"name": "US",
"score": "0.936"
}
}
The above format is correct.
Regarding searching. Kindly use the index to search for the elements of the JSON array.
The workaround for this can be keeping the whole JSON object inside other JSON object and the indexing it
I was suggesting to keep the whole data inside another JSON object. You can try the following way
{
"data": [
{
"id": "44444",
"headline": "testing US",
"generaltags": [
{
"type": "person",
"name": "Jayalalitha",
"relevance": "0.334",
"count": 1
},
{
"type": "person",
"name": "Kumar",
"relevance": "0.234",
"count": 1
}
],
"socialtags": {
"type": "SocialTag",
"name": "US",
"importance": 2
},
"topic": {
"type": "Topic",
"name": "US",
"score": "0.936"
}
}
]
}
see the syntax in http://yonik.com/solr-nested-objects/
$ curl http://localhost:8983/solr/demo/update?commitWithin=3000 -d '
[
{id : book1, type_s:book, title_t : "The Way of Kings", author_s : "Brandon Sanderson",
cat_s:fantasy, pubyear_i:2010, publisher_s:Tor,
_childDocuments_ : [
{ id: book1_c1, type_s:review, review_dt:"2015-01-03T14:30:00Z",
stars_i:5, author_s:yonik,
comment_t:"A great start to what looks like an epic series!"
}
,
{ id: book1_c2, type_s:review, review_dt:"2014-03-15T12:00:00Z",
stars_i:3, author_s:dan,
comment_t:"This book was too long."
}
]
}
]'
supported from solr 5.3

Schema to load json data to google big query

I have a question for the project that we are doing...
I tried to extract this JSON to Google Big Query and not able to get JSON votes Object fields from the JSON input. I tried the "record" and the "string" types in the schema.
{
"votes": {
"funny": 10,
"useful": 10,
"cool": 10
},
"user_id": "OlMjqqzWZUv2-62CSqKq_A",
"review_id": "LMy8UOKOeh0b9qrz-s1fQA",
"stars": 4,
"date": "2008-07-02",
"text": "This is what this 4-star bar is all about.",
"type": "review",
"business_id": "81IjU5L-t-QQwsE38C63hQ"
}
Also i am not able to get the tables populated from this below JSON for the categories and neighborhood JSON arrays? What should my schema be for these inputs? The docs didn't help much unfortunately in this case or maybe i am not looking at the right place..
{
"business_id": "Iu-oeVzv8ZgP18NIB0UMqg",
"full_address": "3320 S Hill St\nSouth East LA\nLos Angeles, CA 90007",
"schools": [
"University of Southern California"
],
"open": true,
"categories": [
"Medical Centers",
"Health and Medical"
],
"neighborhoods": [
"South East LA"
]
}
I am able to get the regular fields, but that's about it... Any help is appreciated!
For business it seems you want schools to be a repeated field. Your schema should be:
"schema": {
"fields": [
{
"name": "business_id",
"type": "string"
}.
{
"name": "full_address",
"type": "string"
},
{
"name": "schools",
"type": "string",
"mode": "repeated"
},
{
"name": "open",
"type": "boolean"
}
]
}
For votes it seems you want record. Your schema should be:
"schema": {
"fields": [
{
"name": "name",
"type": "string"
}.
{
"name": "votes",
"type": "record",
"fields": [
{
"name": "funny",
"type": "integer",
},
{
"name": "useful",
"type": "integer"
},
{
"name": "cool",
"type": "integer"
}
]
},
]
}
Source
I was also stuck on this problem, but the issue I faced was because one has to remember to flag the mode as repeated for the records source
Also please note that these cannot have a null value source