JQ: Set toplevel path - json

I use jq 1.5 under a Windows enviroment to modify a given large json file to extract a single Array ("Offers") from that large file:
'.Offers[] | ({Price: .AdultPriceEUR, Currency: .Currency, Link: .Deeplink, Tickettyp: .TicketClassIndex, Flightindex: .FlightIndex })'
After that i got an "unnamed" Array. But for the later processing it is necessary that the Array keeps his old "Name". I checked the documentation and found the setpath function but it is not possible to keep the Name "easy" on extraction?
shorten example of the json file:
{"Airports": [
{
"Aliases": null,
"ContinentCode": "EU",
"ContinentGroup": 1,
"CountryCode": "DE",
"CountryName": "Germany",
"DST": "",
"DisplayName": "Hamburg (HAM) Germany",
"Iata": "HAM",
"IataLink": false,
"Icao": "EDDH",
"Latitude": 53.63215,
"Longitude": 10.0041609,
"MainCityCode": "HAM",
"MainCityDisplayName": "Hamburg (HAM) Germany",
"MainCityName": "Hamburg",
"Name": "Hamburg",
"Priority": 142,
"StateCode": null,
"StateName": null,
"TimeZone": -798214753
},
{
"Aliases": null,
"ContinentCode": "AS",
"ContinentGroup": 4,
"CountryCode": "TH",
"CountryName": "Thailand",
"DST": "",
"DisplayName": "Suvarnabhumi, Bangkok (BKK) Thailand",
"Iata": "BKK",
"IataLink": false,
"Icao": "VTBS",
"Latitude": 13.6922979,
"Longitude": 100.750694,
"MainCityCode": "BKK",
"MainCityDisplayName": "Bangkok (BKK) Thailand",
"MainCityName": "Bangkok",
"Name": "Suvarnabhumi",
"Priority": 1462,
"StateCode": null,
"StateName": null,
"TimeZone": -640089798
}], "Offers": [
{
"AdultPrice": 2977.6,
"AdultPriceEUR": 2977.6,
"AdultPriceExclTax": 0.0,
"Currency": "EUR",
"FeeIndexes": [
0,
1,
2,
3,
4,
5,
6
],
"FlightIndex": 0,
"IsPaymentIncluded": true,
"MobileDeepLink": null,
"PaymentMethods": [
"American Express",
"Diners Club",
"MasterCard Credit",
"MasterCard Debit",
"Paypal",
"Visa Credit",
"Visa Debit"
],
"Score": 2501.3,
"SegmentFares": null,
"SegmentKey": -1,
"TicketClassIndex": 1,
"TotalIsCalculated": false,
"TotalPrice": 2977.6,
"TotalPriceEUR": 2977.6,
"TotalPriceExclTax": 0.0
},
{
"AdultPrice": 4697.27,
"AdultPriceEUR": 4697.27,
"AdultPriceExclTax": 0.0,
"Currency": "EUR",
"FeeIndexes": [
0,
1,
2,
3,
4,
7,
8,
5,
6
],
"FlightIndex": 1,
"IsPaymentIncluded": true,
"MobileDeepLink": null,
"PaymentMethods": [
"American Express",
"Diners Club",
"MasterCard Credit",
"MasterCard Debit",
"Paypal",
"Sofortüberweisung",
"Überweisung",
"Visa Credit",
"Visa Debit"
],
"Score": 3438.64,
"SegmentFares": null,
"SegmentKey": -1,
"TicketClassIndex": 1,
"TotalIsCalculated": false,
"TotalPrice": 4697.27,
"TotalPriceEUR": 4697.27,
"TotalPriceExclTax": 0.0
}]
}
thanks
BR
Timo

Looks like you're looking at this:
jq '{Offers:[.Offers[] | {Price: .AdultPriceEUR, Currency: .Currency, Link: .Deeplink, Tickettyp: .TicketClassIndex, Flightindex: .FlightIndex }]}' file
It just creates a new object containing an Offers table with the content you want to put it it.

Related

Parsing JSON with VBA-Web and VBA JSON Parser

Need help on parsing JSON data using VBA-Web tool (https://github.com/VBA-tools/VBA-Web) & VBA-JSON Converter
Here is my sample code that I am able to pull the order using the get method but I stumped on parsing the JSON to get the key, value pair that I want
Sub TestWC()
Dim WoocommerceClient As New WebClient
Dim Client As String
Dim Secret As String
WoocommerceClient.BaseUrl = "https://www.example.com"
Client = "ck"
Secret = "cs"
Dim Request As New WebRequest
Request.Method = Httpget
Request.AddQuerystringParam "consumer_key", Client
Request.AddQuerystringParam "consumer_secret", Secret
Dim Response As WebResponse
Request.Resource = "/wp-json/wc/v3/orders"
Set Response = WoocommerceClient.Execute(Request)
//Response.Data -> Should be equal to the JSON data
Dim test As Object
Set test = JsonConverter.ParseJson(Response.Data)
End Sub
When I run this code it will have a error popup saying object required 424 . I also did another work around where I put into the first cell the whole JSON Data using Sheet1.Range("A1").Value = Response.Content then by pulling the data from the cell into the function
Set test = JsonConverter.ParseJson(Sheet1.Range("A1").Value)
I still get the object required error, here is a Sample JSON format from Woocommerce REST API
"id": 727,
"parent_id": 0,
"number": "727",
"order_key": "wc_order_58d2d042d1d",
"created_via": "rest-api",
"version": "3.0.0",
"status": "processing",
"currency": "USD",
"date_created": "2017-03-22T16:28:02",
"date_created_gmt": "2017-03-22T19:28:02",
"date_modified": "2017-03-22T16:28:08",
"date_modified_gmt": "2017-03-22T19:28:08",
"discount_total": "0.00",
"discount_tax": "0.00",
"shipping_total": "10.00",
"shipping_tax": "0.00",
"cart_tax": "1.35",
"total": "29.35",
"total_tax": "1.35",
"prices_include_tax": false,
"customer_id": 0,
"customer_ip_address": "",
"customer_user_agent": "",
"customer_note": "",
"billing": {
"first_name": "John",
"last_name": "Doe",
"company": "",
"address_1": "969 Market",
"address_2": "",
"city": "San Francisco",
"state": "CA",
"postcode": "94103",
"country": "US",
"email": "john.doe#example.com",
"phone": "(555) 555-5555"
},
"shipping": {
"first_name": "John",
"last_name": "Doe",
"company": "",
"address_1": "969 Market",
"address_2": "",
"city": "San Francisco",
"state": "CA",
"postcode": "94103",
"country": "US"
},
"payment_method": "bacs",
"payment_method_title": "Direct Bank Transfer",
"transaction_id": "",
"date_paid": "2017-03-22T16:28:08",
"date_paid_gmt": "2017-03-22T19:28:08",
"date_completed": null,
"date_completed_gmt": null,
"cart_hash": "",
"meta_data": [
{
"id": 13106,
"key": "_download_permissions_granted",
"value": "yes"
}
],
"line_items": [
{
"id": 315,
"name": "Woo Single #1",
"product_id": 93,
"variation_id": 0,
"quantity": 2,
"tax_class": "",
"subtotal": "6.00",
"subtotal_tax": "0.45",
"total": "6.00",
"total_tax": "0.45",
"taxes": [
{
"id": 75,
"total": "0.45",
"subtotal": "0.45"
}
],
"meta_data": [],
"sku": "",
"price": 3
},
{
"id": 316,
"name": "Ship Your Idea – Color: Black, Size: M Test",
"product_id": 22,
"variation_id": 23,
"quantity": 1,
"tax_class": "",
"subtotal": "12.00",
"subtotal_tax": "0.90",
"total": "12.00",
"total_tax": "0.90",
"taxes": [
{
"id": 75,
"total": "0.9",
"subtotal": "0.9"
}
],
"meta_data": [
{
"id": 2095,
"key": "pa_color",
"value": "black"
},
{
"id": 2096,
"key": "size",
"value": "M Test"
}
],
"sku": "Bar3",
"price": 12
}
],
"tax_lines": [
{
"id": 318,
"rate_code": "US-CA-STATE TAX",
"rate_id": 75,
"label": "State Tax",
"compound": false,
"tax_total": "1.35",
"shipping_tax_total": "0.00",
"meta_data": []
}
],
"shipping_lines": [
{
"id": 317,
"method_title": "Flat Rate",
"method_id": "flat_rate",
"total": "10.00",
"total_tax": "0.00",
"taxes": [],
"meta_data": []
}
],
"fee_lines": [],
"coupon_lines": [],
"refunds": [],
"_links": {
"self": [
{
"href": "https://example.com/wp-json/wc/v3/orders/727"
}
],
"collection": [
{
"href": "https://example.com/wp-json/wc/v3/orders"
}
]
}
}```

Error parsing a specific JSON file in Snowflake with File Format

I have created a Stage and File Format in Snowflake which works with all my other JSON files except this, which throws an error:
Error parsing JSON: misplaced { File 'rooms.json.gz', line 1,
character 2 Row 0, column $1
I am using the same query that I am using for other files.
SELECT $1
FROM #MySchema.MY_STAGE/rooms.json.gz
;
What is wrong with the structure of this specific JSON file?
{
"rooms": [
{
"area": 131.49,
"longDescription": "",
"dateCreated": 1589908063390,
"reservable": false,
"name": "E249",
"remoteInfo": "",
"description": "",
"id": 2,
"type": {
"hexColor": "c16058",
"contentFlag": 1,
"cost": 0.0,
"dateCreated": 1308610520717,
"color": {},
"name": "BREAK ROOM",
"occupiable": false,
"id": 120,
"parkingSpace": false,
"dateUpdated": 1591818585913,
"typeCode": ""
},
"floor": {
"area": 25312.9878,
"dateCreated": 1589907703870,
"drawingAvailable": true,
"interiorGross": 0.0,
"name": "2",
"leaseArea": 0.0,
"id": 12,
"building": {
"address": {
"country": {
"defaultSelected": true,
"subdivisionCategoryName": "state",
"alpha2Code": "US",
"isoCode": "US",
"name": "United States of America (the)",
"id": 223
},
"city": "Some City",
"street": "Some Drive",
"postalCode": "00000",
"state": {
"country": {
"defaultSelected": true,
"subdivisionCategoryName": "state",
"alpha2Code": "US",
"isoCode": "US",
"name": "United States of America (the)",
"id": 223
},
"defaultSelected": false,
"code": "XX",
"name": "Some State",
"id": 66,
"categoryName": "state"
}
},
"code": "B2",
"dateCreated": 1589907508020,
"metric": false,
"name": "Some name",
"location": {},
"revitLink": "",
"id": 45,
"dateUpdated": 1601315841453,
"costCenters": []
},
"dateUpdated": 1600441936663
},
"capacity": 0,
"dateUpdated": 1600441936960
}
]
}
Edit: Screenshot from Notepad++ with all characters enabled

Python 3 - Extracting value from key in nested dictionary

Hoping for some pointers here as I'm striking out with my attempts. Am working with python 3.8.5.
I'm querying a Car Park booking system that returns a json list of availability. The result has lots of nested dictionaries and I'm struggling to extract just the values i want.
This is what I've been doing:
import requests
import json
enquiry = requests.post(url.....) #queries api, this works fine
results = enquiry.text #extracts response data
dictionary = json.loads(results) #convert response to python dict
If there is one slot available, I get this output (apologies for length). If there are multiple slots available, i get the same output repeated:
{
"data": {
"services": [{
"id": null,
"name": null,
"services": [{
"id": null,
"name": "Car Park",
"met": true,
"filterCount": 1,
"primary": true,
"options": [{
"id": "9",
"images": [],
"available": true,
"calendarId": "AAAA",
"templateId": "BBBB",
"capacity": 0,
"name": "Car Park Slot 3",
"sessionId": "CCCC",
"functions": null,
"startDate": "2020-08-18T13:30:00Z", <--this is what I want to extract
"endDate": "2020-08-18T14:30:00Z",
"geo": {
"lat": 0.0,
"lng": 0.0
},
"selected": false,
"linkedServices": [],
"tiers": null
}]
}],
"currentBookingId": null,
"startDate": {
"ms": 1597757400000,
"year": 2020,
"month": 8,
"day": 18,
"dayOfWeek": 2,
"time": {
"seconds": 0,
"minutes": 30,
"hours": 14,
"days": 0
}
},
"endDate": {
"ms": 1597761000000,
"year": 2020,
"month": 8,
"day": 18,
"dayOfWeek": 2,
"time": {
"seconds": 0,
"minutes": 30,
"hours": 15,
"days": 0
}
},
"sessionId": "2222222",
"chargeType": 1,
"hasPrimaryBookable": false,
"hasBookable": false,
"hasDiscounts": false,
"hasMultipleTiers": false,
"isPreferred": false,
"primaryServiceAvailable": true,
"primaryServiceId": null,
"primaryServiceType": "undefined",
"unavailableAttendees": []
}],
"bookingLimit": null
},
"success": true,
"suppress": false,
"version": "2.3.293",
"message": null,
"result": null,
"errors": null,
"code": null,
"flags": 0,
"redirect": null
}
I want to extract:
"startDate": "2020-08-18T13:30:00Z"
from each key, value pair in any of the returned slots. However, I cant work it out.
Extracting the whole of this nested dictionary would also contain the same data, but would then involve more work to tidy it up after.
"startDate": {
"ms": 1597757400000,
"year": 2020,
"month": 8,
"day": 18,
"dayOfWeek": 2,
"time": {
"seconds": 0,
"minutes": 30,
"hours": 14,
"days": 0
}
I've tried loads of dictionary.get and dictionary.items variations, but cant seem to get anywhere.
I tried something like
key = ('startDate')
availability = dictionary.get(key)
print(availability)
this just returns 'none', so think im way off
Any pointers?
Thanks in advance!
Thanks for the full data. It makes testing easier :)
I had to replace null -> None, true -> True, false -> False
slot = {
"data": {
"services": [{
"id": None,
"name": None,
"services": [{
"id": None,
"name": "Car Park",
"met": True,
"filterCount": 1,
"primary": True,
"options": [{
"id": "9",
"images": [],
"available": True,
"calendarId": "AAAA",
"templateId": "BBBB",
"capacity": 0,
"name": "Car Park Slot 3",
"sessionId": "CCCC",
"functions": None,
"startDate": "2020-08-18T13:30:00Z", # <--this is what I want to extract
................
print("Start Date:", slot['data']['services'][0]['services'][0]['options'][0]['startDate'])
Output
Start Date: 2020-08-18T13:30:00Z

JOIN in Apache Pig

I have two files consisting of json objects in two different locations on my hdfs and I need to join those two depending on a common field.
First file consists of tweet data and has 34 fields (I literally counted). It looks like:
{"contributors": null, "truncated": false, "text": "US Bank Loans And credit card capitol one business", "avl_brand_all": ["US Bank"], "is_quote_status": false , "in_reply_to_status_id": null, "id": 770150015968825344, "favorite_count": 0, "avl_num_sentences": 1, "source": "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</ a>", "retweeted": false, "coordinates": null, "entities": {"symbols": [], "user_mentions": [], "hashtags": [], "urls": [{"url": "<link>": [51, 74], "expand ed_url": "http://usbanklogins.com/bank/", "display_url": "usbanklogins.com/bank/"}]}, "in_reply_to_screen_name": null, "in_reply_to_user_id": null, "avl_word_tags": [{"distance": 1, " word": "u", "pos": "OTHER"}, {"distance": 1, "word": "bank", "pos": "NOUN"}, {"distance": 1, "word": "loan", "pos": "NOUN"}, {"distance": 1, "word": "credit", "pos": "NOUN"}, {"distan ce": 1, "word": "card", "pos": "NOUN"}, {"distance": 1, "word": "capitol", "pos": "VERB"}, {"distance": 1, "word": "one", "pos": "OTHER"}, {"distance": 1, "word": "business", "pos": " NOUN"}], "avl_brand_1": "US Bank", "retweet_count": 0, "avl_lexicon_text": "us bank loans and credit card capitol one business", "id_str": "770150015968825344", "favorited": false, "a vl_sentences": ["us bank loans and credit card capitol one business"], "user": {"follow_request_sent": false, "has_extended_profile": false, "profile_use_background_image": true, "id" : 485610502, "verified": false, "profile_text_color": "0C3E53", "profile_image_url_https": "<link>", "profile _sidebar_fill_color": "FFF7CC", "geo_enabled": false, "entities": {"url": {"urls": [{"url": "link", "indices": [0, 22], "expanded_url": "http://www.seowithme.com", " display_url": "seowithme.com"}]}, "description": {"urls": []}}, "followers_count": 347, "profile_sidebar_border_color": "F2E195", "location": "", "default_profile_image": false, "id_s tr": "485610502", "is_translation_enabled": false, "utc_offset": null, "statuses_count": 117, "description": "seowithme", "friends_count": 959, "profile_link_color": "FF0000", "profil e_image_url": "http://pbs.twimg.com/profile_images/2334489262/qyznw08zjrgv3vlxtdvt_normal.jpeg", "notifications": false, "profile_background_image_url_https": "https://abs.twimg.com/i mages/themes/theme12/bg.gif", "profile_background_color": "BADFCD", "profile_background_image_url": "http://abs.twimg.com/images/themes/theme12/bg.gif", "screen_name": "sajanshrestha2 2", "lang": "en", "following": false, "profile_background_tile": false, "favourites_count": 2, "name": "sajan shrestha", "url": "<link>", "created_at": "Tue Feb 07 11: 40:39 +0000 2012", "contributors_enabled": false, "time_zone": null, "protected": false, "default_profile": false, "is_translator": false, "listed_count": 0}, "avl_num_paragraphs": 1, "geo": null, "in_reply_to_user_id_str": null, "possibly_sensitive": false, "lang": "en", "created_at": "Mon Aug 29 06:44:07 +0000 2016", "avl_source": "individual", "in_reply_to_stat us_id_str": null, "place": null, "metadata": {"iso_language_code": "en", "result_type": "recent"}, "avl_num_words": 8}
The second file has json objects each having only two fields. Looks like:
{"avl_syntaxnet_tags": [{"pos_tag": "PRP", "position": "1", "dep_rel": "dep", "parent": "3", "word": "us"}, {"pos_tag": "NN", "position": "2", "dep_rel": "nn", "parent": "3", "word": "bank"}, {"pos_tag": "NNS", "position": "3", "dep_rel": "nsubj", "parent": "7", "word": "loans"}, {"pos_tag": "CC", "position": "4", "dep_rel": "cc", "parent": "3", "word": "and"}, {" pos_tag": "NN", "position": "5", "dep_rel": "nn", "parent": "6", "word": "credit"}, {"pos_tag": "NN", "position": "6", "dep_rel": "conj", "parent": "3", "word": "card"}, {"pos_tag": " VBP", "position": "7", "dep_rel": "ROOT", "parent": "0", "word": "capitol"}, {"pos_tag": "CD", "position": "8", "dep_rel": "num", "parent": "9", "word": "one"}, {"pos_tag": "NN", "pos ition": "9", "dep_rel": "dobj", "parent": "7", "word": "business"}], "avl_lexicon_text": "us bank loans and credit card capitol one business"}
Now, there is a common fiels in both the json_objects named avl_lexicon_text and I want to join these two objects using the common field.
I wrote the following Pig script for the join:
a = LOAD file1 as (a1, a2);
b = LOAD file2 as (b1, b2, b3, b4, b5, b6, b7, b8, b9, b10, b11, b12, b13, b14, b15, b16, b17, b18, b19, b20, b21, b22, b23, b24, b25, b26, b27, b28, b29, b30, b31, b32, b33, b34);
x = JOIN b BY b19 FULL, a BY a2;
STORE x INTO '$SYNTAXNET_OUTPUT';
I checked b19 is the avl_lexicon_text field in b and a2 is the same in a. The results I get are really weird. When I dump x, I am not getting a new json_object that contains all the fields in a and b. I get all the objects in b followed by all the objects in a.
Can someone suggest me the right way to do this?
EDIT: Also, is there a way I can do this without loading the schema? Because sometime in future, if the format of any of the files changes (a new field gets added or an existing field gets deleted), I do not want to change the pig script. Is there a way I can do the JOIN without referencing the field position but by accessing the field name? Thanks! )
The behavior is expected since you have specified a FULL outer join.
Remove FULL to only get matching records.See here for FULL outer join.
x = JOIN b BY b19, a BY a2;

How to form JSON path

I have employers array as below; how to get employers:id and featuredReview:id using JSON expression.
"employers": [
{
"id": 194,
"name": "Target",
"website": "www.target.com",
"isEEP": false,
"exactMatch": false,
"industry": "Department, Clothing, & Shoe Stores",
"numberOfRatings": 11531,
"squareLogo": "http://media.glassdoor.com/sqll/194/target-squarelogo.png",
"overallRating": 3.2,
"ratingDescription": "OK",
"cultureAndValuesRating": "3.3",
"seniorLeadershipRating": "2.8",
"compensationAndBenefitsRating": "3.0",
"careerOpportunitiesRating": "3.0",
"workLifeBalanceRating": "3.0",
"recommendToFriendRating": "0.6",
"featuredReview": {
"id": 6613365,
"currentJob": false,
"reviewDateTime": "2015-05-15 16:32:06.997",
"jobTitle": "Executive Team Leader",
"location": "Buena Park, CA",
"jobTitleFromDb": "Executive Team Leader",
"headline": "Unrealistic expectations for leadership",
"overall": 4,
"overallNumeric": 4
},
"ceo": {
"name": "Brian Cornell",
"title": "CEO",
"numberOfRatings": 1127,
"pctApprove": 66,
"pctDisapprove": 34
}
}]
employers[0].id
employers[0].featuredReview.id