How to look up values in a large JSON file in Dart - json

I have a large JSON file with many objects with many properties. Simplified structure looks like this:
"allGadgets": [
{
"Model Code": "nokia1",
"Top Category": "Mobile Phones",
"Category": "non-iPhone",
"Brand": "Nokia",
"Device": "1",
"Price": "£ 11"
},
{
"Model Code": "nokia2",
"Top Category": "Mobile Phones",
"Category": "non-iPhone",
"Brand": "Nokia",
"Device": "2",
"Price": "£ 17",
},
{
"Model Code": "nokia3",
"Top Category": "Mobile Phones",
"Category": "non-iPhone",
"Brand": "Nokia",
"Device": "3",
"Price": "£ 10",
}] ... plus a few hundreds more of different brands and models
I'm extracting from this json list of maps a list of Strings for a search panel for the user to look up their device. The Strings are made of two of the values from the json, i.e.: "${item['Brand']} - ${item['Device']}"
Once the user has selected the relevant model from the dropdown search panel, I need to use this string value to give them the price from the json file. The question is how do I achieve that in dart/flutter? If it was html/css, I would have added an extra hidden field of model code and/or price itself and then just made it visible.
In flutter/dart, however, a search panel plugin I found only accepts Strings, which the user selects and which then have to be used to look up the the corresponding price value in the json file.
Complicating the lookup is the fact that my Strings are now composed of two field values with spaces and a hyphen in between so I probably need to convert them back into how they had been prior to the string conversion and then use both for the lookup... which sounds quite convoluted...
Any thoughts on how to solve the above task would be welcome!
What I guess would help a lot is an example - looking up an object using a String (formed from two values from the objects) within a json with many objects. User is presented with a subset of those objects, but just sees a couple of fields from them. Then the user effectively selects a query using the String shown to him based off the two fields. That String then allows to look up the object and find another value (price) in that corresponding object...

Having decoded your json, you have a List of Maps. Make a new data structure which is a Map of Maps (i.e. Map<String, Map<String, dynamic>>). Populate the new Map by adding each member of the List, keyed by the brand/device name. Now you can directly look up the device details by that composite name.
List<Map<String, dynamic>> original;
Map<String, Map<String, dynamic>> data = {};
original.forEach((item) {
String brandDeviceName = '${item['Brand']} - ${item['Device']}';
data[brandDeviceName] = item;
});

Related

AWS Glue Crawler - DynamoDB Export - Get attribute names in schema instead of struct

I've defined a default crawler on the data directory of an export from dynamodb. I'm trying to get it to give me a structured table instead of a table with a single column of type struct. What do I have to do make get the actual column names in there? I've tried adding custom classifiers and different path expressions but nothing seems to work, and I feel like I'm missing something really obvious.
I'm using the crawler builder inside of glue, which doesn't seem to offer much customization.
Here's the schema from the table generated by the default crawler:
And here's one of the items that I've exported from dynamo:
{
"Item": {
"the_url": {
"S": "/2021/07/06/****redacted****.html"
},
"as_of_when": {
"S": "2021-09-01"
},
"user_hashes": {
"SS": [
"****redacted*****"
]
},
"user_id_hashes": {
"SS": [
"u3MeXDcpQm0ACYuUv6TMrg=="
]
},
"accumulated_count": {
"N": "1"
},
"today_count": {
"N": "1"
}
}
}
The way Athena interprets JSON data means that your data has only a single column, Item. Athena doesn't have any mechanism to map arbitrary parts of a JSON object to columns, it can only map top-level attributes to columns.
If you want other parts of the objects as columns you will either have to create a new table with transformed data, or create a view with the attributes as columns, e.g.
CREATE OR REPLACE VIEW attributes_as_top_level_columns AS
SELECT
item.the_url.S AS the_url,
CAST(item.as_of_when.S AS DATE) AS as_of_when,
item.user_hashes.SS AS user_hashes,
item.user_id_hashes.SS AS user_id_hashes,
item.accumulated_count.N AS accumulated_count,
item.today_count.N AS today_count
FROM items
In the example above I've also flattened the data type keys (S, SS, N) and I converted the date string to a date.

Process events from Event hub using pyspark - Databricks

I have a Mongo change stream (a pymongo application) that is continuously getting the changes in collections. These change documents as received by the program are sent to Azure Event Hubs. A Spark notebook has to read the documents as they get into Event Hub and do Schema matching (match the fields in the document with spark table columns) with the spark table for that collection. If there are fewer fields in the document than in the table, columns have to be added with Null.
I am reading the events from Event Hub like below.
spark.readStream.format("eventhubs").option(**config).load().
As said in the documentation, the original message is in the 'body' column of the dataframe that I am converting to string. Now I have got the Mongo document as a JSON string in a streaming dataframe. I am facing below issues.
I need to extract the individual fields in the mongo document. This is needed to compare what fields are present in the spark table and what is not in Mongo document. I saw a function called get_json_object(col,path). This essentially returns a string again and I cannot individually select all the columns.
If from_json can be used to convert the JSON string to Struct type, I cannot specify the Schema because we have close to 70 collections (corresponding number of spark tables as well) each sending Mongo docs with fields from 10 to 450.
If I can convert the JSON string in streaming dataframe to a JSON object whose schema can be inferred by the dataframe (something like how read.json can do), I can use SQL * representation to extract the individual columns, do few manipulations & then save the final dataframe to the spark table. Is it possible to do that? What is the mistake I am making?
Note: Stram DF doesn't support collect() method to individually extract the JSON string from underlying rdd and do the necessary column comparisons. Using Spark 2.4 & Python in Azure Databricks environment 4.3.
Below is the sample data I get in my notebook after reading the events from event hub and casting it to string.
{
"documentKey": "5ab2cbd747f8b2e33e1f5527",
"collection": "configurations",
"operationType": "replace",
"fullDocument": {
"_id": "5ab2cbd747f8b2e33e1f5527",
"app": "7NOW",
"type": "global",
"version": "1.0",
"country": "US",
"created_date": "2018-02-14T18:34:13.376Z",
"created_by": "Vikram SSS",
"last_modified_date": "2018-07-01T04:00:00.000Z",
"last_modified_by": "Vikram Ganta",
"last_modified_comments": "Added new property in show_banners feature",
"is_active": true,
"configurations": [
{
"feature": "tip",
"properties": [
{
"id": "tip_mode",
"name": "Delivery Tip Mode",
"description": "Tip mode switches the display of tip options between percentage and amount in the customer app",
"options": [
"amount",
"percentage"
],
"default_value": "tip_percentage",
"current_value": "tip_percentage",
"mode": "multiple or single"
},
{
"id": "tip_amount",
"name": "Tip Amounts",
"description": "List of possible tip amount values",
"default_value": 0,
"options": [
{
"display": "No Tip",
"value": 0
}
]
}
]
}
]
}
}
I would like to separate and take out the full_document in the sample above. When I use get_json_object, I am getting the full_document in another streaming dataframe as JSON string and not as an object. As you can see, there are some array types in full_document which I can explode (documentation says that explode is supported in streaming DF, but havent tried) but there are some objects also (like struct type) which I would like to extract the individual fields. I cannot use the SQL '*' notation because what get_json_object returns is a string and not the object itself.
Its convincing that this much varied Schema of the JSON would be better with schema mentioned explicitly. So I took it like, in a streaming environment with very different Schema of the incoming stream, its always better to specify the schema. So I am proceeding with get_json_object and from_json and reading the schema through a file.

Cannot convert list in Excel Power Query Editor

I am trying to convert JSON API response into a table in Excel using the Power Query functionality. I am currently getting the error below when I try to put the customerOrderHistory which is a list into a delimited list as I don't want to create extra rows for a list if I can avoid it.
If possible I would like to either
Just print the customerOrderHistory list in the JSON format that it is in already into the cell
OR
Create a delimited list of the values as the lists only contain one entry at the moment
The JSON test file looks like this:
{
"computerid": "1",
"total": 1,
"results": [
{
"computerid": "1",
"customerOrderHistory": [
{
"orderId": "1",
"channelId": null,
"agentId": null,
"orderItems": 1
}
]
}
]
}
Thanks
You can use Json.FromValue to convert the list into a Binary value, and then use Text.FromBinary to get back the string representation. If you can dig down deep enough to get to the customerOrderHistory field it would look like Text.FromBinary(Json.FromValue(result[customerOrderHistory])).

Order of performing sorting on a big JSON object

i have a big json object with a list of "tickets". schema looks like below
{
"Artist": "Artist1",
"Tickets": [
{
"Id": 1,
"Attr2Array": [
{
"Att41": 1,
"Att42": "A",
"Att43": null
},
{
"Att41": 1,
"Att42": "A",
"Att43": null
},
],
.
.
.
(more properties)
"Price": "20",
"Description": "I m a ticket"
},
{
"Id": 4,
"Attr2Array": [
{
"Att41": 1,
"Att42": "A",
"Att43": null
},
{
"Att41": 1,
"Att42": "A",
"Att43": null
},
],
.
.
.
.
(more properties)
"Price": "30",
"Description": "I m a ticket"
}
]
}
each item in the list has around 25-30 properties (some simple types, and others complex array as nested objects)
i have to read the object from an api endpoint and extract only "ID" and "Description" but they need to be sorted by "Price" which is an int for example
In what order shall i proceed with this data manipulation
Shall i use the json object, deserialised it into another object with just those 2 properties (which i need) and THEN perform sort "asc" on the "Price"?
Please note that after i have the sorted list i will have to convert it back to a json list because the front end consumes a json after all.
What i dont like about this approach is the cycle of serialisation and deserialisation that happens
or
I perform a sort on the json object first (using for example a binary/bubble sort) and then use the object to create a strongly typed (deserialised) object with just those 2 properties and then serialise it back to pass to the front end
I dont know how performant the bubble sort will be and if at all i will get any gain in performance for large chunks of data processing.
I also need to keep in mind that this implementation can take into account other properties like "availabilitydate" because at a later date, this front end could add one more filter like "availabilitdate" asc
any help is much appreciated
thanks
You can deserialize your JSON string (or file) using the Microsoft System.Web.Extensions and JavaScriptSerializer.DeserializeObject.
First, you must have classes associated to your JSON. To create classes, select your JSON sample data and, in Visual Studio, go to Edit / Paste Special / Paste JSON As Classes.
Next, use this sample to deserialize a JSON string to typed objects, and to sort all Tickets by Price property using Linq.
String json = System.IO.File.ReadAllText(#"C:\Data.json");
var root = new System.Web.Script.Serialization.JavaScriptSerializer().Deserialize<Rootobject>(json);
var sortedTickets = root.Tickets.OrderBy(t => t.Price);

How do I write a query using Elasticsearch Java API where the elements in a JSON list contains some of the elements in a List<>?

I have this JSON-object:
{
"title": "Food",
"Dishes": [
"Pancakes",
"Tacos"
],
"rating": "5"
}
And I need to write a query using Elasticsearch's Java API that will match JSON-objects where the "Dishes" field contains either the string "Pancakes" or "Soup" (hence the JSON above should match). The elements that I search for are stored in a list like this:
List<String> findElems = Arrays.asList("Pancakes", "Soup");
I have tried to use QueryBuilders but I cant find out how to write a query that matches JSON-files where the "Dishes" list contains one or more of the elements in findElems.
Use terms query to do that,
List<String> list= new ArrayList<String>();
list.add("Pancakes");
list.add("Tacos");
QueryBuilders.termsQuery("Dishes", list);
HOpe it helps!