Convert record with Power Query and JSON - json

I'm using Power Query on Excel 2013 to convert an huge JSON file (more than 100Mb) to plain excel sheet.
All the fields except one are converted correctly but there is one specific field that is recognized as record. All other fields have a fixed text value or values separated by comma, so the conversion is pretty easy, this field inside has a JSON record structure so "Field" : "Value".
This is an extract of the file:
{
"idTrad": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"images": {
"1": "SE1.JPG",
"2": "SE2.JPG"
},
"date": "2018-09-22",
"category": "MD",
"value": "Original text",
"language": "IT",
"contexts": [
""
],
"label": "Translated text",
"variantes": "1,23,45,23,32,232,2315,23131",
"theme": [
"XX_XXX"
]
}
The problematic field is "images" because it's recognized as a record, in the resulting table I have this situation:
[1]: https://i.stack.imgur.com/EnHow.png
My query so far is:
let
Source = Json.Document(File.Contents("filename.json")),
#"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Column1 développé" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"value", "contexts", "theme", "variantes", "category", "label", "language", "idTrad","images", "date"}, {"Column1.value", "Column1.contexts", "Column1.theme", "Column1.variantes", "Column1.category", "Column1.label", "Column1.language", "Column1.idTrad","Column1.images", "Column1.date"}),
#"Valeurs extraites" = Table.TransformColumns(#"Column1 développé", {"Column1.contexts", each Text.Combine(List.Transform(_, Text.From), ","), type text}),
#"Valeurs extraites1" = Table.TransformColumns(#"Valeurs extraites", {"Column1.theme", each Text.Combine(List.Transform(_, Text.From), ","), type text})
in
#"Valeurs extraites1"
I would like to have in the images field a text rappresentation of the record so something like "1: SE1.JPG, 2: SE2.JPG", any ideas?

Sure, you can even do it in one step! If you convert a record to a table (Record.ToTable) it will create a table where the names of the fields in your record are in a column called "Name" and the values are in a column called "Value". This way you get your "1", "2", etc from the json file. From there you can just combine the columns into the text you want and convert and combine a list like you did in the rest of your columns.
= Table.TransformColumns(#"Valeurs extraites1",{"Column1.images",
each
Text.Combine(
Table.ToList(
Table.CombineColumns(
Record.ToTable(_)
,{"Name", "Value"},Combiner.CombineTextByDelimiter(": ", QuoteStyle.None),"Merged")
)
, ", ")
})
I wouldn't think Record.ToTable localizes it's column naming , but maybe test it with just converting the record to a table first to see what it does.
Table.TransformColumns(#"Valeurs extraites1",{"Column1.images",each Record.ToTable(_)})

Related

Appending JSON array to existing JSON column

I have some data to be inserted into a MySQL column with the JSON datatype (blob_forms).
The value of the fields column is populated asynchronously, and if the document has multiple pages, then I need to append the data onto the existing row.
So a same table is;
document
document_id INT
text_data JSON
blob_forms JSON
blob_data JSON
The first chunk of data is correctly inserted and it is this data; (A sample)
{"fields": [
{"key": "Date", "value": "01/01/2020"},
{"key": "Number", "value": "xxx 2416 xx"},
{"key": "Invoice Date", "value": "xx/xx/2020"},
{"key": "Reg. No.", "value": "7575855"},
{"key": "VAT", "value": "1,000.00"}
]}
I am using lambda (Python) to handle the database insert, using this query
insertString = json.dumps(newObj)
sql = "INSERT INTO `document` (`document_id`, `blob_forms`) VALUES (%s, %s) ON DUPLICATE KEY UPDATE `blob_forms` = %s"
cursor.execute(sql, (self.documentId, insertString, insertString))
conn.commit()
The problem is, I also want to do an UPDATE too, so that if blob_forms has a value already, it would add the new items in the fields array to the existing objects fields array.
So basically use the original data input a second, so that if it is sent again, with the same document_id it would append to any existing data in blob_forms but preserve the JSON structure.
(Please note other processes write to this table and possibly this row due to the async nature, as the data for the columns can be written in any order, but the document_id ties them all together.
My failed attempt was something like this;
SET #j = {"fields": [{"key": "Date", "value": "01/01/2020"},{"key": "Number", "value": "xxx 2416 xx"},{"key": "Invoice Date", "value": "xx/xx/2020"},{"key": "Reg. No.", "value": "7575855"},{"key": "VAT", "value": "1,000.00"}]}
INSERT INTO `document` (`document_id`, `blob_forms`) VALUES ('DFGHJKfghj45678', #j) ON DUPLICATE KEY UPDATE blob_forms = JSON_INSERT(blob_forms, '$', #j)
I'm not sure that you can get the results that you want with 1 clean query in mysql. My advice would be to make the changes to the array on the client side (or wherever) and updating the entire field without delving into whether there is an existing value or not. I architect all of my api's in this way to keep the database interactions clean and fast.
So far this looks closest;
SET #j = '{"fields": [{"key": "Date", "value": "01/01/2020"},{"key": "Number", "value": "xxx 2416 xx"},{"key": "Invoice Date", "value": "xx/xx/2020"},{"key": "Reg. No.", "value": "7575855"},{"key": "VAT", "value": "1,000.00"}]}';
INSERT INTO `document` (`document_id`, `blob_forms`) VALUES ('DFGHJKfghj45678', #j) ON DUPLICATE KEY UPDATE blob_forms = JSON_MERGE_PRESERVE(blob_forms, #j)

Extract data from a JSON file using python

Say if I have JSON entry as follows(The JSON file generated by fetching data from a Firebase DB):
[{"goal_savings": 0.0, "social_id": "", "score": 0, "country": "BR", "photo": "http://graph.facebook", "id": "", "plates": 3, "rcu": null, "name": "", "email": ".", "provider": "facebook", "phone": "", "savings": [], "privacyPolicyAccepted": true, "currentRole": "RoleType.PERSONAL", "empty_lives_date": null, "userId": "", "authentication_token": "-------", "onboard_status": "ONBOARDING_WIZARD", "fcmToken": ----------", "level": 1, "dni": "", "social_token": "", "lives": 10, "bills": [{"date": "2020-12-10", "role": "RoleType.PERSONAL", "name": "Supermercado", "category": "feeding", "periodicity": "PeriodicityType.NONE", "value": 100.0"}], "payments": [], "goals": [], "goalTransactions": [], "incomes": [], "achievements": [{"created_at":", "name": ""}]}]
How do I extract the content corresponding to 'value' which is present inside column 'bills' . Any way to do this ?
My python code is as follows. With this I was only able to get data within bills column. But I need only the entry corresponding to 'value' which is present inside bills.
import json
filedata = open('firebase-dataset.json','r')
data = json.load(filedata)
listoffields = [] # To produce it into a list with fields
for dic in data:
try:
listoffields.append(dic['bills']) # only non-essential bill categories.
except KeyError:
pass
print(listoffields)
The JSON you posted contains misplaced quotes.
I think you are trying to extract the value of 'value' column within bills.
try this
print(listoffields[0][0]['value'])
which will print you 100.0 as str. use float() to use it in calculations.
---edit---
Say the JSON you having contains many JSON objects separated by commas as..
[{ first-entry },{ second-entry },{ third.. }, ....and so on]
..and you want to find the value of each bill in the each JSON obj..
may be the code below will work.-
bill_value_list = [] # to store 'value' of each bill
for bill_list in listoffields:
bill_value_list.append(float(bill_list[0]['value'])) # blill_list[0] will contain complete bill dictionary.
print(bill_value_list)
print(sum(bill_value_list)) # do something usefull
Paste it after the code you posted.(no changes to your code .. since it always works :-) )

Filter valid and invalid records in Spark

I have pyspark dataframe and it have 'n' number of rows with each row having one column result
The content of the result column is a JSON
{"crawlDate": "2019-07-03 20:03:44", "Code": "200", "c1": "XYZ", "desc": "desc", "attributes": {"abc":123, "def":456}}
{"crawlDate": "2019-07-04 20:03:44", "Code": "200", "c1": "ABC", "desc": "desc1"}
{"crawlDate": "2019-07-04 20:03:44", "Code": "200", "c1": "ABC", "desc": "desc1", "attributes": {"abc":456, "def":123}}
df.show():
Now I want to check how many records(ROWS) have attributes element and how many records don't have.
I tried to use array_contains, filter and explode functions in spark, but It didn't get the results.
Any suggestions please?
import org.apache.spark.sql.functions._
df.select(get_json_object($"result", "$.attributes").alias("attributes")) .filter(col("attributes").isNotNull).count()
with this logic, we can get the count of attribute existing records count
for your reference, please read this
https://docs.databricks.com/spark/latest/dataframes-datasets/complex-nested-data.html
another solution if your input is JSON format, then
val df = spark.read.json("path of json file")
df.filter(col("attributes").isNotNull).count()
similar API we can get in python.
Below simple logic worked after lot of struggle
total_count = old_df.count()
new_df = old_df.filter(old_df.result.contains("attributes"))
success_count = new_df.count()
failure_count = total_count - success_count

How to query MySQL Json Array

A MySQL table has a JSON column containing a large amount of JSON data.
For example:
SELECT nodes From Table LIMIT 1;
results in:
'{"data": {"id": "Node A", "state": true, "details": [{"value": "Value","description": "Test"}, {"value": "Value2", "description": "Test2"}, {"value": "Value 7", "description": "Test 7"}, {"value": "Value 9", "description": "Test 9"}]}}'
How can I write queries that return rows in accordance with the following examples:
Where Node A state is True. In this case "Node A" is the value of key "id" and "state" contains True or False.
Where "value" is "Value2" or where "description" is "Test2." Note that these values are in a list that contains key value pairs.
I doubt if you can make a direct MySQL query to achieve above task. You will have to load all the string data from MySQL db then parse this string to get JSON object upon which you can perform your custom query operation to get your output.
But here in this case i will suggest you to use MongoDB which will be an ideal database storage solution and you can make direct queries.

Postgres replace an array inside a JSONB field

I have a table where the data field has JSONB type and among many other data I have a notes key inside the data json value where I store an array of notes.
Each note has (at least) two fields: title and content.
Sometimes I have to replace the whole list of notes with a different list, but not affecting any other fields inside my json record.
I tried something like this:
UPDATE mytable
SET data = jsonb_set("data", '{notes}', '[{ "title": "foo1" "content": "bar"'}, { "title": "foo2" "content": "bar2"}]', true)
WHERE id = ?
And I get an exception (through a js wrapper)
error: invalid input syntax for type json
How should I correctly use the jsonb_set function?
You have a stray single quote and missing commas in your JSON payload
Instead of
[{ "title": "foo1" "content": "bar"'}, { "title": "foo2" "content": "bar2"}]
^ ^ ^
it should rather look
[{ "title": "foo1", "content": "bar"}, { "title": "foo2", "content": "bar2"}]