Parsing JSON in Power BI with a Custom Structure

Parsing JSON in Power BI with a Custom Structure - json

I am attempting to parse JSON data in Power BI that has a structure like the following:
{"name":{"0":"Jerry","1":"Ron","2":"Sally","3":"Sue"},"grade":{"0":78,"1":99,"2":88,"3":97}}
Currently, I have tried transforming the data via transform -> To Table -> Parsed JSON but it returns the table in the following format which when I attempt to expand rows shows the table two below. If I expand again, the data is not in the correct format (see table 3, through all the elements in my data). I need the data to be in the format like seen in table 4. Is there a different way to parse data that is in this custom structure?
Table 1:
| Column1 |
| -------- |
| *Record* |
Table 2:
| Name | Score |
| -------- | -------------- |
| *Record* | *Record* |
Table 3:
Name.0
Name.1
Name.2
Name.3
Grade.0
Grade.1
Grade.2
Grade.3
Jerry
Ron
Sally
Sue
78
99
88
87
Table 4:
Name
Score
Jerry
78
Ron
99
Sally
88
Sue
87

let
Source = Json.Document("{""name"":{""0"":""Jerry"",""1"":""Ron"",""2"":""Sally"",""3"":""Sue""},""grade"":{""0"":78,""1"":99,""2"":88,""3"":97}}"),
#"Converted to Table" = Record.ToTable(Source),
#"Expanded Value" = Table.ExpandRecordColumn(#"Converted to Table", "Value", {"0", "1", "2", "3"}, {"0", "1", "2", "3"}),
#"Transposed Table" = Table.Transpose(#"Expanded Value"),
#"Promoted Headers" = Table.PromoteHeaders(#"Transposed Table", [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"name", type text}, {"grade", Int64.Type}})
in
#"Changed Type"
Results in

Related

JOINS in Debezium : MySQL to Elasticsearch

I've been trying to set up a MySQL to Elasticsearch data pipeline for real-time data replication.
The MySQL database has around 10 different tables that are highly normalized. But in Elasticsearch, I'm in need to have all of the data from these tables in a single index, which would be similar to the output from a big compound JOIN query. Tried a lot to find out, please help 🙂
(Changing the DB schema isn't feasible as there are a lot of other dependent services. )
For example :
Input from MySQL:
Table: main_profile
+--------+------+
| name | city |
+--------+------+
| Edward | 1 |
| Jake | 9 |
+--------+------+
Table: city_master
+---------+----------+
| city_id | name |
+---------+----------+
| 1 | New York |
| 9 | Tampa |
+---------+----------+
Document stored in Elasticsearch:
{
"0": {
"name": "Edward",
"city": "New York"
},
"1": {
"name": "Jake",
"city": "Tampa"
}
}

you can use Kafka Streams to do aggregation from two different topics to build a unfied message. Please check an example for Debezium source https://github.com/debezium/debezium-examples/tree/master/kstreams
The target is MongoDB in the example but the principle is the same.

MySQL nested JSON column search and extract sub JSON

I have a MySQL table authors with columns id, name and published_books. In this, published_books is a JSON column. With sample data,
id | name | published_books
-----------------------------------------------------------------------
1 | Tina | {
| | "17e9bf8f": {
| | "name": "Book 1",
| | "tags": [
| | "self Help",
| | "Social"
| | ],
| | "language": "English",
| | "release_date": "2017-05-01"
| | },
| | "8e8b2470": {
| | "name": "Book 2",
| | "tags": [
| | "Inspirational"
| | ],
| | "language": "English",
| | "release_date": "2017-05-01"
| | }
| | }
-----------------------------------------------------------------------
2 | John | {
| | "8e8b2470": {
| | "name": "Book 4",
| | "tags": [
| | "Social"
| | ],
| | "language": "Tamil",
| | "release_date": "2017-05-01"
| | }
| | }
-----------------------------------------------------------------------
3 | Keith | {
| | "17e9bf8f": {
| | "name": "Book 5",
| | "tags": [
| | "Comedy"
| | ],
| | "language": "French",
| | "release_date": "2017-05-01"
| | },
| | "8e8b2470": {
| | "name": "Book 6",
| | "tags": [
| | "Social",
| | "Life"
| | ],
| | "language": "English",
| | "release_date": "2017-05-01"
| | }
| | }
-----------------------------------------------------------------------
As you see, the published_books column has nested JSON data (one level). JSON will have dynamic UUIDs as the keys and its values will be book details as a JSON.
I want to search for books with certain conditions and extract those books JSON data alone to return as the result.
The query that I've written,
select JSON_EXTRACT(published_books, '$.*') from authors
where JSON_CONTAINS(published_books->'$.*.language', '"English"')
and JSON_CONTAINS(published_books->'$.*.tags', '["Social"]');
This query performs the search and returns the entire published_books JSON. But I wanted just those books JSON alone.
The expected result,
result
--------
"17e9bf8f": {
"name": "Book 1",
"tags": [
"self Help",
"Social"
],
"language": "English",
"release_date": "2017-05-01"
}
-----------
"8e8b2470": {
"name": "Book 6",
"tags": [
"Social",
"Life"
],
"language": "English",
"release_date": "2017-05-01"
}

There is no JSON function yet that filters elements of a document or array with "WHERE"-like logic.
But this is a task that some people using JSON data may want to do, so the solution MySQL has provided is to use the JSON_TABLE() function to transform the JSON document into a format as if you had stored your data in a normal table. Then you can use a standard SQL WHERE clause to the fields returned.
You can't use this function in MySQL 5.7, but if you upgrade to MySQL 8.0 you can do this.
select authors.id, authors.name, books.* from authors,
json_table(published_books, '$.*'
columns(
bookid for ordinality,
name text path '$.name',
tags json path '$.tags',
language text path '$.language',
release_date date path '$.release_date')
) as books
where books.language = 'English'
and json_search(tags, 'one', 'Social') is not null;
+----+-------+--------+--------+-------------------------+----------+--------------+
| id | name | bookid | name | tags | language | release_date |
+----+-------+--------+--------+-------------------------+----------+--------------+
| 1 | Tina | 1 | Book 1 | ["self Help", "Social"] | English | 2017-05-01 |
| 3 | Keith | 2 | Book 6 | ["Social", "Life"] | English | 2017-05-01 |
+----+-------+--------+--------+-------------------------+----------+--------------+
Note that nested JSON arrays are still difficult to work with, even with JSON_TABLE(). In this example, I exposed the tags as a JSON array, and then use JSON_SEARCH() to find the tag you wanted.
I agree with Rick James — you might as well store the data in normalized tables and columns. You think that using JSON will save you some work, but it's won't. It might make it more convenient to store the data as a single JSON document instead of multiple rows across several tables, but you just have to unravel the JSON again before you can query it the way you want.
Furthermore, if you store data in JSON, you will have to solve this sort of JSON_TABLE() expression every time you want to query the data. That's going to make a lot more work for you on an ongoing basis than if you had stored the data normally.
Frankly, I have yet to see a question on Stack Overflow about using JSON with MySQL that wouldn't lead to the conclusion that storing data in relational tables is a better idea than using JSON, if the structure of the data doesn't need to vary.

You are approaching the task backwards.
Do the extraction as you insert the data. Insert into a small number of tables (Authors, Books, Tags, and maybe a couple more) and build relations between them. No JSON is needed in this database.
The result is an easy-to-query and fast database. However, it requires learning about RDBMS and SQL.
JSON is useful when the data is a collection of random stuff. Your JSON is very regular, hence the data fits very nicely into RDBMS technology. In that case, JSON is merely a standard way to serialize the data. But it should not be used for querying.

Prevent duplicate values in mysql nested tables using sequelize

I am unable to find out duplicate rows exist or not. I have 3 tables with following table structure.
domaintable
--------------------
| id name |
--------------------
| 1 car |
| 2 student |
|__________________|
domaintableschema
-------------------------------------------
| id domaintableId columnName type |
-------------------------------------------
| 1 1 model string |
| 2 1 year string |
| 3 2 name string |
| 4 2 gender string |
-------------------------------------------
domaintabledata
----------------------------------------------
| id domaintableschemaId row value |
----------------------------------------------
| 1 1 1 Jaguar |
| 2 2 1 2016 |
| 3 3 1 Joe |
| 4 4 1 male |
| 5 3 2 Jake |
| 6 4 2 female |
|_____________________________________________|
In the user interface, the tables car and students are shown as domain tables. User have privilege to create more domain tables with any domain schema. But user is not allowed to insert duplicate values into domain table data. The API payload to insert data as follows, suppose user wants to insert into student table with values male and Jake.
{
"0": {
"domaintableschemaId": 4,
"value": "male",
"row": 3
},
"1": {
"domaintableschemaId": 3,
"value": "Joe",
"row": 3
}
}
what I have tried, I wrote following query to avoid duplicates but it doesn't work in all the cases. The approach, I formed schemaId = [4, 3] and value = [ "Joe", "male"]
SELECT * from domaintabledata where values in $value and schemaId in $schemaId which returns rows array, any array element with the length equal to payload will decide duplicate present or not. But it will not work when I have to insert following row (note: values are same, only schema ids are revered)
{
"0": {
"domaintableschemaId": 3,
"value": "male",
"row": 3
},
"1": {
"domaintableschemaId": 4,
"value": "Joe",
"row": 3
}
}
Even though the above value is not duplicate, my solution will not allow to insert as the values are same and out of rows returned one row length will be same as row to be inserted. Please help me with any elegant approach to prevent inserting duplicate values. I use sequelize and sequelize-hierarchy as ORM.

Postgres update JSON field

I've got several Postgres 9.4 tables that contain data like this:
| id | data |
|----|-------------------------------------------|
| 1 | {"user": "joe", "updated-time": 123} |
| 2 | {"message": "hi", "updated-time": 321} |
I need to transform the JSON column into something like this
| id | data |
|----|--------------------------------------------------------------|
| 1 | {"user": "joe", "updated-time": {123, "unit":"millis"}} |
| 2 | {"message": "hi", "updated-time": {321, "unit":"millis"}} |
Ideally it would be easy to apply the transformation to multiple tables. Tables that contain the JSON key data->'updated-time' should be updated, and ones that do not should be skipped. Thanks!

You can use the || operator to merge two jsonb objects together.
select '{"foo":"bar"}'::jsonb || '{"baz":"bar"}'::jsonb;
= {"baz": "bar", "foo": "bar"}

Write JSON into a field using Talend Open Studio

I try to migrate data an old database into our new application.
In process, I need to grab data from old db to create a JSON which must be stored in a field in the new MySQL db.
So I use the components tWriteJSONField and tExtractJSONFields.
In tWriteJSONField, my XML tree looks like this :
path
|-- id [loop element]
|-- name
|-- description
N.B. : I can't find how to use loop element and group element properties. I don't understand how it works and the documentation doesn't talk about this.
The component tWriteJSONField is linked to a tExtractJSONFields in order to extract id from JSON. I need this to know to each record JSON must be linked.
tExtractJSONFields configuration : XPath request
"/path"
tExtractJSONFields configuration : Mapping
-----------------------------------------------
| column | XPath request | get nodes ? |
-----------------------------------------------
| idForm | "id" | false |
-----------------------------------------------
| jsonStructure | "*" | yes |
-----------------------------------------------
My problem is in jsonStructure output by tExtractJSONField, I only get the first child of my root tag. In my case jsonStructure looks like this :
{
"id": "123"
}
Expected result is :
{
"id": "123",
"name": "Test",
"description": "Test"
}
If I declare the child name before id for example, I will get :
{
"name": "Test"
}
I have tried to change the XPath query for jsonStructure but I never get all the fields.
Why ?
It's my first question about Talend, so if it lacks information, let me know in the comments.
Thanks for help.
EDIT :
Data from tMysqlInput to tWriteJSONField :
N.B. : My flux contains more columns but I only show you which are used to create JSON.
---------------------------------------------------------------------------------------
| IdForm | NomForm | DescrForm |
---------------------------------------------------------------------------------------
| 1 | English training | <p>This is a description of the training</p> |
---------------------------------------------------------------------------------------
| 2 | French training | <p>This contains HTML tags from a WYSIWYG</p> |
---------------------------------------------------------------------------------------
| 3 | How to use the application | <p>Description</p> |
---------------------------------------------------------------------------------------
In tWriteJSONField, columns are mapped to the JSON like this :
path
|-- id [loop element] --> IdForm
|-- name --> NomForm
|-- description --> DescrForm
tWriteJSONField output a new flux with the same columns as the input (although, this columns are all empty in the output even if they were populated in input) and add a new one jsonStructure which contains generated JSON.
This new flux is caught by a tExtractJSONFields (configuration for this component is available in my original post).
tExtractJSONFields outputs this flux :
--------------------------
| IdForm | jsonStructure |
--------------------------
| 1 | { "id": "1" } |
--------------------------
| 2 | { "id": "2" } |
--------------------------
| 3 | { "id": "3" } |
--------------------------
And I expect it returns this one :
--------------------------------------------------------------------------------------------
| IdForm | jsonStructure |
--------------------------------------------------------------------------------------------
| 1 | { "id": "1", "name": "English training", "description": "<p>This is[...]</p>" } |
--------------------------------------------------------------------------------------------
| 2 | { "id": "2", "name": "French training", "description": "<p>[...]</p>" } |
--------------------------------------------------------------------------------------------
| 3 | { "id": "3", "name": "How to use the [...]", "description": "<p>[...]</p>" } |
--------------------------------------------------------------------------------------------
EDIT 2
I use TOS 5.4.0.r110020 if it can help.

Your XPath request for JSONStructure column is not correct. Just remove "*" and you will get the expected result.
Also, if you don't need the root node in the json entry, just check "Remove root node" on tWriteJsonField and change Loop XPath Query to "/" in tExtractJSONFields

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Parsing JSON in Power BI with a Custom Structure - json

Related

JOINS in Debezium : MySQL to Elasticsearch

MySQL nested JSON column search and extract sub JSON

Prevent duplicate values in mysql nested tables using sequelize

Postgres update JSON field

Write JSON into a field using Talend Open Studio

Categories

Resources