MySQL query to retrieve tabular data in json format - mysql

I have a table like below in MySQl database
user-name mail
ganesh g#g.com
gani gani#gani.com
gan gan#gan.com
I need query to retrieve above table in JSON format
Example:
[{
user-name:"ganesh",
mail:"g#g.com"
},
{
user-name:"gani",
mail:"gani#gani.com"
},
{
user-name:"gan",
mail:"gan#gan.com"
}
]
I need help, to do above

It's not recommended to do such things in the DBMS, do it in the script that is loading the data instead, if you're wrapping some legacy code you can't edit then wrap it with more code to format the data.
If all that fails do something like this: http://www.thomasfrank.se/mysql_to_json.html
SELECT
CONCAT("[",
GROUP_CONCAT(
CONCAT("{username:'",username,"'"),
CONCAT(",email:'",email),"'}")
)
,"]")
AS json FROM users;

Related

I have a MySQL database whereI am trying to retrieve JSON data that stores urls but the results keep coming back empty

The data in the column looks like the below:
{
"activity": {
"token": "e7b64be4-74d4-7a6d-a74b-xxxxxxx",
"route": "http://example.com/enroll/confirmation",
"url_parameters": {
"Success": "True",
"ContractNumber": "003992314W",
"Barcode": "1908Y10Z",
"price": "8.99"
},
"server_info": {
"cookie": [
"_ga=xxxx; _fbp=xxx; _hjid=xxx; XDEBUG_SESSION=XDEBUG_ECLIPSE;"
],
"upgrade-insecure-requests": [
"1"
],
},
"campaign": "Unknown/None",
"ip": "192.168.10.1",
"entity": "App\\Models\\User",
"entity_id": "1d9f3066-13ce-4659-b10d-xxxxx",
},
"time": "2021-05-21 20:15:02"
}
My code that I am using is below:
SELECT *
FROM websote.stored_events
WHERE JSON_EXTRACT(event_properties, '$.route') = 'http://example.com/enroll/confirmation'
ORDER BY created_at DESC LIMIT 500;
The code works on the other the json values just not the url ones. I've tried escaping the values in MySQL like the below:
SELECT *
FROM websote.stored_events
WHERE JSON_EXTRACT(event_properties, '$.route') = 'http:///example.com//enroll//confirmation'
ORDER BY created_at DESC LIMIT 500;
But still no luck. Any help on this would be appreciated!
Route is a nested property; I would have expected the path to be
JSON_EXTRACT(event_properties, '$.activity.route')
Your example data isn't valid JSON. You can't have a comma after the last element in an object or array:
"entity_id": "1d9f3066-13ce-4659-b10d-xxxxx",
},
^ here
If I remove that and other similar cases, I can test your data inserts into a JSON column and I can extract the object element you described:
mysql> select json_extract(event_properties, '$.activity.route') as route from stored_events;
+------------------------------------------+
| route |
+------------------------------------------+
| "http://example.com/enroll/confirmation" |
+------------------------------------------+
Note the value is returned with double-quotes. This is because it's returned as a JSON document, a scalar string. If you want the raw value, you have to unquote it:
mysql> select json_unquote(json_extract(event_properties, '$.activity.route')) as route from stored_events;
+----------------------------------------+
| route |
+----------------------------------------+
| http://example.com/enroll/confirmation |
+----------------------------------------+
If you want to search for that value, you would have to do a similar expression:
select * from stored_events
where json_unquote(json_extract(event_properties, '$.activity.route'))
= 'http://example.com/enroll/confirmation'
Searching based on object properties stored in JSON has disadvantages.
It requires complex expressions that force you (and anyone else you needs to maintain your code) to learn a lot of details about how JSON works.
It cannot be optimized with an index. This query will run a table-scan. You can add virtual columns with indexes, but that adds to complexity and if you need to ALTER TABLE to add virtual columns, it misses the point of JSON to store semi-structured data.
The bottom line is that if you find yourself using JSON functions in the WHERE clause of a query, it's a sign that you should be storing the column you want to search as a normal column, not as part of a JSON document.
Then you can write code that is easy to read, easy for your colleagues to maintain, and can be optimized easily with indexes:
SELECT * FROM stored_events
WHERE route = 'http://example.com/enroll/confirmation';
You can still store other properties in the JSON document, but the ones you want to be searchable should be stored in normal columns.
You might like to view my presentation How to Use JSON in MySQL Wrong.

AWS Athena and handling json

I have millions of files with the following (poor) JSON format:
{
"3000105002":[
{
"pool_id": "97808",
"pool_name": "WILDCAT (DO NOT USE)",
"status": "Zone Permanently Plugged",
"bhl": "D-12-10N-05E 902 FWL 902 FWL",
"acreage": ""
},
{
"pool_id": "96838",
"pool_name": "DRY & ABANDONED",
"status": "Zone Permanently Plugged",
"bhl": "D-12-10N-05E 902 FWL 902 FWL",
"acreage": ""
}]
}
I've tried to generate an Athena DDL that would accommodate this type (especially the api field) of structure with this:
CREATE EXTERNAL TABLE wp_info (
api:array < struct < pool_id:string,
pool_name:string,
status:string,
bhl:string,
acreage:string>>)
LOCATION 's3://foo/'
After trying to generate a table with this, the following error is thrown:
Your query has the following error(s):
FAILED: ParseException line 2:12 cannot recognize input near ':' 'array' '<' in column type
What is a workable solution to this issue? Note that the api string is different for every one of the million files. The api key is not actually within any of the files, so I hope there is a way that Athena can accommodate just the string-type value for these data.
If you don't have control over the JSON format that you are receiving, and you don't have a streaming service in the middle to transform the JSON format to something simpler, you can use regex functions to retrieve the relevant data that you need.
A simple way to do it is to use Create-Table-As-Select (CTAS) query that will convert the data from its complex JSON format to a simpler table format.
CREATE TABLE new_table
WITH (
external_location = 's3://path/to/ctas_partitioned/',
format = 'Parquet',
parquet_compression = 'SNAPPY')
AS SELECT
regexp_extract(line, '"pool_id": "(\d+)"', 1) as pool_id,
regexp_extract(line, ' "pool_name": "([^"])",', 1) as pool_name,
...
FROM json_lines_table;
You will improve the performance of the queries to the new table, as you are using Parquet format.
Note that you can also update the table when you can new data, by running the CTAS query again with external_location as 's3://path/to/ctas_partitioned/part=01' or any other partition scheme

Insert JSON to Hadoop using Spark (Java)

I'm very new in Hadoop,
I'm using Spark with Java.
I have dynamic JSON, exmaple:
{
"sourceCode":"1234",
"uuid":"df123-....",
"title":"my title"
}{
"myMetaDataEvent": {
"date":"10/10/2010",
},
"myDataEvent": {
"field1": {
"field1Format":"fieldFormat",
"type":"Text",
"value":"field text"
}
}
}
Sometimes I can see only field1 and sometimes I can see field1...field50
And maybe the user can add fields/remove fields from this JSON.
I want to insert this dynamic JSON to hadoop (to hive table) from Spark Java code,
How can I do it?
I want that the user can after make HIVE query, i.e: select * from MyTable where type="Text
I have around 100B JSON records per day that I need to insert to Hadoop,
So what is the recommanded way to do that?
*I'm looked on the following: SO Question but this is known JSON scheme where it isnt my case.
Thanks
I had encountered kind of similar problem, I was able to resolve my problem using this. ( So this might help if you create the schema before you parse the json ).
For a field having a string data type you could create the schema :-
StructField field = DataTypes.createStructField(<name of the field>, DataTypes.StringType, true);
For a field having a int data type you could create the schema :-
StructField field = DataTypes.createStructField(<name of the field>, DataTypes.IntegerType, true);
After you have added all the fields in a List<StructField>,
Eg:-
List<StructField> innerField = new ArrayList<StructField>();
.... Field adding logic ....
Eg:-
innerField.add(field1);
innerField.add(field2);
// One instance can come, or multiple instance of value comes in an array, then it needs to be put in Array Type.
ArrayType getArrayInnerType = DataTypes.createArrayType(DataTypes.createStructType(innerField));
StructField getArrayField = DataTypes.createStructField(<name of field>, getArrayInnerType,true);
You can then create the schema :-
StructType structuredSchema = DataTypes.createStructType(getArrayField);
Then I read the json using the schema generated using the Dataset API.
Dataset<Row> dataRead = sqlContext.read().schema(structuredSchema).json(fileName);

How do I query a complex JSONB field in Django 1.9

I have a table item with a field called data of type JSONB. I would like to query all items that have text that equals 'Super'. I am trying to do this currently by doing this:
Item.objects.filter(Q(data__areas__texts__text='Super'))
Django debug toolbar is reporting the query used for this is:
WHERE "item"."data" #> ARRAY['areas', 'texts', 'text'] = '"Super"'
But I'm not getting back any matching results. How can I query this using Django? If it's not possible in Django, then how can I query this in Postgresql?
Here's an example of the contents of the data field:
{
"areas": [
{
"texts": [
{
"text": "Super"
}
]
},
{
"texts": [
{
"text": "Duper"
}
]
}
]
}
try Item.objects.filter(data__areas__0__texts__0__text='Super')
it is not exact answer, but it can clarify some jsonb filter features, also read django docs
I am not sure what you want to achieve with this structure, but I was able to get the desired result only with strange raw query, it can look like this:
Item.objects.raw("SELECT id, data FROM (SELECT id, data, jsonb_array_elements(\"table_name\".\"data\" #> '{areas}') as areas_data from \"table_name\") foo WHERE areas_data #> '{texts}' #> '[{\"text\": \"Super\"}]'")
Dont forget to change table_name in query (in your case it should be yourappname_item).
Not sure you can use this query in real programs, but it probably can help you to find a way for a better solution.
Also, there is very good intro to jsonb query syntax
Hope it will help you

Select JSON array's fields from SQL view to create a column

I have an SQL Table which one of the columns contain a JSON array in the following format:
[
{
"id":"1",
"translation":"something here",
"value":"value of something here"
},
{
"id":"2",
"translation":"something else here",
"value":"value of something else here"
},
..
..
..
]
Is there any way to use an SQL Query and retrieve columns with the ID as header and the "value" as the value of the column? Instead of return only one column with the JSON array.
For example, if I run:
SELECT column_with_json FROM myTable
It will return the above array. Where I want to return
1,2
value of something here, value of something else here
You can't use SQL to retrieve columns from the JSON stored inside the table: to the database engine the JSON is just unstructured text saved in a text field.
Some relational databases, like PostgreSQL, have a JSON type and functions to support JSON query. If this is your case, you should be able to perform the query you want.
Check this for an example on how it work with PostgreSQL:
http://clarkdave.net/2013/06/what-can-you-do-with-postgresql-and-json/