Count number of objects in JSON array stored as Hive string column - json

I have a Hive table with a JSON string stored as a string in a column.
Something like this.
Id | Column1 (String)
1 | [{k1:v1,k2:v2},{k3:v3,k4:v4}]
2 | [{k1:v1,k2:v2}]
I want to count the number of JSON objects in the column.
Id | Count
1 | 2
2 | 1
What would be the query to achieve this?

If the JSON objects are such simple structs without nested structs then you can split by '}' and use size()-1:
size(split(column,'[}]'))-1
It works with empty strings correctly, NULLs require special handling if you need to convert to 0:
case when column is null then 0 else size(split(column,'[}]'))-1 end

Related

how to create Json object of each column in a row

i have a table itemmaster in Postgresql.
id| attribute1 | attribute2 | attribute3
1 | Good | Average | Best
i want output as json like
[{"attribute1":"Good"},{"attribute2":"Average"},{"attribute3":"Best"}]
i want to use this JSON as nested JSON other object, ihave tried row_to_json and json object builder but not getting exact result.
select json_build_array(json_build_object('attribute1', itemmaster.attribute1),
json_build_object('attribute2', itemmaster.attribute2),
json_build_object('attribute3', itemmaster.attribute3))
from itemmaster;

Create hive external table with complex data type and load from CSV or TSV having few columns with serialized JSON object

I have CSV (or TSV) with a column ('nw_day' in example below) having serialized array object and another column ('res_m' in example below) having serialized JSON object. It also has columns with STRING, TIMESTAMP, and FLOAT data type.
For the TSV that looks somewhat like (showing first row)
+----------+---------------------+-------+-----------------------------------------------+------------------------------------------------------------------------+
| com_id | w_start_time | cap | nw_day | res_m |
+----------+---------------------+-------+-----------------------------------------------+------------------------------------------------------------------------+
| dtf_id | 2019-04-24 06:00:03 | 444.3 | {'Fri','Mon','Sat','Sun','Thurs','Tue','Wed'} | {"some_str":"str_one","some_n":1,"some_t":2019-04-24 06:00:03.700+0000}|
+----------+---------------------+-------+-----------------------------------------------+------------------------------------------------------------------------+
I have tried the following statement, but it is not giving me perfect results.
CREATE EXTERNAL TABLE IF NOT EXISTS table_name(
com_id STRING,
w_start_time TIMESTAMP,
cap FLOAT,
nw_day array <STRING>,
res_m STRUCT <
some_str: STRING,
some_n: BIGINT,
some_t: TIMESTAMP
>)
COMMENT 's_e_s'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/location/to/folder/containing/csv'
TBLPROPERTIES ("skip.header.line.count"="1");
So, I'm thinking I deserialize those objects into hive complex datatypes with ARRAYS and STRUCT. But that is not exactly what I get when I run
select * from table_name limit 1;
which gives me
+----------+---------------------+-------+----------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------+
| com_id | w_start_time | cap | nw_day | res_m |
+----------+---------------------+-------+----------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------+
| dtf_id | 2019-04-24 06:00:03 | 444.3 | ["{'Fri'"," 'Mon'"," 'Sat'"," 'Sun'"," 'Thurs'"," 'Tue'"," 'Wed'}"] | {"some_str":"{\"some_str\":\"str_one\",\"some_n\":1,\"some_t\":2019-04-24 06:00:03.700+0000}\","some_n":null,"some_t":null}|
+----------+---------------------+-------+----------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------+
So, it considering the whole object as a string and split the string by delimiter.
I need some help understanding how to load data from CSV/TSV to complex data types in Hive.
I found a similar question but the requirement is little different and there is no complex datatype involved in there.
Any help would be much appreciated. If this cannot be done and a preprocessing step has to be included prior to loading, some example of input data to complex datatype loads in hive would help me. Thanks in advance!

Using MySQL JSON field to join on a table with custom fields

So I made this system to store custom objects with custom fields for an app that I'm developing. First I have object_def where I save the object definitions:
id | name | fields
------------------------------------------------------------
101 | Group 1 | [{"name": "Title", "id": "AbCdE123"}, ...]
102 | Group 2 | [{"name": "Name", "id": "FgHiJ456"}, ...]
So we have ID (INT), name (VARCHAR) and fields (LONGTEXT). In fields are the object fields like this: {id: string, type: string, name: string}[].
Now In the object table, I have this:
id | object_def_id | object_values
------------------------------------------------------------
235 | 101 | {"AbCdE123": "The Object", ... }
236 | 102 | {"FgHiJ456": "John Perez", ... }
Where object_values is a LONGTEXT also. With that system, I'm able to show the objects on a table in my app using JSON.parse().
Now I've learned that there is a JSON type in MySQL and I want it to use it to do queries and stuff (I'm really new to this).
I've changed the LONGTEXT to JSON and now I wanted to do a SELECT that show the results like this:
#Select objects in group 1:
id | group | Title | ... | other_custom_field
-------------------------------------------------------
235 | Group 1 | The Object | ... | other_custom_value
#Select objects in group 2:
id | group | Name | ... | other_custom_field
-------------------------------------------------------
236 | Group 2 | John Perez | ... | other_custom_value
Id, then group name (I can do this with INNER JOIN) and then all the custom fields with the respective values.
Is this possible? How can I achieve this (hopefully without changing my database structure)? I'm learning MySQL, SQL and databases as I go so I really appreciate your help. Thanks!
Problems I see with your design:
Incorrect JSON format.
[{name: 'Title', id: 'AbCdE123'}, ...]
Should be:
[{"name": "Title", "id": "AbCdE123"}, ...]
You should use the JSON data type instead of LONGTEXT, because JSON will at least reject invalid JSON syntax.
Setting column headings based on data. You can't do this in SQL. Columns and headings must be fixed at the time you prepare the query. You can't do an SQL query that changes its own column headings.
Your object def has an array of attributes, but there's no way in MySQL 5.7 to loop over the "rows" of a JSON array. You'll need to use the JSON_TABLE() in MySQL 8.0.
That will get you closer to being able to look up object values, but then you'll still have to pivot the data into the result set you describe, with one attribute in each column, as if the data had been stored in a traditional way. But SQL doesn't allow you to do dynamic pivoting in a single query. You can't make an SQL query that dynamically grows its own select-list based on the data it finds.
This all makes me wonder...
Why don't you just store the data in the traditional way?
Create a table per object type. Add one column to that table per attribute. That way you get column names. You get column types. You get column constraints — for example, how would you simulate NOT NULL or UNIQUE in your current system?
If you don't want to use SQL, then don't. There are alternatives, like document databases or key/value databases. But don't torture poor SQL by using it to implement an Inner-Platform.

Laravel - Retrive row from DB as json when one column is in json format

I store my data in DB where one of column keep data as a json format.
When I try to retrieve row and response as json then I have that column as a string instead of json object.
DB:
id | name | map_id | map_settings | created_at
1 | Europe | 2 | {"zoom":7,"minZoom":5,"maxZoom":9,"zoomControl":true,"disableDefaultUI":true,"center":"new google.maps.LatLng(51.954422144707960, 19.140930175781250)"} | 2018-08-19 05:19:50
PHP
$mapConfig = MapConfig::with(['places'])->where(['id'=>$id])->get();
return response()->json($mapConfig);
Result
id: 1,
name: "Europe",
map_id: 2,
map_settings: "{\"zoom\":7,\"minZoom\":5,\"maxZoom\":9,\"zoomControl\":true,\"disableDefaultUI\":true,\"center\":\"new google.maps.LatLng(51.954422144707960, 19.140930175781250)\"}"
Why that map_settings is not in correct JSON format? And how to do it?
Thank you.
You have to parse it (map_settings) by json_decode becausue it is a string.
you can use laravel attribute casting and convert your json to array, then it would work fine with response json function.

Removing double-encoded JSON values in PSQL

Given a table in Postgresql, defined approximately as follows:
Column | Type | Modifiers | Storage | Stats target | Description
-------------+-----------------------------+-----------+----------+--------------+-------------
id | character varying | not null | extended | |
answers | json | | extended | |
we accidentally did a number of inserts to this database of doubly-encoded JSON objects, ie the json value is a string, that is a json-encoded object -- for example:
"{\"a\": 1}"
We'd like to find a query that would convert these values to the JSON objects they represent, for example:
{"a": 1}
We can easily select the bad values by doing:
SELECT * FROM table WHERE json_type(answers) = 'string'
but we are having trouble coming up with a way to parse the JSON in PSQL.
Unfortunaly, there is no string-extraction function for the json[b] type(s) directly, but you can workaround this by embedding the value inside a JSON array & using the ->> operator for string extraction at the 0 array index:
UPDATE table
SET answers = (CONCAT('[', answers::text, ']')::json ->> 0)::json
WHERE json_type(answers) = 'string'
This should work with lower PostgreSQL versions too (9.3). For newer versions (9.4+), you could use the json_build_array() function too.