Hive parse json elements from a long concat json string - json

I have a server log, it continuously records json values without any delimiter, such as:
{"a":1}{"b",2}{"a":2}{"c":{\"qwe\":\"asd\"},"d":"ert"}{"e":12}....
I want to extract each element and put them into rows like:
{"a":1}
{"b",2}
{"a":2}
{"c":{\"qwe\":\"asd\"},"d":"ert"}
{"e":12}..
The log lacks delimiter and comprises nested json, so I cannot use
split function...How to achieve this...

One option would be to split on }{ character and get the elements using posexplode. Positions are only needed to concatenate properly for the first and last elements.
select case when pos = 0 then concat(split_str,'}')
when pos = max(pos) over(partition by str) then concat('{',split_str)
else concat('{',split_str,'}') end as res
from tbl
lateral view posexplode(split(str,'\\}\\{')) t as pos,split_str
Note the result will be a string.

Related

How replace "-" only in a *text* value of any generic jsonb in postgresql?

I need to clean JSON data that could look like:
{
"reference":"0000010-CAJ",
"product_code":"00000-10",
"var_name":"CAJ-1",
"doc_date":"2020-02-09T21:01:01-05:00",
"due_date":"2020-03-10T21:01:01-05:00",
}
However, this is just one of many other possibilities (is for a log aggregation that gets data from many sources).
I need to replace "-" with "_", but without break the dates like "2020-03-10T21:01:01-05:00", so can't simply cast to string and do a replace. I wonder if exist an equivalent of:
for (k,v) in json:
if is_text(v):
v = replace(...)
You can check with a regex if the value looks like a timestamp:
update the_table
set the_column = (select
jsonb_object_agg(
key,
case
when value ~ '^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}.*' then value
else replace(value, '-', '_')
end)
from jsonb_each_text(the_column) as t(key, value))
This iterates over all key/value pairs in the JSON column (using jsonb_each_text()) and assembles all of them back into a JSON again (using jsonb_object_agg()). Values that look like a timestamp are left unchanged, for all others, the - is replaced with a _.

mySql JSON string field returns encoded

First week having to deal with a MYSQL database and JSON field types and I cannot seem to figure out why values are encoded automatically and then returned in encoded format.
Given the following SQL
-- create a multiline string with a tab example
SET #str ="Line One
Line 2 Tabbed out
Line 3";
-- encode it
SET #j = JSON_OBJECT("str", #str);
-- extract the value by name
SET #strOut = JSON_EXTRACT(#J, "$.str");
-- show the object and attribute value.
SELECT #j, #strOut;
You end up with what appears to be a full formed JSON object with a single attribute encoded.
#j = {"str": "Line One\n\tLine 2\tTabbed out\n\tLine 3"}
but using JSON_EXTRACT to get the attribute value I get the encoded version including outer quotes.
#strOut = "Line One\n\tLine 2\tTabbed out\n\tLine 3"
I would expect to get my original string with the \n \t all unescaped to the original values and no outer quotes. as such
Line One
Line 2 Tabbed out
Line 3
I can't seem to find any JSON_DECODE or JSON_UNESCAPE or similar functions.
I did find a JSON_ESCAPE() function but that appears to be used to manually build a JSON object structure in a string.
What am I missing to extract the values to the original format?
I like to use handy operator ->> for this.
It was introduced in MySQL 5.7.13, and basically combines JSON_EXTRACT() and JSON_UNQUOTE():
SET #strOut = #J ->> '$.str';
You are looking for the JSON_UNQUOTE function
SET #strOut = JSON_UNQUOTE( JSON_EXTRACT(#J, "$.str") );
The result of JSON_EXTRACT() is intentionally a JSON document, not a string.
A JSON document may be:
An object enclosed in { }
An array enclosed in [ ]
A scalar string value enclosed in " "
A scalar number or boolean value
A null — but this is not an SQL NULL, it's a JSON null. This leads to confusing cases because you can extract a JSON field whose JSON value is null, and yet in an SQL expression, this fails IS NULL tests, and it also fails to be equal to an SQL string 'null'. Because it's a JSON type, not a scalar type.

In Python, how to concisely get nested values in json data?

I have data loaded from JSON and am trying to extract arbitrary nested values using a list as input, where the list corresponds to the names of successive children. I want a function get_value(data,lookup) that returns the value from data by treating each entry in lookup as a nested child.
In the example below, when lookup=['alldata','TimeSeries','rates'], the return value should be [1.3241,1.3233].
json_data = {'alldata':{'name':'CAD/USD','TimeSeries':{'dates':['2018-01-01','2018-01-02'],'rates':[1.3241,1.3233]}}}
def get_value(data,lookup):
res = data
for item in lookup:
res = res[item]
return res
lookup = ['alldata','TimeSeries','rates']
get_value(json_data,lookup)
My example works, but there are two problems:
It's inefficient - In my for loop, I copy the whole TimeSeries object to res, only to then replace it with the rates list. As #Andrej Kesely explained, res is a reference at each iteration, so data isn't being copied.
It's not concise - I was hoping to be able to find a concise (eg one or two line) way of extracting the data using something like list comprehension syntax
If you want one-liner and you are using Python 3.8, you can use assignment expression ("walrus operator"):
json_data = {'alldata':{'name':'CAD/USD','TimeSeries':{'dates':['2018-01-01','2018-01-02'],'rates':[1.3241,1.3233]}}}
def get_value(data,lookup):
return [data:=data[item] for item in lookup][-1]
lookup = ['alldata','TimeSeries','rates']
print( get_value(json_data,lookup) )
Prints:
[1.3241, 1.3233]
I don't think you can do it without a loop, but you could use a reducer here to increase readability.
functools.reduce(dict.get, lookup, json_data)

How to get not escaped value from json

I have inserted some test data like below:
INSERT INTO tblDataInfo (lookupkey, lookupvalue, scope)
VALUES ('diskname', '/dev/sdb', 'common')
yours sincerely
I wanted to query this data and like to obtain the query output in JSON format.
I have used the query
select lookupvalue as 'disk.name'
from tblDataInfo
where lookupkey = 'diskname' FOR JSON PATH;
This query returns
[{"disk":{"name":"\/dev\/sdb"}}]
which is escaping all my forward slashes (/) using the escape character (\). How can I have my output not to put escape-character (\)?
This query returns result that you need:
select json_query ('{"name":"' + lookupvalue + '"}') as 'disk'
from tblDataInfo
where lookupkey = 'diskname'
for json path;

How can I loop over a map of String List (with an iterator) and load another String List with the values of InputArray?

How can I iterate over a InputArray and load another input array with the same values except in lower case (I know that there is a string to lower function)?
Question: How to iterate over a String List with a LOOP structure?
InputArray: A, B, C
OutputArray should be: a, b, c
In case, you want to retain the inputArray as such and save the lowercase values in an outputArray, then follow steps in below image which is self explanatory:
In the loop Step, Input Array should be /inputArray and Output Array should be /outputArray.
Your InputArray field looks like a string field. It's not a string list.
You need to use pub.string:tokenize from the WmPublic package to split your strings into a string list and then loop through the string list.
A string field looks like this in the pipeline:
A string list looks like this in the pipeline:
See the subtle difference in the little icon at the left ?
I can see two cases out here.
If your input is a string
Convert the string to stringlist by pub.string:tokenize service.
Loop over the string list by providing the name of string list in input array property of loop.
within loop use pub.string:toLower service as transformer and map the output to an output string.
put the output string name in the output array property of Loop.
once you come out of the loop you will see two string lists, one with upper case and one with lower case.
If your input is a string list.
In this case follow steps 2 to 5 as mentioned above.