String conversion in hive - json

I have a json string in which there isa field called version.
Version can either not be there or if it is there it will be of form x.y
.
I want to convert this to x.0 I am currently doing
CONCAT(split(get_json_object(json, '$.version'),'[.]')[0],".","0")
but this does not handle cases where version is not there.
I want "bad_version" to be returned if version is not there. Can I somehow use COALESCE and do some tweaks ?

Yes, you can use either COALESCE or CASE - the syntax is identical to database usage.
select coalesce(myField, 'bad_version') ....
or
select case when myField is null then 'bad_version' else myField end as x ....

You can conditionally test the result of get_json_object to see if it's NULL and return bad_version accordingly. When the version is valid, you can use a regular expression to replace the minor version with 0.
SELECT
IF(get_json_object(json, "$.version") IS NULL,
"bad_version",
regexp_replace(get_json_object(json, "$.version") , "\\..*$", ".0")
)
FROM json_table; -- The table I loaded with test data
Some simple example data:
hive> SELECT json FROM json_table;
OK
{"id":"001","version":"3.9"}
{"id":"002","notversion":"3.9"}
Time taken: 0.225 seconds, Fetched: 2 row(s)
And then the results of this query against this data:
hive> SELECT
> IF(get_json_object(json, "$.version") IS NULL,
> "bad_version",
> regexp_replace(get_json_object(json, "$.version") , "\\..+$", ".0")
> )
> FROM json_table;
OK
3.0
bad_version
Time taken: 0.225 seconds, Fetched: 2 row(s)

Related

JSON parsing issue in hive

i am getting some issues while querying json data.
my sample data look like ...
{"Rtype":{"ver":"1","os":"ms","type":"ns","vehicle":"Mh-3412","MOD":{"Version":[{"ABC":{"XYZ":"123.dfer","founder":"3.0","GHT":"Florida","fashion":"fg45","cdc":"new","dof":"yes","ts":"2000-04-01T00:00:00.171Z"}}]}}}
{"Rtype":{"ver":"1","os":"ms","type":"ns","vehicle":"Mh-3412","MOD":{"Version":[{"GAP":{"GGG":"123.dfer","FFF":"3.0","DDD":"Florida","GOP":"fg45","cdc":"QQQ","ZZZ":"yes","ts":"2000-04-01T00:00:00.171Z"}}]}}}
{"Rtype":{"ver":"1","os":"ms","type":"ns","vehicle":"Mh-3412","MOD":{"Version":[{"BOX":{"FRG":"123.dfer","CXD":"3.0","FAX":"Florida","SXD":"fg45","cdc":"new","dof":"yes","ts":"2000-04-01T00:00:00.171Z"}}]}}}
i have done follwing
create table src (myjson string);
insert into src values
('{"Rtype":{"ver":"1","os":"ms","type":"ns","vehicle":"Mh-3412","MOD":{"Version":[{"ABC":{"XYZ":"123.dfer","founder":"3.0","GHT":"Florida","fashion":"fg45","cdc":"new","dof":"yes","ts":"2000-04-01T00:00:00.171Z"}}]}}}')
,('{"Rtype":{"ver":"1","os":"ms","type":"ns","vehicle":"Mh-3412","MOD":{"Version":[{"GAP":{"XVY":"123.dfer","FAH":"3.0","GHT":"Florida","fashion":"fg45","cdc":"new","dof":"yes","ts":"2000-04-01T00:00:00.171Z"}}]}}}')
,('{"Rtype":{"ver":"1","os":"ms","type":"ns","vehicle":"Mh-3412","MOD":{"Version":[{"BOX":{"VOG":"123.dfer","FAH":"3.0","FAX":"Florida","fashion":"fg45","cdc":"new","dof":"yes","ts":"2000-04-01T00:00:00.171Z"}}]}}}')
;
The issue is when i start do select get_json_object(myjson,'$.Rtype.MOD.Version[0].ABC.fashion') where get_json_object(myjson,'$.Rtype.MOD.Version[0].ABC') is not null from src
am getting NULLS for the some fields
count value for this say 2345
without where condition also countvalue 2345. this is the issue
the observasion i have seen is this is due to it is trying to fetch data that is $.Rtype.MOD.Version[0].GAP
hive> load data local inpath '/home/satish/s.json' into table sjson;
Loading data to table hivelearning.sjson
Table hivelearning.sjson stats: [numFiles=1, totalSize=216]
hive> select * from sjson;
{"Rtype":{"ver":"1","os":"ms","type":"ns","vehicle":"Mh-3412","MOD":{"Version":[{"ABC":{"XYZ":"123.dfer","founder":"3.0","GHT":"Florida","fashion":"fg45","cdc":"new","dof":"yes","ts":"2000-04-01T00:00:00.171Z"}}]}}}
Time taken: 1.297 seconds, Fetched: 1 row(s)
hive> select get_json_object(data,'$.Rtype.MOD.Version[0].ABC.fashion') from sjson;
OK
fg45
Time taken: 0.084 seconds, Fetched: 1 row(s)

MySQL VARCHAR Type won't CONVERT to Integer

I have a column of data of type VARCHAR, that I want to CONVERT or CAST to an integer (my end goal is for all of my data points to be integers). However, all the queries I attempt return values of 0.
My data looks like this:
1
2
3
4
5
If I run either of the following queries:
SELECT CONVERT(data, BINARY) FROM table
SELECT CONVERT(data, CHAR) FROM table
My result is:
1
2
3
4
5
No surprises there. However, if I run either of these queries:
SELECT CONVERT(data, UNSIGNED) FROM table
SELECT CONVERT(data, SIGNED) FROM table
My result is:
0
0
0
0
0
I've searched SO and Google all over for an answer to this problem, with no luck, so I thought I would try the pros here.
EDIT/UPDATE
I ran some additional queries on the suggestions from the comments, and here are the results:
data LENGTH(data) LENGTH(TRIM(data)) ASCII(data)
1 3 3 0
2 3 3 0
3 3 3 0
4 3 3 0
5 3 3 0
It appears that I have an issue with the data itself. For anyone coming across this post: my solution at this point is to TRIM the excess from the data points and then CONVERT to UNSIGNED. Thanks for all of the help!
FURTHER EDIT/UPDATE
After a little research, turns out there were hidden NULL bytes in my data. The answer to this question helped out: How can I remove padded NULL bytes using SELECT in MySQL
What does SELECT data, LENGTH(data), LENGTH(TRIM(data)), ASCII(data) FROM table return? It's possible your numeric strings aren't just numeric strings.
Alternately, are you using multi-byte character encoding?
I believe the query you have is fine; as it worked for me: sqlfiddle.com/#!2/a15ec4/1/3.
Makes me think you have a data problem. Are you sure there's not a return or space in the data somewhere?
you can check the data by trying to do a length or a ascii on the data to see if you have more than expected:
select ascii(data) from foo where ascii(data) not between 48 and 57 or
select length(data) as mLEN from table having mlen>1 for length.
I believe this is the correct form:
SELECT CAST(data AS UNSIGNED) FROM test;
SELECT CAST(data AS SIGNED) FROM test;
Tested here: http://sqlfiddle.com/#!8/8c481/1
Try these syntax
SELECT CONVERT(data, UNSIGNED INTEGER) FROM table
or
SELECT CAST(data AS UNSIGNED) FROM table

selecting BIT field in sub query gives '49' instead of 0 or 1

select *, (select logactivity from users where regtoken= '12345') as log from server
the log activity field in 'users' is a BIT field which under "SELECT * FROM USERS" comes up 0 or 1. But when I use the subquery above, I get 49 instead of 1, 48 instead of 0. Why?
48 and 49 are integer values of character (ascii) representations of characters0 and 1 .
Perhaps the issue is not in your query, but in your code somewhere which deals with this case.
php example : http://codepad.org/5i65IWTG

SELECT statement with where clause IN is not reading variable in MySQL

I'm running SELECT statement (database MySQL version 5.5.27 under Windows 7) with variable in WHERE clause. It supposed to return 6 records, but it doesn't. Below is a simple test code.
-- Test-I
SET #group_saids := (SELECT REPLACE(
'''ClicPlan - España|ClicPlan - Francia|ClicPlan - UK|ClicPlan - Belgique|ClicPlan - Argentina|Clicplan - Turkey'''
,'|',"','") as aids_list from dual);
select #group_saids from dual;
select sd.aid
FROM said_aid sd
where sd.said in (#group_saids);
-- No records selected;
-- Test-II
select sd.aid
FROM said_aid sd
where sd.said in ('ClicPlan - España','ClicPlan - Francia','ClicPlan - UK',
'ClicPlan - Belgique','ClicPlan - Argentina',
'Clicplan - Turkey');
aid
----
3045
3253
3254
3260
3268
3270
In the code above Test-I, select from table said_aid doesn't return records, but should be 6 records output.
Int the Test-II same query with hard coded IN values return 6 records output.
No ERRORS during execution.
You have to use FIND_IN_SET() , because the IN clause expects literal values, so it won't work with the values inside a string variable, so replace your following line:
where sd.said in (#group_saids);
for this one:
where FIND_IN_SET(sd.said, #group_saids);

PHP MYSQL Compare rows when defining WHERE statement

Lets say I have these DB rows
id | storage | used | status
1 - 100 - 0 - 1
2 - 1000 - 5000 - 1
I need to compare the rows "storage" and "used"
I want to select rows WHERE status = 1 and Column"storage" > Column"used".
I tried WHERE status = '1' AND storage > used
It should report back row id #1, but it doesnt.
Well, WHERE status=1 AND storage > used is correct. If you tried it and didn't get back the row with id=1 there's something wrong with your data.
Are storage and used numeric columns? Or are they stored as a VARCHAR (or, gasp, TEXT)? If so, you won't be able to compare them quite the way you want, and will first have to convert or cast them to numeric types. It would be better to change the type to actually be numeric (i.e., INT or DECIMAL or whichever other type is appropriate).
SELECT * FROM `table` WHERE status = '1' AND storage > used
should give you the right solution, like VoteyDisciple mentioned, make sure status and used are both of numeric type.
you can use SELECT * FROMtableWHERE status = '1' AND storage > used but the data type of the storage and used must be NUMERIC not VARCHAR.
if Still you didn't get correct ans then it should be problem with your data storage structure.
Thanks!