Issue in JOIN query in apache drill - apache-drill

File stored in Hive:
[
{
"occupation": "guitarist",
"fav_game": "football",
"name": "d1"
},
{
"occupation": "dancer",
"fav_game": "chess",
"name": "k1"
},
{
"occupation": "traveller",
"fav_game": "cricket",
"name": "p1"
},
{
"occupation": "drummer",
"fav_game": "archery",
"name": "d2"
},
{
"occupation": "farmer",
"fav_game": "cricket",
"name": "k2"
},
{
"occupation": "singer",
"fav_game": "football",
"name": "s1"
}
]
CSV file in hadoop:
name,age,city
d1,23,delhi
k1,23,indore
p1,23,blore
d2,25,delhi
k2,30,delhi
s1,25,delhi
I queried them individually, it's working fine. Then, I tried join query:
select * from hdfs.`/demo/distribution.csv` d join hive.demo.`user_details` u on d.name = u.name
I got the following issue:
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts between 1. Numeric data 2. Varchar, Varbinary data 3. Date, Timestamp data Left type: INT, Right type: VARCHAR. Add explicit casts to avoid this error Fragment 0:0 [Error Id: b01db9c8-fb35-4ef8-a1c0-31b68ff7ae8d on IMPETUS-DSRV03.IMPETUS.CO.IN:31010]

Please refer this https://drill.apache.org/docs/data-type-conversion/
We need to do explicit typecasting to deal with such scenario.
Consider we have a JSON file employee.json and a csv file sample.csv. In order to query on both at the same time , in one query we need to do type casting.
0: jdbc:drill:zk=local> select emp.employee_id, dept.department_description, phy.columns[2], phy.columns[3] FROM cp.`employee.json` emp , cp.`department.json` dept, dfs.`/tmp/sample.csv` phy where CAST(emp.employee_id AS INT) = CAST(phy.columns[0] AS INT) and emp.department_id = dept.department_id;
Here we are typecasting CAST(emp.employee_id AS INT) = CAST(phy.columns[0] AS INT) so that equality does not fail.
Refer this for more detail:- http://www.devinline.com/2015/11/apache-drill-setup-and-SQL-query-execution.html#multiple_src

You need to cast even though by default it has taken varchar. Try this:
select * from hdfs.`/demo/distribution.csv` d join hive.demo.`user_details` u on cast(d.name as VARCHAR) = cast(u.name as VARCHAR)
But you cannot refer to column name directly from csv. you need to consider columns[0] for name.

Related

How to get a specific object in an JSON array in MySQL?

I have a JSON column "jobs" that looks like this:
[
{
"id": "1",
"done": "100",
"target": "100",
"startDate": "123123132",
"lastAction": "123123132",
"status": "0"
},
{
"id": "2",
"done": "10",
"target": "20",
"startDate": "2312321",
"lastAction": "2312321",
"status": "1"
}
]
I want to filter the array by object key values. For example: To find all items that have target > done, status != 0 and lastAction is yesterday to get response like this:
[
{
"id": "1",
"done": "19",
"target": "100",
"startDate": "123123132",
"lastAction": "123123132",
"status": "0"
}
]
I know I can extract the data to a JSON_TABLE() to do the filtering but I don't get the original object back(unless I recreate it back) and the solution is not dynamic.
Can this kind of array filtering can really be done in MySQL?
SELECT JSON_PRETTY(JSON_EXTRACT(jobs.jobs, CONCAT('$[', j.rownum-1, ']'))) AS object
FROM jobs
CROSS JOIN JSON_TABLE(
jobs.jobs, '$[*]' COLUMNS(
rownum for ordinality,
done int path '$.done',
target int path '$.target',
status int path '$.status'
)
) as j
WHERE j.target > j.done AND j.status != 0;
You also mentioned a condition on lastAction, but the example values you gave are not valid dates, so I'll leave that enhancement to you. The example above demonstrates the technique.
Yes it is possible to do it using the JSON_EXTRACT and JSON_SEARCH functions.
Let's say your table is named tbl_Jobs and the jobs column is of type JSON.
SELECT * FROM tbl_Jobs
WHERE JSON_EXTRACT(jobs, "$[*].target") = JSON_EXTRACT(jobs, "$[*].done")
AND JSON_EXTRACT(jobs, "$[*].status") != 0
AND JSON_SEARCH(jobs, 'one', DATE_SUB(CURDATE(), INTERVAL 1 DAY), NULL, "$[*].lastAction") IS NOT NULL

Using SQLite Json Functions I want to retrieve a value base on the value near it

Lets say I have this Json and I would like to retrieve all the age values where the name equals Chris in the Array key.
{
"Array": [
{
"age": "65",
"name": "Chris"
},
{
"age": "20",
"name": "Mark"
},
{
"age": "23",
"name": "Chris"
}
]
}
That Json is present in the Json column inside my database.
by that I would like to retrieve one age column the has the age 65 and 23 because they both named Chris.
Use json_each() table-valued function to extract all the names and ages from the json array of each row of the table and json_extract() function to filter the rows for 'Chris' and get his age:
SELECT json_extract(j.value, '$.name') name,
json_extract(j.value, '$.age') age
FROM tablename t JOIN json_each(t.col, "$.Array") j
WHERE json_extract(j.value, '$.name') = 'Chris';
Change col to the name of the json column.
See the demo.

Postgres query on json with empty value

I have a query in JSON to filter out the data based on data present inside JSON field .
Table name: audit_rules
Column name: rule_config (json)
rule_config contains JSON which contain 'applicable_category' as an attribute in it.
Example
{
"applicable_category":[
{
"status":"active",
"supported":"yes",
"expense_type":"Meal",
"acceptable_variation":0.18,
"minimum_value":25.0
},
{
"status":"active",
"supported":"yes",
"expense_type":"Car Rental",
"acceptable_variation":0.0,
"minimum_value":25.0
},
{
"status":"active",
"supported":"yes",
"expense_type":"Airfare",
"acceptable_variation":0.0,
"minimum_value":75
},
{
"status":"active",
"supported":"yes",
"expense_type":"Hotel",
"acceptable_variation":0.0,
"minimum_value":75
}
],
"minimum_required_keys":[
"amount",
"date",
"merchant",
"location"
],
"value":[
0,
0.5
]
}
But some of the rows doesn't have any data or doesn't have the 'applicable_category' attribute in it.
So while running following query i am getting error:
select s.*,j from
audit_rules s
cross join lateral json_array_elements ( s.rule_config#>'{applicable_category}' ) as j
WHERE j->>'expense_type' in ('Direct Bill');
Error: SQL Error [22023]: ERROR: cannot call json_array_elements on a scalar
You can restrict the result to only rows that contain an array:
select j.*
from audit_rules s
cross join lateral json_array_elements(s.rule_config#>'{applicable_category}') as j
WHERE json_typeof(s.rule_config -> 'applicable_category') = 'array'
and j ->> 'expense_type' in ('Meal')

How to parse JSON into relational format in SQL Server 2016?

I have some Json stored in SQL Server 2016 table as under (partitial)
{
"AFP": [
{
"AGREEMENTID": "29040400001330",
"LoanAccounts": {
"Product": "OD003",
"BUCKET": 0,
"ZONE": "MUMBAI ZONE",
"Region": "MUMBAI METRO-CENTRAL REGION",
"STATE": "GOA",
"Year": 2017,
"Month": 10,
"Day": 13
},
"FeedbackInfo": {
"FeedbackDate": "2017-10-13T12:07:44.2317198",
"DispositionDate": "2017-10-13T12:07:44.2317198",
"DispositionCode": "PR"
},
"PaymentInfo": {
"ReceiptNo": "2000000170",
"ReceiptDate": "2017-10-13T12:07:42.1218299",
"PaymentMode": "Cheque",
"Amount": 200,
"PaymentStatus": "CollectionBatchCreated"
}
}
]
}
table schema as under
create table tblHistoricalDataDemo(
AGREEMENTID nvarchar(40)
,Year_Json nvarchar(4000)
)
I would like to fetch the records from JSON into relational format as
AgreementID Product Bucket .... PaymentStatus
I tried with below but something wrong i am doing for which I am not able to get the result
SELECT AGREEMENTID,
JSON_VALUE(Year_Json, '$.LoanAccounts') AS records
FROM tblHistoricalDataDemo
Use the OPENJSON built in table value function:
SELECT *
FROM tblHistoricalDataDemo
CROSS APPLY
OPENJSON(Year_Json, '$.AFP') WITH
(
-- You don't have to specify the json path
-- if the column name is the same as the json name
AGREEMENTID bigint
)
As afp
CROSS APPLY
OPENJSON(Year_Json, '$.AFP') WITH
(
Product varchar(10) '$.LoanAccounts.Product',
bucket int '$.LoanAccounts.BUCKET'
)
As LoanAccounts
In case the array in JSON has a fixed number of element, use
$.P1[x]
If AFP has only 1 element,
SELECT t.AGREEMENTID,
JSON_Value(Year_Json, '$.AFP[0].LoanAccounts.Product') Product,
JSON_Value(Year_Json, '$.AFP[0].LoanAccounts.BUCKET') Bucket,
JSON_Value(Year_Json, '$.AFP[0].PaymentInfo.PaymentStatus') PaymentStatus
FROM tblHistoricalDataDemo t
Run it in SQLFiddle, thx Jacob H.

JSON Oracle SQL parsing / unnest embedded JSON data in escaped form

Here is my JSON stored in a CLOB column:
select upJSON from myLocations;
{"values":[
{"nameValuePairs":{"upJSON":"{\"mResults\":[0.0,0.0],\"mProvider\":\"fused\",\"mDistance\":0.0,\"mAltitude\":0.0}","id":"1","updated":"2015-03-30 20:28:51"}},
{"nameValuePairs":{"upJSON":"{\"mResults\":[0.0,0.0],\"mProvider\":\"FINDME\",\"mDistance\":0.0,\"mAltitude\":22.2}","id":"2","updated":"2015-03-30 20:28:53"}},
{"nameValuePairs":{"upJSON":"{\"mResults\":[0.0,0.0],\"mProvider\":\"fused\",\"mDistance\":0.0,\"mAltitude\":0.0}","id":"3","updated":"2015-03-30 20:28:55"}},
{"nameValuePairs":{"upJSON":"{\"mResults\":[0.0,0.0],\"mProvider\":\"fused\",\"mDistance\":0.0,\"mAltitude\":0.0}","id":"4","updated":"2015-03-30 20:28:57"}}
]}
(I have inserted newlines for clarity)
Please: What is the SQL (or PL/SQL) needed to select just the value of mProvider, mAltitude, and the id from the 2nd "nameValuePairs"
(= "FINDME" and 22.2 and "2") in the example above)
??
Since you're using 12c you have access to the native JSON parsing (as long as your CLOB column has an is json check constraint).
Some good background is available at:
https://docs.oracle.com/database/121/ADXDB/json.htm#ADXDB6371
If you're JSON looks something like this:
{
"values": [
{
"nameValuePairs": {
"upJSON": {
"mResults": [
"0.0",
"0.0"
],
"mProvider": "fused",
"mDistance": "0.0",
"mAltitude": "22.2"
},
"id": "1",
"updated": "2015-03-30 20:28:51"
}
},
...
...
Although, when I put your snippet from the question into JSONLint it returns:
{
"values": [
{
"nameValuePairs": {
"upJSON": "{\"mResults\":[0.0,0.0],\"mProvider\":\"fused\",\"mDistance\":0.0,\"mAltitude\":0.0}",
"id": "1",
"updated": "2015-03-30 20:28:51"
}
},
{
"nameValuePairs": {
"upJSON": "{\"mResults\":[0.0,0.0],\"mProvider\":\"FINDME\",\"mDistance\":0.0,\"mAltitude\":22.2}",
"id": "2",
"updated": "2015-03-30 20:28:53"
}
},
Something like the following might get you started:
select
upJSON.values
from
myLocations
where
json_value(upJSON, '$nameValuePairs.id' returning varchar2 error on error) = '2';
If you want to limit the query to a single ID, you'll need to add a full-text or function-based index to the JSON column.
https://odieweblog.wordpress.com/2015/04/12/json_table-chaining/#comment-1025
with tmp as (
SELECT /*+ no_merge */ d.*
FROM ulocations ul,
json_table(ul.upjson, '$'
columns(
NESTED PATH '$.values[*].nameValuePairs'
COLUMNS (
updated VARCHAR2(19 CHAR) PATH '$.updated'
, id varchar2(9 char) path '$._id'
, upJSON VARCHAR2(2000 CHAR) PATH '$.upJSON'
)) ) d
--where d.id = '0'
)
select t.updated
, t.id
, jt2.*
from tmp t
, json_table(t.upJSON, '$'
columns mProvider varchar2(5) path '$.mProvider'
, mLongitude number path '$.mLongitude'
) jt2
;