Unnest and Average operation in Couchbase - couchbase

I have following structure saved in a bucket:
[
{
"Grouplens_1M": {
"genres": [
"Thriller",
"Drama"
],
"movieId": 3952,
"ratings": [
{
"rating": 4,
"userId": 23
},
{
"rating": 5,
"userId": 36
},
{
"rating": 4,
"userId": 52
}
],
"title": "Contender, The (2000)"
}
}
]
Now I need to get all titles which are rated in average above 3. I found out, that I need to unnest ratings and then use AVG to get the average. But it was not working. After trying to figure out how to solve this problem, I came to this:
SELECT g.title, AVG(r_item.rating) AS avg_r
FROM Grouplens_1M AS g
UNNEST ratings r_item
WHERE r_item > 4.0
GROUP BY g.title
After execution time on the query, it shows me a result. But the WHERE clause is not correct. It seems to ignore the statement as it shows me all movies with the average rating.

Since you want to filter out results based on the value of the derived average use HAVING. WHERE would be used to limit the documents considered for the query.
Try
SELECT g.title, avg_r
FROM Grouplens_1M AS g
UNNEST ratings r_item
GROUP BY g.title
LETTING avg_r = AVG(r_item.rating)
HAVING avg_r > 4.0;

Related

How to map nested array items with N1QL?

I have documents in a bucket called blocks in the following format:
{
"random_field": 1,
"transactions": [{
"id": "CCCCC",
"inputs": [{
"tx_id": "AAAAA",
"index": 0
},{
"tx_id": "BBBBB",
"index": 1
}]
}]
}
{
"transactions": [{
"id": "AAAAA",
"outputs": [{
"field1": "value123",
"field2": "value456"
},{
"field1": "ignore",
"field2": "ignore"
}]
}]
}
{
"transactions": [{
"id": "BBBBB",
"outputs": [{
"field1": "ignored",
"field2": "ignored"
},{
"field1": "value999",
"field2": "value888"
}]
}]
}
and I need to map the inputs from the first document to the corresponding outputs of the second and third documents. The way to do it manually is to, for each input, find a transaction with id equal to the input's tx_id, and then get the item from the outputs array based on the index of the input. To exemplify, this is the object I would like to return in this scenario:
{
"random_field": 1,
"transactions": [{
"id": "CCCCC",
"inputs": [{
"tx_id": "AAAAA",
"index": 0,
"output": {
"field1": "value123",
"field2": "value456"
}
},{
"tx_id": "BBBBB",
"index": 1,
"output": {
"field1": "value999",
"field2": "value888"
}
}]
}]
}
I managed to come up with the following query:
SELECT b.random_field,
b.transactions -- how to map this?
FROM blocks b
UNNEST b.transactions t
UNNEST t.inputs input
JOIN blocks `source` ON (ANY tx IN `source`.transactions SATISFIES tx.`id` = input.tx_id END)
UNNEST `source`.transactions source_tx
UNNEST source_tx.outputs o
WHERE (ANY tx IN b.transactions SATISFIES tx.`id` = 'AAAAA' END) LIMIT 1;
I suppose there should be a way to map b.transactions.inputs by using source_tx.outputs, but I couldn't find how.
I came across this other answer, but I don't really understand how it applies to my scenario. Maybe it does, but I am very new to Couchbase, so I am very much lost: How to map array values in one document to another and display in result
Basically you want inline some other document into current document using condition.
Instead of JOINs+ GROUPS use subquery expressions + correlated subqueries. (b.*, "abc" AS transactions, selects all fields of b and adds transactions (if already exist overwrite else adds)
CREATE INDEX ix1 ON blocks (ALL ARRAY FOR ot.id FOR ot IN transactions END);
SELECT b.*,
(SELECT t.*,
(SELECT i.*,
(SELECT RAW ot
FROM blocks AS o
UNNEST o.transactions AS ot
UNNEST ot.outputs AS oto
WHERE i.tx_id = ot.id AND i.`index` = UNNEST_POS(oto))[0] AS output
FROM t.`inputs` AS i) AS inputs
FROM b.transactions AS t) AS transactions
FROM blocks AS b
WHERE ANY tx IN b.transactions SATISFIES tx.`inputs` IS NOT NULL END ;
OR
SELECT b.*,
(SELECT t.*,
(SELECT i.*,
(SELECT RAW ot.outputs[i.`index`]
FROM blocks AS o
UNNEST o.transactions AS ot
WHERE i.tx_id = ot.id
LIMIT 1)[0] AS output
FROM t.`inputs` AS i) AS inputs
FROM b.transactions AS t) AS transactions
FROM blocks AS b
WHERE ANY tx IN b.transactions SATISFIES tx.`inputs` IS NOT NULL END ;

How to deal with not existing values using JSON_EXTRACT?

I have a list ob objects. Each object contains several properties. Now I want to make a SELECT statement that gives me a list of a single property values. The simplified list look like this:
[
[
{
"day": "2021-10-01",
"entries": [
{
"name": "Start of competition",
"startTimeDelta": "08:30:00"
}
]
},
{
"day": "2021-10-02",
"entries": [
{
"name": "Start of competition",
"startTimeDelta": "03:30:00"
}
]
},
{
"day": "2021-10-03",
"entries": [
{
"name": "Start of competition"
}
]
}
]
]
The working SELECT is now
SELECT
JSON_EXTRACT(column, '$.days[*].entries[0].startTimeDelta') AS list
FROM table
The returned result is
[
"08:30:00",
"03:30:00"
]
But what I want to get (and also have expected) is
[
"08:30:00",
"03:30:00",
null
]
What can I do or how can I change the SELECT statement so that I also get NULL values in the list?
SELECT startTimeDelta
FROM test
CROSS JOIN JSON_TABLE(val,
'$[*][*].entries[*]' COLUMNS (startTimeDelta TIME PATH '$.startTimeDelta')) jsontable
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=491f0f978d200a8a8522e3200509460e
Do you also have a working idea for MySQL< 8? – Lars
What is max amount of objects in the array on the 2nd level? – Akina
Well it's usually less than 10 – Lars
SELECT JSON_EXTRACT(val, CONCAT('$[0][', num, '].entries[0].startTimeDelta')) startTimeDelta
FROM test
-- up to 4 - increase if needed
CROSS JOIN (SELECT 0 num UNION SELECT 1 UNION SELECT 2 UNION SELECT 3) nums
WHERE JSON_EXTRACT(val, CONCAT('$[0][', num, '].entries[0]')) IS NOT NULL;
https://www.db-fiddle.com/f/xnCCSTGQXevcpfPH1GAbUo/0

Aggregate JSON in PostgreSQL

I have a json column with entries that look like this:
{
"pages": "64",
"stats": {
"1": { "200": "55", "400": "4" },
"2": { "200": "1" },
"3": { "200": "1", "404": "13" },
}
}
The 'stats' are collections (of various sizes) containing http status codes versus counts.
I would like to aggregate the stats into two calculated columns - one for the total number of 200 responses and the other for the total number of responses (including 200s).
You can use two lateral joins to unnest the inner objects, then do conditional aggregation:
select
sum(z.cnt::int) no_responses,
sum(z.cnt::int) filter(where z.code::int = 200) no_200_responses
from mytable t
cross join lateral jsonb_each(t.data -> 'stats') as x(kx, obj)
cross join lateral jsonb_each_text(x.obj) as z(code, cnt)
Demo on DB Fiddle:
no_responses | no_200_responses
-----------: | ---------------:
74 | 57

Count occurences along with result using DISTINCT ON on PostgreSQL

I have data like this:
[
{"name": "pratha", "email": "p#g.com", "sub": { "id": 1 } },
{"name": "john", "email": "x#x.com", "sub": { "id": 5 } },
{"name": "pratha", "email": "c#d.com", "sub": { "id": 2 } }
]
This is my query to get unique and latest emails:
SELECT DISTINCT ON (jae.e->>'name')
jae.e->>'name' as name,
jae.e->>'email' as email
FROM survey_results sr
CROSS JOIN LATERAL jsonb_array_elements(sr.data_field) jae (e)
ORDER BY jae.e->>'name', jae.e->'sub'->>'id' desc
Problem is, when I add count(*) to select, all counts are equal.
I want to get unique result with distinct, and count their occurrences. So in this case, pratha should be 2 and john should be 1
with their data (not just counts)
How can achieve this with PostgreSQL?
See here: https://dbfiddle.uk/?rdbms=postgres_11&fiddle=f5c640958c3e4d594287632d0f4a835f
Do you need this?
SELECT DISTINCT ON (jj->>'name') jj->>'name', jj->>'email' , count(*) over(partition by jj->>'name' )
from survey_results
join lateral jsonb_array_elements(data_field) j(jj) on true
ORDER BY jj->>'name', jj->'sub'->>'id' desc
https://dbfiddle.uk/?rdbms=postgres_11&fiddle=5f07b7bcb0001ebe32aa2f1338d9d0f0

N1QL join with same Bucket with some logic

I have below json.
First Campaign
"campaign|1000":{
"_id": 1000,
"_type": "Campaign",
"country": 14,
"created": "2016-03-08T18:30:00.000Z",
"user": 45
"bids":[{
click:123
},
{
click:50
}
]
}
second USER
"User|257"{
"IMId": "",
"IMType": 0,
"_id": 257,
"_type": "User",
"children:[1,4,45,67,106]
"roles":[4]
"email": "krishn#inheritx.com",
}
Now I need to join this json base on who has roles=[4] and join it children with campaign. here children is user like child 45 means "User|45"
I am trying below query
select Users._id,count(Campaign._id) total from Inheritx Campaign
join Inheritx Users on keys("User|"|| TOSTRING(Campaign.`user`))
join reachEffect realted_users on keys ARRAY "User|" || TOSTRING FOR
c IN Users.children END where Campaign._type="Campaign" and
Users.roles=[4] group by Users._id
But I need to join with campaign and children of user whose users has roles is 4
I need output like below
{
user:257
clicks:157
}
Two options.
(1) You can start with campaign, join to users, and then filter on roles afterwards.
(2) You can start with users, filter on roles, and then use an INDEX JOIN to join users to campaigns.
See https://dzone.com/articles/join-faster-with-couchbase-index-joins