How to select max value from rows and join to another table - mysql

I am trying to join two tables with respect to the max values for the values column. I would like to produce the expected results as shown below based on the max value while joining
select * from order
-------------------------
| ID | value | Name |
-------------------------
| 1 | 23 | REM |
| 2 | 0 | SER |
| 3 | 13 | MH |
| 4 | 3 | MH |
| 5 | 1 | MP |
-------------------------
select * from product
-------------------------
| ID | value | Name |
-------------------------
| 1 | 2 | ABC |
| 2 | 2 | DEG |
| 3 | 17 | XYZ |
-------------------------
Desired result:
-------------------------
| ID | Value | Name |
-------------------------
| 1 | 23 | REM |
| 2 | 2 | DEG |
| 3 | 17 | XYZ |
| 4 | 3 | MH |
| 5 | 1 | MP |
-------------------------
I have tried something like below but it's not fetching the value (NAME) from other table
SELECT
MAX(IF(a.value >b.value , a.value ,b.value )) AS Value
from order a left join product b on a.ID= b.ID
Please suggest how to get the expected result from these two tables.

Below is for BigQuery Standard SQL
#standardsql
select as value array_agg(struct(id, value, name) order by value desc limit 1)[offset(0)]
from
(
select * from `project.dataset.order`
union all
select * from `project.dataset.product`
)
group by id
with output

You can do this using a full join:
select id,
(case when p.val is null or p.val < o.val then o.val else p.val end),
(case when p.val is null or p.val < o.val then o.name else p.name end)
from product p full join
order o
using (id);
I just find this the simplest way to think about the problem.

Related

How Affect Group By to Other Second Join Table

I have some table like this
table request_buys
| id | invoice | user_id |
| -- | ----------------- | ------- |
| 3 | 20220405/01104298 | 1 |
table traces
| id | request_buy_id | status_id | created_at |
| -- | -------------- | --------- | ------------------- |
| 37 | 3 | 1 | 2022-03-27 14:12:25 |
| 38 | 3 | 2 | 2022-03-28 14:12:25 |
| 39 | 3 | 3 | 2022-03-29 14:12:25 |
| 40 | 3 | 4 | 2022-03-30 14:12:25 |
| 41 | 3 | 5 | 2022-03-31 14:12:25 |
| 42 | 3 | 6 | 2022-04-01 14:12:25 |
table statuses
| id | nama |
| -- | ----------------- |
| 1 | Order Placed |
| 2 | Order Paid |
| 3 | Accepted |
| 4 | Picked by Courier |
| 5 | In Transit |
| 6 | Delivered |
| 7 | Rated |
| 8 | Rejected |
| 9 | Canceled |
and then i try to design query like below
select
request_buys.invoice,
MAX(traces.id) as traces_id,
MAX(statuses.nama) as statuses_nama
from
`request_buys`
inner join `traces` on `request_buys`.`id` = `traces`.`request_buy_id`
inner join `statuses` on `traces`.`status_id` = `statuses`.`id`
where
`user_id` = 1
group by
request_buys.id
and produces output like the following
output
| invoice | traces_id | statuses_nama |
| ----------------- | --------- | ----------------- |
| 20220405/01104298 | 42 | Picked by Courier |
and the output i expect should be like in the table below
expect
| invoice | traces_id | statuses_nama |
| ----------------- | --------- | ----------------- |
| 20220405/01104298 | 42 | Delivered |
I understand my error is in MAX(statuses.nama) which I should change like removing MAX() in statuses.nama
But i just get error like this "SELECT list is not in GROUP BY clause and contains nonaggregated ... this is incompatible with sql_mode=only_full_group_by"
then I tried some to clear the value "ONLY_FULL_GROUP_BY" with a query like the following
SET sql_mode=(SELECT REPLACE(##sql_mode,'ONLY_FULL_GROUP_BY',''))
and the result is like this
output
| invoice | traces_id | statuses_nama |
| ----------------- | --------- | ----------------- |
| 20220405/01104298 | 42 | Order Placed |
and I'm really stuck at this
and how to make trace_id.status_id from the "GROUP BY" result based on request_buys.id still have a relationship with statuses.id
Your problem lies with your misuse of the MAX(statuses.nama) expression. Based on your expected output,you intend to get the statuses.nama which matches the MAX(traces.id), NOT the MAX(statuses.nama) value which returns the highest value in terms of alphabetic order. In this case, the initial letter 'P' > 'D' . I have tweaked your code a bit and tried it on workbench,supposing there are more than one invoice for a particular user.(e.g insert into request_buys values (4,'20230405/01104298',1); insert into traces values (43,4,7,'2022-04-01 14:12:25');) It works as intended.
select invoice, t.id as traces_id, s.nama as statuses_name from request_buys r
join traces t on r.id=t.request_buy_id
join statuses s on t.status_id=s.id
join
(select traces.request_buy_id, MAX(traces.id) as traces_id
from `request_buys`
inner join `traces` on `request_buys`.`id` = `traces`.`request_buy_id`
where
`user_id` = 1
group by
traces.request_buy_id ) join_t
on t.request_buy_id=join_t.request_buy_id and t.id=join_t.traces_id
;
If I'm understanding correctly, you're trying to retrieve the most recent status for each invoice. Using MAX(nama) won't return that result, because it just picks the maximum status name alphabetically.
Assuming you're using MySQL 8.x, use ROW_NUMBER() to sort and rank the statuses for each invoice, by the most recent date first. Then grab the latest one using where rowNum = 1
WITH cte AS (
SELECT rb.id AS request_buy_id
, rb.invoice
, t.id AS traces_id
, s.nama AS statuses_nama
, ROW_NUMBER() OVER(PARTITION BY rb.id ORDER BY t.created_at DESC) AS RowNum
FROM request_buys rb
INNER JOIN traces t ON rb.id = t.request_buy_id
INNER JOIN statuses s ON t.status_id = s.id
WHERE user_id = 1
)
SELECT *
FROM cte
WHERE RowNum = 1
;
Result:
request_buy_id
invoice
traces_id
statuses_nama
RowNum
3
20220405/01104298
42
Delivered
1
db<>fiddle here

Selecting COUNT and MAX columns with 2 tables and a bridge table

so what I am trying to do is having 3 tables (pictures, collections, and bridge) with the following columns:
Collections Table:
| id | name |
------------------
| 1 | coll1 |
| 2 | coll2 |
------------------
Pictures Table: (timestamps are unix timestamps)
| id | name | timestamp |
-------------------------
| 5 | Pic5 | 1 |
| 6 | Pic6 | 19 |
| 7 | Pic7 | 3 |
| 8 | Pic8 | 892 |
| 9 | Pic9 | 4 |
-------------------------
Bridge Table:
| id | collection | picture |
-----------------------------
| 1 | 1 | 5 |
| 2 | 1 | 6 |
| 3 | 1 | 7 |
| 4 | 1 | 8 |
| 5 | 2 | 5 |
| 6 | 2 | 9 |
| 7 | 2 | 7 |
-----------------------------
And the result should look like this:
| collection_name | picture_count | newest_picture |
----------------------------------------------------
| coll1 | 4 | 8 |
| coll2 | 3 | 9 |
----------------------------------------------------
newest_picture should always be the picture with the heighest timestamp in that collection and I also want to sort the result by it. picture_count is obviously the count of picture in that collection.
Can this be done in a single statement with table joins and if yes:
how can I do this the best way?
A simple method uses correlated subqueries:
select c.*,
(select count(*)
from bridge b
where b.collection = c.id
) as pic_count,
(select p.id
from bridge b join
pictures p
on b.picture = b.id
where b.collection = c.id
order by p.timestamp desc
limit 1
) as most_recent_picture
from collections c;
A more common approach would use window functions:
select c.id, c.name, count(bp.collection), bp.most_recent_picture
from collections c left join
(select b.*,
first_value(p.id) over (partition by b.collection order by p.timestamp desc) as most_recent_picture
from bridge b join
pictures p
on b.picture = p.id
) bp
on bp.collection = c.id
group by c.id, c.name, bp.most_recent_picture;

MySQL Grouping Query

I have a number of tables in my database.
Table: ObjectToPerson
For example if I had a number of entries below in the database:
+----+------------+------------+----------+----------+--------------+
| Id | WeekNumber | Date | PersonId | ObjectId | ObjectTypeId |
+----+------------+------------+----------+----------+--------------+
| 1 | 1 | 2015-11-04 | 1 | 1 | 1 |
| 2 | 1 | 2015-11-04 | 1 | 3 | 2 |
| 3 | 1 | 2015-11-04 | 2 | 2 | 1 |
| 4 | 1 | 2015-11-04 | 2 | 4 | 2 |
+----+------------+------------+----------+----------+--------------+
I am wanting to return the results back as two lines as follows:
+------+------------+----------+----------------------------+----------------------------+
| Week | Date | PersonId | ObjectId(ObjectTypeId = 1) | ObjectId(ObjectTypeId = 2) |
+------+------------+----------+----------------------------+----------------------------+
| 1 | 2015-11-04 | 1 | 1 | 3 |
| 1 | 2015-11-04 | 2 | 2 | 4 |
+------+------------+----------+----------------------------+----------------------------+
I am thinking of some sort of Group By query but I just can't seem to get it right.
Select * From ObjectToPerson
Left Join Objects O On O.Id = ObjectToPerson.ObjectId And ObjectToPerson.ObjectTypeId = 1
Left Join Objects O On O.Id = ObjectToPerson.ObjectId And ObjectToPerson.ObjectTypeId = 2
Can someone explain how I would get to this please?
You could use CASE to only select the ObjectId if the type is correct for the column, then use MAX/GROUP BY to group the result into a single row per person/week/date.
SELECT WeekNumber week, date, personid,
MAX(CASE WHEN ObjectTypeId=1 THEN ObjectId END) Type1,
MAX(CASE WHEN ObjectTypeId=2 THEN ObjectId END) Type2
FROM ObjectToPerson
GROUP BY week, date, personid
An SQLfiddle to test with.
You don't want two joins, you want a WHERE clause;
SELECT * FROM ObjectToPerson
LEFT JOIN Objects O ON O.Id = ObjectToPerson.ObjectId
WHERE ObjectToPerson.ObjectTypeId IN(1,2)

MySQL SELECT value before MAX

How to select 1st, 2nd or 3rd value before MAX ?
usually we do it with order by and limit
SELECT * FROM table1
ORDER BY field1 DESC
LIMIT 2,1
but with my current query I don't know how to make it...
Sample table
+----+------+------+-------+
| id | name | type | count |
+----+------+------+-------+
| 1 | a | 1 | 2 |
| 2 | ab | 1 | 3 |
| 3 | abc | 1 | 1 |
| 4 | b | 2 | 7 |
| 5 | ba | 2 | 1 |
| 6 | cab | 3 | 9 |
+----+------+------+-------+
I'm taking name for each type with max count with this query
SELECT
`table1b`.`name`
FROM
(SELECT
`table1a`.`type`, MAX(`table1a`.`count`) AS `Count`
FROM
`table1` AS `table1a`
GROUP BY `table1a`.`type`) AS `table1a`
INNER JOIN
`table1` AS `table1b` ON (`table1b`.`type` = `table1a`.`type` AND `table1b`.`count` = `table1a`.`Count`)
and I want one more column additional to name with value before max(count)
so result should be
+------+------------+
| name | before_max |
+------+------------+
| ab | 2 |
| b | 1 |
| cab | NULL |
+------+------------+
Please ask if something isn't clear ;)
AS per your given table(test) structure, the query has to be as follows :
select max_name.name,before_max.count
from
(SELECT type,max(count) as max
FROM `test`
group by type) as type_max
join
(select type,name,count
from test
) as max_name on (type_max.type = max_name.type and count = type_max.max )
left join
(select type,count
from test as t1
where count != (select max(count) from test as t2 where t1.type = t2.type)
group by type
order by count desc) as before_max on(type_max.type = before_max .type)

Advanced MySQL: Find correlations between poll responses

I've got four MySQL tables:
users (id, name)
polls (id, text)
options (id, poll_id, text)
responses (id, poll_id, option_id, user_id)
Given a particular poll and a particular option, I'd like to generate a table that shows which options from other polls are most strongly correlated.
Suppose this is our data set:
TABLE users:
+------+-------+
| id | name |
+------+-------+
| 1 | Abe |
| 2 | Bob |
| 3 | Che |
| 4 | Den |
+------+-------+
TABLE polls:
+------+-----------------------+
| id | text |
+------+-----------------------+
| 1 | Do you like apples? |
| 2 | What is your gender? |
| 3 | What is your height? |
| 4 | Do you like polls? |
+------+-----------------------+
TABLE options:
+------+----------+---------+
| id | poll_id | text |
+------+----------+---------+
| 1 | 1 | Yes |
| 2 | 1 | No |
| 3 | 2 | Male |
| 4 | 2 | Female |
| 5 | 3 | Short |
| 6 | 3 | Tall |
| 7 | 4 | Yes |
| 8 | 4 | No |
+------+----------+---------+
TABLE responses:
+------+----------+------------+----------+
| id | poll_id | option_id | user_id |
+------+----------+------------+----------+
| 1 | 1 | 1 | 1 |
| 2 | 1 | 2 | 2 |
| 3 | 1 | 2 | 3 |
| 4 | 1 | 2 | 4 |
| 5 | 2 | 3 | 1 |
| 6 | 2 | 3 | 2 |
| 7 | 2 | 3 | 3 |
| 8 | 2 | 4 | 4 |
| 9 | 3 | 5 | 1 |
| 10 | 3 | 6 | 2 |
| 10 | 3 | 5 | 3 |
| 10 | 3 | 6 | 4 |
| 10 | 4 | 7 | 1 |
| 10 | 4 | 7 | 2 |
| 10 | 4 | 7 | 3 |
| 10 | 4 | 7 | 4 |
+------+----------+------------+----------+
Given the poll ID 1 and the option ID 2, the generated table should be something like this:
+----------+------------+-----------------------+
| poll_id | option_id | percent_correlated |
+----------+------------+-----------------------+
| 4 | 7 | 100 |
| 2 | 3 | 66.66 |
| 3 | 6 | 66.66 |
| 2 | 4 | 33.33 |
| 3 | 5 | 33.33 |
| 4 | 8 | 0 |
+----------+------------+-----------------------+
So basically, we're identifying all of the users who responded to poll ID 1 and selected option ID 2, and we're looking through all the other polls to see what percentage of them also selected each other option.
Don't have an instance handy to test, can you see if this gets proper results:
select
poll_id,
option_id,
((psum - (sum1 * sum2 / n)) / sqrt((sum1sq - pow(sum1, 2.0) / n) * (sum2sq - pow(sum2, 2.0) / n))) AS r,
n
from
(
select
poll_id,
option_id,
SUM(score) AS sum1,
SUM(score_rev) AS sum2,
SUM(score * score) AS sum1sq,
SUM(score_rev * score_rev) AS sum2sq,
SUM(score * score_rev) AS psum,
COUNT(*) AS n
from
(
select
responses.poll_id,
responses.option_id,
CASE
WHEN user_resp.user_id IS NULL THEN SELECT 0
ELSE SELECT 1
END CASE as score,
CASE
WHEN user_resp.user_id IS NULL THEN SELECT 1
ELSE SELECT 0
END CASE as score_rev,
from responses left outer join
(
select
user_id
from
responses
where
poll_id = 1 and
option_id = 2
)user_resp
ON (user_resp.user_id = responses.user_id)
) temp1
group by
poll_id,
option_id
)components
After a few hours of trial and error, I managed to put together a query that works correctly:
SELECT poll_id AS p_id,
option_id AS o_id,
COUNT(*) AS optCount,
(SELECT COUNT(*) FROM response WHERE option_id = o_id AND user_id IN
(SELECT user_id FROM response WHERE poll_id = '1' AND option_id = '2')) /
(SELECT COUNT(*) FROM response WHERE poll_id = p_id AND user_id IN
(SELECT user_id FROM response WHERE poll_id = '1' AND option_id = '2'))
AS percentage
FROM response
INNER JOIN
(SELECT user_id FROM response WHERE poll_id = '1' AND option_id = '2') AS user_ids
ON response.user_id = user_ids.user_id
WHERE poll_id != '1'
GROUP BY option_id DESC
ORDER BY percentage DESC, optCount DESC
Based on a tests with a small data set, this query looks to be reasonably fast, but I'd like to modify it so the "IN" subquery is not repeated three times. Any suggestions?
This seems to give the right results for me:
select poll_stats.poll_id,
option_stats.option_id,
(100 * option_responses / poll_responses) as percent_correlated
from (select response.poll_id,
count(*) as poll_responses
from response selecting_response
join response on response.user_id = selecting_response.user_id
where selecting_response.poll_id = 1 and selecting_response.option_id = 2
group by response.poll_id) poll_stats
join (select options.poll_id,
options.id as option_id,
count(response.id) as option_responses
from options
left join response on response.poll_id = options.poll_id
and response.option_id = options.id
and exists (
select 1 from response selecting_response
where selecting_response.user_id = response.user_id
and selecting_response.poll_id = 1
and selecting_response.option_id = 2)
group by options.poll_id, options.id
) as option_stats
on option_stats.poll_id = poll_stats.poll_id
where poll_stats.poll_id <> 1
order by 3 desc, option_responses desc