MySQL: find most recent value for list of subdocuments - mysql

I have a collection content that has four columns; id, timestamp, locationID, and authorID. Here is an example of my data; in production, this is tens of millions of rows in length.
id timestamp locationID authorID
1 2012-03-01 11:52:00 1 1
2 2012-03-16 19:56:00 1 2
3 2012-04-02 11:26:00 2 1
4 2012-04-22 11:52:00 2 3
5 2012-05-19 09:48:00 2 2
6 2012-05-30 07:12:00 2 1
7 2012-06-04 19:17:00 1 2
I'd like to collect the list of authorIDs whose most recent content (ordered by timestamp) matched a specific locationID.
The correct values for a query of locationID = 2 would be: [ 1, 3 ], as authorID 1 and 3 were most recently 'seen' at locationID = 2, while authorID 2's most recent content was at locationID 1.
I can certainly execute one query per authorID, but on production the authorID array has a length >100,000. This seems terribly inefficient (especially when each 'subquery' would be hitting this multi-million row content collection), and I'm looking for a better way to emerge this data from my dataset, ideally fast enough to be executed on a page render.

Something like this? This is from SQL Server, but I think it should work in mySQL as well.
DECLARE #locationId INT
SET #locationId = 2;
SELECT *
FROM (SELECT AuthorId, Max(TimeStamp) as MaxTimeStamp
FROM Content C
WHERE LocationId = #locationId
GROUP BY AuthorId) AS CBL
LEFT JOIN Content AS C ON CBL.AuthorId = C.AuthorId
AND C.TimeStamp > CBL.MaxTimeStamp
WHERE C.AuthorId IS NULL
For locationId = 2, it returns 1 and 3; and for locationId = 1, it returns 2
Per JW (thanks!), the correct mySql approach:
SET #locationId := 2;
SELECT *
FROM (SELECT AuthorId, Max(TimeStamp) as MaxTimeStamp
FROM Content C
WHERE LocationId = #locationId
GROUP BY AuthorId) AS CBL
LEFT JOIN Content AS C ON CBL.AuthorId = C.AuthorId
AND C.TimeStamp > CBL.MaxTimeStamp
WHERE C.AuthorId IS NULL

Try derieved subquery
SELECT
*
FROM content as c
INNER JOIN(
SELECT
MAX(id) as ID
FROM content
WHERE locationID = 2
GROUP BY authorID
) as t on t.ID = c.id
SQL FIDDLE DEMO

Related

How to reorganise the values of a groupBy from the same table (MYSQL)?

I have a Chats table setup like this:
ID
CHAT_GROUP_ID
MESSAGE
READ
CREATED_AT
1
uu-34uu5-6662
hi1
1
2022-06-02 13:16:42
2
uu-34uu5-6662
hi2
1
2022-06-02 13:16:45
3
uu-34uu5-6663
hi3
0
2022-06-02 13:16:46
4
uu-34uu5-6663
hi4
0
2022-06-02 13:16:47
5
uu-34uu5-6664
hi5
0
2022-06-02 13:16:49
ID = int
CHAT_GROUP_ID = Varchat(some kind of UUID)
MESSAGE = String
What I am trying to achieve is:
GROUP ALL THE CHAT_GROUP_ID with their own respective IDs.
When all the CHAT_GROUP_ID are grouped, SUM all the READ AS SUM(IF(read = 0, 1, 0))
Finally (and where I am struggling), always show only 1 MESSAGE but always the latest one.
I have am struggling so much on this unbelievable! How can I do this?
If you need it to be done in MySQL v5.*, you can use the following query:
SELECT tab.ID,
tab.CHAT_GROUP_ID,
tab.MESSAGE,
aggregated.SUM_READ_,
aggregated.MAX_TIME
FROM tab
INNER JOIN (SELECT CHAT_GROUP_ID,
SUM(IF(READ_=0,1,0)) AS SUM_READ_,
MAX(CREATED_AT) AS MAX_TIME
FROM tab
GROUP BY CHAT_GROUP_ID ) aggregated
ON tab.CHAT_GROUP_ID = aggregated.CHAT_GROUP_ID
AND tab.CREATED_AT = aggregated.MAX_TIME
First you create a view containing aggregated value of READ and CREATED_AT with respect to CHAT_GROUP_ID, then use these information to retrieve the needed infos from the main table.
Try it here.
Assuming at least MySql 8:
SELECT ct.Chat_Group_ID, ct.Message,
( SELECT SUM(Case when read = 1 then 0 else 1 end)
FROM Chats ct2
WHERE ct2.Chat_Group_ID = ct.Chat_Group_ID
) as Unread
FROM (
SELECT Chat_Group_ID, Message
, row_number() over (partition by Chat_Group_ID order by create_at desc) rn
FROM Chats
) ct
WHERE ct.rn = 1

SQL count asset inventory

In sql help i have 3 tables, table one is asset table which is as follow
id
asset_code
asset_name
asset_group
asset_quantity
1
A001
computer
4
7
2
A002
keyboard
6
4
and another table is asset_allocation
id
asset_id
allocated_quantity
allocated_location
returned
1
1
2
IT office
no
2
2
1
main hall
yes
the last table is asset_liquidated which will present assets that are no longer going to be used
id
asset_id
liquidated_quantity
1
1
1
Now lets say that i have 7 computer out of which 2 are allocated but not returned and i have 4 keyboards out of which 1 is allocated and it is returned back and 1 computer is liquidated means it is never going to be used
so now here i want to join these 3 tables and find inventory of my current stock in hand.
Now this is the query now i need to add this
where asset_allocation.returned is enum no inside this query
SELECT id,asset_code, asset_name, asset_group, asset_quantity,allocated_quantity,liquidated_quantity,
asset_quantity - COALESCE(AA.allocated_quantity, 0) - COALESCE(AL.liquidated_quantity, 0) available_quantity
FROM asset A
LEFT JOIN (SELECT asset_id, SUM(allocated_quantity) allocated_quantity
FROM asset_allocation
GROUP BY asset_id) AA ON A.id = AA.asset_id
LEFT JOIN (SELECT asset_id, SUM(liquidated_quantity) liquidated_quantity
FROM asset_liquidated
GROUP BY asset_id) AL ON A.id = AL.asset_id;
I believe what you are looking for is adding WHERE returned = 'no' in your first JOIN like so:
SELECT id,asset_code, asset_name, asset_group, asset_quantity,allocated_quantity,liquidated_quantity,
asset_quantity - COALESCE(AA.allocated_quantity, 0) - COALESCE(AL.liquidated_quantity, 0) available_quantity
FROM asset A
LEFT JOIN (SELECT asset_id, SUM(allocated_quantity) allocated_quantity
FROM asset_allocation
WHERE returned = 'no'
GROUP BY asset_id) AA ON A.id = AA.asset_id
LEFT JOIN (SELECT asset_id, SUM(liquidated_quantity) liquidated_quantity
FROM asset_liquidated
GROUP BY asset_id) AL ON A.id = AL.asset_id;
That changes the available quantity for keyboard from 3 to 4 for me
your query:
vs. mine:

Determine ranking with single mysql query

I am selecting a set of items from my table and determine their ranking to display this on my page, my code for selecting the items:
<?
$attra_query=mysqli_query($link, "select * from table WHERE category ='4'");
if(mysqli_num_rows($attra_query)>
0){
while($attra_data=mysqli_fetch_array($attra_query,1)){
?>
In the while loop I determine the ranking for each of those items like so:
SELECT COUNT(mi.location) + 1 rank
FROM table m
LEFT JOIN (
SELECT id,location,country, ROUND(COALESCE(total_rating/total_rating_amount,0),10) rating_per_vote
FROM table WHERE category = '4'
) mi
ON mi.location = m.location
AND mi.country = m.country
AND mi.rating_per_vote > ROUND(COALESCE(m.total_rating/m.total_rating_amount,0),10)
WHERE m.id = '$attra_id';
I figure this is highly inefficient, is there a way to combine the 2 queries into a single one so I don't have to run the ranking query for each item separately ?
//EDIT
Sample data:
id | location | country | category | total_rating | total_rating_amount
1 berlin DE 4 12 2
2 munich DE 4 9 1
Vote system is 1-10 points, for the sample data berlin has received a total rating of 12 with 2 votes, munich has received a rating of 9 with 1 vote, so berlin would have a rating of 6/10 and munich a rating of 9/10 and therefore should be ranked #1
SELECT COUNT(m.id) rank, m.id
FROM
(SELECT * FROM table WHERE category = '4') m
LEFT JOIN (
SELECT id,location,country, ROUND(COALESCE(total_rating/total_rating_amount,0),10) rating_per_vote
FROM table WHERE category = '4'
) mi
ON (mi.location = m.location
AND mi.country = m.country
AND mi.rating_per_vote > ROUND(COALESCE(m.total_rating/m.total_rating_amount,0),10))
OR mi.id=m.id
GROUP BY m.id
This should do I suppose. I don't know if this is the best possible solution.
In MySQL, you can do the ranking using variables. It is a bit hard to tell what you want to rank by from your query, but it would be something like this:
select t.*, (#rn := #rn + 1) as ranking
from table t cross join
(select #rn := 0) vars
where category = '4'
order by rating_per_vote;
If you provide sample data and desired results, it would be possible to refine this solution.

I need help regarding JOIN query in mysql

I have started learning MySQL and I'm having a problem with JOIN.
I have two tables: purchase and sales
purchase
--------------
p_id date p_cost p_quantity
---------------------------------------
1 2014-03-21 100 5
2 2014-03-21 20 2
sales
--------------
s_id date s_cost s_quantity
---------------------------------------
1 2014-03-21 90 9
2 2014-03-22 20 2
I want these two tables to be joined where purchase.date=sales.date to get one of the following results:
Option 1:
p_id date p_cost p_quantity s_id date s_cost s_quantity
------------------------------------------------------------------------------
1 2014-03-21 100 5 1 2014-03-21 90 9
2 2014-03-21 20 2 NULL NULL NULL NULL
NULL NULL NULL NULL 2 2014-03-22 20 2
Option 2:
p_id date p_cost p_quantity s_id date s_cost s_quantity
------------------------------------------------------------------------------
1 2014-03-21 100 5 NULL NULL NULL NULL
2 2014-03-21 20 2 1 2014-03-21 90 9
NULL NULL NULL NULL 2 2014-03-22 20 2
the main problem lies in the 2nd row of the first result. I don't want the values
2014-03-21, 90, 9 again in row 2... I want NULL instead.
I don't know whether it is possible to do this. It would be kind enough if anyone helps me out.
I tried using left join
SELECT *
FROM sales
LEFT JOIN purchase ON sales.date = purchase.date
output:
s_id date s_cost s_quantity p_id date p_cost p_quantity
1 2014-03-21 90 9 1 2014-03-21 100 5
1 2014-03-21 90 9 2 2014-03-21 20 2
2 2014-03-22 20 2 NULL NULL NULL NULL
but I want 1st 4 values of 2nd row to be NULL
Since there are no common table expressions or full outer joins to work with, the query will have some duplication and instead need to use a left join unioned with a right join;
SELECT p_id, p.date p_date, p_cost, p_quantity,
s_id, s.date s_date, s_cost, s_quantity
FROM (
SELECT *,(SELECT COUNT(*) FROM purchase p1
WHERE p1.date=p.date AND p1.p_id<p.p_id) rn FROM purchase p
) p LEFT JOIN (
SELECT *,(SELECT COUNT(*) FROM sales s1
WHERE s1.date=s.date AND s1.s_id<s.s_id) rn FROM sales s
) s
ON s.date=p.date AND s.rn=p.rn
UNION
SELECT p_id, p.date p_date, p_cost, p_quantity,
s_id, s.date s_date, s_cost, s_quantity
FROM (
SELECT *,(SELECT COUNT(*) FROM purchase p1
WHERE p1.date=p.date AND p1.p_id<p.p_id) rn FROM purchase p
) p RIGHT JOIN (
SELECT *,(SELECT COUNT(*) FROM sales s1
WHERE s1.date=s.date AND s1.s_id<s.s_id) rn FROM sales s
) s
ON s.date=p.date AND s.rn=p.rn
An SQLfiddle to test with.
In a general sense, what you're looking for is called a FULL OUTER JOIN, which is not directly available in MySQL. Instead you only get LEFT JOIN and RIGHT JOIN, which you can UNION together to get essentially the same result. For a very thorough discussion on this subject, see Full Outer Join in MySQL.
If you need help understanding the different ways to JOIN a table, I recommend A Visual Explanation of SQL Joins.
The way this is different from a regular FULL OUTER JOIN is that you're only including any particular row from either table at most once in the JOIN result. The problem being, if you have one purchase record and two sales records on a particular day, which sales record is the purchase record associated with? What is the relationship you're trying to represent between these two tables?
It doesn't sound like there's any particular relationship between purchase and sales records, except that some of them happened to take place on the same day. In which case, you're using the wrong tool for the job. If all you want to do is display these tables side by side and line the rows up by date, you don't need a JOIN at all. Instead, you should SELECT each table separately and do your formatting with some other tool (or manually).
Here's another way to get the same result, but the EXPLAIN for this is horrendous; and performance with large sets is going to be atrocious.
This is essentially two queries UNIONed together. The first query is essentially "purchase LEFT JOIN sales", the second query is essentially "sales ANTI JOIN purchase".
Because there is no foreign key relationship between the two tables, other than rows matching on date, we have to "invent" a key we can join on; we use user variables to assign ascending integer values to each row within a given date, so we can match row 1 from purchase to row 1 from sales, etc.
I wouldn't normally generate this type of result using SQL; it's not a typical JOIN operation, in the sense of how we traditionally join tables.
But, if I had to produce the specified resultset using MySQL, I would do it like this:
SELECT p.p_id
, p.p_date
, p.p_cost
, p.p_quantity
, s.s_id
, s.s_date
, s.s_cost
, s.s_quantity
FROM ( SELECT #pl_i := IF(pl.date = #pl_prev_date,#pl_i+1,1) AS i
, #pl_prev_date := pl.date AS p_date
, pl.p_id
, pl.p_cost
, pl.p_quantity
FROM purchase pl
JOIN ( SELECT #pl_i := 0, #pl_prev_date := NULL ) pld
ORDER BY pl.date, pl.p_id
) p
LEFT
JOIN ( SELECT #sr_i := IF(sr.date = #sr_prev_date,#sr_i+1,1) AS i
, #sr_prev_date := sr.date AS s_date
, sr.s_id
, sr.s_cost
, sr.s_quantity
FROM sales sr
JOIN ( SELECT #sr_i := 0, #sr_prev_date := NULL ) srd
ORDER BY sr.date, sr.s_id
) s
ON s.s_date = p.p_date
AND s.i = p.i
UNION ALL
SELECT p.p_id
, p.p_date
, p.p_cost
, p.p_quantity
, s.s_id
, s.s_date
, s.s_cost
, s.s_quantity
FROM ( SELECT #sl_i := IF(sl.date = #sl_prev_date,#sl_i+1,1) AS i
, #sl_prev_date := sl.date AS s_date
, sl.s_id
, sl.s_cost
, sl.s_quantity
FROM sales sl
JOIN ( SELECT #sl_i := 0, #sl_prev_date := NULL ) sld
ORDER BY sl.date, sl.s_id
) s
LEFT
JOIN ( SELECT #pr_i := IF(pr.date = #pr_prev_date,#pr_i+1,1) AS i
, #pr_prev_date := pr.date AS p_date
, pr.p_id
, pr.p_cost
, pr.p_quantity
FROM purchase pr
JOIN ( SELECT #pr_i := 0, #pr_prev_date := NULL ) prd
ORDER BY pr.date, pr.p_id
) p
ON p.p_date = s.s_date
AND p.i = s.i
WHERE p.p_date IS NULL
ORDER BY COALESCE(p_date,s_date),COALESCE(p_id,s_id)

Identifying groups in Group By

I am running a complicated group by statement and I get all my results in their respective groups. But I want to create a custom column with their "group id". Essentially all the items that are grouped together would share an ID.
This is what I get:
partID | Description
-------+---------+--
11000 | "Oven"
12000 | "Oven"
13000 | "Stove"
13020 | "Stove"
12012 | "Grill"
This is what I want:
partID | Description | GroupID
-------+-------------+----------
11000 | "Oven" | 1
12000 | "Oven" | 1
13000 | "Stove" | 2
13020 | "Stove" | 2
12012 | "Grill" | 3
"GroupID" does not exist as data in any of the tables, it would be a custom generated column (alias) that would be associated to that group's key,id,index, whatever it would be called.
How would I go about doing this?
I think this is the query that returns the five rows:
select partId, Description
from part p;
Here is one way (using standard SQL) to get the groups:
select partId, Description,
(select count(distinct Description)
from part p2
where p2.Description <= p.Description
) as GroupId
from part p;
This is using a correlated subquery. The subquery is finding all the description values less than the current one -- and counting the distinct values. Note that this gives a different set of values from the ones in the OP. These will be alphabetically assigned rather than assigned by first encounter in the data. If that is important, the OP should add that into the question. Based on the question, the particular ordering did not seem important.
Here's one way to get it:
SELECT p.partID,p.Description,b.groupID
FROM (
SELECT Description,#rn := #rn + 1 AS groupID
FROM (
SELECT distinct description
FROM part,(SELECT #rn:= 0) c
) a
) b
INNER JOIN part p ON p.description = b.description;
sqlfiddle demo
This gets assigns a diferent groupID to each description, and then joins the original table by that description.
Based on your comments in response to Gordon's answer, I think what you need is a derived table to generate your groupids, like so:
select
t1.description,
#cntr := #cntr + 1 as GroupID
FROM
(select distinct table1.description from table1) t1
cross join
(select #cntr:=0) t2
which will give you:
DESCRIPTION GROUPID
Oven 1
Stove 2
Grill 3
Then you can use that in your original query, joining on description:
select
t1.partid,
t1.description,
t2.GroupID
from
table1 t1
inner join
(
select
t1.description,
#cntr := #cntr + 1 as GroupID
FROM
(select distinct table1.description from table1) t1
cross join
(select #cntr:=0) t2
) t2
on t1.description = t2.description
SQL Fiddle
SELECT partID , Description, #s:=#s+1 GroupID
FROM part, (SELECT #s:= 0) AS s
GROUP BY Description