Pagination in MySQL with JOINed ORDER clause - mysql

Having some trouble with paginating a dataset which is built by querying a joined table.
My product table looks like this:
id | sort_order
-----------------
1 | 5
2 | 4
3 | 0
4 | 0
5 | 4
...
and my joined stock table looks like this:
id | product_id | start_date
----------------------------
1 | 1 | 2018-12-14
2 | 1 | 2019-01-28
3 | 2 | 2018-12-26
4 | 3 | 2018-12-28
5 | 4 | 2019-01-12
6 | 4 | 2019-01-14
7 | 5 | 2020-01-10
...
I would like to paginate my list of products, however, I would like to sort it as follows:
Firstly, by sort_order
Secondly, by the earliest start_date associated to it.
I initially started with cursor pagination, but this was resulting in duplicate results, although the total number of pages (after cursoring to the end) was correct - this meant that there must have been missing rows which were never fetched.
I then resorted to page based pagination (which would also be fine for now), but this is resulting in duplication also, and a huge number of pages.
I'm quite stuck as to how to continue with this.
My offset-based pagination SQL (generated by Sequelize) is below:
SELECT
`Product`.`id` AS `id`,
`Product`.`sort_order` AS `sortOrder`,
`availability`.`id` AS `availability.id`,
`availability`.`product_id` AS `availability.productId`,
`availability`.`start_date` AS `availability.startDate`
FROM
`product` AS `Product`
LEFT OUTER JOIN `stock` AS `availability`
ON `Product`.`id` = `availability`.`productId`
ORDER BY
sort_order = 0,
sort_order
LIMIT 0, 10
With the dataset above, I would hope for the following:
id | sortOrder | `availability.id` | `startDate`
--------------------------------------------------
1 | 5 | 1 | 2018-12-14
2 | 4 | 3 | 2018-12-26
5 | 4 | 7 | 2020-01-10
3 | 0 | 4 | 2018-12-28
4 | 0 | 5 | 2019-01-12

You're getting duplication because you're JOINing to a table which has multiple values per product_id. You need to restrict that to one value, and based on your sort criteria that should be the values associated with the minimum start_date. You can do that with a subquery for the JOIN table:
SELECT
`Product`.`id` AS `id`,
`Product`.`sort_order` AS `sortOrder`,
`availability`.`id` AS `availability.id`,
`availability`.`product_id` AS `availability.productId`,
`availability`.`start_date` AS `availability.start_date`
FROM
`product` AS `Product`
LEFT JOIN (SELECT id, product_id, start_date
FROM`stock` s
WHERE start_date = (SELECT MIN(start_date)
FROM stock s1
WHERE s1.product_id = s.product_id)
) AS `availability`
ON `Product`.`id` = `availability`.`product_id`
ORDER BY
sort_order = 0,
sort_order,
availability.start_date
LIMIT 0, 10
Output for your sample data:
id sortOrder availability.id availability.productId availability.start_date
1 5 1 1 2018-12-14
2 4 3 2 2018-12-26
5 4 7 5 2020-01-10
3 0 4 3 2018-12-28
4 0 5 4 2019-01-12
Demo on SQLFiddle

Related

Latest datetime from unique mysql index

I have a table. It has a pk of id and an index of [service, check, datetime].
id service check datetime score
---|-------|-------|----------|-----
1 | 1 | 4 |4/03/2009 | 399
2 | 2 | 4 |4/03/2009 | 522
3 | 1 | 5 |4/03/2009 | 244
4 | 2 | 5 |4/03/2009 | 555
5 | 1 | 4 |4/04/2009 | 111
6 | 2 | 4 |4/04/2009 | 322
7 | 1 | 5 |4/05/2009 | 455
8 | 2 | 5 |4/05/2009 | 675
Given a service 2 I need to select the rows for each unique check where it has the max date. So my result would look like this table.
id service check datetime score
---|-------|-------|----------|-----
6 | 2 | 4 |4/04/2009 | 322
8 | 2 | 5 |4/05/2009 | 675
Is there a short query for this? The best I have is this, but it returns too many checks. I just need the unique checks at it's latest datetime.
SELECT * FROM table where service=?;
First you need find out the biggest date for each check
SELECT `check`, MAX(`datetime`)
FROM YourTable
WHERE `service` = 2
GROUP BY `check`
Then join back to get the rest of the data.
SELECT Y.*
FROM YourTable Y
JOIN ( SELECT `check`, MAX(`datetime`) as m_date
FROM YourTable
WHERE `service` = 2
GROUP BY check) as `filter`
ON Y.`service` = `filter`.service
AND Y.`datetime` = `fiter`.m_date
WHERE Y.`service` = 2

a query that returns a single row for each foreign key

I have a table of routines. In this table, I have the column "grade" (which is not mandatory), and the column "date". Also, I have a number of days and an array of ids of users. I need a query that returns me the last routine that have a value != null for "grade" column and datediff(current_date,date) >= number_of_days for each id in the array and make an average of all these values.
e.g.
today = 2014/10/15
number_of_days = 10
ids(1,3)
routines
id | type | date | grade | user_id
1 | 1 | 2014-10-10 | 3 | 1
2 | 1 | 2014-10-04 | 3 | 1
3 | 1 | 2014-10-01 | 3 | 1
4 | 1 | 2014-09-24 | 2 | 1
5 | 1 | 2014-10-10 | 2 | 2
6 | 1 | 2014-10-04 | 3 | 2
7 | 1 | 2014-10-01 | 3 | 2
8 | 1 | 2014-09-24 | 1 | 2
9 | 1 | 2014-10-10 | 1 | 3
10 | 1 | 2014-10-04 | 1 | 3
11 | 1 | 2014-10-01 | 1 | 3
12 | 1 | 2014-09-24 | 1 | 3
In this case, my query would return an avg between "grade" of row id #2 and #10
I think you're saying that you want to consider rows having non-null values in the grade column, a date within a given number of days of the current date, and one of a given set of user_ids. Among those rows, for each user_id you want to choose the row with the latest date, and compute an average of the grade columns for those rows.
I will assume that you cannot have any two rows with the same user_id and date, both with non-null grades, else the question you want to ask does not have a well-defined answer.
A query along these lines should do the trick:
SELECT AVG(r.grade) AS average_grade
FROM
(SELECT user_id, MAX(date) AS date
FROM routines
WHERE grade IS NOT NULL
AND DATEDIFF(CURDATE(), date) >= 10
AND user_id IN (1,3)
GROUP BY user_id) AS md
JOIN routines r
ON r.user_id = md.user_id AND r.date = md.date
Note that in principle you need a grade IS NOT NULL condition on both the inner and the outer query to select the correct rows to average, but in practice AVG() ignores nulls, so you don't actually have to filter out the extra rows in the outer query.

MySQL - Get row with the maximum HISTORY ID for COMPONENT IDs in non-existing months

I have a table INVENTORY which consists of inventory items. I have the following table structure:
INSTALLATION_ID
COMPONENT_ID
HISTORY_ID
ON_STOCK
LAST_CHANGE
I need to obtain the row with the max HISTORY ID for records for which the spcified LAST_CHANGE month doesn't exist.
Each COMPONENT_ID and INSTALLATION_ID can occur multiple times, they are distinguished by their respective HISTORY_ID
Example:
I have the following records
COMPONENT_ID | INSTALLATION_ID | HISTORY_ID | LAST_CHANGE
1 | 100 | 1 | 2013-01-02
1 | 100 | 2 | 2013-02-01
1 | 100 | 3 | 2013-04-09
2 | 100 | 1 | 2013-02-22
2 | 100 | 2 | 2013-03-12
2 | 100 | 3 | 2013-07-07
2 | 100 | 4 | 2013-08-11
2 | 100 | 5 | 2013-09-15
2 | 100 | 6 | 2013-09-29
3 | 100 | 1 | 2013-02-14
3 | 100 | 2 | 2013-09-23
4 | 100 | 1 | 2013-04-17
I am now trying to retrieve the rows with the max HISTORY ID for each component but ONLY for COMPONENT_IDs in which the specifiec month does not exists
I have tried the following:
SELECT
INVENTORY.COMPONENT_ID,
INVENTORY.HISTORY_ID
FROM INVENTORY
WHERE INVENTORY.HISTORY_ID = (SELECT
MAX(t2.HISTORY_ID)
FROM INVENTORY t2
WHERE NOT EXISTS
(
SELECT *
FROM INVENTORY t3
WHERE MONTH(t3.LAST_CHANGE) = 9
AND YEAR(t3.LAST_CHANGE)= 2013
AND t3.HISTORY_ID = t2.HISTORY_ID
)
)
AND INVENTORY.INSTALLATION_ID = 200
AND YEAR(INVENTORY.LAST_CHANGE) = 2013
The query seems to have correct syntax but it times out.
In this particular case, i would like to retrieve the maximum HISTORY_ID for all components except for those that have records in September.
Because I need to completely exclude rows by their month, i cannot use NOT IN, since they will just suppress the records for september but the same component could show up with another month.
Could anybody give some pointers? Thanks a lot.
If I understand correctly what you want you can do it like this
SELECT component_id, MAX(history_id) history_id
FROM inventory
WHERE last_change BETWEEN '2013-01-01' AND '2013-12-31'
AND installation_id = 100
GROUP BY component_id
HAVING MAX(MONTH(last_change) = 9) = 0
Output:
| COMPONENT_ID | HISTORY_ID |
|--------------|------------|
| 1 | 3 |
| 4 | 1 |
If you always filter by installation_id and a year of last_change make sure that you have a compound index on (installation_id, last_change)
ALTER TABLE inventory ADD INDEX (installation_id, last_change);
Here is SQLFiddle demo

Remove duplicates from one column keeping whole rows

id | userid | total_points_spent
1 | 1 | 10
2 | 2 | 15
3 | 2 | 50
4 | 3 | 5
5 | 1 | 15
With the above table, I would first like to remove duplicates of userid keeping the rows with the largest total_points_spent, like so:
id | userid | total_points_spent
3 | 2 | 50
4 | 3 | 5
5 | 1 | 15
And then I would like to sum the values of total_points_spent, which would be the easy part, resulting in 70.
I am not really sure the "remove" you meant is to delete or to select. Here is the query for select only max totalpointspend record respectively.
SELECT tblA.*
FROM ( SELECT userid, MAX(totalpointspend) AS maxtotal
FROM tblA
GROUP BY userid ) AS dt
INNER JOIN tblA
ON tblA.userid = dt.userid
AND tblA.totalpointspend = dt.maxtotal
ORDER BY tblA.userid

SQL with specific LIMIT

I have the next example table:
id | user_id | data
-------------------
1 | 1 | 10
2 | 2 | 10
3 | 2 | 10
4 | 1 | 10
5 | 3 | 10
6 | 4 | 10
7 | 4 | 10
8 | 5 | 10
9 | 5 | 10
10 | 2 | 10
11 | 6 | 10
12 | 3 | 10
13 | 1 | 10
I need to create a SELECT query, that LIMITS my data. For example, I have a limit range (1, 3) (page number = 1, row count = 3). It should selects rows with first 3 unique user_id. And if there are some rows in the end of table with this first user_id's, they should be included to the result. LIMIT statement is bad for this query, because I can get more than 3 rows. Output for my limit should be:
id | user_id | data
-------------------
1 | 1 | 10
2 | 2 | 10
3 | 2 | 10
4 | 1 | 10
5 | 3 | 10
10 | 2 | 10
12 | 3 | 10
13 | 1 | 10
Can you help me to generate this query?
How about:
SELECT *
FROM table
WHERE user_id IN
(SELECT distinct(user_id) FROM table order by user_id LIMIT 3);
What about something like this?
SELECT * FROM table WHERE user_id BETWEEN (number) AND (number+row count)
I know it isn't working but you should be able to make it work ^^
The sample code below can be used for Oracle & Mysql. (use TOP for SQL Server & Sybase)
You get all the results from your table (t1) that match the top 3 user_id (t2) (check the MySQL manual for the limit function)
SELECT *
FROM exampletable t1
INNER JOIN (
SELECT DISTINCT user_id
FROM exampletable
ORDER BY user_id
LIMIT 0,3 -- this is the important part
) AS t2 ON t1.user_id = t2.user_id
ORDER BY id
For the next 3 id's change the limit 0,3 to limit 3,6.