"Seek" paging with jOOQ skipping rows - mysql

I'm trying to implement "seek" paging using jOOQ (3.11.12) + MySQL (5.7.24).
I have a table of products, that contains the following rows:
ID | Name | Created At
---------------------- --------- -------------------
XjpPXlZxT5i3tTjO7lZQ6Q Product A 2019-10-25 03:23:05
SmytEB9lTW-UiVFhg_gViQ Product B 2019-10-09 05:43:44
glpNYcsBTJqAzQERbgGh5g Product C 2019-10-02 14:53:48
HDZ1K7g_Rj-2vdQaEj79Ow Product D 2019-09-07 14:52:56
aTcWWxdJSReZBGzkLXuNIQ Product E 2019-09-06 08:21:24
HPOD380mTR-g2Ut4Da0k4Q Product F 2019-09-06 08:19:57
jXzfHBDAQ6We4CjXLem_WA Product G 2019-09-06 08:16:06
duxiQ3InRXaFy_JVDkkewQ Product H 2019-09-06 08:15:02
QF-3ECfLQD2vdVGE_5X-rQ Product I 2019-09-04 12:35:00
zRnp0tLZRjSsQHN0wV7N_w Product J 2019-09-04 12:34:28
6Y3E3KkITYWbOs5aOQCHOw Product K 2019-09-04 10:33:38
ZOoG06ThRTiDDhteIW_6tA Product L 2019-09-04 10:19:14
6UW4MUClSLSuQI3pkA0qJA Product M 2019-09-04 10:18:40
Assume my application shows pages of 5 products at a time, ordered from newest to oldest.
I'm therefore ordering by creation date descending, and also ordering by ID so as to disambiguate between products that may have been created at the same moment.
I'm trying to fetch the results what would be the second page. The code (with relevant runtime values substituted in) looks like such:
selectFromWhere // <-- assume this to be a SelectConditionStep built with various filter criteria
.orderBy(TBL_PRODUCT.CREATED_AT.desc(), TBL_PRODUCT.ID.asc())
.seek(2019-09-06T08:21:24Z, "aTcWWxdJSReZBGzkLXuNIQ") // <-- runtime values
.limit(limit)
.fetchInto(Product::class.java)
This generates the following SQL (fully-qualified references and filter criteria omitted for brevity):
select distinct
id, created_at
from tbl_product
where (
(
created_at < {ts '2019-09-06 08:21:24.0'}
or (
created_at = {ts '2019-09-06 08:21:24.0'}
and id > 'aTcWWxdJSReZBGzkLXuNIQ'
)
)
)
order by
created_at desc,
id asc
limit 5
If I copy/paste and run the generated query manually from a SQL session, I get the results I expect:
Product F
Product G
Product H
Product I
Product J
...however, the results of the execution are saved into a local variable, and when I debug my program to examine its contents, I see it contains:
Product I
Product J
Product K
Product L
Product M
Two questions:
Why would jOOQ return different results for the same query I run manually against the same database?
Is there something wrong with my approach?
Any suggestions would be greatly appreciated!

Assuming the database column type for TBL_PRODUCT.CREATED_AT is DATETIME and the corresponding Java type is java.sql.Timestamp (which would be the default in jOOQ 3.11), this situation could arise when the time zone of the MySQL server differs from that of the Java client, since the JDBC driver will convert the timestamp for you (see https://stackoverflow.com/a/14070771/1732086 for details).
This behavior can also be controlled using various JDBC connection URL parameters (see https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-configuration-properties.html). One option is to use the serverTimezone JDBC URL property to specify the client's time zone as the session time zone to be used (e.g. serverTimezone=Europe/Zurich).
Time zones can always cause nasty surprises, especially in the context of JDBC :-(

Related

Ordering MySQL 8 results by count existence in a crosswalk table

I have the following MySQL 8 tables:
[submissions]
===
id
submission_type
name
[reject_reasons]
===
id
name
[submission_reject_reasons] -- crosswalk joining the first 2 tables
===
id
submission_id
reject_reason_id
In my application, users can submit submissions, and other users can request changes to those submissions. When they request these rejections, 1+ entries get saved to the submission_reject_reasons table (which stores the ID of the submission for which rejections are requested, as well as the ID of the reason for why the rejection is being made). So a typical entry in the table might look like:
id submission_id reject_reason_id
==============================================
45 384 294
Where submission_id = 384 is the "Fizz Buzz" submission and reject_reason_id = 294 is the "Missing Required Field" reason.
I currently have a query that fetches all the reject_reasons out of the DB:
SELECT * FROM reject_reasons
I now want to modify this query to sort the results based on their usage frequency. Meaning the query might currently return:
294 | Missing Required Field
14 | Malformed Entry
1885 | Makes No Sense
etc. But lets say there are 5 entries in the submission_reject_reasons table where 294 (Missing Required Field) is the reject_reason_id, and say there are 15 enries where 1885 (Makes No Sense) is present, and 120 entries where 14 (Malformed Entry) are present. I need a query that returns all reject_reasons sorted by their count in the submission_reject_reasons (SRR) table, descending, so that the most frequently used appear earlier in the sort. Hence the result set would be:
14 | Malformed Entry --> because there are 120 instances of this in the SRR table
1885 | Makes No Sense --> because there are 15 instances in the SRR
294 | Missing Required Field --> because there are only 5 instances in the SRR
Furthermore, I need a ranking from most-used to least-used. If a reason doesn't exist in the SRR table it should have a default "count" of zero (0) but should still come back in the query. If 2+ reason counts are tied, then I don't care how they are sorted. Any ideas here? I need the final result set to only contain the rr.id and rr.name field/values.
My best attempt is not getting me anywhere:
SELECT rr.id, rr.name
FROM reject_reasons AS rr
LEFT JOIN submission_reject_reasons AS srr on rr.id = srr.reject_reason_id
GROUP BY rr.id
ORDER BY COUNT(*) DESC
Can anyone help me over the finish line here? Can anyone spot where I'm goin awry? Thanks in advance!
You should be grouping by the reject reason ID. COUNT(*) is what you want to count in each group.
SELECT rr.id, rr.name
FROM reject_reasons AS rr
JOIN submission_reject_reasons AS srr on rr.id = srr.reject_reason_id
GROUP BY rr.id
ORDER BY COUNT(*) DESC
There's no need for any EXISTS check, since the INNER JOIN won't return any reject reasons that don't exist in submission_reject_reasons.

MySQL - SQL select query with two tables using where, count and having

There are two tables: client and contract.
client table:
client_code INT pk
status VARCHAR
A client can have 1 or more contracts. The client has a status column which specifies if it has valid contracts - the values are 'active' or 'inactive'. The contract is specified for a client with active status.
contract table:
contract_code INT pk
client_code INT pk
end_date DATE
A contract has an end date. A contract end date before today is an expired contract.
REQUIREMENT: A report requires all active clients with contracts, but with all (not some) contracts having expired date. Some example data is shown below:
Client data:
client_code status
----------------------------------
1 active
2 inactive
3 active
4 active
Contract data:
contract_code client_code end_date
-------------------------------------------------------------
11 1 08-12-2018
12 1 09-12-2018
13 1 10-12-2018
31 3 11-31-2018
32 3 10-30-2018
41 4 01-31-2019
42 4 12-31-2018
Expected result:
client_code
-------------
1
RESULT: This client (client_code = 1) has all contracts with expired dates: 08-12-2018, 09-12-2018 and 10-12-2018.
I need some help to write a SQL query to get this result. I am not sure what constructs I have to use - one can point out what I can try. The database is MySQL 5.5.
One approach uses aggregation. We can join together the client and contract tables, then aggregate by client, checking that, for an active client, there exist no contract end dates which occur in the future.
SELECT
c.client_code
FROM client c
INNER JOIN contract co
ON c.client_code = co.client_code
WHERE
c.status = 'active'
GROUP BY
c.client_code
HAVING
SUM(CASE WHEN co.end_date > CURDATE() THEN 1 ELSE 0 END) = 0;
Demo
Note: I am assuming that your dates are appearing in M-D-Y format simply due to the particular formatting, and that end_date is actually a proper date column. If instead you are storing your dates as text, then we might have to make a call to STR_TO_DATE to convert them to dates first.
Is that what you're looking for?
select clients.client_code
from clients
join contracts
on contracts.client_code=clients.client_code
where status='active'
group by clients.client_code
having min(end_date)>curdate()

SQL - Add To Existing Average

I'm trying to build a reporting table to track server traffic and popularity overall. Each SID is a unique game server hosting a particular game, and each UCID is a unique player key connecting to that server.
Say I have a table like so:
SID UCID AvgTime NumConnects
-----------------------------------------
1 AIE9348ietjg 300.55 5
1 Po328gieijge 500.66 7
2 AIE9348ietjg 234.55 3
3 Po328gieijge 1049.88 18
We can see that there are 2 unique players, and 3 unique servers, with SID 1 having 2 players that have connected to it at some point in the past. The AvgTime is the average amount of time those players spent on that server (in seconds), and the NumConnects is the size of the average (ie. 300.55 is averaged out of 5 elements).
Now I run a job in the background where I process a raw connection table and pull out player connections like so:
SID UCID ConnectTime DisconnectTime
-----------------------------------------
1 AIE9348ietjg 90.35 458.32
2 Po328gieijge 30.12 87.15
2 AIE9348ietjg 173.12 345.35
This table has no ID or other fluff to help condense my example. There may be multiple connect/disconnect records for multiple players in this table. What I want to do is add to my existing AvgTime for each SID these new values.
There is a formula from here I am trying to use (taken from this math stackexchange: https://math.stackexchange.com/questions/1153794/adding-to-an-average-without-unknown-total-sum/1153800#1153800)
Average = (Average * Size + NewValue) / Size + 1
How can I write an update query to update each ServerIDs traffic table above, and add to the average using the above formula for each pair of records. I tried something like the following but it didn't work (returned back null):
UPDATE server_traffic st
LEFT JOIN connect_log l
ON st.SID = l.SID AND st.UCID = l.UCID
SET AvgTime = (AvgTime * NumConnects + SUM(l.DisconnectTime - l.ConnectTime) / NumConnects + COUNT(l.UCID)
I would prefer an answer in MySql, but I'll accept MS SQL as well.
EDIT
I understand that statistics and calculations are generally not to be stored in tables and that you can run reports that would crunch the numbers for you. My requirement is that users can go to a website and view the popularity of various servers. This needs to be done in a way that
A: running a complex query per user doesn't crash or slow down the system
B: the page returns the data within a few seconds at most
See this example here: https://bf4stats.com/pc/shinku555555
This is a web page for battlefield 4 stats - notice that the load is almost near instant for this player, and I get back a load of statistics without waiting for some complex report query to return the data. I'm assuming they store these calculations in preprocessed tables where the webpage just needs to do a simple select to return back the values. That's the same approach I want to take with my Database and Web Application design.
Sorry if this is off topic to the original question - but hopefully this adds additional context that helps people understand my needs.
Since you cannot run aggregate functions like SUM and COUNT by themselves at the unit level in SQL but contained in an aggregate query, consider joining to an aggregate subquery for the UPDATE...LEFT JOIN. Also, adjust parentheses in SET to match above formula.
Also, note that since you use LEFT JOIN, rows with non-match IDs will render NULL for aggregate fields and this entity cannot be used in arithmetic operations and will return NULL. You can convert to zero with IFNULL() but may fail with formula's division.
UPDATE server_traffic s
LEFT JOIN
(SELECT SID, UCID, COUNT(UCID) As GrpCount,
SUM(DisconnectTime - ConnectTime) AS SumTimeDiff
FROM connect_log
GROUP BY SID, UCID) l
ON s.SID = l.SID AND s.UCID = l.UCID
SET s.AvgTime = (s.AvgTime * s.NumConnects + l.SumTimeDiff) / s.NumConnects + l.GrpCount
Aside - reconsider saving calculations/statistics within tables as they can always be run by queries even by timestamps. Ideally, database tables should store raw values.

Joining and selecting multiple tables and creating new column names

I have very limited experience with MySQL past standard queries, but when it comes to joins and relations between multiple tables I have a bit of an issue.
I've been tasked with creating a job that will pull a few values from a mysql database every 15 minutes but the info it needs to display is pulled from multiple tables.
I have worked with it for a while to figure out the relationships between everything for the phone system and I have discovered how I need to pull everything out but I'm trying to find the right way to create the job to do the joins.
I'm thinking of creating a new table for the info I need, with columns named as:
Extension | Total Talk Time | Total Calls | Outbound Calls | Inbound Calls | Missed Calls
I know that I need to start with the extension ID from my 'user' table and match it with 'extensionID' in my 'callSession'. There may be multiple instances of each extensionID but each instance creates a new 'UniqueCallID'.
The 'UniqueCallID' field then matches to 'UniqueCallID' in my 'CallSum' table. At that point, I just need to be able to say "For each 'uniqueCallID' that is associated with the same 'extensionID', get the sum of all instances in each column or a count of those instances".
Here is an example of what I need it to do:
callSession Table
UniqueCallID | extensionID |
----------------------------
A 123
B 123
C 123
callSum table
UniqueCallID | Duration | Answered |
------------------------------------
A 10 1
B 5 1
C 15 0
newReport table
Extension | Total Talk Time | Total Calls | Missed Calls
--------------------------------------------------------
123 30 3 1
Hopefully that conveys my idea properly.
If I create a table to hold these values, I need to know how I would select, join and insert those things based on that diagram but I'm unable to construct the right query/statement.
You simply JOIN the two tables, and do a group by on the extensionID. Also, add formulas to summarize and gather the info.
SELECT
`extensionID` AS `Extension`,
SUM(`Duration`) AS `Total Talk Time`,
COUNT(DISTINCT `UniqueCallID`) as `Total Calls`,
SUM(IF(`Answered` = 1,0,1)) AS `Missed Calls`
FROM `callSession` a
JOIN `callSum` b
ON a.`UniqueCallID` = b.`UniqueCallID`
GROUP BY a.`extensionID`
ORDER BY a.`extensionID`
You can use a join and group by
select
a.extensionID
, sum(b.Duration) as Total_Talk_Time
, count(b.Answered) as Total_Calls
, count(b.Answered) -sum(b.Answered) as Missed_calls
from callSession as a
inner join callSum as b on a.UniqueCallID = b.UniqueCallID
group by a.extensionID
This should do the trick. What you are being asked to do is to aggregate the number of and duration of calls. Unless explicitly requested, you do not need to create a new table to do this. The right combination of JOINs and AGGREGATEs will get the information you need. This should be pretty straightforward... the only semi-interesting part is calculating the number of missed calls, which is accomplished here using a "CASE" statement as a conditional check on whether each call was answered or not.
Pardon my syntax... My experience is with SQL Server.
SELECT CS.Extension, SUM(CA.Duration) [Total Talk Time], COUNT(CS.UniqueCallID) [Total Calls], SUM(CASE CS.Answered WHEN '0' THEN SELECT 1 ELSE SELECT 0 END CASE) [Missed Calls]
FROM callSession CS
INNER JOIN callSum CA ON CA.UniqueCallID = CS.UniqueCallID
GROUP BY CS.Extension

How to get the sum of a column from combined tables in mySQL?

I've been trying to write a mySQL-statement for the scenario below, but I just can't get it to work as intended. I would be very grateful if you guys could help me get it right!
I have two tables in a mySQL-database, event and route:
event:
id | date | destination | drivers |
passengers | description | executed
route:
name | distance
drivers contains a string with the usernames of the registered drivers in an event on the form "jack:jill:john".
destination contains the event destination (oh, really?) and its value is always the same as one of the values in the field name in the table route (i.e. the destination must already exist in route).
executed tells if the event is upcoming (0) or already executed (1).
distance is the distance to the destination in km from the home location.
What I want is to get the total distance covered for one specific user, only counting already executed events.
E.g., if Jill has been registered as a driver in two executed events where the distances to the destinations are 50km and 100km respectively, I would like the query to return the value 150.
I know I can use something like ...WHERE drivers LIKE '%jill%' AND executed = 1 to get the executed events where Jill was driving, and SUM() to get the total distance, but how do I combine the two tables and get it all to work?
Your help is very much appreciated!
/Linus
I haven't use MySQL for years, so sorry if I've got the syntax wrong, but something like this should do it:
In generic SQL:
select sum(distance) from route
join event on route.name = event.destination
where drivers like '%jill%' AND executed = 1
Or not using JOIN:
select sum(distance) from route, event
where drivers like '%jill%' AND executed = 1
and route.name = event.destination
Stuart's answer shows you how to get the sum of the column, but I just want to note that:
...WHERE drivers LIKE '%jill%'...
will return any event with a driver whose name contains the letters 'jill'.
Secondly, this database design doesn't seem to be normalized. You have driver names and route names repeated. If you normalize the database and have something like:
participant
id | name | role
event
id | date | route_id | description | executed
route
id | name | distance
participant_event
id | participant_id | event_id
then it would be a lot easier to work with the data.
Then if you wanted to implement a user search, you could make the query:
SELECT id FROM participant WHERE
name LIKE '%jill%' AND
role='driver';
Then if the query returns more than one result, let the user/application choose the correct driver and then run a SELECT SUM like Stuart's query:
SELECT SUM(r.distance) FROM route r
JOIN event e ON e.route_id=r.id
JOIN participant_event pe ON e.id=pe.event_id
JOIN participant p ON pe.participant_id=p.id
WHERE p.id=?;
Otherwise, the only way to ensure that you're only getting the total distance driven by one driver is to do something like this (assuming drivers is comma-delimited):
...WHERE LCASE(drivers)='jill' OR
drivers LIKE 'jill, %' OR
drivers LIKE '%, jill' OR
drivers LIKE '%, jill,%';