Mysql Query where max(time) less than today - mysql

I have two tables, the first table ( job ) stores the data and the second table ( job_locations ) stores the locations for each job, I'm trying to show the number of jobs that job locations are less than today
I use the DateTime for the Date Column
unfortunately, the numbers that appear after test the next code are wrong
My code
SELECT *
FROM `job`
left join job_location
on job_location.job_id = job.id
where job_location.cutoff_time < CURDATE()
group by job.id
Please help me to write the working Query.

I think you need to rephrase your query slightly. Select a count of jobs where the cutoff time is earlier than the start of today.
SELECT
j.id,
COUNT(CASE WHEN jl.cutoff_time < CURDATE() THEN 1 END) AS cnt
FROM job j
LEFT JOIN job_location jl;
ON j.id = jl.job_id
GROUP BY
j.id;
Note that the left join is important here because it means that we won't drop any jobs having no matching criteria. Instead, those jobs would still appear in the result set, just with a zero count.

As a note, you can simplify the count (in MySQL). And, assuming that all jobs have at least one location, you don't need a JOIN at all. So:
SELECT jl.job_id, sum( jl.cutoff_time < CURDATE() )
FROM job_location jl
GROUP BY jl.job_id;
If this is not correct (and you need the JOIN), then the condition on the date should go in the ON clause:
SELECT jl.job_id, COUNT(jo.job_id)
FROM job LEFT JOIN
job_location jl
ON jl.job_id = j.id AND jl.cutoff_time < CURDATE()
GROUP BY jl.job_id;

Related

MySQL INNER JOIN Select only one row from other table

I have been trying to figure out a MySQL statement to perform the following.
I have a table with that stores jobs (tbl_jobs).
Another table that stores work scheduling (tbl_schedule) in the form of fixed time slots.
I want the resulting query to show all jobs scheduled today, check if the jobs are already scheduled (timeslot field) and return the earliest time slot.
My timeslots are stored as numbers from 1-8 so I used MIN to get the smallest number.
There can be the same job spanning multiple timeslots.
I tried a code from MySQL INNER JOIN select only one row from second table but I believe I don't understand the query in depth enough to make my own statement for my purposes
SELECT a.*, c.*
FROM tbl_jobs a
INNER JOIN tbl_schedule c
ON a.job_id = c.job_id
INNER JOIN (
SELECT job_id, MIN(timeslot) ts
FROM tbl_schedule
GROUP BY job_id
) b ON c.job_id = b.job_id
WHERE date = '2018-01-05'
This query on returns jobs that are scheduled and the ones that are not scheduled do not show up at all.
Would appreciate if anyone can assist me in where I should go from here? I am at a roadblock so, I decided to post here for help! Thanks in advance!
To get unscheduled job, use the left join
SELECT a.*, c.*
FROM tbl_jobs a
LEFT JOIN tbl_schedule c
ON a.job_id = c.job_id
LEFT JOIN (
SELECT job_id, MIN(timeslot) ts
FROM tbl_schedule
GROUP BY job_id AND
) b ON c.job_id = b.job_id
WHERE date = '2018-01-05'

SQL Join with data associated to dates

Currently I have a simple SQL request to get aall group departure date and the associated group size (teamLength) between 2 dates but it doesn't work properly.
SELECT `groups`.`departure`, COUNT(`group_users`.`group_id`) as 'teamLength'
FROM `groups`
INNER JOIN `group_users`
ON `groups`.`id` = `group_users`.`group_id`
WHERE departure BETWEEN '2017-03-01' AND '2017-03-31'
In fact, if I have more than 1 group between the 2 dates, only 1 date will be recovered in association with the total number of teamLength.
For exemple, if I have 2 groups in the same interval with, for group 1, 2 people and for group 2, 1 people, the result will be:
Here are 2 screenshots of the current state of my groups and group_users tables:
Is it even possible to do what I want in only 1 SQL request ? Thanks
In addition to what jarlh commented (JOIN with ON). Don't ever group data without an explicit GROUP BY. I don't know why MYSQL still allows this...
Change your query to something like this and you should get the result you are looking for. Currently, the other departure dates get lost in the aggregation.
SELECT
groups.departure,
COUNT(1) as team_length
FROM
groups
INNER JOIN group_users
ON groups.id = group_users.group_id
WHERE
groups.departure BETWEEN '2017-03-01' AND '2017-03-31'
GROUP BY
groups.departure
I think that you have a syntax issue in your query. You are missing the ON statement so your database could be trying to get a cartesian product since there is no join clause.
SELECT `groups`.`departure`, COUNT(`group_users`.`id`) as 'teamLength'
FROM `groups`
INNER JOIN `group_users` ON `groups`.`id` = `group_users`.`group_id`
WHERE departure BETWEEN '2017-03-01' AND '2017-03-31'
GROUP BY `groups`.`departure`
You also are missing the GROUP BYclause which is not mandatory in all RDBS but it is a good practice to set it.

How can I speed up a multiple inner join query?

I have two tables. The first table (users) is a simple "id, username" with 100,00 rows and the second (stats) is "id, date, stat" with 20M rows.
I'm trying to figure out which username went up by the most in stat and here's the query I have. On a powerful machine, this query takes minutes to complete. Is there a better way to write it to speed it up?
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN stats AS b ON (b.id=a.id)
INNER JOIN stats AS c ON (c.id=a.id)
WHERE b.date = '2016-01-10'
AND c.date = '2016-01-13'
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
the other way i tried but it doesn't seem optimal is
SELECT a.id, a.username,
(SELECT b.stat FROM stats AS b ON (b.id=a.id) AND b.date = '2016-01-10') AS start,
(SELECT c.stat FROM stats AS c ON (c.id=a.id) AND c.date = '2016-01-14') AS end,
((SELECT b.stat FROM stats AS b ON (b.id=a.id) AND b.date = '2016-01-10') -
(SELECT c.stat FROM stats AS c ON (c.id=a.id) AND c.date = '2016-01-14')) AS stat_diff
FROM users AS a
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
Introduction
Let's suppose we rewrite sentence like this:
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN stats AS b ON
b.date = STR_TO_DATE('2016-01-10', '%Y-%m-%d' ) and b.id=a.id
INNER JOIN stats AS c ON
c.date = STR_TO_DATE('2016-01-13', '%Y-%m-%d' ) and c.id=a.id
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
And we ensure than:
users table has index on field id:
stats has index on composite field date, id: create index stats_idx_d_i on stats ( date, id );
Then
Database optimizer may use indexes to selected a Restricted Set of Date ('RSD'), that means, rows that match filtered dates. This is fast.
But
You are sorting by a calculated field:
(b.stat - c.stat) AS stat_diff #<-- calculated
ORDER BY stat_diff DESC #<-- this forces to calculate it
They are no possible optimization on this sort because you should to calculate one by one all results on your 'RSD' (restricted set of data).
Conclusion
The question is, how may rows they are on your 'RSD'? If only they are few hundreds rows you query may run fast, else, your query will be slow.
Any case, you should to be sure the first step of query ( without sorting ) is made by index and no fullscanning. Use Explain command to be sure.
All you need to do is to help optimizer.At a bare minimum.have a check list which looks like below
1.Are my join columns indexed ?
2.Are the where clauses Sargable
3.are there any implicit,explicit conversions
4.Am i seeing any statistics issues
one more interesting aspect to look at is how is your data distributed,once you understand the data,you will be able to intrepret the execution plan and alter it as per your need
EX:
Think like i have any customers table with 100rows,Each one has a minimum of 10 orders(total upto 10000 orders).Now if you need to find out only top 3 orders by date,you dont want a scan happening of orders table
Now in your case ,i may not go with second option,even though the optimizer may choose a good plan for this one as well,I will go first approach and try to see if the execution time is acceptable.if not then i will go through my check list and try to tune it further
The Query Seems OK, Verify your Indexes ..
Or
Try this Query
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN (select id,stat from stats where date = '2016-01-10') AS b ON (b.id=a.id)
INNER JOIN (select id,stat from stats where date = '2016-01-13') AS c ON (c.id=a.id)
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100

WHERE clause behaving in the wrong way

i want to create a weekly report using a php script. in the mysql query of this, i want to take out tables that functioned during the last week by using its timestamp in the meantime get the other values(especially the counts of unique users and overall users) since the creation date of those tables. but at the moment its returning incorrect data in terms of counts where the counts are given only within the last week not since the creation of project.
this happens inside a loop so for the example query im using table table_1
QUERY
SELECT DISTINCT b.ID, name, accountname, c.accountID, status,
total_impr, min(a.timestamp), max(a.timestamp),COUNT(DISTINCT userid)
AS unique_users,COUNT(userid) AS overall_users
FROM table_1 a INNER JOIN logs b on a.ID = b.ID INNER JOIN
accounts c on b.accountID = c.accountID
WHERE a.timestamp > DATE_ADD(NOW(), INTERVAL -1 WEEK

MySQL aggregate functions with LEFT JOIN

It's Friday afternoon* and my brain has stopped working. Normally it is me who's answering dumb SQL questions like this, sorry!
I am trying to get one table, along with the highest value of a column of another table by LEFT JOINing the latter to the former.
SELECT
jobs.*,
MAX(notes.`timestamp`) AS complete_date
FROM jobs
LEFT JOIN notes ON (jobs.id=notes.job_id)
WHERE (jobs.status="complete" OR jobs.status="closed")
AND (notes.type="complete" OR notes.type IS NULL)
GROUP BY jobs.id
ORDER BY complete_date ASC
I am trying to get all jobs that meet the WHERE jobs.… criteria, and if they have one, the timestamp of the latest type=complete note associated with that job:
Job ID Complete Date
1 today
2 NULL
4 yesterday
Job 3 don't appear because it don't meet the jobs.status criteria. But what I actually get is:
Job ID Complete Date
1 today
4 yesterday
Job 2 is missing, i.e. the JOIN is acting like an INNER JOIN.
I am sure it's just me having a brain-dead moment, but I can't see why my LEFT (OUTER) JOIN is not giving me all jobs regardless of the value of the note.
Specifically, users can delete notes, so potentially a complete/closed job may not have a type=complete note on it (the notes are entered when the status is changed), I am trying to catch the case when a user closes a job, adds a note, then deletes the note.
* somewhere in the east
Since you have the filter for the notes table in the WHERE clause the JOIN is acting like an INNER JOIN, move it to the JOIN condition:
SELECT
jobs.*,
MAX(notes.`timestamp`) AS complete_date
FROM jobs
LEFT JOIN notes
ON (jobs.id=notes.job_id)
AND (notes.type="complete" OR notes.type IS NULL)
WHERE (jobs.status="complete" OR jobs.status="closed")
GROUP BY jobs.id
ORDER BY complete_date ASC;
This could also be done using a subquery, so you apply the notes filter inside the subquery:
SELECT
jobs.*,
n.complete_date
FROM jobs
LEFT JOIN
(
select job_id, MAX(`timestamp`) AS complete_date
from notes
where (type="complete" OR type IS NULL)
group by job_id
) n
ON (jobs.id=n.job_id)
WHERE (jobs.status="complete" OR jobs.status="closed")
ORDER BY complete_date ASC