MYSQL updating a table containing a join and subquery - mysql

I am relatively new to SQL, i am trying to update monthly salary based on employees working for a certain duration, the query displays the data using info from the person and employee table but it won't update, i keep getting a 'operand should contain 1 column' error? How would i go about displaying all the data and be able to update the monthly_salary column as well? Thanks.
UPDATE employee ep set monthly_salary = monthly_salary*1.15 = all(
SELECT p.person_id, p.name_first, p.name_last, ep.monthly_salary, ep.start_date, curdate() as today_date,
TIMESTAMPDIFF(month,ep.start_date,curdate()) as duration_months
FROM employee ep
INNER JOIN person p ON ep.person_id = p.person_id having duration_months > 24);
query result
I want this expected result but the monthly salary hasn't been updated yet, is it possible to display this and update the monthly_salary?

You are not able to do both in a single query. Typically one would run a "select query" to inspect if the desired logic appears correct, e.g.
SELECT
p.person_id
, p.name_first
, p.name_last
, ep.start_date
, curdate() as today_date
, TIMESTAMPDIFF(month,ep.start_date,curdate()) as duration_months
FROM employee ep
INNER JOIN person p ON ep.person_id = p.person_id
WHERE ep.start_date < curdate() - INTERVAL 24 MONTH
;
In that query the important piece of logic is the where clause which seeks out any employees with a start date earlier than today - 24 months.
If that logic is correct, then apply the same logic in an "update query":
UPDATE employee ep
SET monthly_salary = monthly_salary*1.15
WHERE ep.start_date < curdate() - INTERVAL 24 MONTH
;
Syntax notes:
you cannot string multiple conditions together using multiple equality operators (monthly_salary = monthly_salary*1.15 = all(...) there are 2 = signs in that
x = all() requires that all values returned by a subquery will equal x
the having clause is NOT just a substitute for a where clause. A having clause is designed for evaluating aggregated data e.g. having count(*) > 2
Finally, while it was inventive to use the having clause, what you were doing was gaining access to the alias 'duration_months', so you could simply have done this instead:
where TIMESTAMPDIFF(month,ep.start_date,curdate()) > 24
BUT this is not a good way to filter information because it requires running a function on every row of data before a decision can be reached. This has he effect of making queries slower. Compare that to the following:
WHERE ep.start_date < curdate() - INTERVAL 24 MONTH
ep.start_date is not affected by any function, and curdate() - INTERVAL 24 MONTH is just one calculation (not done every row). So this is much more efficient (also known as "sargable").

Related

COUNT number distinct when they a row hasn't existed before the time period

I have kind of an interesting situation that I will try my best to explain.
I have a table called appointments in that table holds many appointments that a sales person can have with a potential customer. The relationship between appointments to salespeople is many to one and it is the same for potential customers.
I need to count how many appointments a salesperson has set with a lead when that salesperson has never set an appointment with that lead before.
Here is how far I have gotten in the code (I'm trying to see how many appointments a salesperson set yesterday, hence the date scrub):
SELECT COUNT(DISTINCT lead)
FROM appointments
WHERE status = 3
and DATE(appointment_created_at) = CURDATE() - interval 1 day
AND creator = 'xxx';
(the column creator represents the individual sales person and the column lead represents the individual potential customer)
The problem with this SQL query is that if a salesperson is resetting an appointment with a lead they have already set an appointment with, it still counts it as a "set appointment".
How can I count the number of rows in my appointments table without counting leads who have already been set before?
You can utilize NOT EXISTS() to check if an appointment already exists earlier or not.
SELECT COUNT(DISTINCT a1.lead)
FROM appointments a1
WHERE a1.status = 3
and a1.appointment_created_at >= CURRENT_DATE() - INTERVAL 1 DAY
AND a1.appointment_created_at < CURRENT_DATE()
AND a1.creator = 'xxx'
AND NOT EXISTS (SELECT 1
FROM appointments a2
WHERE a2.creator = 'xxx'
AND a2.lead = a1.lead
AND a2.appointment_created_at < a1.appointment_created_at)
For good performance, for the Correlated subquery in the NOT EXISTS() portion, you can use the following composite index: (creator, lead, appointment_created_at)
And, for the main select query, you can add the following the composite index: (creator, status, appointment_created_at)
If you want the number of "first-time" appointments, you can use row_number() or a correlated subquery:
SELECT COUNT(*)
FROM appointments a
WHERE a.status = 3 AND
a.appointment_created_at >= CURDATE() - interval 1 day AND
a.appointment_created_at < CURDATE() AND
a.creator = 'xxx' AND
a.appointment_created_at = (SELECT MIN(a2.appointment_created_at)
FROM appointments a2
WHERE a2.creator = a.creator AND
a2.lead = a.lead
);
Notice that I changed the date comparisons so an index can be used for the WHERE clause. If you care about performance, you want indexes on:
appointments(creator, status, appointment_created_at, lead)
appointments(creator, lead, appointment_created_at).
If the sales people can reschedule appointments then you are going to need an additional field to store original appointment date, at least. There are other more complex solutions, but this is probably the easiest approach.

Generating complex sql tables

I currently have an employee logging sql table that has 3 columns
fromState: String,
toState: String,
timestamp: DateTime
fromState is either In or Out. In means employee came in and Out means employee went out. Each row can only transition from In to Out or Out to In.
I'd like to generate a temporary table in sql to keep track during a given hour (hour by hour), how many employees are there in the company. Aka, resulting table has columns HourBucket, NumEmployees.
In non-SQL code I can do this by initializing the numEmployees as 0 and go through the table row by row (sorted by timestamp) and add (employee came in) or subtract (went out) to numEmployees (bucketed by timestamp hour).
I'm clueless as how to do this in SQL. Any clues?
Use a COUNT ... GROUP BY query. Can't see what you're using toState from your description though! Also, assuming you have an employeeID field.
E.g.
SELECT fromState AS 'Status', COUNT(*) AS 'Number'
FROM StaffinBuildingTable
INNER JOIN (SELECT employeeID AS 'empID', MAX(timestamp) AS 'latest' FROM StaffinBuildingTable GROUP BY employeeID) AS LastEntry ON StaffinBuildingTable.employeeID = LastEntry.empID
GROUP BY fromState
The LastEntry subquery will produce a list of employeeIDs limited to the last timestamp for each employee.
The INNER JOIN will limit the main table to just the employeeIDs that match both sides.
The outer GROUP BY produces the count.
SELECT HOUR(SBT.timestamp) AS 'Hour', SBT.fromState AS 'Status', COUNT(*) AS 'Number'
FROM StaffinBuildingTable AS SBT
INNER JOIN (
SELECT SBIJ.employeeID AS 'empID', MAX(timestamp) AS 'latest'
FROM StaffinBuildingTable AS SBIJ
WHERE DATE(SBIJ.timestamp) = CURDATE()
GROUP BY SBIJ.employeeID) AS LastEntry ON SBT.employeeID = LastEntry.empID
GROUP BY SBT.fromState, HOUR(SBT.timestamp)
Replace CURDATE() with whatever date you are interested in.
Note this is non-optimal as it calculates the HOUR twice - once for the data and once for the group.
Again you are using the INNER JOIN to limit the number of returned row, this time to the last timestamp on a given day.
To me your description of the FromState and ToState seem the wrong way round, I'd expect to doing this based on the ToState. But assuming I'm wrong on that the following should point you in the right direction:
First, I create a "Numbers" table containing 24 rows one for each hour of the day:
create table tblHours
(Number int);
insert into tblHours values
(0),(1),(2),(3),(4),(5),(6),(7),
(8),(9),(10),(11),(12),(13),(14),(15),
(16),(17),(18),(19),(20),(21),(22),(23);
Then for each date in your employee logging table, I create a row in another new table to contain your counts:
create table tblDailyHours
(
HourBucket datetime,
NumEmployees int
);
insert into tblDailyHours (HourBucket, NumEmployees)
select distinct
date_add(date(t.timeStamp), interval h.Number HOUR) as HourBucket,
0 as NumEmployees
from
tblEmployeeLogging t
CROSS JOIN tblHours h;
Then I update this table to contain all the relevant counts:
update tblDailyHours h
join
(select
h2.HourBucket,
sum(case when el.fromState = 'In' then 1 else -1 end) as cnt
from
tblDailyHours h2
join tblEmployeeLogging el on
h2.HourBucket >= el.timeStamp
group by h2.HourBucket
) cnt ON
h.HourBucket = cnt.HourBucket
set NumEmployees = cnt.cnt;
You can now retrieve the counts with
select *
from tblDailyHours
order by HourBucket;
The counts give the number on site at each of the times displayed, if you want during the hour in question, we'd need to tweak this a little.
There is a working version of this code (using not very realistic data in the logging table) here: rextester.com/DYOR23344
Original Answer (Based on a single over all count)
If you're happy to search over all rows, and want the current "head count" you can use this:
select
sum(case when t.FromState = 'In' then 1 else -1) as Heads
from
MyTable t
But if you know that there will always be no-one there at midnight, you can add a where clause to prevent it looking at more rows than it needs to:
where
date(t.timestamp) = curdate()
Again, on the assumption that the head count reaches zero at midnight, you can generalise that method to get a headcount at any time as follows:
where
date(t.timestamp) = "CENSUS DATE" AND
t.timestamp <= "CENSUS DATETIME"
Obviously you'd need to replace my quoted strings with code which returned the date and datetime of interest. If the headcount doesn't return to zero at midnight, you can achieve the same by removing the first line of the where clause.

How to get the record of employees who were joined in first quarter or first month

I want to retrieve the records of employees who were joined in first quarter or in the first month. I have tried this but am not getting the right answer...
SELECT * FROM table
WHERE DOJ(date_created) = DOJ(CURRENT_DATE - INTERVAL 1 MONTH)
Please help me with this!
Answering the question as clarified in a comment...
SELECT * FROM table
WHERE YEAR(table.doj) = 2015 AND QUARTER(table.doj) = 1
If instead you want "first quarter of prior year"...
SELECT * FROM table
WHERE YEAR(table.doj) = YEAR(CURRENT_DATE) - 1 AND QUARTER(table.doj) = 1
In either case, note that there's no code to include the first month, because that's part of the first quarter. However, if you wanted to make that explicit (at a slight performance hit), you could code it as follows...
SELECT * FROM table
WHERE YEAR(table.doj) = 2015 AND (QUARTER(table.doj) = 1
OR MONTH(table.doj) = 1)
If you run into performance problems because you have a lot of records but only an index on table.doj, you could also write the query over an explicit date range...
SELECT * FROM table
WHERE table.doj >= '2015-01-01' AND table.doj <= '2015-03-31'

MYSQL - find and show all duplicates within date difference critria

This query below selects all rows that have a row with the same father registering 335 days or less since earlier registration. Is there a way to edit this query so that it does not eliminate the duplicate row in the output? I need to see all instances of the registration for that father within 335 days of each other.
SELECT * FROM ymca_reg a later
WHERE NOT EXISTS (
SELECT 1 FROM ymca_reg a earlier
WHERE
earlier.Father_First_Name = later.Father_First_Name
AND earlier.Father_Last_Name = later.Father_Last_Name
AND (later.Date - earlier.Date < 335) AND (later.Date > earlier.Date)
My current query is:
SELECT ymca_reg.* FROM ymca_reg WHERE (((ymca_reg.Year) In (SELECT Year FROM ymca_reg As Tmp
GROUP BY Year, Father_Last_Name, Father_First_Name
HAVING Count(*)>1
And Father_Last_Name = ymca_reg.Father_Last_Name
And Father_First_Name = ymca_reg.Father_First_Name)))
ORDER BY ymca_reg.Year, ymca_reg.Father_Last_Name, ymca_reg.Father_First_Name
This query does return all the duplicates for review correctly, but it's terribly slow because it doesn't use a join and as soon as I add the date criteria it only returns the later row. Thanks.
I think you want something like this:
SELECT *
FROM ymca_reg later
WHERE EXISTS (SELECT 1
FROM ymca_reg earlier
WHERE earlier.Father_First_Name = later.Father_First_Name AND
earlier.Father_Last_Name = later.Father_Last_Name AND
abs(later.Date - earlier.Date) < 335 and
later.Date <> earlier.Date
);
This should return all records that have such duplicates. Note that "later" and "earlier" are no longer really apt descriptions, but I left the names so you can see the similarity to your query.

Mysql summary query with date range, multiple tables

Im running a sql query that is returning results between dates I have selected (2012-07-01 - 2012-08-01). I can tell from the values they are wrong though.
Im confused cause its not telling me I have a syntax error but the values returned are wrong.
The dates in my database are stored in the date column in the format YYYY-mm-dd.
SELECT `jockeys`.`JockeyInitials` AS `Initials`, `jockeys`.`JockeySurName` AS Lastname`,
COUNT(`runs`.`JockeysID`) AS 'Rides',
COUNT(CASE
WHEN `runs`.`Finish` = 1 THEN 1
ELSE NULL
END
) AS `Wins`,
SUM(`runs`.`StakeWon`) AS 'Winnings'
FROM runs
INNER JOIN jockeys ON runs.JockeysID = jockeys.JockeysID
INNER JOIN races ON runs.RacesID = races.RacesID
WHERE `races`.`RaceDate` >= STR_TO_DATE('2012,07,01', '%Y,%m,%d')
AND `races`.`RaceDate` <= STR_TO_DATE('2012,08,01', '%Y,%m,%d')
GROUP BY `jockeys`.`JockeySurName`
ORDER BY `Wins` DESC`
It's hard to guess what the problem is from your question.
Are you looking to summarize all the races in July and the races on the first of August? That's a slightly strange date range.
You should try the following kind of date-range selection if you want to be more precise. You MUST use it if your races.RaceDate column is a DATETIME expression.
WHERE `races`.`RaceDate` >= STR_TO_DATE('2012,07,01', '%Y,%m,%d')
AND `races`.`RaceDate` < STR_TO_DATE('2012,08,01', '%Y,%m,%d') + INTERVAL 1 DAY
This will pick up the July races and the races at any time on the first of August.
But, it's possible you're looking for just the July races. In that case you might try:
WHERE `races`.`RaceDate` >= STR_TO_DATE('2012,07,01', '%Y,%m,%d')
AND `races`.`RaceDate` < STR_TO_DATE('2012,07,01', '%Y,%m,%d') + INTERVAL 1 MONTH
That will pick up everything from midnight July 1, inclusive, to midnight August 1 exclusive.
Also, you're not using GROUP BY correctly. When you summarize, every column in your result set must either be a summary (SUM() or COUNT() or some other aggregate function) or mentioned in your GROUP BY clause. Some DBMSs enforce this. MySQL just rolls with it and gives strange results. Try this expression.
GROUP BY `jockeys`.`JockeyInitials`,`jockeys`.`JockeySurName`
My best guess is that the jocky surnames are not unique. Try changing the group by expression to:
group by `jockeys`.`JockeyInitials`, `jockeys`.`JockeySurName`
In general, it is bad practice to include columns in the SELECT clause of an aggregation query that are not included in the GROUP BY line. You can do this in MySQL (but not in other databases), because of a (mis)feature called Hidden Columns.