I was working on a SQL database question using MySQL. The goal is to find all IDs that satisfy today is warmer than yesterday. I'll show you my original code, which passed 2 out of 3 test cases and then a revised code which satisfies all 3.
What is the functional difference between these two? Is it a MySQL thing, leetcode thing, or something else?
Original
SELECT DISTINCT w2.id
FROM weather w1, weather w2
WHERE w2.RecordDate = w1.RecordDate +1 AND w2.temperature > w1.temperature
Revised
SELECT DISTINCT w2.id
FROM weather w1, weather w2
WHERE DATEDIFF(w2.RecordDate,w1.RecordDate) =1 AND w2.temperature > w1.temperature
The only differences is the use of DATEDIFF or use of w2.recordDate = w1.recordDate + 1.
I'd like to know, what is the difference between these two?
Edit: here's the LC problem https://leetcode.com/problems/rising-temperature/
This does not do what you want:
w2.RecordDate = w1.RecordDate + 1
Because you are using number arithmetics on date, this expression implicitly converts the dates to numbers, adds 1 to one of them, and then compares the results. Depending on the exact dates, it might work sometimes, but it is just a wrong approach. As an example, say your date is '2020-01-31', then adding 1 to it would produce integer number 20200132.
MySQL understands date arithmetics, so I would use:
w2.RecordDate = w1.RecordDate + interval 1 day
Related
I would like to calculate the entropy of a list in mysql.
Now I run this and move to python:
select group_concat(first_name), last_name
from table
group by last name
What I am looking for would be the equivalent of
entropy(first_name)
Returning a single number for each.
Similar to the below usage for numericals:
std(age)/avg(age)
EDIT- Partially answered: Thank you to commenter #IVO GELOV for a very efficient approximation:
SELECT LOG2(COUNT(DISTINCT column)) FROM Table
Based on solution above and an approximate of the t-test we reach comparative weighted entropy. Hacky, but works like a charm:
CASE
WHEN count(*)-1 < 6 THEN (1 + LOG2(COUNT(distinct first_name)))*5.61*power(count(*)-1,-0.71)
WHEN count(*)-1 >= 6 and cnt-1 < 27 THEN (1 + LOG2(COUNT(distinct first_name)))*2.2*power(count(*)-1,-0.081)
ELSE (1 + LOG2(COUNT(distinct first_name)))*1.815*power(count(*)-1,-0.02)
END as entropy
Defined for rows with count(*) > 1
I want to update a table (cust_id date_id) with randomly generated content
for the cust_id I am using
rand()*1000
for the datetime I am generating random dates over the past three years as follows
CONCAT(ROUND(RAND()*-3) + YEAR(NOW()),"-",ROUND(RAND()*11) + 1,"-",ROUND(RAND()*27) + 1)
Then I am generating many instances by creating inner joins from a table with 10 numbers
FROM numbers JOIN number n2 JOIN number n3
Putting it all together I run
INSERT INTO orders (cust_id, date_id)
SELECT ROUND(RAND()*1000) AS cust_id,
CONVERT(DATETIME, (CONCAT(ROUND(RAND()*-3) + YEAR(NOW()),"-",ROUND(RAND()*11) + 1,"-
",ROUND(RAND()*27) + 1))) AS date_id
FROM numbers JOIN number n2 JOIN number n3 JOIN;
I have played around with different conversion formats, and tried to set the result to a variable and cast that to datetime but it is all throwing up errors. I suspect that the problem is that mysql is reading it as a function rather than a string. I have found another work around where I keep the original datetime and use intervals, but would like to know what the issue with my initial approach is. Any insights that people have would be appreciated.
I have your basic formula working using the DATE() function.
SELECT DATE(CONCAT(ROUND(RAND()*-3) + YEAR(NOW()),"-",
ROUND(RAND()*11) + 1,"-",
ROUND(RAND()*27) + 1))
Still, you're much better off using
SELECT CURDATE() - INTERVAL ROUND(RAND()*3*365.25) DAY
Why? If you leave leap-year February 29 out of your test data, you leave out something critical to test. And if you leave out days 29, 30, and 31 from all your test months, you may not get test coverage for end-of-month date arithmetic.
I have a Data as follows:
Order_id Created_on Comment
1 20-07-2015 18:35 Order Placed by User
1 20-07-2015 18:45 Order Reviewed by the Agent
1 20-07-2015 18:50 Order Dispatched
2 20-07-2015 18:36 Order Placed by User
And I am trying to find the difference between the
first and second Date
Second and third Date for each Order. How Do i Obtain this through a SQL query?
SQL is about horizontal relations - vertical relations do not exist. To a relational database they're just 2 rows, stored somewhere on a disk, and until you apply ordering to a result set the 'first and second' are just 2 randomly picked rows.
In specific cases it's possible to calculate the time difference within SQL, but rarely a good idea for performance reason, as it requires costly self-joins or subqueries. Just selecting the right data in the right order and then calculating the differences during postprocessing in C#/PHP/whatever is far more practical and faster.
I think you can use a query like this:
SELECT t1.Order_id, t1.Created_on, TIMEDIFF(mi, t1.Created_on, COALESCE(MIN(t2.Created_on), t1.Created_on)) AS toNextTime
FROM yourTable t1
LEFT JOIN yourTable t2 ON t1.Order_id = t2.Order_id AND t1.Created_on < t2.Created_on
GROUP BY t1.Order_id, t1.Created_on
Posting this even though another answer has been accepted already - and I don't disagree with the accepted answer - but there is in fact a fairly neat way to do this with mySQL variables.
This query will give you the time between stages in minutes - it can't be expressed as a datetime as it's an interval between two dates:
SELECT
Order_id,
Created_on,
Comment,
if (#last_id = Order_id, TIMESTAMPDIFF(MINUTE, #last_date, Created_on), 0) as StageMins,
#last_id := Order_id,
#last_date := Created_on
FROM tblData
ORDER BY Order_id, Created_on;
SQL Fiddle here: http://sqlfiddle.com/#!9/6ffdd/10
Info on mySQL TIMESTAMPDIFF function here: https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_timestampdiff
I have an existing SQL query that gets call stats from a Zultys MX250 phone system: -
SELECT
CONCAT(LEFT(u.firstname,1),LEFT(u.lastname,1)) AS Name,
sec_to_time(SUM(
time_to_sec(s.disconnecttimestamp) - time_to_sec(s.connecttimestamp)
)) AS Duration,
COUNT(*) AS '#Calls'
FROM
session s
JOIN mxuser u ON
s.ExtensionID1 = u.ExtensionId
OR s.ExtensionID2 = u.ExtensionId
WHERE
s.ServiceExtension1 IS NULL
AND s.connecttimestamp >= CURRENT_DATE
AND BINARY u.userprofilename = BINARY 'DBAM'
GROUP BY
u.firstname,
u.lastname
ORDER BY
'#Calls' DESC,
Duration DESC;
Output is as follows: -
Name Duration #Calls
TH 01:19:10 30
AS 00:44:59 28
EW 00:51:13 22
SH 00:21:20 13
MG 00:12:04 8
TS 00:42:02 5
DS 00:00:12 1
I am trying to generate a 4th column that shows the average call time for each user, but am struggling to figure out how.
Mathematically it's just "'Duration' / '#Calls'" but after looking at some similar questions on StackOverflow, the example queries are too simple to help me relate to my one above.
Right now, I'm not even sure that it's going to be possible to divide the time column by the number of calls.
UPDATE: I was so close in my testing but got all confused & overcomplicated things. Here's the latest SQL (thanks to #McAdam331 & my buddy Jim from work): -
SELECT
CONCAT(LEFT(u.firstname,1),LEFT(u.lastname,1)) AS Name,
sec_to_time(SUM(
time_to_sec(s.disconnecttimestamp) - time_to_sec(s.connecttimestamp)
)) AS Duration,
COUNT(*) AS '#Calls',
sec_to_time(SUM(time_to_sec(s.disconnecttimestamp) - time_to_sec(s.connecttimestamp)) / COUNT(*)) AS Average
FROM
session s
JOIN mxuser u ON
s.ExtensionID1 = u.ExtensionId
OR s.ExtensionID2 = u.ExtensionId
WHERE
s.ServiceExtension1 IS NULL
AND s.connecttimestamp >= CURRENT_DATE
AND BINARY u.userprofilename = BINARY 'DBAM'
GROUP BY
u.firstname,
u.lastname
ORDER BY
Average DESC;
Output is as follows: -
Name Duration #Calls Average
DS 00:14:25 4 00:03:36
MG 00:17:23 11 00:01:34
TS 00:33:38 22 00:01:31
EW 01:04:31 43 00:01:30
AS 00:49:23 33 00:01:29
TH 00:43:57 35 00:01:15
SH 00:13:51 12 00:01:09
Well, you are able to get the number of total seconds, as you do before converting it to time. Why not take the number of total seconds, divide that by the number of calls, and then convert that back to time?
SELECT sec_to_time(
SUM(time_to_sec(s.disconnecttimestamp) - time_to_sec(s.connecttimestamp)) / COUNT(*))
AS averageDuration
If I understand correctly, you can just replace sum() with avg():
SELECT
CONCAT(LEFT(u.firstname,1),LEFT(u.lastname,1)) AS Name,
sec_to_time(SUM(
time_to_sec(s.disconnecttimestamp) - time_to_sec(s.connecttimestamp)
)) AS Duration,
COUNT(*) AS `#Calls`,
sec_to_time(AVG(
time_to_sec(s.disconnecttimestamp) - time_to_sec(s.connecttimestamp)
)) AS AvgDuration
Seems like all you need is another expression in the SELECT list. The SUM() aggregate (from the second expression) divided by COUNT aggregate (the third expr). Then wrap that in a sec_to_time function. (Unless I'm totally missing the question.)
Personally, I'd use the TIMESTAMPDIFF function to get a difference in times.
SEC_TO_TIME(
SUM(TIMESTAMPDIFF(SECOND,s.connecttimestamp,s.disconnecttimestamp))
/ COUNT(*)
) AS avg_duration
If what you are asking is there's a way to reference other expressions in the SELECT list by the alias... the answer is unfortunately, there's not.
With a performance penalty, you could use your existing query as an inline view, then in the outer query, the alias names assigned to the expressions are available...
SELECT t.Name
, SEC_TO_TIME(s.TotalDur) AS Duration
, s.`#Calls`
, SEC_TO_TIME(s.TotalDur/s.`#Calls`) AS avgDuration
FROM (
SELECT CONCAT(LEFT(u.firstname,1),LEFT(u.lastname,1)) AS Name
, SUM(TIMESTAMPDIFF(SECOND,s.connecttimestamp,s.disconnecttimestamp)) AS TotalDur
, COUNT(1) AS `#Calls`
FROM session s
-- the rest of your query
) t
Im running a sql query that is returning results between dates I have selected (2012-07-01 - 2012-08-01). I can tell from the values they are wrong though.
Im confused cause its not telling me I have a syntax error but the values returned are wrong.
The dates in my database are stored in the date column in the format YYYY-mm-dd.
SELECT `jockeys`.`JockeyInitials` AS `Initials`, `jockeys`.`JockeySurName` AS Lastname`,
COUNT(`runs`.`JockeysID`) AS 'Rides',
COUNT(CASE
WHEN `runs`.`Finish` = 1 THEN 1
ELSE NULL
END
) AS `Wins`,
SUM(`runs`.`StakeWon`) AS 'Winnings'
FROM runs
INNER JOIN jockeys ON runs.JockeysID = jockeys.JockeysID
INNER JOIN races ON runs.RacesID = races.RacesID
WHERE `races`.`RaceDate` >= STR_TO_DATE('2012,07,01', '%Y,%m,%d')
AND `races`.`RaceDate` <= STR_TO_DATE('2012,08,01', '%Y,%m,%d')
GROUP BY `jockeys`.`JockeySurName`
ORDER BY `Wins` DESC`
It's hard to guess what the problem is from your question.
Are you looking to summarize all the races in July and the races on the first of August? That's a slightly strange date range.
You should try the following kind of date-range selection if you want to be more precise. You MUST use it if your races.RaceDate column is a DATETIME expression.
WHERE `races`.`RaceDate` >= STR_TO_DATE('2012,07,01', '%Y,%m,%d')
AND `races`.`RaceDate` < STR_TO_DATE('2012,08,01', '%Y,%m,%d') + INTERVAL 1 DAY
This will pick up the July races and the races at any time on the first of August.
But, it's possible you're looking for just the July races. In that case you might try:
WHERE `races`.`RaceDate` >= STR_TO_DATE('2012,07,01', '%Y,%m,%d')
AND `races`.`RaceDate` < STR_TO_DATE('2012,07,01', '%Y,%m,%d') + INTERVAL 1 MONTH
That will pick up everything from midnight July 1, inclusive, to midnight August 1 exclusive.
Also, you're not using GROUP BY correctly. When you summarize, every column in your result set must either be a summary (SUM() or COUNT() or some other aggregate function) or mentioned in your GROUP BY clause. Some DBMSs enforce this. MySQL just rolls with it and gives strange results. Try this expression.
GROUP BY `jockeys`.`JockeyInitials`,`jockeys`.`JockeySurName`
My best guess is that the jocky surnames are not unique. Try changing the group by expression to:
group by `jockeys`.`JockeyInitials`, `jockeys`.`JockeySurName`
In general, it is bad practice to include columns in the SELECT clause of an aggregation query that are not included in the GROUP BY line. You can do this in MySQL (but not in other databases), because of a (mis)feature called Hidden Columns.