MySQL: Undesired result with max function on a timestamp - mysql

I use a Mantis Bug Database (that uses MySQL) and I want to query which bugs had a change in their severity within the last 2 weeks, however only the last severity change of the bug should be indicated.
The problem is, that I get multiple entries per bugID (which is the primary key), which is not my desired result since I want to have only the latest change per bug. This means that somehow I am using the max function and the group by clause wrongfully.
Here you can see my query:
SELECT `bug_id`,
max(date_format(from_unixtime(`mantis_bug_history_table`.`date_modified`),'%Y-%m-%d %h:%i:%s')) AS `Severity_changed`,
`mantis_bug_history_table`.`old_value`,
`mantis_bug_history_table`.`new_value`
from `prepared_bug_list`
join `mantis_bug_history_table` on `prepared_bug_list`.`bug_id` = `mantis_bug_history_table`.`bug_id`
where (`mantis_bug_history_table`.`field_name` like 'severity')
group by `bug_id`,`old_value`,`.`new_value`
having (`Severity_modified` >= (now() - interval 2 week))
order by bug_id` ASC
For the bug with the id 8 for example I get three entries with this query. The bug with the id 8 had indeed three severity changes within the last 2 weeks but I only want to get the latest severity change.
What could be the problem with my query?

max() is an aggregation function and it does not appear to be suitable for what you are trying to do.
I have feeling that what you are trying to do is to get the latest out of all the applicable bug_id in mantis_bug_history_table . If that is true, then I would rewrite the query as the following -- I would write a sub-query getLatest and join it with prepared_bug_list
Updated answer
Caution: I don't have access to the actual DB tables so this query may have bugs
select
`getLatest`.`last_bug_id`
, `mantis_bug_history_table`.`date_modified`
, `mantis_bug_history_table`.`old_value`
, `mantis_bug_history_table`.`new_value`
from
(
select
(
select
`bug_id`
from
`mantis_bug_history_table`
where
`date_modified` > unix_timestamp() - 14*24*3600 -- two weeks
and `field_name` like 'severity'
and `bug_id` = `prepared_bug_list`.`bug_id`
order by
`date_modified` desc
limit 1
) as `last_bug_id`
from
`prepared_bug_list`
) as `getLatest`
inner join `mantis_bug_history_table`
on `prepared_bug_list`.`bug_id` = `getLatest`.`last_bug_id`
order by `getLatest`.`bug_id` ASC

I finally have a solution! I friend of mine helped me and one part of the solution was to include the Primary key of the mantis bug history table, which is not the bug_id, but the column id, which is a consecutive number.
Another part of the solution was the subquery in the where clause:
select `prepared_bug_list`.`bug_id` AS `bug_id`,
`mantis_bug_history_table`.`old_value` AS `old_value`,
`mantis_bug_history_table`.`new_value` AS `new_value`,
`mantis_bug_history_table`.`type` AS `type`,
date_format(from_unixtime(`mantis_bug_history_table`.`date_modified`),'%Y-%m-%d %H:%i:%s') AS `date_modified`
FROM `prepared_bug_list`
JOIN mantis_import.mantis_bug_history_table
ON `prepared_bug_list`.`bug_id` = mantis_bug_history_table.bug_id
where (mantis_bug_history_table.id = -- id = that is the id of every history entry, not confuse with bug_id
(select `mantis_bug_history_table`.`id` from `mantis_bug_history_table`
where ((`mantis_bug_history_table`.`field_name` = 'severity')
and (`mantis_bug_history_table`.`bug_id` = `prepared_bug_list`.`bug_id`))
order by `mantis_bug_history_table`.`date_modified` desc limit 1)
and `date_modified` > unix_timestamp() - 14*24*3600 )
order by `prepared_bug_list`.`bug_id`,`mantis_bug_history_table`.`date_modified` desc

Related

MySQL Matching date-based First Instance of value

I have a table containing stock market data (open, hi, lo, close prices) but in a random order of date:
Date Open Hi Lo Close
12/10/2019 313.82 314.54 312.81 313.58
11/22/2019 311.09 311.24 309.85 310.96
11/25/2019 311.98 313.37 311.98 313.37
11/26/2019 313.41 314.28 313.06 314.08
11/27/2019 314.61 315.48 314.37 315.48
11/29/2019 314.86 315.13 314.06 314.31
12/2/2019 314.59 314.66 311.17 311.64
12/3/2019 308.65 309.64 307.13 309.55
I have another value in a PHP variable (say $BaseValue),and a start date and end date ($startdt and $enddt).
1) My requirement is to pick-up the value from the HI column, if it exceeds the $BaseValue on the very FIRST date in a chronological order between the given start and end dates.
For example, if the $BaseValue=314, startdt=11/22, enddt=12/2, then I want to retrieve the Date (11/26/19) as it is the earliest date on which the Hi value (314.28) exceeded the $Basevalue within the given date range. The select statement should return both the Hi value (314.28) and the Date (11/26/19).
2) Additionally, I also need to retrieve the HIGHEST value and date from the HI column during the given date duration. In the above scenario, it should return 315.48 and corresponding date 11/27.
The table is NOT in a chronological order - its randomly filled.
I am unable to get the first query at all with the use of MAX function and its various combinations. Makes me wonder if that is possible at all in SQL or not.
While the second is straightforward, I was wondering if it is more efficient and less complex to club the two queries and get the four values in one single shot.
Any ideas on how can I approach the need to fulfill this requirement please?
Thanks
You could use two subqueries for filtering, one per criteria, like:
select t.*
from mytable t
where
t.date = (
select min(t1.date)
from mytable t1
where t1.date between :datedt and :enddt and t1.hi >= :basevalue
)
or t.hi = (
select max(t1.hi)
from mytable t1
where t1.date between datedt and :enddt and t1.hi >= :basevalue
)
Another option is to union two queries with orer by and limit:
(
select t.*
from mytable
where t.date between :datedt and :enddt and t1.hi >= :basevalue
order by t.date
limit 1
)
union
(
select t.*
from mytable t
where t.date between :datedt and :enddt and t1.hi >= :basevalue
order by t.hi desc, t.date
limit 1
)
Please note that both queries do not do exactly the same thing. If there are ties for the highest hi in the period, the first query will return all ties, while the second will pick the earliest one. It's up to you to decide which solution better fits your use case.

How do I SELECT a MySQL Table value that has not been updated on a given date?

I have a MySQL database named mydb in which I store daily share prices for
423 companies in a table named data. Table data has the following columns:
`epic`, `date`, `open`, `high`, `low`, `close`, `volume`
epic and date being primary key pairs.
I update the data table each day using a csv file which would normally have 423 rows
of data all having the same date. However, on some days prices may not available
for all 423 companies and data for a particular epic and date pair will
not be updated. In order to determine the missing pair I have resorted
to comparing a full list of epics against the incomplete list of epics using
two simple SELECT queries with different dates and then using a file comparator, thus
revealing the missing epic(s). This is not a very satisfactory solution and so far
I have not been able to construct a query that would identify any epics that
have not been updated for any particular day.
SELECT `epic`, `date` FROM `data`
WHERE `date` IN ('2019-05-07', '2019-05-08')
ORDER BY `epic`, `date`;
Produces pairs of values:
`epic` `date`
"3IN" "2019-05-07"
"3IN" "2019-05-08"
"888" "2019-05-07"
"888" "2019-05-08"
"AA." "2019-05-07"
"AAL" "2019-05-07"
"AAL" "2019-05-08"
Where in this case AA. has not been updated on 2019-05-08. The problem with this is that it is not easy to spot a value that is not a pair.
Any help with this problem would be greatly appreciated.
You could do a COUNT on epic, with a GROUP BY epic for items in that date range and see if you get any with a COUNT less than 2, then select from this result where UpdateCount is less than 2, forgive me if the syntax on the column names is not correct, I work in SQL Server, but the logic for the query should still work for you.
SELECT x.epic
FROM
(
SELECT COUNT(*) AS UpdateCount, epic
FROM data
WHERE date IN ('2019-05-07', '2019-05-08')
GROUP BY epic
) AS x
WHERE x.UpdateCount < 2
Assuming you only want to check the last date uploaded, the following will return every item not updated on 2019-05-08:
SELECT last_updated.epic, last_updated.date
FROM (
SELECT epic , max(`date`) AS date FROM `data`
GROUP BY 'epic'
) AS last_updated
WHERE 'date' <> '2019-05-08'
ORDER BY 'epic'
;
or for any upload date, the following will compare against the entire database, so you don't rely on '2019-08-07' having every epic row. I.e. if the epic has been in the database before then it will show if not updated:
SELECT d.epic, max(d.date)
FROM data as d
WHERE d.epic NOT IN (
SELECT d2.epic
FROM data as d2
WHERE d2.date = '2019-05-08'
)
GROUP BY d.epic
ORDER BY d.epic

Generating complex sql tables

I currently have an employee logging sql table that has 3 columns
fromState: String,
toState: String,
timestamp: DateTime
fromState is either In or Out. In means employee came in and Out means employee went out. Each row can only transition from In to Out or Out to In.
I'd like to generate a temporary table in sql to keep track during a given hour (hour by hour), how many employees are there in the company. Aka, resulting table has columns HourBucket, NumEmployees.
In non-SQL code I can do this by initializing the numEmployees as 0 and go through the table row by row (sorted by timestamp) and add (employee came in) or subtract (went out) to numEmployees (bucketed by timestamp hour).
I'm clueless as how to do this in SQL. Any clues?
Use a COUNT ... GROUP BY query. Can't see what you're using toState from your description though! Also, assuming you have an employeeID field.
E.g.
SELECT fromState AS 'Status', COUNT(*) AS 'Number'
FROM StaffinBuildingTable
INNER JOIN (SELECT employeeID AS 'empID', MAX(timestamp) AS 'latest' FROM StaffinBuildingTable GROUP BY employeeID) AS LastEntry ON StaffinBuildingTable.employeeID = LastEntry.empID
GROUP BY fromState
The LastEntry subquery will produce a list of employeeIDs limited to the last timestamp for each employee.
The INNER JOIN will limit the main table to just the employeeIDs that match both sides.
The outer GROUP BY produces the count.
SELECT HOUR(SBT.timestamp) AS 'Hour', SBT.fromState AS 'Status', COUNT(*) AS 'Number'
FROM StaffinBuildingTable AS SBT
INNER JOIN (
SELECT SBIJ.employeeID AS 'empID', MAX(timestamp) AS 'latest'
FROM StaffinBuildingTable AS SBIJ
WHERE DATE(SBIJ.timestamp) = CURDATE()
GROUP BY SBIJ.employeeID) AS LastEntry ON SBT.employeeID = LastEntry.empID
GROUP BY SBT.fromState, HOUR(SBT.timestamp)
Replace CURDATE() with whatever date you are interested in.
Note this is non-optimal as it calculates the HOUR twice - once for the data and once for the group.
Again you are using the INNER JOIN to limit the number of returned row, this time to the last timestamp on a given day.
To me your description of the FromState and ToState seem the wrong way round, I'd expect to doing this based on the ToState. But assuming I'm wrong on that the following should point you in the right direction:
First, I create a "Numbers" table containing 24 rows one for each hour of the day:
create table tblHours
(Number int);
insert into tblHours values
(0),(1),(2),(3),(4),(5),(6),(7),
(8),(9),(10),(11),(12),(13),(14),(15),
(16),(17),(18),(19),(20),(21),(22),(23);
Then for each date in your employee logging table, I create a row in another new table to contain your counts:
create table tblDailyHours
(
HourBucket datetime,
NumEmployees int
);
insert into tblDailyHours (HourBucket, NumEmployees)
select distinct
date_add(date(t.timeStamp), interval h.Number HOUR) as HourBucket,
0 as NumEmployees
from
tblEmployeeLogging t
CROSS JOIN tblHours h;
Then I update this table to contain all the relevant counts:
update tblDailyHours h
join
(select
h2.HourBucket,
sum(case when el.fromState = 'In' then 1 else -1 end) as cnt
from
tblDailyHours h2
join tblEmployeeLogging el on
h2.HourBucket >= el.timeStamp
group by h2.HourBucket
) cnt ON
h.HourBucket = cnt.HourBucket
set NumEmployees = cnt.cnt;
You can now retrieve the counts with
select *
from tblDailyHours
order by HourBucket;
The counts give the number on site at each of the times displayed, if you want during the hour in question, we'd need to tweak this a little.
There is a working version of this code (using not very realistic data in the logging table) here: rextester.com/DYOR23344
Original Answer (Based on a single over all count)
If you're happy to search over all rows, and want the current "head count" you can use this:
select
sum(case when t.FromState = 'In' then 1 else -1) as Heads
from
MyTable t
But if you know that there will always be no-one there at midnight, you can add a where clause to prevent it looking at more rows than it needs to:
where
date(t.timestamp) = curdate()
Again, on the assumption that the head count reaches zero at midnight, you can generalise that method to get a headcount at any time as follows:
where
date(t.timestamp) = "CENSUS DATE" AND
t.timestamp <= "CENSUS DATETIME"
Obviously you'd need to replace my quoted strings with code which returned the date and datetime of interest. If the headcount doesn't return to zero at midnight, you can achieve the same by removing the first line of the where clause.

MYSQL - find and show all duplicates within date difference critria

This query below selects all rows that have a row with the same father registering 335 days or less since earlier registration. Is there a way to edit this query so that it does not eliminate the duplicate row in the output? I need to see all instances of the registration for that father within 335 days of each other.
SELECT * FROM ymca_reg a later
WHERE NOT EXISTS (
SELECT 1 FROM ymca_reg a earlier
WHERE
earlier.Father_First_Name = later.Father_First_Name
AND earlier.Father_Last_Name = later.Father_Last_Name
AND (later.Date - earlier.Date < 335) AND (later.Date > earlier.Date)
My current query is:
SELECT ymca_reg.* FROM ymca_reg WHERE (((ymca_reg.Year) In (SELECT Year FROM ymca_reg As Tmp
GROUP BY Year, Father_Last_Name, Father_First_Name
HAVING Count(*)>1
And Father_Last_Name = ymca_reg.Father_Last_Name
And Father_First_Name = ymca_reg.Father_First_Name)))
ORDER BY ymca_reg.Year, ymca_reg.Father_Last_Name, ymca_reg.Father_First_Name
This query does return all the duplicates for review correctly, but it's terribly slow because it doesn't use a join and as soon as I add the date criteria it only returns the later row. Thanks.
I think you want something like this:
SELECT *
FROM ymca_reg later
WHERE EXISTS (SELECT 1
FROM ymca_reg earlier
WHERE earlier.Father_First_Name = later.Father_First_Name AND
earlier.Father_Last_Name = later.Father_Last_Name AND
abs(later.Date - earlier.Date) < 335 and
later.Date <> earlier.Date
);
This should return all records that have such duplicates. Note that "later" and "earlier" are no longer really apt descriptions, but I left the names so you can see the similarity to your query.

MySQL: combining 2 queries (one empty) using UNION ALL results in error "1048 - Column cannot be NULL"

I need to return two different results in a single query. When I run them independently, the first returns no rows (that's fine) and the second returns some rows (also fine). When I UNION ALL them, I get 1048 - Column "Date" cannot be null.
I need resulting rows of Date, PW, errors which I will feed a graph to show me what's going on in the system at the points in time specified by Date. In both tables, Date is of the format DateTime and must never be NULL.
SELECT `Date`, COUNT(`ID`) AS `PW`, 0 AS `errors`
FROM `systemlogins`
WHERE `Result` = 'PasswordFailure' AND `Date` >= DATE_SUB(NOW(), INTERVAL 1 DAY)
UNION ALL
SELECT `Date`, 0 AS `PW`, COUNT(`ID`) AS `errors`
FROM `systemerrors`
WHERE `Date` >= DATE_SUB(NOW(), INTERVAL 1 DAY)
GROUP BY ( 4 * HOUR( `Date` ) + FLOOR( MINUTE( `Date` )/15)) --i.e. full 1/4s of hour
ORDER BY ( 4 * HOUR( `Date` ) + FLOOR( MINUTE( `Date` )/15))
I have read that MySQL might ignore tables' NOT NULL conditions in UNIONs, causing that error. I have indeed removed the "NOT NULL" restriction on the tables and, tada, it works. Now, those restrictions have been put there for a reason and I would like to keep them while running the aforementioned query - is there any way?
Edit:
Order is the villain - removing it returns a correct result, albeit with one empty row where Date is NULL. For my purposes, I need to order the results by Date somehow.
Why are you selecting the Date column? Since you are using a aggregate function COUNT, but there is no GROUP BY clause in any of the selects, seems to me that you do not care about the Date column.
Try adding a GROUP BY clause, or removing the Date column from the select.