I use a Mantis Bug Database (that uses MySQL) and I want to query which bugs had a change in their severity within the last 2 weeks, however only the last severity change of the bug should be indicated.
The problem is, that I get multiple entries per bugID (which is the primary key), which is not my desired result since I want to have only the latest change per bug. This means that somehow I am using the max function and the group by clause wrongfully.
Here you can see my query:
SELECT `bug_id`,
max(date_format(from_unixtime(`mantis_bug_history_table`.`date_modified`),'%Y-%m-%d %h:%i:%s')) AS `Severity_changed`,
`mantis_bug_history_table`.`old_value`,
`mantis_bug_history_table`.`new_value`
from `prepared_bug_list`
join `mantis_bug_history_table` on `prepared_bug_list`.`bug_id` = `mantis_bug_history_table`.`bug_id`
where (`mantis_bug_history_table`.`field_name` like 'severity')
group by `bug_id`,`old_value`,`.`new_value`
having (`Severity_modified` >= (now() - interval 2 week))
order by bug_id` ASC
For the bug with the id 8 for example I get three entries with this query. The bug with the id 8 had indeed three severity changes within the last 2 weeks but I only want to get the latest severity change.
What could be the problem with my query?
max() is an aggregation function and it does not appear to be suitable for what you are trying to do.
I have feeling that what you are trying to do is to get the latest out of all the applicable bug_id in mantis_bug_history_table . If that is true, then I would rewrite the query as the following -- I would write a sub-query getLatest and join it with prepared_bug_list
Updated answer
Caution: I don't have access to the actual DB tables so this query may have bugs
select
`getLatest`.`last_bug_id`
, `mantis_bug_history_table`.`date_modified`
, `mantis_bug_history_table`.`old_value`
, `mantis_bug_history_table`.`new_value`
from
(
select
(
select
`bug_id`
from
`mantis_bug_history_table`
where
`date_modified` > unix_timestamp() - 14*24*3600 -- two weeks
and `field_name` like 'severity'
and `bug_id` = `prepared_bug_list`.`bug_id`
order by
`date_modified` desc
limit 1
) as `last_bug_id`
from
`prepared_bug_list`
) as `getLatest`
inner join `mantis_bug_history_table`
on `prepared_bug_list`.`bug_id` = `getLatest`.`last_bug_id`
order by `getLatest`.`bug_id` ASC
I finally have a solution! I friend of mine helped me and one part of the solution was to include the Primary key of the mantis bug history table, which is not the bug_id, but the column id, which is a consecutive number.
Another part of the solution was the subquery in the where clause:
select `prepared_bug_list`.`bug_id` AS `bug_id`,
`mantis_bug_history_table`.`old_value` AS `old_value`,
`mantis_bug_history_table`.`new_value` AS `new_value`,
`mantis_bug_history_table`.`type` AS `type`,
date_format(from_unixtime(`mantis_bug_history_table`.`date_modified`),'%Y-%m-%d %H:%i:%s') AS `date_modified`
FROM `prepared_bug_list`
JOIN mantis_import.mantis_bug_history_table
ON `prepared_bug_list`.`bug_id` = mantis_bug_history_table.bug_id
where (mantis_bug_history_table.id = -- id = that is the id of every history entry, not confuse with bug_id
(select `mantis_bug_history_table`.`id` from `mantis_bug_history_table`
where ((`mantis_bug_history_table`.`field_name` = 'severity')
and (`mantis_bug_history_table`.`bug_id` = `prepared_bug_list`.`bug_id`))
order by `mantis_bug_history_table`.`date_modified` desc limit 1)
and `date_modified` > unix_timestamp() - 14*24*3600 )
order by `prepared_bug_list`.`bug_id`,`mantis_bug_history_table`.`date_modified` desc
I had a sql query I would run that would get a rolling sum (or moving window) data set. I would run this query for every 7 days, increase the interval number by 7 (28 in example below) until I reached the start of the data. It would give me the data split by week so I can loop through it on the view to create a weekly graph.
SELECT *
FROM `table`
WHERE `row_date` >= DATE_SUB(NOW(), INTERVAL 28 DAY)
AND `row_date` <= DATE_SUB(NOW(), INTERVAL 28 DAY)
This is of course very slow once you have several weeks worth of data. I wanted to replace it with a single query. I came up with this.
SELECT *
CONCAT(YEAR(row_date), '/', WEEK(row_date)) as week_date
FROM `table`
GROUP BY week_date
ORDER BY row_date DESC
It appeared mostly accurate, except I noticed the current week and the last week of 2015 was much lower than usual. That's because this query gets a week starting on Sunday (or Monday?) meaning that it resets weekly.
Here's a data set of employees that you can use to demonstrate the behavior.
CREATE TABLE employees (
id INT NOT NULL,
first_name VARCHAR(14) NOT NULL,
last_name VARCHAR(16) NOT NULL,
row_date DATE NOT NULL,
PRIMARY KEY (id)
);
INSERT INTO `employees` VALUES
(1,'Bezalel','Simmel','2016-12-25'),
(2,'Bezalel','Simmel','2016-12-31'),
(3,'Bezalel','Simmel','2017-01-01'),
(4,'Bezalel','Simmel','2017-01-05')
This data will return the last 3 rows on the same data point on the old query (last 7 days) assuming you run it today 2017-01-06, but only the last 2 rows on the same data point on the new query (Sunday to Saturday).
For more information on what I mean by rolling or moving window, see this English stack exchange link.
https://english.stackexchange.com/questions/362791/word-for-graph-that-counts-backwards-vs-graph-that-counts-forwards
How can I write a query in MySQL that will bring me rolling data, where the last data point is the last 7 days of data, the previous point is the previous 7 days, and so on?
I've had to interpret your question a lot so this answer might be unsuitable. It sounds like you are trying to get a graph showing data historically grouped into 7-day periods. Your current attempt does this by grouping on calendar week instead of by 7-day period leading to inconsistent size of periods.
So using a modification of your dataset on sql fiddle ( http://sqlfiddle.com/#!9/90f1f2 ) I have come up with this
SELECT
-- Figure out how many periods of 7 days ago this record applies to
FLOOR( DATEDIFF( CURRENT_DATE , row_date ) / 7 ) AS weeks_ago,
-- Count the number of ids in this group
COUNT( DISTINCT id ) AS number_in_week,
-- Because this is grouped, make sure to have some consistency on what we select instead of leaving it to chance
MIN( row_date ) AS min_date_in_week_in_dataset
FROM `sample_data`
-- Groups by weeks ago because that's what you are interested in
GROUP BY weeks_ago
ORDER BY
min_date_in_week_in_dataset DESC;
With respect to the data set below, I'm trying to get the top 5 records per day on a MySQL database. It's a table of web page visits & my aim is to find out the 5 most visited pages.
I'm comfortable getting just the top 10 in a given date range, but, have not been able to manage to get a query going for the topic in question.
I did try the below
select
VISIT_DATE,
group_concat(PAGE_ID order by NUM_VISITS desc separator ',') as pagehits
from
PAGEVISITS
where
VISIT_DATE >= '2015-07-01' and VISIT_DATE <= '2015-07-15'
group by
VISIT_DATE
but I can't get SUM(NUM_VISITS) int here & I couldn't get group byVISIT_DATE` which makes it pretty useless. This apart, this is how far I've got
select
VISIT_DATE,
PAGE_ID,
SUM(NUM_VISITS) as pagehits
from
PAGEVISITS
where
VISIT_DATE >= '2015-01-01' and VISIT_DATE <= '2015-03-15'
group by
VISIT_DATE,
PAGE_ID
order by
pagehits desc
limit 5;
which obviously is not top 5 per day. Also, there could be more than one page that can end up having the same number of page hits obviously & may also end up appearing as one of the top 5 which is why I tried using group concat to display all those PAGE IDs whose number of page hits is in the top 5 page hit count for that day.
I'm not a seasoned SQL coder. Could I please request assistance to get this working. If I've not sounded clear anywhere, please do let me know.
CREATE TABLE PAGEVISITS
(`VISIT_DATE` date, `PAGE_ID` varchar(20), `SERVER_NAME` varchar(50), `NUM_VISITS` int)
;
INSERT INTO PAGEVISITS
(`VISIT_DATE`, `PAGE_ID`, `SERVER_NAME`, `NUM_VISITS`)
VALUES
('2015-01-01','2015A12123','A',10),
('2015-01-01','2015A12123','B',10),
('2015-01-01','2015A12124','A',30),
('2015-01-01','2015A12124','B',30),
('2015-01-01','2015A12125','A',40),
('2015-01-01','2015A12125','B',40),
('2015-01-01','2015A12126','A',1),
('2015-01-01','2015A12126','B',1),
('2015-01-01','2015A12127','A',0),
('2015-01-01','2015A12127','B',1),
('2015-01-01','2015A12128','A',40),
('2015-01-01','2015A12129','A',30),
('2015-01-01','2015A12134','A',45),
('2015-01-01','2015A12126','A',56),
('2015-01-01','2015A12167','A',23),
('2015-01-01','2015A12145','A',17),
('2015-01-01','2015A121289','A',12),
('2015-01-01','2015A121289','B',5),
('2015-01-02','2015A12123','A',3),
('2015-01-02','2015A12124','A',10),
('2015-01-02','2015A12125','A',70),
('2015-01-02','2015A12126','A',10),
('2015-01-02','2015A12127','A',100),
('2015-01-02','2015A12128','A',3),
('2015-01-02','2015A12128','B',2),
('2015-01-02','2015A12129','A',10),
('2015-01-02','2015A12134','A',5),
('2015-01-02','2015A12126','A',6),
('2015-01-02','2015A12167','A',3),
('2015-01-02','2015A12145','A',170),
('2015-01-02','2015A121289','A',34),
('2015-01-03','2015A12123','A',34),
('2015-01-03','2015A12124','A',14),
('2015-01-03','2015A12125','A',37),
('2015-01-03','2015A12126','A',23),
('2015-01-03','2015A12127','A',234),
('2015-01-03','2015A12128','A',47),
('2015-01-03','2015A12129','A',67),
('2015-01-03','2015A12134','A',89),
('2015-01-03','2015A12134','B',1),
('2015-01-03','2015A12126','A',97),
('2015-01-03','2015A12167','A',35),
('2015-01-03','2015A12145','A',0),
('2015-01-03','2015A121289','A',19),
('2015-01-04','2015A12123','A',115),
('2015-01-04','2015A12124','A',149),
('2015-01-04','2015A12125','A',370),
('2015-01-04','2015A12126','A',34),
('2015-01-04','2015A12127','A',4),
('2015-01-04','2015A12128','A',70),
('2015-01-04','2015A12129','B',70),
('2015-01-04','2015A12134','A',70),
('2015-01-04','2015A12126','B',64),
('2015-01-04','2015A12167','A',33),
('2015-01-04','2015A12145','A',10);
ANTICIPATED OUTPUT
Fiddle here
If this is going to be used daily, then you should consider to create a separate table and fill the data in it using procedure. There is still better way to do this(using merge). This is just for your reference.
create table daily_results
(`VISIT_DATE` date, `PAGE_ID` varchar(20), `SERVER_NAME` varchar(50), `NUM_VISITS` int);
CREATE PROCEDURE proc_loop_test( IN startdate date, in enddate date)
BEGIN
WHILE(startdate < enddate) DO
insert into daily_results (select * from PAGEVISITS where VISIT_DATE=startdate order by NUM_VISITS desc limit 5);
SET startdate = date_add(startdate, INTERVAL 1 DAY);
end WHILE;
END;
call it using
call proc_loop_test(`2015-01-01`,`2015-03-15`);
select * from daily_results;
The query should be
select sub.*,
CASE WHEN #vd!=VISIT_DATE THEN #rn:=0 ELSE #rn:=#rn+1 END as row_num,
#vd:=VISIT_DATE
from (
select
VISIT_DATE,
PAGE_ID,
SUM(NUM_VISITS) as pagehits
from
PAGEVISITS
where
VISIT_DATE >= '2015-01-01' and VISIT_DATE <= '2015-03-15'
group by
VISIT_DATE,
PAGE_ID
order by
VISIT_DATE, pagehits desc) sub
having row_num<5
Somehow the SQL fiddle shows kind of internal error when the query is executed.
Goal: Write the correct SQL to solve the problems below.
Part 1:
Having trouble figuring out the SQL statement on how to get the timestamp that includes the date and the hour where you have the maximum "in_bytes" for each day. See "video_hourly" table DDL code below. If there are two maximum values that have the same value in a given day just pick the first one. This data is being graphed in highcharts so there can only be one data point for each given day. You can fill the table with some sample data.
Part 2:
Another part of this problem is once you have all of the unique maximum "in_bytes" for each day then you need to sum the "in_bytes" and "out_bytes" to get one record.
To convert the UTC time from the database to local time we using this in the queries:
SELECT time_stamp,CONVERT_TZ(time_stamp, '+00:00', '-07:00' ) as localtime
Here is the DDL SQL for the table:
CREATE TABLE video_hourly (
id bigint(20) NOT NULL AUTO_INCREMENT,
time_stamp datetime NOT NULL,
in_bytes bigint(20) UNSIGNED NOT NULL DEFAULT 0,
out_bytes bigint(20) UNSIGNED NOT NULL DEFAULT 0,
opt_pct decimal(11, 2) NOT NULL DEFAULT 0.00,
PRIMARY KEY (id)
)
ENGINE = INNODB;
Any help or advice on this would greatly be appreciated. Thank you!
See this list of datetime functions that you can use. Specifically, you can use HOUR() to get the hour value.
You can also use DATE() to get the date part of a datetime column. Once you have those, you can group them together. I will try and break it down for you.
This will return the date, hour, and the in_bytes for that hour, by grouping by day and hour.
SELECT DATE(time_stamp) AS date, HOUR(time_stamp) AS hour, SUM(in_bytes) AS totalInBytes
FROM video_hourly
GROUP BY date, hour
ORDER BY date, hour, totalInBytes DESC;
This will also but the max totalInBytes at the top of each group because it orders by that in descending order.
Also, please see this question for how to get the max value in a group, which in this case is you want to get the max inBytes for each date.
Then, you can change your query to this:
SELECT CONCAT(v.date, ' ', v.hour) AS dateAndHour, v.totalInBytes
FROM(SELECT time_stamp AS fullDate, DATE(time_stamp) AS date, HOUR(time_stamp) AS hour, SUM(in_bytes) AS totalInBytes
FROM video_hourly
GROUP BY date, hour
ORDER BY date, hour, totalInBytes DESC
) v
WHERE(
SELECT COUNT(*)
FROM(SELECT DATE(time_stamp) AS date, HOUR(time_stamp) AS hour, SUM(in_bytes) AS totalInBytes
FROM video_hourly
GROUP BY date, hour
ORDER BY date, hour, totalInBytes DESC
) vh
WHERE vh.date = v.date AND vh.totalInBytes >= v.totalInBytes
) <= 1;
I can't try it without any sample data, but here is an SQL Fiddle link, if you want to try it out. I used this to make sure it would not produce any errors.