phpMyAdmin incorrect results count - mysql

Stumbled across potentially a bug(?) within phpMyAdmin, although it's more likely to maybe be my misunderstanding of MySQL, so was hoping someone could shed some light on this behaviour.
Using the following schema
CREATE TABLE IF NOT EXISTS `mlfsql_test` (
`id` int(11) NOT NULL,
`frequency_length` smallint(3) DEFAULT NULL,
`frequency_units` varchar(10) DEFAULT NULL,
`next_delivery_date` date DEFAULT NULL,
`last_created_delivery_date` date DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=5 ;
INSERT INTO `mlfsql_test`
(`id`, `frequency_length`, `frequency_units`, `next_delivery_date`, `last_created_delivery_date`) VALUES
(1, 2, 'week', '2014-06-25', NULL),
(2, 3, 'day', '2014-06-27', NULL),
(3, 1, 'week', '2014-08-08', NULL),
(4, 2, 'day', NULL, '2014-06-26');
I want to determine rows with an upcoming delivery, based on their currently set delivery date, or their last delivery date with the frequency taken into consideration.
Came up with the following query which works fine:
SELECT *, IF (next_delivery_date IS NOT NULL, next_delivery_date,
CASE frequency_units
WHEN 'day' THEN DATE_ADD(last_created_delivery_date, INTERVAL frequency_length DAY)
WHEN 'week' THEN DATE_ADD(last_created_delivery_date, INTERVAL frequency_length WEEK)
WHEN 'month' THEN DATE_ADD(last_created_delivery_date, INTERVAL frequency_length MONTH)
END)
AS next_order_due_date
FROM mlfsql_test
HAVING next_order_due_date IS NULL OR (next_order_due_date BETWEEN CURDATE() AND CURDATE() + INTERVAL 10 DAY)
With the data currently in the table, I am expecting it to return 3 rows, but phpMyAdmin states there are a total of 4 rows of results, although it only displays 3...
I've found that if I add a WHERE clause to my query such as WHERE 1, it'll return the 3 rows and also state that there is a total of 3.
Why does it give an incorrect number of returned rows without the WHERE clause? I'm assuming without one phpMyAdmin assumes that all rows will match, however only returns those that actually did, so the count is wrong? Any help would be appreciated.
Edit: phpMyAdmin Version 4.2.0

This seems to be a bug in phpMyAdmin v4.2.x. I have opened a bug ticket (see Bug #4473). I have also proposed a fix for this bug to them (see PR #1253). You can also apply this patch to fix it in v4.2.4. This is most likely to be fixed in upcoming bugfix release i.e. v4.2.5.
Edit 1: My patch was accepted and this issue is fixed in v4.2.5 (upcoming minor release).

Related

Select one piece of data from every day at a specific hour MySQL

My database has data imputed every 1 minute and is stored in the format 2020-04-05 16:20:04 under a column called timestamp.
I need a MySQL query to select data from every day at a specific hour (the second does not matter), for for example I want to get the data from 16:00 of every day from the past 30 days.
It currently, just grabs the data from the past 30 days and then the PHP application sorts it, however, this is causing very slow loading time, hence wanting to only select the wanted data from the database.
Example of data
Please try the following sql:
select
d.timestamp, hour(d.timestamp)
from
demo1 d
where
DATEDIFF(NOW(), d.timestamp) < 30 and hour(d.timestamp) = 16;
The create sql is as following:
CREATE TABLE `demo1` (
`id` int(11) not null auto_increment primary key,
`serverid` int(11) not null,
`timestamp` datetime not null,
KEY `idx_timestamp` (`timestamp`)
) engine = InnoDB;
insert into `demo1` (serverid, timestamp)
VALUES (1, "2020-07-05 16:20:04"),
(2, "2020-07-06 17:20:04"),
(3, "2020-07-07 16:40:04"),
(4, "2020-07-08 08:20:04"),
(5, "2020-07-05 15:20:04"),
(5, "2020-07-05 16:59:04"),
(5, "2020-06-04 16:59:04");
Zhiyong's response will work, but wont perform well. You need to figure out a way to get the query to use indexes.
You can add a simple index on timestamp and run the query this way:
SELECT
d.timestamp, d.*
FROM demo1 d
WHERE 1
AND d.timestamp > CURDATE() - INTERVAL 30 DAY
AND hour(d.timestamp) = 16;
In MySQL 5.7 and up, you can created a generated column (also called calculated column) top store the hour of the timestamp in a separate column. You can then index this column, perhaps as a composite index of hour + timestamp, so that the query above will perform really quickly.
ALTER TABLE demo1
ADD COLUMN hour1 tinyint GENERATED ALWAYS AS (HOUR(timestamp)) STORED,
ADD KEY (hour1, timestamp);
The result query would be:
SELECT
d.timestamp, d.*
FROM demo1 d
WHERE 1
AND d.timestamp > CURDATE() - INTERVAL 30 DAY
AND hour1 = 16;
More info on that here:
https://dev.mysql.com/doc/refman/5.7/en/create-table-generated-columns.html
https://dev.mysql.com/doc/refman/5.7/en/generated-column-index-optimizations.html

Why Index (on primary key column) is not used?

I have a date table, which has a column date (PK). The CREATE script is here:
CREATE TABLE date_table (
date DATE
,year INT(4)
,month INT(2)
,day INT(2)
,month_pad VARCHAR(2)
,day_pad VARCHAR(2)
,month_name VARCHAR(10)
,year_month_index INT(6)
,year_month_hypname VARCHAR(7)
,year_month_name VARCHAR(15)
,week_day_index INT(1)
,day_name VARCHAR(9)
,week INT(2)
,week_interval VARCHAR(13)
,weekend_fl INT(1)
,quarter_num INT(1)
,quarter_num_pad VARCHAR(2)
,quarter_name VARCHAR(2)
,year_quarter_index INT(6)
,year_quarter_name VARCHAR(7)
,PRIMARY KEY (date)
);
Now I would like select rows from this table with dynamic values, using such as LAST_DAY() or DATE_SUB(DATE_FORMAT(SYSDATE(),'%Y-01-01'), INTERVAL X YEAR), etc.
When one of my queries failed and didn't execute in 30 secs, I knew something was fishy, and it looks like the reason is that the index on the primary key column is not used. Here are my results (sorry for using an image instead of copying the queries, but I thought it's concise enough for this purpose, and the queries are short/simple enough):
First of all, it's strange that the BETWEEN works differently than using >= and <=. Secondly, it looks like the index is only used for constant values. If you look closely, you can see that on the right side (where >= and <= is used), it shows ~9K rows, which is half of the rows in the table (the table has about ~18k rows, dates from 2000-01-01 to `2050-12-31).
SYSDATE() returns the time at which it executes. This differs from the behavior for NOW(), which returns a constant time that indicates the time at which the statement began to execute. (Within a stored function or trigger, NOW() returns the time at which the function or triggering statement began to execute.)
-- https://dev.mysql.com/doc/refman/5.7/en/date-and-time-functions.html#function_sysdate
That is, the Optimizer does not see this as a "constant". Otherwise, the Optimizer eagerly evaluates any "constant expressions", then tries to take advantage of knowing the value.
See also the sysdate_is_now option.
Bottom line: Don't use SYSDATE() for normal datetime usage; use NOW() or CURDATE().
Looks like if I use CURRENT_DATE() (or NOW()) instead of SYSDATE(), it's working. Both of these queries:
SELECT *
FROM date_table t
WHERE 1 = 1
AND t.ddate >= LAST_DAY(CURRENT_DATE()) AND t.ddate <= LAST_DAY(CURRENT_DATE());
SELECT *
FROM date_table t
WHERE 1 = 1
AND t.ddate >= LAST_DAY(NOW()) AND t.ddate <= LAST_DAY(NOW());
Give the same result, which is this:
I will accept my answer as a solution, but I'm still looking for an explanation. I thought it might has to do something with SYSDATE() not being a DATE, but NOW() is also not a DATE...
EDIT: Forgot to add, BETWEEN is also working as I see.

Counting rows in event table, grouped by time range, a lot

Imagine I have a table like this:
CREATE TABLE `Alarms` (
`AlarmId` INT UNSIGNED NOT NULL AUTO_INCREMENT
COMMENT "32-bit ID",
`Ended` BOOLEAN NOT NULL DEFAULT FALSE
COMMENT "Whether the alarm has ended",
`StartedAt` TIMESTAMP NOT NULL DEFAULT 0
COMMENT "Time at which the alarm was raised",
`EndedAt` TIMESTAMP NULL
COMMENT "Time at which the alarm ended (NULL iff Ended=false)",
PRIMARY KEY (`AlarmId`),
KEY `Key4` (`StartedAt`),
KEY `Key5` (`Ended`, `EndedAt`)
) ENGINE=InnoDB;
Now, for a GUI, I want to produce:
a list of days during which at least one alarm were "active"
for each day, how many alarms started
for each day, how many alarms ended
The intent is to present users with a dropdown box from which they can choose a date to see any alarms active (started before or during, and ended during or after) on that day. So something like this:
+-----------------------------------+
| Choose day ▼ |
+-----------------------------------+
| 2017-12-03 (3 started) |
| 2017-12-04 (1 started, 2 ended) |
| 2017-12-05 (2 ended) |
| 2017-12-16 (1 started, 1 ended) |
| 2017-12-17 (1 started) |
| 2017-12-18 |
| 2017-12-19 |
| 2017-12-20 |
| 2017-12-21 (1 ended) |
+-----------------------------------+
I will probably force an age limit on alarms so that they are archived/removed after, say, a year. So that's the scale we're working with.
I expect anywhere from zero to tens of thousands of alarms per day.
My first thought was a reasonably simple:
(
SELECT
COUNT(`AlarmId`) AS `NumStarted`,
NULL AS `NumEnded`,
DATE(`StartedAt`) AS `Date`
FROM `Alarms`
GROUP BY `Date`
)
UNION
(
SELECT
NULL AS `NumStarted`,
COUNT(`AlarmId`) AS `NumEnded`,
DATE(`EndedAt`) AS `Date`
FROM `Alarms`
WHERE `Ended` = TRUE
GROUP BY `Date`
);
This uses both of my indexes, with join type ref and ref type const, which I'm happy with. I can iterate over the resultset, dumping the non-NULL values found into a C++ std::map<boost::gregorian::date, std::pair<size_t, size_t>> (then "filling the gaps" for days on which no alarms started or ended, but were active from previous days).
The spanner I'm throwing in the works is that the list should take into account location-based timezones, but only my application knows about timezones. For logistical reasons, the MySQL session is deliberately SET time_zone = '+00:00' so that timestamps are all kicked out in UTC. (Various other tools are then used to perform any necessary location-specific corrections for historical timezones, taking into account DST and whatnot.) For the rest of the application this is great, but for this particular query it breaks the date GROUPing.
Maybe I could pre-calculate (in my application) a list of time ranges, and generate a huge query of 2n UNIONed queries (where n = number of "days" to check) and get the NumStarted and NumEnded counts that way:
-- Example assuming desired timezone is -05:00
--
-- 3rd December
(
SELECT
COUNT(`AlarmId`) AS `NumStarted`,
NULL AS `NumEnded`,
'2017-12-03' AS `Date`
FROM `Alarms`
-- Alarm started during 3rd December UTC-5
WHERE `StartedAt` >= '2017-12-02 19:00:00'
AND `StartedAt` < '2017-12-03 19:00:00'
GROUP BY `Date`
)
UNION
(
SELECT
NULL AS `NumStarted`,
COUNT(`AlarmId`) AS `NumEnded`,
'2017-12-03' AS `Date`
FROM `Alarms`
-- Alarm ended during 3rd December UTC-5
WHERE `EndedAt` >= '2017-12-02 19:00:00'
AND `EndedAt` < '2017-12-03 19:00:00'
GROUP BY `Date`
)
UNION
-- 4th December
(
SELECT
COUNT(`AlarmId`) AS `NumStarted`,
NULL AS `NumEnded`,
'2017-12-04' AS `Date`
FROM `Alarms`
-- Alarm started during 4th December UTC-5
WHERE `StartedAt` >= '2017-12-03 19:00:00'
AND `StartedAt` < '2017-12-04 19:00:00'
GROUP BY `Date`
)
UNION
(
SELECT
NULL AS `NumStarted`,
COUNT(`AlarmId`) AS `NumEnded`,
'2017-12-04' AS `Date`
FROM `Alarms`
-- Alarm ended during 4th December UTC-5
WHERE `EndedAt` >= '2017-12-03 19:00:00'
AND `EndedAt` < '2017-12-04 19:00:00'
GROUP BY `Date`
)
UNION
-- 5th December
-- [..]
But, of course, even if I'm restricting the database to a year's worth of historical alarms, that's up to like 730 UNIONd SELECTs. My spidey senses tell me that this is a very bad idea.
How else can I generate these sort of time-grouped statistics? Or is this really silly and I should look at resolving the problems preventing me from using tzinfo with MySQL?
Must work on MySQL 5.1.73 (CentOS 6) and MariaDB 5.5.50 (CentOS 7).
The UNION approach is actually not far off a viable solution; you can achieve the same thing, without a catastrophically large query, by recruiting a temporary table:
CREATE TEMPORARY TABLE `_ranges` (
`Start` TIMESTAMP NOT NULL DEFAULT 0,
`End` TIMESTAMP NOT NULL DEFAULT 0,
PRIMARY KEY (`Start`, `End`)
);
INSERT INTO `_ranges` VALUES
-- 3rd December UTC-5
('2017-12-02 19:00:00', '2017-12-03 19:00:00'),
-- 4th December UTC-5
('2017-12-03 19:00:00', '2017-12-04 19:00:00'),
-- 5th December UTC-5
('2017-12-04 19:00:00', '2017-12-05 19:00:00'),
-- etc.
;
-- Now the queries needed are simple and also quick:
SELECT
`_ranges`.`Start`,
COUNT(`AlarmId`) AS `NumStarted`
FROM `_ranges` LEFT JOIN `Alarms`
ON `Alarms`.`StartedAt` >= `_ranges`.`Start`
ON `Alarms`.`StartedAt` < `_ranges`.`End`
GROUP BY `_ranges`.`Start`;
SELECT
`_ranges`.`Start`,
COUNT(`AlarmId`) AS `NumEnded`
FROM `_ranges` LEFT JOIN `Alarms`
ON `Alarms`.`EndedAt` >= `_ranges`.`Start`
ON `Alarms`.`EndedAt` < `_ranges`.`End`
GROUP BY `_ranges`.`Start`;
DROP TABLE `_ranges`;
(This approach was inspired by a DBA.SE post.)
Notice that there are two SELECTs — the original UNION is no longer possible, because temporary tables cannot be accessed twice in the same query. However, since we've already introduced additional statements anyway (the CREATE, INSERT and DROP), this seems to be a moot problem in the circumstances.
In both cases, each row represents one of our requested periods, and the first column equals the "start" part of the period (so that we can identify it in the resultset).
Be sure to use exception handling in your code as needed to ensure that _ranges is DROPped before your routine returns; although the temporary table is local to the MySQL session, if you're continuing to use that session afterwards then you probably want a clean state, particularly if this function is going to be used again.
If this is still too heavy, for example because you have many time periods and the CREATE TEMPORARY TABLE itself will therefore become too large, or because multiple statements doesn't fit in your calling code, or because your user doesn't have permission to create and drop temporary tables, you'll have to fall back on a simple GROUP BY over DAY(Date), and ensure that your users run mysql_tzinfo_to_sql whenever the system's tzdata is updated.

MySQL JSON_EXTRACT performance

We have a logging table which is growing as new events happening. At the moment we have around 120.000 rows of log events stored.
The events table looks like this:
'CREATE TABLE `EVENTS` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`EVENT` varchar(255) NOT NULL,
`ORIGIN` varchar(255) NOT NULL,
`TIME_STAMP` TIMESTAMP NOT NULL,
`ADDITIONAL_REMARKS` json DEFAULT NULL,
PRIMARY KEY (`ID`)
) ENGINE=InnoDB AUTO_INCREMENT=137007 DEFAULT CHARSET=utf8'
Additional_Remarks is a JSON field because different applications log into this table and can add more information to the event which happened. I did not want to put any data structure here, because this information can be different. For example one project management application can log:
ID, "new task created", "app", NOW(), {"project": {"id": 1}, "creator": {"id": 1}}
While other applications do not have projects or creator, but maybe cats and owners they want to store in the Additional_Remarks field.
Queries can use the Additional_Remarks field to filter information for one specific application like:
SELECT
DISTINCT(ADDITIONAL_REMARKS->"$.project.id") as 'project',
COUNT(CASE WHEN EVENT = 'new task created' THEN 1 END) AS 'new_task'
FROM EVENTS
WHERE DATE(TIMESTAMP) >= DATE(NOW()) - INTERVAL 30 DAY
AND ORIGIN = "app"
GROUP BY project
ORDER BY new_task DESC
LIMIT 10;
Output EXPLAIN query:
'1', 'SIMPLE', 'EVENTS', NULL, 'ALL', NULL, NULL, NULL, NULL, '136459', '100.00', 'Using where; Using temporary; Using filesort'
With this query I get the top 10 projects with the most created tasks for the last 30 days. Works fine, but this queries get slower and slower as our table grows. With 120.000 rows this query needs over 30 seconds.
Do you know any way to improve the speed? The newest information in the table with the highest id is more important then older entries. Often I look only for entries which happened in the last X days. It would be useful to stop the query after the first entry is older as X days from the where clause, as all further entries are even older.
if TIME_STAMP is indexed, the DATE function will not allow the index to be used because it is non-deterministic.
WHERE DATE(TIMESTAMP) >= DATE(NOW()) - INTERVAL 30 DAY
Can be rewritten as.
WHERE TIMESTAMP >= UNIX_TIMESTAMP(DATE(NOW()) - INTERVAL 30 DAY)
Do you know any way to improve the speed?
The only way i can see to speed up the query is to multicolumn index TIMESTAMP and ORIGIN like so ALTER TABLE EVENTS ADD KEY timestamp_origin (TIME_STAMP, ORIGIN); and mine query adjustment above
EDIT
And a delivered table may improve query speed because it will use the new index.
SELECT
ADDITIONAL_REMARKS->"$.project.id" AS 'project',
COUNT(CASE WHEN EVENT = 'new task created' THEN 1 END) AS 'new_task'
FROM (
SELECT
*
FROM EVENTS
WHERE
TIME_STAMP >= UNIX_TIMESTAMP(DATE(NOW()) - INTERVAL 30 DAY)
AND
ORIGIN = "app"
)
AS events_within_30_days
GROUP BY project
ORDER BY new_task DESC
LIMIT 10;
A inner select where I already reduce the amount of rows could reduce the query time from 30 sec to 0.05 sec.
It looks like:
SELECT
ADDITIONAL_REMARKS->"$.project.id" AS 'project',
COUNT(CASE WHEN EVENT = 'new task created' THEN 1 END) AS 'new_task'
FROM (
SELECT *
FROM EVENTS WHERE
EVENT = 'new task created'
AND TIME_STAMP >= UNIX_TIMESTAMP(DATE(NOW()) - INTERVAL 30 DAY)
AND ORIGIN = "app" ) AS events_within_30_days
GROUP BY project
ORDER BY new_task DESC
LIMIT 10;

How to avoid duplicate registrations in MySQL

I wonder if it is possible to restrain users to insert duplicate registration records.
For example some team is registered from 5.1.2009 - 31.12.2009. Then someone registers the same team for 5.2.2009 - 31.12.2009.
Usually the end_date is not an issue, but start_date should not be between existing records start and end date
CREATE TABLE IF NOT EXISTS `ejl_team_registration` (
`id` int(11) NOT NULL auto_increment,
`team_id` int(11) NOT NULL,
`league_id` smallint(6) NOT NULL,
`start_date` date NOT NULL,
`end_date` date NOT NULL,
PRIMARY KEY (`team_id`,`league_id`,`start_date`),
UNIQUE KEY `id` (`id`)
);
I would check it in the code df the program, not the database.
If you want to do this in database, you can probably use pre-insert trigger that will fail if there are any conflicting records.
This is a classic problem of time overlapping. Say you want to register a certain team for the period of A (start_date) until B (end_date).
This should NOT be allowed in next cases:
the same team is already registered, so that the registered period is completely inside the A-B period (start_date >= A and end_date <= B)
the same team is already registered at point A (start_date <= A and end_date >= A)
the same team is already registered at point B (start_date <= B and end_date >= B)
In those cases, registering would cause time overlap. In any other it would not, so you're free to register.
In sql, the check would be:
select count(*) from ejl_team_registration
where (team_id=123 and league_id=45)
and ((start_date>=A and end_date<=B)
or (start_date<=A and end_date>=A)
or (start_date<=B and end_date>=B)
);
... with of course real values for the team_id, league_id, A and B.
If the query returns anything else than 0, the team is already registered and registering again would cause time overlap.
To demonstrate this, let's populate the table:
insert into ejl_team_registration (id, team_id, league_id, start_date, end_date)
values (1, 123, 45, '2007-01-01', '2007-12-31')
, (2, 123, 45, '2008-01-01', '2008-12-31')
, (3, 123, 45, '20010-01-01', '2010-12-31');
Let's check if we could register team 123 in leage 45 between '2009-02-03' and '2009-12-31':
select count(*) from ejl_team_registration
where (team_id=123 and league_id=45)
and ((start_date<='2009-02-03' and end_date>='2009-12-31')
or (start_date<='2009-03-31' and end_date>='2009-03-02')
or (start_date<='2009-12-31' and end_date>='2009-12-31')
);
The result is 0, so we can register freely.
Registering between e.g. '2009-02-03' and '2011-12-31' would not be possible.
I'll leave checking other values for you as a practice.
PS: You mentioned the end date is usually not an issue. As a matter of fact it is, since inserting an entry with invalid end date would cause overlapping as well.
Before doing your INSERT, do a SELECT to check.
SELECT COUNT(*) FROM `ejl_team_registration`
WHERE `team_id` = [[myTeamId]] AND `league_id` = [[myLeagueId]]
AND `start_date` <= NOW()
AND `end_date` >= NOW()
If that returns more than 0, then don't insert.