Query optimization to comapre dates in mysql - mysql

i was in a mysql competion and faced this problem , i 've written ordinary mysql query with select and compare date but it exceeded time limit ?
anyone has an observation how to write a better one here ?
btw i don't have extra info about database it self and we can't use indexing here ( only read from db)
Here's problem statement ..
Given a table called "bugs" with the following columns
(id, token, title, category, device, reported_at, created_at, updated_at).
Find how many bugs were created at "2019-03-01" or later.
It is worth noting that created_at represents the time at which the bug was persisted to the database.
Your query should produce a table with one column called "count".

"Find how many bugs were"
SELECT COUNT(*)
"created at "2019-03-01" or later."
WHERE created_at >= "2019-03-01"
(That works regardless of datatype DATE, DATETIME, or TIMESTAMP.)
"It is worth noting that created_at represents the time at which the bug was persisted to the database."
That just clarifies that `created_at` is the desired column for the problem.
"Your query should produce a table with one column called "count"."
Interpretation 1: "table" = tabular output, a la
mysql> SELECT COUNT(*) AS 'count' FROM t_county;
+-------+
| count |
+-------+
| 3482 |
+-------+
Interpretation 2: An actual table was desired:
CREATE TABLE foo (
`count` INT NOT NULL
)
SELECT COUNT(*) AS 'count'
FROM bugs
WHERE ... ;
This creates a "table" and immediately populates it.
"but it exceeded time limit"
Interpretation 1 -- the query ran too slowly. -- Well, I can't help you there without seeing your query.
Interpretation 2 -- the test was timed. -- Can't help you there. Practice, practice.

Related

Order by not sorting as expected

I'm using following query:
SELECT plan.datum, plan.anketa, plan.ai as autoinc, plan.objekt as sifra , objekt.sifra, objekt.temp4_da FROM plan
LEFT JOIN objekt ON plan.objekt = objekt.sifra WHERE objekt.temp4_da = '1'
AND objekt.sifra >= 30 AND plan.datum > '2019-01-15' and plan.datum < '2019-01-30'
GROUP BY objekt.sifra
ORDER BY plan.datum ASC, plan.objekt ASC
I get results which is sorted by the last records, though I did put it sorted by date.
Results should be from 2019-01-15, but as you can see its sorted to last date plan.datum < '2019-01-30'...
How can I achive this?
EDIT:
When I select from 2019-01-15 to 2019-01-20 I achive this:
Your result comes from the ability of MySQL to process incorrect SQL queries with GROUP BY. For example, most of the DBMS is not capable to process query like this
SELECT col1, col2
FROM tab
GROUP BY col1
Some DBMS process this query if col1 is the primary key of tab, however, MySQL process it always and it returns a RANDOM col2 value if there are more than one col2 values corresponding to col1! For example, having table
col1 | col2
-----------
a | 1
a | 2
then MySQL may return (a, 1) result on Monday, and (a, 2) on Tuesday using my SQL query shown above (I'm little sarcastic).
I believe that is also your case. MySQL picks random plan.datum for each objekt.sifra (the group by attribute in your query) and you subsequently miss some plan.datum values in the result. Fix your query to obtain deterministic values and you will get rid of your problems.
Given that it does seem to have sorted how you wanted the first 4 rows shows that, those dates are all in the range specified.
You need to just go through basic diagnosis:
Does 'plan' table actually contain data with those dates?
If it does, then the data is being removed by your query.
So the next easiest to check is the WHERE, so remove the other clauses (i.e. leave the 'datum' restrictions), does that data now appear?
If it still doesn't, then the LEFT JOIN is the issue, as joins are filters too.
If you do those and the data appears, then the data and your understanding of the data don't match, and you need to check/confirm any assumptions about the data you may have made.
I'm not 100% familiar with mysql, but the GROUP BY looks really odd, your not doing any sums, mins, or operations on the group. Do you need that line?

Optimizing summing/grouping query with millions of rows on MYSQL

I have a MySQL table with nearly 4.000.000 rows containing income transactions of more than 100.000 employees.
There are three columns relevant in it, which are:
Employee ID [VARCHAR and INDEX] (not unique since one employee gets more than one income);
Type of Income [also VARCHAR and INDEX]
Value of the Income [Decimal; 10,2]
What I was looking to do seems to be very simple to me. I wanted to sum all the income occurrences grouping by each employee, filtering by one type.
For that, I was using the following code:
SELECT
SUM(`value`) AS `SumofValue`,
`type`,
`EmployeeID`
FROM
`Revenue`
GROUP BY `EmployeeID`
HAVING `type` = 'X'
And the result was supposed to be something like this:
SUM TYPE EMPLOYEE ID
R$ 250,00 X 250000008377
R$ 5.000,00 X 250000004321
R$ 3.200,00 X 250000005432
R$ 1.600,00 X 250000008765
....
However, this is taking a long time. I decide to use the LIMIT command to limit the results just to 1.000 rows and it is working, but if i want to do for the whole table, it would take approximately 1 hous according to my projections. This seems to be way too much time for something that does not look sooooo demandable to me (but i'm assuming i'm probably wrong). Not only that, but this is just the first step on an even more complex query that i intend to run in the future, in which i will group also by Employer ID, aside with Employee ID (one person can get income from more than one employer).
Is there any way to optimize this? Is there anything wrong with my code? Is there any secret path to increase the speed of this operation? Should I index the column of the value of the income as well? If this is a MySQL limitation, is there any option that could handle this better?
I would really appreaciate any help.
Thanks in advance
DISCLOSURE: This is a open government database. All this data is lawfully open to the public.
First, phrase the query using WHERE, rather than HAVING -- filter before doing the aggregation:
SELECT SUM(`value`) AS `SumofValue`,
MAX(type) as type,
EmployeeID
FROM Revenue r
WHERE `type` = 'X'
GROUP BY EmployeeID;
Next, try using this index: (type, EmployeeId, value). At the very least, this is a covering index for the query. MySQL (depending on the version) might be smart enough to use it for the aggregation as well.
As per your defined schema, Why you are using VARCHAR datatype for Employee ID and Type.
You can create reference table for Type with 1-->X, 2-->Y...So basically integer reference will be for type in transaction table.
Just create one dummy table with below one and execute your same query which was taking hour. Even you will see major change in execution plan as well.
CREATE TABLE test_transaction
(
Employee_ID BIGINT,
Type SMALLINT,
Income DECIMAL(10,2)
)
Create separate index on Employee_ID and Type column.

MySQL Select & Limit computational complexity

Let's say I have a mysql table defined like this:
create table test_table(
id int(10) unsigned auto_increment primary key
/*, other attributes...*/
);
And given that table I wanna fetch the last record from it like this:
select * from test_table order by id desc limit 1;
It works, but it feels a bit sketchy, what is its complexity?
Is it O(log(n)) since "limit" and "order by" are executed after the select?
Is there a better way to select the last record from an auto incremented table?
I think I figured it out.
My initial question was linked to "Select & Limit", but really this applies to all queries.
MySQL provides the "analyze" keyword.
You can invoke it on your terminal an then execute your query; It will output some metadata regarding the details of the execution.
Here's an example using the table in my question (I change its name to "comment" and its PK to "commentid" to give it some context):
analyze
select * from comment order by commentid desc limit 1;
And the following is the output:
"rows" tells you how many rows the query iterated and "r_rows" are the result rows.
This is what I was looking for.
I was living under the impression that somehow the "limit" keyword would optimize the query. It doesn't.
On the other hand, you can also use MAX() to get the last row
analyze
select * from comment where commentid=(select max(commentid) from comment);
The primary query iterates just 1 row obviously, but the subquery should be the most complex select of the two, so I analyzed it:
analyze
select max(commentid) from comment;
gave me:
This doesn't tell me much, except for the "extra" description, which says: "Select tables optimized away".
I looked that up and it's an already answered question on stack
From what I've gathered so far that description means the MAX doesn't actually count the rows of your table, instead it uses a stored value which is managed by the sql engine.
It only works if the column has "auto_increment".
The accepted answer also says it only works on MyISAM tables, but I'm running these tests on a InnoDB table, and the optimization seems to be working.
Here are the details:
SELECT PLUGIN_NAME, PLUGIN_VERSION, PLUGIN_TYPE_VERSION, PLUGIN_LIBRARY, PLUGIN_LIBRARY_VERSION, PLUGIN_AUTHOR
FROM information_schema.PLUGINS
WHERE PLUGIN_NAME = 'innodb';
PS: You might be wondering if doing this:
ALTER TABLE comment AUTO_INCREMENT = 999;
messes up the optimization.
The answer is no, it doesn't, setting the AUTO_INCREMENT to a certain value only affects the next entry. Try it yourself, modify the AUTO_INCREMENT value and then run
select max(commentid) from comment;
you will still get the correct value.
You can get desired output with this approach as well.
SELECT * FROM test_table where id=(select max(id) from test_table);
Hope, this will help you.

Two(with subquery) or one query to select max(date) in where clause. MySQL

I need to create a table and store there cached status of some events. So I will have to do only two operations:
1) Insert id of event, it's status, and time of when this record was stored in db;
2) Get last record with certain event id.
There are several methods to get the result (status):
Method 1:
SELECT status FROM status_log a
WHERE a.event_id = 1
ORDER BY a.update_date DESC
LIMIT 1
Method 2:
SELECT status FROM status_log a
WHERE a.update_date = (
SELECT max(b.update_date) FROM status_log b
WHERE b.event_id = 1
) AND a.event_id = 1
So I have two questions:
Which query to use
Which field type to set to update_date field (int or timestamp)
Actually, your second query does not resolve question 'find record with greatest date of update for event #1' - because there could be many different events with same latest update_date. So, in terms of semantics - you should use first query. (after your edit this is fixed)
First query will be effective if you'll create an index by event_id index and this column will have good cardinality (i.e. WHERE clause will filter few enough rows by using that index). However, this can be improved by adding column update_date to index - but that makes sense only if there will be many rows with same event_id (many enough for MySQL to use second index part) - and again with good cardinality inside first index part.
But in practice - my advice is just a theory, you'll have to figure it out with EXPLAIN syntax and your own measures on real data.
As for data type - common practice is to use proper data type (i.e. datetime/timestamp for something which means time point)
Which query to use
I believe the first one should be faster. Anyway just run an EXPLAIN on them and you'll find out yourself.
The index you should be using will be:
ALERT TABLE status_log ADD INDEX(event_id, update_date)
Now... did you notice that those queries are NOT equivalent? The second one will return all status from all event_id that have a maximum date.
Which field type to set to update_date field (int or timestamp)
If you have a field named update_date I just can't imagine why an int would serve the same purpose. Rephrasing the question to choose between datetime or timestamp, then the answer is up to the requirements. If you just want to know when a record in the DB was updated use a timestamp. If the update_date refers to an entity in your domain model go for a datetime. You will most likely need to perform calculations on the date (add time, remove time, extract a month, etc) so using a unix timestamp (which I'd say should be almost write-only) will result in extra calculation time because you'll have to convert the timestamp to a datetime and then perform the function over that result.

MySQL sum query taking huge time to complete. Looking for a bottleneck

I am running a simple MySQL query to find total time user spent playing the game:
SELECT userId, SUM(time) AS totalGameTime
FROM game_attempts
WHERE userId = 19599
EXPLAIN shows the following:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE game_attempts ref userId_gameId userId_gameId 4 const 26880
PROFILER shows, that most of the time spent on "Sending data":
Sending data 1.786524
Why such a simple query takes so much time to complete? Where to look for a bottleneck?
UPDATE. Time is INT(11) field, no conversions involved.
UPDATE. Possible solution is to introduce (userId, time) index, which solves the problem by moving part of the data to index-tree. But it doesn't solve the bigger problem of why summing up 30000 integers takes so long.
This question doesn't have a simple answer. Indices are right, no time-consuming conversions are involved. It's just about DB eninge tuning — why locating those 30000 records and retrieving data takes so much time?
Important to say that table using InnoDB engine and conains about 2 million records.
try making an index to userId like this will solve your problem:
ALTER TABLE game_attempts ADD INDEX (userId);
It suggests you are returning a large number of rows to the client.
Can you add
GROUP BY userId
to make sure you return just a single row?
Make an index on userId. That restricts access to records with failing userId.
In just about any other DBMS, your statement would be considered invalid SQL, because the select part of your query contains an aggregate function as well as a field that is not part of the GROUP BY clause - in fact, you have no GROUP BY clause.
Oracle e.g. will tell you:
ORA-00937: not a single-group group function
You'll get something similar in MSSQL.
I'd guess that what MySQL does here is to compute the SUM way more often than needed.
The following query will is more SQL standard conformant and will be way speedier:
SELECT userId, SUM(time) AS totalGameTime
FROM game_attempts
WHERE userId = 19599
GROUP BY userId;
Too much precision on your 'time' column?
What if you sum
SEC_TO_TIME(SUM(TIME_TO_SEC(time)))
instead?
Ok i note this as an anwser so you cant make this error again
As off MySQL 5.0.23 you can set
ONLY_FULL_GROUP_BY by SET SESSION sql_mode = 'ONLY_FULL_GROUP_BY ';
ur configure the server correctly
mysql> SELECT name, MAX(age) FROM t;
ERROR 1140 (42000): Mixing of GROUP columns (MIN(),MAX(),COUNT(),...)
with no GROUP columns is illegal if there is no GROUP BY clause
source (http://dev.mysql.com/doc/refman/5.0/en/server-sql-mode.html#sqlmode_only_full_group_by)