I am working on a data analytics Dashboard for a media content broadcasting company. Even if a user clicks a certain channel, logs/records are stored into MySQL DB. Following is the table that stores data regarding channel play times.
Here is the table structure:
_____________________________________
| ID INT(11) |
_____________________________________
| Channel_ID INT(11) |
_____________________________________
| playing_date (DATE) |
_____________________________________
| country_code VARCHAR(50) |
_____________________________________
| playtime_in_sec INT(11) |
_____________________________________
| count_more_then_30_min_play INT(11) |
_____________________________________
| count_15_30_min_play INT(11) |
_____________________________________
| count_0_15_min_play |
_____________________________________
| channel_report_tag VARCHAR(50) |
_____________________________________
| device_report_tag VARCHAR(50) |
_____________________________________
| genre_report_tag VARCHAR(50) |
_____________________________________
The Query that I run behind one of the dashboard graphs construction is :
SELECT
channel_report_tag,
SUM(count_more_then_30_min_play) AS '>30 minutes',
SUM(count_15_30_min_play) AS '15-30 Minutes',
SUM(count_0_15_min_play) AS '0-15 Minutes'
FROM
channel_play_times_cleaned
WHERE
playing_date BETWEEN '' AND ''
AND country_code LIKE ''
AND device_report_tag LIKE ''
AND channel_report_tag LIKE ''
GROUP BY
channel_report_tag
LIMIT 10
This query basically is taking a lot of time to return the result set (given the table data exceeds a million records per day and increasing every second ). I came across this stack-overflow Question : What generic techniques can be applied to optimize SQL queries? which basically mentions employing indices as one the techniques to optimize SQL queries. At the moment I am confused how to apply indices (i.e on what columns) in order to optimize the above mentioned query. I would be very grateful if some one could offer help in creating indices according to my specific scenario. Any other expert opinion for a beginner like me are surely welcomed.
EDIT :
As suggested by #Thomas G ,
I have tried to improve my query and make it more specific :
SELECT
channel_report_tag,
SUM(count_more_then_30_min_play) AS '>30 minutes',
SUM(count_15_30_min_play) AS '15-30 Minutes',
SUM(count_0_15_min_play) AS '0-15 Minutes'
FROM
channel_play_times_cleaned
WHERE
playing_date BETWEEN '' AND ''
AND country_code = 'US'
AND device_report_tag = 'j8'
AND channel_report_tag = 'NAT GEO'
GROUP BY
channel_report_tag
LIMIT 10
I started to write this in a comment because these are hints and not a clear answer. But that's way too long
First of all, it is common sense (but not always a rule of thumb) to index the columns appearing in a WHERE clause :
playing_date BETWEEN '' AND ''
AND country_code LIKE ''
AND device_report_tag LIKE ''
AND channel_report_tag LIKE ''
If your columns have a very high cardinality (your tag columns???), it's probably not a good idea to index them. Country_code and playing_date should be indexed.
The issue here is that there are so many LIKE in your query. This operator is perf a killer and you are using it on 3 columns. That's awfull for the database. So the question is: Is that really needed?
For instance I see no obvious reason to make a LIKE on a country code. Will you really query like this :
AND country_code LIKE 'U%'
To retrieve UK and US ??
You probably won't. Chances are high that you will know the countries for which you are searching for, so you should do this instead :
AND country_code IN ('UK','US')
Which will be a lot faster if the country column is indexed
Next, If you really want to make LIKE on your 2 tag columns, instead of doing a LIKE you can try this
AND MATCH(device_report_tag) AGAINST ('anything*' IN BOOLEAN MODE)
It is also possible to index your tag columns as FULLTEXT, especially if you search with LIKE ='anything%'. I you search with LIKE='%anything%', the index won't probably help much.
I could also state that with millions rows a day, you might have to PARTITION your tables (on the date for instance). And following your data, a composite index on the date and something else might help.
Really, there's no simple and straight answer to your complex question, especially with what you shown (not a lot).
Separate indexes are not as useful as composite indexes. Unfortunately, you have many possible combinations, and you are (apparently) allowing wildcards, which may destroy the utility of indexes.
Suggest you use client code to build the WHERE clause rather than populating it with ''
In composite indexes, put one range last. date BETWEEN ... AND ... is a "range".
LIKE 'abc' -- same as = 'abc', so why not change to that.
LIKE 'abc%' -- is a "range"
LIKE '%abc' -- can't use an index.
IN ('CA', 'TX') -- sometimes optimizes like '=', sometimes like 'range'.
So... Watch what queries the users ask for, then build composite indexes to satisfy them. Some rules:
At most one range, and put it last.
Put '=' column(s) first.
INDEX(a,b) is handled by INDEX(a,b,c), so include only the latter.
Don't have more than, say, a dozen indexes.
Index Cookbook
Related
I have a table with huge number of records. When I query from that specific table specially when using ORDER BY in query, it takes too much execution time.
How can I optimize this table for Sorting & Searching?
Here is an example scheme of my table (jobs):
+---+-----------------+---------------------+--+
| id| title | created_at |
+---+-----------------+---------------------+--+
| 1 | Web Developer | 2018-04-12 10:38:00 | |
| 2 | QA Engineer | 2018-04-15 11:10:00 | |
| 3 | Network Admin | 2018-04-17 11:15:00 | |
| 4 | System Analyst | 2018-04-19 11:19:00 | |
| 5 | UI/UX Developer | 2018-04-20 12:54:00 | |
+---+-----------------+---------------------+--+
I have been searching for a while, I learned that creating INDEX can help improving the performance, can someone please elaborate how the performance can be increased?
Add "explain" word before ur query, and check result
explain select ....
There u can see what u need to improve, then add index on ur search and/or sorting field and run explain query again
If you want to earn performance on your request, a way is paginating it. So, you can put a limit (as you want) and specify the page you want to display.
For example SELECT * FROM your_table LIMIT 50 OFFSET 0.
I don't know if this answer will help you in your problem but you can try it ;)
Indexes are the databases way to create lookup trees (B-Trees in most cases) to more efficiently sort, filter, and find rows.
Indexes are used to find rows with specific column values quickly.
Without an index, MySQL must begin with the first row and then read
through the entire table to find the relevant rows. The larger the
table, the more this costs. If the table has an index for the columns
in question, MySQL can quickly determine the position to seek to in
the middle of the data file without having to look at all the data.
This is much faster than reading every row sequentially.
https://dev.mysql.com/doc/refman/5.5/en/mysql-indexes.html
You can use EXPLAIN to help identify how the query is currently running, and identify areas of improvement. It's important to not over-index a table, for reasons probably beyond the scope of this question, so it'd be good to do some research on efficient uses of indexes.
ALTER TABLE jobs
ADD INDEX(created_at);
(Yes, there is a CREATE INDEX syntax that does the equivalent.)
Then, in the query, do
ORDER BY created_at DESC
However, with 15M rows, it may still take a long time. Will you be filtering (WHERE)? LIMITing?
If you really want to return to the user 15M rows -- well, that is a lot of network traffic, and that will take a long time.
MySQL details
Regardless of the index declaration or version, the ASC/DESC in ORDER BY will be honored. However it may require a "filesort" instead of taking advantage of the ordering built into the index's BTree.
In some cases, the WHERE or GROUP BY is to messy for the Optimizer to make use of any index. But if it can, then...
(Before MySQL 8.0) While it is possible to declare an index DESC, the attribute is ignored. However, ORDER BY .. DESC is honored; it scans the data backwards. This also works for ORDER BY a DESC, b DESC, but not if you have a mixture of ASC and DESC in the ORDER BY.
MySQL 8.0 does create DESC indexes; that is, the BTree containing the index is stored 'backwards'.
I have such a table:
id | content | date
45 | "Lorem" | "2014-09-06"
56 | "Ipsum" | "2013-05-01"
The table has a lot of rows. Now I need to get different year values.
Statement:
SELECT YEAR(`date`) AS `year` FROM `news` GROUP BY `year` ORDER BY `date`
Unfortunately, this solution doesn't use date index.
My question is if it's a good practice to have a separate year column and set it before every insert/update and have an index on it?
Or is there a better solution?
A time-space tradeoff if you ask me. In any case, to maintain normalization, you need to exclude the year from the date or never use it for the purposes the year column exists for.
A possible implementation is to use insert/update triggers as per Is it possible to have function-based index in MySQL? . This shouldn't hurt performance much since news are likely to be added/updated much less frequently than read.
I'm new to mysql. Right now, I have this kind of structure in mysql database:
| keyID | Param | Value
| 123 | Location | Canada
| 123 | Cost | 34
| 123 | TransportMethod | Boat
...
...
I have probably like 20 params with unique values for each Key ID. I want to be able to search in mysql given the 20 params with each of the values and figure out which keyID.
Firstly, how should I even restructure mysql database? Should I have 20 param columns + keyID?
Secondly, (relates to first question), how would I do the query to find the keyID?
If your params are identical across different keys (or all params are a subset of some set of params that the objects may have), you should structure the database so that each column is a param, and the row corresponds to one KeyID and the values of its params.
|keyID|Location|Cost|TransportMethod|...|...
|123 |Canada |34 |Boat ...
|124 | ...
...
Then to query for the keyID you would use a SELECT, FROM, and WHERE statement, such as,
SELECT keyID
FROM key_table
WHERE Location='Canada'
AND Cost=34
AND TransportMethod='Boat'
...
for more info see http://www.w3schools.com/php/php_mysql_where.asp
edit: if your params change across different objects (keyIDs) this will require a different approach I think
The design you show is called Entity-Attribute-Value. It breaks many rules of relational database design, and it's very hard to use with SQL.
In a relational database, you should have a separate column for each attribute type.
CREATE TABLE MyTable (
keyID SERIAL PRIMARY KEY,
Location VARCHAR(20),
Cost NUMERIC(9,2),
TransportMethod VARCHAR(10)
);
I agree that Nick's answer is probably best, but if you really want to keep your key/value format, you could accomplish what you want with a view (this is in PostgreSQL syntax, because that's what I'm familiar with, but the concept is the same for MySQL):
CREATE OR REPLACE VIEW myview AS
SELECT keyID,
MAX(CASE WHEN Param = 'Location' THEN Value END) AS Location,
MAX(CASE WHEN Param = 'Cost' THEN Value END) AS Cost,
....
FROM mytable;
Performance here is likely to be dismal, but if your queries are not frequent, it could get the job done.
I have a table with columns like this:
| Country.Number | CountryName |
| US.01 | USA |
| US.02 | USA |
I'd like to modify this to:
| Country | Number | CountryName |
| US | 01 | USA |
| US | 02 | USA |
Regarding optimization, is there a difference in performance if I use:
select * from mytable where country.number like "US.%"
or
select * from mytable where country = "US"
The performance difference will most likely be miniscule in this particular case, as mysql uses an index on "US.%". The performance degradation is mostly felt when searching for something like "%.US" (the wildcard is in front). As it then does a tablescan without using indices.
EDIT: you can look at it like this:
MySql internally stores varchar indices like trees with first symbol being the root and branching to each next letter.
So when searching for = "US" it looks for U, then goes one step down for S and then another to make sure that is the end of the value. That's three steps.
Searching for LIKE "US.%" it looks again for U, then S, then . and then stops searching and displays the results - that's also three steps only as it cares not whether the value terminated there.
EDIT2: I'm in no way promoting such database denormalization, I just wanted to attract your attention that this matter may not be as straightforward as it seems at first glance.
The later query:
select * from mytable where country = "US"
should be much faster because mySQL does not have to look for wildcards patterns unlike LIKE query. It just looks for the value that has been equalized.
If you need to optimize, a simple = is way better than a like.
Why ?
With an = either the string is exactly the same and it's true or it doesn't match and it's false.
With a like, MySQL must compare the string and test if the other string match the mask, and that takes more time and needs more operations.
So for the sake of your database, use SELECT * FROM 'mytable' WHERE country = "US".
The second is faster if there is an index on column country. MySQL has to scan less index entries to produce the result.
Not technically an answer to the question.. but...I would understand them to be close enough in speed for it not to (usually) matter - thus using "=" would be better as it displays the intent in a more obvious way.
why dont you just make country_id a tinyint unsigned and have an iso_code varchar(3) column which is unique ? (saves you from all the BS)
I am working on an e-shop which sells products only via loans. I display 10 products per page in any category, each product has 3 different price tags - 3 different loan types. Everything went pretty well during testing time, query execution time was perfect, but today when transfered the changes to the production server, the site "collapsed" in about 2 minutes. The query that is used to select loan types sometimes hangs for ~10 seconds and it happens frequently and thus it cant keep up and its hella slow. The table that is used to store the data has approximately 2 milion records and each select looks like this:
SELECT *
FROM products_loans
WHERE KOD IN("X17/Q30-10", "X17/12", "X17/5-24")
AND 369.27 BETWEEN CENA_OD AND CENA_DO;
3 loan types and the price that needs to be in range between CENA_OD and CENA_DO, thus 3 rows are returned.
But since I need to display 10 products per page, I need to run it trough a modified select using OR, since I didnt find any other solution to this. I have asked about it here, but got no answer. As mentioned in the referencing post, this has to be done separately since there is no column that could be used in a join (except of course price and code, but that ended very, very badly). Here is the show create table, kod and CENA_OD/CENA_DO very indexed via INDEX.
CREATE TABLE `products_loans` (
`KOEF_ID` bigint(20) NOT NULL,
`KOD` varchar(30) NOT NULL,
`AKONTACIA` int(11) NOT NULL,
`POCET_SPLATOK` int(11) NOT NULL,
`koeficient` decimal(10,2) NOT NULL default '0.00',
`CENA_OD` decimal(10,2) default NULL,
`CENA_DO` decimal(10,2) default NULL,
`PREDAJNA_CENA` decimal(10,2) default NULL,
`AKONTACIA_SUMA` decimal(10,2) default NULL,
`TYP_VYHODY` varchar(4) default NULL,
`stage` smallint(6) NOT NULL default '1',
PRIMARY KEY (`KOEF_ID`),
KEY `CENA_OD` (`CENA_OD`),
KEY `CENA_DO` (`CENA_DO`),
KEY `KOD` (`KOD`),
KEY `stage` (`stage`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
And also selecting all loan types and later filtering them trough php doesnt work good, since each type has over 50k records and the select takes too much time as well...
Any ides about improving the speed are appreciated.
Edit:
Here is the explain
+----+-------------+----------------+-------+---------------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+-------+---------------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | products_loans | range | CENA_OD,CENA_DO,KOD | KOD | 92 | NULL | 190158 | Using where |
+----+-------------+----------------+-------+---------------------+------+---------+------+--------+-------------+
I have tried the combined index and it improved the performance on the test server from 0.44 sec to 0.06 sec, I cant access the production server from home though, so I will have to try it tomorrow.
Your issue is that you are searching for intervals which contain a point (rather than the more normal query of all points in an interval). These queries do not work well with the standard B-tree index, so instead you need to use an R-Tree index. Unfortunately MySQL doesn't allow you to select an R-Tree index on a column, but you can get the desired index by changing your column type to GEOMETRY and using the geometric functions to check if the interval contains the point.
See Quassnoi's article Adjacency list vs. nested sets: MySQL where he explains this in more detail. The use case is different, but the techniques involved are the same. Here's an extract from the relevant part of the article:
There is also a certain class of tasks that require searching for all ranges containing a known value:
Searching for an IP address in the IP range ban list
Searching for a given date within a date range
and several others. These tasks can be improved by using R-Tree capabilities of MySQL.
Try to refactor your query like:
SELECT * FROM products_loans
WHERE KOD IN("X17/Q30-10", "X17/12", "X17/5-24")
AND CENA_OD >= 369.27
AND CENA_DO <= 369.27;
(mysql is not very smart when choosing indexes) and check the performance.
The next try is to add a combined key - (KOD,CENA_OD,CENA_DO)
And the next major try is to refactor your base to have products separated from prices. This should really help.
PS: you can also migrate to postgresql, it's smarter than mysql when choosing right indexes.
MySQL can only use 1 key. If you always get the entry by the 3 columns, depending on the actual data (range) in the columns one of the following could very well add a serious amount of performance:
ALTER TABLE products_loans ADD INDEX(KOD, CENA_OD, CENA_DO);
ALTER TABLE products_loans ADD INDEX(CENA_OD, CENA_DO, KOD);
Notice that the order of the columns matter! If that doesn't improve performance, give us the EXPLAIN output of the query.