MySQL Query Caching (2) - mysql

This is not a problem but it belongs to site optimization. I have 110K records of hotels. When I use SELECT something query it will pulled out data from 110k records.
If I search a hotel list with more than 3 star rating, price between 100 - 300 $ and within Mexico City. Suppose I got 45 matching results.
Is there any other way when I add more refinement, it will pulled out data from just only the 45 matching and not go with the 110K data?

The key is indexes my friend... make sure you have indexes of all items used in the WHERE and this will reduce cardinality when selecting...
On a side not... 110k rows is still an extremely small data set for MySQL so shouldn't pose much of a performance issue if you haven't got correct indexing on the table anyway.

It is more depend on how often your data updates.
See.
The MySQL Query Cache
Query Caching in MySQL
Caching question MySQL or Filesystem
I am saying that is there any other way when I add more refinement, it
will pulled out data from just only the 45 matching and not go with
the 110K data.
Then make view of those 45 rows and apply query to it.

Create a view using query
Create view refined as select * from ....
And after that add more select queries to that view
like
Select * from refined where ...

Firs of all, i tend to agree with Brian, indexes matter.
Check what kind(s) of queries are most frequent, and construct multi-column indexes on the table accordingly. Note that the order of columns in the indexes does matter (as the index is a tree, first column appears in tree root, so if your query does not use that column - the whole tree is useless).
Enable slow query log to see what queries actually take long (if any), or not use indexes, so you can improve indexes over time.
Having said this, query cache is a real performance boost, if your table data is mostly read. Here is a useful article on mysql query cache.

Related

Optimised way to store large key value kind of data

I am working on a database that has a table user having columns user_id and user_service_id. My application needs to fetch all the users whose user_service_id is a particular value. Normally I would add an index to the user_service_id column and run a query like this :
select user_id from user where user_service_id = 2;
Since the cardinality of the column user_service_id is very less than around 3-4 and the table has around 10M entries, the query will end up scanning almost the entire table.
I was wondering what is the recommendation for such usecases. Also, would it make more sense to move the data to another nosql datastore as this doesn't seem to be an efficient usecase for MySQL or any SQL datastore? Tried to search this but couldn't find any recommendations here. Can someone please help or provide the necessary references?
Thanks in advance.
That query needs this index, which is both "composite" and "covering":
INDEX(user_service_id, user_id) -- in this order
But what will you do with the millions of rows that you get? Sounds like it will choke the client, whether it comes fast or slow.
See my Index Cookbook
"very dynamic" -- Not a problem.
"cache" -- the dynamic nature defeats caching.
"cardinality" -- not important, except to point out that there will be millions of rows.
"millions of rows" -- that takes time to deliver to the client. The number of rows delivered is the biggest factor in cost.
"select entire table, then filter in client" -- That will be even slower! (See "millions of rows".)

MySQL temporary indexes for user-defined queries

I am building an analytics platform where users can create reports and such against a MySQL database. Some of the tables in this database are pretty huge (billions of rows), so for all of the features so far I have indexes built to speed up each query.
However, the next feature is to add the ability for a user to define their own query so that they can analyze data in ways that we haven't pre-defined. They have full read permission to the relevant database, so basically any SELECT query is a valid query for them to enter. This creates problems, however, if a query is defined that filters or joins on a column we haven't currently indexed - sometimes to the point of taking over a minute for a simple query to execute - something as basic as:
SELECT tbl1.a, tbl2.b, SUM(tbl3.c)
FROM
tbl1
JOIN tbl2 ON tbl1.id = tbl2.id
JOIN tbl3 ON tbl1.id = tbl3.id
WHERE
tbl1.d > 0
GROUP BY
tbl1.a, tbl1.b, tbl3.c, tbl1.d
Now, assume that we've only created indexes on columns not appearing in this query so far. Also, we don't want too many indexes slowing down inserts, updates, and deletes (otherwise the simple solution would be to build an index on every column accessible by the users).
My question is, what is the best way to handle this? Currently, I'm thinking that we should scan the query, build indexes on anything appearing in a WHERE or JOIN that isn't already indexed, execute the query, and then drop the indexes that were built afterwards. However, the main things I'm unsure about are a) is there already some best practice for this sort of use case that I don't know about? and b) would the overhead of building these indexes be enough that it would negate any performance gains the indexes provide?
If this strategy doesn't work, the next option I can see working is to collect statistics on what types of queries the users run, and have some regular job periodically check what commonly used columns are missing indexes and create them.
If using MyISAM, then performing an ALTER statement on tables with large (billions of rows) in order to add an index will take a considerable amount of time, probably far longer than the 1 minute you've said for the statement above (and you'll need another ALTER to drop the index afterwards). During that time, the table will be locked meaning other users can't execute their own queries.
If your tables use the InnoDB engine and you're running MySQL 5.1+, then CREATE / DROP index statements shouldn't lock the table, but it still may take some time to execute.
There's a good rundown of the history of ALTER TABLE [here][1].
I'd also suggest that automated query analysis to identify and build indeces would quite difficult to get right. For example, what about cases such as selecting by foo.a but ordering by foo.b? This kind of query often needs a covering index over multiple columns, otherwise you may find your server tries a filesort on a huge resultset which can cause big problems.
Giving your users an "explain query" option would be a good first step. If they know enough SQL to perform custom queries then they should be able to analyse EXPLAIN in order to best execute their query (or at least realise that a given query will take ages).
So, going further with my idea, I propose you segment your datas into well identified views. You used abstract names so I can't reuse your business model, but I'll take a virtual example.
Say you have 3 tables:
customer (gender, social category, date of birth, ...)
invoice (date, amount, ...)
product (price, date of creation, ...)
you would create some sorts of materialized views for specific segments. It's like adding a business layer on top of the very bottom data representation layer.
For example, we could identify the following segments:
seniors having at least 2 invoices
invoices of 2013 with more than 1 product
How to do that? And how to do that efficiently? Regular views won't help your problem because they will have poor explain plans on random queries. What we need is a real physical representation of these segments. We could do something like this:
CREATE TABLE MV_SENIORS_WITH_2_INVOICES AS
SELECT ... /* select from the existing tables */
;
/* add indexes: */
ALTER TABLE MV_SENIORS_WITH_2_INVOICES ADD CONSTRAINT...
... etc.
So now, your guys just have to query MV_SENIORS_WITH_2_INVOICES instead of the original tables. Since there are less records, and probably more indexes, the performances will be better.
We're done! Oh wait, no :-)
We need to refresh these datas, a bit like a FAST REFRESH in Oracle. MySql does not have (not that I know... someone corrects me?) a similar system, so we have to create some triggers for that.
CREATE TRIGGER ... AFTER INSERT ON `seniors`
... /* insert the datas in MV_SENIORS_WITH_2_INVOICES if it matches the segment */
END;
Now we're done!

Optimizing the Joining of Multiple MySQL Views

I have multiple views in my database that I am trying to perform a JOIN on when certain queries get very complex. As a worst case I would have to join 3 views with the following stats:
View 1 has 60K+ rows with 26 fields.
View 2 has 60K+ rows with 15 fields.
View 3 has 80K+ rows with 8 fields.
Joining views 1 and 2 seem to be no problem, but anytime I try to join the third view the query hangs. I'm wondering if there are any best practices I should be following to keep these querys from hanging. I've tried to use the smallest fields possible (medium/small ints where possible, ect).
We are using MySQL 5.0.92 community edition with MyISAM tables. Not sure if InnoDB would be more efficient.
As a last resort I thinking of splitting the one query into two, hitting views 1 & 2 with the first query, and then view 3 separately with the 3rd. Is there any downside to this other than making 2 queries?
Thanks.
You need to use EXPLAIN to understand why the performance is poor.
I wouldn't think you need to worry about MyISAM vs. InnoDB for this particular read performance just yet. MyISAM versus InnoDB
I am going to post my comments as an answer:
1) Take a look at the EXPLAIN command and see what it says.
2) Check the performance of the individual views. Are they as fast as you think on their own?
3) The columns you are using in your WHERE or JOIN clauses, do the underlying tables have indexes that apply to them? Something to have in mind:
A composite index (an index with more than one column) with columns
(a, b) would not help when you query only for b. It helps with a, and
a + b, but not with only b. That's why the single index you added
improved the situation
4) Are you using the all the columns and all the views? If you don't wouldn't it be simpler to take a look at the views and come up with a query instead?
If its possible to get what how the original VIEWs are defined, then use that as a basis to create your own single query might be a better approach... Way back, another person had similar issues on their query. He needed to get back to the raw table of one such view to ensure it had proper indexes to accept the optimization of the query he was trying to perform. Remember a view is a subset of something else and does not have an index to work with. So, if you can't take advantage of an index at the root table of a view, you could see such a performance hit.

How big is too big for a view in MySQL InnoDB?

BACKGROUND
I'm working with a MySQL InnoDB database with 60+ tables and I'm creating different views in order to make dynamic queries fast and easier in the code. I have a couple of views with INNER JOINS (without many-to-many relationships) of 20 to 28 tables SELECTING 100 to 120 columns with row count below 5,000 and it works lighting fast.
ACTUAL PROBLEM
I'm creating a master view with INNER JOINS (without many-to-many relationships) of 34 tables and SELECTING about 150 columns with row count below 5,000 and it seems like it's too much. It takes forever to do a single SELECT. I'm wondering if I hit some kind of view-size limit and if there is any way of increasing it, or any tricks that would help me pass through this apparent limit.
It's important to note that I'm NOT USING Aggregate functions because I know about their negative impact on performance, which, by the way I'm very concerned about.
MySql does not use the "System R algorithm" (used by Postgresql, Oracle, and SQL Server, I think), which considers not only different merge algorithms (MySQL only has nested-loop, although you can fake a hash join by using a hash index), but also the possible ways of joining the tables and possible index combinations. The result seems to be that parsing of queries - and query execution - can be very quick upto a point, but performance can dramatically drop off as the optimizer chooses the wrong path through the data.
Take a look at your explain plans and try to see if a) the drop in performance is due to the number of columns you are returning (just do SELECT 1 or something) or b) if it is due to the optimizer choosing a table scan instead of index usage.
A view is just a named query. When you refer to a view in MySQL it just replaces the name with the actual query and run it.
It seems that you confuse it with materialized views, which are tables you create from a query. Afterwards you can query that table, and does not have to do the original query again.
Materialized views are not implemented in MySQL.
To improve the performance try to use the keyword explain to see where you can optimize your query/view.

MySQL : Does a query searches inside the whole table?

1. So if I search for the word ball inside the toys table where I have 5.000.000 entries does it search for all the 5 millions?
I think the answer is yes because how should it know else, but let me know please.
2. If yes: If I need more informations from that table isn't more logic to query just once and work with the results?
An example
I have this table structure for example:
id | toy_name | state
Now I should query like this
mysql_query("SELECT * FROM toys WHERE STATE = 1");
But isn't more logical to query for all the table
mysql_query("SELECT * FROM toys"); and then do this if($query['state'] == 1)?
3. And something else, if I put an ORDER BY id LIMIT 5 in the mysql_query will it search for the 5 million entries or just the last 5?
Thanks for the answers.
Yes, unless you have a LIMIT clause it will look through all the rows. It will do a table scan unless it can use an index.
You should use a query with a WHERE clause here, not filter the results in PHP. Your RDBMS is designed to be able to do this kind of thing efficiently. Only when you need to do complex processing of the data is it more appropriate to load a resultset into PHP and do it there.
With the LIMIT 5, the RDBMS will look through the table until it has found you your five rows, and then it will stop looking. So, all I can say for sure is, it will look at between 5 and 5 million rows!
Read this about indexes :-)
http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
It makes it uber-fast :-)
Full table scan is here only if there are no matching indexes and indeed very slow operation.
Sorting is also accelerated by indexes.
And for the #2 - this is slow because transfer rate from MySQL -> PHP is slow, and MySQL is MUCH faster at doing filtering.
For your #1 question: Depends on how you're searching for 'ball'. If there's no index on the column(s) where you're searching, then the entire table has to be read. If there is an index, then...
WHERE field LIKE 'ball%' will use an index
WHERE field LIKE '%ball%' will NOT use an index
For your #2, think of it this way: Doing SELECT * FROM table and then perusing the results in your application is exactly the same as going to the local super walmart, loading the store's complete inventory into your car, driving it home, picking through every box/package, and throwing out everything except the pack of gum from the impulse buy rack by the front till that you'd wanted in the first place. The whole point of a database is to make it easy to search for data and filter by any kind of clause you could think of. By slurping everything across to your application and doing the filtering there, you've reduced that shiny database to a very expensive disk interface, and would probably be better off storing things in flat files. That's why there's WHERE clauses. "SELECT something FROM store WHERE type=pack_of_gum" gets you just the gum, and doesn't force you to truck home a few thousand bottles of shampoo and bags of kitty litter.
For your #3, yes. If you have an ORDER BY clause in a LIMIT query, the result set has to be sorted before the database can figure out what those 5 records should be. While it's not quite as bad as actually transferring the entire record set to your app and only picking out the first five records, it still involves a bit more work than just retrieving the first 5 records that match your WHERE clause.