Is it worth to cache simple queries with Redis? - mysql

I was wondering if it is worth caching queries like:
SELECT * FROM users WHERE id = 1
If not, then the same should also be the case for complex queries, since they will be cached by the DB cache anyways.
Would it ever make sense to cache a single DB query with Redis? Or would I only gain benefits from caching the results of multiple queries (e.g. an entire route)
Is Redis (in memory) faster than the DB cache (also in memory). In that case it would also make sense to cache single queries in Redis, but I assume DB and Redis cache should perform similarly.

Query results are worth caching if your app is likely to read it from cache instead of running the SQL query again. Or if you need the result more quickly than any SQL query can run.
Also the cost of an SQL query is not necessarily detrimental to your app performance. An SQL query like SELECT * FROM users WHERE id = 1 is simple and efficient, being a primary key lookup of at most one row (assuming id is the primary key).
A complex query against large ranges of data can take a lot longer if you do it in SQL, so the relative benefit of reading the cached result will be greater.
But even a simple query run a million times per hour can be costly. If you run the query so frequently that it's holding back your app performance, a cache is a good strategy.
There are many variables, and they depend on your specific app behavior and constraints. There's no way someone can answer this for you.
How often does the data in the database change, making the cached copy out of date?
How efficient is the SQL query? Is it a simple query that will be pretty quick anyway, or is it a complex query that may take full seconds when you run the SQL version?
Is your app able to tolerate the time it takes to run the SQL query? Of course all apps say they want the results "as fast as possible" but that's not a measurable requirement. Is the SQL query fast enough?
How frequently do you run the query? At a certain scale, you need to use minor optimizations that would not be worth the time to code if the query is used infrequently.

Related

Reporting with MySQL - Simplest Query taking too long

I have a MySQL Table on an Amazon RDS Instance with 250 000 Rows. When I try to
SELECT * FROM tableName
without any conditions (just for testing, the normal query specifies the columns I need, but I need most of them) , the query takes between 20 and 60 seconds to execute. This will be the base query for my report, and the report should run in under 60 seconds, so I think this will not work out (it times out the moment I add the joins). The report runs without any problems in our smaller test environments.
Could it be that the Query is taking so long because MySQL is trying to lock the table and waiting for all writes to finish? There might be quite a lot of writes on this table. I am doing the query on a MySQL slave, since I do not want to lockup the production system with my queries.
I have no experience with how much rows are much for a relational DB. Are 250 000 Rows with ~30 columns (varchar, date and integer types) much?
How can I speedup this query (hardware, software, query optimization ...)
Can I tell MySQL that I do not care that the Data might be inconsistent (It is a snapshot from a Reporting Database)
Is there a chance that this query will run under 60 seconds, or do I have to adjust my goals?
Remember that MySQL has to prepare your result set and transport it to your client. In your case, this could be 200MB of data it has to shuttle across the connection, so 20 seconds is not bad at all. Most libraries, by default, wait for the entire result being received before forwarding it to the application.
To speed it up, fetch only the columns you need, or do it in chunks with LIMIT. SELECT * is usually a sign that someone's being super lazy and not optimizing at all.
If your library supports streaming resultsets, use that, as then you can start getting data almost immediately. It'll allow you to iterate on rows as they come in without buffering the entire result.
A table with 250,000 rows is not too big for MySQL at all.
However, waiting for those rows to be returned to the application does take time. That is network time, and there are probably a lot of hops between you and Amazon.
Unless your report is really going to process all the data, check the performance of the database with a simpler query, such as:
select count(*) from table;
EDIT:
Your problem is unlikely to be due to the database. It is probably due to network traffic. As mentioned in another answer, streaming might solve the problem. You might also be able to play with the data formats to get the total size down to something more reasonable.
A last-resort step would be to save the data in a text file, compress the file, move it over, and uncompress it. Although this sounds like a lot of work, you might get 5x - 10x compression on the data, saving oodles of time on the transmission and still have a large improvement in performance with the rest of the processing.
I got updated specs from my client and was able to reduce the amount of users returned to 250, which goes (with a lot of JOINS) though in 60 seconds.
So maybe the answer is really: Try to not dump a whole table with a query, fetch only the exact data your need. The Client has SQL access, and he will have to update his queries, so only relevant users are returned.
I should never really use * as a wildcard. Choose the fields that you actually want and then create an index of these fields combined.
If you have thousands of rows, another option is implement pagination.
If result data directly using for report , no one can look more than 100 rows in single shot.

MySQL vs SQL Server 2008 R2 simple select query performance

Can anyone explain to me why there is a dramatic difference in performance between MySQL and SQL Server for this simple select statement?
SELECT email from Users WHERE id=1
Currently the database has just one table with 3 users. MySQL time is on average 0.0003 while SQL Server is 0.05. Is this normal or the MSSQL server is not configured properly?
EDIT:
Both tables have the same structure, primary key is set to id, MySQL engine type is InnoDB.
I tried the query with WITH(NOLOCK) but the result is the same.
Are the servers of the same level of power? Hardware makes a difference, too. And are there roughly the same number of people accessing the db at the same time? Are any other applications using the same hardware (databases in general should not share servers with other applications).
Personally I wouldn't worry about this type of difference. If you want to see which is performing better, then add millions of records to the database and then test queries. Database in general all perform well with simple queries on tiny tables, even badly designed or incorrectly set up ones. To know if you will have a performance problem you need to test with large amounts of data and many simulataneous users on hardware similar to the one you will have in prod.
The issue with diagnosing low cost queries is that the fixed cost may swamp the variable costs. Not that I'm a MS-Fanboy, but I'm more familiar with MS-SQL, so I'll address that, primarily.
MS-SQL probably has more overhead for optimization and query parsing, which adds a fixed cost to the query when decising whether to use the index, looking at statistics, etc. MS-SQL also logs a lot of stuff about the query plan when it executes, and stores a lot of data for future optimization that adds overhead
This would all be helpful when the query takes a long time, but when benchmarking a single query, seems to show a slower result.
There are several factors that might affect that benchmark but the most significant is probably the way MySQL caches queries.
When you run a query, MySQL will cache the text of the query and the result. When the same query is issued again it will simply return the result from cache and not actually run the query.
Another important factor is the SQL Server metric is the total elapsed time, not just the time it takes to seek to that record, or pull it from cache. In SQL Server, turning on SET STATISTICS TIME ON will break it down a little bit more but you're still not really comparing like for like.
Finally, I'm not sure what the goal of this benchmarking is since that is an overly simplistic query. Are you comparing the platforms for a new project? What are your criteria for selection?

How to find slower MySQL queries from many small queries

I'm wondering if anyone has a suggestion for my situation:
I have a process that runs many tens of thousands of queries. The whole process takes between 5 and 10 minutes. I want to know which queries are running slower than the rest, but I know that none of them are running for more than say, 5 seconds (with this many queries, that would be very noticeable in my logs). How should I find out which ones are taking the most time, and are the ones that, if optimized, would provide the best results?
MORE DETAILS:
My queries run single-threaded and synchronous, and I'd say 70% SELECT and 30% INSERT/UPDATE. I'd have to get some heads together and determine if the work can be split up into different units that can be run simultaneously - I'm not sure...
All the queries are either simple INSERT statements, single-property UPDATE statements, or SELECT statements on either a primary or foreign key or a two-field ANDed restriction.
DESCRIPTION OF THE ISSUE:
What I'm doing is basically copying a complex directed graph structure, in its entirety. Nodes are database entries, and adjacencies represent essentially foreign keys, but not strictly-speaking (they could be a two-field combination, where the first says what table the second is the id for).
Take a look at MySQL's slow query log. You can configure the threshold of what is regarded as "slow".
Basically, you track time from query start to query end, e.g.:
function getmicrotime() {
list($usec, $sec) = explode(" ",microtime());
return ((float)$usec + (float)$sec);
}
$query_start = getmicrotime();
$query = 'SELECT ...';
mysql_query($query, $connect);
$query_end = getmicrotime();
if ($query_end - $query_start > 2) {
// add query to log
your_slow_queries_logger($query);
}
I'd sugest to make some kind of a wrapper for mysql_query function that would take care of logging slow queries.
I, personally, log my slow queries using syslog
If you serious in optimizing MySQL, have a read at this book
High Performance MySQL: Optimization, Backups, Replication, and More
Some of the contents may be outdated for the newest MySQL versions, but should be sufficient as it covers MySQL 5.1. It also has an extensive chapter on benchmarking along with techniques and methods and guiding you to create a benchmark plan that suits your needs.
It also has chapters on Index and Query optimizations which would be very helpful to you if you need to optimize the slow queries identified with the benchmarks.

Optimizing MySQL queries/database

I have two tables TABLE A and TABLE B.
TABLE A contain 1 million (1,000,000) records and 4 fields while TABLE 2 contain 60,000 and 3 fields.
I am running a query which joins these two tables and usees WHERE clause to find specific products like WHERE product like '%Bags%' and product like 'Bags%' e.t.c.
When I run the query directly in phpMyAdmin then it returns records in around 1 or 2 seconds. But when they are being used on website, they are sometime taking 9 or 10 seconds according to MySQL 'slow query' log. Actually my website response was very slow at times so upon investigation I found out it is due to MySQL as I came to know about 'slow query log'.
The slow query log consists of all SQL statements that took more than long_query_time seconds to execute and required at least min_examined_row_limit rows to be examined.
So according to that log "query_time" for above query was 13 seconds while in some cases they even had "query_time" exceeding 50 seconds.
Both my tables are using PRIMARY keys as well as INDEXES. So I want to know how can I optimize them more or is there any way I can optimize MySQL settings in general?
This slowness of website doesn't happen all the time but sometimes (may be once in a week) and lasts for around 1 or 2 minutes. It gets decent amount of traffic and there are many other queries too, the above I posted was just one example.
Thanks
For all things MySQL and performance related, check out http://www.mysqlperformanceblog.com/
Check your queries with EXPLAIN, see here and here for info on how to use EXPLAIN as query diagnostic tool.
It's not enough to just have indexes. Are you indexing the fields searched in the WHERE clause? Also do you have indexes for the fields used in the WHERE clause (including the fields you mention in ORDER BY, GROUP BY, and HAVING clauses as well as JOINs)? If you have grouped fields in a single index, that index won't be hit unless you have a query that searches all those fields together. If you group fields in an index make sure they the index will actually be used in your query (EXPLAIN is your friend).
That said, it could be many other things as well: poorly configured MySQL server, poorly tuned server, bad schema. But your queries and your indexes are good place to start your investigation.
Here is a nice summary of performance best practices from Jay Pipes of MySQL.
like '%Bags%' query cannot be optimized using indexes.
The only way to improve performance here is to use fulltext indexes or get sphinx to search.
Its because of some other queries are run at the time when you are going to refresh the page of your website. so if for example your website going to run 8-10 queries at time of page refresh then it will take some more time than you run single query in phpmyadmin. and if its take 1-1.5 min to execute then its may not the query problem but it may have prob with the server speed also.
and you also can use MATCH() AGAINST() statement for optimize this type of search queries.
Otherwise you are already using PRIMARY KEY, INDEXES and JOINS so there is no need to worry about other things.
just check it out.
Thanks.
There are many ways to optimize Databases and queries. My method is the following.
Look at the DB Schema and see if it makes sense
Most often, Databases have bad designs and are not normalized. This can greatly affect the speed of your Database. As a general case, learn the 3 Normal Forms and apply them at all times. The normal forms above 3rd Normal Form are often called de-normalization forms but what this really means is that they break some rules to make the Database faster.
What I suggest is to stick to the 3rd normal form except if you are a DBA (which means you know subsequent forms and know what you're doing). Normalization after the 3rd NF is often done at a later time, not during design.
Only query what you really need
Filter as much as possible
Your Where Clause is the most important part for optimization.
Select only the fields you need
Never use "Select *" -- Specify only the fields you need; it will be faster and will use less bandwidth.
Be careful with joins
Joins are expensive in terms of time. Make sure that you use all the keys that relate the two tables together and don't join to unused tables -- always try to join on indexed fields. The join type is important as well (INNER, OUTER,... ).
Optimize queries and stored procedures (Most Run First)
Queries are very fast. Generally, you can retrieve many records in less than a second, even with joins, sorting and calculations. As a rule of thumb, if your query is longer than a second, you can probably optimize it.
Start with the Queries that are most often used as well as the Queries that take the most time to execute.
Add, remove or modify indexes
If your query does Full Table Scans, indexes and proper filtering can solve what is normally a very time-consuming process. All primary keys need indexes because they makes joins faster. This also means that all tables need a primary key. You can also add indexes on fields you often use for filtering in the Where Clauses.
You especially want to use Indexes on Integers, Booleans, and Numbers. On the other hand, you probably don't want to use indexes on Blobs, VarChars and Long Strings.
Be careful with adding indexes because they need to be maintained by the database. If you do many updates on that field, maintaining indexes might take more time than it saves.
In the Internet world, read-only tables are very common. When a table is read-only, you can add indexes with less negative impact because indexes don't need to be maintained (or only rarely need maintenance).
Move Queries to Stored Procedures (SP)
Stored Procedures are usually better and faster than queries for the following reasons:
Stored Procedures are compiled (SQL Code is not), making them faster than SQL code.
SPs don't use as much bandwidth because you can do many queries in one SP. SPs also stay on the server until the final results are returned.
Stored Procedures are run on the server, which is typically faster.
Calculations in code (VB, Java, C++, ...) are not as fast as SP in most cases.
It keeps your DB access code separate from your presentation layer, which makes it easier to maintain (3 tiers model).
Remove unneeded Views
Views are a special type of Query -- they are not tables. They are logical and not physical so every time you run select * from MyView, you run the query that makes the view and your query on the view.
If you always need the same information, views could be good.
If you have to filter the View, it's like running a query on a query -- it's slower.
Tune DB settings
You can tune the DB in many ways. Update statistics used by the optimizer, run optimization options, make the DB read-only, etc... That takes a broader knowledge of the DB you work with and is mostly done by the DBA.
****> Using Query Analysers****
In many Databases, there is a tool for running and optimizing queries. SQL Server has a tool called the Query Analyser, which is very useful for optimizing. You can write queries, execute them and, more importantly, see the execution plan. You use the execution to understand what SQL Server does with your query.

mySQL Inconsistent Performance

I'm running a mySQL query that joins various tables of 500,000+ rows. Sometimes it takes a second, other times around 15 seconds! This is on my local machine. I have experienced similarly varied times before on other intensive queries, does anyone know why this is?
Thanks
Thanks for the replies - I am using appropriate indexes, inner and left joins and have a WHERE clause range of one week out of possible 2 year period of invoices. If I keep varying it (so presumably query results are not cached) and re-running, time varies a lot, even if no. of rows retrieved is similar. The server is not busy. A few scheduled queries every minute but not intensive, take around 200ms.
The explain plan shows that a table of around 2000 rows is always fully scanned. So maybe these rows are sometimes cached, or maybe indexes are cached - didnt know indexes could be cached. I will try again with caching turned off.
Editing again - query cache is in fact off, I'm using InnoDB so looks like increasing innodb_buffer_pool_size is way to go
Same query each time?
It's hard to tell, based on what you've posted. If we assume that the schema and data aren't changing, I'd guess that there's something else running on your machine when the queries are long that would explain the difference. It could be that the state of memory is different, so paging is going on; an anti-virus program is running; some other service has started. It's impossible to answer.
Try to do an
Optimize Table
That should help to refresh some data useful for the query planner.
You have not give us much information, if you're using MyISAM tables, it may be a matter of locks.
Are you using ANSI INNER JOINs? Little basic, but don't use "cross joins". Those are the joins with the comma, like
SELECT * FROM t1, t2 WHERE t1.id_t1=t2.id_t1
Last things you may want to try. Increase your buffers (innodb), your key_buffers (myisam), and some query cache buffers.
Here's some common reasons(bar your server simply being too busy)
The slow query is hitting the harddrive. In the fast case the indexes and data are already cached in MySQL or the OS file cache.
Retrieving the data gets locked by updates/inserts, for MyISAM tables the whole table gets locked whenever someone inserts/updates data in it in some cases.
Table statistics are out of date and/or the wrong index gets selected. running analyze oroptimize on the table can help.
You have the query cache enabled, fetching the result of a cached query is fast, fetching it if it's not in the cache might be slow. Try turning off the query cache to check if the query is always slow if its not fetched from the cache.
In any case, you should show the output of EXPLAIN on your queries to verify indexes are getting used properly - even if they're not, queries can be fast if everything is in ram but grinding to a halt if it needs to hit the hardddrive.