So I have a real estate website. I have a page that runs about 5 queries to build a statistics page. I am wondering if there is a way to speed this up or optimize or combine the queries so that it runs faster. Right now its take up to 5 seconds to run the page.
Query:
SELECT COUNT(`listing_num`) as `count`,
AVG(`price`),
AVG(`square_feet`),
AVG(`bedroom_total`),
AVG(`bathroom_total`),
MIN(`price`),
MAX(`price`),
MIN(`square_feet`),
MAX(`square_feet`),
MIN(`bathroom_total`),
MAX(`bathroom_total`),
MIN(`bedroom_total`),
MAX(`bedroom_total`),
MIN(`psf`),
MAX(`psf`),
AVG(`psf`)
FROM `Res_Active2`
WHERE `status` != 'S'
So i run this query about 6 different times on the page with the WHERE clause changed in each so that I can display stats for sold properties, active properties, under contract properties, etc.
What is the right way and fast way to do this? Can i use cache, combine the sql, anything? I need to speed this page up. Thanks.
Try just setup mysql query cache. It will do only once and reuse result in all other queries.
To enable mysql cache see mysql cache
I am pretty sure for you will be enought just add in your /etc/my.conf
query_cache_size=30M
If that not help, you can create special table which have hold result of that query and update that result every X minutes by external script.
Related
This is a general question that applies to MySQL, Oracle DB or whatever else might be out there.
I know for MySQL there is LIMIT offset,size; and for Oracle there is 'ROW_NUMBER' or something like that.
But when such 'paginated' queries are called back to back, does the database engine actually do the entire 'select' all over again and then retrieve a different subset of results each time? Or does it do the overall fetching of results only once, keeps the results in memory or something, and then serves subsets of results from it for subsequent queries based on offset and size?
If it does the full fetch every time, then it seems quite inefficient.
If it does full fetch only once, it must be 'storing' the query somewhere somehow, so that the next time that query comes in, it knows that it has already fetched all the data and just needs to extract next page from it.
In that case, how will the database engine handle multiple threads? Two threads executing the same query?
I am very confused :(
I desagree with #Bill Karwin. First of all, do not make assumptions in advance whether something will be quick or slow without taking measurements, and complicate the code in advance to download 12 pages at once and cache them because "it seems to me that it will be faster".
YAGNI principle - the programmer should not add functionality until deemed necessary.
Do it in the simplest way (ordinary pagination of one page), measure how it works on production, if it is slow, then try a different method, if the speed is satisfactory, leave it as it is.
From my own practice - an application that retrieves data from a table containing about 80,000 records, the main table is joined with 4-5 additional lookup tables, the whole query is paginated, about 25-30 records per page, about 2500-3000 pages in total. Database is Oracle 12c, there are indexes on a few columns, queries are generated by Hibernate.
Measurements on production system at the server side show that an average time (median - 50% percentile) of retrieving one page is about 300 ms. 95% percentile is less than 800 ms - this means that 95% of requests for retrieving a single page is less that 800ms, when we add a transfer time from the server to the user and a rendering time of about 0.5-1 seconds, the total time is less than 2 seconds. That's enough, users are happy.
And some theory - see this answer to know what is purpose of Pagination pattern
Yes, the query is executed over again when you run it with a different OFFSET.
Yes, this is inefficient. Don't do that if you have a need to paginate through a large result set.
I'd suggest doing the query once, with a large LIMIT — enough for 10 or 12 pages. Then save the result in a cache. When the user wants to advance through several pages, then your application can fetch the 10-12 pages you saved in the cache and display the page the user wants to see. That is usually much faster than running the SQL query for each page.
This works well if, like most users, your user reads only a few pages and then changes their query.
Re your comment:
By cache I mean something like Memcached or Redis. A high-speed, in-memory key/value store.
MySQL views don't store anything, they're more like a macro that runs a predefined query for you.
Oracle supports materialized views, so that might work better, but querying the view would have the overhead of interpreting an SQL query.
A simpler in-memory cache should be much faster.
There is this rather simple query that I have to run on a livesystem, in order to get a count. The problem is that the table and database are rather inefficiently designed and since it is a livesystem altering it is not an option at this point.
So I have to figure out a query that runs fast and won't slow down the system too much, because for the time of the query execution the system basically stops which is not really what I would like a livesystem to do, so I need to streamline my query in order to make it perform in an acceptable time.
SELECT id1, count(id2) AS count FROM table GROUP BY id1 ORDER BY count
DESC;
So here is the query, unfortunately it is so simple that I am out of ideas on how to further improve it, maybe someone else has an idea ... ?
Application Get "good enough" results via application changes:
If you have access to the application, but not the database, then there are possibilities:
Periodically run that slow query and capture the results. Then use the cached results.
Do you need all
What is the goal? Find a few of the most common id1's? Rank all of them?
Back to the query
COUNT(id2) checks for id2 being not null; this us usually unnecessary, so COUNT(*) is better. However the speedup is insignificant.
ORDER BY NULL is irrelevant if you are picking off the rows with the highest COUNT -- the sort needs to be done somewhere. Moving it to the application does not help; at least not much.
Adding LIMIT 10 would only help because of cutting down on the time to send the data back to the client.
INDEX(id1) is the best index for the query (after changing to COUNT(*)). But the operation still requires
full index scan to do the COUNT and GROUP BY
sort the grouped results -- for the ORDER BY
Zero or near-zero downtime
Do you have replication established? Galera Clustering?
Look into pt-online-schema-change and gh-ost.
What is the real goal?
We cannot fix the query as written. What things can we change? Better yet, what is the ultimate goal -- perhaps there is an approach that does not involve any query that looks the least like the one you are trying to speed up.
Now I have just dumped the table and imported it into a MySQL-Docker, ran the query there, took ages and I actually had to move my entire Docker because the dump was so huge, but in the end I got my results and now I know how many id2s are associated with specific id1s (apostrophe to form a plural? You may want to double-check that ;) ).
As it was already pointed out, there wasn't much room for improvement on the query anymore.
FYI suddenly the care about stopping the system was gone and now we are indexing the table, so far it took 6 hours, no end in sight :D
Anyways, thanks for the help everyone.
I have created a view on a simple table. My problem is that my average execution time of a select on that view is about 29 seconds.
However, if I run the select statement which describes the view directly, the query executes in about 0.015 seconds.
Now, I have looked up some info, and here and here, people basically say that it should be roughly the same since a view is just a stored query.
Is it possible that I have this much of a difference in time? I have tried using SQL_NO_CACHE to make sure no cache is used so I get representative data when testing both options.
I would prefer to keep my view unless I have no option in reducing costs.
After a lot of research and trial and error I have concluded that on even simple queries and views, the performance can be a huge difference when selecting * from a view or just running the select query that is described in the creation of the view.
This ought to be a fairly common problem. I could not find a solution in the stackoverflow questions database. I am not sure I did the right search.
I run a MySQL, CGI/Perl site. May be a 1000 hits a day. A user can search the database for the website. The where clause can become quite lengthy. I display 10 items per page and give the user links to go to the next and previous pages. Currently, I do a new search, every time the user clicks on 'prev' or 'next' page link. I use
LIMIT num-rows-to-fetch OFFSET num-rows-to-skip
along with the query statement. However, response time is way too much for subsequent searches. This can only get worse as I add more users. I am trying to see if I can implement this in a better way.
If you can give me some pointers, I would appreciate it very much.
If you don't mind using Javascript, you should check DataTables. This way you send all rows to the client, and pagination is done on client side.
If it is not an option, then you could try using mod_perl or CGI::Session in order to save the query result between page querys, so you will not need to query mysql again and again.
you could try analyzing the query to find out which part causes the most trouble for the database. If there are joins, you could create indexes that include both the key fields and possible filter fields used in the query. But that could, depending on the database layout and size, create a lot of indexes. And if the query can change significantly you may still be at a loss if you did not create an index for a particular case. In any case i would at first analyze the query, trying to find out what makes the query so slow. If there is one major cause for the bad performance you could try throwing indexes at it.
Another way would be to cache the search results in memory and do pagination from there, avoiding the database roundtrip at all. Depending on your database size you might want to limit the search results to a reasonable number.
I have never used this module, so cannot vouch for it's performance, but have a look at DBI::ResultPager
I have a group-by query that is very fast when it comes to indexing, joining, sending the results etc. Unfortunately, mysql spends 99.6% of its time "copying to tmp table" when I profile it. I am at a loss as to how I can get mysql to perform better on its own.
This group by query essentially finds the number of entities for the top 20 tags in the system. I build on this query to filter out tags that the user selects, essentially narrowing down the field. As they select a new tag, the query performance improves drastically. If they select 2 or 3 tags, the query is extremely fast in finding the number of entities with those 2 or 3 tags, plus the next most popular tags after that.
Essentially, the performance problem is when no tags are selected, or when 1 tag is selected. What is the best way to solve this problem?
1) Memory cache, like ehcache? Get hibernate involved?
2) Mysql query cache?
3) Store the 0 and 1 tag selected results for every tag in the system in a cache table, and use those for a 24-hour period? Use a job to populate the results every day?
Any other ideas? I'm using Spring, Java, Hibernate, Mysql.
We definitely need the query and EXPLAIN query for answering this. Possibly also with schema.
What are the column types involved? If you look at the MySql reference you can see some of the factors that might be triggering on-disk storage for your query during processing. Is your disk system slow enough to cause this kind of issue?
Is it possible for you to just create a view rather than copying files?
Taking a stab in the dark here. MySQL may be writing the temp table to disk if with one tag the results are bigger than the setting for in memory temp tables.
If you have the resources, you may want to consider bumping up that setting. Or you can find a way to reduce the size of the results returned for zero or 1 tag.