Mysql SQL_CALC_FOUND_ROWS and pagination - mysql

So I have a table that has a little over 5 million rows. When I use SQL_CALC_FOUND_ROWS the query just hangs forever. When I take it out the query executes within a second withe LIMIT ,25. My question is for pagination reasons is there an alternative to getting the number of total rows?

SQL_CALC_FOUND_ROWS forces MySQL to scan for ALL matching rows, even if they'd never get fetched. Internally it amounts to the same query being executed without the LIMIT clause.
If the filtering you're doing via WHERE isn't too crazy, you could calculate and cache various types of filters to save the full-scan load imposed by calc_found_rows. Basically run a "select count(*) from ... where ...." for most possible where clauses.
Otherwise, you could go Google-style and just spit out some page numbers that occasionally have no relation whatsoever with reality (You know, you see "Goooooooooooogle", get to page 3, and suddenly run out of results).

Detailed talk about implementing Google-style pagination using MySQL

You should choose between COUNT(*) AND SQL_CALC_FOUND_ROWS depending on situation. If your query search criteria uses rows that are in index - use COUNT(*). In this case Mysql will "read" from indexes only without touching actual data in the table while SQL_CALC_FOUND_ROWS method will load rows from disk what can be expensive and time consuming on massive tables.
More information on this topic in this article #mysqlperformanceblog.

Related

Is using limit on main query effect the sub-queries?

I have this query :
SELECT *,(SELECT count(id) FROM riverLikes
WHERE riverLikes.river_id = River.id) as likeCounts
FROM River
WHERE user_id IN (1,2,3)
LIMIT 10
my question is my sub-query runs only 10 time ( foreach row that are fetched ) or it run for every row in the "River" table ?
my "River" has lots of records and i like to have the best performance to get the rivers .
thanks.
In general, calculated data (either subqueries or functions), is calculated for the rows that matter, being rows that are returned, or rows for which the outcome of the calculation is relevant to further filtering or grouping.
In addition, the query optimizer may do all kinds of magic, and it is unlikely that it will run the subquery many times as such. It can be transformed in such a way that all relevant information is fetched at once.
And even if it didn't do that, it all takes place within the same operation in the database SQL engine, so executing this subselect 10 times is way, way faster than executing that subselect as a separate select 10 times, because the SQL engine only has to parse and prepare it once, and doesn't suffer from roundtrip times.
A simple select like that could easily take 30 milliseconds or so when executed from PHP, so quick math would suggest that it'd take 300ms extra to have this subselect in a 10-row query, but that's not the case, because the lion's share of those 30ms is overhead of communication between PHP and the database.
Because of the reasons mentioned above, this subselect is possibly way faster than a join, and it's a common misconception that a join is (almost) always faster.
So, to get back to your example, the subquery won't be executed for all rows in River, but will only be executed, probably in optimized form, for those 10 records of Rivers 1, 2 and 3.
In most production-ready RDBMS's subquery will be run only for rows which included in result set, i.e. only 10 times in your case. I think it is true for mysql too.
EDIT:
For assurance run
EXPLAIN <your query>
And view execution plan of your query
The subquery in the select statement runs one time per row returned, in your sample 10 times

sql query count for sum of two "unknown" columns

I need to query for the COUNT of rows that fulfill multiple filter criteria. However, I do not know which filters will be combined, so I cannot create appropriate indexes.
SELECT COUNT(id) FROM tbl WHERE filterA > 1000 AND filterD < 500
This is very slow since it has to do a full table scan. Is there any way to have a perfomant query in my situation?
id, filterA, filterB, filterC, filterD, filterE
1, 2394, 23240, 8543, 3241, 234, 23
The issue here is that there are certain limitations in how you can index data on multiple criteria. These are standard, fundamental issues and to the extent that ElasticSearch is able to get away from the problems it is just brute force parallelism and indexes on everything you may want to filter by.
Usually some filters will be more commonly used and more selective, so usually one would start by looking at actual examples of queries and build indexes around the queries which have performed slowly in the past.
This means you start with slow query logging and then focus on the most important queries first until you get everything where it is tolerable.

Should I use the sql COUNT(*) or use SELECT to fetch rows and count the rows later

I am writing a NodeJs application which should be very light in weight to the mysql db(Engine- InnoDB).
I am trying to count the number of records of a table in the mysql db
So I was wondering whether I should use the COUNT(*) function or get all the rows with a SELECT query and then count the rows using JavaScript.
Which way is better with respect to,
DB Operation cost
Overall performance
Definitely use the count() function - unless you need the data within the records as well for other purpose.
If you query all rows, then on MySQL side the server has to prepare a resultset (memory consumption, time to fetch data into resultset), then push it down through the connection to your application (more data takes more time), your application has to receive the data (again, memory consumption and time to create the resultset), and finally your application has to count the number of records in the resultset.
If you use count(), MySQL counts records and returns just a single number.
count() is obviously better than fetch and count separately.
As count() fetch the total count from index key (if there is any primary key).
Also the fetching data takes too much of time( disk I/O and network operations).
Thanks
When getting information from a database, the usual best approach is to get what you need and nothing more. This includes things like selecting specific columns rather than select *, and aggregating at the DBMS rather than in your client code. In this case, since all you apparently need is a count, use count().
It's a good bet that will outperform any other attempted solution since:
you'll be sending only what's absolutely necessary over the network (this may be less important for local databases but, once you have your data elsewhere, it can have a real impact); and
the DBMS will almost certainly be optimised for that use case.
Do a count(FIELD_NAME) as it will be much faster when you fetch all rows .It will only get count which is always index in table.

SQL group by and limit [duplicate]

When I add LIMIT 1 to a MySQL query, does it stop the search after it finds 1 result (thus making it faster) or does it still fetch all of the results and truncate at the end?
Depending on the query, adding a limit clause can have a huge effect on performance. If you want only one row (or know for a fact that only one row can satisfy the query), and are not sure about how the internal optimizer will execute it (for example, WHERE clause not hitting an index and so forth), then you should definitely add a LIMIT clause.
As for optimized queries (using indexes on small tables) it probably won't matter much in performance, but again - if you are only interested in one row than add a LIMIT clause regardless.
Limit can affect the performance of the query (see comments and the link below) and it also reduces the result set that is output by MySQL. For a query in which you expect a single result there is benefits.
Moreover, limiting the result set can in fact speed the total query time as transferring large result sets use memory and potentially create temporary tables on disk. I mention this as I recently saw a application that did not use limit kill a server due to huge result sets and with limit in place the resource utilization dropped tremendously.
Check this page for more specifics: MySQL Documentation: LIMIT Optimization
The answer, in short, is yes. If you limit your result to 1, then even if you are "expecting" one result, the query will be faster because your database wont look through all your records. It will simply stop once it finds a record that matches your query.
If there is only 1 result coming back, then no, LIMIT will not make it any faster. If there are a lot of results, and you only need the first result, and there is no GROUP or ORDER by statements then LIMIT will make it faster.
If you really only expect one single result, it really makes sense to append the LIMIT to your query. I don't know the inner workings of MySQL, but I'm sure it won't gather a result set of 100'000+ records just to truncate it back to 1 at the end..

Improve performance on MySQL fulltext search query

I have a following MySQL query:
SELECT p.*, MATCH (p.description) AGAINST ('random text that you can use in sample web pages or typography samples') AS score
FROM posts p
WHERE p.post_id <> 23
AND MATCH (p.description) AGAINST ('random text that you can use in sample web pages or typography samples') > 0
ORDER BY score DESC LIMIT 1
With 108,000 rows, it takes ~200ms. With 265,000 rows, it takes ~500ms.
Under performance testing(~80 concurrent users) it shows ~18sec average latency.
Is any way to improve performance for this query ?
EXPLAIN OUTPUT:
UPDATED
We have added one new mirror MyISAM table with post_id, description and synchronized it with posts table via triggers. Now, fulltext search on this new MyISAM table works ~400ms(with the same performance load where InnoDB shows ~18sec.. this is a huge performance boost) Look like MyISAM is much more quicker for fulltext in MySQL than InnoDB. Could you please explain it ?
MySQL profiler results:
Tested on AWS RDS db.t2.small instance
Original InnoDB posts table:
MyISAM mirror table with post_id, description only:
Here are a few tips what to look for in order to maximise the speed of such queries with InnoDB:
Avoid redundant sorting. Since InnoDB already sorted the result according to ranking. MySQL Query Processing layer does not need to
sort to get top matching results.
Avoid row by row fetching to get the matching count. InnoDB provides all the matching records. All those not in the result list
should all have ranking of 0, and no need to be retrieved. And InnoDB
has a count of total matching records on hand. No need to recount.
Covered index scan. InnoDB results always contains the matching records' Document ID and their ranking. So if only the Document ID and
ranking is needed, there is no need to go to user table to fetch the
record itself.
Narrow the search result early, reduce the user table access. If the user wants to get top N matching records, we do not need to fetch
all matching records from user table. We should be able to first
select TOP N matching DOC IDs, and then only fetch corresponding
records with these Doc IDs.
I don't think you cannot get that much faster looking only at the query itself, maybe try removing the ORDER BY part to avoid unnecessary sorting. To dig deeper into this, maybe profile the query using MySQLs inbuild profiler.
Other than that, you might look into the configuration of your MySQL server. Have a look at this chapter of the MySQL manual, it contains some good informations on how to tune the fulltext index to your needs.
If you've already maximized the capabilities of your MySQL server configuration, then consider looking at the hardware itself - sometimes even a lost cost solution like moving the tables to another, faster hard drive can work wonders.
My best guess for the performance hit is the number of rows being returned by the query. To test this, simply remove the order by score and see if that improves the performance.
If it does not, then the issue is the full text index. If it does, then the issue is the order by. If so, the problem becomes a bit more difficult. Some ideas:
Determine a hardware solution to speed up the sorts (getting the intermediate files to be in memory).
Modifying the query so it returns fewer values. This might involve changing the stop-word list, changing the query to boolean mode, or other ideas.
Finding another way of pre-filtering the results.
The issue here is WHERE p.post_id <> 23
Design your system in such a way so that non-indexed columns — like post_id — need not be added to the WHERE clause.
Basically MySQL will search for the full-text indexed column and then filter the post_id. Hence, if there are a lot of matches returned by the full text search, the response time will not be as expected.