count(*) using left join taking long time to respond - mysql

I have a MySQL query for showing the count of data in the Application listing page.
query
SELECT COUNT(*) FROM cnitranr left join cnitrand on cnitrand.tx_no=cnitranr.tx_no
Explain screen shot
Indexes on cnitranr
tx_no (primary )approx 1 crore of data[ENGINE MYISAM]
index on cnitrand
(tx_no secondary)approx 2 crore of data[ENGINE MYISAM]
Profiler output is like this
Can anyone suggest possibilities in optimizing this query or may i want to run a crone job for counting the count .Please help.

You would need to implement a materialized view.
Since MySQL does not support them directly, you would need to create a table like that:
CREATE TABLE totals (cnt INT)
and write a trigger on both tables that would increment and decrement cnt on INSERT, UPDATE and DELETE to each of the tables.
Note that if you have a record with many linked records in either table, the DML affecting such a record would be slow.
On large data volumes, you very rarely need exact counts, especially for pagination. As I said in a comment above, Google, Facebook etc. only show approximate numbers on paginated results.
It's very unlikely that a person would want to browse through 20M+ records on page only able to show 100 or so.

Related

mysql> SELECT COUNT(*) vs SHOW TABLE STATUS for row count

We have a table in our database that has teens of millions of entries (10.1.21-MariaDB; InnoDB table engine; Windows OS). We are able to get the number of rows in the table instantaneously using the command SHOW TABLE STATUS LIKE 'my_table_name'. However, SELECT COUNT(*) FROM my_table_name takes a few minutes to complete.
Q) Why is SHOW TABLE STATUS LIKE 'my_table_name' so so much quicker than SELECT COUNT(*) FROM my_table_name?
Because one is a query that counts all the rows and the other is a command that retrieves stats the DB engine maintains about the table. There isn't any firm guarantee that the table statistic will be up to date so the only way to get an accurate count is to count the rows, but it might be that you don't need it to be perfectly accurate all the time. You can thus choose either, depending on your desire for accuracy vs speed etc.
See here this screenshot from https://pingcap.com/docs/stable/sql-statements/sql-statement-show-table-status/
You can see the example inserts 5 rows but the table stats are out of date and the table still reports 0 rows. Running ANALYZE TABLE will (probably) take longer than counting the rows, but the stats will be up to date (for a while at least) afterwards.
A suggested approach to get a reasonably accurate count of the table size, when SELECT COUNT(*) is taking a long time to complete, could be:
ANALYZE TABLE my_table_name; SHOW TABLE STATUS LIKE 'my_table_name';
This comes in especially handy when importing a large amount of data into a table, and you want to track the progress of the import process.

How to make a faster query when joining multiple huge tables?

I have 3 tables. All 3 tables have approximately 2 million rows. Everyday 10,000-100,000 new entries are entered. It takes approximately 10 seconds to finish the sql statement below. Is there a way to make this sql statement faster?
SELECT customers.name
FROM customers
INNER JOIN hotels ON hotels.cus_id = customers.cus_id
INNER JOIN bookings ON bookings.book_id = customers.book_id
WHERE customers.gender = 0 AND
customers.cus_id = 3
LIMIT 25 OFFSET 1;
Of course this statement works fine, but its slow. Is there a better way to write this code?
All database servers have a form of an optimization engine that is going to determine how best to grab the data you want. With a simple query such as the select you showed, there isn't going to be any way to greatly improve performance within the SQL. As others have said sub-queries won't helps as that will get optimized into the same plan as joins.
Reduce the number of columns, add indexes, beef up the server if that's an option.
Consider caching. I'm not a mysql expert but found this article interesting and worth a skim. https://www.percona.com/blog/2011/04/04/mysql-caching-methods-and-tips/
Look at the section on summary tables and consider if that would be appropriate. Does pulling every hotel, customer, and booking need to be up-to-the-minute or would inserting this into a summary table once an hour be fine?
A subquery don't help but a proper index can improve the performance so be sure you have proper index
create index idx1 on customers(gender , cus_id,book_id, name )
create index idex2 on hotels(cus_id)
create index idex3 on hotels(book_id)
I find it a bit hard to believe that this is related to a real problem. As written, I would expect this to return the same customer name over and over.
I would recommend the following indexes:
customers(cus_id, gender, book_id, name)
hotels(cus_id)
bookings(book_id)
It is really weird that bookings are not to a hotel.
First, these indexes cover the query, so the data pages don't need to be accessed. The logic is to start with the where clause and use those columns first. Then add additional columns from the on and select clauses.
Only one column is used for hotels and bookings, so those indexes are trivial.
The use of OFFSET without ORDER BY is quite suspicious. The result set is in indeterminate order anyway, so there is no reason to skip the nominally "first" value.

Why is this simple SQL query causing major lag on a simple 6k record table?

So I have 2 tables, one called user, and one called user_favorite. user_favorite stores an itemId and userId, for storing the items that the user has favorited. I'm simply trying to locate the users who don't have a record in user_favorite, so I can find those users who haven't favorited anything yet.
For testing purposes, I have 6001 records in user and 6001 in user_favorite, so there's just one record who doesn't have any favorites.
Here's my query:
SELECT u.* FROM user u
JOIN user_favorite fav ON u.id != fav.userId
ORDER BY id DESC
Here the id in the last statement is not ambigious, it refers to the id from the user table. I have a PK index on u.id and an index on fav.userId.
When I run this query, my computer just becomes unresponsive and completely freezes, with no output ever being given. I have 2gb RAM, not a great computer, but I think it should be able to handle a query like this with 6k records easily.
Both tables are in MyISAM, could that be the issue? Would switching to INNODB fix it?
Let's first discuss what your query (as written) is doing. Because of the != in the on-clause, you are joining every user record with every one of the other user's favorites. So your query is going to produce something like 36M rows. This is not going to give you the answer that you want. And it explains why your computer is unhappy.
How should you write the query? There are three main patterns you can use. I think this is a pretty good explanation: http://explainextended.com/2009/09/18/not-in-vs-not-exists-vs-left-join-is-null-mysql/ and discusses performance specifically in the context of mysql. And it shows you how to look at and read an execution plan, which is critical to optimizing queries.
change your query to something like this:
select * from User
where not exists (select * from user_favorite where User.id = user_favorite.userId)
let me know how it goes
A join on A != B means that every record of A is joined with every record of B in which the id's aren't equal.
In other words, instead of producing 6000 rows, you're producing approximately 36 million (6000 * 6001) rows of output, which all have to be collected, then sorted...

Running count and count distinct on many rows (tens of thousands)

I'm trying to run this query:
SELECT
COUNT(events.event_id) AS total_events,
COUNT(matches.fight_id) AS total_matches,
COUNT(players.fighter_id) AS total_players,
COUNT(DISTINCT events.organization) AS total_organizations,
COUNT(DISTINCT players.country) AS total_countries
FROM
events, matches, players
These are table details:
Events = 21k
Players = 90k
Matches = 155k
All of those are uniques, so the query's first 3 things will be those numbers. The other two values should be total_organizations, where the organization column is in the events (should return couple hundred), and total_countries should count distinct countries using country column in players table (also couple hundred).
All three of those ID columns are unique and indexed.
This query as it stands now takes forever. I never even have patience to see it complete. Is there a faster way of doing this?
Also, I need this to load these results on every page load, so should I just put this query in some hidden file, and set a cron job to run every midnight or something and populate a "totals" table or something so I can retrieve it from that table quickly?
Thanks!
First, remove the unnecessary join here; it's preventing most (if not all) of your indexes from being used. You want three different queries:
SELECT
COUNT(events.event_id) AS total_events,
COUNT(DISTINCT events.organization) AS total_organizations
FROM
events;
SELECT
COUNT(matches.fight_id) AS total_matches
FROM
matches;
SELECT
COUNT(players.fighter_id) AS total_players,
COUNT(DISTINCT players.country) AS total_countries
FROM
players;
This should go a long way to improving the performance of these queries.
Now, consider adding these indexes:
CREATE INDEX "events_organization" ON events (organization);
CREATE INDEX "players_country" ON events (country);
Compare the EXPLAIN SELECT ... results before and after adding these indexes. They might help and they might not.
Note that if you are using the InnoDB storage engine then all table rows will be visited anyway, to enforce transactional isolation. In this case, indexes will only be used to determine which table rows to visit. Since you are counting the entire table, the indexes will not be used at all.
If you are using MyISAM, which does not fully support MVCC, then COUNT() queries should be able to execute using only index cardinality, which will result in nearly instant results. This is possible because transactions are not supported on MyISAM, which means that isolation becomes a non-issue.
So if you are using InnoDB, then you may wind up having to use a cronjob to create a cache of this data anyway.

SQL ORDER BY performance

I have a table with more than 1 million records. The problem is that the query takes too much times, like 5 minutes. The "ORDER BY" is my problem, but i need the expression in the query order by to get most popular videos. And because of the expression i can't create an index on it.
How can i resolve this problem?
Thx.
SELECT DISTINCT
`v`.`id`,`v`.`url`, `v`.`title`, `v`.`hits`, `v`.`created`, ROUND((r.likes*100)/(r.likes+r.dislikes),0) AS `vote`
FROM
`videos` AS `v`
INNER JOIN
`votes` AS `r` ON v.id = r.id_video
ORDER BY
(v.hits+((r.likes-r.dislikes)*(r.likes-r.dislikes))/2*v.hits)/DATEDIFF(NOW(),v.created) DESC
Does the most popular have to be calculated everytime? I doubt if the answer is yes. Some operations will take a long time to run no matter how efficient your query is.
Also bear in mind you have 1 million now, you might have 10 million in the next few months. So the query might work now but not in a month, the solution needs to be scalable.
I would make a job to run every couple of hours to calculate and store this information on a different table. This might not be the answer you are looking for but I just had to say it.
What I have done in the past is to create a voting system based on Integers.
Nothing will outperform integers.
The voting system table has 2 Columns:
ProductID
VoteCount (INT)
The votecount stores all the votes that are submitted.
Like = +1
Unlike = -1
Create an Index in the vote table based on ID.
You have to alternatives to improve this:
1) create a new column with the needed value pre-calculated
1) create a second table that holds the videos primary key and the result of the calculation.
This could be a calculated column (in the firts case) or modify your app or add triggers that allow you to keep it in sync (you'd need to manually load it the firs time, and later let your program keep it updated)
If you use the second option your key could be composed of the finalRating plus the primary key of the videos table. This way your searches would be hugely improved.
Have you try moving you arithmetic of the order by into your select, and then order by the virtual column such as:
SELECT (col1+col2) AS a
FROM TABLE
ORDER BY a
Arithmetic on sort is expensive.