My query as a small duration time when the Fetch time is quite large. What is exactly the fetching time. Can it be reduced ? Is it dependent on the network since the server is a remote one.
my Query is a simple one SELECT * FROM Table WHERE id = a primary_Key; Usually, the query returns between 5 and 50k rows.
But the table has 9 million Rows. The count(*) takes 82 seconds (duration) and 0 for fetching time.
Fetch time does depend on network speed. Your SELECT Count(*) ... query only returns a single number so network overhead is minimal.
To improve the speed, I suggest you only fetch the columns and rows you need (i.e. replace SELECT * ... with the exact columns you need).
Also, enabling compression between client and server will reduce time (but will slightly increase CPU usage for compressing/decompressing). See this related question.
Have you looked into using Mysql Indexes
Related
Actually, it is the question for an interview of a company which builds high-load service.
For example, we have a table with 1TB of records with primary b-tree index.
We need to select all records in a range from 5000 to 5000000.
We cannot block the whole database. Database in under high load.
Does it make sense to split a huge select query into parts like
select * from a where id > =5000 and id < 10000;
select * from a where id >= 10000 and id < 15000;
...
Please help me to compare behaviour in case when we use Postgres and MySQL.
Are there any other optimal techniques to select all required records?
Thanks.
There are many unknowns in your question. First of all, what is the table structure ? Will this query use any indexes ?
The best way to find out is to run an execution plan and analyze performance.
But trying to retrieve so many rows in one pass does not seem very reasonable. The query will very likely cause heavy load on the server + RAM consumption + usage of a temp file probably. It could fail or time out.
And then the resultset has to travel across the network and it could be huge. You have to evaluate the size of the dataset, we cannot guess without insight into the table structure.
The big question is, why retrieve so many rows, what is the ultimate goal ? Say you have a GUI application with a datagridview or something like that. You are not going to display 500 millions rows at once, this would crash the application. What the user probably wants is to paginate or search records using some filter. Maybe you'll show a few hundreds of records at a time max.
What are you going to do with all those records ?
I have this query :
SELECT *,(SELECT count(id) FROM riverLikes
WHERE riverLikes.river_id = River.id) as likeCounts
FROM River
WHERE user_id IN (1,2,3)
LIMIT 10
my question is my sub-query runs only 10 time ( foreach row that are fetched ) or it run for every row in the "River" table ?
my "River" has lots of records and i like to have the best performance to get the rivers .
thanks.
In general, calculated data (either subqueries or functions), is calculated for the rows that matter, being rows that are returned, or rows for which the outcome of the calculation is relevant to further filtering or grouping.
In addition, the query optimizer may do all kinds of magic, and it is unlikely that it will run the subquery many times as such. It can be transformed in such a way that all relevant information is fetched at once.
And even if it didn't do that, it all takes place within the same operation in the database SQL engine, so executing this subselect 10 times is way, way faster than executing that subselect as a separate select 10 times, because the SQL engine only has to parse and prepare it once, and doesn't suffer from roundtrip times.
A simple select like that could easily take 30 milliseconds or so when executed from PHP, so quick math would suggest that it'd take 300ms extra to have this subselect in a 10-row query, but that's not the case, because the lion's share of those 30ms is overhead of communication between PHP and the database.
Because of the reasons mentioned above, this subselect is possibly way faster than a join, and it's a common misconception that a join is (almost) always faster.
So, to get back to your example, the subquery won't be executed for all rows in River, but will only be executed, probably in optimized form, for those 10 records of Rivers 1, 2 and 3.
In most production-ready RDBMS's subquery will be run only for rows which included in result set, i.e. only 10 times in your case. I think it is true for mysql too.
EDIT:
For assurance run
EXPLAIN <your query>
And view execution plan of your query
The subquery in the select statement runs one time per row returned, in your sample 10 times
I am writing a NodeJs application which should be very light in weight to the mysql db(Engine- InnoDB).
I am trying to count the number of records of a table in the mysql db
So I was wondering whether I should use the COUNT(*) function or get all the rows with a SELECT query and then count the rows using JavaScript.
Which way is better with respect to,
DB Operation cost
Overall performance
Definitely use the count() function - unless you need the data within the records as well for other purpose.
If you query all rows, then on MySQL side the server has to prepare a resultset (memory consumption, time to fetch data into resultset), then push it down through the connection to your application (more data takes more time), your application has to receive the data (again, memory consumption and time to create the resultset), and finally your application has to count the number of records in the resultset.
If you use count(), MySQL counts records and returns just a single number.
count() is obviously better than fetch and count separately.
As count() fetch the total count from index key (if there is any primary key).
Also the fetching data takes too much of time( disk I/O and network operations).
Thanks
When getting information from a database, the usual best approach is to get what you need and nothing more. This includes things like selecting specific columns rather than select *, and aggregating at the DBMS rather than in your client code. In this case, since all you apparently need is a count, use count().
It's a good bet that will outperform any other attempted solution since:
you'll be sending only what's absolutely necessary over the network (this may be less important for local databases but, once you have your data elsewhere, it can have a real impact); and
the DBMS will almost certainly be optimised for that use case.
Do a count(FIELD_NAME) as it will be much faster when you fetch all rows .It will only get count which is always index in table.
I have large sets of data. Over 40GB that I loaded in MySQL table. I am trying to perform simple queries like select * from tablename but it takes gazillion minutes to run and eventually times out. If I set a limit, the execution is fairly fast ex: select * from tablename limit 1000.
The table has over 200 million records.
Tried creating indexes on some columns and that failed too after 3 hours of execution.
Any tips on working with these types of datasets?
First thing you need to do is completely ignore all answers and comments advising some other, awesome, mumbo jumbo technology. It's absolute bullshit. Those things can't work in a different way because they're all constrained with the same problem - hardware.
Now, let's get back to MySQL. The problem with LIMIT is that MySQL takes the whole data set, then takes LIMIT amount of rows starting from OFFSET. That means if you do SELECT * FROM my_table LIMIT 1000 - it will take all 200 million rows, buffer them, then it will start counting from 0 to 999 and discard the rest.
Yes, it takes time. Yes, it appears as dumb. However, MySQL doesn't know what "start" or "end" mean, so it can't know what limit and offset are until you tell it so.
To improve your search, you can use something like this (assuming you have numeric primary key):
SELECT * FROM tablename WHERE id < 10000 LIMIT 1000;
In this case, instead of with 200 million rows, MySQL will work with all rows whose PK is below 10 000. Much easier, much quicker, also readable. Numbers can be tweaked at any point and if you perform a pagination of some sort in a scripting language, you can always transfer the last numeric id that was present so MySQL can start from that id onwards in its search.
Also, you should be using InnoDB engine, and tweak it using innodb_buffer_pool_size which is the magic sauce that makes MySQL fly.
For large databases, one should consider using an alternative solutions such as Apache Spark. MySQL reads the data from disk which is a slow operation. Nothing can work as fast as a technology that is based on MapReduce. Take a look to this answer. It is true that with large databases, queries get very challenging.
Anyway assuming you want to stick with MySQL, first of all if you are using MyISAM, make sure to convert your database storage to InnoDB. This is especially important if you have lots of read/write operations.
It is also important to partition, that reduce the table into more manageable smaller tables. It will also enhance the indexes performance.
Do not be too generous with adding indexes. Define indexes wisely. If an index does not need to be UNIQUE do not define it as one. If an index does not need to include multiple fields do not include multiple fields.
Most importantly start monitor your MySQL instance. Use SHOW ENGINE INNODB STATUS to investigate the performance of your MySQL instance.
Do you think it's a good idea to count entries from a really big table (like 50K rows) on each page load?
SELECT COUNT(*) FROM table
Right now I have like 2000 rows and seems pretty fast, I don't see any delays in page load :)
But the table should reach up to 50K entries... And I'm curious how it will load then
(ps: this page which shows the row count is private, in a Admin interface, not public)
COUNT(*) is optimized to return very quickly if the SELECT retrieves from one table, no other columns are retrieved, and there is no WHERE clause. For example:
mysql> SELECT COUNT(*) FROM student;
This optimization applies only to MyISAM tables only, because an exact row count is stored for this storage engine and can be accessed very quickly.
Source
As you said you use MyISAM and your query is for the whole table, it doesn't matter if its 1 or 100000 rows.
As you have said this page is pvt and not public I don't see any problem with that query and 50k records, shouldn't have any real impact on page load times and server load.
The MyISAM engine stores the row count internally, so when issuing a query like SELECT COUNT(*) FROM table, then it will be fast. With InnoDB, on the other hand, it will take some time because it counts the actual rows. Which means - more rows - the slower it gets. But there's a trick by which you use a small covering index to count all the rows in the table - then it's fast. Another trick is to simply store the row count in a corresponding summary table.
COUNT(*) isnt an expensive operation, it dosent actually return the data just looks at the indexes. You should be fine even on a 50k table.
If you experience issues in loading it would be simple to retractor and optimise at that that point.
In MyISAM the count(*) is optimized away WHEN THERE ISN'T ANY 'WHERE' CONDITION, so the query is very fast even with billions of lines.
In the case of partitioned tables, we could think it would behave the same way if there is a simple condition on the column that defines the partition (ex: count all the lines on a few physical tables of the logical table). But this is not the case : it loops on all the lines of the physical tables considered, even if we want to count them all. For instance, here, on a 98-million-line table partitioned into 40 tables, it takes over 5 minutes to count the number of lines in the last 32 physical tables.
It can be. According to this forum PostgreSql will do an entire scan of the database to figure out the count.
count(*) is O(n) so it's performance is related to the number of records in the table, 50k is not a lot at all, so i think it is fine on an admin page. When you get into the millions count(*) certainly does become expensive.