If I used MySQLdb or JDBC to issue the sql: select * from users to Mysql. Suppose the table has 1 billion records. Then how many rows would be returned by Mysql in one chunk/package. I mean Mysql won't transfer the rows one by one neither transfer all of the data just one time, right? So what's the default chunk/package size one internet transfer to the client please?
If I used server-side cursor then I should set the fetch size bigger than default chunk size for better performance, right please?
The implementation notes of MySQL's JDBC API implementation points out, that by default the whole set will be retreived and stored in memory. So if there are 1 billion records they will be retreived. The limiting factor would probably be the memory of your machine.
So to sum up the size of the ResultSet retreived is depending on the JDBC implementation. For example Oracle's JDBC-Driver would only retreive 10 rows at a time and store them in memory.
Related
I'm loading a RDB with dummy data to practice query optimization. MySQL Workbench executed 10,000 INSERTs without returning an error into my customers table. Yet, when I SELECT * from that table I am only getting back exactly 1000 records in the result set. I am using InnoDB as my table engine
According to this link I should have unlimited records available and a 64TB overall sizelimit.. Im inserting 10,000 records with 4 VARCHAR(255)columns and 2 BOOLEAN columns each and I don't think that tops 1 TB. Am I wrong in this assumption?
Is the result grid limited to 1000 records? Is there an alternative to InnoDB which supports foreign keys? Is the problem that VARCHAR(255) is way too large and I need to reduce to something like VARCHAR(50)? What am I not understanding.
THANK YOU IN ADVANCE
In the query editor toolbar there's a drop down where you can limit the number of records you want to have returned. Default is 1000, but you can change that in a wide range, including no limitation.
No, it is not limited to 1000 records. I have InnoDB complex tables of more than 50 million records with blobs and multiple indexes. InnoDB is perfectly fine, you don't have to look for another engine. Could you be more precise about the context where you executed the query? Was it from a programming language? command line mysql client? Another Mysql client?
Many database query tools limit the number of rows returned. Try selecting some data from a high row number to see if your data is there (it should be).
I thought this would be useful for future reference:
In Microsoft SQL Server Management Studio, under Tools->Options->SQL Server Object Explorer->Value for Select Top <n> Rows Command change the number of rows returned:
I have a node API that accepts arrays of up to 1000 email addresses:
These are then sent to the server in a single INSERT INTO statement.
i.e. 1000 ROWS with about 8 columns each.
The API can currently only handle 1 of those requests per second.
I have tried to scale my node server (up and out).
I have also tried to scale up my DB.
Nothing seems to increase the API beyond 1 req/sec
The strange thing is that the API server and DB server usage stay well below 50%.
Is there some other possible bottleneck that might impact the performance?
The code part that takes most of the time is definitely the INSERT INTO statement, but why does it still remain that slow when I scale the DB. And why is the usage % so low?
Does my query get sent to the database once and I get a list of all the results in one shot which I then loop through, or do I have to request the next row from the DB each time?
Essentially, does reducing the number of rows I expect to return mean less connections/calls to the DB meaning my DB will be able to handle more connections at once, or is the number of database connections not dependent on the number of returned rows?
Your question is very vague, and seems to have the terminology jumbled up.
The number of rows retrieved from a resultset has no bearing on the number of connections. Nor does the number of statements executed have any bearing on connections.
(Unless it's a very badly written application that churns connections, connecting and disconnecting from the database for each statement execution.)
I think what you're asking is whether there's a "roundtrip" made to the database server for each "fetch", to retrieve a row from a resultset returned by a SELECT query.
The answer to that question is no, most database libraries fetch a "batch" of rows. When the client requests the next row, it's returned from the set already returned by the library. Once the batch is exhausted, it's another roundtrip to get the next set. That's all "under the covers" and your application doesn't need to worry about it.
It doesn't matter whether you fetch just one row, and then discard the resultset, or whether you loo[ through and fetch every row. It's the same pattern.
In terms of performance and scalability, if you only need to retrieve four rows, then there's no sense in preparing a resultset containing more than that. When your query runs against the database, the database server generates the resultset, and holds that, until the client requests a row from the resultset. Larger resultsets require more resources on the database server.
A huge resultset is going to need more roundtrips to the database server, to retrieve all of the rows, than a smaller resultset.
It's not just the number of rows, it's also the size of the row being returned. (Which is why DBA types gripe about developer queries that do a SELECT * FROM query to retrieve every flipping column, when the client is actually using only a small subset of the columns.)
Reducing roundtrips to the database generally improves performance, especially if the client is connecting to the database server over a network connection.
But we don't really need to be concerned how many roundtrips it requires to fetch all the rows from a resultset... it takes what it takes, the client needs what it needs.
What we should be concerned with is the number of queries we run
A query execution is WAY more overhead than the simple roundtrip for a fetch. The SQL statement has to be parsed for syntax (order and pattern of keywords and identifiers is correct), for semantics (the identifiers reference valid tables, columns, functions, and user has to appropriate permissions on the database objects), generate a execution plan (evaluate which predicates and operations can be satisfied with which indexes, the permutations for the order the operations are performed in). Finally, the statement can be executed, and if a resultset is being returned, prepare that, and then notify the client of query completion status, and wait for the client to request rows from the resultset, when the client closes the resultset, the database server can clean up, releasing the memory, etc.
These aren't linked. The number of connections you are able to make is based on the quality of the thread library and the amount of RAM available / used by each thread. So essentially, it is limited by the quality of the systems and not the complexity of the database. As the threads use a buffer, the number of rows will only make the processes slower or a fixed amount of RAM. see here for more
https://dev.mysql.com/doc/refman/5.0/en/memory-use.html
Q: Does my query get sent to the database once and I get a list of all the results in one shot which I then loop through, or do I have to request the next row from the DB each time?
A: You are getting back the batch of rows. You are iterating thru until you need the next batch (trip to DB on the same connection). Size of batch depends on multiple conditions, if your query result dataset is small you may get all results in one shot.
Q: Essentially, does reducing the number of rows I expect to return mean less connections/calls to the DB meaning my DB will be able to handle more connections at once, or is the number of database connections not dependent on the number of returned rows?
A: The larger the dataset the more trips (to grab next set of rows) to DB there may be. But number of connections opened to DB does not depend on your result dataset size for single query.
While working with MySQL and some really "performance greedy queries" I noticed, that if I run such a greedy query it could take 2 or 3 minutes to be computed. But if I retry the query immediately after it finished the first time, it takes only some seconds. Does MySQL store something like "the last x queries"?
The short answer is yes. there is a Query Cache.
The query cache stores the text of a SELECT statement together with the corresponding result that was sent to the client. If an identical statement is received later, the server retrieves the results from the query cache rather than parsing and executing the statement again. The query cache is shared among sessions, so a result set generated by one client can be sent in response to the same query issued by another client.
from here
The execution plan for the query will be calculated and re-used. The data can be cached, so subsequent executions will be faster.
Yes, depending on how the MySQL Server is configured, it may be using the query cache. This stores the results of identical queries until a certain limit (which you can set if you control the server) has been reached. Read http://dev.mysql.com/doc/refman/5.1/en/query-cache.html to find out more about how to tune your query cache to speed up your application if it issues many identical queries.
I was told by a colleague that executing an SQL statement always puts the data into RAM/swap by the database server. Thus it is not practical to select large result sets.
I thought that such code
my $sth = $dbh->prepare('SELECT million_rows FROM table');
while (my #data = $sth->fetchrow) {
# process the row
}
retrieves the result set row by row, without it being loaded to RAM.
But I can't find any reference to this in DBI or MySQL docs. How is the result set really created and retrieved? Does it work the same for simple selects and joins?
Your colleague is right.
By default, the perl module DBD::mysql uses mysql_store_result which does indeed read in all SELECT data and cache it in RAM. Unless you change that default, when you fetch row-by-row in DBI, it's just reading them out of that memory buffer.
This is usually what you want unless you have very very large result sets. Otherwise, until you get the last data back from mysqld, it has to hold that data ready and my understanding is that it causes blocks on writes to the same rows (blocks? tables?).
Keep in mind, modern machines have a lot of RAM. A million-row result set is usually not a big deal. Even if each row is quite large at 1 KB, that's only 1 GB RAM plus overhead.
If you're going to process millions of rows of BLOBs, maybe you do want mysql_use_result -- or you want to SELECT those rows in chunks with progressive uses of LIMIT x,y.
See mysql_use_result and mysql_store_result in perldoc DBD::mysql for details.
This is not true (if we are talking about the database server itself, not client layers).
MySQL can buffer the whole resultset, but this is not necessarily done, and if done, not necessarily in RAM.
The resultset is buffered if you are using inline views (SELECT FROM (SELECT …)), the query needs to sort (which is shown as using filesort), or the plan requires creating a temporary table (which is shown as using temporary in the query plan).
Even if using temporary, MySQL only keeps the table in memory when its size does not exceed the limit set in tmp_table. When the table grows over this limit, it is converted from memory into MyISAM and stored on disk.
You, though, may explicitly instruct MySQL to buffer the resultset by appending SQL_BUFFER_RESULT instruction to the outermost SELECT.
See the docs for more detail.
No, that is not how it works.
Database will not hold rows in RAM/swap.
However, it will try, and mysql tries hard here, to cache as much as possible (indexes, results, etc...). Your mysql configuration gives values for the available memory buffers for different kinds of caches (for different kinds of storage engines) - you should not allow this cache to swap.
Test it
Bottom line - it should be very easy to test this using client only (I don't know perl's dbi, it might, but I doubt it, be doing something that forces mysql to load everything on prepare). Anyway... test it:
If you actually issue a prepare on SELECT SQL_NO_CACHE million_rows FROM table and then fetch only few rows out of millions.
You should then compare performance with SELECT SQL_NO_CACHE only_fetched_rows FROM table and see how that fares.
If the performance is comparable (and fast) then I believe that you can call your colleague's bluff.
Also if you enable log of the statements actually issued to mysql and give us a transcript of that then we (non perl folks) can give more definitive answer on what would mysql do.
I am not super familiar with this, but it looks to me like DBD::mysql can either fetch everything up front or only as needed, based on the mysql_use_result attribute. Consult the DBD::mysql and MySQL documentation.