EclipseLink: ConnectionPools and native queries - mysql

We are using Spring (Eclipselink) on MariaDB. Our SQL via ORM result in a long lasting DB query. Therefore I need to refine it into a nativequery - which is no big deal by itself. Nevertheless the Resultset is limited by a LIMIT and I need a total counter for all found records. For querying the total counter I found for MariaSQL the following solution.
My question:
Is it save to query the two SQL command separately or should I send them once with a UNION combined?
The question arises due to the fact that between my query and the SELECT FOUND_ROWS() the another query might interfere (with a request from the same Microservice) and dilute the result.

If both queries are done in the same transaction, the MVCC of INNODB should guarantee, that the results will not be influenced by other transactions.
see: https://dev.mysql.com/doc/refman/8.0/en/innodb-multi-versioning.html

Related

Qt SQL `nextResult` function for MySQL Server 8.0: delayed execution per result set?

We are currently doing a lot of small queries. We execute a query, read the results, and then execute the next one. Since network requests cost a lot of time, this ping-ponging gets slow very fast.
This is why we want to do multiple queries at once, sending all data that the SQL server must know to it, and only retrieving one result (consisting of multiple result sets).
We found that Qt 5.14.1's QSqlQuery has the nextResult() function, but in the documentation (link) it says:
Some databases may execute all statements at once while others may delay the execution until the result set is actually accessed, [...].
MY QUESTION:
So, does MySql Server 8.0 delay the execution until the result set is actually accessed? If this is the case, then we still have a ping-pong for every query right? Which would be very slow still.
P.S. Our current solution to just have 1 ping-pong is to union different result sets (resulting in kind of a block diagonal matrix) with lots and lots of null values), and this question is meant to find a better way to do this.

MySQL JOIN Keeps Timing Out

I am currently trying to run a JOIN between two tables in a local MySQL database and it's not working. Below is the query, I am even limiting the query to 10 rows just to run a test. After running this query for 15-20 minutes, it tells me "Error Code" 2013. Lost connection to MySQL server during query". My computer is not going to sleep, and I'm not doing anything to interrupt the connection.
SELECT rd_allid.CreateDate, rd_allid.SrceId, adobe.Date, adobe.Id
FROM rd_allid JOIN adobe
ON rd_allid.SrceId = adobe.Id
LIMIT 10
The rd_allid table has 17 million rows of data and the adobe table has 10 million. I know this is a lot, but I have a strong computer. My processor is an i7 6700 3.4GHz and I have 32GB of ram. I'm also running this on a solid state drive.
Any ideas why I cannot run this query?
"Why I cannot run this query?"
There's not enough information to determine definitively what is happening. We can only make guesses and speculations. And offer some suggestions.
I suspect MySQL is attempting to materialize the entire resultset before the LIMIT 10 clause is applied. For this query, there's no optimization for the LIMIT clause.
And we might guess that there is not a suitable index for the JOIN operation, which is causing MySQL to perform a nested loops join.
We also suspect that MySQL is encountering some resource limitation which is causing the session to be terminated. Possibly filling up all space in /tmp (that usually throws an error, something like "invalid/corrupted myisam table '#tmpNNN'", something of that ilk. Or it could be some other resource constraint. Without doing an analysis, we're just guessing.
It's possible MySQL wrote something to the error log (hostname.err). I'd check there.
But whatever condition MySQL is running into (the answer to the question "Why I cannot run this query")
I'm seriously questioning the purpose of the query. Why is that query being run? Why is returning that particular resultset important?
There are several possible queries we could execute. Some of those will run a long time, and some will be much more performant.
One of the best ways to investigate query performance is to use MySQL EXPLAIN. That will show us the query execution plan, revealing the operations that MySQL will perform, and in what order, and indexes will be used.
We can make some suggestions as to some possible indexes to add, based on the query shown e.g. on adobe (id, date).
And we can make some suggestions about modifications to the query (e.g. adding a WHERE clause, using a LEFT JOIN, incorporate inline views, etc. But we don't have enough of a specification to recommend a suitable alternative.
You can try something like:
SELECT rd_allidT.CreateDate, rd_allidT.SrceId, adobe.Date, adobe.Id
FROM
(SELECT CreateDate, SrceId FROM rd_allid ORDER BY SrceId LIMIT 1000) rd_allidT
INNER JOIN
(SELECT Id FROM adobe ORDER BY Id LIMIT 1000) adobeT ON adobeT.id = rd_allidT.SrceId;
This may help you get a faster response times.
Also if you are not interested in all the relation you can also put some WHERE clauses that will be executed before the INNER JOIN making the query faster also.

Fetching large number of records from MySQL through Java

There is a MySQL table, Users on a Server. It has 28 rows and 1 million records (It may increase as well). I want to fetch all rows from this table, do some manipulation on them and then want to add them to MongoDB. I know that it will take lots of time to retrieve these records through simple 'Select * from Users' operation. I have been doing this in Java, JDBC.
So, the options I got from my research is:
Option 1. Do batch processing : My plan was to get the total number of rows from the table, ie. select count(*) from users. Then, set a fetch size of say 1000 (setFetchSize(1000)). After that I was stuck. I did not know if I can write something like this:
Connection conn = DriverManager.getConnection(connectionUrl, userName,passWord);
Statement stmt =conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,java.sql.ResultSet.CONCUR_READ_ONLY);
String query="select * from users";
ResultSet resultSet=stmt.executeQuery(query);
My doubt was whether resultSet will have 1000 entries once I execute the query and should I repeatedly do the operation till all records are retrieved.
I dropped the plan because, I understand that for MySQL, ResultSet is fully populated at once and batching might not work. This stackoverflow discussion and MySQL documentation helped out.
Option 2. Do pagination: My idea is that I will set a Limit which will tell starting index for fetching and offset for fetching. May be, set the offset as 1000 and iterate over the index.
I read a suggested article link, but did not find any loop holes in approaching this problem using Limit.
Anybody who is kind enough and patient enough to read this long post, could you please share your valuable opinions on my thought process and correct me if there is something wrong or missing.
Answering my own question based on the research I did:
Batching is not really effective for select queries, especially if you want to use the resultset of each query operation.
Pagination - Good if you want to improve the memory efficiency, not for improving speed of execution. Speed comes down as you fire multiple queries with Limit, as every time JDBC has to connect to MySQL.

Does mysql have cache about sql plan?

If the same sql run many times from different sessions, will mysql parse the same sql many times? In oracle/sql server, the plan for a sql is cached and can be reused. Since it is told that parse and creating sql plan is costly, if mysql doesn't cache them, will it be a problem to parse it many time which could potentially cost a lot?
For execution plan caching: I don't believe MySQL currently offers this feature.
MySQL does have a query cache: http://dev.mysql.com/doc/refman/5.1/en/query-cache.html
The query cache stores the text of a SELECT statement together with the corresponding result that was sent to the client. If an identical statement is received later, the server retrieves the results from the query cache rather than parsing and executing the statement again. The query cache is shared among sessions, so a result set generated by one client can be sent in response to the same query issued by another client.
I'm not sure how up to date this article is (2006), but it talks about these issues in detail:
http://www.mysqlperformanceblog.com/2006/07/27/mysql-query-cache/
To the best of my knowledge, not much has changed since then in this regard.
This is an existing MySQL Feature Request.
However, the last comments (in 2009) where along the lines that it's not clear it would offer any significant performance improvements and that it could lead to deadlock conditions.
If you are concerned about this, you might want to look into using prepared statements.

Is there an effect on the speed of a query when using SQL_CALC_FOUND_ROWS in MySQL?

The other day I found the FOUND_ROWS() (here) function in MySQL and it's corresponding SQL_CALC_FOUND_ROWS option. The later looks especially useful (instead of running a second query to get the row count).
I'm wondering what speed impact there is by adding SQL_CALC_FOUND_ROWS to a query?
I'm guessing it will be much faster than runnning a second query to count the rows, but will it be a lot different. Also, I have found limiting a query to make it much faster (for example when you get the first 10 rows of 1000). Will adding SQL_CALC_FOUND_ROWS to a query with a small limit cause the query to run much slower?
I know I can test this, but I'm wondering about general practices here.
When I was at the MySQL Conference in 2008, part of one session was dedicated to exactly this - benchmarks between SQL_CALC_FOUND_ROWS and doing a separate SELECT.
I believe the result was that there was no benefit to SQL_CALC_FOUND_ROWS - it wasn't faster, in fact it may have been slower. There was also a 3rd way.
Additionally, you don't always need this information, so I would go the extra query route.
I'll try to find the slides...
Edit: Hrm, google tells me that I actually liveblogged from that session: http://beerpla.net/2008/04/16/mysql-conference-liveblogging-mysql-performance-under-a-microscope-the-tobias-and-jay-show-wednesday-200pm/. Google wins when memory fails.
To calculate SQL_CALC_FOUND_ROWS the query will be execute as if no LIMIT was set, but the result set sent to the client will obey the LIMIT.
Update: for COUNT(*) operations which would be using only the index, SQL_CALC_FOUND_ROWS is slower (reference).
I assume it would be slightly faster for queries that you need the number of rows know, but would incur and overhead for queries that you don't need to know.
The best advice I could give is to try it out on your development server and benchmark the difference. Every setup is different.
I would advise to use as few proprietary SQL extensions as possible when developing an application (or actually not using SQL queries at all). Doing a separate query is portable, and actually I don't think MySql could do better at getting the actual information than re-querying. Btw. as the page mentions the command has some drawbacks too when used in replicated environments.