I am working on a high scale application of the order of 35000 Qps, using Hibernate and MySQL.
A large table has AutoIncrement Primary key, and generation defined is IDENTITY at Hibernate. Show Sql is true as well.
Whenever an Insert happens I see only one query being fired in DB, which is an
Insert statement.
Few Questions Follow:
1) I was wondering how does Hibernate get the AutoIncrement Value after insert?
2) If the answer is "SELECT LAST_INSERT_ID()", why does it not show up at VividCortex or in Show Sql Logs...?
3) How does "SELECT LAST_INSERT_ID()" account for multiple autoincrements in different tables?
4) If MySql returns a value on Insert, why aren't the MySql clients built so that we can see what is being returned?
Thanks in Advance for all the help.
You should call SELECT LAST_INSERT_ID().
Practically, you can't do the same thing as the MySQL JDBC driver using another MySQL client. You'd have to write your own client that reads and writes the MySQL protocol.
The MySQL JDBC driver gets the last insert id by parsing packets of the MySQL protocol. The last insert id is returned in this protocol by a MySQL result set.
This is why SELECT LAST_INSERT_ID() doesn't show up in query metrics. It's not calling that SQL statement, it's picking the integer out of the result set at the protocol level.
You asked how it's done internally. A relevant line of code is https://github.com/mysql/mysql-connector-j/blob/release/8.0/src/main/protocol-impl/java/com/mysql/cj/protocol/a/result/OkPacket.java#L55
Basically, it parses an integer from a known position in a packet as it receives a result set.
I'm not going to go into any more detail about parsing the protocol. I don't have experience coding a MySQL protocol client, and it's not something I wish to do.
I think it would not be a good use of your time to implement your own MySQL client.
It probably uses the standard JDBC mechanism to get generated values.
It's not
You execute it imediately after inserting in one table, and you thus get the values that have been generated by that insert. But that's not what is being used, so it's irrelevant
Not sure what you mean by that: the MySQL JDBC driver allows doing that, using the standard JDBC API
(Too long for a comment.)
SELECT LAST_INSERT_ID() uses the value already available in the connection. (This may explain its absence from any log.)
Each table has its own auto_inc value.
(I don't know any details about Hibernate.)
35K qps is possible, but it won't be easy.
Please give us more details on the queries -- SELECTs? writes? 35K INSERTs?
Are you batching the inserts in any way? You will need to do such.
What do you then use the auto_inc value in?
Do you use BEGIN..COMMIT? What value of autocommit?
Related
I have a Spring JPA application which uses hibernate and mysql. I would like to log slow running database queries on this application. Hibernate provides an option to configure 'hibernate.session.events.log.LOG_QUERIES_SLOWER_THAN_MS' (with log level org.hibernate.SQL_SHOW: INFO). However using this option will log SQL with parameter values set.
Is there an option in hibernate to replace the parameter values with '?'.
ex: insert into customer (name) values (?)
instead of
insert into customer (name) values ('John')
FYI, Setting the log level on org.hibernate.SQL to DEBUG will log the SQL with ? instead of parameter values. This will print all the SQL in the logs, In my case I only want to log slow running queries
Appreciate any help.
(Maybe Hibernate is doing more than necessary...)
MySQL optionally creates "slowlog" that contains details (in including 'John') on the slow query. However, the next thing that can be done is to "digest" the log. This will replace strings and numbers with ? or S (for string). So, if you can bypass Hibernate and get the log, use pt-query-digest or mysqldumpslow -s t to do the digesting. This gives a much shorter output file (by combining queries with the same 'digest' or 'signature').
The underlying setting is long_query_time; it is a number of "seconds" and can be a fractional value such as "0.5" for half a second. Setting it to 0 would capture all queries, which it sounds like you don't want. (I almost never go much below 1.)
When using mysql.h, there's a property called insert_id on the MYSQL connection object. Is there a similar feature when using a TADOConnection?
I am well ware of SELECT LAST_INSERT_ID();. What I'm asking is: If the C API has that feature, why is that feature missing in the ADO connector, if missing?
And if the only solution is to INSERT and SELECT the last insert id, are these two queries atomic, can some other INSERT queries be executed in-between, from a different connection, and make LAST_INSERT_ID return an unexpected value?
Since SQL-Server has OUTPUT Inserted.ID and Postgres has RETURNING id, I find MySQL to have made a dumb design choice with LAST_INSERT_ID().
In my code I use database->last_insert_id(undef,undef,undef,"id"); to get the autoincremented primary key. This works 99.99% of the time. But once in a while it returns a value of 0.
In such situations, Running a select with a WHERE clause similar to the value of the INSERT statement shows that the insert was successful. Indicating that the last_insert_id method failed to get the proper data.
Is this a known problem with a known fix? or should I be following up each call to last_insert_id with a check to see if it is zero and if yes a select statement to retrieve the correct ID value?
My version of mysql is
mysql Ver 14.14 Distrib 5.7.19, for Linux (x86_64)
Edit1: Adding the actual failing code.
use Dancer2::Plugin::Database;
<Rest of the code to create the insert parameter>
eval{
database->quick_insert("build",$job);
$job->{idbuild}=database->last_insert_id(undef,undef,undef,"idbuild");
if ($job->{idbuild}==0){
my $build=database->quick_select("build",$job);
$job->{idbuild}=$build->{idbuild};
}
};
debug ("=================Scheduler build Insert=======================*** ERROR :Got Error",$#) if $#;
Note: I am using Dancer's Database plugin. Whose description says,
Provides an easy way to obtain a connected DBI database handle by
simply calling the database keyword within your Dancer2 application
Returns a Dancer::Plugin::Database::Core::Handle object, which is a
subclass of DBI's DBI::db connection handle object, so it does
everything you'd expect to do with DBI, but also adds a few
convenience methods. See the documentation for
Dancer::Plugin::Database::Core::Handle for full details of those.
I've never heard of this type of problem before, but I suspect your closing note may be the key. Dancer::Plugin::Database transparently manages database handles for you behind the scenes. This can be awfully convenient... but it also means that you could change from using one dbh to using a different dbh at any time. From the docs:
Calling database will return a connected database handle; the first time it is called, the plugin will establish a connection to the database, and return a reference to the DBI object. On subsequent calls, the same DBI connection object will be returned, unless it has been found to be no longer usable (the connection has gone away), in which case a fresh connection will be obtained.
(emphasis mine)
And, as ysth has pointed out in comments on your question, last_insert_id is handle-specific, which suggests that, when you get 0, that's likely to be due to the handle changing on you.
But there is hope! Continuing on in the D::P::DB docs, there is a database_connection_lost hook available which is called when the database connection goes away and receives the defunct handle as a parameter, which would allow you to check and record last_insert_id within the hook's callback sub. This could provide a way for you to get the id without the additional query, although you'd first have to work out a means of getting that information from the callback to your main processing code.
The other potential solution, of course, would be to not use D::P::DB and manage your database connections yourself so that you have direct control over when new connections are created.
I'm fixing a bug in a proprietary piece of software, where I have some kind of JDBC Connection (pooled or not, wrapped or not,...). I need to detect if it is a MySQL connection or not. All I can use is an SQL query.
What would be an SQL query that succeeds on MySQL each and every time (MySQL 5 and higher is enough) and fails (Syntax error) on every other database?
The preferred way, using JDBC Metadata...
If you have access to a JDBC Connection, you can retrieve the vendor of database server fairly easily without going through an SQL query.
Simply check the connection metadata:
string dbType = connection.getMetaData().getDatabaseProductName();
This will should give you a string that beings with "MySQL" if the database is in fact MySQL (the string can differ between the community and enterprise edition).
If your bug is caused by the lack of support for one particular type of statement which so happens that MySQL doesn't support, you really should in fact rely on the appropriate metadata method to verify support for that particular feature instead of hard coding a workaround specifically for MySQL. There are other MySQL-like databases out there (MariaDB for example).
If you really must pass through an SQL query, you can retrieve the same string using this query:
SELECT ##version_comment as 'DatabaseProductName';
However, the preferred way is by reading the DatabaseMetaData object JDBC provides you with.
Assuming your interesting preconditions (which other answers try to work around):
Do something like this:
SELECT SQL_NO_CACHE 1;
This gives you a single value in MySQL, and fails in other platforms because SQL_NO_CACHE is a MySQL instruction, not a column.
Alternatively, if your connection has the appropriate privileges:
SELECT * FROM mysql.db;
This is an information table in a database specific to MySQL, so will fail on other platforms.
The other ways are better, but if you really are constrained as you say in your question, this is the way to do it.
MySql may be the only db engine that uses backticks. That means something like this should work.
SELECT count(*)
FROM `INFORMATION_SCHEMA.CHARACTER_SETS`
where 1=3
I might not have the backticks in the right spot. Maybe they go like this:
FROM `INFORMATION_SCHEMA`.`CHARACTER_SETS`
Someone who works with MySql would know.
I'm getting data from an MSSQL DB ("A") and inserting into a MySQL DB ("B") using the date created in the MSSQL DB. I'm doing it with simple logics, but there's got to be a faster and more efficient way of doing this. Below is the sequence of logics involved:
Create one connection for MSSQL DB and one connection for MySQL DB.
Grab all of data from A that meet the date range criterion provided.
Check to see which of the data obtained are not present in B.
Insert these new data into B.
As you can imagine, step 2 is basically a loop, which can easily max out the time limit on the server, and I feel like there must be a way of doing this must faster and during when the first query is made. Can anyone point me to right direction to achieve this? Can you make "one" connection to both of the DBs and do something like below?
SELECT * FROM A.some_table_in_A.some_column WHERE
"it doesn't exist in" B.some_table_in_B.some_column
A linked server might suit this
A linked server allows for access to distributed, heterogeneous
queries against OLE DB data sources. After a linked server is created,
distributed queries can be run against this server, and queries can
join tables from more than one data source. If the linked server is
defined as an instance of SQL Server, remote stored procedures can be
executed.
Check out this HOWTO as well
If I understand your question right, you're just trying to move things in the MSSQL DB into the MySQL DB. I'm also assuming there is some sort of filter criteria you're using to do the migration. If this is correct, you might try using a stored procedure in MSSQL that can do the querying of the MySQL database with a distributed query. You can then use that stored procedure to do the loops or checks on the database side and the front end server will only need to make one connection.
If the MySQL database has a primary key defined, you can at least skip step 3 ("Check to see which of the data obtained are not present in B"). Use INSERT IGNORE INTO... and it will attempt to insert all the records, silently skipping over ones where a record with the primary key already exists.