Spring JPA and Hibernate slow query log - mysql

I have a Spring JPA application which uses hibernate and mysql. I would like to log slow running database queries on this application. Hibernate provides an option to configure 'hibernate.session.events.log.LOG_QUERIES_SLOWER_THAN_MS' (with log level org.hibernate.SQL_SHOW: INFO). However using this option will log SQL with parameter values set.
Is there an option in hibernate to replace the parameter values with '?'.
ex: insert into customer (name) values (?)
instead of
insert into customer (name) values ('John')
FYI, Setting the log level on org.hibernate.SQL to DEBUG will log the SQL with ? instead of parameter values. This will print all the SQL in the logs, In my case I only want to log slow running queries
Appreciate any help.

(Maybe Hibernate is doing more than necessary...)
MySQL optionally creates "slowlog" that contains details (in including 'John') on the slow query. However, the next thing that can be done is to "digest" the log. This will replace strings and numbers with ? or S (for string). So, if you can bypass Hibernate and get the log, use pt-query-digest or mysqldumpslow -s t to do the digesting. This gives a much shorter output file (by combining queries with the same 'digest' or 'signature').
The underlying setting is long_query_time; it is a number of "seconds" and can be a fractional value such as "0.5" for half a second. Setting it to 0 would capture all queries, which it sounds like you don't want. (I almost never go much below 1.)

Related

How does Hibernate get the AutoIncrement Value on Identity Insert

I am working on a high scale application of the order of 35000 Qps, using Hibernate and MySQL.
A large table has AutoIncrement Primary key, and generation defined is IDENTITY at Hibernate. Show Sql is true as well.
Whenever an Insert happens I see only one query being fired in DB, which is an
Insert statement.
Few Questions Follow:
1) I was wondering how does Hibernate get the AutoIncrement Value after insert?
2) If the answer is "SELECT LAST_INSERT_ID()", why does it not show up at VividCortex or in Show Sql Logs...?
3) How does "SELECT LAST_INSERT_ID()" account for multiple autoincrements in different tables?
4) If MySql returns a value on Insert, why aren't the MySql clients built so that we can see what is being returned?
Thanks in Advance for all the help.
You should call SELECT LAST_INSERT_ID().
Practically, you can't do the same thing as the MySQL JDBC driver using another MySQL client. You'd have to write your own client that reads and writes the MySQL protocol.
The MySQL JDBC driver gets the last insert id by parsing packets of the MySQL protocol. The last insert id is returned in this protocol by a MySQL result set.
This is why SELECT LAST_INSERT_ID() doesn't show up in query metrics. It's not calling that SQL statement, it's picking the integer out of the result set at the protocol level.
You asked how it's done internally. A relevant line of code is https://github.com/mysql/mysql-connector-j/blob/release/8.0/src/main/protocol-impl/java/com/mysql/cj/protocol/a/result/OkPacket.java#L55
Basically, it parses an integer from a known position in a packet as it receives a result set.
I'm not going to go into any more detail about parsing the protocol. I don't have experience coding a MySQL protocol client, and it's not something I wish to do.
I think it would not be a good use of your time to implement your own MySQL client.
It probably uses the standard JDBC mechanism to get generated values.
It's not
You execute it imediately after inserting in one table, and you thus get the values that have been generated by that insert. But that's not what is being used, so it's irrelevant
Not sure what you mean by that: the MySQL JDBC driver allows doing that, using the standard JDBC API
(Too long for a comment.)
SELECT LAST_INSERT_ID() uses the value already available in the connection. (This may explain its absence from any log.)
Each table has its own auto_inc value.
(I don't know any details about Hibernate.)
35K qps is possible, but it won't be easy.
Please give us more details on the queries -- SELECTs? writes? 35K INSERTs?
Are you batching the inserts in any way? You will need to do such.
What do you then use the auto_inc value in?
Do you use BEGIN..COMMIT? What value of autocommit?

Pentaho PDI (Spoon): MySQL table output very slow (~2000 rows/s)

My table output step is terribly slow (~2.000 rows/second), compared to the input (100.000-200.000 rows/second). The MySQL server is not the problem, using native MySQL, e.g. with the "Execute SQL script" step, I get something in the 100thousands/second. I already tried (without success) the common solution of extending the SQL options by:
useServerPrepStmts=false
rewriteBatchedStatements=true
useCompression=true
I also varied the commit size parameter (100, 1.000, 10.000) and Use batch updates for inserts is enabled, also without success. What else can I do? I have tables with ~10.000.000 rows and Pentaho runs on a very potent machine, so this is not acceptable.
For this I think the ideal step is MySQL Bulk Loader step which is listed under Bulk loading section. Along with that use the said
useServerPrepStmts=false
rewriteBatchedStatements=true
useCompression=true
in JDBC options in the connection.
These useCompression will compress the traffic between the client and the MySQL server
where as other two will form INSERT INTO tbl (a,b) VALUES (1,'x'),(2,'y'),(3,'z'); without using separate insert statements for each.
Follow these steps:
Increase the RAM Size for PDI a.k.a Spoon.
Using the Command line utility such as ( Kitchen or Pan) run your Job or Transformation.
Well Now compare the speed.
Cheers!

Why does enabling server side prepared statements result in CONCUR_UPDATABLE error?

When I enable server-side prepared statments via useServerPrepStmts jdbc flag, result set update operations fail after the first request for a given query with:
Result Set not updatable.This result set must come from a statement
that was created with a result set type of ResultSet.CONCUR_UPDATABLE,
the query must select only one table, can not use functions and must
select all primary keys from that table
When I disable server-side prepared statements, result set update operations work flawlessly.
Since the query involves only 1 table, has a primary key, returns a single row, and no functions are involved, what must be happening is that the prepared statement is created with ResultSet.CONCUR_READ_ONLY and then cached server-side. Subsequent requests for the same query will draw the prepared statement from the cache and then, even though the client sends ResultSet.CONCUR_UPDATABLE for rs.updateRow(), concurrency is still set to ResultSet.CONCUR_READ_ONLY on the server.
If I am correct in above assumption, how does one override the server-side cache in this case? (everything else is fine with prepared statement caching, just result set row operations are affected).
Linux (CentOS 5.7) with:
mysql-connector-java 5.1.33
mysql 5.6.20
EDIT
not relevant I notice that the first query, which always succeeds, has this in the query log: SET SQL_SELECT_LIMIT=1, and all subsequent queries fail with this: SET SQL_SELECT_LIMIT=DEFAULT. Not sure if this is the cause of the problem, or just a side effect. Guess I'll try to manually setFetchSize on the client and see if that makes a difference...
Workaround is to append FOR UPDATE to ResultSet.CONCUR_READ_ONLY select statement on the client with ResultSet.CONCUR_UPDATABLE concurrency for the new prepared statement. This allows for server statement caching while still being able to modify a JDBC ResultSet.
Side note:
the select ... for update statement itself does not appear to be eligible for caching; i.e. query log shows Prepare and Execute lines on every request.

How to insert data to mysql directly (not using sql queries)

I have a MySQL database that I use only for logging. It consists of several simple look-alike MyISAM tables. There is always one local (i.e. located on the same machine) client that only writes data to db and several remote clients that only read data.
What I need is to insert bulks of data from local client as fast as possible.
I have already tried many approaches to make this faster such as reducing amount of inserts by increasing the length of values list, or using LOAD DATA .. INFILE and some others.
Now it seems to me that I've came to the limitation of parsing values from string to its target data type (doesn't matter if it is done when parsing queries or a text file).
So the question is:
does MySQL provide some means of manipulating data directly for local clients (i.e. not using SQL)? Maybe there is some API that allow inserting data by simply passing a pointer.
Once again. I don't want to optimize SQL code or invoke the same queries in a script as hd1 adviced. What I want is to pass a buffer of data directly to the database engine. This means I don't want to invoke SQL at all. Is it possible?
Use mysql's LOAD DATA command:
Write the data to file in CSV format then execute this OS command:
LOAD DATA INFILE 'somefile.csv' INTO TABLE mytable
For more info, see the documentation
Other than LOAD DATA INFILE, I'm not sure there is any other way to get data into MySQL without using SQL. If you want to avoid parsing multiple times, you should use a client library that supports parameter binding, the query can be parsed and prepared once and executed multiple times with different data.
However, I highly doubt that parsing the query is your bottleneck. Is this a dedicated database server? What kind of hard disks are being used? Are they fast? Does your RAID controller have battery backed RAM? If so, you can optimize disk writes. Why aren't you using InnoDB instead of MyISAM?
With MySQL you can insert multiple tuples with one insert statement. I don't have an example, because I did this several years ago and don't have the source anymore.
Consider as mentioned to use one INSERT with multiple values:
INSERT INTO table_name (col1, col2) VALUES (1, 'A'), (2, 'B'), (3, 'C'), ( ... )
This leads to you only having to connect to your database with one bigger query instead of several smaller. It's easier to take in the entire couch through the door once than running back and forth with all disassembled pieces of the couch, opening the door every time. :)
Apart from that, you can also run LOCK TABLES table_name WRITE before INSERT and UNLOCK TABLES afterwards. That will secure that nothing else is inserted during.
Lock tables
INSERT into foo (foocol1, foocol2) VALUES ('foocol1val1', 'foocol2val1'),('foocol1val2','foocol2val2') and so on should sort you. More information and sample code will be found here. If you have further problems, do leave a comment.
UPDATE
If you don't want to use SQL, then try this shell script to do as many inserts as you want, put it in a file, say insertToDb.sh, and get on with your day/evening:
#!/bin/sh
mysql --user=me --password=foo dbname -h foo.example.com -e "insert into tablename (col1, col2) values ($1, $2);"
Invoke as sh insertToDb.sh col1value col2value. If I've still misunderstood your question, leave another comment.
After making some investigation I found no way of passing data directly to mysql database engine (without parsing it).
My aim was to speed up communication between local client and db server as much as possible. The idea was if client is local then it could use some api functions to pass data to db engine thus not using (i.e. parsing) SQL and values in it. The only closest solution was proposed by bobwienholt (using prepared statement and binding parameters). But LOAD DATA .. INFILE appeared to be a bit faster in my case.
The best way to insert data on MS SQL without using insert into or update queries is just to access MS SQL Interface. Right click on the table name and select "Edit top 200 rows". Then you will be able to add data on the database directly by just typing per cell. For you to enable searching or using select or other sql commands just right click on any of the 200 rows you have selected. Go to pane then select SQL and you can add sql command. Check it out. :D
without using insert statement , use " Sqllite Studio " for inserting data in mysql. It's free and open source so u can download and check.

Is there a way to filter a SQL Profiler trace?

I'm trying to troubleshoot this problem using SQL Profiler (SQL 2008)
After a few days running the trace in production, finally the error happened again, and now i'm trying to diagnose the cause. The problem is that the trace has 400k rows, 99.9% of which are coming from "Report Server", which I don't even know why it's on, but it seems to be pinging SQL Server every second...
Is there any way to filter out some records from the trace, to be able to look at the rest?
Can I do this with the current .trc file, or will I have to run the trace again?
Are there other applications to look at the .trc file that can give me this functionality?
You can load a captured trace into SQL Server Profiler: Viewing and Analyzing Traces with SQL Server Profiler.
Or you can load into a tool like ClearTrace (free version) to perform workload analysis.
You can load into a SQL Server table, like so:
SELECT * INTO TraceTable
FROM ::fn_trace_gettable('C:\location of your trace output.trc', default)
Then you can run a query to aggregate the data such as this one:
SELECT
COUNT(*) AS TotalExecutions,
EventClass,
CAST(TextData as nvarchar(2000)) ,
SUM(Duration) AS DurationTotal ,
SUM(CPU) AS CPUTotal ,
SUM(Reads) AS ReadsTotal ,
SUM(Writes) AS WritesTotal
FROM
TraceTable
GROUP BY
EventClass,
CAST(TextData as nvarchar(2000))
ORDER BY
ReadsTotal DESC
Also see: MS SQL Server 2008 - How Can I Log and Find the Most Expensive Queries?
It is also common to set up filters for the captured trace before starting it. For example, a commonly used filter is to limit to only events which require more than a certain number of reads, say 5000.
Load the .trc locally and then Use save to database to local db and then query to your hearts content.
These suggestions are great for an existing trace - if you want to filter the trace as it occurs, you can set up event filters on the trace before you start it.
The most useful filter in my experience is application name - to do this you have to ensure that every connection string used to connect to your database has an appropriate Application Name value in it, ie:
"...Server=MYDB1;Integrated Authentication=SSPI;Application Name=MyPortal;..."
Then in the trace properties for a new trace, select the Events Selection tab, then click Column Filters...
Select the ApplicationName filter, and add values to LIKE to include only the connections you have indicated, ie using MyPortal in the LIKE field will only include events for connections that have that application name.
This will stop you from collecting all the crud that Reporting Services generates, for example, and make subsequent analysis a lot faster.
There are a lot of other filters available as well, so if you know what you are looking for, such as long execution (Duration) or large IO (Reads, Writes) then you can filter on that as well.
Since SQL Server 2005, you can filter a .trc file content, directly from SQL Profiler; without importing it to a SQL table. Just follow the procedure suggested here:
http://msdn.microsoft.com/en-us/library/ms189247(v=sql.90).aspx
An additional hint: you can use '%' as a filter wildcard. For instance, if you want to filter by HOSTNAME like SRV, then you can use SRV%.
Here you can find a complete script to query the default trace with the complete list of events you can filter:
http://zaboilab.com/sql-server-toolbox/anayze-sql-default-trace-to-investigate-instance-events
You have to query sys.fn_trace_gettable(#TraceFileName,default) joining sys.trace_events to decode events numbers.