I am creating a web site using php, mysql and zend framework.
When I try to run any sql query, page generation jumps to around 0.5 seconds. That's too high. If i turn of sql, page generation is 0.001.
The amount of queries I run, doesn't really affect the page generation time (1-10 queries tested). Stays at 0.5 seconds
I can't figure out, what I am doing wrong.
I connect to sql in bootstrap:
protected function _initDatabase ()
{
try
{
$config = new Zend_Config_Ini( APPLICATION_PATH . '/configs/application.ini', APPLICATION_ENV );
$db = Zend_Db::factory( $config -> database);
Zend_DB_Table_Abstract::setDefaultAdapter( $db );
}
catch ( Zend_Db_Exception $e )
{
}
}
Then I have a simple model
class StandardAccessory extends Zend_DB_Table_Abstract
{
/**
* The default table name
*/
protected $_name = 'standard_accessory';
protected $_primary = 'model';
protected $_sequence = false;
}
And finally, inside my index controller, I just run the find method.
require_once APPLICATION_PATH . '/models/StandardAccessory.php';
$sa = new StandardAccessory( );
$stndacc = $sa->find( 'abc' );
All this takes ~0.5 seconds, which is way too long. Any suggestions?
Thanks!
Tips:
Cache the table metadata. By default, Zend_Db_Table tries to discover metadata about the table each time your table object is instantiated. Use a cache to reduce the number of times it has to do this. Or else hard-code it in your Table class (note: db tables are not models).
Use EXPLAIN to analyze MySQL's optimization plan. Is it using an index effectively?
mysql> EXPLAIN SELECT * FROM standard_accessory WHERE model = 'abc';
Use BENCHMARK() to measure the speed of the query, not using PHP. The subquery must return a single column, so be sure to return a non-indexed column so the query has to touch the data instead of just returning an index entry.
mysql> SELECT BENCHMARK(1000,
(SELECT nonindexed_column FROM standard_accessory WHERE model = 'abc'));
Note that Zend_Db_Adapter lazy-loads its db connection when you make the first query. So if there's any slowness in connecting to the MySQL server, it'll happen as you instantiate the Table object (when it queries metadata). Any reason this could take a long time? DNS lookups, perhaps?
The easiest way to debug this, is to profile your sql queries. you can use Firephp (plugin for firebug) see http://framework.zend.com/manual/en/zend.db.profiler.html#zend.db.profiler.profilers.firebug
another way to speed up things a little is to cache the metadata of your tables.
see: http://framework.zend.com/manual/en/zend.db.table.html#zend.db.table.metadata.caching
Along with the above suggestions I did a very unscientific test and found that the PDO adapter was faster for me in my application (I know mysqli is supposed to be faster but maybe it's the ZF abstraction). I show the results here (the times shown are only good for comparison)
Related
I have this really big table with some millions of records every day and in the end of every day I am extracting all the records of the previous day. I am doing this like:
String SQL = "select col1, col2, coln from mytable where timecol = yesterday";
Statement.executeQuery(SQL);
The problem is that this program takes like 2GB of memory because it takes all the results in memory then it processes it.
I tried setting the Statement.setFetchSize(10) but it takes exactly the same memory from OS it does not make any difference. I am using Microsoft SQL Server 2005 JDBC Driver for this.
Is there any way to read the results in small chunks like the Oracle database driver does when the query is executed to show only a few rows and as you scroll down more results are shown?
In JDBC, the setFetchSize(int) method is very important to performance and memory-management within the JVM as it controls the number of network calls from the JVM to the database and correspondingly the amount of RAM used for ResultSet processing.
Inherently if setFetchSize(10) is being called and the driver is ignoring it, there are probably only two options:
Try a different JDBC driver that will honor the fetch-size hint.
Look at driver-specific properties on the Connection (URL and/or property map when creating the Connection instance).
The RESULT-SET is the number of rows marshalled on the DB in response to the query.
The ROW-SET is the chunk of rows that are fetched out of the RESULT-SET per call from the JVM to the DB.
The number of these calls and resulting RAM required for processing is dependent on the fetch-size setting.
So if the RESULT-SET has 100 rows and the fetch-size is 10,
there will be 10 network calls to retrieve all of the data, using roughly 10*{row-content-size} RAM at any given time.
The default fetch-size is 10, which is rather small.
In the case posted, it would appear the driver is ignoring the fetch-size setting, retrieving all data in one call (large RAM requirement, optimum minimal network calls).
What happens underneath ResultSet.next() is that it doesn't actually fetch one row at a time from the RESULT-SET. It fetches that from the (local) ROW-SET and fetches the next ROW-SET (invisibly) from the server as it becomes exhausted on the local client.
All of this depends on the driver as the setting is just a 'hint' but in practice I have found this is how it works for many drivers and databases (verified in many versions of Oracle, DB2 and MySQL).
The fetchSize parameter is a hint to the JDBC driver as to many rows to fetch in one go from the database. But the driver is free to ignore this and do what it sees fit. Some drivers, like the Oracle one, fetch rows in chunks, so you can read very large result sets without needing lots of memory. Other drivers just read in the whole result set in one go, and I'm guessing that's what your driver is doing.
You can try upgrading your driver to the SQL Server 2008 version (which might be better), or the open-source jTDS driver.
You need to ensure that auto-commit on the Connection is turned off, or setFetchSize will have no effect.
dbConnection.setAutoCommit(false);
Edit: Remembered that when I used this fix it was Postgres-specific, but hopefully it will still work for SQL Server.
Statement interface Doc
SUMMARY: void setFetchSize(int rows)
Gives the JDBC driver a hint as to the
number of rows that should be fetched
from the database when more rows are
needed.
Read this ebook J2EE and beyond By Art Taylor
Sounds like mssql jdbc is buffering the entire resultset for you. You can add a connect string parameter saying selectMode=cursor or responseBuffering=adaptive. If you are on version 2.0+ of the 2005 mssql jdbc driver then response buffering should default to adaptive.
http://msdn.microsoft.com/en-us/library/bb879937.aspx
It sounds to me that you really want to limit the rows being returned in your query and page through the results. If so, you can do something like:
select * from (select rownum myrow, a.* from TEST1 a )
where myrow between 5 and 10 ;
You just have to determine your boundaries.
Try this:
String SQL = "select col1, col2, coln from mytable where timecol = yesterday";
connection.setAutoCommit(false);
PreparedStatement stmt = connection.prepareStatement(SQL, SQLServerResultSet.TYPE_SS_SERVER_CURSOR_FORWARD_ONLY, SQLServerResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(2000);
stmt.set....
stmt.execute();
ResultSet rset = stmt.getResultSet();
while (rset.next()) {
// ......
I had the exact same problem in a project. The issue is that even though the fetch size might be small enough, the JDBCTemplate reads all the result of your query and maps it out in a huge list which might blow your memory. I ended up extending NamedParameterJdbcTemplate to create a function which returns a Stream of Object. That Stream is based on the ResultSet normally returned by JDBC but will pull data from the ResultSet only as the Stream requires it. This will work if you don't keep a reference of all the Object this Stream spits. I did inspire myself a lot on the implementation of org.springframework.jdbc.core.JdbcTemplate#execute(org.springframework.jdbc.core.ConnectionCallback). The only real difference has to do with what to do with the ResultSet. I ended up writing this function to wrap up the ResultSet:
private <T> Stream<T> wrapIntoStream(ResultSet rs, RowMapper<T> mapper) {
CustomSpliterator<T> spliterator = new CustomSpliterator<T>(rs, mapper, Long.MAX_VALUE, NON-NULL | IMMUTABLE | ORDERED);
Stream<T> stream = StreamSupport.stream(spliterator, false);
return stream;
}
private static class CustomSpliterator<T> extends Spliterators.AbstractSpliterator<T> {
// won't put code for constructor or properties here
// the idea is to pull for the ResultSet and set into the Stream
#Override
public boolean tryAdvance(Consumer<? super T> action) {
try {
// you can add some logic to close the stream/Resultset automatically
if(rs.next()) {
T mapped = mapper.mapRow(rs, rowNumber++);
action.accept(mapped);
return true;
} else {
return false;
}
} catch (SQLException) {
// do something with this Exception
}
}
}
you can add some logic to make that Stream "auto closable", otherwise don't forget to close it when you are done.
Throughout some testings; a little question popped up. When I usually code database updates; I usually do this via callbacks which I code in PHP; to which I simply pass a given mysqli connection object as function argument. Executing all queries of for example three queries across the same single connection proved to be much faster than if closing and reopening a DB connection for each query of a given query sequence. This also works easily with SQL transactions, the connection can be passed along to callbacks without any issues.
My question is; can you also do this with prepared statement objects ? What I mean is, considering we successfully established a $conn object, representing the mysqli connection, is stuff like this legit? :
function select_users( $users_id, $stmt ) {
$sql = "SELECT username FROM users where ID = ?";
mysqli_stmt_prepare( $stmt, $sql );
mysqli_stmt_bind_param( $stmt, "i", $users_id );
mysqli_stmt_execute( $stmt );
return mysqli_stmt_get_result( $stmt );
}
function select_labels( $artist, $stmt ) {
$sql = "SELECT label FROM labels where artist = ?";
mysqli_stmt_prepare( $stmt, $sql );
mysqli_stmt_bind_param( $stmt, "s", $artist );
mysqli_stmt_execute( $stmt );
return mysqli_stmt_get_result( $stmt );
}
$stmt = mysqli_stmt_init( $conn );
$users = select_users( 1, $stmt );
$rappers = select_labels( "rapperxyz", $stmt );
or is it bad practice; and you should rather use:
$stmt_users = mysqli_stmt_init( $conn );
$stmt_rappers = mysqli_stmt_init( $conn );
$users = select_users( 1, $stmt_users );
$rappers = select_labels( "rapperxyz", $stmt_rappers );
During the testing; I noticed that the method by using a single statement object passed along callbacks works for server calls where I call like 4 not too complicated DB queries via the 4 according callbacks in a row.
When I however do a server call with like 10 different queries, sometimes (yes, only sometimes; for pretty much the same data used across the different executions; so this seems to be weird behavior to me) I get the error "Commands out of sync; you can't run this command now" and some other weird errors I've never experienced, like the amount of variables not matching the amount of parameters; although they prefectly do after checking them all. The only way to fix this I found after some research was indeed by using different statement objects for each callback. So, I just wondered; should you actually ALWAYS use ONE prepared statement object for ONE query, which you then may execute N times in a row?
Yes.
The "commands out of sync" error is because MySQL protocol is not like http. You can't send requests any time you want. There is state on the server-side (i.e. mysqld) that is expecting a certain sequence of requests. This is what's known as a stateful protocol.
Compare with a protocol like ftp. You can do an ls in an ftp client, but the list of files you get back depends on the current working directory. If you were sharing that ftp client connection among multiple functions in your app, you don't know that another function hasn't changed the working directory. So you can't be sure the file list you get from ls represents the directory you thought you were in.
In MySQL too, there's state on the server-side. You can only have one transaction open at a time. You can only have one query executing at a time. The MySQL client does not allow you to execute a new query where there are still rows to be fetched from an in-progress query. See Commands out of sync in the MySQL doc on common errors.
So if you pass your statement handle around to some callback functions, how can that function know it's safe to execute the statement?
IMO, the only safe way to use a statement is to use it immediately.
Here's the story. I'm testing doing some security testing (using zaproxy) of a Laravel (PHP framework) application running with a MySQL database as the primary store for data.
Zaproxy is reporting a possible SQL injection for a POST request URL with the following payload:
id[]=3-2&enabled[]=on
Basically, it's an AJAX request to turn on/turn off a particular feature in a list. Zaproxy is fuzzing the request: where the id value is 3-2, there should be an integer - the id of the item to update.
The problem is that this request is working. It should fail, but the code is actually updating the item where id = 3.
I'm doing things the way I'm supposed to: the model is retrieved using Eloquent's Model::find($id) method, passing in the id value from the request (which, after a bit of investigation, was determined to be the string "3-2"). AFAIK, the Eloquent library should be executing the query by binding the ID value to a parameter.
I tried executing the query using Laravel's DB class with the following code:
$result = DB::select("SELECT * FROM table WHERE id=?;", array("3-2"));
and got the row for id = 3.
Then I tried executing the following query against my MySQL database:
SELECT * FROM table WHERE id='3-2';
and it did retrieve the row where id = 3. I also tried it with another value: "3abc". It looks like any value prefixed with a number will retrieve a row.
So ultimately, this appears to be a problem with MySQL. As far as I'm concerned, if I ask for a row where id = '3-2' and there is no row with that exact ID value, then I want it to return an empty set of results.
I have two questions:
Is there a way to change this behaviour? It appears to be at the level of the database server, so is there anything in the database server configuration to prevent this kind of thing?
This looks like a serious security issue to me. Zaproxy is able to inject some arbitrary value and make changes to my database. Admittedly, this is a fairly minor issue for my application, and the (probably) only values that would work will be values prefixed with a number, but still...
SELECT * FROM table WHERE id= ? AND ? REGEXP "^[0-9]$";
This will be faster than what I suggested in the comments above.
Edit: Ah, I see you can't change the query. Then it is confirmed, you must sanitize the inputs in code. Another very poor and dirty option, if you are in an odd situation where you can't change query but can change database, is to change the id field to [VAR]CHAR.
I believe this is due to MySQL automatically converting your strings into numbers when comparing against a numeric data type.
https://dev.mysql.com/doc/refman/5.1/en/type-conversion.html
mysql> SELECT 1 > '6x';
-> 0
mysql> SELECT 7 > '6x';
-> 1
mysql> SELECT 0 > 'x6';
-> 0
mysql> SELECT 0 = 'x6';
-> 1
You want to really just put armor around MySQL to prevent such a string from being compared. Maybe switch to a different SQL server.
Without re-writing a bunch of code then in all honesty the correct answer is
This is a non-issue
Zaproxy even states that it's possibly a SQL injection attack, meaning that it does not know! It never said "umm yeah we deleted tables by passing x-y-and-z to your query"
// if this is legal and returns results
$result = DB::select("SELECT * FROM table WHERE id=?;", array("3"));
// then why is it an issue for this
$result = DB::select("SELECT * FROM table WHERE id=?;", array("3-2"));
// to be interpreted as
$result = DB::select("SELECT * FROM table WHERE id=?;", array("3"));
You are parameterizing your queries so Zaproxy is off it's rocker.
Here's what I wound up doing:
First, I suspect that my expectations were a little unreasonable. I was expecting that if I used parameterized queries, I wouldn't need to sanitize my inputs. This is clearly not the case. While parameterized queries eliminate some of the most pernicious SQL injection attacks, this example shows that there is still a need to examine your inputs and make sure you're getting the right stuff from the user.
So, with that said... I decided to write some code to make checking ID values easier. I added the following trait to my application:
trait IDValidationTrait
{
/**
* Check the ID value to see if it's valid
*
* This is an abstract function because it will be defined differently
* for different models. Some models have IDs which are strings,
* others have integer IDs
*/
abstract public static function isValidID($id);
/**
* Check the ID value & fail (throw an exception) if it is not valid
*/
public static function validIDOrFail($id)
{
...
}
/**
* Find a model only if the ID matches EXACTLY
*/
public static function findExactID($id)
{
...
}
/**
* Find a model only if the ID matches EXACTLY or throw an exception
*/
public static function findExactIDOrFail($id)
{
...
}
}
Thus, whenever I would normally use the find() method on my model class to retrieve a model, instead I use either findExactID() or findExactIDOrFail(), depending on how I want to handle the error.
Thank you to everyone who commented - you helped me to focus my thinking and to understand better what was going on.
I am using a MySQL database with phpMyAdmin as the frontend (I am not sure I have remote/client access yet). I have a script that queries the database and would like to see how long each query takes? What is the easiest way to do this? Could I install another PHP app on the server?
If you have access to MySQL config files, you can enable general query log and slow query log.
See here for details.
Since, as I think, 5.1.6, you can also do it in runtime:
SET GLOBAL long_query_time = 10 /* which queries are considered long */
SET GLOBAL slow_query_log = 1
, but try it anyway, I don't rememeber exact version when it appeared.
For most applications that I work on, I include query profiling output that can easily be turned on in the development environment. This outputs the SQL, execution time, stack trace and a link to display explain output. It also highlights queries running longer than 1 second.
Although you probably don't need something as sophisticated, you can get a fairly good sense of run time of queries by writing a function in PHP that wraps the query execution, and store debug information on the session (or simply output it). For example:
function run_query($sql, $debug=false, $output=false) {
$start = microtime(true);
$q = mysql_query($sql);
$time = microtime(true) - $start;
if($debug) {
$debug = "$sql<br/>$time<br/><br/>";
if($output) {
print $debug;
} else {
$_SESSION['sql_debug'] .= $debug;
}
}
return $q;
}
That's just kind of a rough idea. You can tweak it however you want.
Hope that helps -
You should set long_query_time to 1 since setting it to 10 will exclude most if not all of your queries.
In the mysql prompt type
set ##long_query_time=1;
We have a lot of queries
select * from tbl_message
that get stuck on the state "Writing to net". The table has 98k rows.
The thing is... we aren't even executing any query like that from our application, so I guess the question is:
What might be generating the query?
...and why does it get stuck on the state "writing to net"
I feel stupid asking this question, but I'm 99,99% sure that our application is not executing a query like that to our database... we are however executing a couple of querys to that table using WHERE statement:
SELECT Count(*) as StrCount FROM tbl_message WHERE m_to=1960412 AND m_restid=948
SELECT Count(m_id) AS NrUnreadMail FROM tbl_message WHERE m_to=2019422 AND m_restid=440 AND m_read=1
SELECT * FROM tbl_message WHERE m_to=2036390 AND m_restid=994 ORDER BY m_id DESC
I have searched our application several times for select * from tbl_message but haven't found anything... But still our query-log on our mysql server is full of Select * from tbl_message queries
Since applications don't magically generate queries as they like, I think that it's rather likely that there's a misstake somewhere in your application that's causing this. Here's a few suggestions that you can use to track it down. I'm guessing that your using PHP, since your using MySQL, so I'll use that for my examples.
Try adding comments in front of all your queries in the application, like this:
$sqlSelect = "/* file.php, class::method() */";
$sqlSelect .= "SELECT * FROM foo ";
$sqlSelect .= "WHERE criteria";
The comment will show up in your query log. If you're using some kind database api wrapper, you could potentially add these messages automatically:
function query($sql)
{
$backtrace = debug_backtrace();
// The function that executed the query
$prev = $backtrace[1];
$newSql = sprintf("/* %s */ ", $prev["function"]);
$newSql .= $sql;
mysql_query($newSql) or handle_error();
}
In case you're not using a wrapper, but rather executing the queries directly, you could use the runkit extension and the function runkit_function_rename to rename mysql_query (or whatever you're using) and intercept the queries.
There are (at least) two data retrieval modes for mysql. With the c api you either call mysql_store_result() or mysql_use_result().
mysql_store_result() returns when all result data is transferred from the MySQL server to your process' memory, i.e. no data has to be transferred for further calls to mysql_fetch_row().
However, by using mysql_use_result() each record has to be fetched individually if and when mysql_fetch_row() is called. If your application does some computing that takes longer than the time period specified in net_write_timeout between two calls to mysql_fetch_row() the MySQL server considers your connection to be timed out.
Temporarily enable the query log by putting
log=
into your my.cnf file, restart mysql and watch the query log for those mystery queries (you don't have to give the log a name, it'll assume one from the host value).