Symfony3: Doctrine batch processing with exceptions handling - exception

I need to insert multiple user from an Excel file using Symfony3 command. I have read the following article about batch processing: http://docs.doctrine-project.org/projects/doctrine-orm/en/latest/reference/batch-processing.html
I have been wondering if there is a way to no stop the flushing process when a query fails (for a not null column for instance). I would actually like to be able to not check all my data before doing the persist and let Doctrine continue the inserts, even though a query failed in the flush of 20 queries lets say.
Thank you for your help.
Kind regards,

This template may help you go further...
$batchSize = 20;
$currentSize = 0;
$data = [ .... ];
foreach ($data as $item) {
$entity = new Entity();
$entity->setProperty($item['property']);
try {
$currentSize++;
$em->persist($entity);
if ($batchSize% $currentSize === 0) {
$em->flush();
$em->clear();
}
} catch (\Doctrine\ORM\ORMException $e) {
$currentSize--;
}
}
$em->flush();
$em->clear();

Related

Improving performance on a large Doctrine query

In Symfony3, I'm using Doctrine's QueryBuilder to iterate up to 500k rows from my 35 million row table:
$query = $this->createQueryBuilder('l')
->where('l.foo = :foo')
->setParameter('foo', $foo)
->getQuery();
$results = $query->iterate();
foreach ($results as $result) {
$em->clear();
// My logic using $result[0]
}
The memory usage of this often approaches 512mb, before I even begin to iterate. Is there any further way I can optimise this? Am I correct in reading that hydration is turned off when iterating a query?
I had great results with generators. Perhaps processing results in a separate method helps PHP to cleanup unused objects. I'm not sure what you're doing to process your records, and cannot guarantee you'll get the same results, but in my case memory consumption remained constant through whole script execution:
public function getMyResults($foo)
{
$query = $this->createQueryBuilder('l')
->where('l.foo = :foo')
->setParameter('foo', $foo)
->getQuery();
foreach ($query->iterate() as $result) {
yield $result[0]
$em->clear();
}
}
public function processMyResults($foo)
{
foreach ($this->getMyResults($foo) as $result) {
}
}
If this doesn't help, consider making a query with DBAL or PDO (both with the fetch() method to avoid fetching all records at once). Doctrine's iterator might leak memory (PDO's resultset shouldn't).
Doctrine will solve 80% of your problems. The remaining 20% is better approached without it.
Am I correct in reading that hydration is turned off when iterating a query?
No, unless you change the hydration mode. You can do it by passing a second argument to the iterate() method.
Example from Doctrine docs
$batchSize = 20;
$i = 1;
$q = $em->createQuery('select u from MyProject\Model\User u');
foreach ($q->toIterable() as $user) {
$user->increaseCredit();
$user->calculateNewBonuses();
++$i;
if (($i % $batchSize) === 0) {
$em->flush(); // Executes all updates.
$em->clear(); // Detaches all objects from Doctrine!
}
}
$em->flush();

Stop mysql query when press stop button

I got stuck with this problem, I found many posts but seemed it's not useful. So I post again here and hope someone can help me.
Let say I have 2 button, 1 is Start button and 1 is Stop button. When I press start will call ajax function which query very long time. I need when I press Stop will stop immediately this query, not execute anymore.
this is function used to call query and fetch row. (customize Mysqli.php)
public function fetchMultiRowset($params = array()) {
$data = array();
$mysqli = $this->_adapter->getConnection();
$mysqli->multi_query($this->bindParams($this->_sql, $params));
$thread_id = mysqli_thread_id($mysqli);
ignore_user_abort(true);
ob_start();
$index = 0;
do {
if ($result = $mysqli->store_result()) {
while ($row = $result->fetch_array(MYSQLI_ASSOC)) {
$data[$index] = $row;
$index++;
echo " ";
ob_flush();
flush();
}
$result->free();
}
}
while ($mysqli->more_results() && $mysqli->next_result());
ob_end_flush();
return $data;
}
Function in Model:
public function select_entries() {
$data = null;
try {
$db = Zend_Db_Adapter_Mysqlicustom::singleton();
$sql = "SELECT * FROM report LIMIT 2000000";
$data = $db->fetchMultiRowset($sql);
$db->closeConnection();
} catch (Exception $exc) {
}
return $data;
}
Controller:
public function testAction(){
$op = $this->report_test->select_entries();
}
In AJAX I used xhr.abort() to stop the AJAX function. But it still runs the query while AJAX was aborted.
How do I stop query? I used Zend Framework.
EDIT: I did not look in detail at your program, now I see that not the query itself is taking so long, but the reading of all the data. So just check every 1000 rows, if the ajax call is still active. Ajax Abort.
Solution in case of a long-running SQL-query:
You would have to allow the application to kill database queries, and you need to implement a more complex interaction between Client and Server, which could lead to security holes if done wrong.
The Start-Request should contain a session and a page id (secure id, so not 3 and 4 and 5 but a non-guessable but unique hash of some kind). The backend then connects this id with the query. This could be done in some extra table of the database, but also via comments in the SQL query, like "Session fid98a08u4j, Page 940jfmkvlz" => s:<session>p:<page>.
/* s:fid98a08u4jp:940jfmkvlz */ select * from ...
If the user presses "stop", you send session and page id to the server. The php-code then fetches the list of your running SQL Queries and searches for session and page and extracts the query id.
Then the php sends a
kill query <id>
to the MySQL-server.
This might lead to trouble when not using transactions, and this might damage replication. And even a kill query might take some time in the state 'killing'.
So be sure that you can and want not to split the long running query into subqueries, which check if the request is still valid every few seconds, or that you do not just want to kill the query for cosmetical reasons.

PHP PDO - Testing connection before doing query?

public function smart_query($query, $options = null, $bindoptions = null)
{
// Code to Reconnect incase of timeout
try {
$this->db->query('SELECT * FROM templates');
}
catch (PDOException $e)
{
echo $e;
$pdooptions = array(
PDO::ATTR_PERSISTENT => true,
PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION
);
$this->db = new PDO("mysql:host=localhost;dbname=$this->database", "$this->username", "$this->password", $pdooptions);
}
$this->statement = $this->db->prepare($query);
if($bindoptions != null)
{
$this->bind($bindoptions);
}
$this->execute();
if($options != null)
{
// Return Single Row
if($options['rows'] == 1)
{
return $this->statement->fetch(PDO::FETCH_ASSOC);
}
// Return Multiple Rows
elseif($options['rows'] != 1)
{
return $this->statement->fetchAll(PDO::FETCH_ASSOC);
}
}
}
I've saw this code today, and got really confused.
It looks like he is trying to process a simple query, before doing the actual query.
Why is he checking if the connection is still open?
I thought that PDO only destroys it's connection upon script finishing executing automatically?
Is that correct to check if it's open or closed?
This implements a form of lazy loading.
This first time a query is executed through this class/function, the database connection may not be established yet. This is the purpose of this check, so that the consumer (you) does not have to mind about it.
The connection is then stored in the $this->db class member, for future reuse when you call this method again in the course of your script (and yes, this connection will stay open until the script ends -- unless it is closed explicitely beforehand, of course).
For information, this check is slightly inefficient. A simple $this->db->query('SELECT 1') would suffice, without the need to read a table at all.

Solving "MySQL server has gone away" errors

I have written some code in PHP that returns the html content from .edu domains. A brief introduction is given here: Errors regarding Web Crawler in PHP
The crawler works fine when the number of links to crawl are small (something around 40 URLS) but I am getting "MySQL server has gone away" error after this number.
I am storing html content as longtext in MySQL tables and I am not getting why the error arrives after at least 40-50 insertions.
Any help in this regard is highly appreciated.
Please note that I have already altered the wait_timeout and max_allowed_packet to accomodate my queries and the php code and now I don't know what to do. Please help me in this regard.
You might be inclined to handle this problem by "pinging" the mysql server before a query. This is a bad idea. For more on why, check this SO post: Should I ping mysql server before each query?
The best way to handle the issue is by wrapping queries inside try/catch blocks and catching any database exceptions so that you can handle them appropriately. This is especially important in long running and/or daemon type scripts. So, here's a very basic example using a "connection manager" to control access to DB connections:
class DbPool {
private $connections = array();
function addConnection($id, $dsn) {
$this->connections[$id] = array(
'dsn' => $dsn,
'conn' => null
);
}
function getConnection($id) {
if (!isset($this->connections[$id])) {
throw new Exception('Invalid DB connection requested');
} elseif (isset($this->connections[$id]['conn'])) {
return $this->connections[$id]['conn'];
} else {
try {
// for mysql you need to supply user/pass as well
$conn = new PDO($dsn);
// Tell PDO to throw an exception on error
// (like "MySQL server has gone away")
$conn->setAttribute(
PDO::ATTR_ERRMODE,
PDO::ERRMODE_EXCEPTION
);
$this->connections[$id]['conn'] = $conn;
return $conn;
} catch (PDOException $e) {
return false;
}
}
}
function close($id) {
if (!isset($this->connections[$id])) {
throw new Exception('Invalid DB connection requested');
}
$this->connections[$id]['conn'] = null;
}
}
class Crawler {
private $dbPool;
function __construct(DbPool $dbPool) {
$this->dbPool = $dbPool;
}
function crawl() {
// craw and store data in $crawledData variable
$this->save($crawledData);
}
function saveData($crawledData) {
if (!$conn = $this->dbPool->getConnection('write_conn') {
// doh! couldn't retrieve DB connection ... handle it
} else {
try {
// perform query on the $conn database connection
} catch (Exception $e) {
$msg = $e->getMessage();
if (strstr($msg, 'MySQL server has gone away') {
$this->dbPool->close('write_conn');
$this->saveData($val);
} else {
// some other error occurred
}
}
}
}
}
I have another answer that deals with what I think is a similar problem, and it would require a similar answer. Basically, you can use the mysql_ping() function to test the connection before your insert. Before MySQL 5.0.14, mysql_ping() would automatically reconnect the server, but now you have to build your own reconnect logic. Something similar to this should work for you:
function check_dbconn($connection) {
if (!mysql_ping($connection)) {
mysql_close($connection);
$connection = mysql_connect('server', 'username', 'password');
mysql_select_db('db',$connection);
}
return $connection;
}
foreach($array as $value) {
$dbconn = check_dbconn($dbconn);
$sql="insert into collected values('".$value."')";
$res=mysql_query($sql, $dbconn);
//then some extra code.
}
I was facing "Mysql server has gone away" error while using Mysql connector 5.X, replacing dll to the last version solved the problem.
Are you opening a single DB connection and reusing it? Is it possible that its a simple timeout? You might be better served by opening a new DB connection for each of your read/write operations (IE contact .edu, get text, open DB, write text, close db, repeat).
Also how are you using the handle? Is it possible that it has hit an error and has 'gone away' for that reason?
Well This is what I am doing now based on rdlowrey's suggestion and I guess this is also right.
public function url_db_html($sourceLink = NULL, $source) {
$source = mysql_real_escape_string($source);
$query = "INSERT INTO html (id, sourceLink, sourceCode)
VALUES (NULL,('$sourceLink') , ('$source'))";
try {
if(mysql_query($query, $this->connection)==FALSE) {
$msg = mysql_errno($this->connection) . ": " . mysql_error($this->connection);
throw new DbException($msg);
}
} catch (DbException $e) {
echo "<br><br>Catched!!!<br><br>";
if(strstr($e->getMessage(), 'MySQL server has gone away')) {
$this->connection = mysql_connect("localhost", "root", "");
mysql_select_db("crawler1", $this->connection);
}
}
}
So once the query has failed to execute, the script will skip it but will make sure the connection is re-established.
However, my web crawler is crashing when files such as .jpg, .bmp, .pdf, etc are encountered. Is there a way to skip those urls containing these extensions. I am using preg_match and has given pdf and doc to match. Yet I want the function to skip all links containing extensions such as mp3, pdf, etc. Is this possible??

session_regenerate_id and database handler

i am using database handler for my sessions which working fine but now i stack into a problem on authentication.
When user login with username/password i do session_regenerate_id and after that i am trying to select the current session_id.
Here is my code
session_regenerate_id();
echo $checkQ=" SELECT * FROM my_sessions WHERE id='".session_id()."' ";
......
but i dont get any results. The session_id is the correct one.
After finish load the page and copy paste the SQL Command to phpMyAdmin i get the results.
I know thats its stupid but the only reason i can think of is that session_regenerate_id() "is too slow" so when i try to read the session_id at next line the session_id has not created in database yet.
Can anyone help me!
I know it has been a while, I hope you have found an answer since this was posted, but I'll add my solution for posterity's sake.
The call to session_generate_id() will cause the value of session_id() to change:
<?php
$before = session_id();
session_regenerate_id();
$after = session_id();
var_dump($before == $after); // outputs false
This problem manifested for me because in the session write handler I was doing this (without such bogus method names, of course):
<?php
class MySQLHandler
{
function read($id)
{
$row = $this->doSelectSql($id);
if ($row) {
$this->foundSessionDuringRead = true;
}
// snip
}
function write($id, $data)
{
if ($this->foundSessionDuringRead) {
$this->doUpdateSql($id, $data);
}
else {
$this->doInsertSql($id, $data);
}
}
}
The write() method worked fine if session_regenerate_id() was never called. If it was called, however, the $id argument to write() is different to the $id passed to read(), so the update won't find any records with the new $id because they've never been inserted.
Some people suggest to use MySQL's "REPLACE INTO" syntax, but that deletes and replaces the row, which plays merry havoc if you want to have a creation date column. What I did to fix the problem was to hold on to the session ID that was passed to read, then update the session ID in the database during write using the id passed to read as the key:
<?php
class MySQLHandler
{
function read($id)
{
$row = $this->doSelectSql($id);
if ($row) {
$this->rowSessionId = $id;
}
// snip
}
function write($id, $data)
{
if ($this->rowSessionId) {
$stmt = $this->pdo->prepare("UPDATE session SET session_id=:id, data=:data WHERE session_id=:rowSessionId AND session_name=:sessionName");
$stmt->bindValue(':id', $id);
$stmt->bindValue(':rowSessionId', $this->rowSessionId);
$stmt->bindValue(':data', $data);
$stmt->bindValue(':sessionName', $this->sessionName);
$stmt->execute();
}
else {
$this->doInsertSql($id, $data);
}
}
}
I think I'm having the same problem you are having. It's unclear to me whether this is a PHP (cache) feature or a bug.
The issue is that, when using a custom SessionHandler and calling session_regenerate_id(true), the new session is not created until the script terminates. I have confirmed that by doing the same thing you did: SELECTing the new session id from the database. And the new session is not there. However, after the script finishes, it is.
This is how I fixed it:
$old_id = session_id();
// If you SELECT your DB and search for $old_id, it will be there.
session_regenerate_id(TRUE);
$new_id = session_id();
// If you SELECT your DB for either $old_id or $new_id, none will be there.
session_write_close();
session_start();
// If you SELECT your DB for $new_id, it will be there.
Therefore the solution (workaround) I came about was to force PHP to write the session. I hope this helps.