perl script to create xml from mysql query - out of memory - mysql

I need to generate an XML file from database records, and I get the error "out of memory". Here's the script I am using, it's found on Google, but it's not suitable for me, and it's also killing the server's allocated memory. It's a start though.
#!/usr/bin/perl
use warnings;
use strict;
use XML::Simple;
use DBI;
my $dbh = DBI->connect('DBI:mysql:db_name;host=host_address','db_user','db_pass')
or die DBI->errstr;
# Get an array of hashes
my $recs = $dbh->selectall_arrayref('SELECT * FROM my_table',{ Columns => {} });
# Convert to XML where each hash element becomes an XML element
my $xml = XMLout( {record => $recs}, NoAttr => 1 );
print $xml;
$dbh->disconnect;
This script only prints the records, because I tested with a where clause for a single row id.
First of all, I couldn't manage to make it to save the output to a file.xml.
Second, I need somehow to split the "job" in multiple jobs and then put together the XML file all in one piece.
I have no idea how to achieve both.
Constraint: No access to change server settings.

These are problem lines:
my $recs = $dbh->selectall_arrayref('SELECT * FROM my_table',{ Columns => {} });
This reads the whole table into memory, representing every single row as an array of values.
my $xml = XMLout( {record => $recs}, NoAttr => 1 );
This is probably even larger structure, it is a the whole XML string in one go.
The lowest memory-use solution needs to involve loading the table one item at a time, and printing that item out immediately. In DBI, it is possible to make a query so that you fetch one row at a time in a loop.
You will need to play with this before the result looks like your intended output (I haven't tried to match your XML::Simple output - I'm leaving that to you:
print "<records>\n";
my $sth = $dbh->prepare('SELECT * FROM my_table');
$sth->execute;
while ( my $row = $sth->fetchrow_arrayref ) {
# Convert db row to XML row
print XMLout( {row => $row}, NoAttr => 1 ),"\n";
}
print "</records>\n";
Perl can use e.g. open( FILEHANDLE, mode, filename ) to start access to a file and print FILEHANDLE $string to print to it, or you could just call your script and pipe it to a file e.g. perl myscript.pl > table.xml

It's the select * with no contraints that will be killing your memory. Add some constraint to your query ie date or id and use a loop to execute the query and do your output in chunks. That way you won't need to load the whole table in mem before your get started on the output.

Related

How to write value stored in a variable to CSV in MySQL

MySQL: I want to write a timestamp stored in a variable to a CSV file and I only know how to store a query.
Here's how I do it (using my own SQL() function to retrieve the result set via fetch_all(MYSQLI_ASSOC), but you're clearly ok on that bit):
$out = fopen($file, 'w');
$rows = SQL($sqlstatement);
fputcsv($out, array_keys($rows[0]));
foreach($rows as $row) { fputcsv($out, $row); }
fclose($out);
The key is the built-in PHP fputcsv function. The first use of fputcsv writes the column headings (what a fantastic language PHP is...), the others (in the loop) write the rows.

How can I grab multiple records from a MySQL query in Perl using array pointers?

I can do this all as one function, but in trying to port it over to my packages of functions (library) I am missing something.
Here's what I want to do from my main Perl script
my #rows;
$result = Funx::dbcdata($myConnection,
"SELECT * FROM Inv where name like \"%DOG%\";", \#rows);
Then in my library package I am attempting this
sub dbcdata
{
my ($connection, $command, $array) = #_;
my $query = $connection->prepare($command);
my $result = $query->execute();
my $i =0;
while(my $row = $query->fetchrow_arrayref() )
{
#{$array}[$i] = $row;
$i++;
}
$query->finish;
return $result;
}
I was hoping to get back pointers or references to each row (which was 4in this case) but am not. Every element in #rows is the same:
ARRAY(0x5577a0f77ec0) ARRAY(0x5577a0f77ec0) ARRAY(0x5577a0f77ec0)
ARRAY(0x5577a0f77ec0)
Nor do I know how to turn each one into the original separate row. Any help would be appreciated, thanks.
From the documentation for fetchrow_arrayref:
Note that the same array reference is returned for each fetch, so don't store the reference and then use it after a later fetch. Also, the elements of the array are also reused for each row, so take care if you want to take a reference to an element.
Sounds like you want fetchall_arrayref:
The fetchall_arrayref method can be used to fetch all the data to be returned from a prepared and executed statement handle. It returns a reference to an array that contains one reference per row.
After executing the statement, you can do something like
#{$array} = $query->fetchall_arrayref->#*;
instead of that ugly loop.
But selectall_array might be even better. Your whole function can be replaced by a call to it:
my #rows =
$myConnection->selectall_array(q/SELECT * FROM Inv WHERE name LIKE '%DOG%'/);

Perl / DBI query doesn't preserve integer values for JSON output

I can't get this Perl code to return true integer values for integers in the table. The MySQL table columns are correctly specified as integers, yet the JSON output here wraps all query values in quotes. How can I correctly preserve data-types (esp. integers and boolean values) as specified?
use strict;
use warnings;
use DBI;
use JSON;
my $sth = "SELECT id, name, age FROM table";
my $data = $dbh->selectall_arrayref($sth, {Slice => {}});
my $response = encode_json($data);
print $response;
## outputs: {"id":"1","name":"Joe Blodge","age":"42"}
What am I doing wrong here? How can I get this to output the correctly formatted JSON:
{"id":1,"name":"Joe Blodge","age":42}
DBD::mysql returns all results as strings (see https://github.com/perl5-dbi/DBD-mysql/issues/253). Normally Perl doesn't care, encoding to JSON is one of the few times when it matters. You can either use Cpanel::JSON::XS::Type to provide type declarations for your JSON structure:
use Cpanel::JSON::XS;
use Cpanel::JSON::XS::Type;
my $response = encode_json($data, {id => JSON_TYPE_INT, name => JSON_TYPE_STRING, age => JSON_TYPE_INT});
or you can go through and numify the appropriate elements before JSON encoding.
$data->{$_} += 0 for qw(id age);
It is possible to check the type (as indicated by MySQL) of each returned column, if you construct and execute your query using a statement handle then the type will be available as an array in $sth->{TYPE}, but this is pretty complex and may not be reliable.

Loop through query results without loading them all in array in Codeigniter [duplicate]

The normal result() method described in the documentation appears to load all records immediately. My application needs to load about 30,000 rows, and one at a time, submit them to a third-party search index API. Obviously loading everything into memory at once doesn't work well (errors out because of too much memory).
So my question is, how can I achieve the effect of the conventional MySQLi API method, in which you load one row at a time in a loop?
Here is something you can do.
while ($row = $result->_fetch_object()) {
$data = array(
'id' => $row->id
'some_value' => $row->some_field_name
);
// send row data to whatever api
$this->send_data_to_api($data);
}
This will get one row at the time. Check the CodeIgniter source code, and you will see that they will do this when you execute the result() method.
For those who want to save memory on large result-set:
Since CodeIgniter 3.0.0,
There is a unbuffered_row function,
All the methods above will load the whole result into memory (prefetching). Use unbuffered_row() for processing large result sets.
This method returns a single result row without prefetching the whole result in memory as row() does. If your query has more than one row, it returns the current row and moves the internal data pointer ahead.
$query = $this->db->query("YOUR QUERY");
while ($row = $query->unbuffered_row())
{
echo $row->title;
echo $row->name;
echo $row->body;
}
You can optionally pass ‘object’ (default) or ‘array’ in order to specify the returned value’s type:
$query->unbuffered_row(); // object
$query->unbuffered_row('object'); // object
$query->unbuffered_row('array'); // associative array
Official Document: https://www.codeigniter.com/userguide3/database/results.html#id2
Well, the thing is that result() gives away the entire reply of the query. row() simply fetches the first case and dumps the rest. However the query can still fetched 30 000 rows regardles of which function you use.
One design that would fit your cause would be:
$offset = (int)#$_GET['offset'];
$query = $this-db->query("SELECT * FROM table LIMIT ?, 1", array($offset));
$row = $query->row();
if ($row) {
/* Run api with values */
redirect(current_url().'?offset'.($offset + 1));
}
This would take one row, send it to api, update the page and use the next row. It will alos prevent the page from having a timeout. However it would most likely take a while with 30 000 records and refreshes, so you may wanna adjust your LIMIT ?, 1 to a higher number than 1 and go result() and foreach() multiple apis per pageload.
Well, there'se the row() method, which returns just one row as an object, or the row_array() method, which does the same but returns an array (of course).
So you could do something like
$sql = "SELECT * FROM yourtable";
$resultSet = $this->db->query($sql);
$total = $resultSet->num_rows();
for($i=0;$i<$total;$i++) {
$row = $resultSet->row_array($i);
}
This fetches in a loop each row from the whole result set.
Which is about the same as fetching everyting and looping over the $this->db->query($sql)->result() method calls I believe.
If you want a row at a time either you make 30.000 calls, or you select all the results and fetch them one at a time or you fetch all and walk over the array. I can't see any way out now.

Problems parsing Reddit's JSON

I'm working on a perl script that parses reddit's JSON using the JSON module.
However I do have the problem of being very new to both perl and json.
I managed to parse the front page and subreddits successfully, but the comments have a different structure and I can't figure out how to access the data I need.
Here's the code that successfully finds the "data" hash for the front page and subreddits:
foreach my $children(#{$json_text->{"data"}->{"children"}}) #For values of children.
{
my $data = $children->{"data"}; #accessing each data hash.
my %phsh = (); #my hash to collect and print.
$phsh{author} = $data->{"author"};#Here I get the "author" value from "data"
*Etc....
This successfully gets what I need from http://www.reddit.com/.json
But when I go to the json of a comment, this one for example, it has a different format and I can't figure out how to parse it. If I try the same thing as before my parser crashes, saying it is not a HASH reference.
So my question is: How do access the "children" in the second JSON? I need to get both the data for the Post and the data for the comments. Can anybody help?
Thanks in advance!
(I know it may be obvious, but I'm running on very little sleep XD)
You need to either look at the JSON data or dump the decoded data to see what form it takes. The comment data, for example is an array at the top level.
Here is some code that prints the body field of all top-level comments. Note that a comment may have an array of replies in its replies field, and each reply may also have replies in turn.
Depending on what you want to do you may need to check whether a reference is to an array or a hash by checking the value returned by the ref operator.
use strict;
use warnings;
binmode STDOUT, ':utf8';
use JSON;
use LWP;
use Data::Dump;
my $ua = LWP::UserAgent->new;
my $resp = $ua->get('http://www.reddit.com/r/funny/comments/wx3n5/caption_win.json');
die $resp->status_line unless $resp->is_success;
my $json = $resp->decoded_content;
my $data = decode_json($json);
die "Error: $data->{error}" if ref $data eq 'HASH' and exists $data->{error};
dd $data->[1]{data}{children}[0];
print "\n\n";
my $children = $data->[1]{data}{children};
print scalar #$children, " comments:\n\n";
for my $child (#$children) {
print $child->{data}{body}, "\n";
}