Importing data from VERY large text file into Mysql [duplicate] - mysql

I have a very large CSV file (150 MB). What is the best way to import it to MySQL?
I have to do some manipulation in PHP before inserting it into the MySQL table.

You could take a look at LOAD DATA INFILE in MySQL.
You might be able to do the manipulations once the data is loaded into MySQL, rather than first reading it into PHP. First store the raw data in a temporary table using LOAD DATA INFILE, then transform the data to the target table using a statement like the following:
INSERT INTO targettable (x, y, z)
SELECT foo(x), bar(y), z
FROM temptable

I would just open it with fopen and use fgetcsv to read each line into an array.
pseudo-php follows:
mysql_connect( //connect to db);
$filehandle = fopen("/path/to/file.csv", "r");
while (($data = fgetcsv($filehandle, 1000, ",")) !== FALSE) {
// $data is an array
// do your parsing here and insert into table
}
fclose($filehandle)

Related

Upload contents of CSV as new maximum stock position in Exact Online

I want to upload the contents of a CSV file as new values in Exact Online data set using for instance the following SQL statement:
update exactonlinerest..ItemWarehouses
set maximumstock=0
where id='06071a98-7c74-4c26-9dbe-1d422f533246'
and maximumstock != 0
I can retrieve the contents of the file using:
select *
from files('C:\path\SQLScripts', '*.csv', true)#os fle
join read_file_text(fle.file_path)#os
But seem unable to change the multi-line text in the file_contents field to separate lines or records.
How can I split the file_contents's field into multi lines (for instance using 'update ...' || VALUE and then running it through ##mydump.sql or directly using insert into / update statement)?
For now I've been able to solve it using regular expressions and then loading generated SQL statement into the SQL engine as follows:
select regexp_replace(rft.file_contents, '^([^,]*),([^,]*)(|,.*)$', 'update exactonlinerest..ItemWarehouses set maximumstock = $1 where code = $2 and maximumstock != $1;' || chr(13), 1, 0, 'm') stmts
, 'dump2.sql' filename
from files('C:\path\SQLScripts', '*.csv', true)#os fle
join read_file_text(fle.file_path)#os rft
local export documents in stmts to "c:\path\sqlscripts" filename column filename
#c:\hantex\path\dump2.sql
However, it is error prone when I have a single quote in the article code.

Pig decimal value not working

I am studying the PIG language in cloudera, and I have some problem with decimal value.
I have a csv file, where I have a lot of data with different types.
I have a data column named "petrol_average" with value like "5,78524512".
I want to load this data from my CSV file.
My script is :
*> a = LOAD ‘myfile.csv’ USING PigStorage(‘;’) AS (country: chararray,
> petrol_average: double);
>
> b = FOREACH a generate country, petrol_average;
>
> DUMP B;*
The result dumped is like:
*"(Canada, )
(Brazil, 5.0)
(France, )
(United States 8.0)
..."*
In my Csv file i have value for the petrol_average Canada and France.
My pig script is not showing me the value and the value for Brazil is 5,78524512, the value is automatically rounded.
Do you have some answer for my problem ?
Sorry for my English.
sample of myfile.csv
a,578524512
b,8596243
c,15424685
d,14253685
code
A = Load 'data/MyFile.txt' using PigStorage(',') as (country:chararray,petrol_average:long);
NOTE:
you have create schema with double but your data is simple integer so that it remove data after first digit so that i have used it as long
grunt> dump A;
grunt> B = FOREACH A generate country, petrol_average;
grunt> dump B;
result
(a,578524512)
(b,8596243)
(c,15424685)
(d,14253685)
work fine happy hadoop :)
#MaheshGupta
Thank you for your answer, When I am using float or long I have a result like this :
()
(8.0)
()
()
()
()
()
()
()
()
()
When i declare it in my schema as chararray I have this result :
(9,100000381)
(8,199999809)
(8,399999619)
(8,100000381)
(8,399999619)
(8,399999619)
(8,399999619)
(8,100000381)
(8,5)
(8,199999809)
(9)
My script is this one:
a = LOAD 'myfile.csv' USING PigStorage(';') AS
(country: chararray;
petrol_average chararray);
b = FOREACH a generate petrol_average;
DUMP b;
My big problem is for division or addition because I can't do it, the type is a Chararray.

SSIS write DT_NTEXT into an UTF-8 csv file

I need to write the result of an SQL query into a CSV file (UTF-8 (I need this encoding as there are French letters)). One of the columns is too large (more than 20000 char) so I can't use DT_WSTR for it. The type that is inputted is DT_TEXT so I use a Data Conversion to change it to DT_NTEXT. But then when I want to write it to the file I have this error message :
Error 2 Validation error. The data type for "input column" is
DT_NTEXT, which is not supported with ANSI files. Use DT_TEXT instead
and convert the data to DT_NTEXT using the data conversion component
Is there a way I can write the data to my file?
Thank you
I had this kind of issues also sometimes. When working with data larger than 255 characters SSIS sees it as blob data and will always handle this as such.
I then converted this blob stream data to a readable text with a script component. Then other transformation should be possible.
This was the case in ssis that came with sql server 2008 but I believe this isn't changed yet.
I ended up doing just like Samyne says, I used a script.
First I've modified my SQL SP, instead of having several columns I put all the info in one single column like follows :
Select Column1 + '^' + Column2 + '^' + Column3 ...
Then I used this code in a script
string fileName = Dts.Variables["SLTemplateFilePath"].Value.ToString();
using (var stream = new FileStream(fileName, FileMode.Truncate))
{
using (var sw = new StreamWriter(stream, Encoding.UTF8))
{
OleDbDataAdapter oleDA = new OleDbDataAdapter();
DataTable dt = new DataTable();
oleDA.Fill(dt, Dts.Variables["FileData"].Value);
foreach (DataRow row in dt.Rows)
{
foreach (DataColumn column in dt.Columns)
{
sw.WriteLine(row[column]);
}
}
sw.WriteLine();
}
}
Putting all the info in one column is optional, I just wanted to avoid handling it in the script, this way if my SP is changed I don't need to modify the SSIS.

perl script to create xml from mysql query - out of memory

I need to generate an XML file from database records, and I get the error "out of memory". Here's the script I am using, it's found on Google, but it's not suitable for me, and it's also killing the server's allocated memory. It's a start though.
#!/usr/bin/perl
use warnings;
use strict;
use XML::Simple;
use DBI;
my $dbh = DBI->connect('DBI:mysql:db_name;host=host_address','db_user','db_pass')
or die DBI->errstr;
# Get an array of hashes
my $recs = $dbh->selectall_arrayref('SELECT * FROM my_table',{ Columns => {} });
# Convert to XML where each hash element becomes an XML element
my $xml = XMLout( {record => $recs}, NoAttr => 1 );
print $xml;
$dbh->disconnect;
This script only prints the records, because I tested with a where clause for a single row id.
First of all, I couldn't manage to make it to save the output to a file.xml.
Second, I need somehow to split the "job" in multiple jobs and then put together the XML file all in one piece.
I have no idea how to achieve both.
Constraint: No access to change server settings.
These are problem lines:
my $recs = $dbh->selectall_arrayref('SELECT * FROM my_table',{ Columns => {} });
This reads the whole table into memory, representing every single row as an array of values.
my $xml = XMLout( {record => $recs}, NoAttr => 1 );
This is probably even larger structure, it is a the whole XML string in one go.
The lowest memory-use solution needs to involve loading the table one item at a time, and printing that item out immediately. In DBI, it is possible to make a query so that you fetch one row at a time in a loop.
You will need to play with this before the result looks like your intended output (I haven't tried to match your XML::Simple output - I'm leaving that to you:
print "<records>\n";
my $sth = $dbh->prepare('SELECT * FROM my_table');
$sth->execute;
while ( my $row = $sth->fetchrow_arrayref ) {
# Convert db row to XML row
print XMLout( {row => $row}, NoAttr => 1 ),"\n";
}
print "</records>\n";
Perl can use e.g. open( FILEHANDLE, mode, filename ) to start access to a file and print FILEHANDLE $string to print to it, or you could just call your script and pipe it to a file e.g. perl myscript.pl > table.xml
It's the select * with no contraints that will be killing your memory. Add some constraint to your query ie date or id and use a loop to execute the query and do your output in chunks. That way you won't need to load the whole table in mem before your get started on the output.

import multiple table data into mysql using single csv file

I have this table in csv file
Now i have two mysql tables dealing with this csv file,
Data A, B, C has to get stored in Table1 whereas data D, E, F, G, H has to get stored in Table2.
I have the above formatted csv file, how can i upload its data to MYSQL database??
So that from same file input can be done for different tables.
Try this it's working well, you can add as many values as possible depending on the number of columns you have in the CSV file.
<?php
$fname = $_FILES['csv_file']['name'];
$chk_ext = explode(".",$fname);
$filename = $_FILES['csv_file']['tmp_ name'];
$handle = fopen($filename, "r");
if(!$handle){
die ('Cannot open file for reading');
}
while (($data = fgetcsv($handle, 10000, ",")) !== FALSE)
{
$query = "INSERT INTO tablename (col1_csv, col2_csv)
values ('$data[0]', '$data[1]');
mysql_query($query)
or die(mysql_error());
$query1 = "INSERT INTO tablename (col1_csv, col2_csv)
values ('$data[0]', '$data[1]');
mysql_query($query1)
or die(mysql_error());
}
fclose($handle);
?>
Probably only via custom script (PHP or so)...
But it's usually not hard to split CSV into 2 files and import both into separate tables using LOAD_DATA function from MySQL directly. It will be MUCH faster than using phpMyAdmin or similar scripts.
Load the data in a temporary table, then split up in two tables.
Then drop that temporary table. I somewhere read that php can do this, but I cannot help in php codes.