NBSP creeping inside mySQL data [duplicate] - mysql

I have a spreadsheet which really has only one complicated table. I basically convert the spreadsheet to a cvs and use a groovy script to generate the INSERT scripts.
However, I cannot do this with a table that has 28 fields with data within some of the fields on the spreadsheet that make importing into the CVS even more complicated. So the fields in the new CVS are not differentiated properly or my script has not accounted for it.
Does anyone have any suggestions on a better approach to do this? Thanks.

Have a look at LOAD DATA INFILE statement. It will help you to import data from the CSV file into table.

This is a recurrent question on stackoverflow. Here is an updated answer.
There are actually several ways to import an excel file in to a MySQL database with varying degrees of complexity and success.
Excel2MySQL or Navicat utilities. Full disclosure, I am the author of Excel2MySQL. These 2 utilities aren't free, but they are the easiest option and have the fewest limitations. They also include additional features to help with importing Excel data into MySQL. For example, Excel2MySQL automatically creates your table and automatically optimizes field data types like dates, times, floats, etc. If your in a hurry or can't get the other options to work with your data then these utilities may suit your needs.
LOAD DATA INFILE: This popular option is perhaps the most technical and requires some understanding of MySQL command execution. You must manually create your table before loading and use appropriately sized VARCHAR field types. Therefore, your field data types are not optimized. LOAD DATA INFILE has trouble importing large files that exceed 'max_allowed_packet' size. Special attention is required to avoid problems importing special characters and foreign unicode characters. Here is a recent example I used to import a csv file named test.csv.
phpMyAdmin: Select your database first, then select the Import tab. phpMyAdmin will automatically create your table and size your VARCHAR fields, but it won't optimize the field types. phpMyAdmin has trouble importing large files that exceed 'max_allowed_packet' size.
MySQL for Excel: This is a free Excel Add-in from Oracle. This option is a bit tedious because it uses a wizard and the import is slow and buggy with large files, but this may be a good option for small files with VARCHAR data. Fields are not optimized.

For comma-separated values (CSV) files, the results view panel in Workbench has an "Import records from external file" option that imports CSV data directly into the result set. Execute that and click "Apply" to commit the changes.
For Excel files, consider using the official MySQL for Excel plugin.

A while back I answered a very similar question on the EE site, and offered the following block of Perl, as a quick and dirty example of how you could directly load an Excel sheet into MySQL. Bypassing the need to export / import via CSV and so hopefully preserving more of those special characters, and eliminating the need to worry about escaping the content.
#!/usr/bin/perl -w
# Purpose: Insert each Worksheet, in an Excel Workbook, into an existing MySQL DB, of the same name as the Excel(.xls).
# The worksheet names are mapped to the table names, and the column names to column names.
# Assumes each sheet is named and that the first ROW on each sheet contains the column(field) names.
#
use strict;
use Spreadsheet::ParseExcel;
use DBI;
use Tie::IxHash;
die "You must provide a filename to $0 to be parsed as an Excel file" unless #ARGV;
my $sDbName = $ARGV[0];
$sDbName =~ s/\.xls//i;
my $oExcel = new Spreadsheet::ParseExcel;
my $oBook = $oExcel->Parse($ARGV[0]);
my $dbh = DBI->connect("DBI:mysql:database=$sDbName;host=192.168.123.123","root", "xxxxxx", {'RaiseError' => 1,AutoCommit => 1});
my ($sTableName, %hNewDoc, $sFieldName, $iR, $iC, $oWkS, $oWkC, $sSql);
print "FILE: ", $oBook->{File} , "\n";
print "DB: $sDbName\n";
print "Collection Count: ", $oBook->{SheetCount} , "\n";
for(my $iSheet=0; $iSheet < $oBook->{SheetCount} ; $iSheet++)
{
$oWkS = $oBook->{Worksheet}[$iSheet];
$sTableName = $oWkS->{Name};
print "Table(WorkSheet name):", $sTableName, "\n";
for(my $iR = $oWkS->{MinRow} ; defined $oWkS->{MaxRow} && $iR <= $oWkS->{MaxRow} ; $iR++)
{
tie ( %hNewDoc, "Tie::IxHash");
for(my $iC = $oWkS->{MinCol} ; defined $oWkS->{MaxCol} && $iC <= $oWkS->{MaxCol} ; $iC++)
{
$sFieldName = $oWkS->{Cells}[$oWkS->{MinRow}][$iC]->Value;
$sFieldName =~ s/[^A-Z0-9]//gi; #Strip non alpha-numerics from the Column name
$oWkC = $oWkS->{Cells}[$iR][$iC];
$hNewDoc{$sFieldName} = $dbh->quote($oWkC->Value) if($oWkC && $sFieldName);
}
if ($iR == $oWkS->{MinRow}){
#eval { $dbh->do("DROP TABLE $sTableName") };
$sSql = "CREATE TABLE IF NOT EXISTS $sTableName (".(join " VARCHAR(512), ", keys (%hNewDoc))." VARCHAR(255))";
#print "$sSql \n\n";
$dbh->do("$sSql");
} else {
$sSql = "INSERT INTO $sTableName (".(join ", ",keys (%hNewDoc)).") VALUES (".(join ", ",values (%hNewDoc)).")\n";
#print "$sSql \n\n";
eval { $dbh->do("$sSql") };
}
}
print "Rows inserted(Rows):", ($oWkS->{MaxRow} - $oWkS->{MinRow}), "\n";
}
# Disconnect from the database.
$dbh->disconnect();
Note:
Change the connection ($oConn) string to suit, and if needed add a
user-id and password to the arguments.
If you need XLSX support a quick switch to Spreadsheet::XLSX is all
that's needed. Alternatively it only takes a few lines of code, to
detect the filetype and call the appropriate library.
The above is a simple hack, assumes everything in a cell is a string
/ scalar, if preserving type is important, a little function with a
few regexp can be used in conjunction with a few if statements to
ensure numbers / dates remain in the applicable format when written
to the DB
The above code is dependent on a number of CPAN modules, that you can install, assuming outbound ftp access is permitted, via a:
cpan YAML Data::Dumper Spreadsheet::ParseExcel Tie::IxHash Encode Scalar::Util File::Basename DBD::mysql
Should return something along the following lines (tis rather slow, due to the auto commit):
# ./Excel2mysql.pl test.xls
FILE: test.xls
DB: test
Collection Count: 1
Table(WorkSheet name):Sheet1
Rows inserted(Rows):9892

Related

Mysql update command from csv data

I am going round in circles, please can someone help with what I guess is a relatively easy problem.
I have a table with 1200 users.
One of the fields in table db_user_accounts is 'status'
I have a sub list of those users in a csv that I want to set the 'status' to '5'
the csv is ordered user, status
I found this -
<?php
if (($handle = fopen("input.csv", "r")) !== FALSE)
{
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE)
{
mysql_query(UPDATE db-user_accounts SET status="{$data[1]}" WHERE user = "{$data[0]}");
}
fclose($handle);
}
?>
Im not sure what the 1000 is for or whether this will actually work.
Any advice gratefully recieved
thanks
This code seems good and should work. 1000 is the length of the line, as from the PHP manual (quoting below):
Ref:http://php.net/manual/en/function.fgetcsv.php
Must be greater than the longest line (in characters) to be found in
the CSV file (allowing for trailing line-end characters). Otherwise,
the line is split into chunks of length characters unless the split
would occur inside an enclosure.
Omitting this parameter (or setting it to 0 in PHP 5.1.0 and later)
the maximum line length is not limited, which is slightly slower.
What your code does is:
Open the CSV file in reading mode, if the file is opened successfully it will enter into the loop
It will then start reading line by line (considering line length up to 1000) till the end of the file. The third parameter , is a delimiter.
The output variable $data will contain the values read from the current line i.e. it will hold user account id and status.
Then you are running MySQL query to update the database.
And finally closes the opened CSV file.
Now, advice:
You are directly passing the values read from CSV file into plain SQL query, doing so may cause unwanted errors, worst SQL injection
What I will suggest is to perform some kind of input validation on the data read from the CSV file and then use parameterized SQL query.
Also, you are using MySQL functions which are deprecated now. Use either Mysqli or PDO.

Move Data from Blob to SQL Database

I am currently in the process of moving my sensor reading data from my Azure Blob storage into a SQL database. I have multiple .csv files and in those files I have various columns that holds the date ( in the format: 25/4/2017), time, sensor_location and sensor_readings.
My question; If I want to store the data according to their respective columns using Logic App, what step should I take? and how do I push the second file data into the row after the first file data? Thanks
You will need to either write a script, (any high level language which has support or extensions for mysql will do, python, php, nodejs, etc) to import your data or you can use a mysql client like sequelpro https://www.sequelpro.com/ which imports csv files.
Here is a link as to how to insert data into mysql with php:
http://php.net/manual/en/pdo.prepared-statements.php
You can read the csv file with:
$contents = file_get_contents('filename.csv');
$lines = explode("\n", $contents);
foreach($lines as $line) { ...
// insert all rows to mysql here

Creating Hive table - how to derive column names from CSV source?

...I really thought this would be a well-traveled path.
I want to create the DDL statement in Hive (or SQL for that matter) by inspecting the first record in a CSV file that exposes (as is often the case) the column names.
I've seen a variety of near answers to this issue, but not to many that can be automated or replicated at scale.
I created the following code to handle the task, but I fear that it has some issues:
#!/usr/bin/python
import sys
import csv
# get file name (and hence table name) from command line
# exit with usage if no suitable argument
if len(sys.argv) < 2:
sys.exit('Usage: ' + sys.argv[0] + ': input CSV filename')
ifile = sys.argv[1]
# emit the standard invocation
print 'CREATE EXTERNAL TABLE ' + ifile + ' ('
with open(ifile + '.csv') as inputfile:
reader = csv.DictReader(inputfile)
for row in reader:
k = row.keys()
sprung = len(k)
latch = 0
for item in k:
latch += 1
dtype = '` STRING' if latch == sprung else '` STRING,'
print '`' + item.strip() + dtype
break
print ')\n'
print "ROW FORMAT DELIMITED FIELDS TERMINATED BY ','"
print "LOCATION 'replacethisstringwith HDFS or S3 location'"
The first is that it simply datatypes everything as a STRING. (I suppose that coming from CSV, that's a forgivable sin. And of course one could doctor the resulting output to set the datatypes more accurately.)
The second is that it does not sanitize the potential column names for characters not allowed in Hive table column names. (I easily broke it immediately by reading in a data set where the column names routinely had an apostrophe as data. This caused a mess.)
The third is that the data location is tokenized. I suppose with just a little more coding time, it could be passed on the command line as an argument.
My question is -- why would we need to do this? What easy approach to doing this am I missing?
(BTW: no bonus points for referencing the CSV Serde - I think that's only available in Hive 14. A lot of us are not that far along yet with our production systems.)
Regarding the first issue (all columns are typed as strings), this is actually the current behavior even if the table were being processed by something like the CSVSerde or RegexSerDe. Depending on whether the particulars of your use case can tolerate the additional runtime latency, one possible approach is to define a view based upon your external table that dynamically recasts the columns at query time, and direct queries against the view instead of the external table. Something like:
CREATE VIEW VIEW my_view (
CAST(col1 AS INT) AS col1,
CAST(col2 AS STRING) AS col2,
CAST(col3 AS INT) as col3,
...
...
) AS SELECT * FROM my_external_table;
For the second issue (sanitizing column names), I'm inferring your Hive installation is 0.12 or earlier (0.13 supports any unicode character in a column name). If you import the re regex module, you can perform that scrubbing in your Python with something like the following:
for item in k:
...
print '`' + re.sub(r'\W', '', item.strip()) + dtype
That should get rid of any non-alphernumeric/underscore characters, which was the pre-0.13 expectation for Hive column names. By the way, I don't think you need the surrounding backticks anymore if you sanitize the column name this way.
As for the third issue (external table location), I think specifying the location as a command line parameter is a reasonable approach. One alternative may be to add another "metarow" to your data file that specifies the location somehow, but that would be a pain if you are already sitting on a ton of data files - personally I prefer the command line approach.
The Kite SDK has functionality to infer a CSV schema with the names from the header record and the types from the first few data records, and then create a Hive table from that schema. You can also use it to import CSV data into that table.

How to Map a CSV or Tab Delimited File to MySQL Multi-Table Database

I've got a pretty substantial XLS file a client provided 830 total tabs/sheets.
I've designed a multi table database with PHPMyAdmin (MySQL obviously) to house the information that's in there, and have populated about 5 of those sheets by hand to ensure the data will fit into the designed database.
Is there a piece of software or some sort of tool that will help me format this XLS document and map it to the right places in the database?
According to this thread you can import a csv file exported from excel with php.
Quoting #noelthefish
as you seem to be using PHP here is a function that is built into PHP
array fgetcsv ( resource $handle [, int $length [, string $delimiter [, string $enclosure [, string $escape ]]]] )
<?php
$row = 1;
$handle = fopen("test.csv", "r");
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
$num = count($data);
echo "<p> $num fields in line $row: </p>\n";
$row++;
for ($c=0; $c < $num; $c++) {
echo $data[$c] . "\n";
}
}
fclose($handle);
?>
this is the very basic of what you need. You would of course put in
the database update part within the while loop. Take out the ECHO
statements as well unless you are debugging but basically with a
little alteration this small piece of code will do what you need. if
you need more info check out uk.php.net
It's possible to import data from a spreadsheet into Access and map the fields to whatever table/column you want. It's also possible to use an ODBC connector to connect Access to a MySQL DB, essentially using Access as a front-end to MySQL.
Alternatively, you can do as toomanyairmiles suggests and simply write the PHP code to massage the CSV data into the format your MySQL DB needs. This is what I've done in the past when we needed to import sales data from disparate sources into an in-house sales/royalties-tracking system. If you need to do frequent imports, I would suggest automating a system (e.g. via an Excel macro) to export the individual sheets into CSVs in a single directory or, if it's more convenient, you can zip them together and upload the zip file to the PHP app.
If you're doing bulk imports into MySQL, you may want to consider using LOAD DATA INFILE, which is generally the fastest way to import data into MySQL—to the point where it's actually faster to write a new CSV file to disk after you've massaged the data, and then use LOAD DATA INFILE rather than doing a bulk INSERT directly. Doing this on my intranet actually cut the insertion time from over 3 seconds (already an improvement over the ~3min. it took to do thousands of individual INSERTs) down to 240ms.

How do I dump contents of a MySQL table to file using Perl?

What's the best way to dump a MySQL table to a file in Perl?
I've been using:
open( FILE, ">$filename" );
my $sth=$dbh->prepare("select * from table");
$sth->execute();
while ( my $row = $sth->fetchrow_arrayref ) {
print FILE join( "|", #$row ), "\n";
}
Can you shell out to mysqldump? That's what it's there for...
It depends on what you really want. Do you want to preserve schema information and database metadata? What about column names?
On the other hand your current method should work fine for data storage as long as the schema and column order don't change, buy you should consider the case of some record with the "|" character in it and escape that value appropiately, and apply the corresponding logic when you read and parse the file back. You might want to look into Text::CSV for a fast, realiable and flexible implementation that does most of the work for you in both directions, both writing an reading the file.
As already said, it depends on what you want to do. If the purpose is to back up the data, you should consider mysqlhotcopy (if you are using MyIsam tables) which copies the data/index files. It is much faster than manually dumping data (e.g. I get a 2.5 gb backup in approx 3 minutes).
Depending on why you need it dumped and what the size and content is. Assuming what you want is not a backup, which obviously should have a different application besides perl for. I would go with something like this which will preserve your columns and make the data easier in some respects to slurp into other programs or hand off than a CSV.
use XML::Simple;
...
my #rows=();
while ( my $h = $sth->fetchrow_hashref() )
{
$h->{_ROWNUM}=$#rows;
push(#rows, $h);
}
print XMLout(\#rows);