Import of xml with XPath and encoding

Import of xml with XPath and encoding - mysql

I would like to import data from an xml with german umlauts (ä,ö,ü). To analyze the problem I imported the data as sql-insert, csv-file and xml-file. To see the source of the imports the entries are marked with prefix insert,csv and xml. Here is the result:
The umlauts from the xml source are not correctly imported.
Here are the codes to produce this:
Table definition
DROP TABLE IF EXISTS test01.animal;
CREATE TABLE test01.animal (
name VARCHAR(50) DEFAULT NULL
, category VARCHAR(50) DEFAULT NULL
) ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8 COLLATE = utf8_unicode_ci
;
INSERT
INSERT INTO test01.animal (name, category)
VALUES
('insert_snäke','reptile')
,('insert_frög','amphibian')
,('insert_tüna','fish')
,('insert_racoon','mammal')
;
CSV
data
csv_snäke,reptile
csv_frög,amphibian
csv_tüna,fish
csv_racoon,mammal
sql
LOAD DATA INFILE 'C:/animals.csv'
INTO TABLE test01.animal
FIELDS TERMINATED BY ','
;
XML
data
<?xml version="1.0" encoding="UTF-8"?>
<database>
<select query="SELECT name, category FROM animal">
<row>
<name>xml_snäke</name>
<category>reptile</category>
</row>
<row>
<name>xml_frög</name>
<category>amphibian</category>
</row>
<row>
<name>xml_tüna</name>
<category>fish</category>
</row>
<row>
<name>xml_racoon</name>
<category>mammal</category>
</row>
</select>
</database>
Perl/XPath
use strict;
use DBI;
use XML::XPath;
use XML::XPath::XMLParser;
my $dir0 = "C:";
my $dbh = DBI->connect ("DBI:mysql:test01",
"root", "fire2013",
{ RaiseError => 1, PrintError => 0});
my $xp = XML::XPath->new (filename => "$dir0/animals.xml");
my $nodelist = $xp->find ("//row");
foreach my $row ($nodelist->get_nodelist ())
{
$dbh->do (
"INSERT INTO animal (name, category) VALUES (?,?)",
undef,
$row->find ("name")->string_value (),
$row->find ("category")->string_value ()
);
}
$dbh->disconnect ();
Any idea why I don't get umlauts by importing xml? Any help appreciated. Thanks.
PS: Windows 7 / MariaDB 5.5.31 / Strawberry Perl 5.16

Try this:
$dbh->do(qq{SET NAMES 'utf8'}) or die $dbh->errstr;

Related

PowerShell Export-csv writes empty string for a NULL database value. Can it be told to write nothing?

Using PowerShell 4.0 and SQL-Server, I want to merge records from one database to another. I use export-csv and import-csv. I export from one database to a csv file, and import from the csv file to a temporary table in another database, for subsequent MERGE.
TableA is
([ID] int not null,
[Name] varchar(25) not null,
[StartDate] datetime null,
[Department] varchar(25) not null)
Values are ID=1, Name=Joe, StartDate=NULL, Department=Sales
exportTable.ps1 (Ignoring database config)
Invoke Sqlcmd ("SELECT FROM TableA WHERE ID=1") | Export-Csv -path a.csv -NoTypeInformation -Append
This results in a.csv
"ID","Name","StartDate","Department"
"1","Joe","","Sales"
import Table.ps1
CreateTable TableATemporary
([ID] int not null,
[Name] varchar(25) not null,
[StartDate] datetime null)
Import-Csv a.csv | ForEach-Object {
$allValues = "'"+($_.Psobject.Properties.Value -join "','") + "'"
Invoke Sqlcmd ("INSERT INTO TableATemporary VALUES $allValues")
This gives a table of
Values are ID=1, Name=Joe, StartDate=1900-01-01 00:00:00:000, Department=Sales
Rather than a null entry, the datetime field is a default value because the field in the csv file is ""
Is there any way for the Export-Csv cmdlet to write nothing to the csv file for the empty fields in the database, instead of "" ?

Import-Csv always returns blank strings, but there are plenty of ways to set those values to $null if they're empty. For example, here I check for blank values before joining them:
Import-Csv a.csv | ForEach-Object {
# convert empty strings to null
$allValues = '"' + (($_.Psobject.Properties.Value | ForEach-Object {
if($_){"'$_'"} else{''} }) -join ',') + '"'
Invoke-Sqlcmd ("INSERT INTO TableATemporary VALUES $allValues")
}
Now it no longer sets empty strings in $allvalues:
"'1','Joe',,'Sales'"
I recommend using Write-SqlTableData for importing rather than running sqlcmd for each row, but it's just an efficiency thing.

Cpt.Whale's helpful answer shows how to unconditionally represent empty-string field values without embedded quoting ('') [update: should be NULL] in the argument list constructed for the SQL statement.
If you want explicit control over which fields should act this way, you can try the following (simplified example):
[pscustomobject] #{ ID = '1'; Name = 'Joe'; StartDate = ''; Department = '' } |
ForEach-Object {
$_.psobject.Properties.ForEach({
$quoteChar = "'"
$name, $value = $_.Name, $_.Value
# Determine whether to use (unquoted) NULL if the value is the empty string.
switch ($name) {
'StartDate' { if ($value -eq '') { $quoteChar = ''; $value = 'NULL' } }
# Handle other fields here, as needed.
}
'{0}{1}{0}' -f $quoteChar, $value, $quoteChar
}) -join ','
}
Output (note how the empty-string for StartDate resulted in unquoted NULL, whereas the one for Department is the quoted empty string):
'1','Joe',NULL,''
Note:
Ideally, whether to quote or not should be guided by the data types of the values, but CSV data is by definition all string-typed.
You can implement a data-type mapping of sorts by extending the switch statement above, so that, say, numeric fields are always used unquoted (e.g., as a switch branch for a numeric Age field: 'Age' { $quoteChar = '' }).

Tsql to generate json array

I am trying to generate json output from a sql table. Need help with the SQL statement please. "schemas" output is not coming as I expected. My sql query is returning extra ''. Screenshot I indicated how my query should return the output as an array. Need help with fixing my select statement.
Thanks in advance.
Drop TABLE #tmp
CREATE TABLE #tmp (
[EmployeeEmailAccount] [nvarchar](50) NULL,
[displayName] [nvarchar](50) NULL
) ON [PRIMARY]
GO
INSERT #tmp ([EmployeeEmailAccount], [displayName]) VALUES (N'test1#gmail.com', N'testusr1')
GO
SELECT TOP 1
[schemas] = '["urn:scim:schemas:core:2.0:User" , "urn:scim:schemas:extension:fa:2.0:faUser"]',
EmployeeEmailAccount as 'userName'
FROM #tmp
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER

To get array you can use the function JSON_QUERY
SELECT TOP 1
[schemas] =
JSON_QUERY('["urn:scim:schemas:core:2.0:User" , "urn:scim:schemas:extension:fa:2.0:faUser"]'),
EmployeeEmailAccount as 'userName'
FROM tmp
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER
GO
This will return:
{
"schemas": [
"urn:scim:schemas:core:2.0:User",
"urn:scim:schemas:extension:fa:2.0:faUser"
],
"userName": "test1#gmail.com"
}
Note: to format the JSON as such I am using my SSMS extension, but you can use external third party app like notepad++ or VS

perl dbi mysql - value precision

user#host:~# mysql -V - mysql Ver 14.14 Distrib 5.7.25-28, for debian-linux-gnu (x86_64) using 7.0 running under debian-9,9
user#host:~# uname -a - Linux 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3.1 (2019-02-19) x86_64 GNU/Linux
user#host:~# perl -MDBI -e 'print $DBI::VERSION ."\n";' - 1.636
user#host:~# perl -v This is perl 5, version 24, subversion 1 (v5.24.1) built for x86_64-linux-gnu-thread-multi
mysql> SHOW CREATE TABLE tbl1;
tbl1 | CREATE TABLE `tbl1` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`main_id` bigint(20) NOT NULL DEFAULT '0',
`debet` varchar(255) NOT NULL DEFAULT '',
`kurs` double(20,4) NOT NULL DEFAULT '0.0000',
`summ` double(20,2) NOT NULL DEFAULT '0.00',
`is_sync` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `main_id` (`main_id`)
) ENGINE=InnoDB AUTO_INCREMENT=70013000018275 DEFAULT CHARSET=utf8
mysql> SELECT * FROM tbl1 WHERE id=70003020040132;
-+---------------+----------------+-------+--------+---------+---------+
| id | main_id | debet | kurs | summ | is_sync |
+----------------+----------------+-------+--------+---------+---------+
| 70003020040132 | 70003020038511 | | 0.0000 | 1798.00 | 0 |
+----------------+----------------+-------+--------+---------+---------+
But when I get this data by perl::DBI module I lose precisions, and values 0.0000 and 1798.00 becomes 0 and 1798.
Code is next:
####
#These 3 subs are connecting to DB, executing query and get data by fetchall_arrayref and coverting undef to NULL.
####
sub DB_connect {
# DataBase Handler
my $dbh = DBI->connect("DBI:mysql:$DBNAME", $DBUSER, $DBPWD,{RaiseError => 0, PrintError => 0, mysql_enable_utf8 => 1}) or die "Error connecting to database: $DBI::errstr";
return $dbh;
}
sub DB_executeQuery {
# Executes SQL query. Return reference to array, or array, according to argv[0]
# argv[0] - "A" returns array, "R" - reference to array
# argv[1] - DB handler from DB_connect
# argv[2] - query to execute
my $choice=shift #_;
my $dbh=shift #_;
my $query=shift #_;
print "$query\n" if $DEBUG>2;
my $sth=$dbh->prepare($query) or die "Error preparing $query for execution: $DBI::errstr";
$sth->execute;
my $retval = $sth->fetchall_arrayref;
if ($choice eq "A" ) {
my #ret_arr=();
foreach my $value (#{ $retval }) {
push #ret_arr,#{ $value };
}
return #ret_arr;
}
elsif ($choice eq "R") {
return $retval;
}
}
sub undef2null {
# argv[1] - reference ro array of values where undef
# values has to be changed to NULL
# Returns array of prepared values: (...) (...) ...
my $ref=shift #_;
my #array=();
foreach my $row (#{ $ref }) {
my $str="";
foreach my $val ( #{ $row} ) {
if (! defined ( $val )) {
$str="$str, NULL";
}
else {
# Escape quotes and other symbols listed in square brackets
$val =~ s/([\"\'])/\\$1/g;
$str="$str, \'$val\'";
}
}
# Remove ', ' at the beginning of each VALUES substring
$str=substr($str,2);
push #array,"($str)";
} # End foreach my $row (#{ $ref_values })
return #array;
} # End undef2null
#### Main call
#...
# Somewhere in code I get data from DB and print it to out file
my #arr_values=();
my #arr_col_names=DB_executeQuery("A",$dbh,qq(SELECT column_name FROM `information_schema`.`columns` WHERE `table_schema` = '$DBNAME' AND `table_name` = '#{ $table }'));
#arr_ids=DB_executeQuery("A",$dbh,qq(SELECT `id` FROM `#{ $table }` WHERE `is_sync`=0));
my $ref_values=DB_executeQuery("R",$dbh,"SELECT * FROM \`#{ $table }\` WHERE \`id\` IN(".join(",",#arr_ids).")");
#arr_values=undef2null($ref_values);
print FOUT "REPLACE INTO \`#{ $table }\` (`".join("`, `",#arr_col_names)."`) VALUES ".(join ", ",#arr_values).";\n";
and as a result I get next string:
REPLACE INTO `pko_plat` (`id`, `main_id`, `debet`, `kurs`, `summ`, `is_sync`) VALUES ('70003020040132', '70003020038511', '', '0', '1798', '0')
in DB it was 0.0000 - become 0, was 1798.00, become 1798
Perl's DBI documentation says it gets data 'as is' into strings, no translations made. But, then, who rounded values?

The rounding you see is happening because of the way you create the columns.
`kurs` double(20,4) NOT NULL DEFAULT '0.0000'
`summ` double(20,2) NOT NULL DEFAULT '0.00'
If you look at the mysql floating point type documentation you will see that you are using the non-standard syntax double(m, d), where the two parameters define how the float is being output.
So in your case the values stored in summ will be displayed with 2 digits behind the point. This means that when perl gets a value out of the table which is 1.0001 in the database, the value that perl gets delivered by the database is rounded to the set number of digits (in this case .00).
Perl in turn interprets this value ("1.00") as a float, and when printed, will not show any trailing zeroes. If you want these you should accommodate for this in your output.
For example: print sprintf("%.2f\n", $summ);
The way I see it you have two ways you can go (if you want to avoid this loss of precision):
Only add numbers with the correct precision to the database (so for 'summ' only two trailing digits, and four for 'kurs'.)
Alter your table creation to the standard syntax for floats and determine the output formatting in Perl (which you will be doing either way):
`kurs` double() NOT NULL DEFAULT '0.0'

mysql doesn't support all symbols from range U+0000..U+FFFF

Consider the following table:
CREATE TABLE t1 (f1 VARCHAR(255));
Then, be it ruby:
#!/usr/bin/env ruby
require 'json'
require 'sequel'
require 'mysql2'
DB = Sequel.connect(
:adapter => 'mysql2',
:database => 'd1',
:user => '<user>',
:password => '<password>',
:encoding => 'utf8')
v1 = '{"a":"b\ud83c\udf4ec"}'
v2 = JSON.parse(v1)
p v2['a']
DB[:t1].truncate
DB[:t1].insert(f1: v2['a']);
p DB[:t1].first[:f1]
or php:
#!/usr/bin/env php
<?php
$dbh = new PDO('mysql:dbname=d1', '<user>', '<password>', [
PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES utf8',
PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
]);
$dbh->exec('TRUNCATE TABLE t1');
$v1 = '{"a":"b\ud83c\udf4ec"}';
$v2 = json_decode($v1);
var_dump($v2->a);
$sth = $dbh->prepare("INSERT INTO t1 VALUES (?)");
$sth->execute([$v2->a]);
$sth = $dbh->query("SELECT * FROM t1");
var_dump($sth->fetch()['f1']);
what gets in the database is b. I'm running mysql-5.1 and the documentation says:
MySQL 5.1 supports two character sets for storing Unicode data:
ucs2, the UCS-2 encoding of the Unicode character set using 16 bits per character.
utf8, a UTF-8 encoding of the Unicode character set using one to three bytes per character.
These two character sets support the characters from the Basic Multilingual Plane (BMP) of Unicode Version 3.0. BMP characters have these characteristics:
Their code values are between 0 and 65535 (or U+0000 .. U+FFFF).
What am I doing wrong?
UPD
$ mysql -BNe 'SHOW CREATE TABLE t1' d1
t1 CREATE TABLE `t1` (\n `f1` varchar(255) DEFAULT NULL\n) ENGINE=InnoDB DEFAULT CHARSET=utf8

It appears those two escape sequences represent only one character: RED APPLE (U+1F34E). The first one being a surrogate. And surrogates are:
The UCS uses surrogates to address characters outside the initial Basic Multilingual Plane without resorting to more than 16 bit byte representations.
So that must be it, the resulting character is outside the BMP. And is not supported by mysql's utf8 character set as such.

in my MySQL 5.1 (from debian) doing
CREATE TABLE t1 (f1 VARCHAR(255));
is effectively creating a LATIN1 table :
mysql> show CREATE TABLE t1 ;
+-------+---------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+---------------------------------------------------------------------------------------------+
| t1 | CREATE TABLE `t1` (
`f1` varchar(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+---------------------------------------------------------------------------------------------+
So please check first that your MySQL really defaults to UTF-8.
Then, MySQL is known to NOT be able to store every character from BMP table. I don't find references about that, but saw it earlier.
So much that from mysql 5.5.3 introduced a new utf8mb4 full-unicode support character set as statu As stated here : https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-upgrading.html
Finally, even if BMP is saying they are between 0 and 0xFFFF it doesn't mean they are using all of this space as stated here : https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane

SQL collation issue

I have a php code which parse XML files and store the parsed information in my database(MySQL) on my Linux server.
all my tables collation is 'utf8_unicode_ci' even my database collation is 'utf8_unicode_ci'
I have a lot of languages in my XML files (Turkish, Swedish, french, ...) and I want the special characters in these languages to be stored in their original form but my problem is that an XML pattern like this :
<match date="24.08.2012" time="17:00" status="FT" venue="Stadiumi Loro Boriçi (Shkodër)" venue_id="1557" static_id="1305963" id="1254357">
the venue value will be stored in my database like this:
Stadiumi Loro BoriÃ§i (ShkodÃ«r)
can anybody help me to store the value in my database as it is ???

Try to use "SET NAMES utf8" command before the query. As the manual says:
SET NAMES indicates what character set the client will use to send SQL statements to the server.
I tried to imitate your case. Created a table test1:
DROP TABLE IF EXISTS `test1`;
CREATE TABLE `test1` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`value` varchar(20) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;
And tried to insert a string with French special characters and Chinese. For me it worked ok even without SET NAMES.
<?php
$str = "çëÙùÛûÜüÔô汉语漢語";
try {
$conn = new PDO('mysql:host=localhost;dbname=test', 'user', '');
$conn->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$conn->exec("SET NAMES utf8");
$stmt = $conn->prepare('INSERT INTO test1 SET value = :value');
$stmt->execute(array('value' => $str));
//select the inserted row
$stmt = $conn->prepare('SELECT * FROM test1 WHERE id = :id');
$stmt->execute(array('id' => 1));
while($row = $stmt->fetch(PDO::FETCH_ASSOC)) {
print_r($row);
}
}
catch(PDOException $e) {
echo 'ERROR: ' . $e->getMessage();
}
It worked correctly, printing this:
Array
(
[id] => 1
[value] => çëÙùÛûÜüÔô汉语漢語
)
Also when you are testing don't read directly from console, redirect the output into a file and open it with a text editor:
php test.php > 1.txt

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Import of xml with XPath and encoding - mysql

Try this: $dbh->do(qq{SET NAMES 'utf8'}) or die $dbh->errstr;

Related

PowerShell Export-csv writes empty string for a NULL database value. Can it be told to write nothing?

Tsql to generate json array

perl dbi mysql - value precision

mysql doesn't support all symbols from range U+0000..U+FFFF

SQL collation issue

Categories

Resources