I am migrating a database from PostgresSql to MySql.
We were saving files in the database as PostgreSQL bytea columns. I wrote a script to export the bytea data and then insert the data into a new MySql database as a blob. The data is inserting into Mysql fine, but it is not working at the application level. However, the application should not care, as the data is exactly the same. I am not sure what is wrong, but I feel like it is some difference between MySql vs. PostgreSQL. Any help would be greatly appreciated.
This could really be a number of issues, but I can provide some tips in regards to converting binary data between sql vendors.
The first thing you need to be aware of is that each sql database vendor uses different escape characters. I suspect that your binary data export is using hex and you most likely have unwanted escape characters when you import to your new database.
I recently had to do this. The exported binary data was in hex and vendor specific escape characters were included.
In your new database, check if the text value of the binary data starts with an 'x' or unusual encoding. If it does you need to get rid of this. Since you already have the data inserting properly, to test, you can just write an sql script to remove any unwanted vendor specific escape characters from each imported binary data record in your new database. Finally, you may need to unhex each each new record.
So, something like this worked for me:
UPDATE my_example_table
SET my_blob_column = UNHEX(SUBSTRING(my_blob_column, 2, CHAR_LENGTH(my_blob_column)))
Note: The 2 in the SUBSTRING function is because the export script
was using hex and prepending '\x' as a vendor specific escape character.
I am not sure that will work for you, but it maybe worth a try.
Related
I have a simple yet for me difficult problem. I am encoding some integer data into binary type in a python script, and now I would like to load this binary data into a mysql table. I'm using pymysql to connect with the mysql database.
I've tried dropping the encoding in python, and leaving it to mysql to load it into a blob file. Here mysql uses a utf-8 encoding, which I'm not interested in. Maybe I can change the way mysql encodes the blob files?
Or better yet, can I just use the encoded data that I already have, and load it into the mysql table directly
Thanks!
I have converted from a MySQL database to Postgres. During the conversion, the picture column in Postgres was created as bytea.
This Xojo code works in MySQL but not Postgres.
Dim mImage as Picture
mImage = rs.Field("Picture").PictureValue
Any ideas?
I don't know about this particular issue, but here's what you can do to find out yourself, perhaps:
Pictures are stored as BLOBs in the database. Now, this means that the column must also be declared as BLOB (or a similar binary type). If it was accidentally marked as TEXT, this would work as long as the database does not get exported by other means. I.e, as long as only your Xojo code reads and writes to the record, using the PictureValue functions, that takes care of keeping the data in BLOB form. But if you'd then convert to another database, the BLOB data would be read as text, and in that process it might get mangled.
So, it may be relevant to let us know how you converted the DB. Did you perform a export as SQL commands and then imported it into Postgres by running these commands again? Do you still have the export file? If so, find a record with picture data in it and see if that data is starting with: x' and then contains hex byte code, e.g. x'45FE1200... and so on. If it doesn't, that's another indicator for my suspicion.
So, check the type of the Picture column in your old DB first. If that specifies a binary data type, then the above probably does not apply.
Next, you can look at the actualy binary data that Xojo reads. To do that, get the BlobValue instead of the PictureValue, and store that in a MemoryBlock. Do the same for a single picture, both with the old and the new database. The memoryblock should contain the same bytes. If not, that would suggest that the data was not transferred correctly. Why? Well, that depends on how you converted it.
I've been given some data which I am trying to import into mysql, the data was provided in a text file format which is usually fine by me - i know mssql uses different data types so a SQL dump was a none starter...
For some reason mssql must store LINESTRINGS in reverse order, which seemed very odd to me. As a result of this, when i try to upload the file with navicat the import fails. Below is an example of the LINESTRING - as you can see the longitude is first, then the latitude - this is what i believe to be the issue?
LINESTRING (-1.61674 54.9828,-1.61625 54.9828)
Does anybody know how i can get this data into my database?
Im quite new to spatial/geometry extensions.
Thanks,
Paul
must remember that the columns with spatial data have their own data type, navicat it does is call the "toString ()" or "AsText ()" event to display data, but in the background are blob, the advantage is that 2 are based on the standard WKT, I recommend that the db of origin to become space for text in the db destination takes that text and use it to "geometrifromtext" to convert the data (obviously you have to make a script with some programming language, with navicat can not do that)
info wkt
info mysql spatial
info sql server
I haven't seen any data type that can store a file in SQL. Is there something like that? What I'm particularly talking about is that I want to insert into my table a source code. What is the best method to do it? It can be either stored in my database as a nicely formatted text, or better (what I actually want) to store it as a single file. Please note that I'm using MySQL.
It is best not to store a file in your SQL database but to store a path to the file in the server or any other UNC path that your application can retrieve by itself and do with it what ever is unnecessary.
see this: https://softwareengineering.stackexchange.com/questions/150669/is-it-a-bad-practice-to-store-large-files-10-mb-in-a-database
and this:
Better way to store large files in a MySQL database?
and if you still want to store the file on the DB.. here is an example:
http://mirificampress.com/permalink/saving_a_file_into_mysql
If you can serialized the file you can store it as binary and then deserialize when needed
http://dev.mysql.com/doc/refman/5.0/en/binary-varbinary.html
You can also use a BLOB (http://dev.mysql.com/doc/refman/5.0/en/blob.html) which has some differences. Normally I just store the file in the filesystem and a pointer in the DB, which makes serving it back via something like HTTP a bit easier and doesn't bloat up the Database.
Storing the file in a table only makes sense if you need to do searches in that code. In other cases, you should only store a file's URL.
If you want to store a text file, use the TEXT datatype. Since it is a source code, you may consider using the ASCII character set to save space - but be aware that this will cause character set conversions during your queries, and this affects performances. Also, if it is ASCII you can use REGEXP for searches (that operator doesnt work with multi-byte charsets).
To load the file, if the file is on the same server as MySQL, you can use the FILE() function within an INSERT.
We are importing data from .sql script containing UTF-8 encoded data to MySQL database:
mysql ... database_name < script.sql
Later this data is being displayed on page in our web application (connected to that database), again in UTF-8. But somewhere in the process something went wrong, because non-ascii characters was displayed incorrectly.
Our first attempt to solve it was to change mysql columns encoding to UTF-8 (as described for example here):
alter table wp_posts change post_content post_content LONGBLOB;`
alter table wp_posts change post_content post_content LONGTEXT CHARACTER SET utf8;
But it didn't helped.
Finally we solved this problem by importing data from .sql script with additional command line flag which as I believe forced mysql client to treat data from .sql script as UTF-8.
mysql ... --default-character-set=utf8 database_name < script.sql
It helped but then we realized that this time we forgot to change column encoding to utf8 - it was set to latin1 even if utf-8 encoded data was flowing through database (from sql script to application).
So if data obtained from database is displayed correctly even if database character set is set incorrectly, then why the heck should I bother setting correct database encoding?
Especially I would like to know:
What parts of database rely on column encoding setting? When this setting has any real meaning?
On what occasions implicit conversion of column encoding is done?
How does trick with converting column to binary format and then to the destination encoding work (see: sql code snippet above)? I still don't get it.
Hope someone help me to clear things up...
The biggest reason, in my view, is that it breaks your DB consistency.
it happens way to often that you need to check data in the database. And if you cannot properly input UTF-8 strings coming from the web page to your MySQL CLI client, it's a pity;
if you need to use phpMyAdmin to administer your database through the “correct” web, then you're limiting yourself (might not be an issue though);
if you need to build a report on your data, then you're greatly limited by the number of possible choices, given only web is producing your the correct output;
if you need to deliver a partial database extract to your partner or external company for analysis, and extract is messed up — it's a pity.
Now to your questions:
When you ask database to ORDER BY some column of string data type, then sorting rules takes into account the encoding of your column, as some internal trasformation are applicable in case you have different encodings for different columns. Same applies if you're trying to compare strings, encoding information is essential here. Encoding comes together with collation, although most people don't use this feature so often.
As mentioned, if you have any set of columns in different encodings, database will choose to implicitly convert values to a common encoding, which is UTF8 nowadays. Strings' implicit encoding might be done in the client frameworks/libraries, depending on the client's environment encoding. Typically data is recoded into the database's encoding when sent to the server and back into client's encoding when results are delivered.
Binary data has no notion of encoding, it's just a set of bytes. So when you convert to binary, you're telling database to “forget” encoding, although you keep data without changes. Later, you convert to the string enforcing the right encoding. This trick helps if you're sure that data physically is in UTF-8, while by some accident a different encoding was specified.
Given that you've managed to load in data into the database by using --default-character-set=utf8 then there was something to do with your environment, I suggest it was not UTF8 setup.
I think the best practice today would be to:
have all your environments being UTF8 ready, including shells;
have all your databases defaulting to UTF8 encoding.
This way you'll have less field for errors.