LONGTEXT valid in migration for PGSQL and MySQL - mysql

I am developing a Ruby on Rails application that stores a lot of text in a LONGTEXT column. I noticed that when deployed to Heroku (which uses PostgreSQL) I am getting insert exceptions due to two of the column sizes being too large. Is there something special that must be done in order to get a tagged large text column type in PostgreSQL?
These were defined as "string" datatype in the Rails migration.

If you want the longtext datatype in PostgreSQL as well, just create it. A domain will do:
CREATE DOMAIN longtext AS text;
CREATE TABLE foo(bar longtext);

In PostgreSQL the required type is text. See the Character Types section of the docs.

A new migration that updates the models datatype to 'text' should do the work. Don't forget to restart the database. if you still have problems, take a look at your model with 'heroku console' and just enter the modelname.
If the db restart won't fix the problem, the only way I figured out was to reset the database with 'heroku pg:reset'. No funny way if you already have important data in your database.

Related

Postgresql Encoding Issue Using tsearch With Thai Text After Conversion from MariaDB using Pgloader

I am trying to convert a MySQL UTF8mb4 database which contains both Thai and English to Postgresql. This appears to go
well until I try and add tsearch. Let me outline the steps taken.
Install this Thai parser https://github.com/zdk/pg-search-thai
I restore a copy of production locally from a dump file into MariaDB
Fix some enum values that trip up Postgresql due to them being
missing. MariaDB is happy with them :(
Convert some polygons to
text format as pgloader does not deal with them gracefully.
-Run pgloader against a fresh postgresql database, testdb
pgloader mysql://$MYSQL_USER:$MYSQL_PASSWORD#localhost/$MYSQL_DB postgresql://$PG_USER:$PG_PASSWORD#localhost/testdb
This appears to work, the site, a Laravel one, appears to function although with some bugs to fix due to differences
between MariaDB and Postgresql constraint behavior. However when I try and create text vectors for tsearch, I run into
encoding issues. This is where I need advice.
-- trying to create minimal case, dumping Thai names into a temporary table
CREATE EXTENSION thai_parser;
CREATE TEXT SEARCH CONFIGURATION thai_unstemmed (PARSER = thai_parser);
ALTER TEXT SEARCH CONFIGURATION thai_unstemmed ADD MAPPING FOR a WITH simple;
-- to test the parser is working, which it is
SELECT to_tsvector('thai_unstemmed', 'ข้าวเหนียวส้มตำไก่ย่าง ต้มยำกุ้ง in thailand');
-- to recreate my error I did this
CREATE TABLE vendor_names AS SELECT id,name from vendors_i18n;
ALTER TABLE vendor_names ADD COLUMN tsv_name_th tsvector;
-- this fails
UPDATE vendor_names SET tsv_name_th=to_tsvector('thai_unstemmed', coalesce(name, ''));
The error I get is ERROR: invalid byte sequence for encoding "UTF8": 0x80
If I dump that table and restore into a new Postgresql database I do not get the encoding error.
Questions:
What is the correct encoding to use for UTF8mb4 to Postgresql for pgloader?
Is there any way, other than the above, of checking the data being correct UTF8 or not?
Is the problem in the Thai parser tool?
Any suggestions as to how to solve this would be appreciated.
Cheers,
Gordon
PS I'm an experienced developer but not an experienced DBA.
Have you tried manually importing the dataset row-by-row to see which rows are successfully imported and which ones fail? If some imports succeed but others fail it would seem to be a data integrity problem.
If none of the records are successfully imported it's obviously an encoding problem.

Sqoop compatibility with TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT

For a project of mine, I would like to transfer multiple tables fomr a MySQL database into hive using sqoop. Because I have a few columns that use the MEDIUMTEXT datatype, I'd like to check the compatibility with someone that has experience, to prevent sudden surprises down the road.
Taken from the latest Sqoop userguide (1.4.6) there is no compatibility for BLOB, CLOB, or LONGVARBINARY columns in direct mode.
Given that there is no mention of incompatibilities with "TEXT" datatypes, will I be able to import them from MySQL without problems?
In MySQL, TEXT is same as CLOB. What ever limitations user guide mentions for CLOB is applicable to TEXT types.
Unlike typical datatypes, CLOB and TEXT need not store data inline to the record, instead the contents can be stored in a separate file and there will be pointer in the record. That is why direct path does not work for special types like CLOB/TEXT, BLOB in most of the databases.
I finally got around to setting up my hadoop cluster for my project. I am using hadoop 2.6.3 with hive 1.2.1 and sqoop 1.4.6.
It turns out that there is no problem with importing TEXT datatypes from MySQL into Hive using Sqoop. You can even supply the '--direct' parameter that makes use of the mysqldump tool for quicker transfers. In my project I had to import multiple tables containing 2 MEDIUMTEXT columns each. The tables were only about 2 GB each, so not that massive.
I hope this helps someone that is in the same sitation I was in.

Adding a TEXT field to entity framework model for MySql

I'm trying to add a text field to my EDMX file which I have set to generate DDL for MySql. but the only option I have is to add a string with the maximum length set to Max. This is reporting a error when executing the SQL statements against the database, that the maxlength of 4000 is not supported.
I also tried it the other way around, updating the field in the database and than update the EDMX file based on the database, but that sets the field back to a string field with maximum length set to None.
Am I overlooking something? Have anyone used this field?
Right now I have a kind of workaround to have my text field in the database mapped to the string property in the EDMX model. I generate the database script from the EDMX file and manually update the type for the TEXT columns from nvarchar(1000) to TEXT, execute it against the database and after that validate the mappings in the EDMX file.
Hopefully someone will come up with a better solution, because this is definitly not a cool workaround.
Update
This bug is fixed in Mysql Connector for .Net version 6.4.4
I'm not familiar with the entity framework, however a TEXT field in mysql does not have a length as part of its definition. Only CHAR/VARCHAR does.
There is a maximum lenght of data that can be stored in TEXT, which is 64kb.

mysql to oracle

I've googled this but can't get a straight answer. I have a mysql database that I want to import in to oracle. Can I just use the mysql dump?
Nope. You need to use some ETL (Export, Transform, Load) tool.
Oracle SQL Developer has inbuilt feature for migrating MySQL DB to Oracle.
Try this link - http://forums.oracle.com/forums/thread.jspa?threadID=875987&tstart=0 This is for migrating MySQL to Oracle.
If the dump is a SQL script, you will need to do a lot of copy & replace to make that script work on Oracle.
Things that come to my mind
remove the dreaded backticks
remove all ENGINE=.... options
remove all DEFAULT CHARSET=xxx options
remove all UNSIGNED options
convert all DATETIME types to DATE
replace BOOLEAN columns with e.g. integer or a CHAR(1) (Oracle does not support boolean)
convert all int(x), smallint, tinyint data types to simply integer
convert all mediumtext, longtext data types to CLOB
convert all VARCHAR columns that are defined with more than 4000 bytes to CLOB
remove all SET ... commands
remove all USE commands
remove all ON UPDATE options for columns
rewrite all triggers
rewrite all procedures
The answer depends on which MySQL features you use. If you don't use stored procedures, triggers, views etc, chances are you will be able to use the MySQL export without major problems.
Take a look at:
mysqldump --compatible=oracle
If you do use these features, you might want to try an automatic converter (Google offers some).
In every case, some knowledge of both syntaxes is required to be able to debug problems (there almost certainly will be some). Also remember to test everything thoroughly.

Adding a custom column data type in Active Record

On my local machine I develop my Rails application using MySQL, but on deployment I am using Heroku which uses PostgreSQL. I am in need of creating a new data type, specifically I wish to call it longtext, and it is going to need to map to separate column types in either database.
I have been searching for this. My basic idea is that I am going to need to override some hash inside of the ActiveRecord::ConnectionAdapters::*SQL adapter(s) but I figured I would consult the wealth of knowledge here to see if this is a good approach (and, if possible, pointers on how to do it) or if there is a quick win another way.
Right now the data type is "string" and I am getting failed inserts because the data type is too long. I want the same functionality on both MySQL and PgSQL, but it looks like there is no common data type that gives me an unlimited text blob column type?
The idea is that I want to have this application working correctly (with migrations) for both database technologies.
Much appreciated.
Why don't you install PostgreSQL on your dev machine? Download it, click "ok" a few times and you're up and running. It isn't rocket science :-)
http://www.postgresql.org/download/
PostgreSQL doesn't have limitations on datatypes, you can create anything you want, it's up to your imagination:
CREATE DOMAIN (simple stuff only)
CREATE TYPE (unlimited)
The SQL that Frank mentioned is actually the answer, but I really was looking for a more specific way to do RDBMS specific Rails migrations. The reason is that I want to maintain the fact that my application can run on both PostgreSQL and MySQL.
class AddLongtextToPostgresql < ActiveRecord::Migration
def self.up
case ActiveRecord::Base.connection.adapter_name
when 'PostgreSQL'
execute "CREATE DOMAIN longtext as text"
execute "ALTER TABLE chapters ALTER COLUMN html TYPE longtext"
execute "ALTER TABLE chapters ALTER COLUMN body TYPE longtext"
else
puts "This migration is not supported on this platform."
end
end
def self.down
end
end
That is effectively what I was looking for.