What Happened?
We migrated from MySQL to Postgres and had Laravel re-run it's migrations with the psql driver instead of mysql. Worked great - it setup the database just fine. Since we are moving from Laravel to Rails, we created rails migrations to rename fields to be more 'rails-like' as well as to import the db data(not the structure) into postgres. We have run into a lot of issues from this, but most importantly sequences are jacked for auto increments now. They are trying to start from 1.
What Can I Not Figure Out?
I can manually reset all of them using something like SELECT pg_catalog.setval('availabilities_id_seq', 700, false); but that isn't very helpful as we do this on multiple environments with so many tables. Is there a way you can think of to find the last autoincrement id and then set the sequence from that?
In case you're curious, here is the Rails migration to move the data incase anyone has tips on how we did this wrong (my_models being the array of tables):
my_models.each do |m|
puts m
"Mysql#{m}".constantize.find_each(batch_size: 100) do |old_model|
new_model = m.constantize.new
new_model.attributes = old_model.attributes
new_model.save
end
end
You can check the following discussions:
Postgresql - Using subqueries with alter sequence expressions
How to reset postgres' primary key sequence when it falls out of sync?
PostgreSQL wiki also have a possible solution to you case: Fixing Sequences.
Related
i am an amateur with postgresql and a newbie with liquibase.
i am using puppet and liquibase to create postgresql database on rhel server.
i drop the database (puppet resource postgresql_database ensure=absent) then run puppet to re-create the database.
i log into psql and run \dt \di \ds. no duplicate tables or indexes but duplicate sequences e.g.
activity_log_activities_id_seq
activity_log_activities_id_seq1
the baseline.xml lists the sequence 1 time e.g.
<createSequence sequenceName="activity_log_activities_id_seq"/>
<createSequence sequenceName="activity_log_activity_products_id_seq"/>
i've google'd liquibase duplicate sequences id1 etc. but no good hits.
please advise.
we didn't figure out what was creating the sequences the 1st time but we did resolve it via removing the creation of the sequences within the baseline.xml included.
now i'm attempting to figure out why when i add includeAll with path="sqls" and run java -jar ../liquibase.jar --changeLogFile=master.xml update
it comes back with "Liquibase Update Successful"
but it's not making a change to the database or the databasechangelog table. i'll do a search and see if it's already been answered elsewhere... thanks!
I have a text field in my database and a index on it for the first 10 characters. How do I specify that in my Doctrine Entity?
I can't find any information about database specific options for indexes anywhere :/
This is my "partial" MySQL create statement:
KEY `sourceaddr_index` (`sourceaddr`(10)),
And this is my #Index in doctrine:
#ORM\Index(name="sourceaddr_index", columns={"sourceaddr"}, options={}),
This dosnt interfere with the regular use, but I noticed the problem when deploying development to a new laptop, and creating the database based on my entities...
Any help would be appreciated :)
Possible since Doctrine 2.9, see: https://github.com/doctrine/dbal/pull/2412
#Index(name="slug", columns={"slug"}, options={"lengths": {191}})
Unfortunately, Doctrine seem to be very picky with whitespace location, so e.g. update --dump-sql yields:
DROP INDEX slug ON wp_terms;
CREATE INDEX slug ON wp_terms (slug(191));
and even if you execute those, they messages will stay there (tested with MariaDB 10.3.14).
I've had very good luck naming the index in Doctrine, after manually creating it in MySQL. It's not pretty or elegant, and it's prone to cause errors moving from dev to production if you forget to recreate the index. But, Doctrine seems to understand it respect it.
In my entity, I have the following definition. Doctrine ignores the length option - it's wishful thinking on my part.
/**
* Field
*
* #ORM\Table(name="field", indexes={
* #ORM\Index(name="field_value_bt", columns={"value"}, options={"length": 100})
* })
And in MySQL, I execute
CREATE INDEX field_value_bt ON field (value(100))
As far as I've seen, Doctrine just leaves the index alone so long as it's named the same.
In short: you can't set this within Doctrine. Doctrine's ORM is specifically focused on cross vendor compatability and the type of index you're describing, though supported in many modern RDBMS, is somewhat outside the scope of Doctrine to handle.
Unfortunately there isn't an easy way around this if you use Doctrine's schema updater (in Symfony that would be php app/console doctrine:schema:update --force) as if you manually update the database, Doctrine will sometimes, regress that change to keep things in sync.
In instances where I've needed something like this I've just set up a fixture that sends the relevant ALTER TABLE statement via SQL. If you're going to be distributing your code (i.e. it may run on other/older databases) you can wrap the statement with a platform check to make sure.
It's not ideal but once your app/software stabilises, issues like this shouldn't happen all that often.
I use Kettle for some transformations and ran into a problem:
For one specific row, my DatabaseLookup step hangs. It just doesn't give a result. Trying to stop the transformation results in a never ending "Halting" for the lookup step.
The value given is nothing complicated at all, neither it is different from all other rows/values. It just won't continue.
Doing the same query in the database directly or in a different database tool (e.g. SQuirreL), it works.
I use Kettle/Spoon 4.1, the database is MySQL 5.5.10. It happens with Connector/J 5.1.14 and the one bundled with spoon.
The step initializes flawlessly (it even works for other rows) and I have no idea why it fails. No error message in the Spoon logs, nothing on the console/shell.
weird. Whats the table type? is it myisam? Does your transform also perform updates to the same table? maybe you are locking the table inadvertantly at the same time somehow?
Or maybe it's a mysql 5.5 thing.. But ive used this step extensively with mysql 5.0 and pdi 4.everything and it's always been fine... maybe post the transform?
I just found the culprit:
The lookup takes as a result the id field and gave it a new name, PERSON_ID. This FAILS in some cases! The resulting lookup/prepared statement was something like
select id as PERSON_ID FROM table WHERE ...
SOLUTION:
Don't use underscore in the "New name" for the field! With a new name of PERSONID everything works flawlessly for ALL rows!
Stupid error ...
I have reached the limit of RAM in analyzing large datasets in R. I think my next step is to import these data into a MySQL database and use the RMySQL package. Largely because I don't know database lingo, I haven't been able to figure out how to get beyond installing MySQL with hours of Googling and RSeeking (I am running MySQL and MySQL Workbench on Mac OSX 10.6, but can also run Ubuntu 10.04).
Is there a good reference on how to get started with this usage? At this point I don't want to do any sort of relational databasing. I just want to import .csv files into a local MySQL database and do the subsetting in with RMySQL.
I appreciate any pointers (including "You're way off base!" as I'm new to R and newer to large datasets... this one's around 80 mb)
The documentation for RMySQL is pretty good - but it does assume that you know the basics of SQL. These are:
creating a database
creating a table
getting data into the table
getting data out of the table
Step 1 is easy: in the MySQL console, simply "create database DBNAME". Or from the command line, use mysqladmin, or there are often MySQL admin GUIs.
Step 2 is a little more difficult, since you have to specify the table fields and their type. This will depend on the contents of your CSV (or other delimited) file. A simple example would look something like:
use DBNAME;
create table mydata(
id INT(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
height FLOAT(3,2)
);
Which says create a table with 2 fields: id, which will be the primary key (so has to be unique) and will autoincrement as new records are added; and height, which here is specified as a float (a numeric type), with 3 digits total and 2 after the decimal point (e.g. 100.27). It's important that you understand data types.
Step 3 - there are various ways to import data to a table. One of the easiest is to use the mysqlimport utility. In the example above, assuming that your data are in a file with the same name as the table (mydata), the first column a tab character and the second the height variable (with no header row), this would work:
mysqlimport -u DBUSERNAME -pDBPASSWORD DBNAME mydata
Step 4 - requires that you know how to run MySQL queries. Again, a simple example:
select * from mydata where height > 50;
Means "fetch all rows (id + height) from the table mydata where height is more than 50".
Once you have mastered those basics, you can move to more complex examples such as creating 2 or more tables and running queries that join data from each.
Then - you can turn to the RMySQL manual. In RMySQL, you set up the database connection, then use SQL query syntax to return rows from the table as a data frame. So it really is important that you get the SQL part - the RMySQL part is easy.
There are heaps of MySQL and SQL tutorials on the web, including the "official" tutorial at the MySQL website. Just Google search "mysql tutorial".
Personally, I don't consider 80 Mb to be a large dataset at all; I'm surprised that this is causing a RAM issue and I'm sure that native R functions can handle it quite easily. But it's good to learn new skill such as SQL, even if you don't need them for this problem.
I have a pretty good suggestion. For 80MB use SQLite. SQLite is a super public domain, lightweight, super fast file-based database that works (almost) just like a SQL database.
http://www.sqlite.org/index.html
You don't have to worry about running any kind of server or permissions, your database handle is just a file.
Also, it stores all data as a string, so you don't even have to worry about storing the data as types (since all you need to do is emulate a single text table anyway).
Someone else mentioned sqldf:
http://code.google.com/p/sqldf/
which does interact with SQLite:
http://code.google.com/p/sqldf/#9._How_do_I_examine_the_layout_that_SQLite_uses_for_a_table?_whi
So your SQL create statement would be like this
create table tablename (
id INT(11) INTEGER PRIMARY KEY,
first_column_name TEXT,
second_column_name TEXT,
third_column_name TEXT
);
Otherwise, neilfws' explanation is a pretty good one.
P.S. I'm also a little surprised that your script is choking on 80mb. It's not possible in R to just seek through the file in chunks without opening it all up in memory?
The sqldf package might give you an easier way to do what you need: http://code.google.com/p/sqldf/. Especially if you are the only person using the database.
Edit: Here is why I think it would be useful in this case (from the website):
With sqldf the user is freed from having to do the following, all of which are automatically done:
database setup
writing the create table statement which defines each table
importing and exporting to and from the database
coercing of the returned columns to the appropriate class in common cases
See also here: Quickly reading very large tables as dataframes in R
I agree with what's been said so far. Though I guess getting started with MySQL (databases) in general is not a bad idea for the long if you are going to deal with data. I mean I checked your profile which says finance PhD student. I don't know if that means quant. finance, but it is likely that you will come across really large datasets in your career. I you can afford some time, I would recommend to learn something about databases. It just helps.
The documentation of MySQL itself is pretty solid and you can a lot of additional (specific) help here at SO.
I run MySQL with MySQL workbench on Mac OS X Snow Leopard too. So here´s what helped me to get it done comparatively easy.
I installed MAMP , which gives my an local Apache webserver with PHP, MySQL and the MySQL tool PHPmyadmin, which can be used as a nice webbased alternative for MySQL workbench (which is not always super stable on a Mac :) . You will have a little widget to start and stop servers and can access some basic configuration settings (such as ports through your browser) . It´s really one-click install here.
Install the Rpackage RMySQL . I will put my connection string here, maybe that helps:
Create your databases with MySQL workbench. INT and VARCHAR (for categorical variables that contain characters) should be the field types you basically need at the beginning.
Try to find the import routine that works best for you. I don't know if you are a shell / terminal guy – if so you'll like what was suggested by neilfws. You could also use LOAD DATA INFILE which is I prefer since it's only one query as opposed to INSERT INTO (line by line)
If you specify the problems that you have more accurately, you'll get some more specific help – so feel free to ask ;)
I assume you have to work a lot with time series data – there is a project (TSMySQL) around that use R and relational databases (such as MySQL, but also available for other DBMS) to store time series data. Besides you can even connect R to FAME (which is popular among financers, but expensive). The last paragraph is certainly nothing basic, but I thought it might help you to consider if it´s worth the hustle to dive into it a little deeper.
Practical Computing for Biologists as a nice (though subject-specific) introduction to SQLite
Chapter 15. Data Organization and Databases
I'm trying to load some Rake Fixtures (rake db:fixtures:load) into a MySql database and I'm seeing some weird behaviour with AutoIncrement values. Normally this goes up by 1 for each insert which allows me to define/create tests. (BTW - normal create/insert from script works correctly).
However when I load from fixtures the id field is assigned a large random number and the autoinc value on the table is also a large number (1054583385) after the load. Has anyone else seen this?
FWIW this is on Windows XP with MySql 5.0 (I also tested with MySql 5.1, found the problem and rolled back to 5.0).
Anybody else seen this - Is this a known bug/issue?
TIA,
This is not abnormal behavior for rails fixtures. It is, by design a random hash based on the label of your fixture. See the documentation.
You can explicitly specify an ID in your fixtures if needed.
id: 1
But does it really matter? Fixtures are meant to be used for tests. The ID of your objects is irrelevant as long as the relations are there.
Here is the relevant function from the Fixtures class:
# Returns a consistent identifier for +label+. This will always
# be a positive integer, and will always be the same for a given
# label, assuming the same OS, platform, and version of Ruby.
def self.identify(label)
label.to_s.hash.abs
end