Is it possible "Database Synchronization" - mysql

I have a problem and not sure if this is possible. My web application has a database and i'm using a mysql workbench and using wamp server.
My web app has a database name healthcare, and if I import again another database with the same tables, etc but addition data. I want the first database to be updated only with new values but not replaced.
Is it possible?
Edit: I searched in the net and other related sources and I manage to set my phpmyadmin "Ignore multiple statement errors". When I import the second database (.sql with same tables but with new data) it does not update the first database but the message is successful. Please help, I'll appreciate any help...

in the past ive searched for tools to do some similar database sync tasks - in my experience ive found that none are free & reliable.
have you tried writing some queries to do this manually?
first thing that comes to mind would be figuring out a key you can use to evaluate each row and determine if you should copy said record from database A to database B.
afterwards you could simply do an INSERT(SELECT)
INSERT INTO healthcare_DESTINATION.table (SELECT * FROM healthcare_SOURCE WHERE some_condition = 1);
obviously this is the simplified version - but i've done something very similar utilizing timestamps (eg only copy rows newer than the newest row in the destination table)
hope this helps

Related

How to properly wipe a database, and re-import?

I am unsure about the best way to do this. As I'm getting ready to put a new database into production, I need to import data from the old database that has been formed in the meantime of me working on it. The new database now also contains a lot of fake data that was used for testing, which I have to get rid of, so a fresh complete re-import seems reasonable.
Now, truncating all the tables in the new database cannot go through, because the foreign keys prevent it. Simply deleting the data instead would solve that problem, but it leaves the AUTO_INCREMENT indexes to the values where they were, so it's not a "proper" wipe. Now, there could be more properties such as that one, that would be left over (so to say), but this is the only one that I'm aware of.
So my question now is, how much of a problem could these "leftover" pieces of data pose to performance, if I were to go with the simple DELETE solution?
And also; is there a way that would be more thorough in cleaning it out, and also allow me to, of course, keep the defined constraints?
First i would use some gui tool to create the dump for the old DB ( like mySql workbench, or what ever you prefer ). Check options "Export to self-contained file", and check "Dump stored procedures and functions","Dump events" and "Dump triggers".
Then get create scripts for all tables not included in the old DB.
You can do this via "reverse engineer" option.
If you have trouble with this part this post will help.
How to get a table creation script in MySQL Workbench?
When you have old DB dump and create scripts for new sql tables, combine them to a single sql file.
On the first row add:
SET FOREIGN_KEY_CHECKS = 0;
On the last row add:
SET FOREIGN_KEY_CHECKS = 1;
Run the script. As a result you should have all tables ( new without data and old with data ), with all relations set properly. Hope it will work for you.

MySQL server lost power, partial data loss

I had a server running mysql. The power was cut to the machine, and mysql (I assume was forced to terminate)
Now, when I try to connect to the database again, the tables exist, but there doesn't appear to be anything in the tables, would there be any chance of a way to fix this.
When I use the SHOW TABLES command, it lists the corrupted table. When I use the SELECT * FROM [corrupted table] it says table does not exist.
I understand if it seems impossible (please let me know if so)
THANKS!
in such case we retrive a data from log file:
1) [This is first link][1],
2) [This is main URL][2]

Import some database entries through PHPMyAdmin with overwrite

I exported a couple of entries from a database I have stored locally on my MySQL dbase through PhpMyAdmin and I'd like to replace only those entries on my destination database hosted online. Unfortunately when I try to do so PHPMyAdmin says that those posts already exist and therefore he can't erase them.
It'll take me a lot of time to search for those entries manually within the rest of the posts and delete them one at a time so I was wondering if there's any workaround in order to overwite those entries on import.
Thanks in advance!
A great option is to handle this on your initial export from phpMyAdmin locally. When exporting from phpMyAdmin:
Export method: Custom
Format: SQL
Format-specific options - choose "data" (instead of "structure" or "structure and data")
In Data creation options - Function to use when dumping data: Switch "Insert" to "Update" <-- This is the ticket!
Click Go!
Import into your production database. (always backup your production database before hand just in case)
I know this is an old post, but it actually helped me find a solution built into phpMyAdmin. Hope it helps someone else!
This is a quick and dirty way to do it. Others may have a better solution:
It sounds like you're trying to run INSERT queries, and phpMyAdmin is telling you they already exist. If you use UPDATE queries, you could update the info.
I would copy the queries you have there, into a text editor, preferably one that can handle find and replace, like Notepad++ or Gedit, and then replace some code to change the queries around from INSERT to UPDATE.
See: http://dev.mysql.com/doc/refman/5.0/en/update.html
OR, you could just delete them, then run your INSERT queries.
You might be able to use some logic with find and replace to make a DELETE query that gets rid of them first.
http://dev.mysql.com/doc/refman/5.0/en/delete.html
Check out insert on duplicate. You can either add the syntax to your entries stored locally, or import into a temporary database, then run an INSERT ... SELECT ... ON DUPLICATE KEY UPDATE. If you could post a schema, it would help us guide you better.

Perl: How to copy/mirror remote MYSQL table(s) to another database? Possibly different structure too?

I am very new to this and a good friend is in a bind. I am at my wits end. I have used gui's like navicat and sqlyog to do this but, only manually.
His band info data (schedules and whatnot) is in a MYSQL database on a server (admin server).
I am putting together a basic site for him written in Perl that grabs data from a database that resides on my server (public server) and displays schedule info, previous gig newsletters and some fan interaction.
He uses an administrative interface, which he likes and desires to keep, to manage the data on the admin server.
The admin server db has a bunch of tables and even table data the public db does not need.
So, I created tables on the public side that only contain relevant data.
I basically used a gui to export the data, then insert to the public side whenever he made updates to the admin db (copy and paste).
(FYI I am using DBI module to access the data in/via my public db perl script.)
I could access the admin server directly to grab only the data I need but, the whole purpose of this is to "mirror" the data not access the admin server on every query. Also, some tables are THOUSANDS of rows and parsing every row in a loop seemed too "bulky" to me. There is however a "time" column which could be utilized to compare to.
I cannot "sync" due to the fact that the structures are different, I only need the relevant table data from only three tables.
SO...... I desire to automate!
I read "copy" was a fast way but, my findings in how to implement were too advanced for my level.
I do not have the luxury of placing a script on the admin server to notify when there was an update.
1- I would like to set up a script to check a table to see if a row was updated or added on the admin servers db.
I would then desire to update or insert the new or changed data to the public servers db.
This "check" could be set up in a cron job I guess or triggered when a specific page loads on the public side. (the same sub routine called by the cron I would assume).
This data does not need to be "real time" but, if he updates something it would be nice to have it appear as quickly as possible.
I have done much reading, module research and experimenting but, here I am again at stackoverflow where I always get great advice and examples.
Much of the terminology is still quite over my head so verbose examples with explanations really help me learn quicker.
Thanks in advance.
The two terms you are looking for are either "replication" or "ETL".
First, replication approach.
Let's assume your admin server has tables T1, T2, T3 and your public server has tables TP1, TP2.
So, what you want to do (since you have different table structres as you said) is:
Take the tables from public server, and create exact copies of those tables on the admin server (TP1 and TP2).
Create a trigger on the admin server's original tables to populate the data from T1/T2/T3 into admin server's copy of TP1/TP2.
You will also need to do initial data population from T1/T2/T3 into admin server's copy of TP1/TP2. Duh.
Set up the "replication" from admin server's TP1/TP2 to public server's TP1/TP2
A different approach is to write a program (such programs are called ETL - Extract-Transform-Load) which will extract the data from T1/T2/T3 on admin server (the "E" part of "ETL"), massage the data into format suitable for loading into TP1/TP2 tables (the "T" part of "ETL"), transfer (via ftp/scp/whatnot) those files to public server, and the second half of the program (the "L") part will load the files into the tables TP1/TP2 on public server. Both halfs of the program would be launched by cron or your scheduler of choice.
There's an article with a very good example of how to start building Perl/MySQL ETL: http://oreilly.com/pub/a/databases/2007/04/12/building-a-data-warehouse-with-mysql-and-perl.html?page=2
If you prefer not to build your own, here's a list of open source ETL systems, never used any of them so no opinions on their usability/quality: http://www.manageability.org/blog/stuff/open-source-etl
I think you've misunderstood ETL as a problem domain, which is complicated, versus ETL as a one-off solution, which is often not much harder than writing a report. Unless I've totally misunderstood your problem, you don't need a general ETL solution, you need a one-off solution that works on a handful of tables and a few thousand rows. ETL and Schema mapping sound scarier than they are for a single job. (The generalization, scaling, change-management, and OLTP-to-OLAP support of ETL are where it gets especially difficult.) If you can use Perl to write a report out of a SQL database, you probably know enough to handle the ETL involved here.
1- I would like to set up a script to check a table to see if a row was updated or added on the admin servers db. I would then desire to update or insert the new or changed data to the public servers db.
If every table you need to pull from has an update timestamp column, then your cron job includes some SELECT statements with WHERE clauses based on the last time the cron job ran to get only the updates. Tables without an update timestamp will probably need a full dump.
I'd use a one-to-one table mapping unless normalization was required... just simpler to my opinion. Why complicate it with "big" schema changes if you don't have to?
some tables are THOUSANDS of rows and parsing every row in a loop seemed too "bulky" to me.
Limit your queries to only the columns you need (and if there are no BLOBs or exceptionally big columns in what you need) a few thousand rows should not be a problem via DBI with a FETCHALL method. Loop all you want locally, just make as few trips to the remote database as possible.
If a row is has a newer date, update it. I will also have to check for new rows for insertion.
Each table needs one SELECT ... WHERE updated_timestamp_columnname > last_cron_run_timestamp. That result set will contain all rows with newer timestamps, which contains newly inserted rows (if the timestamp column behaves like I'd expect). For updating your local database, check out MySQL's ON DUPLICATE KEY UPDATE syntax... this will let you do it in one step.
... how to implement were too advanced for my level ...
Yes, I have actually done this already but, I have to manually update...
Some questions to help us understand your level... Are you hitting the database from the mysql client command-line or from a GUI? Have you gotten to the point where you've wrapped your SQL queries in Perl and DBI, yet?
If the two databases have different, you'll need an ETL solution to map from one schema to another.
If the schemas are the same, all you have to do is replicate the data from one to the other.
Why not just create identical structure on the 'slave' server to the master server. Then create a small table that keeps track of the last timestamp or id for the updated tables.
Then select from the master all rows changed since the last timestamp or greater than the id. Insert them into the matching table on the slave server.
You will need to be careful of updated rows. If a row on the master is updated but the timestamp doesn't change then how will you tell which rows to fetch? If that's not an issue the process is quite simple.
If it is an issue then you need to be more sophisticated, but without knowing the data structure and update mechanism its a goose chase to give pointers on it.
The script could be called by cron every so often to update the changes.
if the database structures must be different on the two servers then a simple translation step may need to be added, but most of the time that can be done within the sql select statement and maybe a join or two.

Set up large database in MySQL for analysis in R

I have reached the limit of RAM in analyzing large datasets in R. I think my next step is to import these data into a MySQL database and use the RMySQL package. Largely because I don't know database lingo, I haven't been able to figure out how to get beyond installing MySQL with hours of Googling and RSeeking (I am running MySQL and MySQL Workbench on Mac OSX 10.6, but can also run Ubuntu 10.04).
Is there a good reference on how to get started with this usage? At this point I don't want to do any sort of relational databasing. I just want to import .csv files into a local MySQL database and do the subsetting in with RMySQL.
I appreciate any pointers (including "You're way off base!" as I'm new to R and newer to large datasets... this one's around 80 mb)
The documentation for RMySQL is pretty good - but it does assume that you know the basics of SQL. These are:
creating a database
creating a table
getting data into the table
getting data out of the table
Step 1 is easy: in the MySQL console, simply "create database DBNAME". Or from the command line, use mysqladmin, or there are often MySQL admin GUIs.
Step 2 is a little more difficult, since you have to specify the table fields and their type. This will depend on the contents of your CSV (or other delimited) file. A simple example would look something like:
use DBNAME;
create table mydata(
id INT(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
height FLOAT(3,2)
);
Which says create a table with 2 fields: id, which will be the primary key (so has to be unique) and will autoincrement as new records are added; and height, which here is specified as a float (a numeric type), with 3 digits total and 2 after the decimal point (e.g. 100.27). It's important that you understand data types.
Step 3 - there are various ways to import data to a table. One of the easiest is to use the mysqlimport utility. In the example above, assuming that your data are in a file with the same name as the table (mydata), the first column a tab character and the second the height variable (with no header row), this would work:
mysqlimport -u DBUSERNAME -pDBPASSWORD DBNAME mydata
Step 4 - requires that you know how to run MySQL queries. Again, a simple example:
select * from mydata where height > 50;
Means "fetch all rows (id + height) from the table mydata where height is more than 50".
Once you have mastered those basics, you can move to more complex examples such as creating 2 or more tables and running queries that join data from each.
Then - you can turn to the RMySQL manual. In RMySQL, you set up the database connection, then use SQL query syntax to return rows from the table as a data frame. So it really is important that you get the SQL part - the RMySQL part is easy.
There are heaps of MySQL and SQL tutorials on the web, including the "official" tutorial at the MySQL website. Just Google search "mysql tutorial".
Personally, I don't consider 80 Mb to be a large dataset at all; I'm surprised that this is causing a RAM issue and I'm sure that native R functions can handle it quite easily. But it's good to learn new skill such as SQL, even if you don't need them for this problem.
I have a pretty good suggestion. For 80MB use SQLite. SQLite is a super public domain, lightweight, super fast file-based database that works (almost) just like a SQL database.
http://www.sqlite.org/index.html
You don't have to worry about running any kind of server or permissions, your database handle is just a file.
Also, it stores all data as a string, so you don't even have to worry about storing the data as types (since all you need to do is emulate a single text table anyway).
Someone else mentioned sqldf:
http://code.google.com/p/sqldf/
which does interact with SQLite:
http://code.google.com/p/sqldf/#9._How_do_I_examine_the_layout_that_SQLite_uses_for_a_table?_whi
So your SQL create statement would be like this
create table tablename (
id INT(11) INTEGER PRIMARY KEY,
first_column_name TEXT,
second_column_name TEXT,
third_column_name TEXT
);
Otherwise, neilfws' explanation is a pretty good one.
P.S. I'm also a little surprised that your script is choking on 80mb. It's not possible in R to just seek through the file in chunks without opening it all up in memory?
The sqldf package might give you an easier way to do what you need: http://code.google.com/p/sqldf/. Especially if you are the only person using the database.
Edit: Here is why I think it would be useful in this case (from the website):
With sqldf the user is freed from having to do the following, all of which are automatically done:
database setup
writing the create table statement which defines each table
importing and exporting to and from the database
coercing of the returned columns to the appropriate class in common cases
See also here: Quickly reading very large tables as dataframes in R
I agree with what's been said so far. Though I guess getting started with MySQL (databases) in general is not a bad idea for the long if you are going to deal with data. I mean I checked your profile which says finance PhD student. I don't know if that means quant. finance, but it is likely that you will come across really large datasets in your career. I you can afford some time, I would recommend to learn something about databases. It just helps.
The documentation of MySQL itself is pretty solid and you can a lot of additional (specific) help here at SO.
I run MySQL with MySQL workbench on Mac OS X Snow Leopard too. So here´s what helped me to get it done comparatively easy.
I installed MAMP , which gives my an local Apache webserver with PHP, MySQL and the MySQL tool PHPmyadmin, which can be used as a nice webbased alternative for MySQL workbench (which is not always super stable on a Mac :) . You will have a little widget to start and stop servers and can access some basic configuration settings (such as ports through your browser) . It´s really one-click install here.
Install the Rpackage RMySQL . I will put my connection string here, maybe that helps:
Create your databases with MySQL workbench. INT and VARCHAR (for categorical variables that contain characters) should be the field types you basically need at the beginning.
Try to find the import routine that works best for you. I don't know if you are a shell / terminal guy – if so you'll like what was suggested by neilfws. You could also use LOAD DATA INFILE which is I prefer since it's only one query as opposed to INSERT INTO (line by line)
If you specify the problems that you have more accurately, you'll get some more specific help – so feel free to ask ;)
I assume you have to work a lot with time series data – there is a project (TSMySQL) around that use R and relational databases (such as MySQL, but also available for other DBMS) to store time series data. Besides you can even connect R to FAME (which is popular among financers, but expensive). The last paragraph is certainly nothing basic, but I thought it might help you to consider if it´s worth the hustle to dive into it a little deeper.
Practical Computing for Biologists as a nice (though subject-specific) introduction to SQLite
Chapter 15. Data Organization and Databases