dbConnect- Does cleaning data in R change data values in the real database - mysql

I am doing research on MySQL data. I used the dbConnect function to connect to the database and used dbReadTable to read a table.
My question is: if I start cleaning data to make it tidy using tidyr and dplyr, etc, will this change the data from the database (data that is stored in mySQL and was collected by researcher)
Or does cleaning data in R only change the data called upon in R and have NO EFFECT ON THE database.
I need a definitive, well-backed, and professional answer as the data I'm dealing with is pretty important and valuable.

Given a database connection, you can definitely modify data in the database by using any of the keywords such as INSERT, UPDATE, DELETE depending on the role of the database user;
One safe way to avoid any modification of the database is ask the database administrator (I assume you are not the one) to create a user that has only read access to it, and then connect the database using this specific user. Then you would be safe to do analysis without unintentionally injecting anything into your database because the database won't allow you to do so;
But most importantly consult with the database administrator before taking next step, this answer is just for giving a clue on how to do this safely from my personal perspective. No responsibility taken for the next move you made.

Related

A Web-based Database Simulator for students

I'm tasked to work on a web based SQL simulator much like what can be found on SQL Fiddle or the one on W3Schools' SQL Tryit editor. Here are the requirements for the simulator:
Multiple Students will be using the simulator at the same time.
The teacher should be able to see or track their changes and queries.
DDL should be included (e.g, CREATE, ALTER, DROP DATABASES & TABLES), of course certain privileges are enforced as to not ruin the database.
And also a simulation using MySQL directly obviously won't work. So, to anyone who has a suggestion on how to go about doing this, that would be awesome.
Hmmm quite a difficult task you got there...
I would try an approach like this:
Create a sample database with some entries and tables based on the task for the students
Whenever a students logs in and starts the task, copy this sample database for this specific user
Create an input field for all MySQL commands in your view and an execute button
Whenever the student clicks on execute: log his input and maybe the current database state to a log and execute the command on the database created for the user and return the return values you get from MySQL (errors, messages or selects) to the user
This approach is easy and reliable because the student can't harm another database besides his own (provided the database rights are configured correctly) and he can just restore the database from the sample database if he f*cked up.
It should also meet your requirements:
Multiple students can work on it simultaneously: Check
The teacher can track their changes: Check
DDL is possible and the student can't harm anything besides his own DB: Check

Any reason not to put stored procs independent of specific database in mysql's mysql database?

I've never wanted to touch the database actually named "mysql" (the one with tables proc, slow_log, user, etc) unless doing something officially supported with it. By this, I mean I wouldn't create new tables in it, etc.
But, if I'm creating a few stored procedures that operate on any database and aren't specific to a single one, is it appropriate to store those to the "mysql" database, or would it be better to create a "genericStoredProcs" database to put them in that had no tables?
I would not recommend adding any objects to the mysql database.
It might be possible to do that, and it might be supported. But I wouldn't take that risk. That's just asking for trouble. Just let the mysql database be what it's supposed to be, let it do what it's supposed to do, and don't muck with it.
If you need a database to store "shared" objects, then create a new database, and grant appropriate privileges.

MySQL backup multi-client DB for single client

I am facing a problem for a task I have to do at work.
I have a MySQL database which holds the information of several clients of my company and I have to create a backup/restore procedure to backup and restore such information for any single client. To clarify, if my client A is losing his data, I have to be able to recover such data being sure I am not modifying the data of client B, C, ...
I am not a DB administrator, so I don't know if I can do this using standard mysql tools (such as mysqldump) or any other backup tools (such as Percona Xtrabackup).
To backup, my research (and my intuition) led my to this possibile solution:
create the restore insert statement using the insert-select syntax (http://dev.mysql.com/doc/refman/5.1/en/insert-select.html);
save this inserts into a sql file, either in proper order or allowing this script to temporarily disable the foreign key checks to meet foreign keys' constraint;
of course, I do this for all my clients on a daily base, using a file for each client (and day).
Then, in the case I have to restore the data for a specific client:
I delete all his data left;
I restore the correct data using his sql file I created during the backup.
This way I believe I may recover the right data of client A without touching the data of client B. Is my solution eventually working? Is there any better way to achieve the same result? Or do you need more information about my problem?
Please, forgive me if this question is not well-formed, but I am new here and this is my first question so I may be unprecise...thanks anyway for the help.
Note: we will also backup the entire database with mysqldump.
You can use the --where parameter, you could provide a condition like *client_id=N* . Of course I am making an assumption since you don't provide any information on your schema.
If you have a Star schema , then you could probably write a small script that backups all lookup tables (considering they are adequately small) by using this parameter --tables and use the --where condition for your client data table. For additional performance, perhaps you could partition the table by the client_id.

Default database for MySQL

Is there a way to allocate a default database to a specific user in MySQL so they don't need to specify the database name while making a query?
I think you need to revisit some concepts - as Lmwangi points out if you are connecting with mysql client then my.cnf can set it.
However, your use of the word query suggests that you are talking about connecting from some programming environment - in this case you will always need a connection object. To create connection object and in this case having default database to connect to will lead to no improvement (in terms of speed or simplicity). Efficiently managing your connection(s) might be interesting for you - but for this you should let us know exactly what is your environment.
If you use a database schema you don't need to specify the database name every time, but you need to select the database name.
The best thing to do would be to use a MySQL trigger on the connection. However, MySQL only accepts triggers for updates, deletes and inserts. A quick Google search yielded an interesting stored procedure alternative. Please
see MySQL Logon trigger.
When you assign the permissions to every user group, you can also specify, at the same file, several things for that group, for example the database that users group need to use.
You can do this with a specification file, depending on the language you are working with, as a simple variable. Later, you only have to look for that variable to know which database you need to work with. But, I repeat, it depends on the language. The specification file can be an XML, phpspecs file, or anything like this.

Perl: How to copy/mirror remote MYSQL table(s) to another database? Possibly different structure too?

I am very new to this and a good friend is in a bind. I am at my wits end. I have used gui's like navicat and sqlyog to do this but, only manually.
His band info data (schedules and whatnot) is in a MYSQL database on a server (admin server).
I am putting together a basic site for him written in Perl that grabs data from a database that resides on my server (public server) and displays schedule info, previous gig newsletters and some fan interaction.
He uses an administrative interface, which he likes and desires to keep, to manage the data on the admin server.
The admin server db has a bunch of tables and even table data the public db does not need.
So, I created tables on the public side that only contain relevant data.
I basically used a gui to export the data, then insert to the public side whenever he made updates to the admin db (copy and paste).
(FYI I am using DBI module to access the data in/via my public db perl script.)
I could access the admin server directly to grab only the data I need but, the whole purpose of this is to "mirror" the data not access the admin server on every query. Also, some tables are THOUSANDS of rows and parsing every row in a loop seemed too "bulky" to me. There is however a "time" column which could be utilized to compare to.
I cannot "sync" due to the fact that the structures are different, I only need the relevant table data from only three tables.
SO...... I desire to automate!
I read "copy" was a fast way but, my findings in how to implement were too advanced for my level.
I do not have the luxury of placing a script on the admin server to notify when there was an update.
1- I would like to set up a script to check a table to see if a row was updated or added on the admin servers db.
I would then desire to update or insert the new or changed data to the public servers db.
This "check" could be set up in a cron job I guess or triggered when a specific page loads on the public side. (the same sub routine called by the cron I would assume).
This data does not need to be "real time" but, if he updates something it would be nice to have it appear as quickly as possible.
I have done much reading, module research and experimenting but, here I am again at stackoverflow where I always get great advice and examples.
Much of the terminology is still quite over my head so verbose examples with explanations really help me learn quicker.
Thanks in advance.
The two terms you are looking for are either "replication" or "ETL".
First, replication approach.
Let's assume your admin server has tables T1, T2, T3 and your public server has tables TP1, TP2.
So, what you want to do (since you have different table structres as you said) is:
Take the tables from public server, and create exact copies of those tables on the admin server (TP1 and TP2).
Create a trigger on the admin server's original tables to populate the data from T1/T2/T3 into admin server's copy of TP1/TP2.
You will also need to do initial data population from T1/T2/T3 into admin server's copy of TP1/TP2. Duh.
Set up the "replication" from admin server's TP1/TP2 to public server's TP1/TP2
A different approach is to write a program (such programs are called ETL - Extract-Transform-Load) which will extract the data from T1/T2/T3 on admin server (the "E" part of "ETL"), massage the data into format suitable for loading into TP1/TP2 tables (the "T" part of "ETL"), transfer (via ftp/scp/whatnot) those files to public server, and the second half of the program (the "L") part will load the files into the tables TP1/TP2 on public server. Both halfs of the program would be launched by cron or your scheduler of choice.
There's an article with a very good example of how to start building Perl/MySQL ETL: http://oreilly.com/pub/a/databases/2007/04/12/building-a-data-warehouse-with-mysql-and-perl.html?page=2
If you prefer not to build your own, here's a list of open source ETL systems, never used any of them so no opinions on their usability/quality: http://www.manageability.org/blog/stuff/open-source-etl
I think you've misunderstood ETL as a problem domain, which is complicated, versus ETL as a one-off solution, which is often not much harder than writing a report. Unless I've totally misunderstood your problem, you don't need a general ETL solution, you need a one-off solution that works on a handful of tables and a few thousand rows. ETL and Schema mapping sound scarier than they are for a single job. (The generalization, scaling, change-management, and OLTP-to-OLAP support of ETL are where it gets especially difficult.) If you can use Perl to write a report out of a SQL database, you probably know enough to handle the ETL involved here.
1- I would like to set up a script to check a table to see if a row was updated or added on the admin servers db. I would then desire to update or insert the new or changed data to the public servers db.
If every table you need to pull from has an update timestamp column, then your cron job includes some SELECT statements with WHERE clauses based on the last time the cron job ran to get only the updates. Tables without an update timestamp will probably need a full dump.
I'd use a one-to-one table mapping unless normalization was required... just simpler to my opinion. Why complicate it with "big" schema changes if you don't have to?
some tables are THOUSANDS of rows and parsing every row in a loop seemed too "bulky" to me.
Limit your queries to only the columns you need (and if there are no BLOBs or exceptionally big columns in what you need) a few thousand rows should not be a problem via DBI with a FETCHALL method. Loop all you want locally, just make as few trips to the remote database as possible.
If a row is has a newer date, update it. I will also have to check for new rows for insertion.
Each table needs one SELECT ... WHERE updated_timestamp_columnname > last_cron_run_timestamp. That result set will contain all rows with newer timestamps, which contains newly inserted rows (if the timestamp column behaves like I'd expect). For updating your local database, check out MySQL's ON DUPLICATE KEY UPDATE syntax... this will let you do it in one step.
... how to implement were too advanced for my level ...
Yes, I have actually done this already but, I have to manually update...
Some questions to help us understand your level... Are you hitting the database from the mysql client command-line or from a GUI? Have you gotten to the point where you've wrapped your SQL queries in Perl and DBI, yet?
If the two databases have different, you'll need an ETL solution to map from one schema to another.
If the schemas are the same, all you have to do is replicate the data from one to the other.
Why not just create identical structure on the 'slave' server to the master server. Then create a small table that keeps track of the last timestamp or id for the updated tables.
Then select from the master all rows changed since the last timestamp or greater than the id. Insert them into the matching table on the slave server.
You will need to be careful of updated rows. If a row on the master is updated but the timestamp doesn't change then how will you tell which rows to fetch? If that's not an issue the process is quite simple.
If it is an issue then you need to be more sophisticated, but without knowing the data structure and update mechanism its a goose chase to give pointers on it.
The script could be called by cron every so often to update the changes.
if the database structures must be different on the two servers then a simple translation step may need to be added, but most of the time that can be done within the sql select statement and maybe a join or two.