Is there a way to have a .csv imported into a SQL Table automatically in mysql db? I know how to do it manually, but there is a situation where a .csv is exported nightly from PeopleSoft and we want that imported automatically into SQL Table in linux environment. plese give me a sample script to do that.. If there's a way, can anyone point me in that direction (I'm not a SQL expert)!!
You can try creating Stored procedure,
Write load csv query into SP.
Create Event to call SP.
I hope this helps.
CREATE EVENT IF NOT EXISTS `load_csv_event`
ON SCHEDULE EVERY 23 DAY_HOUR
DO CALL my_sp_load_csv();
Alos, You can directly create an event and write a load query into it.
You could create a crontab job, for example:
* * * * * /path/to/load_script.sh
Where load_script.sh may be like (do not forget make it executable):
#!/bin/bash
IMPORTED_FILE_PATH=/path/to/your/imported/file.csv
TABLENAME=target_table_name
DATABASE=db_name
TMP_FILENAME=/tmp/${TABLENAME}.cvs
# do nothing if imported file does not exist
[ -f "$IMPORTED_FILE_PATH" ] || exit 0
# if temporary file exists, then it means previous import job is running. Also do nothing
[ -f "$TMP_FILENAME" ] && exit 0
# Move it to tmp and rename to target table name
mv "$IMPORTED_FILE_PATH" "$TMP_FILENAME"
mysqlimport --user=mysqlusername --password=mysqlpassword --host=mysqlhost --local $DATABASE $TMP_FILENAME
rm -f "$TMP_FILENAME"
It is just an example (not tested). You should add error handling, logging, etc.
Also look at manual of mysqlimport
I have created a script of hive queries mainly for features creation and scoring for cross sell project. Most of the queries are simple queries that do the data cleaning , transformation etc. I want to automate this process so that I can start with hive table as input and can output the final result into Hbase file . My question are :
What is the best way to do it ?
Can I simply create filename.sql or filename.hql and run it from shell using hive -f filename.sql
Is there something in hive like PL for SQL?
You can do it in multiple ways.
Like you can also use Hive CLI and its very ease to do such jobs.
You can write shell script in Linux or .bat in Windows.
In script you can simply go like below entries.
$HIVE_HOME/bin/hive -e 'select a.col from tab1 a';
or if you have file :
$HIVE_HOME/bin/hive -f /home/my/hive-script.sql
Make sure you have set $HIVE_HOME in your env.
Once you have tested and working fine you can put in cronjob for scheduling.
It is important to note that if you are using either of the technique, each of your queries must be separated by a semi colon i.e.
hive -e 'select * from tableA limit 10;select * from tableB limit 10'
I have to move data between two SQL Server DBs. My task is to export the data as text (.dat) files, move the files and import into the destination. I have to migrate over 200 tables.
This is what I tried
1) I used a Execute SQL task to fetch my tables.
2) Used a For each loop to loop through the table names from the collection.
3) Used a script task inside the for each loop to build the text file destination path.
4) Called a DFT with the table name in a variable for the source ole db and the path name in a variable for the destination flat file.
First table extracts fine but the second table bombs with a synchronization error. I see this is numerous posts but could not find one that matches my scenario. Hence posting here.
Even if I get the package to work with multiple DFTs, the second table from the second DFT does not export columns because the flat file connection manager still remembers the first table columns. Is there a way to get it to forget the columns?
Any thoughts on how I can export multiple tables to multiple text files using one DFT using dynamic source and destination variable?
Thanks and appreciate your help.
Unfortunately Bulk Import Task only enable us to use format files effectively to map the columns between source and destinations. Bulk Import Task uses BULK INSERT TSQL command to import the data, to execute user should have the BULKADMIN server privilege.
Most of the companies would not allow BULKADMIN server privilege to enable due to security reasons.
Hence using the script task to construct BCP statements is a good and simple option to Export.
You does not require to construct .bat file as script itself can execute dos commands which runs under .NET security account.
I figured out a way to do this. I thought I will share if anybody is stuck in the same situation.
So, in summary, I needed to export and import data via files. I also wanted to use a format file if at all possible for various reasons.
What I did was
1) Construct a DFT which gets me a list of table names from the DB that I need to export. I used 'oledb' as a source and 'recordset destination' as target and stored the table names inside a object variable.
A DFT is not really necessary. You can do it any other way. Also, in our application, we store the table names in a table.
2) Add a 'For each loop container' with a 'For Each ADO Enumerator' which takes my object variable from the previous step into the collection.
3) Parse the variable one by one and construct BCP statements like below inside a Script task. Create variables as necessary. The BCP statement will be stored in a variable.
I loop through the tables and construct multiple BCP statements like this.
BCP "DBNAME.DBO.TABLENAME1" out "PATH\FILENAME2.dat" -S SERVERNAME -T -t"|" -r$\n -f "PATH\filename.fmt"
BCP "DBNAME.DBO.TABLENAME1" out "PATH\FILENAME2.dat" -S SERVERNAME -T -t"|" -r$\n -f "PATH\filename.fmt"
The statements are put inside a .bat file. This is also done inside the script task.
4) A execute process task will next execute the .BAT file. I had to do this because, I do not have the option to use the 'master..xp_cmdShell' command or the 'BULK INSERT' command in my company. If I had the option to execute cmdshell, I could have directly run the command from the package.
5) Again add a 'For each loop container' with a 'For Each ADO Enumerator' which takes my object variable from the previous step into the collection.
6) Parse the variable one by one and construct BCP statements like this inside a Script task. Create variables as necessary. The BCP statement will be stored in a variable.
I loop through the tables and construct multiple BCP statements like this.
BCP "DBNAME.DBO.TABLENAME1" in "PATH\FILENAME2.dat" -S SERVERNAME -T -t"|" -r$\n -b10000 -f "PATH\filename.fmt"
BCP "DBNAME.DBO.TABLENAME1" in "PATH\FILENAME2.dat" -S SERVERNAME -T -t"|" -r$\n -b10000 -f "PATH\filename.fmt"
The statements are put inside a .bat file. This is also done inside the script task.
The -b10000 was put so I can import in batches. Without this many of my large tables could not be copied due to less space in the tempdb.
7) Run the .bat file to import the file again.
I am not sure if this is the best solution. I still thought I will share what satisfied my requirement. If my answer is not clear, I would be happy to explain if you have any questions. We can also optimize this solution. The same can be done purely via VB Scripts but you have to write some code to do that.
I also created a package configuration file where I can change the DB name, server name, the data and format file locations dynamically.
Thanks.
This has been asked a few times but I cannot find a resolution to my problem. Basically when using mysqldump, which is the built in tool for the MySQL Workbench administration tool, when I dump a database using extended inserts, I get massive long lines of data. I understand why it does this, as it speeds inserts by inserting the data as one command (especially on InnoDB), but the formatting makes it REALLY difficult to actually look at the data in a dump file, or compare two files with a diff tool if you are storing them in version control etc. In my case I am storing them in version control as we use the dump files to keep track of our integration test database.
Now I know I can turn off extended inserts, so I will get one insert per line, which works, but any time you do a restore with the dump file it will be slower.
My core problem is that in the OLD tool we used to use (MySQL Administrator) when I dump a file, it does basically the same thing but it FORMATS that INSERT statement to put one insert per line, while still doing bulk inserts. So instead of this:
INSERT INTO `coupon_gv_customer` (`customer_id`,`amount`) VALUES (887,'0.0000'),191607,'1.0300');
you get this:
INSERT INTO `coupon_gv_customer` (`customer_id`,`amount`) VALUES
(887,'0.0000'),
(191607,'1.0300');
No matter what options I try, there does not seem to be any way of being able to get a dump like this, which is really the best of both worlds. Yes, it take a little more space, but in situations where you need a human to read the files, it makes it MUCH more useful.
Am I missing something and there is a way to do this with MySQLDump, or have we all gone backwards and this feature in the old (now deprecated) MySQL Administrator tool is no longer available?
Try use the following option:
--skip-extended-insert
It worked for me.
With the default mysqldump format, each record dumped will generate an individual INSERT command in the dump file (i.e., the sql file), each on its own line. This is perfect for source control (e.g., svn, git, etc.) as it makes the diff and delta resolution much finer, and ultimately results in a more efficient source control process. However, for significantly sized tables, executing all those INSERT queries can potentially make restoration from the sql file prohibitively slow.
Using the --extended-insert option fixes the multiple INSERT problem by wrapping all the records into a single INSERT command on a single line in the dumped sql file. However, the source control process becomes very inefficient. The entire table contents is represented on a single line in the sql file, and if a single character changes anywhere in that table, source control will flag the entire line (i.e., the entire table) as the delta between versions. And, for large tables, this negates many of the benefits of using a formal source control system.
So ideally, for efficient database restoration, in the sql file, we want each table to be represented by a single INSERT. For an efficient source control process, in the sql file, we want each record in that INSERT command to reside on its own line.
My solution to this is the following back-up script:
#!/bin/bash
cd my_git_directory/
ARGS="--host=myhostname --user=myusername --password=mypassword --opt --skip-dump-date"
/usr/bin/mysqldump $ARGS --database mydatabase | sed 's$VALUES ($VALUES\n($g' | sed 's$),($),\n($g' > mydatabase.sql
git fetch origin master
git merge origin/master
git add mydatabase.sql
git commit -m "Daily backup."
git push origin master
The result is a sql file INSERT command format that looks like:
INSERT INTO `mytable` VALUES
(r1c1value, r1c2value, r1c3value),
(r2c1value, r2c2value, r2c3value),
(r3c1value, r3c2value, r3c3value);
Some notes:
password on the command line ... I know, not secure, different discussion.
--opt: Among other things, turns on the --extended-insert option (i.e., one INSERT per table).
--skip-dump-date: mysqldump normally puts a date/time stamp in the sql file when created. This can become annoying in source control when the only delta between versions is that date/time stamp. The OS and source control system will date/time stamp the file and version. Its not really needed in the sql file.
The git commands are not central to the fundamental question (formatting the sql file), but shows how I get my sql file back into source control, something similar can be done with svn. When combining this sql file format with your source control of choice, you will find that when your users update their working copies, they only need to move the deltas (i.e., changed records) across the internet, and they can take advantage of diff utilities to easily see what records in the database have changed.
If you're dumping a database that resides on a remote server, if possible, run this script on that server to avoid pushing the entire contents of the database across the network with each dump.
If possible, establish a working source control repository for your sql files on the same server you are running this script from; check them into the repository from there. This will also help prevent having to push the entire database across the network with every dump.
As others have said using sed to replace "),(" is not safe as this can appear as content in the database.
There is a way to do this however:
if your database name is my_database then run the following:
$ mysqldump -u my_db_user -p -h 127.0.0.1 --skip-extended-insert my_database > my_database.sql
$ sed ':a;N;$!ba;s/)\;\nINSERT INTO `[A-Za-z0-9$_]*` VALUES /),\n/g' my_database.sql > my_database2.sql
you can also use "sed -i" to replace in-line.
Here is what this code is doing:
--skip-extended-insert will create one INSERT INTO for every row you have.
Now we use sed to clean up the data. Note that regular search/replace with sed applies for single line so we cannot detect the "\n" character as sed works one line at a time. That is why we put ":a;N;$!ba;" which basically tells sed to search multi-line and buffer the next line.
Hope this helps
What about storing the dump into a CSV file with mysqldump, using the --tab option like this?
mysqldump --tab=/path/to/serverlocaldir --single-transaction <database> table_a
This produces two files:
table_a.sql that contains only the table create statement; and
table_a.txt that contains tab-separated data.
RESTORING
You can restore your table via LOAD DATA:
LOAD DATA INFILE '/path/to/serverlocaldir/table_a.txt'
INTO TABLE table_a FIELDS TERMINATED BY '\t' ...
LOAD DATA is usually 20 times faster than using INSERT statements.
If you have to restore your data into another table (e.g. for review or testing purposes) you can create a "mirror" table:
CREATE TABLE table_for_test LIKE table_a;
Then load the CSV into the new table:
LOAD DATA INFILE '/path/to/serverlocaldir/table_a.txt'
INTO TABLE table_for_test FIELDS TERMINATED BY '\t' ...
COMPARE
A CSV file is simplest for diffs or for looking inside, or for non-SQL technical users who can use common tools like Excel, Access or command line (diff, comm, etc...)
I'm afraid this won't be possible. In the old MySQL Administrator I wrote the code for dumping db objects which was completely independent of the mysqldump tool and hence offered a number of additional options (like this formatting or progress feedback). In MySQL Workbench it was decided to use the mysqldump tool instead which, besides being a step backwards in some regards and producing version problems, has the advantage to stay always up-to-date with the server.
So the short answer is: formatting is currently not possible with mysqldump.
Try this:
mysqldump -c -t --add-drop-table=FALSE --skip-extended-insert -uroot -p<Password> databaseName tableName >c:\path\nameDumpFile.sql
I found this tool very helpful for dealing with extended inserts: http://blog.lavoie.sl/2014/06/split-mysqldump-extended-inserts.html
It parses the mysqldump output and inserts linebreaks after each record, but still using the faster extended inserts. Unlike a sed script, there shouldn't be any risk of breaking lines in the wrong place if the regex happens to match inside a string.
I liked Ace.Di's solution with sed, until I got this error:
sed: Couldn't re-allocate memory
Thus I had to write a small PHP script
mysqldump -u my_db_user -p -h 127.0.0.1 --skip-extended-insert my_database | php mysqlconcatinserts.php > db.sql
The PHP script also generates a new INSERT for each 10.000 rows, again to avoid memory problems.
mysqlconcatinserts.php:
#!/usr/bin/php
<?php
/* assuming a mysqldump using --skip-extended-insert */
$last = '';
$count = 0;
$maxinserts = 10000;
while($l = fgets(STDIN)){
if ( preg_match('/^(INSERT INTO .* VALUES) (.*);/',$l,$s) )
{
if ( $last != $s[1] || $count > $maxinserts )
{
if ( $count > $maxinserts ) // Limit the inserts
echo ";\n";
echo "$s[1] ";
$comma = '';
$last = $s[1];
$count = 0;
}
echo "$comma$s[2]";
$comma = ",\n";
} elseif ( $last != '' ) {
$last = '';
echo ";\n";
}
$count++;
}
add
set autocommit=0;
to first line of your sql script file, then import by:
mysql -u<user> -p<password> --default-character-set=utf8 db_name < <path>\xxx.sql
, it will fast 10x.
I've been trying to get a shell(bash) script to insert a row into a REMOTE database, but I've been having some trouble :(
The script is meant to upload a file to a server, get a URL, HASH, and a file size, connect to a remote mysql database, and insert the data into an existing table. I've gotten it working until the remote MYSQL database bit.
It looks like this:
#!/bin/bash
zxw=randomtext
description=randomtext2
for file in "$#"
do
echo -n *****
ident= *****
data= ****
size=` ****
hash=`****
mysql --host=randomhost --user=randomuser --password=randompass randomdb
insert into table (field1,field2,field3) values('http://www.example.com/$hash','$file','$size');
echo "done"
done
I'm a total noob at programming so yeah :P
Anyway, I added the \ to escape the brackets as I was getting errors. As it is right now, the script is works fine until connects to the mysql database. It just connects to the mysql database and doesn't do the insert command (and I don't even know if the insert command would work in bash).
PS: I've tried both the mysql commands from the command line one by one, and they worked, though I defined the hash/file/size and didn't have the escaping "".
Anyway, what do you guys think? Is what I'm trying to do even possible? If so how?
Any help would be appreciated :)
The insert statement has to be sent to mysql, not another line in the shell script, so you need to make it a "here document".
mysql --host=randomhost --user=randomuser --password=randompass randomdb << EOF
insert into table (field1,field2,field3) values('http://www.site.com/$hash','$file','$size');
EOF
The << EOF means take everything before the next line that contains nothing but EOF (no whitespace at the beginning) as standard input to the program.
This might not be exactly what you are looking for but it is an option.
If you want to bypass the annoyance of actually including your query in the sh script, you can save the query as .sql file (useful sometimes when the query is REALLY big and complicated). This can be done with simple file IO in whatever language you are using.
Then you can simply include in your sh scrip something like:
mysql -u youruser -p yourpass -h remoteHost < query.sql &
This is called batch mode execution. Optionally, you can include the ampersand at the end to ensure that that line of the sh script does not block.
Also if you are concerned about the same data getting entered multiple times and your rdbms getting inconsistent, you should explore MySql transactions (commit, rollback, etc).
Don't use raw SQL from bash; bash has no sane facility for sanitizing the data beforehand. Generate a CSV file and upload that instead.