is there a way for me to be able to add each line from a text file to a mysql database where each line is given a new ID?
The text file is just a list of numbers and there's a new line for every new entry like this
45
33
55
67
41
I want to do it in bash but not sure how to go about this. My skill in bash is limited though.
If you want to create simple MySQL INSERT statements (see http://dev.mysql.com/doc/refman/5.6/en/insert.html) which you may want to review before importing into the database, the following approach might be helpful:
cat ids.txt | awk '{print "INSERT INTO TABLE mytable (myfield) VALUES ("$1");"}' > myinserts.sql
If that is what you expected, you can execute the statements (see http://dev.mysql.com/doc/refman/5.6/en/mysql-batch-commands.html):
mysql mydb < myinserts.sql
Of course you can do this in one step too without creating a temporary file:
cat ids.txt | awk '{print "INSERT INTO TABLE mytable (myfield) VALUES ("$1");"}' | mysql mydb
It is not as efficient as the comment from juergen d, but is more flexible if you want to have more control over the generated INSERT statements.
Related
In the following script, I try to get all tables name from a mysql database and I expect all table's name printed out, but no matter what I do or which method I use, it just doesn't work. the printed string I suppose are tables name overlapped on each other:
watchdoglescabularyrchygsey
What's wrong with this script?
mysql -Nse 'show tables' DATABASE |
{
while read table
do
alltables="$alltables $table"
done
echo $alltables;
}
Could it be that mysql separates the table names by \n\r instead of \n? The read would then read First Table, \rSecond Table, and so on. In most linux terminals \r causes the cursor to jump back to the start of the current line. ABC\r_ will be printed as _BC.
Checking for \r
Execute mysql -Nse 'show tables' DATABASE | sed 's:\r:\\r:' and look at the output. The control character \r will be printed as the literal string \r.
Deleting the \r
Insert a ... | tr -d '\r' | ... between the commands.
First off, due to my MySQL user not having FILE rights on the server, I am having to use the below line to pipe my SELECT statement output to a file in shell instead of doing it directly in MySQL and being able to use INTO OUTFILE & FIELDS TERMINATED BY '|' which I'm guessing would solve all my problems.
So I have the following line to grab my fields:
echo "select id, UNIX_TIMESTAMP(time), company from database.table_name" | mysql -h database.mysql.host.com -u username -ppassword user > /root/sql/output.txt
This outputs the following 3 columns:
63 1414574321 person one
50 1225271921 Another person
8 1225271921 Company with many names
10 1414574567 Person with Company
I then use that data in other scripts to do some tasks.
My issue is that some columns, of which the third here, 'company', is an example, has spaces in its data meaning my WHILE loops later get thrown off.
I would like to add a delimiter to my output so it looks like this instead:
63|1414574321|person one
50|1225271921|Another person
8|1225271921|Company with many names
10|1414574567|Person with Company
and that way I could hopefully manipulate the data in blocks using awk -F| and IFS=| later.
There are many many more columns with variable lengths and number of words pr column to be added when I get it working, so I cannot use a method that relies on position to add the delimiter.
I feel the delimiter needs to be set when the data is dumped in the first place.
I've tried things like:
echo "select (id, + '|' + UNIX_TIMESTAMP(time), + '|' + company) from database.table_name" | mysql -h database.mysql.host.com -u username -ppassword user > /root/sql/output.txt
without any luck, its just adds the characters to the header of the output file.
Does anyone out there see a solution to what I could do?
In case anyone wonders, I'm dumping data from 2 databases, comparing timestamps and writing back the latest data to both databases.
You could use concat_ws function to recieve one concateneted string per row:
select concat_ws( '|', id, UNIX_TIMESTAMP(time) , company ) from database.table_name
Edit: Missing comma added, sorry!
This code searches through website html files and extracts a list of domain names...
httrack --skeleton http://www.ilovefreestuff.com -V "cat \$0" | grep -iEo '[[:alnum:]-]+\.(com|net|org)'
The result looks like this.
domain1.com
domain2.com
domain3.com
I plan to use this code on very large websites, therefore this will generate a very large list of domain names. In addition, the above code generates a lot of duplicate domain names. Therefore, I setup a mysql database with a unique field so duplicates will not be inserted.
Using my limited knowledge of programming I hacked together this line below, but this is not working. When I execute the command, I get no error, just a new command prompt of > and a blinking cursor. I assume I'm not using the correct syntax or methodology, and/or maybe what I want to do is not possible via command line. Any help is much appreciated.
httrack --skeleton http://www.ilovefreestuff.com -V "cat \$0" | domain=“$(grep -iEo '[[:alnum:]-]+\.(com|net|org)’)” | mysql -pPASSWORD -e "INSERT INTO domains.domains (domains) VALUES ($domain)”
And yes, my database name is domains, and my table name is domains, and my field name is domains.
Judging from the MySQL syntax for INSERT:
INSERT [LOW_PRIORITY | DELAYED | HIGH_PRIORITY] [IGNORE]
[INTO] tbl_name
[PARTITION (partition_name,…)]
[(col_name,…)]
{VALUES | VALUE} ({expr | DEFAULT},…),(…),…
…
you need to convert the domain names into parenthesized, quoted, comma separated items:
('domain1.com'),('domain2.com'),…
and then attach this list to the end of the INSERT statement you generated.
httrack --skeleton http://www.ilovefreestuff.com -V "cat \$0" |
grep -iEo '[[:alnum:]-]+\.(com|net|org)’ |
sort -u |
sed -e "s/.*/,('&')/" -e '1s/,/INSERT IGNORE INTO domains.domains(domain) VALUES /' |
mysql -pPASSWORD
The sort -u ensures that the names are unique. The first -e to sed converts the contents of a line (e.g. domain1.com) into ,('domain1.com); the second -e removes the comma of the first line (added by the first -e) and replaces it with the INSERT prefix. The IGNORE in the INSERT statement means that if a domain is already in the table, the new entry will be ignored.
Clearly, if the number of domains generated is too large for a valid SQL statement in MySQL, you'll have to do some splitting of the data, but you're likely to be able to process a few thousand domains at a time.
I am very new to PostgreSQL. Actually I was using MySQL before but for some project specific reason I need to use postgreSQL and do a POC.
Now the problem is:
I was using MySQL LOAD DATA INFILE command to load the column content from a file to my database table.
My table structure is :
Table name: MSISDN
Table Column Names: ID(primary key - auto_generated), JOB_ID, MSISDN, REGION, STATUS
But my input text file(rawBase.txt) is having only below columns:
MSISDN, REGION
so I was using the below command to load these above 2 column with initial JOB_ID and STATUS.
LOAD DATA INFILE 'D:\\project\\rawBase.txt' INTO TABLE MSISDN
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(MSISDN,REGION)
SET JOB_ID = 'XYZ1374147779999', STATUS = 0;
as you can see that there is a option available in LOAD DATA INFILE command where I can SET a particular initial value for the columns which are not present(JOB_ID and STATUS) in the input text file.
NOW,
in case of postgreSQL, I want same thing to happen.
There is also a same kind of command available COPY FROM
like below:
COPY MSISDN FROM 'D:\\project\\rawBase.txt' WITH DELIMITER AS ','
but I am not able to SET a particular initial value for the rest columns which are not present(JOB_ID and STATUS) in my input text file. I am not getting any fruitful example of doing this.
Please give some suggestion if possible.
Regards,
Sandy
You may do it the "Unix way" using pipes:
cat rawbase.txt | awk '{print $0",XYZ1374147779999,0"}' | psql -d dbname -c "copy MSISDN FROM stdin with delimiter AS ','"
Now from the file paths in the question it appears you're using MS-Windows, but a Unix shell and command-line tools like awk are available for Windows through MSYS or Cygwin.
COPY with a column-list, and set a DEFAULT on the table columns you don't specify.
regress=> CREATE TABLE copydemo(a text not null, b text not null default 'blah');
regres=> \COPY copydemo(a) FROM stdin
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> blah
>> otherblah
>> \.
regres=> SELECT * FROM copydemo;
a | b
-----------+------
blah | blah
otherblah | blah
(2 rows)
You're probably COPYing from a file rather than stdin; I just did it on stdin for a quick demo of what I mean. The key thing is that columns that require values not in the CSV have DEFAULTs set, and you specify a column-list in COPY, eg COPY (col1, col2).
There is unfortunately no equivalent to the COPY-specific SET that you want there. You can stage via a temporary table and do an INSERT INTO ... SELECT, as Igor suggested, if you can't or don't want to ALTER your table to set column DEFAULTs.
UPDATE: added an example to clarify the format of the data.
Considering a CSV with each line formatted like this:
tbl1.col1,tbl1.col2,tbl1.col3,tbl1.col4,tbl1.col5,[tbl2.col1:tbl2.col2]+
where [tbl2.col1:tbl2.col2]+ means that there could be any number of these pairs repeated
ex:
tbl1.col1,tbl1.col2,tbl1.col3,tbl1.col4,tbl1.col5,tbl2.col1:tbl2.col2,tbl2.col1:tbl2.col2,tbl2.col1:tbl2.col2,tbl2.col1:tbl2.col2,tbl2.col1:tbl2.col2,tbl2.col1:tbl2.col2,tbl2.col1:tbl2.col2,tbl2.col1:tbl2.col2
The tables would relate to eachother using the line number as a key which would have to be created in addition to any columns mentioned above.
Is there a way to use mysql load
data infile to load the data into
two separate tables?
If not, what Unix command line tools
would be best suited for this?
no, not directly. load data can only insert into one table or partitioned table.
what you can do is load the data into a staging table, then use insert into to select the individual columns into the 2 final tables. you may also need substring_index if you're using different delimiters for tbl2's values. the line number is handled by an auto incrementing column in the staging table (the easiest way is to make the auto column last in the staging table definition).
the format is not exactly clear, and is best done w/perl/php/python, but if you really want to use shell tools:
cut -d , -f 1-5 file | awk -F, '{print NR "," $0}' > table1
cut -d , -f 6- file | sed 's,\:,\,,g' | \
awk -F, '{i=1; while (i<=NF) {print NR "," $(i) "," $(i+1); i+=2;}}' > table2
this creates table1 and table 2 files with these contents:
1,tbl1.col1,tbl1.col2,tbl1.col3,tbl1.col4,tbl1.col5
2,tbl1.col1,tbl1.col2,tbl1.col3,tbl1.col4,tbl1.col5
3,tbl1.col1,tbl1.col2,tbl1.col3,tbl1.col4,tbl1.col5
and
1,tbl2.col1,tbl2.col2
1,tbl2.col1,tbl2.col2
2,tbl2.col1,tbl2.col2
2,tbl2.col1,tbl2.col2
3,tbl2.col1,tbl2.col2
3,tbl2.col1,tbl2.col2
As you say, the problematic part is the unknown number of [tbl2.col1:tbl2.col2] pairs declared in each line. I would tempted to solve this through sed: split the one file into two files, one for each table. Then you can use load data infile to load each file into its corresponding table.