I have a SQL table similar to the following:
id | text | other_columns...
----------------------------
0 | a | ...
1 | b | ...
2 | c | ...
I need to apply some complex operation to the values in the text column, and then update the fields with the new values.
// Get all the current values.
entries = SELECT id,text FROM foo_table;
// Apply some complex operation to the text values (this part is Python, not SQL).
foreach entry in entries
entry.text = f(entry.text)
// Update the text fields (1 UPDATE per entry).
foreach entry in entries
UPDATE foo_table SET text=entry.text WHERE id=entry.id;
This results in a table like this, with updated text values:
id | text | other_columns...
----------------------------
0 | x | ...
1 | y | ...
2 | z | ...
It takes ~1 ms per UPDATE, and I have ~.5 million entries which results in ~8 minutes of execution. I am batching the SQL commands (1000 at a time), but this still seems very slow/inefficient.
Is there a better (faster) way to do this? Thanks.
Export to txtfile with 2 columns thru OUTFILE export.
SELECT id, theText
INTO OUTFILE '/path/to/file.csv'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
FROM myTable
Have Py do an in out of it.
Can have 2 columns or 3. Let's say 3 for debugging purposes.
Now u have the out. Bring back into mysql with an INFILE into a worktable with id,newText.
LOAD DATA INFILE 'data.txt' INTO TABLE worktable
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\r\n';
https://dev.mysql.com/doc/refman/5.1/en/load-data.html
Note that the data to import can have row1 with column names. Row1 can be skipped for data import of course, but by naming columns you bring in only certain columns. In your case 2 out of 3 columns.
Add index on worktable.id AFTER import.
Update will be fast.
UPDATE myTable
JOIN worktable
ON worktable.id=myTable.id
SET myTable.text=worktable.newText
This entire thing can occur in an enclosed bash script. If not sure how please ask.
I might be missing something big here, but why can't you just do
Update foo_table;
Set foo_table.text = f(foo_table.text)
You could use a UDF but you have to rewrite your function in C.
Related
I have an hashes table with 2 columns, hash | plain
And a text file looking like that:
acbd18db4cc2f85cedef654fccc4a4d8:foo
37b51d194a7513e45b56f6524f2d51f2:bar
4e99e8c12de7e01535248d2bac85e732:foo:bar
I'm trying to execute this query:
LOAD DATA LOCAL INFILE 'file.txt' INTO TABLE hashes COLUMNS TERMINATED BY ':' LINES TERMINATED BY '\n'
The issue is, for the hash 4e99e8c12de7e01535248d2bac85e732, it will only insert foo, not foo:bar, because of COLUMNS TERMINATED BY ':'.
How can I make it "only split once" to fix this issue?
You could load into a user variable and use a bit of string maniplulation.
drop table if exists t;
create table t
(hash varchar(100),plain varchar(100));
LOAD DATA INFILE 'file.txt'
INTO TABLE t
LINES TERMINATED BY '\r\n'
(#var)
set
hash = substring_index(#var,':',1),
plain = replace(#var,substring_index(#var,':',1),'')
;
select *
from t;
+--------+----------+
| hash | plain |
+--------+----------+
| abc | :def |
| abc | :ghi |
| abc | :def:ghi |
+--------+----------+
3 rows in set (0.001 sec)
Note I have used \r\n to load this properly - you should test for your environment
I need to load file content into a table. The file contains text separated by commas. It is a very large file. I can not change the file it is already given to me like this.
12.com,128.15.8.6,TEXT1,no1,['128.15.8.6']
23com,122.14.10.7,TEXT2,no2,['122.14.10.7']
45.com,91.33.10.4,TEXT3,no3,['91.33.10.4']
67.com,88.22.88.8,TEXT4,no4,['88.22.88.8', '5.112.1.10']
I need to load the file into a table of four columns. So for example, the last row above should be in the table as follows:
table.col1: 67.com
table.col2: 88.22.88.8
table.col3: TEXT3
table.col4: no3
table.col5: ['88.22.88.8', '5.112.1.10']
Using MySQL workbench, I created a table with five columns all are of type varchar. Then I run the following SQL command:
LOAD DATA INFILE '/var/lib/mysql-files/myfile.txt'
INTO TABLE `mytable`.`myscheme`
fields terminated BY ','
The last column string (which contains commas that I do not want to separate) causes an issue.
Error:
Error Code: 1262. Row 4 was truncated; it contained more data than there were input columns
How can I overcome this problem please.
Not that difficult simply using load data infile - note the use of a variable.
drop table if exists t;
create table t(col1 varchar(20),col2 varchar(20), col3 varchar(20), col4 varchar(20),col5 varchar(100));
truncate table t;
load data infile 'test.csv' into table t LINES TERMINATED BY '\r\n' (#var1)
set col1 = substring_index(#var1,',',1),
col2 = substring_index(substring_index(#var1,',',2),',',-1),
col3 = substring_index(substring_index(#var1,',',3),',',-1),
col4 = substring_index(substring_index(#var1,',',4),',',-1),
col5 = concat('[',(substring_index(#var1,'[',-1)))
;
select * from t;
+--------+-------------+-------+------+------------------------------+
| col1 | col2 | col3 | col4 | col5 |
+--------+-------------+-------+------+------------------------------+
| 12.com | 128.15.8.6 | TEXT1 | no1 | ['128.15.8.6'] |
| 23com | 122.14.10.7 | TEXT2 | no2 | ['122.14.10.7'] |
| 45.com | 91.33.10.4 | TEXT3 | no3 | ['91.33.10.4'] |
| 67.com | 88.22.88.8 | TEXT4 | no4 | ['88.22.88.8', '5.112.1.10'] |
+--------+-------------+-------+------+------------------------------+
4 rows in set (0.00 sec)
In this case for avoid the problem related with the improper presence of comma you could import the rows .. in single column table .. (of type TEXT on Medimun TEXT ..as you need)
ther using locate (one for 1st comma , one for 2nd, one for 3th .. ) and substring you could extract form each rows the four columns you need
and last with a insert select you could populate the destination table .. separating the columns as you need ..
This is too long for a comment.
You have a horrible data format in your CSV file. I think you should regenerate the file.
MySQL has facilities to help you handle this data, particularly the OPTIONALLY ENCLOSED BY option in LOAD DATA INFILE. The only caveat is that this allows one escape character rather than two.
My first suggestion would be to replace the field separates with another character -- tab or | come to mind. Any character that is not used for values within a field.
The second is to use a double quote for OPTIONALLY ENCLOSED BY. Then replace '[' with '"[' and ] with ']"' in the data file. Even if you cannot regenerate the file, you can pre-process it using something like grep or pearl or python to make this simple substitution.
Then you can use the import facilities for MySQL to load the file.
//I have updated the question to hopefully clarify things better as I'm still at a loss. I got some very much appreciated help but I can't get it to work yet//
I'm still new at this, new here as well, so I hope I'll explain it clearly enough. If not, please let me know.
I have 2 tables Table 1 "students" , Table 2 'Mixed data"
Table 1 "students" has the columns "ID", "Name" and "Experience"
In the column "Experience" the fields refer to data from Table 2 "Mixed Data"
I am able to export the data of the table "students" to CSV
I use for this export the following query
SELECT *
INTO OUTFILE '/tmp/results.csv'
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
ESCAPED BY '\\'
LINES TERMINATED BY '\n'
FROM students;
In the file "results.csv" the data of the ID's will be the ID's and not the real value/name
I like to be able to export the data into CSV and have the values and not the ID reference in the exported file.
Current export result
ID | Name | Experience
1 | Gary | 5
2 | Mary | 4
the number "4" and "5" in Experience refers to the id in table 2 "Mixed data"
Result I like to have in the export .csv"
ID | Name | Experience
1 | Gary | Super experienced
2 | Mary | Lots of experience
In this export of the CSV the experience field contains now the real values, being "Super experience", "Lots of experience"
Is it possible to export the data of a table including the values of referenced ID's of another table
and if yes, how should I adjust my query.
Thanks so much for any help.
It is not a problem.
Just add desired table to the SELECT statement, for example -
SELECT s.ID, s.Name, e.Experience FROM students s
JOIN experience e ON s.ID = e.ID
INTO OUTFILE '/tmp/results.csv'
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
ESCAPED BY '\\'
LINES TERMINATED BY '\n'
I've a table_name like this:
No | Name | Inserted_Date | Inserted_By
=====================================
and then I've file name.csv like this
no,name
1,jhon
2,alex
3,steve
I want to load these file table_name using syntax like this:
LOAD DATA INFILE 'name.csv' INTO TABLE table1
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES
???
the question is, what should I put on ??? so I can store data like this:
No | Name | Inserted_Date | Inserted_By
=====================================
1 | jhon | sysdate() | me
2 | ales | sysdate() | me
3 | steve | sysdate() | me
I do not understand if columns inserted_date and inserted_by already exists in your table. If no than you can add them before runing LOAD DATA INFILE:
LOAD DATA INFILE 'name.csv' INTO TABLE table1
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(#no, #name)
set
no = #no,
name = #name,
inserted_date = now(),
inserted_by = 'me'
something like this will do it:
LOAD DATA INFILE 'name.csv' INTO TABLE table1
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES
SET inserted_date=CURRENT_DATE(), inserted_by='me'
Take a look at the manual:
http://dev.mysql.com/doc/refman/5.1/en/load-data.html
Insert bulk more than 7000000 record in 1 minutes in database(superfast query with calculation)
mysqli_query($cons, '
LOAD DATA LOCAL INFILE "'.$file.'"
INTO TABLE tablename
FIELDS TERMINATED by \',\'
LINES TERMINATED BY \'\n\'
IGNORE 1 LINES
(isbn10,isbn13,price,discount,free_stock,report,report_date)
SET RRP = IF(discount = 0.00,price-price * 45/100,IF(discount = 0.01,price,IF(discount != 0.00,price-price * discount/100,#RRP))),
RRP_nl = RRP * 1.44 + 8,
RRP_bl = RRP * 1.44 + 8,
ID = NULL
')or die(mysqli_error());
$affected = (int) (mysqli_affected_rows($cons))-1;
$log->lwrite('Inventory.CSV to database:'. $affected.' record inserted successfully.');
RRP and RRP_nl and RRP_bl is not in csv but we are calculated that and after insert that.
There is no way to do that using the load data command in mysql. The data has to be in your input file.
If you are on Linux, it would be easy enough to add the additional field data to your input file using sed or similar tools.
Another alternative would be to upload your file and then run a query to update the missing fields with the data you desire.
You could also set up a trigger on the table to populate those fields when an insert happens, assuming you want to use the mysql username value.
LOAD DATA Local INFILE 'name.csv' INTO TABLE table1
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 LINES
SET inserted_date=CURRENT_DATE(), inserted_by='me'
I am facing an issue with GROUP_CONCAT and exporting the results to csv.
Consider the following table
Search Results
with columns search Id | Description | Votes | search category
and consider the following data in the table
1|java, beans|2|java
2|serialization|3| java
3|jquery| 1|javascript
4|mysql joins|5| database
I need the output in the following format
Search Category| description1 | description 2 | votes 1 | votes 2
java |java,beans | serialization | 2 | 3
javascript |jquery | | 1
database | mysqljoins | | 5
I need to output this data into a csv file.
I have written the following query
select category, GROUP_CONCAT(description), GROUP_CONCAT(votes) from search_results group by search_category into outfile '/tmp/out.csv' fields terminated by ',' enclosed by '"' lines terminated by '\n';
However, following are the issues
-The above query returns one column each for description and votes which displays the comma separated values . I need separate columns for each of the values(as shown in the desired output)
- for the category javascript, the output is returned in the format
javascript|jquery|5<br/>
I will need the out put in the format
javascript|jquery| | 5| |
There should be a placeholder for the empty values
SELECT
s1.search_category
, s1.description AS description1
, ISNULL(s2.description, '') AS description2
, s1.votes AS votes1
, ISNULL(s2.votes, '') AS votes2
FROM search_results AS s1
LEFT OUTER JOIN search_results AS s2 ON s2.search_category = s1.search_category
WHERE s1.search_id < s2.search_id
INTO OUTFILE ...
LEFT JOINing the table to its self will give you the second column if there is one. If there isn't one, the values will be NULL and the ISNULL will return spaces (or whatever else you want).
If there are more than 2 rows with the same search_category, you will get more than one row in the output which might not be what you're after.