I have multiple store procedures to do the ETL work in mysql. Normally, it is running on the server for over night.
inside the store procedures there are multiple update statement like
update table1 set column1=3 when column2 = 4
if there any way, I can keep the mysql workbench result like
Rows matched: 100 Changed: 50 Warnings: 0
for each statement I run either into mysql table or external file?
prefer mysql native method. if not, any python I could possible use?
"Rows changed" can be retrieved with ROW_COUNT() function.
"Rows matched" needs in a trick with user-defined variable usage.
CREATE TABLE test (id INT, val INT);
INSERT INTO test VALUES
(1,1), (1,2), (1,3), (2,4);
Now we want to perform UPDATE test SET val = 3 WHERE id = 1; and count the amounts.
UPDATE test
-- add user-defined variable for matched rows counting
CROSS JOIN ( SELECT #matched := 0 ) init_variable
-- increment matched rows counter (this expression is always TRUE)
SET val = CASE WHEN #matched := #matched + 1
-- update the column
THEN 3
END
WHERE id = 1;
SELECT #matched matched, ROW_COUNT() changed;
matched | changed
------: | ------:
3 | 2
db<>fiddle here
If more than one column should be updated in a query then only one expression must be accompanied with the counter increment.
Related
In MySQL I need to find multiple IDs within a table, but in some cases the search ID is missing. There is no way around this and I can not put this into programming application logic as it is a grafana dashboard filter. In case no filter it will provide no value at all for the variable.
Minimal example:
SELECT *
FROM tbl
where
-- try to catch empty value
case
when "$filter_ids" then
id in ($filer_ids)
-- id in ('1','2')
-- find_in_set(id, 1,2)
end
AND other_id = 4
-- Possible values for $filter_ids:
-- ''
-- '1'
-- '1','18'
-- Alternativ cases, also possible:
-- empty for no value
-- 1
-- 1,18
I tried both, find_in_set and IN. But both result in an MySQL error in case there is no value (no filter set).
How could I catch this in MySQL?
You can add an empty string to IN or FIND_IN_SET, if you have no ids. that would nozt produce an error
CREATE tABLE TEST(id int)
SELECT * FROM TEST WHERE id in ("") OR FIND_IN_SET(id,'')
| id |
| -: |
db<>fiddle here
I have a table someTable with a column bin of type VARCHAR(4). Whenever I insert to this table, bin should be a unique combination of characters and numbers. Unique in this sense meaning has not appeared before in the table in another row.
bin is in the form of AA00, where A is a character A-F and 0 is a number 0-9.
Say I insert to this table once: it should come up with a bin value which doesn't appear before. Assuming the table was empty, the first bin could be AA11. On second insertion, it should be AA12, and then AA13, etc.
AA00, AA01, ... AA09, AA10, AA11, ... AA99, AB00, AB01, ... AF99, BA00, BA01, ... FF99
It doesn't matter this table can contain only 3,600 possible rows. How do I create this code, specifically finding a bin that doesn't already exist in someTable? It can be in order as I've described or a random bin, as long as it doesn't appear twice.
CREATE TABLE someTable (
bin VARCHAR(4),
someText VARCHAR(32),
PRIMARY KEY(bin)
);
INSERT INTO someTable
VALUES('?', 'a');
INSERT INTO someTable
VALUES('?', 'b');
INSERT INTO someTable
VALUES('?', 'c');
INSERT INTO someTable
VALUES('?', 'd');
Alternatively, I can use the below procedure to insert instead:
CREATE PROCEDURE insert_someTable(tsomeText VARCHAR(32))
BEGIN
DECLARE var (VARCHAR(4) DEFAULT (
-- some code to find unique bin
);
INSERT INTO someTable
VALUES(var, tsomeText);
END
A possible outcome is:
+------+----------+
| bin | someText |
+------+----------+
| AB31 | a |
| FC10 | b |
| BB22 | c |
| AF92 | d |
+------+----------+
As Gordon said, you will have to use a trigger because it is too complex to do as a simple formula in a default. Should be fairly simple, you just get the last value (order by descending, limit 1) and increment it. Writing the incrementor will be somewhat complicated because of the alpha characters. It would be much easier in an application language, but then you run into issues of table locking and the possibility of two users creating the same value.
A better method would be to use a normal auto-increment primary key and translate it to your binary value. Consider your bin value as two base 6 characters followed by two base 10 values. You then take the id generated by MySQL which is guaranteed to be unique and convert to your special number system. Calculate the bin and store it in the bin column.
To calculate the bin:
Step one would be to get the lower 100 value of the decimal number (mod 100) - that gives you the last two digits. Convert to varchar with a leading zero.
Subtract that from the id, and divide by 100 to get the value for the first two digits.
Get the mod 6 value to determine the 3rd (from the right) digit. Convert to A-F by index.
Subtract this from what's left of the ID, and divide by 6 to get the 4th (from the right) digit. Convert to A-F by index.
Concat the three results together to form the value for the bin.
You may need to edit the following to match your table name and column names, but it should so what you are asking. One possible improvement would be to have it cancel any inserts past the 3600 limit. If you insert the 3600th record, it will duplicate previous bin values. Also, it won't insert AA00 (id=1 = 'AA01'), so it's not perfect. Lastly, you could put a unique index on bin, and that would prevent duplicates.
DELIMITER $$
CREATE TRIGGER `fix_bin`
BEFORE INSERT ON `so_temp`
FOR EACH ROW
BEGIN
DECLARE next_id INT;
SET next_id = (SELECT AUTO_INCREMENT FROM information_schema.TABLES WHERE TABLE_SCHEMA=DATABASE() AND TABLE_NAME='so_temp');
SET #id = next_id;
SET #Part1 = MOD(#id,100);
SET #Temp1 = FLOOR((#id - #Part1) / 100);
SET #Part2 = MOD(#Temp1,6);
SET #Temp2 = FLOOR((#Temp1 - #Part2) / 6);
SET #Part3 = MOD(#Temp2,6);
SET #DIGIT12 = RIGHT(CONCAT("00",#Part1),2);
SET #DIGIT3 = SUBSTR("ABCDEF",#Part2 + 1,1);
SET #DIGIT4 = SUBSTR("ABCDEF",#Part3 + 1,1);
SET NEW.`bin` = CONCAT(#DIGIT4,#DIGIT3,#DIGIT12);
END;
$$
DELIMITER ;
I have access to a reporting dataset (that I don't control) that we retrieve daily from a cloud service and store in a mysql db to run advanced reporting and report combining locally with 3rd party data visualization software.
The data often has duplicate values on an id field that create problems when joining with other tables for data analysis.
For example:
+-------------+----------+------------+----------+
| workfile_id | zip_code | date | total |
+-------------+----------+------------+----------+
| 78002 | 90210 | 2016-11-11 | 2010.023 |
| 78002 | 90210 | 2016-12-22 | 427.132 |
+-------------+----------+------------+----------+
Workfile_id is duplicated because this is the same job, but additional work on the job was performed in a different month than the original work. Instead of the software creating another workfile id for the job, the same is used.
Doing joins with other tables on workfile_id is problematic when more than one of the same id is present, so I was wondering if it is possible to do one of two things:
Make duplicate workfile_id's unique. Have sql append a number to the workfile id when a duplicate is found. The first duplicate (or second occurrence of the same workfile id) would need to get a .01 appended to the end of the workfile id. Then later, if another duplicate is inserted, it would need to auto increment the appended number, say .02, and so on with any subsequent duplicate workfile_id. This method would work best with our data but I'm curious how difficult this would be for the server from a performance perspective. If I could schedule the alteration to take place after the data is inserted to speed up the initial data insert, that would be ideal.
Sum total columns and remove duplicate workfile_id row. Have a task that identifies duplicate workfile_ids and sums the financial columns of the duplicates, replacing the original total with new sum and deleting the 'new row' after the columns have been added together.
This is more messy from a data preservation perspective, but is acceptable if the first solution isn't possible.
My assumption is that there will be significant overhead to have the server compare new workfile_id values to all existing worlfile_id values each time data is inserted, but our dataset is small and new data is only inserted once daily, at 1:30am, and it also should be feasible to keep the duplicate workfile_id searching to rows inserted within the last 6 mo.
Is finding duplicates in a column (workfile_id) and appending an auto-incrementing value onto the workfile_id possible?
EDIT:
I'm having trouble getting my trigger to work based on sdsc81's answer below.
Any ideas?
DELIMITER //
CREATE TRIGGER append_subID_to_workfile_ID_salesjournal
AFTER INSERT
ON salesjournal FOR EACH ROW
BEGIN
SET #COUNTER = ( SELECT (COUNT(*)-1) FROM salesjournal WHERE workfile_id = NEW.workfile_id );
IF #COUNTER > 1 THEN
UPDATE salesjournal SET workfile_id = CONCAT(workfile_id, #COUNTER) WHERE id = NEW.id;
END IF;
END;//
DELIMITER ;
It's hard to know if the trigger isn't working at all, or if just the code in the trigger isn't working. I get no errors on insert. Is there any way to debug trigger errors?
Well, everything is posible ;)
You dont control the dataset but you can modifify the database, right?
Then you could use a trigger after every insert of a new value, and update it, if its duplicate. Something like:
SET #COUNTER = ( SELECT (COUNT(*)-1) FROM *your_table* WHERE workfile_id = NEW.workfile_id );
IF #COUNTER > 1 THEN
UPDATE *your_table* SET workfile_id = CONCAT(workfile_id, #COUNTER) WHERE some_unique_id = NEW.some_unique_id;
END IF;
If there are only one insert a day, and there is defined an index over the workfile_id value, then it shouldn't be any problem for your server at all.
Also, you could implement the second solution, doing:
DELIMITER //
CREATE TRIGGER append_subID_to_workfile_ID_salesjournal
AFTER INSERT ON salesjournal FOR EACH ROW
BEGIN
SET #COUNTER = ( SELECT (COUNT(*)-1) FROM salesjournal WHERE workfile_id = NEW.workfile_id );
IF #COUNTER > 1 THEN
UPDATE salesjournal SET total = total + NEW.total WHERE workfile_id = NEW.workfile_id AND id <> NEW.id;
DELETE FROM salesjournal WHERE id = NEW.id;
END IF;
END;//
DELIMITER ;
Hope this helps.
I have a table and want to add another field identity
id | name | identity
1 | sam |
2 | joe |
3 | jen |
Right now there is no data for identity. I will have a string of 5 random character (ex: kdU3k) populate each row.
What is the best way to alter/update the table in this manner?
Since I have a PHP backend, I could technically loop through a SQL statement where identity = null, but I want to know how to do this with just SQL.
While I do not recommend doing this, primarily because MySQL makes certain aspects less fun, this can be done entirely in MySQL DML without even the use of user-defined procedures. Procedures would allow the use of procedural while loops, etc.
I've created an sqlfiddle. The first step is to create the random values; in this case they are also ensured to be distinct in the table afterwards, which ensures there is one less thing to worry about.
-- Create lots of random values without using a proceure and loop.
-- There may be duplicates created. Could be a temporary table.
-- Would be much simplified if there was already a numbers table.
create table idents (value char(5));
insert into idents (value) values (left(md5(rand()), 5)); -- 1
insert into idents (value) select (left(md5(rand()), 5)) from idents; -- 2
insert into idents (value) select (left(md5(rand()), 5)) from idents; -- 4
insert into idents (value) select (left(md5(rand()), 5)) from idents;
insert into idents (value) select (left(md5(rand()), 5)) from idents;
insert into idents (value) select (left(md5(rand()), 5)) from idents;
insert into idents (value) select (left(md5(rand()), 5)) from idents; -- 64
-- Delete duplicate values. While there may be a rare duplicate we will
-- still be left with a good many random values. A similar process
-- could also be used to weed out existing used values.
delete from idents
where value in (
-- The select * is for another MySQL quirk
select value from (select * from idents) i
group by value
having count(value) > 1);
Then the random values have to be associated with each person. This is done with a horrid simulation of a "ROW_NUMBER" on derived relations and a join.
set #a = 0;
set #b = 0;
-- Now here is UGLY MYSQL MAGIC, where variables are used to simulate
-- ROW_NUMBER. YMMV, it "Works Here, Now". Note the very suspicious
-- hack to assign #b back to 0 "for each" joined item.
update people p2
join (select p.id, i.value
-- Give each person record a row number
from (select #a := #a + 1 as rn1, id, #b := 0 as hack from people) p
-- Give each random number a row number
join (select #b := #b + 1 as rn2, value from idents) i
-- And join on row number
on p.rn1 = i.rn2) pv
on p2.id = pv.id
set p2.identity = pv.value
Again, YMMV.
The MySQL "show warnings" output identifies problematic rows by number. What's the best way to quickly see all the data for such a row?
For example, after running an update statement the result indicates "1 warning" and running show warnings gives a message like this: "Data truncated for column 'person' at row 65278". How can I select exactly that row?
Here is a concrete example exploring the limit solution:
create table test1 (
id mediumint,
value varchar(2)
);
insert into test1 (id, value) values
(11, "a"),
(12, "b"),
(13, "c"),
(14, "d"),
(15, "ee"),
(16, "ff");
update test1 set value = concat(value, "X") where id % 2 = 1;
show warnings;
That results in this warning output:
+---------+------+--------------------------------------------+
| Level | Code | Message |
+---------+------+--------------------------------------------+
| Warning | 1265 | Data truncated for column 'value' at row 5 |
+---------+------+--------------------------------------------+
To get just that row 5 I can do this:
select * from test1 limit 4,1;
resulting in this:
+------+-------+
| id | value |
+------+-------+
| 15 | ee |
+------+-------+
So it seems that the limit offset (4) must be one less than the row number, and the row number given in the warning is for the source table of the update without regard to the where clause.
As far as I'm aware, the only way to select those rows is to just SELECT them using the criteria from your original UPDATE query:
mysql> UPDATE foo SET bar = "bar" WHERE baz = "baz";
mysql> SHOW WARNINGS;
...
Message: Data truncated for column 'X' at row 420
...
mysql> SELECT * FROM foo WHERE baz = "baz" LIMIT 420,1;
Obviously, this doesn't work if you've modified one or more of the columns that were part of your original query.
LIMIT x,y returns y number of rows after row x, based on the order of the resultset from your select query. However, if you look closely at what I just said, you'll notice that without an ORDER BY clause, you've got no way to guarantee the position of the row(s) you're trying to get.
You might want to add an autoincrement field to your insert or perhaps a trigger that fires before each insert, then use that index to ensure the order of the results to limit by.
Not to raise this question from the dead, but I'll add one more method of finding the source of warning data that can be helpful in certain cases.
If you are importing a complete dataset from one table into another and receive a truncation warning on a specific field you can run a query joining the two tables on an ID value and then filter by records where the field in question doesn't match. Obviously this only will work if you are importing from a separate table and still have access to the unmodified source table.
So if the field in question is testfield and your import query looks like this:
INSERT INTO newtable (
id,
field1,
field2,
testfield
)
SELECT
id,
field1,
field2,
testfield
FROM oldtable;
The diagnostic query could look something like this:
SELECT newtable.testfield, oldtable.testfield
FROM newtable
INNER JOIN oldtable ON newtable.id = oldtable.id
WHERE newtable.testfield != oldtable.testfield;
This has the added advantage that the order of records in either table doesn't matter.