remove duplicates in set of fields in any row - knime

Using Knime, I am trying to remove duplicates in all the rows for set of columns through Groupby node. Can you tell how to implement this or if I can use any other node to get this done.
First I have divided my table in set of columns such as
set 1 is -->Col1,Col2,Col3,Col4
set 2 is-->Col5,col6,Col7,col8
and like this I have 10 sets(with 4 columns each) now I want to check if there we have same data in any particular set Lets say below values are there in set 1
Col1 has 4
Col2 has 4
Col3 has 4
Col4 has 4
then I will keep Col1 as 4 and values in Col2, col3,col4 will be 'null' .
Can you please tell me how to do this through GroupBy node in KNIME
I have tried this using other nodes like constant Value column Filter, math formula,Rule Engine, but nothing seems to working .
First I have divided my table in set of columns such as
set 1 is -->Col1,Col2,Col3,Col4
set 2 is-->Col5,col6,Col7,col8
and like this I have 10 sets(with 4 columns each) now I want to check if there we have same data in any particular set Lets say below values are there in set 1
Col1 has 4
Col2 has 4
Col3 has 4
Col4 has 4
then I will keep Col1 as 4 and values in Col2, col3,col4 will be 'null' .

Can't do it in a GroupBy node. You can get unique values in GroupBy node but you need some logic that will determine that this value is a duplicate and instead of it put null or some other identifier. I advise you to use Rule Engine node with following syntax for last column:
$column4$ MATCHES $column1$ OR $column4$ MATCHES $column2$ OR $column4$ MATCHES $column3$ => "null"
TRUE => $column4$
After that add two more Rule Engine nodes with syntax for column3 and column2. You don't need to do anything for column1 obviously.

Related

Update 1 row in a table that has 5 columns, and make other columns have the old value if they're not changed

Let's say I have a table with 10 columns, and I have 1 row too.
I want to update only one row and not the others
If I update just one, the other ones will have the default value which is the expected behavior
Is there a way so that I change only a row and still have all the other rows preserve the old values?
I don't want to use a lot of resources for this thing so I wish there's something built-in or something
TL;DR:- How can I update a row and have the other rows stay the same?
I'm using MySQL
An update statement lets you control both the rows being updated and the columns. Usually the rows are defined by the where clause and the columns by the set:
update t
set col4 = <some value>
where col1 = <some value>;
If the where clause is using the table's primary key with an equality condition, then only one row is updated, and only the specified columns in that row change.
IF EXISTS (SELECT * FROM X )
UPDATE [Table] SET...
ELSE
INSERT INTO [Table]

A way to keep and update only one row with TYPE=3

The table:
ID TYPE USER_ID
======================
1 1 15
2 1 15
. 3 15
. 1 15
.
should keep multiple USER_ID's with TYPE=1 but only 0 or 1 row where TYPE=3.
In the case that TYPE=3, upon insert I need to either update or create (much like insert on duplicate key update) that row.
Is there a good way to accomplish this without first SELECTing, and updating or inserting depending on the SELECT results in the program?
Preferably doing this in a single command, and without triggers?
One way might be to add new tuple, hold the user id you added in a variable, then
delete where type = 3 and id != {Added id}
it would work, but I want to make the disclaimer that it seems dodgy somehow
You can do the update with subqueries. In this case since you would want to read and write to the same tuple you would need to rename the subquery on the same table to erase the lock on that tuple.
Say you want to update the user_id of the first row of data with type=3 to
20 do:
UPDATE tbl SET user_id=20 WHERE id=
(SELECT A.id FROM (SELECT MIN(id) id
FROM tbl
WHERE type=3) A);
See DEMO ON SQL Fiddle.

Increment an SQL table key in one for stored data

I'm trying to build an script for a database migration and I have a doubt. I have a table which has lots of registers and an integer type key. This key is auto incremental and now beginning with the '1' index.
The problem is that this index has to be occupied by a default value in the new database. So I want to loop database rows to get each index incremented in one, to leave the first place in blank and insert my value into it. I have tried with this statement:
UPDATE `tapp`, (
SELECT #loop := id_App
FROM
tapp
) o
SET id_App = id_App + 1;
However, is trying to update each index starting from the beginning, so when it tries converting the first one to '2' it finds out the second one is already taken and can't make it.
It's important to make the increase in one, because it's a MyIsam database and I also have to update each foreign key one by one. I'm using MySQL.
Please give me a hand!
The easiest way is twofold: After inserting everything into the target table, you make a 2-pass update.
First, you check what the highest number is in the table, then increment it by one and remember it as topnumber.
Then, you update everything to incremental numbers, starting with topnumber. Finally, you update everything, starting with whatever your initial seed is and increment by one each record.
number data
1 "foo"
3 "bar"
10 "snafu"
topnumber becomes 11
After the first pass, the data looks like this:
number data
11 "foo"
12 "bar"
13 "snafu"
After the second pass (and assuming your initial number is 7), the data looks like this:
number data
7 "foo"
8 "bar"
9 "snafu"
UPDATE
Alternatively, instead of updating the numbers to incremental values, you could add the remembered top number to every initial value at the first pass (and so the above sample table would look like this after the first update:
number data
11 "foo"
13 "bar"
23 "snafu"
), and at the second pass, you would decrement all the numbers by the previously stored top number and increment them by 1, which, for our example, would result in the following:
number data
2 "foo"
4 "bar"
11 "snafu"
Using the names in your code snippet, the entire script might look something like this:
/* remember the top ID */
SET #max_id = (SELECT MAX(ID) FROM tapp);
/* increment by the top ID */
UPDATE tapp SET id_App = id_App + #max_id;
/* decrement by the top ID and increment by 1 */
UPDATE tapp SET id_App = id_App - #max_id + 1;

Remove duplicates from TWO columns

Good Morning stackoverflownians,
I have a very big table with duplicates on two columns. Means that if numbers on row a are duplicated in col1 and col2 on row b, I should keep only row a :
## table_1
col1 col2
1 10
1 10
1 10
1 11
1 11
1 12
2 20
2 20
2 21
2 21
# should return this tbl without duplication
col1 col2
1 10
1 11
1 12
2 20
2 21
My previous code account only for col1, and I don't know how to query this on two coluns :
CREATE TABLE temp LIKE db.table_1;
INSERT INTO temp SELECT * FROM table_1 WHERE 1 GROUP BY col1;
DROP TABLE table_1;
ALTER TABLE temp RENAME table_1;
So I thought about that :
CREATE TABLE temp LIKE db.table_1;
INSERT INTO temp(col1,col2)
SELECT DISTINCT col1,col2 FROM table_1;
then drop and rename..
But I'm not sure it's gonna work and MySQL tend to be unstable, if it takes too long I will have to stop the query and that my crash the server again .. T.T
We have 200,000,000 rows and all of them have at least one duplicate..
Any Suggestion of code ? :)
Also .. How long would it take ? minutes or hours ?
you already know quite a ways :)
you can try this also
Use INSERT IGNORE rather than INSERT. If a record doesn't duplicate an existing record, MySQL inserts it as usual. If the record is a duplicate, the IGNORE keyword tells MySQL to discard it silently without generating an error.
Read from existing table and then write on a new table using INSERT IGNORE. This way you can control insert process depending on your resource usage.
When using INSERT IGNORE and you do have key violations, MySQL does NOT raise a warning!!!
the distinct clause is the way to go, but it will take a while to run on that many records. I'd add an ID column that is autoincrment, and is your pk. Then you can run the deduplicate in stages that won't time out.
Good luck and HTH
-- Joe

Order by max value in three different columns

I'm not even sure it's possible to do this but I want to order a query based on the maximum value of one of three columns.
Example table structure:
guid, column1, column2, column3
Columns 1-3 have numerical values and I want to order the select statement based on the maximum value of 1, 2 OR 3.
For example:
record column1 column2 column3
---------------------------------
1 5 0 2
2 2 0 6
3 0 1 2
Would be ordered record 2, 1, 3 because 6 is the maximum value of the three fields across the three records, record 1 is the second and record 3 is the third.
Does this make any sense?
Thanks.
It may be possible to do in a select query (possibly using something like case when though I'm not sure that's allowed in the order by clause itself, YMMV depending on the DBMS) but it's rarely a good idea to use per-row calculations if you want your database to scale well as tables get bigger ("not have the performance of a one-legged pig in a horse race", as one of our DBAs eloquently puts it).
In situations like this, I set up an additional (indexed) column to hold the maximum and ensure that the data integrity is maintained by using an insert/update trigger to force that new column to the maximum of the other three.
Because most database tables are read far more often than written, this amortises the cost of the calculation across all the reads. The cost is borne only when the data is updated and the queries become blindingly fast since you're ordering on a single, indexed, column:
select f1, f2, f3 from t order by fmax desc;
As mentioned here, what you want is an equivalent of the GREATEST function.
In the absence of that, and assuming you've defined a UDF LargerOf to return the largest of two numbers, use
SELECT *
FROM Table
ORDER BY LargerOf(LargerOf(column1, column2), column3)
create table myTable
(column1 int, column2 int, column3 int)
go
insert into myTable
values (5, 0 , 2)
go
insert into myTable
values (2, 0 , 6)
go
insert into myTable
values (0, 1 , 2)
go
select *
from mytable
order by case when column1 > column2 and column1 > column3 then column1
when column2 > column3 then column2
else column3 end desc