Remove duplicate in SSIS package with preference over a column data - ssis

I have duplicate rows in data coming from excel sheet. In the SSIS package, I am using Sort transformation where sorting is done in ascending order by the primary key column ID. But before removing the duplicates I want to see if the email column has email with my company's domain. If so, I want other rows removed than the one having this type of email addresses. What should I do? Please refer to the image attached below.
In the data above, I want to remove two rows of John where email address are john#gmail.com. In Maria's case, I want to remove two rows having email addresses maria#gmail.com, hence preserving rows having email addresses of the domain mycompany.com. If there are multiple rows for a user having email addresses of the domain mycompany.com, I want to keep any one row with the domain email address.
Suggest please.

you can do that in sql like Kobi showed, that may be easier. But if you prefer in ssis:
My test data:
Some points:
Conditional split: First you separate rows with mycompany and those without.
Sort and non_mycompany sort: sort both output on id and remove duplicates.
mycompany_multicast: create two copy of rows with mycompany
Merge join: left join rows without mycompany to rows with mycompany. Note the join order, the purpose is to get rows without mycompany and no matching id in rows with mycompany.
Conditional split1: take rows without mycompany and no matching id in rows with mycompany. you can check id from rows with mycompany, if the id is null then the row has no matching in rows with mycompany.
union all: union the final result

You can use a statement like this:
WITH T AS
(
SELECT ROW_NUMBER() OVER (partition BY id ORDER BY id, CASE WHEN email LIKE '%#mycompany.com' THEN 0 ELSE 1 END ) rn FROM persons
)
DELETE FROM T
WHERE rn > 1
It sort all rows by similar ID and email ( the prefered mail with #mycompany is the first of the list), then add a rownumber on each group, and to finish, it delete all rows wich have a rownumber superior to 1 ( theses are duplicates)
Here is the data to test:
CREATE TABLE Persons (
id NUMERIC(5),
NAME VARCHAR(200),
email VARCHAR(400) );
INSERT INTO persons
VALUES ( 100,
'john',
'john#mycompany.com'),
( 100,
'john',
'john#gmail.com'),
( 100,
'john',
'john#gmail.com');
INSERT INTO persons
VALUES ( 200,
'maria',
'maria#mycompany.com'),
( 200,
'maria',
'maria#gmail.com'),
( 200,
'maria',
'maria#gmail.com');
INSERT INTO persons
VALUES ( 300,
'jean',
'jean#mycompany.com'),
( 300,
'jean',
'jean#gmail.com'),
( 300,
'jean',
'jean#mycompany.com'),
( 300,
'jean',
'jean#mycompany.com');
INSERT INTO persons
VALUES ( 400,
'tom',
'tom#gmail.com'),
( 400,
'tom',
'tom#gmail.com');

Related

How to populate a mysql table with data from two other tables

I have two tables with a many-to-many relation, and a joint table between them, for example:
client (id, name)
address (id, address)
client_address (client_id, address_id)
I need to populate the client_address table with a line for every client, using a specific address, like:
client_id, address_id
1, 1
2, 1
3, 1
4, 1
etc...
I tried something like this (which obviously does not work):
INSERT INTO
client_address (`client_id`, `address_id`)
SELECT id from client,
SELECT id from address where address = 'My Address';
can I do this with a single query?
If you have to populate it manually just like your example you can try to use CROSS JOIN :
INSERT INTO
client_address (`client_id`, `address_id`)
SELECT c.id, a.id
FROM client c,
CROSS JOIN address a
WHERE address = 'My Address';
This will create a line for every client you have in CLIENT table and the address you chose in the WHERE clause

filtering from sql

I use my SQL for my app.
Say I have a table of all registered users for my app.
say I have users at hand and I want to filter (or select) from my database the only ones that are registered.
For example my data base have user1,user2......user100
and input user set : user3,user5,user10,user999,user2000 so the output of the query will be : user3,user5 and user 10 only.
Thank you in advance
You seem to want in:
select t.*
from t
where user_id in ('user3', 'user5', 'user10', 'user999', 'user2000')
This will return only the matching users.
The format the user is passing these values is very important here. I am assuming that you have different rows of information. If in that case, you could make use of the below code.
Declare #MyTableVar table
(User_ID VARCHAR(32) primary key)
INSERT #MyTableVar VALUES ('user3')
INSERT #MyTableVar VALUES ('user5')
INSERT #MyTableVar VALUES ('user10')
INSERT #MyTableVar VALUES ('user999')
INSERT #MyTableVar VALUES ('user2000')
SELECT *
FROM #MyTableVar
WHERE User_ID NOT IN (SELECT USER_ID FROM database.schema.table_name)
If your user is passing values in the same row you can convert them to multiple rows using CROSS APPLY. Example can be seen here
Kartheek

Adding multiple rows mysql in phpmyadmin

I'm trying to execute a rather complicated task in phpmyadmin.
Shortly i need to add multiple rows in a table that uses a certain value from another table column.
I kinda figured out how to add 1 row manually) but since i need 11.000 of it..
INSERT INTO `DATABASE`.`ratings` (
`id` ,
`total_votes` ,
`total_value` ,
`used_ips`
)
VALUES (
'5000', '10', '1000', NULL
);
There are 2 TABLES (CONTENT and RATINGS) in same DB.
In CONTENT there is a column named CONTENT_RECORD (11.000 entries) which excists of only numbers
In RATINGS we have 4 colums (ID,TOTAL_VOTES,TOTAL_VALUE,USED_IPS)
I want to add multiple rows in RATING with following values.
ID = should copy the value from table CONTENT, column CONTENT_RECORD
Total_votes = fixed number 10
Total_value = fixed number 1000
used_ips = leave empty
Help would be much appreciated.
You can use insert..into..select syntax. Something like this -
INSERT INTO `DATABASE`.`ratings`(`id`, `total_votes`, `total_value`, `used_ips`)
SELECT CONTENT_RECORD, '10', '1000', NULL FROM CONTENT
insert into table_name
values('values of your table for row1'),('values of your table for row 2),..........('values of your table for row n

Merge parts of two different strings into new string

I have three columns PRODUCTID, PRODUCTNAME, PRODUCTCODE within a table.
I wish to insert a new product creating the PRODUCTID from the other columns
How do I take the first two letters of PRODUCTNAME and the last four numbers of PRODUCTCODE and populate them into PRODUCTID?
I do not want a product to be duplicated within the table and if attempted I want the table to remain unchanged. The default PRODUCTID is DEFAULT.
I have so far got:
INSERT INTO `v1_products` (`PRODUCTNAME`, `PRODUCTNUMBER`) VALUES ("EXAMPLE", "EXAMPLE4444");
UPDATE `v1_products`
SET `PRODUCTID`=(SELECT CONCAT(
(SELECT LEFT(`PRODUCTNAME`,2)) , (SELECT RIGHT(`PRODUCTNUMBER`,4))))
WHERE `PRODUCTID`= "DEFALT" AND `PRODUCTID`!=`PRODUCTID`
LIMIT 1
First off I would reccomend setting the PRODUCTID column to be unique. This will prevent any products from having the same values. If your default PRODUCTID is "default" then you may need to do this after updating the ids.
Your UPDATE statement is fine except from the WHERE clause:
WHERE `PRODUCTID`= "DEFALT" AND `PRODUCTID`!=`PRODUCTID`
There's a typo in "DEFAULT" and
`PRODUCTID`!=`PRODUCTID`
Will never be true.
WHERE `PRODUCTID`= "DEFAULT"
Should be enough.
The above will fix existing products with incorrect ids. Now you should change your insert so that PRODUCTID is set correctly for each new product. You can do this by simply adding the CONCAT you used in the update statement:
INSERT INTO `v1_products` (`PRODUCTID`, `PRODUCTNAME`, `PRODUCTNUMBER`)
VALUES (
CONCAT(
LEFT("EXAMPLE",2),
RIGHT("EXAMPLE4444",4)),
"EXAMPLE",
"EXAMPLE4444");
From the discussion in the comments an example of only setting the columns values once:
INSERT INTO v1_products (PRODUCTNAME, PRODUCTNUMBER,PRODUCTID)
VALUES (
"EXAMPLE",
"EXAMPLE4444",
CONCAT( LEFT(PRODUCTNAME,2), RIGHT(PRODUCTNUMBER,4)));

SUM data based on a Group By statement for another table

I am trying to create a query that allows me to get the sum of a total stored in one table based on values in another table.
Specifically, I have one table called 'winning_bids', that I want to join with another table, called 'objects'. 'winning_bids' contains a User ID, and an Object ID (primary key of 'objects' table). The 'objects' table contains an Object ID, and the value of the object. I want to sum the value from the 'objects' table for each user, grouped by the User ID from the 'winning_bids' table.
I tried something like this, but it does not work:
SELECT SUM(o.value) AS total, w.uid
FROM winning_bids w
LEFT JOIN objects o ON (o.id = w.oid)
GROUP BY w.uid
This statement merely returns all of the User IDs, but with the total for only the first User ID in each row.
Any help would be appreciated, thanks.
It works fine for me.
Here is what I did to test your query:
CREATE TABLE winning_bids (uid INT NOT NULL, oid INT NOT NULL);
INSERT INTO winning_bids (uid, oid) VALUES
(1, 1),
(1, 2),
(2, 3);
CREATE TABLE objects (id INT NOT NULL, value INT NOT NULL);
INSERT INTO objects (id, value) VALUES
(1, 1),
(2, 20),
(3, 300);
SELECT SUM(o.value) AS total, w.uid
FROM winning_bids w
LEFT JOIN objects o ON (o.id = w.oid)
GROUP BY w.uid;
Result:
total uid
21 1
300 2
If you still think it doesn't work can you please post example input data that gives the wrong result when you run your query, and also specify what you believe that the correct result should be.