Select without duplicate from a specific string/key - mysql

Thanks to #Ed Gibbs i managed to solve my first problem on this case (Select duplicate and keep the oldest (not based on ID))
I am now facing a new problem I can not solve.
I have two tables, "domain" which is clear of duplicate and "email" which contains duplicate. In the first table i had a value called "creationdate" which i used as a filter. In the second table i don't have any filter but some informations could (i think) be used to act as a filter.
Table domain :
| domain | value 1 | foreign_key |
|------------|---------|-------------|
| google.com | patrick | X |
| yahoo.com | britney | Y |
| ebay.com | harry | Z |
Table email :
| email | value 1 | foreign_key |
|--------------------|---------|-------------|
| john#google.com | patrick | X |
| john#google.com | britney | Y |
| harry#google.com | mary | X |
| mickael#google.com | jack | X |
| david#ebay.com | walter | Z |
| alice#yahoo.com | brian | Y |
As you can see on the first table, the domain google.com is handled by X foreign_key. In the email table the records "john#google.com,patrick,X" and "harry#google.com,mary,X" are fine because they match to the right foreign_key. The problem is records like "john#google.com,britney,Y", Y isn't the associated foreign_key to the domain google.com so i want to remove it.
Here is the desired table :
| email | value 1 | foreign_key |
|--------------------|---------|-------------|
| john#google.com | patrick | X |
| harry#google.com | mary | X |
| mickael#google.com | jack | X |
| david#ebay.com | walter | Z |
| alice#yahoo.com | brian | Y |
How can i select theses datas without the wrongs records ? I think the key of the problem is a concat/substring but i can't figure how to do it.
Thanks for your help.

To get domain out of a proper email you can use substring_index() function and use a simple join based on foreign key and domain match.
SELECT email.* FROM email
JOIN domain ON email.foreign_key = domain.foreign_key
AND substring_index( email.email, '#', -1 ) = domain.domain

Related

How to make a pivot table by multiple unique ID numbers?

I'm trying to break up a SQL table that needs to take a users name and find the unique user ID's from up to 4 systems.
The data is currently like this:
| Name | User_ID |
-----------------
| A | 10 |
| A | 110 |
| A | 1500 |
| A | 4 |
| B | 20 |
| B | 100 |
| B | 2 |
| C | 10 |
I need to pivot it around the user's name to look like this (the id's don't need to be in numerical order as the SYS#_ID for each doesn't matter):
| Name | SYS1_ID | SYS2_ID | SYS3_ID | SYS4_ID |
------------------------------------------------
| A | 4 | 10 | 110 | 1500 |
| B | 2 | 20 | 100 | NULL |
| C | 10 | NULL | NULL | NULL |
This is the code I have tried on MySQL:
PIVOT(
COUNT(User_ID)
FOR Name
IN (SYS1_ID, SYS2_ID, SYS3_ID, SYS4_ID)
)
AS PivotedUsers
ORDER BY PivotedUsers.User_Name;
I'm unsure if PIVOT works on MySQL as I keep getting an error "PIVOT unknown". Is there a way to find the values that each user has and if they do not appear in the table already add them to the next column with a max of 4 values?

Selecting multiple columns from previous row in MySQL

Suppose I have a table like this:
| id | date | name | value |
|----|------------|------|-------|
| 0 | 2017-01-14 | foo | one |
| 1 | 2017-01-17 | bar | two |
| 2 | 2017-01-18 | john | five |
| 3 | 2017-01-19 | doe | ten |
(where date need not necessarily be ordered)
I want to be able to select some values of the previous row (based on date). Such a functionality can be achieved by the following query:
SELECT
*,
(SELECT
name
FROM
example e2
WHERE
e2.dt < e1.dt
ORDER BY dt DESC
LIMIT 1
) as prev_name
FROM example e1
with resulting table:
| id | dt | name | value | prev_name |
|----|------------|------|-------|-----------|
| 0 | 2017-01-14 | foo | one | (null) |
| 1 | 2017-01-17 | bar | two | foo |
| 2 | 2017-01-18 | john | five | bar |
| 3 | 2017-01-19 | doe | ten | john |
Now, this works just fine. However, it would be preferable if I could easily select multiple columns from the previous row, resulting in a result like:
| id | dt | name | value | prev_name | prev_value | prev_dt |
|----|------------|------|-------|-----------|------------|------------|
| 0 | 2017-01-14 | foo | one | (null) | (null) | (null) |
| 1 | 2017-01-17 | bar | two | foo | one | 2017-01-14 |
| 2 | 2017-01-18 | john | five | bar | two | 2017-01-17 |
| 3 | 2017-01-19 | doe | ten | john | five | 2017-01-18 |
This can of course be accomplished by simply copying the subquery (SELECT [..] FROM example e2 ...) into the query multiple times, but I guess this is not the preferable way to go. I have found several question on SO addressing either the "how to select records from a previous row" or the "how to select multiple columns using subqueries" problem, but not both. The latter problem is then mostly solved by using a JOIN statement, but I think this is not combinable with the "previous row" case. So my question is: what would be a better way to produce the last result, rather then copying a subquery for every column we need?
EDIT. As an extra constraint, that I did not include in the original question, "previous" could actually be something different from the previous row, but rather "the previous row that satisfies a condition". So suppose my table contains an extra boolean column b
| id | dt | name | value | b |
|----|------------|------|-------|---|
| 0 | 2017-01-14 | foo | one | 1 |
| 1 | 2017-01-17 | bar | two | 0 |
| 2 | 2017-01-18 | john | five | 1 |
| 3 | 2017-01-19 | doe | ten | 0 |
I would want the "previous row" to be the previous row with b = 1, so the desired result would be:
| id | dt | name | value | b | prev_name | prev_value | prev_dt |
|----|------------|------|-------|---|-----------|------------|------------|
| 0 | 2017-01-14 | foo | one | 1 | (null) | (null) | (null) |
| 1 | 2017-01-17 | bar | two | 0 | foo | one | 2017-01-14 |
| 2 | 2017-01-18 | john | five | 1 | foo | one | 2017-01-14 |
| 3 | 2017-01-19 | doe | ten | 0 | john | five | 2017-01-18 |
I think this can still be accomplished by James Scott's answer, by simply only updating the variables when b = 1, using an IF-statement, but maybe there is another solution possible in this case.
EDIT. SQLfiddle
Something like this will return the id of the 'previous' row.
SELECT x.*
, MAX(y.id) prev_id
FROM example x
LEFT
JOIN example y
ON y.id < x.id
AND y.b = 1
GROUP
BY x.id;
I'll leave the business of returning the rest of the data associated with this row as an exercise for the reader.
Looks like a good use case for session variables if you only want the previous row, you can use ORDER BY to get different results.
SET #VDt := NULL, #VName := NULL, #VValue := NULL;
SELECT id, #VName prev_name, #VValue prev_value, #VDt prev_dt, #VDt := dt dt, #VName := `name` `name`, #VValue := `value` `value` FROM example;
Messed this up when I first posted, note that the variables must be set after they are returned from the previous row. To reorder the columns (if desired) you can wrap this query in another that then reorders the result columns.
Let me know if you need anything else,
Regards,
James

Compare different rows and bring out result

I have a table which requires me to pair certain rows together using a unique value that both the rows share.
For instance in the below table;
+--------+----------+-----------+-----------+----------------+-------------+
| id | type | member | code | description | matching |
+--------+----------+-----------+-----------+----------------+-------------+
| 1000 |transfer | 552123 | SC120314 | From Gold | |
| 1001 |transfer | 552123 | SC120314 | To Platinum | |
| 1002 |transfer | 833612 | SC120314 | From silver | |
| 1003 |transfer | 833612 | SC120314 | To basic | |
| 1004 |transfer | 457114 | SC150314 | From Platinum | |
| 1005 |transfer | 457114 | SC150314 | To silver | |
| 1006 |transfer | 933276 | SC180314 | From Gold | |
| 1007 |transfer | 933276 | SC180314 | From To basic | |
+--------+----------+-----------+-----------+----------------+-------------+
basically What i need the query / routine to do is find the rows where the value in the 'member' column for each row match. Then see if the values in the 'code' column for the same found rows also match.
If both columns for both rows match, then assign a value to the 'matching' column for both rows. This value should be the same for both rows and unique to only them.
The unique code can be absolutely anything, so long as it's exclusive to matching rows. Is there any query / routine capable of carrying this out?
I'm not sure I understand the question correctly, but if you like to pick out and update rows where the code and member columns matches and set matching to some unique value for each of the related rows, I believe this would work:
UPDATE <table> A
INNER JOIN (SELECT * FROM <table>) B ON
B.member = A.member && B.code = A.code && A.id <> B.id
SET A.matching = (A.id + B.id);
The matching value will be set to the sum of the id columns for both rows. Notice that updating the matching field this way will not work if there are more than two rows that can match.
Running the above query against your example table would yield:
+------+----------+--------+----------+---------------+----------+
| id | type | member | code | description | matching |
+------+----------+--------+----------+---------------+----------+
| 1000 | transfer | 552123 | SC120314 | From Gold | 2001 |
| 1001 | transfer | 552123 | SC120314 | To Platinum | 2001 |
| 1002 | transfer | 833612 | SC120314 | From Silver | 2005 |
| 1003 | transfer | 833612 | SC120314 | To basic | 2005 |
| 1004 | transfer | 457114 | SC150314 | From Platinum | 2009 |
| 1005 | transfer | 457114 | SC150314 | To silver | 2009 |
| 1006 | transfer | 933276 | SC180314 | From Gold | 2013 |
| 1007 | transfer | 933276 | SC180314 | From To basic | 2013 |
+------+----------+--------+----------+---------------+----------+
I can give you a simple query what can do what you need.
tst is the name of the table.
SELECT *, COUNT( t2.id ) as matching FROM tst t LEFT JOIN tst t2 ON t2.member = t.member GROUP BY t.id

Add column to table form to show value from a query

I have table x, where x.b is the primary key:
+-----+-----+
| a | b |
+-----+-----+
| xyz | 123 |
| abc | 456 |
| abc | 999 |
+-----+-----+
Table y, where y.b is the foreign key for x.b:
+----+-----+-------+
| ID | b | c |
+----+-----+-------+
| 1 | 123 | x105 |
| 2 | 123 | a309 |
| 3 | 456 | b123 |
| 4 | 999 | q234 |
| 5 | 999 | z525 |
+----+-----+-------+
A query yQuery based on y to find the value of c for the highest ID for each b, which results in:
+----+-----+-------+
| ID | b | c |
+----+-----+-------+
| 2 | 123 | a309 |
| 3 | 456 | b123 |
| 5 | 999 | z525 |
+----+-----+-------+
I have a form xForm that's currently displaying table x. I want to add a column that shows the c result from yQuery, so that xForm would look like this:
+----+-----+-------+
| ID | b | c |
+----+-----+-------+
| 2 | 123 | a309 |
| 3 | 456 | b123 |
| 5 | 999 | z525 |
+----+-----+-------+
I tried adding a textbox to xForm where the control source of the textbox is:
=[yQuery]![c]
But that just gave me a column of #Name? errors. I'm not sure how to set up the textbox so that its source is the xForm!ID field.
One option would be to have the form bound to a query that pulls the information from table [x] joined with query [yQuery] on [ID]. However, if [yQuery] has a GROUP BY clause then any query that incorporates [yQuery] might produce a recordset that is not updateable.
Another option would be to use a DLookup() as the Control Source for the textbox in question, something like
=DLookup("c","yQuery","ID=" & [ID])
you could just use this dlookup instead:
=DLookup("[c]","yQuery","[b] = " & [Control Name for B in your form])
or maybe
=DLookup("[c]","yQuery","[b] = """ & [Control Name for B in your form] & """")
if b is not numeric

Select duplicate and keep the oldest (not based on ID)

Thanks for your help i'm stuck on this problem.
Let me explain it, i have this kind of table :
| domain | creationdate | value 1 | value 2 |
|--------|---------------------|---------|---------|
| abc | 2013-05-28 15:35:01 | value 1 | value 2 |
| abc | 2013-04-30 12:10:10 | value 1 | value 2 |
| aaa | 2011-04-02 13:10:10 | value 1 | value 2 |
| bbb | 2012-02-12 10:48:10 | value 1 | value 2 |
| bbb | 2013-04-15 07:15:23 | value 1 | value 2 |
And i want to select (with subqueries) this :
| domain | creationdate | value 1 | value 2 |
|--------|---------------------|---------|---------|
| abc | 2013-04-30 12:10:10 | value 1 | value 2 |
| aaa | 2011-04-02 13:10:10 | value 1 | value 2 |
| bbb | 2012-02-12 10:48:10 | value 1 | value 2 |
I tried to do a combinaison of subqueries with IN/NOT IN in WHERE clause and group by/having but i'm not able to obtain a proper result.
I also have another question to ask, if someone already faced this kind of problem i would be glad to hear how he managed to figure it out.
The records in the first table you see above are frequently (every ten mins) deleted/inserted. My aim is to make a copy (or maybe a view) of the result (without the duplicates entries) which will be used 24/7 by a postfix mail server. I heard that big views (with many subqueries) decreases performances which means a table would be a preferable option. The thing is if i have to make a new table every ten mins there will be a little down time and postfix will not be able to read the table.
Waiting for your advices, thanks already.
EDIT :
Based on #Ed Gibbs answer, there is a better sample :
Source table :
| domain | creationdate | value 1 | value 2 |
|------------|---------------------|---------|---------|
| google.com | 2013-05-28 15:35:01 | john | mary |
| google.com | 2013-04-30 12:10:10 | patrick | edward |
| yahoo.fr | 2011-04-02 13:10:10 | britney | garry |
| ebay.com | 2012-02-12 10:48:10 | harry | mickael |
| ebay.com | 2013-04-15 07:15:23 | bill | alice |
With your query the result is the source table.
Desired result :
| domain | value 1 | value 2 |
|------------|---------|---------|
| google.com | patrick | edward |
| yahoo.fr | britney | garry |
| ebay.com | harry | mickael |
I want to keep the oldest domain (with the min creation date) with its own value1 and 2.
New question !
I made a view of the desired result based on your anwser.
The result look like this :
| domain | value 1 | foreign_key |
|------------|---------|-------------|
| google.com | patrick | X |
| yahoo.fr | britney | Y |
| ebay.com | harry | Z |
I also have a table with this kind of entries :
| email | value 1 | foreign_key |
|--------------------|---------|-------------|
| john#google.com | patrick | X |
| john#google.com | britney | Y |
| harry#google.com | mary | X |
| mickael#google.com | jack | X |
| david#ebay.com | walter | Z |
| alice#yahoo.com | brian | Y |
Assume that (in this sample) emails %#google.com from Y foreign_key aren't good records (only %google.com from X foreign are the good ones and also because its domain is the one i choose with the creationdate selection) how could i manage to select only emails from domain/fk referenced in my new view ?
Desired result :
| email | value 1 | foreign_key |
|--------------------|---------|-------------|
| john#google.com | patrick | X |
| harry#google.com | mary | X |
| mickael#google.com | jack | X |
| david#ebay.com | walter | Z |
| alice#yahoo.com | brian | Y |
I tried with a CONCAT('%','#',domain) and a foreign_key=foreign_key join but it doesn't give me what i want.
Based on your sample data and results, a GROUP BY will give you the results you're after:
SELECT
domain,
MIN(creationdate) AS creationdate,
value1,
value2
FROM mytable
GROUP BY domain, value1, value2
Addendum: #Arka provided updated sample data where the value 1 and value 2 columns have different values (in the original they were the same). That changes the query to this:
SELECT domain, creationdate, value1, value2
FROM mytable
WHERE (domain, creationdate) IN (
SELECT domain, MIN(creationdate)
FROM mytable
GROUP BY domain)
The subquery gets a list of the earliest creationdate for each domain, and the outer query only selects rows where the domain and creationdate match the subquery values.