Mysql find duplicated on the first matched characters - mysql

I want to pull out double records on the first matched four characters at the left or at first lines in MySQL.
how to make a SELECT?
id name
1 1111q
2 1111
3 1111asdfgg
4 2222
5 2222ag
6 1111au
7 3333
8 5555
Something like:
id name
2 1111
1 1111q
6 1111au
3 1111asdfgg
4 2222
5 2222ag

select distinct t.*
from the_table t
inner join the_table t2 ON left(t.name,4)=left(t2.name,4) and t.id<>t2.id
Join to the same table by LEFT 4 chars and exclude the same ID

Remove my original answer as misread the question.
What you could do is add one more indexed column with the first 4 characters from string. You'll probably pre-process the strings with some other language. This way you can have working index also. By joining the same table in to calculation with subquery and join. You can then compare the counts. Even better would be if you can add some kind of comparison operator to subquery for searching only certain prefixes.
The solution using left is not bad at all, but it loses power of index when using function around field when comparing.
Here's working example from my solution. You can view the execution plan from the bottom.
http://sqlfiddle.com/#!9/1d9cc/19

Related

Select all rows contains same value in a column

I want to select all package_id that contain product_id 2.
In this case, package_id 1,3,5 has product_id 2
Table: product_package
package_id package_name product_id
---------------------------------------------
1 Gold 1,2,3
2 Platinum 4,5,12
3 Diamond 2,11,5
4 Titanium 3,5
5 Basic 2
I tried:
SELECT
*
FROM
product_package
WHERE product_id IN(2)
It is outputting package_id 3 and 5 only. How do I output this properly?
product_id structure is varchar(256). Should I change the structure or add Foreign keys?
We always recommend not to stored delimited columns see Is storing a delimited list in a database column really that bad?
But you can use FIND_IN_SET but this is always slow
SELECT
*
FROM
product_package
WHERE FIND_IN_SET(2,product_id)
package_id
package_name
product_id
1
Gold
1,2,3
3
Diamond
2,11,5
5
Basic
2
fiddle
First, let me explain what is happening in your query.
You have WHERE product_id IN(2), but product_id is a misnomer and should rather be product_ids, because it is multiple IDs unfortunately stored in a string. IN is made to look up a value in a list. Your list, however, only consists of one element, so you can just as well use the equality operator: WHERE product_id = 2.
What you have is WHERE string = number, so the DBMS has to convert one of the values in order to compare the two. It converts the string to a number (so '2' matches 2 and '002' matches 2, too, as it should). But your strings are not numbers. The DBMS should raise an error on '1,2,3' for instance, because '1,2,3' is not a number. MySQL, however, has a design flaw here and still converts the string, regardless. It just takes as many characters from the left as they still represent a number. '1' does, but then the comma is not considered numerical (yes, MySQL cannot deal with a thousand separator when convertings strings to numbers implicitly). So converting '1,2,3' to a number results in 1. Equally, '2,11,5' results in 2, so rather surprisingly '2,11,5' = 2 in MySQL. This is why you are getting that row.
You ask "Should I change the structure", and the answer to this is yes. So far your table doesn't comply with the first normal form and should thus not exist in a relational database. You'll want two tables instead forming the 1:n relation:
Table: package
package_id
package_name
1
Gold
2
Platinum
3
Diamond
4
Titanium
5
Basic
Table: product_package
package_id
product_id
1
1
1
2
1
3
2
4
2
5
2
12
3
2
3
11
3
5
4
3
4
5
5
2
You ask "or add Foreign keys?", and the answer is and add foreign keys. So with the changed structure you want product_package(product_id) to reference product(product_id) and product_package(package_id) to reference package(package_id).
Disregarding that you should not be storing multiple values in a single field, you can use LIKE operator to achieve what you are looking for. I'm going with assumptions:
all values are delimited with commas
all values are integers
there are no whitespaces (or any other characters besides integers and commas)
select * from product_package
where product_id like '2,%'
or product_id like '%,2,%'
or product_id like '%,2'
or product_id like '2'
Alternatively, you can use REGEXP operator:
select * from product_package
where product_id regexp '^2$|^2,.+|.+,2,.+|.+,2'
References:
MySQL LIKE
MySQL REGEXP

what does this sql query do? SELECT column_1 FROM table_1,table_2;

SELECT column_1 FROM table_1,table_2;
When I ran this on my database it returned huge number of rows with duplicate column_1 values. I could not understand why I got these results. Please explain what this query does.
it gives you a cross product from table 1 and table 2
In more layman's terms, it means that for each record in Table A, you get every record from Table B (all possible combinations).
TableA with 3 records and Table B with 3 records gives 9 total records in the result:
TableA-1/B-1
TableA-1/B-2
TableA-1/B-3
TableA-2/B-1
TableA-2/B-2
TableA-2/B-3
TableA-3/B-1
TableA-3/B-2
TableA-3/B-3
Often used as a basis for Cartesian Queries (which themselves are the means to generate, say, a list of future dates based on a recurrence schedule: give me all possible results for the next 6 months, then restrict that set to those whose factor matches my day of the week)
This is 'valid' way of cross joining two tables; it is not the preferred way though. Cross Join would be much clearer. An on condition would then be helpful to limit results,
Imagine that i have 3 friends named Jhon, Ana, Nick; then i have in the other table 2 are T-shirts a red and a yellow and i wanna know witch is from.
So in the query being tableA:Friends and tableB:Tshirts returns:
1|JHON | t-shirt_YELLOW
2|JHON | t-shirt_RED
3|ANA | t-shirt_YELLOW
4|ANA | t-shirt_RED
5|NICK | t-shirt_YELLOW
6|NICK | t-shirt_RED
As you see this join has no relational logic between friends and Tshirts so by evaluating all the posible combination generates what you call duplicates.

MySQL Subquery in the same table

This is my first question. I've trying to deal with this all week long with no success.
I have the following table
ID F1 F2
1 1 10
2 3 5
3 2 8
4 7 10
5 11 20
6 12 18
7 15 20
Please note that the values of the rows 2,3,4 are between the values of the first row.
Then, rows 6 and 7 are between the values of the row 5
I need to create a query that should bring me only rows 1 and 5.
I have tried many kind of queries with no success.
I was expecting the following query to work (among many others) but it did not.
select OL.F1,OL.F2
from borrar OL,
(select F1,F2
from borrar
) IL
where
OL.F1 >= IL.F1
and OL.F2 <= IL.F2
Any ideas?.
Thanks,
You can do this with a self-join. The thing that makes rows 1 and 5 different from the other rows is that there are no rows in the table where F1 is less than Row 1&5's value for F1 and F2 is greater than Row 1&5's value for F2.
SELECT t1.*
FROM datatable t1
LEFT OUTER JOIN datatable t2
on t2.F1<=t1.F1 and t2.F2>=t1.F2 and t1.id<>t2.id
WHERE t2.ID is NULL
Self joins are always a bit confusing. Each row is combined with all other rows to find if there is some other row that "spans" it (i.e. F1 and F2 are equal to or outside it) but need to exclude the row spanning itself. Use an outer join, and search for NULL's to find the rows that have no matches.
This seems like a homework problem...
But, one thing I noticed is that F2-F1 = 9 in both cases. Perhaps using that in the Where clause will do the trick ;)

Is there a possibility to change the order of a string with numeric value

I have some strings in my database. Some of them have numeric values (but in string format of course). I am displaying those values ordered ascending.
So we know, for string values, 10 is greater than 2 for example, which is normal. I am asking if there is any solution to display 10 after 2, without changing the code or the database structure, only the data.
If for example I have to display values from 1 to 10, I will have:
1
10
2
3
4
5
6
7
8
9
What I would like to have is
1
2
3
4
5
6
7
8
9
10
Is there a possibility to ad an "invisible character or string which will be interpreted as greater than 9". If i put a10 instead of 10, the a10 will be at the end but is there any invisible or less visible character for that.
So, I repeat, I am not looking for a programming or database structure solution, but for a simple workaround.
You could try to cast the value as an number to then order by it:
select col
from yourtable
order by cast(col AS UNSIGNED)
See SQL Fiddle with demo
You could try appending the correct number of zeroes to the front of the data:
01
02
03
..
10
11
..
99
Since you have a mixture of numbers and letters in this column - even if not in a single row - what you're really trying to do is a Natural Sort. This is not something MySQL can do natively. There are some work arounds, however. The best I've come across are:
Sort by length then value.
SELECT
mixedColumn
FROM
tableName
ORDER BY
LENGTH(mixedColumn), mixedColumn;
For more examples see: http://www.copterlabs.com/blog/natural-sorting-in-mysql/
Use a secondary column to use as a sort key that would contain some sort of normalized data (i.e. only numbers or only letters).
CREATE TABLE tableName (mixedColumn varchar, sortColumn int);
INSERT INTO tableName VALUES ('1',1), ('2',2), ('10',3),
('a',4),('a1',5),('a2',6),('b1',7);
SELECT
mixedColumn
FROM
tableName
ORDER BY
sortColumn;
This could get difficult to maintain unless you can figure out a good way to handle the ordering.
Of course if you were able to go outside of the database you'd be able to use natural sort functions from various programming languages.

GROUP BY does not remove duplicates

I have a watchlist system that I've coded, in the overview of the users' watchlist, they would see a list of records, however the list shows duplicates when in the database it only shows the exact, correct number.
I've tried GROUP BY watch.watch_id, GROUP BY rec.record_id, none of any types of group I've tried seems to remove duplicates. I'm not sure what I'm doing wrong.
SELECT watch.watch_date,
rec.street_number,
rec.street_name,
rec.city,
rec.state,
rec.country,
usr.username
FROM
(
watchlist watch
LEFT OUTER JOIN records rec ON rec.record_id = watch.record_id
LEFT OUTER JOIN members usr ON rec.user_id = usr.user_id
)
WHERE watch.user_id = 1
GROUP BY watch.watch_id
LIMIT 0, 25
The watchlist table looks like this:
+----------+---------+-----------+------------+
| watch_id | user_id | record_id | watch_date |
+----------+---------+-----------+------------+
| 13 | 1 | 22 | 1314038274 |
| 14 | 1 | 25 | 1314038995 |
+----------+---------+-----------+------------+
GROUP BY does not "remove duplicates". GROUP BY allows for aggregation. If all you want is to combine duplicated rows, use SELECT DISTINCT.
If you need to combine rows that are duplicate in some columns, use GROUP BY but you need to to specify what to do with the other columns. You can either omit them (by not listing them in the SELECT clause) or aggregate them (using functions like SUM, MIN, and AVG). For example:
SELECT watch.watch_id, COUNT(rec.street_number), MAX(watch.watch_date)
... GROUP by watch.watch_id
EDIT
The OP asked for some clarification.
Consider the "view" -- all the data put together by the FROMs and JOINs and the WHEREs -- call that V. There are two things you might want to do.
First, you might have completely duplicate rows that you wish to combine:
a b c
- - -
1 2 3
1 2 3
3 4 5
Then simply use DISTINCT
SELECT DISTINCT * FROM V;
a b c
- - -
1 2 3
3 4 5
Or, you might have partially duplicate rows that you wish to combine:
a b c
- - -
1 2 3
1 2 6
3 4 5
Those first two rows are "the same" in some sense, but clearly different in another sense (in particular, they would not be combined by SELECT DISTINCT). You have to decide how to combine them. You could discard column c as unimportant:
SELECT DISTINCT a,b FROM V;
a b
- -
1 2
3 4
Or you could perform some kind of aggregation on them. You could add them up:
SELECT a,b, SUM(c) "tot" FROM V GROUP BY a,b;
a b tot
- - ---
1 2 9
3 4 5
You could add pick the smallest value:
SELECT a,b, MIN(c) "first" FROM V GROUP BY a,b;
a b first
- - -----
1 2 3
3 4 5
Or you could take the mean (AVG), the standard deviation (STD), and any of a bunch of other functions that take a bunch of values for c and combine them into one.
What isn't really an option is just doing nothing. If you just list the ungrouped columns, the DBMS will either throw an error (Oracle does that -- the right choice, imo) or pick one value more or less at random (MySQL). But as Dr. Peart said, "When you choose not to decide, you still have made a choice."
While SELECT DISTINCT may indeed work in your case, it's important to note why what you have is not working.
You're selecting fields that are outside of the GROUP BY. Although MySQL allows this, the exact rows it returns for the non-GROUP BY fields is undefined.
If you wanted to do this with a GROUP BY try something more like the following:
SELECT watch.watch_date,
rec.street_number,
rec.street_name,
rec.city,
rec.state,
rec.country,
usr.username
FROM
(
watchlist watch
LEFT OUTER JOIN est8_records rec ON rec.record_id = watch.record_id
LEFT OUTER JOIN est8_members usr ON rec.user_id = usr.user_id
)
WHERE watch.watch_id IN (
SELECT watch_id FROM watch WHERE user_id = 1
GROUP BY watch.watch_id)
LIMIT 0, 25
I Would never recommend using SELECT DISTINCT, it's really slow on big datasets.
Try using things like EXISTS.
You are grouping by watch.watch_id and you have two results, which have different watch IDs, so naturally they would not be grouped.
Also, from the results displayed they have different records. That looks like a perfectly valid expected results. If you are trying to only select distinct values, then you don't want ot GROUP, but you want to select by distinct values.
SELECT DISTINCT()...
If you say your watchlist table is unique, then one (or both) of the other tables either (a) has duplicates, or (b) is not unique by the key you are using.
To suppress duplicates in your results, either use DISTINCT as #Laykes says, or try
GROUP BY watch.watch_date,
rec.street_number,
rec.street_name,
rec.city,
rec.state,
rec.country,
usr.username
It sort of sounds like you expect all 3 tables to be unique by their keys, though. If that is the case, you are simply masking some other problem with your SQL by trying to retrieve distinct values.