find elements with same charateristics

find elements with same charateristics - mysql

Supposing I have a table where a material has asignments of different characteristics. A material can have one or more charateristics. Then I would like to find to a certain material similar materials, that means at least 2 characteristics should match. In this example I should find material C when I compare with A and D should find B. Is there any solution in SQL?
material | character
----------------------
A | 2
A | 5
B | 1
B | 3
B | 4
C | 2
C | 5
D | 3
D | 1

This is an Entity-Attribute-Value table, and it notoriously painful to search. (In this case, the value is implied as being TRUE for has this attribute.)
It involves comparing everything against everything, grouping the results, and checking if the groups match. Virtually no use of indexes or intelligence of any kind.
SELECT
material_a.material AS material_a,
material_b.material AS material_b
FROM
material AS material_a
LEFT JOIN
material AS material_b
ON material_a.character = material_b.character
AND material_a.material <> material_b.material
GROUP BY
material_a.material,
material_b.material
HAVING
0 = MAX(CASE WHEN material_b.character IS NULL THEN 1 ELSE 0 END)
This gives every material_b that has all of the characteristics that material_a has.
- The HAVING clause will check that every 0 of material a's characteristics are missing from material b.
Changing to an INNER JOIN and changing the HAVING CLAUSE will get the share at least two materials.
SELECT
material_a.material AS material_a,
material_b.material AS material_b
FROM
material AS material_a
INNER JOIN
material AS material_b
ON material_a.character = material_b.character
AND material_a.material <> material_b.material
GROUP BY
material_a.material,
material_b.material
HAVING
COUNT(*) >= 2
Either way, you still are joining the whole table against the whole table, then filtering out the failures. With 100 materials, that's 9,900 material-material comparison. Imagine when you have 1000 materials and have 999,000 comparisons. Or 1million materials...

You could use something like the following grouped table to determine all items with more than 2 similar characteristics
SELECT
material = t1.material
, similarMaterial = t2.material
FROM
tableName t1
INNER JOIN tableName t2 ON t1.character = t2.character AND NOT(t1.material = t2.material)
GROUP BY material
HAVING
COUNT(*) >= 2

Yes, you can find all paired of similar materials with SQL similar to this:
SELECT c1.material, c2.material, COUNT(*) as characterCount
FROM charateristics c1
CROSS JOIN charateristics c2
WHERE c1.material > c2.material AND c1.character = c2.character
GROUP BY c1.material, c2.material
HAVING characterCount >= 2;

This would give you the results based on a material input:
SELECT b.material
FROM table1 a
INNER JOIN table1 b
ON a.character = b.character AND a.material <> b.material
WHERE a.material = 'A' -- Your input
GROUP BY b.material
HAVING COUNT(*) > 1;
sqlfiddle demo
Or do this to give you the pairs:
SELECT a.material as LEFT_MATERIAL ,b.material AS RIGHT_MATERIAL
FROM table1 a
INNER JOIN table1 b ON a.character = b.character AND a.material <> b.material
GROUP BY a.material,b.material
HAVING COUNT(*) > 1;
sqlfiddle demo

Related

SQL Distinct based on different colum

I have problem to distinct values on column based on other column. The case study is:
Table: List
well | wbore | op|
------------------
wella|wbore_a|op_a|
wella|wbore_a|op_b|
wella|wbore_a|op_b|
wella|wbore_b|op_c|
wella|wbore_b|op_c|
wellb|wbore_g|op_t|
wellb|wbore_g|op_t|
wellb|wbore_h|op_k|
So, I want the output to be appear in different field/column like:
well | total_wbore | total_op
----------------------------
wella | 2 | 3
---------------------------
wellb | 2 | 2
the real study case come from different table but to simplify it I just assume this case happened in 1 table.
The sql query that I tried:
SELECT well.well_name, wellbore.wellbore_name, operation.operation_name, COUNT(*)
FROM well
INNER JOIN wellbore ON wellbore.well_uid = well.well_uid
INNER JOIN operation ON wellbore.well_uid = operation.well_uid
GROUP BY well.well_name,wellbore.wellbore_name
HAVING COUNT(*) > 1
But this query is to calculate the duplicate row which not meet the requirement. Anyone can help?

you need to use count distinct
SELECT
count(distinct wellbore.wellbore_name) as total_wbore
count(distinct operation.operation_name) as total_op
FROM well
INNER JOIN wellbore ON wellbore.well_uid = well.well_uid
INNER JOIN operation ON wellbore.well_uid = operation.well_uid

Final query:
SELECT
well.well_name,
COUNT(DISTINCT wellbore.wellbore_name) AS total_wbore,
COUNT(DISTINCT operation.operation_name) AS total_op
FROM well
INNER JOIN wellbore ON wellbore.well_uid = well.well_uid
INNER JOIN operation ON wellbore.well_uid = operation.well_uid
GROUP BY well.well_name

MySQL condition statement across multiple rows

I have a Profile table like this
|--------|-----------|
| People | Favorite |
|--------|-----------|
| A | Movie |
| B | Movie |
| B | Jogging |
|--------|-----------|
Q: How to retrieve the people whose favorite is movie but not jogging?
In this table, the result is only People A.
Although I came out with this
select People from Profile
where
People
in
(select People from Profile
where favorite='Movie')
and
People
not in
(select People from Profile
where favorite='Jogging')
But it seem like can be better, any suggestion or answer (without using join or union clause)?

https://www.db-fiddle.com/f/rboiDpxxbABCpjtduEz7uY/1
SELECT People
FROM `profile`
GROUP BY people
HAVING SUM('Movie' = favorite) > 0
AND SUM('Jogging' = favorite) = 0

There's lots of ways. While you can use a UNION, its rather messy and innefficient. MySQL doesn't have a MINUS clause which would give a fairly easy to understand query.
You could aggregate the data:
SELECT people
, MAX(IF(favorite='jogging', 1, 0)) as jogging
, MAX(IF(favorite='movie', 1, 0)) as movie
FROM profile
GROUP BY people
HAVING movie=1 AND jogging=0
Or use an outer join:
SELECT m.people
FROM profile m
LEFT JOIN
( SELECT j.people
FROM joggers j
WHERE j.favorite='jogging' ) joggers
ON m.people=joggers.people
WHERE joggers.people IS NULL
AND m.favorite='movies'
Using a NOT IN/NOT EXISTS gives clearer syntax but again would be very innefficient.

There are several query patterns that will return a result that satisfies the specification.
We can use NOT EXISTS with a correlated subquery:
SELECT p.people
FROM profile p
WHERE p.favorite = 'Movie'
AND NOT EXISTS ( SELECT 1
FROM profile q
WHERE q.favorite = 'Jogging'
AND q.people = p.people /* related to row in out query */
)
ORDER
BY p.people
An equivalent result can also be done with an anti-join pattern:
SELECT p.people
FROM profile p
LEFT
JOIN profile q
ON q.people = p.people
AND q.favorite = 'Jogging'
WHERE q.people IS NULL
AND p.favorite = 'Movie'
ORDER BY p.people
Another option is conditional aggregation. Without a guarantee about uniqueness, and some MySQL shorthand:
SELECT p.people
FROM profile p
GROUP
BY p.people
HAVING 1 = MAX(p.favorite='Movie')
AND 0 = MAX(p.favorite='Jogging')
A more portable more ANSI standard compliant syntax for the conditional aggregation:
SELECT p.people
FROM profile p
GROUP
BY p.people
HAVING 1 = MAX(CASE p.favorite WHEN 'Movie' THEN 1 ELSE 0 END)
AND 0 = MAX(CASE p.favorite WHEN Jogging' THEN 1 ELSE 0 END)

This is a common problem when you want to have multiple conditions with the same column. I have answered this here and there are other methods like intersect and subqueries.
SELECT people, GROUP_CONCAT(favorite) as fav
FROM profile
GROUP BY people
HAVING fav REGEXP 'Movie'
AND NOT fav REGEXP 'Jogging';

With group by people and checking the minimum and maximum values of favorite to be 'Movie':
select people from tablename
where favorite in ('Movie', 'Jogging')
group by people
having min(favorite) = 'Movie' and max(favorite) = 'Movie'

Unique rows in join result

I have a tables of delas and curencies look like this
curecnies
id,code
pairs (the available pairs of curencies )
id to_sell to_buy
deals
id
user_id
pair_id
amount_to_sell
amount_to_buy
So I need to get all match deals which can execute , but I am can not get the unique matches.
Here is my sql query
select *
from deals as d1
join deals d2
on d1.sell_amount = d2.buy_amount and d1.buy_amount = d2.sell_amount
i am getting result look like this
id | user_id | pair_id | amount_to_buy | amount_to_sell | id | user_id | pair_id | amount_to_buy | amount_to_sell
1|2|1|1000|3000|2|1|2|3000|1000
2|1|2|3000|1000|1|2|1|1000|3000

You may try using a least/greatest trick here:
SELECT t1.*, t2.*
FROM
(
SELECT DISTINCT
LEAST(d1.id, d2.id) AS d1_id,
GREATEST(d1.id, d2.id) AS d2_id
FROM deals AS d1
INNER JOIN deals d2
ON d1.sell_amount = d2.buy_amount AND
d1.buy_amount = d2.sell_amount
) d
INNER JOIN deals t1
ON d.d1_id = t1.id
INNER JOIN deals t2
ON d.d2_id = t2.id;
The basic idea here is that the subquery labelled d finds a single pair of matched deal IDs, using a least/greatest trick. Then, we join twice to the deals table again to bring in the full information for each member of that deal pair.

Using DISTINCT inside JOIN is creating trouble [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
How can I modify this query with two Inner Joins so that it stops giving duplicate results?
I'm having trouble getting my query to work.
SELECT itpitems.identifier, itpitems.name, itpitems.subtitle, itpitems.description, itpitems.itemimg, itpitems.mainprice, itpitems.upc, itpitems.isbn, itpitems.weight, itpitems.pages, itpitems.publisher, itpitems.medium_abbr, itpitems.medium_desc, itpitems.series_abbr, itpitems.series_desc, itpitems.voicing_desc, itpitems.pianolevel_desc, itpitems.bandgrade_desc, itpitems.category_code, itprank.overall_ranking, itpitnam.name AS artist, itpitnam.type_code FROM itpitems
INNER JOIN itprank ON (itprank.item_number = itpitems.identifier)
INNER JOIN (SELECT DISTINCT type_code FROM itpitnam) itpitnam ON (itprank.item_number = itpitnam.item_number)
WHERE mainprice > 1
LIMIT 3
I keep getting Unknown column 'itpitnam.name' in 'field list'.
However, if I change DISTINCT type_code to *, I do not get that error, but I do not get the results I want either.
This is a big result table so I am making a dummy example...
With *, I get something like:
+-----------+---------+----------+
| identifier| name | type_code|
+-----------+---------+----------+
| 2 | Joe | A |
| 2 | Amy | R |
| 7 | Mike | B |
+-----------+------------+-------+
The problem here is that I have two instances of identifier = 2 because the type_code is different. I have tried GROUP BY at the outside end of the query, but it is sifting through so many records it creates too much strain on the server, so I'm trying to find an alternative way of getting the results I need.
What I want to achieve (using the same dummy output) would look something like this:
+-----------+---------+----------+
| identifier| name | type_code|
+-----------+---------+----------+
| 2 | Joe | A |
| 7 | Mike | B |
| 8 | Sam | R |
+-----------+------------+-------+
It should skip over the duplicate identifier regardless if type_code is different.
Can someone help me modify this query to get the results as simulated in the above chart?

One approach is to use an inline view, like the query you already have. But instead of using DISTINCT, you would use a GROUP BY to eliminate duplicates. The simplest inline view to satisfy your requirements would be:
( SELECT n.item_number, n.name, n.type_code
FROM itpitnam n
GROUP BY n.item_number
) itpitnam
Although its not deterministic as to which row from itpitnam the values for name and type_code are retrieved from. A more elaborate inline view can make this more specific.
Another common approach to this type of problem is to use a correlated subquery in the SELECT list. For returning a small set of rows, this can perform reasonably well. But for returning large sets, there are more efficient approaches.
SELECT i.identifier
, i.name
, i.subtitle
, i.description
, i.itemimg
, i.mainprice
, i.upc
, i.isbn
, i.weight
, i.pages
, i.publisher
, i.medium_abbr
, i.medium_desc
, i.series_abbr
, i.series_desc
, i.voicing_desc
, i.pianolevel_desc
, i.bandgrade_desc
, i.category_code
, r.overall_ranking
, ( SELECT n1.name
FROM itpitnam n1
WHERE n1.item_number = r.item_number
ORDER BY n1.type_code, n1.name
LIMIT 1
) AS artist
, ( SELECT n2.type_code
FROM itpitnam n2
WHERE n2.item_number = r.item_number
ORDER BY n2.type_code, n2.name
LIMIT 1
) AS type_code
FROM itpitems i
JOIN itprank r
ON r.item_number = i.identifier
WHERE mainprice > 1
LIMIT 3
That query will return the specified resultset, with one significant difference. The original query shows an INNER JOIN to the itpitnam table. That means that a row will be returned ONLY of there is a matching row in the itpitnam table. The query above, however, emulates an OUTER JOIN, the query will return a row when there is no matching row found in itpitnam.
UPDATE
For best performance of those correlated subqueries, you'll want an appropriate index available,
... ON itpitnam (item_number, type_code, name)
That index is most appropriate because it's a "covering index", the query can be satisfied entirely from the index without referencing data pages in the underlying table, and there's equality predicate on the leading column, and an ORDER BY on the next two columns, so that will a avoid a "sort" operation.
--
If you have a guarantee that either the type_code or name column in the itpitnam table is NOT NULL, you can add a predicate to eliminate the rows that are "missing" a matching row, e.g.
HAVING artist IS NOT NULL
(Adding that will likely have an impact on performance.) Absent that kind of guarantee, you'd need to add an INNER JOIN or a predicate that tests for the existence of a matching row, to get an INNER JOIN behavior.

SELECT a.*
b.overall_ranking,
c.name AS artist,
c.type_code
FROM itpitems a
INNER JOIN itprank b
ON b.item_number = a.identifier
INNER JOIN itpitnam c
ON b.item_number = c.item_number
INNER JOIN
(
SELECT item_number, MAX(type_code) code
FROM itpitnam
GROUP BY item_number
) d ON c.item_number = d.item_number AND
c.type_code = d.code
WHERE mainprice > 1
LIMIT 3
Follow-up question: can you please post the table schema and how are the tables related with each other? So I will know what are the columns to be linked.

How to left join or inner join a table itself

I have this data in a table, for instance,
id name parent parent_id
1 add self 100
2 manage null 100
3 add 10 200
4 manage null 200
5 add 20 300
6 manage null 300
How can I left join or inner join this table itself so I get this result below?
id name parent
2 manage self
4 manage 10
6 manage 20
As you can I that I just want to query the row with the keyword of 'manage' but I want the column parent's data in add's row as the as in manage's row in the result.
Is it possible?
EDIT:
the simplified version of my actual table - system,
system_id parent_id type function_name name main_parent make_accessible sort
31 30 left main Main NULL 0 1
32 31 left page_main_add Add self 0 1
33 31 left page_main_manage Manage NULL 0 2
my actual query and it is quite messy already...
SELECT
a.system_id,
a.main_parent,
b.name,
b.make_accessible,
b.sort
FROM system AS a
INNER JOIN -- self --
(
SELECT system_id, name, make_accessible, sort
FROM system AS s2
LEFT JOIN -- search --
(
SELECT system_id AS parent_id
FROM system AS s1
WHERE s1.function_name = 'page'
) AS s1
ON s1.parent_id = s2.parent_id
WHERE s2.parent_id = s1.parent_id
AND s2.system_id != s1.parent_id
ORDER BY s2.sort ASC
) b
ON b.system_id = a.parent_id
WHERE a.function_name LIKE '%manage%'
ORDER BY b.sort ASC
result I get currently,
system_id main_parent name make_accessible sort
33 NULL Main 0 1
but I am after this,
system_id main_parent name make_accessible sort
33 self Main 0 1

You just need to reference the table twice:
select t1.id, t1.name, t2.id, t2.name
from TableA t1
inner join TableA t2
on t1.parent_id = t2.Id
Replace inner with left join if you want to see roots in the list.
UPDATE:
I misread your question. It seems to me that you always have two rows, manage one and add one. To get to "Add" from manage:
select system.*, (select parent
from system s2
where s2.parent_id = system.parent_id
and s2.name = 'add')
AS parent
from system
where name = 'manage'
Or, you might split the table into two derived tables and join them by parent_id:
select *
from system
inner join
(
select * from system where name = 'add'
) s2
on system.parent_id = s2.parent_id
where system.name = 'manage'
This will allow you to use all the columns from s2.

Your data does not abide to a child-parent hierarchical structure. For example, your column parent holds the value 10, which is not the value of any id, so a child-parent association is not possible.
In other words, there's nothing that relates the record 2,manage,null to the record 1,add,self, or the record 4,manage,null to 3,add,10, as you intend to do in your query.
To represent hierarchical data, you usually need a table that has a foreign key referencing it's own primary key. So your column parent must reference the column id, then you can express a child-parent relationship between manage and add. Currently, that's not possible.

UPDATED: Joining by parent_id, try:
select m.id, m.name, a.parent
from myTable m
join myTable a on m.parent_id = a.parent_id and a.name = 'add'
where m.name = 'manage'
Change the inner join to a left join if there may not be a corresponding add row.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008