Count the number of times words appear in a text - mysql

I am trying to get the words appear the most times in different articles.
For example :
Table : Articles
Id Article
1 <b>Une santé digitale au plus près des besoins des patients et des soignants ? Direction Medidays </b> <u> <br> </u><br/>Paris, le mercredi 29 mai 2019 – Si l’on en croit l’ensemble des programmes de santé publique et tous les projets publics et privés dédiés à l’organisation des soins, les outils digitaux seront demain incontournables pour faciliter la pratique des professionnels de santé et améliorer le quotidien des patients. Pourtant, aujourd’hui, un nombre non négligeable des outils qui ont déjà été développés ne se différencient guère de gadgets au pire ou ne présentent pas de valeur ajoutée fondamentale par rapport aux systèmes classiques au mieux. <br/><b>Quarante-huit heures d’effervescence</b><br/>Inclure les professionnels de santé et les représentants de patients dans la conception des projets digitaux est sans doute la voie à suivre pour corriger cet écueil. Aussi, étaient-ils des participants de premier plan lors des Medidays, premier hackaton e-santé organisé par l’Assistance publique – hôpitaux de Paris (AP)-(HP) et Doctolib le week-end dernier. Pendant quarante-huit heures, dans une belle effervescence, vingt-deux équipes comptant des professionnels de santé, des cadres de santé, des patients, des développeurs, des designers ou encore des graphistes ont travaillé sans relâche pour présenter à un jury de spécialistes des projets innovants mais également adaptés à la pratique quotidienne. <br/><b>De la dépression du post partum au coaching des infirmières hospitalières</b><br/>Cinq programmes sur les trente-cinq présentés ont retenu l’attention. Ils ont tous en commun de promouvoir une amélioration directe de la prise en charge des patients ou de la vie pratique des professionnels de santé. Ainsi, « <i>Docteur Simone</i> » est une application proposée par Anne-Charlotte Dimmy pour améliorer la prévention de la dépression post-partum. « Chat marche » imaginée par Flavien Quijoux promet grâce à un système de reconnaissance d’image de lutter contre la chute des personnes âgées. Quant à « <i>Post hop</i> », coup de cœur de l’AP-HP présentée par Romain Laurent, elle est dédiée à la rééducation améliorée après chirurgie. <br/>Du côté de l’amélioration de la vie pratique des professionnels de santé, deux applications ont été saluées : Supply Med, dessinée par Rubin Soudry, une marketplace digitale dédiée aux fournitures médicales dentaires et Coach My Nurse, programme de coaching destiné aux infirmières hospitalières produite par Martin Louvel. L’ensemble de ces applications bénéficieront de soutiens technologiques afin d’assurer leur développement. « <i>Nous sommes heureux de pouvoir faire émerger et d’accompagner des projets qui permettront, demain, de contribuer à la transformation du système de santé. En 48 heures, des premières solutions extrêmement prometteuses ont émergé. C’est la preuve que lorsque plusieurs acteurs de la santé se mettent en commun pour réfléchir au futur de la santé en France, des projets utiles et innovants peuvent voir le jour. Chez Doctolib, nous sommes très fiers d’avoir rendu cela possible</i> », a observé Stanislas Niox-Chateau, co-fondateur et président de Doctolib et membre du jury de la 1ère édition de Medidays. <br/> <b>Léa Crébat </b> </p>
I use this SQL query:
select DISTINCT val, cnt as result from(
select (substring_index(substring_index(t.article, ' ', n.n), ' ', -1)) val,count(*) as cnt
from articles t cross join(
select a.n + b.n * 10 + 1 n
from
(select 0 as n union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6
union all select 7 union all select 8 union all select 9) a,
(select 0 as n union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6
union all select 7 union all select 8 union all select 9) b
order by n
) n
where n.n <= 1 + (length(t.article) - length(replace(t.article, ' ', '')))
AND (substring_index(substring_index(t.article, ' ', n.n), ' ', -1)) NOT REGEXP '^[0-9]+$'
AND (substring_index(substring_index(t.article, ' ', n.n), ' ', -1)) > ''
group by val
order by cnt desc
) as x
ORDER BY `result` DESC LIMIT 5
For the moment I can get :
val result
des 8
de 4
et 4
santé 3
au 2
But I think there is a problem because if I search by hand in the article, I see that "des" appears 26 times, "de" appears 34 times, "et" appears 11 times, "santé" 12 times and "au" appears 5 times.
How can I get the exact number of times each word appears in the text?

You are only counting among the first 100 words. You can extend this to 1000:
(select a.n + b.n * 10 + c.n * 100 + 1 as n
from (select 0 as n union all select 1 union all select 2 union all select 3 union all
select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9
) a cross join
(select 0 as n union all select 1 union all select 2 union all select 3 union all
select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9
) b cross join
(select 0 as n union all select 1 union all select 2 union all select 3 union all
select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9
) c
) n

I have to use REPLACE() to remove some special characters
SQL DEMO
SELECT val, count(*)
FROM (
SELECT
DISTINCT SUBSTRING_INDEX(SUBSTRING_INDEX(message, ' ', n.digit + m.digit*10 + o.digit*100 + p.digit*1000 +1), ' ', -1) val,
n.digit + m.digit*10 + o.digit*100 + p.digit*1000 as word
FROM
(SELECT REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(message, '<b>', ' '), '</p>', ' '), '</b>', ' '), '<br/>', ''), '<br>', ''), ',', ' '), '<i>', ' '), '</i>', ' '), '.', ' '), '<u>', ' '), '</u>', ' ') as message
FROM Table1
) as Table1
CROSS JOIN (SELECT 0 digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) n
CROSS JOIN (SELECT 0 digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) m
CROSS JOIN (SELECT 0 digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) o
CROSS JOIN (SELECT 0 digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) p
ON LENGTH(REPLACE(message, ' ' , '')) <= LENGTH(message)-n.digit + m.digit*10 + o.digit*100 + p.digit*1000
) as T
WHERE val <> ' '
GROUP BY val
ORDER BY COUNT(*) DESC

Related

LEFT JOIN to return only first row

I am making a join with two tables, tab_usuarios (users) and tab_enderecos (address).
tab_usuarios structure:
id_usuario
nome
usuario
1
Administrador
admin
2
Novo Usuário
teste
3
Joao Silva
jao
tab_enderecos structure:
id_endereco
id_usuario
cidade
uf
2
1
cidade
SP
20
2
Lorena
SP
22
2
Lorena
SP
24
3
Campinas
SP
28
4
Lorena
SP
I have this simple query which brings me the following result:
Select
u.id_usuario,
u.usuario,
u.nome,
e.id_endereco,
e.cidade,
e.uf
From
tab_usuarios u Left Join
tab_enderecos e On u.id_usuario = e.id_usuario
id_usuario
usuario
nome
id_endereco
cidade
uf
1
admin
Administrador
2
cidade
SP
2
user 2
Novo Usuário
22
Lorena
SP
2
user 2
Novo Usuário
20
Lorena
SP
3
jao
Joao Silva
24
Campinas
SP
4
teste
fabio
28
Lorena
SP
What I want is, for example, for id_usuario = 2, I only want to bring the id_endereco = 20, which is the first address that have been inserted on the database.
I tried with min and a couple others.
This should do it, assuming you have MySql 8.0 and not some ancient 5.x version:
SELECT *
FROM (
SELECT u.id_usuario, u.usuario, u.nome, e.id_endereco, e.cidade, e.uf,
row_number() over (partition by u.id_usuario order by e.id_endereco) rn
FROM tab_usuarios u
LEFT JOIN tab_enderecos e On u.id_usuario = e.id_usuario
) t
WHERE rn = 1
See it work here:
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=c506baf8157f82390bb335d074e7614c

Mysql - choose main item in the Group By

I have the following query:
SELECT * FROM (
SELECT codigo, protocolo, status, nome
FROM protocolo
GROUP BY protocolo.protocolo
UNION ALL
SELECT codigo, protocolo, status, nome
FROM simulador
) tabela
return
codigo protocolo status nome
559 2016000026 1 ALESSANDRO CAMPOS BONIFACIO
0 2016000026 0 ALESSANDRO CAMPOS BONIFACIO
0 2016000008 0 MARIA DE JESUS F. DA SILVA ***
0 2016000007 0 MARGARIDA BORGES DA SILVA
558 2016000008 1 MARIA DE JESUS F. DA SILVA ***
556 2015014035 1 MARIA DALVA DA SILVA
There are two identical protocolo (2016000008) with different status (0,1) . I want to display only one of the repeated protocolo , one that has status = 1
Is this what you want?
SELECT odigo, protocolo, MAX(status) as stat, nome
FROM (
SELECT codigo, protocolo, status, nome
FROM protocolo
GROUP BY protocolo.protocolo
UNION ALL
SELECT codigo, protocolo, status, nome
FROM simulador
) tabela
GROUP BY codigo, protocolo, nome ;
Note: In a GROUP BY query, all columns in the SELECT should be either in the GROUP BY or in aggregation functions, unless you really, really know what you are doing.

mySql query giving result in wrong order

I am using this mysql query to fetch data from DB
SELECT DISTINCT CONCAT( streetObj.street_type, ' ',streetObj.street_name, ', ', neighborhoodObj.name , ', ', cityObj.name, ', ', stateObj.abbreviation ) namet
FROM street streetObj
LEFT
JOIN cep cepObj1
ON cepObj1.street_id = streetObj.street_id
LEFT
JOIN neighborhood neighborhoodObj
ON neighborhoodObj.neighborhood_id = cepObj1.start_neighborhood_id
LEFT
JOIN city cityObj
ON streetObj.city_id = cityObj.city_id
LEFT
JOIN state stateObj
ON stateObj.state_id = cityObj.state_id
WHERE CONCAT(streetObj.street_type,streetObj.street_name) LIKE '%rua%'
AND CONCAT(streetObj.street_type,streetObj.street_name) LIKE '%Gomes%'
AND CONCAT(streetObj.street_type,streetObj.street_name) LIKE '%de%'
AND CONCAT(streetObj.street_type,streetObj.street_name) like '%Ca%'
AND cityObj.city_id = '9668'
ORDER
BY namet ASC
LIMIT 10;
This query is executed when I type
rua Gomes de Ca
And this query result is this
Rua Baltazar Gomes de Alarcão, Jardim Miriam, São ...
Rua Cabo José Gomes de Barros, Conjunto Habitacion...
Rua Cabo Luís Gomes de Quevedo, Parque Novo Mundo,...
Rua Gomes de Carvalho, Vila Olímpia, São Paulo, SP
Rua João Gomes de Mendonça, Jaraguá, São Paulo, SP
Rua João Gomes de Mendonça, Jardim Taipas, São Pau...
Rua Pedro Gomes de Camargo, Vila Rio Branco, São P...
So as you can see i want those results on top which find exact match, But its not working.
In this query i want
Rua Gomes de Carvalho, Vila Olímpia, São Paulo, SP
on top position.
You need to rank the results by the strength of the match, and sort by that. You will have to define the sort yourself. For example:
select ..
from...
ORDER BY
case
when text like "%all my search phrase%" then 1
when text like "%all my%" then 2
when text like "%search phrase%" then 2
when text like "%phrase%" then 3
else 1000 end
DESCENDING
or
ORDER BY
case when text like "%word%" then 1 else 0 end
+
case when text like "%second_word%" then 1 else 0 end
+
.....
DESC
Specifically for your example
select namet from
(select 'Rua Baltazar Gomes de Alarcão, Jardim Miriam, São ...' as namet
union all select 'Rua Cabo José Gomes de Barros, Conjunto Habitacion...'
union all select 'Rua Cabo Luís Gomes de Quevedo, Parque Novo Mundo,...'
union all select 'Rua Gomes de Carvalho, Vila Olímpia, São Paulo, SP'
union all select 'Rua João Gomes de Mendonça, Jaraguá, São Paulo, SP'
union all select 'Rua João Gomes de Mendonça, Jardim Taipas, São Pau...'
union all select 'Rua Pedro Gomes de Camargo, Vila Rio Branco, São P...')tbl
order by
case when namet like "%rua gomes de ca%" then 100 else 0 end+ #high score for full match
case when namet like "%rua%" then 1 else 0 end+ #lower score for partial matches
case when namet like "%Gomes%" then 1 else 0 end+
case when namet like "%de%" then 1 else 0 end+
case when namet like "%ca%" then 1 else 0 end desc LIMIT 10
Although you probably want to write something to split your search phrase into words, search for every word, and rank on number of words matched. You could also look into soundex or levenstein distance for ranking similarity. Doing it in sql though is harder than doing it programatically.

MySql Combinate queries

I have a table with the records of different defects in a company. The table is something like this
ITMNBR Defect Reference_Designator RepairCenter
8800RTO001700 Componente / Placa abierto U1U2 FG
8800HIB001075V Componente Equivocado (NumeroParte) R53 SB
8800HIB001075V Ensamble Incorrecto (produccion) R19 SB
8800RTO000400 Componente / Placa abierto U1 SB
8800RTO003200 Componente Polaridad Inversa ZD2 SB
8800HIB001048 NO SOLDADURA T1 SB
8800HIB001048 Componente / Placa abierto U2 SB
8800HIB001048 Componente / Placa abierto U2 SB
Etc.
I want to consult only the three most repetitive defects of manufacture, I made this.
SELECT defect, COUNT(*) FROM reportefallas WHERE RepairCenter ='SB'
AND (CREADT BETWEEN NOW() - INTERVAL 7 DAY AND NOW()) #Select the Dates
AND (Defect IN ('Componente / Placa dañada X alto voltaje','Pin / Patita Quebrado','Componente / Placa Quemada','Componente Defecto Cosmetico','Falla no Duplicada','Soldadura Crackeada','Soldadura Fria','Parametros Incorrectos en la torre','Parametros Incorrectos en el dibujo','Componente dañado fisicamente','Conector mal colocado (inclinado)','Tornillo / Rondana Suelto','Pista Levantada (dañada)','Componente Ausente','Soldadura Derretida','Componente Equivocado (NumeroParte)','NO SOLDADURA','Componente/Placa no programada','Conector mal ensamblado','No se encontro problema','Tornillo / Rondana Flojo','Componente / Placa abierto','Pin Hole','Pin / Pata levantada (no Soldadura)','Componente Polaridad Inversa','Puente de Soldadura','Componente Desfasado Pad','Componente / Placa en corto','Splash de Soldadura','LEDs con VF diferente / equivocado','LEDs con VF alto','LEDs con VF bajo','Ensamble Incorrecto (produccion)','Componente posicion Equivocada (referencia)','Cable ensamblado posicion incorrecta'))
GROUP BY defect
ORDER BY COUNT(*) DES
LIMIT 3;
And I have the next result
Defect COUNT(*)
Componente/ Placa abierto 5
Componente / Placa dañada X alto voltaje 4
Componente dañado fisicamente 3
Now, I need a query from the same table where the defects are, with only the three most repetitive defects that I already obtained, this is the result that I want:
ITMNBR Defect Reference_Designator
8800ITH001700 Componente / Placa abierto F2-U1(SHORT)-U2(SHORT)
8800ITH001700 Componente / Placa abierto F2-U1(SHORT)-U2(SHORT)
8800ITH001700 Componente / Placa abierto F2-R29-R22-R19-R32-R13-U1(SHORT)-U2(SHORT)
8800ITH001700 Componente / Placa abierto F2-R29-R22-R19-R32-R13-U1(SHORT)-U2(SHORT)
8800ITH001700 Componente / Placa abierto F2
8850HZL0015EX Componente / Placa dañada X alto voltaje C6-C7
8800HIB001084 Componente / Placa dañada X alto voltaje R7-C20-MOV1
8850HIB004205 Componente / Placa dañada X alto voltaje C21-C42
8800HIB004220 Componente / Placa dañada X alto voltaje R22 SWITH-R44 SWITH
8850HIB004206 Componente dañado fisicamente C42
8850HIB004202 Componente dañado fisicamente F1
8800HIB0131EX Componente dañado fisicamente R37
I tried the code below, but it doesn’t accept the LIMIT.
SELECT ITMNBR, Defect, Reference_Designator FROM reportefallas
WHERE Defect IN (SELECT defect FROM reportefallas WHERE RepairCenter='SB'
AND(CREADT BETWEEN NOW() - INTERVAL 7 DAY AND NOW()) AND (Defect IN ('Componente / Placa dañada X alto voltaje','Pin / Patita Quebrado','Componente / Placa Quemada','Componente Defecto Cosmetico','Falla no Duplicada','Soldadura Crackeada','Soldadura Fria','Parametros Incorrectos en la torre','Parametros Incorrectos en el dibujo','Componente dañado fisicamente','Conector mal colocado (inclinado)','Tornillo / Rondana Suelto','Pista Levantada (dañada)','Componente Ausente','Soldadura Derretida','Componente Equivocado (NumeroParte)','NO SOLDADURA','Componente/Placa no programada','Conector mal ensamblado','No se encontro problema','Tornillo / Rondana Flojo','Componente / Placa abierto','Pin Hole','Pin / Pata levantada (no Soldadura)','Componente Polaridad Inversa','Puente de Soldadura','Componente Desfasado Pad','Componente / Placa en corto','Splash de Soldadura','LEDs con VF diferente / equivocado','LEDs con VF alto','LEDs con VF bajo','Ensamble Incorrecto (produccion)','Componente posicion Equivocada (referencia)','Cable ensamblado posicion incorrecta'))
GROUP BY defect
ORDER BY COUNT(*) DESC
LIMIT 3)
Does anyone have any ideas any ideas?
Sorry for the Spanglish and the bad English, I hope you can understand.
There are several options. Previous questions on this topic have suggested using JOIN to trim down your result set instead of IN, which would look something like this:
SELECT rf.ITMNBR, rf.Defect, rf.Reference_Designator
FROM (SELECT defect FROM reportefallas WHERE RepairCenter='SB'
AND(CREADT BETWEEN NOW() - INTERVAL 7 DAY AND NOW()) AND (Defect IN ('Componente / Placa dañada X alto voltaje','Pin / Patita Quebrado','Componente / Placa Quemada','Componente Defecto Cosmetico','Falla no Duplicada','Soldadura Crackeada','Soldadura Fria','Parametros Incorrectos en la torre','Parametros Incorrectos en el dibujo','Componente dañado fisicamente','Conector mal colocado (inclinado)','Tornillo / Rondana Suelto','Pista Levantada (dañada)','Componente Ausente','Soldadura Derretida','Componente Equivocado (NumeroParte)','NO SOLDADURA','Componente/Placa no programada','Conector mal ensamblado','No se encontro problema','Tornillo / Rondana Flojo','Componente / Placa abierto','Pin Hole','Pin / Pata levantada (no Soldadura)','Componente Polaridad Inversa','Puente de Soldadura','Componente Desfasado Pad','Componente / Placa en corto','Splash de Soldadura','LEDs con VF diferente / equivocado','LEDs con VF alto','LEDs con VF bajo','Ensamble Incorrecto (produccion)','Componente posicion Equivocada (referencia)','Cable ensamblado posicion incorrecta'))
GROUP BY defect
ORDER BY COUNT(*) DESC #Ordena de manera descendente
LIMIT 3) AS subquery
JOIN
reportefallas AS rf USING (Defect)
Alternatively, you could create a separate table to track the three most common defects, and periodically update that table (e.g. via a cron job). Then you would SELECT ... WHERE Defect IN this other table.
Either of these methods could provide better performance, depending on the situation. If you try one and have poor performance, try the other and see if it's an improvement.
(For that matter, you could also store that enormous list of defects in another table, to make your query cleaner.)
just like AirThomas said you can use a subquery.. you should also be able to do a simple select inside your IN() instead of listing out each one individually. this is another way to do the subquery though
SELECT rf.ITMNBR, rf.Defect, rf.Reference_Designator
FROM(
SELECT ITMNBR as itm_number, defect, COUNT(*) as top_three FROM reportefallas WHERE RepairCenter ='SB'
AND (CREADT BETWEEN NOW() - INTERVAL 7 DAY AND NOW()) -- Select the Dates
AND (Defect IN (SELECT defect from reportefallas))
GROUP BY defect
ORDER BY top_three DES
LIMIT 3
)as t
JOIN reportefallas rf ON rf.ITMNBR = t.itm_number

SQL extract data after character change MySQL

I have the following data in a table, the column name is title:
Acqua Di Parma Blu Mediterraneo Arancia Di Capri Scented Water EDT
Acqua Di Parma Blu Mediterraneo Arancia
Acqua Di Parma Blu Mediterraneo Bergamotto Di Calabria
Acqua Di Parma Blu Mediterraneo Cipresso Di Toscana Scented Water EDT
Acqua di Parma Blu Mediterraneo fico di amalfi
Acqua Di Parma Blu Mediterraneo Fico di Amalfi Scented Water EDT
Acqua Di Parma Blu Mediterraneo Mirto di Panarea
Acqua Di Parma Blu Mediterraneo Mirto di Panarea Scented Water EDT
Acqua Di Parma Blu Meditteraneo Cipresso
Acqua Di Parma Colonia Assoluta Bath
Acqua Di Parma Colonia Assoluta
Acqua Di Parma Colonia Body Cream
Acqua Di Parma Colonia Body Cream Tube
Adidas Deep Energy
Adidas Dynamic Pulse
Adidas Fair Play
As you can see these are all variations of Acqua Di Parma Blu Mediterraneo and Adidas products
Is there a way to read the data, letter by letter, then when the next letter does not appear more than say 3 times, return what is before the letter change
Basically, I want to read this list and return only
Acqua Di Parma Blu Meditteraneo
Acqua Di Parma Colonia
Adidas Deep Energy
Adidas Dynamic Pulse
Adidas Fair Play
The whole table is about 70,000 rows all of similar data.
The table consists of row_id, title, category
Possible?
Many thanks
Darren
OK - this isnt pretty and not sure it's completely right but it's the closest i could get.
I created a separate table containing each group of substrings like this
create table subs as
select title,
substring_index(title, ' ',1) one,
substring_index(title, ' ',2) two,
substring_index(title, ' ',3) three,
substring_index(title, ' ',4) four,
substring_index(title, ' ',5) five,
substring_index(title, ' ',6) six,
substring_index(title, ' ',7) seven
from title;
and then created a query to check if a group by of one column was greater than 1 (ie not unique) and the group by of then next column was = 1 (i.e. unique) and that the previous column was a substring of the next, then just unioned together the result of each pair of columns and finally did a select distinct across the whole lot
select distinct brand from (
select * from
(select one brand, count(*) bcount
from subs
group by one) one,
(select two prod, count(*) pcount
from subs
group by two) two
where bcount > 1
and pcount=1
and locate(one.brand, two.prod)>0
union all
select * from
(select two brand, count(*) bcount
from subs
group by two) two,
(select three prod, count(*) pcount
from subs
group by three) three
where two.bcount > 1
and three.pcount=1
and locate(two.brand, three.prod)>0
union all
select * from
(select three brand, count(*) bcount
from subs
group by three) three,
(select four prod, count(*) pcount
from subs
group by four) four
where three.bcount > 1
and four.pcount=1
and locate(three.brand, four.prod)>0
union all
select * from
(select four brand, count(*) bcount
from subs
group by four) four,
(select five prod, count(*) pcount
from subs
group by five) five
where four.bcount > 1
and five.pcount=1
and locate(four.brand, five.prod)>0
union all
select * from
(select five brand, count(*) bcount
from subs
group by five) five,
(select six prod, count(*) pcount
from subs
group by six) six
where five.bcount > 1
and six.pcount=1
and locate(five.brand, six.prod)>0
union all
select * from
(select six brand, count(*) bcount
from subs
group by six) six,
(select seven prod, count(*) pcount
from subs
group by seven) seven
where six.bcount > 1
and seven.pcount=1
and locate(six.brand, seven.prod)>0) x
which results in the following
But it still has some problems as it shows both Aqua Di Parma Blu and Aqua Di Parma Medit.. in two lines instead of just once so it's not correct.