How to make MYSQL search for minimum 3 keyword matches? - mysql

Hi I have one big problem with one MYSQL search.
My database TABLE looks like this :
+------+----------+------+
| id | keywords | file |
+------+----------+------+
At keywords there are many keywords for each entry seperated with comas. (keyword1,keyword2...).
At PHP array there are listed some keywords (5-10).
And my search must get all DB entries which got atleast 3 from those keywords.
Its not required to got all of those words! But it can't work and with just one.
Can somebody help me with that query? I don't got good idea how to make it.

That's a challenge. The brute force method would be to use a UNION in a subquery with a count.
For example,
select id, file, count(*) from
(select distinct id, file
from file_table
where FIND_IN_SET(keyword1, keywords)
UNION ALL
select distinct id, file
from file_table
where FIND_IN_SET(keyword2, keywords)
UNION ALL
select distinct id, file
from file_table
where FIND_IN_SET(keyword3, keywords)
UNION ALL
select distinct id, file
from file_table
where FIND_IN_SET(keyword4, keywords)
.... MORE UNION ALL ....) as files
group by id, file
having count(*) >= 3
More efficiently, you could have a separate table with keywords and ID, one keyword/ID combo per row. This would eliminate the wildcard search and make the query more efficient.
The next step would be to go to something like ElasticSearch and filter on the score of the result.

If you had this setup:
table files:
+------+-------+
| id | file |
+------+-------+
| 1000 | foo |
| 1001 | bar |
+------+-------+
table keywords:
+----+-------+
| id | word |
+----+-------+
| 9 | lorem |
| 10 | ipsum |
+----+-------+
table filekeywords:
+----+--------+--------+
| id | fileid | wordid |
+----+--------+--------+
| 1 | 1000 | 9 |
| 2 | 1000 | 10 |
| 3 | 1001 | 10 |
+----+--------+--------+
You could find files with keywords lorem, ipsum, dolor like this:
SELECT COUNT(DISTINCT(k.word)), f.*
FROM files f
INNER JOIN filekeywords fk
ON fk.fileid = f.id
INNER JOIN keywords k
ON k.id = fk.wordid
WHERE k.word in ('lorem', 'ipsum', 'dolor')
GROUP BY f.id
HAVING COUNT(DISTINCT(k.word)) >= 3

Related

MySQL: export multiple tables to CSV file

I'm trying to export multiple MySQL tables into a single CSV file, these tables have different number of columns and have nothing in common. An example is below:
table1:
ID| Name
1 | Ted
2 | Marry
null| null
table2:
Married| Age | Phone
Y | 35 | No
N | 25 | Yes
Y | 45 | No
The result that I want to get is:
ID| Name | Married | Age | Phone
1 | Ted | Y | 35 | No
2 | Marry | N | 25 | Yes
null| null | Y | 45 | No
Is it possible to do using MySQL commands? I have tried all types of join but it doesn't give me the result that I need.
Try this query:
SELECT * FROM
(SELECT #n1:=#n1+1 as 'index', Table1.* FROM Table1, (select #n1:=0)t)t1
natural join (SELECT #n2:=#n2+1 as 'index', Table2.* FROM Table2, (select #n2:=0)t1)t2 ;
You can see in this fiddle a working example.
In the example we generate an index column for Table1 and Table2 and natural join the 2 tables.
This way the join is done using the row position as returned by the SELECT of tables without any ORDER operator and usually this is not a good idea!
I'm not quite sure you understand what you are asking but sure you can join two tables without really caring on which rows will be matched:
SET #i1=0;
SET #i2=0;
SELECT * INTO OUTFILE 'xyz' FIELDS TERMINATED BY ','
FROM (SELECT #i1:=#i1+1 as i1, table1.* FROM table1) AS a
JOIN (SELECT #i2:=#i2+1 as i2, table2.* FROM table2) b ON (a.i1=b.i2);

What is SQL to select a property and the max number of occurrences of a related property?

I have a table like this:
Table: p
+----------------+
| id | w_id |
+---------+------+
| 5 | 8 |
| 5 | 10 |
| 5 | 8 |
| 5 | 10 |
| 5 | 8 |
| 6 | 5 |
| 6 | 8 |
| 6 | 10 |
| 6 | 10 |
| 7 | 8 |
| 7 | 10 |
+----------------+
What is the best SQL to get the following result? :
+-----------------------------+
| id | most_used_w_id |
+---------+-------------------+
| 5 | 8 |
| 6 | 10 |
| 7 | 8 |
+-----------------------------+
In other words, to get, per id, the most frequent related w_id.
Note that on the example above, id 7 is related to 8 once and to 10 once.
So, either (7, 8) or (7, 10) will do as result. If it is not possible to
pick up one, then both (7, 8) and (7, 10) on result set will be ok.
I have come up with something like:
select counters2.p_id as id, counters2.w_id as most_used_w_id
from (
select p.id as p_id,
w_id,
count(w_id) as count_of_w_ids
from p
group by id, w_id
) as counters2
join (
select p_id, max(count_of_w_ids) as max_counter_for_w_ids
from (
select p.id as p_id,
w_id,
count(w_id) as count_of_w_ids
from p
group by id, w_id
) as counters
group by p_id
) as p_max
on p_max.p_id = counters2.p_id
and p_max.max_counter_for_w_ids = counters2.count_of_w_ids
;
but I am not sure at all whether this is the best way to do it. And I had to repeat the same sub-query two times.
Any better solution?
Try to use User defined variables
select id,w_id
FROM
( select T.*,
if(#id<>id,1,0) as row,
#id:=id FROM
(
select id,W_id, Count(*) as cnt FROM p Group by ID,W_id
) as T,(SELECT #id:=0) as T1
ORDER BY id,cnt DESC
) as T2
WHERE Row=1
SQLFiddle demo
Formal SQL
In fact - your solution is correct in terms of normal SQL. Why? Because you have to stick with joining values from original data to grouped data. Thus, your query can not be simplified. MySQL allows to mix non-group columns and group function, but that's totally unreliable, so I will not recommend you to rely on that effect.
MySQL
Since you're using MySQL, you can use variables. I'm not a big fan of them, but for your case they may be used to simplify things:
SELECT
c.*,
IF(#id!=id, #i:=1, #i:=#i+1) AS num,
#id:=id AS gid
FROM
(SELECT id, w_id, COUNT(w_id) AS w_count
FROM t
GROUP BY id, w_id
ORDER BY id DESC, w_count DESC) AS c
CROSS JOIN (SELECT #i:=-1, #id:=-1) AS init
HAVING
num=1;
So for your data result will look like:
+------+------+---------+------+------+
| id | w_id | w_count | num | gid |
+------+------+---------+------+------+
| 7 | 8 | 1 | 1 | 7 |
| 6 | 10 | 2 | 1 | 6 |
| 5 | 8 | 3 | 1 | 5 |
+------+------+---------+------+------+
Thus, you've found your id and corresponding w_id. The idea is - to count rows and enumerate them, paying attention to the fact, that we're ordering them in subquery. So we need only first row (because it will represent data with highest count).
This may be replaced with single GROUP BY id - but, again, server is free to choose any row in that case (it will work because it will take first row, but documentation says nothing about that for common case).
One little nice thing about this is - you can select, for example, 2-nd by frequency or 3-rd, it's very flexible.
Performance
To increase performance, you can create index on (id, w_id) - obviously, it will be used for ordering and grouping records. But variables and HAVING, however, will produce line-by-line scan for set, derived by internal GROUP BY. It isn't such bad as it was with full scan of original data, but still it isn't good thing about doing this with variables. On the other hand, doing that with JOIN & subquery like in your query won't be much different, because of creating temporery table for subquery result set too.
But to be certain, you'll have to test. And keep in mind - you already have valid solution, which, by the way, isn't bound to DBMS-specific stuff and is good in terms of common SQL.
Try this query
select p_id, ccc , w_id from
(
select p.id as p_id,
w_id, count(w_id) ccc
from p
group by id,w_id order by id,ccc desc) xxx
group by p_id having max(ccc)
here is the sqlfidddle link
You can also use this code if you do not want to rely on the first record of non-grouping columns
select p_id, ccc , w_id from
(
select p.id as p_id,
w_id, count(w_id) ccc
from p
group by id,w_id order by id,ccc desc) xxx
group by p_id having ccc=max(ccc);

Join two tables using multiple rows in the join

I have two tables
Table: color_document
+----------+---------------------+
| color_id | document_id |
+----------+---------------------+
| 180907 | 4270851 |
| 180954 | 4270851 |
+----------+---------------------+
Table: color_group
+----------------+-----------+
| color_group_id | color_id |
+----------------+-----------+
| 3 | 180954 |
| 4 | 180907 |
| 11 | 180907 |
| 11 | 180984 |
| 12 | 180907 |
| 12 | 180954 |
+----------------+-----------+
Is it possible for a query to get a result that looks something like this using multiple color id's to join the two tables?
Result
+----------------+--------------+
| color_group_id | document_id |
+----------------+--------------+
| 12 | 4270851 |
+----------------+--------------+
Since Color Group 12 is the only group that has the exact same set of Colors that Document 4270851 has.
I've got some bad data that i'm being forced to work with so I've had to manufacture the color groups by finding each unique set of color_id's associated with document_id's. I'm trying to then create a new relationship directly between my manufactured color groups and documents.
I know I could probably do something with a GROUP_CONCAT to make a pseudo key of concatenated color ids, but I'm trying to find a solution that would also work in, say, Oracle. Am I barking up the completely wrong tree with this logic?
My ultimate goal is to be able to have a single row in a table that would represent any number of Colors that are associated with a Document to be exported to a completely different system than the one I'm working with.
Any thoughts/comments/suggestions are greatly appreciated.
Thank you in advance for looking at my question.
Do a normal join of the two tables, and count the number of rows in each pairing. Then test whether this is the same as the number of times each of the items appears in the original tables. If all are the same, then all color IDs must match.
SELECT a.color_group_id, a.document_id
FROM (
SELECT color_group_id, document_id, COUNT(*) ct
FROM color_document d
JOIN color_group g ON d.color_id = g.color_id
GROUP BY color_group_id, document_id) a
JOIN (
SELECT color_group_id, COUNT(*) ct
FROM color_group
GROUP BY color_group_id) b
ON a.color_group_id = b.color_group_id and a.ct = b.ct
JOIN (
SELECT document_id, COUNT(*) ct
FROM color_document
GROUP BY document_id) c
ON a.document_id = c.document_id and a.ct = c.ct
SQLFIDDLE
If i understand your question correct you just have to join the two tables and then group the results by color_group_id an document_id.
SQL Fiddle
select color_group_id, document_id
from
color_document cd join
color_group cg
on cd.color_id = cg.color_id
group by color_group_id, document_id
That query will give you this result set:
COLOR_GROUP_ID DOCUMENT_ID
3 4270851
4 4270851
11 4270851
12 4270851
Is that what you want?

Normalizing table LIMIT issue

I have 3 tables song, author, song_author.
Connection is song 1--* song_author 1--* author
I use a query like
SELECT *
FROM song
LEFT JOIN song_author ON s_id = sa_song
LEFT JOIN author ON sa_author = a_id
For the given example that would result in 3 rows:
John - How beautiful it is
John - Awesome
George - Awesome
and I am populating my objects like that so it's fine.
However, I want to add a LIMIT clause, but because it is returning several rows for just one song a LIMIT 10 doesn't always show 10 songs.
The other possibility I know is print all songs and then inside take a second query, but that would result in O(n) which I'd like to avoid.
author:
+------+--------+
| a_id | a_name |
+------+--------+
| 1 | John |
| 2 | George |
+------+--------+
song:
+------+---------------------+
| s_id | s_name |
+------+---------------------+
| 1 | How beautiful it is |
| 2 | Awesome |
+------+---------------------+
song_author:
+-----------+---------+
| sa_author | sa_song |
+-----------+---------+
| 1 | 1 |
| 1 | 2 |
| 2 | 2 |
+-----------+---------+
You can try something like that:
SELECT *
FROM
(SELECT *
FROM song
LIMIT 10) r
LEFT JOIN song_author ON r.s_id = sa_song
LEFT JOIN author ON sa_author = a_id
To be clear, your goal is what, exactly? It seems like you want to grab a set amount of songs at one time (10 in the example), and all related authors for said songs. However, if multiple of those [song, author] record tuples represented the same song, you'd end up with 10 records, but less than 10 actual songs.
I like Marcin's approach, but if you want to avoid sub-queries and temporary tables, you can instead use a feature built into MySQL: GROUP_CONCAT()
SELECT
s_name,
GROUP_CONCAT(DISTINCT a_name SEPARATOR '|')
FROM
song
LEFT JOIN
song_author ON sa_song = s_id
LEFT JOIN
author ON a_id = sa_author
GROUP BY
s_name
LIMIT 10

Distinct Help

Alright so I have a table, in this table are two columns with ID's. I want to make one of the columns distinct, and once it is distinct to select all of those from the second column of a certain ID.
Originally I tried:
select distinct inp_kll_id from kb3_inv_plt where inp_plt_id = 581;
However this does the where clause first, and then returns distinct values.
Alternatively:
select * from (select distinct(inp_kll_id) from kb3_inv_plt) as inp_kll_id where inp_plt_id = 581;
However this cannot find the column inp_plt_id because distinct only returns the column, not the whole table.
Any suggestions?
Edit:
Each kll_id may have one or more plt_id. I would like unique kll_id's for a certain kb3_inv_plt id.
| inp_kll_id | inp_plt_id |
| 1941 | 41383 |
| 1942 | 41276 |
| 1942 | 38005 |
| 1942 | 39052 |
| 1942 | 40611 |
| 1943 | 5868 |
| 1943 | 4914 |
| 1943 | 39511 |
| 1944 | 39511 |
| 1944 | 41276 |
| 1944 | 40593 |
| 1944 | 26555 |
If you do mean, by "make distinct", "pick only inp_kll_ids that happen just once" (not the SQL semantics for Distinct), this should work:
select inp_kll_id
from kb3_inv_plt
group by inp_kll_id
having count(*)=1 and inp_plt_id = 581;
Get all the distinct first (alias 'a' in my following example) and then join it back to the table with the specified criteria (alias 'b' in my following example).
SELECT *
FROM (
SELECT
DISTINCT inp_kll_id
FROM kb3_inv_plt
) a
LEFT JOIN kb3_inv_plt b
ON a.inp_kll_id = b.inp_kll_id
WHERE b.inp_plt_id = 581
in this table are two columns with
ID's. I want to make one of the
columns distinct, and once it is
distinct to select all of those from
the second column of a certain ID.
SELECT distinct tableX.ID2
FROM tableX
WHERE tableX.ID1 = 581
I think your understanding of distinct may be different from how it works. This will indeed apply the where clause first, and then get a distinct list of unique entries of tableX.ID2, which is exactly what you ask for in the first part of your question.
By making a row distinct, you're ensuring no other rows are exactly the same. You aren't making a column distinct. Let's say your table has this data:
ID1 ID2
10 4
10 3
10 7
4 6
When you select distinct ID1,ID2 - you get the same as select * because the rows are already distinct.
Can you add information to clear up what you are trying to do?