SQL query to find set of doc_ids where there is maximum intersection of ent_ids - mysql

I have a table with O(1M) rows with columns doc_id and ent_id where (doc_id, ent_id) is the primary key.
+--------+--------+
| doc_id | ent_id |
+--------+--------+
| 1 | a |
| 1 | b |
| 1 | x |
| 1 | y |
| 2 | a |
| 3 | a |
| 3 | x |
| 3 | y |
| 4 | x |
| 4 | y |
+--------+--------+
My question is, How do I efficiently find a set of doc_ids ( say I need top 1000 or 5000 doc_ids) where there is maximum intersection of ent_ids among that selected set of doc_ids?
For example : In the above table,
say I need top 2 doc_ids where there is maximum intersection among their ent_ids.The result would be - doc_ids = {1,3} with [ common ent_ids={a,x,y}, common ent_ids count=3 ]
say I need top 3 doc_ids where there is maximum intersection among their ent_ids. The result would be - doc_ids = {1,3,4} with [ common ent_ids={x,y}, common ent_ids count=2 ]
footnote - If it's not possible do it efficiently with SQL, any direction towards alternative method of doing it in application code would also be helpful. say, convert to csv -> some data-structure[inverted index?]/library + python code -> result set.

Related

How can I merge two strings of comma-separated numbers in MySQL?

For example, there are three rooms.
1|gold_room|1,2,3
2|silver_room|1,2,3
3|brown_room|2,4,6
4|brown_room|3
5|gold_room|4,5,6
Then, I'd like to get
gold_room|1,2,3,4,5,6
brown_room|2,3,4,6
silver_room|1,2,3
How can I achieve this?
I've tried: select * from room group by name; And it only prints the first row. And I know CONCAT() can combine two string values.
Please use below query,
select col2, GROUP_CONCAT(col3) from data group by col2;
Below is the Test case,
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=ab35e8d66ffe3ac6436c17faf97ee9af
I'm not making an assumption that the lists don't have elements in common on separate rows.
First create a table of integers.
mysql> create table n (n int primary key);
mysql> insert into n values (1),(2),(3),(4),(5),(6);
You can join this to your rooms table using the FIND_IN_SET() function. Note that this cannot be optimized. It will execute N full table scans. But it does create an interim set of rows.
mysql> select * from n inner join rooms on find_in_set(n.n, rooms.csv) order by rooms.room, n.n;
+---+----+-------------+-------+
| n | id | room | csv |
+---+----+-------------+-------+
| 2 | 3 | brown_room | 2,4,6 |
| 3 | 4 | brown_room | 3 |
| 4 | 3 | brown_room | 2,4,6 |
| 6 | 3 | brown_room | 2,4,6 |
| 1 | 1 | gold_room | 1,2,3 |
| 2 | 1 | gold_room | 1,2,3 |
| 3 | 1 | gold_room | 1,2,3 |
| 4 | 5 | gold_room | 4,5,6 |
| 5 | 5 | gold_room | 4,5,6 |
| 6 | 5 | gold_room | 4,5,6 |
| 1 | 2 | silver_room | 1,2,3 |
| 2 | 2 | silver_room | 1,2,3 |
| 3 | 2 | silver_room | 1,2,3 |
+---+----+-------------+-------+
Use GROUP BY to reduce these rows to one row per room. Use GROUP_CONCAT() to put the integers together into a comma-separated list.
mysql> select room, group_concat(distinct n.n order by n.n) as csv
from n inner join rooms on find_in_set(n.n, rooms.csv) group by rooms.room
+-------------+-------------+
| room | csv |
+-------------+-------------+
| brown_room | 2,3,4,6 |
| gold_room | 1,2,3,4,5,6 |
| silver_room | 1,2,3 |
+-------------+-------------+
I think this is a lot of work, and impossible to optimize. I don't recommend it.
The problem is that you are storing comma-separated lists of numbers, and then you want to query it as if the elements in the list are discrete values. This is a problem for SQL.
It would be much better if you did not store your numbers in a comma-separated list. Store multiple rows per room, with one number per row. You can run a wider variety of queries if you do this, and it will be more flexible.
For example, the query you asked about, to produce a result with numbers in a comma-separated list is more simple, and you don't need the extra n table:
select room, group_concat(n order by n) as csv from rooms group by room
See also my answer to Is storing a delimited list in a database column really that bad?

MySQL - Interchange column value from one row onto another

SELECT OlineDate, OlineOrder, OlineDesc, OlineGroup, OlinePrice
FROM tblorderlines
WHERE DATE(OlineDate) = '2019-10-19' AND OlineOrder = 170
AND OlineGroup IN ('spec')
|====================================================================|
| OlineOrder |OlineDate | OlineDesc | OlineGroup | OlinePrice |
|============+============+===========+================+=============|
| 10 | 2019-10-19 | Coupon | spec | -2.42 |
|------------+------------+-----------+----------------|-------------|
| 10 | 2019-10-19 | 10% OFF | spec | 0.00 |
|------------+------------------------+----------------+-------------|
I am looking for a query that would interchange the '10% off' value over the 'Coupon' value. The only results I've found that may produce the result I want are pivot tables but those don't exist in MySQL. Is there another route I can take?

MySQL Substring between two DIFFERENT strings where the second needle comes AFTER the first

I have to extract certain data from a MySQL column. The table looks like so:
+----+---------------------+------------------------+
| id | time | data |
+----+---------------------+------------------------+
| 1 | 2016-10-28 00:12:01 | a Q1!! AF3 !! ext!! z |
| 2 | 2016-10-28 02:19:02 | z !!3F2 !AF66-2!! !!a |
| 3 | 2016-10-28 11:35:03 | AF!a !!! pl6 f !!! dd |
+----+---------------------+------------------------+
I want to grab the string from column data between the characters AF and the NEXT occurrence of !! So ideally the query SELECTid,[something] AS x FROM tbl would result in:
+----+------+
| id | x |
+----+------+
| 1 | 3 |
| 2 | 66-2 |
| 3 | !a |
+----+------+
Thoughts on how to do this? All the other questions I see don't quite relate, as they don't deal with finding the first occurrence of the second needle (!!) AFTER the first needle (AF).
There may be faster ways to do this but this is a good start:
select substring_index(substring_index(data, 'AF', -1), '!!', 1)

mysql table having a->b and b->a values, select only a->b set of values

I have one table having 5 columns
linkid, orinodeno, orinodeno, ternodeno, terifindex
linkid is autoincremented. orinodeno, oriifindex is one combination value and ternodeno, terifindex other combination (orinodeno,oriifindex is originating value and ternodeno,terifindex terminating value i.e, in between there is a link eg just like map two pts n in between connecting link) so my table contains a->b values (i.e a is combination of orinodeno, oriifindex and b is combination of ternodeno,terifindex) and b->a values. so I have to select only a->b set of values not b->a. Also sending my table image. My Table
There is no a map definition in sql databases, forget it. Check any database normalization tutorial. Then you shouldn't have any problems with select statements.
Please be clear about what you are asking. If you can not explain in words, please give example input and your expected output.
From link of table image you have provided and description, It looks like you expect following:
Data in current table:
------------------------------------------------------------------
|linkid | orinodenumber | oriifindex | ternodenumber | terifindex|
------------------------------------------------------------------
|305 | 261 | 2 | 309 | 2 |
|306 | 309 | 2 | 261 | 2 |
|307 | 257 | 10 | 310 | 10 |
|308 | 310 | 10 | 257 | 10 |
|309 | 257 | 11 | 310 | 11 |
------------------------------------------------------------------
Expected Output:
------------------------------------------------------------------
|linkid | orinodenumber | oriifindex | ternodenumber | terifindex|
------------------------------------------------------------------
|305 | 261 | 2 | 309 | 2 |
|307 | 257 | 10 | 310 | 10 |
------------------------------------------------------------------
If that is your case, following query might help you (Assuming table name as link_table):
SELECT *
FROM link_table o
WHERE EXISTS (SELECT linkid
FROM link_table i
WHERE o.orinodenumber = i.ternodenumber
AND o.oriifindex = i.terifindex
AND o.linkid < i.linkid);

Order by in mysql using second table

I have two tables, one is a list os stores and attributes, the second is a list of allocationsa based on these attributes.
The attribute table (stores_metadata)
| key | store_key | field | value
| 1 | 1 | size | Large
| 2 | 1 | dist | Midlands
| 3 | 2 | size | Medium
| 4 | 3 | dist | South
The allocation table (allocation)
| key | ticket_key | field | value | count
| 1 | 1 | size | Large | 10
| 2 | 1 | size | Medium| 5
I've managed to get the allocations working using the code:
SELECT store_key, quantity FROM
allocation
INNER JOIN store_metadata
ON allocation.`field` = store_metadata.`field`
AND allocation.`value` = store_metadata.`value`
This returns a list of the stores and how many items they should recieve, what I now need to do it order the stores by the distribution attribute.
Any help would be greatly appreciated.
The question isn't asked very well.
To perform ordering by any column in your result set add ORDER BY [column] to the end of the query. E.g.
SELECT store_key, quantity FROM
allocation
INNER JOIN store_metadata
ON allocation.`field` = store_metadata.`field`
AND allocation.`value` = store_metadata.`value`
ORDER BY allocation.`field`;