MySQL Reciprocal Search - mysql

I have a table which stores the edges of a directed graph like so:
Table EDGES
FROM_NODE | TO_NODE | STRENGTH
1 | 1 | 8
1 | 2 | 5
2 | 1 | 4
1 | 3 | 2
3 | 4 | 1
And I'm trying to search for edges which are supported in both directions with strength > 3. In the example above, 1 -> 2 and 2 -> 1 both exist, however, 1 <-> 3 does not exist in both directions. 1 -> 1 doesn't count, for obvious reasons.
The major complication is that there are over 1,000,000 edges to search, and all the queries I have tried so far fail before I can check if they've worked.
Any suggestions would be greatly appreciated!

To me the most straightforward solution is something like:
select one.from_node, one.to_node
from edges one
join edges other on (one.to_node = other.from_node AND one.from_node = other.to_node)
where one.strength > 3 AND other.strength > 3
AND one.from_node <> one.to_node
If you have a lot of data, than it's might be a good idea to reconsider indexes on the table and raise the execution limit.
Here is an sql fiddle to check the query.

I think you can use something like this:
select
least(FROM_NODE, TO_NODE) as n1,
greatest(FROM_NODE, TO_NODE) as n2
from
edges
where FROM_NODE<>TO_NODE and nodes.strength>3
group by n1, n2
having count(*)=2

Related

SQL - Join Three Tables with common link in third table

Before I begin, Yes, I thoroughly tried searching for many tutorials on JOINS/INNER JOINS/OUTER JOINS/FULL JOINS but I'm not exactly sure what I'm looking for so a little guidance or simply a finger pointing me in the correct direction would be very helpful. I'll try to be as clear as possible.
So basically, I have Three tables
Foo
| FooID | name | data |
1 Name1 Data1
2 Name2 Data2
3 Name3 Data3
4 Name4 Data4
Bar
| BarID |
1
2
Matrix
| BarID | FooID|
1 2
1 3
1 4
2 1
2 3
So what I'm looking for is, I basically have BarID (let's just pretend it's 1 for clarity purposes). I want to get all the rows from table Matrix that correlate to BarID, so that way I can retrieve the rows that it relate to in Foo (For example, BarID = 1, so I should get rows 2, 3 and 4 in Foo and if BarID is 2, I will get 1 and 3, and so on).
I was trying something similar to:
SELECT Foo.FooID, Foo.name, Foo.data
FROM Bar
JOIN Matrix ON Matrix.BarID = 1 // The 1 is passed in, in this example
JOIN Foo ... // And this is where I'm stuck
Does this make sense what I'm trying to accomplish? I know it's weird. Will appreciate any assistance pointing. Thank you in advance!
You seem to want a simple JOIN and WHERE
select f.*
from foo f join
matrix m
on f.fooid = m.fooid
where m.barid = 1;
You do not need the bar table, because you are passing in the id. I think you might have been overthinking the problem.

Mysql: Find most similar numerical rows based on multiple columns

This is my first question here, I'll try my best to be clear and factual. I've googled for quite a long time but never got the result I wanted. My Mysql knowledge isn't the best and maybe that's why I can't get this answer to work with my wanted function.
At first, here's my Mysql data
user | speed | strength | stamina | precision
---------------------------------------------
1 | 4 | 3 | 5 | 2
2 | 2 | 5 | 3 | 4
3 | 3 | 4 | 6 | 3
Question
I want a Mysql query that find the most similar row to a specific user. For example, if I want to see who's most similar to user 1, I want it to find user 3. User 1 and 2 have in total the same value (14) but 1 and 3 are more similar, see the picture for a better view.
I'd be so glad and grateful if someone knew what Mysql function I should look at, or if you have any ideas.
I think your requirement translated into functions would be "the minimum value of the average of the differences between users scores at ability level".
If that's the case, it can be translated in SQL like this
select t2.user,
(
abs(t1.speed - t2.speed) +
abs(t1.strength - t2.strength) +
abs(t1.stamina - t2.stamina) +
abs(t1.precision - t2.precision)
) / 4 as diff_avg
from users t1
cross join
users t2
where t2.user <> t1.user and
t1.user = 1 /* the starting user id goes here */
order by 2 asc
limit 1
The most accurate solution to do this numerically is by using profile similarity - by getting the rows with the highest correlation coefficient to User1
I have been looking for a way to do this in MySQL but can't seem to find a way to. Hope someone knows enough about this to help us

postgresql, select multiple json_array_elements works so werid

I want use json_array_elements to expands json array. But it works so werid. Pls see below.
select json_array_elements('[1, 2]') as a, json_array_elements('[2, 3, 4]') as b;
a | b
---+---
1 | 2
2 | 3
1 | 4
2 | 2
1 | 3
2 | 4
(6 rows)
select json_array_elements('[1, 2]') as a, json_array_elements('[2, 3]') as b;
a | b
---+---
1 | 2
2 | 3
(2 rows)
It's seems when the length of the arrays are equal, something goes wrong.
Can anyone tell me, why is like this.
PostgreSQL repeats each list until both happen to be at the end simultaneously.
In other words, the length of the result list is the least common multiple of the length of the input lists.
This behaviour is indeed weird, and will be changed in PostgreSQL v10:
select json_array_elements('[1, 2]') as a, json_array_elements('[2, 3, 4]') as b;
a | b
---+---
1 | 2
2 | 3
| 4
(3 rows)
From the commit message:
While moving SRF evaluation to ProjectSet would allow to retain the old
"least common multiple" behavior when multiple SRFs are present in one
targetlist (i.e. continue returning rows until all SRFs are at the end of
their input at the same time), we decided to instead only return rows till
all SRFs are exhausted, returning NULL for already exhausted ones. We
deemed the previous behavior to be too confusing, unexpected and actually
not particularly useful.

1M rows, 1 table, few columns vs 300 tables, 3000 rows, few columns vs 300 columns, 3000 rows, 1 table?

I have tried looking around for the best way to go about with this problem, but I could not find any previous examples of such problem.
I am building a hyperlocal based internet shopping mall, and the zone is divided into about 3000 zones. Each zone holds about 300 items. They are similar items but can be varied by little for each zone. I need to get the list of "available items" for each zone.
Insertion speed is not a matter, and it will be chiefly getting the items based on the "zone" value. What would be the most efficient way to set up the DB for such instance?
1 table with 1M rows such as
id | zone | item | avail
1 | 1 | 1 | Y
2 | 1 | 2 | N
...
1262| 4 | 35 | Y
300 tables with 3000 rows such as
table: zone1
id | item | avail
1 | 1 | Y
2 | 2 | N
table: zone4
id | item | avail
...
35 | 35 | Y
1 table with 300 columns (each per item), 3000 rows
id | zone | item1 | item2 ...
1 | 1 | Y | N ...
...
4 | 4 | Y | Y ...
Thanks in advance for any help or any leads I could use so that I could make a decision!
On the limit of opinion based, but here we go;
Option 1 is most likely what you want.
Option 2 would give you 300 tables to maintain, so if you need to add a field later you have 300 tables to alter which sounds like a maintainability nightmare. Also, 300 indexes will most likely cache worse than a single bigger one and searching for a specific item in all zones is basically out of the question.
Option 3 would require you to alter the table structure and queries to add more than 300 items. Also, to be able to find an item by id you'd need SQL looking like SELECT xx FROM yy WHERE item1=57 OR item2=57 OR ... OR item300=57 which MySQL's optimizer will most likely just give up on.
For a Relational Database point of view, you should choose the first option.
- If one day you have to add a new item or a new zone, you will not have to create a new column or a new table, and the same if you need to delete a item/zone.
But for a NoSQL point of view, you should choose tables like option 2.
Simply use first option. 1M rows, 1 table, few columns .
First option is the best one. DBMSs incur big overhead per table and per row. Plus they were not designed for the case of many tables and many rows.

Select from a table that uses materialized path to encode a tree, ordered by depth-first (no recursive/ltree)

I have a table in a relational database, in which I encode a tree using the technique known as Materialized path (also known as Lineage column). That is, for each node in my tree I have a row in the table, and for each row I have a string column named ancestry where I store the path from the root node to the node represented by this row.
Is it possible, and if yes - how, to select the rows in the table orderd by preorder, that is they should appear in the result set in the order that would result by visiting the tree depth-first. I use MySQL - so no recursive queries and no ltree extension.
For example, a tree, it's table, and selected ordered by preorder:
1 SELECT * FROM nodes SELECT * FROM nodes ORDER BY ?depth_first_visit_order?
| \ id | ancestry id | ancestry
2 3 ------------- -------------
| | \ 1 | NULL 1 | NULL NOTE: I don't care about the
4 5 6 2 | 1 2 | 1 order of siblings!
| 3 | 1 4 | 1/2
7 4 | 1/2 3 | 1
5 | 1/3 5 | 1/3
6 | 1/3 7 | 1/3/5
7 | 1/3/5 6 | 1/3
Note: I am interested explicitly in doing this over a materialized path encoding!
Related: What are the options for storing hierarchical data in a relational database?
I believe what you want is an alphabetic sort.
SELECT id, ancestry, ancestry + '/' + CAST(id as nvarchar(10)) AS PathEnumeration
FROM nodes
ORDER BY 3 ASC;
I don't really remember how MySQL concatenates, but I'm sure my meaning is clear.
1
1/2
1/2/4
1/3
1/3/5
1/3/5/7
1/3/6
Note that it is an alphabetic sort, so 11 will show up before 2. But, you said you didn't care about sibling ordering. I, of course, would rewrite it as a nested set ;)
this will order by the last number of your "ancestry"
select *,
Substring(ancestry,LEN(ancestry) - Charindex('/',Reverse(ancestry))+2, LEN(ancestry)) as END_CHAR
from nodes
order by END_CHAR desc
I didn't try with numbers bigger that 9, you may have to cast to int