How to get row with max dot product using MemSQL - mysql

First of all, I'll preface this by saying I have minimal experience with SQL.
Anyways, I have a table in MemSQL of the following format:
+-----+--------+----------+----------+
| id | uuid | identity | template |
+-----+--------+----------+----------+
| int | string | string | blob |
+-----+--------+----------+----------+
I am trying to use the MemSQL DOT_PRODUCT feature to obtain the identity of the template which generates the maximum dot product against a probe vector that I have provided. Note template is a normalized array of floats of fixed length.
My SQL statement is as follows:
SELECT id, identity, MAX(DOT_PRODUCT(template, JSON_ARRAY_PACK('[<probe template here>]')))
AS score FROM collection;
However, I seem to be experiencing strange behavior, where I am getting inconsistent results (1 out of 10 times that I execute the query I will get a different identity but always the same max score). Additionally, the identity is incorrect (see further below).
The result from the query is the following (9 out of 10 times):
+----+-------------+------------------+
| id | identity | score |
+----+-------------+------------------+
| 7 | armstrong_2 | 0.56488848850131 |
+----+-------------+------------------+
As a sanity check, I wrote the following SQL statement, expecting for the max to be the same. Note, I am using the exact same probe vector from before:
SELECT id, identity, DOT_PRODUCT(template, JSON_ARRAY_PACK('[<same probe template from before>]'))
AS score FROM collection ORDER BY score DESC;
The results are as follows:
+----+--------------+--------------------+
| id | identity | score |
+----+--------------+--------------------+
| 1 | armstrong_1 | 0.56488848850131 |
| 21 | armstrong_1 | 0.56488848850131 |
| 6 | armstrong_1 | 0.56488848850131 |
| 11 | armstrong_1 | 0.56488848850131 |
| 16 | armstrong_1 | 0.56488848850131 |
| 17 | armstrong_2 | 0.534708674997091 |
| 7 | armstrong_2 | 0.534708674997091 |
| 22 | armstrong_2 | 0.534708674997091 |
| 2 | armstrong_2 | 0.534708674997091 |
| 12 | armstrong_2 | 0.534708674997091 |
| 10 | mr_bean_2 | 0.072085081599653 |
| 15 | mr_bean_2 | 0.072085081599653 |
| 5 | mr_bean_2 | 0.072085081599653 |
| 20 | mr_bean_2 | 0.072085081599653 |
| 25 | mr_bean_2 | 0.072085081599653 |
| 14 | mr_bean | 0.037121964152902 |
| 9 | mr_bean | 0.037121964152902 |
| 4 | mr_bean | 0.037121964152902 |
| 19 | mr_bean | 0.037121964152902 |
| 24 | mr_bean | 0.037121964152902 |
| 13 | jimmy_carter | -0.011749440804124 |
| 23 | jimmy_carter | -0.011749440804124 |
| 18 | jimmy_carter | -0.011749440804124 |
| 8 | jimmy_carter | -0.011749440804124 |
| 3 | jimmy_carter | -0.011749440804124 |
+----+--------------+--------------------+
What is going on? Why is the MAX identity from the first query not the same as the max identity (top row) for the second query? Is one / both of my query statements incorrect?
Additionally, when I compute the dot product by hand (without any SQL or MemSQL), I find that armstrong_1 does indeed produce the highest score of 0.56488848850131. So why is my first SQL query (with the MAX operator) not working?

This is simply not valid SQL:
SELECT id, identity, MAX(DOT_PRODUCT(template, JSON_ARRAY_PACK('[<probe template here>]'))) AS score
FROM collection;
You have no GROUP BY but the query is an aggregation function (due to the MAX(). Then there are two other columns. This is not valid SQL and it is sad that some databases allow it.
The best approach is ORDER BY:
SELECT id, identity, DOT_PRODUCT(template, JSON_ARRAY_PACK('[<probe template here>]')) AS score
FROM collection
ORDER BY score DESC
LIMIT 1; -- or whatever your database uses to limit to one row

Related

Recursive select Mysql

I have one table with example data:
+----+---------+
| id | rede_id |
+----+---------+
| 1 | 0 |
| 2 | 38 |
| 3 | 1 |
| 38 | 1 |
| 40 | 1 |
| 41 | 38 |
| 42 | 38 |
| 43 | 40 |
rede_id means what id some person belongs to. Its a network system.
For example, if I need to check network of id=1, the results needs to be like:
+----+---------+
| id | rede_id |
+----+---------+
| 3 | 1 |
| 38 | 1 |
| 40 | 1 |
| 41 | 38 |
| 42 | 38 |
| 40 | 1 |
And if rede_id of someone is '41' or '42' needs to be on results to. Goes to infinite.
I can have N rede_id with my id, N rede_id with some id that belongs to me and infinite... I need to get all results.
I don't know how to do that... Sincerely no I ideia.
MySQL 8.0 now supports recursive queries, documented here: https://dev.mysql.com/doc/refman/8.0/en/with.html#common-table-expressions-recursive
Before MySQL 8.0, there's no easy solution for querying this type of data.
There are alternative ways of storing the data, to make it easier to query.
See also:
What is the most efficient/elegant way to parse a flat table into a tree?
https://www.slideshare.net/billkarwin/models-for-hierarchical-data

How to make a pivot table by multiple unique ID numbers?

I'm trying to break up a SQL table that needs to take a users name and find the unique user ID's from up to 4 systems.
The data is currently like this:
| Name | User_ID |
-----------------
| A | 10 |
| A | 110 |
| A | 1500 |
| A | 4 |
| B | 20 |
| B | 100 |
| B | 2 |
| C | 10 |
I need to pivot it around the user's name to look like this (the id's don't need to be in numerical order as the SYS#_ID for each doesn't matter):
| Name | SYS1_ID | SYS2_ID | SYS3_ID | SYS4_ID |
------------------------------------------------
| A | 4 | 10 | 110 | 1500 |
| B | 2 | 20 | 100 | NULL |
| C | 10 | NULL | NULL | NULL |
This is the code I have tried on MySQL:
PIVOT(
COUNT(User_ID)
FOR Name
IN (SYS1_ID, SYS2_ID, SYS3_ID, SYS4_ID)
)
AS PivotedUsers
ORDER BY PivotedUsers.User_Name;
I'm unsure if PIVOT works on MySQL as I keep getting an error "PIVOT unknown". Is there a way to find the values that each user has and if they do not appear in the table already add them to the next column with a max of 4 values?

How to get sum for different entry in mysql

I have table in mysql like
| service_code | charges | caller_number | duration | minutes |
+--------------+---------+---------------+----------+---------+
| 10 | 15 | 8281490235 | 00:00:00 | 1.0000 |
| 11 | 12 | 9961621709 | 00:00:00 | 0.0000 |
| 10 | 15 | 8281490235 | 01:00:44 | 60.7333 |
| 11 | 2 | 9744944316 | 01:00:44 | 60.7333 |
+--------------+---------+---------------+----------+---------+
from this table I want to get charges*minutes for each separate caller_number.
I have done like this
SELECT sum(charges*minutes) as cost from t8_m4_bill groupby caller_number
but I am not getting expected output. Please help?
SELECT caller_number,sum(charges*minutes) as cost
from t8_m4_bill
group by caller_number
order by caller_number

Finding MAX Date of Two Fields in an Access Query

In my access database, we keep track of two sets of dates. One set is for date of membership dues payments, the other set is date of other contributions (a non-membership donation.) There are multiple dates for each person depending on number of payments made for each type.
Example:
+----+---------------+---------------+
| ID | Dues_Date | Cont_Date |
+----+---------------+---------------+
| 1 | 01/01/15 | 09/12/11 |
| | 01/01/14 | |
| | 01/01/13 | |
| 2 | 07/30/14 | 06/20/13 |
| | | 11/12/11 |
+----+---------------+---------------+
First I needed to know the most recent payment for each of the two fields so I ran a query that tells me the MAX (most recent) date for each field.
Example Query:
+----+---------------+---------------+
| ID | Max Dues_Date | Max Cont_Date |
+----+---------------+---------------+
| 1 | 01/01/15 | 09/12/11 |
| 2 | 07/30/14 | 06/20/13 |
| 3 | 02/11/13 | 09/16/14 |
| 4 | 07/30/12 | 06/20/11 |
| 5 | 12/13/13 | 11/12/14 |
+----+---------------+---------------+
Now I need a third field in the same query to compare the results of the first two fields and show which is the MAX of those two.
I have column 2 and 3 in the query; how can I take that and create column 4 in the same query?
Example Query:
+----+---------------+---------------+-----------------+
| ID | Max Dues_Date | Max Cont_Date | Max Date(DD&CD) |
+----+---------------+---------------+-----------------+
| 1 | 01/01/15 | 09/12/11 | 01/01/15 |
| 2 | 07/30/14 | 06/20/13 | 07/30/14 |
| 3 | 02/11/13 | 09/16/14 | 09/16/14 |
| 4 | 07/30/12 | 06/20/11 | 07/30/12 |
| 5 | 12/13/13 | 11/12/14 | 11/12/14 |
+----+---------------+---------------+-----------------+
Try adapting this to your own scenario:
SELECT tblTest.DueDate, tblTest.ContDate, [DueDate]-[ContDate] AS Test, IIf([Test]<0,[ContDate],[DueDate]) AS MaxRes
FROM tblTest;
"Test" finds which is the later date, ContDate or Due Date. The IIf statement selects the later date.
Does this help?

how to insert average of a query into mysql table

I am trying to get average of latency for each items that holds into two separate mysql table. Let me more clarify that I have two mysql tables as below,
table: monitor_servers
+-----------+-----------------+
| server_id | label |
+-----------+-----------------+
| 1 | a.com |
| 2 | b.com |
+-----------+-----------------+
table: monitor_servers_uptime
+-------------------+-----------+-----------+
| servers_uptime_id | server_id | latency |
+-------------------+-----------+-----------+
| 1 | 1 | 0.4132809 |
| 3 | 1 | 0.4157769 |
| 6 | 1 | 0.4194210 |
| 9 | 1 | 0.4140880 |
| 12 | 2 | 0.4779439 |
| 15 | 2 | 0.4751789 |
| 18 | 2 | 0.4762829 |
| 22 | 2 | 0.4706681 |
+-------------------+-----------+-----------+
Basically, each domains associated with the same id_number in both tables. While I am running the query below, getting average of each items.
select monitor_servers.label, avg(monitor_servers_uptime.latency)
from monitor_servers,monitor_servers_uptime
where monitor_servers.server_id = monitor_servers_uptime.server_id
group by monitor_servers.server_id;
The query ended up,
+---------------------+-------------------------------------+
| label | avg(monitor_servers_uptime.latency) |
+---------------------+-------------------------------------+
| a.com | 0.41393792995 |
| b.com | 0.47551423171 |
+---------------------+-------------------------------------+
My questions are doing am i in wright way while getting average of the each items and how can i insert new average result of each items into a new column on table monitor_servers ? And also what happens if some of latency rows are NULL ?
**Edit : What i am trying to achieve in one query result is **
+-----------+----------+------------------+
| server_id | label | avg. |
+-----------+----------+------------------+
| 1 | a.com | 0.41393792995 |
| 2 | b.com | 0.47551423171 |
+-----------+-----------------------------+
Thanks in advance,
Your calculation seems to be correct.
You could add another column to the monitor_servers using sql:
ALTER TABLE monitor_servers ADD avg_latency DEFAULT 0.0 NOT NULL
For doing the AVG calculation check this answer.