Full disclosure, I'm a noob at SQL
Given two sparce matrices A and B, defined as:
A(row_number, column_number, value) and B(row_number, column_number, value)
I don't understand how this query represents the multiplication of the two matrices:
SELECT A.row_number, B.column_number, SUM(A.value * B.value)
FROM A, B
WHERE A.column_number = B.row_number
GROUP BY A.row_number, B.column_number
My confusion lies in the SUM syntax and the GROUP BY / SELECT syntax
So for my GROUP BY / SELECT confusion, I don't understand why the expressions
A.row_number and B.column_number are necessary after the SELECT statement
Why do we have to specify that when we're already using SELECT and WHERE ? To me that seems like we're saying we want to SELECT using those expressions (A.row_number and B.column_number) even though we're given back a table from WHERE already. Would it not make more sense to just say SELECT * ? I'm assuming that GROUP BY just requires you to type out the expressions it uses in the SELECT statement, but I don't know for sure.
For the SUM, I just want to clarify, the SUM is only using the A.value and the B.value from whatever is returned by the WHERE correct? Otherwise, you would be multiplying all A.value with all B.value.
Clarifying either of these would be immensely helpful. Thank you!
create table A
( column_number int,
row_number int,
value int
);
create table B
( column_number int,
row_number int,
value int
);
insert A (column_number,row_number,value) values (1,1,1),(1,2,2),(2,1,3),(2,2,4);
insert B (column_number,row_number,value) values (1,1,10),(1,2,20),(2,1,30),(2,2,40);
Data with your old style (non explicit) join without aggregage or group by:
SELECT A.row_number as Ar, B.column_number as Bc,
A.value as Av,B.value as Bv,A.value*B.value as product
FROM A, B
WHERE A.column_number = B.row_number
+------+------+------+------+---------+
| Ar | Bc | Av | Bv | product |
+------+------+------+------+---------+
| 1 | 1 | 1 | 10 | 10 |
| 2 | 1 | 2 | 10 | 20 |
| 1 | 1 | 3 | 20 | 60 |
| 2 | 1 | 4 | 20 | 80 |
| 1 | 2 | 1 | 30 | 30 |
| 2 | 2 | 2 | 30 | 60 |
| 1 | 2 | 3 | 40 | 120 |
| 2 | 2 | 4 | 40 | 160 |
+------+------+------+------+---------+
Seeing the above, the below gets a little more clarity:
SELECT A.row_number, B.column_number,sum(A.value * B.value) as theSum
FROM A, B
WHERE A.column_number = B.row_number
GROUP BY A.row_number, B.column_number
+------------+---------------+--------+
| row_number | column_number | theSum |
+------------+---------------+--------+
| 1 | 1 | 70 |
| 1 | 2 | 150 |
| 2 | 1 | 100 |
| 2 | 2 | 220 |
+------------+---------------+--------+
Giving table name after SELECT will identify which table to refer to. Mainly useful in the case where both tables have same column names.
GROUP BY will aggregate the data and display one record per grouped-by value. That is, in your case, you'll end up with only one record per row-column combination.
By definition multiplication of two matrices A(n,m) and B(m,p) produces a matrix C(n,p).
So the SQL for multiplication should return same data structure as was used for storage of A and B, which is three columns:
row_number
column_number
value
, with one value per (row, column) combination.
This is why you need first two in the group by clause.
WHERE clause is independent from SELECT. First is responsible for getting the right records, second for getting the right columns.
Related
I need to get AVG for every row in SQL for example:
this is the first table
+ ---+------+-------------+
| course_id | course_name |
+ ----------+-------------+
| 1 | a |
| 2 | b |
| 3 | c |
| 4 | g |
+ ---+------+-------------+
This is the second table
I need to get AVG for both id 1 and 2. the result for example:
+ -------------------+------+----------+
| course_feedback_id | rate |course_id |
+ -================--+------+----------+
| 1 | 4 | 1 |
| 2 | 3 | 1 |
| 3 | 2 | 2 |
+ -------------------+------+----------+
this is the final answer that i need
+ ----------------------+
| course_id | AVG(rate) |
+ -=======--+-----------+
| 1 | 3.5 |
| 2 | 2 |
+ ----------------------+
I tried this soulution but it will give me only the first row not all records.
SELECT *, AVG(`rate`) from secondTable
please help
SELECT `id`, AVG(`rate`) FROM `your_table` GROUP BY `id`
Try this:
SELECT c.course_id, AVG(fb.rate)
FROM course AS c
INNER JOIN course_feedback AS fb ON fb.course_id = c.course_id
GROUP BY c.course_id
Select course_id,t2.rate from table1 where course_id,rate in (Select course_id,avg(rate) as rate from table group by course_id t2)
When you have multiple entries/redundant entries and you want to find some aggregation per each as in this case you got id containing redundant records, In such cases always try to use group by as group by as the name says will group records of the column to which it is applied and if you apply aggregation avg in this case will be groupwise column to which it is being applied not as a whole like for id 1 we have 2 redundant entries so itll apply avg(id1_entries)..likewise as a group.
I'm in the need to perform a select SUM() where that is a formula contained into a field selected by another query.
Example:
table_A (the "formula" field contains, in each cell, an arithmetic expression involving columns from table B):
+------------+--------------+------------+
| Product_id | related_prod | formula |
+------------+--------------+------------+
| U1 | C2 | col2-col1 |
| U2 | C3 | col3-col2 |
| U3 | C4 | col3-col1 |
+------------+--------------+------------+
table_B:
+------------+---------+------------+----------+------+------+------+
| Product_id | year_id | company_id | month_id | col1 | col2 | col3 |
+------------+---------+------------+----------+------+------+------+
| C2 | 2017 | 1 | 2 | 100 | 200 | 300 |
| C3 | 2017 | 1 | 2 | 400 | 500 | 600 |
| C4 | 2017 | 1 | 2 | 700 | 800 | 900 |
+------------+---------+------------+----------+------+------+------+
I do, then, the following query:
SELECT
SUM(totals.relaz) as final_sum,
totals.relaz as 'col',
totals.prod as 'prod',
totals.cons as 'cons',
m.company_id, m.month_id, m.year_id, FROM `table_B` m,
( SELECT formula as relaz,
related_prod as prod,
p.product_id as cons FROM table_A p )
AS totals
WHERE m.product_id=totals.prod
GROUP BY m.company_id, m.year_id, m.month_id, m.product_id, totals.cons
After the select I'd do expect that, considering for example the only product 'U1', the corresponding row would be
+-----------+-----------+------+------+------------+----------+---------+
| final_sum | col | prod | cons | company_id | month_id | year_id |
+-----------+-----------+------+------+------------+----------+---------+
| 100 | col2-col1 | C2 | U1 | 1 | 2 | 2017 |
+-----------+-----------+------+------+------------+----------+---------+
Instead, what I get is
+-----------+-----------+------+------+------------+----------+---------+
| final_sum | col | prod | cons | company_id | month_id | year_id |
+-----------+-----------+------+------+------------+----------+---------+
| 0 | col2-col1 | C2 | U1 | 1 | 2 | 2017 |
+-----------+-----------+------+------+------------+----------+---------+
i.e. the final_sum field is always set to 0, despite the 'col' field contains the correct equation.
What am I doing wrong?
Thank you in advance
Alex
You are trying to get sum from a string column (table_A.formula). This will result 0. MySQL/MariaDB will not try to convert the strings to column references and evaluate the formula in the string.
Another thing is that you should list all columns not in aggregate function in GROUP BY.
To get the result you want, use:
SELECT
SUM(CASE
WHEN a.formula = 'col2-col1' THEN b.col2-b.col1
WHEN a.formula = 'col3-col1' THEN b.col3-b.col1
WHEN a.formula = 'col3-col2' THEN b.col3-b.col2
END
) AS final_sum,
a.formula as 'col',
a.related_prod as 'prod',
a.Product_id as 'cons',
b.company_id,
b.month_id,
b.year_id
FROM table_B b
JOIN table_A a on a.related_prod=b.Product_id
GROUP BY a.formula, a.related_prod, a.Product_id, b.company_id, b.month_id, b.year_id
It may possible to build a Stored routine that fetches the string col2-col1 and inserts it (using CONCAT) into a string, then PREPAREs and EXECUTEs the SQL string.
That is, dynamically build the SQL, perhaps like in #slaakso's Answer.
It would be messy.
I have needed something like this; I chose to do eval() in PHP, which was the client language. I use it for evaluating VARIABLES and GLOBAL STATUS. Example: Table_open_cache_misses / Uptime gives the "misses per second", which, if high, indicates the need for increasing the setting table_open_cache.
I have some data (~70,000 rows) that is in a similar format to the below.
+-----------+-----+-----+----+-----------+
| ID | A | B | C | Whatever |
+-----------+-----+-----+----+-----------+
| 1banana | 42 | 0 | 2 | Um |
| fhqwhgads | 514 | 6 | 9 | Nevermind |
| 2banana | 69 | 42 | 0 | NULL |
| pears | 18 | 96 | 2 | 8.8 |
| zubat2 | 96 | 2 | 14 | "NULL" |
+-----------+-----+-----+----+-----------+
I want to make an output table that counts how many times each number occurs in any of the three columns, such as:
+--------+---------+---------+---------+-----+
| Number | A count | B count | C count | sum |
+--------+---------+---------+---------+-----+
| 0 | 0 | 1 | 1 | 2 |
| 2 | 0 | 1 | 2 | 3 |
| 6 | 0 | 1 | 0 | 1 |
| 9 | 0 | 0 | 1 | 1 |
| 14 | 0 | 0 | 1 | 1 |
| 18 | 1 | 0 | 0 | 1 |
| 42 | 1 | 1 | 0 | 2 |
| 69 | 1 | 0 | 0 | 1 |
| 96 | 1 | 1 | 0 | 2 |
| 514 | 1 | 0 | 0 | 1 |
+--------+---------+---------+---------+-----+
(In my real-world use, there would be at least 10 times as many rows in the input table than in the query result)
Whether or not the query returns a row of zeros for numbers that aren't anywhere in those 3 columns isn't that important, as is a lack of a distinct sum column (though my preferences are that it does have the sum column and numbers not in any column are excluded).
Currently, I am using the following query to get ungrouped data:
SELECT * #Number, COUNT(DISTINCT A), COUNT(DISTINCT B), COUNT(DISTINCT C)
FROM
( # Generate a list of numbers to try
SELECT #ROW := #ROW + 1 AS `Number`
FROM DataTable t
join (SELECT #ROW := -9) t2
LIMIT 777 # None of the numbers I am interested in should be greater than this
) AS NumberList
INNER JOIN DataTable ON
Number = A
OR Number = B
OR Number = C
#WHERE <filters on DataTable columns to speed things up>
#WHERE NUMBER = 10 # speed things up
#GROUP BY Number
The above query with the commented-out parts of the code left as they are returns a table similar to the data table, but sorted by which number of the entry it matches. I would like to group together all rows starting with the same Number, and have the values in the "data" columns of the query result be the count of how many times the Number occured in the corresponding column of DataTable.
When I uncomment the grouping statements (and delete the * from the SELECT statement), I can get the count of how many rows each Number appeared in (useful for the sum column of the desired output). However, it does not give me the actual totals of how many times the Number matched each data column: I just get three copies of the number of rows where Number was found. How do I get the groupings to be by each actual column instead of the total number of matching rows?
Additionally, you may have noticed that I have some lines with comments regarding speeding things up. This query is slow, so I added a couple filters so testing it runs faster. I would very much like some way to make it run fast so that sending the results of the query from the complete set to a new table is not the only reasonable way to re-use this data, since I would like to have the ability to play around with the filters on DataTable for non-performance reasons. Is there a better way to structure the overall query so that it runs faster?
I think you want to unpivot using union all and then an aggregation:
select number, sum(a) as a, sum(b) as b, sum(c) as c, count(*) as `sum`
from ((select a as number, 1 as a, 0 as b, 0 as c from t
) union all
(select b, 0 as a, 1 as b, 0 as c from t
) union all
(select c, 0 as a, 0 as b, 1 as c from t
)
) abc
group by number
order by number;
I have read answers to similar questions, and solved my problem based on these answers. However the (my) solution is rather complex, it includes two-step views, an additional table with left join. I wonder if there is a simpler, more elegant solution to the problem below as an illustration. The table Test has column B that needs to be summarized based on a criteria that column Marker has values 4 or over. The table TestResult is the desired output of the query. As I say, I do have an awkward, complex solution, but I'm curious if there is a simple, elegant method?
mysql> select * FROM Test;
+----+------+------+--------+
| ID | A | B | Marker |
+----+------+------+--------+
| 1 | 11 | 2 | 4 |
| 2 | 11 | 1 | 5 |
| 3 | 14 | 4 | 4 |
| 4 | 11 | 2 | 1 |
| 5 | 12 | 2 | 2 |
| 6 | 13 | 2 | 3 |
| 7 | 14 | 2 | 2 |
+----+------+------+--------+
7 rows in set (0.00 sec)
mysql> select * FROM TestResult;
+----+------+
| A | SumB |
+----+------+
| 11 | 3 |
| 12 | 0 |
| 13 | 0 |
| 14 | 4 |
+----+------+
4 rows in set (0.00 sec)
Could be achieved by using group by and case:
select A, sum(case when marker >= 4 then B else 0 end)
from Test
group by A;
Check out this SQL Fiddle
Edit: I originally overlooked the "or over" part of the question; leading me to think a different (more difficult to reach) answer was desired. I've corrected the condition, but left the solution for the more difficult answer since someone has apparently found it helpful.
Using the LEFT JOIN assures you get all A values from Test, the subselect provides you an indicator of the A values of rows satisfying the condition.
SELECT A, IF(t1.A IS NULL, 0, SUM(B)) AS sumB
FROM Test AS t0
LEFT JOIN (SELECT DISTINCT A FROM Test WHERE Marker >= 4) AS t1
ON t0.A = t1.A
GROUP BY A;
Alternatively, this is the most succinct; but might be less reliable or even rejected through some types of connections.
SELECT A, IF(SUM(IF(Marker>=4,1,0))>0, SUM(B),0) AS sumB
FROM Test
GROUP BY A;
I'd do it like this. Start with this:
SELECT t.A
FROM Test t
GROUP BY t.A
That's easy enough.
For the second column, we want a SUM of an expression...
SELECT t.A
, SUM(expr) AS SumB
FROM Test t
GROUP BY t.A
The trick is to use an expression that does a conditional tests that determine whether a value of column B should be returned or not. For example, in MySQL specific syntax, something like:
IF(t.Marker >= 4, t.B, 0)
Or, a more ANSI standards compliant equivalent:
CASE WHEN t.Marker >= 4 THEN t.B ELSE 0 END
Putting that all together...
SELECT t.A
, SUM(IF(t.Marker >= 4, t.B, 0)) AS SumB
FROM Test t
GROUP BY t.A
EDIT
What if there are rows like this in Test
+----+------+------+--------+
| ID | A | B | Marker |
+----+------+------+--------+
| 1 | 11 | 2 | 4 |
| 2 | 11 | 1 | 5 |
| 17 | 11 | 4 | 1 |
I'm questioning whether the row with a Marker value of 1 (which is not greater than or equal to 4), should the value of B be included in the SUM?
The query above, you'd get a SumB value of 3. Just the values from first two rows will be included in the SUM.
If you need to return a value of 7 for SumB (including that third row), then you'll need a different query pattern.
This seems to be a convoluted problem, but I'll try my best to articulate the idea and illustrate a scenario. Essentially I have two tables that need to be combined and returned as the result set for a single query. One table needs to be merged into the other in a specific order.
Say table one is called Articles and table two is called Features. Both tables have an ID field with unique numbers. Articles has a date field which will be used to initially sort its records in descending order. The Features table has a Delta field which be used initially to sort its records. Some of the records in the Features table are placeholders and are not meant to be included in the final set. Their only purpose is to affect the sort order. Each record has a unique value in the Delta field, from 1 - X which will be used to sort these records. Another field called Skip has a value of 1 if it should be eliminated when merging the two tables together. Again, the only purpose to the skipped records is to take up space during the initial sort on the Features table. Even though they are unnecessary, they exist and can't be deleted.
The tricky part is that when the results from both tables are merged, any non-skipped records from the Features table need to be inserted into the results from the Articles table in the exact order they appears in the Features table.
So lets say I have 6 records in the Features table, A - F and the order field ranges from 1 - 6. Records A,B,D,E all have a value of 1 in the Skip field. That means I'm only interested in records C and F both of which need to be inserted into the final record set in positions 3 and 6 respectively.
The records may look something like this for the Articles table:
+----+------------+
| id | date |
+----+------------+
| 1 | 9999999999 |
+----+------------+
| 2 | 9999999998 |
+----+------------+
| 3 | 9999999997 |
+----+------------+
| 4 | 9999999996 |
+----+------------+
| 5 | 9999999995 |
+----+------------+
| 6 | 9999999994 |
+----+------------+
| 7 | 9999999993 |
+----+------------+
| 8 | 9999999992 |
+----+------------+
| 9 | 9999999991 |
+----+------------+
| 10 | 9999999990 |
+----+------------+
The Features table may look something like this:
+----+------+-------+------+
| id | name | delta | skip |
+----+------+-------+------+
| 11 | A | 1 | 1 |
+----+------+-------+------+
| 12 | B | 2 | 1 |
+----+------+-------+------+
| 13 | C | 3 | 0 |
+----+------+-------+------+
| 14 | D | 4 | 1 |
+----+------+-------+------+
| 15 | E | 5 | 1 |
+----+------+-------+------+
| 16 | F | 6 | 0 |
+----+------+-------+------+
The results would look something like this (not including any additional fields that might be needed to achieve my goal):
+----+
| id |
+----+
| 1 |
+----+
| 2 |
+----+
| 13 | (record from the Features table in the third position)
+----+
| 3 |
+----+
| 4 |
+----+
| 16 | (record from the Features table in the sixth position)
+----+
| 5 |
+----+
| 6 |
+----+
| 7 |
+----+
| 8 |
+----+
| 9 |
+----+
| 10 |
+----+
Hope my explanation makes sense. Any ideas?
Thanks,
Howie
I assume that there is a mistake in your example - record id=16 is sixth row in Features table, so should be after id=5 in results, not before.
Try the blelow query. Here is SQLFiddle.
select id from (
select `date`, null delta, id
from Articles
union all
select a.`date`, f.delta, f.id
from (
select (#x:=#x+1) rn, a.*
from Articles a, (select #x:=0) z
order by a.`date` desc
) a
join (
select (#y:=#y+1) rn, f.id, f.delta, f.skip
from Features f, (select #y:=0) z
order by f.delta
) f
on a.rn = f.rn
where f.skip <> 1
order by `date` desc, isnull( delta ), delta
) merge
Looks like this example in SQL Fiddle did it for me.
SELECT id, sort_order FROM (
SELECT `date`, NULL delta, id, (#a_count:=#a_count+1) sort_order
FROM Articles a_main, (SELECT #a_count:=-1) z
UNION ALL
SELECT a.`date`, f.delta, f.id, f.weighted_rn
FROM (
SELECT (#x:=#x+1) rn, a.*
FROM Articles a, (SELECT #x:=-1) z
ORDER BY a.`date` DESC
) a
JOIN (
SELECT (#y:=#y+1) rn, TRUNCATE((f.delta - #y - (1/#y)),2) AS weighted_rn, f.id, f.delta, f.skip
FROM Features f, (SELECT #y:=-1) z
WHERE f.skip <> 1
ORDER BY f.delta
) f
ON a.rn = f.rn
ORDER BY sort_order
) merge
Thanks to Kordirko for the framework.