Summarise multiple columns at once in MySQL - mysql

I have some data (~70,000 rows) that is in a similar format to the below.
+-----------+-----+-----+----+-----------+
| ID | A | B | C | Whatever |
+-----------+-----+-----+----+-----------+
| 1banana | 42 | 0 | 2 | Um |
| fhqwhgads | 514 | 6 | 9 | Nevermind |
| 2banana | 69 | 42 | 0 | NULL |
| pears | 18 | 96 | 2 | 8.8 |
| zubat2 | 96 | 2 | 14 | "NULL" |
+-----------+-----+-----+----+-----------+
I want to make an output table that counts how many times each number occurs in any of the three columns, such as:
+--------+---------+---------+---------+-----+
| Number | A count | B count | C count | sum |
+--------+---------+---------+---------+-----+
| 0 | 0 | 1 | 1 | 2 |
| 2 | 0 | 1 | 2 | 3 |
| 6 | 0 | 1 | 0 | 1 |
| 9 | 0 | 0 | 1 | 1 |
| 14 | 0 | 0 | 1 | 1 |
| 18 | 1 | 0 | 0 | 1 |
| 42 | 1 | 1 | 0 | 2 |
| 69 | 1 | 0 | 0 | 1 |
| 96 | 1 | 1 | 0 | 2 |
| 514 | 1 | 0 | 0 | 1 |
+--------+---------+---------+---------+-----+
(In my real-world use, there would be at least 10 times as many rows in the input table than in the query result)
Whether or not the query returns a row of zeros for numbers that aren't anywhere in those 3 columns isn't that important, as is a lack of a distinct sum column (though my preferences are that it does have the sum column and numbers not in any column are excluded).
Currently, I am using the following query to get ungrouped data:
SELECT * #Number, COUNT(DISTINCT A), COUNT(DISTINCT B), COUNT(DISTINCT C)
FROM
( # Generate a list of numbers to try
SELECT #ROW := #ROW + 1 AS `Number`
FROM DataTable t
join (SELECT #ROW := -9) t2
LIMIT 777 # None of the numbers I am interested in should be greater than this
) AS NumberList
INNER JOIN DataTable ON
Number = A
OR Number = B
OR Number = C
#WHERE <filters on DataTable columns to speed things up>
#WHERE NUMBER = 10 # speed things up
#GROUP BY Number
The above query with the commented-out parts of the code left as they are returns a table similar to the data table, but sorted by which number of the entry it matches. I would like to group together all rows starting with the same Number, and have the values in the "data" columns of the query result be the count of how many times the Number occured in the corresponding column of DataTable.
When I uncomment the grouping statements (and delete the * from the SELECT statement), I can get the count of how many rows each Number appeared in (useful for the sum column of the desired output). However, it does not give me the actual totals of how many times the Number matched each data column: I just get three copies of the number of rows where Number was found. How do I get the groupings to be by each actual column instead of the total number of matching rows?
Additionally, you may have noticed that I have some lines with comments regarding speeding things up. This query is slow, so I added a couple filters so testing it runs faster. I would very much like some way to make it run fast so that sending the results of the query from the complete set to a new table is not the only reasonable way to re-use this data, since I would like to have the ability to play around with the filters on DataTable for non-performance reasons. Is there a better way to structure the overall query so that it runs faster?

I think you want to unpivot using union all and then an aggregation:
select number, sum(a) as a, sum(b) as b, sum(c) as c, count(*) as `sum`
from ((select a as number, 1 as a, 0 as b, 0 as c from t
) union all
(select b, 0 as a, 1 as b, 0 as c from t
) union all
(select c, 0 as a, 0 as b, 1 as c from t
)
) abc
group by number
order by number;

Related

COUNT() with a nested subquery

Can I count a column with the number of instances of a character in a particular column?
+---+---+
| i | p |
+---+---+
| A | 3 |
| B | 3 |
| C | 0 |
| A | 1 |
| B | 1 |
| C | 3 |
| A | 1 |
| B | 0 |
| C | 0 |
+---+---+
Query:
SELECT i, SUM(p) AS Sp, COUNT(p) AS Cp FROM table
GROUP BY i
Id like to get this:
+---+----+----+-----+-----+-----+
| i | Sp | Cp | x3x | x1x | x0x |
+---+----+----+-----+-----+-----+
| A | 5 | 3 | 1 | 2 | 0 |
| B | 4 | 3 | 1 | 1 | 0 |
| C | 4 | 3 | 1 | 0 | 2 |
+---+----+----+-----+-----+-----+
Essentially I want to COUNT the instances of 3, 0 or 1 in a column where the column is grouped by the id 'i'
I tried this as well as a number of variations, but I can't seem to get it going.
COUNT(P WHERE p='3'), COUNT(P WHERE p='1'), COUNT(P WHERE p='0'),
Is there a means by which I can place a subquery within a COUNT() that I've missed in my research?
I also tried
COUNT(Points='3'), COUNT(='1'), COUNT(Points='0'),
You are close:
select i, sum(points), count(*),
sum(Points = 3), sum(points = 1), sum(Points = 0)
from t
group by i;
One minor difference in this case is that a removed the single quotes around the values. When comparing to a number, don't use single quotes. Only use single quotes for string and date constants.
The more important change is from count() to sum(). count() counts the number of non-NULL values. Well, the boolean expression is true or false -- but not really NULL (unless points is NULL, which is not the case with your data).
MySQL treats boolean values as integers in a numeric context, with 0 for false and 1 for true. So, adding them up counts the number of times that something is true.

How does this matrix multiply work in SQL?

Full disclosure, I'm a noob at SQL
Given two sparce matrices A and B, defined as:
A(row_number, column_number, value) and B(row_number, column_number, value)
I don't understand how this query represents the multiplication of the two matrices:
SELECT A.row_number, B.column_number, SUM(A.value * B.value)
FROM A, B
WHERE A.column_number = B.row_number
GROUP BY A.row_number, B.column_number
My confusion lies in the SUM syntax and the GROUP BY / SELECT syntax
So for my GROUP BY / SELECT confusion, I don't understand why the expressions
A.row_number and B.column_number are necessary after the SELECT statement
Why do we have to specify that when we're already using SELECT and WHERE ? To me that seems like we're saying we want to SELECT using those expressions (A.row_number and B.column_number) even though we're given back a table from WHERE already. Would it not make more sense to just say SELECT * ? I'm assuming that GROUP BY just requires you to type out the expressions it uses in the SELECT statement, but I don't know for sure.
For the SUM, I just want to clarify, the SUM is only using the A.value and the B.value from whatever is returned by the WHERE correct? Otherwise, you would be multiplying all A.value with all B.value.
Clarifying either of these would be immensely helpful. Thank you!
create table A
( column_number int,
row_number int,
value int
);
create table B
( column_number int,
row_number int,
value int
);
insert A (column_number,row_number,value) values (1,1,1),(1,2,2),(2,1,3),(2,2,4);
insert B (column_number,row_number,value) values (1,1,10),(1,2,20),(2,1,30),(2,2,40);
Data with your old style (non explicit) join without aggregage or group by:
SELECT A.row_number as Ar, B.column_number as Bc,
A.value as Av,B.value as Bv,A.value*B.value as product
FROM A, B
WHERE A.column_number = B.row_number
+------+------+------+------+---------+
| Ar | Bc | Av | Bv | product |
+------+------+------+------+---------+
| 1 | 1 | 1 | 10 | 10 |
| 2 | 1 | 2 | 10 | 20 |
| 1 | 1 | 3 | 20 | 60 |
| 2 | 1 | 4 | 20 | 80 |
| 1 | 2 | 1 | 30 | 30 |
| 2 | 2 | 2 | 30 | 60 |
| 1 | 2 | 3 | 40 | 120 |
| 2 | 2 | 4 | 40 | 160 |
+------+------+------+------+---------+
Seeing the above, the below gets a little more clarity:
SELECT A.row_number, B.column_number,sum(A.value * B.value) as theSum
FROM A, B
WHERE A.column_number = B.row_number
GROUP BY A.row_number, B.column_number
+------------+---------------+--------+
| row_number | column_number | theSum |
+------------+---------------+--------+
| 1 | 1 | 70 |
| 1 | 2 | 150 |
| 2 | 1 | 100 |
| 2 | 2 | 220 |
+------------+---------------+--------+
Giving table name after SELECT will identify which table to refer to. Mainly useful in the case where both tables have same column names.
GROUP BY will aggregate the data and display one record per grouped-by value. That is, in your case, you'll end up with only one record per row-column combination.
By definition multiplication of two matrices A(n,m) and B(m,p) produces a matrix C(n,p).
So the SQL for multiplication should return same data structure as was used for storage of A and B, which is three columns:
row_number
column_number
value
, with one value per (row, column) combination.
This is why you need first two in the group by clause.
WHERE clause is independent from SELECT. First is responsible for getting the right records, second for getting the right columns.

MySql - Increment according to another column

I have a table (innoDB) that has 3 columns: ID, ID_FATHER, ROWPOS. ID is auto_increment and ROWPOS has values from other table. I need ID_FATHER to be incremented by 1 if ROWPOS is not a sequence, if it is a sequence ID_FATHER should not increment.
Like this:
ID | ID_FATHER | ROWPOS
1 | 1 | 250
2 | 2 | 253
3 | 2 | 254
4 | 3 | 260
5 | 4 | 263
6 | 5 | 268
7 | 6 | 270
8 | 6 | 271
9 | 6 | 272
10 | 7 | 276
Is there a way to do that?
With this query:
INSERT INTO mytable (i, rowpos)
SELECT #i := IF(t.rowpos = #prev_rowpos + 1, #i, #i + 1) AS i
, #prev_rowpos := t.rowpos AS rowpos
FROM temp
JOIN (SELECT #prev_rowpos := NULL, #i := 0) v
ORDER BY t.rowpos
I am able to import into the tables I want. But the problem is in the TABLE.Service, as you can see with this solution the ID_FATHER is wrong because it only increments by 1
but in this case it actually should be 2 because invoice 1 doesn't have service.
How can I solve this problem without changing all my schema.
TABLE.temp
ROW|TYPE |INVOICE_temp
1 |xxx |10
2 |xxP |led tv
3 |xxP |mp3 Player
4 |xxx |11
5 |xxP |tv cable
6 |xxS |install
xxx = Invoice number
xxP = Product
xxs = service
TABLE.Invoice_Number TABLE.Product
ID|ID_FATHER|ROWPOS|NUM ID|ID_FATHER|ROWPOS|PROD
1 | 1 | 1 | 10 1 | 1 | 2 | led tv
2 | 2 | 4 | 11 2 | 1 | 3 | mp3 player
3 | 2 | 5 | tv cable
TABLE.Service
ID|ID_FATHER|ROWPOS|SERV
1 | 1 | 6 | install
I made some changes in the query to work as I needed.
You could do something like this:
INSERT INTO mytable (i, rowpos)
SELECT #i := IF(t.rowpos = #prev_rowpos + 1, #i, #i + 1) AS i
, #prev_rowpos := t.rowpos AS rowpos
FROM another_table t
JOIN (SELECT #prev_rowpos := NULL, #i := 0) v
ORDER BY t.rowpos
(Test just the SELECT query, get that working returning the resultset you want, before you preface it with the INSERT.)
For completeness, I will add that this technique is dependent on UNDOCUMENTED and non-guaranteed behavior in MysQL, using "user variables". I've successfully used this approach many times, but for "one off" type admin functions, not ever embedded as SQL in an application.
Note that the ORDER of the expressions in the SELECT list is important, they are evaluated in the order they appear in the SELECT list. (MySQL doesn't guarantee this behavior, but we do observe it. It's important that the check of the user variables containing values from the previous row to precede the assignment of the current row values to the user variables. That's why i is returned first, followed by rowpos. If you reversed the order of those in the SELECT list, the query would operate differently, and we wouldn't get the same results.
The purpose of the inline view (aliased as v) is to initialize the user variables. Since MySQL materializes that view query into a "derived table" before the outer query runs, those variables get initialized before they are referenced in the outer query. We don't really care what the inline view query actually returns, except that we need it to return exactly one row (because we reference it in a JOIN operation to the table we really want to query).
E.g.:
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,rowpos INT NOT NULL
);
INSERT INTO my_table (rowpos) VALUES
(250),
(253),
(254),
(260),
(263),
(268),
(270),
(271),
(272),
(276);
SELECT x.*
, #i:=#i+ISNULL(y.id) i
FROM my_table x
LEFT
JOIN my_table y
ON y.id < x.id
AND y.rowpos = x.rowpos - 1
, (SELECT #i:=0) vals
ORDER
BY x.id;
+----+--------+------+
| id | rowpos | i |
+----+--------+------+
| 1 | 250 | 1 |
| 2 | 253 | 2 |
| 3 | 254 | 2 |
| 4 | 260 | 3 |
| 5 | 263 | 4 |
| 6 | 268 | 5 |
| 7 | 270 | 6 |
| 8 | 271 | 6 |
| 9 | 272 | 6 |
| 10 | 276 | 7 |
+----+--------+------+

Count rows with specific value over multiple rows

Its very hard for to set a proper title, because I dont know how I describe my problem.
I have a table like this:
dlID | dl_seID | dlEpisode | dlFlag
___________________________________
1 | 1 | 1 | 0
2 | 1 | 2 | 1
3 | 1 | 3 | 1
4 | 2 | 1 | 1
5 | 2 | 2 | 0
6 | 3 | 1 | 0
What i want is a select query where I get something like this:
dlID | dl_seID | dlEpisode | dlFlag | dlFlagCount
_________________________________________________
1 | 1 | 1 | 0 | 2
2 | 1 | 2 | 1 | 2
3 | 1 | 3 | 1 | 2
4 | 2 | 1 | 1 | 1
5 | 2 | 2 | 0 | 1
6 | 3 | 1 | 0 | 0
dlFlagCount shoud be a counter of dlFlag = 1 where dl_seID = dl_seID.
Second try:
I need a value where I see how many Flags have the value 1 with the same dl_seID.
Is that possible?
I hope you guys know what I want^^
Regards
Try this:
select
a.*,
ifnull(b.ctflags,0)
from
tablea a left join
( select dl_seID, count(dlFlag) ctflags
from tablea
where dlFlag=1
group by dl_seID ) b on (a.dl_seID = b.dl_seID)
The left join is just to get the registry with 0 flags
See the fiddle: http://sqlfiddle.com/#!2/ef9b0/5
EDIT:
As op requested some explanation, here it goes:
What you asked is to count the amount of flags by the dl_seID and to do that you need to do this you separeta your problems, first you get the count for the dl_seID by flags, this is this subquery:
select dl_seID, count(dlFlag) ctflags
from tablea
where dlFlag=1
group by dl_seID
This became a 'separe table' or a new group of data, whatever you wanna call it. Then you have to join this with your original data (from your table) like the query for answer.
The left join part is because maybe there are some data that wont complain with where dlFlag=1 therefore if you want to get then as 0 you have to bring all values from table that exists or not on our created subgroup. And this ifnull(b.ctflags,0) is for theese data data exists on your table but has no flags (for your problem). If you use just b.ctflags it will bring null.
SELECT x.*
, COALESCE(y.flagcount,0) flagcount
FROM my_table x
LEFT
JOIN
( SELECT seID
, COUNT(*) flagcount
FROM my_table
WHERE flag = 1
GROUP
BY seid
) y
ON y.seid = x.seid;

MySQL query to sort by one column and generate its ranking (equivalent to RANK OVER PARTITION oracle)

I have a MySQL table like this:
id | points | rank | league_id
______________________________
1 | 84 | 0 | 1
2 | 55 | 0 | 1
3 | 104 | 0 | 1
4 | 123 | 0 | 2
What I want to accomplish is the following:
id | points | rank | league_id
______________________________
1 | 84 | 2 | 1
2 | 55 | 3 | 1
3 | 104 | 1 | 1
4 | 123 | 1 | 2
So - use the rank column to store ranks based on number of points, grouped by league_id. This may seem redundant but I need it for my purpose (it's for a fantasy sports website and having a rank column greatly simplifies a lot of PHP code and reduces the number of needed queries throughout the script).
I'm aware of the obvious solution - iterate through the rows, grouping by league_id and updating the ranks one by one. What I'm wondering is, is there a more efficient solution?
OTOH, so test first on a throw-away database, prepare to load backup:
SET #rank = 1;
SET #league = 0;
UPDATE tablename
SET
rank = #rank := IF(#league = league_id,#rank+1,1),
league_id = #league := league_id
ORDER BY league_id,points DESC;