SUMing Dynamically Created Columns in MySQL - mysql

UPDATE: Not sure if what I'm trying to achieve is possible but thanks for all the help - is it appropriate to request this be deleted? I don't want the contributors to lose the upvotes I've given them for their help.
UPDATE: Just to be clear, when I say columns are created 'dynamically' I mean without developer input so they are an unknown. They are still properly defined columns in a standard database table - I just don't know the names of all of them. :-D
I have a table with columns created dynamically (very rarely but I'm trying to make this as robust as possible). I need to output the SUM of these columns, ordered by highest first but obviously also need the column names in the first row (as otherwise the data is useless). I've retrieved the columns using the information_schema.columns method in to PHP and thought I'd iterate through the columns performing a SUM but if I do that, they are not ordered numerically.
This can be built in to an SP (I'm assuming it will have to be done in an SP due to complexity). I believe I probably need to involve 'PIVOT' somewhere but that is the limit of my knowledge!
So to SUMmarise (see what I did there :-D )
I have a table definition with columns like this:
volunteerID INT
yearAdded DATETIME
willySize111to120 INT
willySize121to130 INT
willySize131to140 INT
willySize141to150 INT
I'd like to return a dataset like this in a query where I can specify the year:
sizeBracket count
willySize111to120 98
willySize121to130 76
willySize131to140 54
willySize141to150 23
Every time I think I've figured out a way to do it, I hit another wall.
Thanks for any help or pointers!
Bob

Assuming that your original table has a 1 in the correct bracket for each volunteer and a 0 in all other brackets:
SELECT bracket.sizeBracket, COUNT(*) count
FROM (
SELECT CASE
WHEN willySize111to120 THEN 'willySize111to120'
WHEN willySize121to130 THEN 'willySize121to130'
WHEN willySize131to140 THEN 'willySize131to140'
WHEN willySize141to150 THEN 'willySize141to150'
END CASE sizeBracket
FROM ... -- < Table Name
WHERE ... -- < Date Selection Logic
) bracket
GROUP BY sizeBracket
ORDER BY count DESC
UPDATE
Based on a raw data table willySize with columns
volunteerID INT
yearAdded DATETIME
willySize INT
You could run the following query
SELECT
CONCAT(
'willySize',
ROUND(willySize-6,-1)+1,
'to',
ROUND(willySize+4,-1)
) sizeBracket,
COUNT(*) count
FROM willySize
GROUP BY sizeBracket
ORDER BY count DESC

Related

it is possible to "group by" without losing the original rows?

i have a query like this:
ID | name | commentsCount
1 | mysql for dummies | 33
2 | mysql beginners guide | 22
SELECT
...,
commentsCount // will return 33 for first row, 22 for second one
FROM
mycontents
WHERE
name LIKE "%mysql%"
also i want to know the total of comments, of all rows:
SELECT
...,
SUM(commentsCount) AS commentsCountAggregate // should return 55
FROM
mycontents
WHERE
name LIKE "%mysql%"
but this one obviously returns a single row with the total.
now i want to merge these two queries in one single only,
because my actual query is very heavy to execute (it uses boolean full text search, substring offset search, and sadly lot more), then i don't want to execute it twice
is there a way to get the total of comments without making the SELECT twice?
!! custom functions are welcome !!
also variable usage is welcome, i never used them...
You can cache the intermediate result to a temporary table, and then do the sum over this table
One obvious solution is storing intermediate results withing another 'temporary' table, and than perform aggregation in the second step.
Another solution is preparing a lookup table containing sums you need (but there obviously needs to be some grouping ID, I call it MASTER_ID), like that:
CREATE TABLE comm_lkp AS
SELECT MASTER_ID, SUM(commentsCount) as cnt
FROM mycontents
GROUP BY MASTER_ID
Also create an index on that table on column MASTER_ID. Later, you can modify your query like that:
SELECT
...,
commentsCount,
cnt as commentsSum
FROM
mycontents as a
JOIN comm_lkp as b ON (a.MASTER_ID=b.MASTER_ID)
WHERE
name LIKE "%mysql%"
It also shouldn't touch your performance as long as lookup table will be relatively small.
A GROUP BY on one of the ID fields might do the trick. This will then give you the SUM(commentsCount) for each ID.
The query in your question is not detailed enough to know which of your fields/tables the ID field should come form.

How to retrieve a count number of specified values in mysql (even if no records)?

Suppose I have a table with a field named 'rating', it may take different values, but I want to receive a count of specific values.
Example:
Create table mytable(
rating int(1),
);
First and the obvious way I could think of was the following:
select rating,count(rating) from mytable group by rating order by rating
The problem though it is not clear how many values it would return, it may be also not easy to process them way.
What I would really like to do is to select two fields in one row showing the number of records that have some specific values.
Example...
//something like this (some pseudocode):
select count(rating=-1) as rating1, count (rating=1) as rating2 from mytable
Could you advice on some neat way I could select in the ^ above format?
select SUM(IF(rating=-1,1,0)) AS rating1,
SUM(IF(rating=1,1,0)) AS rating2 from mytable

sql group by value within substring of char(255) field

I have a column called image_small for a table images which will have information like this:
image_small
/images/menu_items/148/5a13140f8ef4503f024d06231037c69e_medium.jpg
/images/menu_items/152/5a14444444444444424d06231037c69e_medium.jpg
...
I'd like to select out a the count of unique values for the menu item value (148) above (assume 148-400) on Mysql 5.1 using innodb so that output would be like:
val count
148 5
101 4
152 1
The integer values (148,101,152) don't exist in a column and would rather not loop through.
Any ideas on how to do this? Would seem like would need to create a notion of a variable for the '148' and then a group by but a little clueless.
thx
I'm supposing that all your paths have the same format.
select
substring_index(substring_index(str,'/',4),'/',-1) as val,
count(*) as `count`
from table
group by val
It seems to work but I'm afraid about performance on large tables. Do some tests.
Could you add a field with this value? I think it would be better.

SQL Server 2008

I have a table like so
ID NAME
----------- -----------
1 JON
2 JIM
3 BOB
(3 row(s) affected)
What I need its code to select a number that does not exit in the column ID and out put it to a file so in this instance it will be "4".
What i need it to do is start at 1 then check 2,3,and so on until if finds a a number that does not exists in the table.
This code will have to be in SQL Server 2008
What you need is a numbers table or list:
Declare #MaxValue int;
Set #MaxValue = 100;
With Numbers As
(
Select 1 As Value
Union All
Select Value + 1
From Numbers
Where Value <= #MaxValue
)
Select Min(N.Value)
From Numbers As N
Left Join MyTable As T
On T.Id = N.Value
Where T.Id Is Null
OPTION (MAXRECURSION 0)
Can you specify why you need this? It sounds like there may be a better way to satisfy the overall need.
However, if all you need is the next number in the sequence, then this should work:
SELECT MAX(ID) + 1 FROM Table
Edit: I just noticed from Thomas' answer (and re-inspecting the question) that it looks like you're looking for the first gap, which may or may not be the next number. But I guess the overall point still remains... why?
Edit: I'm glad you accepted an answer, but I still think there's more to this. For example, if you just want to be able to "reserve" an ID then there are a couple ways to accomplish this.
GUIDs are good for application-generated IDs, but shouldn't be used as primary keys for performance reasons. You can have a second column as a GUID and use that within your application, allowing a simple auto-increment column to be the primary key. There are further performance considerations to be made, and you should research it.
Conversely, there's something called the Hi/Lo Algorithm for reserving ranges of database IDs. It uses integers, which are great for indexing and make great primary keys. It leaves gaps in the sequence, but that's to be expected anyway even with a regular auto-generated column (such as when a record is deleted).
If there is a requirement that there shouldn't be gaps in the identifiers, that sounds like an odd business requirement and should be analyzed for its true needs. Something like that shouldn't spill over into the primary key in your data persistence.

randomizing large dataset

I am trying to find a way to get a random selection from a large dataset.
We expect the set to grow to ~500K records, so it is important to find a way that keeps performing well while the set grows.
I tried a technique from: http://forums.mysql.com/read.php?24,163940,262235#msg-262235 But it's not exactly random and it doesn't play well with a LIMIT clause, you don't always get the number of records that you want.
So I thought, since the PK is auto_increment, I just generate a list of random id's and use an IN clause to select the rows I want. The problem with that approach is that sometimes I need a random set of data with records having a spefic status, a status that is found in at most 5% of the total set. To make that work I would first need to find out what ID's I can use that have that specific status, so that's not going to work either.
I am using mysql 5.1.46, MyISAM storage engine.
It might be important to know that the query to select the random rows is going to be run very often and the table it is selecting from is appended to frequently.
Any help would be greatly appreciated!
You could solve this with some denormalization:
Build a secondary table that contains the same pkeys and statuses as your data table
Add and populate a status group column which will be a kind of sub-pkey that you auto number yourself (1-based autoincrement relative to a single status)
Pkey Status StatusPkey
1 A 1
2 A 2
3 B 1
4 B 2
5 C 1
... C ...
n C m (where m = # of C statuses)
When you don't need to filter you can generate rand #s on the pkey as you mentioned above. When you do need to filter then generate rands against the StatusPkeys of the particular status you're interested in.
There are several ways to build this table. You could have a procedure that you run on an interval or you could do it live. The latter would be a performance hit though since the calculating the StatusPkey could get expensive.
Check out this article by Jan Kneschke... It does a great job at explaining the pros and cons of different approaches to this problem...
You can do this efficiently, but you have to do it in two queries.
First get a random offset scaled by the number of rows that match your 5% conditions:
SELECT ROUND(RAND() * (SELECT COUNT(*) FROM MyTable WHERE ...conditions...))
This returns an integer. Next, use the integer as an offset in a LIMIT expression:
SELECT * FROM MyTable WHERE ...conditions... LIMIT 1 OFFSET ?
Not every problem must be solved in a single SQL query.