Totaling a column in SQL - mysql

I am trying to run a SQL Query in phpmyadmin that will total multiple rows and insert that total into the cell.
The file is sorted by date and then a few other specifics.
I am just having a hard time finding the syntax for it.
Basically I have one column called 'Points' and another called 'Total_Points'
It needs to look like this:
+--------+--------------+
| Points | Total Points |
+--------+--------------+
| 10 | 10 |
| 10 | 20 |
| 10 | 30 |
| 10 | 40 |
+--------+--------------+
And so on and so on.
It seems like there has to be something out there that would do this and I am just missing it

For a running sum you can use "window functions" like this:
create table tbl(ord int, points int);
insert into tbl values(1, 10),(2, 10), (3, 10), (4, 10);
select
*,
sum(points) over w total_points
from tbl
window w as (order by ord);
ord | points | total_points
-----+--------+--------------
1 | 10 | 10
2 | 10 | 20
3 | 10 | 30
4 | 10 | 40
I "aggregated" your ORDER BY criteria to be the column ord. You may of course replace it.
Besides of this: You did not specify a concrete database vendor. My example runs on PostgreSQL. Other SQL dialects may be a little different.

There is a lot of discussion about cumulative sums in SQL.
You may want to look at this article. If speed isn't a big issue, you can also simply use:
select t1.id, t1.SomeNumt, SUM(t2.SomeNumt) as sum
from #t t1
inner join #t t2
on t1.id >= t2.id
group by t1.id, t1.SomeNumt
order by t1.id
The key issue is that SQL doesn't actually store rows in any order. The ieda of a 'running total' assumes that the rows have an order, and by default, this isn't true. In SQL Server 2012, there is a function for this, but until people are using it, this is the best we can do.

Related

INSERT data from one table INTO another with the copies (as many as `quantity` field in first table says)

I have an MySQL table creatures:
id | name | base_hp | quantity
--------------------------------
1 | goblin | 5 | 2
2 | elf | 10 | 1
And I want to create creature_instances based on it:
id | name | actual_hp
------------------------
1 | goblin | 5
2 | goblin | 5
3 | elf | 10
The ids of creatures_instances are not important and not relevant to creatures.ids.
How can I make it with just the MySQL in the most optimal (in terms of execution time) way? The single query would be best, but procedure is ok too. I use InnoDB.
I know that with a help of e.g. php I could:
select each row separately,
make for($i=0; $i<line->quantity; $i++) loop in which I insert one row to creatures_instances for each iteration.
The most efficient way is to do everything in SQL. It helps if you have a numbers table. Without one, you can generate the numbers in a subquery. The following works up to 4 copies:
insert into creatures_instances(id, name, actual_hp)
select id, name, base_hp
from creatures c join
(select 1 as n union all select 2 union all select 3 union all select 4
) n
on n.n <= c.quantity;

Filter by one value in unstructured table but sorting by another

This is perhaps a fairly straight-forward SQL query, but I've not done much SQL/database querying before and have inherited an issue that I'm struggling to understand and describe properly (thus the vague title....)
USER META
======================================
id | user_id | field | value
======================================
1 | 1 | color | red
2 | 1 | year | 1923
3 | 1 | ... | ...
4 | 3 | color | purple
5 | 3 | year | 2013
6 | 3 | ... | ...
7 | 7 | color | red
8 | 7 | year | 1982
9 | 7 | ... | ...
Given that I have a table structured like the above example, how would I query for a list of user_id's filtered by a specific 'color', but sorted by 'year'?
NOTE: I'm dealing with a legacy project, so I'm not in position to make schema changes.
You can do self-join:
SELECT DISTINCT t1.user_id
FROM TableName t1
JOIN TableName t2 ON t1.user_id = t2.user_id
WHERE t1.field = 'color' AND t1.value = 'red' AND t2.field = 'year'
ORDER BY t2.value
One way is with aggregation:
select user_id
from usermeta
group by user_id
having sum((case when field = 'color' then value end) = 'purple') > 0
order by max(case when field = 'year' then value+0 end);
The having clause is counting the number of rows that meet the particular condition, and making sure there is at least one for a given user_id.
The order by is returning the year. The +0 just converts it to a numeric value, so it sorts correctly. The year is being stored as a string. (For year, this may not be important because presumably all are four digits, but for other numerics it could be important.)
I think that one solution would be to create temp tables dynamically, one with columns user_id and color and another with columns user_id and year based upon this table and then do a join on them. Even if you find a syntactically composite single statement solution, it will internally have to do something similar.
It is possible to insert into a table based upon the output of a select. Sorry, I am not giving you the exact syntax here but the direction hopefully helps.

How to group values from a table if they're close?

Let's say I define 10 as being a close enough difference between two values, what I want is the average of all the values that are close enough to each other (or in other words, grouped by their closeness). So, if I have a table with the following values:
+-------+
| value |
+-------+
| 1 |
| 1 |
| 2 |
| 4 |
| 2 |
| 1 |
| 4 |
| 3 |
| 22 |
| 23 |
| 24 |
| 22 |
| 20 |
| 19 |
| 89 |
| 88 |
| 86 |
+-------+
I want a query that would output the following result:
+---------+
| 2.2500 |
| 21.6667 |
| 87.6667 |
+---------+
Where 2.2500 would be produced as the average of all the values ranging from 1 to 4 since they're for 10 or less away from each other. In the same way, 21.6667 would be the average of all the values ranging from 19 to 24, and 87.6667 would be the average of all the values ranging from 86 to 89.
Where my specified difference of what is currently 10, would have to be variable.
This isn't so bad. You want to implement the lag() function in MySQL to determine if a value is the start of a new set of rows. Then you want a cumulative sum of this value to identify a group.
The code looks painful, because in MySQL you need to do this with correlated subqueries and join/aggregation rather than with ANSI standard functions, but this is what it looks like:
select min(value) as value_min, max(value) as value_max, avg(value) as value_avg
from (select t.value, count(*) as GroupId
from table t join
(select value
from (select value,
(select max(value)
from table t2
where t2.value < t.value
) as prevValue
from table t
) t
where value - prevvalue < 10
) GroupStarts
on t.value >= GroupStarts.value
group by t.value
) t
group by GroupId;
The subquery GroupStarts is finding the break points, that is, the set of values that differ by 10 or more from the previous value. The next level uses join/aggregation to count the number of such break points before any given value. The outermost query then aggregation using this GroupId.
Create another column with a hash value for the field. This field will be used to test for equality. For example with strings you may store a soundex. For numbers you may store the closest multiple of ten
Otherwise doing a calculation will be much slower. You could also cross join the table to itself and group where the difference of the two fields < 10
I like the other user's suggestion to create a hash column. Joining to yourself has an exponential effect, and should be avoided.
One other possibility is to use /, for example select avg(val), val/10 from myTable group by val/10 would have a value of group that is 0 for 0-9, 1 for 10-19, etc.
At least, it works in SQL Server that way
At first, I would export to an array the whole result.
Afterwards, use a function
function show(elements_to_agroup=4)
{
for (i = 0; i < count(array) ; i++)
{
sum = 0;
if (i % elements_to_agroup)
{
sum = sum / elements_to_agroup;
return sum;
}
else
{
sum =+ array[i];
}
}
}

Replace nested select query by self join

I recently asked a question here concerning an SQL query: Trouble wrapping head around complex SQL delete query
I now understand that what I'm trying to do is too complex to pull off with a single query or even multiple queries without some way to keep results in between. Therefore I decided to create a bash script (the end result will be something to do with a cronjob so bash is the most straightforward choice).
Consider the following table:
AssociatedClient:
+-----------+-----------------+
| Client_id | Registration_id |
+-----------+-----------------+
| 2 | 2 |
| 3 | 2 |
| 3 | 4 |
| 4 | 5 |
| 3 | 6 |
| 5 | 6 |
| 3 | 8 |
| 8 | 9 |
| 7 | 10 |
+-----------------------------+
What I want to do is select all Registration_ids where the Client_id is in the list of Client_ids associated with a specific Registration_id.
Although I'm pretty noob with SQL, I found this query relatively easy:
SELECT `Registration_id` FROM `AssociatedClient` ac1
WHERE ac1.`Client_id` IN
(SELECT `Client_id` FROM `AssociatedClient` ac2
WHERE ac2.`Registration_id` = $reg_id);
where $reg_id is just a bash variable.
This works but I would like to see it done with a self join, because it looks nicer, especially within a bash script where a lot of character clutter occurs. I'm afraid my SQL skills just don't reach that far.
If I've understood correctly, you should just be able to do a simple self join like so:
SELECT ac1.registration_id
FROM associatedclient ac1
JOIN associatedclient ac2 ON ac2.client_id = ac1.client_id
WHERE ac2.registration_id = $reg_id
So what you are doing is scanning the table once, joining it to itself where the client_id matches. Then you are restricting the joined rows to ones where the 2nd version of the table has a specific id, leaving you with the different permutations of the join on the 1st table, and then just picking the registration_id from those rows.
So, given the example of a variable value of 6, try running the following statement:
SELECT
ac1.client_id AS client_id_1
, ac1.registration_id AS reg_id_1
, ac2.client_id AS client_id_2
, ac2.registration_id AS reg_id_2
FROM associatedclient ac1
JOIN associatedclient ac2 ON ac1.client_id = ac2.client_id
and you'll notice the full set of joins. Then try adding the WHERE restriction and notice which rows come back. Then finally just pick the column you want.
You can check out a SQLFiddle I set up which tests it with a value of 6

How to find the best score for each event in MySQL? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I have a MySQL table containing data for series of tests taken by athletes. I want to get the best results for each event.
Here is the table containing the data for all of the tests taken by the athletes:
+---------+-----------+-------+
| eventId | athleteId | score |
+---------+-----------+-------+
| 1 | 129907 | 900 |
| 2 | 129907 | 940 |
| 3 | 129907 | 927 |
| 4 | 129907 | 856 |
| 1 | 328992 | 780 |
| 2 | 328992 | 890 |
| 3 | 328992 | 936 |
| 4 | 328992 | 864 |
| 1 | 492561 | 899 |
| 2 | 492561 | 960 |
| 3 | 492561 | 840 |
| 4 | 492561 | 920 |
| 5 | 487422 | 900 |
| 6 | 487422 | 940 |
| 7 | 487422 | 927 |
| 5 | 629876 | 780 |
| 6 | 629876 | 890 |
| 7 | 629876 | 940 |
| 5 | 138688 | 899 |
| 6 | 138688 | 950 |
| 7 | 138688 | 840 |
+---------+-----------+-------+
I need to select the best standard lineup, taking the best tests. The result I am looking for should be:
+---------+-----------+-------+
| eventId | athleteId | score |
+---------+-----------+-------+
| 1 | 129907 | 900 |
| 2 | 492561 | 960 |
| 3 | 328992 | 936 |
| 4 | 492561 | 920 |
| 5 | 487422 | 900 |
| 6 | 138688 | 950 |
| 7 | 629876 | 940 |
+---------+-----------+-------+
If you wanted to reliably get the winner (and joint winners). The following SQL statement should do it...
SELECT athleteId, a.eventId, a.score
FROM tests AS a
JOIN (
-- This select finds the top score for each event
SELECT eventId, MAX(score) AS score
FROM tests
GROUP BY eventId
) AS b
-- Join on the top scores
ON a.eventId = b.eventId
AND a.score = b.score
I'm performing a sub-select to get the highest scores for each event and then performing an inner join to get the individual records that achieved the highest score in the event.
Additional information
I have compiled the following information from conversations in the comments.
Why is the basic group by solution not reliable?
SELECT athleteId, eventId, score
FROM (
SELECT athleteId, eventId, score
FROM tests
ORDER BY eventId, score DESC
) AS a
GROUP BY eventId
We are creating a group from a recordset we have ordered on event and score. We are then selecting the value from the columns using grouping to select one record per event.
The first thing to note
If you are using a GROUP BY clause you are no longer talking about individual records but an unordered set of records!
You can use aggregate functions to do some pretty powerful and useful cross-record calculations in MySQL http://dev.mysql.com/doc/refman/5.1/en/group-by-functions.html but in order to relate the groups back to individual records you will likely need to perform a JOIN.
In the second example we are returning groups as if they were individual records.
Why does the second example appear to work?
Rather than in the SQL language non-aggregated columns are illegal, in MySQL they have been allowed, although I can't say why, it could be for performance reasons in denormalized columns or where for some reason you are certain that the value for the column in a group does not change.
MySQL selects the easiest value to return for the non-aggregated column in a group. It happens to select the first value it encounters as a result of the ordering of the recordset before it was grouped, however, it will not necessarily do this all of the time!
MySQL documentation states that the values for non-aggregated columns in a select containing a GROUP BY are indeterminate. This means that the resulting values for non-aggregated columns should not be assumed to be a result of events prior to grouping (i.e. any ordering in the recordset), although practically in this current implementation it appears that way.
In future version it may not be the case, it may not even be the case that the result may not even be the same if you run it twice. The fact it is documented explicitly is reason enough for me to avoid it!
Why are non-aggregated columns indeterminate?
I would deduce that they intend to leave the implementation of algos for grouping open for future optimization which may ignore or break the original ordering of the records prior to grouping.
Conceptually it makes sense if you imagine a group of records as a single unit rather than a collection of individual records. For a non-aggregate column there are a number of possible values that can be returned and no implied conditions to choose one over the other at that point of selection, you have to remember the way the records were before grouping.
The risk
All of my queries using this approach may start acting up at some point. They might return values for a record that did not obtain the highest score for the event.
Also, this bug wont be immediately apparent so tracking the cause to the recent upgrade of MySQL will take a while. I can also guarantee I will have forgotten about this potential pitfall, where all of the places this was a problem when it does happen and so I will likely end up stuck on an older less secure version of MySQL until I get the chance to debug it properly... etc... Painful...
Why does the join solution differ?
The sub select in the JOIN statement does not use non-aggregated columns, aggregations are determinate as they relate to the group as a whole rather than individual records. Regardless of the order of the records before they were grouped the answer will always be the same.
I have used a JOIN statement to relate the groups back to the individual records we are interested in. In some cases it may mean that I have more than one individual record for each group. For example, when it comes to a draws where two athletes have the same highest score I will either have to return both records or arbitrarily pick one. I'm fairly confident that we will want all of the highest scorers so I haven't provided any rules to select between two athletes that may draw.
Picking one record as the winner
In order to pick one record as the clear winner we need a way of being able to tell apart the winner from the runners up. We might pick the ultimate winner as the first athlete to get the highest score, for another athlete to jump into the lead they must better the previous score set.
To do this we must have a way of determining the sequence of the tests so we introduce a testId column which will be incremented with each new result we get. When we have this we can then perform the following query...
SELECT a.eventId, athleteId, a.score
FROM tests AS a
JOIN (
-- This select finds the first testId for each score + event combination
SELECT MIN(testId) AS testId, c.eventId, c.score
FROM tests AS c
JOIN (
-- This select finds the top score for each event
SELECT eventId, MAX(score) AS score
FROM tests
GROUP BY eventId
) AS d
ON c.eventId = d.eventId
AND c.score = d.score
GROUP BY eventId, score
) AS b
ON a.testId = b.testId
What happens here is that we create groups representing the highest score for each event we then inner join that with groups that represent the lowest testId for each score and event combination and finally inner join that with the records in the test table to get the individual records.
This can also be written (with a slightly different execution plan) as follows.
SELECT a.eventId, athleteId, a.score
FROM tests AS a
JOIN (
-- This select finds the top score for each event
SELECT eventId, MAX(score) AS score
FROM tests
GROUP BY eventId
) AS b
ON a.eventId = b.eventId
AND a.score = b.score
JOIN (
-- This select finds the first testId for each score + event combination
SELECT MIN(testId) AS testId, eventId, score
FROM tests
GROUP BY eventId, score
) AS c
ON a.testId = c.testId
The basic group by solution achieves the same result in less SQL but it optimizes very poorly in comparison. If we add indexes to our tables the basic group by solution doesn't utilize the indexes and requires two filesorts (additional runs through the table to put it into order) on all of the records in the tests table. However, the original nested sub-select query above optimizes very well.
Try this one:
SELECT t1.eventId, t1.athleteId, t1.score
FROM tests t1
LEFT JOIN tests t2 ON t2.eventId = t1.eventId AND t2.score > t1.score
WHERE t2.athleteId IS NULL
ORDER BY t1.eventId
http://sqlfiddle.com/#!2/80e34/3/0