Adding or subtracting values based on another field's contents - mysql

I have a table with transactions. All transactions are stored as positive numbers, if its a deposit or withdrawl only the action changes. How do i write a query that can sum up the numbers based on the action
-actions-
1 Buy 2 Sell 5 Dividend
ID ACTION SYMBOL PRICE SHARES
1 1 AGNC 27.50 150
2 2 AGNC 30.00 50
3 5 AGNC 1.25 100
So the query should show AGNC has a total of 100 shares.
SELECT
symbol,sum(shares) AS shares,
ROUND(abs(sum((price * shares))),2) AS cost,
FROM bf_transactions
WHERE (action_id <> 5)
GROUP BY symbol
HAVING sum(shares) > 0
I was originally using that query when i had positive/negative numbers and that worked great.. but i dont know how to do it now with just positive numbers.

This ought to do it:
SELECT symbol, sum(case action
when 1 then shares
when 2 then -shares
end) as shares
FROM bf_transactions
GROUP BY symbol
SQL Fiddle here
It is however good practice to denormalize this kind of data - what you appear to have now is a correctly normalized database with no duplicate data, but it's rather impractical to use as you can see in cases like this. You should keep a separate table with current stock portfolio that you update when a transaction is executed.
Also, including a HAVING-clause to 'hide' corrupted data (someone has sold more than they have purchased) seems rather bad practice to me - when a situation like that is detected you should definitely throw some kind of error, or an internal alert.

Related

Update a value and a table with a foreign key

ticketId(**)
timeExpected
timeElapsed
187
5
5
225
4
8
856
8
15
782
10
8
**primary key
*foreign key
id(**)
(*)ticketId
beyondTime
1
187
0
2
225
1
3
856
1
4
782
0
I have to know which ticket his out of time and I have this in mind and in my database but I can't figure it out with SQL. I want to know when a ticket is out of time like the ticket number 225 is, I would like to update the other table with a binary 1 for "out of time" and 0 "good".
I don't know if I can update the "beyondTime" table when I do "timeExpected - timeElapsed" in the first table when a ticket exceed the time expected.
First, let's simplify the problem you're looking a solution for:
Comparing two data field values for the rows in a table
I believe that's basically what you're trying to do. And of course, you can make a decision based on the comparison and act accordingly like updating some values in another table if you want to.
But if all you're looking for is kind of a report, then all you need would be just a select statement equipped with the logic doing the comparison for you.
You can use SQL Case Statement for this. Here is an easy guide if you want to take a look.
The select statement you need would be like this:
select ticketId, (case when timeElapsed > timeExpected then 1 else 0 end) as beyondTime from tickets
The result would be like this:
Two things to remember:
Don't get disappointed by the negative points and keep asking questions :)
Identify the issue you need help with and be specific with your question. I'd suggest you to update this question.
Good luck!

SQL - Add To Existing Average

I'm trying to build a reporting table to track server traffic and popularity overall. Each SID is a unique game server hosting a particular game, and each UCID is a unique player key connecting to that server.
Say I have a table like so:
SID UCID AvgTime NumConnects
-----------------------------------------
1 AIE9348ietjg 300.55 5
1 Po328gieijge 500.66 7
2 AIE9348ietjg 234.55 3
3 Po328gieijge 1049.88 18
We can see that there are 2 unique players, and 3 unique servers, with SID 1 having 2 players that have connected to it at some point in the past. The AvgTime is the average amount of time those players spent on that server (in seconds), and the NumConnects is the size of the average (ie. 300.55 is averaged out of 5 elements).
Now I run a job in the background where I process a raw connection table and pull out player connections like so:
SID UCID ConnectTime DisconnectTime
-----------------------------------------
1 AIE9348ietjg 90.35 458.32
2 Po328gieijge 30.12 87.15
2 AIE9348ietjg 173.12 345.35
This table has no ID or other fluff to help condense my example. There may be multiple connect/disconnect records for multiple players in this table. What I want to do is add to my existing AvgTime for each SID these new values.
There is a formula from here I am trying to use (taken from this math stackexchange: https://math.stackexchange.com/questions/1153794/adding-to-an-average-without-unknown-total-sum/1153800#1153800)
Average = (Average * Size + NewValue) / Size + 1
How can I write an update query to update each ServerIDs traffic table above, and add to the average using the above formula for each pair of records. I tried something like the following but it didn't work (returned back null):
UPDATE server_traffic st
LEFT JOIN connect_log l
ON st.SID = l.SID AND st.UCID = l.UCID
SET AvgTime = (AvgTime * NumConnects + SUM(l.DisconnectTime - l.ConnectTime) / NumConnects + COUNT(l.UCID)
I would prefer an answer in MySql, but I'll accept MS SQL as well.
EDIT
I understand that statistics and calculations are generally not to be stored in tables and that you can run reports that would crunch the numbers for you. My requirement is that users can go to a website and view the popularity of various servers. This needs to be done in a way that
A: running a complex query per user doesn't crash or slow down the system
B: the page returns the data within a few seconds at most
See this example here: https://bf4stats.com/pc/shinku555555
This is a web page for battlefield 4 stats - notice that the load is almost near instant for this player, and I get back a load of statistics without waiting for some complex report query to return the data. I'm assuming they store these calculations in preprocessed tables where the webpage just needs to do a simple select to return back the values. That's the same approach I want to take with my Database and Web Application design.
Sorry if this is off topic to the original question - but hopefully this adds additional context that helps people understand my needs.
Since you cannot run aggregate functions like SUM and COUNT by themselves at the unit level in SQL but contained in an aggregate query, consider joining to an aggregate subquery for the UPDATE...LEFT JOIN. Also, adjust parentheses in SET to match above formula.
Also, note that since you use LEFT JOIN, rows with non-match IDs will render NULL for aggregate fields and this entity cannot be used in arithmetic operations and will return NULL. You can convert to zero with IFNULL() but may fail with formula's division.
UPDATE server_traffic s
LEFT JOIN
(SELECT SID, UCID, COUNT(UCID) As GrpCount,
SUM(DisconnectTime - ConnectTime) AS SumTimeDiff
FROM connect_log
GROUP BY SID, UCID) l
ON s.SID = l.SID AND s.UCID = l.UCID
SET s.AvgTime = (s.AvgTime * s.NumConnects + l.SumTimeDiff) / s.NumConnects + l.GrpCount
Aside - reconsider saving calculations/statistics within tables as they can always be run by queries even by timestamps. Ideally, database tables should store raw values.

MySQL- Counting rows VS Setting up a counter

I have 2 tables posts<id, user_id, text, votes_counter, created> and votes<id, post_id, user_id, vote>. Here the table vote can be either 1 (upvote) or -1(downvote). Now if I need to fetch the total votes(upvotes - downvotes) on a post, I can do it in 2 ways.
Use count(*) to count the number of upvotes and downvotes on that post from votes table and then do the maths.
Set up a counter column votes_counter and increment or decrement it everytime a user upvotes or downvotes. Then simply extract that votes_counter.
My question is which one is better and under what condition. By saying condition, I mean factors like scalability, peaktime et cetera.
To what I know, if I use method 1, for a table with millions of rows, count(*) could be a heavy operation. To avoid that situation, if I use a counter then during peak time, the votes_counter column might get deadlocked, too many users trying to update the counter!
Is there a third way better than both and as simple to implement?
The two approaches represent a common tradeoff between complexity of implementation and speed.
The first approach is very simple to implement, because it does not require you to do any additional coding.
The second approach is potentially a lot faster, especially when you need to count a small percentage of items in a large table
The first approach can be sped up by well designed indexes. Rather than searching through the whole table, your RDBMS could retrieve a few records from the index, and do the counts using them
The second approach can become very complex very quickly:
You need to consider what happens to the counts when a user gets deleted
You should consider what happens when the table of votes is manipulated by tools outside your program. For example, merging records from two databases may prove a lot more complex when the current counts are stored along with the individual ones.
I would start with the first approach, and see how it performs. Then I would try optimizing it with indexing. Finally, I would consider going with the second approach, possibly writing triggers to update counts automatically.
As this sounds a lot like StackExchange, I'll refer you to this answer on the meta about the database schema used on the site. The votes table looks like this:
Votes table:
Id
PostId
VoteTypeId, one of the following values:
1 - AcceptedByOriginator
2 - UpMod
3 - DownMod
4 - Offensive
5 - Favorite (if VoteTypeId = 5, UserId will be populated)
6 - Close
7 - Reopen
8 - BountyStart (if VoteTypeId = 8, UserId will be populated)
9 - BountyClose
10 - Deletion
11 - Undeletion
12 - Spam
15 - ModeratorReview
16 - ApproveEditSuggestion
UserId (only present if VoteTypeId is 5 or 8)
CreationDate
BountyAmount (only present if VoteTypeId is 8 or 9)
And so based on that it sounds like the way it would be run is:
SELECT VoteTypeId FROM Votes WHERE VoteTypeId = 2 OR VoteTypeId = 3
And then based on the value, do the maths:
int score = 0;
for each vote in voteQueryResults
if(vote == 2) score++;
if(vote == 3) score--;
Even with millions of results, this is probably going to be a very fast operation as it's so simple.

MySQL query for items where average price is less than X?

I'm stumped with how to do the following purely in MySQL, and I've resorted to taking my result set and manipulating it in ruby afterwards, which doesn't seem ideal.
Here's the question. With a dataset of 'items' like:
id state_id price issue_date listed
1 5 450 2011 1
1 5 455 2011 1
1 5 490 2011 1
1 5 510 2012 0
1 5 525 2012 1
...
I'm trying to get something like:
SELECT * FROM items
WHERE ([some conditions], e.g. issue_date >= 2011 and listed=1)
AND state_id = 5
GROUP BY id
HAVING AVG(price) <= 500
ORDER BY price DESC
LIMIT 25
Essentially I want to grab a "group" of items whose average price fall under a certain threshold. I know that my above example "group by" and "having" are not correct since it's just going to give the AVG(price) of that one item, which doesn't really make sense. I'm just trying to illustrate my desired result.
The important thing here is I want all of the individual items in my result set, I don't just want to see one row with the average price, total, etc.
Currently I'm just doing the above query without the HAVING AVG(price) and adding up the individual items one-by-one (in ruby) until I reach the desired average. It would be really great if I could figure out how to do this in SQL. Using subqueries or something clever like joining the table onto itself are certainly acceptable solutions if they work well! Thanks!
UPDATE: In response to Tudor's answer below, here are some clarifications. There is always going to be a target quantity in addition to the target average. And we would always sort the results by price low to high, and by date.
So if we did have 10 items that were all priced at $5 and we wanted to find 5 items with an average < $6, we'd simply return the first 5 items. We wouldn't return the first one only, and we wouldn't return the first 3 grouped with the last 2. That's essentially how my code in ruby is working right now.
I would do almost an inverse of what Jasper provided... Start your query with your criteria to explicitly limit the few items that MAY qualify instead of getting all items and running a sub-select on each entry. Could pose as a larger performance hit... could be wrong, but here's my offering..
select
i2.*
from
( SELECT i.id
FROM items i
WHERE
i.issue_date > 2011
AND i.listed = 1
AND i.state_id = 5
GROUP BY
i.id
HAVING
AVG( i.price) <= 500 ) PreQualify
JOIN items i2
on PreQualify.id = i2.id
AND i2.issue_date > 2011
AND i2.listed = 1
AND i2.state_id = 5
order by
i2.price desc
limit
25
Not sure of the order by, especially if you wanted grouping by item... In addition, I would ensure an index on (state_id, Listed, id, issue_date)
CLARIFICATION per comments
I think I AM correct on it. Don't confuse "HAVING" clause with "WHERE". WHERE says DO or DONT include based on certain conditions. HAVING means after all the where clauses and grouping is done, the result set will "POTENTIALLY" accept the answer. THEN the HAVING is checked, and if IT STILL qualifies, includes in the result set, otherwise throws it out. Try the following from the INNER query alone... Do once WITHOUT the HAVING clause, then again WITH the HAVING clause...
SELECT i.id, avg( i.price )
FROM items i
WHERE i.issue_date > 2011
AND i.listed = 1
AND i.state_id = 5
GROUP BY
i.id
HAVING
AVG( i.price) <= 500
As you get more into writing queries, try the parts individually to see what you are getting vs what you are thinking... You'll find how / why certain things work. In addition, you are now talking in your updated question about getting multiple IDs and prices at apparent low and high range... yet you are also applying a limit. If you had 20 items, and each had 10 qualifying records, your limit of 25 would show all of the first item and 5 into the second... which is NOT what I think you want... you may want 25 of each qualified "id". That would wrap this query into yet another level...
What MySQL does makes perfectly sense. What you want to do does not make sense:
if you have let's say 4 items, each with price of 5 and you put HAVING AVERAGE <= 7 what you say is that the query should return ALL the permutations, like:
{1} - since item with id 1, can be a group by itself
{1,2}
{1,3}
{1,4}
{1,2,3}
{1,2,4}
...
and so on?
Your algorithm of computing the average in ruby is also not valid, if you have items with values 5, 1, 7, 10 - and seek for an average value of less than 7, element with value 10 can be returned just in a group with element of value 1. But, by your algorithm (if I understood correctly), element with value 1 is returned in the first group.
Update
What you want is something like the Knapsack problem and your approach is using some kind of Greedy Algorithm to solve it. I don't think there are straight, easy and correct ways to implement that in SQL.
After a google search, I found this article which tries to solve the knapsack problem with AI written in SQL.
By considering your item price as a weight, having the number of items and the desired average, you could compute the maximum value that can be entered in the 'knapsack' by multiplying desired_cost with number_of_items
I'm not entirely sure from your question, but I think this is a solution to your problem:
SELECT * FROM items
WHERE (some "conditions", e.g. issue_date > 2011 and listed=1)
AND state_id = 5
AND id IN (SELECT id
FROM items
GROUP BY id
HAVING AVG(price) <= 500)
ORDER BY price DESC
LIMIT 25
note: This is off the top of my head and I haven't done complex SQL in a while, so it might be wrong. I think this or something like it should work, though.

Count a specific value from multiple columns and group by values in another column... in mysql

Hey. I have 160 columns that are filled with data when a user fills a report form out and submit it. A few of these sets of columns contain similar data, but there needs to be multiple instance of this data per record set as it may be different per instance in the report.
For example, an employee opens a case by a certain type at one point in the day, then at another point in the day they open another case of a different type. I want to create totals per user based on the values in these columns. There is one column set that I want to target right now, case type. I would like to be able to see all instances of the value "TSTO" in columns CT1, CT2, CT3... through CT20. Then have that sorted by the employee ID number, which is just one column in the table.
Any ideas? I am struggling with this one.
So far I have SELECT CT1, CT2, CT3, CT4, CT5, CT6, CT7, CT8, CT9, CT10, CT11, CT12, CT13, CT14, CT15, CT16, CT17, CT18, CT19, CT20 FROM REPORTS GROUP BY OFFICER
This will display the values of all the case type entries in a record set but I need to count them, I tried to use,
SELECT CT1, CT2, CT3, CT4, CT5, CT6, CT7, CT8, CT9, CT10, CT11, CT12, CT13, CT14, CT15, CT16, CT17, CT18, CT19, CT20 FROM REPORTS COUNT(TSTO) GROUP BY OFFICER
but it just spits an error. I am fairly new to mysql databasing and php, I feel I have a good grasp but query'ing the database and the syntax involved is a tad bit confused and/or overwhelming right now. Just gotta learn the language. I will keep looking and I have found some similar things on here but I don't understand what I am looking at (completely) and I would like to shy away from using code that "works" but I don't understand fully.
Thank you very much :)
Edit -
So this database is an activity report server for the days work for the employees. The person will often open cases during the day. These cases vary in type, and their different types are designated by a four letter convention. So your different case types could be TSTO, DOME, ASBA, etc etc. So the user will fill out their form throughout the day then submit it down to the database. That's all fine :) Now I am trying to build a page which will query the database by user request for statistics of a user's activities. So right now I am trying to generate statistics. Specifically, I want to be able to generate the statistic of, and in human terms, "HOW MANY OCCURENCES OF "USER INPUTTED CASE TYPE" ARE THERE FOR EMPLOYEEIDXXX"
So when a user submits a form they will type in this four letter case type up to 20 times in one form, there is 20 fields for this case type entry, thus there is 20 columns. So these 20 columns for case type will be in one record set, one record set is generated per report. Another column that is generated is the employeeid column, which basically identifies who generated the record set through their form.
So I would like to be able to query all 20 columns of case type, across all record sets, for a defined type of case (TSTO, DOME, ASBA, etc etc) and then group that to corresponding user(s).
So the output would look something like,
316 TSTO's for employeeid108
I hope this helps to clear it up a bit. Again I am fairly fresh to all of this so I am not the best with the vernacular and best practices etc etc...
Thanks so much :)
Edit 2 -
So to further elaborate on what I have going on, I have an HTML form that has 164 fields. Each of these fields ultimately puts a value into a column in a single record set in my DB, each submission. I couldn't post images or more than two URLs so I will try to explain it the best I can without screenshots.
So what happens is this information gets in the DB. Then there is the query'ing. I have a search page which uses an HTML form to select the type of information to be searched for. It then displays a synopsis of each report that matches the query. The user than enters the REPORT ID # for the report they want to view in full into another small form (an input field with a submit button) which brings them to a page with the full report displayed when they click submit.
So right now I am trying to do totals and realizing my DB will be needing some work and tweaking to make it easier to create querys for it for different information needed. I've gleaned some good information so far and will continue to try and provide concise information about my setup as best I can.
Thanks.
Edit 3 -
Maybe you can go to my photobucket and check them out, should let me do one link, there is five screenshots, you can kind of see better what I have happening there.
http://s1082.photobucket.com/albums/j376/hughessa
:)
The query you are looking for would be very long and complicated for your current db schema.
Every table like (some_id, column1, column2, column3, column4... ) where columns store the same type of data can be also represented by a table (some_id, column_number, column_value ) where instead of 1 row with values for 20 columns you have 20 rows.
So your table should rather look like:
officer ct_number ct_value
1 CT1 TSTO
1 CT2 DOME
1 CT3 TSTO
1 CT4 ASBA
(...)
2 CT1 DOME
2 CT2 TSTO
For a table like this if you wanted to find how many occurences of different ct_values are there for officer 1 you would use a simple query:
SELECT officer, ct_value, count(ct_value) AS ct_count
FROM reports WHERE officer=1 GROUP BY ct_value
giving results
officer ct_value ct_count
1 TSTO 2
1 DOME 1
1 ASBA 1
If you wanted to find out how many TSTO's are there for different officers you would use:
SELECT officer, ct_value, count( officer ) as ct_count FROM reports
WHERE ct_value='TSTO' GROUP BY officer
giving results
officer ct_value ct_count
1 TSTO 2
2 TSTO 1
Also any type of query for your old schema can be easily converted to new schema.
However if you need store additional information about every particular report I suggest having two tables:
Submissions
submission_id report_id ct_number ct_value
primary key
auto-increment
------------------------------------------------
1 1 CT1 TSTO
2 1 CT2 DOME
3 1 CT3 TSTO
4 1 CT4 ASBA
5 2 CT1 DOME
6 2 CT2 TSTO
with report_id pointing to a record in another table with as many columns as you need for additional data:
Reports
report_id officer date some_other_data
primary key
auto-increment
--------------------------------------------------------------------
1 1 2011-04-29 11:28:15 Everything went ok
2 2 2011-04-29 14:01:00 There were troubles
Example:
How many TSTO's are there for different officers:
SELECT r.officer, s.ct_value, count( officer ) as ct_count
FROM submissions s JOIN reports r ON s.report_id = r.report_id
WHERE s.ct_value='TSTO'
GROUP BY r.officer