MYSQL Alternative to UNION for same table reusing same columns selected as new name - mysql

I'm trying to generate a result set from a table with effectively a unique/primary key as billyear, billmonth and type along with cost and consumption. So there could be 3 bill year and bill month identical entries but the type could be one of three values: E, W or NG.
I need to create a result set that has just one row per billyear and billmonth entry.
(
select month as billmonth, year as billyear, cost_estimate as eleccost, consumption_estimate as eleccons from tblbillforecast where buildingid=19 and type='E'
)
UNION (
select month as billmonth, year as billyear, cost_estimate as gascost, consumption_estimate as gascons from tblbillforecast where buildingid=19 and type='NG'
)
UNION (
select month as billmonth, year as billyear, cost_estimate as watercost, consumption_estimate as watercons from tblbillforecast where buildingid=19 and type='W'
)
This generates a result set with only billmonth, billyear, eleccost and eleccons columns. I've tried all kinds of solutions but the above example is the simplest to show where it's going wrong.
Additionally it still has 3 rows per billmonth/billyear unique combination instead of merging to one.
UPDATE:
Sample data
SELECT month AS billmonth,
year AS billyear,
SUM(CASE type WHEN 'E' THEN cost_estimate END) AS eleccost,
SUM(CASE type WHEN 'NG' THEN cost_estimate END) AS gascost,
SUM(CASE type WHEN 'W' THEN cost_estimate END) AS watercost
FROM tblbillforecast
WHERE buildingid=19
GROUP BY billmonth, billyear;
Result:
Expected result, eg:
year | month | eleccost | gascost | watercost
2018 | 1 | 32800 | 4460 | 4750

This is behaving correctly. An SQL query result set has one name per column, and this name applies to all the rows. So if you try to rename the column in the second or subsequent queries of the UNION, those new names are ignored. The name of the column is determined only by the first query of the UNION.
Additionally it still has 3 rows per billmonth/billyear unique combination instead of merging to one.
That's also correct behavior, according to the query you tried. UNION does not merge multiple rows into one, it only appends sets of rows.
As Akina hinted in the comments above, you may use multiple columns:
SELECT month AS billmonth,
year AS billyear,
SUM(CASE type WHEN 'E' THEN cost_estimate END) AS eleccost,
SUM(CASE type WHEN 'NG' THEN cost_estimate END) AS gascost,
SUM(CASE type WHEN 'W' THEN cost_estimate END) AS watercost
FROM tblbillforecast
WHERE buildingid=19
GROUP BY billmonth, billyear;
This uses GROUP BY to "merge" rows together, so you get one row in the result per month/year.

A quick bit of guidance on various data shaping operations in SQL:
JOIN - makes resultsets wider (more columns) by bringing together tables/resultsets in a side-by-side fashion generating output rows that have all the columns of the two input column sets
SELECT - typically makes resultsets narrower by allowing you to specify which columns you're interested in and which you are not; by not mentioning an available column it disappears meaning you output fewer columns
UNION - makes resultsets taller (more rows) by bringing together resultsets and outputting one on top of the other. Because columns always have a fixed data type and one name, you must have the same number of and type of, and order of columns
WHERE - makes resultsets shorter (fewer rows) by allowing you to specify truth based filters that exclude rows
It's not hard and fast; you can use select to create more columns too, but just in a very rudimentary sense these concepts hold true - JOIN to widen, UNION for taller, SELECT for narrower and WHERE for shorter. All the work you do with SQL is a data shaping exercise; you're either paring a rectangular block of data down or extending it, and in either a vertical or horizontal direction (or a mix).
I'm not going to get into grouping because that mixes rows up, and isn't something you tried in the question.. The reason for me writing this out was purely because you'd attempted to use a UNION (height-increasing) operation when you actually wanted a widen which, regardless of how it is done (JOIN or as per Bill's answer a SELECT+GROUP, which is valid, but relies on the "mixes rows up" aspect of grouping), specifically isn't done with a UNION. Union only makes stuff taller.
To give an example of how it might be done in an alternative way to Bill's approach, this task of yours has one huge table that is "too tall" - it uses 3 rows where 1 would do, if only it were a bit wider. That is to say if only there were 3 columns for electric/gas/water then we wouldn't need 3 rows with 1 utility in each.
Of course, we have this "one utility per row" because it is very flexible. Database tables don't have varying numbers of columns but they DO have varying numbers of rows. If a new bill type came along tomorrow - internet - no table changes are needed to accommodate it; add a new type I, and away you go, adding another row. We now store 4 rows of 1 utility where 1 row with 4 columns would do, but crucially we didn't have to change the table structure. We could have infinite different kinds of bills, and not need infinite columns because we can already have infinite rows
So you want to reshape your data from 4-rows-by-1-column to 1-row-by-4-columns. It could be solved as :
narrow the table to just year,month,building,type,cost AND shorten it to just electricity
separately narrow the table to just year,month,building,type,cost AND shorten it to just gas
separately narrow the table to just year,month,building,type,cost AND shorten it to just water
join (widening) all these newly created result sets , then narrow to remove the repeated year,month,building,type columns
That would look like:
SELECT e.year, e.month, e.building, e.cost, g.cost, w.cost
FROM
(SELECT year,month,building,cost FROM t WHERE type = 'E') e
JOIN
(SELECT year,month,building,cost FROM t WHERE type = 'NG') g
ON
e.year = g.year AND e.month = g.month AND e.building = g.building
JOIN
(SELECT year,month,building,cost FROM t WHERE type = 'W') w
ON
e.year = w.year AND e.month = w.month AND e.building = w.building
WHERE
e.building = 19
You can see clearly the 3 narrowing-and-shortening operations that pick out "just the gas", "just the electric", and "just the water" - they're the (SELECT year,month,building,cost FROM t WHERE type = 'NG') and that's what reduces the height of the original table, making it three times shorter than it was in each case. If we had 999 rows X 5 cols in the big table it goes to 3 sets of 333 x 5 rows each
You can see that we then JOIN these together to widen the results - our e.g 3 sets of 333 x 5 rows each widens to 333 x 15 when JOINed..
Then went from 333x15 down to 333 X 7 when SELECTed to ditch the repeated columns
It's likely not perfect (I'd perhaps left join all 3 onto a 4th set of numbers that are just the common columns in case some utilities aren't present for a particular month), and perhaps some people will come along complaining that it's less performant because it hits the table 3 times.. All that is accessory to the point I'm making about SQL being an exercise in reshaping data - tables are the starting blocks of data and you cut them up narrower and shorter, then stick them together side by side, or on top of each other and that becomes your new data block that's maybe wider, higher, both.. In any case it's definitely a different shape to what you started with. And then you can cut and shape again, and again..
Go with Bill's conditional agg (though this way would be fine if there is one row per building/year/month) but take away a stronger notion about in what direction these common operations (SELECT/JOIN/WHERE/UNION) reshape your data
Footnote about Bill's conditional aggregation (I know I said I wouldn't talk about it but it might make more sense to now). If you have:
Type, Cost
E, 123
NG, 456
W, 789
And you do a
SELECT
CASE WHEN Type = 'E' THEN Cost END as CostE,
CASE WHEN Type = 'NG' THEN Cost END as CostG,
CASE WHEN Type = 'W' THEN Cost END as CostW
...
It spreads the data out over more columns - the data has "gone from vertical to diagonal"
CostE, CostNG, CostW
123, NULL, NULL
NULL, 456, NULL
NULL, NULL, 789
But it's still too tall. If you then run a GROUP BY, which mixes rows up and ask for e.g. just the MAX from each column, then all the NULLs will disappear (because there is a non null somewhere in the column, and NULL is lost if there is a non null, no matter what you're doing) and the rows collapse, mixing together, into one:
CostE, CostNG, CostW
123, 456, 789
The data has pivoted round from being vertical, to being horizontal - another data shaping. It was pulled wider, and squashed flatter

Related

How to select two MySQL rows and then compare a column and return an output

I've a table with a structure something like this,
Device | paid | time
abc 1 2 days ago
abc 0 1 day ago
abc 0 5 mins ago
Is it possible to write a query that checks the paid column on all the rows where Device = abc and then outputs the most recent two rows that different. Basically, something like an if statement saying if row 1 = 1 and row 2 = 0 output that but only if it's the most recent two columns that are different. For example, in this case, the first and second row. The table is being updated whenever a user changes from a free to paid account etc. It is also updated in different columns for different reasons hence the duplicate 0s for example.
I know this would probably be done better by having another table altogether and updating that every time the user switches account type, but is there any way to make this work?
Thanks
Example:
http://rextester.com/MABU7860 need further testing on edge cases but this seems to work.
SELECT A.*, B.*
FROM SQLfoo A
INNER JOIN SQLFoo B
on A.Device = B.Device
and A.mTime < B.mTime
WHERE A.Paid <> B.Paid
and A.device = 'abc'
ORDER BY B.mTime Desc, A.MTime Desc
LIMIT 1
By performing a self join we on the devices where the time from one table is less than the time from the next table (thus the two records will never matach and we only get the reuslts one way) and we order by those times descending, the highest times appear first in the result since we limit by a single device we don't need to concern ourselves with the devices. We then just need compare the paid from one source to the paid in the 2nd source and return the first result encountered thus limit 1.
Or using user variables
http://rextester.com/TWVEVX7830
in other engines one might accomplish this task by performing the join as in above, assigning a row number partitioned by the device and then simply return all those row_numbers with a value of 1; which would be the earliest date discrepency.
Use LIMIT to limit the number of record on mysql:
http://www.mysqltutorial.org/mysql-limit.aspx
In your case, use LIMIT 2
and then put the 2 record that you just select into an array, then compare the array if the value is different. If they are different then print

Storing csv in MySQL field – bad idea?

I have two tables, one user table and an items table. In the user table, there is the field "items". The "items" table only consists of a unique id and an item_name.
Now each user can have multiple items. I wanted to avoid creating a third table that would connect the items with the user but rather have a field in the user_table that stores the item ids connected to the user in a "csv" field.
So any given user would have a field "items" that could have a value like "32,3,98,56".
It maybe is worth mentioning that the maximum number of items per user is rather limited (<5).
The question: Is this approach generally a bad idea compared to having a third table that contains user->item pairs?
Wouldn't a third table create quite an overhead when you want to find all items of a user (I would have to iterate through all elements returned by MySQL individually).
You don't want to store the value in the comma separated form.
Consider the case when you decide to join this column with some other table.
Consider you have,
x items
1 1, 2, 3
1 1, 4
2 1
and you want to find distinct values for each x i.e.:
x items
1 1, 2, 3, 4
2 1
or may be want to check if it has 3 in it
or may be want to convert them into separate rows:
x items
1 1
1 2
1 3
1 1
1 4
2 1
It will be a HUGE PAIN.
Use atleast normalization 1st principle - have separate row for each value.
Now, say originally you had this as you table:
x item
1 1
1 2
1 3
1 1
1 4
2 1
You can easily convert it into csv values:
select x, group_concat(item order by item) items
from t
group by x
If you want to search if x = 1 has item 3. Easy.
select * from t where x = 1 and item = 3
which in earlier case would use horrible find_in_set:
select * from t where x = 1 and find_in_set(3, items);
If you think you can use like with CSV values to search, then first like %x% can't use indexes. Second, it will produce wrong results.
Say you want check if item ab is present and you do %ab% it will return rows with abc abcd abcde .... .
If you have many users and items, then I'd suggest create separate table users with an PK userid, another items with PK itemid and lastly a mapping table user_item having userid, itemid columns.
If you know you'll just need to store and retrieve these values and not do any operation on it such as join, search, distinct, conversion to separate rows etc. etc. - may be just may be, you can (I still wouldn't).
Storing complex data directly in a relational database is a nonstandard use of a relational database. Normally they are designed for normalized data.
There are extensions which vary according to the brand of software which may help. Or you can normalize your CSV file into properly designed table(s). It depends on lots of things. Talk to your enterprise data architect in this case.
Whether it's a bad idea depends on your business needs. I can't assess your business needs from way out here on the internet. Talk to your product manager in this case.

how to use group by on this table

here is a screen shot of my table
I am trying to remove all those rows whose sum of PostAmt comes to be 0 when grouped by the sales_contract_nbr and the name.
for example :
the sales_contract_nbr 51101008103 will be removed when grouped by name and sales_contract_nbr as -96.83 and 96.83 when summed up amounts to 0.
Quite simple right?
but what I want apart from this is that I want to remove the contracts in group. I mean if the contract 51101008195 is grouped it amounts to be 533.87 which won't be removed (highlighted)
But I want to remove it in groups
for example
two rows of contract number 51101008195 should be summed first (see the image below) I mean the amount -533.87 and 533.87 should be summed to get the total of 0. Only one record for the contract should be left.
Update
More Description :
what i want to do is first group the row number 1 and 2 (matching amounts one positive and the other negative) and then group the others. If there were 4 rows of the same contract number then the row 1 and row 2 should have been grouped then the row 3 and row 4 should be grouped if there absolute amounts are same if not the row number 3 and 4 doesn't get deleted.
I want to use group by to eliminate the rows whose total ends up to be 0 and which have the same name or the contract number.
I hope I have made the question clear. If not please ask.
how can it be done?
what i am doing till now is :
SELECT sales_contract_nbr
,name
,SUM(PostAmt) PostAmt
FROM tblMasData
GROUP BY sales_contract_nbr, name
thanks.
Here is what I come up with for now :
SELECT location
,sales_contract_nbr
,name
,SUM(absPostAmt * nbPostAmt) / ABS(SUM(nbPostAmt)) PostAmt
,ABS(SUM(nbPostAmt)) nbPostAmt
,SUM(absPostAmt * nbPostAmt) PostAmtTotal
FROM (
SELECT location
,sales_contract_nbr
,name
,PostAmt
,ABS(PostAmt) absPostAmt
,SUM(CASE WHEN PostAmt >= 0 THEN 1 ELSE -1 END) nbPostAmt
FROM tblMasData
GROUP BY location
,sales_contract_nbr
,name
,PostAmt
,ABS(PostAmt)
) t
GROUP BY location
,sales_contract_nbr
,name
,absPostAmt
HAVING SUM(absPostAmt * nbPostAmt) != 0
See SQLFiddle.
This doesn't totally answer your question, as if you have 100 + 100 - 200 for instance, it won't hide all three rows. But it can be pretty messy to find combinations which equal to 0 among a bunch of rows.
More, if some rows have the same amount, they will be grouped. That's why I added a column counting those rows being equal, and a column summing them up at the end.
This should at least allow you to deal with the data programmatically.
Let me know if this fills your needs, or if you need some improvement (which could involve some not so pretty SQL).

How can I return a row for each date, even when there is no data for that date (in which case the row should be filled with zero's)?

I hope I will be able to make my problem clear.
Ik have a table called tweets from which I want to extract information for each data in the daterange table. This table holds 142 dates, of which 102 dates have the property trading (day on which market was open) set to 1 (trading=1).
The below query extracts information from the tweets table for 20 companies (identified by sp100_id). The expected resultset therefore contains 20 x 102 = 2,040 rows. However, I only get returned 1,987 rows because for some date-company combinations, the tweets table holds no data. I need these "empty days" to be included in the resultset however. I thought I could accomplish this by using COALESCE(X, 0), returning 0 if there would be no data, but the result is the same: 1,987 rows.
Based on this information and the query below, does anybody know how I can get it to return 102 rows (1 row for each daterange._date with trading=1) for each sp100_id in the tweets table?
SELECT
sp100.sp100_id,
daterange._date,
COALESCE(SUM(IF(tweets.classify1=2, tweets.`retweet_count`, 0)),0) AS `pos-retweet`,
COALESCE(SUM(IF(tweets.classify1=2, tweets.`user-quality`, 0)),0) AS `pos-quality`,
COALESCE(SUM(IF(tweets.classify1=2, tweets.`follow`, 0)),0) AS `pos-follow`,
COALESCE(SUM(IF(tweets.classify1=3, tweets.`retweet_count`, 0)),0) AS `neg-retweet`,
COALESCE(SUM(IF(tweets.classify1=3, tweets.`user-quality`, 0)),0) AS `neg-quality`,
COALESCE(SUM(IF(tweets.classify1=3, tweets.`follow`, 0)),0) AS `neg-follow`
FROM
sp100
CROSS JOIN
daterange
LEFT JOIN
tweets
ON tweets.nyse_date = daterange._date
AND tweets.sp100_id = sp100.sp100_id
WHERE sp100.sp100_id BETWEEN 1 AND 20 AND tweets.type != 1 AND daterange.trading = 1
GROUP BY
sp100.sp100_id, daterange._date
In any other case, I would provide you with a SQLFiddle, but it would be a lot of work to export a proper portion of the tables used to SQLFiddle while the solution might be clear to some real SQL guru anyway :-)
The problem comes from requiring that tweets.type != 1 in your WHERE clause.
For the dates that have no associated tweets, the outer join will result in all tweets columns, including tweets.type, being NULL. As documented under Working with NULL Values:
Because the result of any arithmetic comparison with NULL is also NULL, you cannot obtain any meaningful results from such comparisons.
In MySQL, 0 or NULL means false and anything else means true. The default truth value from a boolean operation is 1.
Therefore such records are filtered by your WHERE clause.
As #Martin Smith commented, you can move this filter criterion into the ON clause of your outer join (so that the test is performed only against actual tweets records rather than simulated NULL ones).
Alternatively, you could rewrite the filter to handle NULL. For example, using the NULL-safe equality operator:
NOT tweets.type <=> 1
As an aside, I usually don't bother with a daterange table and instead omit dates for which there is no data from the resultset: instead, I handle missing dates within my application code.
You need a calendar table filled with each day. I know it might sound silly, but this solution solves yo a lot of problems. The same solution you can have also with integers ( integer tables)

count rows where date is equal but separated by name

I think it will be easiest to start with the table I have and the result I am aiming for.
Name | Date
A | 03/01/2012
A | 03/01/2012
B | 02/01/2012
A | 02/01/2012
B | 02/01/2012
A | 02/01/2012
B | 01/01/2012
B | 01/01/2012
A | 01/01/2012
I want the result of my query to be:
Name | 01/01/2012 | 02/01/2012 | 03/01/2012
A | 1 | 2 | 2
B | 2 | 2 | 0
So basically I want to count the number of rows that have the same date, but for each individual name. So a simple group by of dates won't do because it would merge the names together. And then I want to output a table that shows the counts for each individual date using php.
I've seen answers suggest something like this:
SELECT
NAME,
SUM(CASE WHEN GRADE = 1 THEN 1 ELSE 0 END) AS GRADE1,
SUM(CASE WHEN GRADE = 2 THEN 1 ELSE 0 END) AS GRADE2,
SUM(CASE WHEN GRADE = 3 THEN 1 ELSE 0 END) AS GRADE3
FROM Rodzaj
GROUP BY NAME
so I imagine there would be a way for me to tweak that but I was wondering if there is another way, or is that the most efficient?
I was perhaps thinking if the while loop were to output just one specific name and date each time along with the count, so the first result would be A,01/01/2012,1 then the next A,02/01/2012,2 - A,03/01/2012,3 - B,01/01/2012,2 etc. then perhaps that would be doable through a different technique but not sure if something like that is possible and if it would be efficient.
So I'm basically looking to see if anyone has any ideas that are a bit outside the box for this and how they would compare.
I hope I explained everything well enough and thanks in advance for any help.
You have to include two columns in your GROUP BY:
SELECT name, COUNT(*) AS count
FROM your_table
GROUP BY name, date
This will get the counts of each name -> date combination in row-format. Since you also wanted to include a 0 count if the name didn't have any rows on a certain date, you can use:
SELECT a.name,
b.date,
COUNT(c.name) AS date_count
FROM (SELECT DISTINCT name FROM your_table) a
CROSS JOIN (SELECT DISTINCT date FROM your_table) b
LEFT JOIN your_table c ON a.name = c.name AND
b.date = c.date
GROUP BY a.name,
b.date
SQLFiddle Demo
You're asking for a "pivot". Basically, it is what it is. The real problem with a pivot is that the column names must adapt to the data, which is impossible to do with SQL alone.
Here's how you do it:
SELECT
Name,
SUM(`Date` = '01/01/2012') AS `01/01/2012`,
SUM(`Date` = '02/01/2012') AS `02/01/2012`,
SUM(`Date` = '03/01/2012') AS `03/01/2012`
FROM mytable
GROUP BY Name
Note the cool way you can SUM() a condition in mysql, becasue in mysql true is 1 and false is 0, so summing a condition is equivalent to counting the number of times it's true.
It is not more efficient to use an inner group by first.
Just in case anyone is interested in what was the best method:
Zane's second suggestion was the slowest, I loaded in a third of the data I did for the other two and it took quite a while. Perhaps on smaller tables it would be more efficient, and although I am not working with a huge table roughly 28,000 rows was enough to create significant lag, with the between clause dropping the result to about 4000 rows.
Bohemian's answer gave me the least amount to code, I threw in a loop to create all the case statements and it worked with relative ease. The benefit of this method was the simplicity, besides creating the loop for the cases, the results come in without the need for any php tricks, just simple foreach to get all the columns. Recommended for those not confident with php.
However, I found Zane's first suggestion the quickest performing and despite the need for extra php coding it seems I will be sticking with this method. The disadvantage of this method is that it only gives the dates that actually have data, so creating a table with all the dates becomes a bit more complicated. What I did was create a variable that keeps track of what date it is supposed to be compared to the table column which is reset on each table row, when the result of the query is equal to that date it echoes the value otherwise it does a while loop echoing table cells with 0 until the dates do match. It also had to do a check to see if the 'Name' value is still the same and if not it would switch to the next row after filling in any missing cells with 0 to the end of that row. If anyone is interested in seeing the code you can message me.
Results of the two methods over 3 months of data (a column for each day so roughly 90 case statements) ~ 12,000 rows out of 28,000:Bohemian's Pivot - ~0.158s (highest seen ~0.36s)Zane's Double Group by - ~0.086s (highest seen ~0.15s)