Count changes of a value over time? - mysql

again I am stuck with counting something in MySQL. The database structure is far from SOers´d call optimal, but nevertheless I do not have an influence here and have to live with it. Probably that´s one of the reasons why I need help again to get some information out of it :)
Assume I have:
some_id (not the PK of the table, not unique),
year, month (no date fields just two integer fields),
some_flag (character that is either A or B) .
Now I´d like to know how often some_flag has changed (in a given time span). The time span is not utterly important in the first approach, I just need to know how many changes happened. Note that changes can only happen monthly. My query:
SELECT some_id,year,some_flag FROM mytable
WHERE some_flag = "A" OR someflag = "B"
AND year > 2005
GROUP BY some_id,some_flag
HAVING COUNT(DISTINCT some_flag) > 1
returns an empty result set. What´s wrong with it? I am sure there are years in which the flag changes over months...
Isn't something like
select .... , sum(case when month=month-1 and some_flag != some_flag then 1 else 0 end) as changecount
possible ?

Try this:
SELECT some_flag, COUNT(some_id) FROM mytable
WHERE some_flag = "A" OR someflag = "B"
AND year > 2005
GROUP BY some_flag
HAVING COUNT(some_id) > 1
-Edit-
If you want to see a month over month count, try this:
(Note: it will only show months where it has changed)
SELECT some_flag, year, month, COUNT(some_id) FROM mytable
WHERE some_flag = "A" OR someflag = "B"
AND year > 2005
GROUP BY some_flag, month, year
HAVING COUNT(some_id) > 1

It looks to me like you need to do this in two parts.
First, execute this SQL query to get all of the values for some_id, some_flag:
SELECT some_id, some_flag, year, month
FROM ...
WHERE year > 2005
ORDER BY some_id, some_flag, year, month
Then, run the output through a match / merge process to detect when some_flag changes for a given some_id. Save the year and month that some_flag changes for reporting in the match / merge process.

When grouping by some_flag you're making COUNT(DISTINCT some_flag) to be always 1.
Try gouping only by some_id. I hope this helps.

Related

Group By or Case Logic Issue (Similar to SUMIFS in Excel)

I have a temp table and I'm trying to sum data but can't seem to get the logic right for it. The table contains customer level data and now I'm trying to aggregate it by fiscal year, quarter, and product description. I'm trying to sum by going back 1 year and using the same quarter to sum the # of units sold.
I can do this in excel, but the table is too large for that. This is what the formula in Excel looks like:
=SUMIFS(Units,FiscalYearQuarter >= Concat(FiscalYear -1 & FiscalQuarter, FiscalYearQuarter <= Concat(FiscalYear, FiscalQuarter)
Here's an example of the table:
Here's what the results should looks like (This does not include productdescription, but I will want to add that in):
Every time I try to group by or do a Sum(Case When...) I keep getting the results only by the fiscal year/quarter instead of the sum of historical for 1 year.
A simple GROUP BY will work (although I don't quite understand your Excel logic with concatenation):
SELECT t1.FiscalYear, t1.FiscalQuater, sum(t2.UnitsPurchased)
FROM `table` t1
LEFT JOIN `table` t2
ON ( t1.FiscalYear = t2.FiscalYear + 1
AND t1.FiscalQuater < t2.FiscalQuater)
OR ( t1.FiscalYear = t2.FiscalYear
AND t1.FiscalQuater >= t2.FiscalQuater)
GROUP BY t1.FiscalYear, t1.FiscalQuater
EDIT 1
modified query based on author's feedback

SQL query to select values grouped by hour(col) and weekday(row) based on the timestamp

I have searched SO for this question and found slightly similar posts but was unable to adapt to my needs.
I have a database with server requests since forever, each one with a timestamp and i'm trying to come up with a query that allows me to create a heatmatrix chart (CCC HeatGrid).
The sql query result must represent the server load grouped by each hour of each weekday.
Like this: Example table
I just need the SQL query, i know how to create the chart.
Thank you,
Those looks like "counts" of rows.
One of the issues is "sparse" data, we can address that later.
To get the day of the week ('Sunday','Monday',etc.) returned, you can use the DATE_FORMAT function. To get those ordered, we need to include an integer value 0 through 6, or 1 through 7. We can use an ORDER BY clause on that expression to get the rows returned in the order we want.
To get the "hour" across the top, we can use expressions in the SELECT list that conditionally increments the count.
Assuming your timestamp column is named ts, and assuming you want to pull all rows from the year 2014, we start with something like this:
SELECT DAYOFWEEK(t.ts)
, DATE_FORMAT(t.ts,'%W')
FROM mytable t
WHERE t.ts >= '2014-01-01'
AND t.ts < '2015-01-01'
GROUP BY DAYOFWEEK(t.ts)
ORDER BY DAYOFWEEK(t.ts)
(I need to check the MySQL documentation, WEEKDAY and DAYOFWEEK are real similar, but we want the one that returns lowest value for Sunday, and highest value for Saturday... i think we want DAYOFWEEK, easy enough to fix later)
The "trick" now is the columns across the top.
We can extract the "hour" from timestamp using the DATE_FORMAT() function, the HOUR() function, or an EXTRACT() function... take your pick.
The expressions we want are going to return a 1 if the timestamp is in the specified hour, and a zero otherwise. Then, we can use a SUM() aggregate to count up the 1. A boolean expression returns a value of 1 for TRUE and 0 for FALSE.
, SUM( HOUR(t.ts)=0 ) AS `h0`
, SUM( HOUR(t.ts)=1 ) AS `h1`
, SUM( HOUR(t.ts)=2 ) AS `h2`
, '...'
, SUM( HOUR(t.ts)=22 ) AS `h22`
, SUM( HOUR(t.ts)=23 ) AS `h23`
A boolean expression can also evaluate to NULL, but since we have a predicate (i.e. condition in the WHERE clause) that ensures us that ts can't be NULL, that won't be an issue.
The other issue we can encounter (as I mentioned earlier) is "sparse" data. To illustrate that, consider what happens (with our query) if there are no rows that have a ts value for a Monday. What happens is that we don't get a row in the resultset for Monday. If it does happen that a row is "missing" for Monday (or any day of the week), we do know that all of the hourly counts across the "missing" Monday row would all be zero.

How to select based on different column data

I want to perform a different SELECT based on the column data. For example I have a table http://sqlfiddle.com/#!2/093a2 where I want compare start_date and end_date only if use_schedule = 1. Otherwise select all data. (A different select) Basically I only want to compare the start and end date if only use_schedule is 1 and if use_schedule is 0 then select rest of the data.
An example may be something like
select id, name from table
where use_schedule = 0
else
select id, name, start_date from table
where use_schedule = 0 and current_date >= start_date.
Basically I have the data where schedule is enabled only then look into start and end date. Because if schedule is not enabled there is no point of looking into the dates. Just select the data. With schedule enabled, I want to be more selective in selecting the scheduled data.
I am trying to figure out if MySQL CASE or IF statements would work but not able to do so. How can I run this select?
Thanks.
You can use UNION to mix and match the results of 2 different SQL queries into one result set:
select id, name, null from table
where use_schedule = 0
union
select id, name, start_date from table
where use_schedule = 1 and current_date >= start_date
Note that both queries have to have compatible output fields (same number and type for this to work). The use of UNION automatically merges only distinct records - if you want to keep double results use UNION ALL instead.
In this specific case a more extensive WHERE-clause would also work obviously:
where use_schedule = 0 or (use_schedule = 1 and current_date >= start_date)
But given the question I'm assuming your real case is a bit more complex.
Documentation over at MySQL site.
Use CASE, in this case..:
SELECT id, name,
(CASE
WHEN start_date >= DATE(NOW()) AND use_schedule = 1
THEN start_date
ELSE NULL
END) AS cols FROM campaigns
This way it selects only the schedule 0 OR the 1 with a date bigger or equals to now;
I used DATE(NOW()) so that it removes the time which you are not interested in.

How to use where clause in separate datetime(year,month,day)

http://upic.me/i/hq/capture.png
http://upic.me/i/3g/capture.png
I have the table that divide datetime to single field and set these field to index.
i would to use where clause in date range ex. between 2010/06/21 to 2011/05/15
I try to use
where concat_ws('-',year,month,day) between '2010/06/21' and '2011/05/15'
it's work because I use concat function to adjust these field like ordinary datetime
but it not use index and query slowly.This table has 3 million record
if would to use index I try to this query
where
year = '2011'
and month between 05 and 06
and day between 21 and 15
It almost work but in last line
day between 21 and 15
I can't use this condition
I try to solve this problem but I can't find it and change structer table
I'm looking for answer
thank you
Now I can OR operation for query thank for your answer
In another case if would to find 2009/08/20 to 2011/04/15 It's use longer query and make confusion.Has someone got idea?
If it's a datestamp type, you can just use the where/between clause directly. I would consider switching to that, it's quite faster than a varchar with a custom date format.
WHERE yourdate BETWEEN "2011-05-01" AND "2011-06-15"
Although checking ranges may work for single months, you will find if you're querying between several months to have some margin of error because, if you think about it, you're selecting more than you may necessarily want. Using Datestamp will fix performance and usability issues arising from storing the date in a custom varchar.
Here are the two queries to convert your times around if you're interested:
ALTER TABLE `yourtable` ADD `newdate` DATE NOT NULL;
UPDATE `yourtable` SET `newdate` = STR_TO_DATE(`olddate`, '%Y/%m/%d');
Just change "yourtable", "newdate", and "olddate" to your table's name, the new date column name, and the old datestamp column names respectively.
If you can't change the table structure, you could use something like the following:
WHERE year = '2011'
AND ((month = '05' AND day >= 21) OR (month = '06' AND day <= '15'))
(At least, I think that query does what you want in your specific case. But for e.g. a longer span of time, you'd have to think about the query again, and I suspect queries like this could become a pain to maintain)
UPDATE for the updated requirement
The principle remains the same, only the query becomes more complex. For the range of 2009/08/20 to 2011/04/15 it might look like this:
WHERE year = '2009' AND (month = '08' AND day >= '20' OR month BETWEEN '09' AND '12')
OR year = '2010'
OR year = '2011' AND (month BETWEEN '01' AND '03' OR month = '04' AND day <= '15')
where year = 2011
and (month between 5 and 6) and (day > 20 or day < 16)
You where seperating days and month whereas you must keep them together
parentheses must be set ...
Mike
It is important that you use OR otherwise it is nonsense

MySQL Query to perform calculation and display data based on 2 different date criteria

Good morning,
I am trying to combine two queries into one so that the result array can be populated into a single table. Data is pulled from a single table, and math calculations must take place for one of the columns. Here is what I have currently:
SELECT
laboratory,
SUM(total_produced_week) AS total_produced_sum,
SUM(total_produced_over14) AS total_over14_sum,
100*(SUM(total_produced_over14)/sum(total_produced_week)) as divided_sum,
max(case when metrics_date =maxdate then total_backlog else null end) as total_backlog,
max(case when metrics_date =maxdate then days_workable else null end) as days_workable,
max(case when metrics_date =maxdate then workable_backlog else null end) as workable_backlog,
max(case when metrics_date =maxdate then deferred_over_30_days else null end) as deferred_over_30_days
FROM
test,
(
select max(metrics_date) as maxdate
from metrics
) as x
WHERE
YEAR(metrics_date) = YEAR(CURDATE())
AND MONTH(metrics_date) = MONTH(CURDATE())
GROUP BY
laboratory
ORDER BY 1 ASC
Here's the breakdown:
For each laboratory site, I need:
1) Perform a MONTH TO DATE (current month only) sum, division and multiply by 100 for each site to obtain percentage.
2) Display other columns (total_backlog, days_workable, workable_backlog, deferred_over_30_days) for the most recent update date (metrics_date) only.
The above query performs #1 just fine - I get a total_produced_sum, total_over14_sum and divided_sum column with correct math.
The other columns mentioned in #2, however, return NULL. Data is available in the table for the most recently updated date, so the columns should be reporting that data. It seems like I have a problem with the CASE, but I'm not very familiar with the function so it could be incorrect.
I am running MySQL 5.0.45
Thanks in advance for any suggestions!
Chris
P.S. Here are the two original queries that work correctly. These need to be combined so that the full resultset can be output to a table, organized by laboratory.
Query 1:
SELECT SUM(total_produced_week) AS total_produced_sum,
SUM(total_produced_over14) AS total_over14_sum
FROM test
WHERE laboratory = 'Site1'
AND YEAR(metrics_date) = YEAR(CURDATE()) AND MONTH(metrics_date) = MONTH(CURDATE())
Query 2:
SELECT laboratory, total_backlog, days_workable, workable_backlog, deferred_over_30_days,
items_over_10_days, open_ncs, total_produced_week, total_produced_over14
FROM metrics
WHERE metrics_date = (select MAX(metrics_date) FROM metrics)
ORDER BY laboratory ASC
Operator Error.
I created a copy of the original table (named "metrics") to a table named "test". I then modified the metrics_date in the new "test" table to include data from January 2011 (for the month-to-date). While the first part of the query that performs the math was using the "test" table (and working properly), the second half that pulls the most-recently-updated data was using the original "metrics" table, which did not have any rows with a metrics_date this month.
When I changed the query to use "test" for both parts of the query, everything works as expected. And now I feel really dumb.
Thanks anyway, guys!