I need to query a table and gather counts with and without a column value.
What is the count of records that contain column value on 'src' and the count without.
Problem
Results contain one day only instead of every day on each row. Each row has same values.
Results Expected
DAY, CONTAINS VALUE, DOESN'T CONTAIN VALUE
Query
SELECT
DATE_FORMAT(edate,'%Y-%m-%d') as day,
(SELECT
COUNT(id) FROM entries WHERE src='a string' and color = 'red') with_value,
(SELECT
COUNT(id) FROM entries WHERE src='' and color = 'red') without_value
FROM entries
GROUP BY day
ORDER BY day DESC
You can do it without subqueryes using this technique:
SELECT
DATE_FORMAT(edate,'%Y-%m-%d') as day,
SUM(src = 'a string') as with_value,
SUM(src = '') as without_value
FROM entries
GROUP BY day
ORDER BY day DESC
What I did there was take advantage of the fact that MySQL does not have a Boolean data type, but rather TRUE is identical to 1, and FALSE to 0, in effect having the SUM act as a COUNT of rows that satisfy the condition.
I would do this using conditional aggregation:
SELECT DATE_FORMAT(edate, '%Y-%m-%d') as day,
SUM(src = 'a string' and color = 'red') as with_value,
SUM(src = '' and color = 'red') as without_value
FROM entries
GROUP BY day
ORDER BY day DESC;
In MySQL boolean expressions are treated as 0 (for false) and (1 for true) in an integer context. This makes them convenient for aggregation.
If you want per-day aggregate results then you need to perform the grouping by day in the query(-ies) where you compute the aggregate(s). You are grouping in a parent query instead.
In any event, you don't need subqueries for this, and it would be better to avoid them:
SELECT
DATE_FORMAT(edate,'%Y-%m-%d') as day,
SUM(CASE WHEN src='a string' THEN 1 ELSE 0 END CASE)
AS with_value
SUM(CASE WHEN src='' THEN 1 ELSE 0 END CASE)
AS without_value
FROM entries
WHERE color = 'red'
GROUP BY day
ORDER BY day DESC
Related
I have a column(varchar) with date values, I need to find those dates which are expiring in next 30 days.
ExpiringDate
===================
20171208,
20171215,samples
20171130,tested
N/A
No
(empty row)
So, First I need to get values before comma. On the resultset, I need to filter out rows that has only numbers(no 'N/A' or 'No' or empty rows) & then I need to filter those dates which are expiring in next 30 days.
Edited
I have tried the following & resultset seems to be inappropriate
SELECT
DocName,
CategoryName,
AttributeName,
CAST(SUBSTRING_INDEX(AttributeValue, ',', 1) AS DATE) AS ExpiredDate
FROM myDB
WHERE (AttributeName = 'Date of last vessel OVID' OR AttributeName = 'Next Statutory docking' OR
AttributeName = 'Last statutory docking') AND AttributeValue LIKE '%[^0-9]%' AND
DATEDIFF(now(), AttributeValue) <= 30;
Because you are not only storing dates as text, but mixing those dates with entirely non date information, this complicates things. In this case, we can do two checks, one to ensure that the record starts with an actual expected date, and the second to make sure that the date diff is within 30 days from now.
SELECT ExpiringDate
FROM
(
SELECT ExpiringDate
FROM yourTable
WHERE ExpiringDate REGEXP '^[0-9]{8}'
) t
WHERE
DATEDIFF(LEFT(ExpiringDate, 8), NOW()) BETWEEN 0 AND 30;
Note that I use a subquery to first remove rows that do not even have a parseable date. The reason for this is that DATEDIFF will error out if not passed valid dates for both parameters.
Demo
I have epoch timestamps into "PART_EPOCH" column, table name is "crud_mysqli"
I would like to select associated "PART_ID" value for the next FUTURE timestamps. (avoid a research into past timestamps)
The following MySQLI query should select the MIN (next) value within the future : > now.
But it does not return anything.
It does return expected return if i state clauses seperately,
combining clauses as below returns no result.
Would you please tell me what is wrong here :
// Find next event PART_ID name :
// SELECT lowest (next) PART_ID value in the future (do not select winthin past PART_EPOCH values)
$query = "SELECT
PART_ID
FROM crud_mysqli
WHERE (PART_EPOCH = (SELECT MIN(PART_EPOCH) FROM crud_mysqli))
AND (PART_EPOCH > UNIX_TIMESTAMP(NOW()))
";
WHERE (PART_EPOCH = (SELECT MIN(PART_EPOCH) FROM crud_mysqli))
Here you say to only take the entry with the lowest timestamp, which is probably somthing in the past.
AND (PART_EPOCH > UNIX_TIMESTAMP(NOW()))
And here you say, that it should be in the future. The two conditions are excluding each other, if you have any entry with the timestamp in the future.
So you need to put the second condition into the subquery:
SELECT
PART_ID
FROM crud_mysqli
WHERE PART_EPOCH = (
SELECT MIN(PART_EPOCH)
FROM crud_mysqli
WHERE PART_EPOCH > UNIX_TIMESTAMP(NOW())
)
That means: "take the entry with the lowest timestamp in the past"
However.. you can as good do the following:
SELECT PART_ID
FROM crud_mysqli
WHERE PART_EPOCH > UNIX_TIMESTAMP(NOW())
ORDER BY PART_EPOCH ASC
LIMIT 1
The result would only differ if you have two entries with the same timestamp. In that case the first query would return both of them - the second query only one.
I currently have an employee logging sql table that has 3 columns
fromState: String,
toState: String,
timestamp: DateTime
fromState is either In or Out. In means employee came in and Out means employee went out. Each row can only transition from In to Out or Out to In.
I'd like to generate a temporary table in sql to keep track during a given hour (hour by hour), how many employees are there in the company. Aka, resulting table has columns HourBucket, NumEmployees.
In non-SQL code I can do this by initializing the numEmployees as 0 and go through the table row by row (sorted by timestamp) and add (employee came in) or subtract (went out) to numEmployees (bucketed by timestamp hour).
I'm clueless as how to do this in SQL. Any clues?
Use a COUNT ... GROUP BY query. Can't see what you're using toState from your description though! Also, assuming you have an employeeID field.
E.g.
SELECT fromState AS 'Status', COUNT(*) AS 'Number'
FROM StaffinBuildingTable
INNER JOIN (SELECT employeeID AS 'empID', MAX(timestamp) AS 'latest' FROM StaffinBuildingTable GROUP BY employeeID) AS LastEntry ON StaffinBuildingTable.employeeID = LastEntry.empID
GROUP BY fromState
The LastEntry subquery will produce a list of employeeIDs limited to the last timestamp for each employee.
The INNER JOIN will limit the main table to just the employeeIDs that match both sides.
The outer GROUP BY produces the count.
SELECT HOUR(SBT.timestamp) AS 'Hour', SBT.fromState AS 'Status', COUNT(*) AS 'Number'
FROM StaffinBuildingTable AS SBT
INNER JOIN (
SELECT SBIJ.employeeID AS 'empID', MAX(timestamp) AS 'latest'
FROM StaffinBuildingTable AS SBIJ
WHERE DATE(SBIJ.timestamp) = CURDATE()
GROUP BY SBIJ.employeeID) AS LastEntry ON SBT.employeeID = LastEntry.empID
GROUP BY SBT.fromState, HOUR(SBT.timestamp)
Replace CURDATE() with whatever date you are interested in.
Note this is non-optimal as it calculates the HOUR twice - once for the data and once for the group.
Again you are using the INNER JOIN to limit the number of returned row, this time to the last timestamp on a given day.
To me your description of the FromState and ToState seem the wrong way round, I'd expect to doing this based on the ToState. But assuming I'm wrong on that the following should point you in the right direction:
First, I create a "Numbers" table containing 24 rows one for each hour of the day:
create table tblHours
(Number int);
insert into tblHours values
(0),(1),(2),(3),(4),(5),(6),(7),
(8),(9),(10),(11),(12),(13),(14),(15),
(16),(17),(18),(19),(20),(21),(22),(23);
Then for each date in your employee logging table, I create a row in another new table to contain your counts:
create table tblDailyHours
(
HourBucket datetime,
NumEmployees int
);
insert into tblDailyHours (HourBucket, NumEmployees)
select distinct
date_add(date(t.timeStamp), interval h.Number HOUR) as HourBucket,
0 as NumEmployees
from
tblEmployeeLogging t
CROSS JOIN tblHours h;
Then I update this table to contain all the relevant counts:
update tblDailyHours h
join
(select
h2.HourBucket,
sum(case when el.fromState = 'In' then 1 else -1 end) as cnt
from
tblDailyHours h2
join tblEmployeeLogging el on
h2.HourBucket >= el.timeStamp
group by h2.HourBucket
) cnt ON
h.HourBucket = cnt.HourBucket
set NumEmployees = cnt.cnt;
You can now retrieve the counts with
select *
from tblDailyHours
order by HourBucket;
The counts give the number on site at each of the times displayed, if you want during the hour in question, we'd need to tweak this a little.
There is a working version of this code (using not very realistic data in the logging table) here: rextester.com/DYOR23344
Original Answer (Based on a single over all count)
If you're happy to search over all rows, and want the current "head count" you can use this:
select
sum(case when t.FromState = 'In' then 1 else -1) as Heads
from
MyTable t
But if you know that there will always be no-one there at midnight, you can add a where clause to prevent it looking at more rows than it needs to:
where
date(t.timestamp) = curdate()
Again, on the assumption that the head count reaches zero at midnight, you can generalise that method to get a headcount at any time as follows:
where
date(t.timestamp) = "CENSUS DATE" AND
t.timestamp <= "CENSUS DATETIME"
Obviously you'd need to replace my quoted strings with code which returned the date and datetime of interest. If the headcount doesn't return to zero at midnight, you can achieve the same by removing the first line of the where clause.
I'm trying to get a list of 20 events grouped by their Ids and sorted by whether they are in progress, pending, or already finished. The problem is that there are events with the same id that include finished, pending, and in progress events and I want to have 20 distinct Ids in the end. What I want to do is group these events together but if one of them is in progress then sort that group by that event. So basically I want to sort by the latest end time that is also before now().
What I have so far is something like this where end and start are end/start times. I'm not sure if what is inside max() is behaving how I should expect.
select * from event_schedule as t1
JOIN (
SELECT DISTINCT(event_id) as e
from event_schedule
GROUP BY event_id
order by MAX(end < unix_timestamp(now())) asc,
MIN(start >= unix_timestamp(now())) asc,
MAX(start) desc
limit 0, 20
)
as t2 on (t1.event_id = t2.e)
This results in some running / pending events to be mixed around in order when I want them to be in the order running -> pending -> Ended.
I would suggest to first create a view in order to not get an overcomplicated SELECT statement:
CREATE VIEW v_event_schedule AS
SELECT *,
CASE
WHEN end < unix_timestamp(now())
THEN 1
WHEN start > unix_timestamp(now())
THEN 2
ELSE 3
END AS category
FROM event_schedule;
This view v_event_schedule returns an extra column, in addition to the columns of event_schedule, which represents the priority of the category (running, pending, past):
running (in progress)
pending (future)
past
Then the following will do what you want:
SELECT a.*
FROM v_event_schedule a
INNER JOIN (
SELECT id,
MIN(category) category
FROM v_event_schedule b
GROUP BY id
) b
ON a.id = b.id
AND a.category = b.category
ORDER BY category,
start DESC
LIMIT 20;
The ORDER BY can be further adapted to your needs as to how you want to sort within the same category. I added start DESC as that seemed what you were doing in your attempt.
About the original ORDER BY
You had this:
order by MAX(end < unix_timestamp(now())) asc,
MIN(start >= unix_timestamp(now())) asc,
The expressions you have there evaluate to boolean values, and both elements in the ORDER BY each divide the groups into two sections, one for false and one for true, so in total 4 groups.
The first of the two will order IDs first that have no record with an end value in the past, because only then the boolean expression is always false which is the only way to make the MAX of them false as well.
Now let's say for the same ID you have both records that have an end date in the future as well as records with an end date in the past. In that case the MAX aggregates to true, and so the id will be sorted secondary. This is not intended, as this ID might have a "running" record.
I did not look into making your query work based on such aggregates on boolean expressions. It requires some time to understand what they are doing. A CASE WHEN to determine the category with a number really makes the SQL a lot easier to understand, at least to me.
I want to perform a different SELECT based on the column data. For example I have a table http://sqlfiddle.com/#!2/093a2 where I want compare start_date and end_date only if use_schedule = 1. Otherwise select all data. (A different select) Basically I only want to compare the start and end date if only use_schedule is 1 and if use_schedule is 0 then select rest of the data.
An example may be something like
select id, name from table
where use_schedule = 0
else
select id, name, start_date from table
where use_schedule = 0 and current_date >= start_date.
Basically I have the data where schedule is enabled only then look into start and end date. Because if schedule is not enabled there is no point of looking into the dates. Just select the data. With schedule enabled, I want to be more selective in selecting the scheduled data.
I am trying to figure out if MySQL CASE or IF statements would work but not able to do so. How can I run this select?
Thanks.
You can use UNION to mix and match the results of 2 different SQL queries into one result set:
select id, name, null from table
where use_schedule = 0
union
select id, name, start_date from table
where use_schedule = 1 and current_date >= start_date
Note that both queries have to have compatible output fields (same number and type for this to work). The use of UNION automatically merges only distinct records - if you want to keep double results use UNION ALL instead.
In this specific case a more extensive WHERE-clause would also work obviously:
where use_schedule = 0 or (use_schedule = 1 and current_date >= start_date)
But given the question I'm assuming your real case is a bit more complex.
Documentation over at MySQL site.
Use CASE, in this case..:
SELECT id, name,
(CASE
WHEN start_date >= DATE(NOW()) AND use_schedule = 1
THEN start_date
ELSE NULL
END) AS cols FROM campaigns
This way it selects only the schedule 0 OR the 1 with a date bigger or equals to now;
I used DATE(NOW()) so that it removes the time which you are not interested in.