I am trying to write a report on some data I collected using a google form. Each person was asked how many of an item they had in their closet. I want to present the data as a count of how many out of the total fell into each range. So, I used this mysql query to count the instances of each answer:
SELECT `Closet` , COUNT( * ) FROM `TABLE 1` GROUP BY `Closet`
And here is the resulting data:
Closet | COUNT( * )
--------+------------
0 | 8
1-5 | 124
101-200 | 7
11-20 | 181
201-300 | 3
21-50 | 171
51-100 | 48
6-10 | 156
The problem is that alphabetically, 101-200 items sorts before 6-10 items. I basically want to sort this in some way that would put the number ranges in a logical order. (1-5, 6-10, 11-20, etc).
How can I accomplish this?
You'll have to use convert & substring_index.
SELECT `Closet` , COUNT( * ) FROM `TABLE 1` GROUP BY `Closet` order by convert(substring_index(Closet,'-',1), unsigned integer)
This will sort the Closet by taking the first number of the ranges, which should essentially do the job.
Related
For example, I have this table (the values are completely random):
long | lat | fare_amout | total_people |
----------------------------------------
23,32 64,67 450 4
64,67 78,27 543 2
25,32 98,07 458 1
12,32 44,65 323 7
93,42 24,19 398 9
...
So basically the total_people is the number of times that the same coordinates appear on the table (I got the result by doing a simple count), the fare_amout is the average of the values that have the same coordinates.
I want to order my table in order to have the coordinates who have a higher combination of fare_amount and total_people on the first line. Any suggestions?
Is this what you want?
order by total_people desc, fare_amount desc
It orders by total_people first. When there are ties, then by fare_amount.
If by "higher combination" you mean the total amount paid, then you can instead use:
order by total_people * fare_amount desc
I have been at this for a few days without much luck and I am looking for some guidance on how to get the lowest estimate from a particular group of sullpiers and then place it into another table.
I have 4 supplier estimate on every piece of work and all new estimates go into a single table, i am trying to find the lowest 'mid' price from the 4 newsest entries in the 'RECENT QUOTE TABLE' with a group id of '1' and then place that into the 'LOWEST QUOTE TABLE' as seen below.
RECENT QUOTE TABLE:
suppid group min mid high
1 1 200 400 600
2 1 300 500 700
3 1 100 300 500
[4] [1] 50 [150] 300
5 2 1000 3000 5000
6 2 3000 5000 8000
7 2 2000 4000 6000
8 2 1250 3125 5578
LOWEST QUOTE TABLE:
suppid group min mid high
4 1 50 150 300
Any help on how to structure this would be great as i have been loking for a few days and have not been able to find anything to get me moving again, im using MYSQL and the app is made in Python im open to all suggestions.
Thanks in advance.
If you really want to select only row with group 1, you can do something like
INSERT INTO lowest_quote_table
SELECT * FROM recent_quote_table
WHERE `group` = 1
ORDER BY `mid` ASC
LIMIT 1.
If you want a row with the lowest mid from every group, you can do something like
INSERT INTO lowest_quote_table
SELECT rq.* FROM recent_quote_table AS rq
JOIN (
SELECT `group`, MIN(`mid`) AS min_mid FROM recent_quote_table
GROUP BY `group`
) MQ ON rq.`group` = MQ.`group` AND rq.`mid` = MQ.min_mid
I have a table with the following information:
Table: bar
minute | beer
1 | 48
2 | 24
3 | 92
4 | 17
5 | 38
6 | 64
I want to know what or where the biggest difference is in the column beer. By manually seeing it with my own eyes, it's between minute 3 and 4, but how can I do this in SQL?
I had something in mind:
Select minute, count(beer) as spike
from bar
where ???
You need nested aggregation:
select max(spike) - min(spike)
from
( -- count per minute
Select minute, count(beer) as spike
from bar
group by minute
) as dt
The simplest method would be:
select max(beer) - min(beer)
from bar;
You can use mysql MAX() and MIN() functions to get highest and lowest values.
SELECT MIN(beer) AS lowestBeer, MAX(beer) as highestBeer
FROM bar;
Since the order does not matter, you can do it with a self-join:
SELECT a.minute AS from_minute, b.minute AS to_minute, a.beer, b.beer
FROM bar a
CROSS JOIN bar b
ORDER BY a.beer-b.beer DESC
LIMIT 1
This would yield a row describing from what minute to what minute you have the biggest difference, along with the corresponding values of beer.
We have a database for patients that shows the details of their various visits to our office, such as their weight during that visit. I want to generate a report that returns the visit (a row from the table) based on the difference between the date of that visit and the patient's first visit being the largest value possible but not exceeding X number of days.
That's confusing, so let me try an example. Let's say I have the following table called patient_visits:
visit_id | created | patient_id | weight
---------+---------------------+------------+-------
1 | 2006-08-08 09:00:05 | 10 | 180
2 | 2006-08-15 09:01:03 | 10 | 178
3 | 2006-08-22 09:05:43 | 10 | 177
4 | 2006-08-29 08:54:38 | 10 | 176
5 | 2006-09-05 08:57:41 | 10 | 174
6 | 2006-09-12 09:02:15 | 10 | 173
In my query, if I were wanting to run this report for "30 days", I would want to return the row where visit_id = 5, because it's 28 days into the future, and the next row is 35 days into the future, which is too much.
I've tried a variety of things, such as joining the table to itself, or creating a subquery in the WHERE clause to try to return the max value of created WHERE it is equal to or less than created + 30 days, but I seem to be at a loss at this point. As a last resort, I can just pull all of the data into a PHP array and build some logic there, but I'd really rather not.
The bigger picture is this: The database has about 5,000 patients, each with any number of office visits. I want to build the report to tell me what the average wait loss has been for all patients combined when going from their first visit to X days out (that is, X days from each individual patient's first visit, not an arbitrary X-day period). I'm hoping that if I can get the above resolved, I'll be able to work the rest out.
You can get the date of the first and next visit using query like this (Note that this doesn't has correct syntax for date comparing and it is just an schema of the query):
select
first_visits.patient_id,
first_visits.date first_date,
max(next_visit.created) next_date
from (
select patient_id, min(created) as "date"
from patient_visits
group by patient_id
) as first_visits
inner join patient_visits next_visit
on (next_visit.patient_id = first_visits.patient_id
and next_visit.created between first_visits.created and first_visits.created + 30 days)
group by first_visits.patient_id, first_visits.date
So basically you need to find start date using grouping by patient_id and then join patient_visits and find max date that is within the 30 days window.
Then you can join the result to patient_visits to get start and end weights and calculate the loss.
I have a table that looks something like the following - essentially containing a timestamp as well as some other columns:
WeatherTable
+---------------------+---------+----------------+ +
| TS | MonthET | InsideHumidity | .... |
+---------------------+---------+----------------+ |
| 2014-10-27 14:24:22 | 0 | 54 | |
| 2014-10-27 14:24:24 | 0 | 54 | |
| 2014-10-27 14:24:26 | 0 | 52 | |
| 2014-10-27 14:24:28 | 0 | 54 | |
| 2014-10-27 14:24:30 | 0 | 53 | |
| 2014-10-27 14:24:32 | 0 | 55 | |
| 2014-10-27 14:24:34 | 9 | 54 | |
.......
I'm trying to formulate a SQL query that returns all rows within a certain timeframe (no problem here) with a certain arbitrary granularity, for instance, every 15 seconds. The number is always specified in seconds but is not limited to values less than 60. To complicate things further, the timestamps don't necessarily fall on the granularity required, so it's not a case of simply selecting the timestamp of 14:24:00, 14:24:15, 14:24:30, etc. - the row with the closest timestamp to each value needs to be included in the result.
For example, if the starting time was given as 14:24:30, the end time as 14:32:00, and the granularity was 130, the ideal times would be:
14:24:30
14:26:40
14:28:50
14:31:00
However, timestamps may not exist for each of those times, in which case the row with the closest timestamp to each of those ideal timestamps should be selected. In the case of two timestamps which are equally far away from the ideal timestamp, the earlier one should be selected.
The database is part of a web service, so presently I'm just ignoring the granularity in the SQL query and filtering the unwanted results out in (Java) code later. However, this seems far from ideal in terms of memory consumption and performance.
Any ideas?
You could try to do it like this:
Create a list of time_intervals first. Using the stored procedure make_intervals from Get a list of dates between two dates create a temporary tables calling it somehow like that:
call make_intervals(#startdate,#enddate,15,'SECOND');
You will then have a table time_intervals with one of two columns named interval_start. Use this to find the closest Timestamp to each interval somehow like that:
CREATE TEMPORARY TABLE IF NOT EXISTS time_intervals_copy
AS (SELECT * FROM time_intervals);
SELECT
time_intervals.interval_start,
WeatherTable.*
FROM time_intervals
JOIN WeatherTable
ON WeatherTable.TS BETWEEN #startdate AND #enddate
JOIN (SELECT
time_intervals.interval_start AS interval_start,
MIN(ABS(time_intervals.interval_start - WeatherTable.TS)) AS ts_diff
FROM time_intervals_copy AS time_intervals
JOIN WeatherTable
WHERE WeatherTable.TS BETWEEN #startdate AND #enddate
GROUP BY time_intervals.interval_start) AS min
ON min.interval_start = time_intervals.interval_start AND
ABS(time_intervals.interval_start - WeatherTable.TS) = min.ts_diff
GROUP BY time_intervals.interval_start;
This will find the closest timestamp to every time_interval. Note: Each row in WeatherTable could be listed more than once, if the interval used is less than half the interval of the stored data (or something like that, you get the point ;)).
Note: I did not test the queries, they are written from my head. Please adjust to your use-case and correct minor mistakes, that might be in there...
For testing purposes, I extended your dataset to the following timestamps. The column in my database is called time_stamp.
2014-10-27 14:24:24
2014-10-27 14:24:26
2014-10-27 14:24:28
2014-10-27 14:24:32
2014-10-27 14:24:34
2014-10-27 14:24:25
2014-10-27 14:24:32
2014-10-27 14:24:34
2014-10-27 14:24:36
2014-10-27 14:24:37
2014-10-27 14:24:39
2014-10-27 14:24:44
2014-10-27 14:24:47
2014-10-27 14:24:53
I've summarized the idea, but let me explain in more detail before providing the solution I was able to work out.
The requirements are to address timestamps +/- a given time. Since we must go in either direction, we'll want to take the timeframe and split it in half. Then, -1/2 of the timeframe to +1/2 of the timeframe defines a "bin" to consider.
The bin for a given time from a given start time in an interval of #seconds is then given by this MySQL statement:
((floor(((t1.time_stamp - #time_start) - (#seconds/2))/#seconds) + 1) * #seconds)
NOTE: The whole + 1 trick is there so that we do not end up with bin of -1 index (it'll start at zero). All times are calculated from the start time to ensure timeframes of >=60 seconds work.
Within each bin, we will need to know the magnitude of the distance from the center of the bin for each timeframe. That's done by determining the number of seconds from start and subtracting it from the bin (then taking the absolute value).
At this stage we then have all times "binned up" and ordered within the bin.
To filter out these results, we LEFT JOIN to the same table and setup the conditions to remove the undesirable rows. When LEFT JOINed, the desirable rows will have a NULL match in the LEFT JOINed table.
I have rather hack-like replaced the start, end, and seconds with variables, but only for readability. MySQL-style comments are included in the LEFT JOIN ON clause identifying the conditions.
SET #seconds = 7;
SET #time_start = TIMESTAMP('2014-10-27 14:24:24');
SET #time_end = TIMESTAMP('2014-10-27 14:24:52');
SELECT t1.*
FROM temp t1
LEFT JOIN temp t2 ON
#Condition 1: Only considering rows in the same "bin"
((floor(((t1.time_stamp - #time_start) - (#seconds/2))/#seconds) + 1) * #seconds)
= ((floor(((t2.time_stamp - #time_start) - (#seconds/2))/#seconds) + 1) * #seconds)
AND
(
#Condition 2 (Part A): "Filter" by removing rows which are greater from the center of the bin than others
abs(
(t1.time_stamp - #time_start)
- (floor(((t1.time_stamp - #time_start) - (#seconds/2))/#seconds) + 1) * #seconds
)
>
abs(
(t2.time_stamp - #time_start)
- (floor(((t2.time_stamp - #time_start) - (#seconds/2))/#seconds) + 1) * #seconds
)
OR
#Condition 2 (Part B1): "Filter" by removing rows which are the same distance from the center of the bin
(
abs(
(t1.time_stamp - #time_start)
- (floor(((t1.time_stamp - #time_start) - (#seconds/2))/#seconds) + 1) * #seconds
)
=
abs(
(t2.time_stamp - #time_start)
- (floor(((t2.time_stamp - #time_start) - (#seconds/2))/#seconds) + 1) * #seconds
)
#Condition 2 (Part B2): And are in the future from the other match
AND
(t1.time_stamp - #time_start)
>
(t2.time_stamp - #time_start)
)
)
WHERE t1.time_stamp - #time_start >= 0
AND #time_end - t1.time_stamp >= 0
#Condition 3: All rows which have a match are undesirable, so those
#with a NULL for the primary key (in this case temp_id) are selected
AND t2.temp_id IS NULL
There may be a more succinct way to write the query, but it did filter the results down to what was needed with one notable exception -- I purposefully put in a duplicate entry. This query will return both such entries as they do meet the criteria as stated.