Very complex Group By / Unique / Limit by SQL-command - mysql

I actually don't even know how to call this :P, but...
I have one table, let's call it "uploads"
id owner date
-----------------------------
0 foo 20100101120000
1 bar 20100101120300
2 foo 20100101120400
3 bar 20100101120600
.. .. ..
6 foo 20100101120800
Now, when I'ld do something like:
SELECT id FROM uploads ORDER BY date DESC
This would result in:
id owner date
-----------------------------
6 foo 20100101120800
.. .. ..
3 bar 20100101120600
2 foo 20100101120400
1 bar 20100101120300
0 foo 20100101120000
Question: Nice, but, I want to go even further. Because now, when you would build a timeline (and I did :P), you are 'spammed' by messages saying foo and bar uploaded something. I'ld like to group them and return the first result with a time-limit of '500' at the date-field.
What kind of SQL-command do I need that would result in:
id owner date
-----------------------------
6 foo 20100101120800
3 bar 20100101120600
0 foo 20100101120000
Then, after that, I can perform a call for each record to get the associative records in a timeframe of 5 minutes (this is an exmaple for id=6):
SELECT id FROM uploads WHERE date>=20100101120800-500 ORDER BY date DESC
Does anyone now how I should do the first step? (so limiting/grouping the results)
(btw. I know that when I want to use this, I should convert every date (YmdHis=60) to Unix-time (=100), but I don't need the 5 minutes to be exactly 5 minutes, they may be a minute less sometimes...)

I'm not quite clear on the result you are trying to get, even with your examples. Perhaps something with rounding and group by.
SELECT max(id) max_id,owner, (ROUND(date/500)*500) date_interval, max(date) date
FROM uploads GROUP BY date_interval,owner
You may want to use FLOOR or CEILING instead of ROUND, depending on what you want.

Standard SQL doesn't deal with intervals very well.
You are going to need to do a self-join of the table to compare dates of different tuples.
That way, you can easily find all pairs of tuples of which the dates are no more than 500 apart.
However, you really want to cluster the dates in sets no more than 500 apart - and that can't be expressed in SQL at all, as far as I know.
What you can do is something quite similar: split the total time interval into fixed 500-unit ranges, and then cluster all tuples in the table based on the interval they're in. For that, you first need a table or query result with the start times of the intervals; this can be created using a SQL query on your table and a function that either "rounds off" a timestamp to the starting time in its interval, or computes its interval sequence number. Then as a second step you can join the table with that result to group its timestamps according to their corresponding start time. I can't give the SQL because it's DBMS-dependent, and I certainly can't tell you if this is the best way of accomplishing what you want in your situation.

Use an inline view? e.g. something like
SELECT u1.*
FROM uploads u1,
(SELECT date
FROM uploads u2
WHERE u2.owner='foo') datum_points
WHERE u1.date BETWEEN datum_points.date
AND DATE_ADD(datum_points.date INTERVAL 5 MINUTES)
should return all the posts made within 5 minutes of 'foo' making a post.

Related

Mysql query to only return rows based on column of first row

Sorry for the title, hard to explain in one line, but I am after a query that can get me all the rows based on the result of the first row - an example will explain:
Table looks like this
id | name | week
1 x 2
2 y 2
3 z 3
So basically without knowing the first week value, I want only those rows where week = 2. In other words, if the next run, the first result is week: 3, then I only want rows where week = 3.
This is all contingent on the correct ordering of the rows - but that is not the purpose of this.
I've thought about doing it in two queries given this be inside some php app, where the first query
select week from table limit 1
And therefore, now we know the week value, can simply
select id, name, week from table where week = '2'
But I figured there was a smarter way to do it in one query, just not sure what that sql function might look like.
Hope that makes sense.
Expanding from your steps and thinking, you could do
SELECT * FROM table WHERE week = (SELECT week FROM table LIMIT 1)
The above should work fine (unless I misunderstood the question).

How to select two MySQL rows and then compare a column and return an output

I've a table with a structure something like this,
Device | paid | time
abc 1 2 days ago
abc 0 1 day ago
abc 0 5 mins ago
Is it possible to write a query that checks the paid column on all the rows where Device = abc and then outputs the most recent two rows that different. Basically, something like an if statement saying if row 1 = 1 and row 2 = 0 output that but only if it's the most recent two columns that are different. For example, in this case, the first and second row. The table is being updated whenever a user changes from a free to paid account etc. It is also updated in different columns for different reasons hence the duplicate 0s for example.
I know this would probably be done better by having another table altogether and updating that every time the user switches account type, but is there any way to make this work?
Thanks
Example:
http://rextester.com/MABU7860 need further testing on edge cases but this seems to work.
SELECT A.*, B.*
FROM SQLfoo A
INNER JOIN SQLFoo B
on A.Device = B.Device
and A.mTime < B.mTime
WHERE A.Paid <> B.Paid
and A.device = 'abc'
ORDER BY B.mTime Desc, A.MTime Desc
LIMIT 1
By performing a self join we on the devices where the time from one table is less than the time from the next table (thus the two records will never matach and we only get the reuslts one way) and we order by those times descending, the highest times appear first in the result since we limit by a single device we don't need to concern ourselves with the devices. We then just need compare the paid from one source to the paid in the 2nd source and return the first result encountered thus limit 1.
Or using user variables
http://rextester.com/TWVEVX7830
in other engines one might accomplish this task by performing the join as in above, assigning a row number partitioned by the device and then simply return all those row_numbers with a value of 1; which would be the earliest date discrepency.
Use LIMIT to limit the number of record on mysql:
http://www.mysqltutorial.org/mysql-limit.aspx
In your case, use LIMIT 2
and then put the 2 record that you just select into an array, then compare the array if the value is different. If they are different then print

Can SQL query do this?

I have a table "audit" with a "description" column, a "record_id" column and a "record_date" column. I want to select only those records where the description matches one of two possible strings (say, LIKE "NEW%" OR LIKE "ARCH%") where the record_id in each of those two matches each other. I then need to calculate the difference in days between the record_date of each other.
For instance, my table may contain:
id description record_id record_date
1 New Sub 1000 04/14/13
2 Mod 1000 04/14/13
3 Archived 1000 04/15/13
4 New Sub 1001 04/13/13
I would want to select only rows 1 and 3 and then calculate the number of days between 4/15 and 4/14 to determine how long it took to go from New to Archived for that record (1000). Both a New and an Archived entry must be present for any record for it to be counted (I don't care about ones that haven't been archived). Does this make sense and is it possible to calculate this in a SQL query? I don't know much beyond basic SQL.
I am using MySQL Workbench to do this.
The following is untested, but it should work asuming that any given record_id can only show up once with "New Sub" and "Archived"
select n.id as new_id
,a.id as archive_id
,record_id
,n.record_date as new_date
,a.record_date as archive_date
,DateDiff(a.record_date, n.record_date) as days_between
from audit n
join audit a using(record_id)
where n.description = 'New Sub'
and a.description = 'Archieved';
I changed from OR to AND, because I thought you wanted only the nr of days between records that was actually archived.
My test was in SQL Server so the syntax might need to be tweaked slightly for your (especially the DATEDIFF function) but you can select from the same table twice, one side grabbing the 'new' and one grabbing the 'archived' then linking them by record_id...
SELECT
newsub.id,
newsub.description,
newsub.record_date,
arc.id,
arc.description,
arc.record_date,
DATEDIFF(day, newsub.record_date, arc.record_date) AS DaysBetween
FROM
foo1 arc
, foo1 newsub
WHERE
(newsub.description LIKE 'NEW%')
AND
(arc.description LIKE 'ARC%')
AND
(newsub.record_id = arc.record_id)

MySQL query for items where average price is less than X?

I'm stumped with how to do the following purely in MySQL, and I've resorted to taking my result set and manipulating it in ruby afterwards, which doesn't seem ideal.
Here's the question. With a dataset of 'items' like:
id state_id price issue_date listed
1 5 450 2011 1
1 5 455 2011 1
1 5 490 2011 1
1 5 510 2012 0
1 5 525 2012 1
...
I'm trying to get something like:
SELECT * FROM items
WHERE ([some conditions], e.g. issue_date >= 2011 and listed=1)
AND state_id = 5
GROUP BY id
HAVING AVG(price) <= 500
ORDER BY price DESC
LIMIT 25
Essentially I want to grab a "group" of items whose average price fall under a certain threshold. I know that my above example "group by" and "having" are not correct since it's just going to give the AVG(price) of that one item, which doesn't really make sense. I'm just trying to illustrate my desired result.
The important thing here is I want all of the individual items in my result set, I don't just want to see one row with the average price, total, etc.
Currently I'm just doing the above query without the HAVING AVG(price) and adding up the individual items one-by-one (in ruby) until I reach the desired average. It would be really great if I could figure out how to do this in SQL. Using subqueries or something clever like joining the table onto itself are certainly acceptable solutions if they work well! Thanks!
UPDATE: In response to Tudor's answer below, here are some clarifications. There is always going to be a target quantity in addition to the target average. And we would always sort the results by price low to high, and by date.
So if we did have 10 items that were all priced at $5 and we wanted to find 5 items with an average < $6, we'd simply return the first 5 items. We wouldn't return the first one only, and we wouldn't return the first 3 grouped with the last 2. That's essentially how my code in ruby is working right now.
I would do almost an inverse of what Jasper provided... Start your query with your criteria to explicitly limit the few items that MAY qualify instead of getting all items and running a sub-select on each entry. Could pose as a larger performance hit... could be wrong, but here's my offering..
select
i2.*
from
( SELECT i.id
FROM items i
WHERE
i.issue_date > 2011
AND i.listed = 1
AND i.state_id = 5
GROUP BY
i.id
HAVING
AVG( i.price) <= 500 ) PreQualify
JOIN items i2
on PreQualify.id = i2.id
AND i2.issue_date > 2011
AND i2.listed = 1
AND i2.state_id = 5
order by
i2.price desc
limit
25
Not sure of the order by, especially if you wanted grouping by item... In addition, I would ensure an index on (state_id, Listed, id, issue_date)
CLARIFICATION per comments
I think I AM correct on it. Don't confuse "HAVING" clause with "WHERE". WHERE says DO or DONT include based on certain conditions. HAVING means after all the where clauses and grouping is done, the result set will "POTENTIALLY" accept the answer. THEN the HAVING is checked, and if IT STILL qualifies, includes in the result set, otherwise throws it out. Try the following from the INNER query alone... Do once WITHOUT the HAVING clause, then again WITH the HAVING clause...
SELECT i.id, avg( i.price )
FROM items i
WHERE i.issue_date > 2011
AND i.listed = 1
AND i.state_id = 5
GROUP BY
i.id
HAVING
AVG( i.price) <= 500
As you get more into writing queries, try the parts individually to see what you are getting vs what you are thinking... You'll find how / why certain things work. In addition, you are now talking in your updated question about getting multiple IDs and prices at apparent low and high range... yet you are also applying a limit. If you had 20 items, and each had 10 qualifying records, your limit of 25 would show all of the first item and 5 into the second... which is NOT what I think you want... you may want 25 of each qualified "id". That would wrap this query into yet another level...
What MySQL does makes perfectly sense. What you want to do does not make sense:
if you have let's say 4 items, each with price of 5 and you put HAVING AVERAGE <= 7 what you say is that the query should return ALL the permutations, like:
{1} - since item with id 1, can be a group by itself
{1,2}
{1,3}
{1,4}
{1,2,3}
{1,2,4}
...
and so on?
Your algorithm of computing the average in ruby is also not valid, if you have items with values 5, 1, 7, 10 - and seek for an average value of less than 7, element with value 10 can be returned just in a group with element of value 1. But, by your algorithm (if I understood correctly), element with value 1 is returned in the first group.
Update
What you want is something like the Knapsack problem and your approach is using some kind of Greedy Algorithm to solve it. I don't think there are straight, easy and correct ways to implement that in SQL.
After a google search, I found this article which tries to solve the knapsack problem with AI written in SQL.
By considering your item price as a weight, having the number of items and the desired average, you could compute the maximum value that can be entered in the 'knapsack' by multiplying desired_cost with number_of_items
I'm not entirely sure from your question, but I think this is a solution to your problem:
SELECT * FROM items
WHERE (some "conditions", e.g. issue_date > 2011 and listed=1)
AND state_id = 5
AND id IN (SELECT id
FROM items
GROUP BY id
HAVING AVG(price) <= 500)
ORDER BY price DESC
LIMIT 25
note: This is off the top of my head and I haven't done complex SQL in a while, so it might be wrong. I think this or something like it should work, though.

Grouping timestamps in MySQL with PHP

I want to log certain activities in MySql with a timecode using time(). Now I'm accumulating thousands of records, I want to output the data by sets of hours/days/months etc.
What would be the suggested method for grouping time codes in MySQL?
Example data:
1248651289
1248651299
1248651386
1248651588
1248651647
1248651700
1248651707
1248651737
1248651808
1248652269
Example code:
$sql = "SELECT COUNT(timecode) FROM timecodeTable";
//GROUP BY round(timecode/3600, 1) //group by hour??
Edit:
There's two groupings that can be made so I should make that clearer: The 24 hours in the day can be grouped but I'm more interested in grouping over time so returning 365 results for each year the tracking is in place, so total's for each day passed, then being able to select a range of dates and see more details on hours/minutes accessed over those times selected.
This is why I've titled it as using PHP, as I'd expect this might be easier with a PHP loop to generate the hours/days etc?
Peter
SELECT COUNT(*), HOUR(timecode)
FROM timecodeTable
GROUP BY HOUR(timecode);
Your result set, given the above data, would look as such:
+----------+----------------+
| COUNT(*) | HOUR(timecode) |
+----------+----------------+
| 10 | 18 |
+----------+----------------+
Many more related functions can be found here.
Edit
After doing some tests of my own based on the output of your comment I determined that your database is in a state of epic fail. :) You're using INT's as TIMESTAMPs. This is never a good idea. There's no justifiable reason to use an INT in place of TIMESTAMP/DATETIME.
That said, you'd have to modify my above example as follows:
SELECT COUNT(*), HOUR(FROM_UNIXTIME(timecode))
FROM timecodeTable
GROUP BY HOUR(FROM_UNIXTIME(timecode));
Edit 2
You can use additional GROUP BY clauses to achieve this:
SELECT
COUNT(*),
YEAR(timecode),
DAYOFYEAR(timecode),
HOUR(timecode)
FROM timecodeTable
GROUP BY YEAR(timecode), DAYOFYEAR(timecode), HOUR(timecode);
Note, I omitted the FROM_UNIXTIME() for brevity.