mysql query orderby groupby, - mysql

I have a table with the following key fields: commentid, comment, time, parentid
The comments are of two types, regular and replies. Regular have parentid 0; replies have parentid equal to the commentid they reply to.
I thought I had this figured out with group by and orderby but my idea didn't work and I'm now struggling as my knowledge of sql is rudimentary.
I want it to display by time DESC except that I would like those with the same parentid grouped together and within that parentid, that they also be be sorted by time. I am allowing only one level of reply.
As an example, I would like following order:
time3 commentid3 parentid0
time2 commentid2 parentid0
parentid2 commentid4 time4 (this one is a reply)
parentid2 commentid5 time5 (this one also reply)
time1 comment1 parentid0
I tried SELECT * from comments GROUP BY parentid ORDER BY TIME DESC but this did not work. If needed, I can add another column. Any suggestions would be appreciated! Thx.

I'm making a few assumptions here. I'm assuming your commentid is an auto-incrementing id, so that would mean that the insert order would be from oldest to newest. This will not work if you are not using auto-incrementing ids or if you have some kind of partial-save functionality with these tables. So it's kind of fragile.
I'm also assuming parent_id is null if it is the parent.
SELECT commentid, comment, time, parent_id, if(parent_id = 0, commentid, parent_id) thread
FROM comments
ORDER BY thread desc, time asc
Anyway to add some more information.
Group By is not what you want to use because it will group all rows by the grouping column into one row. Group By is usually used for aggregate calculations such as counting or summing the values in rows, etc.
EDIT:
I updated the query to sort by time asc, which will put the regular comment first and then the replies below the parent comment from oldest to newest.

You can't retrieve a "tree-like" result from a sql query (well there are ways, but they don't seem practically usable here)
What you can do is to retrieve all datas from regular and replies (which implies that "regular" datas will be replicated if they have many replies, and you'll have to treat them after getting the data, to get a "tree" result)
result will look like that
regular.time, regular.commentid, replies.commentid, replies.time
if regular has no comments, replies.commentid and replies.time will be null
Query (with a "self left join") would look like that (I moved some fields that seems to be useless now)
select
regular.time as regulartime,
regular.commentid as regularid,
replies.commentid as repliesid,
replies.time as repliestime
from comments regular
left join comments replies on replies.parentid = parent.commentid
where regular.parentid = 0
order by regular.time desc, replies.time asc
Following your example, you should get
time3 commentid3 null null
time2 commentid2 commentid4 time4 (this one is a reply)
time2 commentid2 commentid5 time5 (this one also reply)
time1 commentid1 null nulll

To get both layers of the data, you will need a UNION of just the top-most layer of the original comment, and another for any POSSIBLE replies. The first column of the first part of the query will hold a 1 or 2 for sorting purposes. This will be used to float the original post to the top of the group for a given question... then, all replies will show in natural order after that.
Also, to retain the proper grouping by original date/time, I am preserving the original post comment time with the '2' CommentType records so they do stay grouped with exact same original time start basis, but grab the actual comment and time of the RESPONSE (alias "r") for their respective sorting.
select
PreQuery.*
from
( select
'1' as CommentType,
c.Time as OriginalTime,
c.CommentID StartingCommentID,
c.Comment,
c.Time as LastTime,
c.CommentID as EndCommentID
from
comments c
where
c.ParentID = 0
UNION ALL
select
'2' as CommentType,
c.Time as OriginalTime,
c.CommentID StartingCommentID,
r.Comment,
r.Time as LastTime,
r.CommentID as EndCommentID
from
comments c
join comments r
on c.CommentID = r.ParentID
where
c.ParentID = 0 ) PreQuery
order by
PreQuery.OriginalTime DESC,
PreQuery.StartingCommentID,
PreQuery.CommentType,
PreQuery.LastTime
This should give you the results I think you are looking for (slightly modified)
CommentType OriginalTime StartingCommentID Comment LastTime EndCommentID
1 Time3 ID3 Comm3 Time3 ID3 <-- ID 3 IS the start
1 Time2 ID2 Comm2 Time2 ID2 <-- ID 2 is the start of next
2 Time2 ID2 Comm4 Time4 ID4 <- ID4 is reply to orig ID2
2 Time2 ID2 Comm5 Time5 ID5 <- another reply to ID2
1 Time1 ID1 Comm1 Time1 ID1 <-- start of new comment ID1
So, for all rows, the 2nd and 3rd columns will always represent the parent ID that is starting the first comment... and for those with comment type = 1, the comment, last time and end comment ID is the actual content from the starting comment. For comment type = 2, the final comment, last time and end comment will be the ID of the RESPONSE record.

Related

Select max date by grouping?

PLEASE will someone help? I've put HOURS into this silly, stupid problem. This stackoverview post is EXACTLY my question, and I have tried BOTH suggested solutions to no avail.
Here are MY specifics. I have extracted 4 records from my actual database, and excluded no fields:
master_id date_sent type mailing response
00001 2015-02-28 00:00:00 PHONE NULL NULL
00001 2015-03-13 14:45:20 EMAIL ThankYou.html NULL
00001 2015-03-13 14:34:43 EMAIL ThankYou.html NULL
00001 2015-01-11 00:00:00 EMAIL KS_PREVIEW TRUE
00001 2015-03-23 21:42:03 EMAIL MailChimp Update #2 NULL
(sorry about the alignment of the columns.)
I want to get the most recent mailing and date_sent for each master_id. (My extract is of only one master_id to make this post simple.)
So I run this query:
SELECT master_id,date_sent,mailing
FROM contact_copy
WHERE type="EMAIL"
and get the expected result:
master_id date_sent mailing
1 3/13/2015 14:45:20 ThankYou.html
1 3/13/2015 14:34:43 ThankYou.html
1 1/11/2015 0:00:00 KS_PREVIEW
1 3/23/2015 21:42:03 MailChimp Update #2
BUT, when I add this simple aggregation to get the most recent date:
SELECT master_id,max(date_sent),mailing
FROM contact_copy
WHERE type="EMAIL"
group BY master_id
;
I get an UNEXPECTED result:
master_id max(date_sent) mailing
00001 2015-03-23 21:42:03 ThankYou.html
So my question: why is it returning the WRONG MAILING?
It's making me nuts! Thanks.
By the way, I'm not a developer, so sorry if I'm breaking some etiquette rule of asking. :)
That's because when you use GROUP BY, all the columns have to be aggregate columns, and mailing is not one of them..
You should use a subquery or a join to make it work
SELECT master_id,date_sent,mailing
FROM contact_copy cc
JOIN
( SELECT master_id,max(date_sent)
FROM contact_copy
WHERE type="EMAIL"
group BY master_id
) result
ON cc.master_id= result.master_id AND cc.date_sent=result.date_sent
You're getting an "unexpected" result because of a MySQL specific extension to the GROUP BY functionality. The result you're getting is actually expected, according to the MySQL Reference Manual.
Ref: https://dev.mysql.com/doc/refman/5.5/en/group-by-handling.html
Other database engines would reject your query as invalid... an error along the lines of "non-aggregate expressions included in the SELECT list not included in the GROUP BY".)
We can get MySQL to behave like other databases (and return an error for that query) if we include ONLY_FULL_GROUP_BY in the SQL mode.
Ref: https://dev.mysql.com/doc/refman/5.5/en/sql-mode.html#sqlmode_only_full_group_by
To get the result you are looking for...
If the (master_id,type,date_sent) tuple is UNIQUE in contact_copy (that is, if for given values of master_id and type, there will be no "duplicate" values of date_sent), we could use a JOIN operation to retrieve the specified result.
First, we write a query to get the "maximum" date_sent for a given master_id and type. For example:
SELECT mc.master_id
, mc.type
, MAX(mc.date_sent) AS max_date_sent
FROM contact_copy mc
WHERE mc.master_id = '0001'
AND mc.type = 'EMAIL'
To retrieve the entire row associated with that "maximum" date_sent, we can use that query as an inline view. That is, wrap the query text in parens, assign an alias, and then reference that as if it were a table, for example:
SELECT c.master_id
, c.date_sent
, c.mailing
FROM ( SELECT mc.master_id
, mc.type
, MAX(mc.date_sent) AS max_date_sent
FROM contact_copy mc
WHERE mc.master_id = '0001'
AND mc.type = 'EMAIL'
) m
JOIN contact_copy c
ON c.master_id = m.master_id
AND c.type = m.type
AND c.date_sent = m.max_date_sent
Note that if there are multiple rows that have the same values of master_id,type and date_sent, there is potential to return more than one row. You could add a LIMIT 1 clause to guarantee that you return only one row; which of those rows is returned is indeterminate, without an ORDER BY clause before the LIMIT clause.

MySQL ORDER BY Column = value AND distinct?

I'm getting grey hair by now...
I have a table like this.
ID - Place - Person
1 - London - Anna
2 - Stockholm - Johan
3 - Gothenburg - Anna
4 - London - Nils
And I want to get the result where all the different persons are included, but I want to choose which Place to order by.
For example. I want to get a list where they are ordered by LONDON and the rest will follow, but distinct on PERSON.
Output like this:
ID - Place - Person
1 - London - Anna
4 - London - Nils
2 - Stockholm - Johan
Tried this:
SELECT ID, Person
FROM users
ORDER BY FIELD(Place,'London'), Person ASC "
But it gives me:
ID - Place - Person
1 - London - Anna
4 - London - Nils
3 - Gothenburg - Anna
2 - Stockholm - Johan
And I really dont want Anna, or any person, to be in the result more then once.
This is one way to get the specified output, but this uses MySQL specific behavior which is not guaranteed:
SELECT q.ID
, q.Place
, q.Person
FROM ( SELECT IF(p.Person<=>#prev_person,0,1) AS r
, #prev_person := p.Person AS person
, p.Place
, p.ID
FROM users p
CROSS
JOIN (SELECT #prev_person := NULL) i
ORDER BY p.Person, !(p.Place<=>'London'), p.ID
) q
WHERE q.r = 1
ORDER BY !(q.Place<=>'London'), q.Person
This query uses an inline view to return all the rows in a particular order, by Person, so that all of the 'Anna' rows are together, followed by all the 'Johan' rows, etc. The set of rows for each person is ordered by, Place='London' first, then by ID.
The "trick" is to use a MySQL user variable to compare the values from the current row with values from the previous row. In this example, we're checking if the 'Person' on the current row is the same as the 'Person' on the previous row. Based on that check, we return a 1 if this is the "first" row we're processing for a a person, otherwise we return a 0.
The outermost query processes the rows from the inline view, and excludes all but the "first" row for each Person (the 0 or 1 we returned from the inline view.)
(This isn't the only way to get the resultset. But this is one way of emulating analytic functions which are available in other RDBMS.)
For comparison, in databases other than MySQL, we could use SQL something like this:
SELECT ROW_NUMBER() OVER (PARTITION BY t.Person ORDER BY
CASE WHEN t.Place='London' THEN 0 ELSE 1 END, t.ID) AS rn
, t.ID
, t.Place
, t.Person
FROM users t
WHERE rn=1
ORDER BY CASE WHEN t.Place='London' THEN 0 ELSE 1 END, t.Person
Followup
At the beginning of the answer, I referred to MySQL behavior that was not guaranteed. I was referring to the usage of MySQL User-Defined variables within a SQL statement.
Excerpts from MySQL 5.5 Reference Manual http://dev.mysql.com/doc/refman/5.5/en/user-variables.html
"As a general rule, other than in SET statements, you should never assign a value to a user variable and read the value within the same statement."
"For other statements, such as SELECT, you might get the results you expect, but this is not guaranteed."
"the order of evaluation for expressions involving user variables is undefined."
Try this:
SELECT ID, Place, Person
FROM users
GROUP BY Person
ORDER BY FIELD(Place,'London') DESC, Person ASC;
You want to use group by instead of distinct:
SELECT ID, Person
FROM users
GROUP BY ID, Person
ORDER BY MAX(FIELD(Place, 'London')), Person ASC;
The GROUP BY does the same thing as SELECT DISTINCT. But, you are allowed to mention other fields in clauses such as HAVING and ORDER BY.

Can SQL query do this?

I have a table "audit" with a "description" column, a "record_id" column and a "record_date" column. I want to select only those records where the description matches one of two possible strings (say, LIKE "NEW%" OR LIKE "ARCH%") where the record_id in each of those two matches each other. I then need to calculate the difference in days between the record_date of each other.
For instance, my table may contain:
id description record_id record_date
1 New Sub 1000 04/14/13
2 Mod 1000 04/14/13
3 Archived 1000 04/15/13
4 New Sub 1001 04/13/13
I would want to select only rows 1 and 3 and then calculate the number of days between 4/15 and 4/14 to determine how long it took to go from New to Archived for that record (1000). Both a New and an Archived entry must be present for any record for it to be counted (I don't care about ones that haven't been archived). Does this make sense and is it possible to calculate this in a SQL query? I don't know much beyond basic SQL.
I am using MySQL Workbench to do this.
The following is untested, but it should work asuming that any given record_id can only show up once with "New Sub" and "Archived"
select n.id as new_id
,a.id as archive_id
,record_id
,n.record_date as new_date
,a.record_date as archive_date
,DateDiff(a.record_date, n.record_date) as days_between
from audit n
join audit a using(record_id)
where n.description = 'New Sub'
and a.description = 'Archieved';
I changed from OR to AND, because I thought you wanted only the nr of days between records that was actually archived.
My test was in SQL Server so the syntax might need to be tweaked slightly for your (especially the DATEDIFF function) but you can select from the same table twice, one side grabbing the 'new' and one grabbing the 'archived' then linking them by record_id...
SELECT
newsub.id,
newsub.description,
newsub.record_date,
arc.id,
arc.description,
arc.record_date,
DATEDIFF(day, newsub.record_date, arc.record_date) AS DaysBetween
FROM
foo1 arc
, foo1 newsub
WHERE
(newsub.description LIKE 'NEW%')
AND
(arc.description LIKE 'ARC%')
AND
(newsub.record_id = arc.record_id)

mysql first record retrieval

While very easy to do in Perl or PHP, I cannot figure how to use mysql only to extract the first unique occurence of a record.
For example, given the following table:
Name Date Time Sale
John 2010-09-12 10:22:22 500
Bill 2010-08-12 09:22:37 2000
John 2010-09-13 10:22:22 500
Sue 2010-09-01 09:07:21 1000
Bill 2010-07-25 11:23:23 2000
Sue 2010-06-24 13:23:45 1000
I would like to extract the first record for each individual in asc time order.
After sorting the table is ascending time order, I need to extract the first unique record by name.
So the output would be :
Name Date Time Sale
John 2010-09-12 10:22:22 500
Bill 2010-07-25 11:23:23 2000
Sue 2010-06-24 13:23:45 1000
Is this doable in an easy fashion with mySQL?
I think that something along the lines of
select name, date, time, sale from mytable order by date, time group by name;
will get you what you're looking for
you need to perform a groupwise max or groupwise min
see below or http://pastie.org/973117 for an example
select
u.user_id,
u.username,
latest.comment_id
from
users u
left outer join
(
select
max(comment_id) as comment_id,
user_id
from
user_comment
group by
user_id
) latest on u.user_id = latest.user_id;
In databases, there really is no "first" or "last" record; think of each record as its own, non-positional entity in the table. The only positions they have are when you give them one, say, using ORDER BY.
This will give you what you want. It might not be efficient, but it works.
select Name, Date, Time, Sale from
(select Name, Date, Time, Sale from MyTable
order by Date asc, Time asc) MyTable_subquery_name
group by Name
Note: MyTable_subquery_name is just a dummy name for the subquery. MySQL will give the error ERROR 1248 (42000): Every derived table must have its own alias without it.
If only GROUP BY and ORDER BY were communicative operations, then this wouldn't have to be a subquery.

Help diagnose bizzare MySQL query behavior

I have a very specific query that is acting up and I could use any help at all with debugging it.
There are 4 tables involved in this query.
Transaction_Type
Transaction_ID (primary)
Transaction_amount
Transaction_Type
Transaction
Transaction_ID (primary)
Timestamp
Purchase
Transaction_ID
Item_ID
Item
Item_ID
Client_ID
Lets say there is a transaction in which someone pays $20 in cash and $0 in credit it inserts two rows into the table.
//row 1
Transaction_ID: 1
Transaction_amount: 20.00
Transaction_type: cash
//row 2
Transaction_ID: 1
Transaction_amount: 0.00
Transaction_type: credit
here is the specific query:
SELECT
tt.Transaction_Amount, tt.Transaction_ID
FROM
ItemTracker_dbo.Transaction_Type tt
JOIN
ItemTracker_dbo.Transaction t
ON
tt.Transaction_ID = t.Transaction_ID
JOIN
ItemTracker_dbo.Purchase p
ON
p.Transaction_ID = tt.Transaction_ID
JOIN
ItemTracker_dbo.Item i
ON
i.Item_ID = p.Item_ID
WHERE
t.TimeStamp >= "2010-01-06 00:00:00" AND t.TimeStamp <= "2010-01-06 23:59:59"
AND
tt.Transaction_Format IN ('cash', 'credit')
AND
i.Client_ID = 3
when I execute this query, it returns 4 rows for a specific transaction. (it should be 2)
When I remove ALL where clauses and insert WHERE tt.Transaction_ID = problematicID it only returns two.
EDIT:::::
still repeats upon changing date range
The kicker:
When I change the initial daterange it only returns two rows for that specific transaction_id.
::::
Is it the way I use join? that's all I can think of...
EDIT: This is the problem
in purchase - two sepparate purchase_ID's can have the same transaction_ID (purhcase_ID breaks down specific item sales).
There are duplicate Transaction_ID rows in purchase_ID
We need to see all the data in all the tables to be able to know where the problem is. However, because the joins are the problem it is because one of your tables has two rows when you think it has only one.
There's a problem with your schema. You have rows with the same transaction_id, which is the primary key. I would think they couldn't be marked primary in that database. With two rows with the same id, that could cause unexpected extra rows to come back from the join(s).