generate index columns for "ORDER BY x, y" - mysql

I use this query to summarize the contents of the table export_blocks, aggregated by user and date, and save it as a new table:
CREATE TABLE export_days
SELECT user_id DATE(submitted) AS date_str,
FROM export_blocks
GROUP BY user_id, DATE(submitted)
ORDER BY user_id, submitted
How can I, for each user_id get an incremental index for the date of records for that user? The indicies should start at 1 for each user, following the ORDER BY. I.e. I'd like to generate the date_index of the output below using SQL:
user_id date_str date_index
brian 2014-06-10 1
brian 2014-06-12 2
brian 2014-06-15 3
louis 2014-06-08 1
louis 2014-06-16 2
lucy 2013-11-15 1
(etc...)
I've been trying https://stackoverflow.com/a/5493480/1297830 but I cannot get it to work. It stops the counters prematurely, giving too low numbers for id_no and date_no.

Basing it on your sample query, you can do simple (dependent) subqueries to get the result;
SELECT id, date_str,
(SELECT COUNT(DISTINCT id)+1 FROM mytable WHERE id < a.id) id_no,
(SELECT COUNT(id)+1 FROM mytable WHERE id = a.id AND date_str < a.date_str) date_no
FROM mytable a
ORDER BY id;
...or you could do a couple of self joins;
SELECT a.id, a.date_str,
COUNT(DISTINCT b.id)+1 id_no,
COUNT(DISTINCT c.date_str)+1 date_no
FROM mytable a
LEFT JOIN mytable b ON a.id > b.id
LEFT JOIN mytable c ON a.id = c.id AND a.date_str > c.date_str
GROUP BY a.id, a.date_str
ORDER BY a.id, a.date_str;
An SQLfiddle showing both in action.
Sadly neither is really a very performant solution, but since MySQL lacks analytical (ie ranking) functions, the options are limited. Using user variables to do the ranking is also an option, however they're notoriously tricky to use and aren't portable so I'd go there only if performance demands it.

Based on Joachim's excellent answer I worked out the solution. It also works when there's multiple rows per day for each user.
CREATE TABLE export_days
SELECT
user_id, DATE(submitted) AS date_str,
(SELECT COUNT(DISTINCT DATE(submitted))+1 FROM export_blocks WHERE user_id = a.user_id AND submitted < a.submitted) date_no
FROM export_blocks a
GROUP BY user_id, DATE(submitted)
ORDER BY user_id, submitted

Related

When will recursive query stop in this case?

Given this table description.
I have written a query to find Users who logged in for 5 or more consecutive days.
WITH RECURSIVE
rec_t AS
(SELECT id, login_date, 1 AS days FROM Logins
UNION ALL
SELECT l.id, l.login_date, rec_t.days+1 FROM rec_t
INNER JOIN Logins l
ON rec_t.id = l.id AND DATE_ADD(rec_t.login_date, INTERVAL 1 DAY) = l.login_date
)
SELECT * FROM Accounts
WHERE id IN
(SELECT DISTINCT id FROM rec_t WHERE days = 5)
ORDER BY id
Code Explanation :
For every id and login date, match the CTE table with the same id and +1 login_date.
the "days" column just increments +1 everytime the same user_id appears.
The Problem:
Although the query works fine, I just don't know where am I asking the query to stop the recursion. There isn't a "where" in RECURSIVE CTE definition. However, the inner join might help to dictate that there are no more login_date to match on. But I am uncertain that is the case.

Find users with most number of common pages they liked

I am trying to find pairs of users who liked the same pages and list the ones who have the most common page likes at the top.
For simplicity I am considering the following table schema
Likes (LikeID, UserID)
LikeDetail (LikeDetailID, LikeID, PageID)
I am trying to find pairs of users with most number of common page likes ordered descending. E.g User1 and User2 have liked 3 pages in common.
I would to have the resulting set of the query to be
UserID1 UserID2 NoOfCommonLikes
2 3 10
4 3 8
1 5 4
I am guessing it would need aggregation, join and aliases however I needed to rename a table twice using AS which did not work for me.
Any tip would be appreciated in MySQL, or SQL Server.
In SQL Server and MySQL 8+, you can use a CTE which JOINs the Likes and LikeDetail table, and then self-JOIN that where PageID is the same but UserID is not, and then grouping on the two userID values:
WITH CTE AS
(SELECT l.UserId, d.PageID
FROM Likes l
JOIN LikeDetail d ON d.LikeID = l.likeID)
SELECT l1.UserId AS UserID1, l2.UserID AS UserID2, COUNT(*) AS NoOfCommonLikes
FROM CTE l1
JOIN CTE l2 ON l2.PageID = l1.PageID AND l2.UserID < l1.UserID
GROUP BY l1.UserID, l2.UserID
ORDER BY COUNT(*) DESC
In versions of MySQL prior to 8.0, you need to repeat the CTE defintion twice in a JOIN to achieve the same result:
SELECT l1.UserId AS UserID1, l2.UserID AS UserID2, COUNT(*) AS NoOfCommonLikes
FROM (SELECT l.UserId, d.PageID
FROM Likes l
JOIN LikeDetail d ON d.LikeID = l.likeID) l1
JOIN (SELECT l.UserId, d.PageID
FROM Likes l
JOIN LikeDetail d ON d.LikeID = l.likeID) l2 ON l2.PageID = l1.PageID AND l2.UserID < l1.UserID
GROUP BY l1.UserID, l2.UserID
ORDER BY COUNT(*) DESC
Note that we use < in the UserID comparison rather than != to avoid getting duplicate rows (e.g. for (UserID1, UserID2) = (1, 2) and (UserID1, UserID2) = (2, 1).
I've made a small demo on dbfiddle which demonstrate the queries.

How to Optimize mysql query the brings huge quantity of rows?

I really need your help. I´m doing one work from my universitiy and before I come here I read a lot of things from the documentations of mysql, searched and searched but none of this helped me in my sql query. Look I have this query:
SELECT a.nome, COUNT(*)
FROM publ p JOIN auth a on p.pubid = a.pubid
WHERE p.pubid IN (SELECT pubid
FROM auth
GROUP BY pubid
HAVING COUNT(*) < 3) // THIS VALUE 3 here I have to do with value 2, 4 and 5
GROUP BY a.nome // in different querys.
ORDER BY COUNT(*) DESC, a.nome ASC
I tried to put index in the where clause but I never get the results and takes to long time. What can I do to increase my query to bring me more faster the results? Thank you for the help
I would create these indexes and reorder the query
CREATE INDEX publ_pubid ON publ(pubid);
CREATE INDEX auth_pubid ON auth(pubid, nome);
SELECT a.nome, COUNT(*)
FROM (
SELECT pubid
FROM auth
GROUP BY pubid
HAVING COUNT(*) < 3
) L
LEFT JOIN publ p on L.pubid=publ.pubid
JOIN auth a on p.pubid = a.pubid
GROUP BY a.nome
ORDER BY COUNT(*) DESC, a.nome ASC;

MySQL group and sum with joined tables

I've got pretty tricky problem with MySQL.
I have two tables with one to many relation (below colums that are relevant)
Table A (campaigns):
id | channel_type | date
Table B (budgets):
id | campaign_id | budget
I need single query to fetch following result:
Campaign count by channel_type
Sum of all budgets that are related to found campaigns.
I need to filter results by columns in campaigns table (e.g. WHERE campaigns.date > '2014-05-01')
I have tried following approach:
SELECT channel_type, COUNT(*) cnt,
(SELECT SUM(budget) FROM budgets WHERE budgets.campaign_id = campaigns.id))
as budget
FROM campaigns
WHERE campaigns.date >= 'some-value'
AND [more conditions]
GROUP BY campaigns.channel_type
But this of course fails miserably because of GROUP i am getting only first campaigns.id result for channel_type.
Any tips (and solution) would be really appreciated!
TIA
Get the total budget from budgets table using GROUP BY campain_id. It will be subquery. Name it. For example, A.
Now get the total id counts from campains using GROUP BY channel_type and WHERE date>='some-value'.
Use step 2 and 1(the subquery will act as table) in the final query and you will get the results.
You can post schema and then I can check.
I think this should work :
SELECT channel_type, COUNT(*) cnt,
(SELECT SUM(t2.budget) FROM budgets t2 WHERE t2.campaign_id IN (
SELECT t3.id FROM campaigns t3 WHERE t3.channel_type = t1.channel_type))
AS budget
FROM campaigns t1
WHERE t1.date >= 'some-value'
AND [more conditions]
GROUP BY t1.channel_type
see this fiddle
I've found working solution.
Here's working query:
SELECT SUM(budget) as budget, COUNT(*) as count FROM
(SELECT * FROM campaigns WHERE [conditions]) AS found_campaigns
LEFT JOIN budgets ON budgets.campaign_id = found_campaigns.id
GROUP BY channel_type

grouping records conditionally

My Google-fu is coming up short on this one. I've got a table of transactions, like this:
id email source amount timestamp
1 daniel#example.com vendor 10 2014-03-10 23:34:40
2 john#example.com website 15 2014-03-11 13:30:00
3 mary#example.com website 50 2014-03-11 17:30:00
4 daniel#example.com website 65 2014-03-13 20:06:30
5 mary#example.com vendor 10 2014-03-14 16:20:30
I want to be able to group these by email, but only for users who:
A) came in through the 'vendor' source initially, and
B) also made a transaction through the 'website' source.
So for the above sample data, I would want this:
email total_amount transactions
daniel#example.com 75 2
Mary would not be included because her first transaction was through 'website', and not 'vendor'. John would not be included because he did not have a transaction through the vendor at all.
EDIT:
Less ideal, but still useful, would be this result set:
email total_amount transactions
daniel#example.com 75 2
mary#example.com 60 2
Where Mary and Daniel are both included because they both came in through the 'vendor' source in at least one transaction.
SELECT A.Email, sum(B.Amount) as Total_Amount, count(B.time) as Transactions
FROM tableName A
INNER join tableName B
on A.Email=B.Email
AND A.source='vendor'
Group By A.Email
Requirements are a bit unclear as you initially indicate the must initially come though vendor, but then you retract that statement later by adding mary.
http://sqlfiddle.com/#!2/bb4f9/1/0
If date/timestamps are important add an AND clause for A.Time<= B.Time and aggregrate the A.Amoun t and A.time and add those in like...
SELECT A.Email, sum(B.Amount)+ sum(A.Amount) as Total_Amount, count(B.time)+count(A.Time) as Transactions
FROM tableName A
INNER join tableName B
on A.Email=B.Email
AND A.source='vendor'
and A.Time<=B.Time
Group By A.Email
But this assumes vendor entry will only occur once for each email
So this solution first finds a vendor entry (if there's more than one for an email address this will not return accurate counts) then it finds any entries for the same email address with a source of website occurring after that vendor entry and aggregates the totals for that email adding in the vendor entry totals. While it works for the same data provided, it may not work as desired if multiple vendor entries exist for the same email. Without understanding how the totals should occur or if multiple data exists, or understanding why you need this information based on this data, I can't think of a better option without making lots of assumptions.
SELECT A.Email, sum(B.Amount)+sum(A.Amount) as Total_Amount,
count(B.time)+count(A.Time) as Transactions
FROM tableName A
INNER join tableName B
on A.Email=B.Email
AND A.source='vendor'
AND A.Time < B.Time and B.Source='website'
Group By A.Email
This query should give you the desired result by using a subquery to find the persons that have an initial 'vendor' record followed by a 'website' record, before collecting the summary information from the records for these persons.
If you remove the lines marked with -- *, persons whose 'vendor' record is not their first one is also included.
SELECT email, SUM(amount) AS total_amount, COUNT(*) AS transactions
FROM transactions
WHERE email IN
(SELECT t1.email FROM transactions t1
LEFT JOIN transactions t0 -- *
ON t0.email = t1.email AND t0.timestamp < t1.timestamp -- *
LEFT JOIN transactions t2
ON t2.email = t1.email
WHERE t1.source = 'vendor' AND t2.source = 'website'
AND t0.email IS NULL -- *
)
GROUP BY email;
See http://www.sqlfiddle.com/#!2/864898/8/0
Your query should look like this :
select email, sum(amount) ,count(*)
from tbl
where email='daniel#example.com'
group by email;
OR - to count all email !
select email, sum(amount) ,count(*)
from tbl
group by email;
All by vendor
select email, sum(amount) ,count(*)
from tbl
where source ='vendor'
group by email;
Also demo here:
http://sqlfiddle.com/#!2/de36ed/2
Try this :-
select x1.email_id,(x1.tot + x2.tot)as total_amount,(x1.cnt + x2.cnt)as transactions from
(select t1.email_id,count(t1.email_id)as cnt,sum(t1.totalamt)as tot from testdata t1 where t1.sourcee='web' group by t1.email_id)x1
inner join (select t2.email_id,count(t2.email_id)as cnt,sum(t2.totalamt)as tot from testdata t2 where t2.sourcee='vendor' group by t2.email_id)x2
on x1.email_id=x2.email_id group by x1.email_id;
Output :-
Its working fine.If required you please change the field name as per your table structure.
Hope it will help you.