MySQL: Get latest message grouped by recipient's mobile number - mysql

I have a series of SMS messages I got from a dump and am planning on arranging it so each row only displays the latest message in each thread. I'm having problems writing this query but essentially since the Sender is always the same number (gateway number) then it would be best to group it by the Recipient (groupby number).
I can imagine this like email where the latest message being displayed can either be from the Sender or the Recipient (whichever is latest) but nonetheless grouped by the Recipient. Honestly, I don't know how to go about this.
Messages table. Type out means gateway sent it, in means groupby sent it.
| id | groupby | gateway | message | type | created |
-------------------------------------------------------------------------------
| 1 | +111 | +789 | Hello | out | 2015-01-01 00:00:00 |
| 2 | +222 | +789 | World | out | 2015-01-02 00:00:00 |
| 3 | +111 | +789 | What's | in | 2015-01-03 00:00:00 |
| 4 | +222 | +789 | New | in | 2015-01-04 00:00:00 |
| 5 | +111 | +789 | With You? | out | 2015-01-05 00:00:00 |
-------------------------------------------------------------------------------
So the result should be:
Result in html.
| id | groupby | message | sent from |
------------------------------------------------
| 5 | +111 | With You? | +789 |
| 4 | +222 | New | +222 |
------------------------------------------------

You can do this in many ways and one is
select
m.* from messages m
join(
select
groupby,
max(created) as created
from messages group by groupby
)m1
on m1.groupby = m.groupby and m1.created = m.created

SELECT id, `groupby`, message,
(CASE WHEN typo='out' THEN `groupby` ELSE gateway END) AS sentfrom
FROM __table__name
GROUP BY sentfrom ORDER BY created DESC;

Related

How to join 4 mysql tables WHERE the retrieved records are BETWEEN two dates

I have 4 mysql tables (Projects, Rates, Times, Users) structured as follows (* = primary key):
Projects:
+------------+------+-----+
| projectID* | type | ... |
+------------+------+-----+
| ABCDEF | MGT | ... |
| GHIASD | GRT | ... |
+------------+------+-----+
Rates:
+---------+---------+---------+-----+
| type* | rateID* | rate | ... |
+---------+---------+---------+-----+
| user | A003 | 60.4697 | ... |
| user | A004 | 69.2197 | ... |
| account | A003 | 12.321 | ... |
| account | A004 | 12.345 | ... |
+---------+---------+---------+-----|
--> The table does not hold rates for all users.
Times:
+---------+-----------+--------+---------------------+----------+-----+
| timeID* | projectID | userID | dateTime | duration | ... |
+---------+-----------+--------+---------------------+----------+-----+
| 1 | ABCDEF | A003 | 2022-01-14 00:00:00 | 34200 | ... |
| 2 | GHIASD | A004 | 2022-03-12 07:30:00 | 34200 | ... |
+---------+-----------+--------+---------------------+----------+-----+
--> datetime + duration (in seconds) = time worked
Users:
+---------+-----------+----------+--------+-----+
| userID* | firstName | lastName | status | ... |
+---------+-----------+----------+--------+-----+
| A003 | John | Doe | Active | ... |
| A004 | Jane | Doe | Active | ... |
+---------+-----------+----------+--------+-----+
I wrote the following query, hoping it would get me
total hours worked per project and user
total cost per project and user
project id, project type, user id, frist name, last name, rate
between two specified dates (usually within a 14 days period):
SELECT
ROUND(SUM(Times.duration) / 3600, 2) as totalHours,
ROUND(SUM(Times.duration) / 3600 * Rates.rate, 2) as totalCost,
Times.projectID AS projectID,
Projects.type AS type,
Times.userID AS userID,
Users.firstname as firstname,
Users.lastname as lastname,
Rates.rate as rate
FROM Times
INNER JOIN Users ON Times.userID=Users.userID
INNER JOIN Projects ON Times.projectID=Project.projectID
LEFT JOIN Rates ON Times.userID=Rates.rateID and Rates.type='user'
WHERE Times.dateTime BETWEEN '{yyyy-dd-MM}' AND '{yyyy-dd-MM}'
GROUP BY Times.userID, Times.projectID
ORDER BY Users.lastname, Projects.type ASC;
However, some of the received values are way off, e.g. getting a total of 400h worked within a 2 week period, where I should have gotten a total of 80h.
I figured that I must be doing something wrong and would appreciate any help to get the query right.

Is it possible to use two COUNT and two JOIN in a SQL query from 3 tables?

So what I'm trying to do here is get a report on how many emails (with a MailChimp like app) were sent by different users, but I want two different metrics in one query. I want to know how many individual emails were sent by each user. Meaning if they sent 3 emails to 100 contacts each, that would display 300. But I also want to know how many unique emails were sent, meaning that would display 3.
I'd like to get something that looks like:
-------------------------------------------------------------
| Full Name | Username | Total Sent | Unique Mails |
|-------------|-----------------|------------|--------------|
| John Doe | jdoe#mail.com | 12000 | 4 |
| James Smith | jsmith#mail.com | 6000 | 12 |
| Jane Jones | jjones#mail.com | 4000 | 2 |
| ... | ... | ... | ... |
-------------------------------------------------------------
So I could know that John sends a few emails to a lot of contacts while James sends more emails to fewer contacts.
Here's what my query looks like. I've changed the table and column names, but this is otherwise an exact representation of what it is.
SELECT
CONCAT(Usernames.FirstName, ' ', Usernames.LastName) AS 'Full Name',
Usernames.Username,
COUNT(Sent_Mail_Contacts.IDContact) AS `Total Sent`,
COUNT(Mass_Mail.IDMass_Mail) AS `Individual E-Mails`
FROM Usernames
LEFT JOIN Sent_Mail_Contacts ON Usernames.Username = Sent_Mail_Contacts.Username
LEFT JOIN Mass_Mail ON Usernames.Username = Mass_Mail.Username
GROUP BY Usernames.Username
ORDER BY `Total Sent`
I have a table with Usernames, a table with individual contacts reached by which emails and a table with unique emails.
So does my query make sense or not? Is this even possible? Because right now when I run it, it gives me something like this:
-------------------------------------------------------------
| Full Name | Username | Total Sent | Unique Mails |
|-------------|-----------------|------------|--------------|
| John Doe | jdoe#mail.com | 12000 | 12000 |
| James Smith | jsmith#mail.com | 6000 | 6000 |
| Jane Jones | jjones#mail.com | 4000 | 4000 |
| ... | ... | ... | ... |
-------------------------------------------------------------
I just gives me the same number in both columns and takes 7 minutes to process.
Here is an example of what the 3 tables would look like separately if that can help:
Usernames
------------------------------------------------
| Username | FirstName | LastName | ... |
|-----------------|-----------|----------|-----|
| jdoe#mail.com | John | Doe | ... |
| jsmith#mail.com | James | Smith | ... |
| jjones#mail.com | Jane | Jones | ... |
| ... | ... | ... | ... |
------------------------------------------------
Mass_Mail
----------------------------------------------------
| ID_Mass_Mail | Username | Date | ... |
|--------------|----------------|------------|-----|
| 1 | jdoe#mail.com | 2019-01-16 | ... |
| 2 | jdoe#mail.com | 2019-01-29 | ... |
| 3 | jjones#mail.com| 2019-02-14 | ... |
| ... | ... | ... | ... |
----------------------------------------------------
Sent_Mail_Contacts
---------------------------------------------------------------------
| ID_Mass_Mail | Username | Contact_ID | Contact_Email | ... |
|--------------|----------------|------------|----------------|------
| 1 | jdoe#mail.com | 1 | bob#mail.com | ... |
| 1 | jdoe#mail.com | 2 | jim#mail.com | ... |
| 1 | jdoe#mail.com | 3 | cindy#mail.com | ... |
| ... | ... | ... | ... | ... |
| 2 | jdoe#mail.com | 4 | mike#mail.com | ... |
| 2 | jdoe#mail.com | 2 | jim#mail.com | ... |
| 2 | jdoe#mail.com | 3 | cindy#mail.com | ... |
| ... | ... | ... | ... | ... |
---------------------------------------------------------------------
Use COUNT(DISTINCT ...) :
SELECT
CONCAT(Usernames.FirstName, ' ', Usernames.LastName) AS 'Full Name',
Usernames.Username,
COUNT(Sent_Mail_Contacts.IDContact) AS `Total Sent`,
COUNT(DISTINCT Mass_Mail.IDMass_Mail) AS `Individual E-Mails`
FROM Usernames
LEFT JOIN Sent_Mail_Contacts ON Usernames.Username = Sent_Mail_Contacts.Username
LEFT JOIN Mass_Mail ON Usernames.Username = Mass_Mail.Username
GROUP BY Usernames.Username
ORDER BY `Total Sent`
NB : this will not make the query any faster though. To start with, you should at least make sure that you are using primary/foreign keys relations in the JOINs : Usernames(Username), Sent_Mail_Contacts(Username), Mass_Mail(Username)
Assuming the values in IDMass_Mail indicate a unique email, then you just need to edit the last COUNT to use the DISTINCT keyword.
COUNT(DISTINCT Mass_Mail.IDMass_Mail) AS `Individual E-Mails`
That will return the number of unique values in the grouping by Username.
You should also get a performance boost if you're able to add indexes to the Username columns in the Sent_Mail_Contacts and Mass_Mail tables.
I managed to do it using a query that (besides from changing the actual table and column names due to privacy concerns) looked exactly like this.
SELECT
Accounts.Account_Name AS `account`,
Usernames.Username AS `username`,
COUNT(Mass_Mail_Reached_Contacts.ID_Contact) AS `total_emails`,
COUNT(Mass_Mail_Reached_Contacts.ID_Mass_Mail) /
(
SELECT COUNT(*)
FROM
Mass_Mail_Reached_Contacts
WHERE
Mass_Mail_Reached_Contacts.DATE >= '2019-02-01'
AND
Mass_Mail_Reached_Contacts.DATE <= '2019-02-28'
)
* 100 AS `%`,
COUNT(DISTINCT Mass_Mail.ID_Mass_Mail) AS `unique_emails`,
COUNT(Mass_Mail_Reached_Contacts.ID_Mass_Mail) /
COUNT(DISTINCT mass_mail.ID_Mass_Mail)
AS `avg_contacts_per_email`
FROM
Usernames
LEFT JOIN Mass_Mail_Reached_Contacts ON Mass_Mail_Reached_Contacts.Username = Usernames.Username
LEFT JOIN Account ON Account.ID_Account = Usernames.ID_Account
LEFT JOIN Mass_Mail ON Mass_Mail.ID_Mass_Mail = Mass_Mail_Reached_Contacts.ID_mass_mail
WHERE
Mass_Mail_Reached_Contacts.DATE >= '2019-02-01'
AND
Mass_Mail_Reached_Contacts.DATE <= '2019-02-28'
GROUP BY
Usernames.Username
HAVING COUNT(DISTINCT Mass_Mail.IDMass_Mail) > 0
ORDER BY
`total_emails` DESC
I'm now able to get a table that looks like this
Emails Stats
--------------------------------------------------------------------------------------
| account | username | total_emails | % | unique_emails | avg_contact_email |
|----------|--------------|--------------|-------|------------------------------------
| Bob inc. | bob#mail.com | 28,550 | 14.52 | 12 | 2379.17 |
| ... | ... | ... | ... | ... | ... |
--------------------------------------------------------------------------------------
To start with: Why do Mass_Mail and Sent_Mail_Contacts both contain a Username? This looks redundant. Or is Sent_Mail_Contacts.ID_Mass_Mail nullable?
For this query at least, I suppose we can ignore the Username in Sent_Mail_Contacts completely. What really links the two tables is the ID_Mass_Mail, and you have forgotten this join criteria in your query.
select
ws_concat(' ', u.firstname, u.lastname) as full_name,
u.username,
count(smc.idmass_mail) as total_sent,
count(mm.idmass_mail) as individual_e_mails
from usernames u
left join mass_mail mm on mm.username = u.username
left join sent_mail_contacts smc on smc.id_mass_mail = u.id_mass_mail
group by u.username
order by total_sent;

MySQL - Show latest successful test

How can I get all the latest successful tests by program? The latest one has the highest Build number and successful are all PASSED and OF CONCERN
My table looks like this (I excluded some columns from the original):
+----+---------+----------------+-------+-----------+---------+
| ID | Test | Program | Build | Result | Tester |
+----+---------+----------------+-------+-----------+---------+
| 1 | 1 | Mag. & Speech | 1825 | PASSED | Dale |
| 2 | 2 | Scr. Reader | 1820 | PASSED | Aadarsh |
| 3 | 2 | Scr. Reader | 1821 | PASSED | Tony |
| 4 | 2 | Scr. Reader | 1824 | PASSED | Tony |
| 5 | 2 | Mag. & Speech | 1820 | PASSED | Colin |
| 6 | 2 | Mag. & Speech | 1821 | FAILED | Dale |
| 7 | 2 | Mag. & Speech | 1822 | OF CONCERN| Tony |
| 8 | 2 | Mag. | 1820 | PASSED | Steven |
| 9 | 3 | Scr. Reader | 1820 | NOT TESTED| Aadarsh |
+----+---------+----------------+-------+-----------+---------+
As a result I would want to get the row (ID) 1,4,7,8. As you can see, no program has more than one of the same test.
Edit:
Added some missing information to the table.
Sadly I don't have the queries anymore, I tried, but I didn't get very far with just Where and Order By.
This query should do the trick
SELECT t3.*
FROM (
SELECT t1.ID,
MAX(t1.Build) as Build
FROM table_name t1
WHERE LOWER(t1.Result) NOT IN( 'n/a', 'not completed', 'not tested' )
GROUP BY t1.Test, t1.Program
) t2
INNER JOIN table_name t3
ON t3.ID = t2.ID
AND t3.Build = t2.Build;
Unfortunately it is a bit complicated due to group by limitations.
Please replace table_name (in 2 places) with proper name

MySQL Query Two Tables and Max Timestamp

I have two tables that look something like this:
TABLE_conversations:
+-----------------+----------+----------------+------------+---------------------+--------+
| CONVERSATION_ID | QUEUE_ID | CONTACT_NUMBER | CONTACT_ID | DATE_CREATED | STATUS |
+-----------------+----------+----------------+------------+---------------------+--------+
| 1 | 1 | 15551112222 | 9000001 | 2014-09-12 00:28:24 | ACTIVE |
| 2 | 1 | 15553334444 | 9000002 | 2014-09-12 00:32:08 | ACTIVE |
+-----------------+----------+----------------+------------+---------------------+--------+
TABLE_messages:
+------------+-----------------+-------------+-------------+-----------+---------+---------------------+--------+-----------------------------------------------------------------------------------------------------------------+--------+
| MESSAGE_ID | CONVERSATION_ID | FROM_NUMBER | TO_NUMBER | DIRECTION | SENDER | TIMESTAMP | VIEWED | MESSAGE | STATUS |
+------------+-----------------+-------------+-------------+-----------+---------+---------------------+--------+-----------------------------------------------------------------------------------------------------------------+--------+
| 1 | 1 | 15551112222 | 17021112222 | IN | 9000001 | 2014-09-12 00:30:11 | 1 | Hello! Is this working? | ACTIVE |
| 2 | 1 | 17021112222 | 15551112222 | OUT | 8000001 | 2014-09-12 00:31:05 | 1 | Good evening! Of course! | ACTIVE |
| 3 | 1 | 15551112222 | 17021112222 | IN | 9000001 | 2014-09-12 00:31:27 | 1 | Perfect. Thank you! | ACTIVE |
| 4 | 1 | 17021112222 | 15553334444 | OUT | 8000002 | 2014-09-12 00:32:52 | 1 | Ticket 11251 is ready for pickup. | ACTIVE |
+------------+-----------------+-------------+-------------+-----------+---------+---------------------+--------+-----------------------------------------------------------------------------------------------------------------+--------+
I'm trying to run a query to select the CONVERSATION_ID, CONTACT_NUMBER, CONTACT_ID and most recent TIMESTAMP and grouping by phone number:
SELECT TABLE_conversations.CONVERSATION_ID, TABLE_conversations.CONTACT_NUMBER,
TABLE_conversations.CONTACT_ID, MAX(TABLE_messages.TIMESTAMP)
FROM TABLE_conversations, TABLE_messages
WHERE TABLE_conversations.STATUS='ACTIVE'
AND TABLE_messages.STATUS='ACTIVE'
GROUP BY CONTACT_NUMBER
ORDER BY TABLE_messages.TIMESTAMP;
The output I'm getting is below:
+-----------------+----------------+------------+-------------------------------+
| CONVERSATION_ID | CONTACT_NUMBER | CONTACT_ID | MAX(TABLE_messages.TIMESTAMP) |
+-----------------+----------------+------------+-------------------------------+
| 1 | 15551112222 | 9000001 | 2014-09-12 00:32:52 |
| 2 | 15553334444 | 9000002 | 2014-09-12 00:32:52 |
+-----------------+----------------+------------+-------------------------------+
I'm getting the same TIMESTAMP for both. The result I want is 2014-09-12 00:31:27 for 15551112222 and 2014-09-12 00:32:52 for 15553334444.
Really appreciate any help!
You're missing join conditions between the tables, so you're getting a full cross-product. So every conversation is being joined with every message, not just the messages from that conversation.
SELECT TABLE_conversations.CONVERSATION_ID, TABLE_conversations.CONTACT_NUMBER,
TABLE_conversations.CONTACT_ID, MAX(TABLE_messages.TIMESTAMP)
FROM TABLE_conversations
JOIN TABLE_messages ON TABLE_conversations.conversation_id = TABLE_messages.conversation_id
WHERE TABLE_conversations.STATUS='ACTIVE'
AND TABLE_messages.STATUS='ACTIVE'
GROUP BY CONTACT_NUMBER
ORDER BY TABLE_messages.TIMESTAMP;
Your sql cross join all rows from two tables, so the max timestamp for any group is the same.
SELECT TABLE_conversations.CONVERSATION_ID,
TABLE_conversations.CONTACT_NUMBER,
TABLE_conversations.CONTACT_ID,
MAX(TABLE_messages.TIMESTAMP)
FROM TABLE_conversations
JOIN TABLE_messages
ON TABLE_conversations.CONVERSATION_ID = TABLE_messages.CONVERSATION_ID
WHERE TABLE_conversations.STATUS='ACTIVE'
AND TABLE_messages.STATUS='ACTIVE'
GROUP BY CONTACT_NUMBER
Suggest you to get rid of the group by and aggregation function to see what's different between full cross join and inner join. Such as:
SELECT TABLE_conversations.CONVERSATION_ID,
TABLE_conversations.CONTACT_NUMBER,
TABLE_conversations.CONTACT_ID,
TABLE_messages.TIMESTAMP
FROM TABLE_conversations
JOIN TABLE_messages
ON TABLE_conversations.CONVERSATION_ID = TABLE_messages.CONVERSATION_ID
--without on clause above, comes to the full cross join
WHERE TABLE_conversations.STATUS='ACTIVE'
AND TABLE_messages.STATUS='ACTIVE'
ORDER BY TABLE_messages.TIMESTAMP;

MySQL JOIN 3 tables using multiple columns/keys

Complete newbie to mySQL. So any help will be appreciated.
I have 3 tables -- carts, users, actions.
carts:
+------------+-------------+-------+
| cartId | session_id | userId|
+------------+-------------+-------+
users:
+----------+-------------+
| usedId | email |
+----------+-------------+
actions:
+-------------+------------------+---- ---------+
| session_id | impressionAction | impressionId |
+-------------+------------------+-----+--------+
In carts, there is one session_id per line.
In users, there is one userId per line.
In actions, there are multiple lines per session_id counting for all the actions for that session.
I would like to JOINthe three tables getting the output to be something like
+------+-------------+--------+------------------+--------------+-------+
userId | session_id | cartId | impressionAction | impressionId | email |
+------+-------------+--------+------------------+--------------+-------+
Where there will be multiple lines per userId and session_id; essentially a flattened file. I think if we JOIN carts and users on userId resulting in say A and then JOIN A and actions' onsession_id`, we are home.
A sample expected output is:
+------------+-------------+--------+------------------+--------------+---------+
userId | session_id | cartId | impressionAction | impressionId | email |
+------------+-------------+--------+------------------+--------------+---------+
| 1234 | abc3f45 | 0001 | LOGIN | 2032 |ab#yc.com|
| 1234 | abc3f45 | 0001 | ADD | 4372 |ab#yc.com|
| 1234 | abc3f45 | 0001 | ADD | 4372 |ab#yc.com|
| 1234 | abc3f45 | 0001 | SENDMAIL | ab#yc.com |ab#yw.com|
| 4567 | def4rg4 | 0002 | LOGIN | 2032 |db#yw.com|
| 4567 | def4rg4 | 0002 | ADD | 4372 |db#yw.com|
| 4567 | def4rg4 | 0002 | REMOVE | 3210 |db#yw.com|
+------------+-------------+--------+------------------+--------------+---------+**
I don't know how to JOIN 3 tables without one common key. I don't even know what type of join it is called.
Essentially, we are trying to join 3 tables with non-overlapping keys, gathering one common key through the first JOIN and then joining the intermediate with the third one. Is this called a CROSS JOIN? If no, is there a name?
Taken from your comment above
A USER may select many products, add them to their CART; a single
USER may have multiple CARTS and at the end of the event, they can
EMAIL the cart to themselves; the ACTIONS of the user are stored in
the actions table
This is how I see the structure (having in mind your data)
+---------------------+ +---------------------+ +---------------------+
| users | | carts | | actions |
+---------------------+ +---------------------+ +---------------------+
| user_id [PK] |--| | cart_id [PK] | | impression_id [PK] |
| email | |--| user_id [FK] | | action_name |
| | | product_id [FK] | |--| session_id [FK]* |
+---------------------+ | session_id [FK]* |--| | |
| | +---------------------+
+---------------------+
As you can see above, I'm joining first with carts and them with actions because only the table carts has both, user and session data.
The [FK]* next to the session_id on carts and actions could seem as a foreign key but in this case it's not - 'cause there's no separate table for sessions where it would be placed as an PK (primary key)
You asked about join - it is the same as inner join. INNER JOIN creates a new result table by combining column values of two tables (A and B) based upon the join-predicate. The query compares each row of A with each row of B to find all pairs of rows which satisfy the join-predicate.
This is a possible content of the tables
+------------------------+
| users |
+------------------------+
| id | email |
+------+-----------------+
| 1 | first#mail.org |
| 2 | second#mail.org |
| 3 | third#mail.org |
+------+-----------------+
+------------------------------------------+
| carts |
+------------------------------------------+
| id | user_id | product_id | session_id |
+------+---------+------------+------------+
| 1 | 1 | 5 | 1aaaa |
| 2 | 2 | 5 | 2ffff |
| 3 | 3 | 8 | 3ddddd |
| 4 | 1 | 5 | 1aaaaa |
| 5 | 3 | 9 | 3bbbbb |
| 6 | 1 | 6 | 1ccccc |
+------+---------+------------+------------+
+-------------------------------+
| actions |
+-------------------------------+
| id | name | session_id |
+------+-----------+------------+
| 1 | ADD | 1aaaa |
| 2 | ADD | 2ffff |
| 3 | SENDMAIL | 3ddddd |
| 4 | ADD | 3ddddd |
| 5 | SENDMAIL | 2ffff |
| 6 | ADD | 1aaaaa |
| 7 | REMOVE | 3ddddd |
| 8 | ADD | 1ccccc |
| 9 | ADD | 3bbbbb |
| 10 | SENDMAIL | 3bbbbb |
+------+-----------+------------+
As you can see, there are six products in the table carts and exactly six add actions in the table actions. Furthermore, as you can see user with an id=1 bought three products but not at the same time, since there are two sessions; user with an id=3 as well, bought these two products in different times etc...
The SQL statement
SELECT u.user_id, c.session_id, c.cart_id, a.impression_id, a.action_name, u.email
FROM users AS u
INNER JOIN carts AS c ON c.user_id = u.user_id
INNER JOIN actions AS a ON a.session_id = c.session_id
ORDER BY u.user_id, c.session_id, c.cart_id
Results:
+---------+------------+---------+---------------+-------------+-----------------+
| user_id | session_id | cart_id | impression_id | action_name | email |
+---------+------------+---------+---------------+-------------+-----------------+
| 1 | 1aaaa | 1 | 1 | ADD | first#mail.org |
| 1 | 1aaaa | 1 | 6 | ADD | first#mail.org |
| 1 | 1aaaa | 4 | 1 | ADD | first#mail.org |
| 1 | 1aaaa | 4 | 6 | ADD | first#mail.org |
| 1 | 1cccc | 6 | 8 | ADD | first#mail.org |
| 2 | 2ffff | 2 | 5 | SENDMAIL | second#mail.org |
| 2 | 2ffff | 2 | 2 | ADD | second#mail.org |
| 3 | 3bbbb | 5 | 9 | ADD | third#mail.org |
| 3 | 3bbbb | 5 | 10 | SENDMAIL | third#mail.org |
| 3 | 3dddd | 3 | 3 | SENDMAIL | third#mail.org |
| 3 | 3dddd | 3 | 4 | ADD | third#mail.org |
| 3 | 3dddd | 3 | 7 | REMOVE | third#mail.org |
+---------+------------+---------+---------------+-------------+-----------------+
Note: There's no guarantee for session uniqueness.
(Updated) Working SQL Fiddle
UPDATE: (Finding and deleting duplicates)
I've updated the SQL Fiddle in order to simulate duplicate records (when user added the same product within the same session). With the following statement you'll be able to retrieve those duplicated rows.
SELECT c.card_id, c.user_id, c.product_id, c.session_id, a.action_name, a.impression_id
FROM cards As c
INNER JOIN actions AS a ON a.session_id = c.session_id
GROUP BY c.user_id, c.product_id, c.session_id, a.action_name
HAVING count(*) > 1
Results:
+---------+------------+------------+------------+-------------+-----------------+
| card_id | user_id | product_id | session_id | action_name | impression_id |
+---------+------------+------------+------------+-------------+-----------------+
| 1 | 1 | 5 | 1aaaa | ADD | 1 |
| 6 | 1 | 6 | 1cccc | ADD | 8 |
+---------+------------+------------+------------+-------------+-----------------+
In the SELECT part of the statement above you may omit everything except card_id and impression_id. Deleting these two duplicates in one statement is a bit tricky since you can't modify the same table selected in a sub-query within the same query. I would avoid the tricky part in this case (which involves another inner sub-query) and would delete duplicates using separate statements as following
-- delete duplicates from cards
--
DELETE FROM WHERE card_id IN (1,6)
-- delete duplicates from actions
--
DELETE FROM WHERE card_id IN (1,8)
Even better, you could check if the user already has been added a selected product and don't add it twice.
Excuse my MySql syntax, as I don't know it :-p But this is the idea
SELECT u.userId, a.session_id, c.cartId, a.impressionAction, a.impressionId, u.email
FROM Carts c
JOIN Users u on u.userId = c.UserId
JOIN Actions a on a.session_id = c.session_id
This will just merge everything together, and you'll have duplicate cart records if you have many to 1 relationships