table creation logic - mysql

I am looking to take two tables I have a perform a data transformation to create a single table. I have an events table and user table:
Events: {id, user_id, start_date, end_date, cost...}
Users: {id, name, ...}
I am trying to create a table at that shows user spend at a daily level, assuming the user start with a starting cost of zero and it goes up after every event.
The intended output would be:
{date, userid, beginning_balance, sum(cost), num_of_events}
i need some direction on how to tackle this one as I am not very familiar with data transformation within SQL

Your requirement is a bit unclear but you may be after something like this
drop table if exists event;
create table event(id int auto_increment primary key, user_id int,start_date date, end_date date, cost int);
insert into event (user_id,start_date , end_date, cost) values
(1,'2017-01-01','2017-01-01',10),(1,'2017-01-01','2017-01-01',10),
(1,'2017-02-01','2017-01-01',10),
(2,'2017-01-01','2017-01-01',10);
select e.user_id,start_date,
ifnull(
(select sum(cost)
from event e1
where e1.user_id = e.user_id and e1.start_date <e.start_date
), 0 )beginning_balance,
sum(cost),count(*)
as num_of_events
from users u
join event e on e.user_id = u.userid
group by e.user_id,start_date
+---------+------------+-------------------+-----------+---------------+
| user_id | start_date | beginning_balance | sum(cost) | num_of_events |
+---------+------------+-------------------+-----------+---------------+
| 1 | 2017-01-01 | 0 | 20 | 2 |
| 1 | 2017-02-01 | 20 | 10 | 1 |
| 2 | 2017-01-01 | 0 | 10 | 1 |
+---------+------------+-------------------+-----------+---------------+
3 rows in set (0.03 sec)

Can you try this query
SELECT
data,
User.id AS userid,
0 AS beginning_balance,
SUM(cost) AS cost,
COUNT(0) AS num_of_events
FROM
Users
LEFT JOIN Events ON (user_id = Users.id)
GROUP BY
Users.id

Related

mysql counts the number of new visitors by day

MySQL version: 5.7
Here is users table:
+------------+------+
| date | uid |
+------------+------+
| 2020-06-29 05:00:00 | 352 |
| 2020-06-29 08:00:00 | 354 |
| 2020-06-29 09:25:53 | 354 |
| 2020-06-30 08:00:00 | 863 |
| 2020-06-30 09:00:01 | 352 |
| 2020-06-30 09:59:59 | 352 |
| 2020-07-01 07:00:00 | 358 |
| 2020-07-01 09:00:00 | 358 |
+------------+------+
I want to count the number of new visitors per day,But there is an important condition here that new visitors of the day cannot be visited before.
I want the result:
Result:
+------------+------------------+
| date | new_user_count |
+------------+------------------+
| 2020-06-29 | 2 |
| 2020-06-30 | 1 |
| 2020-07-01 | 1 |
+------------+------------------+
The above result is equivalent to these three sql:
2020-06-29 (352,354) : select count( distinct uid ) as new_user_count from users where DATE(date) = '2020-06-29' and uid not in ( select distinct uid from users where date < '2020-06-29 05:00:00'); #2
2020-06-30 (863): select count( distinct uid ) as new_user_count from users where DATE(date)= '2020-06-30' and uid not in ( select distinct uid from users where date < '2020-06-30 08:00:00'); # 1
2020-07-01 (358): select count( distinct uid ) as new_user_count from users where DATE(date)= '2020-07-01' and uid not in ( select distinct uid from users where date < '2020-07-01 07:00:00'); # 1
I haven't thought of it until now, thanks
Here is Online users table
You could try using a correlated subquery to check if each user visit be the first or not:
SELECT
date,
SUM(CASE WHEN NOT EXISTS (SELECT 1 FROM users u2
WHERE u2.date < u1.date AND u2.uid = u1.uid)
THEN 1 ELSE 0 END) AS new_user_count
FROM
(SELECT DISTINCT date, uid FROM users) u1
GROUP BY
date;
Demo
The above logic actually reads straightforward, and says to count a user record only if we cannot find that same user appearing in the table at some later date. Note that I use distinct selects, because it appears that in your data a given user might appear more than once on the same date. This data would spoof the above correlated subquery, so we ensure that a given user appear only once on a given date (and besides, one user can only be counted once per day anyway).
SELECT
date,
(
SELECT COUNT(DISTINCT u1.uid)
FROM users u1
WHERE NOT EXISTS(
SELECT * FROM users u2
WHERE u2.uid = u1.uid AND u2.date < u0.date
) AND u1.date = u0.date
)
FROM
users u0
GROUP BY
date
;
-- get date and the amount of distinct users
SELECT date, COUNT(DISTINCT uid)
-- from users table
FROM users
-- only when there not exists a row
WHERE NOT EXISTS ( SELECT NULL -- may use any literal value instead of NULL
-- in the table
FROM users u
-- with this user id
WHERE users.uid = u.uid
-- but earlier (less) date
AND users.date > u.date )
GROUP BY date;

Query membership of group on date with open-ended memberships

Given the following table structure for tracking membership of given groups:
+----+----------+----------------+--------------+
| id | group_id | in_group_begin | in_group_end |
+----+----------+----------------+--------------+
| 1 | 10 | 2019-01-01 | 2019-02-01 |
| 1 | 11 | 2019-02-02 | 2019-03-01 |
| 1 | 12 | 2019-03-01 | NULL |
| 2 | 10 | 2019-01-01 | NULL |
+----+----------+----------------+--------------+
(Where in_group_end being NULL signifies this is their current group)
How would I form a query that would tell me, for example, what group_id each member was associated with on a given date?
... in_group_end IS NULL will give me their current group, not necessarily the group they were in
... in_group_end IS NULL OR in_group_end >= '{$date_str}' could give me multiple options
Ideally I would like something I can use in a joined query, e.g. with a table storing a persons name, address, etc. from which I expect only one row back.
Would some kind of IF stmt in the JOIN do it? or GROUP in a sub-query?
Consider the following logic, which would find all matches for 2019-01-15:
SELECT group_id
FROM yourTable
WHERE
'2019-01-15' >= in_group_begin AND
('2019-01-15' <= in_group_end OR in_group_end IS NULL);
The WHERE clause considers an input date a match if it lies in between the start and end dates or it is greater than the start date and there is no end date. Also, the WHERE clause as written can make use of an index.
Let's say you want to search for a date 2019-01-03.
SELECT
id,
group_id
FROM membership
WHERE '2019-01-03' BETWEEN in_group_begin AND IFNULL(in_group_end, CURRENT_DATE);
If you have another users table which stores details of users and id of that table is used in membership table using id field. You can do following query.
SELECT
u.id,
u.name,
u.address,
m.group_id
FROM users u
INNER JOIN membership m ON u.id = m.id
WHERE '2019-01-03' BETWEEN in_group_begin AND IFNULL(in_group_end, CURRENT_DATE);
Assuming there is a table users from which you want the user's details returned, join it to your table tablename like this:
select u.*, t.group_id
from users u inner join (
select
id, group_id, in_group_begin,
coalesce(in_group_end, current_date) in_group_end
from tablename
) t on t.id = u.id and #date between t.in_group_begin and t.in_group_end
Replace #date with the date you search for.

Inserting rows into a MySQL DB based on what is missing

I've this table called Runks (a Runk is basically like a challenge in this game that I'm making).
Every game can hold 4 users. Thus per round 4 Runks will be created. 1 round will last 24 hours.
At the end of the round the status of these Runks changes.
However I am running into a problem. If one or more of the users neglected to upload Runk in the meantime I need to create an empty Runk for them in the database.
This query:
SELECT runk_group_id, COUNT(runk_id)
FROM runks
WHERE runk_status = 'ACTIVE'
GROUP BY runk_group_id
Would output this:
This should then result in a next query creating 5 Runks.
1 Runk needs to be created for group_id 32
1 Runk needs to be created for group_id 35
3 Runks need to be created for group_id 44
Also one thing that needs to be taken into is the fact that I need new Runks created with the player ids that have not yet uploaded a Runk.
So if for group 32 player 1, 2 & 3 have already uploaded a Runk... the Runk that will need to be created needs to belong to player 4.
This is what my table looks like:
For the sake of an answer, here is a simplified example (apologies for the terrible naming...):
CREATE TABLE users (
user_id int,
);
INSERT INTO users (1), (2), (3);
CREATE TABLE users_list (
user_id int
);
INSERT INTO users_list values (1), (1), (1), (3);
-- SELECT as shown
SELECT user_id, count(user_id)
FROM users_list
GROUP BY user_id;
+---------+----------------+
| user_id | count(user_id) |
+---------+----------------+
| 1 | 3 |
| 3 | 1 |
+---------+----------------+
-- Incorrect, count includes all an entry even if the left join has nulls
SELECT u.user_id, count(u.user_id)
FROM users u
LEFT JOIN users_list ul ON u.user_id = ul.user_id
GROUP BY u.user_id;
# Gives - WRONG
+---------+------------------+
| user_id | count(u.user_id) |
+---------+------------------+
| 1 | 3 |
| 2 | 1 |
| 3 | 1 |
+---------+------------------+
-- Doesn't include the nulls in the count so we ge the correct answer
SELECT u.user_id, count(ul.user_id)
FROM users u
LEFT JOIN users_list ul ON u.user_id = ul.user_id
GROUP BY u.user_id;
+---------+-------------------+
| user_id | count(ul.user_id) |
+---------+-------------------+
| 2 | 0 |
| 1 | 3 |
| 3 | 1 |
+---------+-------------------+

How to calculate running total grouped by Order No

Trying to create a running total for orders in SQL Server 2008, similar to the below table (Order No & Order Total columns exist in my SQL Server table), tried using a recursive cte but my results were a running total for all orders, not grouped by order no. Any suggestions how to have the running total grouped by the order no? Thanks
---------------------------------------------------------
| Order No. | Order Total | Running Total for Order No |
---------------------------------------------------------
| 1 | $10,000 | $10,000 |
---------------------------------------------------------
| 1 | -$5,000 | $5,000 |
---------------------------------------------------------
| 1 | $3,000 | $8,000 |
---------------------------------------------------------
| 2 | $2,500 | $2,500 |
---------------------------------------------------------
| 2 | $5,000 | $7,500 |
---------------------------------------------------------
| 2 | $4,000 | $11,000 |
---------------------------------------------------------
I would do this is with an Instead of Insert Trigger. The trigger would subtract/add from the groups first value. Obviously this should of been done at the creation of the table but you could add it after you make table update.
Keep in mind in order for the below code to work, you would need a primary key on the Order table
CREATE TABLE Orders
(
id INT IDENTITY(0, 1) PRIMARY KEY
, orderNo INT
, orderTotal MONEY
, runningTotal MONEY
);
INSERT INTO Orders
VALUES
(1,10000,10000),
(1,-5000,5000),
(1,3000,8000),
(2,2500,2500),
(2,5000,7500),
(2,4000,11500);
GO
--CREATE TRIGGER
CREATE TRIGGER trg_RunningTotal ON Orders
INSTEAD OF INSERT
AS
BEGIN
DECLARE #PreviousTotal MONEY =
(
SELECT TOP 1
a.runningTotal
FROM Orders AS a
INNER JOIN INSERTED AS b ON a.orderNo = b.orderNo
WHERE a.orderno = b.Orderno
ORDER BY a.id DESC
);
INSERT INTO Orders
SELECT
orderno,
orderTotal,
(#PreviousTotal + orderTotal) AS runningTotal
FROM INSERTED;
END;
--Insert new record
INSERT INTO orders
VALUES
(1,1000,NULL);
--View newly added record
SELECT
*
FROM orders
WHERE orderno = 1;
You need to following query:
SELECT orderno,
SUM((CASE WHEN ISNUMERIC(ordertotal)=1
THEN CONVERT(MONEY,ordertotal) ELSE 0 END)
)
AS [Converted to Numeric]
FROM price group by orderno

MySQL count daily new users VS returned users (cohort analysis)

The table structure is: user_id, Date (I'm used to work with timestamp)
for example
user id | Date (TS)
A | '2014-08-10 14:02:53'
A | '2014-08-12 14:03:25'
A | '2014-08-13 14:04:47'
B | '2014-08-13 04:04:47'
...
and for the next week I have
user id | Date (TS)
A | '2014-08-17 09:02:53'
B | '2014-08-17 10:04:47'
B | '2014-08-18 10:04:47'
A | '2014-08-19 10:04:22'
C | '2014-08-19 11:04:47'
...
and for today I have
user id | Date (TS)
A | '2015-05-27 09:02:53'
B | '2015-05-27 10:04:47'
C | '2015-05-27 10:04:22'
D | '2015-05-27 17:04:47'
I need to know how to perform a single query to find the number of users which are a "returned" user from the very beginning of their activity.
Expected results :
date | New user | returned User
2014-08-10 | 1 | 0
2014-08-11 | 0 | 0
2014-08-12 | 0 | 1 (A was active on 08/11)
2014-08-13 | 1 | 1 (A was active on 08/12 & 08/11)
...
2014-08-17 | 0 | 2 (A & B were already active )
2014-08-18 | 0 | 1
2014-08-19 | 1 | 1
...
2015-05-27 | 1 | 3 (D is a new user)
After some long search on Stackoverflow I found some material provided by https://meta.stackoverflow.com/users/107744/spencer7593 here : Weekly Active Users for each day from log but I didn't succeed to change his query to output my expected results.
Thanks for your help
Assuming you have a date table somewhere (and using t-sql syntax because I know it better...) the key is to calculate the mindate for each user separately, calculate the total number of users on that day, and then just declaring a returning user to be a user who wasn't new:
SELECT DateTable.Date, NewUsers, NumUsers - NewUsers AS ReturningUsers
FROM
DateTable
LEFT JOIN
(
SELECT MinDate, COUNT(user_id) AS NewUsers
FROM (
SELECT user_id, min(CAST(date AS Date)) as MinDate
FROM Table
GROUP BY user_id
) A
GROUP BY MinDate
) B ON DateTable.Date = B.MinDate
LEFT JOIN
(
SELECT CAST(date AS Date) AS Date, COUNT(DISTINCT user_id) AS NumUsers
FROM Table
GROUP CAST(date AS Date)
) C ON DateTable.Date = C.Date
Thanks to Stephen, I made a short fix on his query, which works well even it's a bit time consuming on large database :
SELECT
DATE(Stats.Created),
NewUsers,
NumUsers - NewUsers AS ReturningUsers
FROM
Stats
LEFT JOIN
(
SELECT
MinDate,
COUNT(user_id) AS NewUsers
FROM (
SELECT
user_id,
MIN(DATE(Created)) as MinDate
FROM Stats
GROUP BY user_id
) A
GROUP BY MinDate
) B
ON DATE(Stats.Created) = B.MinDate
LEFT JOIN
(
SELECT
DATE(Created) AS Date,
COUNT(DISTINCT user_id) AS NumUsers
FROM Stats
GROUP BY DATE(Created)
) C
ON DATE(Stats.Created) = C.Date
GROUP BY DATE(Stats.Created)