ClickHouse Aggregates - GROUP BY DAY/MONTH/YEAR(timestamp)? - mysql

Is there a way in ClickHouse to do a GROUP BY DAY/MONTH/YEAR() with a timestamp value? Having hard time figuring it out while rewriting MySQL queries to ClickHouse. My MySQL queries looks like so...
SELECT COUNT(this), COUNT(that) FROM table WHERE something = x AND stamp BETWEEN startdate AND enddate
SELECT COUNT(this), COUNT(that) FROM table WHERE something = x AND stamp BETWEEN startdate AND enddate GROUP BY DAY(stamp)
SELECT COUNT(this), COUNT(that) FROM table WHERE something = x AND stamp BETWEEN startdate AND enddate GROUP BY MONTH(stamp)
SELECT COUNT(this), COUNT(that) FROM table WHERE something = x AND stamp BETWEEN startdate AND enddate GROUP BY YEAR(stamp)
Quite simple AND SLOW in MySQL, but I do not know how to do the aggregates in ClickHouse.
Thanks!

To get part of date use function toYear, toMonth, toDayOfMonth by the next way:
SELECT
toMonth(time) AS month,
count()
FROM
(
SELECT
number,
addDays(now(), number) AS time
FROM numbers(8)
)
GROUP BY month
/*
┌─month─┬─count()─┐
│ 1 │ 7 │
│ 2 │ 1 │
└───────┴─────────┘
*/
To get multiple grouping set use WITH ROLLUP-modifier:
SELECT
toYear(time) AS year,
toMonth(time) AS month,
toDayOfMonth(time) AS day,
count()
FROM
(
SELECT
number,
addDays(now(), number) AS time
FROM numbers(8)
)
GROUP BY
year,
month,
day
WITH ROLLUP
/*
┌─year─┬─month─┬─day─┬─count()─┐
│ 2021 │ 2 │ 1 │ 1 │ // day
│ 2021 │ 1 │ 29 │ 1 │ // day
│ 2021 │ 1 │ 31 │ 1 │ // day
│ 2021 │ 1 │ 26 │ 1 │ // day
│ 2021 │ 1 │ 25 │ 1 │ // day
│ 2021 │ 1 │ 28 │ 1 │ // day
│ 2021 │ 1 │ 30 │ 1 │ // day
│ 2021 │ 1 │ 27 │ 1 │ // day
│ 2021 │ 1 │ 0 │ 7 │ // month
│ 2021 │ 2 │ 0 │ 1 │ // month
│ 2021 │ 0 │ 0 │ 8 │ // year
│ 0 │ 0 │ 0 │ 8 │
└──────┴───────┴─────┴─────────┘
*/

Related

MySQL recurssion heirarchy [duplicate]

I don't quite get how recursive queries work and how to solve this problem. We were given the table on the left and the structure looks like the tree on the right:
ID | Parent 1
1 null / \
2 1 2 3
3 1 \
4 3 4
5 4 / \
6 4 5 6
7 6 \
7
I know how to get all the parent nodes of every node... but I don't get how you find the max depth of the tree. Meaning I have to find out how many levels this tree has. We aren't given any more information
I would be super grateful if you could give me a solution for mysql, but any sql statement will help me figuring this out
Thanks in advance!
You could use RECURSIVE cte (MySQL 8.0):
WITH RECURSIVE cte AS (
SELECT 1 AS lvl, Parent, id
FROM tab
WHERE Parent IS NULL
UNION ALL
SELECT lvl + 1, tab.Parent, tab.id
FROM tab
JOIN cte
ON tab.Parent = cte.Id
)
SELECT * -- MAX(lvl) AS max_depth_of_tree
FROM cte;
Output:
┌──────┬─────────┬────┐
│ lvl │ Parent │ id │
├──────┼─────────┼────┤
│ 1 │ │ 1 │
│ 2 │ 1 │ 2 │
│ 2 │ 1 │ 3 │
│ 3 │ 3 │ 4 │
│ 4 │ 4 │ 5 │
│ 4 │ 4 │ 6 │
│ 5 │ 6 │ 7 │
└──────┴─────────┴────┘
DBFiddle Demo

SQL GROUP BY and COUNT and conditional SUM with column value

Here is an example of my table.
┌────────┬────────┬───────┐
│ UserId │ Status │ Value │
├────────┼────────┼───────┤
│ 1 │ 1 │ 10 │
│ 1 │ 0 │ 5 │
│ 2 │ 0 │ 8 │
│ 2 │ 1 │ 15 │
│ 1 │ 1 │ 10 │
│ 1 │ 0 │ 5 │
└────────┴────────┴───────┘
I need to GROUP BY rows with UserId then COUNT total rows in the group, to this point I have no problem but I also want to SUM(Value) according Status Column. Like COUNT my sql give me total sum of group rows but I need result like below :-)
┌────────┬──────────────────────┬─────────────────────┐
│ UserId │ SUM(Value) Status=1 │ SUM(Value) Status=0 │
├────────┼──────────────────────┼─────────────────────┤
│ 1 │ 20 │ 10 │
│ 2 │ 15 │ 8 │
└────────┴──────────────────────┴─────────────────────┘
NOTE: This type of query called Conditional Aggregation you may search for more about this.
use this
SELECT USERID,
SUM(CASE WHEN STATUS = 1 THEN VALUE ELSE 0 END ) AS ST1,
SUM(CASE WHEN STATUS = 0 THEN VALUE ELSE 0 END ) AS ST2 FROM
DBO.TABLENAME
GROUP BY USERID
Assuming that the data type of Status is BOOLEAN or INTEGER with only 0 and 1 as possible values:
SELECT UserId,
SUM(Status * Value) Status_1,
SUM((NOT Status) * Value) Status_0
FROM tablename
GROUP BY UserId;
See the demo.

Add identifier of first created record to select statement with group_by

I have the following payments table
┌─name───────────────────────────┬─type────────────────────────────┐
│ payment_id │ UInt64 │
│ factory │ String │
│ user_id │ UInt64 │
│ amount_cents │ Int64 │
│ action │ String │
│ success │ UInt8 │
│ country │ FixedString(2) │
│ created_at │ DateTime │
│ finished_at │ Nullable(DateTime) │
└────────────────────────────────┴─────────────────────────────────┘
With sample data
┌─factory───┬─────────finished_at─┬─payment_id─┬─country─┬─action──┬─amount_cents─┬─user_id───┬
│ 0_factory │ 2021-01-18 00:00:01 │ 1 │ BY │ payment │ 1 │ 1 │
│ 0_factory │ 2021-01-18 00:00:02 │ 2 │ BY │ payment │ 1 │ 1 │
│ 1_factory │ 2021-01-18 00:00:02 │ 2 │ PL │ win │ 4 │ 1 │
│ 1_factory │ 2021-01-18 00:00:03 │ 3 │ PL │ win │ 7 │ 1 │
│ 2_factory │ 2021-01-18 00:00:01 │ 4 │ PL │ win │ 7 │ 1 │
│ 2_factory │ 2021-01-18 00:00:02 │ 1 │ PL │ payment │ 7 │ 1 │
│ 2_factory │ 2021-01-18 00:00:03 │ 2 │ PL │ win │ 7 │ 1 │
│ 2_factory │ 2021-01-18 00:00:04 │ 3 │ GR │ win │ 2 │ 1 │
└───────────┴─────────────────────┴────────────┴─────────┴─────────┴─────────┴────────────────┘
This is an example of what I have right now with
SELECT
factory,
user_id,
payment_id,
action,
created_at
FROM payments_all
WHERE (payments_all.action = 'payment') AND (payments_all.factory IN ('0_factory', '1_factory', '2_factory')) AND isNotNull(payments_all.created_at)
GROUP BY
factory,
user_id,
payment_id,
action
HAVING (min(created_at) >= toDate('2019-01-01 00:00:00')) AND (min(created_at) < toDate('2021-10-01 00:00:00'))
ORDER BY user_id
┌─factory───┬─user_id─┬─payment_id─┬─action──┬──────────created_at─┐
│ 1_factory │ 1 │ 1 │ payment │ 2021-02-04 09:00:00 │
│ 0_factory │ 1 │ 1 │ payment │ 2021-01-17 00:00:01 │
│ 0_factory │ 1 │ 2 │ payment │ 2021-01-17 00:00:06 │
└───────────┴─────────┴────────────┴─────────┴─────────────────────┘
I need to add new column first_payment
first_payment takes value 1 if action is payment && it is first payment for a user. Otherwise it takes value 0.
the first_payment should be checked for all period
So expected result is:
┌─factory───┬─────────finished_at─┬─payment_id─┬─country─┬─action──┬─amount_cents─┬─user_id───┬first_payment─┐
│ 0_factory │ 2021-01-18 00:00:01 │ 1 │ BY │ deposit │ 1 │ 1 │ 1 │
│ 0_factory │ 2021-01-18 00:00:02 │ 2 │ BY │ deposit │ 1 │ 1 │ 0 │
│ 1_factory │ 2021-01-18 00:00:02 │ 2 │ PL │ win │ 4 │ 1 │ 0 │
│ 1_factory │ 2021-01-18 00:00:03 │ 3 │ PL │ win │ 7 │ 1 │ 0 │
│ 2_factory │ 2021-01-18 00:00:01 │ 4 │ PL │ win │ 7 │ 1 │ 0 │
│ 2_factory │ 2021-01-18 00:00:02 │ 1 │ PL │ deposit │ 7 │ 1 │ 1 │
│ 2_factory │ 2021-01-18 00:00:03 │ 2 │ PL │ win │ 7 │ 1 │ 0 │
│ 2_factory │ 2021-01-18 00:00:04 │ 3 │ GR │ win │ 2 │ 1 │ 0 │
└───────────┴─────────────────────┴────────────┴─────────┴─────────┴─────────┴────────────────┘
I couldn't find much about ClickHouse, but it doesn't appear to support Windowed Functions.
Your example output also seems to be exactly the same as your sample table, plus one additional column, so I'm not sure what you GROUP BY was meant to achieve.
So, I'd use a LEFT JOIN on to a sub-query.
SELECT
payments_all.*,
CASE WHEN user_summary.user_id IS NOT NULL THEN 1 ELSE 0 END AS first_payment
FROM
payments_all
LEFT JOIN
(
SELECT
user_id,
factory,
MIN(created_at) AS first_created_at
FROM
payments_all
WHERE
action = 'payment'
GROUP BY
user_id,
factory
)
AS user_summary
ON payments_all.user_id = user_summary.user_id
ON payments_all.factory = user_summary.factory
AND payments_all.created_at = user_summary.first_created_at
WHERE
(payments_all.factory IN ('0_factory', '1_factory', '2_factory'))
AND (payments_all.created_at >= toDate('2019-01-01 00:00:00'))
AND (payments_all.created_at < toDate('2021-10-01 00:00:00'))
As I can see for first payment the payment_id is always 1. So, I think you can use CASE WHEN payment_id=1 Then 1 ELSE 0 END AS first_payment. Please check query below =>
WITH CTE AS
(SELECT
factory,
user_id,
payment_id,
action,
created_at
FROM payments_all
WHERE (payments_all.action = 'payment') AND (payments_all.factory IN ('0_factory', '1_factory', '2_factory')) AND isNotNull(payments_all.created_at)
GROUP BY
factory,
user_id,
payment_id,
action
HAVING (min(created_at) >= toDate('2019-01-01 00:00:00')) AND (min(created_at) < toDate('2021-10-01 00:00:00'))
) T1
SELECT *,CASE WHEN payment_id=1 Then 1
ELSE 0 END AS first_payment
FROM CTE
ORDER BY T1.user_id
NOTE: Query is written in SQL Server. Please check and let me know.

How do you calculate the depth of a tree with a sql statement?

I don't quite get how recursive queries work and how to solve this problem. We were given the table on the left and the structure looks like the tree on the right:
ID | Parent 1
1 null / \
2 1 2 3
3 1 \
4 3 4
5 4 / \
6 4 5 6
7 6 \
7
I know how to get all the parent nodes of every node... but I don't get how you find the max depth of the tree. Meaning I have to find out how many levels this tree has. We aren't given any more information
I would be super grateful if you could give me a solution for mysql, but any sql statement will help me figuring this out
Thanks in advance!
You could use RECURSIVE cte (MySQL 8.0):
WITH RECURSIVE cte AS (
SELECT 1 AS lvl, Parent, id
FROM tab
WHERE Parent IS NULL
UNION ALL
SELECT lvl + 1, tab.Parent, tab.id
FROM tab
JOIN cte
ON tab.Parent = cte.Id
)
SELECT * -- MAX(lvl) AS max_depth_of_tree
FROM cte;
Output:
┌──────┬─────────┬────┐
│ lvl │ Parent │ id │
├──────┼─────────┼────┤
│ 1 │ │ 1 │
│ 2 │ 1 │ 2 │
│ 2 │ 1 │ 3 │
│ 3 │ 3 │ 4 │
│ 4 │ 4 │ 5 │
│ 4 │ 4 │ 6 │
│ 5 │ 6 │ 7 │
└──────┴─────────┴────┘
DBFiddle Demo

SQL count occurrences of entries

I'm having some problems in counting some entries in my database.
I have following query:
SELECT ref_training_date, training_date FROM fynslund.users_training
INNER JOIN training
ON training.id = users_training.ref_training_date
WHERE attendance = 1
Which gives me following results:
┌──────┬───────────────┐
│ ID │ DATE │
├──────┼───────────────┤
│ '55' │ '2018-01-09' │
│ '55' │ '2018-01-09' │
│ '54' │ '2018-02-03' │
│ '54' │ '2018-02-03' │
│ '54' │ '2018-02-03' │
│ '54' │ '2018-02-03' │
└──────┴───────────────┘
How do I count how many times the date with ID '55' appears?
You need to GOUP BY ref_training_date and then COUNT(training_date):
SELECT ref_training_date, COUNT(training_date) AS date_count
FROM fynslund.users_training
INNER JOIN training ON training.id = users_training.ref_training_date
WHERE attendance = 1
GROUP BY ref_training_date