I am learning DBT, specifically dbt-mysql. I am having trouble combining several tables into one table.
What I want to do:
Group By several columns by the last_updated (timestamp) date of the table and then combine those columns into a single table by the split last_updated field. Here is how I want my data to end up:
Here is my staging model (which I think should be straight selects from the database):
staging/clients/stg_clients_fields.sql
SELECT id, created, last_updated, service, order_count, spent_count, deleted, country
FROM client_database.clients
Then I have intermediate models (which I think should reconstruct data for my needs):
intermediate/clients/clients_last_updated_grouped.sql
SELECT YEAR(last_updated) as year_updated, MONTH(last_updated) as month_updated, COUNT(id) as client_count
FROM {{ ref('stg_clients_fields') }}
GROUP BY YEAR(last_updated), MONTH (last_updated)
intermediate/clients/clients_deleted_grouped.sql
SELECT YEAR(last_updated) as year_updated, MONTH(last_updated) as month_updated, COUNT(id) as deleted
FROM {{ ref('stg_clients_fields') }}
WHERE deleted = 1
GROUP BY YEAR(last_updated), MONTH (last_updated)
intermediate/clients/clients_service_grouped.sql
SELECT YEAR(last_updated) as year_updated, MONTH(last_updated) as month_updated, COUNT(id) as service
FROM {{ ref('stg_clients_fields') }}
WHERE service IS NOT NULL
GROUP BY YEAR(last_updated), MONTH (last_updated)
And other columns follow the same pattern based on their WHERE clauses.
Now I need to create a marts model that would use all previously created data and put it in one single table.
At this point, I end up with several tables that have the last_updated field separated and the specific column value next to the date.
How can I now combine all these tables that they would join on the last_updated split into to columns field?
Or perhaps there is a better solution to group data by year and month and get individual column values based on conditions?
I am new to DBT so all the help and all advice are welcome!
since clients_last_updated_grouped doesn't have a where condition, it's guaranteed to have all of the year/month combinations found in the other models. This makes it much easier. You can just select from that model and join the other models on year and month:
with
updated as (select * from {{ ref('clients_last_updated_grouped') }} ),
deleted as (select * from ),
service as (select * from ),
joined as (
select
updated.year,
updated.month,
updated.client_count,
coalesce(deleted.deleted, 0) as deleted_count,
coalesce(service.service, 0) as service_count
from
updated
left join deleted on updated.year = deleted.year and updated.month = deleted.month
left join service on updated.year = service.year and updated.month = service.month
)
select *
from joined
If your database doesn't support CTEs (with ...), this becomes:
select
updated.year,
updated.month,
updated.client_count,
coalesce(deleted.deleted, 0) as deleted_count,
coalesce(service.service, 0) as service_count
from
{{ ref('clients_last_updated_grouped') }} as updated
left join {{ ref('clients_deleted_grouped') }} as deleted on updated.year = deleted.year and updated.month = deleted.month
left join {{ ref('clients_service_grouped') }} as service on updated.year = service.year and updated.month = service.month
If it's not the case that clients_last_updated_grouped has every month/year combination of the other tables, you would need to first construct a "date spine", and then left join all 3 tables to that date spine.
Related
I have two tables, Project and Projectnote
There is a one to many relationship between project and projectnote.
I want to be able to list my projects and select the most recent projectnotes based on the id.
Is this possible to do in Mysql query, I can't figure it out.
Thanks for any help!
Edit: so far I have a basic query (below) that joins the two tables. However, this only selects projects where a note exists and I get multiple rows where there are several notes per project.
SELECT `driver_checkins`.*, `driver_trips`.`id` AS `trip_id`, `driver_trips`.`trip_num` AS `trip_num`, `driver_trips`.`status` AS `trip_status`, `driver_trips`.`ride_date` AS `ride_date`, `driver_trips`.`today_date` AS `trip_today_date`, `driver_trips`.`pick_up_time` AS `pick_up_time`, `driver_trips`.`d_time` AS `d_time`, `driver_trips`.`trip_type` AS `trip_type`
FROM `driver_checkins`
LEFT JOIN `driver_trips` ON `driver_trips`.`driver_id` = `driver_checkins`.`driver_id` WHERE `checkin_status` = 1 AND `booking_status` = 0;
Using a GROUP BY clause and the GROUP_CONCAT function should do the trick.
I am assuming that "driver_checkins" is your "Project" table and "driver_trips" is your "Projectnote" table.
SELECT `driver_checkins`.*, GROUP_CONCAT(`driver_trips`.`id`, `driver_trips`.`status` ORDER BY `driver_trips`.id DESC SEPARATOR " --- " LIMIT 3)
FROM `driver_checkins`
LEFT JOIN `driver_trips` ON `driver_trips`.`driver_id` = `driver_checkins`.`driver_id`
WHERE `checkin_status` = 1 AND `booking_status` = 0
GROUP BY `driver_checkins`.id;
This should display id and status for the last 3 driver_trips per driver_checkin, separated by " --- ".
Something to consider: while in many cases ordering by id will work chronologically, it's always better to add a timestamp column (e.g. called created) instead to order by chronologically.
There is a script that displays two tables. Names of type String and counting of type Long.
How can I combine the same fields “Out of stock” in one field and separate fields “marriage”, “re-sorting” in one field “marriage / re-sorting” in one table. In doing so, save the types of two tables and get the combined values in the new fields. And how not to display extra fields, for example, a form for employees, etc.
I know that you can use the CASE, WHEN, THEN structure. But I don’t understand how to correctly describe it in my script.
SELECT rl.reason AS reject_reason, COUNT(*)
FROM
mp.reservation_log AS rl
JOIN
mp.store AS st ON rl.store_id = st.md_id
JOIN mp.order_item oi ON oi.reserve_id=rl.reservation_id
JOIN mp.sku s ON s.id=oi.item_id
JOIN mp.product p ON p.id=s.product_id
WHERE rl.created_at > DATE(NOW()) - INTERVAL 1 MONTH AND rl.is_successful=0
GROUP BY rl.reason;
Table example:
table example
It seems like you want to combine several reasons in the same group.
You can use a case expression as a no-aggregated column for this, like so:
select
case
when r1.reason in ('marriage', 're-sorting') then 'marriage/re-sorting'
else r1.reason
end real_reason,
count(*) cnt
from ...
where rl.created_at > current_date - interval 1 month and rl.is_successful = 0
group by real_reason
Side note: DATE(NOW()) is better written CURRENT_DATE .
I want to create a report with the top 20 customers (based on revenue).
I am using the query:
SELECT dbo.CustTable.AccountNum
,dbo.dirpartytable.NAME
,dbo.hcmworker.PERSONNELNUMBER
,dbo.CustInvoiceJour.SALESBALANCE
,dbo.custinvoicejour.QTY
FROM dbo.CustTable
inner JOIN dbo.HCMWORKER ON dbo.HCMWORKER.RECID = dbo.CustTable.KEV_Worker
inner join dbo.custInvoiceJour on CustInvoiceJour.OrderAccount = CustTable.AccountNum
inner join dbo.dirpartytable on dirpartytable.recid = custtable.PARTY
where CustTable.KEV_Worker = '5633561745'
ORDER BY SalesBalanceMst DESC
I can't find the relation for the customer revenue, after all, that is how I want to sort the report. I am sorting on SalesBalanceMST right now while building the report. Also I am getting multiple records when executing this query.
What am i doing wrong?
EDIT: I now realize I am showing each Invoice Journal, how can I display the Total Revenue of the customer?
A similar search from AX 2012:
CustInvoiceJour CustInvoiceJour;
CustTable CustTable;
DirPartyTable DirPartyTable;
select forceLiterals generateonly sum(SalesBalanceMST), sum(Qty) from CustInvoiceJour
where CustInvoiceJour.OrderAccount == '102372200'
&& CustInvoiceJour.InvoiceDate > today()-365
join TableId from CustTable
group AccountNum
where CustTable.AccountNum == CustInvoiceJour.OrderAccount
join TableId from DirPartyTable
group Name
where DirPartyTable.RecId == CustTable.Party;
info(CustInvoiceJour.getSQLStatement());
This shows the following SQL:
SELECT SUM(T1.SALESBALANCEMST),SUM(T1.QTY),T2.ACCOUNTNUM,T3.NAME
FROM CUSTINVOICEJOUR T1
CROSS JOIN CUSTTABLE T2
CROSS JOIN DIRPARTYTABLE T3
WHERE (((T1.PARTITION=5637144576) AND (T1.DATAAREAID=N'xxx'))
AND ((T1.ORDERACCOUNT=N'102372200')
AND (T1.INVOICEDATE>{ts '2015-11-06 00:00:00.000'})))
AND (((T2.PARTITION=5637144576) AND (T2.DATAAREAID=N'xxx'))
AND (T2.ACCOUNTNUM=T1.ORDERACCOUNT))
AND ((T3.PARTITION=5637144576)
AND (T3.RECID=T2.PARTY))
GROUP BY T2.ACCOUNTNUM,T3.NAME
ORDER BY T2.ACCOUNTNUM,T3.NAME
What is different from your query:
no join on HcmWorker, as I do not have your custom field.
Using sum() to aggregate
selecting on InvoiceDate
selection on OrderAccount
selection on DataAreaId, really important for performance, implicit in AX
selection on Partition, really important for performance, implicit in AX
You cannot directly sort on a sum, but may on a nested SQL query.
I do not know exactly what is wrong in your query but perhaps this information can help you.
Check this standard report CustTopCustomersbyYTDSales, It has some good queries to do that.
https://technet.microsoft.com/en-us/library/hh389751.aspx
I have a MySQL table called EssayStats with three columns, EssayDate, WordCount and EssayId.
Each row is a record of when the bot recorded how many words were in an essay at a particular point in time.
I'm trying to write a query that will group by EssayId and sort by the largest increase in WordCount from a particular EssayDate to an ending EssayDate.
I'm not really sure where to start. I've tried a handful of things but they obviously don't accomplish what I jeed. My most recent query attempt was
SELECT *
FROM EssayStats
WHERE EssayDate >= "2014-01-01" AND EssayDate <= "2014-05-31"
GROUP BY EssayId
ORDER BY (WordCount)
Start by getting the dates at the beginning and end for each essay. Then join back the original tables to get the counts and do some arithmetic:
select es.EssayId, (esmax.WordCount - esmin.WordCount)
from (select es.EssayId, min(es.EssayDate) as mined, max(es.EssayDate) as maxed
from EssayStats es
group by es.EssayId
) es join
EssayStats esmin
on es.EssayId = esmin.EssayId and es.mined = esmin.EssayDate join
EssayStats esmax
on es.EssayId = esmax.EssayId and es.maxed = esmax.EssayDate;
I'm building a report for a database where I need to determine the number of "first scans" grouping by company, job, and date.
The scan table can contain multiple scans for the same item, however I only want to include the original scan in my COUNT, which can only be identified as being the scan with the earliest date that matches a particular item.
My first attempt at this was:
SELECT
_item_detail.job_id,
_item_group.group_id,
_scan.company_id,
DATE(scan_date_time) as scan_date,
COUNT(1)
FROM _scan
INNER JOIN _item_detail ON _item_detail.company_id = _scan.company_id
AND
_item_detail.serial_number = _scan.serial_number
INNER JOIN _item_group ON _item_group.group_id = _item_detail.group_id
WHERE _item_detail.job_id = '0326FCM' AND _scan.company_id = '152345' AND _item_group.group_id = 13
GROUP BY
_item_detail.job_id,
_item_group.group_id,
_scan.company_id, scan_date -- first_scan_count
HAVING min(scan_date_time);
This is giving me incorrect results, though (about 3x too many). I am assuming it's because the MIN record is being recalculated for each date, so if the min was found on day 1, it may also be found on day 3 and counted again.
How can I modify my query to achieve the desired results?
Something similar to this should work... I'm not completely sure of how your tables are laid out or how the data relates them together, but this is the general idea:
SELECT
_item_detail.job_id,
_item_group.group_id,
_scan.company_id,
DATE(scan_date_time) as scan_date,
COUNT(1)
FROM
_scan s1
INNER JOIN _item_detail
ON _item_detail.company_id = s1.company_id
AND _item_detail.serial_number = s1.serial_number
AND _item_detail.job_id = '0326FCM'
INNER JOIN _item_group
ON _item_group.group_id = _item_detail.group_id
AND _item_group.group_id = 13
WHERE
s1.company_id = '152345'
AND s1.scan_date_time = (
SELECT MIN(s2.scan_date_time)
FROM _scan s2
WHERE
s2.company_id = s1.company_id
AND s2.serial_number = s1.serial_number
)
GROUP BY
_item_detail.job_id,
_item_group.group_id,
s1.company_id
I don't quite follow your query, but based on the description of the problem, I'd say create a subquery that gives the min scan date for for each item, group by items, the perform your outer select on that.