how to count number of lines with jointure in Talend on Oracle - mysql

i have 3 tables
supplier(id_supp, name, adress, ...)
Customer(id_cust, name, adress, ...)
Order(id_order, ref_cust, ref_supp, date_order...)
I want to make a job that counts the number of orders by Supplier, for last_week, last_two_weeks with Talend
select
supp.name,
(
select
count(*)
from
order
where
date_order between sysdate-7 and sysdate
nd ref_supp=id_supp
) as week_1,
(
select
count(*)
from
order
where
date_order between sysdate-14 and sysdate-7
nd ref_supp=id_supp
) as week_2
from supplier supp
the resaon for what i'm doing this, is that my query took to much time

You need a join between supplier and order to get supplier names. I show an inner join, but if you need ALL suppliers (even those with no orders in the order table) you may change it to a left outer join.
Other than that, you should only have to read the order table once and get all the info you need. Your query does more than one pass (read EXPLAIN PLAN for your query), which may be why it is taking too long.
NOTE: sysdate has a time-of-day component (and perhaps the date_order value does too); the way you wrote the query may or may not do exactly what you want it to do. You may have to surround sysdate by trunc().
select s.name,
count(case when o.date_order between sysdate - 7 and sysdate then 1 end)
as week_1,
count(case when o.date_order between sysdate - 14 and sysdate - 7 then 1 end)
as week_2
from supplier s inner join order o
on s.id_supp = o.ref_supp
;

Related

SQL get one time customers by email field

I have a database with over 100,000 records. I'm trying to get all customers who ordered only once searching by customer's email field (OrderEmail).
The SQL query is running for 10 minutes and then times out.
If I use short date ranges, I can get results but it still takes over 3 minutes.
How can I optimize the syntax to get it work?
SELECT
tblOrders.OrderID,
tblOrders.OrderName,
tblOrders.OrderEmail,
tblOrders.OrderPhone,
tblOrders.OrderCountry,
tblOrders.OrderDate
FROM
tblOrders
LEFT JOIN tblOrders AS orders_join ON orders_join.OrderEmail = tblOrders.OrderEmail
AND NOT orders_join.OrderID = tblOrders.OrderID
WHERE
orders_join.OrderID IS NULL
AND (tblOrders.OrderDate BETWEEN '2015-01-01' AND '2017-03-01')
AND tblOrders.OrderDelivered = - 1
ORDER BY
tblOrders.OrderID ASC;
I would expect the below to work - but I can't test it as you don't provide sample data. Well, I added a temporary table definition that could be used for the query ....
But , if you could actually change the data model to use an INTEGER id for the entity who placed the order (instead of a VARCHAR() email address), you would get considerably faster.
CREATE TEMPORARY TABLE IF NOT EXISTS
tblorders(orderid,ordername,orderemail,orderphone,ordercountry,orderdate) AS (
SELECT 1,'ORD01','adent#hog.com' ,'9-991' ,'UK', DATE '2017-01-01'
UNION ALL SELECT 2,'ORD02','tricia#hog.com','9-992' ,'UK', DATE '2017-01-02'
UNION ALL SELECT 3,'ORD03','ford#hog.com' ,'9-993' ,'UK', DATE '2017-01-03'
UNION ALL SELECT 4,'ORD04','zaphod#hog.com','9-9943','UK', DATE '2017-01-04'
UNION ALL SELECT 5,'ORD05','marvin#hog.com','9-9942','UK', DATE '2017-01-05'
UNION ALL SELECT 6,'ORD06','ford#hog.com' ,'9-993' ,'UK', DATE '2017-01-06'
UNION ALL SELECT 7,'ORD07','tricia#hog.com','9-992' ,'UK', DATE '2017-01-07'
UNION ALL SELECT 8,'ORD08','benji#hog.com' ,'9-995' ,'UK', DATE '2017-01-08'
UNION ALL SELECT 9,'ORD09','benji#hog.com' ,'9-995' ,'UK', DATE '2017-01-09'
UNION ALL SELECT 10,'ORD10','ford#hog.com' ,'9-993' ,'UK', DATE '2017-01-10'
)
;
SELECT
tblOrders.OrderID
, tblOrders.OrderName
, tblOrders.OrderEmail
, tblOrders.OrderPhone
, tblOrders.OrderCountry
, tblOrders.OrderDate
FROM tblOrders
JOIN (
SELECT
OrderEmail
FROM tblOrders
GROUP BY
OrderEmail
HAVING COUNT(*) = 1
) singleOrders
ON singleOrders.OrderEmail = tblOrders.OrderEmail
ORDER BY OrderID
;
OrderID|OrderName|OrderEmail |OrderPhone|OrderCountry|OrderDate
1|ORD01 |adent#hog.com |9-991 |UK |2017-01-01
4|ORD04 |zaphod#hog.com|9-9943 |UK |2017-01-04
5|ORD05 |marvin#hog.com|9-9942 |UK |2017-01-05
As you can see, it returns Mr. Dent, Zaphod and Marvin, who all occur only once in the example data.
Another approach that might work is that you group by email address and get only those with one entry. It may behave unpredictably if you want to get customers with multiple orders but it should be fine for this particular case:
SELECT
tblOrders.OrderID,
tblOrders.OrderName,
tblOrders.OrderEmail,
tblOrders.OrderPhone,
tblOrders.OrderCountry,
tblOrders.OrderDate,
count(tblOrders.OrderID) as OrderCount
FROM
tblOrders
WHERE
tblOrders.OrderDate BETWEEN '2015-01-01' AND '2017-03-01'
AND tblOrders.OrderDelivered = - 1
GROUP BY
tblOrders.OrderEmail
HAVING
OrderCount = 1
ORDER BY
tblOrders.OrderID ASC;
Also, I suspect that if you're seeing so long query times with just 100k records, you probably don't have an index on the OrderEmail column - I suggest setting that up and that might help with your original queries as well.
This does not work in Oracle, or SQL Server but it does work in MySQL and SQLite. So, while the code is not portable between different RDBMS, it works for this particular case.

Displaying data with respect to specific date?

I am trying to make a reporting system where I need to display report
for each date.
These is my table schema for selected_items
This is stock_list
I am using php in the back-end and java in the front end to display
the data. I tried a couple of queries to get the desired output but so
far I am not able to get it.These are some of the queries i used.
SELECT
COALESCE(stock_list.date, selected_items.date) AS date,
SUM( stock_list.qty ) AS StockSum,
SUM( stock_list.weight ) AS Stockweight,
COUNT( selected_items.barcode ) AS BilledItems,
SUM( selected_items.weight ) AS Billedweight
FROM stock_list join selected_items
ON stock_list.date = selected_items.date
GROUP BY COALESCE(stock_list.date, selected_items.date)
ORDER BY COALESCE(stock_list.date, selected_items.date);
This gives me the first five columns but the output gives me wrong values.
Then I also tried Union.
SELECT SUM( qty ) AS StockSum, SUM( weight ) AS Stockweight
FROM `stock_list`
WHERE DATE LIKE '08-Jan-2016'
UNION SELECT COUNT( barcode ) AS BilledItems, SUM( weight ) AS Billedweight
FROM `selected_items`
WHERE DATE LIKE '08-Jan-2016'
UNION SELECT SUM( qty ) AS TotalStock, SUM( weight ) AS TotalWeight
FROM `stock_list`;
Here I get the correct values for four columns but the problem is the >result is displayed in two columns when I would like it to be in 4 columns.
Can anyone guide me please I have figured the java part of it but I am not good at php and mysql.
Thank you
Unfortunately, SQL Fiddle crashed while I was trying to execute this query
SELECT sl.date AS date, B.qtySum AS StockSum, B.weightSum AS Stockweight,
C.barcodeCount AS BilledItems, C.weightSum AS Billedweight
FROM stock_list sl
JOIN (SELECT SUM(qty) as qtySum, SUM(weight) as weightSum
FROM STOCK_LIST GROUP BY date) AS B
ON B.date = sl.date
JOIN (SELECT SUM (weight) AS weightSum, COUNT(barcode) AS barcodeCount
FROM SELECTED_ITEMS GROUP BY date) AS C
ON C.date = sl.date;
As it was tried here. The problem with joins is that the rows will be joined multiple times and thus, the sum goes awry. For example, you have four rows that are joined from the second table and so the sum is four times higher as it should. With subqueries you can avoid this problem as you count and sum up variables before joining them and therefore, the numbers should fit. Alas, I couldn't run the query so I'm not 100% sure it works, but it should be the right approach.

MySQL right outer join query

I have a query regarding a query in MySQL.
I have 2 tables one containing SalesRep details like name, email, etc. I have another table with the sales data which has reportDate, customers served and link to the salesrep via a foreign key. One thing to note is that the reportDate is always a friday.
So the requirement is this: I need to find sales data for a 13 week period for a given list of sales reps - with 0 as customers served if on a particular friday there is no data. The query result is consumed by a Java application which relies on the 13 rows of data per sales rep.
I have created a table with all the Friday dates populated and wrote a outer join like below:
select * from (
select name, customersServed, reportDate
from Sales_Data salesData
join `SALES_REPRESENTATIVE` salesRep on salesRep.`employeeId` = salesData.`employeeId`
where employeeId = 1
) as result
right outer join fridays on fridays.datefield = reportDate
where fridays.datefield between '2014-10-01' and '2014-12-31'
order by datefield
Now my doubts:
Is there any way where i can get the name to be populated for all 13 rows in the above query?
If there are 2 sales reps, I'd like to use a IN clause and expect 26 rows in total - 13 rows per sales person (even if there is no record for that person, I'd still like to see 13 rows of nulls), and 39 for 3 sales reps
Can these be done in MySql and if so, can anyone point me in the right direction?
You must first select your lines (without customersServed) and then make an outer join for the customerServed
something like that:
select records.name, records.datefield, IFNULL(salesRep.customersServed,0)
from (
select employeeId, name, datefield
from `SALES_REPRESENTATIVE`, fridays
where fridays.datefield between '2014-10-01' and '2014-12-31'
and employeeId in (...)
) as records
left outer join `Sales_Data` salesData on (salesData.employeeId = records.employeeId and salesData.reportDate = records.datefield)
order by records.name, records.datefield
You'll have to do 2 level nesting, in your nested query change to outer join for salesrep, so you have atleast 1 record for each rep, then a join with fridays without any condition to have atleast 13 record for each rep, then final right outer join with condition (fridays.datefield = innerfriday.datefield and (reportDate is null or reportDate=innerfriday.datefield))
Very inefficient, try to do it in code except for very small data.

How can I optimize the query below which uses three levels of select statements?

How to optimize the below query:
I have two tables, 'calendar_table' and 'consumption', Here I use this query to calculate monthly consumption for each year.
The calendar table has day, month and year for years 2005 - 2009 and consumption table has billed consumption data for monthly bill cycle. This query will count the number of days for each bill and use that the find the consumption for each month.
SELECT id,
date_from as bill_start_date,
theYear as Year,
MONTHNAME(STR_TO_DATE(theMonth, '%m')) as month,
sum(DaysOnBill),
TotalDaysInTheMonth,
sum(perDayConsumption * DaysOnBill) as EstimatedConsumption
FROM
(
SELECT
id,
date_from,
theYear,
theMonth, # use theMonth for displaying the month as a number
COUNT(*) AS DaysOnBill,
TotalDaysInTheMonth,
perDayConsumption
FROM
(
SELECT
c.id,
c.date_from as date_from,
ct.dt,
y AS theYear,
month AS theMonth,
DAY(LAST_DAY(ct.dt)) as TotalDaysInTheMonth,
perDayConsumption
FROM
consumption AS c
INNER JOIN
calendar_table AS ct
ON ct.dt >= c.date_from
AND ct.dt<= c.date_to
) AS allDates
GROUP BY
id,
date_from,
theYear,
theMonth ) AS estimates
GROUP BY
id,
theYear,
theMonth;
It is taking around 1000 seconds to go through around 1 million records. Can something be done to make it faster?.
The query is a bit dubious pretending to do one grouping first and then building on that with another, which actually isn't the case.
First the bill gets joined with all its days. Then we group by bill plus month and year thus getting a monthly view on the data. This could be done in one pass, but the query is joining first and then using the result as a derived table which gets aggregated. At last the results are taken again and "another" group is built, which is actually the same as before (bill plus month and year) and some pseudo aggregations are done (e.g. sum(perDayConsumption * DaysOnBill) which is the same as perDayConsumption * DaysOnBill, as SUM sums one record only here).
This can simply written as:
SELECT
c.id,
c.date_from as bill_start_date,
ct.y AS Year,
MONTHNAME(STR_TO_DATE(ct.month, '%m')) as month,
COUNT(*) AS DaysOnBill,
DAY(LAST_DAY(ct.dt)) as TotalDaysInTheMonth,
SUM(c.perDayConsumption) as EstimatedConsumption
FROM consumption AS c
INNER JOIN calendar_table AS ct ON ct.dt BETWEEN c.date_from AND c.date_to
GROUP BY
c.id,
ct.y,
ct.month;
I don't know if this will be faster or if MySQL's optimizer doesn't see through your query itself and boils it down to this anyhow.

MySQL Query with Multiple Dates

I'm trying to pull some activity reports for an application based on date ranges (number of "Stars" for each post)
It has a post table that includes and account id, and an "affiliate" table that ties that id to an account.
I know that I can do:
SELECT
posts.affid,
affiliates.name
sum(posts.stars) AS SEPT_2012
from posts
JOIN affiliates on posts.affid = affiliates.id
WHERE posts.timestamp BETWEEN '2012-09-01' AND '2012-10-01'
group by affid
That will pull a result that has the affiliate ID, Name and total "stars" from September. A single month
However, I'd like to do a pull that goes back and gets numbers for August, July, June and May that would display in a single query result (so the result would include affid, name, SEPT_2012, AUG_2012, JUL_2012...etc). Essentially, doing subqueries for those other date ranges, I assume.
Any suggestions?
Thanks for your help!
You probably want to GROUP BY EXTRACT(YEAR_MONTH FROM timestamp) (in addition to whatever else you want to do). Of course it will not get you the SEPT_2012, AUG_2012, etc. columns, but the data will be there.
While you won't be able to dynamically create the columns, you can "fake" them and use a UNION for each date range. Inside each UNION, you select 0 for the other date columns and the SUM() for the correct column.
Something similar to this should work:
SELECT
posts.affid,
affiliates.name,
sum(posts.stars) AS SEPT_2012,
0 AS AUG_2012,
0 AS JUL_2012
from
posts
JOIN affiliates on posts.affid = affiliates.id
WHERE
posts.timestamp BETWEEN '2012-09-01' AND '2012-10-01'
group by affid
UNION (
SELECT
posts.affid,
affiliates.name
0 AS SEPT_2012,
sum(posts.stars) AS AUG_2012,
0 AS JUL_2012
from
posts
JOIN affiliates on posts.affid = affiliates.id
WHERE
posts.timestamp BETWEEN '2012-08-01' AND '2012-09-01'
group by affid
)
UNION (
SELECT
posts.affid,
affiliates.name
0 AS SEPT_2012,
0 AS AUG_2012,
sum(posts.stars) AS JUL_2012
from
posts
JOIN affiliates on posts.affid = affiliates.id
WHERE
posts.timestamp BETWEEN '2012-07-01' AND '2012-08-01'
group by affid
)
UPDATE (to combine all results for each affid on a single row)
Per a comment, you would like to combine the results for each posts.affid on a single row with all of the data in each column. You can achieve this by putting an outer-query around the full query above and then using GROUP BY affid again. With this, you should have a single row for each affid and all of the columns as requested. I've updated the above query to select 0 for each empty column instead of null for "nicer" output too:
SELECT affid, name, SEPT_2012, AUG_2012, JUL_2012 FROM (
... full query above ...
) AS q
GROUP BY affid
UPDATE
To get the sum of all "stars" from all subqueries, the outer select statement works with:
SELECT affid, name, sum(SEPT_2012), sum(AUG_2012), sum(JUL_2012) FROM (
... full query above ...
) AS q
GROUP BY affid