I have a database with over 100,000 records. I'm trying to get all customers who ordered only once searching by customer's email field (OrderEmail).
The SQL query is running for 10 minutes and then times out.
If I use short date ranges, I can get results but it still takes over 3 minutes.
How can I optimize the syntax to get it work?
SELECT
tblOrders.OrderID,
tblOrders.OrderName,
tblOrders.OrderEmail,
tblOrders.OrderPhone,
tblOrders.OrderCountry,
tblOrders.OrderDate
FROM
tblOrders
LEFT JOIN tblOrders AS orders_join ON orders_join.OrderEmail = tblOrders.OrderEmail
AND NOT orders_join.OrderID = tblOrders.OrderID
WHERE
orders_join.OrderID IS NULL
AND (tblOrders.OrderDate BETWEEN '2015-01-01' AND '2017-03-01')
AND tblOrders.OrderDelivered = - 1
ORDER BY
tblOrders.OrderID ASC;
I would expect the below to work - but I can't test it as you don't provide sample data. Well, I added a temporary table definition that could be used for the query ....
But , if you could actually change the data model to use an INTEGER id for the entity who placed the order (instead of a VARCHAR() email address), you would get considerably faster.
CREATE TEMPORARY TABLE IF NOT EXISTS
tblorders(orderid,ordername,orderemail,orderphone,ordercountry,orderdate) AS (
SELECT 1,'ORD01','adent#hog.com' ,'9-991' ,'UK', DATE '2017-01-01'
UNION ALL SELECT 2,'ORD02','tricia#hog.com','9-992' ,'UK', DATE '2017-01-02'
UNION ALL SELECT 3,'ORD03','ford#hog.com' ,'9-993' ,'UK', DATE '2017-01-03'
UNION ALL SELECT 4,'ORD04','zaphod#hog.com','9-9943','UK', DATE '2017-01-04'
UNION ALL SELECT 5,'ORD05','marvin#hog.com','9-9942','UK', DATE '2017-01-05'
UNION ALL SELECT 6,'ORD06','ford#hog.com' ,'9-993' ,'UK', DATE '2017-01-06'
UNION ALL SELECT 7,'ORD07','tricia#hog.com','9-992' ,'UK', DATE '2017-01-07'
UNION ALL SELECT 8,'ORD08','benji#hog.com' ,'9-995' ,'UK', DATE '2017-01-08'
UNION ALL SELECT 9,'ORD09','benji#hog.com' ,'9-995' ,'UK', DATE '2017-01-09'
UNION ALL SELECT 10,'ORD10','ford#hog.com' ,'9-993' ,'UK', DATE '2017-01-10'
)
;
SELECT
tblOrders.OrderID
, tblOrders.OrderName
, tblOrders.OrderEmail
, tblOrders.OrderPhone
, tblOrders.OrderCountry
, tblOrders.OrderDate
FROM tblOrders
JOIN (
SELECT
OrderEmail
FROM tblOrders
GROUP BY
OrderEmail
HAVING COUNT(*) = 1
) singleOrders
ON singleOrders.OrderEmail = tblOrders.OrderEmail
ORDER BY OrderID
;
OrderID|OrderName|OrderEmail |OrderPhone|OrderCountry|OrderDate
1|ORD01 |adent#hog.com |9-991 |UK |2017-01-01
4|ORD04 |zaphod#hog.com|9-9943 |UK |2017-01-04
5|ORD05 |marvin#hog.com|9-9942 |UK |2017-01-05
As you can see, it returns Mr. Dent, Zaphod and Marvin, who all occur only once in the example data.
Another approach that might work is that you group by email address and get only those with one entry. It may behave unpredictably if you want to get customers with multiple orders but it should be fine for this particular case:
SELECT
tblOrders.OrderID,
tblOrders.OrderName,
tblOrders.OrderEmail,
tblOrders.OrderPhone,
tblOrders.OrderCountry,
tblOrders.OrderDate,
count(tblOrders.OrderID) as OrderCount
FROM
tblOrders
WHERE
tblOrders.OrderDate BETWEEN '2015-01-01' AND '2017-03-01'
AND tblOrders.OrderDelivered = - 1
GROUP BY
tblOrders.OrderEmail
HAVING
OrderCount = 1
ORDER BY
tblOrders.OrderID ASC;
Also, I suspect that if you're seeing so long query times with just 100k records, you probably don't have an index on the OrderEmail column - I suggest setting that up and that might help with your original queries as well.
This does not work in Oracle, or SQL Server but it does work in MySQL and SQLite. So, while the code is not portable between different RDBMS, it works for this particular case.
Related
It makes some years since I studied sql so, I am having trouble getting a
get a distinct list of formatted dates sorted using Mysql. I don't need to show my table because I only use one column of datetime
data_vencimento datetime
If I have 2018-10-29 , 2018-10-29, 2018-09-29. It should be sorted as
10/2018
09/2018
notice that the repeated date is "removed" and a sorted list of formatted date was generated
here is my attempt. It is generating repeated results.
select distinct(data_vencimento), date_format( data_vencimento,'%m/%Y' ) as data from (
select data_vencimento from custo_extra_movimento where id_admin
union
select data_vencimento as data from custo_fixo_movimento where id_admin
union
select data_vencimento as data from custo_variavel_movimento where id_admin) as tbl order by data_vencimento desc ;
Distinct is not a function; so you do not need to use parentheses with Distinct.
Nevertheless, you need a Distinct combination of Month and Year, so you can use Group By instead, alongwith date functions like Month() and Year().
Also, in your Union queries, defining data alias for second and third Select query will not serve any purpose. MySQL would consider the first Select query column name only.
Do the following instead:
SELECT
YEAR(tbl.data_vencimento) AS year_data,
MONTH(tbl.data_vencimento) AS month_data,
DATE_FORMAT( MAX(tbl.data_vencimento),'%m/%Y' ) AS data
FROM (
select data_vencimento from custo_extra_movimento where id_admin
union
select data_vencimento from custo_fixo_movimento where id_admin
union
select data_vencimento from custo_variavel_movimento where id_admin
) AS tbl
GROUP BY year_data, month_data
ORDER BY year_data DESC, month_data DESC
I think this is sufficient:
select date_format(data_vencimento, '%m/%Y') as data from custo_extra_movimento where id_admin
union -- on purpose to remove duplicates
select date_format(data_vencimento, '%m/%Y') as data from custo_fixo_movimento where id_admin
union -- on purpose to remove duplicates
select date_format(data_vencimento, '%m/%Y') as data from custo_variavel_movimento where id_admin
order by data desc;
To be honest, I am a little unclear on the logic for the ordering, so that might be off.
i have 3 tables
supplier(id_supp, name, adress, ...)
Customer(id_cust, name, adress, ...)
Order(id_order, ref_cust, ref_supp, date_order...)
I want to make a job that counts the number of orders by Supplier, for last_week, last_two_weeks with Talend
select
supp.name,
(
select
count(*)
from
order
where
date_order between sysdate-7 and sysdate
nd ref_supp=id_supp
) as week_1,
(
select
count(*)
from
order
where
date_order between sysdate-14 and sysdate-7
nd ref_supp=id_supp
) as week_2
from supplier supp
the resaon for what i'm doing this, is that my query took to much time
You need a join between supplier and order to get supplier names. I show an inner join, but if you need ALL suppliers (even those with no orders in the order table) you may change it to a left outer join.
Other than that, you should only have to read the order table once and get all the info you need. Your query does more than one pass (read EXPLAIN PLAN for your query), which may be why it is taking too long.
NOTE: sysdate has a time-of-day component (and perhaps the date_order value does too); the way you wrote the query may or may not do exactly what you want it to do. You may have to surround sysdate by trunc().
select s.name,
count(case when o.date_order between sysdate - 7 and sysdate then 1 end)
as week_1,
count(case when o.date_order between sysdate - 14 and sysdate - 7 then 1 end)
as week_2
from supplier s inner join order o
on s.id_supp = o.ref_supp
;
I am trying to query a table. There are 3 important fields: attendant_id, client_id, and date.
Each time an attendant works with a client, they add an entry which includes their id, the client's id, and the date. Occasionally, an attendant will work with more than one client on the same day. I would like to capture when this happens. Here is what I have so far:
SELECT *
FROM timesheet_lines tsl1
WHERE EXISTS
(
SELECT *
FROM timesheet_lines tsl2
WHERE tsl1.date = tsl2.date
AND tsl1.attendant_id = tsl2.attendant_id
AND tsl1.client_id <> tsl2.client_id
AND tsl1.date between '2014-04-01' AND '2014-06-30'
LIMIT 2,5
)
I only want to display results where an attendant worked with at least 2 different clients. I don't expect it to be possible to have more than 5 on a single day. This is why I am using LIMIT 2,5.
I am also only interested in April through June of this year.
I think I may have the right syntax, but the query seems to be taking forever to run. Is there a faster query? There should be only about 42000+ entries all together for this particular date range. I am not expecting to get more than about 500-600 results that meet the criteria.
I ended up using the following:
create TEMPORARY table tempTSL1
(date1 date, start1 time, end1 time, attend1 varchar(50), client1 varchar(50), type1 tinyint);
insert into tempTSL1(date1, start1, end1, attend1, client1, type1)
select date, start_time, end_time, attendant_id, client_id, type
from timesheet_lines
WHERE
timesheet_lines.date BETWEEN '2014-04-01' AND '2014-06-30'
and timesheet_lines.type IN (1,2,5,6);
create TEMPORARY table tempTSL2
(date2 date, start2 time, end2 time, attend2 varchar(50), client2 varchar(50), type2 tinyint);
insert into tempTSL2(date2, start2, end2, attend2, client2, type2)
select date, start_time, end_time, attendant_id, client_id, type
from timesheet_lines
WHERE
timesheet_lines.date BETWEEN '2014-04-01' AND '2014-06-30'
and timesheet_lines.type IN (1,2,5,6);
SELECT *
FROM tempTSL1
WHERE (attend1,date1) IN (
SELECT attend2
,date2
FROM tempTSL2 tsl2
GROUP BY attend2
,date2
HAVING COUNT(date2) > 1
)
GROUP BY attend1
,client1
,date1
HAVING COUNT(client1) = 1
ORDER BY date1,attend1,start1
You are likely making it much more complex than it needs to be. Try something like this:
SELECT attendant_id
,client_id
,date
FROM timesheet_lines
WHERE (attendant_id,date) IN (
SELECT attendant_id
,date
FROM timesheet_lines tsl1
GROUP BY attendant_id
,date
HAVING COUNT(date) > 1
)
GROUP BY attendant_id
,client_id
,date
HAVING COUNT(client_id) = 1
The subquery returns results only of attendants performing multiple activities on the same date. The top query will pull from the same table, matching the attendant and dates of activity, and filter the result set to items where there is only 1 client in the grouping. Example:
attendant_id client_id date
1 A 2014-01-01
1 B 2014-01-01
2 C 2014-01-01
2 D 2014-01-02
Will return:
attendant_id client_id date
1 A 2014-01-01
1 B 2014-01-01
Untested, but I think it should be in line with what you are looking for, assuming the following two statements are true:
You are not trying to capture two different attendants working the same client on the same day
An attendant can only perform one activity per client per day
If the second point is not true, then you will need to incorporate additional fields into the subquery (such as an activity_id or something).
Hope this helps.
I have "users" table with fields
user_name, user_id
I have data tables like
data_table_2012_10
data_table_2012_11
data_table_2012_12
data_table_2013_01
data_table_2013_02
each table contains the following fields
user_id, type ('ALARM', 'EMERGENCY', 'ALIVE', 'DEAD'), date_time
There will be millions of records in each table.
I have to select the count of type from the data_tables within the time frame given by the user, as well as have to get the corresponding name of the user with the help of user_id.
Can some one help me out with the best solution.
Try this query where DATE1 and DATE2 is your date range. You should union all tables in the inner query. Also you can try to make a query dynamically to include in the inner query only those tables that are in a date range you use:
select t.user_id,t.type, MAX(users.user_name), SUM(t.cnt)
from
(
select user_id,type,count(*) cnt
from data_table_2012_10 where date_time between DATE1 and DATE2
group by user_id,type
union all
select user_id,type,count(*) cnt
from data_table_2012_11 where date_time between DATE1 and DATE2
group by user_id,type
union all
.........................................
union all
select user_id,type,count(*) cnt
from data_table_2013_02 where date_time between DATE1 and DATE2
group by user_id,type
) t
left join users on (t.user_id=users.user_id)
group by t.user_id,t.type
Remember not to use UNION, but UNION ALL as UNION will return only merge similar rows into one and that may cause problem
I am trying to write a query which will give me the last entry of each month in a table called transactions. I believe I am halfway there as I have the following query which groups all the entries by month then selects the highest id in each group which is the last entry for each month.
SELECT max(id),
EXTRACT(YEAR_MONTH FROM date) as yyyymm
FROM transactions
GROUP BY yyyymm
Gives the correct results
id yyyymm
100 201006
105 201007
111 201008
118 201009
120 201010
I don’t know how to then run a query on the same table but select the balance column where it matches the id from the first query to give results
id balance date
120 10000 2010-10-08
118 11000 2010-09-29
I've tried subqueries and looked at joins but i'm not sure how to go about using them.
You can make your first select an inline view, and then join to it. Something like this (not tested, but should give you the idea):
SELECT x.id
, t.balance
, t.date
FROM your_table t
/* here, we make your select an inline view, then we can join to it */
, (SELECT max(id) id,
EXTRACT(YEAR_MONTH FROM date) as yyyymm
FROM transactions
GROUP BY yyyymm) x
WHERE t.id = x.id