I have a database with over 100,000 records. I'm trying to get all customers who ordered only once searching by customer's email field (OrderEmail).
The SQL query is running for 10 minutes and then times out.
If I use short date ranges, I can get results but it still takes over 3 minutes.
How can I optimize the syntax to get it work?
SELECT
tblOrders.OrderID,
tblOrders.OrderName,
tblOrders.OrderEmail,
tblOrders.OrderPhone,
tblOrders.OrderCountry,
tblOrders.OrderDate
FROM
tblOrders
LEFT JOIN tblOrders AS orders_join ON orders_join.OrderEmail = tblOrders.OrderEmail
AND NOT orders_join.OrderID = tblOrders.OrderID
WHERE
orders_join.OrderID IS NULL
AND (tblOrders.OrderDate BETWEEN '2015-01-01' AND '2017-03-01')
AND tblOrders.OrderDelivered = - 1
ORDER BY
tblOrders.OrderID ASC;
I would expect the below to work - but I can't test it as you don't provide sample data. Well, I added a temporary table definition that could be used for the query ....
But , if you could actually change the data model to use an INTEGER id for the entity who placed the order (instead of a VARCHAR() email address), you would get considerably faster.
CREATE TEMPORARY TABLE IF NOT EXISTS
tblorders(orderid,ordername,orderemail,orderphone,ordercountry,orderdate) AS (
SELECT 1,'ORD01','adent#hog.com' ,'9-991' ,'UK', DATE '2017-01-01'
UNION ALL SELECT 2,'ORD02','tricia#hog.com','9-992' ,'UK', DATE '2017-01-02'
UNION ALL SELECT 3,'ORD03','ford#hog.com' ,'9-993' ,'UK', DATE '2017-01-03'
UNION ALL SELECT 4,'ORD04','zaphod#hog.com','9-9943','UK', DATE '2017-01-04'
UNION ALL SELECT 5,'ORD05','marvin#hog.com','9-9942','UK', DATE '2017-01-05'
UNION ALL SELECT 6,'ORD06','ford#hog.com' ,'9-993' ,'UK', DATE '2017-01-06'
UNION ALL SELECT 7,'ORD07','tricia#hog.com','9-992' ,'UK', DATE '2017-01-07'
UNION ALL SELECT 8,'ORD08','benji#hog.com' ,'9-995' ,'UK', DATE '2017-01-08'
UNION ALL SELECT 9,'ORD09','benji#hog.com' ,'9-995' ,'UK', DATE '2017-01-09'
UNION ALL SELECT 10,'ORD10','ford#hog.com' ,'9-993' ,'UK', DATE '2017-01-10'
)
;
SELECT
tblOrders.OrderID
, tblOrders.OrderName
, tblOrders.OrderEmail
, tblOrders.OrderPhone
, tblOrders.OrderCountry
, tblOrders.OrderDate
FROM tblOrders
JOIN (
SELECT
OrderEmail
FROM tblOrders
GROUP BY
OrderEmail
HAVING COUNT(*) = 1
) singleOrders
ON singleOrders.OrderEmail = tblOrders.OrderEmail
ORDER BY OrderID
;
OrderID|OrderName|OrderEmail |OrderPhone|OrderCountry|OrderDate
1|ORD01 |adent#hog.com |9-991 |UK |2017-01-01
4|ORD04 |zaphod#hog.com|9-9943 |UK |2017-01-04
5|ORD05 |marvin#hog.com|9-9942 |UK |2017-01-05
As you can see, it returns Mr. Dent, Zaphod and Marvin, who all occur only once in the example data.
Another approach that might work is that you group by email address and get only those with one entry. It may behave unpredictably if you want to get customers with multiple orders but it should be fine for this particular case:
SELECT
tblOrders.OrderID,
tblOrders.OrderName,
tblOrders.OrderEmail,
tblOrders.OrderPhone,
tblOrders.OrderCountry,
tblOrders.OrderDate,
count(tblOrders.OrderID) as OrderCount
FROM
tblOrders
WHERE
tblOrders.OrderDate BETWEEN '2015-01-01' AND '2017-03-01'
AND tblOrders.OrderDelivered = - 1
GROUP BY
tblOrders.OrderEmail
HAVING
OrderCount = 1
ORDER BY
tblOrders.OrderID ASC;
Also, I suspect that if you're seeing so long query times with just 100k records, you probably don't have an index on the OrderEmail column - I suggest setting that up and that might help with your original queries as well.
This does not work in Oracle, or SQL Server but it does work in MySQL and SQLite. So, while the code is not portable between different RDBMS, it works for this particular case.
I have a very simple table which consists of the following columns:
id | customer_id | total | created_at
I was running this query to get the results per day for the last ten days:
SELECT SUM(total) AS total, DATE_FORMAT(created_at, "%d/%m/%Y") AS date
FROM table
WHERE created_at BETWEEN "2017-02-20" AND "2017-03-01"
GROUP BY created_at
ORDER BY created_at DESC
This works fine, but I've just noticed that there's an issue with imported rows being duplicated for some reason so I'd like to update the query to be able to handle the situation if it ever happens again, in other words select one row instead of all when the date and customer id are the same (the total is also identical).
If I add customer_id to the group by that seems to work but the trouble with that is then the query returns a result per day for each customer when I only want the overall total.
I've tried a couple of things but I haven't cracked it yet, I think it will be achievable using a sub query and/or an inner join, I have tried this so far but the figures are very wrong:
SELECT
created_at,
(
SELECT SUM(total)
FROM table test
WHERE test.created_at = table.created_at
AND test.customer_id = table.customer_id
GROUP BY customer_id, created_at
LIMIT 1
) AS total
FROM table
WHERE created_at BETWEEN "2017-02-20" AND "2017-03-01"
GROUP BY created_at
ORDER BY created_at DESC
It's also a large table so finding a performant way to do this is also important.
First, are you sure that created_at is a date and not a datetime? This makes a big difference.
You can do what you want using two levels of aggregation:
SELECT SUM(max_total) AS total, DATE_FORMAT(created_at, '%d/%m/%Y') AS date
FROM (SELECT t.customer_id, t.created_at, MAX(total) as max_total
FROM table t
WHERE t.created_at BETWEEN '2017-02-20' AND '2017-03-01'
GROUP BY t.customer_id, t.created_at
) t
GROUP BY created_at
ORDER BY created_at DESC;
I have a column of type date. I want to group rows per month for a particular year.
I did the following:
SELECT SUM(price), DATE_FORMAT(production_date, '%Y%m')
FROM TABLE
WHERE YEAR(production_date) = ?
GROUP BY 2
Is there a better/more efficient way to do this?
Instead of specifying the year using the YEAR function
SELECT SUM(price), DATE_FORMAT(production_date, '%Y%m')
FROM TABLE
WHERE YEAR(production_date) = 2014
GROUP BY 2
specify it like this
SELECT SUM(price), DATE_FORMAT(production_date, '%Y%m')
FROM TABLE
WHERE production_date BETWEEN '2014-01-01' AND '2014-12-31'
GROUP BY 2
The 2nd query enables the db to use an index on production_date
you are almost there :)
SELECT SUM(price), MONTH(production_date)
FROM TABLE
WHERE YEAR(production_date) = ?
GROUP BY 2
I have "users" table with fields
user_name, user_id
I have data tables like
data_table_2012_10
data_table_2012_11
data_table_2012_12
data_table_2013_01
data_table_2013_02
each table contains the following fields
user_id, type ('ALARM', 'EMERGENCY', 'ALIVE', 'DEAD'), date_time
There will be millions of records in each table.
I have to select the count of type from the data_tables within the time frame given by the user, as well as have to get the corresponding name of the user with the help of user_id.
Can some one help me out with the best solution.
Try this query where DATE1 and DATE2 is your date range. You should union all tables in the inner query. Also you can try to make a query dynamically to include in the inner query only those tables that are in a date range you use:
select t.user_id,t.type, MAX(users.user_name), SUM(t.cnt)
from
(
select user_id,type,count(*) cnt
from data_table_2012_10 where date_time between DATE1 and DATE2
group by user_id,type
union all
select user_id,type,count(*) cnt
from data_table_2012_11 where date_time between DATE1 and DATE2
group by user_id,type
union all
.........................................
union all
select user_id,type,count(*) cnt
from data_table_2013_02 where date_time between DATE1 and DATE2
group by user_id,type
) t
left join users on (t.user_id=users.user_id)
group by t.user_id,t.type
Remember not to use UNION, but UNION ALL as UNION will return only merge similar rows into one and that may cause problem
I am trying to write a query which will give me the last entry of each month in a table called transactions. I believe I am halfway there as I have the following query which groups all the entries by month then selects the highest id in each group which is the last entry for each month.
SELECT max(id),
EXTRACT(YEAR_MONTH FROM date) as yyyymm
FROM transactions
GROUP BY yyyymm
Gives the correct results
id yyyymm
100 201006
105 201007
111 201008
118 201009
120 201010
I don’t know how to then run a query on the same table but select the balance column where it matches the id from the first query to give results
id balance date
120 10000 2010-10-08
118 11000 2010-09-29
I've tried subqueries and looked at joins but i'm not sure how to go about using them.
You can make your first select an inline view, and then join to it. Something like this (not tested, but should give you the idea):
SELECT x.id
, t.balance
, t.date
FROM your_table t
/* here, we make your select an inline view, then we can join to it */
, (SELECT max(id) id,
EXTRACT(YEAR_MONTH FROM date) as yyyymm
FROM transactions
GROUP BY yyyymm) x
WHERE t.id = x.id