Conditioning a column that's being selected by subquery - mysql

I have a following query
select distinct `orders`.`id`, `order_status`,
(SELECT `setting_id` FROM `settings` WHERE `settings`.`order_id` = `orders`.`id`) AS `setting_id`,
from `orders` order by orders.id desc limit 100 OFFSET 0
which works just fine, the column "setting_id" is being fetched as it should but when I add another WHERE
and `setting_id` = 2
it cannot find it and outputs Unknown column 'setting_id' in 'where clause'. Why?

I would change your query to a JOIN vs selecting column. Slower as it has to perform the column select every time. Now, if you only want for a setting_id = 2, just add that the the JOIN portion of the condition
select distinct
o.id,
o.order_status,
s.setting_id
from
orders o
join settings s
on o.id = s.order_id
AND s.setting_id = 2
order by
orders.id desc
limit
100 OFFSET 0
(it was also failing because you had an extra comma after your select setting column).
You could also have reversed the join by starting with your setting table so you were only concerned with those with status of 2, then finding the orders. I would ensure that your setting table had an index on ( setting_id, order_id ) to optimize the query.
select distinct
s.order_id id,
o.order_status,
s.setting_id
from
settings s
JOIN orders o
on s.order_id = o.id
where
s.setting_id = 2
order by
s.order_id desc
limit
100 OFFSET 0
With proper index as suggested, the above should be lightning fast directly from the index
Another consideration for an apparent large table, is to limit the days-back you query orders to limit your 100. Go back 1 month? 2? 15 days? How large is your orders table to make it drag for 10 seconds. That may be a better choice for you.

Related

How to get data based on row count in MySQL?

I am working on a project where I have to fetch data from database if the row count is greater than zero then show it otherwise don't. But my query returning all rows.
This is My Query
SELECT
d.id,
d.district,
(
SELECT
COUNT(a.district_id)
FROM
ambulance AS a
WHERE
a.district_id = d.id
) AS total
FROM
district d
ORDER BY
total
DESC
That is okay, but I added a WHERE clause with my query witch is,
WHERE total > 0
But I am having a sql error Unknown column 'total' in 'where clause'
Now my question is, how can I achieve a result with WHERE total > 0, do I have to type something else in the place of total? What is the proper way to add this WHERE clause in my query.
MySQL extends the use of the HAVING clause, so it can be used in non-aggregation queries. This allows you to use an alias:
SELECT d.id, d.district,
(SELECT COUNT(*)
FROM ambulance a
WHERE a.district_id = d.id
) AS total
FROM district d
HAVING total > 0
ORDER BY total DESC;
This logic would more colloquially be written using an inner join:
select d.id, d.district, count(*) as total
from district d join
ambulance a
on a.district_id = d.id
group by d.id, d.district
order by total desc;
The join requires that there be at least one match.

How to Optimize mysql query the brings huge quantity of rows?

I really need your help. I´m doing one work from my universitiy and before I come here I read a lot of things from the documentations of mysql, searched and searched but none of this helped me in my sql query. Look I have this query:
SELECT a.nome, COUNT(*)
FROM publ p JOIN auth a on p.pubid = a.pubid
WHERE p.pubid IN (SELECT pubid
FROM auth
GROUP BY pubid
HAVING COUNT(*) < 3) // THIS VALUE 3 here I have to do with value 2, 4 and 5
GROUP BY a.nome // in different querys.
ORDER BY COUNT(*) DESC, a.nome ASC
I tried to put index in the where clause but I never get the results and takes to long time. What can I do to increase my query to bring me more faster the results? Thank you for the help
I would create these indexes and reorder the query
CREATE INDEX publ_pubid ON publ(pubid);
CREATE INDEX auth_pubid ON auth(pubid, nome);
SELECT a.nome, COUNT(*)
FROM (
SELECT pubid
FROM auth
GROUP BY pubid
HAVING COUNT(*) < 3
) L
LEFT JOIN publ p on L.pubid=publ.pubid
JOIN auth a on p.pubid = a.pubid
GROUP BY a.nome
ORDER BY COUNT(*) DESC, a.nome ASC;

Sql efficient query from multiple tables

I've two tables tbl_data and tbl_user_data
Structure of tbl_data
id (int) (primary)
names (varchar)
dept_id (int)
Structure of tbl_user_data:
id (int) (primary)
user_id (int)
names_id (int)
tbl_data.id and tbl_user_data.names_id are foreign key
I've situation where I've to pick 25 random entries from tbl_data which is not served earlier to particular user. So I've created a tbl_user_data which will store user_id and names_id(from tbl_data which is already served).
I'm bit confused, how to query on behalf of this or is there any other way to do this efficiently ?
Note: tbl_data have more than 5 million entries.
So far I've written this but it seems its not right.
SELECT td.names, td.dept_id
FROM tbl_data AS td
LEFT JOIN tbl_user_data AS tud ON td.id = tud.names_id
WHERE tud.user_id !=2
ORDER BY RAND( ) LIMIT 25
Two things:
First ... you need the LEFT JOIN .... IS NULL pattern to pick out your not-yet-served items. You'll need to mention the user id in the ON clause to get this to work correctly.
SELECT td.names, td.dept_id
FROM tbl_data AS td
LEFT JOIN tbl_user_data AS tud ON td.id = tud.names_id
AND tud.user_id = 2
WHERE tud.id IS NULL
ORDER BY RAND( ) LIMIT 25
Second, ORDER BY RAND() LIMIT ... is a notoriously poor performer on a large table. It has to select the entire table, then sort it, then discard all except 25 items from it. That's massively wasteful and will never perform decently.
You can make it a little less wasteful by sorting just the id values, then using them to get the other information.
This gets your 25 random ID values.
SELECT td.id
FROM tbl_data AS td
LEFT JOIN tbl_user_data AS tud ON td.id = tud.names_id
AND tud.user_id = 2
WHERE tud.id IS NULL
ORDER BY RAND( )
LIMIT 25
This gets your names and dept_id values.
SELECT a.names, a.dept_id
FROM tbl_data AS a
JOIN (
SELECT td.id
FROM tbl_data AS td
LEFT JOIN tbl_user_data AS tud ON td.id = tud.names_id
AND tud.user_id = 2
WHERE tud.id IS NULL
ORDER BY RAND( )
LIMIT 25
) b ON a.id = b.id
But, it's still wasteful. You may want to build a randomized version of this tbl_data table, and then use it sequentially. You could re-randomize it once a day, with something like this.
DROP TABLE tbl_data_random;
INSERT INTO tbl_data_random FROM
SELECT *
FROM tbl_data
ORDER BY RAND()
That way you don't do the sort over and over again, just to discard the results. Instead, you randomize once in a while.
As you're not selecting anything from the tbl_user_data, you can use exists instead:
SELECT td.names, td.dept_id
FROM tbl_data AS td
where exists (
select 1
from tbl_user_data AS tud
where td.id = tud.names_id
and tud.user_id !=2
)
ORDER BY RAND( ) LIMIT 25
Index on tbl_data(id) and tbl_user_data(names_id, user_id) will help.
Create index on names_id and user_id. Why is user_id varchar?
If need to be varchar and is varchar very long, create partial index on user_id.
You can use EXPLAIN to see what index use your query.

How can I speed up a multiple inner join query?

I have two tables. The first table (users) is a simple "id, username" with 100,00 rows and the second (stats) is "id, date, stat" with 20M rows.
I'm trying to figure out which username went up by the most in stat and here's the query I have. On a powerful machine, this query takes minutes to complete. Is there a better way to write it to speed it up?
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN stats AS b ON (b.id=a.id)
INNER JOIN stats AS c ON (c.id=a.id)
WHERE b.date = '2016-01-10'
AND c.date = '2016-01-13'
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
the other way i tried but it doesn't seem optimal is
SELECT a.id, a.username,
(SELECT b.stat FROM stats AS b ON (b.id=a.id) AND b.date = '2016-01-10') AS start,
(SELECT c.stat FROM stats AS c ON (c.id=a.id) AND c.date = '2016-01-14') AS end,
((SELECT b.stat FROM stats AS b ON (b.id=a.id) AND b.date = '2016-01-10') -
(SELECT c.stat FROM stats AS c ON (c.id=a.id) AND c.date = '2016-01-14')) AS stat_diff
FROM users AS a
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
Introduction
Let's suppose we rewrite sentence like this:
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN stats AS b ON
b.date = STR_TO_DATE('2016-01-10', '%Y-%m-%d' ) and b.id=a.id
INNER JOIN stats AS c ON
c.date = STR_TO_DATE('2016-01-13', '%Y-%m-%d' ) and c.id=a.id
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
And we ensure than:
users table has index on field id:
stats has index on composite field date, id: create index stats_idx_d_i on stats ( date, id );
Then
Database optimizer may use indexes to selected a Restricted Set of Date ('RSD'), that means, rows that match filtered dates. This is fast.
But
You are sorting by a calculated field:
(b.stat - c.stat) AS stat_diff #<-- calculated
ORDER BY stat_diff DESC #<-- this forces to calculate it
They are no possible optimization on this sort because you should to calculate one by one all results on your 'RSD' (restricted set of data).
Conclusion
The question is, how may rows they are on your 'RSD'? If only they are few hundreds rows you query may run fast, else, your query will be slow.
Any case, you should to be sure the first step of query ( without sorting ) is made by index and no fullscanning. Use Explain command to be sure.
All you need to do is to help optimizer.At a bare minimum.have a check list which looks like below
1.Are my join columns indexed ?
2.Are the where clauses Sargable
3.are there any implicit,explicit conversions
4.Am i seeing any statistics issues
one more interesting aspect to look at is how is your data distributed,once you understand the data,you will be able to intrepret the execution plan and alter it as per your need
EX:
Think like i have any customers table with 100rows,Each one has a minimum of 10 orders(total upto 10000 orders).Now if you need to find out only top 3 orders by date,you dont want a scan happening of orders table
Now in your case ,i may not go with second option,even though the optimizer may choose a good plan for this one as well,I will go first approach and try to see if the execution time is acceptable.if not then i will go through my check list and try to tune it further
The Query Seems OK, Verify your Indexes ..
Or
Try this Query
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN (select id,stat from stats where date = '2016-01-10') AS b ON (b.id=a.id)
INNER JOIN (select id,stat from stats where date = '2016-01-13') AS c ON (c.id=a.id)
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100

Slow aggregate query with join on same table

I have a query to show customers and the total dollar value of all their orders. The query takes about 100 seconds to execute.
I'm querying on an ExpressionEngine CMS database. ExpressionEngine uses one table exp_channel_data, for all content. Therefore, I have to join on that table for both customer and order data. I have about 14,000 customers, 30,000 orders and 160,000 total records in that table.
Can I change this query to speed it up?
SELECT link.author_id AS customer_id,
customers.field_id_122 AS company,
Sum(orders.field_id_22) AS total_orders
FROM exp_channel_data customers
JOIN exp_channel_titles link
ON link.author_id = customers.field_id_117
AND customers.channel_id = 7
JOIN exp_channel_data orders
ON orders.entry_id = link.entry_id
AND orders.channel_id = 3
GROUP BY customer_id
Thanks, and please let me know if I should include other information.
UPDATE SOLUTION
My apologies. I noticed that entry_id for the exp_channel_data table customers corresponds to author_id for the exp_channel_titles table. So I don't have to use field_id_117 in the join. field_id_117 duplicates entry_id, but in a TEXT field. JOINING on that text field slowed things down. The query is now 3 seconds
However, the inner join solution posted by #DRapp is 1.5 seconds. Here is his sql with a minor edit:
SELECT
PQ.author_id CustomerID,
c.field_id_122 CompanyName,
PQ.totalOrders
FROM
( SELECT
t.author_id
SUM( o.field_id_22 ) as totalOrders
FROM
exp_channel_data o
JOIN
exp_channel_titles t ON t.author_id = o.entry_id AND o.channel_id = 3
GROUP BY
t.author_id ) PQ
JOIN
exp_channel_data c ON PQ.author_id = c.entry_id AND c.channel_id = 7
ORDER BY CustomerID
If this is the same table, then the same columns across the board for all alias instances.
I would ensure an index on (channel_id, entry_id, field_id_117 ) if possible. Another index on (author_id) for the prequery of order totals
Then, start first with what will become an inner query doing nothing but a per customer sum of order amounts.. Since the join is the "author_id" as the customer ID, just query/sum that first. Not completely understanding the (what I would consider) poor design of the structure, knowing what the "Channel_ID" really indicates, you don't want to duplicate summation values because of these other things in the mix.
select
o.author_id,
sum( o.field_id_22 ) as totalOrders
FROM
exp_channel_data customers o
where
o.channel_id = 3
group by
o.author_id
If that is correct on the per customer (via author_id column), then that can be wrapped as follows
select
PQ.author_id CustomerID,
c.field_id_122 CompanyName,
PQ.totalOrders
from
( select
o.author_id,
sum( o.field_id_22 ) as totalOrders
FROM
exp_channel_data customers o
where
o.channel_id = 3
group by
o.author_id ) PQ
JOIN exp_channel_data c
on PQ.author_id = c.field_id_117
AND c.channel_id = 7
Can you post the results of an EXPLAIN query?
I'm guessing that your tables are not indexed well for this operation. All of the columns that you join on should probably be indexed. As a first guess I'd look at indexing exp_channel_data.field_id_117
Try something like this. Possibly you have error in joins. also check whether joins on columns are correct in your databases. Cross join may takes time to fetch large data, by mistake if your joins are not proper on columns.
select
link.author_id as customer_id,
customers.field_id_122 as company,
sum(orders.field_id_22) as total_or_orders
from exp_channel_data customers
join exp_channel_titles link on (link.author_id = customers.field_id_117 and
link.author_id = customer.channel_id = 7)
join exp_channel_data orders on (orders.entry_id = link.entry_id and orders.entry_id = orders.channel_id = 3)
group by customer_id