I am having a problem trying to JOIN across a total of three tables:
Table users: userid, cap (ADSL bandwidth)
Table accounting: userid, sessiondate, used bandwidth
Table adhoc: userid, date, amount purchased
I want to have 1 query that returns a set of all users, their cap, their used bandwidth for this month and their adhoc purchases for this month:
< TABLE 1 ><TABLE2><TABLE3>
User | Cap | Adhoc | Used
marius | 3 | 1 | 3.34
bob | 1 | 2 | 1.15
(simplified)
Here is the query I am working on:
SELECT
`msi_adsl`.`id`,
`msi_adsl`.`username`,
`msi_adsl`.`realm`,
`msi_adsl`.`cap_size` AS cap,
SUM(`adsl_adhoc`.`value`) AS adhoc,
SUM(`radacct`.`AcctInputOctets` + `radacct`.`AcctOutputOctets`) AS used
FROM
`msi_adsl`
INNER JOIN
(`radacct`, `adsl_adhoc`)
ON
(CONCAT(`msi_adsl`.`username`,'#',`msi_adsl`.`realm`)
= `radacct`.`UserName` AND `msi_adsl`.`id`=`adsl_adhoc`.`id`)
WHERE
`canceled` = '0000-00-00'
AND
`radacct`.`AcctStartTime`
BETWEEN
'2010-11-01'
AND
'2010-11-31'
AND
`adsl_adhoc`.`time`
BETWEEN
'2010-11-01 00:00:00'
AND
'2010-11-31 00:00:00'
GROUP BY
`radacct`.`UserName`, `adsl_adhoc`.`id` LIMIT 10
The query works, but it returns wrong values for both adhoc and used; my guess would be a logical error in my joins, but I can't see it. Any help is very much appreciated.
Your query layout is too spread out for my taste. In particular, the BETWEEN/AND conditions should be on 1 line each, not 5 lines each. I've also removed the backticks, though you might need them for the 'time' column.
Since your table layouts don't match your sample query, it makes life very difficult. However, the table layouts all include a UserID (which is sensible), so I've written the query to do the relevant joins using the UserID. As I noted in a comment, if your design makes it necessary to use a CONCAT operation to join two tables, then you have a recipe for a performance disaster. Update your actual schema so that the tables can be joined by UserID, as your table layouts suggest should be possible. Obviously, you can use functions results in joins, but (unless your DBMS supports 'functional indexes' and you create appropriate indexes) the DBMS won't be able to use indexes on the table where the function is evaluated to speed the queries. For a one-off query, that may not matter; for production queries, it often does matter a lot.
There's a chance this will do the job you want. Since you are aggregating over two tables, you need the two sub-queries in the FROM clause.
SELECT u.UserID,
u.username,
u.realm,
u.cap_size AS cap,
h.AdHoc,
a.OctetsUsed
FROM msi_adsl AS u
JOIN (SELECT UserID, SUM(AcctInputOctets + AcctOutputOctets) AS OctetsUsed
FROM radact
WHERE AcctStartTime BETWEEN '2010-11-01' AND '2010-11-31'
GROUP BY UserID
) AS a ON a.UserID = u.UserID
JOIN (SELECT UserID, SUM(Value) AS AdHoc
FROM adsl_adhoc
WHERE time BETWEEN '2010-11-01 00:00:00' AND '2010-11-31 00:00:00'
GROUP BY UserId
) AS h ON h.UserID = u.UserID
WHERE u.canceled = '0000-00-00'
LIMIT 10
Each sub-query computes the value of the aggregate for each user over the specified period, generating the UserID and the aggregate value as output columns; the main query then simply pulls the correct user data from the main user table and joins with the aggregate sub-queries.
I think that the problem is here
FROM `msi_adsl`
INNER JOIN
(`radacct`, `adsl_adhoc`)
ON
(CONCAT(`msi_adsl`.`username`,'#',`msi_adsl`.`realm`)
= `radacct`.`UserName` AND `msi_adsl`.`id`=`adsl_adhoc`.`id`)
You are mixing joins with Cartesian product, and this is not good idea, because it's a lot harder to debug. Try this:
FROM `msi_adsl`
INNER JOIN
`radacct`
ON
CONCAT(`msi_adsl`.`username`,'#',`msi_adsl`.`realm`) = `radacct`.`UserName`
JOIN `adsl_adhoc` ON `msi_adsl`.`id`=`adsl_adhoc`.`id`
Related
it's been quite a while since I had to write SQL for the last time. I'm struggling with a select and I was hoping you can point out what I am missing.
I have a rather simple table structure with 2 tables:
Users
user_id
name
Estimates
user_id
estimate
estimation_date
The goal is to use a start and end date to show all estimations of all users. For users who don't have an estimate for that particular time range, I want the SQL to return me a null value.
In reality there are a few more tables and columns involved as well as some user role stuff but that is not necessary for this example.
I tried to setup my select like this:
SELECT a.id, a.name, b.estimate, b.estimation_date
FROM users a
LEFT OUTER JOIN estimates b on a.id = b.id
WHERE b.estimation_date BETWEEN $startdate AND $enddate OR b.estimation_date IS NULL
As long as a user has not made any estimate at all, the result is what I want. Each row is a user, same user can occur multiple times - once for each estimate in the date range - and a user without an estimate has NULL in b.estimate and b.estimation_date.
However, as soon as a user has made an estimation, even if it is not inside the range, the user does not show up in the result anymore.
What would be the best solution to input a start and end date and get all users and for each user who has no estimation inside this range get a NULL in those columns?
Put the date range condition in the on clause:
SELECT u.id, u.name, e.estimate, e.estimation_date
FROM users u LEFT OUTER JOIN
estimates e
ON u.id = e.id AND
e.estimation_date >= $startdate AND
(e.estimation_date <= $enddate OR e.estimation_date IS NULL)
Note that I also fixed the table aliases so they are not meaningless letters, but are abbreviations for the table name.
I have following query..
SELECT avg(h.price)
FROM `car_history_optimized` h
LEFT JOIN vin_data vd ON (concat(substr(h.vin,1,8),'_',substr(h.vin,10,3))=vd.prefix)
WHERE h.date >='2015-01-01'
AND h.date <='2015-04-01'
AND h.dealer_id <> 2389
AND vd.prefix IN
(SELECT concat(substr(h.vin,1,8),'_',substr(h.vin,10,3))
FROM `car_history_optimized` h
LEFT JOIN vin_data vd ON (concat(substr(h.vin,1,8),'_',substr(h.vin,10,3))=vd.prefix)
WHERE h.date >='2015-03-01'
AND h.date <='2015-04-01'
AND h.dealer_id =2389)
It finds the average market value of a car sold within last 3 months by everyone else other than (2389) but only those car which have the same Make, Model sold by (2389)
can above query be optimized ? it's taking 2 minutes to run for 11 million records..
Thanks
How often will you use that particular "prefix"? If often, then I will direct you toward indexing a 'virtual' column.
Otherwise, you need
INDEX(date) -- for the outer query
INDEX(dealer_id, date) -- for what is now the subquery
Then do the EXISTS as suggested, or use a LEFT JOIN ... WHERE ... IS NULL.
Is date a DATE? or a DATETIME? You may be including an extra day. Suggest this pattern:
WHERE date >= '2015-01-01'
AND date < '2015-01-01' + INTERVAL 3 MONTH
If you want a simple solution, my initial thought is to figure out a way to not have function calls in your joins.
You negatively affect the chance that an index will be helpful.
(concat(substr(h.vin,1,8),'_',substr(h.vin,10,3))=vd.prefix)
Maybe a like statement would be a better idea, however, either approach in a join clause is to be avoided.
Bottom line is your table structure & relationships here leaves room for improvement... If you need the concat because you are avoiding joining intermediate tables, don't -- allow the indexes to be used and it should improve your query performance.
Also, make sure you have indexes.
I suggest 3 things
add a column and index it (avoid the functions in the join)
use an inner join
use EXISTS (...) instead of IN (...)
To "optimize" that query you need to add a column to the table car_history_optimized which contains the result of concat(substr(vin,1,8),'_',substr(vin,10,3)) and this column should be indexed.
Also, use INNER JOIN. In the current query the left outer join is wasted because you require every row of that table to be IN (the subquery) so NULL from that table isn't permitted hence you have the same effect as an inner join.
Use EXISTS instead of IN
SELECT
AVG(h.price)
FROM car_history_optimized h
INNER JOIN vin_data vd ON h.new_column = vd.prefix
WHERE h.`date` >= '2015-01-01'
AND h.`date` <= '2015-04-01'
AND h.dealer_id <> 2389
AND EXISTS (
SELECT
NULL
FROM car_history_optimized cho
WHERE cho.`date` >= '2015-03-01'
AND cho.`date` <= '2015-04-01'
AND cho.dealer_id = 2389
AND vd.prefix = cho.new_column
)
;
By the way:
I assume already have some indexes and those include date and dealer_id
in future avoid using "date" as a column name (it's a reserved word)
A customer has asked me to pull some extra accounting data for their statements. I have been having trouble writing a query that can accomodate this.
What they initially wanted was to get all "payments" from a certain date range that belongs to a certain account, ordered as oldest first, which was a simple statement as follows
SELECT * FROM `payments`
WHERE `sales_invoice_id` = ?
ORDER_BY `created_at` ASC;
However, now they want to have newly raised "invoices" as part of the statement. I cannot seem to write the correct query for this. Union does not seem to work, and JOINS behave like, well, joins, so that I cannot get a seperate row for each payment/invoice; it instead merges them together. Ideally, I would retrieve a set of results as follows:
payment.amount | invoice.amount | created_at
-----------------------------------------------------------
NULL | 250.00 | 2014-02-28 8:00:00
120.00 | NULL | 2014-02-28 8:00:00
This way I can loop through these to generate a full statement. The latest query I tried was the following:
SELECT * FROM `payments`
LEFT OUTER JOIN `sales_invoices`
ON `payments`.`account_id` = `sales_invoices`.`account_id`
ORDER BY `created_at` ASC;
The first problem would be that the "created_at" column is ambiguous, so I am not sure how to merge these. The second problem is that it merges rows and brings back duplicate rows which is incorrect.
Am I thinking about this the wrong way or is there a feature of MySQL I am not aware of that can do this for me? Thanks.
You can use union all for this. You just need to define the columns carefully:
select ip.*
from ((select p.invoice_id, p.amount as payment, NULL as invoice, p.created_at
from payments p
) union all
(select i.invoice_id, NULL, i.amount, i.created_at
from sales_invoices i
)
) ip
order by created_at asc;
Your question is specifically about MySQL. Other databases support a type of join called the full outer join, which can also be used for this type of query (MySQL does not support full outer join).
I have following tables:
products - 4500 records
Fields: id, sku, name, alias, price, special_price, quantity, desc, photo, manufacturer_id, model_id, hits, publishing
products_attribute_rel - 35000 records
Fields: id, product_id, attribute_id, attribute_val_id
attribute_values - 243 records
Fields: id, attr_id, value, ordering
manufacturers - 29 records
Fields: id, title,publishing
models - 946 records
Fields: id, manufacturer_id, title, publishing
So I get data from these tables by one query:
SELECT jp.*,
jm.id AS jm_id,
jm.title AS jm_title,
jmo.id AS jmo_id,
jmo.title AS jmo_title
FROM `products` AS jp
LEFT JOIN `products_attribute_rel` AS jpar ON jpar.product_id = jp.id
LEFT JOIN `attribute_values` AS jav ON jav.attr_id = jpar.attribute_val_id
LEFT JOIN `manufacturers` AS jm ON jm.id = jp.manufacturer_id
LEFT JOIN `models` AS jmo ON jmo.id = jp.model_id
GROUP BY jp.id HAVING COUNT(DISTINCT jpar.attribute_val_id) >= 0
This query is slow as hell. It takes hundreds of seconds mysql to handle it.
So how it would be possible to improve this query ? With small data chunks it works
perfectly well. But I guess everything ruins products_attribute_rel table, which
has 35000 records.
Your help would be appreciated.
EDITED
EXPLAIN results of the SELECT query:
The problem is that MySQL uses the join-type ALL for 3 tables. That means that MySQL performs 3 full table scans, puts every possibility together before sorting those out that don't match the ON statement. To get a much faster join-type (for instance eq_ref), you must put an index on the coloumns that are used on the ON statements.
Be aware though that putting an index on every possible coloumn is not recommended. A lot of indexes do speed up SELECT statements, however it also creates an overhead since the index must be stored and managed. This means that manipulation queries like UPDATE and DELETE are much slower. I've seen queries deleting only 1000 records in half an hour. It's a trade-off where you have to decide what happens more often and what is more important.
To get more infos on MySQL join-types, take a look at this.
More on indexes here.
Tables data is not so much huge that it's taking hundreds of seconds. Something is wrong with table schema. Please do proper indexing. That will surly speed up.
select distinct
jm.id AS jm_id,
jm.title AS jm_title,
jmo.id AS jmo_id,
jmo.title AS jmo_title
from products jp,
products_attribute_rel jpar,
attribute_values jav,
manufacturers jm
models jmo
where jpar.product_id = jp.id
and jav.attr_id = jpar.attribute_val_id
and jm.id = jp.manufacturer_id
and jmo.id = jp.model_id
you can do that if you want to select all the data. Hope it works.
I need to gather posts from two mysql tables that have different columns and provide a WHERE clause to each set of tables. I appreciate the help, thanks in advance.
This is what I have tried...
SELECT
blabbing.id,
blabbing.mem_id,
blabbing.the_blab,
blabbing.blab_date,
blabbing.blab_type,
blabbing.device,
blabbing.fromid,
team_blabbing.team_id
FROM
blabbing
LEFT OUTER JOIN
team_blabbing
ON team_blabbing.id = blabbing.id
WHERE
team_id IN ($team_array) ||
mem_id='$id' ||
fromid='$logOptions_id'
ORDER BY
blab_date DESC
LIMIT 20
I know that this is messy, but i'll admit, I am no mysql veteran. I'm a beginner at best... Any suggestions?
You could put the where-clauses in subqueries:
select
*
from
(select * from ... where ...) as alias1 -- this is a subquery
left outer join
(select * from ... where ...) as alias2 -- this is also a subquery
on
....
order by
....
Note that you can't use subqueries like this in a view definition.
You could also combine the where-clauses, as in your example. Use table aliases to distinguish between columns of different tables (it's a good idea to use aliases even when you don't have to, just because it makes things easier to read). Example:
select
*
from
<table> as alias1
left outer join
<othertable> as alias2
on
....
where
alias1.id = ... and alias2.id = ... -- aliases distinguish between ids!!
order by
....
Two suggestions for you since a relative newbie in SQL. Use "aliases" for your tables to help reduce SuperLongTableNameReferencesForColumns, and always qualify the column names in a query. It can help your life go easier, and anyone AFTER you to better know which columns come from what table, especially if same column name in different tables. Prevents ambiguity in the query. Your left join, I think, from the sample, may be ambigous, but confirm the join of B.ID to TB.ID? Typically a "Team_ID" would appear once in a teams table, and each blabbing entry could have the "Team_ID" that such posting was from, in addition to its OWN "ID" for the blabbing table's unique key indicator.
SELECT
B.id,
B.mem_id,
B.the_blab,
B.blab_date,
B.blab_type,
B.device,
B.fromid,
TB.team_id
FROM
blabbing B
LEFT JOIN team_blabbing TB
ON B.ID = TB.ID
WHERE
TB.Team_ID IN ( you can't do a direct $team_array here )
OR B.mem_id = SomeParameter
OR b.FromID = AnotherParameter
ORDER BY
B.blab_date DESC
LIMIT 20
Where you were trying the $team_array, you would have to build out the full list as expected, such as
TB.Team_ID IN ( 1, 4, 18, 23, 58 )
Also, not logical "||" or, but SQL "OR"
EDIT -- per your comment
This could be done in a variety of ways, such as dynamic SQL building and executing, calling multiple times, once for each ID and merging the results, or additionally, by doing a join to yet another temp table that gets cleaned out say... daily.
If you have another table such as "TeamJoins", and it has say... 3 columns: a date, a sessionid and team_id, you could daily purge anything from a day old of queries, and/or keep clearing each time a new query by the same session ID (as it appears coming from PHP). Have two indexes, one on the date (to simplify any daily purging), and second on (sessionID, team_id) for the join.
Then, loop through to do inserts into the "TempJoins" table with the simple elements identified.
THEN, instead of a hard-coded list IN, you could change that part to
...
FROM
blabbing B
LEFT JOIN team_blabbing TB
ON B.ID = TB.ID
LEFT JOIN TeamJoins TJ
on TB.Team_ID = TJ.Team_ID
WHERE
TB.Team_ID IN NOT NULL
OR B.mem_id ... rest of query
What I ended up doing is;
I added an extra column to my blabbing table called team_id and set it to null as well as another field in my team_blabbing table called mem_id
Then I changed the insert script to also insert a value to the mem_id in team_blabbing.
After doing this I did a simple UNION ALL in the query:
SELECT
*
FROM
blabbing
WHERE
mem_id='$id' OR
fromid='$logOptions_id'
UNION ALL
SELECT
*
FROM
team_blabbing
WHERE
team_id
IN
($team_array)
ORDER BY
blab_date DESC
LIMIT 20
I am open to any thought on what I did. Try not to be too harsh though:) Thanks again for all the info.