I have the two tables tabA and tabB, and there is a one-to-many relationship from tabA to tabB. I have the query:
SELECT * FROM `tabA` LEFT JOIN `tabB` ON `tabA`.`aID` = `tabB`.`aID`
and the rows that are returned is a large set with multiple duplicates from tabA for each tabB reference to tabA.
I am aware that I can use GROUP BY to limit the tabA rows to unique elements, unless I use custom field(s) using the GROUP_CONCAT function, combined with two REPLACE functions for escaping (which seriously impacts performance), I loose all bar one of the rows contained in tabB. An example query looks like:
SELECT `tabA`.*,
GROUP_CONCAT(REPLACE(REPLACE(`tabB`.`tabBCol1`, '/', '//'), ',', '/,')) AS `tabBCol`,
GROUP_CONCAT(REPLACE(REPLACE(`tabB`.`tabBCol2`, '/', '//'), ',', '/,')) AS `tabBCo2`
FROM `tabA`
LEFT JOIN `tabB` ON `tabA`.`aID` = `tabB`.`aID`
GROUP BY `tabA`.`aID`
That query will allow me to use the LIMIT syntax so I can (for example) only show 5 entries, starting after 5 (i.e. LIMIT 5,5). And when I apply that to the former query, then I won't get the next 5 queries, but a random set of data based on the numbers of associations.
So, apart from the second query, is there any way that I can fetch the rows, with there associations, but allow the use of the LIMIT syntax, and without the performance hit of excessive REPLACE functions?
ADDITIONAL
Although I can use multiple subqueries for each row, using the first query with GROUP BY syntax (which would allow me to apply any WHERE conditions for the associations), I am trying to find a way to avoid the N+1 Selects Problem (although in this example, my LIMIT syntax is LIMIT 5,5, I will be applying this to much larger LIMITs (upto 1000 rows at a time)).
Try two queries:
// get those 5 records
SELECT * FROM Cars WHERE some_conditon = blabla LIMIT 5;
// get all associated records from related table
SELECT * FROM Wheels WHERE car_id IN (1, 3, 5, 123, 16);
In the result there will not be any N problem as you will always have two queries. Even if you will have 1000 records in 1st query it will always be better to use this simple method, than joins/groups by/concats/etc.
Related
I have 1 query that returns over 180k rows. I need to make a slight change, so that it returns only about 10 less.
How do I show only the 10 rows as a result?
I've tried EXCEPT but it seems to return a lot more than just the 10.
You can use LIMIT. This will show first n rows. Example:
SELECT * FROM Orders LIMIT 10
If you are trying to make pagination add OFFSET. It will return 10 rows starting from row 20. Example:
SELECT * FROM Orders LIMIT 10 OFFSET 20
MySQL doesn't support EXCEPT (to my knowledge).
Probably the most efficient route would be to incorporate the two WHERE clauses into one. I say efficient in the sense of "Do it that way if you're going to run this query in a regular report or production application."
For example:
-- Query 1
SELECT * FROM table WHERE `someDate`>'2016-01-01'
-- Query 2
SELECT * FROM table WHERE `someDate`>'2016-01-10'
-- Becomes
SELECT * FROM table WHERE `someDate` BETWEEN '2016-01-01' AND '2016-01-10'
It's possible you're implying that the queries are quite complicated, and you're after a quick (read: not necessarily efficient) way of getting the difference for a one-off investigation.
That being the case, you could abuse UNION and a sub-query:
(Untested, treat as pseudo-SQL...)
SELECT
*
FROM (
SELECT * FROM table WHERE `someDate`>'2016-01-01'
UNION ALL
SELECT * FROM table WHERE `someDate`>'2016-01-10'
) AS sub
GROUP BY
`primaryKey`
HAVING
COUNT(1) = 1;
It's ugly though. And not efficient.
Assuming that the only difference is only that one side (I'll call it the "right hand side") is missing records that the left includes, you could LEFT JOIN the two queries (as subs) and filter to right-side-is-null. But that'd be dependent on all those caveats being true.
Temporary tables can be your friend - especially given they're so easily created (and can be indexed):
CREATE TEMPORARY TABLE tmp_xyz AS SELECT ... FROM ... WHERE ...;
Forgive me if this seems like common sense as I am still learning how to split my data between multiple tables.
Basically, I have two:
general with the fields userID,owner,server,name
count with the fields userID,posts,topics
I wish to fetch the data from them and cannot decide how I should do it: in a UNION:
SELECT `userID`, `owner`, `server`, `name`
FROM `english`.`general`
WHERE `userID` = 54 LIMIT 1
UNION
SELECT `posts`, `topics`
FROM `english`.`count`
WHERE `userID` = 54 LIMIT 1
Or a JOIN:
SELECT `general`.`userID`, `general`.`owner`, `general`.`server`,
`general`.`name`, `count`.`posts`, `count`.`topics`
FROM `english`.`general`
JOIN `english`.`count` ON
`general`.`userID`=`count`.`userID` AND `general`.`userID`=54
LIMIT 1
Which do you think would be the more efficient way and why? Or perhaps both are too messy to begin with?
It's not about efficiency, but about how they work.
UNION just unions 2 different independent queries. So you get 2 result sets one after another.
JOIN appends each row from one result set to each row from another result set. So in total result set you have "long" rows (in terms of amount of columns)
Just for completeness as I don't think it's mentioned elsewhere: often UNION ALL is what's intended when people use UNION.
UNION will remove duplicates (so relatively expensive because it requires a sort). This remove duplicates in the final result (so it doesn't matter if there's a duplicate in a single query or the same data from individual SELECTs). UNION is a set operation.
UNION ALL just sticks the results together: no sorting, no duplicate removal. This is going to be quicker (or at least no worse) than UNION.
If you know the individual queries won't return duplicate results use UNION ALL. (In fact often best to assume UNION ALL and think about UNION if you need that behaviour; using SELECT DISTINCT with UNION is redundant).
You want to use a JOIN. Joining is used to creating a single set which is a combination of related data. Your union example doesn't make sense (and probably won't run). UNION is for linking two result sets with identical columns to create a set that has the combined rows (it does not 'union' the columns.)
If you want to fetch users and near user posts and topics. you need to write QUERY using JOIN like this:
SELECT general.*,count.posts,count.topics FROM general LEFT JOIN count ON general.userID=count.userID
My problem is this:
select * from
(
select * from barcodesA
UNION ALL
select * from barcodesB
)
as barcodesTOTAL, boxes
where barcodesTotal.code=boxes.code;
Table barcodesA has 4000 entries
Table barcodesB has 4000 entries
Table boxes has like 180.000 entries
It takes 30 seconds to proccess the query.
Another problematic query:
select * from
viewBarcodesTotal, boxes
where barcodesTotal.code=boxes.code;
viewBarcodesTotal contains the UNION ALL from both barcodes tables. It also takes forever.
Meanwhile,
select * from barcodesA , boxes where barcodesA.code=boxes.code
UNION ALL
select * from barcodesB , boxes where barcodesB.code=boxes.code
This one takes <1second.
The question is obviously WHY?, is my code bugged? is mysql bugged?
I have to migrate from access to mysql, and i would have to rewrite all my code if the first option in bugged.
Add an index on boxes.code if you don't already have one. Joining 8000 records (4K+4K) to the 180,000 will benefit from an index on the 180K side of the equation.
Also, be explicit and specify the fields you need back in your SELECT statements. Using * in a production-use query is bad form as it encourages not having to think about what fields (and how big they might be), not to mention the fact that you have 2 different tables in your example, barcodesa and barcodesb with potentially different data types and column orders that you're UNIONing....
The REASON for the performance difference...
The first query says... First, do a complete union of EVERY record in A UNIONed with EVERY record in B, THEN Join it to boxes on the code. The union does not have an index to be optimized against.
By explicitly applying your SECOND query instance, each table individually IS optimized on the join (apparently there IS an index per performance of second, but I would ensure both tables have index on "code" column).
I have multiple select statements from different tables on the same database. I was using multiple, separate queries then loading to my array and sorting (again, after ordering in query).
I would like to combine into one statement to speed up results and make it easier to "load more" (see bottom).
Each query uses SELECT, LEFT JOIN, WHERE and ORDER BY commands which are not the same for each table.
I may not need order by in each statement, but I want the end result, ultimately, to be ordered by a field representing a time (not necessarily the same field name across all tables).
I would want to limit total query results to a number, in my case 100.
I then use a loop through results and for each row I test if OBJECTNAME_ID (ie; comment_id, event_id, upload_id) isset then LOAD_WHATEVER_OBJECT which takes the row and pushes data into an array.
I won't have to sort the array afterwards because it was loaded in order via mysql.
Later in the app, I will "load more" by skipping the first 100, 200 or whatever page*100 is and limit by 100 again with the same query.
The end result from the database would pref look like "this":
RESULT - selected fields from a table - field to sort on is greatest
RESULT - selected fields from a possibly different table - field to sort on is next greatest
RESULT - selected fields from a possibly different table table - field to sort on is third greatest
etc, etc
I see a lot of simpler combined statements, but nothing quite like this.
Any help would be GREATLY appreciated.
easiest way might be a UNION here ( http://dev.mysql.com/doc/refman/5.0/en/union.html ):
(SELECT a,b,c FROM t1)
UNION
(SELECT d AS a, e AS b, f AS c FROM t2)
ORDER BY a DESC
I am trying to order a query by two keys. The query is built with several subqueries. The table contains, beside columns with other data, two columns, Key and Key_Father. So I need to order the results since SQL to print the results in a report. This is an example:
Key Key_Father
4 NULL
1 4
2 4
7 NULL
1 7
2 7
As you can see is a structure father-son, where a row is a father if the Key_Father is NULL and the Key column start from one for each son with a different father.
The first subquery gives the data in order, because is stored on that order in the table, but the second subquery that uses a group by, no. So I tried adding a extra column with Row_Number on the first subquery to keep that order, but the second subquery does the same thing.
This is the query:
SELECT Orden,INV_Key,Key_Padre,INV.INV_ID,INV.BOD_Bodega_ID,
CASE WHEN MAX(HIS_Ventas) > 0 OR max(HIS_Disponible) > 0 THEN 1 ELSE 0 END AS Participacion,MAX(ISNULL(HIS_Ventas,0)) AS Ventas
FROM(SELECT ROW_NUMBER() OVER (ORDER BY C.INV_Compra_ID) Orden,C.BOD_Bodega_ID,INV_Key,Key_Padre,CD.INV_ID
FROM dbo.INV_COMPRAS_USADOS C
INNER JOIN dbo.INV_COMPRAS_USADOS_DET CD ON C.INV_Compra_ID = CD.INV_Compra_ID
WHERE C.INV_Compra_ID = #Compra_ID
AND ((Key_Padre IS NULL AND CD.INV_Catalogo_Codigo = ISNULL(#Cod_Catalogo,CD.INV_Catalogo_Codigo)
AND INV_Key IN (SELECT DISTINCT Key_Padre
FROM dbo.INV_COMPRAS_USADOS_DET
WHERE INV_Compra_ID = #Compra_ID AND Key_Padre IS NOT NULL))
OR Key_Padre IN (SELECT DISTINCT INV_Key
FROM dbo.INV_COMPRAS_USADOS_DET
WHERE INV_Compra_ID = #Compra_ID AND (Key_Padre IS NULL AND CD.INV_Catalogo_Codigo = ISNULL(#Cod_Catalogo,CD.INV_Catalogo_Codigo))))) INV
LEFT JOIN DBO.HIS_HISTORICO_DETALLE HD ON INV.INV_ID = HD.INV_ID AND HD.BOD_Bodega_ID = INV.BOD_Bodega_ID
LEFT JOIN DBO.HIS_HISTORICO_INVENTARIO H on H.HIS_Historico_ID= HD.HIS_Historico_ID AND (CONVERT(datetime,(convert(varchar(20),HIS_Historico_Ano) + '/' + convert(varchar(20),HIS_Historico_Mes) + '/01')) BETWEEN #FechaDesde AND #FechaHasta)
WHERE H.HIS_Historico_Mes IS NOT NULL OR INV.INV_ID IS NULL
GROUP BY Orden,INV_Key,Key_Padre,INV.INV_ID,INV.BOD_Bodega_ID,HIS_Historico_Ano,HIS_Historico_Mes
Another interesting thing (well for me) is that when I change the #Variables for Constant values, the second query keeps the correct order, even when the constant values are the same that the #variables. This is just a portion of the total query, is a subquery that needs of another two selects, and I need to keep the order from those selects too.
So I hope that someone could help me with this. Thanks!
To order the results you need to place an ORDER BY clause on the outermost SELECT statement. Using ORDER BY in a nested SELECT is generally not permitted but even if you work around it (e.g. by using TOP), you can't rely on the results being ordered in any particular way.
Without an ORDER BY the results may appear to be coming out in the order you want but this cannot be relied upon. Running the same query on a different server or at some point in the future may produce a different order where differences in statistics, server load, etc can affect how the query optimizer actually executes the statement.
The portion of the query you've provided is outputting the following columns. Which are the ones you want to order by?
Orden (although this is just an alias for INV_Compra_ID as far as orderin is concerned)
INV_Key
Key_Padre
INV_ID
BOD_Bodega_ID
Participacion
Ventas
Let's say you want to order by just thre of them, then you need to append the following clause to the outermost SELECT:
ORDER BY
Orden,
INV_Key,
Key_Padre,
This should do it. I'm not sure if I'm missing an obvious simplification though.
ORDER BY ISNULL(Key_Father,[Key]), ISNULL(Key_Father,-1),[Key]