How can I improve this multiple join monster query? - mysql

For your sakes i'm not gonna paste it all here, just know that it's hella long and involves a lot of joins.
The table structure i'm working on, at its most complicated, looks like this:
tables = Co, Cn, O, A, N;
scheme (simplified):
N(n)-> (1)every other table
A(n)-> (1)(every other table except N
O(n)-> (1)Co, Cn
Cn(n) -> (1)Co
Co(1) -> (1)cn
The most laborious part of the query goes trough the structure like this (i'm trying to get information from table N but also from the other ones):
N join A join O join ((Co join Cn) union (Cn join Co ))
I need to get the contents from N but I also need stuff from Co and Cn but I must go trough A and O to get them. Now, this may not seem like a huge query but N can also relate to every one of those other tables and I need to get them ALL; let's just say I can't see the content from the query if I echo it in the aplication. I join each case separately (N->A, N->O, N->Cn, etc) and then grab them all up with a UNION
At the moment, with almost 200,000 lines in N and a couple hundred to almost a thousand in each of the other tables the database server and the website just hangs when I try to run the query.
One solution I thought of was to ditch the foreign keys and make a field for links that goes, for an example from N trough all the others: "A_IdA,O_IdO,Co_IdCo,Cn_IdCn", and process it in php; but I'd rather try something else before such a drastic change in table design.

I'm having a hard time trying to grasp your exact table structure and query.
However, I feel like you're trying to combine too much into 1 query, causing the joining to be extremely stressful on the DB server. You'd most likely be better of performing multiple simpler queries, then trying to do it all in 1 huge query.

Related

MySQL 1:N Data Mapping

Something really bugs me and im not sure what is the "correct" approach.
If i make a select to get contacts from my database there are a decent amount of joins involved.
It will look something like this (around 60-70 columns):
SELECT *
FROM contacts
LEFT JOIN company
LEFT JOIN person
LEFT JOIN address
LEFT JOIN person_communication
LEFT JOIN company_communication
LEFT JOIN categories
LEFT JOIN notes
company and person are 1:1 cardinality so its straight forward.
But "address", "communication" and "categories" are 1:n cardinality.
So depending on the amount of rows in the 1:n tables i will get a lot of "double" rows (I don't know whats the real term for that, the rows are not double i know that the address or phone number etc is different). For myself as a contact, a fairly filled contact, i get 85 rows back.
How do you guys work with that?
In my PHP application i always wrote some "Data-Mapper" where the array key was the "contact.ID aka primary" and then checked if it exists and then pushed the additional data into it. Also PHP is not really type strict what makes it easy.
Now I'm learning GO(golang) and i thought screw that LOOOONG select and data mapping just write selects for all the 1:n.... yeah no, not enough connections to load a table full of contacts. I know that i can increase the connections but the error seems to imply that this would be the wrong way.
I use the following driver: https://github.com/go-sql-driver/mysql
I also tried GROUP_CONCAT but then i running in trouble parsing it back.
Do i have to do my mapping approach again or is there some nice solution out there? I found it quite dirty at points tho?
The solution is simple: you need to execute more than one query!
The cause of all the "duplicate" rows is that you're generating a result called a Cartesian product. You are trying to join to several tables with 1:n relationships, but each of these has no relationship to the other, so there's no join condition restricting them with respect to each other.
Therefore you get a result with every combination of all the 1:n relationships. If you have 3 matches in address, 5 matches in communication, and 5 matches in categories, you'd get 3x5x5 = 75 rows.
So you need to run a separate SQL query for each of your 1:n relationships. Don't be afraid—MySQL can handle a few queries. You need them.

Inner joining within an inner join

I tried to find if there are any answered but couldn't seem to find any. I'm trying to join together four tables but one of the joins is not on the table that the other two joins are from, I've successfully joined three of the table I'm just not sure of syntax for joining the third.
SELECT * FROM
nc_booking
INNER JOIN
nc_customer ON nc_booking.c_id = nc_customer.id
INNER JOIN
nc_propertys ON nc_booking.p_id = nc_propertys.id
How would i now join nc_propertys to another table nc_owner?
Building on the code from #GordonLinoff, to add your extra table you need to do something like:
SELECT *
FROM nc_booking b INNER JOIN
nc_customer c
ON b.c_id = c.id INNER JOIN
nc_propertys p
ON b.p_id = p.id INNER JOIN
nc_owner o
ON o.id = p.o_id;
You haven't shared the column names we need to use to connect the extra table, so the last line might not be right. A few things to note ...
(1) The SELECT * is not ideal. If you only need particular columns here, list them. I've stuck with your * because I don't know what you want from the tables. Where a column with the same name exists in each table, you'll have "fully qualify" the field name as follows ...
SELECT c.id as customer_id,
-- more field can go here, with a comma after each
...
Several of the joined tables have an id field, so the c. is necessary to tell the database which one we want. Notice that as with the tables, we can also give the fields an 'alias', which in this case is 'customer_id'. This can be very helpful for presentation, and is often essential when using the output from a query as part of a larger piece of code.
(2) Since all the joins are INNER JOINS it makes little (if any) difference what order the tables are listed as long as the connections between them remain the same.
(3) For MySQL, it technically shouldn't matter whether you have lots of new-lines or none at all. SQL is designed to ignore "white space" (except within data). What matters is simply laying out your code so it is easy to read ... especially for other users who later might need to figure out what you were doing (although in my experience also for you, when you return to a piece of code several years later and can't remember it at all).
(4) In each ON clause it doesn't actually matter whether you wright say a = b or b = a. That's because you aren't setting one to equal the other, you are requiring that they already be equal so it amounts to the same thing either way.
My advice to a SQL beginner would be when you are writing a SELECT query (which only reads and doesn't change any data): if you aren't too sure then write some code and set it to run. If it's completely invalid, your software should give you some idea of what is wrong and no harm will be done. If it's valid but wrong, the very worst that can happen is that you put some unnecessary load on your database server ... if it takes a long time to run and you weren't expecting it to, then you should be able to cancel the query. As long as you have some idea of what you expect the results to look like, and roughly how many rows to expect, you won't go too far wrong. If you get completely stuck come back here to Stack Overflow.
Things get a bit different if you are writing code which DELETEs or UPDATEs data. Then you want to know exactly what you're up to. Normally you can write a closely related SELECT statement first to make sure you're going to be making all and only the changes you were expecting. It's also best to make sure you've got a way to undo your changes should the worst happen. Backups are obviously good, and you can often create your own backup copy of a table before you make any alterations. You don't necessarily need to rely on backup software or your in house IT guys for that ... in my experience they don't like databases anyway.
Also there are some great books out there. For a beginner, I'd recommend anything by Ben Forta, including his SQL in 10 Minutes (that's a per chapter figure), or his MySQL Crash Course (the latter is a little old though, so won't have anything on the more recently added features of MySQL).
Your syntax looks okay. I am providing an answer because you really should learn to use table aliases. They make a query easier to write and to read:
SELECT *
FROM nc_booking b INNER JOIN
nc_customer c
ON b.c_id = c.id INNER JOIN
nc_propertys p
ON b.p_id = p.id;

Right way to phrase MySQL query across many (possible empty) tables

I'm trying to do what I think is a set of simple set operations on a database table: several intersections and one union. But I don't seem to be able to express that in a simple way.
I have a MySQL table called Moment, which has many millions of rows. (It happens to be a time-series table but that doesn't impact on my problem here; however, these data have a column 'source' and a column 'time', both indexed.) Queries to pull data out of this table are created dynamically (coming in from an API), and ultimately boil down to a small pile of temporary tables indicating which 'source's we care about, and maybe the 'time' ranges we care about.
Let's say we're looking for
(source in Temp1) AND (
((source in Temp2) AND (time > '2017-01-01')) OR
((source in Temp3) AND (time > '2016-11-15'))
)
Just for excitement, let's say Temp2 is empty --- that part of the API request was valid but happened to include 'no actual sources'.
If I then do
SELECT m.* from Moment as m,Temp1,Temp2,Temp3
WHERE (m.source = Temp1.source) AND (
((m.source = Temp2.source) AND (m.time > '2017-01-01')) OR
((m.source = Temp3.source) AND (m.time > '2016-11'15'))
)
... I get a heaping mound of nothing, because the empty Temp2 gives an empty Cartesian product before we get to the WHERE clause.
Okay, I can do
SELECT m.* from Moment as m
LEFT JOIN Temp1 on m.source=Temp1.source
LEFT JOIN Temp2 on m.source=Temp2.source
LEFT JOIN Temp3 on m.source=Temp3.source
WHERE (m.source = Temp1.source) AND (
((m.source = Temp2.source) AND (m.time > '2017-01-01')) OR
((m.source = Temp3.source) AND (m.time > '2016-11-15'))
)
... but this takes >70ms even on my relatively small development database.
If I manually eliminate the empty table,
SELECT m.* from Moment as m,Temp1,Temp3
WHERE (m.source = Temp1.source) AND (
((m.source = Temp3.source) AND (m.time > '2016-11-15'))
)
... it finishes in 10ms. That's the kind of time I'd expect.
I've also tried putting a single unmatchable row in the empty table and doing SELECT DISTINCT, and it splits the difference at ~40ms. Seems an odd solution though.
This really feels like I'm just conceptualizing the query wrong, that I'm asking the database to do more work than it needs to. What is the Right Way to ask the database this question?
Thanks!
--UPDATE--
I did some actual benchmarks on my actual database, and came up with some really unexpected results.
For the scenario above, all tables indexed on the columns being compared, with an empty table,
doing it with left joins took 3.5 minutes (!!!)
doing it without joins (just 'FROM...WHERE') and adding a null row to the empty table, took 3.5 seconds
even more striking, when there wasn't an empty table, but rather ~1000 rows in each of the temporary tables,
doing the whole thing in one query took 28 minutes (!!!!!), but,
doing each of the three AND clauses separately and then doing the final combination in the code took less than a second.
I still feel I'm expressing the query in some foolish way, since again, all I'm trying to do is one set union (OR) and a few set intersections. It really seems like the DB is making this gigantic Cartesian product when it seriously doesn't need to. All in all, as pointed out in the answer below, keeping some of the intelligence up in the code seems to be the better approach here.
There are various ways to tackle the problem. Needless to say it depends on
how many queries are sent to the database,
the amount of data you are processing in a time interval,
how the database backend is configured to manage it.
For your use case, a little more information would be helpful. The optimization of your query by using CASE/COUNT(*) or CASE/LIMIT combinations in queries to sort out empty tables would be one option. However, if-like queries cost more time.
You could split the SQL code to downgrade the scaling of the problem from 1*N^x to y*N^z, where z should be smaller than x.
You said that an API is involved, maybe you are able handle the temporary "no data" tables differently or even don't store them?
Another option would be to enable query caching:
https://dev.mysql.com/doc/refman/5.5/en/query-cache-configuration.html

MySQL CONCAT '*' symbol toasts the database

I have a table used for lookups which stores the human-readable value in one column and a the same text stripped of special characters and spaces in another. e.g., the value "Children's Shows" would appear in the lookup column as "childrens-shows".
Unfortunately the corresponding main table isn't quite that simple - for historical reasons I didn't create myself and now would be difficult to undo, the lookup value is actually stored with surrounding asterisks, e.g. '*childrens-shows*'.
So, while trying to join the lookup table sans-asterisks with the main table that has asterisks, I figured CONCAT would help me add them on-the-fly, e.g.;
SELECT *
FROM main_table m
INNER JOIN lookup_table l
ON l.value = CONCAT('*',m.value,'*')
... and then the table was toast. Not sure if I created an infinite loop or really screwed the data, but it required an ISP backup to get the table responding again. I suspect it's because the '*' symbol is probably reserved, like a wildcard, and I've asked the database to do the equivalent of licking its own elbow. Either way, I'm hesitant to 'experiment' to find the answer given the spectacular way it managed to kill the database.
Thanks in advance to anyone who can (a) tell me what the above actually did to the database, and (b) how I should actually join the tables?
When using CONCAT, mysql won't use the index. Use EXPLAIN to check this, but a recent problem I had was that on a large table, the indexed column was there, but the key was not used. This should not bork the whole table however, just make it slow. Possibly it ran out of memory, started to swap and then crashed halfway, but you'd need to check the logs to find out.
However, the root cause is clearly bad table design and that's where the solution lies. Any answer you get that allows you to work around this can only be temporary at best.
Best solution is to move this data into a separate table. 'Childrens shows' sounds like a category and therefore repeated data in many rows. This should really be an id for a 'categories' table, which would prevent the DB from having to run CONCAT on every single row in the table, as you could do this:
SELECT *
FROM main_table m
INNER JOIN lookup_table l
ON l.value = m.value
/* and optionally */
INNER JOIN categories cat
ON l.value = cat.id
WHERE cat.name = 'whatever'
I know this is not something you may be able to do given the information you supplied in your question, but really the reason for not being able to make such a change to a badly normalised DB is more important than the code here. Without either the resources or political backing to do things the right way, you will end up with even more headaches like this, which will end up costing more in the long term. Time for a word with the boss perhaps :)

MySQL How to efficiently compare multiple fields between tables?

So my expertise is not in MySQL so I wrote this query and it is starting to run increasingly slow as in 5 minutes or so with 100k rows in EquipmentData and 30k or so in EquipmentDataStaging (which to me is very little data):
CREATE TEMPORARY TABLE dataCompareTemp
SELECT eds.eds_id FROM equipmentdatastaging eds
INNER JOIN equipment e ON e.e_id_string = eds.eds_e_id_string
INNER JOIN equipmentdata ed ON e.e_id = ed.ed_e_id
AND eds.eds_ed_log_time=ed.ed_log_time
AND eds.eds_ed_unit_type=ed.ed_unit_type
AND eds.eds_ed_value = ed.ed_value
I am using this query to compare data rows pulled from a clients device to current data sitting within their database. From here I take the temp table and use the ID's off it to make conditional decisions. I have the e_id_string indexed and I have e_id indexed and everything else is not. I know that it looks stupid that I have to compare all this information, but the clients system is spitting out redundant data and I am using this query to find it. Any type of help on this would be greatly appreciated whether it be a different approach by SQL or MySql Management. I feel like when I do stuff like this in MSSQL it handles the requests much better, but that is probably because I have something set up incorrectly.
TIPS
index all necessary columns which are using with ON or WHERE condition
here you need to index eds_ed_log_time,eds_e_id_string, eds_ed_unit_type, eds_ed_value,ed_e_id,ed_log_time,ed_unit_type,ed_value
change syntax to SELECT STRAIGHT JOIN ... see more reference