Is there something more efficient than joining tables in MySQL? - mysql

I have a table with entity-attribute-value structure. As an example, as entities I can have different countries. I can have the following attributes: "located in", "has border with", "capital".
Then I want to find all those countries which are "located in Asia" and "has border with Russia". The straightforward way to do that is to join the table with itself using entities are the column for joining and then to use where.
However, if I have 20 rows where Russia in in the entity-column, than in the joint table I will have 20*20=400 rows with Russia as the entity. And it is so for every country. So, the joint table going to be huge.
Will it be not more efficient to use the original table to extract all countries which are located in Asia, then to extract all countries which have border with Russia and then to use those elements which are in both sets of countries?

You shouldn't end up having a huge number of records so this should work
SELECT a.entity,
a.located_in,
a.border
FROM my_table a
WHERE a.border in (SELECT b.entity FROM my_table b WHERE b.entity = 'RUSSIA' )
AND a.located_in = 'ASIA'

You are confusing join with Cartesian product. There could never be more rows in the join then there are in the actual data, the only thing being altered is which elements/rows are taken.
So if you have 20 Russian rows, the table resulting from the join could never have more than 20 Russian entries.
The operation you suggest using is exactly what a join does. Just make sure you have the appropriate indices and let MySQL do the rest.

Related

MySQL GROUP BY slow across three tables with spatial search

I am adding some further First World War records to my astreetnearyou.org site
I have three tables:
people - contains full details of over 1 million people who died
addresses - contains about 700,000 different addresses for about 600,000 of these people
cemeteries - a new table which has records of about 15,000 cemeteries;
In terms of relationships, every address has the ID of the person it relates to; every person in the people table has the name of the cemetery they are buried in (as an aside, these can be long varchar values, would it be better to give them unique integer IDs for the join? Answer: I tried it and it shaved about 0.5 secs off the query time)
I want to run a query that essentially says "give me a unique list of all the people who lived or are buried in this map area (bounding box)"
An example query is:
SELECT people.id, people.rank, people.forename, `people`.surname, people.regiment, people.date_of_death, people.cemeteryname, cemeteries.country, cemeteries.link
FROM people
JOIN cemeteries ON people.cemeteryId=cemeteries.id
LEFT JOIN addresses ON addresses.personId=people.id
WHERE MBRContains( GeomFromText( 'LINESTRING(-0.35 51.50,-0.32 51.51)' ), cemeteries.point) OR MBRContains( GeomFromText( 'LINESTRING(-0.35 51.50,-0.32 51.51)' ), addresses.point)
GROUP BY people.id
This returns 276 results but takes about 6 seconds. Without the GROUP BY it's 296 results including the duplicate IDs but takes well under a second. If I remove the LEFT JOIN table and associated WHERE clause (so I only get matches by cemetery, not address) it is also very quick.
I have spatial indexes on both point fields and all the fields that are in the JOIN conditions, plus based on another post on here I've added indexes across the id and point fields in the addresses table, and the cemetery and point fields in the cemeteries table.
I'm no sql expert so any advice on making this more efficient and thereby quicker would be much appreciated. Also I guess some more table info would probably be of use, but can you tell me what would be helpful and how to produce it?!
ALTER TABLE people ADD INDEX IdCemIdIdx (id, cemeteryId);
if possible, use:
https://www.percona.com/doc/percona-toolkit/LATEST/pt-online-schema-change.html

Set child grandparent relationship

I've built a web app (PHP/MySQL) for people to predict soccer games.
For each entry in the leagues table, there are many entries in the matches table.
For each entry in the matches table, there are many entries in the predictions table.
Should I explicitly set a relationship from the predictions table to the leagues table? In other words, should I add a league_id column to the predictions table?
PRO
Easier queries, less tables to be read in some cases. Example query to look for someones predictions from a certain league with the relationship:
SELECT * FROM predictions p
WHERE p.league_id:league_id AND p.user_id=:user_id
Without the relationship:
SELECT * FROM predictions p
INNER JOIN matches m ON m.match_id=p.match_id AND m.league_id=:league_id
WHERE p.user_id=:user_id
CON
It's data that's already there, so it's duplicate data (makes the database bigger).
A correct normallizzazion of the database is expected that duplicates are avoided above all is to avoid the use of redundant data to avoid relations. You shoul theh withou doubt prefer the second query Theproposed and the related schema
SELECT * FROM predictions p
INNER JOIN matches m ON m.match_id=p.match_id AND m.league_id=:league_id
WHERE p.user_id=:user_id

SQL Inner Join With Multiple Columns

I've got 2 tables - dishes and ingredients:
in Dishes, I've got a list of pizza dishes, ordered as such:
In Ingredients, I've got a list of all the different ingredients for all the dishes, ordered as such:
I want to be able to list all the names of all the ingredients of each dish alongside each dish's name.
I've written this query that does not replace the ingredient ids with names as it should, instead opting to return an empty set - please explain what it that I'm doing wrong:
SELECT dishes.name, ingredients.name, ingredients.id
FROM dishes
INNER JOIN ingredients
ON dishes.ingredient_1=ingredients.id,dishes.ingredient_2=ingredients.id,dishes.ingredient_3=ingredients.id,dishes.ingredient_4=ingredients.id,dishes.ingredient_5=ingredients.id,dishes.ingredient_6=ingredients.id, dishes.ingredient_7=ingredients.id,dishes.ingredient_8=ingredients.id;
It would be great if you could refer to:
The logic of the DB structuring - am I doing it correctly?
The logic behind the SQL query - if the DB is built in the right fashion, then why upon executing the query I get the empty set?
If you've encountered such a problem before - one that requires a single-to-many relationship - how did you solved it in a way different than this, using PHP & MySQL?
Disregard The Text In Hebrew - Treat It As Your Own Language.
It seems to me that a better Database Structure would have a Dishes_Ingredients_Rel table, rather than having a bunch of columns for Ingredients.
DISHES_INGREDIENTS_REL
DishesID
IngredientID
Then, you could just do a much simpler JOIN.
SELECT Ingredients.Name
FROM Dishes_Ingredients_Rel
INNER JOIN Ingredients
ON Dishes_Ingredients.IngredientID = Ingredients.IngredientID
WHERE Dishes_Ingredients_Rel.DishesID = #DishesID
1. The logic of the DB structuring - am I doing it correctly?
This is denormalized data. To normalize it, you would restructure your database into three tables:
Pizza
PizzaIngredients
Ingredients
Pizza would have ID, name, and type where ID is the primary key.
PizzaIngredients would have PizzaId and IngredientId (this is a many-many table where the primary key is a composite key of PizzaId and IngredientID)
Ingredients has ID and name where ID is the primary key.
2. List all the names of all the ingredients of each dish alongside each dish's name. Something like this in MySQL (untested):
SELECT p.ID, p.name, GROUP_CONCAT(i.name) AS ingredients
FROM pizza p
INNER JOIN pizzaingredients pi ON p.ID = pi.PizzaID
INNER JOIN ingredients i ON pi.IngredientID = i.ID
GROUP BY p.id
3. If you've encountered such a problem before - one that requires a single-to-many relationship - how did you solved it in a way different than this, using PHP & MySQL?
Using a many-many relationship, since that what your example truly is. You have many pizzas which can have many ingredients. And many ingredients belong to many different pizzas.
The reason you are getting an empty result is because you are setting a join condition that never gets satisfied. During the INNER join execution the database engine compares each record of the first table with each record of the second one trying to find a match where the id of the ingredient table record being evaluated is equal to ingredient1 AND ingredient2 AND so on. It would return some result if you create a record in the first table with the same ingredient in all 8 columns (testing purposes only).
Regarding the database structure, you choose a denormalized one creating 8 columns for each ingredient. There are a lot of considerations possible on this data structure (performance, maintainability, or just think if you are asked to insert a dish with 9 ingredients for example) and I would personally go for a normalized data structure instead.
But if you want to keep this, you should write something like:
SELECT dishes.name, ingredients1.name, ingredients1.id, ingredients2.name, ingredients2.id, ...
FROM dishes
LEFT JOIN ingredients AS ingredients1 ON dishes.ingredient_1=ingredients1.id
LEFT JOIN ingredients AS ingredients2 ON dishes.ingredient_2=ingredients2.id
LEFT JOIN ingredients AS ingredients3 ON dishes.ingredient_3=ingredients3.id
...
The LEFT join is required to get a result for unmatched ingredients (0 value when no ingredient is set reading your example)

MySQL Query - Lowest Values with Multiple Tables / Joins

i am requesting some help for a query to be used on a custom golf website.
what i need is to find the lowest score per player per course. my club has 3 nine hole loops, 27 holes in total, but i want to find the lowest per 9 holes (i.e. course as i am describing it).
i have the following database structure (note, i haven’t put in all rows, only those that are pertinent to the query i am stuggling with).
Golf DB ERP Diagram
a query to get the full set of data would be (note some field names are different - the diagram was trying to better descriptive…):
select * from round r, round_hole rh, player p, course_nine c, course_hole ch
where r.r_id = rh.rh_rid
and p.id = r.r_pid
and c.cn_nine = r.r_nine
and ch.ch_nine = c.cn_nine
and rh.rh_hid = ch.ch_no
a snapshot of the results are:
Full query ouput
however, i then need to filter it as above, into "per player, per course”
i am presuming this is some subquery, join, temp table or “in” type statement, but struggling, particularly as it spans multiple tables.
any help is appreciated
This can be accomplished using some simple aggregation. As long as you are able to properly join all of your tables, you can do this:
SELECT player, course, MIN(score) AS lowestScore
FROM myTables
GROUP BY player, course;

How do I combine several MySQL tables of varying rows?

I need to compile quantities from several warehouses that update their inventories periodically. The files are automatically loaded into tables in my MySQL database, but sometimes one warehouse might have a few more or less files than the others. I need to add them all to determine the total quantity available from the entire network.
My idea was to do:
SELECT
IFNULL(branch_01.qty,0) +
IFNULL(branch_02.qty,0) +
IFNULL(branch_03.qty,0) +
etc. through all warehouses joined as:
FROM branch_01
JOIN branch_02
USING (oespaced)
JOIN branch_03
USING (oespaced)
etc. through all warehouses
I can't use LEFT JOIN or RIGHT JOIN because sometimes one warehouse might have missing entries and sometimes another might. If a sku is missing from one branch's file, I'd prefer to still have the other branches added together, and just get a NULL, which would be converted to a 0 by the functions in the SELECT. When I've tested different joining solutions, I also seem to be getting Cartesian numbers of rows, which confuses me further.
Any guidance is greatly appreciated.
Just a little clarification:
We need to join 17 tables. We're not really concerned with the sum of a column, but more the individual values. For instance, a table might represent a list of items a,b,c,d, and list quantities of 1,2,3,4. We would have that table from several different warehouses and we would need to find a total for the entire network. If four warehouses had those values, we would want to see a,b,c,d with 4,8,12,16 as values for the total available.
I don't understand your question fully but hope my answer helps you a bit.
Are you joining many tables? So let's say 2 tables and you want to sum up the quantity column?
First of all, JOIN performs the cartesian product of 2(or more) tables. So you'll get so many instances that you don't wish to have; the solution to this is using the WHERE.
Maybe this is what you are looking for:
SELECT sum(a.qty) + sum(b.qty)
FROM table1 a, table2 b
WHERE a.pk = b.fk -- this one resolves the unwanted instances
fk denotes foreign key and pk denotes primary key.