What type of Join to use? - mysql

I've got a core table and and 3 tables that extend the 'core' table in different ways.
I'm working with MLS data and I have a 'common' table that contains information common to all mls listings and then a table that has specifically "residential" information, one for "commercial",etc... I have been using mls number to join a single table when I know a listing when the property type is known, but for searching I want to join all of them and have the special fields available for search criteria (not simply searching the common table).
What type of join will give me a dataset that will contain all listings (including the extended fields in the idx tables) ?
For each Common table record there is a single corresponding record in ONLY ONE of the idx tables.
___________
| |
| COMMON |
| |
|___________|
_|_
|
___________________|_____________________
_|_ _|_ _|_
_____|_____ _____|______ ____|______
| | | | | |
| IDX1 | | IDX2 | | IDX3 |
| | | | | |
|___________| |____________| |___________|

If you want everything in one row, you can use something like this format. Basically it gives you all the "Common" fields, then the other fields if there is a match otherwise NULL:
SELECT Common.*,
Idx1.*,
Idx2.*,
Idx3.*
FROM Common
LEFT JOIN Idx1
ON Idx1.MLSKey = Common.MLSKey
LEFT JOIN Idx2
ON Idx2.MLSKey = Common.MLSKey
LEFT JOIN Idx3
ON Idx3.MLSKey = Common.MLSKey
Bear in mind it's better to list out fields than to use the SELECT * whenever possible...
Also I'm assuming MySQL syntax is the same as SQL Server, which is what I use.

I have a similar set up of tables where the table 'jobs' is the core table.
I have this query that selects certain elements from each of the other 2 tables:
SELECT jobs.frequency, twitterdetails.accountname, feeds.feed
FROM jobs
JOIN twitterdetails ON twitterdetails.ID = jobs.accountID
JOIN feeds ON jobs.FeedID = feeds.FeedID
WHERE jobs.username ='".$currentuser."';");
So, as you can see, no specific JOIN, but the linking fields defined. You'd probably just need an extra JOIN line for your set up.

Ugly solution / poor attempt / may have misunderstood question:
SELECT common.*,IDX1.field,NULL,NULL FROM COMMON
LEFT JOIN IDX1 ON COMMON.ID = IDX1.ID
WHERE TYPE="RESIDENTIAL"
UNION ALL
SELECT common.*,NULL,IDX2.field,NULL FROM COMMON
LEFT JOIN IDX2 ON COMMON.ID = IDX2.ID
WHERE TYPE="RESIDENTIAL"
UNION ALL
SELECT common.*,NULL,NULL,IDX3.field FROM COMMON
LEFT JOIN IDX3 ON COMMON.ID = IDX3.ID
WHERE TYPE="INDUSTRIAL"

Orbit is close. Use inner join, not left join. You don't want common to show up in the join if it does not have a row in idx.
You MUST union 3 queries to get the proper results assuming each record in common can only have 1 idx table. Plug in "NULL" to fill in the columns that each idx table is missing so they can be unioned.
BTW your table design is good.

Related

2 inner joins between same 2 tables

I am trying to select columns from 2 tables,
The INNER JOIN conditions are $table1.idaction_url=$table2.idaction AND $table1.idaction_name=$table2.idaction.
However, From the query below, there is no output. It seems like the INNER JOIN can only take 1 condition. If I put AND to include both conditions as shown in the query below, there wont be any output. Please look at the picture below. Please advice.
$mysql=("SELECT conv(hex($table1.idvisitor), 16, 10) as visitorId,
$table1.server_time, $table1.idaction_url,
$table1.time_spent_ref_action,$table2.name,
$table2.type, $table1.idaction_name, $table2.idaction
FROM $table1
INNER JOIN $table2
ON $table1.idaction_url=$table2.idaction
AND $table1.idaction_name=$table2.idaction
WHERE conv(hex(idvisitor), 16, 10)='".$id."'
ORDER BY server_time DESC");
Short answer:
You need to use two separate inner joins, not only a single join.
E.g.
SELECT `actionurls`.`name` AS `actionUrl`, `actionnames`.`name` AS `actionName`
FROM `table1`
INNER JOIN `table2` AS `actionurls` ON `table1`.`idaction_url` = `actionurls`.`idaction`
INNER JOIN `table2` AS `actionnames` ON `table1`.`idaction_name` = `actionurls`.`idaction`
(Modify this query with any additional fields you want to select).
In depth: INNER JOIN, when done on a value unique to the second table (the table joined to the first in this operation) will only ever fetch one row. What you want to do is fetch data from the other table twice, into the same row, reading the select part of the statement.
INNER JOIN table2 ON [comparison] will, for each row selected from table1, grab any rows from table2 for which [comparison] is TRUE, then copy the row from table1 N times, where N is the amount of rows found in table2. If N = 0, then the row is skipped. In our case N=1 so INNER JOIN of idaction_name in table1 to idaction in table2 for example will allow you to select all the action names.
In order to get the action urls as well we have to INNER JOIN a second time. Now you can't join the same table twice normally, as SQL won't know which of the two joined tables is meant when you type table2.name in the first part of your query. This would be ambiguous if both had the same name. There's a solution for this, table aliases.
The output (of my answer above) is going to be something like:
+-----+------------------------+-------------------------+
| Row | actionUrl | actionName |
+-----+------------------------+-------------------------+
| 1 | unx.co.jp/ | UNIX | Kumamoto Home |
| 2 | unx.co.jp/profile.html | UNIX | Kumamoto Profile |
| ... | ... | ... |
+-----+------------------------+-------------------------+
While if you used only a single join, you would get this kind of output (using OR):
+-----+-------------------------+
| Row | actionUrl |
+-----+-------------------------+
| 1 | unx.co.jp/ |
| 2 | UNIX | Kumamoto Home |
| 3 | unx.co.jp/profile.html |
| 4 | UNIX | Kumamoto Profile |
| ... | ... |
+-----+-------------------------+
Using AND and a single join, you only get output if idaction_name == idaction_url is TRUE. This is not the case, so there's no output.
If you want to know more about how to use JOINS, consult the manual about them.
Sidenote
Also, I can't help but notice you're using variables (e.g. $table1) that store the names of the tables. Do you make sure that those values do not contain user input? And, if they do, do you at least whitelist a list of tables that users can access? You may have some security issues with this.
INNER JOIN does not put any restriction on number of conditions it can have.
The zero resultant rows means, there is no rows satisfying the two conditions simultaneously.
Make sure you are joining using correct columns. Try going step by step to identify from where the data is lost

select multiple columns from multiple tables except certain columns

I have three tables.
The first table is like:
+----+----+----+
| id | x | y |
+----+----+----+
The second and third tables are like:
+----+----+----+----+----+----+----+----+----+----+----+----+
| id | Z1 | Z2 | Z3 | .. | .. | .. | .. | .. | .. | .. | Zn |
+----+----+----+----+----+----+----+----+----+----+----+----+
n is quite large, about 800-900.
I know it is quite ugly tables and database. But it is a raw data set and a learning set of a certain experiment. Please, just ignore it.
And a skeleton of a query is like:
'SELECT a.*, b.*, c.* \
FROM `test_xy` a, `test_1` b, `test_2` c \
WHERE a.id = b.id AND b.id = c.id'
What I concern is, the result with the query includes id field three times. I want id field to appear just one time at the front of the result.
I can do it by slicing the result table (by Python, MATLAB, etc.)
But, is there a better way to do this with a large number of columns? I mean, can id field of the second and third tables be excluded at the query stage?
The answer is the USING syntax: MySQL specific by the way. http://dev.mysql.com/doc/refman/5.5/en/join.html. Learn to use JOINs before you do anything else; putting the jon condition into the where clause is just plan wrong.
SELECT a.*, b.*, c.*
FROM `test_xy` a JOIN `test_1` b USING(`id)
JOIN `test_2` c USING(`id)

MySQL queries, selecting field from one of many databases

I have a remarks table which can be linked to any number of other items in a system, in the case of this example we'll use bookings, enquiries and referrals.
Thus in the remarks table we have columns
remark_id | datetime | text | booking_id | enquiry_id | referral_id
1 | 2014-06-28 | abc | 0 | 8 | 0
2 | 2014-06-27 | def | 3 | 0 | 0
2 | 2014-05-31 | ghi | 0 | 0 | 10
Etc...
Each of the item tables will have a field called name. Thus when I want to select a remark the likelihood is I'll need this name.
I'd like to achieve this with a single query, getting a 2d array as follows:
['remark_id'=>1, 'datetime'=>'2014-06-28', 'text'=>'abc', 'name'=>'Harold']
However the query I'd expect to use would be
SELECT r.remark_id,r.datetime,r.text
,b.name AS book,rr.name AS referral,e.name AS enquiry
FROM remarks AS r
LEFT JOIN bookings AS b ON b.book_id=r.book_id
LEFT JOIN referrals AS rr ON rr.referral_id=r.referral_id
LEFT JOIN enquiries AS e ON e.enquiry_id=r.enquiry_id
Leaving me with the output
['remark_id'=>1, 'datetime'=>'2014-06-28', 'text'=>'abc', 'book'=>'Harold', 'referral'=>'', 'enquiry'=>'']
And more processing to do before or during rendering it to a view.
Is there a way to write a query such that it would fill a field from the first NOT NULL string it encountered in one of the joined tables?
Please only suggest using a different database system if you know that MySQL doesn't provide any way to do what I'm asking. If it's the case it can't be done there's no business sense in rewriting the system anyway, but I'd like to ask!
Two ways I can think of:
use UNION:
SELECT remark_id, datetime, text, name
FROM remarks
JOIN bookings ON (remarks.book_id = bookings.book_id)
UNION
SELECT remark_id, datetime, text, name
FROM remarks
JOIN referrals ON (remarks.referral_id = referrals.referral_id)
UNION
SELECT remark_id, datetime, text, name
FROM remarks
JOIN enquiries ON (remarks.enquiry_id = enquiries.enquiry_id)</code>
use IFNULL (probably much slower):
SELECT r.remark_id,r.datetime,r.text,
IFNULL(b.name,IFNULL(rr.name,e.name)) AS name
FROM remarks AS r
LEFT JOIN bookings AS b ON b.book_id=r.book_id
LEFT JOIN referrals AS rr ON rr.referral_id=r.referral_id
LEFT JOIN enquiries AS e ON e.enquiry_id=r.enquiry_id</code>
Variant 2 is really much slower because of the LEFT JOINs.
Also, generally I would not recommend using 0 as value for non-existent links, rather use NULL. This will allow MySQL to speed up the join.
one way to achieve this is with nested if statements:
if(b.name is not null, b.name, if(rr.name is not null, rr.name, e.name)) as name
one drawback is that this gives an implicit priority to books? not sure if that would be an issue.
perhaps the main drawback, though, is that this is kind of "magical" and has goofy syntax so it might be more clear to just handle those cases in the controller after all.
Seems quite messy that you have multiple unused columns for each entry, unless I'm not understanding correctly. If you add more tables, you'd have to adjust each of the views so that it would filter out the new table.
I'd be tempted to redesign your structure so that each of the tables has a remarkgroup_id column, then add the following remark table
remark_id, remarkgroup_id, date, message
This would clean up the extra unused columns and allow you to use simple joining logic.

Optimize 5 table SQL query (stores => items => words)

Tables
stores (100,000 rows): id (pk), name, lat, lng, ...
store_items (9,000,000 rows): store_id (fk), item_id (fk)
items (200,000 rows): id(pk), name, ...
item_words (1,000,000 rows): item_id(fk), word_id(fk)
words (50,000 rows): id(pk), word VARCHAR(255)
Note: all ids are integers.
========
Indexes
CREATE UNIQUE INDEX storeitems_storeid_itemid_i ON store_items(store_id,item_id);
CREATE UNIQUE INDEX itemwords_wordid_itemid_i ON item_words(word_id,item_id);
CREATE UNIQUE INDEX words_word_i ON words(word);
Note: I prefer multi column indexes (storeitems_storeid_itemid_i and itemwords_wordid_itemid_i) because: http://www.mysqlperformanceblog.com/2008/08/22/multiple-column-index-vs-multiple-indexes/
QUERY
select s.name, s.lat, s.lng, i.name
from words w, item_words iw, items i, store_items si, stores s
where iw.word_id=w.id
and i.id=iw.item_id
and si.item_id=i.id
and s.id=si.store_id
and w.word='MILK';
Problem: elapsed time is 20-120 sec (depending on the word)!!!
explain $QUERY$
+----+-------------+-------+--------+-------------------------------------------------------+-----------------------------+---------+-----------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-------------------------------------------------------+-----------------------------+---------+-----------------------------+------+-------------+
| 1 | SIMPLE | w | const | PRIMARY,words_word_i | words_word_i | 257 | const | 1 | Using index |
| 1 | SIMPLE | iw | ref | itemwords_wordid_itemid_i,itemwords_itemid_fk | itemwords_wordid_itemid_i | 4 | const | 1 | Using index |
| 1 | SIMPLE | i | eq_ref | PRIMARY | PRIMARY | 4 | iw.item_id | 1 | |
| 1 | SIMPLE | si | ref | storeitems_storeid_itemid_i,storeitems_itemid_fk | storeitems_itemid_fk | 4 | iw.item_id | 16 | Using index |
| 1 | SIMPLE | s | eq_ref | PRIMARY | PRIMARY | 4 | si.store_id | 1 | |
I want elapsed time to be less than 5 secs!!! Any ideas???
==============
What I tried
I tried to see when increase in the execution time happens by adding tables to the query.
1 table
select * from words where word='MILK';
Elapsed time: 0.4 sec
2 tables
select count(*)
from words w, item_words iw
where iw.word_id=w.id
and w.word='MILK';
Elapsed time: 0.5-2 sec (depending on word)
3 tables
select count(*)
from words w, item_words iw, items i
where iw.word_id=w.id
and i.id=iw.item_id
and w.word='MILK';
Elapsed time: 0.5-2 sec (depending on word)
4 tables
select count(*)
from words w, item_words iw, items i, store_items si
where iw.word_id=w.id
and i.id=iw.item_id
and si.item_id=i.id
and w.word='MILK';
Elapsed time: 20-120 sec (depending on word)
I guess the problem with the indexes or with the design of query/database. But there must be a way to make it work fast. Google does it somehow and their tables are much bigger!
a) You're actually writing queries to do FTS inside mysql -> use real FTS like lucene instead.
b) Clearly, adding the 9M row join is the performance issue
c) How about limiting that join (maybe it's being done in full with the current query plan) like this :
SELECT
s.name, s.lat, s.lng, i.name
FROM
(SELECT * FROM words WHERE word='MILK') w
INNER JOIN
item_words iw
ON
iw.word_id=w.id
INNER JOIN
items i
ON
i.id=iw.item_id
INNER JOIN
store_items si
ON
si.item_id=i.id
INNER JOIN
stores s
ON
s.id=si.store_id;
The logic behind this is that instead of joining full tables and then limiting the results, you start by limiting the tables on which you will join, this (if the join order happens to be the one I wrote) will greatly reduce your working set and inner queries running time.
d) Google does NOT use mysql for FTS
Consider de-normalising the structure - the first candidate is the 1 million record item_words table - bring the words directly into the table. Creating a list of unique words might be more easily achieved through a view (depends on how often you need this data compared to, for example, your need to extract a list of shops with products associated with a keyword).
Secondly - create indexed views (not an option in MySQL, but certainly an option on other commercial databases).
You don't have an index that it can use to find the store_id if given the item_id. If the cardinality of store_id is low enough it might gain some benefit from storeitems_storeid_itemid_i, but since you have 100,000 stores this might not be so useful. You might try creating an index on store_items that lists the item_id first:
CREATE UNIQUE INDEX storeitems_item_store ON store_items(item_id, store_id);
Also, I'm not sure if putting join conditions in the where clause will affect performance adversely in the way you're seeing but you might try changing the query to something like this:
select s.name, s.lat, s.lng, i.name
from words w LEFT JOIN item_words iw ON w.id=iw.word_id
LEFT JOIN items i ON i.id=iw.item_id
LEFT JOIN store_items si ON si.item_id=i.id
LEFT JOIN stores s ON s.id=si.store_id
where w.word='MILK';
Without knowing the exact layout of your tables it's hard to give a good answer. But these types of multi table joins has a tendency to get really bogged down. Especially if one of the factors making up the expression of selection is a dynamic string.
You could try to return multiple resultsets of the tables in one go, from a stored procedure or something and then joining the data outside of SQL. This way I got the query time of a massive join down from 2 minutes to 4 seconds. Or do it using a temporary table and return the resultset from that when you are done.
Start with selecting from the words table since that's where you have the dynamic string. Then you can select from the other tables based on the data returned from that query.
Try this one.
Rewrite the query in such way
select s.name, s.lat, s.lng, i.name
from words w LEFT JOIN item_words iw ON w.id=iw.word_id AND w.word='MILK'
LEFT JOIN items i ON i.id=iw.item_id
LEFT JOIN store_items si ON si.item_id=i.id
LEFT JOIN stores s ON s.id=si.store_id
And create index on (w.id, w.word)
Have you tried analyzing the tables ?
this will help the optimiser select the best possible execution plan.
e.g:
ANALYZE TABLE words
ANALYZE TABLE item_words
ANALYZE TABLE items
ANALYZE TABLE store_items
ANALYZE TABLE stores
see: http://dev.mysql.com/doc/refman/5.0/en/analyze-table.html

complex sql query issue

I have a little SQL but I can't find the way to get back text just numbers. - revised!
SELECT if( `linktype` = "group",
(SELECT contactgroups.grname
FROM contactgroups, groupmembers
WHERE contactgroups.id = groupmembers.id ???
AND contactgroups.id = groupmembers.link_id),
(SELECT contactmain.contact_sur
FROM contactmain, groupmembers
WHERE contactmain.id = groupmembers.id ???
AND contactmain.id = groupmembers.link_id) ) AS adat
FROM groupmembers;
As now I have improved a bit gives back some info but ??? (thanks to minitech) indicate my problem. I can't see how could I fix... Any advice welcomed! Thansk
Contactmain (id, contact_sur, email2)
data:
1 | Peter | email#email.com
2 | Andrew| email2#email.com
Contactgroups (id, grname)
data:
1 | All
2 | Trustee
3 | Comitee
Groupmembers (id, group_id, linktype, link_id)
data:
1 | 1 | contact | 1
2 | 1 | contact | 2
3 | 2 | contact | 1
4 | 3 | group | 2
And I would like to list out who is in the 'Comitee' the result should be Andrew and Trustee if I am right:)
It does look a bit redundant on the join since you are implying both the ID and Link_ID columns are the same value. Since BOTH select values are derived from a qualification to the group members table, I have restructured the query to use THAT as the primary table and do a LEFT JOIN to each of the other tables, anticipating from your query that the link should be found from ONE or the OTHER tables. So, with each respective LEFT JOIN, you will go through the GroupMembers table only ONCE. Now, your IF(). Since the group members is the basis, and we have BOTH tables available and linked, we just grab the column from one table vs the other respectively. I've included the "linktype" too just for reference purposes. By using the STRAIGHT_JOIN will help the engine from trying to change the interpretation of how to join the tables.
SELECT STRAIGHT_JOIN
gm.linktype,
if( gm.linktype = "group", cg.grname, cm.contact_sur ) ADat
from
groupmembers gm
left join contactgroups cg
ON gm.link_id = cg.id
left join contactmain cm
ON gm.link_id = cm.id
If contactgroups.id must equal groupmembers.id but must also equal 2, that's redundant and also probably where your problem is. It works fine as you've written it: http://ideone.com/7EGLZ so without knowing what it's actually supposed to do I can't help more.
EDIT: I'm unfamiliar with the comma-separated FROM, but it gives the same result since you don't select anything from the other table so it doesn't really matter.