BACKGROUND
I was given a bunch of data that looks like this, a little over 200 columns wide:
Name|Address|etc...|Value 1|Value 2|Code 1|Code 2|repeat 7 times for codes|Value 3|repeat 200 times for values....|Value 200
They included definition lists that are used to decipher the Coded Values, for example: U6 = Local Limit and U7 = More than 100 Times
So I loaded it up into mysql because they wanted reports that swapped in the value from the definition lists for the Coded Values. However not all cells in main table have data, some are blank.
PROBLEM
So, when build my select statement, I would usually use a left join and be fine, but I need multiple left joins to get the 8 definition lists swapped in when needed, multiple left joins give me a lot of extra fields, having trouble with this.
Main Table is called
RAW_DATA
and the tables that hold all of the definition lists are named:
COUNTRY
ORIGIN
LANGUAGE
PREFERENCE
HAS_VEHICLE
EDUCATION
MARITAL_STATUS
OCCUPATION
TECHCODE
TYPE
INCOME
These tables above are just the ones that have definitions. All of the other fields in the 225 table are static and often unique. It could be normalized I am sure, but it would be tons of effort for converting one report, one time. That is why I just used the ones that had codes that were not human recognizable via the definition lists.
MY QUERY
SELECT `raw_data`.`id_raw_data`,
`raw_data`.`id`,
`raw_data`.`first_name`,
`raw_data`.`last_name`,
`raw_data`.`OTHER_COLUMNS_AS_NEEDED`,
`country`.`longname` as `country`,
`origin`.`longname` as `origin`,
`language`.`longname` as `language`,
`preference`.`longname` as `preference`,
`has_vehicle`.`longname` as `vehichle_type`,
`education`.`longname` as `education`,
`marital_status`.`longname` as `marital_status`,
`occupation`.`longname` as `occupation`,
`techcode`.`longname` as `tech_group`,
`typestat`.`longname` as `typecode`,
`income`.`longname` as `income`,
FROM `raw_data`
left join `country`
on `raw_data`.`countrycode` = `country`.`shortname`
left join `origin`
on `raw_data`.`origincode` = `origin`.`shortname`
left join `language`
on `raw_data`.`languagecode` = `origin`.`language`
left join `preference`
on `raw_data`.`preferencecode` = `preference`.`shortname`
left join `has_vehicle`
on `raw_data`.`has_vehiclecode` = `has_vehicle`.`shortname`
left join `education`
on `raw_data`.`educationcode` = `education`.`shortname`
left join `marital_status`
on `raw_data`.`marital_statuscode` = `marital_status`.`shortname`
left join `occupation`
on `raw_data`.`occupationcode` = `occupation`.`shortname`
left join `techcode`
on `raw_data`.`techcodecode` = `techcode`.`shortname`
left join `typecode`
on `raw_data`.`typestatcode` = `typestat`.`shortname`
left join `income`
on `raw_data`.`incomecode` = `income`.`shortname`
I have done some searching, all seem to use some form of sub-query or question involved joining back to itself. I am pretty sure it has something to do with the columns in the massive raw_data table that do not have values so there is no match, but need help.
This seemed close, but my query times out if too many joins already and this seems like even more work for all my lookups: Removing duplicates from result of multiple join on tables with different columns in MySQL
Thanks for the help,
David
In case anyone else wants to know, I found the issue was not with the sql at all, which worked fine for my purpose.
Rather, the data in a definition table had some values that were not unique, so the result returned an extra row in those cases where there was a duplicate definition defined.
Related
I have two tables:
Invariant (UniqueID, characteristic1, characteristic2)
Variant (VariantID, UniqueID, specification1, specification2)
Each project has its own unchanging characteristics between implementations. Each implementation also has its own individual properties.
So, I use queries like this to find projects with the given characteristics and specifications:
SELECT *
FROM `Invariants`
LEFT JOIN (`Variants`) ON (`Variants`.`UniqueID`=`Invariants`.`UniqueID`)
WHERE char2='y' and spec1='x'
GROUP BY `Invariant`.`UniqueID`;
I'm looking for a query that will return all projects that have never satisfied a given specification. So, if one of project 100's variants had spec1='bad', then I don't want project 100 to be included, regardless if it had variants where spec1='good'.
select *
from Invariants iv
where not exists (
select 1
from Variants v
where v.UniqueId = iv.UniqueId and v.spec1 = 'bad'
)
The queries below do not address your question, I probably read to fast and thought you wanted to pick up only the invariant properties of a particular type. But I will note that you shouldn't use a left join and then filter, in the where clause, against columns from the right table (except for checking nulls). People make that mistake all the time and that's what jumped out to me at first glance.
The whole purpose of a left join is that some of the rows will not match and will thus have filler null values in the columns for the right-hand table. This join logic happens first and then after that the where clause is applied. When you have a condition like where spec1 = 'x' it will always evaluate to false against a null value. So you end up eliminating all the rows you wanted to keep.
This happens a lot with these invariant/custom values tables. You're only interested in one of the properties but if you don't filter prior to joining or inside the join condition, you end up dropping rows because the value didn't exist and you didn't have a value left to compare once it tried to apply a where-clause condition on the property name.
Hope that made sense. See below for examples:
select iv.UniqueId, ...
from
Invariants iv left outer join
Variants
on v.UniqueId = vi.UniqueId and v.spec1 = 'x'
or
select iv.UniqueId, ...
from
Invariants iv left outer join
(
select
from Variants
where spec1 = 'x'
) v
on v.UniqueId = vi.UniqueId
I'm not sure if this is possible exactly as stated in the title, but what I'm trying to accomplish, however possible, is what would amount to left joining two columns from one table, each on a separate table.
Here's the statement I'm working with. Unfortunately, MySQL doesn't allow me to actually do this and gives me an error message. Is there another way to accomplish what I want? Is my syntax perhaps just off?
select h.HITTER_ID, u.UMPIRE_ID, u.HAND_B
from
hitters h,
schedules sched
LEFT JOIN
umpires u
ON sched.home_base_umpire_id = u.UMPIRE_ID and h.HAND_B = u.HAND_B
In the umpires table, each UMPIRE_ID appears twice, once with HAND_B = "R" and once with HAND_B = "L"
Essentially, I want to:
1) Pull the UMPIRE_ID from umpires when that UMPIRE_ID appears in schedules
2) Of the two UMPIRE_ID records, select the one with the HAND_B field that corresponds to the HAND_B field in hitters
I could put the "h.HAND_B = u.HAND_B" in the where clause, but that would require that the UMPIRE_ID not be NULL, and I need to leave open the possibility that it is NULL.
How can I accomplish this?
You need to modify your query to be like below
select h.HITTER_ID,
u.UMPIRE_ID,
u.HAND_B
from hitters h
left join umpires u on h.HAND_B = u.HAND_B
left join schedules sched ON sched.home_base_umpire_id = u.UMPIRE_ID;
i have two tables as below:
Table 1 "customer" with fields "Cust_id", "first_name", "last_name" (10 customers)
Table 2 "cust_order" with fields "order_id", "cust_id", (26 orders)
I need to display "Cust_id" "first_name" "last_name" "order_id"
to where i need count of order_id group by cust_id like list total number of orders placed by each customer.
I am running below query, however, it is counting all the 26 orders and applying that 26 orders to each of the customer.
SELECT COUNT(order_id), cus.cust_id, cus.first_name, cus.last_name
FROM cust_order, customer cus
GROUP BY cust_id;
Could you please suggest/advice what is wrong in the query?
You issue here is that you have told the database how these two tables are 'connected', or what they should be connected by:
Have a look at this image:
~IMAGE SOURCE
This effectively allows you to 'join' two tables together, and use a query between them.
so you might want to use something like:
SELECT COUNT(B.order_id), A.cust_id, A.first_name, A.last_name
FROM customer A
LEFT JOIN cust_order B //this is using a left join, but an inner may be appropriate also
ON (A.cust_id= B.Cust_id) //what links them together
GROUP BY A.cust_id; // the group by clause
As per your comment requesting some further info:
Left Join (right joins are almost identical, only the other way around):
The SQL LEFT JOIN returns all rows from the left table, even if there are no matches in the right table. This means that if the ON clause matches 0 (zero) records in right table, the join will still return a row in the result, but with NULL in each column from right table. ~Tutorials Point.
This means that a left join returns all the values from the left table, plus matched values from the right table or NULL in case of no matching join predicate.
LEFT joins will be used in the cases where you wish to retrieve all the data from the table in the left hand side, and only data from the right that match.
Execution Time
While the accepted answer in this case may work well in small datasets, it may however become 'heavy' in larger databases. This is because it was not actually designed for this type of operation.
This was the purpose of Joins to be introduced.
Much work in database-systems has aimed at efficient implementation of joins, because relational systems commonly call for joins, yet face difficulties in optimising their efficient execution. The problem arises because inner joins operate both commutatively and associatively. ~Wikipedia
In practice, this means that the user merely supplies the list of tables for joining and the join conditions to use, and the database system has the task of determining the most efficient way to perform the operation. A query optimizer determines how to execute a query containing joins. So, by allowing the dbms to choose the way your data is queried, you can save a lot of time.
Other Joins/Summary
AN INNER JOIN will return data from both tables where the keys in each table match
A LEFT JOIN or RIGHT JOIN will return all the rows from one table and matching data from the other table.
Use a join when you want to query multiple tables.
Joins are much faster than other ways of querying >=2 tables (speed can be seen much better on larger datasets).
You could try this one:
SELECT COUNT(cus_order.order_id), cus.cust_id, cus.first_name, cus.last_name
FROM cust_order cus_order, customer cus
WHERE cus_order.cust_id = cus.cust_id
GROUP BY cust_id;
Maybe an left join will help you
SELECT COUNT(order_id), cus.cust_id, cus.first_name, cus.last_name ]
FROM customer cus
LEFT JOIN cust_order co
ON (co.cust_id= cus.Cust_id )
GROUP BY cus.cust_id;
I've done quite a bit of reading and testing, and I can get the information I want out of individual queries, but I can't seem to join them to get everything in one table.
Goal - List all containers with the related blocks and the name of the AdminRole.
Here's a visual of the relationships between the tables and fields I need:
Table Structure Picture
Since I can't attach pictures yet, here's the table descriptions:
AdminACL
adminID (=adminrole_admin.adminID) (EDIT: =adminrole.ID)
objectID (=Container.ID)
ObjectType (Condition "Container")
adminrole_admin (EDIT: table not needed)
adminroleID (=adminrole.ID)
adminID (=AdminACL.adminID)
adminrole
ID (=adminrole_admin.adminroleID) (EDIT: =AdminACL.adminid)
name <- Desired field in results table [bonus points for condition not like 'hidden_%']
Container
ID (=AdminACL.objectID)
name <- Desired field in results table
container_block
containerID (=Container.ID)
blockID (=Block.ID)
Block
ID (=container_block.blockID)
name <- Desired field in results table
I've got select statements for each piece, but since it makes for exceptionally messy queries, I tried joins, but end up with empty sets. I'm not sure which table to start with and if I need to use different types of joins and/or in different orders.
Here's the last query I tried before giving up:
SELECT C.name, B.name, A.name
FROM container C
JOIN (SELECT adminid,objected from adminacl where objecttype='Container') ACL ON ACL.objectid=C.id
JOIN adminrole_admin AA ON ACL.adminid=AA.adminid
JOIN (select id,name from adminrole where name not like 'hidden_%') A ON AA.adminid=A.id
JOIN container_block CB ON C.id=CB.containerid
JOIN block B ON CB.blockid=B.id;
As much as I'd love the answer, I'd also like to understand how to structure this type of query in the future. I'm pretty sure I'm missing something obvious - just having a hard time relating all the other examples I've looked at with this. TIA!
EDIT: It turns out that adminacl.adminid=adminrole.id, which made things a lot more simple. Also ended up needing the 'hidden' admin roles, which were specific user based permissions added on top of the roles. Based on Russ's answer, this is what my final query ended up being:
SELECT container.name, block.name, adminrole.name
FROM block
LEFT JOIN container_block ON container_block.blockid=block.id
LEFT JOIN container ON container.id=container_block.containerid
LEFT JOIN adminacl ON adminacl.objectid=container.id
JOIN adminrole ON adminrole.id=adminacl.adminid
WHERE
adminacl.objecttype='Container'
AND block.blockstatus !=1 #to exclude unassigned blocks
ORDER BY container.name, block.name;
I like to start from what I actually want the largest quantity of. From your question, I think you want a list of blocks with containers and admin roles on there, so I started there and left-joined on the additional information to create the list.
SELECT Block.name, Container.name, adminrole.name from
Block
LEFT JOIN container_block ON container_block.containerid = Block.id
LEFT JOIN Container ON Container.id = container_block.containerid
LEFT JOIN AdminACL ON AdminACL.objectid = Container.id
RIGHT JOIN adminrole_admin ON adminrole_admin.adminid #there could be multiple roles
LEFT JOIN adminrole ON adminrole.id = adminrole_admin.adminroleID
WHERE
AdminACL.ObjectType = "Container"
AND adminrole.name not like 'hidden_%';
# shouldn't you have a `hidden` column on the table, instead of prefixing the name?
Disclaimer: I wrote this from memory and didn't even test it on anything, but give it a try, it might set you on the right path.
LEFT JOIN is going to exclude rows that fail the join condition, so you shouldn't get lots of irrelevant blanks, but I'm assuming that there can be many adminroles, so I'm using a RIGHT JOIN to produce rows that contain all possible combinations of the adminrole and Block. This will mean that the same block appears with different adminroles and the same adminrole will appear on multiple blocks. Your application will need to handle this duplication appropriately.
MySQL setup: step by step.
programs -> linked to --> speakers (by program_id)
At this point, it's easy for me to query all the data:
SELECT *
FROM programs
JOIN speakers on programs.program_id = speakers.program_id
Nice and easy.
The trick for me is this. My speakers table is also linked to a third table, "books." So in the "speakers" table, I have "book_id" and in the "books" table, the book_id is linked to a name.
I've tried this (including a WHERE you'll notice):
SELECT *
FROM programs
JOIN speakers on programs.program_id = speakers.program_id
JOIN books on speakers.book_id = books.book_id
WHERE programs.category_id = 1
LIMIT 5
No results.
My questions:
What am I doing wrong?
What's the most efficient way to make this query?
Basically, I want to get back all the programs data and the books data, but instead of the book_id, I need it to come back as the book name (from the 3rd table).
Thanks in advance for your help.
UPDATE:
(rather than opening a brand new question)
The left join worked for me. However, I have a new problem. Multiple books can be assigned to a single speaker.
Using the left join, returns two rows!! What do I need to add to return only a single row, but separate the two books.
is there any chance that the books table doesn't have any matching columns for speakers.book_id?
Try using a left join which will still return the program/speaker combinations, even if there are no matches in books.
SELECT *
FROM programs
JOIN speakers on programs.program_id = speakers.program_id
LEFT JOIN books on speakers.book_id = books.book_id
WHERE programs.category_id = 1
LIMIT 5
Btw, could you post the table schemas for all tables involved, and exactly what output (or reasonable representation) you'd expect to get?
Edit: Response to op author comment
you can use group by and group_concat to put all the books on one row.
e.g.
SELECT speakers.speaker_id,
speakers.speaker_name,
programs.program_id,
programs.program_name,
group_concat(books.book_name)
FROM programs
JOIN speakers on programs.program_id = speakers.program_id
LEFT JOIN books on speakers.book_id = books.book_id
WHERE programs.category_id = 1
GROUP BY speakers.id
LIMIT 5
Note: since I don't know the exact column names, these may be off
That's typically efficient. There is some kind of assumption you are making that isn't true. Do your speakers have books assigned? If they don't that last JOIN should be a LEFT JOIN.
This kind of query is typically pretty efficient, since you almost certainly have primary keys as indexes. The main issue would be whether your indexes are covering (which is more likely to occur if you don't use SELECT *, but instead select only the columns you need).