I am relatively new to the SQL programming, so please go easy on me.
I am currently writing a query, which would output the result based on the value from one of the outer parameters. The structure is currently looking like following:
#ShowEntireCategory bit = 0
select distinct
p.pk
p.name
--other columns
from dbo.Project P
--bunch of left joins
where p.Status = 'Open'
--other conditions
What I am trying to implement is: when the value of ShowEntireCategory is 1 (changed programmatically through radiobutton selection) it will show records of all subcategories, which are inside of the the category. When it is 0, it will only show records from the selected subcategory, while other subcategories in that category remains untouched.
I have been performing a research on the best approach, and it narrowed down to either WHERE statements or JOINs.
What I want to know is: which of these approaches I should use for my scenario? In my case the priority is optimization (minimum time to execute) and ease of implementation.
NOTE: My main goal here is not to receive a ready to use code here (though an example code snippets would be welcome), I just want to know a better approach, so I can continue researching in that direction.
Thank you in advance!
UPDATE
I have performed additional research on the database structure, and managed to narrow down to parameters relevant to the question
One is dbo.Project table, which contains: PK, CategoryKey (FK) (connected to the one in second table), Name, Description, and all other parameters which are irrelevant.
Second one is dbo.Area table, which contains: PK, AreaNumber, Name, CategoryKey (FK), IsCategory (1 = is category, 0 = not category).
Sorry, but I work in fast-paced environment, this is as much as I was able to squeeze. Please let me know if it is not enough.
With the information you provided the best solution would be to use a combination of WHERE clauses and JOINS. You would likely need to use a WHERE clause on the second table (described in the update) to select all rows which are categories. Then, you would JOIN this result with your other tables/data. Finally, you can use a CASE clause (details found here) to check your variable and determine if you need all categories or just some (which can be dealt with through an additional WHERE clause).
Not sure this entirely answers your question, but for a more detailed answer we would need a more detailed description of the database schema.
Related
I'm preparing for an exam in databases and SQL and I'm solving an exercise:
We have a database of 4 tables that represent a human resources company. The tables are:
applicant(a-id,a-name,a-city,years-of-study),
job(job-name,job-id),
qualified(a-id,job-id)
wish(a-id,job-id).
the table applicant represents the table of applicants obviously. And jobs is the table of available jobs. the table qualified shows what jobs a person is qualified for, and the table wish shows what jobs a person is interested in.
The question was to write a query that displays for each job-id, the number of applicants that are both qualified and interested to work in.
Here is the solution the teacher wrote:
Select q1.job_id
, count(q1.a_id)
from qualified as q1
, wish as w1
Where q1.a_id = w1.a_id
and q1.job_id = w1.job_id
Group by job_id;
That's all well and good, I'm not sure why we needed that "as q1" and "as w1", but i can see why it works.
And here is the solution I wrote:
SELECT job-id,COUNT(a-id) FROM job,qualified,wish WHERE (qualified.a-id=wish.a-id)
GROUP BY job-id
Why is my solution wrong? And also - From which table will it select the information? Suppose I write SELECT job-id FROM job,qualified,wish. From which table will it take the information? because job-id exists in all 3 of these tables.
You can only refer to tables mentioned in the FROM clause. If it's ambiguous (because more than one has a column of the same name) then you need to be explicit by qualifying the name. Usually the qualifier is an alias but it could also be the table name itself if an alias wasn't specified.
There's a concept of a "natural join" which joins tables on common column(s) between two tables. Not all systems support that notation but I think MySQL does. I believe these systems usually collapse the joined pairs into a single column.
select q1.job_id, count(q1.a_id) from qualified as q1, wish as w1
where q1.a_id = w1.a_id and q1.job_id = w1.job_id
group by job_id;
I don't think I've worked on any systems that would have accepted the query above because the grouping column would have been strictly unclear even though the intention really is not. So if it truly does work correctly on MySQL then my guess is that it recognizes the equivalence of the columns and cuts you some slack on the syntax.
By the way, your query appears to be incorrect because you only included a single column in a join that requires two columns. You also included a third table which means that your result will effectively do a cross join of every row in that table. The grouping is going to still going to reduce it to one row per job_id but the count is going to be multiplied by the number of rows in the job table. Perhaps you added that table thinking it would hurt to add it just in case you need it but that is not what it means at all.
Your query will list non-existing jobs in case the database has orphan records in applicant and qualified, and might also omit jobs that have no qualified and willing candidates.
I'm not exactly sure, because I have no idea if there's any database that will accept COUNT(a-id) when there's no information about the table from which to take this value.
edit: Interestingly it looks like both of these problems are shared by both of the solutions, but shawnt00 has a point: your solution makes a huge pointless cartesian of three tables: see it without the group by.
My current best guess for a working answer would therefore be http://sqlfiddle.com/#!9/09d0c/6
What methods are recommended for Selecting multiple columns within a nested subquery? It's been a while since I've coded any queries and I'm having some difficulty wrapping my head around this. The specific challenge is on Line 2 of the code below. The IN operand doesn't quite work here (see error message below), and I'm not sure if it's simply a matter of the syntax I'm using, and/or there is a much better way to go about this (i.e. using the HAVING operand or a JOIN statement)
SELECT * FROM Rules WHERE Rules.LNRule_id
IN(SELECT LNRule_id1,LNRule_id2,LNRule_id3,LNRule_id4 FROM Silhouette
WHERE Silhouette.Silhouette_Skirt=(SELECT Silhouette_Skirt FROM Style
WHERE Style.Style_Skirt='$Style_Skirt')
)
The purpose of this query is to SELECT all the relevant rows in table Rules for a particular value in table Style (i.e. $Style_Skirt), which it does by matching it to one of several factors - in this case the garment's Silhouette. What I am therefore trying to do in this portion of the query is SELECT all rows in table Rules who's ID (LNRule_id) matches values in any of the specified columns in table Silhouette
(SELECT LNRule_id1,LNRule_id2,LNRule_id3,LNRule_id4 FROM Silhouette WHERE Silhouette.Silhouette_Skirt=(...))
Edit
There is a many-to-many relationship (each Silhouette has several applicable Rules, and each Rule can apply to several Silhouettes). All the rules reside in the table 'Rules' (one per row), and each rule has an id ('LNRule_id'). The table 'Silhouette' has columns which tell it which rows need to be called from 'Rules' by 'LNRule_id' (LNRule_id1,2,3,4 indicate which Rules should be called, and store the values of the id's for the relevant rows in table 'Rules')
The error message currently being generated by the IN Operand is:
SQLSTATE[21000]: Cardinality violation: 1241 Operand should contain 1
column(s)
I think you want this query
SELECT * FROM Rules r
JOIN Silhouette s
(ON r.LNRule_id=s.LNRule_id1
OR r.LNRule_id=s.LNRule_id2
OR r.LNRule_id=s.LNRule_id3
OR r.LNRule_id=s.LNRule_id4)
JOIN Style st
ON s.Silhouette_Skirt=st.Silhouette_Skirt
WHERE st.Silhouette_Skirt = '$Style_Skirt'
mysql is complaining that on one side of IN you have a single column, and on the other side you have a multi-column rowset. In order for the IN operator to work, the rowset on the right side of IN must have the exact same number of columns as the left side; in this case, one column.
What you are trying to accomplish could perhaps be achieved if you did something like WHERE LNRule_id IN( SELECT LNRule_Id1 ...) OR LNRule_id IN( SELECT LNRule_Id2 ...) OR ... OR ... but the resulting query would be a monstrosity, and its performance would be horrendous. There may be other ways to go about it too, but anything you try will probably be similarly atrocious.
I do not have enough information to be absolutely sure about what I am saying, but it seems to me that the reason why you have this problem is that your database schema is not normalized. Generally, whenever you see a table with a group of columns having names that all begin with the same prefix and end with a number, it means that someone, somewhere, did not normalize their data.
To address the edit in your question, what you have implemented might conceptually be a many to many relationship, but as far as relational databases are concerned, (you know, the science, the theory, the established practices, the approaches necessary to get things to actually work,) it is definitely not a many to many relationship. Many to many relationships are most certainly not implemented with column1, column2, column3, ... columnN. To be sure that I am not making this stuff up, you can read what others say about many to many relationships here:
Many-to-many relations in RDBMS databases
So, my suggestion, if I correctly understand what is going on, would be to introduce a new table, called SilhouetteRules, which contains two columns, silhouette_id and rule_id. This table will implement a many-to-many relationship between silhouettes and your rules. Then of course you get rid of all the rule1, rule2, rule3, etc. columns from Silhouette.
Once you have done that, you can obtain all silhouettes and all rules associated with them using a query like this:
SELECT * FROM Silhouette
LEFT JOIN SilhouetteRules ON
Silhouette.id = SilhouetteRules.silhouette_id
LEFT JOIN Rules ON
SilhouetteRules.rule_id = Rules.id
The above query will yield multiple rows for each silhouette, where the silhouette fields will be identical from row to row, and only the rule fields will differ. Do not be surprised by this, that's how relational databases work.
Given a given_silhouette_id, you can retrieve all rules associated with it using a query like this:
SELECT * FROM Rules
LEFT JOIN SilhouetteRules ON
Rules.id = SilhouetteRules.rule_id
WHERE
SilhouetteRules.silhouette_id = given_silhouette_id
So, you are going to be using this query as a subquery in queries like the one in the question.
Now, regarding the query in the question, I am unable to tell you exactly how you would need to modify it to get it to work with the normalization that I proposed, because I cannot make sense of it. You see, even if you fix the problem that you currently have with SELECT * FROM table WHERE single-column IN multi-column-rowset, there is another problem further down: the WHERE Silhouette.Silhouette_Skirt=(SELECT ... part would not work either, because you cannot compare the value of a column against the result of a select statement. So, I do not know what you are trying to do there. Hopefully, once you normalize your schema and fix the first problem with your query, then the solution to the second problem will become obvious, or you can ask another question on stackoverflow.
P.S. did Mihai's answer work?
On the project I'm working on we have an activity table and each activity can be linked to one of about 20 different "activity details" tables...
e.g. If the activity was of type "work", then it would have a corresponding activity_details_work record, if it was of type "sick leave" then it would have a corresponding activity_details_sickleave record and so on.
Currently we are loading the activities and then for each activity we have a separate query to go fetch the activity details from the relevant table. This obviously doesn't scale well if you have thousands of activities.
So my initial thought was to have a single query which fetches the activities and joins the details in one go e.g.
SELECT * FROM activity
LEFT JOIN activity_details_1_work ON ...
LEFT JOIN activity_details_2_sickleave ON ...
LEFT JOIN activity_details_3_travelwork ON ...
...etc...
LEFT JOIN activity_details_20_yearleave ON ...
But this will result in each record having 100's of fields, most of which are empty and that feels nasty.
Lazy-loading the details isn't really an option either as the details are almost always requested in the core logic, at least for the main types anyway.
Is there a super clever way of doing this that I'm not thinking of?
Thanks in advance
My suggestion is to define a view for each ActivityType, that is tailored specifically to that activity.
Then add an index on the Activity table lead by the ActivityType field. Cluster said index unless there is an overwhelming need for some other to be clustered (or performance benchmarking shows some other clustering selection to be more performant).
Is there a particular reason why this degree of denormalization was designed in? Is that reason well known?
Chances are your activity tables are like (date_from, date_to, with_who, descr) or something to that effect. As Pieter suggested, consider tossing in a type varchar or enum field in there, so as to deal with a single details table.
If there are rational reasons to keep the tables apart, consider adding triggers that maintain boolean/tinyint fields (has_work, has_sickleave, etc), or a bit string (has_activites_of_type where the first position amounts to has_work, the next to has_sickleave, etc.).
Either way, you'll probably be better off by fetching the activity's details in one or more separate queries -- if only to avoid field name collisions.
I don't think enum is the way to go, because as you say there might be 1000's of activities, then altering your activity table would become an issue.
There is no point doing a left join on a large number of tables either.
So the options that you have are :
See this The first comment might be useful.
I am guessing that your activity table has a field called activity_type_id.
Build a table called activity_types containing fields activity_type_id, activity_name, activity_details_table_name. First query in the following way
activity
inner join
activity_types
using( activity_type_id )
This query gives you the table name on which to query for the details.
This way you can add any new activity type just by adding a row in the activity_types table.
I am developing a forum in PHP MySQL. I want to make my forum as efficient as I can.
I have made these two tables
tbl_threads
tbl_comments
Now, the problems is that there is a like and dislike button under the each comment. I have to store the user_name which has clicked the Like or Dislike Button with the comment_id. I have made a column user_likes and a column user_dislikes in tbl_comments to store the comma separated user_names. But on this forum, I have read that this is not an efficient way. I have been advised to create a third table to store the Likes and Dislikes and to comply my database design with 1NF.
But the problem is, If I make a third table tbl_user_opinion and make two fields like this
1. comment_id
2. type (like or dislike)
So, will I have to run as many sql queries as there are comments on my page to get the like and dislike data for each comment. Will it not inefficient. I think there is some confusion on my part here. Can some one clarify this.
You have a Relational Scheme like this:
There are two ways to solve this. The first one, the "clean" one is to build your "like" table, and do "count(*)'s" on the appropriate column.
The second one would be to store in each comment a counter, indicating how many up's and down's have been there.
If you want to check, if a specific user has voted on the comment, you only have to check one entry, wich you can easily handle as own query and merge them two outside of your database (for this use a query resulting in comment_id and kind of the vote the user has done in a specific thread.)
Your approach with a comma-seperated-list is not quite performant, due you cannot parse it without higher intelligence, or a huge amount of parsing strings. If you have a database - use it!
("One Information - One Dataset"!)
The comma-separate list violates the principle of atomicity, and therefore the 1NF. You'll have hard time maintaining referential integrity and, for the most part, querying as well.
Here is one way to do it in a normalized fashion:
This is very clustering-friendly: it groups up-votes belonging to the same comment physically close together (ditto for down-votes), making the following query rather efficient:
SELECT
COMMENT.COMMENT_ID,
<other COMMENT fields>,
COUNT(DISTINCT UP_VOTE.USER_ID) - COUNT(DISTINCT DOWN_VOTE.USER_ID) SCORE
FROM COMMENT
LEFT JOIN UP_VOTE
ON COMMENT.COMMENT_ID = UP_VOTE.COMMENT_ID
LEFT JOIN DOWN_VOTE
ON COMMENT.COMMENT_ID = DOWN_VOTE.COMMENT_ID
WHERE
COMMENT.COMMENT_ID = <whatever>
GROUP BY
COMMENT.COMMENT_ID,
<other COMMENT fields>;
[SQL Fiddle]
Please measure on realistic amounts of data if that works fast enough for you. If not, then denormalize the model and cache the total score in the COMMENT table, and keep it current it through triggers every time a new row is inserted to or deleted from *_VOTE tables.
If you also need to get which comments a particular user voted on, you'll need indexes on *_VOTE {USER_ID, COMMENT_ID}, i.e. the reverse of the primary/clustering key above.1
1 This is one of the reasons why I didn't go with just one VOTE table containing an additional field that can be either 1 (for up-vote) or -1 (for down-vote): it's less efficient to cover with secondary indexes.
Is there a way that I can do a select as such
select * from attributes where product_id = 500
would return
id name description
1 wheel round and black
2 horn makes loud noise
3 window solid object you can see through
and the query
select * from attributes where product_id = 234
would return the same results as would any query to this table.
Now obviously I could just remove the where clause and go about my day. But this involves editing code that I don't really want to modify so i'm trying to fix this at the database level.
So is there a "magical" way to ignore what is in the where clause and return whatever I want using a view or something ?
Even if it was possible, I doubt it would work. Both of those WHERE clauses expect one thing to be returned, therefore the code would probably just use the first row returned, not all of them.
It would also give the database a behaviour that would make future developers pull their hair out trying to understand.
Do it properly and fix the code.
or you could pass "product_id" instead of an integer, if there's no code checking for that...so the query would become:
select * from attributes where product_id = product_id;
this would give you every row in the table.
If you can't edit the query, maybe you can append to it? You could stick
OR 1=1
on the end.
You may be able to use result set metadata to get what you want, but a result set won't have descriptions of fields. The specific API to get result set metadata from a prepared query varies by programming language, and you haven't said what language you're using.
You can query the INFORMATION_SCHEMA for the products table.
SELECT ordinal_position, column_name, column_comment
FROM INFORMATION_SCHEMA.columns
WHERE table_name = 'products' AND schema_name = 'mydatabase';
You can restructure the database into an Entity-Attribute-Value design, but that's a much more ambitious change than fixing your code.
Or you can abandon SQL databases altogether, and use a semantic data store like RDF, which allows you to query metadata of an entity in the same way you query data.
As far out as this idea seems I'm always interested in crazy ways to do things.
I think the best solution I could come up with is to use a view that uses the products table to get all the products then the attributes table to get the attributes, so every possible product is accounted for and all will get the same result