Conditionally select column values sql - mysql

I've got an SQL table, homes, like this:
+------------------------+-------------------+-------------------+---------------------+
| id | home_one | home_two | test_val |
+------------------------+-------------------+-------------------+---------------------+
| q6KPfv2bsnZTEdiK6McPn4 | 4214 | 1234 (*) | 8 |
| kTEHH6QA9wGGSnFDENeWHk | 6431 | 0251 | 5 |
| fjLrUzp16vKaDWYMHoyvKQ | 1234 (*) | 5381 | 8 |
| hn89YvsayDWEYziv4jZBnR | 8241 | 1682 | 4 |
| wK5QdX54A2z6uH7SKkHiao | 1234 (*) | 9375 | 8 |
+------------------------+-------------------+-------------------+---------------------+
I'd like to filter on a condition such as
SELECT home_one, home_two FROM HOMES
WHERE test_val = 8
although I don't really want to get both home_one and home_two, but I only want to get either home_one, or home_two, depending on which one does not equal '1234'.
So my idea output would be something like:
+---------+
| results |
+---------+
| 4214 |
| 5381 |
| 9375 |
+---------+
I realize this could be done in the server logic instead, but I figure if there's a way to do this in SQL, that'd be nice, since the less server logic necessary, the less strain there will be on the server.
If there's some sort of command like this that can conditionally choose which column value to take for each row that would be what I'm looking for!
For reference, too, something like this is what would be used to make the homes table:
CREATE TABLE homes(
id character varying(24) NOT NULL,
home_one character varying (24) NOT NULL,
home_two character varying(24) NOT NULL,
PRIMARY KEY id
);

Just add the logic you mentioned to the WHERE clause:
SELECT CASE WHEN home_one <> '1234' THEN home_one ELSE home_two END AS home
FROM HOMES
WHERE test_val = 8 AND
(home_one = '1234' AND home_two <> '1234' OR
home_one <> '1234' AND home_two = '1234');

Related

How can I refine this query?

You might want to have a look at my previous question.
My database schema looks like this
--------------- ---------------
| candidate 1 | | candidate 2 |
--------------- \ --------------
/ \ |
------- -------- etc
|job 1| | job 2 |
------- ---------
/ \ / \
--------- --------- --------- --------
|company | | skills | |company | | skills |
--------- --------- ---------- ----------
Here's my database:
mysql> describe jobs;
+--------------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+---------+------+-----+---------+----------------+
| job_id | int(11) | NO | PRI | NULL | auto_increment |
| candidate_id | int(11) | NO | MUL | NULL | |
| company_id | int(11) | NO | MUL | NULL | |
| start_date | date | NO | MUL | NULL | |
| end_date | date | NO | MUL | NULL | |
+--------------+---------+------+-----+---------+----------------+
.
mysql> describe candidates;
+----------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+----------+------+-----+---------+----------------+
| candidate_id | int(11) | NO | PRI | NULL | auto_increment |
| candidate_name | char(50) | NO | MUL | NULL | |
| home_city | char(50) | NO | MUL | NULL | |
+----------------+----------+------+-----+---------+----------------+
.
mysql> describe companies;
+-------------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+---------------+------+-----+---------+----------------+
| company_id | int(11) | NO | PRI | NULL | auto_increment |
| company_name | char(50) | NO | MUL | NULL | |
| company_city | char(50) | NO | MUL | NULL | |
| company_post_code | char(50) | NO | | NULL | |
| latitude | decimal(11,8) | NO | | NULL | |
| longitude | decimal(11,8) | NO | | NULL | |
+-------------------+---------------+------+-----+---------+----------------+
.
Note that I should probably call this skill_usage, as it indicates when a skill was use don a job.
mysql> describe skills;
+----------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+---------+------+-----+---------+-------+
| skill_id | int(11) | NO | MUL | NULL | |
| job_id | int(11) | NO | MUL | NULL | |
+----------+---------+------+-----+---------+-------+
.
mysql> describe skill_names;
+------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+----------+------+-----+---------+----------------+
| skill_id | int(11) | NO | PRI | NULL | auto_increment |
| skill_name | char(32) | NO | MUL | NULL | |
+------------+----------+------+-----+---------+----------------+
So far, my MySQL query looks like this:
SELECT DISTINCT can.candidate_id,
can.candidate_name,
can.candidate_city,
j.job_id,
j.company_id,
DATE_FORMAT(j.start_date, "%b %Y") AS start_date,
DATE_FORMAT(j.end_date, "%b %Y") AS end_date,
s.skill_id
FROM candidates AS can
INNER JOIN jobs AS j ON j.candidate_id = can.candidate_id
INNER JOIN companies AS co ON j.company_id = co.company_id
INNER JOIN skills AS s ON s.job_id = j.job_id
INNER JOIN skill_names AS sn ON s.skill_id = s.skill_id
AND sn.skill_id = s.skill_id
ORDER by can.candidate_id, j.job_id
I am getting output like this, but am not satisfied with it
+--------------+----------------+---------------------+--------+------------+------------+------------+----------+
| candidate_id | candidate_name | candidate_city | job_id | company_id | start_date | end_date | skill_id |
+--------------+----------------+---------------------+--------+------------+------------+------------+----------+
| 1 | Pamela Brown | Cardiff | 1 | 3 | 2019-01-01 | 2019-08-31 | 1 |
| 1 | Pamela Brown | Cardiff | 1 | 3 | 2019-01-01 | 2019-08-31 | 2 |
| 1 | Pamela Brown | Cardiff | 1 | 3 | 2019-01-01 | 2019-08-31 | 1 |
| 1 | Pamela Brown | Cardiff | 2 | 2 | 2018-06-01 | 2019-01-31 | 3 |
| 1 | Pamela Brown | Cardiff | 3 | 1 | 2017-11-01 | 2018-06-30 | 4 |
| 1 | Pamela Brown | Cardiff | 3 | 1 | 2017-11-01 | 2018-06-30 | 5 |
| 1 | Pamela Brown | Cardiff | 3 | 1 | 2017-11-01 | 2018-06-30 | 6 |
| 1 | Pamela Brown | Cardiff | 4 | 3 | 2016-08-01 | 2017-11-30 | 1 |
| 2 | Christine Hill | Salisbury | 5 | 2 | 2018-02-01 | 2019-05-31 | 3 |
Now, I would like to restrict the search, by specifying "skill", like Python, C, C++, UML, etc and company names
The user will enter something like Python AND C++ into a skill search box (and/or Microsoft OR Google into a company name search box).
How do I feed that into my query? Please bear in mind that each skill ID has a job Id associated with it. Maybe I first need to convert the skill names from the search (in this case Python C++) into skill Ids? Even so, how do I include that in my query?
Te make a few things clearer:
both the skills & company search box can be empty, which I will interpret as "return everything"
search terms can include the keywords AND and OR, with grouping brackets (NOT is not required). I am happy enough to parse that in PHP & turn it into a MySQL query term (my difficulty is only with SQL, not PHP)
It looks like I made a start, with that INNER JOIN skills AS s ON s.job_id = j.job_id, which I think will handle a search for a single skill, given its ... name ? ... Id?
I suppose my question would be how would that query look if, for example, I wanted to restrict the results to anyone who had worked at Microsoft OR Google and has the skills Python AND C++?
If I get an example for that, I can extrapolate, but, at this point, I am unsure whether I want more INNER JOINs or WHERE clauses.
I think that I want to extend that second last line AND sn.skill_id = s.skill_id by paring the skills search string, in my example Python AND C++ and generating some SQL along the lines of AND (s.skill_id = X ), where X is the skill Id for Python, BUT I don't know how to handle Python AND C++, or something more complex, like Python AND (C OR C++) ...
Update
Just to be clear, the users are technical and expect to be able to enter complex searches. E.g for skills: (C AND kernel)OR (C++ AND realtime) OR (Doors AND (UML OR QT)).
Final update
The requirements just changed. The person that I am coding this for just told me that if a candidate matches the skill search on any job that he ever worked, then I ought to return ALL jobs for that candidate.
That sounds counter-intuitive to me, but he swears that that is what he wants. I am not sure it can even be done in a single query (I am considering multiple queries; a first t get the candidates with matching skills, then a second to get all of their jobs).
The first thing I'd say is that your original query probably needs an outer join on the skills table - as it stands, it only retrieves people whose job has a skill (which may not be all jobs). You say that "both the skills & company search box can be empty, which I will interpret as return everything" - this version of the query will not return everything.
Secondly, I'd rename your "skills" table to "job_skills", and your "skill_names" to "skills" - it's more consistent (your companies table is not called company_names).
The query you show has a duplication - AND sn.skill_id = s.skill_id duplicates the terms of your join. Is that intentional?
To answer your question: I would present the skills to your users in some kind of pre-defined list in your PHP, associated with a skill_id. You could have all skills listed with check boxes, or allow the user to start typing and use AJAX to search for skills matching the text. This solves a UI problem (what if the user tries to search for a skill that doesn't exist?), and makes the SQL slightly easier.
Your query then becomes:
SELECT DISTINCT can.candidate_id,
can.candidate_name,
can.candidate_city,
j.job_id,
j.company_id,
DATE_FORMAT(j.start_date, "%b %Y") AS start_date,
DATE_FORMAT(j.end_date, "%b %Y") AS end_date,
s.skill_id
FROM candidates AS can
INNER JOIN jobs AS j ON j.candidate_id = can.candidate_id
INNER JOIN companies AS co ON j.company_id = co.company_id
INNER JOIN skills AS s ON s.job_id = j.job_id
INNER JOIN skill_names AS sn ON s.skill_id = s.skill_id
AND skill_id in (?, ?, ?)
OR skill_id in (?)
ORDER by can.candidate_id, j.job_id
You need to substitute the question marks for the input your users have entered.
EDIT
The problem with allowing users to enter the skills as free text is that you then have to deal with case conversion, white space and typos. For instance, is "python " the same as "Python"? Your user probably intends it to be, but you can't do a simple comparison with skill_name. If you want to allow free text, one solution might be to add a "normalized" skill_name column in which you store the name in a consistent format (e.g. "all upper case, stripped of whitespace"), and you normalize your input values in the same way, then compare to that normalized column. In that case, the "in clause" becomes something like:
AND skill_id in (select skill_id from skill_name where skill_name_normalized in (?, ?, ?))
The boolean logic you mention - (C OR C++) AND (Agile) - gets pretty tricky. You end up writing a "visual query builder". You may want to Google this term - there are some good examples.
You've narrowed down your requirements somewhat (I may misunderstand). I believe your requirements are
I want to be able to specify zero or more filters.
A filter consists of one or more ANDed skill groups.
A skill group consists of one or more skills.
Filters are ORed together to create a query.
To make this concrete, let's use your example - (A and (B OR C)) OR (D AND (E OR F)). There are two filters: (A and (B OR C)) and (D AND (E OR F)). The first filter has two skill groups: A and (B OR C).
It's hard to explain the suggestion in text, but you could create a UI that allows users to specify individual "filters". Each "filter" would allow the user to specify one or more "in clauses", joined with an "and". You could then convert this into SQL - again, using your example, the SQL query becomes
SELECT DISTINCT can.candidate_id,
can.candidate_name,
can.candidate_city,
j.job_id,
j.company_id,
DATE_FORMAT(j.start_date, "%b %Y") AS start_date,
DATE_FORMAT(j.end_date, "%b %Y") AS end_date,
s.skill_id
FROM candidates AS can
INNER JOIN jobs AS j ON j.candidate_id = can.candidate_id
INNER JOIN companies AS co ON j.company_id = co.company_id
INNER JOIN skills AS s ON s.job_id = j.job_id
INNER JOIN skill_names AS sn ON s.skill_id = s.skill_id
AND
(skill_id in (A) and skil_id in (B, C))
OR
(skill_id in (D) and skil_id in (E, F))
ORDER by can.candidate_id, j.job_id
Building a bit off previous comments and answers... if handling input like
(A and (B OR C)) OR (D AND (E OR F))
is the blocker you could try moving some of the conditional logic out of the joins and filter instead.
WHERE (
((sn.skill_id LIKE 'A') AND ((sn.skill_id LIKE ('B')) OR (sn.skill_id LIKE('C'))))
AND ((co.company_id IN (1,2,3)) AND ((can.city = 'Springfield') OR (j.city LIKE('Mordor'))))
)
You can build your query string based off used input, search out Id's for selected values and put them into the string and conditionally build as many filters as you like. Think about setting up add_and_filter and add_or_filter functions to construct the <db>.<field> <CONDITION> <VALUE> statements.
$qs = "";
$qs .= "select val from table";
...
$qs .= " WHERE ";
if($userinput){ $qs += add_and_filter($userinput); }
alternately, look at a map/reduce pattern rather than trying to do it all in SQL?

Mysql: Order By with numerical value shows wrong order

I'm using Spring Boot 2.2.6.RELEASE. I have a repository method looks like this:
#Query(value = "SELECT m FROM Media m ORDER BY m.viewCount DESC")
Page<Media> findMedias(Pageable pageable);
I get unordered result list with this. I tried to run the next query in the cli:
SELECT media.view_count FROM mydb.media ORDER BY media.view_count DESC;
The result looks like this:
--------------
| 9 |
| 8 |
| 7 |
| 6 |
| 5 |
| 4 |
| 3 |
| 3 |
| 20 |
| 19 |
| 18 |
| 17 |
| 16 |
| 15 |
| 13 |
| 12 |
| 12 |
| 11 |
| 10 |
| 1 |
| 1 |
--------------
I want the value of 20 to be the first and not 9. Why MySQL do this kind of order? It shows one digit value first instead of the highest number?
EDIT:
I use sql file to create my tables. view_count column has LONG as type, not String. The query looks like this:
CREATE TABLE IF NOT EXISTS media(m_id INTEGER PRIMARY KEY AUTO_INCREMENT, title VARCHAR(255) NOT NULL, category VARCHAR(10) NOT NULL, file_name VARCHAR(255) NOT NULL, view_count LONG NOT NULL, download_count LONG NOT NULL);
The view_count is stored as a string in your table which is not incorrect. If you can, then change it to integer. If you can not do that then, use the below to get the desired output.
SELECT m.view_count
FROM media m
ORDER BY CAST(m.view_count AS UNSIGNED) DESC;

Get distinct results from several tables

I need to implement mysql query to calculate space used by user's mailbox.
A message thread may have multiple messages (reply, follow up) by 2 parties
(sender/recipient) and is tagged with one or more tags (Inbox, Sent etc.).
The following conditions have to be met:
a) user is either recipient OR author of the message;
b) message IS TAGGED by any of the tags: 1,2,3,4;
c) distinct records only, ie if the thread, containing messages is tagged with
more than one of the 4 tags (for example 1 and 4: Inbox and Sent) the calculation
is done on one tag only
I have tried the following query but I am not able to get distinct values - the
subject/body values are duplicated:
SELECT SUM(LENGTH(subject)+LENGTH(body)) AS sum
FROM om_msg_message omm
JOIN om_msg_index omi ON omm.mid = omi.mid
JOIN om_msg_tags_index omti ON omi.thread_id = omti.thread_id AND omti.uid = user_id
WHERE (omi.recipient = user_id OR omi.author = user_id) AND omti.tag_id IN (1,2,3,4)
GROUP BY omi.mid;
Structure of the tables:
om_msg_message - fields subject and body are the ones to be calculated
+--------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------------+------+-----+---------+----------------+
| mid | int(10) unsigned | NO | PRI | NULL | auto_increment |
| subject | varchar(255) | NO | | NULL | |
| body | longtext | NO | | NULL | |
| timestamp | int(10) unsigned | NO | | NULL | |
| reply_to_mid | int(10) unsigned | NO | | 0 | |
+--------------+------------------+------+-----+---------+----------------+
om_msg_index
+-----+-----------+-----------+--------+--------+---------+
| mid | thread_id | recipient | author | is_new | deleted |
+-----+-----------+-----------+--------+--------+---------+
| 1 | 1 | 1392 | 1211 | 0 | 0 |
| 2 | 1 | 1211 | 1392 | 1 | 0 |
+-----+-----------+-----------+--------+--------+---------+
om_msg_tags_index
+--------+------+-----------+
| tag_id | uid | thread_id |
+--------+------+-----------+
| 1 | 1211 | 1 |
| 4 | 1211 | 1 |
| 1 | 1392 | 1 |
| 4 | 1392 | 1 |
+--------+------+-----------+
Here's another solution:
SELECT SUM(LENGTH(omm.subject) + LENGTH(omm.body)) as totalLength
FROM om_msg_message omm
JOIN om_msg_index omi
ON omi.mid = omm.mid
AND (omi.recipient = user_id OR omi.author = user_id)
JOIN (SELECT DISTINCT thread_id
FROM om_msg_tags_index
WHERE uid = user_id
AND tag_id IN (1, 2, 3, 4)) omti
ON omti.thread_id = omi.thread_id
I'm assuming that:
user_id is a parameter marker/host variable, being queried for an individual user.
You want the total of all messages per user, not the total length of each message (which is what the GROUP BY clause in your version was getting you).
That mid in both om_msg_message and om_msg_index is unique.
So, your problem is the IN clause. I'm not a MYSQL guru, but in T-SQL you could change it to have a where clause on a subquery that contained an EXISTS so your join didn't pop out two rows. You need to compensate for the fact that you have two rows with different tagID's associated with each row of your primary join data.
The way I could do it cross-platform would be with four left-joins that linked tables then demanded a non-null value for 1, 2, 3, or 4. Fairly inefficient; I'm sure there's a better way to do it in MySQL, but now that you know what the problem is you might know a better solution.

SQL algorithm to as near to linear time as possible and tweaking of select statement

I am using MySQL version 5.5 on Ubuntu.
My database tables are setup as follows:
DDLs:
CREATE TABLE 'asx' (
'code' char(3) NOT NULL,
'high' decimal(9,3),
'low' decimal(9,3),
'close' decimal(9,3),
'histID' int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY ('histID'),
UNIQUE KEY 'code' ('code')
)
CREATE TABLE 'asxhist' (
'date' date NOT NULL,
'average' decimal(9,3),
'histID' int(11) NOT NULL,
PRIMARY KEY ('date','histID'),
KEY 'histID' ('histID'),
CONSTRAINT 'asxhist_ibfk_1' FOREIGN KEY ('histID') REFERENCES 'asx' ('histID')
ON UPDATE CASCADE
)
t1:
| code | high | low | close | histID (primary key)|
| asx | 10.000 | 9.500 | 9.800 | 1
| nab | 42.000 | 41.250 | 41.350 | 2
t2:
| date | average | histID (foreign key) |
| 2013-01-01| 10.000 | 1 |
| 2013-01-01| 39.000 | 2 |
| 2013-01-02| 9.000 | 1 |
| 2013-01-02| 38.000 | 2 |
| 2013-01-03| 9.500 | 1 |
| 2013-01-03| 39.500 | 2 |
| 2013-01-04| 11.000 | 1 |
| 2013-01-04| 38.500 | 2 |
I am attempting to complete a select query that produces this as a result:
| code | high | low | close | asxhist.average |
| asx | 10.000 | 9.500 | 9.800 | 11.000, 9.5000 |
| nab | 42.000 | 41.250 | 41.350 | 38.500,39.500 |
Where the most recent information in table 2 is returned with table 1 in a csv format.
I have managed to get this far:
SELECT code, high, low, close,
(SELECT GROUP_CONCAT(DISTINCT t2.average ORDER BY date DESC SEPARATOR ',') FROM t2
WHERE t2.histID = t1.histID)
FROM t1;
Unfortunately this returns all values associated with hID. I'm taking a look at xaprb.com's firstleastmax-row-per-group-in-sql solution but I have been banging my head all day and the slight wooziness seems to be dimming my ability to comprehend how I should use it to my benefit. How can I limit the results to the most 5 recent values and considering the tables will eventually be megabytes in size, try and remain in O(n2) or less? (Or can I?)
Temporary work around using SUBSTRING_INDEX and not a feasible solution for huge data
SELECT code, high, low, close,
(SELECT SUBSTRING_INDEX(GROUP_CONCAT(asxhist.average), ',', 3)
FROM asxhist
WHERE asxhist.histID = asx.histID
ORDER BY date DESC)
FROM asx;
From what I gather Limit option in GROUP_CONCAT is still under feature-request.
Also on stackoverflow hack MySQL GROUP_CONCAT

MySQL query - only exact result or every choice

I've a query that I need some help with -
As part of a form I've got a serial number field that is populated if there is a serial number, blank if it's not, or no result if it's an invalid serial number.
select *
from cust_site_contract as cs
where cs.serial_no = 'C20050' or (cs.serial_no <> 'C20050' and if(cs.serial_no = 'C20050',1,0)=0)
limit 10;
Here's a sample of the regular data:
+----------------------+-----------+-----------+-----------
| idcust_site_contract | system_id | serial_no | end_date
+----------------------+-----------+-----------+-----------
| 561315 | SH001626 | C19244 | 2009-12-21
| 561316 | SH001626 | C19244 | 2010-06-30
| 561317 | SH002125 | C19671 | 2010-05-31
| 561318 | SH001766 | C14781 | 2010-09-25
| 561319 | SH001766 | C14781 | 2011-02-15
| 561320 | SH002059 | C19020 | 2008-07-09
| 561321 | SH002639 | C18889 | 2008-03-31
| 561322 | SH002639 | C18889 | 2008-06-30
| 561323 | SH002715 | C20051 | 2010-04-30
| 561324 | SH002719 | C20057 | 2010-04-30
And an exact result would look something like this:
| 561487 | SH002837 | C20050 | 2012-07-04
I was writing this as a subquery so I could match the system_ids to customer and contract names, but realised I was getting garbage pretty early on.
I'm tempted to try and simplify it by saying the third case might not hold true (i.e. if it's an invalid serial number, allow the choice of any customer name and simply flag it in the data)
Has anyone got any ideas of where I'm going wrong? The combination of conditions is clearly wrong, and I can't work out how to make each side of the or statement mutually exclusive
Even if I try to evaluate only the if(sn = 'blah') I get the wrong result for obvious reasons, but can't think of a sane way to express it.
Many thanks
Scott
If there is is no contract with a serial number of C20050, this query will return all rows, otherwise, it will return only one row where serial_no is C20050:
SELECT a.*
FROM cust_site_contract a
INNER JOIN
(
SELECT COUNT(*) AS rowexists
FROM cust_site_contract
WHERE serial_no = 'C20050'
) b ON b.rowexists = 0
UNION ALL
(
SELECT *
FROM cust_site_contract
WHERE serial_no = 'C20050'
LIMIT 1
)
If you just write the query as below you will get blank if doesn't exists or it's an invalid serial number.
select cs.serial_no from cust_site_contract as cs where cs.serial_no = 'C20050'