I've asked a question here. Now, I need reverse output of it. I've following table, My_Table, with structure as follows,
Name | Address
----------------
Test1 | abc_123
Test1 | abc_456
Test2 | xyz_123
Test1 | xyz_123
Test2 | xyz_456
Test3 | abc_123
Test4 | xyz_123
Test1 | xyz_456
Test3 | xyz_123
Test4 | xyz_456
I need output as follows,
Name
-------
Test2
Test4
I need to do group by chunkwise and select the Name, such that none of the address has prefix as abc.
Output Explanation in detail:
Test1 has addresses as abc_123, abc_456, xyz_123, abc_123 and at least one of the address has prefix as abc. Hence not in output.
Test2 is has addresses as xyz_123, xyz_456 and none of them has prefix abc. Hence in output.
x = select distinct(Name) from My_Table
y = select distinct(Name) from My_Table where Address like 'abc%'
Result = x - y
I was able to achieve result using 2 queries (as shown above) and then using subtract operation in the result set, but can this be achieved in single query?
Note: I'm using JAVA for query MYSQL DB, hence I can subtract result sets. My table size is huge, hence I want to optimize number of queries!
One method uses aggregation and is similar to the other answer:
select name
from my_table
group by name
having sum(address like 'abc%') = 0;
The having clause counts the number of matching rows. The = 0 says there are none for a given name.
Related
Table structure and data (I know data in IP/domain fields might not make much sense, but this is for illustration purposes):
rec_id | account_id | product_id | ip | domain | some_data
----------------------------------------------------------------------------
1 | 1 | 1 | 192.168.1.1 | 127.0.0.1/test | abc
2 | 1 | 1 | 192.168.1.1 | 127.0.0.1/other | xyz
3 | 1 | 1 | 192.168.1.2 | 127.0.0.1/test | ooo
Table has unique index ip_domain combined from ip and domain fields (so records with identical values in both fields can't exist).
In each case I know values for account_id, product_id, ip, domain fields, and I need to get other rows that have the SAME account_id, product_id values and one (or both) of ip, domain values are DIFFERENT.
Example: I know that account_id=1, product_id=1, ip=192.168.1.1, domain=127.0.0.1/test (so it matches rec_id 1), I need to select records with IDs 2 and 3 (because record 2 has different domain and record 3 has different ip).
So, I used query:
SELECT * FROM table WHERE
account_id='1' AND product_id='1' AND ip!='192.168.1.1' AND domain!='127.0.0.1/test'
Of course, it returned 0 rows. Looked at mysql multiple where and where not in and wrote:
SELECT * FROM table WHERE
account_id='1' AND product_id='1' AND installation_ip NOT IN ('192.168.1.1') AND installation_domain NOT IN ('127.0.0.1/test')
My guess is that this query is identical (just formatted different way), so 0 rows again. Found some more examples too, but none worked in my case.
The syntax is correct, but you're using the wrong logical operation
SELECT *
FROM table
WHERE account_id='1' AND product_id='1' AND
(ip != '192.168.1.1' OR domain != '127.0.0.1/test')
Select * from table
Where ROWID <> myRowid
And account_id = '1'
And product_id = '1';
myRowid is the unique id given by your dbms to each record, in this case you need to retrieve it with your select statement and then pass it back when using this select. This will return all the rows with account_id = 1 and product_id = 1 except the one you have selected.
If your inputs are not defined/or if you want list then you may be look at Group By clause. Also, you may look at group_concat
Query would be something like:
SELECT ACCOUNT_ID, PRODUCT_ID, GROUP_CONCAT(DISTINCT IP||'|'||DOMAIN, ','), COUNT(1)
FROM TABLE
GROUP BY ACCOUNT_ID, PRODUCT_ID
P.S.: I dont have mysql installed hence the query syntax is not verified
I'd appreciate your help with an SQL problem.
I have some student quiz score data in an SQL table and I wish to write a query to extract the information that I want. Candidates can attempt the tests as many times as they wish. Ideally, for each candidate, I wish to find out their highest percentage score on each of the tests. And I wish to get an average percentage of their highest percentage score on each test. Many of the candidates will not have done all of the tests. For example, candidate 1's highest scores on tests 1, 2 and 3 are 50%, 100% and 0%, leaving an overall average of 50%.
The table is named resultsets. The relevant column titles names are: Candidate (this is the student ID number), QuizName (the title of each quiz), and PercentageScore. It looks like this:
Candidate | QuizName | PercentageScore
---------------------------------------
1 | Test1 | 25
1 | Test1 | 50
1 | Test2 | 100
1 | Test3 | 0
2 | Test1 | 50
2 | Test1 | 100
3 | Test3 | 75
I'm hoping to get a table that looks something like this:
Candidate | Test1 | Test 2 | Test 3 | AveragePercentageScore
---------------------------------------
1 | 50 | 100 | 0 | 50
2 | 50 | 100 | 0 | 50
3 | 0 | 0 | 75 | 25
(Thanks Jain) I'd like to know the SQL command that I should enter.
Thank you!
Aside from being a beginner, it would be good to get a handle on basic table / database structures, relationships, use of primary / foreign keys and especially data normalization.
As for learning queries, I have seen other people utilize SQL Zoo as it has sample data and covers samples of how to look for certain things that require different query, joins, left-joins, aggregates etc.
All that said, sometimes it makes things easier if you can understand queries based on YOUR data, not some generic sample database that you have no context on its application to your data.
With all that said, I will help you get started. You need aggregates (min, max, avg, count) that are typically applicable based on a "GROUP BY" column(s). In this first case, you want to find "for each candidate" (the group by), and each "QUIZ" for that CANDIDATE (also part of group by), you want the highest test.
SELECT
Q.candidate,
Q.quizname,
MAX( Q.PercentageScore ) as HighestScore
from
YourQuizTable Q
group by
Q.candidate,
Q.quizname
Will result in the following.
Candidate QuizName HighestScore
1 Test1 50
1 Test2 100
1 Test3 0 (a legit score on file)
2 Test1 100
3 Test3 75
From that, you could create a pivot. Now, different sql engines have different pivot syntax, but to better see on these specific quizes posted, I will be doing a hard-coded pivot. Since the pivot is derived (uses the first query as the basis), the first query IS the basis of the pivot.
SELECT
smry.Candidate,
if( smry.quizname = 'Test1', smry.HighestScore, 0 ) as HiTest1,
if( smry.quizname = 'Test2', smry.HighestScore, 0 ) as HiTest2,
if( smry.quizname = 'Test3', smry.HighestScore, 0 ) as HiTest3,
AVG( smry.HighestScore ) as AvgTest
from
( SELECT
Q.candidate,
Q.quizname,
MAX( Q.PercentageScore ) as HighestScore
from
YourQuizTable Q
group by
Q.candidate,
Q.quizname ) smry
group by
smry.Candidate
The "IF()" is applied as each row is attempted, and each row will only ever have an instance of 1 quizname, it would only be either "Test1", "Test2" or "Test3". IF it IS that proper test, then grab the highest score as the basis to show in that column result. The last column is a simple average.
The final group by is now keeping them per candidate, but this time WITHOUT the group of the quiz as the inner query had.
I have a column called "Permissions" in my table. The permissions are strings which can be:
"r","w","x","rw","wx","rwx","xwr"
etc. Please note the order of characters in the string is not fixed. I want to GROUP_CONCAT() on the "Permissions" column of my table. However this causes very large strings.
Example: "r","wr","wx" group concatenated is "r,wr,wx" but should be "r,w,x" or "rwx". Using distinct() clause doesn't seem to help much. I am thinking that if I could check if a permission value is a substring of the other column then I should not concatenate it, but I don't seem to find a way to accomplish that.
Any column based approach using solely string functions would also be appreicated.
EDIT:
Here is some sample data:
+---------+
| perm |
+---------+
| r,x,x,r |
| x |
| w,rw |
| rw |
| rw |
| x |
| w |
| x,x,r |
| r,x |
+---------+
The concatenated result should be:
+---------+
| perm |
+---------+
| r,w,x |
+---------+
I don't have control over the source of data and would like not to create new tables ( because of restricted privileges and memory constraints). I am looking for a post-processing step that converts each column value to the desired format.
A good idea would be to first normalize your data.
You could, for example try this way (I assume your source table is named Files):
Create simple table called PermissionCodes with only column named Code (type of string).
Put r, w, and x as values into PermissionCodes (three rows total).
In a subquery join Files to PermissionCodes on a condition that Code exists as a substring in Permissions.
Perform your GROUP_CONCAT aggregation on the result of the subquery.
If it is a case here, that for the same logical entires in Files there exists multiple permission sets that overlaps (i.e. for some file there is a row with rw and another row with w) then you would limit your subquery to distinct combinations of Files' keys and Code.
Here's a fiddle to demonstrate the idea:
http://sqlfiddle.com/#!9/6685d6/4
You can try something like:
SELECT user_id, GROUP_CONCAT(DISTINCT perm)
FROM Permissions AS p
INNER JOIN (SELECT 'r' AS perm UNION ALL
SELECT 'w' UNION ALL
SELECT 'x') AS x
ON p.permission LIKE CONCAT('%', x.perm, '%')
GROUP BY user_id
You can include any additional permission code in the UNION ALL of the derived table used to JOIN with Permissions table.
Demo here
Imagine the following table:
id | variant | name
-----------------------
1 | 1 | a
1 | 2 | b
1 | 3 | c
2 | 1 | d
2 | 2 | e
2 | 3 | NULL
3 | 1 | g
Which SQL statement do I need to run to get this:
For a given id and a given variant get the name of this combination.
But if the name is NULL get the name of the row with the given id but with variant 1 (the name of variant 1 is never NULL).
If there is no row with the given variant, again, use the row where variant is 1 with the same id.
Variant 1 is never requested directly.
This is like a fall back mechanism. Or you could consider it as overriding values of rows with variant = 1.
Examples:
id = 1, variant = 2: b
id = 1, variant = 3: c
id = 2, variant = 3: d
id = 3, variant = 5: g
Is this possible with SQL? And is it performing well if the mechanism is applied on many more fields?
Thanks!
Update:
Please note that I would like to have this mechanism not only for the name field. There should be further columns which should have the same behaviour - but each column should be treated on its own.
This should do what you need using a LEFT JOIN to get the optional value.
SELECT COALESCE(b.Name,a.Name)
FROM Table1 a
LEFT JOIN Table1 b
ON a.id=b.id AND b.variant=#y
WHERE a.id=#x AND a.variant=1
An SQLFiddle to test with.
Performance wise, it would depend on how you need to apply the query to get multiple fields. If you can solve your column choice using COALESCE from the existing join, I can't see a big problem, but if you end up with multiple self joins to solve it, you may have a problem.
As always, performance will depend on your data structure, how many rows you have, and what indexes you have created. I would recommend a query like this:
SELECT name FROM table WHERE id=#1 AND ((variant=#2 AND name IS NOT NULL) OR variant=1) ORDER BY variant DESC LIMIT 1
I have a table with multiple rows which have a same data. I used SELECT DISTINCT to get a unique row and it works fine. But when i use ORDER BY with SELECT DISTINCT it gives me unsorted data.
Can anyone tell me how distinct works?
Based on what criteria it selects the row?
From your comment earlier, the query you are trying to run is
Select distinct id from table where id2 =12312 order by time desc.
As I expected, here is your problem. Your select column and order by column are different. Your output rows are ordered by time, but that order doesn't necessarily need to preserved in the id column. Here is an example.
id | id2 | time
-------------------
1 | 12312 | 34
2 | 12312 | 12
3 | 12312 | 48
If you run
SELECT * FROM table WHERE id2=12312 ORDER BY time DESC
you will get the following result
id | id2 | time
-------------------
2 | 12312 | 12
1 | 12312 | 34
3 | 12312 | 48
Now if you select only the id column from this, you will get
id
--
2
1
3
This is why your results are not sorted.
When you specify SELECT DISTINCT it will give you all the rows, eliminating duplicates from the result set. By "duplicates" I mean rows where all fields have the same values. For example, say you have a table that looks like:
id | num
--------------
1 | 1
2 | 3
3 | 3
SELECT DISTINCT * would return all rows above, whereas SELECT DISTINCT num would return two rows:
num
-----
1
3
Note that which row actual row (eg: whether it's row 2 or row 3) it selects is irrelevant, as the result would be indistinguishable.
Finally, DISTINCT should not affect how ORDER BY works.
Reference: MySQL SELECT statement
The behaviour you describe happens when you ORDER BY an expression that is not present in the SELECT clause. The SQL standard does not allow such a query but MySQL is less strict and allows it.
Let's try an example:
SELECT DISTINCT colum1, column2
FROM table1
WHERE ...
ORDER BY column3
Let's say the content of the table table1 is:
id | column1 | column2 | column3
----+---------+---------+---------
1 | A | B | 1
2 | A | B | 5
3 | X | Y | 3
Without the ORDER BY clause, the above query returns following two records (without ORDER BY the order is not guaranteed):
column1 | column2
---------+---------
A | B
X | Y
But with ORDER BY column3 the order is also not guaranteed.
The DISTINCT clause operates on the values of the expressions present in the SELECT clause. If row #1 is processed first then (A, B) is placed in the result set and it is associated with row #1. Then, when row #2 is processed, the values of the SELECT expressions produce the record (A, B) that is already in the result set. Because of DISTINCT it is dropped. Row #3 produces (X, Y) that is also put in the result set. Then, the ORDER BY column3 clause makes the records be sorted in the result set as (A, B), (X, Y).
But if row #2 is processed before row #1 then, following the same logic exposed in the previous paragraph, the records in the result set are sorted as (X, Y), (A, B).
There is no rule imposed on the database engine about the order it processes the rows when it runs a query. The database is free to process the rows in any order it consider it's better for performance.
Your query is invalid SQL and the fact that it can return different results using the same input data proves it.