Check membership of elements of one column in another, mySQL - mysql

How would I go about counting values that appear in column 1, but not column 2. They are from the same table, without using subqueries or anything fancy. They may or may not share other common column values (like col 3 = col 4) but this doesnt matter.
I have it almost working with subqueries, but cannot figure how to do it without. The only problem (I think) is it will count something twice if the primary key (composed of col1,col3,col4) are different but col1 is the same.
SELECT DISTINCT COUNT(*)
FROM mytable t1
WHERE NOT EXISTS (
SELECT DISTINCT *
FROM mytable
WHERE t1.column1 = mytable.column2
);
But like I said, I'm trying to figure this without subqueries anyways

How about:
SELECT COUNT(*)
FROM mytable mt1
LEFT JOIN mytable mt2 ON mt1.column1 = mt2.column2
WHERE mt2.column IS NULL

Please see this:
SELECT
SUM(IF(column1 = column2, 0, 1)) as c
FROM
mytable

Related

SQL Join 2 tables with almost same field

I need to join two tables in SQL. There are no common fields. But the one table have a field with the value krin1001 and I need it to be joined with the row in the other table where the value is 1001.
The idea behind the joining is i have multiple customers, but in the one table there customer id is 'krin1001' 'krin1002' and so on, in this table is how much they have sold. In the other table there customer is is '1001' '1002' and so on, and in this table is there name and adress and so on. So it will always be the first 4 charakters i need to strip from the field before matching and joining. It might not always be 'krin' i need it to work with 'khjo1001' also, and it still needs to join on the '1001' value from the other table.
Is that possible?
Hope you can help me.
You need to use substring:
ON SUBSTRING(TableA.Field, 5, 4) = TableB.Field
Or Right:
ON RIGHT(TableA.Field, 4) = TableB.Field
You can also try to use CHARINDEX function for join operation. If value from table1 contains value from table2 row will be included in result set.
;WITH table1 AS(
SELECT 'krin1001' AS val
UNION ALL
SELECT 'xxx'
UNION ALL
SELECT 'xyz123'
),
table2 AS(
SELECT '1001' AS val
UNION ALL
SELECT '12345'
UNION ALL
SELECT '123'
)
SELECT * FROM table1 AS t
JOIN table2 AS T2 ON CHARINDEX(T2.val, T.val) > 0
Use it as:
SELECT
*
FROM table t1
INNER JOIN table t2 ON RIGHT(t1.col1, 4) = t2.col1;

percentile by COUNT(DISTINCT) with correlated WHERE only works with a view (or without DISTINCT)

I've got a weird one, and I don't know if it's my syntax (which seems straightforward) or a bug (or just unsupported).
Here's my query that works but is needlessly slow:
UPDATE table1
SET table1column1 =
(SELECT COUNT(DISTINCT table2column1) FROM table2view WHERE table2column1 <= (SELECT table2column1 FROM table2 WHERE table2.id = table1.id) )
/
(SELECT COUNT(DISTINCT table2column1) FROM table2)
+ (SELECT COUNT(DISTINCT table2column2) FROM table2view WHERE table2column2 <= (SELECT table2column2 FROM table2 WHERE table2.id = table1.id) )
/
(SELECT COUNT(DISTINCT table2column2) FROM table2)
+ (SELECT COUNT(DISTINCT table2column3) FROM table2view WHERE table2column3 <= (SELECT table2column3 FROM table2 WHERE table2.id = table1.id) )
/ (SELECT COUNT(DISTINCT table2column3) FROM table2);
It's just the sum of three percentiles (of table2column1, table2column2, and table2column3) with duplicates removed.
Here's where it gets weird. I have to use a view for this to work on the subquery with the WHERE or it will only UPDATE the first row of table1, and set the rest of the rows' table1column1 to 0. That table2view is an exact duplicate of table2. Yeah, weird.
If I don't use DISTINCT, I can do it without the view. Does that make sense? Note: I have to have DISTINCT because I have lots of duplicates.
I tried making it SELECT only from the view, but that slowed it down worse.
Does anyone know what the problem is and the best way to rework this query so it doesn't take so long? It's in a TRIGGER, and the updated data is pretty on demand.
Many thanks in advance!
Details
I'm testing the speed in phpMyAdmin's command line.
I'm pretty sure the degradation is coming from the view since the more of the view and the less of the actual table I use, the slower it gets.
When I do the one without DISTINCT, it's lightning fast.
Only works on views?
OK, so I just set up a copy of table2. I tried first to do the original query substituting the view with the copy. No go.
I tried to do the query below with the copy instead of the view. No go.
Hopefully the introduction of these constants will better show what I'm trying to do.
SET #table2column1_distinct_count = (SELECT COUNT(DISTINCT table2column1) FROM table2);
SET #table2column2_distinct_count = (SELECT COUNT(DISTINCT table2column2) FROM table2);
SET #table2column3_distinct_count = (SELECT COUNT(DISTINCT table2column3) FROM table2);
UPDATE table1, table2
SET table1.table1column1 = (SELECT COUNT(DISTINCT table2column1) FROM table2view WHERE table2column1 <= table2.table2column1) / #table2column1_distinct_count
+ (SELECT COUNT(DISTINCT table2column2) FROM table2view WHERE table2column2 <= table2.table2column2) / #table2column2_distinct_count
+ (SELECT COUNT(DISTINCT table2column3) FROM table2view WHERE table2column3 <= table2.table2column3) / #table2column3_distinct_count
WHERE table1.id = table2.id;
Again, when I use table2 instead of the table2view, it only updates the first row properly and sets all other rows' table1.table1column1 = 0.
Math
I'm trying to set table1.table1column1 = to the sum of the percentiles of table2column1, table2column2, and table2column3 by id.
I do a percentile by (counting the distinct values of a table2columnX <= to the current table2columnX ) / (the total count of distinct table2columnXs).
I use DISTINCT to get rid of the excessive duplicates.
View
Here's the SELECT for the view. Does this help?
CREATE VIEW myTable.table2view AS SELECT
table2.table2column1 AS table2column1,
table2.table2column2 AS table2column2,
table2.table2column2 AS table2column3,
FROM table2
GROUP BY table2.id;
Is there something special about the GROUP BY in the view's SELECT that makes this work (that I'm not seeing)?
I would probably say that the query is slow because it is repeatedly accessing the table when the trigger fires.
I am no SQL expert but I have tried to put together a query using temporary tables. You can see if it helps speed up the query. I have used different but similar sounding column names in my code sample below.
EDIT : There was a calculation error in my earlier code. Updated now.
SELECT COUNT(id) INTO #no_of_attempts from tb2;
-- DROP TABLE IF EXISTS S1Percentiles;
-- DROP TABLE IF EXISTS S2Percentiles;
-- DROP TABLE IF EXISTS S3Percentiles;
CREATE TEMPORARY TABLE S1Percentiles (
s1 FLOAT NOT NULL,
percentile FLOAT NOT NULL DEFAULT 0.00
);
CREATE TEMPORARY TABLE S2Percentiles (
s2 FLOAT NOT NULL,
percentile FLOAT NOT NULL DEFAULT 0.00
);
CREATE TEMPORARY TABLE S3Percentiles (
s3 FLOAT NOT NULL,
percentile FLOAT NOT NULL DEFAULT 0.00
);
INSERT INTO S1Percentiles (s1, percentile)
SELECT A.s1, ((COUNT(B.s1)/#no_of_attempts)*100)
FROM (SELECT DISTINCT s1 from tb2) A
INNER JOIN tb2 B
ON B.s1 <= A.s1
GROUP BY A.s1;
INSERT INTO S2Percentiles (s2, percentile)
SELECT A.s2, ((COUNT(B.s2)/#no_of_attempts)*100)
FROM (SELECT DISTINCT s2 from tb2) A
INNER JOIN tb2 B
ON B.s2 <= A.s2
GROUP BY A.s2;
INSERT INTO S3Percentiles (s3, percentile)
SELECT A.s3, ((COUNT(B.s3)/#no_of_attempts)*100)
FROM (SELECT DISTINCT s3 from tb2) A
INNER JOIN tb2 B
ON B.s3 <= A.s3
GROUP BY A.s3;
-- select * from S1Percentiles;
-- select * from S2Percentiles;
-- select * from S3Percentiles;
UPDATE tb1 A
INNER JOIN
(
SELECT B.tb1_id AS id, (C.percentile + D.percentile + E.percentile) AS sum FROM tb2 B
INNER JOIN S1Percentiles C
ON B.s1 = C.s1
INNER JOIN S2Percentiles D
ON B.s2 = D.s2
INNER JOIN S3Percentiles E
ON B.s3 = E.s3
) F
ON A.id = F.id
SET A.sum = F.sum;
-- SELECT * FROM tb1;
DROP TABLE S1Percentiles;
DROP TABLE S2Percentiles;
DROP TABLE S3Percentiles;
What this does is that it records the percentile for each score group and then finally just updates the tb1 column with the requisite data instead of recalculating the percentile for each student row.
You should also index columns s1, s2 and s3 for optimizing the queries on these columns.
Note: Please update the column names according to your db schema. Also note that each percentile calculation has been multiplied by 100 as I believe that percentile is usually calculated that way.

Get number of values that only appear once in a column

Firstly, if it is relevant, I'm using MySQL, though I assume a solution would work across DB products. My problem is thus:
I have a simple table with a single column. There are no constraints on the column. Within this column there is some simple data, e.g.
a
a
b
c
d
d
I need to get the number/count of values that only appear once. From the example above that would be 2 (since only b and c occur once in the column).
Hopefully it's clear I don't want DISTINCT values, but UNIQUE values. I have actually done this before, by creating an additional table with a UNIQUE constraint on the column and simply INSERTing to the new table from the old one, handling the duplicates accordingly.
I was hoping to find a solution that did not require the temporary table, and could somehow just be accomplished with a nifty SELECT.
Assuming your table is called T and your field is called F:
SELECT COUNT(F)
FROM (
SELECT F
FROM T
GROUP BY F
HAVING COUNT(*) = 1
) AS ONLY_ONCE
select count(*) from
(
select
col1, count(*)
from
Table
group by
Col1
Having
Count(Col1) = 1
)
just nest it a little...
select count( cnt ) from
( select count(mycol) cnt from mytab group by mycol )
where cnt = 1
select field1, count(field1) from my_table group by field1 having count(field1) = 1
select count(*) from (select field1, count(field1) from my_table group by field1 having count(field1) = 1)
first one will return the ones that are unique and second one will return the number of unique elements.
Could it be as simple as this:
Select count(*) From MyTable Group By MyColumn Where Count(MyColumn) = 1
This is what I did and it worked:
SELECT name
FROM people JOIN stars ON stars.person_id = people.id
JOIN movies ON movies.id = stars.movie_id
WHERE year = 2004
GROUP BY name, person_id ORDER BY birth;
note: I was working with several tables here.
CS50 Problem Set 7 (pset7) 9.sql fix!!

DISTINCT on multiple columns

I have, SELECT DISTINCT (first),second,third FROM table
AND i want not only the first to be DISTINCT and the second to be DISTINCT to but the third to stay without DISTINCT , i tryed like that.
SELECT DISTINCT (first,second),third FROM table
And couple more things but didnt worked.
SELECT m.first, m.second, m.third -- and possibly other columns
FROM (
SELECT DISTINCT first, second
FROM mytable
) md
JOIN mytable m
ON m.id =
(
SELECT id
FROM mytable mi
WHERE mi.first = md.first
AND mi.second = md.second
ORDER BY
mi.first, mi.second, mi.third
LIMIT 1
)
Create an index on (first, second, third) for this to work fast.
Have you seen this post?
Select distinct from multiple fields using sql
They seem very similar, maybe you could try something like that?
Hope this helps!

INSERT SELECT in MySQL with superfluous aggregate column

I'd like to do an insert select where the select statement has aggregate columns for use by a "HAVING" clause, but where I do not actually want those columns to be inserted. A simple example:
INSERT INTO table1 ( a )
SELECT a, MAX (b) AS maxb FROM table2
GROUP BY a
HAVING maxb = 1
Of course, this won't work because there are a different number of columns in the INSERT and the SELECT. Is there as simple way to make this work? I was hoping I could define some sort of null column in the INSERT field list, or something. I was hoping to avoid a subquery in my SELECT statement, although I could probably do it that way if necessary.
INSERT INTO table1 ( a )
SELECT a FROM (SELECT a, MAX (b) AS maxb FROM table2
GROUP BY a
HAVING maxb = 1) t
You can rewrite the query like this
INSERT INTO table1 ( a )
SELECT a FROM table2
GROUP BY a
HAVING MAX (b) = 1