Comparing field value with count of occurrences - mysql

i dont exactly know how write down this query,
so im asking your guys and gals help.
so, i have a table the contains something like:
COLUMNS
id,keyword,pages
what i basicly need is to get all the rows where pages!=count(keyword)
this is basicly how i tried to do it anyway.
so it should be really simple, return all rows, where the keyword count does not equal the pages column value.
so, if for example the data is like this :
ROW A: 1, aaa, 3
ROW B: 4, aaa, 3
ROW C: 5, aaa, 3
ROW D: 5, aac, 100
with an example as above,
only ROW D will be returned since rows a,b,c PAGES (3) match the keyword count.
any help will welcome.
thx!

A solution without a subquery. This will work filne in Mysql, but for more strict SQL's you need to add some aggregate functios.
SELECT a.*
FROM mytable AS a
LEFT JOIN mytable AS b
ON b.keyword = a.keyword
GROUP BY a.id
HAVING COUNT(b.id) != a.pages
Also use indexes like these:
CREATE INDEX myindex ON mytable (keyword);
CREATE INDEX myindex2 ON mytable (pages);

select *
from table t1
where t1.pages <> (select count(*)
from table t2
where t1.keyword = t2.keyword)
But that's a a pretty slow query. Just to give you an idea...

Related

Query to Check for Values Not Matching List in Where Clause

I am writing a SQL query that uses the IN operator in the WHERE clause on a table that contains millions of records.
I would like to know for the specific values I am checking in my WHERE clause, which ones are missing from the table.
How do I write a query to check for the values that do not result in a match according to the values in my WHERE clause for those values specifically?
For example -
SELECT *
FROM employees
WHERE emp_id IN (123, 456, 789)
Let's say that this query returns 2 rows for emp_id = 123 and emp_id = 456.
I want to write a query that shows me what was missing, illustrating that there was no row in the table specifically for emp_id = 789.
To find rows missing, you must have a canonical set of rows to compare with. Here's an example using the VALUES statement (which is new in MySQL 8.0).
Suppose we have a list 1, 2, 19, 64 and we want to find which ones don't have matching rows in the table mytable.
SELECT t.column_0
FROM (VALUES ROW(1), ROW(2), ROW(19), ROW(64)) AS t
LEFT OUTER JOIN mytable ON t.column_0 = mytable.id
WHERE mytable.id IS NULL;
If you use a version older than MySQL 8.0, you can replace the derived table with this more verbose syntax:
...
FROM (SELECT 1 AS column_0 UNION SELECT 2 UNION SELECT 19 UNION SELECT 64) AS t
...
Or you could also create a temporary table and fill it with the values you want to search for, one value per row.
You can use the opposite of IN which is NOT IN(). Keep the bit inside the brackets the same and you'll get every record that doesn't have that

SQL: FOR Loop in SELECT query

Is there a way to go through a FOR LOOP in a SELECT-query? (1)
I am asking because I do not know how to commit in a single
SELECT-query collection of some data from table t_2 for each row of
table t_1 (please, see UPDATE for an example). Yes, it's true that we can GROUP BY a UNIQUE INDEX but
what if it's not present? Or how to request all rows from t_1, each concatenated with a specific related row from t_2. So, it seems like in a Perfect World we would have to be able to loop through a table by a proper SQL-command (R). Maybe, ANY(...) will help?
Here I've tried to find maximal count of repetitions in column prop among all values of the column in table t.
I.e. I've tried to carry out something alike Pandas'
t.groupby(prop).max() in an SQL query (Q1):
SELECT Max(C) FROM (SELECT Count(t_1.prop) AS C
FROM t AS t_1
WHERE t_1.prop = ANY (SELECT prop
FROM t AS t_2));
But it only throws the error:
Every derived table must have its own alias.
I don't understand this error. Why does it happen? (2)
Yes, we can implement Pandas' value_counts(...) way easier
by using SELECT prop, COUNT() GROUP BY prop. But I wanted to do it in a "looping" way staying in a "single non-grouping SELECT-query mode" for reason (R).
This sub-query, which attempts to imitate Pandas' t.value_counts(...)) (Q2):
SELECT Count(t_1.prop) AS C FROM t AS t_1 WHERE t_1.prop = ANY(SELECT prop FROM t AS t_2)
results in 6, which is simply a number of rows in t. The result is logical. The ANY-clause simply returned TRUE for every row and once all rows had been gathered COUNT(...) returned simply the number of the gathered (i.e. all) rows.
By the way, it seems to me that in the "full" previous SELECT-query (Q1) should return that very 6.
So, the main question is how to loop in such a query? Is there such
an opportunity?
UPDATE
The answer to the question (2) is found here, thanks to
Luuk. I just assigned an alias to the (...) subquery in SELECT Max(C) FROM (...) AS sq and it worked out. And of course, I got 6. So, the question (1) is still unclear.
I've also tried to do an iteration this way (Q3):
SELECT (SELECT prop_2 FROM t_2 WHERE t_2.prop_1 = t_1.prop) AS isq FROM t_1;
Here in t_2 prop_2 is connected to prop_1 (a.k.a. prop in t_1) as many to one. So, along the course, our isq (inner select query) returns several (rows of) prop_2 values per each prop value in t_1.
And that is why (Q3) throws the error:
Subquery returns more than 1 row.
Again, logical. So, I couldn't create a loop in a single non-grouping SELECT-query.
This query will return the value for b with the highest count:
SELECT b, count(*)
FROM table1
GROUP BY b
ORDER BY count(*) DESC
LIMIT 1;
see: DBFIDDLE
EDIT: Without GROUP BY
SELECT b,C1
FROM (
SELECT
b,
ROW_NUMBER() OVER (PARTITION BY B ORDER BY A) C1,
ROW_NUMBER() OVER (PARTITION BY B ORDER BY A DESC) C2
FROM table1
) x
WHERE x.C2=1
see: DBFIDDLE

Select least 6 values across 100 columns in a MySQL row

I have a table with 103 columns. First column (rowID) is the row index, the next one contains a date, and the third one contains a string (a name), then there are 100 columns (named A1 to A100) that each contain an integer. I am trying to write a query to fetch the lowest 6 values among those 100 columns, for each row.
Here is what I tried. I had to write out all 100 columns (is there a better way?), and this only gets me the smallest 1, and NOT the smallest 6:
SELECT LEAST(A1,A2,A3,A4,...A100) FROM myTable WHERE rowID=1
I am thinking maybe I can use 5 queries to run the least command each time, returning the result to the backend which will then exclude the column that contained the least value in the previous query. However I am not sure this is the best way because I am trying to keep it all within MySQL. Is there a way to use sub-queries to do this? Or another effective method. Any help would be appreciated!
Edit: I also need to know the columns from which those minimum 6 values were obtained.
You seem to be storing a multi-valued attribute in a denormalized way.
If you need to do set-oriented comparisons on these values, they should be stored in rows, not columns.
You can "unpivot" them, so each value is on its own row, like this:
SELECT 1 AS ValNo, A1 AS Val FROM MyTable WHERE rowID=1
UNION ALL
SELECT 2, A2 FROM MyTable WHERE rowID=1
UNION ALL
SELECT 3, A3 FROM MyTable WHERE rowID=1
UNION ALL
SELECT 4, A4 FROM MyTable WHERE rowID=1
UNION ALL
...
UNION ALL
SELECT 100, A100 FROM MyTable WHERE rowID=1
Then by putting that into a subquery, get the lowest 6 values.
SELECT ValNo, Val
FROM ( ... subquery above ... ) AS t
ORDER BY Val
LIMIT 6
You would be better off to store a table with one column for the value, and up to 100 rows for each rowId:
CREATE TABLE MyNewTable (
RowId INT,
OrdinalId TINYINT, -- 1 to 100
Aval INT,
PRIMARY KEY (RowId, OrdinalId)
);
Then you can query it more simply:
SELECT OrdinalId, Aval
FROM MyNewTable
WHERE RowId = 1
ORDER BY Aval
LIMIT 6;

In MySQL, how to retrieve records after/before a record with a given ID?

As part of implementing a GraphQL API I'm attempting to provide before/after pagination. For example, to return LIMIT records that come after a record with a given ID. This has to work on an arbitrary SELECT statement with arbitrary JOIN, WHERE and ORDER BY clauses already in place.
The benefit of this kind of pagination over just using page numbers is that it can more closely return an expected page of results when the underlying data is changing.
If I can get this working for after I can also make it work for before by inverting the ORDER BY clause, so here I'll focus just on the after condition.
What's the easiest or most efficient way to modify a given SELECT statement to accomplish this?
My first thought was to add an AND condition to the WHERE clause restricting results to those with column values from the ORDER BY clause that are greater than or equal to their values in the record with the given ID. But this doesn't seem to work, because there is no expectation of uniqueness in the ORDER BY clause columns, so there's no way to know where the target record will fall in the results, and therefore no way to know how to set the LIMIT to return the correct number of records.
Another approach is to first discover the offset of the target record within the initial SELECT statement, and then to add a LIMIT offset+1, limit clause to the initial SELECT statement with the discovered offset.
MySQL has no row_count() function or similar, but row numbers can be added like this:
SELECT #rownum:=#rownum+1 ‘rank’, t.*
FROM my_table t, (SELECT #rownum:=0) r ORDER BY field2;
Then the above can be used as a subquery to fetch the rank of the target record, e.g.
SELECT rank FROM (SELECT #rownum...) WHERE id = 42
And then using that rank as the offset for the final query:
SELECT ... LIMIT (rank + 1), 100
Possibly this can be done as a single query with multiple subqueries, e.g.
SELECT ... LIMIT (SELECT rank from (SELECT #rownum...) ...) + 1, 100
But this three query approach seems like an elaborate and not very rapid way to perform a very frequently used operation, putting a higher load on our database servers than we would prefer.
Is there a better way to do this?
Edit: A specific example was requested. Say I want to get a page of 2 articles from a table of 10 articles. We'll paginate this query:
select id, title from articles order by title desc
The table:
id, title
1, "a"
3, "b"
4, "c"
6, "d"
7, "e"
8, "f"
9, "g"
10, "h"
11, "i"
12, "k"
So when requesting the page after id 6 the correct records would be 4, "c" and 3, "b". This needs to work for arbitrary WHERE and ORDER BY clauses.
Without any actual examples tables, the following query is a blueprint of sorts to use.
It selects the rows from a table where a column value is greater than the row in the sub-query.
The sub-query selects the target row in the table that is the "starting point" for the rows desired.
In the example below, col1 is the ranking column. Change the WHERE clause in the sub-query to whatever would be needed to select the starting point row.
For pagination, alter the LIMIT clause to represent the previous pagination query.
SELECT
col1,
col2,
etc
FROM table a
JOIN (
SELECT col1
FROM table c
WHERE conditions
LIMIT 20,1
) b
ON a.col1 > b.col1
ORDER BY col1
LIMIT 20;
You can simply select the rows before or after you target row based on whatever you are ordering by. Assuming you have an index on id this should only require a single relatively inexpensive subquery:
SELECT id, title
FROM articles
WHERE title < (SELECT title FROM articles WHERE id = '6')
ORDER BY title DESC
LIMIT 0,2;
Next set would be
LIMIT 2,2
Previous set would require >= operator:
SELECT id, title
FROM articles
WHERE title >= (SELECT title FROM articles WHERE id = '6')
ORDER BY title DESC
LIMIT 0,2;
etc...
It would be easier to give a more succinct answer with more specific data and table structures but hope this helps.

SQL duplicated rows selection without using "having"

I have this type of table:
A.code A.name
1. X
2. Y
3. X
4. Z
5. Y
And i need to write a query that gives me all duplicated names like this:
A.name
X
Y
Z
Without using "group by".
The correlated subquery is your friend here. The subquery is evaluated for every row in the table referenced in the outer query due to the table alias used in both the outer query and the subquery.
In the subquery, the outer table is queried again without the alias to determine the row's compliance with the condition.
SELECT DISTINCT name FROM Names AS CorrelatedNamesTable
WHERE
(
SELECT COUNT(Name) FROM Names WHERE Name = CorrelatedNamesTable.Name
) > 1
Try using DISTINCT for the column. Please note in tables with a large number of rows, this is not the best performance option.
SELECT DISTINCT A.Name FROM A
SELECT a1.name FROM A a1, A a2 WHERE a1.name=a2.name AND a1.code<>a2.code
This assumes code is unique ;).