Count matches in the previous 100 rows - mysql

As the title states, I'm trying to count the number of matches within the last 100 records in a certain table.
This query works, but the data is very dynamic, with lots of inserts on that particular table, and similar queries are being run, and they all end up being extremely slow (20s) probably blocking each other out.
Because caching the result is not acceptable (data has to be live) I'm thinking of switching the exterior query to a PHP, even though I know that would be slower because it would still be faster than 20s.
Here's the query
SELECT count(*) as matches
FROM (
SELECT first_name FROM customers
WHERE division = 'some_division'
AND type = 'employee'
ORDER BY request_time DESC
LIMIT 0, 100
) as entries
WHERE first_name = 'some_first_name_here'
What I'm looking for a more optimized way of performing the same task, without having to implement it in PHP since that's the naive/obviously wrong approach.
the table looks something like this:
id first_name last_name type division request_time
Just to set things straight, this is obviously not the actual table / data due to NDA reasons, but, the table looks exactly the same with different column names.
So again, what I'm trying to achieve is to pull a count of matches found WITHIN the last 100 records which have some contraints.
for example,
how many times does the name 'John' appear within the last 100 employees added in the HR division?

I see.
How about something like this...
SELECT i
FROM
( SELECT CASE WHEN first_name = 'some_first_name_here' THEN #i:=#i+1 END i
FROM customers
WHERE division = 'some_division'
AND type = 'employee'
, (SELECT #i:=0)val
ORDER
BY request_time
DESC
LIMIT 0,100
) n
ORDER
BY i DESC
LIMIT 1;

Try this:
SELECT SUM(matches)
FROM
(
SELECT IF(first_name = 'some_first_name_here', 1, 0) AS matches
FROM customers
WHERE division = 'some_division' AND type = 'employee'
ORDER BY request_time DESC
LIMIT 0,100
) AS entries

Related

Why does SQL LIMIT clause returns random rows for every query?

It is a very simple query. For every query, I get a different result. Similar things happen when I used TOP 1. I would like a random sub-sample and it works. But am I missing something? Why does it return a different value every time?
SELECT DISTINCT user_id FROM table1
where day_id>="2009-01-09" and day_id<"2011-02-16"
LIMIT 1;
There's no guarantee that you will get a random result with your query. It's quite likely you'll get the same result each time (although the actual result returned will be indeterminate). To guarantee that you get a random, unique user_id, you should SELECT a random value from the list of DISTINCT values:
SELECT user_id
FROM (SELECT DISTINCT user_id
FROM table1
WHERE day_id >= "2009-01-09" AND day_id < "2011-02-16"
) u
ORDER BY RAND()
LIMIT 1
SQL statements represent unordered sets, add order by clause such as
...
ORDER BY user_id
LIMIT 1

Should ORDER BY change the result set?

I was under the impression that using an ORDER BY in an SQL query would not affect which records were selected for the result set. I thought that ORDER BY would only affect the presentation of the result set.
Recently, however, I was getting unexpected results from a query until I used an ORDER BY clause. This suggests that either a) ORDER BY can affect which records are included in the result set, or b) I have some other bug which I need to work on.
Which is it?
Here's the query: SELECT node_id FROM users ORDER BY node_id LIMIT 100
(node_id is both a primary key and foreign key).
As you can see, the query includes a LIMIT clause. It seems that if I use the ORDER BY, the records are ordered before the top 100 are selected. I had expected it to select 100 records based on natural order, then order them according to node_id.
I've looked for info on ORDER BY but as yet, the only info I can find suggests that it affects presentation only... I am using MySQL.
ORDER BY reflects the order of all of the records before the LIMIT Clause. To get the result you want you will need this:
select u.node_id
from users u
join
(
SELECT node_id
FROM users
LIMIT 100
) us ON u.node_id = us.node_id
ORDER BY u.node_id
This way you will use the limit clause first and get the top 100 records and then you will sort the result of that. The join clause is faster than a double Select statement especially if you are working with many records.
You can use a nested query:
SELECT node_id FROM
(
SELECT node_id FROM users LIMIT 100
) u
ORDER BY node_id

SELECT other if value is no

I have a column called is_thumbnail with a default value of no. I select a row based off of is_thumbnail = 'yes'
$query = $conn->prepare("SELECT * FROM project_data
WHERE project_id = :projectId
AND is_thumbnail = 'yes'
LIMIT 1");
There is a chance that no rows will have a value of yes. In that case, I want to select the first row with that same projectId regardless of the value of is_thumbnail
Now I know I can see what the query returns and then run another query. I was wondering if it was possible to do this in a single query or is there somehow I can take advantage of PDO? I just started using PDO. Thanks!
Example data:
id project_id image is_thumbnail
20 2 50f5c7b5b8566_20120803_185833.jpg no
19 2 50f5c7b2767d1_4link 048.jpg no
18 2 50f5c7af2fb22_4link 047.jpg no
$query = $conn->prepare("SELECT * FROM project_data
WHERE project_id = :projectId
ORDER BY is_thumbnail ASC LIMIT 1");
Given that the schema described in the question shows multiple rows for a given project_id, using only ORDER BY is_thumbnail ... solutions may not yield good performance, if there is for instance a single project with many related rows. The cost of sorting rows can potentially be fairly high, and it won't be able to use an index. An alternate solution which may be necessary is:
SELECT * FROM (
SELECT *
FROM project_data
WHERE project_id = :projectId AND is_thumbnail = "yes"
ORDER BY id DESC
LIMIT 1
UNION
SELECT *
FROM project_data
WHERE project_id = :projectId AND is_thumbnail = "no"
ORDER BY id DESC
LIMIT 1
) AS t
ORDER BY t.is_thumbnail = "yes" DESC
LIMIT 1
While this solution is a bit more complex to understand, it is able to use a compound index on (project_id, is_thumbnail, id) to quickly find exactly one row matching the requested conditions. The outer select ensures a stable ordering of the yes/no rows if both are found.
Note that you could also just issue two queries, and probably get similar or better performance. In order to use the above UNION and sub-select, MySQL will require temporary tables, which aren't great in busy environments.

MYSQL find related query

I'm writing a query where I'm looking to pull related fields from a database with a limit of 10 rows.
The query is easy to write, however I was wondering if there is a way to write the query so it searches for related items and pulls those first and if those are < 10 it will just pull random fields for the remaining ones.
Here is the query I use to pull the related rows
SELECT * FROM table WHERE term LIKE '%term1%' or term LIKE '%term2%' LIMIT 0,10
Your just need to order the table by the terms that you are looking for first, one way of doing this is as follows:
SELECT * FROM table
ORDER BY (
(
CASE WHEN term LIKE '%term1%'
THEN 1
ELSE 0
END
) + (
CASE WHEN term LIKE '%term2%'
THEN 1
ELSE 0
END
)
) DESC
LIMIT 0,10

Does this MySQL query always return the expected result?

I wrote a query as follows:
SELECT COUNT(*) AS count, email
FROM sometable
GROUP BY email
ORDER BY count DESC
LIMIT 4
I am interested in seeing the four most duplicated email entries in the table. So far, it seems to return exactly what I want:
count email
12 very-duplicated#email.com
2 duped-twice#email.com
2 also-twice#email.com
1 single#email.com
When I don't use LIMIT, I get the same result (albeit with many more rows having a count = 1). What I'm wondering about is the LIMIT. In the future, when the numbers change, will my query above still return the four most used emails? or does the query need to scan the entire database to remain accurate?
(note: I am not trying to prevent duplicates, I'm trying to see the most frequently used email.)
I'm not sure. But if you're concerned, you could apply a limit to a subquery:
select *
from
(
SELECT COUNT(*) AS count, email
FROM sometable
GROUP BY email
ORDER BY count DESC
)
limit 4
Alternateively, you could do something like this to see all duplicated email address (may return more or less than 4):
SELECT COUNT(*) AS count, email
FROM sometable
GROUP BY email
having COUNT(email) > 1
ORDER BY count DESC
Well first thing is, the query does not only return you the duplicate entries. Look at 4th row which says count = 1 which means it occurs only once in the table. To list duplicate records you need to modify your query as -
SELECT COUNT(*) AS count, email
FROM sometable
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY count DESC
LIMIT 4
Then, this will always return you 4 topmost duplicate entries in your table as the order mentioned.