mysql: inner join with in and not in

mysql: inner join with in and not in - mysql

I'm trying to pull rows from one table "articles" based on specific category tags from table "article_category_reference", to exclude articles that have a specific tag. I have this query right now:
SELECT DISTINCT
a.article_id,
a.`title`,
a.`text`,
a.`date`
FROM
`articles` a
INNER JOIN `article_category_reference` c ON
a.article_id = c.article_id AND c.`category_id` NOT IN (54)
WHERE
a.`active` = 1
ORDER BY
a.`date`
DESC
LIMIT 15
The problem is, it seems to grab rows even if they do have a row in the "article_category_reference" table where "category_id" matches "54". I've also tried it in the "where" clause and it makes no difference.
Keep in mind I'm using "NOT IN" as it may be excluding multiple tags.
SQL fiddle to show it: http://sqlfiddle.com/#!9/b2172/1
Tables:
CREATE TABLE `article_category_reference` (
`ref_id` int(11) NOT NULL,
`article_id` int(11) NOT NULL,
`category_id` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `articles` (
`article_id` int(11) UNSIGNED NOT NULL,
`author_id` int(11) UNSIGNED NOT NULL,
`date` int(11) NOT NULL,
`title` varchar(120) NOT NULL,
`text` text CHARACTER SET utf8mb4 NOT NULL,
`active` int(1) NOT NULL DEFAULT '1'
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

One option is to use an EXISTS clause:
SELECT DISTINCT
a.article_id,
a.title,
a.text,
a.date
FROM articles a
WHERE
a.active = 1 AND
NOT EXISTS (SELECT 1 FROM article_category_reference c
WHERE a.article_id = c.article_id AND c.category_id = 54)
ORDER BY
a.date DESC
LIMIT 15;
The logical problem with your current approach of checking the category in the WHERE clause is that it is checking individual records. You need to assert that all category records for a given article, in aggregate, do not match the category you wish to exclude. An EXISTS clause, as I have written above, is one way to do it. Using GROUP BY in a subquery is another way.

The NOT IN condition is evaluated for each joined rows. Since you have same article_id with multiple category_id-values, the ones that do match the NOT IN condition will get picked.
See SQLFiddle.
To select articles that do not have any rows with category_id 54 use a subquery:
SELECT
a.article_id,
a.`title`,
a.`text`,
a.`date`
FROM `articles` a
WHERE a.`active` = 1 AND
a.`article_id` not in (
SELECT c.article_id
FROM `article_category_reference` c
WHERE c.`category_id` = 54
)
ORDER BY a.`date`
DESC
LIMIT 15

Related

MysqL big table query optimization

I have a chatting application. I have an api which returns list of users who the user talked. But it takes a long to mysql return a list messages when it reachs 100000 rows of data.
This is my messages table
CREATE TABLE IF NOT EXISTS `messages` (
`_id` int(11) NOT NULL AUTO_INCREMENT,
`fromid` int(11) NOT NULL,
`toid` int(11) NOT NULL,
`message` text NOT NULL,
`attachments` text NOT NULL,
`status` tinyint(1) NOT NULL DEFAULT '0',
`date` datetime NOT NULL,
`delete` varchar(50) NOT NULL,
`uuid_read` varchar(250) NOT NULL,
PRIMARY KEY (`_id`),
KEY `fromid` (`fromid`,`toid`,`status`,`delete`,`uuid_read`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=118561 ;
and this is my users table (simplified)
CREATE TABLE IF NOT EXISTS `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`login` varchar(50) DEFAULT '',
`sex` tinyint(1) DEFAULT '0',
`status` varchar(255) DEFAULT '',
`avatar` varchar(30) DEFAULT '0',
`last_active` datetime DEFAULT NULL,
`active` tinyint(1) DEFAULT '1',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=15523 ;
And here is my query (for user with id 1930)
select SQL_CALC_FOUND_ROWS `u_id`, `id`, `login`, `sex`, `birthdate`, `avatar`, `online_status`, SUM(`count`) as `count`, SUM(`nr_count`) as `nr_count`, `date`, `last_mesg` from
(
(select `m`.`fromid` as `u_id`, `u`.`id`, `u`.`login`, `u`.`sex`, `u`.`birthdate`, `u`.`avatar`, `u`.`last_active` as online_status, COUNT(`m`.`_id`) as `count`, (COUNT(`m`.`_id`)-SUM(`m`.`status`)) as `nr_count`, `tm`.`date` as `date`, `tm`.`message` as `last_mesg` from `messages` as m inner join `messages` as tm on `tm`.`_id`=(select MAX(`_id`) from `messages` as `tmz` where `tmz`.`fromid`=`m`.`fromid`) left join `users` as u on `u`.`id`=`m`.`fromid` where `m`.`toid`=1930 and `m`.`delete` not like '%1930;%' group by `u`.`id`)
UNION
(select `m`.toid as `u_id`, `u`.`id`, `u`.`login`, `u`.`sex`, `u`.`birthdate`, `u`.`avatar`, `u`.`last_active` as online_status, COUNT(`m`.`_id`) as `count`, 0 as `nr_count`, `tm`.`date` as `date`, `tm`.`message` as `last_mesg` from `messages` as m inner join `messages` as tm on `tm`.`_id`=(select MAX(`_id`) from `messages` as `tmz` where `tmz`.`toid`=`m`.`toid`) left join `users` as u on `u`.`id`=`m`.`toid` where `m`.`fromid`=1930 and `m`.`delete` not like '%1930;%' group by `u`.`id`)
order by `date` desc ) as `f` group by `u_id` order by `date` desc limit 0,10
Please help to optimize this query
What I need,
Who user talked to (name, sex, and etc)
What was the last message (from me or to me)
Count of messages (all)
Count of unread messages (only to me)
The query works well, but takes too long.
The output must be like this

You have some design problems on your query and database.
You should avoid keywords as column names, as that delete column or the count column;
You should avoid selecting columns not declared in the group by without an aggregation function... although MySQL allows this, it's not a standard and you don't have any control on what data will be selected;
Your not like construction may cause a bad behavior on your query because '%1930;%' may match 11930; and 11930 is not equal to 1930;
You should avoid like constructions starting and ending with % wildcard, which will cause the text processing to take longer;
You should design a better way to represent a message deletion, probably a better flag and/or another table to save any important data related with the action;
Try to limit your result before the join conditions (with a derived table) to perform less processing;
I tried to rewrite your query the best way I understood it. I've executed my query in a messages table with ~200.000 rows and no indexes and it performed in 0,15 seconds. But, for sure you should create the right indexes to help it perform better when the amount of data increase.
SELECT SQL_CALC_FOUND_ROWS
u.id,
u.login,
u.sex,
u.birthdate,
u.avatar,
u.last_active AS online_status,
g._count,
CASE WHEN m.toid = 1930
THEN g.nr_count
ELSE 0
END AS nr_count,
m.`date`,
m.message AS last_mesg
FROM
(
SELECT
MAX(_id) AS _id,
COUNT(*) AS _count,
COUNT(*) - SUM(m.status) AS nr_count
FROM messages m
WHERE 1=1
AND m.`delete` NOT LIKE '%1930;%'
AND
(0=1
OR m.fromid = 1930
OR m.toid = 1930
)
GROUP BY
CASE WHEN m.fromid = 1930
THEN m.toid
ELSE m.fromid
END
ORDER BY MAX(`date`) DESC
LIMIT 0, 10
) g
INNER JOIN messages AS m ON 1=1
AND m._id = g._id
LEFT JOIN users AS u ON 0=1
OR (m.fromid <> 1930 AND u.id = m.fromid)
OR (m.toid <> 1930 AND u.id = m.toid)
ORDER BY m.`date` DESC
;

How would I avoid using two separate queries for this purpose?

I am using MySQL to create a database of articles and categories. Each article has a category. I would like to make a feature for the admin panel that lists all the categories, but also includes the latest article for each category. The method I usually use is to fetch rows from the category table, loop through the results, and then create another query using something like FROM articlesWHERE category_id = {CATEGORY_ID} ORDER BY article_id DESC LIMIT 1. That method just seems like overkill to me and I am wondering if it can be done in one query(Maybe with joins and subqueries?).
This is the current query I have that fetches categories:
SELECT * FROM article_categories ORDER BY category_title ASC
These are my tables:
CREATE TABLE IF NOT EXISTS `articles` (
`article_id` int(15) NOT NULL AUTO_INCREMENT,
`author_id` int(15) NOT NULL,
`category_id` int(15) NOT NULL,
`modification_id` int(15) NOT NULL,
`title` varchar(125) NOT NULL,
`content` text NOT NULL,
`type` tinyint(1) NOT NULL,
`date_posted` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`status` tinyint(1) NOT NULL,
`attachment_id` int(15) NOT NULL,
`youtube_id` varchar(32) DEFAULT NULL,
`refs` text NOT NULL,
`platforms` varchar(6) NOT NULL,
PRIMARY KEY (`article_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `article_categories` (
`category_id` int(15) NOT NULL AUTO_INCREMENT,
`parent_id` int(15) NOT NULL,
`title` varchar(50) NOT NULL,
`description` text NOT NULL,
`attachment_id` text NOT NULL,
`enable_comments` tinyint(1) NOT NULL,
`enable_ratings` tinyint(1) NOT NULL,
`guest_reading` tinyint(1) NOT NULL,
`platform_assoc` tinyint(1) NOT NULL,
`allowed_types` varchar(6) NOT NULL,
PRIMARY KEY (`category_id`,`title`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
This is the query I have come up with so far:
SELECT
c.category_id, c.title, c.description,
a.article_id, a.category_id, a.title, COUNT(a.article_id) AS total_articles
FROM article_categories AS c
LEFT JOIN articles AS l ON (
SELECT
article_id AS article_id, category_id, title AS article_title
FROM articles AS l
WHERE l.category_id = c.category_id
ORDER BY l.article_id
DESC LIMIT 1)
LEFT JOIN articles AS a ON (c.category_id = a.category_id)
GROUP BY c.category_id
ORDER BY c.title ASC
The above query gives me the following SQL error:
Operand should contain 1 column(s)
Why is this happening?

You can return list of all the categories and recent article in each category using one query, Try this
SELECT C.*, A.*
FROM article_categories C
LEFT OUTER JOIN articles A ON c.category_id = A.category_id
WHERE
(
A.category_id IS NULL OR
A.article_id = (SELECT MAX(X.article_id)
FROM articles X WHERE X.category_id = C.category_id)
)

This will restrict the articles to just the highest article_id per category and make use of the indexes on those tables:
select
ac.category_id, ac.title, newa.article_id, newa.title article_title
from article_categories ac
left join articles newa on ac.category_id = newa.category_id
left join articles olda on newa.category_id = olda.category_id
and olda.article_id > newa.article_id
where olda.article_id is null
;
See this Demonstrated at SQLFiddle

Shoelace, I was browsing your other questions and saw that this was unresolved so I've decided to take a crack at it.
This is a little tricky, but I don't think it's too bad, assuming I understand your question correctly. First, get the latest article date for each category:
SELECT a.category_id, MAX(a.date_posted)
FROM articles a
JOIN article_categories c ON c.category_id = a.category_id
GROUP BY a.category_id;
Then, join that with your articles table on the condition that the category_id and date are equal and you have what you need:
SELECT ar.*
FROM articles ar
JOIN(SELECT a.category_id, MAX(a.date_posted) AS latestDateForCategory
FROM articles a
JOIN article_categories c ON c.category_id = a.category_id
GROUP BY a.category_id) t
ON t.category_id = ar.category_id AND t.latestDateForCategory = ar.date_posted;
SQL Fiddle.

complicated sql query returns a result with empty tables

I have three empty tables
--
-- Tabellenstruktur für Tabelle `projects`
--
CREATE TABLE IF NOT EXISTS `projects` (
`id_project` int(11) NOT NULL AUTO_INCREMENT,
`id_plan` int(11) DEFAULT NULL,
`name` varchar(255) NOT NULL,
`description` longtext NOT NULL,
PRIMARY KEY (`id_project`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=2 ;
-- --------------------------------------------------------
--
-- Tabellenstruktur für Tabelle `project_plans`
--
CREATE TABLE IF NOT EXISTS `project_plans` (
`id_plan` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`description` longtext NOT NULL,
`max_projects` int(11) DEFAULT NULL,
`max_member` int(11) DEFAULT NULL,
`max_filestorage` bigint(20) NOT NULL DEFAULT '3221225472' COMMENT '3GB Speicherplatz',
PRIMARY KEY (`id_plan`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=2 ;
-- --------------------------------------------------------
--
-- Tabellenstruktur für Tabelle `project_users`
--
CREATE TABLE IF NOT EXISTS `project_users` (
`id_user` int(11) NOT NULL,
`id_project` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
All these tables are empty but i get a result with my query?
my query:
SELECT
A.id_plan,
A.name AS plan_name,
A.description AS plan_description,
A.max_projects,
A.max_member,
A.max_filestorage,
B.id_plan,
B.name AS project_name,
B.description AS project_description,
C.id_user,
C.id_project,
COUNT(*) AS max_project_member
FROM
".$this->config_vars["projects_plans_table"]." AS A
LEFT JOIN
".$this->config_vars["projects_table"]." AS B
ON
B.id_plan = A.id_plan
LEFT JOIN
".$this->config_vars["projects_user_table"]." AS C
ON
C.id_project = B.id_project
WHERE
C.id_project = '".$id."'
&& B.deleted = '0'
i think the problem is the COUNT (*) AS ...
how i can solve the problem?

For one, you are getting a record explicitly due to the COUNT(). Even though you have no records, you are asking the engine how many records which at worst case will return zero. Count(), like other aggregates are anticipated to have a group by, so even though you don't have one, you are still asking.
So the engine is basically stating hey... there are no records, but I have to send you a record so you can get the count() column to look at and do with what you will. So, it is doing what you asked.
Now, for the comment to the other question where you asked...
Yes but i want to count the project member from a project, how i can count the users from project_users where all users have the id_project 1.
Since you only care about a count, and not the specific WHO involved, you can get this result directly from the project_users table (which should have an index on both the ID_User and another on the ID_Project. Then
select count(*)
from project_users
where id_project = 1
To expand from basis of your original question to get the extra details, I would do...
select
p.id_project,
p.id_plan,
p.name as projectName,
p.description as projectDescription,
pp.name as planName,
pp.description as planDescription,
pp.max_projects,
pp.max_member,
pp.max_filestorage,
PJCnt.ProjectMemberCount
from
( select id_project,
count(*) as ProjectMemberCount
from
project_users
where
id_project = 1 ) PJCnt
JOIN Projects p
on PJCnt.id_project = p.id_project
JOIN Project_Plans PP
on p.id_plan = pp.id_plan
Now, based on this layout of tables, a plan can have a max member count, but there is nothing indicating max members for the plan based on all projects, or max per SINGLE project. So, if a plan allows for 20 people, can there be 20 people for 10 different projects under the same plan? That's something only you would know the impact of... just something to consider what you are asking for.

Your cleaned-up query should look like :
See sqlfidle demo as well : http://sqlfiddle.com/#!2/e693f5/9
SELECT
A.id_plan,
A.name AS plan_name,
A.description AS plan_description,
A.max_projects,
A.max_member,
A.max_filestorage,
B.id_plan,
B.name AS project_name,
B.description AS project_description,
C.id_user,
C.id_project,
COUNT(*) AS max_project_member
FROM
project_plans AS A
LEFT JOIN
projects AS B
ON
B.id_plan = A.id_plan
LEFT JOIN
project_users AS C
ON
C.id_project = B.id_project
WHERE
C.id_project = '".$id."';
This will return you null values for all the cols from the select because you have one legit return form the result set and that is the count(*) output 0.
To fix this just add a group by at the end (see group by example http://sqlfiddle.com/#!2/14d46/2) or
Remove the count(*) and the null values will be gone as well as the count(*) values 0
See simple sql example here : http://sqlfiddle.com/#!2/ab7dd/5
Just comment the count() and you fixed you null problem!

Mysql select if only all sizes equal 0

I have some mysql tables:
`items` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`cat_id_p` int(11) NOT NULL,
`cat_id` int(11) DEFAULT NULL,
`brand_id` int(11) DEFAULT NULL,
...
)
`items_sizes` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`item_id` int(11) NOT NULL,
`size_id` int(11) NOT NULL,
`count` int(11) DEFAULT '1',
...
)
And i need to select items which only have items_sizes.count < 1 and not items which have at least count > 1
Here is sql query:
SELECT
DISTINCT `items`.*
FROM
(`items2`)
LEFT JOIN `items_sizes` ON items_sizes`.`item_id` = `items`.`id`
WHERE ...
AND `items_sizes`.`item_id` = items.id
AND `items_sizes`.`count` < 1
GROUP BY `items`.`id`
ORDER BY `items`.`id` desc
LIMIT 30
But it does not work... May be i need If statement ?
SOLVED! JUST with SUM and HAVING
SELECT DISTINCT `items`.*, sum(items_sizes.count)
FROM (`items`)
LEFT JOIN `items_sizes` ON `items_sizes`.`item_id` = `items`.`id`
WHERE ...
GROUP BY `items`.`id`
having sum(items_sizes.count)=0
ORDER BY `items`.`id` desc LIMIT 30

SELECT
DISTINCT *
FROM
`items`
WHERE
NOT EXISTS (
SELECT * FROM `items_sizes`
WHERE
`items_sizes`.`item_id` = `items`.`id`
AND `items_sizes`.`count` > 0
)
-- ...
ORDER BY `id` desc
LIMIT 30

Assuming the table name items2 in your FROM clause is a typo or the items.* is a typo and should be items2.*...
You have no aggregate functions (SUM(), COUNT(), AVG()), so there is no need for the GROUP BY. It also appears you have mixed the WHERE clause up with the ON clause used in your JOIN. The first WHERE condition should not be there:
SELECT
DISTINCT items2.*
FROM
items2
LEFT JOIN items_sizes ON items2.id = items_sizes.item_id
WHERE ...
AND items_sizes.count < 1
ORDER BY items2.id desc
LIMIT 30
Note that the part of your WHERE clause that we don't see (WHERE ...) might be significant here as well...
The LEFT JOIN is probably unnecessary, and can just be a JOIN because the items_sizes.count < 1 will eliminate the NULL values the LEFT JOIN would have returned anyway.

mysql query with multiple groups?

I have a query that gets product IDs based on keywords.
SELECT indx_search.pid
FROM indx_search
LEFT JOIN word_index_mem ON (word_index_mem.word = indx_search.word)
WHERE indx_search.word = "phone"
GROUP BY indx_search.pid
ORDER BY indx_search.pid ASC
LIMIT 0,20
This works well but now I'm trying to go a step further and implement "price range" into this query.
CREATE TABLE `price_range` (
`pid` int(11) NOT NULL,
`range_id` tinyint(3) NOT NULL,
PRIMARY KEY (`pid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
This table simply contains product IDs and a range_id. The price range values are stored here:
CREATE TABLE `price_range_values` (
`ID` tinyint(3) NOT NULL AUTO_INCREMENT,
`rangeFrom` float(10,2) NOT NULL,
`rangeTo` float(10,2) NOT NULL,
PRIMARY KEY (`ID`)
) ENGINE=MyISAM AUTO_INCREMENT=32 DEFAULT CHARSET=latin1
I want to GROUP price_range.range_id with COUNT() of how many products match the certain price range within the current query. I still want to receive my 20 results of product IDs.
So something along the lines of:
SELECT indx_search.pid, price_range.range_id as PriceRangeID, COUNT(price_range.range_id) as PriceGroupTotal
FROM indx_search
LEFT JOIN windex_mem ON ( windex_mem.word = indx_search.word )
LEFT JOIN price_range ON ( price_range.pid = indx_search.pid )
WHERE indx_search.word = "memory"
GROUP BY indx_search.pid, PriceRangeID
ORDER BY indx_search.pid ASC
LIMIT 0 , 20
Is this possible to accomplish without busting an additional query?

Try to use a subquery with a LIMIT clause, e.g. -
SELECT
i_s.pid, p_r.range_id as PriceRangeID, COUNT(p_r.range_id) as PriceGroupTotal
FROM
(SELECT * FROM indx_search WHERE i_s.word = 'memory' ORDER BY i_s.pid ASC LIMIT 0 , 20) i_s
LEFT JOIN
windex_mem w_m ON w_m.word = i_s.word
LEFT JOIN
price_range p_r ON p_r.pid = i_s.pid
GROUP BY
i_s.pid, PriceRangeID

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008