Tagging query with group_concat - mysql

Using the database schema for tagging from this question's accepted answer is it possible to have a query using group_concat that works with a large amount of data? I need to get items with their tags for all items tagged with tag x. Using a query with group_concat having ~ .5 million tags is very slow at > 15 seconds. Without group_concat (items without tags) it is ~ 0.05 seconds.
As a side question, how does SO solve this problem?

This is probably a case of a poor indexing strategy. Adapting the schema shown in the accepted answer of the question to which you linked:
CREATE Table Items (
Item_ID SERIAL,
Item_Title VARCHAR(255),
Content TEXT
) ENGINE=InnoDB;
CREATE TABLE Tags (
Tag_ID SERIAL,
Tag_Title VARCHAR(255)
) ENGINE=InnoDB;
CREATE TABLE Items_Tags (
Item_ID BIGINT UNSIGNED REFERENCES Items (Item_ID),
Tag_ID BIGINT UNSIGNED REFERENCES Tags ( Tag_ID),
PRIMARY KEY (Item_ID, Tag_ID)
) ENGINE=InnoDB;
Note that:
MySQL's SERIAL data type is an alias for BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE and, as such, is indexed;
defining the foreign key constraints in Items_Tags creates indexes on the foreign key columns.

I would propose to have a hybrid between normalized data and denormalized data .
So using the normalized structure provided by eggyal i would do the following denormalized structure :
CREATE TABLE Items_Tags_Denormalized (
Item_ID BIGINT UNSIGNED REFERENCES Items (Item_ID),
Tags BLOB,
PRIMARY KEY (Item_ID)
) ENGINE=InnoDB;
In column Tags you would have all the tags (Tag_Title) for the corresponding Item_ID.
Now you have 2 ways to achieve this:
create a cron that runs periodically which will build this table Items_Tags_Denormalized using GROUP_CONCAT or whatever suits you (advantage: doesn't put additional load when you insert or delete in Items_Tags table; disadvantage: the denormalized table will not always be up to date (depending on how often do you run the cron))
create triggers for Items_Tags table on insert and delete in order to keep up to date the Items_Tags_Denormalized table (advantage: the denormalized table will always be up to date;disadvantage: additional load when you insert or delete in Items_Tags table)
Choose whatever solution suits your needs best considering the advantages and disadvantages.
So in the end you will have the Items_Tags_Denormalized table from which you will only read without doing additional operations.

Why would you use group_concat for that? For a given tag x you said that selecting the list of items is fast. For a given list of items getting all the tags should be fast, too. And is there not normally some kind of restriction, I mean normal websites don't show 100000 entries on one page.
I would suggest:
drop temporary table if exists lookup_item;
create temporary table lookup_item (item_id serial, primary key(item_id));
insert into lookup_item select i.id as item_id
from items i
where exists (select * from items_tags where item_id = i.id and tag_id = <tag_id>)
and <other conditions or limits>;
select * from lookup_item
inner join items_tags it on it.item_id = i.id
inner join tags t on t.id = it.tag_id
order by i.<priority>, t.<priority>
priority could be last-modified for items and some kind of importance for tags.
Then you get every item with it's tags. The only work in the code is to see when the result-line has the next item.

If I understand correctly, GROUP_CONCAT isn't the only thing you are removing that makes the query faster without tags. Inside the GROUP_CONCAT you're selecting Tags.Tag_Title and forcing the Tags table to be accessed.
You could try running GROUP_CONCAT with Items_Tags.Tag_ID to test my theory.

Related

Mysql database empty column values vs additional identifying table

Sorry, not sure if question title is reflects the real question, but here goes:
I designing system which have standard orders table but with additional previous and next columns.
The question is which approach for foreign keys is better
Here I have basic table with following columns (previous, next) which are self referencing foreign keys. The problem with this table is that the first placed order doesn't have previous and next fields, so they left out empty, so if I have say 10 000 records 30% of them have those columns empty that's 3000 rows which is quite a lot I think, and also I expect numbers to grow. so in a let's say a year time period it can come to 30000 rows with empty columns, and I am not sure if it's ok.
The solution I've have came with is to main table with other 2 tables which have foreign keys to that table. In this case those 2 additional tables are identifying tables and nothing more, and there's no longer rows with empty columns.
So the question is which solution is better when considering query speed, table optimization, and common good practices, or maybe there's one even better that I don't know? (P.s. I am using mysql with InnoDB engine).
If your aim is to do order sets, you could simply add a new table for that, and just have a single column as a foreign key to that table in the order table.
The orders could also include a rank column to indicate in which order orders belonging to the same set come.
create table order_sets (
id not null auto_increment,
-- customer related data, etc...
primary key(id)
);
create table orders (
id int not null auto_increment,
name varchar,
quantity int,
set_id foreign key (order_set),
set_rank int,
primary key(id)
);
Then inserting a new order means updating the rank of all other orders which come after in the same set, if any.
Likewise, for grouping queries, things are way easier than having to follow prev and next links. I'm pretty sure you will need these queries, and the performances will be much better that way.

Resort MySQL Table by Column Alphabetically

I have table containing settings for an application with the columns: id, key, and value.
The id column is auto-incrementing but as of current, I do not use it nor does it have any foreign key constraints. I'm populating the settings and would like to restructure it so they are alphabetical as I've not been putting the settings in that way, but reordering alphabetically would help group related settings together.
For example, if I have the following settings:
ID KEY VALUE
======================================
1 App.Name MyApplication
2 Text.Title Title of My App
3 App.Version 0.1
I would want all the App.* settings to be grouped together sequential without having to do an ORDER BY everytime. Anyway, thats the explanation. I have tried the following and it didn't seem to change the order:
CREATE TABLE mydb.Settings2 LIKE mydb.Settings;
INSERT INTO mydb.Settings2 SELECT `key`,`value` FROM mydb.Settings ORDER BY `key` ASC;
DROP TABLE mydb.Settings;
RENAME TABLE mydb.Settings2 TO mydb.Settings;
That will make a duplicate of the table as suggested, but won't restructure the data. What am I missing here?
The easy way to reorder a table is with ALTER TABLE table ORDER BY column ASC. The query you tried looks like it should have worked, but I know the ALTER TABLE query works; I use it fairly often.
Note: Reordering the data in a table only works and makes sense in MyISAM tables. InnoDB always stores data in PRIMARY KEY order, so it can't be rearranged.
Decided to make that an answer.
As I said in a comment to the initial answer, for you to achieve a long term effect you need to recreate the settings table with the key column as the PRIMARY KEY. Because as G-Nugget correctly said 'InnoDB always stores data in PRIMARY KEY order'.
You can do that like this
CREATE TABLE settings2
(`id` int NULL, `key` varchar(64), `value` varchar(64), PRIMARY KEY(`key`));
INSERT INTO settings2
SELECT id, `key`, `value`
FROM settings;
DROP TABLE settings;
RENAME TABLE settings2 TO settings;
That way you get your order intact after inserting new records.
And if you don't need the initial id column in settings table it's a good time to ditch it.
Here is working sqlfiddle
Disclaimer: Personally I would use ORDER BY anyway

Sphinx Search, compound key

After my previous question (http://stackoverflow.com/questions/8217522/best-way-to-search-for-partial-words-in-large-mysql-dataset), I've chosen Sphinx as the search engine above my MySQL database.
I've done some small tests with it, and it looks great. However, i'm at a point right now, where I need some help / opinions.
I have a table articles (structure isn't important), a table properties (structure isn't important either), and a table with values of each property per article (this is what it's all about).
The table where these values are stored, has the following structure:
articleID UNSIGNED INT
propertyID UNSIGNED INT
value VARCHAR(255)
The primary key is a compound key of articleID and propertyID.
I want Sphinx to search through the value column. However, to create an index in Sphinx, I need a unique id. I don't have right here.
Also when searching, I want to be able to filter on the propertyID column (only search values for propertyID 2 for example, which I can do by defining it as attribute).
On the Sphinx forum, I found I could create a multi-value attribute, and set this as query for my Sphinx index:
SELECT articleID, value, GROUP_CONCAT(propertyID) FROM t1 GROUP BY articleID
articleID will be unique now, however, now I'm missing values. So I'm pretty sure this isn't the solution, right?
There are a few other options, like:
Add an extra column to the table, which is unique
Create a calculated unique value in the query (like articleID*100000+propertyID)
Are there any other options I could use, and what would you do?
In your suggestions
Add an extra column to the table, which is unique
This can not be done for an existing table with large number of records as adding a new field to a large table take some time and during that time the database will not be responsive.
Create a calculated unique value in the query (like articleID*100000+propertyID)
If you do this you have to find a way to get the articleID and propertyID from the calculated unique id.
Another alternative way is that you can create a new table having a key field for sphinx and another two fields to hold articleID and propertyID.
new_sphinx_table with following fields
id - UNSIGNED INT/ BIGINT
articleID - UNSIGNED INT
propertyID - UNSIGNED INT
Then you can write an indexing query like below
SELECT id, t1.articleID, t1.propertyID, value FROM t1 INNER JOIN new_sphinx_table nt ON t1.articleID = nt.articleID AND t1.propertyID = nt.propertyID;
This is a sample so you can modify it to fit to your requirements.
What sphinx return is matched new_sphinx_table.id values with other attributed columns. You can get result by using new_sphinx_table.id values and joining your t1 named table and new_sphinx_table

Delayed insert due to foreign key constraints

I am trying to run a query:
INSERT
INTO `ProductState` (`ProductId`, `ChangedOn`, `State`)
SELECT t.`ProductId`, t.`ProcessedOn`, \'Activated\'
FROM `tmpImport` t
LEFT JOIN `Product` p
ON t.`ProductId` = p.`Id`
WHERE p.`Id` IS NULL
ON DUPLICATE KEY UPDATE
`ChangedOn` = VALUES(`ChangedOn`)
(I am not quite sure the query is correct, but it appears to be working), however I am running into the following issue. I am running this query before creating the entry into the 'Products' table and am getting a foreign key constraint problem due to the fact that the entry is not in the Products table yet.
My question is, is there a way to run this query, but wait until the next query (which updates the Product table) before performing the insert portion of the query above? Also to note, if the query is run after the Product entry is created it will no longer see the p.Id as being null and therefore failing so it has to be performed before the Product entry is created.
---> Edit <---
The concept I am trying to achieve is as follows:
For starters I am importing a set of data into a temp table, the Product table is a list of all products that are (or have been in the past) added through the set of data from the temp table. What I need is a separate table that provides a state change to the product as sometimes the product will become unavailable (no longer in the data set provided by the vendor).
The ProductState table is as follows:
CREATE TABLE IF NOT EXISTS `ProductState` (
`ProductId` VARCHAR(32) NOT NULL ,
`ChangedOn` DATE NOT NULL ,
`State` ENUM('Activated','Deactivated') NULL ,
PRIMARY KEY (`ProductId`, `ChangedOn`) ,
INDEX `fk_ProductState_Product` (`ProductId` ASC) ,
CONSTRAINT `fk_ProductState_Product`
FOREIGN KEY (`ProductId` )
REFERENCES `Product` (`Id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8
COLLATE = utf8_general_ci;
The foreign key is an identifying relationship with the Product table (Product.Id)
Essentially what I am trying to accomplish is this:
1. Anytime a new product (or previously deactivated product) shows up in the vendor data set, the record is created in the ProductState table as 'Activated'.
2. Anytime a product (that is activated), does not show up in the vendor data set, the record is created as 'Deactivated' in the ProductState table.
The purpose of the ProductState table is to track activation and deactivation states of a product. Also the ProductState is a Multi-To-One relationship with the Product Table, and the state of the product will only change once daily, therefore my PKEY would be ProductId and ChangedDate.
With foreign keys, you definitely need to have the data on the Product table first, before entering the state, think about it with this logic: "How can something that dont exist have a state" ?
So pseudocode of what you should do:
Read in the vendor's product list
Compare them to the existing list in your Product table
If new ones found: 3.1 Insert it to Product table, 3.2 Insert it to ProductState table
If missing from vendor's list: 4.1 Insert it to ProductState table
All these should be done in 1 transaction. Note that you should NOT delete things from Product table, unless you really want to delete every information associated with it, ie. also delete all the "states" that you have stored.
Rather than trying to do this all in 1 query - best bet is to create a stored procedure that does the work as step-by-step above. I think it gets overly complicated (or in this case, probably impossible) to do all in 1 query.
Edit: Something like this:
CREATE PROCEDURE `some_procedure_name` ()
BEGIN
-- Breakdown the tmpImport table to 2 tables: new and removed
SELECT * INTO _temp_new_products
FROM`tmpImport` t
LEFT JOIN `Product` p
ON t.`ProductId` = p.`Id`
WHERE p.`Id` IS NULL
SELECT * INTO _temp_removed_products
FROM `Product` p
LEFT JOIN `tmpImport` t
ON t.`ProductId` = p.`Id`
WHERE t.`ProductId` IS NULL
-- For each entry in _temp_new_products:
-- 1. Insert into Product table
-- 2. Insert into ProductState table 'activated'
-- For each entry in _temp_removed_products:
-- 1. Insert into ProductState table 'deactivated'
-- drop the temporary tables
DROP TABLE _temp_new_products
DROP TABLE _temp_removed_products
END
I think you should:
start a transaction
do your insert into the Products table
do your insert into the ProductState table
commit the transaction
This will avoid any foreign key errors, but will also make sure your data is always accurate. You do not want to 'avoid' the foreign key constraint in any way, and InnoDB (which I'm sure you are using) never defers these constraints unless you turn them off completely.
Also no you cannot insert into multiple tables in one INSERT ... SELECT statement.

How to fill in the "holes" in auto-increment fields?

I've read some posts about this but none cover this issue.
I guess its not possible, but I'll ask anyway.
I have a table with more than 50.000 registers. It's an old table where various insert/delete operations have taken place.
That said, there are various 'holes' some of about 300 registers. I.e.: ..., 1340, 1341, 1660, 1661, 1662,...
The question is. Is there a simple/easy way to make new inserts fill these 'holes'?
I agree with #Aaron Digulla and #Shane N. The gaps are meaningless. If they DO mean something, that is a flawed database design. Period.
That being said, if you absolutely NEED to fill these holes, AND you are running at least MySQL 3.23, you can utilize a TEMPORARY TABLE to create a new set of IDs. The idea here being that you are going to select all of your current IDs, in order, into a temporary table as such:
CREATE TEMPORARY TABLE NewIDs
(
NewID INT UNSIGNED AUTO INCREMENT,
OldID INT UNSIGNED
)
INSERT INTO NewIDs (OldId)
SELECT
Id
FROM
OldTable
ORDER BY
Id ASC
This will give you a table mapping your old Id to a brand new Id that is going to be sequential in nature, due to the AUTO INCREMENT property of the NewId column.
Once this is done, you need to update any other reference to the Id in "OldTable" and any foreign key it utilizes. To do this, you will probably need to DROP any foreign key constraints you have, update any reference in tables from the OldId to the NewId, and then re-institute your foreign key constraints.
However, I would argue that you should not do ANY of this, and just understand that your Id field exists for the sole purpose of referencing a record, and should NOT have any specific relevance.
UPDATE: Adding an example of updating the Ids
For example:
Let's say you have the following 2 table schemas:
CREATE TABLE Parent
(
ParentId INT UNSIGNED AUTO INCREMENT,
Value INT UNSIGNED,
PRIMARY KEY (ParentId)
)
CREATE TABLE Child
(
ChildId INT UNSIGNED AUTO INCREMENT,
ParentId INT UNSIGNED,
PRIMARY KEY(ChildId),
FOREIGN KEY(ParentId) REFERENCES Parent(ParentId)
)
Now, the gaps are appearing in your Parent table.
In order to update your values in Parent and Child, you first create a temporary table with the mappings:
CREATE TEMPORARY TABLE NewIDs
(
Id INT UNSIGNED AUTO INCREMENT,
ParentID INT UNSIGNED
)
INSERT INTO NewIDs (ParentId)
SELECT
ParentId
FROM
Parent
ORDER BY
ParentId ASC
Next, we need to tell MySQL to ignore the foreign key constraint so we can correctly UPDATE our values. We will use this syntax:
SET foreign_key_checks = 0;
This causes MySQL to ignore foreign key checks when updating the values, but it will still enforce the correct value type is used (see MySQL reference for details).
Next, we need to update our Parent and Child tables with the new values. We will use the following UPDATE statement for this:
UPDATE
Parent,
Child,
NewIds
SET
Parent.ParentId = NewIds.Id,
Child.ParentId = NewIds.Id
WHERE
Parent.ParentId = NewIds.ParentId AND
Child.ParentId = NewIds.ParentId
We now have updated all of our ParentId values correctly to the new, ordered Ids from our temporary table. Once this is complete, we can re-institute our foreign key checks to maintain referential integrity:
SET foreign_key_checks = 1;
Finally, we will drop our temporary table to clean up resources:
DROP TABLE NewIds
And that is that.
What is the reason you need this functionality? Your db should be fine with the gaps, and if you're approaching the max size of your key, just make it unsigned or change the field type.
You generally don't need to care about gaps. If you're getting to the end of the datatype for the ID it should be relatively easy to ALTER the table to upgrade to the next biggest int type.
If you absolutely must start filling gaps, here's a query to return the lowest available ID (hopefully not too slowly):
SELECT MIN(table0.id)+1 AS newid
FROM table AS table0
LEFT JOIN table AS table1 ON table1.id=table0.id+1
WHERE table1.id IS NULL
(remember to use a transaction and/or catch duplicate key inserts if you need concurrent inserts to work.)
INSERT INTO prueba(id)
VALUES (
(SELECT IFNULL( MAX( id ) , 0 )+1 FROM prueba target))
IFNULL for skip null on zero rows count
add target for skip error mysql "error clause FROM)
There is a simple way but it doesn't perform well: Just try to insert with an id and when that fails, try the next one.
Alternatively, select an ID and when you don't get a result, use it.
If you're looking for a way to tell the DB to automatically fill the gaps, then that's not possible. Moreover, it should never be necessary. If you feel you need it, then you're abusing an internal technical key for something but the single purpose it has: To allow you to join tables.
[EDIT] If this is not a primary key, then you can use this update statement:
update (
select *
from table
order by reg_id -- this makes sure that the order stays the same
)
set reg_id = x.nextval
where x is a new sequence which you must create. This will renumber all existing elements preserving the order. This will fail if you have foreign key constraints. And it will corrupt your database if you reference these IDs anywhere without foreign key constraints.
Note that during the next insert, the database will create a huge gap unless you reset the identity column.
As others have said, it doesn't matter, and if it does then something is wrong in your database design. But personally I just like them to be in order anyway!
Here is some SQL that will recreate your IDs in the same order, but without the gaps.
It is done first in a temp_id field (which you will need to create), so you can see that it is all good before overwriting your old IDs. Replace Tbl and id as appropriate.
SELECT #i:=0;
UPDATE Tbl
JOIN
(
SELECT id
FROM Tbl
ORDER BY id
) t2
ON Tbl.id = t2.id
SET temp_id = #i:=#i+1;
You will now have a temp_id field with all of your shiny new IDs. You can make them live by simply:
UPDATE Tbl SET id = temp_id;
And then dropping your temp_id column.
I must admit I'm not quite sure why it works, since I would have expected the engine to complain about duplicate IDs, but it didn't when I ran it.
You might wanna clean up gaps in a priority column.
The way below will give an auto increment field for the priority.
The extra left join on the same tabel will make sure it is added in the same order as (in this case) the priority
SET #a:=0;
REPLACE INTO footable
(id,priority)
(
SELECT tbl2.id, #a
FROM footable as tbl
LEFT JOIN footable as tbl2 ON tbl2.id = tbl.id
WHERE (select #a:=#a+1)
ORDER BY tbl.priority
)