I have two tables that look something like this:
Table #1:
CREATE TABLE iteminfo
(Code CHAR(1) PRIMARY KEY,
Tags TEXT NOT NULL);
Table #2:
CREATE TABLE items
(ID INT UNSIGNED PRIMARY KEY,
Name VARCHAR(255) NOT NULL,
Code CHAR(1),
FOREIGN KEY (Code) REFERENCES iteminfo(Code));
I want to create a FULLTEXT index using the fields Name and Tags from the two tables. I would assume that I will have to use EQUIJOIN or something similar but this doesn't work:
ALTER TABLE items JOIN iteminfo WHERE items.Code = iteminfo.Code
ADD FULLTEXT (Name, Tags);
I want to know:
Is this even possible to do?
If yes, then how do I do it?
If no, then what other ways are there to index two columns present in different tables?
Thanks for answering in advance! I apologise if this question already exists but I couldn't find the answer online.
No.
But... It would make sense to collect the various columns from the various tables together in a single column and apply a FULLTEXT index to it.
Assuming there are many tags for each item, you could initialize such via:
CREATE search_info ( PRIMARY KEY(name) )
SELECT name,
CONCAT(name, ' ',
( SELECT GROUP_CONCAT(tags) FROM iteminfo
WHERE code = items.code ) ) AS search
FROM items;
Then
ALTER TABLE search_info ADD FULLTEXT(search);
(After that, changes to items or iteminfo would need to also modify search_info.)
I tired what was suggested in #RickJames's answer and it seems to be the solution to my issue. For anyone else who might want to know, here's what I did but in context to the information I gave in my original question:
ALTER TABLE items ADD COLUMN Tags TEXT;
UPDATE items
SET search = CONCAT(name, ' ', (SELECT GROUP_CONCAT(tags) FROM iteminfo WHERE code = items.code))
WHERE Code IS NOT NULL;
ALTER TABLE items ADD FULLTEXT(Tags);
Related
I'm a beginner to SQL so this is probably a pretty newbie question, but I can't seem to get my head straight on it. I have a pair of tables called MATCH and SEGMENT.
MATCH.id int(11) ai pk
MATCH.name varchar(45)
etc.
SEGMENT.id int(11) ai pk
SEGMENT.name varchar(45)
etc.
Each row in MATCH can have one or more SEGMENT rows associated with it. The name in MATCH is unique on each row. Right now I do an inner join on the name fields to figure out which segments go with which match. I want to copy the tables to a new set of tables and set up a foreign key in SEGMENT that contains the unique ID from the MATCH row both to improve performance and to fix some problems where the names aren't always precisely the same (and they should be).
Is there a way to do a single INSERT or UPDATE statement that will do the name comparisons and add the foreign key to each row in the SEGMENT table - at least for the rows where the names are precisely the same? (For the ones that don't match, I may have to write a SQL function to "clean" the name by removing extra blanks and special characters before comparing)
Thanks for any help anyone can give me!
Here's one way I would consider doing it: add the FK column, add the constraint definition, then populate the column with an UPDATE statement using a correlated subquery:
ALTER TABLE `SEGMENT` ADD COLUMN match_id INT(11) COMMENT 'FK ref MATCH.id' ;
ALTER TABLE `SEGMENT` ADD CONSTRAINT fk_SEGMENT_MATCH
FOREIGN KEY (match_id) REFERENCES `MATCH`(id) ;
UPDATE `SEGMENT` s
SET s.match_id = (SELECT m.id
FROM MATCH m
WHERE m.name = s.name) ;
A correlated subquery (like in the example UPDATE statement above) usually isn't the most efficient approach to getting a column populated. But it seems a lot of people think it's easier to understand than the (usually) more efficient alternative, an UPDATE using a JOIN operation like this:
UPDATE `SEGMENT` s
JOIN `MATCH` m
ON m.name = s.name
SET s.match_id = m.id
Add an ID field you your MATCH Table and populate it.
them add a column MATCHID (which will be your foriegn key) to your SEGMENT table - Note you wont be able to set this as a Foreign Key till you have mapped the records correctly
Use the following query to update the foreign keys:
UPDATE A
FROM SEGMENT A
INNER JOIn MATCH B
on A.NAME=B.NAME
SET MATCHID = B.ID
Using the database schema for tagging from this question's accepted answer is it possible to have a query using group_concat that works with a large amount of data? I need to get items with their tags for all items tagged with tag x. Using a query with group_concat having ~ .5 million tags is very slow at > 15 seconds. Without group_concat (items without tags) it is ~ 0.05 seconds.
As a side question, how does SO solve this problem?
This is probably a case of a poor indexing strategy. Adapting the schema shown in the accepted answer of the question to which you linked:
CREATE Table Items (
Item_ID SERIAL,
Item_Title VARCHAR(255),
Content TEXT
) ENGINE=InnoDB;
CREATE TABLE Tags (
Tag_ID SERIAL,
Tag_Title VARCHAR(255)
) ENGINE=InnoDB;
CREATE TABLE Items_Tags (
Item_ID BIGINT UNSIGNED REFERENCES Items (Item_ID),
Tag_ID BIGINT UNSIGNED REFERENCES Tags ( Tag_ID),
PRIMARY KEY (Item_ID, Tag_ID)
) ENGINE=InnoDB;
Note that:
MySQL's SERIAL data type is an alias for BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE and, as such, is indexed;
defining the foreign key constraints in Items_Tags creates indexes on the foreign key columns.
I would propose to have a hybrid between normalized data and denormalized data .
So using the normalized structure provided by eggyal i would do the following denormalized structure :
CREATE TABLE Items_Tags_Denormalized (
Item_ID BIGINT UNSIGNED REFERENCES Items (Item_ID),
Tags BLOB,
PRIMARY KEY (Item_ID)
) ENGINE=InnoDB;
In column Tags you would have all the tags (Tag_Title) for the corresponding Item_ID.
Now you have 2 ways to achieve this:
create a cron that runs periodically which will build this table Items_Tags_Denormalized using GROUP_CONCAT or whatever suits you (advantage: doesn't put additional load when you insert or delete in Items_Tags table; disadvantage: the denormalized table will not always be up to date (depending on how often do you run the cron))
create triggers for Items_Tags table on insert and delete in order to keep up to date the Items_Tags_Denormalized table (advantage: the denormalized table will always be up to date;disadvantage: additional load when you insert or delete in Items_Tags table)
Choose whatever solution suits your needs best considering the advantages and disadvantages.
So in the end you will have the Items_Tags_Denormalized table from which you will only read without doing additional operations.
Why would you use group_concat for that? For a given tag x you said that selecting the list of items is fast. For a given list of items getting all the tags should be fast, too. And is there not normally some kind of restriction, I mean normal websites don't show 100000 entries on one page.
I would suggest:
drop temporary table if exists lookup_item;
create temporary table lookup_item (item_id serial, primary key(item_id));
insert into lookup_item select i.id as item_id
from items i
where exists (select * from items_tags where item_id = i.id and tag_id = <tag_id>)
and <other conditions or limits>;
select * from lookup_item
inner join items_tags it on it.item_id = i.id
inner join tags t on t.id = it.tag_id
order by i.<priority>, t.<priority>
priority could be last-modified for items and some kind of importance for tags.
Then you get every item with it's tags. The only work in the code is to see when the result-line has the next item.
If I understand correctly, GROUP_CONCAT isn't the only thing you are removing that makes the query faster without tags. Inside the GROUP_CONCAT you're selecting Tags.Tag_Title and forcing the Tags table to be accessed.
You could try running GROUP_CONCAT with Items_Tags.Tag_ID to test my theory.
I have table containing settings for an application with the columns: id, key, and value.
The id column is auto-incrementing but as of current, I do not use it nor does it have any foreign key constraints. I'm populating the settings and would like to restructure it so they are alphabetical as I've not been putting the settings in that way, but reordering alphabetically would help group related settings together.
For example, if I have the following settings:
ID KEY VALUE
======================================
1 App.Name MyApplication
2 Text.Title Title of My App
3 App.Version 0.1
I would want all the App.* settings to be grouped together sequential without having to do an ORDER BY everytime. Anyway, thats the explanation. I have tried the following and it didn't seem to change the order:
CREATE TABLE mydb.Settings2 LIKE mydb.Settings;
INSERT INTO mydb.Settings2 SELECT `key`,`value` FROM mydb.Settings ORDER BY `key` ASC;
DROP TABLE mydb.Settings;
RENAME TABLE mydb.Settings2 TO mydb.Settings;
That will make a duplicate of the table as suggested, but won't restructure the data. What am I missing here?
The easy way to reorder a table is with ALTER TABLE table ORDER BY column ASC. The query you tried looks like it should have worked, but I know the ALTER TABLE query works; I use it fairly often.
Note: Reordering the data in a table only works and makes sense in MyISAM tables. InnoDB always stores data in PRIMARY KEY order, so it can't be rearranged.
Decided to make that an answer.
As I said in a comment to the initial answer, for you to achieve a long term effect you need to recreate the settings table with the key column as the PRIMARY KEY. Because as G-Nugget correctly said 'InnoDB always stores data in PRIMARY KEY order'.
You can do that like this
CREATE TABLE settings2
(`id` int NULL, `key` varchar(64), `value` varchar(64), PRIMARY KEY(`key`));
INSERT INTO settings2
SELECT id, `key`, `value`
FROM settings;
DROP TABLE settings;
RENAME TABLE settings2 TO settings;
That way you get your order intact after inserting new records.
And if you don't need the initial id column in settings table it's a good time to ditch it.
Here is working sqlfiddle
Disclaimer: Personally I would use ORDER BY anyway
I'm trying to remove doublettes (sometimes triplettes, unfortunately!) from a MySQL table. My issue is that the only unique data available are the primary key, so in order to identify doublettes, you have to take account all the columns.
I've managed to identify all records that have doublettes and copied them along with their doublettes (including their primary keys) to the table temp. The source table is called translation and it has an integer primary key with the name TranslationID. How do I move on from here? Thanks!
edit Available columns are:
TranslationID
LanguageID
Translation
Etymology
Type
Source
Comments
WordID
Latest
DateCreated
AuthorID
Gender
Phonetic
NamespaceID
Index
EnforcedOwner
The duplicity issue resides with the rows with the Latest column assigned 1.
edit #2 Thank you, everyone for your time! I've solved the problem by using WouterH's answer, resulting in the following query:
DELETE from translation USING translation, translation as translationTemp
WHERE translation.Latest = 1
AND (NOT translation.TranslationID = translationTemp.TranslationID)
AND (translation.LanguageID = translationTemp.LanguageID)
AND (translation.Translation = translationTemp.Translation)
AND (translation.Etymology = translationTemp.Etymology)
AND (translation.Type = translationTemp.Type)
AND (translation.Source = translationTemp.Source)
AND (translation.Comments = translationTemp.Comments)
AND (translation.WordID = translationTemp.WordID)
AND (translation.Latest = translationTemp.Latest)
AND (translation.AuthorID = translationTemp.AuthorID)
AND (translation.NamespaceID = translationTemp.NamespaceID)
You can remove duplicates without temporary table or subquery. Delete all rows that have the same data but a different TranslationID
DELETE from translation USING translation, translation as translationTemp
WHERE (NOT translation.TranslationID = translationTemp.TranslationID)
AND (translation.LanguageID = translationTemp.LanguageID)
AND (translation.Translation = translationTemp.Translation)
AND (translation.Etymology = translationTemp.Etymology)
AND // compare other fields here
Create a SELECT statement with your current SELECT as a sub-select, so that you can return a col of IDs that should be removed. Then apply that SELECT in a DELETE FROM statement.
Example (pseudo code):
SELECT1 = SELECT ... AS temp; # the table you have right now
SELECT2 = SELECT TranslationID FROM (SELECT1)
Final query will look like this:
DELETE FROM table_name WHERE TranslationID IN (SELECT2);
You just need to insert the SELECT with sub-select in the final query.
Top stop duplicates in future you can change your engine to the InnoDB engine like this:
ALTER TABLE table_name ENGINE=InnoDB;
Then add a Unique constraint to the TranslationID field.
If the doublettes/triplettes are identical except for the primary key, then you can select all records from temp which are identical to another except for having a larger primary key than that other; this will give you temp w/ the record w/ the minimum key for each doublet/triplette. You can then delete these records from translation.
Instead of identifying the lines that aren't unique, I would try to copy the valid data to a new table, and then remove the old one and replace it by this new, cleaned table.
I can see of two ways:
Using the DISTINCT keyword in your SQL query (source);
Using a GROUP BY statement on all columns (source).
I have two sql tables called scan_sc and rescan_rsc. The scan table looks like this:
CREATE TABLE scan_sc
(
id_sc int(4),
Type_sc varchar(255),
ReScan_sc varchar(255),
PRIMARY KEY (id_sc)
)
When a scan a document I insert a row into the scan table. If the result of this scanning is poor I have to do a rescan, and therefore I have a rescan table.
CREATE TABLE rescan_rsc
(
id_rsc int(4),
Scan_rsc varchar(255),
PRIMARY KEY (id_rsc)
)
The problem is, I want to have a trigger that will fill in the column ReScannet_sc with an "x", in the scan_sc table, so I can see that there has been some problems here.
The trigger has to do it where the id from the rescan table is the same as in the scan table.
Hope you all understand my question.
Thanks in advance.
Do you really need the ReScan_sc column and the trigger?
With a simple JOIN, you can find out the records in your scan_sc table that have been re-scanned, without using the ReScan_sc column at all.
There are several possibilities:
Show all scans, with an additional column with the Rescan ID, if any:
SELECT scan_sc.*, rescan_sc.id_rsc
FROM scan_sc
LEFT JOIN rescan_sc ON scan_sc.id_sc = rescan_sc.id_rsc
Show only the scans which have been re-scanned:
SELECT scan_sc.*
FROM scan_sc
INNER JOIN rescan_sc ON scan_sc.id_sc = rescan_sc.id_rsc
(I assume that id_sc and id_rsc are the primary keys and that PRIMARY KEY (id_sd) is a typo, like marc_s pointed out in his comment)