So I have a SELECT Statement that is comparing the current column content from the table_1 column "table_1_content" with the content of another column (table_2_content) in table_2, whereas content in "table_2_content" can be found anywhere in "table_1_content":
$select = "SELECT * FROM table_1, table_2 WHERE `table_1_content` LIKE CONCAT('%', table_2_content, '%')";
$result = mysqli_query($con, $select);
My problem is that LIKE CONCAT is pretty performance heavy.
Is there another way to search through two columns from different tables, so that no full table scan is performed every time the query is executed?
The LIKE in total free text format (% at the start and at the end of the search string) is the performance heavy part. Is the wildcard at the start of the string necessary? If so: You might have to consider pre-processing the data in a different way so that the search can use a single wildcard or no wildcard at all. This last part (depending on the data) is for example done by splitting the string by a delimiter and storing the data in separate rows, after which a much faster comparison and indexes are possible to be used.
To put data in multiple rows, we would assume a usable separator (can be multiple, the code just gets longer):
CREATE TABLE baseinfo (id INT NOT NULL auto_increment primary key,
some other columns);
CREATE TABLE explodedstring(id INT NOT NULL, str VARCHAR(200),
FOREIGN KEY (id) REFERENCES baseinfo(id));
CREATE PROCEDURE explodestring(id int, fullstr VARCHAR(4000))
BEGIN
{many examples exist already how to do this on SO}
END;
The procedure would take as input your key from the original data (id in this case), and the original string.
The output of the procedure would end up in a secondary table explodedstring against which you now could run a normal select (add some index for performance). The resulting ids would tell you which record would match.
Related
This may seem like a dumb question. I am wanting to set up an SQL db with records containing numbers. I would like to run an enquiry to select a group of records, then take the values in that group, do some basic arithmetic on the numbers and then save the results to a different table but still have them linked with a foreign key to the original record. Is that possible to do in SQL without taking the data to another application and then importing it back? If so, what is the basic function/procedure to complete this action?
I'm coming from an excel/macro/basic python background and want to investigate if it's worth the switch to SQL.
PS. I'm wanting to stay open source.
A tiny example using postgresql (9.6)
-- Create tables
CREATE TABLE initialValues(
id serial PRIMARY KEY,
value int
);
CREATE TABLE addOne(
id serial,
id_init_val int REFERENCES initialValues(id),
value int
);
-- Init values
INSERT INTO initialValues(value)
SELECT a.n
FROM generate_series(1, 100) as a(n);
-- Insert values in the second table by selecting the ones from the
-- First one .
WITH init_val as (SELECT i.id,i.value FROM initialValues i)
INSERT INTO addOne(id_init_val,value)
(SELECT id,value+1 FROM init_val);
In MySQL you can use CREATE TABLE ... SELECT (https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html)
I need to search a medium sized MySql table (about 15 million records).
My query searches for a value ending with another value, for example:
SELECT * FROM {tableName} WHERE {column} LIKE '%{value}'
{value} is always 7 characters length.
{column} is sometimes 8 characters length (otherwise it is 7).
Is there a way to improve performence on my search?
clearly index is not an option.
I could save {column} values in reverse order on another column and index that column, but im looking to avoid this solution.
{value} is always 7 characters length
Your data is not mormalized. Fixing this is the way to fix the problem. Anything else is a hack. Having said that I accept it is not always proactical to repair damage done in the past by dummies.
However the most appropriate hack depends on a whole lot of information you've not told us about.
how frequently you will run the query
what the format of the composite data is
but im looking to avoid this solution.
Why? It's a reasonable way to address the problem. The only downside is that you need to maintain the new attribute - given that this data domain appears in different attributes in multiple (another normalization violation) means it would make more sense to implement the index in a seperate, EAV relation but you just need to add triggers on the original table to maintain sync using your existing code base. Every solution I can think will likely require a similar fix.
Here's a simplified example (no multiple attributes) to get you started:
CREATE TABLE lookup (
table_name VARCHAR(18) NOT NULL,
record_id INT NOT NULL, /* or whatever */
suffix VARCHAR(7),
PRIMARY KEY (table_name, record_id),
INDEX (suffix, table_name, record_id)
);
CREATE TRIGGER insert_suffix AFTER INSERT ON yourtable
FOR EACH ROW
REPLACE INTO lookup (table_name, record_id, suffix)
VALUES ('yourtable', NEW.id
, SUBSTR(NEW.attribute, NEW.id, RIGHT(NEW.attribute, 7
);
CREATE TRIGGER insert_suffix AFTER UPDATE ON yourtable
FOR EACH ROW
REPLACE INTO lookup (table_name, record_id, suffix)
VALUES ('yourtable', NEW.id
, RIGHT(NEW.attribute, 7)
);
CREATE TRIGGER insert_suffix AFTER DELETE ON yourtable
FOR EACH ROW
DELETE FROM lookup WHERE table_name='yourtable' AND record_id=OLD.id
;
If you have a set number of options for the first character, then you can use in. For instance:
where column in ('{value}', '0{value}', '1{value}', . . . )
This allows MySQL to use an index on the column.
Unfortunately, with a wildcard at the beginning of the pattern, it is hard to use an index. Is it possible to store the first character in another column?
Currently, I have a mySQL table with columns that looks something like this:
run_date DATE
name VARCHAR(10)
load INTEGER
sys_time TIME
rec_time TIME
valid TINYINT
The column valid is essentially a valid bit, 1 if this row is the latest value for this (run_date,name) pair, and 0 if not. To make insertions simpler, I wrote a stored procedure that first runs an UPDATE table_name SET valid = 0 WHERE run_date = X AND name = Y command, then inserts the new row.
The table reads are in such a way that I usually use only the valid = 1 rows, but I can't discard the invalid rows. Obviously, this schema also has no primary key.
Is there a better way to structure this data or the valid bit, so that I can speed up both inserts and searches? A bunch of indexes on different orders of columns gets large.
In all of the suggestions below, get rid of valid and the UPDATE of it. That is not scalable.
Plan A: At SELECT time, use 'groupwise max' code to locate the latest run_date, hence the "valid" entry.
Plan B: Have two tables and change both when inserting: history, with PRIMARY KEY(name, run_date) and a simple INSERT statement; current, with PRIMARY KEY(name) and INSERT ... ON DUPLICATE KEY UPDATE. The "usual" SELECTs need only touch current.
Another issue: TIME is limited to 838:59:59 and is intended to mean 'time of day', not 'elapsed time'. For the latter, use INT UNSIGNED (or some variant of INT). For formatting, you can use sec_to_time(). For example sec_to_time(3601) -> 01:00:05.
So, we know this one works when I want to select all ID's that are present in the inner sql statement
Select *
FROM TableA
WHERE Column1 IN (SELECT column2 FROM tableB WHERE = condition)
What kind of syntax do I need to do if Column1 is a long string and I need to check if a certain substring exists.
Ex Column1 = "text text text text 12345" where 12345 is an ID that is present in the list of ID's given by the inner sql statement
Basically I'm trying to detect if an ID is present in one of strings from another table based on my list of ID's from another table.
Should I do this in SQL or let a serverside code do it?
This is usually done using the LIKE operator:
SELECT ... FROM ... WHERE Column1 LIKE "%12345%";
However this is extremely slow, since it is based on substring matching. To improve performance you have to create a search index table storing single words. Such index typically is maintained by trigger definitions: whenever an entry is changed the trigger also changes the set of words extracted into the search index table. Searching in such an index table is obviously fast and can be combined with the original table by means of a JOIN based on the n:1 relationship between words in the index to the original entries in your table.
Instead of using fieldname like '%needle%' search, which is extremely slow because it cannot utilise indexes, create a fulltext index on the given column and use fulltext search to find the matching substring.
Below code excerpt is quoted from the MySQL documentation:
CREATE TABLE articles (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
title VARCHAR(200),
body TEXT,
FULLTEXT (title,body)
) ENGINE=InnoDB;
SELECT * FROM articles
WHERE MATCH (title,body)
AGAINST ('database' IN NATURAL LANGUAGE MODE);
The catch with syntax is that the list of words being looked for ('database' in the above code example) must be a string literal, it cannot be a subquery. You need to assemble the list of keywords in the application that calls the sql statement.
I have a column MEDIUMTEXT that contains values that come from a goup_concat, in the form of INT,INT,INT . We can call it Concatenated_IDs
The length of the string can be of 1 int or more.
I need to break it down into original values somehow to be able to do something such as
SELECT
table_country.name
FROM
table_country
WHERE
table_country.country_id IN (
SELECT
Concatenated_IDs
FROM
table_targeted_countries
WHERE
table_targeted_countries.email LIKE "%gmail.com")
and get the country names that users registered with a gmail address target.
I have considered exploding the mediumtext into INT, creating one row for each int, sort of like a reverse concat, but I am guessing it would take a large procedure
edit:reformulated question name
You should probably normalize that table, so those concated ids are stored in a separate table, one id per record. But in the mean time, you can use mysql's find_in_set() function:
SELECT ...
WHERE FIND_IN_SET(table_country.country_id, Concatenated_IDs) > 0
Relevant docs: http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_find-in-set