What is better subquery vs literal on IN clause in mysql - mysql

Subquery on in clause:
SELECT * FROM TABLE1 WHERE IN (SELECT Field1 FROM TABLE2)
Literal on in clause:
SELECT * FROM TABLE1 WHERE IN (1,2,3,4)
Which query is better?
Appends
Ok, let's I elaborate my database
-- `BOARD` is main board table
CREATE TABLE BOARD (
BoardKey INT UNSIGNED,
Content TEXT,
PRIMARY KEY (BoardKey)
)
-- `VALUE` is extra value table
CREATE TABLE VALUE (
BoardKey INT UNSIGNED,
Value TEXT
)
And this example is searching board record using EAV fields
First step is extract needed board keys from VALUE table
Next step is searching board from BOARD table using extracted board keys
This example is just example,
so I don't need restructuring table design
Subquery on in clause:
SELECT * FROM BOARD WHERE (SELECT BoardKey FROM VALUE WHERE Value='SOME')
Literal on in clause:
SELECT BoardKey FROM VALUE WHERE AND Value='SOME'
Get list of BoardKey and put to the some variable
SELECT * FROM BOARD WHERE BoardKey IN (1,2,3,4)

It all depends on your initial requirements. If you know the values (here 1,2,3,4) are static, you may hard code them. But if they will change in the future, it is better to use the sub query. Normally subquery is more durable but more resource consuming.
Please elaborate on your requirements, so that we can understand the problem and answer you better.
EDIT 1:
Ok, first of all, i have never seen a EAV model on two table, basically it is done with one table. In you case you will have difficulty searching for the key in the two table when you can combine them in one table. Ideally, you table should be like this :
CREATE TABLE BOARD
(
BoardKey INT UNSIGNED,
Content TEXT,
Value TEXT
PRIMARY KEY (BoardKey)
)
Finally, you can do
SELECT * FROM BOARD WHERE Value='SOME'
If the value 'SOME' will change in the future, better stick with Subquery. Hope it helped, vote answered if so.

Related

SLOW QUERY / IN HAVING Clause

I have a many-to-many relationship database in MySQL
And this Query:
SELECT main_id FROM posts_tag
WHERE post_id IN ('134','140','187')
GROUP BY main_id
HAVING COUNT(DISTINCT post_id) = 3
There are ~5,300,000 rows into this table and that query seems to be slow like 5 seconds (and slower if I add more ids into search)
I want to ask if there is any way to make it faster?
EXPLAIN shows this:
By the way, I want to add more conditions like NOT IN and possible JOIN new tables which has same structure but different data. Not so much like this but first I want to know if there is any way to make that simple query faster?
Any advice would be helpful, even another method, or structure etc.
PS: Hardware is Intel Core i9 3.6Ghz, 64GB RAM, 480GB SSD. So I think the server specs is not a problem.
Use a "composite" and "covering" index:
INDEX(post_id, main_id)
And get rid of INDEX(post_id) since it will then be redundant.
"Covering" helps speed up a query.
Assuming this is a normal "many-to-many" table, then:
CREATE TABLE post_main (
post_id -- similar to `id` in table `posts`
main_id -- similar to `id` in table `main`
PRIMARY KEY(post_id, main_id),
INDEX(main_id, post_id)
) ENGINE=InnoDB;
There is no need for AUTO_INCREMENT anywhere in a many-to-many table.
(You could add FK constraints, but I say 'why bother'.)
More discussion: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
And NOT IN
This gets a bit tricky. I think this is one way; there may be others.
SELECT main_id
FROM post_main
WHERE post_id IN (244,229,193,93,61)
GROUP BY main_id AS x
HAVING COUNT(*) = 5
AND NOT EXISTS ( SELECT 1
FROM post_main
WHERE main_id = x.main_id
AND post_id IN (92,10,234) );
Alexfsk, your Query on the second line has the IN variables surrounded by single quotes. When your column name is defined as INT or mediumint (or any kind of int) datatype, adding the single quotes around the data causes datatype conversion delays on every row considered and delays completion of your query.

MySQL query with a union for 100 tables vs 100 separate queries what is more performant?

I have a legacy project using MySQL DB with MyISAM tables, and the DB design is far from perfect. And I encountered with N+1 problem because of having entity table and amount of entity_SOME_ID_subentity tables with similar base structure and some random additional columns, where SOME_ID is a primary key value of records from entity table.
For sure this is absolutely not great, but let's assume this is our initial condition and cannot be changed in the short term. So I need to optimize a query where I need to select some amount of records from entity table and some aggregated data from related entity_SOME_ID_subentity table. The aggregation will use only columns that are similar in all subentity tables. Initially, this was implemented as a single query to entity table and then a lot of queries in a loop to corresponding entity_SOME_ID_subentity tables.
I cannot use joins since each entity has a separate subentity table, so maybe using a union can help to reduce the number of queries down to 2, where the second one will use a union for subqueries to each required subentity table.
An additional note is that I need to do sorting of all stuff before pagination will be applied.
Can you advice is it worth at all to try the approach with the union in this situation, or performance be bad in both cases? Or maybe you have better ideas about how this can be handled?
Update:
The query to entity table is trivial and looks like:
SELECT col1, col2, col3 FROM entity WHERE ... LIMIT 10 OFFSET 0;
And the query to entity_SOME_ID_subentity looks like:
SELECT count(id) total, min(start_date) started, max(completion_date) completed
FROM entity_1234_subentity
ORDER BY started;
Here entity_1234_subentity is an example of how table names look like.
And using unions can look like:
SELECT count(id) total, min(start_date) started, max(completion_date) completed
FROM entity_1111_subentity
UNION
(SELECT count(id) total, min(start_date) started, max(completion_date) completed
FROM entity_2222_subentity)
UNION
(SELECT count(id) total, min(start_date) started, max(completion_date) completed
FROM entity_3333_subentity)
...
ORDER BY started
That's a typical design which seemed to be smart by the time of being created but turns out to be absolutely not scalable... Have seen a lot projects like this. If I where you, I would create an index for the search function.
You could
a) User external indexing/search engine SOLR or ElasticSearch.
b) in your RDBS create a index-table containing the recurring information from all sub-tables (like id, start_date, completion_date in your case) which gets updated either on every sub-table update, or, if there are to many places in the code you had to change, every hour/day whatever by cronjob.
It smells like someone tried to implement table inheritance and left a mess.
You can fix this using JSON and views, possibly faster than you can write out 100 unions.
In a transaction (or at least test it on a copy) modify entity so it can hold all the information in the subtables.
Add all the common columns from the subtables into entity.
Add a JSON column to hold the grab bag of data.
alter table entity add column start_date datetime;
alter table entity add column completion_date datetime;
alter table entity add column data json;
If you're not into JSON, you can use a traditional key/value table to store the extra columns, though this loses some flexibility because the value must be a string.
create table entity_data (
entity_id bigint not null,
key varchar(255) not null,
value varchar(255) not null
);
Then, for each subtable...
Update each entity's info with its subentity info. The common columns are updated directly. The rest turn into JSON.
update entity e
inner join entity_123_subentity s on e.id = 123
set
start_date = s.start_date,
completion_date = s.completion_date,
data = json_object(`extra1`, s.extra1, `extra2`, s.extra2)
where id = 123
Once that's done and verified, drop the subtable and replace it with a view.
drop entity_123_subentity;
create view entity_123_subentity
(id, start_date, completion_date, extra1, extra2)
as
select
id, start_date, completion_date, data->>'$.extra1', data->>'$.extra2'
from entity
where subid = 123;
Repeat until there are no more subtables.
New queries can be written efficiently, and old queries will still work until they can be rewritten.

How should I batch query an append-only table in mysql?

Suppose I have an append-only table:
CREATE TABLE IF NOT EXISTS `states` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`start_date` date DEFAULT NULL,
`end_date` date DEFAULT NULL,
`person_id` int(10) unsigned default NULL,
PRIMARY KEY (`id`)
);
There is an index on name and another on person_id (person_id is a fkey reference to another table).
For each name, we store a mapping to person_id for a given date range. The mapping from name -> person_id is many to one (this is a contrived example, but think of it as storing how a person could change their name). We never want to delete history so when altering the mapping, we insert a new entry. The last entry for a given name is the source of truth. We end up wanting to ask two different types of questions on the dataset, for which I have some general questions.
What is the current mapping for a given name/list of names?
If there is only one name, the most straightforward query is:
select * from states where name = 'name' ORDER BY `id` DESC LIMIT 1;
If there is more than one name, the best way I could figure out is to do:
select * from states as a
left join states as b on a.name = b.name and a.id < b.id
where isnull(b.id);
Is this actually the best way to batch query? For a batch of 1, how much worse would the second query be than the first? Using explain, I can tell we end up doing two index lookups instead of 1. Given we care a lot about the performance of this individual lookup, my gut is to run different queries depending on the number of names we are querying for. I'd prefer if there was a way to defer to mysql's optimizer though. Is there a way to write this query so mysql figures out what to do for me?
What are the current mappings that map to person_id / a list of person_ids?
The way I would query for that is:
select * from states as a
left join states as b on a.name = b.name and a.id < b.id
where isnull(b.id) and person_id in person_id_list
I am slightly concerned about the performance for small lists though because my understanding of how mysql works is limited. Using explain, I know that mysql filters by person_id via the index on a before filtering by isnull(b.id). But does it do this before the join or after the join? Could we end up wasting a lot of time joining these two tables? How could I figure this out in general?
The code in (1) is "groupwise-max", but done in a very inefficient way. (Follow the tag I added for more discussion.)
May I suggest you have two tables; one that is append-only, like you have. Let's call this table History. Then have another table called Current. When you add a new entry, INSERT into History, but replace into Current.
If you do take this approach, consider what differences you might have in the two tables. The PRIMARY KEY will certainly be different; other indexes may be different, and even some columns may be different.

MySQL - Copying partial data from one table to another

This may be a silly question, and I understand why I'm getting the result that I am, however, I thought mySQL acted differently and I can't finish the documentation to tell me otherwise.
I have 2 basic tables as follows:
CREATE TABLE test ( num INT, time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP );
CREATE TABLE test_to_copy ( num INT );
I then create a single entry into the test_to_copy table:
INSERT INTO test_to_copy VALUES ( 12 );
Now I try and copy the table test_to_copy to test like so:
INSERT INTO test SELECT * FROM test_to_copy;
The error that keeps getting thrown is
"Column count doesn't match value count at row 1".
I know that it is complaining that the number of columns in both tables does not match meaning it does not know what variable we are assigning our copy to, however, should it not be a case where the time is created automatically i.e. defaulted if nothing is inserted when we do the copy rather than throw the error?
Due to constraints, I can no longer have the time in both tables, and I must do a SELECT * on the test_to_copy table as there are over 50 columns, and i'm wondering is there an easy way around this?
This is another variation of a frequent question: "can I query *-except-for-one-column?"
No, there is no wildcard-with-exceptions syntax in SQL. The * wildcard means all columns. If you don't want all columns, you must name the columns explicitly.
If you have a variety of columns because this method may be used for more than one table, you can get the list of columns for any given table from INFORMATION_SCHEMA.COLUMNS and use that information to build a dynamic SQL query.
Here's a way you can produce the list of columns:
SELECT
GROUP_CONCAT(
CONCAT('`', column_name, '`')
) AS _cols
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA='mydatabase' AND TABLE_NAME='mytable'
AND COLUMN_NAME NOT IN ('time'); -- or other columns to exclude
See also:
Select all columns except one in MySQL?
SQL exclude a column using SELECT * [except columnA] FROM tableA?
INSERT INTO test (num)
SELECT num
FROM test_to_copy

How to have a simple array as column data?

I want one of the columns of a table to contain numbers separated either by space or comma or some other character. e.g. 113, 34, 56
I want to be able to query this table with a single number. e.g. 34 or 67 or 345
If the number I query the database with exists in the column, I want to have that (or those) records delivered.
Questions:
What should be my column's type? Is it varchar?
How should I query the database for this?
I must strongly advise against this. The proper way to store them is in a table with a foreign key to the table holding the rest of the row, one row per value you would have stored in the space separated list.
It will cause you headaches down the road.
Table maintbl
id PRIMARY KEY
column1 VARCHAR
column2 VARCHAR
column3 VARCHAR
Table arraydata
main_id FOREIGN KEY to maintbl.id
value
This does not seem like a great idea, as you wont be able to take advantage of the database indexes. Why do you need to have the numbers all in one column? A more efficient way would be to have it in a separate table with multiple rows (one per number). Michael's answer explains what this would look like (pretty simple relation).
If you insist on having it all in one column, then VARCHAR would do.
You would then have to query with:
SELECT * FROM TABLE where column = '34' OR column LIKE (',34%') OR column LIKE ('34,%')
That would cover the case where there is only one number, or the case where 34 is the first number, or where 34 is anywhere after the first number.
Question #1: Yes, it is ok to be VARCHAR. You only need to take care of the size. It should be sufficient for your needs.
Question #2: SELECT * FROM table WHERE FIND_IN_SET('34', col) > 0
To use FIND_IN_SET, you must separate the values with comma, and no spaces, like this: 113,54,36
Don't use the LIKE approach, as it will find "340" if you do LIKE "%34%"
This would be a varchar column you're looking to use. You'll want to make sure you make it a long length so that you have enough room for all the numbers you're storing.
The query would look something like this:
Select * from table where numbers like '%34%' or numbers like '%67%' or numbers like '%345%';
Normally you'd be storing each of those numbers in a separate column so something like this not the most efficient. For a larger database I wouldn't recommend this, as indexes won't be used so the query won't perform too well with a large number of rows (>10,000)
I would recommend doing it this way. Assuming you have 1 table now called "table1"
table1 would have an integer that's a unique ID that is set to auto increment called "ID".
CREATE TABLE `table1` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(60) NOT NULL,
UNIQUE KEY `id` (`id`),
);
You would create a second table called "table2"
CREATE TABLE `table2` (
`id` int(11) NOT NULL,
`number` int(11) NOT NULL,
UNIQUE KEY `idnumber` (`id`, `number`),
);
now you can properly query this like so:
SELECT t1.name, t1.id, t2.number FROM table1 t1
JOIN table2 t2 on t1.id = t2.id
WHERE t2.number = 34 OR t2.number = 67;
This would allow you to know for sure you're selecting the numbers you are looking for.
With spaces or comma's you're going to need VARCHAR, or one of the other text-based types.
You should be able to select what you want like this:
SELECT * FROM table WHERE col LIKE '%34%'