How to have a simple array as column data? - mysql

I want one of the columns of a table to contain numbers separated either by space or comma or some other character. e.g. 113, 34, 56
I want to be able to query this table with a single number. e.g. 34 or 67 or 345
If the number I query the database with exists in the column, I want to have that (or those) records delivered.
Questions:
What should be my column's type? Is it varchar?
How should I query the database for this?

I must strongly advise against this. The proper way to store them is in a table with a foreign key to the table holding the rest of the row, one row per value you would have stored in the space separated list.
It will cause you headaches down the road.
Table maintbl
id PRIMARY KEY
column1 VARCHAR
column2 VARCHAR
column3 VARCHAR
Table arraydata
main_id FOREIGN KEY to maintbl.id
value

This does not seem like a great idea, as you wont be able to take advantage of the database indexes. Why do you need to have the numbers all in one column? A more efficient way would be to have it in a separate table with multiple rows (one per number). Michael's answer explains what this would look like (pretty simple relation).
If you insist on having it all in one column, then VARCHAR would do.
You would then have to query with:
SELECT * FROM TABLE where column = '34' OR column LIKE (',34%') OR column LIKE ('34,%')
That would cover the case where there is only one number, or the case where 34 is the first number, or where 34 is anywhere after the first number.

Question #1: Yes, it is ok to be VARCHAR. You only need to take care of the size. It should be sufficient for your needs.
Question #2: SELECT * FROM table WHERE FIND_IN_SET('34', col) > 0
To use FIND_IN_SET, you must separate the values with comma, and no spaces, like this: 113,54,36
Don't use the LIKE approach, as it will find "340" if you do LIKE "%34%"

This would be a varchar column you're looking to use. You'll want to make sure you make it a long length so that you have enough room for all the numbers you're storing.
The query would look something like this:
Select * from table where numbers like '%34%' or numbers like '%67%' or numbers like '%345%';
Normally you'd be storing each of those numbers in a separate column so something like this not the most efficient. For a larger database I wouldn't recommend this, as indexes won't be used so the query won't perform too well with a large number of rows (>10,000)
I would recommend doing it this way. Assuming you have 1 table now called "table1"
table1 would have an integer that's a unique ID that is set to auto increment called "ID".
CREATE TABLE `table1` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(60) NOT NULL,
UNIQUE KEY `id` (`id`),
);
You would create a second table called "table2"
CREATE TABLE `table2` (
`id` int(11) NOT NULL,
`number` int(11) NOT NULL,
UNIQUE KEY `idnumber` (`id`, `number`),
);
now you can properly query this like so:
SELECT t1.name, t1.id, t2.number FROM table1 t1
JOIN table2 t2 on t1.id = t2.id
WHERE t2.number = 34 OR t2.number = 67;
This would allow you to know for sure you're selecting the numbers you are looking for.

With spaces or comma's you're going to need VARCHAR, or one of the other text-based types.
You should be able to select what you want like this:
SELECT * FROM table WHERE col LIKE '%34%'

Related

MySQL performance issue on ~3million rows containing MEDIUMTEXT?

I had a table with 3 columns and 3600K rows. Using MySQL as a key-value store.
The first column id was VARCHAR(8) and set to primary key.The 2nd and 3rd columns were MEDIUMTEXT. When calling SELECT * FROM table WHERE id=00000 MySQL took like 54 sec ~ 3 minutes.
For testing I created a table containing VARCHAR(8)-VARCHAR(5)-VARCHAR(5) where data casually generated from numpy.random.randint. SELECT takes 3 sec without primary key. Same random data with VARCHAR(8)-MEDIUMTEXT-MEDIUMTEXT, the time cost by SELECT was 15 sec without primary key.(note: in second test, 2nd and 3rd column actually contained very short text like '65535', but created as MEDIUMTEXT)
My question is: how can I achieve similar performance on my real data? (or, is it impossible?)
If you use
SELECT * FROM `table` WHERE id=00000
instead of
SELECT * FROM `table` WHERE id='00000'
you are looking for all strings that are equal to an integer 0, so MySQL will have to check all rows, because '0', '0000' and even ' 0' will all be casted to integer 0. So your primary key on id will not help and you will end up with a slow full table. Even if you don't store values that way, MySQL doesn't know that.
The best option is, as all comments and answers pointed out, to change the datatype to int:
alter table `table` modify id int;
This will only work if your ids casted as integer are unique (so you don't have e.g. '0' and '00' in your table).
If you have any foreign keys that references id, you have to drop them first and, before recreating them, change the datatype in the other columns too.
If you have a known format you are storing your values (e.g. no zeros, or filled with 0s up to the length of 8), the second best option is to use this exact format to do your query, and include the ' to not cast it to integer. If you e.g. always fill 0 to 8 digits, use
SELECT * FROM `table` WHERE id='00000000';
If you never add any zeros, still add the ':
SELECT * FROM `table` WHERE id='0';
With both options, MySQL can use your primary key and you will get your result in milliseconds.
If your id column contains only numbers so define it as int , because int will give you better performance ( it is more faster)
Make the column in your table (the one defined as key) integer and retry. Check first performance by running a test within your DB (workbench or simple command line). You should get a better result.
Then, and only if needed (I doubt it though), modify your python to convert from integer to string (and/or vise-versa) when referencing the key column.

Comparing strings up to column length (using index)

Basically what I want to do is to reverse the column LIKE 'string%' behavior. Consider following table:
CREATE TABLE texts (
id int not null,
txt varchar(30) not null,
primary key(id),
key `txt_idx` (txt)
) engine=InnoDB;
INSERT INTO texts VALUES(1, 'abcd');
According to B-Tree Index Characteristics following query will utilize txt_idx index:
SELECT txt FROM texts WHERE txt LIKE 'abc%';
Now I want somewhat different behavior. I want the 'abcd' row to be returned when queried for 'abcde'. At the moment I've got stuck with this query:
SELECT txt FROM texts WHERE 'abcde' LIKE CONCAT(txt, '%');
Obviously (confirmed by explain) it does not utilize any index, but my intuition tells me it should be possible to compare particular value against index up to indexed value length (just like strncmp does).
The main reason for this is my huge table with domain entries. I want to select both "example.org" and "something.example.org" (but not "else.example.org") when querying for "www.something.example.org". Splitting and performing multiple queries or applying OR clauses seems to work too slow for me unfortunately.
The only thing I can think of is to convert it to the equivalent IN test:
WHERE txt IN ('a', 'ab', 'abc', 'abcd', 'abcde')

What is better subquery vs literal on IN clause in mysql

Subquery on in clause:
SELECT * FROM TABLE1 WHERE IN (SELECT Field1 FROM TABLE2)
Literal on in clause:
SELECT * FROM TABLE1 WHERE IN (1,2,3,4)
Which query is better?
Appends
Ok, let's I elaborate my database
-- `BOARD` is main board table
CREATE TABLE BOARD (
BoardKey INT UNSIGNED,
Content TEXT,
PRIMARY KEY (BoardKey)
)
-- `VALUE` is extra value table
CREATE TABLE VALUE (
BoardKey INT UNSIGNED,
Value TEXT
)
And this example is searching board record using EAV fields
First step is extract needed board keys from VALUE table
Next step is searching board from BOARD table using extracted board keys
This example is just example,
so I don't need restructuring table design
Subquery on in clause:
SELECT * FROM BOARD WHERE (SELECT BoardKey FROM VALUE WHERE Value='SOME')
Literal on in clause:
SELECT BoardKey FROM VALUE WHERE AND Value='SOME'
Get list of BoardKey and put to the some variable
SELECT * FROM BOARD WHERE BoardKey IN (1,2,3,4)
It all depends on your initial requirements. If you know the values (here 1,2,3,4) are static, you may hard code them. But if they will change in the future, it is better to use the sub query. Normally subquery is more durable but more resource consuming.
Please elaborate on your requirements, so that we can understand the problem and answer you better.
EDIT 1:
Ok, first of all, i have never seen a EAV model on two table, basically it is done with one table. In you case you will have difficulty searching for the key in the two table when you can combine them in one table. Ideally, you table should be like this :
CREATE TABLE BOARD
(
BoardKey INT UNSIGNED,
Content TEXT,
Value TEXT
PRIMARY KEY (BoardKey)
)
Finally, you can do
SELECT * FROM BOARD WHERE Value='SOME'
If the value 'SOME' will change in the future, better stick with Subquery. Hope it helped, vote answered if so.

Create a computed column based on another column in MySQL

I have a 2 columns in my table: a varchar(8) and an int.
I want to auto-increment the int column and when I do, I want to copy the value into the varchar(8) column, but pad it with 0's until it is 8 characters long, so for example, if the int column was incremented to 3, the varchar(8) column would contain '00000003'.
My two questions are, what happens when the varchar(8) column gets to '99999999' because I don't want to have duplicates?
How would I do this in MySQL?
If my values can be between 00000000 to 99999999, how many values can i have before I run out?
This is my alternative approach to just creating a random 8 character string and checking MySQL for duplicates. I thought this was a better approach and would allow for a greater number of values.
Because your formatted column depends upon, and is derivable from, the id column, your table design violates 3NF.
Either create a view that has your derived column in it (see this in sqlfiddle):
CREATE VIEW myview AS
SELECT *, substring(cast(100000000 + id AS CHAR(9)), 2) AS formatted_id
FROM mytable
or just start your auto-increment at 10000000, then it will always be 8 digits long:
ALTER TABLE mytable AUTO_INCREMENT = 10000000;
Simple, if the column is unique, it will throw an exception telling that the value already do exists. But if not unique, after 99999999 you'll get error message that the value is truncated.
Alternatives, why not use INT AUTO_INCREMENT? or a custom ID with a combination of date/time, eg
YYMMDD-00000
This will have a maximum record of 99999 records per day. It will reset on the next day.

Need best practice advice on table creation and use of a text primary key

I'm creating a MySQL MyISAM (Full textual searches are needed) table.
column_1 - contains a TEXT primary key (Data will be a 64 bit encoded string)
column_2 - references another table and is used for joins
column_3 - another TEXT column indexed for searches using MATCH
...
The table is likely to hold billions of records over time.
column_1 main search would be performed on the primary key column as follows. e.g.
SELECT * FROM table WHERE column_1 = 123;
column_2 main search would be performed as follows:
SELECT * FROM table_1
JOIN table_2 ON ( table_1.column_2 == table_2.id );
column_3 main search would be performed as follows:
SELECT column_3, MATCH ( column_3 )
AGAINST ( 'TOKEN' ) AS score
FROM table_1;
I would like to take advice on the sort of indexes I would need to setup and any other advice that sounds relevant.
Thanks in advance.
P.S
Am I right in thinking that if you do a search e.g.
SELECT * FROM table WHERE id = 1; (where id column is not indexed)
The search on a substantial db would be slower than if the column was indexed?
You don't need any more indices for the first query since the PRIMARY KEY is indexed already.
table_2.id should be indexed (if a text field, make the indexed on the first few bytes of the field). table_1.column_2 does not need to be indexed since you do no selection on that field.
column_3 needs a FULL TEXT index.
You are right in your final assumption. The index is made up by a data structure specifically tailored for searching in, with the column as key and a pointer to the correct row as the value. A search on a non-indexed field will require a full table scan (making the db examine every row of the table).