MySQL index with two same columns - mysql

I have a table with several hundred million rows. One of the columns is `status` varchar(10).
Most values in the status are 1 character, some varying up to 10. However a subset of the values has a pattern of its own. A whole group of status values begin with a single character c followed by a number ranging from 0 to 10,000.
I would like to index this column with the following:
ALTER TABLE tbl ADD KEY (status(1), status);
This would be better than having two individual keys, one on status(1) (first character of the whole column) and second status. Together they would always be faster.
However MySQL prohibits me from creating such:
ERROR 1060 (42S21): Duplicate column name 'status'
How can I solve this?

There's really no reason to index status(1) independently of status. One index created on status should handle both situations equally well.

You could create a second column in your table and populate it with the first character of the other column and then create an index on each. However, this might have poor selectivity and not be all that useful.

Related

Is it necessary to create index for column which only contains 7 or 8 possible strings?

And what about more or less possible strings that may store in that column?
The key point is not the number of distinct values, it is the number of rows that the query engine has to take into account (worst case, using a table scan) when optimising your query. So yes, if your table has a non-negligible number of rows, you should create your index.

SQL Left Join. Taking too long.

Okay so here are my table schemas.
I have 2 tables. Say Table A and Table B. The primary key of Table A is PriKeyA bigint(50) and primary key of Table B is PriKeyB varchar(255). Both PriKeyA and PriKeyB contain the same type of data.
The relevant fields of Table A required for this problem are Last_login_date_in_A (date) and Table B is the primary key itself.
What I need to do is, get those PriKeyA's in A which are not there in Table B's PriKeyB column and the Last_login_date_in_A column should be greater than 30 days from the current date. Basically I need the difference of Table A and Table B along with a certain condition(which is the date in this problem)
Here is my SQL command
: SELECT A.PriKeyA from A
LEFT JOIN B ON A.PriKeyA = B.PriKeyB
WHERE B.PriKeyB IS NULL and DATEDIFF(CURRENTDATE,Last_login_date_in_A)>30;
However when I run this MySQL command, it takes about ridiculously long amount of time (About 3 hours). The size of Table A is 2,50,000 and Table B is 42,000 records respectively. I thought that this problem could arise due to the fact that PriKeyA and PriKeyB are different datatypes. So i also used the CAST(PriKeyB as unsigned) in the query. But that too didn't work. There was a marginal performance improvement.
What could be the possible problems? I've used Left Joins before and they never have taken this long.
The expense of the query appears to be for these reasons:
The SQL datatype for A's PK and B's PK aren't the same.
Table A probably doesn't have an index on Last_login_date_in_A
What this means is that ALL rows in table A MUST be examined one row at a time in order to determine if the > 30 days ago criteria is true. This is especially true if A has 2,500,000 rows (as evidenced by how you placed your commas in A's row count) instead of 250,000.
Adding an index on Last_login_date_in_A might help you out here, but will also slightly slow down insert/update/delete statement times for the table due to needing to update the additional index.
Additionally, you should utilize the documentation for explaining MySQL's actual chosen query plan for your query at: MySQL query plan documentation

Best solution for saving boolean values and saving cpu and memory on searches

What is the best solution for inserting boolean values on database if you want more query performance and minimum losing of memory on select statement.
For example:
I have a table with 36 fields that 30 of them has boolean values (zero or one) and i need to search records using the boolean fields that just have true values.
SELECT * FROM `myTable`
WHERE
`field_5th` = 1
AND `field_12th` = 1
AND `field_20` = 1
AND `field_8` = 1
Is there any solution?
If you want to store boolean values or flags there are basically three options:
Individual columns
This is reflected in your example above. The advantage is that you will be able to put indexes on the flags you intend to use most often for lookups. The disadvantage is that this will take up more space (since the minimum column size that can be allocated is 1 byte.)
However, if you're column names are really going to be field_20, field_21, etc. Then this is absolutely NOT the way to go. Numbered columns are a sign you should use either of the other two methods.
Bitmasks
As was suggested above you can store multiple values in a single integer column. A BIGINT column would give you up to 64 possible flags.
Values would be something like:
UPDATE table SET flags=b'100';
UPDATE table SET flags=b'10000';
Then the field would look something like: 10100
That would represent having two flag values set. To query for any particular flag value set, you would do
SELECT flags FROM table WHERE flags & b'100';
The advantage of this is that your flags are very compact space-wise. The disadvantage is that you can't place indexes on the field which would help improve the performance of searching for specific flags.
One-to-many relationship
This is where you create another table, and each row there would have the id of the row it's linked to, and the flag:
CREATE TABLE main (
main_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
);
CREATE TABLE flag (
main_id INT UNSIGNED NOT NULL,
name VARCHAR(16)
);
Then you would insert multiple rows into the flag table.
The advantage is that you can use indexes for lookups, and you can have any number of flags per row without changing your schema. This works best for sparse values, where most rows do not have a value set. If every row needs all flags defined, then this isn't very efficient.
For performance comparisson you can read a blog post I wrote on the topic:
Set Performance Compare
Also when you ask which is "Best" that's a very subjective question. Best at what? It all really depends on what your data looks like and what your requirements are and how you want to query it.
Keep in mind that if you want to do a query like:
SELECT * FROM table WHERE some_flag=true
Indexes will only help you if few rows have that value set. If most of the rows in the table have some_flag=true, then mysql will ignore indexes and do a full table scan instead.
How many rows of data are you querying over? You can store the boolean values in an integer value and use bit operations to test for them them. It's not indexable, but storage is very well packed. Using TINYINT fields with indexes would pick one index to use and scan from there.

MySQL indexes: how do they work?

I'm a complete newbie with MySQL indexes. I have several MyISAM tables on MySQL 5.0x having utf8 charsets and collations with 100k+ records each. The primary keys are generally integer. Many columns on each table may have duplicate values.
I need to quickly count, sum, average, or otherwise perform custom calculations on any number of fields in each table or joined on any number of others.
I found this page giving an overview of MySQL index usage: http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html, but I'm still not sure I'm using indexes right. Just when I think I've made the perfect index out of a collection of fields I want to calculate against, I get the "index must be under 1000 bytes" error.
Can anyone explain how to most efficiently create and use indexes to speed up queries?
Caveat: upgrading Mysql is not possible in this case. Using Navicat Light for db administration, but this app isn't required.
When you create an index on a column or columns in MySQL table, the database is creating a data structure called a B-tree (assuming you use the default index setting), for which the key of each record is a concatenation of the values in the indexed columns.
For example, let's say you have a table that is defined like:
CREATE TABLE mytable (
id int unsigned auto_increment,
column_a char(32) not null default '',
column_b int unsigned not null default 0,
column_c varchar(512),
column_d varchar(512),
PRIMARY KEY (id)
) ENGINE=MyISAM;
Then let's give it some data:
INSERT INTO mytable VALUES (1, 'hello', 2, null, null);
INSERT INTO mytable VALUES (2, 'hello', 3, 'hi', 'there');
INSERT INTO mytable VALUES (3, 'how', 4, 'are', 'you?');
INSERT INTO mytable VALUES (4, 'foo', 5, '', 'bar');
Now suppose you decide to add a key to column_a and column_b like:
ALTER TABLE mytable ADD KEY (column_a, column_b);
The database is going to create the aforementioned B-tree, which will have four keys in it, one for each row:
hello-2
hello-3
how-4
foo-5
When you perform a search that references the column_a column, or that references the column_a AND column_b columns, the database will be able to use this index to narrow the record set it has to examine. Let's say you have a query like:
SELECT ... FROM mytable WHERE column_a = 'hello';
Even though the above query does not specify a value for the column_b column, it can still take advantage of our index by looking for all keys that begin with "hello". For the same reason, if you had a query like:
SELECT ... FROM mytable WHERE column_b = '2';
This query would NOT be able to use our index, because it would have to parse the index keys themselves to try to determine which keys' second value matches '2', which is terribly inefficient.
Now, let's address your original question of the maximum length. Suppose we try to create an index spanning all four non-PK columns in this table:
ALTER TABLE mytable ADD KEY (column_a, column_b, column_c, column_d);
You will get an error:
ERROR 1071 (42000): Specified key was too long; max key length is 1000 bytes
In this case our column lengths are 32, 10, 512, and 512, which in a single-byte-per-character situation is 1066, which is above the limit of 1000. Suppose that it DID work; you would be creating the following keys:
hello-2-
hello-3-hi-there
how-4-are-you?
foo-5--bar
Now, suppose that you had values in column_c and column_d that were very long -- 512 characters each. Even in a basic single-byte character set, your keys would now be over 1000 bytes in length, which is what MySQL is complaining about. It gets even worse with multibyte character sets, where seemingly "small" columns can still push the keys over the limit.
If you MUST use a large compound key, one solution is to use InnoDB tables rather than the default MyISAM tables, which support a larger key length (3500 bytes) -- you can do this by swapping ENGINE=InnoDB instead of ENGINE=MyISAM in the declaration above. However, generally speaking, if you are using long keys there is probably something wrong with your table design.
Remember that single-column indexes often provide more utility than multi-column indexes. You want to use a multi-column index when you are going to often/always take advantage of it by specifying all of the necessary criteria in your queries. Also, as others have mentioned, do NOT index every column of a table, since each index is adding storage overhead to your database. You want to limit your indexes to the columns that are frequently used by queries, and if it seems like you need too many, you should probably think about breaking up your tables up into more logical components.
Indexes generally aren't well suited for custom calculations where the user is able to construct their own queries. Typically you choose the indexes to match the specific queries you intend to run, using EXPLAIN to see if the index is being used.
In the case that you have absolutely no idea what queries might be performed it is generally best to create one index per column - and not one index covering all columns.
If you have a good idea of what queries might be run often you could create an extra index for those specific queries. You can also add indexes later if your users complain that certain types of queries run too slow.
Also, indexes generally aren't that useful for calculating counts, sums and averages since these types of calculations require looking at every row.
It sounds like you are trying to put too many fields into your index. The limit is the probably the number of bytes it takes to encode all the fields.
The index is used in looking up the records, so you want to choose the fields which you are "WHERE"ing on. In choosing between those fields, you want to choose the ones that will narrow the results the quickest.
As an example, a filter on Male/Female will usually not help much because you are only going to save about 50% of the time. However, a filter on State may be useful because you'll break down into many more categories. However, if almost everybody in the database is in a single state then that won't work.
Remember that indexes are for sorting and finding rows.
The error message you got sounds like it is talking about the 1000 byte Prefix Limit for MyISAM table indexes. From http://dev.mysql.com/doc/refman/5.0/en/create-index.html:
The statement shown here creates an
index using the first 10 characters of
the name column:
CREATE INDEX part_of_name ON customer
(name(10)); If names in the column
usually differ in the first 10
characters, this index should not be
much slower than an index created from
the entire name column. Also, using
column prefixes for indexes can make
the index file much smaller, which
could save a lot of disk space and
might also speed up INSERT operations.
Prefix support and lengths of prefixes
(where supported) are storage engine
dependent. For example, a prefix can
be up to 1000 bytes long for MyISAM
tables, and 767 bytes for InnoDB
tables.
Maybe you can try a FULLTEXT index for problematic columns.

Index counter shared by multiple tables in mysql

I have two tables, each one has a primary ID column as key. I want the two tables to share one increasing key counter.
For example, when the two tables are empty, and counter = 1. When record A is about to be inserted to table 1, its ID will be 1 and the counter will be increased to 2. When record B is about to be inserted to table 2, its ID will be 2 and the counter will be increased to 3. When record C is about to be inserted to table 1 again, its ID will be 3 and so on.
I am using PHP as the outside language. Now I have two options:
Keep the counter in the database as a single-row-single-column table. But every time I add things to table A or B, I need to update this counter table.
I can keep the counter as a global variable in PHP. But then I need to initialize the counter from the maximum key of the two tables at the start of apache, which I have no idea how to do.
Any suggestion for this?
The background is, I want to display a mix of records from the two tables in either ASC or DESC order of the creation time of the records. Furthermore, the records will be displayed in page-style, say, 50 records per page. Records are only added to the database rather than being removed. Following my above implementation, I can just perform a "select ... where key between 1 and 50" from two tables and merge the select datasets together, sort the 50 records according to IDs and display them.
Is there any other idea of implementing this requirement?
Thank you very much
Well, you will gain next to nothing with this setup; if you just keep the datetime of the insert you can easily do
SELECT * FROM
(
SELECT columnA, columnB, inserttime
FROM table1
UNION ALL
SELECT columnA, columnB, inserttime
FROM table2
)
ORDER BY inserttime
LIMIT 1, 50
And it will perform decently.
Alternatively (if chasing last drop of preformance), if you are merging the results it can be an indicator to merge the tables (why have two tables anyway if you are merging the results).
Or do it as SQL subclass (then you can have one table maintain IDs and other common attributes, and the other two reference the common ID sequence as foreign key).
if you need creatin time wont it be easier to add a timestamp field to your db and sort them according to that field?
i believe using ids as a refrence of creation is bad practice.
If you really must do this, there is a way. Create a one-row, one-column table to hold the last-used row number, and set it to zero. On each of your two data tables, create an AFTER INSERT trigger to read that table, increment it, and set the newly-inserted row number to that value. I can't remember the exact syntax because I haven't created a trigger for years; see here http://dev.mysql.com/doc/refman/5.0/en/triggers.html