if we have table such:
create table x(
id int primary key,
something_else_1 int,
something_else_2 int,
something_else_3 int,
char_data text, -- or varchar(1000)
);
this table will be queried on all fields except char_data.
most queries will be similar to:
select id, something_else_1
from x
where something_else_2 = 2 and something_else_3 = 5;
question is - if we have indexes etc,
what configuration will be better - text or varchar.
Just one final note -
I know I can separate this into two tables, buy separation in this case will be not the best idea, since all fields except the blob's will be something like unique index or similar.
this table will be queried on all fields except char_data.
Then data type of char_data has no influence over performance. Only if you select char_data it'll consume more bandwidth. Nothing else.
Its not a problem. Because you are not using in your sql. SELECT * will become slow but SELECT id, something_else_1 will not make it slow. WHERE id=2 and something_else_2=1 has no effect, but WHERE char_data like '%charsequence%'. As long as you are not searching your table with char_data you are safe.
Besides if you still want to search by char_data, you should enable full text search.
ALTER TABLE `x` ADD FULLTEXT(`char_data`);
Note: Full text search is only supported in MyISAM table engine.
Related
So I have a SELECT Statement that is comparing the current column content from the table_1 column "table_1_content" with the content of another column (table_2_content) in table_2, whereas content in "table_2_content" can be found anywhere in "table_1_content":
$select = "SELECT * FROM table_1, table_2 WHERE `table_1_content` LIKE CONCAT('%', table_2_content, '%')";
$result = mysqli_query($con, $select);
My problem is that LIKE CONCAT is pretty performance heavy.
Is there another way to search through two columns from different tables, so that no full table scan is performed every time the query is executed?
The LIKE in total free text format (% at the start and at the end of the search string) is the performance heavy part. Is the wildcard at the start of the string necessary? If so: You might have to consider pre-processing the data in a different way so that the search can use a single wildcard or no wildcard at all. This last part (depending on the data) is for example done by splitting the string by a delimiter and storing the data in separate rows, after which a much faster comparison and indexes are possible to be used.
To put data in multiple rows, we would assume a usable separator (can be multiple, the code just gets longer):
CREATE TABLE baseinfo (id INT NOT NULL auto_increment primary key,
some other columns);
CREATE TABLE explodedstring(id INT NOT NULL, str VARCHAR(200),
FOREIGN KEY (id) REFERENCES baseinfo(id));
CREATE PROCEDURE explodestring(id int, fullstr VARCHAR(4000))
BEGIN
{many examples exist already how to do this on SO}
END;
The procedure would take as input your key from the original data (id in this case), and the original string.
The output of the procedure would end up in a secondary table explodedstring against which you now could run a normal select (add some index for performance). The resulting ids would tell you which record would match.
I was reading Django Book and came across interesting statement.
Notice that Django doesn’t use SELECT * when looking up data and instead lists
all fields explicitly. This is by design:
in certain circumstances SELECT * can be slower,
I got this from http://www.djangobook.com/en/1.0/chapter05/
So my question is can someone explain me why SELECT * can be slower, than call every single column explicitly. Would be good if you can give me some examples.
Or if you think the opposite (it doesn't matter), can you explain why?
Update:
That's the table :
BEGIN;
CREATE TABLE "books_publisher" (
"id" serial NOT NULL PRIMARY KEY,
"name" varchar(30) NOT NULL,
"address" varchar(50) NOT NULL,
"city" varchar(60) NOT NULL,
"state_province" varchar(30) NOT NULL,
"country" varchar(50) NOT NULL,
"website" varchar(200) NOT NULL
);
And that's how Django will call SELECT * FROM book_publisher:
SELECT
id, name, address, city, state_province, country, website
FROM book_publisher;
performance (will matter only if you are selecting less columns than there are in the table
I am not sure about how Django works; but in some languages/ db drivers "select *" will cause an error if you change the table schema (say add a new column). This is because the DB driver "caches" the table schema and now its internal schema does not match the table schema.
If you have 100 columns, SELECT * will return the data for all columns. Listing the columns explicitly will reduce the columns returned, therefore reducing the amount of data transmitted between the server and application.
This is clearly not faster in many case, and when one of them is faster, it is by a slight margin: check by yourself, benchmarking a lot of queries :)
It might be faster to select only some columns in some case, including when you select only column that are on a combined index, avoiding the need to read the whole row, and also when you avoid accessing BLOB or TEXT columns on MySQL.
And naturally if you select less column you will transfer less data between MySQL and your application
I think in this exact case there will be no performance difference, this is exactly that in certain circumstances SELECT * can be slower is all about.
So I am creating a table from a view right now like this:
CREATE TABLE tableName SELECT * FROM viewName
Since there are some text fields in the view resultset, the query is quite slow.
I would like to create a memory table instead. But since memory tables do not support text fields in MySQL, I would like to convert all text fields to varchar fields when creating the table. How should I edit this SQL to do that? is that even possible?
I'm *very unfamiliar with MySQL at this point (just starting to learn it). You might however reference the MySQL syntax page for Create Table: http://dev.mysql.com/doc/refman/5.1/en/create-table.html. You'll see that when creating a table you can specify column names and types i.e. CREATE TABLE t1 (col1 INT, col2 CHAR(5), col3 DATETIME). I'm guessing that if you declare these upfront the SELECTed data will be typecasted to the new datatypes. Good Luck!
I have a fairly large table with about 250k rows. It has an auto incremented ID column that is really sort of useless. I can't just get rid of the column without rewriting too much of the app, but the ID is never used as a foreign key or anything else (except simply as an identifier when you want to delete a row, I guess).
The majority of the data gets deleted and rewritten at least a few times a day (don't ask! it's not important, though I realize it's poor design!), though the total count of the rows stays fairly uniform. What this means is that each day to AI # increases by a quarter million or so.
My question is this: in several years' time, the ID column will get too large for the INT value. Is there a way to "reset" the ID, like an OPTIMIZE or something, or should I just plan on doing a SELECT INTO a temp table and truncating the original table, resetting the ID to 0?
Thanks
If you have the id as integer you can have 2^32 / 2 (2.147.483.647) rows, if is unsigned integer duplicate to 4.294.967.295, no worry 250.000 in nothing, if you want more, use unsigned bigint (18.446.744.073.709.551.615) :P
For reset the auto_numeric position:
ALTER TABLE table AUTO_INCREMENT = 1
Either change the datatype of ID to BIGINT and adjust your program accordingly, or if you're clearing everything out when you delete data you can use TRUNCATE TABLE TABLENAME which will reset the sequence.
Easiest and fastest :) Just drop the index, set autoincrement=1, and add it back :)
ALTER TABLE yourtable DROP id_field;
ALTER TABLE yourtable AUTO_INCREMENT=1;
ALTER TABLE yourtable ADD id_field INT NOT NULL AUTO_INCREMENT FIRST, ADD PRIMARY KEY (id_field);
Assume that I have one big table with three columns: "user_name", "user_property", "value_of_property". Lat's also assume that I have a lot of user (let say 100 000) and a lot of properties (let say 10 000). Then the table is going to be huge (1 billion rows).
When I extract information from the table I always need information about a particular user. So, I use, for example where user_name='Albert Gates'. So, every time the mysql server needs to analyze 1 billion lines to find those of them which contain "Albert Gates" as user_name.
Would it not be wise to split the big table into many small ones corresponding to fixed users?
No, I don't think that is a good idea. A better approach is to add an index on the user_name column - and perhaps another index on (user_name, user_property) for looking up a single property. Then the database does not need to scan all the rows - it just need to find the appropriate entry in the index which is stored in a B-Tree, making it easy to find a record in a very small amount of time.
If your application is still slow even after correctly indexing it can sometimes be a good idea to partition your largest tables.
One other thing you could consider is normalizing your database so that the user_name is stored in a separate table and use an integer foriegn key in its place. This can reduce storage requirements and can increase performance. The same may apply to user_property.
you should normalise your design as follows:
drop table if exists users;
create table users
(
user_id int unsigned not null auto_increment primary key,
username varbinary(32) unique not null
)
engine=innodb;
drop table if exists properties;
create table properties
(
property_id smallint unsigned not null auto_increment primary key,
name varchar(255) unique not null
)
engine=innodb;
drop table if exists user_property_values;
create table user_property_values
(
user_id int unsigned not null,
property_id smallint unsigned not null,
value varchar(255) not null,
primary key (user_id, property_id),
key (property_id)
)
engine=innodb;
insert into users (username) values ('f00'),('bar'),('alpha'),('beta');
insert into properties (name) values ('age'),('gender');
insert into user_property_values values
(1,1,'30'),(1,2,'Male'),
(2,1,'24'),(2,2,'Female'),
(3,1,'18'),
(4,1,'26'),(4,2,'Male');
From a performance perspective the innodb clustered index works wonders in this similar example (COLD run):
select count(*) from product
count(*)
========
1,000,000 (1M)
select count(*) from category
count(*)
========
250,000 (500K)
select count(*) from product_category
count(*)
========
125,431,192 (125M)
select
c.*,
p.*
from
product_category pc
inner join category c on pc.cat_id = c.cat_id
inner join product p on pc.prod_id = p.prod_id
where
pc.cat_id = 1001;
0:00:00.030: Query OK (0.03 secs)
Properly indexing your database will be the number 1 way of improving performance. I once had a query take a half an hour (on a large dataset, but none the less). Then we come to find out that the tables had no index. Once indexed the query took less than 10 seconds.
Why do you need to have this table structure. My fundemental problem is that you are going to have to cast the data in value of property every time you want to use it. That is bad in my opinion - also storing numbers as text is crazy given that its all binary anyway. For instance how are you going to have required fields? Or fields that need to have constraints based on other fields? Eg start and end date?
Why not simply have the properties as fields rather than some many to many relationship?
have 1 flat table. When your business rules begin to show that properties should be grouped then you can consider moving them out into other tables and have several 1:0-1 relationships with the users table. But this is not normalization and it will degrade performance slightly due to the extra join (however the self documenting nature of the table names will greatly aid any developers)
One way i regularly see databqase performance get totally castrated is by having a generic
Id, property Type, Property Name, Property Value table.
This is really lazy but exceptionally flexible but totally kills performance. In fact on a new job where performance is bad i actually ask if they have a table with this structure - it invariably becomes the center point of the database and is slow. The whole point of relational database design is that the relations are determined ahead of time. This is simply a technique that aims to speed up development at a huge cost to application speed. It also puts a huge reliance on business logic in the application layer to behave - which is not defensive at all. Eventually you find that you wan to use properties in a key relationsip which leads to all kinds of casting on the join which further degrades performance.
If data has a 1:1 relationship with an entity then it should be a field on the same table. If your table gets to more than 30 fields wide then consider movign them into another table but dont call it normalisation because it isnt. It is a technique to help developers group fields together at the cost of performance in an attempt to aid understanding.
I don't know if mysql has an equivalent but sqlserver 2008 has sparse columns - null values take no space.
SParse column datatypes
I'm not saying a EAV approach is always wrong, but i think using a relational database for this approach is probably not the best choice.