mysql keyword search across multiple columns - mysql

Various incarnations of this question have been asked here before, but I thought I'd give it another shot.
I had a terrible database layout. A single entity (widget) was split into two tables:
CREATE TABLE widgets (widget_id int(10) NOT NULL auto_increment)
CREATE TABLE widget_data (
widget_id int(10),
field ENUM('name','size','color','brand'),
data TEXT)
this was less that ideal. if wanted to find widgets of a specific name, color and brand, i had to do a three-way join on the widget_data table. so I converted to the reasonable table type:
CREATE TABLE widgets (widget_id int(10) NOT NULL auto_increment,
name VARCHAR(32),size INT(3),color VARCHAR(16), brand VARCHAR(32))
This makes most queries much better. But it makes searching harder. It used to be that if i wanted to search widgets for, say, '%black%', I would just SELECT * FROM widget_data WHERE data LIKE '%black%'. This would give me all instances of widgets that are black in color, or are made by blackwell industries, or whatever. I would even know exactly which field matched, and could show that to my user.
how do I execute a similar search using the new table layout? I could of course do WHERE name LIKE '%black%' OR size LIKE '%black%'... but that seems clunky, and I still don't know which fields matched. I could run a separate query for each column I want to match on, which would give me all matches and how they matched, but that would be a performance hit. any ideas?

You can include part of WHERE expression into selecting columns. For example:
SELECT
*,
(name LIKE '%black%') AS name_matched,
(size LIKE '%black%') AS size_matched
FROM widget_data
WHERE name LIKE '%black%' OR size LIKE '%black%'...
Then check value of name_matched on side of the script.
Not sure how it will affect performance. Feal free to test it before going to production

You have two conflicting requirements. You want to search as if all your data is in a single field, but you also want to identify which specific field was matched.
There's nothing wrong with your WHERE name LIKE '%black%' OR size LIKE '%black%'... expression. It's a perfectly valid search on the table as you have defined it. Why not just check the results in code to see which one matched? It's a minimal overhead.
If you want a cleaner syntax for your SQL then you could create a view on the table, adding an extra field which consists of concatenating the other fields:
CREATE VIEW extra_widget_data AS
SELECT (name, size, color, brand,
CONCAT(name, size, color, brand) as all_fields)
FROM widget_data;
Then you'd have to add an index on this field, which requires more space, CPU time to maintain etc. I don't think it's worth it.

You probably want to look into MySQL full text search capability, this enables you to match against multiple columns of varchar type.
http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html

Related

Generating a big range of numbers in MySQL

How do I generate a range of numbers in one column in MySQL? I'm looking for any soluton to make numbers range that starts from 500000000 and ends on 889999999.
It seems that may want to use an AUTO_INCREMENT in the column value you want. You can set the starting value to the one you desire in this way.
Also, you can only have one AUTO_INCREMENT column in a given table.
CREATE TABLE your_table (
column_1 INT NOT NULL AUTO_INCREMENT = 500000000
--Add other columns
)
If you already have a table with the AUTO_INCREMENT column, just set the value to the one you want.
ALTER TABLE your_table AUTO_INCREMENT = 500000000;
If what you want is to insert rows with those numbers, use a loop.
Just for fun, generate the range in a text file, by any means available.
Unload to a text file.
Load that text file to your table. You don not say if you are constrained by how long this takes. It sounds like you just want a table of a single column of INT with lower and upper limits.
MySQL should just handle these numbers,this is not really a "big" range, seriously.
Do you want to constrain the values in the column to
{500000000..889999999}?
Or do you want to know how to define a column
to hold these values?
Do you want a written procedure to generate
these numbers for you?
Do you want us to size this for you?
Do you want us to write a script or program to load these?
All of these answers are available with minimal sweat. Keywords are MySQL,Integer, Types.
We cannot see your problem because your question does not describe a problem.
Tell us what you tried, and tell us what happened...
Otherwise just add them, you are still in INT territory (-2Gi..2Gi), not BIGINT yet.
Switch to MariaDB, then JOIN to a pseudo-table called seq_500000000_to_889999999.

How to search either on id or name for certain purchase orders

We would like to filter purchase orders either based on purchase order id (primary key) or name of the purchase order using a single search box.
We used the like parameter to search on the name field, but it doesn't seem to work on the primary key. It works only when we use the equal operator for id(s). But it would be preferable if we can filter purchase orders using like for id(s). How to do this?
create table purchase_orders (
id int(11) primary key,
name varchar(255),
...
)
Option 1
SELECT *
FROM purchase_orders
WHERE id LIKE '%123%'; -- tribute to TemporaryNickName
This is horrible, performance-wise :)
Option 2a
Add a text column which receives a string version of id. Maybe add some triggers to populate it automatically.
Option 2b
Change the type of id column to CHAR or VARCHAR (I believe CHAR should be preferred for a primary key).
In both 2a. and 2b. cases, add an index (maybe a FULLTEXT one) to this column.
I think LIKE should work. I assume that your SQL wasn't correctly written.
Let's assume that you have order name "ABCDEF" then you can find this using the following query structure.
SELECT id FROM purchase_orders WHERE name LIKE '%CD%';
To explain it, % sign means it's a wildcard. As a result this query is going to select any String that contains "CD" inside of it.
According to the table structure, varchar can contain 255 characters. I think this is quite a large string and it's probably going to consume a lot of resources and going to take more time to search something using SQL functions like LIKE. You can always search it by id
WHERE id = something. This is much faster way btw
, but I don't think order id is an user friendly data, instead I would let users to use product name. My recommendation is to use apache Lucene or MySQL's full text search feature (which can improve search performance).
Apache lucene
MySQL Full text search function
These are tools built to search certain pattern or word through list of large strings in much faster way. Many websites use this to build their own mini search engines. I found mysql full text search function requires pretty much no learning curve and straight forward to use =D

How to store these field descriptions in mysql?

Apologize for the long topic, I didn't intend for it to be this long, but it's a pretty simple issue I've been having. :)
Let's say you have a simple table called tags that has columns tag_id and tag. The tag_id is simply an auto increment column and the tag is the title of the tag. If I need to add a description field, that would be around 1-2 paragraphs on average (max around 3-4 paragraphs probably), should I simply add a description field to the table or should I create a new table called tag_descriptions and store the descriptions with the tag_id?
I remember reading that it is better to do this because if you do a query that doesn't select the description, that description field will still slow down mysql. Is this true? I don't even remember where I read that from, but I've been kind of following it for a couple years now... Finally I question if I need to do this, I have a feeling I don't. You'd also need to inner join whenever you need the description field.
Another question I have is, is it generally bad to create new tables that will only hold very few rows at the max? What if this data doesn't fit anywhere else?
I have a simple case below which relates to these two questions.
I have three tables content, tags, and content_tags that make up a many to many relationship:
content
content_id
region (enum column with
about 6-7 different values and most
likely won't grow later on)
tags
tag_id
tag
content_tags
content_id
tag_id
I want to store a description around 1-2 paragraphs for each tag, but also for each region. I'm wondering what would be the best way to do this?
Option A:
Just add a description column to the
tags table
Create a new table for
region_descriptions
Option B:
Create a new table called
descriptions with fields: id,
description, and type
The id would be id of the content or
id of the enum field
The type would be whether it is a tag
description, or region description
(Would use the enum column for this)
Maybe have a primary key on the id and type?
Option C:
Create a new table for tag_descriptions
Create a new table for region_descriptions
Option A seems to be a good choice if adding the description column doesn't slow down mysql select queries that don't need the description.
Assuming the description column would slow down mysql, option B might be a good choice. It also removes the need for a small table with just 6-7 rows that would hold the region descriptions. Although now that I think of it, would it be slow to connect to this table if originally to get a region description you'd only need to go through very little rows.
Option C would be ideal if the description columns would slow down mysql and if a small table like region descriptions would not matter.
Maybe none of these options are the best, feel free to offer another option. Thanks.
P.S. What would be an ideal column type to use to hold data that usually 1-2 paragraphs, but might be a little more sometimes?
I don't think it really matters if you don't handle thousands of queries per minute. If you are going to have a zillion queries per minute, then I would implement the various options and perform benchmarks for all these options. Based on the results, you can make a decision.
In my (admittedly somewhat uninformed) opinion, it really depends on how much you'll be using both of them.
If properly indexed, that JOIN should not be very expensive. Also, a larger table will be slower. It inhibits caching, and takes longer to access stuff, although indexing seriously mitigates this problem.
If you'll be joining tag names to tag IDs a LOT, and only rarely will be using the descriptions, I'd say go with separate tables. If you'll be using the descriptions more often, go with one table.
For the first part of your question: if you have a tag with an id, a name and a description, you should save it in 1 table.
Now, this query
SELECT name FROM tags WHERE id = 1;
will NOT slow down if you have 1, 2 or 20 extra fields in there.

What is the best method to store default values in database?

I have several tables like Buyers, Shops, Brands, Money_Collectors, e.t.c.
Each one of those has a default value, e.g. the default Buyer is David, the default Shop is Ebay, and so on.
I would like to save those default values in a database (so that user could change them).
I thought to add is_default column to each one of the tables, but it seems to be ineffective because only one row in each table may be the default.
Then I thought that the best would be to have Defaults table that will contain all the default values. This table will have 1 row and N columns, where N is the number of the default values:
Defaults table:
buyer shop brand money_collector
----- ---- ----- ---------------
David Ebay Dell NULL (no default value)
But, this seems to be not the best approach because the table structure changes when a new default value is added.
What would be the best approach to store default values ?
Just to be clear.
The best way is with a column on each table which dropdowns source from.
And here's why...
"Shouldn't I worry about space when
saving data in a database?"
The short answer is no. The longer answer is what you should worry about is performance. Focusing on space will lead you to do very bad things.
Bad things that you'll do if space is a concern.
You'll bury meaning into Primary Keys. i.e. Smart Keys.
You'll try to store mulitple values in one column.
You'll index too little
(No doubt we could create a list of 50 bad practices which save space)
suppose there are 50 shops (select box
with 50 possible values). In this
case, to store the default shop you
need 50 boolean fields,
Well it's ONE Boolean column. It exists on each row.
Let me ask you this. If you created a table with 1 date column and inserted 1 row, how much space would you use on disk?
If you said a 7 or 8 bytes then you're off by about 1000 times.
The smallest unit of disk space is a block. Blocks are typical 8kb (the can be as small as 2kb as large as 32kb, in general (no nitpicking here, the actual limits are unimportant))
Let's say you have 8kb blocks then your 1 column, 1 row table takes 8Kb. If you insert another 999 rows it will still take up 8KB. (Again no nitpicking there is overhead per block and per row - it's an example)
So in your look up table with 50 store names, the likelihood that adding 50 bytes to the size of the table forces you to expand from 1 block to 2 is slim to none and completely irrelevant.
On the other hand, your default table will certainly take up at least one additional block.
But the worst hit to PERFORMANCE is that your call to fill a drop down will need two round trips to the database, one to get the list, one to get the default. (yes, you may be able to do this in one but go with it)
So you've saved exactly zero space and doubled your network traffic.
You see what I'm saying.
Another crucial reason to stop worrying about space is you're giving up clarity. think of the developer you're going to hire to run this app. When he joins the team and looks at the database, imagine the two scenarios.
There's a Boolean column named Default_value
There's a table with no relationships to anything that's named Default_Values
You ask him to build a new for with a dropdown for 'store'.
In scenario 1 he finds the store table, wires up the dropdown to a simple query of the table and uses the default_value field to select the initial value.
In scenario 2, without some training, how would he know to look for a separate table? Maybe he'd see the table but by the time you're hiring, your datamodel now has hundreds of tables.
Again, a little contrived but the point is salient. Clarity in the database is well, well worth a byte per row.
Technical stuff
I'm not a MySQL guy but in Oracle, a null column at the end of a row take no additional space. In Oracle I would use a Varchar2(1) and let 'T' = Default and leave the others null. That would have the effect on only using 1 addition byte total, and not per row. YMMV with MySQL, you can pose that question separately if you can't Google the answer.
But the time to worry about that is on millions of rows, not hundreds. Any table which feeds a dropdown will never be big enough to start worrying about extra bytes.
What if you create an XML and then store that XML in the table in an XML column. The XML column would contain the XML, and the XML could have tags of tables and a sub node of default values.
You should rather create a a table with two columns and n rows
Defaults table:
buyer, David
shop, Ebay,
brand, Dell
This way you can add new values without having to change table structure
You can create a catalog table (some kind of metadata table) containing the default values as strings for the desired table columns. Then you can use the convert function for getting the appropriate value. Below is a sample table definition (Transact-SQL was used):
create table dbo.cat_default_values
(
id_column varchar(30) not null,
id_table varchar(30) not null,
datatype varchar(30) not null,
value varchar(100) not null,
f_creation datetime not null,
usr_creation char(8) null,
primary key clustered (id_column, id_table)
)
declare #defaultValueInt int,
#defaultValueVarchar varchar(30)
select #defaultValueInt = convert(int, value)
from cat_default_values where id_column = "defColumInteger" and id_table = "table1"
select defaultValueVarchar = value
from cat_default_values where id_column = "defColumVarchar" and id_table = "table1"
What you are trying to store is not meta data information. First of all, so I will not invent an external data store to store this data.(coupled with extra code )
I assume you have a PK Sequence generation logic (under your control). I will assign a magic number x and I will insert a record in each table with _id = x as the default value. So if you want to show the user the default value, you can handle in your query in a uniform way or you can handle this in application logic while insert. The good thing about this is, you have access to default value all the time and without writing any extra logic and the logic for maintaining default value of a table can be maintained using the same code (templating ;)
(From the lessons W3c learned from modeling schema information of XML using DTD.)
Only catch is this logic should be made explicit either using some extensive documentation or could be hard imposed by using a trigger.

How to best get 3 prior image and 3 later image records in MySQL query?

I'll explain briefly what I want to accomplish from a functional perspective. I'm working on an image gallery. On the page that shows a single image, I want the sidebar to show thumbnails of images uploaded by the same user. At a maximum, there should be 6, 3 that were posted before the current main image, and 3 that were posted after the main image. You could see it as a stream of images by the same user through which you can navigate. I believe Flickr has a similar thing.
Technically, my image table has an autoincremented id, a user id and a date_uploaded field, amongst many other columns.
What would your advise be on how to implement such a query? Can I combine this in a single query? Are there any handy MySQL utilities that can deal with offsets and such?
PS: I prefer not to create an extra "rank" column, since that would make managing deletions difficult. Also, using the autoincrement id seems risky, I might change it for a GUID later on. Finally, I'm of course looking for a query that performs and scales.
I know I ask for a lot, but it seems simpler than it is?
The query could look like the following.
With a UserID+image_id index (and possibly additional fields for covering purposes), this should perform relatively well.
SELECT field1, field2, whatever
FROM myTable
WHERE UserID = some_id
-- AND image_id > id_of_the_previously_first_image
ORDER BY image_id
LIMIT 7;
To help with scaling, you should consider using a bigger LIMIT value and cache accordingly.
Edit (answering remarks/questions):
The combined index...
is made of several fields, specifically
CREATE [UNIQUE] INDEX UserId_Image_id_idx
ON myTable (UserId, image_ida [, field1 ...] )
Note that optional elements of this query are in brackets ([]). I would assume the UNIQUE constraint would be a good thing. The additional "covering" fields (field1,...) maybe beneficiary, but would depend on the "width" of such additional fields as well as on the overall setup and usage patterns (since [large] indexes slow down INSERTs/UPDATEs/DELETEs, one may wish to limit the number and size of such indexes etc.)
Such an index data "type" is neither numeric nor string etc. It is simply made of the individual data types. For example if UserId is VARCHAR(10) and Image_id is INT, the resulting index would use these two types for the underlying search criteria, i.e.
... WHERE UserId = 'JohnDoe' AND image_id > 12389
in other words one needn't combine these criteria into a single key.
On image_id
when you say image_id, you mean the combined user/image id, right?
No, I mean only image_id. I'm assuming this field is a separate field in the table. The UserID is taken care of in the other predicate of the WHERE clause.
The original question write up indicates that this field is auto-generated, and I'm assuming we can rely on this field for sorting purposes. Alternatively we could rely on other fields such as the timestamp when the image was uploaded and such.
Also, an afterthought, whether ordered by a [monotonically increasing] Image_id or by the Timestamp_of_upload, we may want to use a DESC order, to show the latest "stuff" first.