Sphinx Search, compound key

Sphinx Search, compound key - mysql

After my previous question (http://stackoverflow.com/questions/8217522/best-way-to-search-for-partial-words-in-large-mysql-dataset), I've chosen Sphinx as the search engine above my MySQL database.
I've done some small tests with it, and it looks great. However, i'm at a point right now, where I need some help / opinions.
I have a table articles (structure isn't important), a table properties (structure isn't important either), and a table with values of each property per article (this is what it's all about).
The table where these values are stored, has the following structure:
articleID UNSIGNED INT
propertyID UNSIGNED INT
value VARCHAR(255)
The primary key is a compound key of articleID and propertyID.
I want Sphinx to search through the value column. However, to create an index in Sphinx, I need a unique id. I don't have right here.
Also when searching, I want to be able to filter on the propertyID column (only search values for propertyID 2 for example, which I can do by defining it as attribute).
On the Sphinx forum, I found I could create a multi-value attribute, and set this as query for my Sphinx index:
SELECT articleID, value, GROUP_CONCAT(propertyID) FROM t1 GROUP BY articleID
articleID will be unique now, however, now I'm missing values. So I'm pretty sure this isn't the solution, right?
There are a few other options, like:
Add an extra column to the table, which is unique
Create a calculated unique value in the query (like articleID*100000+propertyID)
Are there any other options I could use, and what would you do?

In your suggestions
Add an extra column to the table, which is unique
This can not be done for an existing table with large number of records as adding a new field to a large table take some time and during that time the database will not be responsive.
Create a calculated unique value in the query (like articleID*100000+propertyID)
If you do this you have to find a way to get the articleID and propertyID from the calculated unique id.
Another alternative way is that you can create a new table having a key field for sphinx and another two fields to hold articleID and propertyID.
new_sphinx_table with following fields
id - UNSIGNED INT/ BIGINT
articleID - UNSIGNED INT
propertyID - UNSIGNED INT
Then you can write an indexing query like below
SELECT id, t1.articleID, t1.propertyID, value FROM t1 INNER JOIN new_sphinx_table nt ON t1.articleID = nt.articleID AND t1.propertyID = nt.propertyID;
This is a sample so you can modify it to fit to your requirements.
What sphinx return is matched new_sphinx_table.id values with other attributed columns. You can get result by using new_sphinx_table.id values and joining your t1 named table and new_sphinx_table

Related

Database design for multiple models?

I have this design.
Table models:
id - primary key
title - varchar(256)
Table model_instances:
id - primary key
model_id - foreign key to app_models.id
title - varchar(256)
Table model_fields:
id - pk
model_id - foreign key to models.id
instance_id - foreign key to model_instances.id
title - name of the field
type - enum [text, checkbox, radio, select, 'etc']
Table model_field_values:
instance_id - forein key model_instance.id
field_id - foreign key to model_fields.id
value - text
Also there can be many values for some field (like for multiple select dropdown)
The problem is: value is always text field, because I want to store different types of data (text, datetime, integer) and this table contains all values for all instances of all models.
For example, if I have 10 models and every model has 1000 instances with 10 fields then model_field_values (at minimum) would contain 100000 rows, if some fields are multiple, then it would contain (120000-150000 rows).
SQL's select using value field would be slow.
Solution 1:
For every model create new model_field_values like:
model.id = 1, model_field_values_1
...
model.id = 10, model_field_values_10
Solution 2:
Because model_fields contains all fields for model, we can create model_field_values like this
model_fields for model.id=1 (by primary key): 1 - text, 2 - integer, 3 - datetime, 4 - smalltext
Fields for model_field_values_1: field_1 text, field_2 integer, field_3 datetime, field_4 varchar(256)
This solution is not good for fields with multiple values, because every multiple value should have another table with link to the row in model_field_values_1, but it is good for searching through database because mysql would use native datatypes in where clauses (not text fields).
May be I miss something? May be there is a better design?
This database would be used in crm-system, where user can create different model with many instances in these models, so I can not preconfigure all tables with all columns.

Note: 200,000 rows (two tenths of a megarow) is, in the usual operation of MySQL, a medium sized table. It's generally possible to index such a table fairly efficiently. http://use-the-index-luke.com/
That being said, I think I understand your problem. It is, in the jargon of object-oriented design, polymorphism.
You have this model_field_value table, containing
instance_id
field_id
value
Your problem is, the value's native data type is sometimes VARCHAR(255), sometimes DATETIME or maybe TIMESTAMP, and sometimes INT.
And you'll sometimes need to do queries like this one
SELECT fv.instance_id
FROM model_field_value fv
WHERE fv.field_id = something
AND fv.value >= '2017-01-01'
AND fv.value < '2018-01-01'
to find DATETIME values that happened in calendar year 2017. For example.
This is generally a pain in the neck with key/value storage like what you need. For a query like my example to be sargable, you need to be able to put an index on a DATETIME column. But if you don't have such a column, you can't index it. Duh.
Here's a suggestion. Give your table these columns.
instance_id INT pk fk
field_id INT pk fk
value VARCHAR(255) a text representation of every value.
value_double DOUBLE a numeric representation of every numeric value, or NULL
value_ts TIMESTAMP a timestamp value if possible, or NULL
This table will contain redundant data, and you'll have to be very careful when you're writing it to make sure it's correct. But you will be able to put indexes on the value_ts and value_double columns, so you can make those kinds of queries sargable.
Just an idea.

Is there a keyword to identify a PRIMARY column in a MySQL WHERE clause?

I have a situation where the column name "field1" and "field3" are not given to me but all the other data is. The request is coming in via a url in like: /table1/1 or /table2/3 and it is assumed that 1 or 3 represent the primary key. However, the column name may be different. Consider the following 2 queries:
SELECT * FROM table1 where field1 = 1 and field2 =2;
SELECT * FROM table2 where field3 = 3 and field4 =4;
Ideally, I'd like to perform a search like the following:
SELECT * FROM table1 where MYSQL_PRIMARY_COLUMN = 1 and field2 =2;
SELECT * FROM table2 where MYSQL_PRIMARY_COLUMN = 3 and field4 =4;
Is there a keyword to identify MYSQL_PRIMARY_COLUMN in a MySQL WHERE clause?

No, there's no pseudocolumn you can use to map to the primary key column. One reason this is complicated is that a given table may have a multi-column primary key. This is a totally ordinary way to design a table:
CREATE TABLE BooksAuthors (
book_id INT NOT NULL,
author_id INT NOT NULL,
PRIMARY KEY (book_id, author_id)
);
When I implemented the table data gateway class in Zend Framework 1.0, I had to write code to "discover" the table's primary key column (or columns) as #doublesharp describes. Then the table object instance retained this information so that searches and updates knew which column (or columns) to use when generating queries.
I understand you're looking for a solution that doesn't require this "2 pass process" but no such solution exists for the general case.
Some application framework environments attempt to simplify the problem by encouraging developers to give every table a single column primary key, and name the column "id" by convention. But this convention is an oversimplification that fails to cover many legitimate table designs.

You can use DESCRIBE (which is a synonym for EXPLAIN) to get information about the table, which will include the all column information.
DESCRIBE `table`;
You can also use SHOW INDEX to just get information about the PRIMARY key.
SHOW INDEX FROM `table` WHERE Key_name = 'PRIMARY'

Resort MySQL Table by Column Alphabetically

I have table containing settings for an application with the columns: id, key, and value.
The id column is auto-incrementing but as of current, I do not use it nor does it have any foreign key constraints. I'm populating the settings and would like to restructure it so they are alphabetical as I've not been putting the settings in that way, but reordering alphabetically would help group related settings together.
For example, if I have the following settings:
ID KEY VALUE
======================================
1 App.Name MyApplication
2 Text.Title Title of My App
3 App.Version 0.1
I would want all the App.* settings to be grouped together sequential without having to do an ORDER BY everytime. Anyway, thats the explanation. I have tried the following and it didn't seem to change the order:
CREATE TABLE mydb.Settings2 LIKE mydb.Settings;
INSERT INTO mydb.Settings2 SELECT `key`,`value` FROM mydb.Settings ORDER BY `key` ASC;
DROP TABLE mydb.Settings;
RENAME TABLE mydb.Settings2 TO mydb.Settings;
That will make a duplicate of the table as suggested, but won't restructure the data. What am I missing here?

The easy way to reorder a table is with ALTER TABLE table ORDER BY column ASC. The query you tried looks like it should have worked, but I know the ALTER TABLE query works; I use it fairly often.
Note: Reordering the data in a table only works and makes sense in MyISAM tables. InnoDB always stores data in PRIMARY KEY order, so it can't be rearranged.

Decided to make that an answer.
As I said in a comment to the initial answer, for you to achieve a long term effect you need to recreate the settings table with the key column as the PRIMARY KEY. Because as G-Nugget correctly said 'InnoDB always stores data in PRIMARY KEY order'.
You can do that like this
CREATE TABLE settings2
(`id` int NULL, `key` varchar(64), `value` varchar(64), PRIMARY KEY(`key`));
INSERT INTO settings2
SELECT id, `key`, `value`
FROM settings;
DROP TABLE settings;
RENAME TABLE settings2 TO settings;
That way you get your order intact after inserting new records.
And if you don't need the initial id column in settings table it's a good time to ditch it.
Here is working sqlfiddle
Disclaimer: Personally I would use ORDER BY anyway

what is the best way to optimize join query with multiple AND's?

I'm developing an online reservation system where people can reserve items based on availability for a particular hour of the day. For that i'm using two tables 1.Item 2.Reservation
Item:(InnoDB)
-------------------------
id INT (PRIMARY)
category_id MEDIUMINT
name VARCHAR(20)
make VARCHAR(20)
availability ENUM('1','0')
Reservation:(InnoDB)
-------------------------
id INT (PRIMARY)
date DATE
Item_id INT
slot VARCHAR(50)
SELECT Item.id,Item.category,Item.make,Item.name,reservation.slot
FROM Item
INNER JOIN reservation ON Item.id=reservation.Item_id AND Item.category_id=2
AND Item.availability=1 AND reservation.date = DATE(NOW());
I'm using the above query to display all the items under a particular category with free timeslots which a user can reserve on a particular date.
slot field in reservation table contains string(ex:0:1:1:1:0:0:0:0:0:0:0:0:1:1:1:0:0:0:0:0:0:0:0:0) where 1 means that hour is reserved and 0 means available.
availability in Item table shows wether that item is available for reservation or not(may be down for servicing).
First of all is my table structure fine ?Secondly what is the best way to optimize my query(multi column indexing etc).
thanks,
ravi.

Put a foreign key constraint and index on your FK, this should speed things up a little. You appear to mix INT for the item ID and MEDIUMINT for the FK, not sure this is what nature intended.

Points to remember while choosing an Attribute on which an Index will be created
A column that is frequently used in a SELECT list and a WHERE clause.
A column in which data will be accessed in sequence by a range of values.
A column that will be used with the GROUP By or ORDER BY clause to sort data.
A column used in a join, such as the FOREIGN KEY column.
A column that is used as a PRIMERY KEY.
Try to create index on numeric values. can introduce a surrogate key if no numeric pK is there.

Why are you putting all of those conditions in the join clause? Why not:
SELECT Item.id,Item.category,Item.make,Item.name,reservation.slot
FROM Item INNER JOIN reservation ON Item.id=reservation.Item_id
WHERE Item.category_id=2 AND Item.availability=1 AND reservation.date = DATE(NOW());
I'm not enough of a SQL expert to say whether this will make it faster, but it looks more obvious to me.

I would suggest creating an index on Reservation.item_id. That should help you improve query performance.

opinions and advice on database structure

I'm building this tool for classifying data. Basically I will be regularly receiving rows of data in a flat-file that look like this:
a:b:c:d:e
a:b:c:d:e
a:b:c:d:e
a:b:c:d:e
And I have a list of categories to break these rows up into, for example:
Original Cat1 Cat2 Cat3 Cat4 Cat5
---------------------------------------
a:b:c:d:e a b c d e
As of right this second, there category names are known, as well as number of categories to break the data down by. But this might change over time (for instance, categories added/removed...total number of categories changed).
Okay so I'm not really looking for help on how to parse the rows or get data into a db or anything...I know how to do all that, and have the core script mostly written already, to handle parsing rows of values and separating into variable amount of categories.
Mostly I'm looking for advice on how to structure my database to store this stuff. So I've been thinking about it, and this is what I came up with:
Table: Generated
generated_id int - unique id for each row generated
generated_timestamp datetime - timestamp of when row was generated
last_updated datetime - timestamp of when row last updated
generated_method varchar(6) - method in which row was generated (manual or auto)
original_string varchar (255) - the original string
Table: Categories
category_id int - unique id for category
category_name varchar(20) - name of category
Table: Category_Values
category_map_id int - unique id for each value (not sure if I actually need this)
category_id int - id value to link to table Categories
generated_id int - id value to link to table Generated
category_value varchar (255) - value for the category
Basically the idea is when I parse a row, I will insert a new entry into table Generated, as well as X entries in table Category_Values, where X is however many categories there currently are. And the category names are stored in another table Categories.
What my script will immediately do is process rows of raw values and output the generated category values to a new file to be sent somewhere. But then I have this db I'm making to store the data generated so that I can make another script, where I can search for and list previously generated values, or update previously generated entries with new values or whatever.
Does this look like an okay database structure? Anything obvious I'm missing or potentially gimping myself on? For example, with this structure...well...I'm not a sql expert, but I think I should be able to do like
select * from Generated where original_string = '$string'
// id is put into $id
and then
select * from Category_Values where generated_id = '$id'
...and then I'll have my data to work with for search results or form to alter data...well I'm fairly certain I can even combine this into one query with a join or something but I'm not that great with sql so I don't know how to actually do that..but point is, I know I can do what I need from this db structure..but am I making this harder than it needs to be? Making some obvious noob mistake?

My suggestion:
Table: Generated
id unsigned int autoincrement primary key
generated_timestamp timestamp
last_updated timestamp default '0000-00-00' ON UPDATE CURRENT_TIMESTAMP
generated_method ENUM('manual','auto')
original_string varchar (255)
Table: Categories
id unsigned int autoincrement primary key
category_name varchar(20)
Table: Category_Values
id unsigned int autoincrement primary key
category_id int
generated_id int
category_value varchar (255) - value for the category
FOREIGN KEY `fk_cat`(category_id) REFERENCES category.id
FOREIGN KEY `fk_gen`(generated_id) REFERENCES generated.id
Links
Timestamps: http://dev.mysql.com/doc/refman/5.1/en/timestamp.html
Create table syntax: http://dev.mysql.com/doc/refman/5.1/en/create-table.html
Enums: http://dev.mysql.com/doc/refman/5.1/en/enum.html

I think this solution is perfect for what you want to do. The Categories list is now flexible so that you can add new categories or retire old ones (I would recommend thinking long and hard about it before agreeing to delete a category - would you orphan record or remove them too, etc.)
Basically, I'm saying you are right on target. The structure is simple but it will work well for you. Great job (and great job giving exactly the right amount of information in the question).

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008