Converting MySQL to a Sphinx Search Platform - mysql

Currently working on an in-house search engine for over 12 GB / month of MySQL data.
We currently have two tables, practice prescribing, and practice information.
Both tables contain a column, practice number which identifies the practice information with their prescribing information.
I'm trying to migrate the system from MySQL searching to Sphinx Search.
The issue I'm having is the format of the practice number is STR:NUM:NUM.
Sphinx Search says that is an invalid or Null ID format and a ID needs to be just NUM.
An example of our current ID's is YV0091 which will have corresponding data in both tables.
The ID's cannot be changed or manipulated due to them being a standardised ID in our industry.
What should I do to get around this?

Well the document-id itself, in effects Sphinxes 'primary key' does need to be a simple integer. But it doesnt need to match an actual column in your database. (bit like in innodb, if dont have in integer primary key, it will create a 'rowid' internally)
Alas sphinx doesnt have a 'autoincrement' style way of allocating the id, so need to contrive it yourself. For example using a mysql user-variable...
sql_query_pre = SET #rowid:=1
sql_query = SELECT #rowid:=#rowid+1 as id, practice_id, name, ...
sql_attr_string = practice_id
... also includes putting the your practice id as an attribute. This means can still get it in queries, eg rather than using SELECT id FROM ... in sphinxql, can just do SELECT practice_id FROM ... instead.

Related

Auto increment MySQL decimal number problems

I am building an application that will have one table of clients that has an autoincrement id INT field. Then I have an HTML "case" form where the user will have to chose a client from the dropdown, then add some info about "case" that will go into another table.
That means that the client will have an id of 1,2,3 and so on. And I would like that the case adds one decimal number on id number of the client chosen from dropdown. So for Client number two + 1: 2.1, 2.2 and so on. Client number 3, 3.1, 3.2 etc.
What is the best way to add that case filed to SQL? I see if I chose Decimal for a case id field I'm getting number 3.4 as 3.400 because I have chosen decimal 4,3 (MySQL) for testing. I Need to have such decimals because the number of cases can go to hundreds, I can not trim that. I'm struggling with the type of MySQL fields and how to approach this problem.
I'd appreciate some guidance.
The only thing I can think of is to pass the value of a client and then do id + "." + 1, and store it as decimal 1,1 (MySQL), will that auto autoincrement to 1.2 and so on?
The MySQL auto-increment mechanism only increments by whole integers. Sorry, that's the way it is implemented.
The best way to design your Case table in MySQL is this:
CREATE TABLE Cases (
case_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
client_id INT NOT NULL,
...other attributes of the case...
FOREIGN KEY (client_id) REFERENCES Client (client_id)
);
It will have one auto-increment counter for the table, and all clients will need to share this number. This means the case numbers won't always be consecutive for a given client, and they won't start at 1 for each client. Sorry, that's the way auto-increment works in MySQL.
The question has been asked many times with some variation of, "how can I make an auto-increment that renumbers for each group?" You could read the MAX(case_id) for the given client for which you need to insert a case, and then using the max case_id + 1 in your INSERT. In other words, forget about using the auto-increment feature, and calculate the id yourself.
You have to lock the table while doing this to avoid race conditions; two concurrent users could be inserting at the same time, and read the same value for MAX(case_id) and try to insert the same value.
Your plan of using decimal numbers will lead to problems.
What if one day you have a client with more than 999 cases? You'd have to reformat all your case id's, not only for the client with 1000 cases, but for all clients. Any references to the case id's that you had sent out in paper statements and reports would become invalid.
How would you do an SQL query to search for all cases for a given client? If you had client_id in its own column, it would be a query like SELECT ... FROM Case WHERE client_id = 3 but if you have to do a query like ... WHERE case_id BETWEEN 3.000 AND 3.999 it's less clear and harder to optimize. It's also harder to explain to a new programmer you hire for the project. If you end up extending the id format to 4 digits past the decimal, you'd have to rewrite all these SQL queries.
Don't do it. This is the best piece of advise I can give to you.
You are trying to use what was called "Inteligent Codes" back in the 80s.
They went out of fashion for a good reason. Very expensive to develop, non-mantainable, limited ranges, you-name-it. Stay away from them and use normal foreign keys instead. They give you all the flexibility you'll need when the application grows.

Storing multiple values for a single variable (PDO)

I have a MySQL table that stores user emails:
user_id | user_phonenumber
----------------------------
id1 | 555-123456789
I want to allow the user to store multiple phonenumbers and I don't want to limit the number of numbers a user can be associated with.
What's the best way of structuring my data, and how would a query work in PDO?
For example, should I store them all in the same field with comma separators and then parse the output when the query is returned, or should I use another table and have each row as a separate number with common user_ids? How would a lookup work then (please provide example code if possible)?
Thanks
Generally RDBMS systems are designed to access fields/rows. Everything will be much harder when you start to break the data-field link/consistency/logic.
I mean when you start to store more data in a single field.
But you know your system's future. It can happen that you won't ever have to access for example the first phone number, and if you can handle it everywhere as a blob then it can be fine to store more values in a single field.
Anyway If this is not a homework or similar short living task then you should choose the 1 phone number/1 record approach.
I mean something like this can be future proof:
create table user_phonenumbers(
id auto_increment primary key.
user_id integer references user(id),
phonenumber varchar(32)
);
Yes, use another table to store user phone numbers.
use inner join to lookup, it would be good.

How to search either on id or name for certain purchase orders

We would like to filter purchase orders either based on purchase order id (primary key) or name of the purchase order using a single search box.
We used the like parameter to search on the name field, but it doesn't seem to work on the primary key. It works only when we use the equal operator for id(s). But it would be preferable if we can filter purchase orders using like for id(s). How to do this?
create table purchase_orders (
id int(11) primary key,
name varchar(255),
...
)
Option 1
SELECT *
FROM purchase_orders
WHERE id LIKE '%123%'; -- tribute to TemporaryNickName
This is horrible, performance-wise :)
Option 2a
Add a text column which receives a string version of id. Maybe add some triggers to populate it automatically.
Option 2b
Change the type of id column to CHAR or VARCHAR (I believe CHAR should be preferred for a primary key).
In both 2a. and 2b. cases, add an index (maybe a FULLTEXT one) to this column.
I think LIKE should work. I assume that your SQL wasn't correctly written.
Let's assume that you have order name "ABCDEF" then you can find this using the following query structure.
SELECT id FROM purchase_orders WHERE name LIKE '%CD%';
To explain it, % sign means it's a wildcard. As a result this query is going to select any String that contains "CD" inside of it.
According to the table structure, varchar can contain 255 characters. I think this is quite a large string and it's probably going to consume a lot of resources and going to take more time to search something using SQL functions like LIKE. You can always search it by id
WHERE id = something. This is much faster way btw
, but I don't think order id is an user friendly data, instead I would let users to use product name. My recommendation is to use apache Lucene or MySQL's full text search feature (which can improve search performance).
Apache lucene
MySQL Full text search function
These are tools built to search certain pattern or word through list of large strings in much faster way. Many websites use this to build their own mini search engines. I found mysql full text search function requires pretty much no learning curve and straight forward to use =D

Using Sphinx for the first time - configuring the sql_query key

I'm currently practicing using Sphinx, I've not far off done much, except the configuration what I'm trying to do. The sql_query key is leaving me somewhat confused what to put there, I read in the Sphinx documentation of sql_query but it doesn't seem to clear my mind from knowing what to do since I have many SELECTs in my web application, and I want to use Sphinx for my search and the SQL is often changed (upon user search filtering).
As of my search using MySQL, I want to integrate Sphinx to my web application, if the sql_key is not optional, do I have to expect to put the whole search SQL query into that field or do I pick out the necessary fields from tables to start a reindex?
Can someone point me to the right direction so I can get things going well with Sphinx and my web application.
sql_query is mandatory , it's run by sphinx to get the data you want to be indexed from mysql . You can have joins , conditions etc. , must be a valid sql query . You should have something like "SELECT id ,field1,field2,fieldx from table" . id must be a primary id .Each row returned by this query is considered a document ( which is returned by sphinx when you search ) .
If you have multiple tables ( that are very different by meaning - users , articles etc.) - you need to create an index for each .
Read tutorials from here : http://sphinxsearch.com/info/articles/ to understand how sphinx works .
You can create a sql query to get union set of records from the Database. If you do multiple table joining and query to select the best result set, you can do it with Sphinx too.
You may run into a few trouble with your existing table structure in the database.
Like :
Base table does not have integer primary key field
Create a new table which has two fields. One for the integer id field and the other field to hold the primary key of the base table. Do an inner join with that table and select the id field from that table.
Eg. SELECT t1.id, t2.name, t2.description, t2.content FROM table_new t1 INNER JOIN table_2 t2 WHERE t1.document_id = t1.thread_id INNER JOIN REST_OF_YOUR_SELECT_QUERY
The ta.id is for Sphinx search engine to do its internal indexing.
You filter data by placing WHERE clause and filtering
You can do that in Sphinx by setting filters dynamically based on the conditions.
You select and join different tables to get results
This also can be done by setting different sources and indexes based on your requirements.
Hope this would help you to get an understanding what you need to add and modify to start thinking how Sphinx search engine can be configured to your requirements. Just come here again if your need more help.

MySQL 5.5 Database design. Problem with friendly URLs approach

I have a maybe stupid question but I need to ask it :-)
My Friendly URL (furl) database design approach is fairly summarized in the following diagram (InnoDB at MySQL 5.5 used)
Each item will generate as many furls as languages available on the website. The furl_redirect table represents the controller path for each item. I show you an example:
item.id = 1000
item.title = 'Example title'
furl_redirect = 'item/1000'
furl.url = 'en/example-title-1000'
furl.url = 'es/example-title-1000'
furl.url = 'it/example-title-1000'
When you insert a new item, its furl_redirect and furls must be also inserted. The problem appears becouse of the (necessary) unique constraint in the furl table. As you see above, in order to get unique urls, I use the title of the item (it is not necessarily unique) + the id to create the unique url. That means the order of inserting rows should be as follow:
1. Insert item -- (and get the id of the new item inserted) ERROR!! furl_redirect_id must not be null!!
2. Insert furl_redirect -- (need the item id to create de path)
3. Insert furl -- (need the item id to create de url)
I would like an elegant solution to this problem, but I can not get it!
Is there a way of getting the next AutoIncrement value on an InnoDB Table?, and is it recommended to use it?
Can you think of another way to ensure the uniqueness of the friendly urls that is independent of the items' id? Am I missing something crucial?
Any solution is welcome!
Thanks!
You can get an auto-increment in InnoDB, see here. Whether you should use it or not depends on what kind of throughput you need and can achieve. Any auto-increment/identity type column, when used as a primary key, can create a "hot spot" which can limit performance.
Another option would be to use an alphanumeric ID, like bit.ly or other URL shorteners. The advantage of these is that you can have short IDs that use base 36 (a-z+0-9) instead of base 10. Why is this important? Because you can use a random number generator to pick a number out of a fairly big domain - 6 characters gets you 2 billion combinations. You convert the number to base 36, and then check to see if you already have this number assigned. If not, you have your new ID and off you go, otherwise generate a new random number. This helps to avoid hotspots if that turns out to be necessary for your system. Auto-increment is easier and I'd try that first to see if it works under the loads that you're anticipating.
You could also use the base 36 ID and the auto-increment together so that your friendly URLs are shorter, which is often the point.
You might consider another ways to deal with your project.
At first, you are using "en/" "de/" etc, for changing language. May I ask how does it work in script? If you have different folders for different languages your script and users must suffer a lot. Try to use gettext or any other localisation method (depends on size of your project).
About the friendly url's. My favorite method is to have only one extra column in item's table. For example:
Table picture
id, path, title, alias, created
Values:
1, uploads/pics/mypicture.jpg, Great holidays, great-holidays, 2011-11-11 11:11:11
2, uploads/pics/anotherpic.jpg, Great holidays, great-holidays-1, 2011-12-12 12:12:12
Now in the script, while inserting the item, create alias from title, check if the alias exists already and if does, you can add id, random number, or count (depending on how many same titles u have already).
After you store the alais like this its very simple. User try to access
http://www.mywebsite.com/picture/great-holidays
So in your script you just see that user want to see picture, and picture with alias great-holidays. Find it in DB and show it.