I have my database segmented in 8 parts, where each part contains database with table user_data, for better search performance im using sphinx to index all those data but im come accross one problem ... since table user_data dont have any unique field to represent each row but its a 1 to many table i have trouble to run my sphinx index correctly since it requires unique id and this way resolve in duplicate ids ... any idea how can i workaround this? or generate unique id throught all sub indexes from different segments?
example:
SELECT user_id, item_id, info
FROM user_data
Which returns something like:
+----------+-----------------------+
| user_id | item_id | info |
+----------+-----------------------+
| 10 | 151 | asdf |
| 10 | 152 | test |
| 11 | 151 | 545 |
| 12 | 151 | sdfsd |
| 12 | 152 | eewwe |
| 12 | 153 | dfsd |
but i have to get
+----------+-----------------------------+
| user_id | item_id | info | id |
+----------+-----------------------------+
| 10 | 151 | asdf | 1 |
| 10 | 152 | test | 2 |
| 11 | 151 | 545 | 3 |
| 12 | 151 | sdfsd | 4 |
| 12 | 152 | eewwe | 5 |
| 12 | 153 | dfsd | 6 |
of course id must be unique throght all segments
first of all you should set before action query to set your variable
sql_query_pre = SET #a := 1;
then use this variable to get fictional auto increment
sql_query = SELECT #a := #a + 1 AS id, user_id, item_id, info FROM user_data
I'm unfamiliar with Sphinx, but if you're looking to create ids that are unique across tables, in your case:
One option is to use a UUID as a unique index on all the tables -- the chances of them colliding are minute.
Another option is, if you know the max size of a table, to only use numbers in that range plus an offset. E.g., Table 1's ids: 1 - 10000, Table 2's ids: 10001 - 20000, etc. You can even set the id fields to be AUTO_INCREMENT and set their start numbers at the beginning of the particular range.
You could do something like this while indexing:
SELECT user_id + 10 * 1 AS id, 1 AS segment_id, itme_id, info FROM user_data_1
... adding a segment_id. You would have eight of these, so the indexing query would look something like:
SELECT (user_id + 10) * 1 AS id, 1 AS segment_id, itme_id, info FROM user_data_1
UNION
SELECT (user_id + 10) * 2 AS id, 2 AS segment_id, itme_id, info FROM user_data_2
UNION
SELECT (user_id + 10) * 3 AS id, 3 AS segment_id, itme_id, info FROM user_data_3
UNION
SELECT (user_id + 10) * 4 AS id, 4 AS segment_id, itme_id, info FROM user_data_4
UNION
SELECT (user_id + 10) * 5 AS id, 5 AS segment_id, itme_id, info FROM user_data_5
UNION
SELECT (user_id + 10) * 6 AS id, 6 AS segment_id, itme_id, info FROM user_data_6
UNION
SELECT (user_id + 10) * 7 AS id, 7 AS segment_id, itme_id, info FROM user_data_7
UNION
SELECT (user_id + 10) * 8 AS id, 8 AS segment_id, itme_id, info FROM user_data_8
Then when you query sphinx and get back the IDs, just undo the arithmetic by dividing the id by segment_id and subtracting 10. This way all the ids will be unique within sphinx. Just make sure the attribute type can handle the size of the ids you'll be indexing.
As proposed in another answer saying to use UUID. But sphinx can not use UUID as id. You will need an INT. Therefor use UUID_SHORT and then you have a unique integer (for mysql). If this does not work out of the box (e.g. if you aare using Ubuntu-11.04), you will get an error like this:
WARNING: DOCID_MAX document_id, skipping
You will need to compile sphinx source with –enable-id64, or just go to sphinx website and get an up to date package (which is compiled with –enable-id64). A more complete example of this indexing method is given in this blog entry
We are using crc32(uuid_short()) for 32 bit implementations of sphinx. This works, most of the time! Ofc. one cannot rely upon a 32 bit digest of a
Related
I have an MySQL table creatures:
id | name | base_hp | quantity
--------------------------------
1 | goblin | 5 | 2
2 | elf | 10 | 1
And I want to create creature_instances based on it:
id | name | actual_hp
------------------------
1 | goblin | 5
2 | goblin | 5
3 | elf | 10
The ids of creatures_instances are not important and not relevant to creatures.ids.
How can I make it with just the MySQL in the most optimal (in terms of execution time) way? The single query would be best, but procedure is ok too. I use InnoDB.
I know that with a help of e.g. php I could:
select each row separately,
make for($i=0; $i<line->quantity; $i++) loop in which I insert one row to creatures_instances for each iteration.
The most efficient way is to do everything in SQL. It helps if you have a numbers table. Without one, you can generate the numbers in a subquery. The following works up to 4 copies:
insert into creatures_instances(id, name, actual_hp)
select id, name, base_hp
from creatures c join
(select 1 as n union all select 2 union all select 3 union all select 4
) n
on n.n <= c.quantity;
Background
I have a web application which must remove entries from other tables, filtered through a selection of 'tielists' from table 1 -> item_table 1, table 2, table 3.... now basically my result set is going to be filthy big unless I use a filter statement from another table, using a user_id... so can someone please help me structure my statement as needed? TY!
Tables
cars_belonging_to_user
-----------------------------
ID | user_id | make | model
----------------------------
1 | 1 | Toyota | Camry
2 | 1 |Infinity| Q55
3 | 1 | DMC | DeLorean
4 | 2 | Acura | RSX
Okay, Now the three 'tielists'
name:tielist_one
----------------------------
id | id_of_car | id_x | id_y|
1 | 1 | 12 | 22 |
2 | 2 | 23 | 32 |
-----------------------------
name:tielist_two
-------------------------------
id | id_of_car | id_x | id_z|
1 | 3 | 32 | 22 |
-----------------------------
name: tielist_three
id | id_of_car | id_x | id_a|
1 | 4 | 45 | 2 |
------------------------------
Result Set and Code
echo name_of_tielist_table
// I can structure if statements to echo result sets based upon the name
// Future Methodology: if car_id is in tielist_one, delete id_x from x_table, delete id_y from y_table...
// My output should be a double select base:
--SELECT * tielists from WHERE car_id is 1... output name of tielist... then
--SELECT * from specific_tielist where car_id is 1.....delete x_table, delete y_table...
Considering the list will be massive, and the tielist equally long, I must filter the results where car_id(id) = $variable && user_id = $id....
Side Notes
Only one car id will appear once in any single tielist..
This select statement MUST be filtered with user_id = $variable... (and remember, i'm looking for which car id too)
I MUST HAVE THE NAME of the tielist it comes from able to be echo'd into a variable...
I will only be looking for one single id_of_car at any given time, because this select will be contained in a foreach loop.
I was thinking a union all items would do the trick to select the row, but how can I get the name of the tielist the row is in, and how can the filter be used from the user_id row
If you want performance, I would suggest left outer join instead of union all. This will allow the query to make efficient use of indexes for your purpose.
Based on what you say, a car is in exactly one of the lists. This is important for this method to work. Here is the SQL:
select cu.*,
coalesce(tl1.id_x, tl2.id_x, tl3.id_x) as id_x,
tl1.y, tl2.idz, tl3.id_a,
(case when tl1.id is not null then 'One'
when tl2.id is not null then 'Two'
when tl3.id is not null then 'Three'
end) as TieList
from Cars_Belonging_To_User cu left ouer join
TieList_One tl1
on cu.id_of_car = tl1.id_of_car left outer join
TieList_Two tl2
on cu.id_of_car = tl2.id_of_car left outer join
TieList_Three tl3
on cu.id_of_car = tl3.id_of_car;
You can then add a where clause to filter as you need.
If you have an index on id_of_car for each tielist table, then the performance should be quite good. If the where clause uses an index on the first table, then the joins and where should all be using indexes, and the query will be quite fast.
I my mysql db I have a table with 3 parameters ( name, views, id ). I need to get row ordered by views. I'm getting something like this.
query:
select
from table
order by views
Result:
id | name | views
------------------------
7 | xxxx | 9000
2 | yyyy | 8000
1 | aaaa | 7000
4 | bbbb | 6000
8 | dddd | 5000
6 | cccc | 4000
5 | oooo | 3000
3 | tttt | 2000
What I need to do, is to get rows ordered by views but starting with specyfic ID. Is it possible. All input that i have is ID. Let sat that ID is 6, this should be output:
id | name | views
------------------------
6 | cccc | 4000
5 | oooo | 3000
3 | tttt | 2000
I can't use LIMIT as I don't really know what is possition at the moment. I just need to get rows which are left starting with ID.
What I'm trying to do is to get infinite scroll, I requesting next elements base on last element that was displayed. Only tricky part is that I'm ordering by views parameter.
select * from table
where (views = 4000 and id>6) or (views < 4000)
order by views desc, id asc;
The tricky part is that you have to know (select) the views of the element with ID 6; also you need to use the ID as secondary sort criteria in order to get consistent results.
Actually this is a common case of a since,until type of paging
SELECT * FROM table
WHERE views <= (SELECT views FROM table WHERE id = 6)
ORDER BY views
I have a table which will be about 2 - 5 million rows on average. It has a primary key/index called 'instruction_id' and another indexed field called 'mode'. now 'instruction_id' is of course unique since it is the primary key but 'mode' will only be one of 3 different values. The query I run all the time is
SELECT * FROM tablename WHERE mode = 'value1' ORDER BY instruction_id LIMIT 50
This currently takes about 25 sec ( > 1 sec is unacceptably long) but there are only 600K rows right now so it will get worse as the table grows. Would indexing in a different way help? If I index instruction_id and mode together will that make a difference? If I somehow am able to naturally order the table by instruction_id so I don't have to ask for the order by would be another way around this but I don't know how to do that... Any help would be great.
You should try index on (mode, instruction_id), in that order.
The reasoning behind that index is that it creates an index like this
mode instruction_id
A 1
A 3
A 4
A 5
A 10
A 11
B 2
B 8
B 12
B 13
B 14
C 6
C 7
C 9
C 15
C 16
C 17
If you search for mode B the sql server can search the index with a binary search on mode until it finds the first B, then it can simply output the next n rows. This would be really fast, about 22 compares for 4M rows.
Always use ORDER BY if you expect the result to be ordered, regardless of how the data is stored. The query engine might choose a query plan that output the rows in a different order than the order of the PK (maybe not in such simple cases as this, but in general).
You should check out the following links relating to innodb clustered indexes
http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html
http://www.xaprb.com/blog/2006/07/04/how-to-exploit-mysql-index-optimizations/
MySQL and NoSQL: Help me to choose the right one
Then build your schema something along the lines of:
drop table if exists instruction_modes;
create table instruction_modes
(
mode_id smallint unsigned not null,
instruction_id int unsigned not null,
primary key (mode_id, instruction_id), -- note the clustered composite PK order !
unique key (instruction_id)
)
engine = innodb;
Cold (mysql restarted) runtime performance as follows:
select count(*) from instruction_modes;
+----------+
| count(*) |
+----------+
| 6000000 |
+----------+
1 row in set (2.54 sec)
select distinct mode_id from instruction_modes;
+---------+
| mode_id |
+---------+
| 1 |
| 2 |
| 3 |
+---------+
3 rows in set (0.06 sec)
select * from instruction_modes where mode_id = 2 order by instruction_id limit 10;
+---------+----------------+
| mode_id | instruction_id |
+---------+----------------+
| 2 | 2 |
| 2 | 3 |
| 2 | 4 |
| 2 | 5 |
| 2 | 6 |
| 2 | 9 |
| 2 | 14 |
| 2 | 25 |
| 2 | 28 |
| 2 | 32 |
+---------+----------------+
10 rows in set (0.04 sec)
0.04 seconds cold seems pretty performant.
Hope this helps :)
Here is one possible solution:
ALTER TABLE `tablename` ADD UNIQUE (`mode`, instruction_id);
Then:
SELECT A.* FROM tablename A JOIN (
SELECT instruction_id FROM tablename
WHERE mode = 'value1'
ORDER BY instruction_id LIMIT 50
) B
ON (A.instruction_id = B.instruction_id);
I have found for large tables that approach seems to work good for speed as the subquery should only be using the index.
I use a similar query on a table with >100mil records and it returns results in 1-2 seconds.
Is 'mode' a character field? If it's only ever going to hold 3 possible values, it sounds like you should make it an enum field, which will still return you the text string but is stored internally as a number.
You should also follow Albin's advice on indexing, which will benefit you further.
I have a "changesets" table which has a comments column where people enter references to bug issues in the format "Fixed issue #2345 - ......", but can also be "Fixed issues #456, #2956, #12345 ...."
what's the best way to select these reference numbers so i can access the issues via a join.
given this change sets table
id comments
===========================
1 fixed issue #234 ....
2 DES - #789, #7895, #123
3 closed ticket #129
i'd like results like this
changeset_id issue_id
=====================
1 234
2 789
2 7895
2 123
3 129
I've used substring_index(substring_index('#',-1),' ',1) type construct but that will only return a single reference per line.
Also looking for the most efficient way to do this text lookup
Any help appreciated
Thanks
Here's one (bloated/messy) approach on how to get the desired dataset...
Step 1 - figure out what the maximum # of issue ids is
SELECT MAX(LENGTH(comments)- LENGTH(REPLACE(comments,'#',''))) AS max_issues
FROM change_sets
Step 2 - recursively create a UNION'd query with a number of "levels" equal to the maximum number of issue ids. For your example,
SELECT changeset_id, issue_id FROM
(
SELECT id AS changeset_id, CAST(SUBSTRING_INDEX(comments,'#',-1) AS UNSIGNED) AS issue_id FROM change_sets
UNION
SELECT id AS changeset_id, CAST(SUBSTRING_INDEX(comments,'#',-2) AS UNSIGNED) AS issue_id FROM change_sets
UNION
SELECT id AS changeset_id, CAST(SUBSTRING_INDEX(comments,'#',-3) AS UNSIGNED) AS issue_id FROM change_sets
) a
HAVING issue_id!=0
ORDER BY changeset_id, issue_id
I'm taking advangage of UNION's ability to remove duplicate rows, and CAST's ability to use the leading numeric values when deciding the integer.
The result using your toy dataset:
+--------------+----------+
| changeset_id | issue_id |
+--------------+----------+
| 1 | 234 |
| 2 | 123 |
| 2 | 789 |
| 2 | 7895 |
| 3 | 129 |
+--------------+----------+