MySql string manipulation, selecting items from text - mysql

I have a "changesets" table which has a comments column where people enter references to bug issues in the format "Fixed issue #2345 - ......", but can also be "Fixed issues #456, #2956, #12345 ...."
what's the best way to select these reference numbers so i can access the issues via a join.
given this change sets table
id comments
===========================
1 fixed issue #234 ....
2 DES - #789, #7895, #123
3 closed ticket #129
i'd like results like this
changeset_id issue_id
=====================
1 234
2 789
2 7895
2 123
3 129
I've used substring_index(substring_index('#',-1),' ',1) type construct but that will only return a single reference per line.
Also looking for the most efficient way to do this text lookup
Any help appreciated
Thanks

Here's one (bloated/messy) approach on how to get the desired dataset...
Step 1 - figure out what the maximum # of issue ids is
SELECT MAX(LENGTH(comments)- LENGTH(REPLACE(comments,'#',''))) AS max_issues
FROM change_sets
Step 2 - recursively create a UNION'd query with a number of "levels" equal to the maximum number of issue ids. For your example,
SELECT changeset_id, issue_id FROM
(
SELECT id AS changeset_id, CAST(SUBSTRING_INDEX(comments,'#',-1) AS UNSIGNED) AS issue_id FROM change_sets
UNION
SELECT id AS changeset_id, CAST(SUBSTRING_INDEX(comments,'#',-2) AS UNSIGNED) AS issue_id FROM change_sets
UNION
SELECT id AS changeset_id, CAST(SUBSTRING_INDEX(comments,'#',-3) AS UNSIGNED) AS issue_id FROM change_sets
) a
HAVING issue_id!=0
ORDER BY changeset_id, issue_id
I'm taking advangage of UNION's ability to remove duplicate rows, and CAST's ability to use the leading numeric values when deciding the integer.
The result using your toy dataset:
+--------------+----------+
| changeset_id | issue_id |
+--------------+----------+
| 1 | 234 |
| 2 | 123 |
| 2 | 789 |
| 2 | 7895 |
| 3 | 129 |
+--------------+----------+

Related

Skipping row for each unique column value

I have a table from which I would like to extract all of the column values for all rows. However, the query needs to be able to skip the first entry for each unique value of id_customer. It can be assumed that there will always be at least two rows containing the same id_customer.
I've compiled some sample data which can be found here: http://sqlfiddle.com/#!9/c85b73/1
The results I would like to achieve are something like this:
id_customer | id_cart | date
----------- | ------- | -------------------
1 | 102 | 2017-11-12 12:41:16
2 | 104 | 2015-09-04 17:23:54
2 | 105 | 2014-06-05 02:43:42
3 | 107 | 2011-12-01 11:32:21
Please let me know if any more information/better explanation is required, I expect it's quiet a niche solution.
One method is:
select c.*
from carts c
where c.date > (select min(c2.date) from carts c2 where c2.id_customer = c.id_customer);
If your data is large, you want an index on carts(id_customer, date).

INSERT data from one table INTO another with the copies (as many as `quantity` field in first table says)

I have an MySQL table creatures:
id | name | base_hp | quantity
--------------------------------
1 | goblin | 5 | 2
2 | elf | 10 | 1
And I want to create creature_instances based on it:
id | name | actual_hp
------------------------
1 | goblin | 5
2 | goblin | 5
3 | elf | 10
The ids of creatures_instances are not important and not relevant to creatures.ids.
How can I make it with just the MySQL in the most optimal (in terms of execution time) way? The single query would be best, but procedure is ok too. I use InnoDB.
I know that with a help of e.g. php I could:
select each row separately,
make for($i=0; $i<line->quantity; $i++) loop in which I insert one row to creatures_instances for each iteration.
The most efficient way is to do everything in SQL. It helps if you have a numbers table. Without one, you can generate the numbers in a subquery. The following works up to 4 copies:
insert into creatures_instances(id, name, actual_hp)
select id, name, base_hp
from creatures c join
(select 1 as n union all select 2 union all select 3 union all select 4
) n
on n.n <= c.quantity;

SELECT from Union x 3 using filter of another table

Background
I have a web application which must remove entries from other tables, filtered through a selection of 'tielists' from table 1 -> item_table 1, table 2, table 3.... now basically my result set is going to be filthy big unless I use a filter statement from another table, using a user_id... so can someone please help me structure my statement as needed? TY!
Tables
cars_belonging_to_user
-----------------------------
ID | user_id | make | model
----------------------------
1 | 1 | Toyota | Camry
2 | 1 |Infinity| Q55
3 | 1 | DMC | DeLorean
4 | 2 | Acura | RSX
Okay, Now the three 'tielists'
name:tielist_one
----------------------------
id | id_of_car | id_x | id_y|
1 | 1 | 12 | 22 |
2 | 2 | 23 | 32 |
-----------------------------
name:tielist_two
-------------------------------
id | id_of_car | id_x | id_z|
1 | 3 | 32 | 22 |
-----------------------------
name: tielist_three
id | id_of_car | id_x | id_a|
1 | 4 | 45 | 2 |
------------------------------
Result Set and Code
echo name_of_tielist_table
// I can structure if statements to echo result sets based upon the name
// Future Methodology: if car_id is in tielist_one, delete id_x from x_table, delete id_y from y_table...
// My output should be a double select base:
--SELECT * tielists from WHERE car_id is 1... output name of tielist... then
--SELECT * from specific_tielist where car_id is 1.....delete x_table, delete y_table...
Considering the list will be massive, and the tielist equally long, I must filter the results where car_id(id) = $variable && user_id = $id....
Side Notes
Only one car id will appear once in any single tielist..
This select statement MUST be filtered with user_id = $variable... (and remember, i'm looking for which car id too)
I MUST HAVE THE NAME of the tielist it comes from able to be echo'd into a variable...
I will only be looking for one single id_of_car at any given time, because this select will be contained in a foreach loop.
I was thinking a union all items would do the trick to select the row, but how can I get the name of the tielist the row is in, and how can the filter be used from the user_id row
If you want performance, I would suggest left outer join instead of union all. This will allow the query to make efficient use of indexes for your purpose.
Based on what you say, a car is in exactly one of the lists. This is important for this method to work. Here is the SQL:
select cu.*,
coalesce(tl1.id_x, tl2.id_x, tl3.id_x) as id_x,
tl1.y, tl2.idz, tl3.id_a,
(case when tl1.id is not null then 'One'
when tl2.id is not null then 'Two'
when tl3.id is not null then 'Three'
end) as TieList
from Cars_Belonging_To_User cu left ouer join
TieList_One tl1
on cu.id_of_car = tl1.id_of_car left outer join
TieList_Two tl2
on cu.id_of_car = tl2.id_of_car left outer join
TieList_Three tl3
on cu.id_of_car = tl3.id_of_car;
You can then add a where clause to filter as you need.
If you have an index on id_of_car for each tielist table, then the performance should be quite good. If the where clause uses an index on the first table, then the joins and where should all be using indexes, and the query will be quite fast.

mysql sphinx generate unique id

I have my database segmented in 8 parts, where each part contains database with table user_data, for better search performance im using sphinx to index all those data but im come accross one problem ... since table user_data dont have any unique field to represent each row but its a 1 to many table i have trouble to run my sphinx index correctly since it requires unique id and this way resolve in duplicate ids ... any idea how can i workaround this? or generate unique id throught all sub indexes from different segments?
example:
SELECT user_id, item_id, info
FROM user_data
Which returns something like:
+----------+-----------------------+
| user_id | item_id | info |
+----------+-----------------------+
| 10 | 151 | asdf |
| 10 | 152 | test |
| 11 | 151 | 545 |
| 12 | 151 | sdfsd |
| 12 | 152 | eewwe |
| 12 | 153 | dfsd |
but i have to get
+----------+-----------------------------+
| user_id | item_id | info | id |
+----------+-----------------------------+
| 10 | 151 | asdf | 1 |
| 10 | 152 | test | 2 |
| 11 | 151 | 545 | 3 |
| 12 | 151 | sdfsd | 4 |
| 12 | 152 | eewwe | 5 |
| 12 | 153 | dfsd | 6 |
of course id must be unique throght all segments
first of all you should set before action query to set your variable
sql_query_pre = SET #a := 1;
then use this variable to get fictional auto increment
sql_query = SELECT #a := #a + 1 AS id, user_id, item_id, info FROM user_data
I'm unfamiliar with Sphinx, but if you're looking to create ids that are unique across tables, in your case:
One option is to use a UUID as a unique index on all the tables -- the chances of them colliding are minute.
Another option is, if you know the max size of a table, to only use numbers in that range plus an offset. E.g., Table 1's ids: 1 - 10000, Table 2's ids: 10001 - 20000, etc. You can even set the id fields to be AUTO_INCREMENT and set their start numbers at the beginning of the particular range.
You could do something like this while indexing:
SELECT user_id + 10 * 1 AS id, 1 AS segment_id, itme_id, info FROM user_data_1
... adding a segment_id. You would have eight of these, so the indexing query would look something like:
SELECT (user_id + 10) * 1 AS id, 1 AS segment_id, itme_id, info FROM user_data_1
UNION
SELECT (user_id + 10) * 2 AS id, 2 AS segment_id, itme_id, info FROM user_data_2
UNION
SELECT (user_id + 10) * 3 AS id, 3 AS segment_id, itme_id, info FROM user_data_3
UNION
SELECT (user_id + 10) * 4 AS id, 4 AS segment_id, itme_id, info FROM user_data_4
UNION
SELECT (user_id + 10) * 5 AS id, 5 AS segment_id, itme_id, info FROM user_data_5
UNION
SELECT (user_id + 10) * 6 AS id, 6 AS segment_id, itme_id, info FROM user_data_6
UNION
SELECT (user_id + 10) * 7 AS id, 7 AS segment_id, itme_id, info FROM user_data_7
UNION
SELECT (user_id + 10) * 8 AS id, 8 AS segment_id, itme_id, info FROM user_data_8
Then when you query sphinx and get back the IDs, just undo the arithmetic by dividing the id by segment_id and subtracting 10. This way all the ids will be unique within sphinx. Just make sure the attribute type can handle the size of the ids you'll be indexing.
As proposed in another answer saying to use UUID. But sphinx can not use UUID as id. You will need an INT. Therefor use UUID_SHORT and then you have a unique integer (for mysql). If this does not work out of the box (e.g. if you aare using Ubuntu-11.04), you will get an error like this:
WARNING: DOCID_MAX document_id, skipping
You will need to compile sphinx source with –enable-id64, or just go to sphinx website and get an up to date package (which is compiled with –enable-id64). A more complete example of this indexing method is given in this blog entry
We are using crc32(uuid_short()) for 32 bit implementations of sphinx. This works, most of the time! Ofc. one cannot rely upon a 32 bit digest of a

MySQL - COUNT before INSERT in one query

Hey all, I am looking for a way to query my database table only once in order to add an item and also to check what last item count was so that i can use the next number.
strSQL = "SELECT * FROM productr"
After that code above, i add a few product values to a record like so:
ID | Product | Price | Description | Qty | DateSold | gcCode
--------------------------------------------------------------------------
5 | The Name 1 | 5.22 | Description 1 | 2 | 09/15/10 | na
6 | The Name 2 | 15.55 | Description 2 | 1 | 09/15/10 | 05648755
7 | The Name 3 | 1.10 | Description 3 | 1 | 09/15/10 | na
8 | The Name 4 | 0.24 | Description 4 | 21 | 09/15/10 | 658140
i need to count how many times it sees gcCode <> 'na' so that i can add a 1 so it will be unique. Currently i do not know how to do this without opening another database inside this one and doing something like this:
strSQL2 = "SELECT COUNT(gcCode) as gcCount FROM productr WHERE gcCode <> 'na'
But like i said above, i do not want to have to open another database query just to get a count.
Any help would be great! Thanks! :o)
There's no need to do everything in one query. If you're using InnoDB as a storage engine, you could wrap your COUNT query and your INSERT command in a single transaction to guarantee atomicity.
In addition, you should probably use NULL instead of na for fields with unknown or missing values.
They're two queries; one is a subset of the other which means getting what you want in a single query will be a hack I don't recommend:
SELECT p.*,
(SELECT COUNT(*)
FROM PRODUCTR
WHERE gccode != 'na') AS gcCount
FROM PRODUCTR p
This will return all the rows, as it did previously. But it will include an additional column, repeating the gcCount value for every row returned. It works, but it's redundant data...