Update a row if a field is a subsequence of a string - mysql

I have a string S = "1-2-3-4-5-6-7-8"
This is how my database table rows look like:
id
SubSequence
1
1-2-4-5
2
1-3-4-5
3
2-5-7-8
4
5-8-9-10
5
6-7-10-11
and so on ...
I want to write a query that would update (in this example) only the first 3 rows because they're a subsequence of string S.
The current solution I have is to programmatically go thru each row, check if it's a subsequence, and update. But I'm wondering if there's a way to do it at the MySQL level for performance.
Update: I don't mind changing the way data is stored. For example, String S could be an array holding those numbers, and the "SubSequence" column can hold those numbers as an array.

No, there is not a way to do the query you describe with good performance in SQL when you store the subsequences as strings like you have done. The reason is that doing substring comparisons cannot be optimized with indexes, so your query will be forced to do the comparisons row by row.
In general, when you try to store sets of values as a string, but you want to use SQL to treat them as discrete values, it's bound to be awkward, difficult to code, and ultimately have bad performance.
In this case, what I would do is make a two tables, one that numbers your entities, and a second table in which each value in your subsequence is stored on a row by itself.
SubSequences:
id
1
2
SubSequenceElements:
id
SubSequenceElement
1
1
1
2
1
4
1
5
2
1
2
3
2
4
2
5
And so on.
Then you can use relational-division techniques to find cases where every element of this set exists in the set you want to compare it to.
Here's an example:
SELECT s.id
FROM SubSequences AS s
LEFT OUTER JOIN (
SELECT id
FROM SubSequenceElements
WHERE SubSequenceElement NOT IN (1,2,3,4,5,6,7,8)
) AS invalid USING (id)
WHERE invalid.id IS NULL;
In other words, you want to return rows from SubSequences such that no match is found in SubSequenceElements with an element value that is not in the set you're trying to match.
It's a bit confusing, because you have to think about the problem is a double-don't-match-this-set problem. But once you get relational division, it can be very powerful.

If the set can be represented by the numbers 0 through 63 (or some subset of that), then...
Using a column like this
elements BIGINT UNSIGNED NOT NULL DEFAULT '0'
Then "2-5-7-8" could be put into it thus:
UPDATE ...
SET elements = (1<<2) | (1<<5) | (1<<7) | (1<<8);
Then various operations can be done in a single expression:
WHERE elements = (1<<2) | (1<<5) | (1<<7) | (1<<8) -- Test for exactly that set
WHERE (elements ^ ~ ( (1<<2) | (1<<5) | (1<<7) | (1<<8) )) != 0
-- checks to see if any other bits are turned on
This last example is close to what you need. One side of the "and not" would have the 1..8 of your example, the other would have
Your example has S represented as 0x1FE;
WHERE subsequence & ~0x1FE
will be 0 (false) for ids 1,2,3; non-zero (true) for ids 4 and 5.

Related

Storing csv in MySQL field – bad idea?

I have two tables, one user table and an items table. In the user table, there is the field "items". The "items" table only consists of a unique id and an item_name.
Now each user can have multiple items. I wanted to avoid creating a third table that would connect the items with the user but rather have a field in the user_table that stores the item ids connected to the user in a "csv" field.
So any given user would have a field "items" that could have a value like "32,3,98,56".
It maybe is worth mentioning that the maximum number of items per user is rather limited (<5).
The question: Is this approach generally a bad idea compared to having a third table that contains user->item pairs?
Wouldn't a third table create quite an overhead when you want to find all items of a user (I would have to iterate through all elements returned by MySQL individually).
You don't want to store the value in the comma separated form.
Consider the case when you decide to join this column with some other table.
Consider you have,
x items
1 1, 2, 3
1 1, 4
2 1
and you want to find distinct values for each x i.e.:
x items
1 1, 2, 3, 4
2 1
or may be want to check if it has 3 in it
or may be want to convert them into separate rows:
x items
1 1
1 2
1 3
1 1
1 4
2 1
It will be a HUGE PAIN.
Use atleast normalization 1st principle - have separate row for each value.
Now, say originally you had this as you table:
x item
1 1
1 2
1 3
1 1
1 4
2 1
You can easily convert it into csv values:
select x, group_concat(item order by item) items
from t
group by x
If you want to search if x = 1 has item 3. Easy.
select * from t where x = 1 and item = 3
which in earlier case would use horrible find_in_set:
select * from t where x = 1 and find_in_set(3, items);
If you think you can use like with CSV values to search, then first like %x% can't use indexes. Second, it will produce wrong results.
Say you want check if item ab is present and you do %ab% it will return rows with abc abcd abcde .... .
If you have many users and items, then I'd suggest create separate table users with an PK userid, another items with PK itemid and lastly a mapping table user_item having userid, itemid columns.
If you know you'll just need to store and retrieve these values and not do any operation on it such as join, search, distinct, conversion to separate rows etc. etc. - may be just may be, you can (I still wouldn't).
Storing complex data directly in a relational database is a nonstandard use of a relational database. Normally they are designed for normalized data.
There are extensions which vary according to the brand of software which may help. Or you can normalize your CSV file into properly designed table(s). It depends on lots of things. Talk to your enterprise data architect in this case.
Whether it's a bad idea depends on your business needs. I can't assess your business needs from way out here on the internet. Talk to your product manager in this case.

When comparing current row with previous row the query is too slow

When subtracting the previous row from the current row the query is too slow, is there a more efficient way to do this?
I am trying to create a data filter which has the capacity to highlight events which occur sequentially to those that do not. I have a table of machine operational data 'source' which is ordered chronologically. Using a WHERE clause I filter out the data which is of less relevance to this particular analysis. The remaining data is inserted into a new table 'filtered'. Using the inserted ID numbers from 'source' I compare each row with its proceeding row to find the difference in value – if the difference is 1 then then the events have occurred in sequence and if the difference is null then they have not. My problem is with the length of time it takes to compare a row with the previous row. I have reduced my data volume to just 2.5% (275000 rows) of what it full volume will be and the query takes 3012 seconds according to the MySQL Workbench action output. I have experimented with structuring the query differently but ultimately have reached dead ends. So my question is – Is there a more efficient way to compare a row with its previous row ?
OK – here are some more details.
/*First I create the table for the filtered data */
drop table if exists filtered_dta;
create table filtered_dta
(
ID int (11) not null auto_increment,
IDx1 int (11),
primary key (ID)
);
/Then I insert the filtered data/
insert into filtered_dta (IDx1)
select seq from source
WHERE range_value < -1.75
and range_value > -5 ;
/* Then I compare each row with its previous */
select t1.ID, t1.IDx1,(t1.IDx1-t2.IDx1)
as seq_value
from filtered_dta t1
left outer join filtered_dta t2
on t1.IDx1 = t2.IDx1+1
order by IDx1
;
Here are sample tables.
Table - filtered_dta Results
| ID | IDx1 | | ID | IDx1 | seq_value |
1 3 1 3 null
2 4 2 4 1
3 7 3 7 null
4 12 4 12 null
5 13 5 13 1
6 14 6 14 1
A full data set from the source table is expected to be between 3 and 10 million rows. The database will create and use about 50 tables. This database is being used as a back end engine for simulation software which does not have the capacity to process this amount of data and give an appropriate analysis of the system which the data represents.
I have spent some time on the issue and have come across the following;
It may be possible that the find_seq table is creates with myISAM and requires converting to an innoDB table. I tried to set the default engine to innoDB but seen no noticeable differences.
This question was similar in its problem of a slow query MySQL query painfully slow on large data - but its issue lay in having a function in a where clause – from my action output I can see the where clause is not too slow.
I would appreciate any input anyone may have on this. Also I am not a proficient user of MySQL so if possible give details.
Kind regards.
You can use something like this template to identify sequential "islands" without a self-join:
SELECT #island := #island + IF(seqId <> #lastSeqId + 1, 1, 0) AS island
, orderQ.[fieldsYouWant]
, #lastSeqId := seqId
FROM (
SELECT [fieldsYouWant], [sequentialIdentifier] AS seqId
FROM [theTable] AS t
, (SELECT #island := 0, #lastSeqId := [somethingItCannotBe]) AS init_dnr -- Initializes variables, do not reference
WHERE [filteringConditionsMet]
ORDER BY [orderingCriteria]
) AS orderingQ
;
I tried keeping it as generic as possible, but you'll note I had to revert to the assumption that seqId was numeric and expected to increment by one. Conditions in the island calculation can be much more complicated if needed (for cases such as where (A, 1), (A, 2), (B, 3) should be two islands based on the sequence not being defined by a single value).
You can take this template further, to identify "island" boundaries and sizes by simple making the above query as subquery for something like:
SELECT island, MIN(seqId), MAX(seqId), COUNT(seqId)
FROM ([above query]) AS islandQ
GROUP BY island
;

database schema one column entry references many rows from another table

Let's say we have a table called Workorders and another table called Parts. I would like to have a column in Workorders called parts_required. This column would contain a single item that tells me what parts were required for that workorder. Ideally, this would contain the quantities as well, but a second column could contain the quantity information if needed.
Workorders looks like
WorkorderID date parts_required
1 2/24 ?
2 2/25 ?
3 3/16 ?
4 4/20 ?
5 5/13 ?
6 5/14 ?
7 7/8 ?
Parts looks like
PartID name cost
1 engine 100
2 belt 5
3 big bolt 1
4 little bolt 0.5
5 quart oil 8
6 Band-aid 0.1
Idea 1: create a string like '1-1:2-3:4-5:5-4'. My application would parse this string and show that I need --> 1 engine, 3 belts, 5 little bolts, and 4 quarts of oil.
Pros - simple enough to create and understand.
Cons - will make deep introspection into our data much more difficult. (costs over time, etc)
Idea 2: use a binary number. For example, to reference the above list (engine, belt, little bolts, oil) using an 8-bit integer would be 54, because 54 in binary representation is 110110.
Pros - datatype is optimal concerning size. Also, I am guessing there are tricky math tricks I could use in my queries to search for parts used (don't know what those are, correct me if I'm in the clouds here).
Cons - I do not know how to handle quantity using this method. Also, Even with a 64-bit BIGINT still only gives me 64 parts that can be in my table. I expect many hundreds.
Any ideas? I am using MySQL. I may be able to use PostgreSQL, and I understand that they have more flexible datatypes like JSON and arrays, but I am not familiar with how querying those would perform. Also it would be much easier to stay with MySQL
Why not create a Relationship table?
You can create a table named Workorders_Parts with the following content:
|workorderId, partId|
So when you want to get all parts from a specific workorder you just type:
select p.name
from parts p inner join workorders_parts wp on wp.partId = p.partId
where wp.workorderId = x;
what the query says is:
Give me the name of parts that belongs to workorderId=x and are listed in table workorders_parts
Remembering that INNER JOIN means "INTERSECTION" in other words: data i'm looking for should exist (generally the id) in both tables
IT will give you all part names that are used to build workorder x.
Lets say we have workorderId = 1 with partID = 1,2,3, it will be represented in our relationship table as:
workorderId | partId
1 | 1
1 | 2
1 | 3

Is there a possibility to change the order of a string with numeric value

I have some strings in my database. Some of them have numeric values (but in string format of course). I am displaying those values ordered ascending.
So we know, for string values, 10 is greater than 2 for example, which is normal. I am asking if there is any solution to display 10 after 2, without changing the code or the database structure, only the data.
If for example I have to display values from 1 to 10, I will have:
1
10
2
3
4
5
6
7
8
9
What I would like to have is
1
2
3
4
5
6
7
8
9
10
Is there a possibility to ad an "invisible character or string which will be interpreted as greater than 9". If i put a10 instead of 10, the a10 will be at the end but is there any invisible or less visible character for that.
So, I repeat, I am not looking for a programming or database structure solution, but for a simple workaround.
You could try to cast the value as an number to then order by it:
select col
from yourtable
order by cast(col AS UNSIGNED)
See SQL Fiddle with demo
You could try appending the correct number of zeroes to the front of the data:
01
02
03
..
10
11
..
99
Since you have a mixture of numbers and letters in this column - even if not in a single row - what you're really trying to do is a Natural Sort. This is not something MySQL can do natively. There are some work arounds, however. The best I've come across are:
Sort by length then value.
SELECT
mixedColumn
FROM
tableName
ORDER BY
LENGTH(mixedColumn), mixedColumn;
For more examples see: http://www.copterlabs.com/blog/natural-sorting-in-mysql/
Use a secondary column to use as a sort key that would contain some sort of normalized data (i.e. only numbers or only letters).
CREATE TABLE tableName (mixedColumn varchar, sortColumn int);
INSERT INTO tableName VALUES ('1',1), ('2',2), ('10',3),
('a',4),('a1',5),('a2',6),('b1',7);
SELECT
mixedColumn
FROM
tableName
ORDER BY
sortColumn;
This could get difficult to maintain unless you can figure out a good way to handle the ordering.
Of course if you were able to go outside of the database you'd be able to use natural sort functions from various programming languages.

MySQL, how to repeat same line x times

I have a query that outputs address order data:
SELECT ordernumber
, article_description
, article_size_description
, concat(NumberPerBox,' pieces') as contents
, NumberOrdered
FROM customerorder
WHERE customerorder.id = 1;
I would like the above line to be outputted NumberOrders (e.g. 50,000) divided by NumberPerBox e.g. 2,000 = 25 times.
Is there a SQL query that can do this, I'm not against using temporary tables to join against if that's what it takes.
I checked out the previous questions, however the nearest one:
is to be posible in mysql repeat the same result
Only gave answers that give a fixed number of rows, and I need it to be dynamic depending on the value of (NumberOrdered div NumberPerBox).
The result I want is:
Boxnr Ordernr as_description contents NumberOrdered
------+--------------+----------------+-----------+---------------
1 | CORDO1245 | Carrying bags | 2,000 pcs | 50,000
2 | CORDO1245 | Carrying bags | 2,000 pcs | 50,000
....
25 | CORDO1245 | Carrying bags | 2,000 pcs | 50,000
First, let me say that I am more familiar with SQL Server so my answer has a bit of a bias.
Second, I did not test my code sample and it should probably be used as a reference point to start from.
It would appear to me that this situation is a prime candidate for a numbers table. Simply put, it is a table (usually called "Numbers") that is nothing more than a single PK column of integers from 1 to n. Once you've used a Numbers table and aware of how it's used, you'll start finding many uses for it - such as querying for time intervals, string splitting, etc.
That said, here is my untested response to your question:
SELECT
IV.number as Boxnr
,ordernumber
,article_description
,article_size_description
,concat(NumberPerBox,' pieces') as contents
,NumberOrdered
FROM
customerorder
INNER JOIN (
SELECT
Numbers.number
,customerorder.ordernumber
,customerorder.NumberPerBox
FROM
Numbers
INNER JOIN customerorder
ON Numbers.number BETWEEN 1 AND customerorder.NumberOrdered / customerorder.NumberPerBox
WHERE
customerorder.id = 1
) AS IV
ON customerorder.ordernumber = IV.ordernumber
As I said, most of my experience is in SQL Server. I reference http://www.sqlservercentral.com/articles/Advanced+Querying/2547/ (registration required). However, there appears to be quite a few resources available when I search for "SQL numbers table".