Postgresql 9.2 trigger to separate subfields in a stored string - function

Postgresql 9.2 DB which automatically collects data from various machines.
The DB stores all the data including the machine id, the firmware, the manufacturer id etc as well as the actual result data. In one stored field (varchar) there are 5 sub fields which are separated by the ^ character.
ACT18!!!8246-EN-2.00013151!1^7.00^F5260046959^H1P1O1R1C1Q1L1^1 (Machine 1)
The order of this data seems to vary from one machine to another. Eg machine 1 2 and 3. The string above shows the firmware version, in this case "7.0" and it appears in sub-field 2. However, another machine sends the data in a different sub-field - in this case sub-field 3 and the value is "1"
BACT/ALERT^A.00^1^^ (Machine 2)
I want to store the values "7.0" and "1" in a different field in a separate table using a CREATE TRIGGER t_machine_id AFTER INSERT function where I can choose which sub-field is used depending on the machine the data has come from.
Is split_part the best function to do this? Can anyone supply an example code that will do this? I can't find anything in the documentation.

You need to (a) split the data using something like regexp_split_to_table then (b) match which parts are which using some criteria, since you don't have field position-order to rely on. Right now I don't see any reliable rule to decide what's the firmware version and what's the machine number; you can't really say where field <> machine_number because if machine 1 had firmware version 1 you'd get no results.
Given dummy data:
CREATE TABLE machine_info(data text, machine_no integer);
INSERT INTO machine_info(data,machine_no) (VALUES
('ACT18!!!8246-EN-2.00013151!1^7.00^F5260046959^H1P1O1R1C1Q1L1^1',1),
('BACT/ALERT^A.00^1^^',2)
);
Something like:
SELECT machine_no, regexp_split_to_table(data,'\^')
FROM machine_info;
will give you a table of split data elements with machine number, but then you need to decide which fields are which:
machine_no | regexp_split_to_table
------------+------------------------------
1 | ACT18!!!8246-EN-2.00013151!1
1 | 7.00
1 | F5260046959
1 | H1P1O1R1C1Q1L1
1 | 1
2 | BACT/ALERT
2 | A.00
2 | 1
2 |
2 |
(10 rows)
You may find the output of substituting regexp_split_to_array more useful, depending on whether you can get any useful info from field order and how you intend to process the data.
regress=# SELECT machine_no, regexp_split_to_array(data,'\^')
FROM machine_info;
machine_no | regexp_split_to_array
------------+------------------------------------------------------------------
1 | {ACT18!!!8246-EN-2.00013151!1,7.00,F5260046959,H1P1O1R1C1Q1L1,1}
2 | {BACT/ALERT,A.00,1,"",""}
(2 rows)
Say there are two firmware versions; version 1 sends code^blah^fwvers^^ and version 2 and higher sends code^fwvers^blah^blah2^machineno. You can then differentiate between the two because you know that version 1 leaves the last two fields blank:
SELECT
machine_no,
CASE WHEN info_arr[4:5] = ARRAY['',''] THEN info_arr[3] ELSE info_arr[2] END AS fw_vers
FROM (
SELECT machine_no, regexp_split_to_array(data,'\^')
FROM machine_info
) string_parts(machine_no, info_arr);
results:
machine_no | fw_vers
------------+---------
1 | 7.00
2 | 1
(2 rows)
Of course, you've only provided two sample data, so the real matching rules are likely to be more complex. Consider writing an SQL function to extract the desired field(s) and return them from the array passed.

Related

mySQL - Reiteratively Count rows that have particular CSV string

2-column MySQL Table:
| id| class |
|---|---------|
| 1 | A,B |
| 2 | B,C,D |
| 3 | C,D,A,G |
| 4 | E,F,G |
| 5 | A,F,G |
| 6 | E,F,G,B |
Requirement is to generate a report/output which tells which individual CSV value of class column is in how many rows.
For example, A is present in 3 rows (with id 1,3,5), and C is present in 2 rows (with id 2,3), and G is in 4 rows (3,4,5,6) so the output report should be
A - 3
B - 3
C - 2
...
...
G - 4
Essentially, column id can be ignored.
The draft that I can think of - first all the values of class column need to picked, split on comma, then create a distinct list of each unique value (A,B,C...), and then count how many rows contain the unique value from that distinct list.
While I know basic SQL queries, this is way too complex for me. Am unable to match it with some CSV split function in MySQL. (Am new to SQL so don't know much).
An alternative approach I made it to work - Download class column values in a file, feed it to a perl script which will create a distinct array of A,B,C, then read the downloaded CSV file again foreach element in distinct array and increase the count, and finally publish the report. But this is in perl which will be a separate execution, while the client needs it in SQL report.
Help will be appreciated.
Thanks
You may try split-string-into-rows function to get distinct values and use COUNT function to find number of occurrences. Specifically check here

Is there a way in MySQL to use aggregate functions in a sub section of binary column?

Suppose we have 2 numbers of 3 bits each attached together like '101100', which basically represents 5 and 4 combined. I want to be able to perform aggregation functions like SUM() or AVG() on this column separately for each individual 3-bit column.
For instance:
'101100'
'001001'
sum(first three column) = 6
sum(last three column) = 5
I have already tried the SUBSTRING() function, however, speed is the issue in that case as this query will run on millions of rows regularly. And string matching will slow the query.
I am also open for any new databases or technologies that may support this functionality.
You can use the function conv() to convert any part of the string to a decimal number:
select
sum(conv(left(number, 3), 2, 10)) firstpart,
sum(conv(right(number, 3), 2, 10)) secondpart
from tablename
See the demo.
Results:
| firstpart | secondpart |
| --------- | ---------- |
| 6 | 5 |
With the current understanding I have of your schema (which is next to none), the best solution would be to restructure your schema so that each data point is its own record instead of all the data points being in the same record. Doing this allows you to have a dynamic number of data points per entry. Your resulting table would look something like this:
id | data_type | value
ID is used to tie all of your data points together. If you look at your current table, this would be whatever you are using for the primary key. For this answer, I am assuming id INT NOT NULL but yours may have additional columns.
Data Type indicates what type of data is stored in that record. This would be the current tables column name. I will be using data_type_N as my values, but yours should be a more easily understood value (e.g. sensor_5).
Value is exactly what it says it is, the value of the data type for the given id. Your values appear to be all numbers under 8, so you could use a TINYINT type. If you have different storage types (VARCHAR, INT, FLOAT), I would create a separate column per type (val_varchar, val_int, val_float).
The primary key for this table now becomes a composite: PRIMARY KEY (id, data_type). Since your previously single record will become N records, the primary key will need to adjust to accommodate that.
You will also want to ensure that you have indexes that are usable by your queries.
Some sample values (using what you placed in your question) would look like:
1 | data_type_1 | 5
1 | data_type_2 | 4
2 | data_type_1 | 1
2 | data_type_2 | 1
Doing this, summing the values now becomes trivial. You would only need to ensure that data_type_N is summed with data_type_N. As an example, this would be used to sum your example values:
SELECT data_type,
SUM(value)
FROM my_table
WHERE id IN (1,2)
GROUP BY data_type
Here is an SQL Fiddle showing how it can be used.

UNION unexpected results?

Let's say we have:
table 1
a (int) | b (int)
--------|--------
1 | 4
2 | 4
table 2
c (text) d (text)
---------|---------
hoi | hi
Query:
SELECT * FROM table1
UNION
SELECT * FROM table2
yields
a | b
------|--------
1 | 4
2 | 4
hoi | hi
At least, from the query I just ran on mysql
I'd expect (1, 4, NULL, NULL). Why doesn't this give an error?
UNION just appends the rows of one query to the rows of the other. As long as the two queries return the same number of columns, there's no error. The column names always come from the fist query. If the datatypes are different, it finds a common type that they can all be converted to; in your example, it converts the int columns to text (MySQL is loose about this, some other databases require that you use explicit CAST() calls to get everything to the same type).
Since your queries each return two columns, the result contains two columns, using the column names from table1.
This is a bit long for a comment.
I just tested this on MySQL 8.0 and SQLite and it returns:
a b
1 4
2 4
hoi hi
I find this surprising. I would expect the columns to be given an integer type and for there to be either a type conversion error or 0 for the third row. Well, actually the SQLite results isn't that strange, because types are much more fungible in SQLite.
SQL Server and Postgres give errors that I would expect -- type conversion errors that cause the query to fail.

Compare two DNA-like strings with MySQL

I'm trying to find a way to compare two DNA-like strings with MySQL, stored functions are no problem. Also the string may be changed, but needs to have the following format: [code][id]-[value] like C1-4. (- may be changed aswell)
Example of the string:
C1-4,C2-5,C3-9,S5-2,S8-3,L2-4
If a value not exists in the other string, for example S3-1 it will score 10 (max value). If the asked string has C1-4 and the given string has C1-5 the score has to be 4 - 5 = -1 and if the asked string is C1-4 and the given string has C1-2 the score has to be 4 - 2 = 2.
The reason for a this is that my realtime algorithm is getting slow with 10.000 results. (already optimized with stored functions, indexes, query optimalizations) Because 10.000 x small and quick queries will make a lot.
And the score has to be calculated before I can order my query and get the right limit.
Thanks and if you have any questions let me know by comment.
** EDIT **
I'm thinking that it's also possible to not use a string but a table where the DNA-bits are stored as a 1-n relation table.
ID | CODE | ID | VALUE
----------------------
1. | C... | 2. | 4....

How to get the right "version" of a database entry?

Update: Question refined, I still need help!
I have the following table structure:
table reports:
ID | time | title | (extra columns)
1 | 1364762762 | xxx | ...
Multiple object tables that have the following structure
ID | objectID | time | title | (extra columns)
1 | 1 | 1222222222 | ... | ...
2 | 2 | 1333333333 | ... | ...
3 | 3 | 1444444444 | ... | ...
4 | 1 | 1555555555 | ... | ...
In the object tables, on an object update a new version with the same objectID is inserted, so that the old versions are still available. For example see the entries with objectID = 1
In the reports table, a report is inserted but never updated/edited.
What I want to be able to do is the following:
For each entry in my reports table, I want to be able to query the state of all objects, like they were, when the report was created.
For example lets look at the sample report above with ID 1. At the time it was created (see the time column), the current version of objectID 1 was the entry with ID 1 (entry ID 4 did not exist at that point).
ObjectID 2 also existed with it's current version with entry ID 2.
I am not sure how to achieve this.
I could use a query that selects the object versions by the time column:
SELECT *
FROM (
SELECT *
FROM objects
WHERE time < [reportTime]
ORDER BY time DESC
)
GROUP BY objectID
Lets not talk about the performance of this query, it is just to make clear what I want to do. My problem is the comparison of the time columns. I think this is no good way to make sure that I got the right object versions, because the system time may change "for any reason" and the time column would then have wrong data in it, which would lead to wrong results.
What would be another way to do so?
I thought about not using a time column for this, but instead a GLOBAL incremental value that I know the insertion order across the database tables.
If you are interting new versions of the object, and your problem is the time column(I assume you are using this column to sort which one is newer); I suggest you to use an auto-incremental ID column for the versions. Eventually, even if the time value is not reliable for you, the ID will be.Since it is always increasing. So higher ID, newer version.