MUMPS can't format Number to String - mumps

I am trying to convert larg number to string in MUMPS but I can't.
Let me explain what I would like to do :
s A="TEST_STRING#12168013110012340000000001"
s B=$P(A,"#",2)
s TAB(B)=1
s TAB(B)=1
I would like create an array TAB where variable B will be a primary key for array TAB.
When I do ZWR I will get
A="TEST_STRING#12168013110012340000000001"
B="12168013110012340000000001"
TAB(12168013110012340000000000)=1
TAB("12168013110012340000000001")=1
as you can see first SET recognize variable B as a number (wrongly converted) and second SET recognize variable B as a string ( as I would like to see ).
My question is how to write SET command to recognize variable B as a string instead of number ( which is wrong in my opinion ).
Any advice/explanation will be helpful.

This may be a limitation of sorting/storage mechanism built into MUMPS and is different between different MUMPS implementations. The cause is that while variable values in MUMPS are non typed, index values are -- and numeric indices are sorted before string ones. When converting a large string to number, rounding errors may occur. To prevent this from happening, you need to add a space before number in your index to explicitly treat it as string:
s TAB(" "_B)=1
As far as I know, Intersystems Cache doesn't have this limitation -- at least your code works fine in Cache and in documentation they claim to support up to 309 digits:
http://docs.intersystems.com/cache20141/csp/docbook/DocBook.UI.Page.cls?KEY=GGBL_structure#GGBL_C12648

I've tried to recreate your scenario, but I am not seeing the issue you're experiencing.
It actually is not possible ( in my opinion ) for the same command executed immediately ( one execution after another) to produce two different results.
s TAB(B)=1
s TAB(B)=1
for as long the value of B did not change between the executions, the result should be:
TAB("12168013110012340000000001")=1
Example of what GT.M implementation of MUMPS returns in your case

Related

How to remove string that is no longer needed in MySQL database?

This seemed like it should be very simple to do yet I've not been able to find an answer after weeks of looking.
I'm trying to remove strings that are no longer needed. Regex_replace sounds perfect but is not available in MySQL.
In MySQL how would I accomplish changing this:
[quote=ABC;xxxxxx]
to this:
[quote=ABC]
The issues are:
- this can appear anywhere in a text blob
- the xxxxxx can only be numeric but may be 6, 7 or 8 characters long
- not adding/removing any rows, just rewriting the contents of one column on one row at a time.
Thanks.
I don't think you really need REGEX_Replace (though it would make things easier of course).
Assuming that the example you presented is a real reflection of what you have:
Your starting point is with the string [quote=<something>;, meaning that you can start searching for [quote=,
Once you found it, you need to search for ; and after that for ],
Once you found them both, you know what to extract when where to start for the next search (if the pattern you mentioned can appear more than once within a singe blob.
Did I get you correctly?
EDIT
This paradigm is aimed to convert all instances of [quote=ABC;xxxxxx] to [quote=ABC] under the following assumptions:
The pattern can appear any number of times within the input string,
The length of xxxxxx is not fixed,
The resulting string (after removing all the appearances of ;xxxxxx) should replace the value in the table,
Performance is not an issue since either this is going to be a one-time job (through the whole table) or it will run every time on a single string (e.g. before INSERTing a new record).
Some MySQL functions that will be used:
INSTR: Searches within a string for the first appearance of a sub-string and returns the position (offset) where the sub-string was found,
SUBSTR: Returns a substring from a string (several ways to use it),
CONCAT: Concatenates two or more strings.
The guidelines presented here apply for the manipulation of a single INPUT string. If this needs to be used over, say, a whole table, simply get the strings into a CURSOR and loop.
Here are the steps:
Declare five INT local variables to serve as indices and total input string length, say L_Start, L_UpTo, l_Total_Length, l_temp1 and l_temp 2, setting the initial value for l_Start = 1 and l_Total_Length = LENGTH(INPUT_String),
Declare a string variable into which you will copy the "cleaned" result and initiate it as '', say l_Output_str; also declare a temporary string to hold the value of 'ABD', say l_Quote,
Start a infinite loop (you will set the exit condition within it; see below),
Exit loop if l_Start >= l_Total_Length (here is one of the two exit points from the loop),
Find the first location of '[quote=' within the input string starting from L_Start,
If the returned value is 0 (i.e. substring not found), concatenate the current contents of l_Output_str with whatever remains if the input string from position L_start (e.g. SET l_Output_str = CONCAT(l_Output_str,SUBSTR(INPUT_String,L_Start) ;) and exit loop (second exit position),
Search the input string for the ; symbol starting from L_start + 7 (i.e. the length of [quote=) and save the value in l_temp_1,
Search the input string for the ] symbol starting from L_start + 7 + l_temp2 and save the value in l_temp_2,
Add the found result to output string as SET l_Output_str = CONCAT(l_Output_str,'[quote=',SUBSTR(INPUT_String,L_Start + 7, l_temp_2 - l_temp_1),']') ;,
Set L_Start = L_Start + 7 + l_temp_2 + 1 ;
End of loop.
Notes:
As I neither made the code nor tested it, it is possible that I'm not setting indices correctly; you will need to perform detailed tests to make get it working as needed;
The above IS the method I suggested;
If the input string is very long (many MBs), you might observe poor performance (i.e. it might take few seconds to complete) because of the concatenations. There are some steps that can be taken to improve performance, but let's have this working first and then, if needed, tackle the performance issues.
Hope that the above is clear and comprehensive.

SAS Most efficient way to eliminate duplicate

For starter I know my problem is similar to This(which is the closest to my question I have found), but with some differences at the same time, hence my new post.
I have a database with an identifier and declarations. Declarations are constructed as identifier + a letter.
If the idendifier is 123456, declarations would then be "123456A", "123456B" and so on
I would like to select one observation for each identifier, with the declaration that is the one with the last letter, which is of course, not always the same.
I assume I can do that with a proc sort and then another one with nodupkey :
proc sort data=have out=have2;
by identifier declaration /descending;
run;
proc sort data=have2 out=want nodupkey;
by declaration;
run;
but as I have a relatively important database (tens of millions observations) I would like to know the best in sense of both better suited and fastest method if it is another one.
Typically, if it is possible in one step.
Thanks
This looks like a quick solution. It sets only the first observation (in your case the last as you have already sorted by descending). Meaning the rest of the records will not be even loaded into the program data vector. If possible please let me know how it went. I am curious if this would be optimal. I know this to be true only in thoery. I have never tested it myself on a large dataset. 10x
data want;
do until ( first.identifier ) ;
set have;
by identifier ;
end ;
run;
This should work:
proc sql;
create table want as
select
identifier,
max(declaration) as last_declaration
from have
group by identifier;
quit;

MySQL integer comparison ignores trailing alpha characters

So lets just say I have a table with just an ID is an int. I have discovered that running:
SELECT *
FROM table
WHERE ID = '32anystring';
Returns the row where id = 32. Clearly 32 != '32anystring'.
This seems very strange. Why does it do this? Can it be turned off? What is the best workaround?
It is common behavior in most programming languages to interpret leading numerals as a number when converting a string to a number.
There are a couple of ways to handle this:
Use prepared statements, and define the placeholder where you are putting the value to be of a numeric type. This will prevent strings from being put in there at all.
Check at a higher layer of the application to validate input and make sure it is numeric.
Use the BINARY keyword in mysql (I'm just guessing that this would work, have never actually tried it as I've always just implemented a proper validation system before running a query) -
SELECT *
FROM table
WHERE BINARY ID = '32anystring';
You need to read this
http://dev.mysql.com/doc/refman/5.1/en/type-conversion.html
When you work on, or compare two different types, one of them will be converted to the other.
MySQL conversion of string->number (you can do '1.23def' * 2 => 2.46) parses as much as possible of the string as long as it is still a valid number. Where the first letter cannot be part of a number, the result becomes 0

mumps query related to %%

What is the meaning of I $E(R%%,I%%)>1 ? and why using %%?
Actually, if you are talking about Standard MUMPS (not any particular implementation)
the R%% is illegal syntax. I have seen the non-standard use of % in extensions to MUMPS, such as EsiObjects or InterSystems Cache Object Script, but the use in the question above is actually nonsense in standard MUMPS.
There is no particular significance to %%. Its just part of the variable name and I still don't understand MUMPS community obsession with using % in variable names and making them more obscure.
so the statement means IF $EXTRACT(R%%,I%%)>1 i.e if the extracted value from the string R%% at position I%% is greater than 1, do some more obscure stuff.
$EXTRACT(string,from) extracts a
single character in the position
specified by from. The from value can
be an integer count from the beginning
of the string, an asterisk specifying
the last character of the string, or
an asterisk with a negative integer
specifying a count backwards from the
end of the string.
Link to documentation: http://docs.intersystems.com/cache20102/csp/docbook/DocBook.UI.Page.cls?KEY=RCOS_fextract

How can I store an array of boolean values in a MySql database?

In my case, every "item" either has a property , or not. The properties can be some hundreds, so I will need , say, max 1000 true/false bits per item.
Is there a way to store those bits in one field of the item ?
If you're looking for a way to do this in a way that's searchable, then no.
A couple searchable methods (involving more than 1 column and/or table):
Use a bunch of SET columns. You're limited to 64 items (on/offs) in a set, but you cna probably figure out a way to group them.
Use 3 tables: Items (id, ...), FlagNames(id, name), and a pivot table ItemFlags(item_id, flag_id). You can then query for items with joins.
If you don't need it to be searchable, then all you need is a method to serialize your data before you put it in the database, and a unserialize it when you pull it out, then use a char, or varchar column.
Use facilities built in to your language (PHP's serialize/unserialize).
Concatenate a series of "y" and "n" characters together.
Bit-pack your values into a string (8 bits per character) in the client before making a call to the MySQL database, and unpack them when retrieving data out of the database. This is the most efficient storage mechanism (if all rows are the same, use char[x], not varchar[x]) at the expense of the data not being searchable and slightly more complicated code.
I would rather go with something like:
Properties
ID, Property
1, FirsProperty
2, SecondProperty
ItemProperties
ID, Property, Item
1021, 1, 10
1022, 2, 10
Then it would be easy to retrieve which properties are set or not with a query for any particular item.
At worst you would have to use a char(1000) [ynnynynnynynnynny...] or the like. If you're willing to pack it (for example, into hex isn't too bad) you could do it with a char(64) [hexadecimal chars].
If it is less than 64, then the SET type will work, but it seems like that's not enough.
You could use a binary type, but that's designed more for stuff like movies, etc.. so I'd not.
So yeah, it seems like your best bet is to pack it into a string, and then store that.
It should be noted that a VARCHAR would be wasting space, since you do know precisely how much space your data will take, and can allocate it exactly. (Having fixed-width rows is a good thing)
Strictly speaking you can accomplish this using the following:
$bools = array(0,1,1,0,1,0,0,1);
$for_db = serialize($array);
// Insert the serialized $for_db string into the database. You could use a text type
// make certain it could hold the entire string.
// To get it back out:
$bools = unserialize($from_db);
That said, I would strongly recommend looking at alternative solutions.
Depending on the use case you might try creating an "item" table that has a many-to-many relationship with values from an "attributes" table. This would be a standard implementation of the common Entity Attribute Value database design pattern for storing variable points of data about a common set of objects.