F# Read File, Split string list, summarize data, Nonfloat decimal numbers - function

I'm new to F# and got this assignment to create a very simple bankrepresentation.
I do not want any code answers directly related to the problem, but preferally links or tips on where to find solutions or how to find do the solutions.
The issues are the following:
Reading lines of a file (a line looks like this: "126,145001,1500.00" and it's sequence_number, account_number, amount)
Split the line to use the data from the line
summarize the data (to return the bank account balance)
Not using floating point numbers representing the amount, due to rounding errors(?)
Doing all of these in one function.
I know how to read a file, in a function.
I also know how to split a string.
I know how to recursivly add values from a list.
I do not know how to add values that are decimal without floating-point variables.
I do not know how to retrieve the string from a list in a function and split it.
I do not know how to do all of these things in on function taking in file name, account number, and account currency.
The function should return the balance after the transactions in the file have been proccessed.
My idea to solve this is to create a datatype that have the three variables sequence_number, account_number and amount, and then do the following:
Read the file,
Split the data and create an object of my custom type for each line in the file
Add and remove the values from the types and return the final balance.
If anyone could point me in the right direction for each or any problem I would be really thankful!

.NET contains a type called System.Decimal that is indeed more appropriate for storing financial figures than the typical floating point types. In F#, you can use the decimal function to convert a value of a different type (say a string) to a System.Decimal (which F# abbreviates as a type also named decimal): let d = decimal "1.23" You can also create these values directly by using the M suffix: let d' = 1.23M, but in your case that doesn't seem relevant.
Regarding your other questions, if you use System.IO.File.ReadLines, then you can get the individual lines of your file as a sequence. Then you can string together a bunch of operations on that sequence to achieve your desired result. For instance, you can take the sequence and use Seq.map <your splitting code here> to split each line (and convert to instances of your specific data type, if desired), and then use Seq.groupBy to group the transactions by account number, and then Seq.map again to apply your summarization logic to each group. Ask follow-up questions if any of this is unclear.

Related

SSIS Derived Column - Parse Text between break returns

I have a text field from a SQL Server Source. It is a phone number field that typically has this format:
Home: 555-555-1212
Work: 555-555-1212
Cell: 555-555-1212
Emergency: 555-555-1212
I'm trying to split among fields so that only 555-555-1212 is displayed
I am then taking this field and converting to a string. There are literally break returns (\r\n) between the labels here. The goal here is to have this data split among multiple fields (home,work,cell,emergency,etc.) I was researching how to split text among fields and I made some progress. In the case of home numbers, I used this logic:
SUBSTRING(Phone_converted,FINDSTRING(Phone_converted,"Home:",1) + 5,FINDSTRING(Phone_converted,"\n",1) - FINDSTRING(Phone_converted,"Home:",1) - 5)
This works great as it parses up to the text return and I get 555-555-1212.
Now I experience an issue when searching for a text between break returns. I tried the same logic for Work numbers:
SUBSTRING(Phone_converted,FINDSTRING(Phone_converted,"Work:",1) + 5,FINDSTRING(Phone_converted,"\n",1) - FINDSTRING(Phone_converted,"Work:",1) - 5)
But that won't work and results in writing to my error redirection file. I then tried to insert a break return to find the text at the beginning
SUBSTRING(Phone_converted,FINDSTRING(Phone_converted,"\nWork:",1) + 5,FINDSTRING(Phone_converted,"\n",1) - FINDSTRING(Phone_converted,"\nWork:",1) - 5)
No luck there either. Any ideas on how I can address this. Also, I would appreciate an idea of how I can handle the emergency title at the end. There won't be a break return in that situation, but I still want to parse the text.
I look at your data and I see
Home:|555-555-1212|Work:|555-555-1212|Cell:|555-555-1212|Emergency:|555-555-1212
I'm using the pipe character, |, as a placeholder for where I would segment that string, which is basically wherever you have whitespace (space, tab, newline, etc).
There are two approaches to this. I'll start with the easy one.
Script Component
String.Split is your friend here. Look at what it did with that source data
I added a new Script Component, acting as a Transformation and created 4 output columns, all string of length 12 codepage 1252: Home, Work, Cell, and Emergency. I populate them like so
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
string[] split = Row.PhoneData.Split();
Row.Home = split[1];
Row.Work = split[4];
Row.Cell = split[7];
Row.Emergency = split[10];
}
Derived Column
I'm not going to build out a full blown implementation of this. The above is much to simple but I run into situations where ETL devs say they aren't allowed to use Script tasks/components and that's usually because people reached for them first instead of last.
The approach here is to have lots of Derived Columns Components on your Data Flow. It won't hurt performance and in fact can make it easier. It definitely will make your debugging easier as you'll have lots of it to do.
DER Find Colons
This would add 4 columns into the dataflow - HomeColonPosition, WorkColonPosition etc. You've already started down this path but just build it out into the actual data flow as you'll need to reference these positions and again, it's easier to fix the calculation that populates a column versus a calculation that's wrong and used everywhere. You're likely to find that 4 derived columns are useful here as you'd want to use the previous colon's position as the starting point for the third argument to FINDSTRING
Thus, instead of Work being
FINDSTRING(PhoneData, ":", FINDSTRING(PhoneData, ":" 1) + 1)
it would just be
FINDSTRING(PhoneData, ":", HomeColonPosition + 1)
Just knowing the position of the 4 colons in that string, I can figure out where the phone numbers are (maybe). The position of the colon + 2 (colon and the space) is the starting point and then go out 12 characters.
Where this approach gets ugly, much as it did with the script approach is when that data isn't consistent.

MUMPS can't format Number to String

I am trying to convert larg number to string in MUMPS but I can't.
Let me explain what I would like to do :
s A="TEST_STRING#12168013110012340000000001"
s B=$P(A,"#",2)
s TAB(B)=1
s TAB(B)=1
I would like create an array TAB where variable B will be a primary key for array TAB.
When I do ZWR I will get
A="TEST_STRING#12168013110012340000000001"
B="12168013110012340000000001"
TAB(12168013110012340000000000)=1
TAB("12168013110012340000000001")=1
as you can see first SET recognize variable B as a number (wrongly converted) and second SET recognize variable B as a string ( as I would like to see ).
My question is how to write SET command to recognize variable B as a string instead of number ( which is wrong in my opinion ).
Any advice/explanation will be helpful.
This may be a limitation of sorting/storage mechanism built into MUMPS and is different between different MUMPS implementations. The cause is that while variable values in MUMPS are non typed, index values are -- and numeric indices are sorted before string ones. When converting a large string to number, rounding errors may occur. To prevent this from happening, you need to add a space before number in your index to explicitly treat it as string:
s TAB(" "_B)=1
As far as I know, Intersystems Cache doesn't have this limitation -- at least your code works fine in Cache and in documentation they claim to support up to 309 digits:
http://docs.intersystems.com/cache20141/csp/docbook/DocBook.UI.Page.cls?KEY=GGBL_structure#GGBL_C12648
I've tried to recreate your scenario, but I am not seeing the issue you're experiencing.
It actually is not possible ( in my opinion ) for the same command executed immediately ( one execution after another) to produce two different results.
s TAB(B)=1
s TAB(B)=1
for as long the value of B did not change between the executions, the result should be:
TAB("12168013110012340000000001")=1
Example of what GT.M implementation of MUMPS returns in your case

how do i decode/encode the url parameters for the new google maps?

Im trying to figure out how to extract the lat/long of the start/end in a google maps directions link that looks like this:
https://www.google.com/maps/preview#!data=!1m4!1m3!1d189334!2d-96.03687!3d36.1250439!4m21!3m20!1m4!3m2!3d36.0748342!4d-95.8040972!6e2!1m5!1s1331-1399+E+14th+St%2C+Tulsa%2C+OK+74120!2s0x87b6ec9a1679f9e5%3A0x6e70df70feebbb5e!3m2!3d36.1424613!4d-95.9736986!3m8!1m3!1d189334!2d-96.03687!3d36.1250439!3m2!1i1366!2i705!4f13.1&fid=0
Im guessing the "!" is a separator between variables followed by XY where x is a number and y is a lower case letter, but can not quite figure out how to reliably extract the coordinates as the number/order of variables changes as well as their XY prefixes.
ideas?
thanks
Well, this is old, but hey. I've been working on this a bit myself, so here's what I've figured out:
The data is an encoded javascript array, so the trick when trying to generate your own data string is to ensure that your formatting keeps the structure of the array intact. To do this, let's look at what each step represents.
As you're correctly figured out, each exclamation point defines the start of a value definition. The first character, an int value, is an inner count, and (I believe) acts as an identifier, although I'm not 100% certain on this. It seems to be pretty flexible in terms of what you can have here, as long as it's an int. The second character, however, is much more important. It defines the data type of the value. I don't know if I've found all the data types yet, but the ones I have figured out are:
m: matrix
f: float
d: double
i: integer
b: boolean
e: enum (as integer)
s: string
u: unsigned int
x: hexdecimal value?
the remaining characters actually hold the value itself, so a string will just hold the string, a boolean will be '1' or '0', and so on. However, there's an important gotcha: the matrix data type.
The value of the matrix will be an integer. This is the length of the matrix, measured in the number of values. That is, for a matrix !1mx, the next x value definitions will belong to the matrix. This includes nested matrix definitions, so a matrix of form [[1,2]] would look like !1m3!1m2!1i1!2i2 (outer matrix has three children, inner matrix has 2). this also means that, in order to remove a value from the list, you must also check it for matrix ancestors and, if they exist, update their values to reflect the now missing member.
The x data type is another anomaly. I'm going to guess it's hexdecimal encoded for most purposes, but in my particular situation (making a call for attribution info), they appear to also use the x data type to store lat/long information, and this is NOT encoded in hex, but is an unsigned long with the value set as
value = coordinate<0 ? (430+coordinate)*1e7 : coordinate*1e7
An example (pulled directly from google maps) of the x data type being used in this way:
https://www.google.com/maps/vt?pb=!1m8!4m7!2u7!5m2!1x405712614!2x3250870890!6m2!1x485303036!2x3461808386!2m1!1e0!2m20!1e2!2spsm!4m2!1sgid!2sznfCVopRY49wPV6IT72Cvw!4m2!1ssp!2s1!8m11!13m9!2sa!15b1!18m5!2b1!3b0!4b1!5b0!6b0!19b1!19u12!3m1!5e1105!4e5!18m1!1b1
For the context of the question asked, it's important to note that there are no reliable identifiers in the structure. Google reads the values in a specific order, so always keep in mind when building your own encoded data that order matters; you'll need to do some research/testing to determine that order. As for reading, your best hope is to rebuild the matrix structure, then scan it for something that looks like lat/long values (i.e. a matrix containing exactly two children of type double (or x?))
Looks like the developer tools from current browsers (I am using Chrome for that) can give you a lot of info.
Try the following:
Go to Google Maps with Chrome (or adapt the instructions for other browser);
Open Developer Tools (Ctrl + Shift + I);
Go to Network tab. Clear the current displayed values;
Drag the map until some url with encoded data appears;
Click on that url, and then go to the Preview sub-tab;
Try this.
function URLtoLatLng(url) {
this.lat = url.replace(/^.+!3d(.+)!4d.+$/, '$1');
this.lng = url.replace(/^.+!4d(.+)!6e.+$/, '$1');
return this;
}
var url = new URLtoLatLng('https://www.google.com/maps/preview#!data=!1m4!1m3!1d189334!2d-96.03687!3d36.1250439!4m21!3m20!1m4!3m2!3d36.0748342!4d-95.8040972!6e2!1m5!1s1331-1399+E+14th+St%2C+Tulsa%2C+OK+74120!2s0x87b6ec9a1679f9e5%3A0x6e70df70feebbb5e!3m2!3d36.1424613!4d-95.9736986!3m8!1m3!1d189334!2d-96.03687!3d36.1250439!3m2!1i1366!2i705!4f13.1&fid=0');
console.log(url.lat + ' ' + url.lng);

Map and Filter in Haskell

I have two lists of tuples which are as follows: [(String,Integer)] and [(Float,Integer)]. Each list has several tuples.
For every Integer that has a Float in the second list, I need to check if its Integer matches the Integer in the first list, and if it does, return the String - although this function needs to return a list of Strings, i.e. [String] with all the results.
I have already defined a function which returns a list of Integers from the second list (for the comparison on the integers in the first list).
This should be solvable using "high-order functions". I've spent a considerably amount of time playing with map and filter but haven't found a solution!
You have a list of Integers from the second list. Let's call this ints.
Now you need to do two things--first, filter the (String, Integer) list so that it only contains pairs with corresponding integers in the ints list and secondly, turn this list into just a list of String.
These two steps correspond to the filter and map respectively.
First, you need a function to filter by. This function should take a (String, Integer) pair and return if the integer is in the ints list. So it should have a type of:
check :: (String, Integer) -> Bool
Writing this should not be too difficult. Once you have it, you can just filter the first list by it.
Next, you need a function to transform a (String, Integer) pair into a String. This will have type:
extract :: (String, Integer) -> String
This should also be easy to write. (A standard function like this actually exists, but if you're just learning it's healthy to figure it out yourself.) You then need to map this function over the result of your previous filter.
I hope this gives you enough hints to get the solution yourself.
One can see in this example how important it is to describe the problem accurately, not only to others but foremost to oneself.
You want the Strings from the first list, whose associated Integer does occur in the second list.
With such problems it is important to do the solutions in small steps. Most often one cannot write down a function that does it right away, yet this is what many beginners think they must do.
Start out by writing the type signature you need for your function:
findFirsts :: [(String, Integer)] -> [(Float, Integer)] -> [String]
Now, from the problem description, we can deduce, that we essentially have two things to do:
Transform a list of (String, Integer) to a list of String
Select the entries we want.
Hence, the basic skeleton of our function looks like:
findFirsts sis fis = map ... selected
where
selected = filter isWanted sis
isWanted :: (String, Integer) -> Bool
isWanted (_,i) = ....
You'll need the functions fst, elem and snd to fill out the empty spaces.
Side note: I personally would prefer to solve this with a list comprehension, which results often in better readable (for me, anyway) code than a combination of map and filter with nontrivial filter criteria.
Half of the problem is to get the string list if you have a single integer. There are various possibilities to do this, e.g. using filter and map. However you can combine both operations using a "fold":
findAll x axs = foldr extract [] axs where
extract (a,y) runningList | x==y = a:runningList
| otherwise = runningList
--usage:
findAll 2 [("a",2),("b",3),("c",2)]
--["c","a"]
For a fold you have a start value (here []) and an operation that combines the running values successively with all list elements, either starting from the left (foldl) or from the right (foldr). Here this operation is extract, and you use it to decide whether to add the string from the current element to the running list or not.
Having this part done, the other half is trivial: You need to get the integers from the (Float,Integer) list, call findAll for all of them, and combine the results.

How can I store an array of boolean values in a MySql database?

In my case, every "item" either has a property , or not. The properties can be some hundreds, so I will need , say, max 1000 true/false bits per item.
Is there a way to store those bits in one field of the item ?
If you're looking for a way to do this in a way that's searchable, then no.
A couple searchable methods (involving more than 1 column and/or table):
Use a bunch of SET columns. You're limited to 64 items (on/offs) in a set, but you cna probably figure out a way to group them.
Use 3 tables: Items (id, ...), FlagNames(id, name), and a pivot table ItemFlags(item_id, flag_id). You can then query for items with joins.
If you don't need it to be searchable, then all you need is a method to serialize your data before you put it in the database, and a unserialize it when you pull it out, then use a char, or varchar column.
Use facilities built in to your language (PHP's serialize/unserialize).
Concatenate a series of "y" and "n" characters together.
Bit-pack your values into a string (8 bits per character) in the client before making a call to the MySQL database, and unpack them when retrieving data out of the database. This is the most efficient storage mechanism (if all rows are the same, use char[x], not varchar[x]) at the expense of the data not being searchable and slightly more complicated code.
I would rather go with something like:
Properties
ID, Property
1, FirsProperty
2, SecondProperty
ItemProperties
ID, Property, Item
1021, 1, 10
1022, 2, 10
Then it would be easy to retrieve which properties are set or not with a query for any particular item.
At worst you would have to use a char(1000) [ynnynynnynynnynny...] or the like. If you're willing to pack it (for example, into hex isn't too bad) you could do it with a char(64) [hexadecimal chars].
If it is less than 64, then the SET type will work, but it seems like that's not enough.
You could use a binary type, but that's designed more for stuff like movies, etc.. so I'd not.
So yeah, it seems like your best bet is to pack it into a string, and then store that.
It should be noted that a VARCHAR would be wasting space, since you do know precisely how much space your data will take, and can allocate it exactly. (Having fixed-width rows is a good thing)
Strictly speaking you can accomplish this using the following:
$bools = array(0,1,1,0,1,0,0,1);
$for_db = serialize($array);
// Insert the serialized $for_db string into the database. You could use a text type
// make certain it could hold the entire string.
// To get it back out:
$bools = unserialize($from_db);
That said, I would strongly recommend looking at alternative solutions.
Depending on the use case you might try creating an "item" table that has a many-to-many relationship with values from an "attributes" table. This would be a standard implementation of the common Entity Attribute Value database design pattern for storing variable points of data about a common set of objects.