F# csv provider with different column order - csv

If I define a type
type MyType = CsvProvider<"schema.csv",
Schema="A->MyA=int, B->MyB=int">
And if i load csv's like
let csv1 = MyType.Load("file1.csv")
If "file1.csv" contains all the columns that "schema.csv" has, but with different order, and have extra columns which do not appear in "schema.csv". Can I still load it provided that I am only interested in the columns that specified in "schema.csv"?

Either you have a locked schema of the CSV-files, and use CsvProvider, or you dont.
You always have the option of "reverting" to CsvFile (CsvParser): http://fsharp.github.io/FSharp.Data/library/CsvFile.html
With the latter you can easily parse any CSV-file, confirm that it has the columns you want, and then read them as wanted.
I usually revert to the CsvFile, since often creating CSV-files are done somewhat unstructured and apperently ad-hoc (at least in the cases I have encountered), and then CsvFile are a good solution, with somewhat more flexibility then in CsvProvider. Yes somewhat more code too, but still...

That use case is not supported. If the column order is different things won't work. The whole CsvProvider is built on the assumption that the data you give it has the same structure of the sample you provided. You can always submit an issue here: https://github.com/fsharp/FSharp.Data/issues/

Related

Rest API design with multiple unique ids

Currently, we are developing an API for our system and there are some resources that may have different kinds of identifiers.
For example, there is a resource called orders, which may have an unique order number and also have an unique id. At the moment, we only have URLs for the id, which are these URLs:
GET /api/orders/{id}
PUT /api/orders/{id}
DELETE /api/orders/{id}
But now we need also the possibility to use order numbers, which normally would result into:
GET /api/orders/{orderNumber}
PUT /api/orders/{orderNumber}
DELETE /api/orders/{orderNumber}
Obviously that won't work, since id and orderNumber are both numbers.
I know that there are some similar questions, but they don't help me out, because the answers don't really fit or their approaches are not really restful or comprehensible (for us and for possible developers using the API). Additionally, the questions and answers are partially older than 7 years.
To name a few:
1. Using a query param
One suggests to use a query param, e.g.
GET /api/orders/?orderNumber={orderNumber}
I think, there are a lot of problems. First, this is a filter on the orders collections, so that the result should be a list as well. However, there is only one order for the unique order number which is a little bit confusing. Secondly, we use such a filter to search/filter for a subset of orders. Additionally, a query params is some kind of a second-class parameter, but should be first-class in this case. This is even a problem, if I the object does not exist. Normally a get would return a 404 (not found), but a GET /api/orders/?orderNumber=1234 would be an empty array, if the order 1234 does not exist.
2. Using a prefix
Some public APIs use some kind of a discriminator to distinguish between different types, e.g. like:
GET /api/orders/id_1234
GET /api/orders/ordernumber_367652
This works for their approach, because id_1234 and ordernumber_367652 are their real unique identifiers that are also returned by other resources. However, that would result in a response object like this:
{
"id": "id_1234",
"ordernumber": "ordernumber_367652"
//...
}
This is not very clean, because the type (id or order number) is modelled twice. And apart from the problem of changing all identifiers and response objects, this would be confusing, if you e.g. want to search for all order numbers greater than 67363 (thus, there is also a string/number clash). If the response does not add the type as a prefix, a user have to add this for some request, which would also be very confusing (sometime you have to add this and sometimes not...)
3. Using a verb
This is what e.g. Twitter does: their URL ends with show.json, so you can use it like:
GET /api/orders/show.json?id=1234
GET /api/orders/show.json?number=367652
I think, this is the most awful solution, since it is not restful. Furthermore, it has some of the problems that I mentioned in the query param approach.
4. Using a subresource
Some people suggest to model this like a subresource, e.g.:
GET /api/orders/1234
GET /api/orders/id/1234 //optional
GET /api/orders/ordernumber/367652
I like the readability of this approach, but I think the meaning of /api/orders/ordernumber/367652 would be "get (just) the order number 367652" and not the order. Finally, this breaks some best practices like using plurals and only real resources.
So finally, my questions are: Did we missed something? And are there are other approaches, because I think that this is not an unusual problem?
to me, the most RESTful way of solving your problem is using the approach number 2 with a slight modification.
From a theoretical point of view, you just have valid identification code to identify your order. At this point of the design process, it isn't important whether your identification code is an id or an order number. It's something that uniquely identify your order and that's enough.
The fact that you have an ambiguity between ids and numbers format is an issue belonging to the implementation phase, not the design phase.
So for now, what we have is:
GET /api/orders/{some_identification_code}
and this is very RESTful.
Of course you still have the problem of solving your ambiguity, so we can proceed with the implementation phase. Unfortunately your order identification_code set is made of two distinct entities that share the format. It's trivial it can't work. But now the problem is in the definition of these entity formats.
My suggestion is very simple: ids will be integers, while numbers will be codes such as N1234567. This approach will make your resource representation acceptable:
{
"id": "1234",
"ordernumber": "N367652"
//...
}
Additionally, it is common in many scenarios such as courier shipments.
Here is an alternate option that I came up with that I found slightly more palatable.
GET /api/orders/1234
GET /api/orders/1234?idType=id //optional
GET /api/orders/367652?idType=ordernumber
The reason being it keeps the pathing consistent with REST standards, and then in the service if they did pass idType=orderNumber (idType of id is the default) you can pick up on that.
I'm struggling with the same issue and haven't found a perfect solution. I ended up using this format:
GET /api/orders/{orderid}
GET /api/orders/bynumber/{orderNumber}
Not perfect, but it is readable.
I'm also struggling with this! In my case, i only really need to be able to GET using the secondary ID, which makes this a little easier.
I am leaning towards using an optional prefix to the ID:
GET /api/orders/{id}
GET /api/orders/id:{id}
GET /api/orders/number:{orderNumber}
or this could be a chance to use an obscure feature of the URI specification, path parameters, which let you attach parameters to particular path elements:
GET /api/orders/{id}
GET /api/orders/{id};id_type=id
GET /api/orders/{orderNumber};id_type=number
The URL using an unqualified ID is the canonical one. There are two options for the behaviour of non-canonical URLs: either return the entity, or redirect to the canonical URL. The latter is more theoretically pure, but it may be inconvenient for users. Or it may be more useful for users, who knows!
Another way to approach this is to model an order number as its own thing:
GET /api/ordernumbers/{orderNumber}
This could return a small object with just the ID, which users could then use to retrieve the entity. Or even just redirect to the order.
If you also want a general search resource, then that can also be used here:
GET /api/orders?number={orderNumber}
In my case, i don't want such a resource (yet), and i could be uncomfortable adding what appears to be a general search resource that only supports one field.
So basically, you want to treat all ids and order numbers as unique identifiers for the order records. The thing about unique identifiers is, of course, they have to be unique! But your ids and order numbers are all numeric; do their ranges overlap? If, say, "1234" could be either an id or an order number, then obviously /api/orders/1234 is not going to reference a unique order.
If the ranges are unique, then you just need discriminator logic in the handler code for /api/orders/{id}, that can tell an id from an order number. This could actually work, say if your order numbers have more digits than your ids ever will. But I expect you would have done this already if you could.
If the ranges might overlap, then you must at least force the references to them to have unique ranges. The simplest way would be to add a prefix when referring to an order number, e.g. the prefix "N". So that if the order with id 1234 has order number 367652, it could be retrieved with either of these calls:
/api/orders/1234
/api/orders/N367652
But then, either the database must change to include the "N" prefix in all order numbers (you say this is not possible) or else the handler code would have to strip off the "N" prefix before converting to int. In that case, the "N" prefix should only be used in the API calls - user facing data-entry forms should not expose it! You can't have a "lookup by any identifier" field where users can enter either id or order number (this would have a non-uniqueness problem anyway.) Instead, you must have separate "lookup by id" and "lookup by order number" options. Then, you should be able to have the order number input handler automatically add the "N" prefix before submitting to the API.
Fundamentally, this is a problem with the database design - if this (using values from both fields as "unique identifiers") was a requirement, then the database fields should have been designed with this in mind (i.e. with non-overlapping ranges) - if you can't change the order number format, then the id format should have been different.

Drools: Merged cells in CSV

I have a set of Drools rules stored in an Excel document that for various reasons needs to be replaced with a .csv file. The problem is that .csv files don't support merged cells, making it difficult if not impossible to properly convert the rules.
After a lot of googling, I found references to using "..." to indicate merged cells, but no explicit examples on how to use it. Documentation found in the source code gives a few more hints, but is still too ambiguous; I've tried countless different interpretations of it without any success.
Any help would be appreciated.
We had the same issue as you. After reviewing their source code: CsvParser + DefaultRuleSheetListener, I found the solution. Here this post can help you to save time.
Only specify ... at ObjectType Matching row, i.e. the one below CONDTION, ACTION row. Starting from the begin of the Merged cell to the end of the merged cell. Please note, for the continued merged cell, you cannot just use "...", but the code will ignore it after normalized and trim and treat it as an empty cell and silently ignore it. Put anything such as a..., b..., etc. Here is the example.
Please also note Drools uses buffered reader, not CSV reader, it cannot handle one cell value spanning multiple lines. Unless you have your CSVParser which uses CSVReader.
Here is a simplified example.
CONDITION,CONDITION,CONDITION,ACTION,ACTION
$Client:Client(),$Product:Product()...,anythingButNotJust3Dots...,,
"clientType == ""$param""","planType == ""$param""","accountType == ""$param""","documents.add(""$param"");","documents.add(""$param"");"
INDIVIDUAL,RRSP,CASH,document1,document2
INDIVIDUAL,RESP,CASH,document2,
INDIVIDUAL,RIF,CASH,document3,
INDIVIDUAL,,MARGIN,document4,document6

When could a CSV records *not* have the same number of fields?

I am storing a series of events to a CSV file, each event type comes with a different set of data.
To illustrate, say I have two events (there will be many more):
Running, which has a data set containing speed and incline.
Sleeping, which has a data set containing snores.
There are two options to store this data in CSV records:
Option A
Storing each possible item of data in it's own field...
speed, incline, snores
therefore...
15mph, 20%, ,
, , 12
16mph, 20%, ,
14mph, 20%, ,
Option B
Storing each event in its own record...
event, value1...
therefore...
running, 15mph, 20%
sleeping, 12
running, 16mph, 20%
running, 14mph, 20%
Without a specific CSV specification, the consensus seems to be:
Each record "should" contain the same number of comma-separated fields.
Context
There are a number of events which each have a large & different set of data values.
CSV data is to be of use to other developers (I will/could/should/won't use either structure).
The 'other developers' to be toward the novice end of the spectrum and/or using resource limited systems. CSV is accessible.
The CSV format is being provided non-exclusively as feature not requirement. Although, if said application is providing a CSV file it should be provided in the correct manner from now on.
Question
Would it be valid – in this case - to go with Option B?
Thoughts
Option B maintains a level of human readability, which is an advantage say CSV is read by human not processor. Neither method is more complex to parse using a custom parser, but will Option B void the usefulness of a CSV format with other libraries, frameworks, applications et al. With Option A future changes/versions to the data set of an individual event may break the CSV structure (zombie , , to maintain forwards compatibility); whereas Option B will fail gracefully.
edit
This may be aimed at students and frameworks like OpenFrameworks, Plask, Proccessing et al. where CSV is easier to implement.
Any "other frameworks, libraries and applications" I've ever used all handle CSV parsing differently, so trying to conform to one or many of these standards might over-complicate your end result. My recommendation would be to keep it simple and use what works for your specific task. If human readbility is a requirement, then CSV in the form of Option B would work fine. Otherwise, you may want to consider JSON or XML.
As you say there is no "CSV Standard" with regard to contents. The real answer depend on what you are doing and why. You mention "other frameworks, libraries and applications". The one thing I've learnt is "Dont over engineer". i.e. Don't write reams of code today on the assumption that you will plug it into some other framework tomorrow.
I'd say option B is fine, unless you have specific requirements to use other apps etc.
< edit >
Having re-read your context, I'd probably pick one output format and use it, and forget about having multiple formats:
Having multiple output formats is a source of inconsistency (e.g. bug in one format but not another).
Having multiple formats means more code that needs to be
tested
documented
supported
< /edit >
Is there any reason you can't use XML? Yes, it's slightly more difficult to parse, at least for novices, but if so they probably need the practice. File size would be much greater, of course, but it's compressible.

Parse and change elements from database to the XML file

I got XML items.xml file with (almost) the same values as my items table has, I mean fe. there is a field in the items table: level and for any id the level is set to 144, but in the XML file, the level= attribute is set to "1" (for the same id) - what is the best way to correct values like this?
It should go like this:
Check value level in the database table for any id.
If the level value from the database is other than the level="" attribute for this ID, set it to the same level value as in the
database.
It can be kinda hard, since there is about ~40000 records to check.
I will appreciate some examples also!
Depending on what programming-language you are using, find the corresponding StAX-implementation. For Java I would go with XMLStreamReader (JavaDocs) and XMLStreamWriter (JavaDocs). You should find some tutorials on the internet.
When you encounter the START_ELEMENT event while reading the XML, check the tag's name (getLocalName()). If you are on the correct tag, check for the attributes, i.e. using the getAttribute...()-methods and handle the writing differently.
Along all of this, use an XMLStreamWriter to write your new XML to some OutputStream. After all, just write the OutputStream to whereever you wish (File, etc.).
Don't forget to read your Input-XML using a BufferedInputStream (or some other buffered way).
Good luck!
P.S.: You can also use XMLEventReader or XMLEventWriter, but personally I prefer XMLStreamReader / XMLStreamWriter. Also, you could use different StAX-Implementations like Woodstox.
P.P.S.: For PHP use XMLReader and XMLWriter. See here.

Listing of All Mysql Data Types and Syntax For All Settings

I'm looking for a listing of all MySQL data types and the available settings for each option for each data type.
After a bit of googling I couldn't find anything quite like that.
here you can find a quick summary of mysql data types, with range, attributes and default value
For completeness' sake, don't forget the MySQL documentation.
Although the list is broken across multiple pages, often with a lot of commentary in between, it's a useful resource when you need to check some aspect of a particular type. There are also overviews of the basic types, but again, there's a lot of cruft mixed in with it.
if anyone ever needs them as a json array:
"[\"TINYINT[(M)]\", \"SMALLINT[(M)]\", \"MEDIUMINT[(M)]\", \"INT[(M)]\", \"BIGINT[(M)]\", \"FLOAT(p)\", \"FLOAT[(M,D)]\", \"DOUBLE[(M,D)]\", \"DECIMAL[(M,[D])]\", \"BIT[(M)]\", \"CHAR[(M)]\", \"VARCHAR(M)\", \"TINYTEXT\", \"TEXT\", \"MEDIUMTEXT\", \"LONGTEXT\", \"BINARY[(M)]\", \"VARBINARY(M)\", \"TINYBLOB\", \"BLOB\", \"MEDIUMBLOB\", \"LONGBLOB\", \"ENUM(\\\"A1\\\",\\\"A2\\\",...)\", \"SET(\\\"A1\\\",\\\"A2\\\",...)\", \"DATE\", \"DATETIME\", \"TIME\", \"TIMESTAMP\", \"YEAR\"]"