Merge two collections excluding default values in immutable js - immutable.js

I want to merge two records created with the same constructor.
Record A gets initialized with values for the fields a,b,c while record B gets initialized with a value only forfoo.
The constructor has default values for all fields, so both records have a,b,c,foo as fields.
Now I want to merge Record B "on top of" A, such as the new record, will contain a,b,c from A and foo from B.
What actually happens, is that B completely overrides the values in A (admittedly, this sounds logical).
Is there a known / easy way to merge the records, excluding default values? I am thinking something along writing a function that recognizes the constructor, finds the default values from a config file, and has some logic to exclude default values, but that sounds error prone (how do I diffrentiate between a default value, and a value that is legitimate, but is exactly like the default?).
Also, I am working in an existing codebase and would like to make changes as small as possible.

i think you want mergeWith docs
it might even make sense to hang a method off of either / both type A and type B to expose your custom merge logic. this would allow you to more easily identify default values (since presumably they'll be in scope) as well as provide convenient access.
usage would look something like:
a instanceof A; //=> true
b instanceof B; //=> true
a.mergeB(b); //=> a w/ some or all of b's data
b.mergeA(a); //=> b w/ some or all of a's data

Related

SSIS consolidate and concatenate multiple rows into single rows without using SQL

I am trying to accomplish something that is pretty easy to do in SQL, but seemingly very challenging to do in SSIS without using SQL. Basically, I need to consolidate and concatenate a field of a many-to-one relationship.
Given entities: [Contract Item] (many) to (one) [Account]
There is a field [ari_productsummary] that contains the product listed on the Contract Item entity. We want to write that value to the Account as [ari_activecontractitems]. However, an Account may have more than one Contract Item record associated to it, in which case, we want to concatenate those values. We also only want the distinct values to be concatenated (distinct rows already solved within my data flow).
This can be accomplished by writing to a temporary table, and then using a query or view to obtain the summarized results as followed. I created a SQL table called TESTTABLE that contains the [ari_productsummary] from the Contract Item entity along with the referring [accountid] to map it back to Account. I then wrote the following query as a view:
SELECT distinct accountid,
(SELECT TT2.ari_productsummary + '; '
FROM TESTTABLE TT2
WHERE TT2.accountid = TT.accountid
FOR XML PATH ('')
) AS 'ari_activecontractitems'
FROM TESTTABLE TT
Executing that Query provides me the results that I want, which I can then use for importing into the Account entity as shown below:
But how do I do this in a SSIS dataflow without writing to a SQL table as a temporary placeholder for the data?? I want to do the entire process inside one dataflow container, without using a temporary SQL table/view. The whole summarization process needs to be done on the fly:
Does anyone have a solution that doesn't require a temporary SQL table/view/query, but is contained entirely within the data flow?
I am using VS 2017 and the KingswaySoft Dynamic CRM 365 ETL toolset to develop my solution/package.
Spit balling here as I don't Dynamics nor do I have the custom components.
Data Flow 1 - Contract aggregation
The purpose of this data flow is to replicate your logic in the elegant query you provided and shove that into a Cache Connection Manager (see Notes for 2008+ at the end)
KingswaySoft Dynamics Source -> Script Task -> Cache Transform
If you want to keep the sort in there, do it before the script task. The implementation I'll take with the Script Task is that it's fully blocking - that is all the rows must arrive before it can send any on. Tasks like the Merge Join are only partially blocking because the requirement of sorted data means that once you no longer have a match for the current item, you can send it on down the pipeline.
The Script Task is going to be asynchronous transformation. You'll have two output columns, your key accountid and your new derived column of ari_activecontractitems. That column will might need to be big - you'll know your data best but if it's a blob type in Dynamics (> 4k unicode or > 8k ascii characters) then you'll have to define the data type as DT_TEXT/DT_NTEXT
As inputs, you'll select accountid and ari_productsummary from your source.
The code should be pretty easy. We're going to accumulate the inbound data into a Dictionary.
// member variable
Dictionary<string, List<string>> accumulator;
The PreProcess method, we'll tack this in there to initialize our variable
// initialize in PreProcess method
accumulator = new Dictionary<string, List<string>>();
In the OnBufferRowSent (name approx)
// simulate the inbound queue
// row_id would be something like Rows.row_id
if (!accumulator.ContainsKey(row_id))
{
// Create an empty dictionary for our list
accumulator.Add(row_id, new List<string>());
}
// add it if we don't have it
if (!accumulator[row_id].Contains(invoice))
{
accumulator[row_id].Add(invoice);
}
Once you get the signal sent of no more data available, that's when you start buffering output data. The auto generated code will have placeholders for all this.
// This is how we shove data out the pipe
foreach(var kvp in accumulator)
{
// approximately thus
OutputBuffer1.AddRow();
OutputBuffer1.row_id = kvp.Key;
OutputBuffer1.ari_productsummary = string.Join("; ", kvp.Value);
}
We have an upcoming release that comes with a component that does exactly what you are trying to achieve without the need of writing custom code. The feature is currently under preview, please reach out to us for private access to the feature. You can find our contact information on our website.
UPDATE - June 5, 2020, we have made the components available for public access at https://www.kingswaysoft.com/products/ssis-productivity-pack/ as a result of our 2020 Release Wave 1. We have two components available that serve this kind of purpose. The Composition component will take input values and transform into a composite value in a SSIS column. The Decomposition component does the opposite, it would take an input value and split it into multiple rows using either delimiter-based text splitting or XML/JSON array splitting.

Indexeddb sorting with multiple indexes

I have a file object store by indexing name and library_id like below,
let objectStore = db.createObjectStore('file', { keyPath: 'id' });
tempStore.createIndex('nameLibId', ['attributes.name', 'attributes.library_id'], { unique: false });
The object store contains multiple library id's files. I'd like apply the name sort to the particular library id's files. I tried indexing in the below format but it returns empty data.
let self = this,
db = get(self, 'db'),
transaction = db.transaction(["file"], "readonly"),
objectStore = transaction.objectStore("file"),
index = objectStore.index('nameLibId'),
keyRange = IDBKeyRange.only('library_id')),
req = index.getAll(keyRange);
req.onsuccess = ((e)=>{
console.log(e.target.result); // returns empty array
});
Attached the screenshot of db model for reference.
24536475, abc, created, jhgf and lastmodified file names are belongs to a library id called 123.
Screen Shot..* file names are belongs to an another library id called 234.
I need the files which are sorted by name only the given library id. Any help would be highly appreciated.
If your index is based on a properties array and you want to match something using IDBKeyRange.only, then your parameter to IDBKeyRange.only should also be an array. Right now you are comparing a basic string value against a properties array value, where of course nothing matches. In other words, you cannot query against a two-part array using only one part of it.
Furthermore, the parameter to IDBKeyRange.only isn't a property name, it is a value. You want to specify a value to match in the index's set of keypath values. For example, if your index was based exclusively on attributes.name, then you would want to specify a particular value within that index, such as "abc".
And so, taking into account the above two points, and given that your index is not a single value but is instead an array of two properties, you need to revise your parameter to IDBKeyRange.only to look for an array. Something like IDBKeyRange.only(['abc', 'yoktc....']);.
Now, this is further complicated by the fact that what you are doing in your code does not actually accomplish what you want. Ignoring the sort concern for a moment, you only want to use the id condition, and not the name, when matching rows of this index. So you might be tempted to try IDBKeyRange.only([undefined, 'asdf']). Unfortunately this will not work at all because you cannot specify undefined (you will get a javascript error).
So, you must always query by both values, even though you only want to apply criteria to one of the values. The trick here is that you switch to using a different method than only. You use IDBKeyRange.bound(), and furthermore, you do a trick where you specify a criteria such as "smallest possible number is less than my number and my number is less than largest possible number", e.g. a condition that always is true. You use "smallest possible value" as your lower boundary, and "largest possible value" as your upper boundary.
Here is an example in your case. The smallest possible value of name I think is empty string. The largest possible value of name is probably any non-alphanumeric character, so let's use tilde "~". So, now we would rewrite the range parameter. Instead of using IDBKeyRange.only, we use IDBKeyRange.bound. It looks like the following (roughly):
var libId = ???;
var smallestNameValue = '';
var largestNameValue = '~';
var lowerBound = [smallestNameValue, libId];
var upperBOund = [largestNameValue, libId];
var range = IDBKeyRange.bound(lowerBound, upperBound);
Now, the second part, regarding sorting, and a major caveat of using indices that have multiple parts (not to be confused with the multiPart index property, ugh). And I myself get this backwards all the time, so I might even be wrong here and the above will work. The problem with the above is that one the first criterion is met the second is ignored, because of how the short-circuited array sorting algorithm works in indexedDB's comparison function. Your query is going to match everything, because every index row meets the criteria. So the trick to this is to always query first by the important condition, to basically pay attention to the order in which you specify your conditions. So what that means is that you need to switch the order of the properties you specified when creating the index, so that you can query first by libId and then by name.
Instead of createIndex('nameLibId',['attributes.name','attributes.library_id']); you want to do createIndex('nameLibId',['attributes.library_id', 'attributes.name']);. And this also means you need to swap your lower and upper bound queries, e.g. var lowerBound = [libId, smallestNameValue]; (and don't forget to switch the upper).
As I mentioned in my answer on using compound indices, you can always using indexedDB.cmp to experiment. Right now, open up the console on this web page. In the console, type something like this:
indexedDB.cmp(['', '5'], ['~', '5']);
Take a look at the results.
Some final notes:
Tilde might be the wrong thing to use, sorry but I am not bothering to remember, you could also just try any valid sentinel value, where by sentinel I mean any value you know will always come after all your other valid values
As I point out in my other answer, if either prop is missing in the data the actual object won't match
for cmp, -1 means left is less than right, 0 means left equals right, and 1 means left greater than right

IndexedDB - boolean index

Is it possible to create an index on a Boolean type field?
Lets say the schema of the records I want to store is:
{
id:1,
name:"Kris",
_dirty:true
}
I created normal not unique index (onupgradeneeded):
...
store.createIndex("dirty","_dirty",{ unique: false })
...
The index is created, but it is empty! - In the index IndexedDB browser there are no records with Boolean values - only Strings, Numbers and Dates or even Arrays.
I am using Chrome 25 canary
I would like to find all records that have _dirty attribute set to true - do I have to modify _dirty to string or int then?
Yes, boolean is not a valid key.
If you must, of course you can resolve to 1 and 0.
But it is for good reason. Indexing boolean value is not informative. In your above case, you can do table scan and filter on-the-fly, rather than index query.
The answer marked as checked is not entirely correct.
You cannot create an index on a property that contains values of the Boolean JavaScript type. That part of the other answer is correct. If you have an object like var obj = {isActive: true};, trying to create an index on obj.isActive will not work and the browser will report an error message.
However, you can easily simulate the desired result. indexedDB does not insert properties that are not present in an object into an index. Therefore, you can define a property to represent true, and not define the property to represent false. When the property exists, the object will appear in the index. When the property does not exist, the object will not appear in the index.
Example
For example, suppose you have an object store of 'obj' objects. Suppose you want to create a boolean-like index on the isActive property of these objects.
Start by creating an index on the isActive property. In the onupgradeneeded callback function, use store.createIndex('isActive','isActive');
To represent 'true' for an object, simply use obj.isActive = 1;. Then add or put the object into the object store. When you want to query for all objects where isActive is set, you simply use db.transaction('store').index('isActive').openCursor();.
To represent false, simply use delete obj.isActive; and then add or or put the object into the object store.
When you query for all objects where isActive is set, these objects that are missing the isActive property (because it was deleted or never set) will not appear when iterating with the cursor.
Voila, a boolean index.
Performance notes
Opening a cursor on an index like was done in the example used here will provide good performance. The difference in performance is not noticeable with small data, but it is extremely noticeable when storing a larger amount of objects. There is no need to adopt some third party library to accomplish 'boolean indices'. This is a mundane and simple feature you can do on your own. You should try to use the native functionality as much as possible.
Boolean properties describe the exclusive state (Active/Inactive), 'On/Off', 'Enabled/Disabled', 'Yes/No'. You can use these value pairs instead of Boolean in JS data model for readability. Also this tactic allow to add other states ('NotSet', for situation if something was not configured in object, etc.)...
I've used 0 and 1 instead of boolean type.

Define custom POST method for MyDAC

I have three tables objects, (primary key object_ID) flags (primary key flag_ID) and object_flags (cross-tabel between objects and flags with some extra info).
I have a query returning all flags, and a one or zero if a given object has a certain flag:
SELECT
f.*,
of.*,
of.objectID IS NOT NULL AS object_has_flag,
FROM
flags f
LEFT JOIN object_flags of
ON (f.flag_ID = of.flag_ID) AND (of.object_ID = :objectID);
In the application (which is written in Delphi), all rows are loaded in a component. The user can assign flags by clicking check boxes in a table, modifying the data.
Suppose one line is edited. Depending on the value of object_has_flag, the following things have to be done:
If object_has_flag was true and still is true, an UPDATE should be done on the relevant row in objects_flags.
If object_has_flag was false but is now true, and INSERT should be done
If object_has_flag was true, but is now false, the row should be deleted
It seems that this cannot be done in one query https://stackoverflow.com/questions/7927114/conditional-replace-or-delete-in-one-query.
I'm using MyDAC's TMyQuery as a dataset. I have written separate code that executes the necessary queries to save changes to a row, but how do I couple this to the dataset? What event handler should I use, and how do I tell the TMyQuery that it should refresh instead of post?
EDIT: apparently, it is not completely clear what the problem is. The standard UpdateSQL, DeleteSQL and InsertSQL cannot be used because sometimes after editing a line (not deleting it or inserting a line), an INSERT or DELETE has to be done.
The short answer is, to paraphrase your answer here:
Look up the documentation for "Updating Data with MyDAC Dataset Components" (as of MyDAC 5.80).
Every TCustomDADataSet (such as TMyQuery) descendant has the capability to set update SQL statements using SQLInsert, SQLUpdate and SQLDelete properties.
TMyUpdateSQL is also a promising component for custom update operations.
It seems that the easiest way is to use the BeforePost event, and determine what has to be done using the OldValue and NewValue properties of several fields.

"Diffing" objects from a relational database

Our win32 application assembles objects from the data in a number of tables in a MySQL relational database. Of such an object, multiple revisions are stored in the database.
When storing multiple revisions of something, sooner or later you'll ask yourself the question if you can visualize the differences between two revisions :) So my question is: what would be a good way to "diff" two such database objects?
Would you do the comparison at the database level? (Doesn't sound like a good idea: too low-level, and too sensitive to the schema).
Would you compare the objects?
Would you write a function that "manually" compares the properties and fields of two objects?
How would you store the diff? In a separate, generic "TDiff" object?
Any general recommendations on how to visualize such things in a user interface?
Advice, or stories about your own experiences with this, are very welcome; thanks a bunch!
Extra info on use case (20090515)
In reply to Antony's comment: this specific application is used to schedule training courses, run by teams of teachers. The schedule of a teacher is stored in various tables in the database, and contains info such as "where does she have to go on which day", "who are her colleagues in the team", etc. This information is spread out over multiple tables.
Once in a while, we "publish" the schedule, so the teachers can see it on a webpage. Each "publication" is a revision, and we'd like to be able to show the users (and later also the teachers) what's changed between two publications --- if anything.
Hope that makes the scenario a bit more tangible :)
Some final remarks
Well, the bounty has come to an end, so I've accepted an answer. If it'd somehow be possible to slice a couple of extra 100's off of my rep and give it to some of the other answers, I would do so without hesitation. All your guys' help has been great, and I am very grateful! ~ Onno 20090519
Just an idea, but would it be worthwhile for you to convert the two object versions being compared to some text format and then comparing these text objects using an existing diff program - like diff for example? There are lots of nice diff programs out there that can offer nice visual representations, etc.
So for example
Text version of Object 1:
first_name: Harry
last_name: Lime
address: Wien
version: 0.1
Text version of Object 2:
first_name: Harry
last_name: Lime
address: Vienna
version: 0.2
The diff would be something like:
3,4c3,4
< address: Wien
< version: 0.1
---
> address: Vienna
> version: 0.2
Assume that a class has 5 known properties - date, time, subject, outline, location. When I look at my schedule, I'm most interested in the most recent (ie current/accurate) version of these properties. It would also be useful for me to know what, if anything, has changed. (As a side note, if the date, time or location changed, I'd also expect to get an email/sms advising me in case I don't check for an updated schedule :-))
I would suggest that the 'diff' is performed at the time the schedule is amended. So, when version 2 of the class is created, record which values have changed, and store this in two 'changelog' fields on the version 2 object (there must already be one parent table that sits atop all your tables - use that one!). One changelog field is 'human readable text' eg 'Date changed from Mon 1 May to Tues 2 May, Time changed from 10:00am to 10:30am'. The second changelog field is a delimted list of changed fields eg 'date,time' To do this, before saving you would loop over the values submitted by the user, compare to current database values, and concatenate 2 strings, one human readable, one a list of field names. Then, update the data and set your concatenated strings as the 'changelog' values.
When displaying the schedule load the current version by default. Loop through the fields in the changelog field list, and annotate the display to show that the value has changed (a * or a highlight, etc). Then, in a separate panel display the human readable change log.
If a schedule is amended more than once, you would probably want to combine the changelogs between version 1 & 2, and 2 & 3. Say in version 3 only the course outline changed - if that was the only changelog you had when displaying the schedule, the change to date and time wouldn't be displayed.
Note that this denormalised approach won't be great for analysis - eg working out which specific location always has classes changed out of it - but you could extend it using an E-A-V model to store the change log.
Doing a comparison at the database level would be good if what you cared about was changes to the database. That makes the most sense if you're trying to design a layer of generic functionality on top of the database itself.
Doing a comparison at the object level would be good if you care about changes to the data. For example, if the data was the input to a program and you were interested in looking at changes in the input to verify that changes to the output were correct.
Your use case doesn't appear to be either of these. You appear to care about the output and want differences from that perspective. If that's the case, I would do differences on the output report (or a pure-text version of it) instead of on the underlying data. You can do that with any off-the-shelf diff tool. To make things easier for your end-users you could parse the diff results and render them as HTML. There are lots of options here: side-by-side with color coding to indicate changes, one document with markup for changes (e.g. red strikethrough for deletions and green for additions), maybe just highlight areas that have changed and use balloons to show the previous/current values on demand.
I've thought about doing database comparisons but never tried to implement it. As you noted, any such attempts are intimately intertwined with the schema.
I have done object-level comparisons. The general algorithm was this:
Do a set comparison on the lists of object IDs. This creates three result groupings: added objects, deleted objects, and objects that live in both sets.
Report the deletions.
Report the additions.
For the things in both sets, do an attribute-by-attribute comparison.
If any differences are found, report the object ID, the attributes that differ, and the respective values. If appropriate, highlight the portion of the attribute value that has changed.
In my case, the comparison algorithms were hand-written to match the object attributes. This gave me control over which attributes were compared and how. A generic comparator might be possible for some cases but would depend on the situation and at least partially on the implementation language.
I've looked into MysQL Diffing a number of times. Unfortunately, there aren't any really good solutions available.
One tool I've tried was mysqldiff (www.mysqldiff.org). mysqldiff is a tool written in PHP which is capable of diffing mysql schemas. Unfortunately, it doesn't do a great job a lot of the time.
MySQL Workbench, MySQLs own SQL IDE provides the option to generate an alter script and I would imagine it does this by performing some kind of diff operation internally.
Aqua Data Studio is another tool that is capable of comparing schemas and outputing a diff of the two. While the ADS diff is quite nice, it does not provide a tool to create an alter script.
If I were writing my own I guess I would write code capable of comparing structure of two tables. Such code could be tuned to be highly sensitive (Ig if column order differs from from version to the next, it's a difference) or more moderately sensitive (Eg Column order is not a major issue, datatypes and lengths are important, as are indices and constraints).
Storage, I'm not to sure. I would look into how a version control system such as Mercurial stores its diff information for revisions and use that to elaborate a method appropriate for the DB.
Finally, for visual output I recommend you take a look at the Aqua Data Stduio compare feature (You can use the Trial version to test this...). Its diff output is pretty good.
My application dbscript compares hierarchical data (database schemas) in a stored procedure, which of course has to compare each field/property of every object with its counterpart. I guess you won't get around that step (unless you have a generic object description model)
As for the UI part of your question, have a look at screenshots to view and select differences.
I would think about some sort of common text representation of the objects and let the texts compare with an existing diffing tool like WinMerge.
I see no need to invent diffing by myself since there are already plenty of nice tools I can use.
In your situation in PostgreSQL I used a difference tables with the schema:
history_columns (
column_id smallint primary key,
column_name text not null,
table_name text not null,
unique (table_name, column_name)
);
create temporary sequence column_id_seq;
insert into history_columns
select nextval('column_id_seq'), column_name, table_name
from information_schema.columns
where
table_name in ('table1','table2','table3')
and table_schema=current_schema() and table_catalog=current_database();
create table history (
column_id smallint not null references history_columns,
id int not null,
change_time timestamp with time zone not null
constraint change_time_full_second -- only one change allowed per second
check (date_trunc('second',change_time)=change_time),
primary key (column_id,id,change_time),
value text
);
And on the tables I used a trigger like this:
create or replace function save_history() returns trigger as
$$
if (tg_op = 'DELETE') then
insert into historia values (
find_column_id('id',tg_relname), OLD.id,
date_trunc('second',current_timestamp),
OLD.id );
[for each column_name] {
if (char_length(OLD.column_name)>0) then
insert into history values (
find_column_id(column_name,tg_relname), OLD.id,
OLD.change_time, OLD.column_name
)
}
elsif (tg_op = 'UPDATE') then
[for each column_name] {
if (OLD.column_name is distinct from NEW.column_name) then
insert into history values (
find_column_id(column_name,tg_relname), OLD.id,
OLD.change_time, OLD.column_name
);
end if;
}
end if;
$$ language plpgsql volatile;
create trigger save_history_table1
before update or delete on table1
for each row execute procedure save_history();
This isn't really an answer to the question you asked rather an attempt to re-imagine the problem. Would you consider altering your database and object model to store the aggregate root and a series of deltas? That is, model and store RevisionSets that are collections of Revisions; a Revision is an entity property paired with a value. In a sense this is internalizing the revision structure into your architecture that the other posters are suggesting that you bolt-on to what you already have via "logs".
It's trivial to display the aggregate from the deltas, and even easier to display the deltas as a change history. The fact that you are using a rich client with state and local memory makes this even more compelling. You could very easily display "all the changes since date xxxx" without revisiting the database.
Credit for the basic idea goes to Greg Young and his work with financial data streams, but it is imminently applicable to your problem.
I'm riffing off of what Harry Lime suggested: Output your properties to text format, then hash the results. That way you can compare the hash values and easily flag the data that has been altered. This way you get the best of both worlds as you can visually see differences but programmatically identify differences. With the has you'll have a good source for an index should you want to store and retrieve the deltas.
Given you want to create a UI for this and need to indicate where the differences are, it seems to me you can either go custom or create a generic object comparer - the latter being dependent on the language you are using.
For the custom method, you need to create a class that takes to two instances of the classes to be comparied. It then returns differences;
public class Person
{
public string name;
}
public class PersonComparer
{
public PersonComparer(Person old, Person new)
{
....
}
public bool NameIsDifferent() { return old.Name != new.Name; }
public string NameDifferentText() { return NameIsDifferent() ? "Name changed from " + old.Name + " to " + new.Name : ""; }
}
This way you can use the NameComparer object to create your GUI.
The gereric approach would be much the same, just that you generalize the calls, and use object insepection (getObjectProperty call below) to find differences;
public class ObjectComparer()
{
public ObjectComparer(object old, object new)
{
...
}
public bool PropertyIsDifferent(string propertyName) { return getObjectProperty(old, propertyName) != getObjectProperty(new, propertyName) };
public string PropertyDifferentText(string propertyName) { return PropertyIsDifferent(propertyName) ? propertyName + " " + changed from " + getObjectProperty(old, propertyName) + " to " + getObjectProperty(new, propertyName): ""; }
}
}
I would go for the second, as it makes things really easy to change GUI on needs. The GUI I would try 'yellowing' the differences to make them easy to see - but that depends on how you want to show the differences.
Getting the object to compare would be loading your object with the initial revision and latest revision.
My 2 cents... Not as techy as the database compare stuff already here.
Have you looked at Open Source DiffKit?
www.diffkit.org
I think it does what you want.
Example with Oracle.
Export ordered objects to text with dbms_metadata
Export ordered tables data into CSV or query format
Make big text file
Diff