How do I change the default behaviour to store the information for each entity in a different table - fiware

I want to configure Orion and Cygnus to store all data in a single table.
I know that I should configure the names of databe and table based in HTTP headers like so:
dbName=<fiware-service-header>
tableName=<fiware-servicePath-header>_<entityId>_<entityType>
I was told in this post to ask another question.

Cygnus uses the notified fiware-service and fiware-servicePath headers to compose the names of the different backend elements. Specifically:
MySQL
databases are called as <fiware-service>
table names are called as <fiware-servicePath>_<destination>
HDFS
HDFS paths are created as /user/<your_user>/<fiware-service>/<fiware-servicePath>/<destination>/<destination>.txt
CKAN
organizations are called as <fiware-service>
packages/datasets are called as <fiware-servicePath>
resources are called as <destination>
By default, <destination> is equals to <entityId>_<entityType>. This may lead, as described in the question, to the creation of a MySQL table/HDFS folder/CKAN resource per each notified entity.
Such a default destination generation may be changed by using an advanced feature of Cygnus, the pattern-based grouping; as the name suggests, the feature is based on finding (configured) patterns in the data, in order to group the context data showing the pattern. This feature allows, for instance, all the entities of a certain type are stored within a single MySQL table; or certain entities starting by a prefix are stored together in a HDFS file.
In order to activate this feature, edit the /usr/cygnus/conf/matching_table.conf file and add as many matching rules as you need; matching rules syntax is described here. Basically, the rules say "once the pattern-based matching is confirmed, use this new <destination> and this new <fiware-servicePath>":
<rule_id>|<list_of_fields_to_be_compared>|<regular_expresion>|<new_destination>|<new_fiware-servicePath>
Thus, a "store all the data in a MySQL table called 'my_unique_table'" rule would look like:
<any_unique_number>|<entityId>|.*|unique_table|my_
or:
<any_unique_number>|<entityId>|.*|_table|my_unique
Both rules are valid since MySQL table names are created, as already said, by concatenating <fiware-servicePath> and <destination>; in this case table name is equals to "my_"+_+"unique_table" or "my_unique"+"_table".

Related

Querying a database record from flowfile content to retrive data using apache-nifi

My scenario is as followed.
From one process I retrieve data from a table.
id,user_name
1,sachith
2,nalaka
I need to retrieve account details from account_details table for these ids.
I have tried to use various database related processors. But none of them support flowfile content.
How can I retrieve records only for these id?
use below:
ExecuteSQL( account_details)
-> convertAvroToJSON
-> EvaluateJsonPath
->AttributesToJson
( here you take only id and ignore test)
Take a look at the LookupRecord using a DatabaseRecordLookupService controller service. That should allow you to use the id field to look up additional fields from a database and add them to the outgoing records. This is a common "enrichment" pattern, where the lookups can be done against databases, CSV files, etc.
You can use QueryRecord processor to query data from flowfiles. You will need to set a reader and a writer inside this processor to open your file properly and write as well. To create a query, you must create a property with the name of the query and put the query itself as the value for this property. After that, you can create an output stream for this property.
The query syntax is Apache Calcite.
You can find further explanation here

Managing Names in Mysql when they can be slightly different (e.g. "Matt" vs "Mathew")

I have a program that pulls sports data from multiple sources and congregates it in one central location.
I'm using mysql to manage the data, and I enter a lot of the data using load commands with .csv files.
I'm running into problems however when different sources switch between common spellings of the same name, like "Matthew" and "Matt", or "Michael" and "Mike".
I was thinking of perhaps having my program go through the database after it's loaded and manually change every name to some standard form, such as firstInitial.LastName.Team, so "Matt Johnson" on the team XYZ would become "M.Johnson.XYZ". This should work, but it seems hacky and leaves me open for cases where there's two players with the same firstInitial.LastName combo on the same team (unlikely, but still).
Just figured I'd see if any of you have other ideas.
You need to add some comparison logic for names that allows for aliases. The precise organization would depend on what operations you wish to implement.
Here are two possible examples:
Check a name against known aliases for a specific person
Here, you narrow down the search to one or more persons by search terms (surname, team, years etc), then check which one's name your version matches.
A possible alias-handling interface here is:
check_name(test_name,suspected_person_id) -> boolean
Possible implementation:
create table aliases
alias varchar(max),
person_id int foreign key references persons;
The function would be checking test_name against rows with the appropriate person_id. Do include handling for a case where multiple persons match (i.e. an ambiguity) - at least, make the code throw an exception until you actually encounter such a case in practice and can decide what to do with it.
Guess "reference" name by a possible alias
Here, you need some reference database of alternate forms for given names, with whatever structure it requires.
Your function will be querying that database, then check if any names stored in your DB is in the result. The interface is almost the same, the difference is you don't need to store aliases yourself:
guess_name(test_name,possible_candidates) -> person_id or multiple ids/error if ambiguous

Storing unconfirmed and confirmed data to a database

I am creating a web application using Strongloop using a MySQL database connector.
I want it to be possible, that a user can modify data in the application - but that this data will not be 'saved' until a user expressly chooses to save the data.
On the other hand, this is a web application and I don't want to keep the data in the user's session or local storage - I want this data to be immediately persisted so it can be recovered easily if the user loses their session.
To implement it I am thinking of doing the following, but I'm not sure if this is a good idea, or if there is a better way to be doing this.
This is one was I can implement it without doing too much customization on an existing relation:
add an new generated index as the primary key for the table
add a new generated index that represents the item in the row
this would be generated for new items, or set to an old item for edits
add a boolean attribute 'saved'
Data will be written as 'saved=false'. To 'save' the data, the row is marked saved and the old row is deleted. The old row can be looked up by it's key, the second attribute in the row.
The way I was thinking of implementing it is to create a base entity called Saveable. Then every Database entity that extends Saveable will also have the 'Saveable' property.
Saveable has:
A generated id number
A generated non id number - the key for the real object
A 'saved' attribute
I would then put a method in Savable.js to perform the save operation and expose it via the API, and a method to intercept new writes and store them as unsaved.
My question is - is this a reasonable way to achieve what I want?

Transactional Replication to different schemas?

I have database A and database B. I would like to do one way replication from A to B.
The only hitch is [A].[dbo].[table] needs to replicate to [B].[someschema].[table]. Is this easy (or possible to do)? The key requirement is that I have real time synch. I do not need to transform the table definition at all in db B.
Short answer yes, you can do this, but not without some effort.
FROM BOOKS ONLINE:
Schemas and Object Ownership
Replication has the following default behavior in the New Publication Wizard with respect to schemas and object ownership:
For articles in merge publications with a compatibility level of 90 or higher, snapshot publications, and transactional publications: by default, the object owner at the Subscriber is the same as the owner of the corresponding object at the Publisher. If the schemas that own objects do not exist at the Subscriber, they are created automatically.
For articles in merge publications with a compatibility level lower than 90: by default, the owner is left blank and is specified as dbo during the creation of the object on the Subscriber.
The object owner can be changed through the Article Properties - dialog box and through the following stored procedures: sp_addarticle, sp_addmergearticle, sp_changearticle, and sp_changemergearticle. For more information, see
http://msdn.microsoft.com/en-us/library/ms151197.aspx

What's the best way to save trivial user states (e.g. dismissed welcome msg) in database?

Should I use (create) a column for every new state? Or one field with a bunch of comma separated states (alternatively a json obj)? Any suggestions welcome.
UPDATE
First let me day thanks for the answers. I just want to clear up, what options I see:
Put a column for every state in the user row (initial plan) / Can get messy with lots of states (in the future)
Put one column with json/xml data in the user row / Easy to maintain (no db change required), but doesn't feel right
Have a dedicated states table (thx lhiles)/ Sounds cool, how would this table look like?
I'm looking for pros/cons of the different implementations. Again: Thanks!
Create a column for each state. This is proper data normalization.
With a column for each state you can retrieve as few or as many states as needed for the current operation.
All of the states returned will be contained in a single row with each column named. This makes referencing each state value very easy.
It allows you to easily add constraints to each state as needed. (State X can only contain '1' or '2'.)
It allows you to easily query states across users. (How many users have set a state value to 'X'?)
My preferred method is to create a dedicated table for user settings. Each state/setting corresponds to a column within that table. As your project grows additional columns can be added without cluttering your apps core data.
Another route, if you feel that there will be too many settings to devote 1 setting per column, would be to store the settings as XML (or json as you mentioned) data within SQL. This would allow you to derive any type of state format you wanted, however, it puts more work on the programmer to parse, validate, and persist those settings.
You can save state using an ENUM if the states are mutually exclusive; e.g. person is male or female.
Or using a SET if states can co-exist; e.g. person is a member of (AA and CA and SOsA*)
A sample table using both:
CREATE TABLE test.table1(
test_enum ENUM('male', 'female') DEFAULT 'male',
test_set SET('AA', 'CA', 'SOsA') DEFAULT NULL
)
ENGINE = INNODB;
If you're using an ENUM I personally would recommend you set an explicit default value other than null, because most of the time a choice must be made.
Link: http://dev.mysql.com/doc/refman/5.1/en/constraint-enum.html
* (stackoverflow sufferers anonymous)
I really wouldn't do this with a column per setting, as most of the other people are suggesting. I'd do a setting per row because this doesn't require schema changes (including upgrade scripts) every time you add a setting.
Better yet, write some reflection code to run on app startup that'll look at the entries on an enum and automatically create a record in the database (with some default value that you specify in a custom attribute on each enum value).
I recently did something like I'm indicating above. Now to add a new setting, I add an entry to an enum and that's it. New settings take about 10 seconds.
I may put my code up on CodeProject, it has has made development easy.