Add row into many to many relationship table after row creation with Talend - many-to-many

I am using Talend Open Studio for Data Integration to create a job that's supposed to do the following:
Get a row from table1
If a row with a certain value doesn't exist in table 2: add a row into table2
add a row into a many-to-many relationship table using id from table1's row and id from table2's (existing or newly created) row
for example, let's say i have the tables PROFILE, USER and REL_PROFILES_USERS, i need to populate USER starting from PROFILE and every PROFILE row with the same name will be associated to the same USER
so i start with the PROFILE table
id | name | address
-----------------------------------
1 | jsmith | 1234 Main Street
2 | jasonp | 321 Secondary Street
3 | jsmith | NULL
and i want to end up with USER popuplated this way
id | username
----------------
1 | jsmith
2 | jasonp
and REL_PROFILES_USERS
user | profile
----------------
1 | 1
1 | 3
2 | 2
I have managed to make point 1. and 2. with a simple tMap between PROFILE and USER and i am populating REL_PROFILES_USERS after the first tMap using a separate subjob with a tMap and a inner join on all matches between USER and PROFILE on PROFILE.name = USER.username.
What i'd like to do is to populate REL_PROFILE_USERS in the same subjob where i populate USER, soon after the new row has been inserted or when you see there's an already existing row, without knowing about the relationship between USER and PROFILE (that username = name) but only because i'm working on those rows

Hi you didn't state which database you're using, so I'm going to assume MS SQL for purposes of discussion Talend has equivalent database components for the major databases, plus generic JDBC for anything else.
I suggest trying this flow
tMSSqlInput (from profile) -> tMSSqlOutput (on user) -> tMSSqlLastInsertId (to get the inserted id) -> tMSSqlOutput (on ref_profiles_user)
Alternatively, you could turn on identity insert in tMsSqlOutput, on the first insert.That way you know the PK of row inserted into the user table.Then you can can proceed to insert into the rel_users_profile table. I recommend against generating your own PK for the 1st table unless this is the only process which inserts records into this table. Calling a sequence in the database to get the next sequence number is a bit safer than generating your own seq.
tMSSqlInput (from profile) -> (generate ID perhaps by calling get next sequence or generate your own in tMap) -> tMSSqlOutput (on user) -> tMSSqlOutput (on ref_profiles_user)
Have fun and good luck.

Related

Keep certain rows as constant in a MySQL table

I have a situation where I have a table, for example:
| id | type |
------------------
| 0 | Complete |
| 1 | Zone |
Now, I always want my database to be populated with these values, but additionally users should be able to CRUD their own custom types beyond these. For example, a user might decide they want a "Partial Zone" type:
| id | type |
---------------------
| 0 | Complete |
| 1 | Zone |
| 2 | Partial Zone |
This is all fine. But I don't want anyone to be able to delete/modify the first and second rows.
This seems like it should be so simple, but is there a common strategy for handling this case that ensures that these rows go unaffected? Should I put a lock column on the table and only lock these two values when I initially populate the database on application setup? Is there something much more obvious and elegant that I am missing?
Unless I'm missing something, you should be able to just add a third column to your table for the user ID/owner of the record. For the Complete and Zone records, the owner could be e.g. user 0, which would correspond to an admin. In your deletion logic, just check the ID column and do not allow admin records to be deleted by anyone from the application.
If this won't work, you could also consider having two tables, one for system records which cannot be deleted, and another one for user created records. You would have to possibly always take a union of the two tables when you query.

Better way for choosing next best max id for a shard

I have a use case for range based shard in mysql i.e
---------------------
shard | upperbound
---------------------
1 | 500
2 | 1000
3 | 1500
Meaning Shard1 has 1- 500 users and shard2 has 501-1000 users and so on. So that latter point on user base grow, will keep on adding new shards.
The user table looks like below:
-----------------------------
id | name | email | contact
-----------------------------
1 |Test |t#t.com|xxxxx
501 |Test1 |t#t.com|xxxxx
101 |Test2 |t#t.com|xxxxx
1001 |Test3 |t#t.com|xxxxx
So user with name Test resides in shard1, user with name test2 resides in shard2 and user with name Test3 resides in shard3 and so on.
When a new user registration happens, Based on the open shards lets say (Shard4 and Shard5 is open for registration) then it first does a select on the user table with range to get the next max possible Id and increment by 1 and stores it. Problem with this approach is when two different people tries to register and from code select returns same id for the two different people/thread as both the user registration can't be in same transaction.
Is there any other better way for choosing the next id for a shard using range ?
select max(id) from t for update
Should hold the lock on the max id till the transaction completes.

Database issue - how do I set up user accounts/pswds so they can ONLY add/change THEIR data?

Okay... I am working to create a mobile app that allows two groups of users to do two different things.
Essentially, the goal of the project is this:
Group A users: create account/pswd and can enter THEIR data into the database and/or change THEIR existing data (but ONLY their data)
Group B users: can SEARCH the database for information that is inserted by Group A. Down the track I'd like to set it up so that they can create an user account so they can also SAVE key information to THEIR account for faster recall (so they don't have to look up the info they search for regularly) -- but that is down the track.
I have a relational database set up using the mySQL that is available with my web-hosting account (it seemed to be the easiest way to go).
I'm just trying to work out how to handle the user account creation/authentication bit, because each group should ONLY be able to CHANGE/INSERT data to their own account, but can search for information submitted by anyone else.
Thanks in advance.
Use mysql facilites to manage permissions: roles, users and privileges.
Navigate through mysql official documentation (i.e. http://dev.mysql.com/doc/workbench/en/wb-adding-roles.html).
You can create two roles: groupA that can INSERT/SELECT/UPDATE one set of tables, groupB that can do the same but in another set of tables.
You can assign INSERT privilege in just the table you want, but SELECT privileges on all the tables.
Hope this info brings you some light...
Firstly this sounds like a huge project, I am sure there are frameworks out there that can do this for you. However, if you are trying to do this on your own continue reading.
This can be done several ways. I will try to be as detailed as possible. This requires SQL as well as application development/Software engineering knowledge.
Step 1: Setup your database
You will need the following tables: All ids are primary keys auto incremented, the other fields can be varchar, except fields that have date in their name
sessions [id, uid, random_token, datecreated]
resourcescope [rid, name]
user [uid, first, last, email, username, salted_pwd]
user_type [id, name, description]
user_resourcescope [id, uid, rid] //lookup table between userid and resourcescope
I prefer using Java or python because you can use dependency injection or decorators. As a result, you don't have to write a lot of code when checking if a user has access.
Putting it all into practice.
1. When a user signs up, you save them into a user database. Depending on the user type, you give them different permissions. Next, you save the user permissions inside the user_resourcescope table.
You should now have the following.
User Table
UID | first | last | email | username | salted_pwd | usertype
1 | james | iri | example#isp.com | jiri1928 | klasdjf8$kljs | 1
UserType table
usetype_id | Name
1 | Basic users
2 | Searcher
ResourceScope Table
rid | Name
1 | FindContent
2 | CreateContent
3 | DeleteContent
User_Resourcescope
id | uid | rid
1 | 1 | 1
2 | 1 | 3
Session
id | uid | random_token | datecreated
1 | 1 | ldkjfald882u3u | 1391274870322
Each resource represents a request within the system. For example,
http://api.myapi.com/content/add - This would be associated with the ResourceScope CreateContent
http://api.myapi.com/content/delete- This would be associated with the ResourceScope CreateDelete
http://api.myapi.com/content/search - This would be associated with the ResourceScope SearchContent
When someone tries to create content, you check if their cred are correct by validating their session information and you check to see if they have the correct permission by checking the User_Resourcescope table.
To prevent users from deleting content that is not theirs. Inside the content table you can add a creator field and put the user id associated with the content. And if someone try to delete content you can check their user id against the creator field.

How to get the right "version" of a database entry?

Update: Question refined, I still need help!
I have the following table structure:
table reports:
ID | time | title | (extra columns)
1 | 1364762762 | xxx | ...
Multiple object tables that have the following structure
ID | objectID | time | title | (extra columns)
1 | 1 | 1222222222 | ... | ...
2 | 2 | 1333333333 | ... | ...
3 | 3 | 1444444444 | ... | ...
4 | 1 | 1555555555 | ... | ...
In the object tables, on an object update a new version with the same objectID is inserted, so that the old versions are still available. For example see the entries with objectID = 1
In the reports table, a report is inserted but never updated/edited.
What I want to be able to do is the following:
For each entry in my reports table, I want to be able to query the state of all objects, like they were, when the report was created.
For example lets look at the sample report above with ID 1. At the time it was created (see the time column), the current version of objectID 1 was the entry with ID 1 (entry ID 4 did not exist at that point).
ObjectID 2 also existed with it's current version with entry ID 2.
I am not sure how to achieve this.
I could use a query that selects the object versions by the time column:
SELECT *
FROM (
SELECT *
FROM objects
WHERE time < [reportTime]
ORDER BY time DESC
)
GROUP BY objectID
Lets not talk about the performance of this query, it is just to make clear what I want to do. My problem is the comparison of the time columns. I think this is no good way to make sure that I got the right object versions, because the system time may change "for any reason" and the time column would then have wrong data in it, which would lead to wrong results.
What would be another way to do so?
I thought about not using a time column for this, but instead a GLOBAL incremental value that I know the insertion order across the database tables.
If you are interting new versions of the object, and your problem is the time column(I assume you are using this column to sort which one is newer); I suggest you to use an auto-incremental ID column for the versions. Eventually, even if the time value is not reliable for you, the ID will be.Since it is always increasing. So higher ID, newer version.

temp and main tables - mysql

I have a scenario where I need to insert the data into table temporarily and later on approval or confirmation, make it permanent. The data will be inserted by a user and approval or denial needs to be done by Super User.
What I think of now is to have two different but identical tables (temporary and main) and the user will insert the data into temp table. After confirmation of Super User, the data will be moved to main table. But the problem comes when a database contains very large number of tables then this process will become more complex.
EDIT : This implies to CREATE EDIT & DELETE commands.
Is there any simpler or better approach of doing this?
Please suggest.
Using a version table (related to comment):
The idea here is to have a version table; when your user changes a piece of information the new version is stored in this table along with the related ID.
Then all you need to do is join on the PersonID and select the most recent accepted version.
This means the user can make as many updates as they want but they won't show until the super user accepts them, it also means the data is never destroyed (stored in the version table) and they don't need to implement rollback as it's already there!
See: http://sqlfiddle.com/#!3/cc77f/4
People Table:
ID | Age Etc... (Info That Doesn't Change)
-----------------------
1 | 12
2 | 16
3 | 11
People Version Table:
VersionID | PersonID | Name | Approved
-----------------------
1 | 1 | Stevz | FALSE
2 | 1 | Steve | TRUE
3 | 2 | James | TRUE
4 | 3 | Jghn | FALSE
5 | 3 | John | TRUE
Example table SQL
CREATE TABLE People
(
id int identity primary key,
age int
);
CREATE TABLE PeopleVersion
(
versionId int identity primary key,
peopleId int,
name varchar(30),
approved varchar(30)
);
Example Query
SELECT * FROM People p
INNER JOIN PeopleVersion v ON p.id = v.peopleID
WHERE v.approved = 'TRUE'
ORDER BY versionId DESC
A further insight:
You could even have three states of Approved; null meaning no admin has chosen yet, TRUE meaning it was accepted and FALSE meaning it was rejected
You could show the user the most recent from null and true, show the admin all three and show the other users of the site only versions that were true
Old Comments
Could you just add a field called approved to the table and then hide anything without the approval flag set to TRUE?
It could default to FALSE and only the super user would be able to see items with the flag set to FALSE
E.g.
Name | Age | Approved
-----------------------
Steve | 12 | FALSE
James | 16 | TRUE
John | 11 | FALSE
The user would only see James, but the SuperUser would see all three listed
Alternatively using your temporary and main tables is the other way of looking at this problem, though this may lead to problems as everything get's larger
The easiest approach is a flag within the table marking an entry either approved or not-yet approved.
Then just change the retrieving logic to only show entries where that flag is set to approved.