Can Spark / Samza / Storm un-do past commits and regenerate views?

Can Spark / Samza / Storm un-do past commits and regenerate views? - mysql

I just watched Turning the database inside-out and noticed a similarity between Samza and Redux: all state consists of a stream of immutable objects.
This made me realize that if you edited the stream after-the-fact, you could in theory regenerate all materialized views based on the new list of transactions and in effect "undo" a past change to the database.
As an example, suppose I had the following series of diffs:
1. Add user "tom"
2. Add user "bob"
3. Delete user "bob"
4. Change user "tom"s name to "joe"
5. Add user "fred"
After this series of changes, our database looks like:
+-------+
| users |
+-------+
| joe |
| fred |
+-------+
Now what if I wanted to undo number "3"? Our new set of diffs would be:
1. Add user "tom"
2. Add user "bob"
4. Change user "tom"s name to "joe"
5. Add user "fred"
And our database:
+-------+
| users |
+-------+
| joe |
| bob |
| fred |
+-------+
While this sounds good in theory, can this actually be done using Samza, Storm, or Spark? Can any transaction-stream database do this? I'm interested in such functionality for administrative purposes. I have some sites where clients have accidentally deleted an employee or modified records they didn't mean to. In the past I solved this by creating a separate table which recorded all changes to the database, then when an issue arose I could (manually) look at this table, figure out what they did wrong, and (manually) fix the data.
It would be SO much cooler if I could just look at a transaction stream, remove the bad one, and say "regenerate the database"

Related

Create a new record for every user in a database based on a column

I have a MySQL database with a user table, after a new requirement I had to create a new table called social_media and from now on every new user will be created with a social_media_id column that holds a reference to their social media.
+===================+ +===================+
| user | | social_media |
+===================+ +===================+
|PK id | |PK id |
|FK social_media_id | | instagram |
| first_name | | facebook |
| last_name | | twitter |
| email | +===================+
+===================+
I want to update my database so that every user that didn't had a social media reference before gets one (even if the values inside are null), so they can update them if they wish, is there something I can do to make a new social_media record for every user that doesn't have one, and add the correct social_media_id foreign key for that user?

Ok #Jorche, this is too long to be a comment, but I do want to help.
First off, this is probably what youre data structure should look like:
Second, to be able to tell you how you enter these records is very difficult for me at this moment because I have absolutely ZERO requirements or any other business logic that would help me to help you on how to pinpoint the best approach to doing so. Odds are, you would have to work hand in hand with application developers, or ETL developers (that might even be you though) to figure out what that approach is. Maybe its a stored procedure that gets called, maybe its a trigger set up, hard to say for sure without additional context, ya know?
All we know at this point is that users exist and sometimes they have relational data related to social media entities. Your job is literally to understand that process flow and make the appropriate decisions on how to log that data in a way that makes sense from both an operational perspective and a database design perspective.
Hate to say it hombre, but the questions you have now are all entirely dependent on details you haven't provided.

Database schema many-to-many with defaults

I am hoping someone would be willing to take a look at this many-to-many relationship. This example is for a Laravel project, but the specifics shouldn't matter too much.
action
+----+------+--------+-------------+------+--------+------------+
| id | name | script | description | icon | custom | project_id |
+----+------+--------+-------------+------+--------+------------+
pipeline(action_server this is the pivot table)
+----+-----------+-----------+-------+
| id | action_id | server_id | order |
+----+-----------+-----------+-------+
server
+----+------+------------+------------+
| id | name | ip_address | project_id |
+----+------+------------+------------+
This many-to-many relationship is used for a deployment server an action is part of a deployment's pipeline.
An action can be executed on multiple servers.
A user can add an action with a custom script.
All the actions for a deployment pipeline can be fetched through a project_id
This concept works within Laravel and I could simply fetch my actions based on a given project_id. In turn I could fetch the server actions needed to run the deployment by using action->servers().
I need a way to add default actions though. Instead of actions always having a user supplied script, I want the ability to provide actions with pre-defined scripts for a user to select from and add to a deployment pipeline.
These pre-defined actions can't be stuffed in the action table because the actions defined there are tied into a project_id. These need to be generic.
I can't simply create another table for these pre-defined actions in my current setup because the action_id in my pipeline is already set up with a foreign key.
So far it feels like I am mixing 2 concepts, which are pre-defined actions and the user-defined actions which users have created themselves. They need to be in the same pipeline and eventually run in the right order though.
Any thoughts on how this might be achieved? I am open to all suggestions.
Edit
After drawing this out it seems a possible solution would be to add another pivot table in the form of action_project which allows me to decouple(remove) the project_id from the action table. I am wondering how to keep this clean in Laravel though.
action_project
+----+-----------+------------+
| id | action_id | project_id |
+----+-----------+------------+

Summarizing your problem in a conceptual way:
applications ("projects") have associated custom actions,
standard actions are not defined for a specific application
servers have/host applications
pipelines define which "actions" to perform on which server in which order
I think what you need is simply a generalization of custom actions and standard actions, corresponding to a superclass "action" that subsumes both cases. This leads to the following tables:
actions(id, type, name, description) with type being either custom or standard
custom_actions(id, script, icon, custom, project_id)
Alternatively, you could append the attributes of custom_actions to actions and have them all NULL for standard actions.

Recommendations for table structure in MySQL

Hi I've got a small internal project I am working on. Currently it only serves my company, but I'd like to scale it so that it could serve multiple companies. The tables I have at the moment are USERS and PROJECTS. I want to start storing company specific information and relate it to the USERS table. For each user, I will have a new column that is the company they belong to.
Now I also need to store that companies templates in the database. The templates are stored as strings like this:
"divider","events","freeform" etc.
Initially I was thinking each word should go in as a separate row, but as I write this I'm thinking perhaps I should store all templates in one entry separated by commas (as written above).
Bottom line, I'm new to database design and I have no idea how to best set this up. How many tables, what columns etc. For right now, my table structure looks like this:
PROJECTS
Project Number | Title | exacttarget_id | Author | Body | Date
USERS
Name | Email | Date Created | Password
Thanks in advance for any insights you can offer.

What I would do is create 2 tables:
I would create one table for the different companies, lets call it COMPANY:
Company_id | Title | Logo | (Whatever other data you want)
I would also create one table for the settings listed above, lets call it COMPANY_SETTINGS:
Company_id | Key | Value
This gives you the flexibility in the future to add additional settings without compromising your existing code. A simple query gets all the settings, regardless of how many your current version uses.
SELECT Key, Value FROM COMPANY_SETTINGS WHERE Company_id = :companyId
Te results can then be put into an associative array for easy use throughout the project.

Is this a good strategy for implementing access control?

I'd like to implement a database-driven access control system. I've been reading about ACL, roles, RBAC, etc., but it seems like the most common schemes have some major drawbacks. RBAC, for example, seems to be clunky when it comes to implementing fine-grained access control (for example, allowing a certain role to update only particular columns of a particular record).
What if I structured my access control list like this:
| role | table | action | columns | conditions |
| ----- | ----- | ------ | -------- | ----------------- |
| user | user | view | name, id | self.id = user.id |
| user | user | update | password | self.id = user.id |
| admin | user | update | * | |
| admin | user | create | * | |
| admin | user | delete | * | |
The idea is that a user's role(s) would be checked against this table when they try to access the database (so, implemented at the model level). action can be any one of {create, view, update, delete, list}. The self scope would be a reserved keyword that references the current user's properties. This would allow us for example, to only allow users to update their own passwords (and not someone else's).
Is this robust? Obviously I would still need a separate list to control access to other types of resources like URIs, etc.

Great question. You are hitting the limitations of ACLs and RBAC. There is another way which is more flexible called attribute-based access control (ABAC).
The following diagram shows how access control has evolved over time to cater to more complex scenarios (more users, more data, more devices, more context).
More specifically, you are struggling with the fact that RBAC doesn't support relationships. ABAC does however. ABAC is based on attributes. An attribute is just a key-value pair e.g. role == manager or location == Arizona.
ABAC uses policies with attributes to express authorization scenarios. For instance, in ABAC you can express scenarios such as:
A user with the role == doctor can do the action == view on a resource of type == medical record if the doctor location == the patient location.
There is a standard called XACML (eXtensible Access Control Markup Language) which you can use to implement ABAC. There are even products that offer XACML specifically for databases and data access control such as the Axiomatics Data Access Filter.
If you want to learn more on ABAC I recommend you turn to 2 great resources:
NIST: Guide to Attribute Based Access Control (ABAC) Definition and Considerations (pdf)
Webinar on the NIST document.

Database Schema allowing for multiple login opportunities (Facebook-Connect, Oauth, OpenID, etc.) for the same account

I want to accomplish nearly the same thing as this question, which is to store authentication data from multiple sources (Facebook, Twitter, LinkedIn, openID, my own site, etc.) so that one person can log in to their account from any/all of the mentioned providers.
The only caveat being that all user data needs to be stored in a single table.
Any suggestions?
If there is no clean way to accomplish this, is it better to have tons of columns for all of the possibly login methods or create additional tables to handle the login methods with Foreign Keys relating back to the user table (as described in this answer)?

perhaps you want to create a table dedicated to the account types, along with a table for the actual user.
Say you have a users table with an auto_increment uinique ID for each user. Then, you want to create another table example: user_accounts, with it's own auto_icnrement ID, another column for relational ID (to the users table and a 3rd (or/and) 4th table for account type / extra data for authentication if needed.
You then insert a record for each account type for each user. Basically it may look like this:
user_accounts
| ID | User_ID | account_type | authentication
| 1 | 1 | facebook | iamthecoolestfacebookerever
| 2 | 1 | google | mygoogleaccount
In it's most simplistic form. you will probably be storing much different data than that, but hopefully you get the point.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008