How do I ensure Firebase database structure using anonymous auth? - json

I have a public-input type app using Firebase, with anonymous auth. The user data is used to create points on a map. Each anonymous user can only edit the data inside the node matching their auth id - via security rules.
However, my app depends on a certain database structure. How do I ensure my database structure/integrity using anonymous auth, since the database url is client-side readable?
I think it is possible with security and validation rules, but I'm not sure. Maybe deny children creation in a node? This would be necessary to ensure the schema is followed.
Each auth node can have many key nodes, but I would want to limit this Firebase-side. And each key node must follow the schema below (so I can pull out the geojson easily). Below is my current setup - wondering what is missing?
"features" : {
"5AGxfaK2q8hjJsmsO3PUxUs09Sz1" : {
"-KS3R4sWPdcDkrxyIFX6" : {
"geometry" : {
"coordinates" : [ -81.88247680664062, 38.884619201291905 ],
"type" : "Point"
},
"properties" : {
"color" : "#2be",
"title" : ""
},
"type" : "Feature"
},

Authentication and database schema are completely separate topics. You ensure database schema by using a combination of .write and .validate rules in your security doc, not by anything to do with your authentication provider (i.e. Anonymous authentication).
This is described in detail in our database security guide.
A quick summary:
hasChildren: specify required fields
newData: refer to the data being written
data: refer to data already in the database
.validate: specify data schema using things like newData.isString() or newData.val() == data.val() + 1
Keep in mind that .validate rules are only run for non-null values. Thus, if you want to try something like !data.exists() (i.e. you can only write to this path once and can't modify it later) or newData.exists() (i.e. you can't delete this data) then you need to specify those in a .write rule.
Refer to the guide for more detail.

Related

Can i watch only one field in Couchbase with kafka-connect (CDC)?

We are trying to move our database from mysql to couchbase and implement some CDC (change data capture) logic for copying data to our new db.
All enviroments set up and running. mysql, debezium, kafka, couchbase, kubernetes, pipeline etc. And also we are set up our kafka-source connector for debezium. here it is:
- name: "our-connector"
config:
connector.class: "io.debezium.connector.mysql.MySqlConnector"
tasks.max: "1"
group.id: "our-connector"
database.server.name: "our-api"
database.hostname: "******"
database.user: "******"
database.password: "******"
database.port: "3306"
database.include.list: "our_db"
column.include.list: "our_db.our_table.our_field"
table.include.list: "our_db.our_table"
database.history.kafka.topic: "inf.our_table.our_db.schema-changes"
database.history.kafka.bootstrap.servers: "kafka-cluster-kafka-bootstrap.kafka:9092"
value.converter: "org.apache.kafka.connect.json.JsonConverter"
value.converter.schemas.enable: "false"
key.converter: "org.apache.kafka.connect.json.JsonConverter"
key.converter.schemas.enable: "false"
snapshot.locking.mode: "none"
tombstones.on.delete: "false"
event.deserialization.failure.handling.mode: "ignore"
database.history.skip.unparseable.ddl: "true"
include.schema.changes: "false"
snapshot.mode: "initial"
transforms: "extract,filter,unwrap"
predicates: "isOurTableChangeOurField"
predicates.isOurTableChangeOurField.type: "org.apache.kafka.connect.transforms.predicates.TopicNameMatches"
predicates.isOurTableChangeOurField.pattern: "our-api.our_db.our_table"
transforms.filter.type: "com.redhat.insights.kafka.connect.transforms.Filter"
transforms.filter.if: "!!record.value() && record.value().get('op') == 'u' && record.value().get('before').get('our_field') != record.value().get('after').get('our_field')"
transforms.filter.predicate: "isOurTableChangeOurField"
transforms.unwrap.type: "io.debezium.transforms.ExtractNewRecordState"
transforms.unwrap.drop.tombstones: "false"
transforms.unwrap.delete.handling.mode: "drop"
transforms.extract.type: "org.apache.kafka.connect.transforms.ExtractField{{.DOLLAR_SIGN}}Key"
transforms.extract.field: "id"
and this configuration publish this message to kafka. captured from kowl.
as you can see we have original records id and changed fields new value.
no problem so far. Actually we have problem :) Our field is DATETIME type in mysql, but debezium publish it as unixtime.
First question how can we publish this with formatted datetime (YYYY-mm-dd HH:ii:mm for example)?
lets move on.
here is the actual problem. we have searched a lot but all examples are recording whole data to couchbase. but we already created this record in couchbase, just want to data up to date. actually we manipulated data also.
here is example data from couchbase
we want to change only bill.dateAccepted field in couchbase. tried some yaml config but no success on sink.
here is are sink config
- name: "our-sink-connector-1"
config:
connector.class: "com.couchbase.connect.kafka.CouchbaseSinkConnector"
tasks.max: "2"
topics: "our-api.our_db.our_table"
couchbase.seed.nodes: "dev-couchbase-couchbase-cluster.couchbase.svc.cluster.local"
couchbase.bootstrap.timeout: "10s"
couchbase.bucket: "our_bucket"
couchbase.topic.to.collection: "our-api.our_db.our_table=our_bucket._default.ourCollection"
couchbase.username: "*******"
couchbase.password: "*******"
key.converter: "org.apache.kafka.connect.storage.StringConverter"
key.converter.schemas.enable: "false"
value.converter: "org.apache.kafka.connect.json.JsonConverter"
value.converter.schemas.enable: "false"
connection.bucket : "our_bucket"
connection.cluster_address: "couchbase://couchbase-srv.couchbase"
couchbase.document.id: "${/id}"
Partial answer to your first question. One approach would be that You can use an SPI converter to convert the unixdatetime to string. if you want to convert all the datetimes and your input message contains many datetime fields, you can just look at the JDBCType and do the conversion
https://debezium.io/documentation/reference/stable/development/converters.html
As for extracting I/U , you can write a custom SMT (Single message transform) that has before and after records and also has the operation type (I/U/D) and comparing before and after fields extract the delta. In the past when i tried something for this , I bumped upon the following which came in quite handy as a reference. This way you have a delta field and a key and that can just update instead of updating the full document (though the sink has to support it will come in at some point)
https://github.com/michelin/kafka-connect-transforms-qlik-replicate
The Couchbase source connector does not support watching individual fields. In general, the Couchbase source connector is better suited for replication than for change data capture. See the caveats mentioned in the Delivery Guarantees documentation.
The Couchbase Kafka sink connector supports partial document updates via the built-in SubDocumentSinkHandler or N1qlSinkHandler. You can select the sink handler by configuing the couchbase.sink.handler connector config property, and customize its behavior with the Sub Document Sink Handler config options.
Here's a config snippet that tells the connector to update the bill.dateAccepted property with the entire value of the Kafka record. (You'd also need to use a Single Message Transform to extract just this field from the source record.)
couchbase.sink.handler=com.couchbase.connect.kafka.handler.sink.SubDocumentSinkHandler
couchbase.subdocument.path=/bill/dateAccepted
If the built-in sink handlers are not flexible enough, you can write your own custom sink handler using the CustomSinkHandler.java example as a template.

what is view in couchbase

I am trying to understand what exactly couchbase view is used for, I have gone through some materials in docs, but the 'view' concept does not settle me quite well.
Are views in couchbase analogues to views in view in RDBMS?
https://docs.couchbase.com/server/6.0/learn/views/views-basics.html
A view performs the following on the Couchbase unstructured (or
semi-structured) data:
Extract specific fields and information from the data files.
Produce a view index of the selected information.
how does view and index work here, seems there is separate index for view. so if a documents updates are both indexes updated?
https://docs.couchbase.com/server/6.0/learn/views/views-store-data.html
In addition, the indexing of data is also affected by the view system
and the settings used when the view is accessed.
Helpful post:
Views in Couchbase
You can think of Couchbase Map/Reduce views as similar to materialized views, yes. Except that you create them with JavaScript functions (a map function and optionally a reduce function).
For example:
function(doc, meta)
{
emit(doc.name, [doc.city]);
}
This will look at every document, and save a view of each document that contains just city, and has a key of name.
For instance, let's suppose you have two documents;
[
key 1 {
"name" : "matt",
"city" : "new york",
"salary" : "100",
"bio" : "lorem ipsum dolor ... "
},
key 2 {
"name" : "emma",
"city" : "columbus",
"salary" : "120",
"bio" : "foo bar baz ... "
}
]
Then, when you 'query' this view, instead of full documents, you'll get:
[
key "matt" {
"city" : "new york"
},
key "emma" {
"city" : "columbus"
}
]
This is a very simple map. You can also use reduce functions like _count, _sum, _stats, or your own custom.
The results of this view are stored alongside the data on each node (and updated whenever the data is updated). However, you should probably stay away from Couchbase views because:
Views are stored alongside the data on each node. So when reading it, data has to be pulled from every node, combined, and pulled again. "Scatter/gather"
JavaScript map/reduce doesn't give you all the query capabilities you might want. You can't do stuff like 'joins', for instance.
Couchbase has SQL++ (aka N1QL), which is more concise, declarative, and uses global indexes (instead of scatter/gather), so it will likely be faster and reduce strains during rebalance, etc.
Are deprecated as of Couchbase Server 7.0 (and not available in Couchbase Capella at all)

How to work with configuration files in Airflow

In Airflow, we've created several DAGS. Some of which share common properties, for example the directory to read files from. Currently, these properties are listed as a property in each separate DAG, which will obviously become problematic in the future. Say if the directory name was to change, we'd have to go into each DAG and update this piece of code (possibly even missing one).
I was looking into creating some sort of a configuration file, which can be parsed into Airflow and used by the various DAGS when a certain property is required, but I cannot seem to find any sort of documentation or guide on how to do this. Most I could find was the documentation on setting up Connection ID's, but that does not meet my use case.
The question to my post, is it possible to do the above scenario and how?
Thanks in advance.
There are a few ways you can accomplish this based on your setup:
You can use a DagFactory type approach where you have a function generate DAGs. You can find an example of what that looks like here
You can store a JSON config as an Airflow Variable, and parse through that to generate a DAG. You can store something like this in a Admin -> Variables:
[
{
"table": "users",
"schema": "app_one",
"s3_bucket": "etl_bucket",
"s3_key": "app_one_users",
"redshift_conn_id": "postgres_default"
},
{
"table": "users",
"schema": "app_two",
"s3_bucket": "etl_bucket",
"s3_key": "app_two_users",
"redshift_conn_id": "postgres_default"
}
]
Your DAG could get generated as:
sync_config = json.loads(Variable.get("sync_config"))
with dag:
start = DummyOperator(task_id='begin_dag')
for table in sync_config:
d1 = RedshiftToS3Transfer(
task_id='{0}'.format(table['s3_key']),
table=table['table'],
schema=table['schema'],
s3_bucket=table['s3_bucket'],
s3_key=table['s3_key'],
redshift_conn_id=table['redshift_conn_id']
)
start >> d1
Similarly, you can just store that config as a local file and open it as you would any other file. Keep in mind the best answer to this will depend on your infrastructure and use case.

CloudFormation Template - any way to get a Spot-Fleet-Request ID?

I'm attempting to create a single template that creates the following:
AWS::EC2::SpotFleet resource
2 AWS::ApplicationAutoScaling::ScalingPolicy resources (scale up, scale down)
Initially, my template included only the SpotFleet resource, and I confirmed that the stack would create without issue. Once I added the ScalingPolicy resources, the stack would rollback because there was "No scalable target registered for namespace..." So, I added an additional resource.
AWS::ApplicationAutoScaling::ScalableTarget resource.
(From http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-applicationautoscaling-scalabletarget.html#cfn-applicationautoscaling-scalabletarget-resourceid)
{
"Type" : "AWS::ApplicationAutoScaling::ScalableTarget",
"Properties" : {
"MaxCapacity" : Integer,
"MinCapacity" : Integer,
"ResourceId" : String,
"RoleARN" : String,
"ScalableDimension" : String,
"ServiceNamespace" : String
}
}
The ResourceID is a required property. I have the data for all the other properties, but when researching what data is needed for the ResourceID property, I have found that the data I need is the spot-fleet-request ID, (something like this: "SpotFleetRequestId": "sfr-73fbd2ce-aa30-494c-8788-1cee4EXAMPLE").
So here's the problem: Since I am creating the spot fleet request in the same template as the scaling policy, I can't put the SpotFleetRequestId in manually, since to my knowledge this is created when the resource is and there's no way to anticipate what the request ID will be. In other templates, with other kinds of resources, I've simply used "Ref" or "Fn::GetAtt" to pass in the arn of a resource without having to manually input this. However--there seems to be no way to do this with a SpotFleetRequestID. All the research I've done has turned up nothing, not even a single template example that uses a method like I'm describing - the only examples available assume that the scalable target resource already exists and the SpotFleetRequestID is known prior to creating the ScalingPolicy.
Does anyone have any idea if referring to the SpotFleetRequestID of an AWS::EC2::SpotFleet initialized in the same template is even possible? Or am I just missing something REALLY obvious?
-KungFuBilly
Turns out that if you "Ref" the logical name of the AWS::EC2::SpotFleet it will return the request ID. Then, it's a matter of using "Fn::Join" to get the right data for the ResourceID. Should look something like this:
"ResourceId": {
"Fn::Join": [
"/",
[
"spot-fleet-request",
{
"Ref": "SpotFleet"
}
]
]
},
That will output: spot-fleet-request/"SpotFleetRequestID"

Firebase Security Rules?

I seem to be having a hard time with firebase security rules. I've read the guides, but the simulator results aren't descriptive enough (Would be much easier if we could just hover over a node, and a button pops up where we can update the rules).
Here's what my structure looks like:
chats
- randomChatId01
- name: "awesome chat"
- members:
- userId01 : true
- userId02 : true
- randomChatId02
- members:
- userId02 : true
- randomChatId03
- members:
- userId01 : true
- userId02 : true
- ...
I only want a user to be able to read the chat nodes in which the node's child node members contains the authenticated user's auth.uid.
So in this case if userId01 were logged in, she would only have read access to randomChatId01 and randomChatId03.
This is the rule I have:
{
"rules": {
"chats": {
"$chat": {
".read": "data.child('members').val().contains(auth.uid)"
}
}
}
}
However it's returning the following in the simulator:
Attempt to read /chats with auth={"provider":"anonymous","uid":"eF4ztDEXz7"}
/
/chats
No .read rule allowed the operation.
Read was denied.
This is because Firebase Security Rules are evaluated at the location that you read from.
You're trying to read /chats. The user does not have read permission to /chats, so the operations fails straight away.
If you read /chats/randomChatId01 as userId01 it will succeed.
This is covered in the documentation section rules are not filters. Also see Michael Lehenbauer's answer here: Restricting child/field access with security rules