I am just learning Firebase and I would like to know why one would need custom reference keys instead of just using childByAutoId. The examples from docs showed mostly similar to the following:
{
"users": {
"alovelace": {
"name": "Ada Lovelace",
"contacts": { "ghopper": true },
},
"ghopper": { ... },
"eclarke": { ... }
}
}
but why not use something like
{
"users": {
"gFlmT9skBHfxf7vCBCbhmxg6dll1": {
"name": "Ada Lovelace",
"contacts": { "ghopper": true },
},
"gFlmT9skBHfxf7vCBCbhmxg6dll2": { ... },
"gFlmT9skBHfxf7vCBCbhmxg6dll3": { ... }
}
}
Though I would prefer the first example for readability purposes. Aside from that, would there be any impact regarding Firebase features and other development related things like querying, updating, etc? Thanks!
Firebase's childByAutoId method is great for generating the keys in a collection:
Where the items need to be ordered by their insertion time
Where the items don't have a natural key
Where it is not a problem if the same item occurs multiple times
In a collection of users, none of these conditions (usually) apply: the order doesn't matter, users can only appear in the collection once, and the items do have a natural key.
That last one may not be clear from the sample in the documentation. Users stored in the Firebase Database usually come from a different system, often from Firebase Authentication. That system gives users a unique ID, in the case of Firebase Authentication called the UID. This UID is a unique identifier for the user. So if you have a collection of users, using their UID as the key makes it easy to find the users based on their ID. In the documentation samples, just read the keys as if they are (friendly readable versions of) the UID of that user.
In your example, imagine that you've read the node for Ada Lovelace and want to look up her contacts. You'd need the run a query on /users, which gets more and more expensive as you add users. But in the model from the documentation you know precisely what node you need to read: /users/ghopper.
Related
Currently, I have an API end point that needs to return a list of items in a JSON format. However, each of the items that are returned has a different structure.
E.g. Think of a feed API. However, the structure of each item within the API can be very different.
Is it standard to return API response with multiple items - each with a different structure?
A made-up sample below to show different structures.
Store, Candy, and Personnel in the example is logically the same thing in my case (3 different items). Howevever, the structuring underneath can be very different - with different key-value pairs, different levels of nesting, etc.
{
"store":{
"book":[
{
"category":"reference",
"author":"Nigel Rees",
"title":"Sayings of the Century",
"price":8.95
},
{
"category":"fiction",
"author":"Evelyn Waugh",
"title":"Sword of Honour",
"price":12.99
}
],
"bicycle":{
"color":"red",
"price":19.95
}
},
{
"candy":
{
"type":"chocolate",
"manufacturer":"Hershey's",
"cost":10.00,
"reduced_cost": 9.00
},
},
{
"Personnel":
{
"name":"chocolate",
profile:
{
"Key": "Value",
"Key": "Value",
something:
{
"Key": "Value",
"Key": "Value",
}
}
},
},
}
There are no strict rules to REST in terms of how you design your payloads. However, there are still certainly things to consider obviously when doing so. Without knowing really the specifics of your needs it's hard to give specific advice but in general when it comes to designing a JSON REST API here is what I think about.
On average, how large will my payload be. We don't want to pass large amounts of data on each request. This will make your application extremely slow and perhaps even unusable on mobile devices. For me, the limit in the absolute worse case is 1mb and maybe this is even too high. If you find your payload is too large, break it down into separate resources. For example rather than including the books in the response to your stores resource, just reference the unique id's of the books that can be accessed through /stores/books/{id}
Is it simple enough that a person who stumbles across the resource can understand the general use of it. The simpler an API is the more useful it is for users. If the structure is really complex, perhaps breaking it into several resources is a better option
This point sort of balances number #1. Try and reduce the number of requests to get a certain piece of data as much of possible (still considering the other two points above). Excessively breaking down payloads into separate resources also reduces performance.
I have a few people saved in my database. Sometimes teams are formed an I need to add a few people to a team. Some people won't have any team.
I will never need to access the teams separately, i.e. I only ever need to look at the teams when I am already looking at the people. So I thought of saving it all together. So instead of creating a new collection for teams, I thought it might be better to just nest it:
people: [
{
name: "Jack",
team: team_id_1,
},
{
name: "John",
team: null,
},
{
name: "Jane",
team: team_id_1,
},
...
],
teams: [
{
team_id: team_id_1
color: "red",
},
...
]
However, I do not know how I can now create this sort of relationship in mongo, and assign a team_id to each of the person-objects without it being a collection.
Someone here suggested to Define user's team as a reference to team's collection and then populating it. - but, as I undestand it, this means creating a separate model for my team, and turning team into its own separate collection. I would prefer to not do this, since I will never need to access this collection directly and don't want to go through one massive collection of teams, if I only ever need to access three teams and can easily keep them in the relevant object.
What would you advise me to do? Is this the wrong approach?
I've been looking at JSON API and keep getting hung up on scalability scenarios. Lets say you have a large collection (1000) of models, each with 3 or 4 relationships each.
From my understanding, JSON API requires you to specify at least the relationships with their associated id(s) (and optionally sideload the relationship with include). If that collection of 1000 models has to do a JOIN for every single relationship to be able to populate the valid JSON API payload like below:
...
{
"some_relationship_name": {
data: [
{ id: 1, type: "derp" }
...
]
}
}
I don't see how this can possibly scale in any reasonable way.
You don't have to specify the ids of the relationships. You can just specify the links to provide a way to fetch the links. Checkout the specification.
So you can do something like this:
{
id: '1'
type: 'base'
relationships: {
relA: {
links: {
self: '/base/1/relationships/relA',
related: '/base/1/relationships/relA/related',
}
},
...
}
attributes: {...}
}
So you don't have to JOIN anything you don't directly need. For example in a list you don't join information you only need in the detail view.
I'm not sure where you can see problem. You have 20 bytes per relations / 4 relations / 1000 records => ~100kB. Existing adapters should have no issues with processing such data fast enough.
If you need to transport less data there are several options. You can add compression but be aware that for such small data it is usually faster to transfer data than to compress them.
Other option is to send only data that you really need. Usually you don't need 1000 records in the web app immediately. So paging and lazy-loading should help you to send only those data that are really required.
About to add keyword/tags to one of the business objects in our database, let's call the table users. I've considered adding a tags table and a usertags table, but I can't see an easy way to perform queries which would contain and and or. For example, I'd like to be able to return all the users that have tag A AND B, as well as query for users with tag A OR B. OR queries are easy, but AND queries are
I've considered even putting all the user records into a json backed database so I could have all the users duplicated like this:
{
user_id:1,
keyword:"A",
keyword:"B"
}
etc.
but I'm not sure how performant a database like MongoDB is when running queries like this.
Yet another option is to have a tags field on the user table, and use REGEX queries. In some ways I like this the best, since it means it's much easier to have ad hoc queries, but I'm worried about performance.
Note that the tag isn't the only field that we need to search by, so ideally we'd have a solution that supports date range searches as well as searches against other fields.
I can only really talk of MongoDB for that matter, so I'll stick to it.
Let's assume a more accurate model like
{
_id: "foo#bar.com",
keywords: [ "A", "B" ],
joined: ISODate("2014-12-28T12:00:00.123Z"),
tags: [ "C", "D" ],
location: { type: "Point", coordinates: [ 38.1200538, -86.9141607 ] },
notes: "Lorem ipsum dolor sic amet."
}
Performance in MongoDB is determined more or less by two factors: wether a field you query is indexed and wether the index is in RAM. In general, MongoDB tries to keep at least all indices in RAM, plus as big of a subset of the data as possible. Indexing a field is quite easy. To stick with your first requirement, we index the keywords field:
db.yourCollection.ensureIndex({ keywords: 1})
What happens now is that MongoDB will create a list of keywords and a link to the respective documents. So if you do a query for keyword "A"
db.yourCollection.find({keywords: "A"})
only the documents actually containing the keyword "A" will be read and returned. This is called an index scan. If there wasn't an index on "keywords", MongoDB would have read each and every document in the collection, checking wether the keyword field contained "A" and added the respective documents to the result set, which is called a collection scan.
Now, checking for a document that has both the "A" and the "B" keyword, that would be rather simple:
db.yourCollection.find({$or: [ {keywords:"A"}, {keywords:"B"} ] })
Since we have indexed the "keywords" field, the logical check is done in RAM and the respective documents are added to the result set.
As for regex searches, they are absolutely possible and quite fast for indexed fields:
db.yourCollection.find({keywords: /^C.*/i})
will return all documents which contain keywords beginning with the letter "c" (case insensitive) using an index scan.
As for your requirement for doing queries on date ranges:
db.yourCollection.find({joined:
{
$gte: ISODate("2014-12-28T00:00:00.000Z"),
$lt: ISODate("2014-12-29T00:00:00.000Z")
}
})
will return all users who joined on the Dec 28, 2014. Since we haven't created an index on the field yet, a collection scan would have been used. Of course, you can create an index on the "joined" field.
So, let's assume you want to find all users with a keyword "A" from Santa Claus, IN:
db.yourCollection.find({
keywords: "A",
location: {
$nearSphere : {
$geometry: {
type : "Point",
coordinates: [ 38.1200538, -86.9141607 ]
},
$minDistance: 0,
$maxDistance: 10000
}
}
})
This will return... Nothing, iirc, since we have to create a geospatial index first:
db.collection.ensureIndex( { location : "2dsphere" } )
Now the mentioned query will work as expected.
Conclusion
Your requirements can be fulfilled by MongoDB and with proper indexing with good performance. However, you might want to dig into MongoDBs restrictions.
You might want to read a bit more. Here are my suggestions:
Introduction to MongoDB
Index documentation
Data modelling introduction
StackOverflow lets you search for posts by tags, and lets you filter by an intersection of tags, e.g. ruby x mysql x tags. But typically it's inefficient to retrieve such lists from MySQL using mulitple joins on the taggings. What's a more performant way to implement filter-by-multiple tag queries?
Is there a good NoSQL approach to this problem?
In a NoSQL or document-oriented scenario, you'd have the actual tags as part of your document, likely stored as a list. Since you've tagged this question with "couchdb", I'll use that as an example.
A "post" document in CouchDB might look like:
{
"_id": <generated>,
"question": "Question?",
"answers": [... list of answers ...],
"tags": ["mysql", "tagging", "joins", "nosql", "couchdb"]
}
Then, to generate a view keyed by tags:
{
"_id": "_design/tags",
"language": "javascript",
"views": {
"all": {
"map": "function(doc) {
emit(doc.tags, null);
}"
}
}
}
In CouchDB, you can issue an HTTP POST with multiple keys, if you wish. An example is in the documentation. Using that technique, you would be able to search by multiple tags.
Note: Setting the value to null, above, helps keep the views small. Use include_docs=true in your query if you want to see the actual documents as well.