Translate sql database schema to IndexedDB - mysql

I have three tables in my SQL Schema: Clients, with address and so on, orders with order details and files, which stores uploaded files. both the files table and the orders table contain foreign keys referencing the Client tables.
How would I do that in IndexedDB? IÄm new to this whole key-index-thinking and would just like to understand, how the same Thing would be done with indexedDB.
Now I know there is a shim.js file, but I'm trying to understand the concept itself.
Help and tips highly appreciated!
EDIT:
So I would really have to think about which queries I want to allow and then optimize my IndexedDB implementation for those queries, is that the main point here? Basically, I want to to store a customer once and then many orders for that customer and then be able to upload small files (preferably pdfs) for that customer, not even necessarily for each order (although if that's easy to implement, I may do it)... I see every customer as a separate entity, I wont have things like "give me all customers who ordered xy" - I only need to have each customer once and then store all the orders for the customer and all the files. I wanto be able to go: Search for customer with the name of XY - which then gives me a list of all orders and their dates and a list of the files uploaded for that customer (maybe associated to the order).

This question is a bit too broad to answer correctly. Nevertheless, the major concept to learn when transitioning from SQL to No-SQL (indexedDB) is the concept of object stores. Most SQL databases are relational and perform much of the work of optimizing queries for you. indexedDB does not. So the concepts of normalization and denormalization work a bit differently. The focal point is to explicitly plan your own queries. Unlike the design of an app/system that allows simple ad-hoc SQL queries that are designed at a later point in time, and possibly even easily added/changed at a later time, you really need to do a lot of the planning up front for indexedDB.
So it is not quite safe to say that the transition is simply a matter of creating three object stores to correspond to your three relational tables. For one, there is no concept of joining in indexedDB so you cannot join on foreign keys.
It is not clear from your question but your 3 tables are clients, orders, and files. I will go out on a limb here and make some guesses. I would bet you could use a single object store, clients. Then, for each client object, store the normal client properties, store an orders array property, and store a files array property. In the orders array, store order objects.
If your files are binary, this won't work, you will need to use blobs, and may even encounter issues with blob support in various browser indexedDB implementations (Chrome sort of supports it, it is unclear from version to version).
This assumes your typical query plan is that you need to do something like list the orders for a client, and that is the most frequently used type of query.
If you needed to do something across orders, independent of which client an order belongs to, this would not work so well and you would have to iterate over the entire store.
If the clients-orders relation is many to many, then this also would not work so well, because of the need to store the order info redundantly per client. However, one note here, is that this redundant storage is quite common in NoSQL-style databases like indexedDB. The goal is not to perfectly model the data, but to store the data in such a way that it your most frequently occurring queries complete quickly (while still maintaining correctness).
Edit:
Based on your edit, I would suggest a simple prototype that uses three object stores. In your client view page where you display client details, simply run three separate queries.
Get the one entity from the client object store based on client id.
Open a cursor over the orders and get all orders for the client. In the orders store, use a client-id property. Create an index on this client-id property. Open the cursor over the index for a specific client id.
Open a cursor over the files store using a similar tactic as #2.
In your bizlogic layer, enforce your data constraints. For example, when deleting a client, first delete all the files from the files store, then delete all the orders from the orders store, and then delete the single client entity from the client store.
What I am suggesting is to not overthink it. It is not that complicated. So far you have not described something that sounds like it will have performance issues so there is no need for something more elegant.

I will go with Josh answer but if you are still finding it hard to use indexeddb and want to continue using sql. You can use sqlweb - It will let you do operation inside indexeddb by using sql query.
e.g -
var connection = new JsStore.Instance('jsstore worker path');
connection.runSql("select * from Customers").then(function(result) {
console.log(result);
});
Here is the link - http://jsstore.net/tutorial/sqlweb/

Related

How do Salesforce query using the relationships table behind the scenes?

I'm trying to figure out how Salesforce's metadata architecture works behind the scenes. There's a video they've released ( https://www.youtube.com/watch?v=jrKA3cJmoms ) where he goes through many of the important tables that drive it along (about 18m in).
I've figured out the structure for the basic representation / storage / retrieval of simple stuff, but where i'm hazy is how the relationship pivot table works. I'll be happy when:
a) I know exactly how the pivot table relates to things (RelationId column he mentions is not clear to me)
b) I can construct a query for it.
Screenshot from the video
I've not had any luck finding any resources describing it at this level in the detail I need, or managed to find any packages that emulate it that I can learn from.
Does anyone have any low-level experience with this part of Salesforce that could help?
EDIT: Thank you, David Reed for further details in your edit. So presumably you agree that things aren't exactly as explained?
In the 'value' column, the GUID of the related record is stored
This allows ease of fetching -to-one related records and, with a little bit of simple SQL switching, resolve a group of records in the reverse direction.
I believe Salesforce don't have many-to-many relationships, as opposed to using a 'junction', so the above is still relevant
I guess now though I wonder what the point of the pivot table is at all, as there's a very simple relationship going on here now. Unless the lack of index on the value columns dictates the need for one...
Or, could it be more likely/useful if:
The record's value column stores a GUID to the relationship record and not directly to the related record?
This relationship record holds all necessary information required to put together a decent query and ALSO includes the GUID of the related record?
Neither option clear up the ambiguity for me, unless I'm missing something.
You cannot see, query, or otherwise access the internal tables that underlie Salesforce's on-platform schema. When you build an application on the platform, you query relationships using SOQL relationship queries; there are no pivot tables involved in the work you can see and do on the platform.
While some presentations and documentation discuss at some level the underlying implementation, the precise details of the SQL tables, schemas, query optimizers, and so on are not public.
As a Salesforce developer or developer who interacts with Salesforce via the API, you do not need to worry about the underlying SQL implementation used on Salesforce's servers at almost any time. The main point at which that knowledge can become helpful is when you are working with massive data volumes (multiple millions of records). The most helpful documentation for that use case is Best Practices for Deployments with Large Data Volumes. The underlying schema is briefly discussed under Underlying Concepts. But bear in mind
As a customer, you also cannot optimize the SQL underlying many application operations because it is generated by the system, not written by each tenant.
The implementation details are also subject to change.
Metadata Tables and Data Tables
When an organisation declares an object’s field with a relationship type, Force.com maps the field to a Value field in MT_Data, and then uses this field to store the ObjID of a related object.
I believe the documentation you mentioned is using the identifier ObjId ambiguously, and here actually means what it refers to earlier in the document as GUID - the Salesforce Id. Another paragraph states
The MT_Name_Denorm table is a lean data table that stores the ObjID and Name of each record in MT_Data. When an application needs to provide a list of records involved in a parent/child relationship, Force.com uses the MT_Name_Denorm table to execute a relatively simple query that retrieves the Name of each referenced record for display in the app, say, as part of a hyperlink.
This also doesn't make sense unless ObjId is being used to mean what is called GUID in the visual depiction of the table above in the document - the Salesforce Id of the record.

Database for 'who viewed this item also viewed..'

I want to create feature 'who viewed this item also viewed' like Amazon or Ebay. I'm deciding between MySql and non-relational database like MongoDB.
Edit: It seems to be straightforward to implement this feature in MySql. My guess is creating 'viewed' table in which userId, itemId, and time of viewing are saved. So, when trying to recommend off of a current item a user is looking at, I would Sub = (SELECT userId FROM viewed WHERE itemId == currentItemId) Then, SELECT itemId FROM viewed INNER JOIN Sub on viewed.userId = Sub.userId
Wouldn't this be too much for 100,000 users who viewed 100 pages this month?
For non-relational database, I don't feel it is right to have User to embed all users or Item to embed all Users. So, I'm thinking to have each User holds a list of itemIds he looked at and each Item holds a list of userIds seen by. And I'm not sure what to do next. Am I on the right path here?
If not, could you suggest a good way to implement this feature in non-relational database? And, does this suggestion have advantage in speed compared to MySql?
Initial Response
It seems to be straightforward to implement this feature in MySql by just calling JOIN on Item and User table.
Yes.
But, how fast or slow the database call will be to gather entire viewing history of 100,000 users at once?
How long is a piece of string ?
That depends on the standards and quality of your Relational Database implementation. If you have ID fields on all your files, it won't have Relational integrity, power, or speed, it will have 1970's ISAM Record Filing System speeds.
On a Sybase ASE server, on a small Unix box, a SELECT of similar intent on a table (not a file) with 16 billion rows returns 100 rows in 12 milliseconds.
For non-relational database, I don't feel it is right to have User to embed all users or Item to embed all Users. So, I'm thinking to have each User holds a list of item ids he looked at and each Item holds a list of user ids seen by.
I can't answer re MangoDb.
But for a Relational Database, that is how we implement it.
with one great difference: the two lists are implemented in a single table
each row is a single fact viewed [sorry] from two sides (the fact that an User has viewed an Item, is one and the same fact that an Item has been viewed by an User)
So it appears to be Relational thinking ... implemented Mango-style, which requires 100% data and table duplication. I have no idea whether that is good or bad in MongoDb, in the sense that it could well be what is required for the thing to "perform". Ugly as sin.
And I'm not sure what to do next. Am I on the right path here?
Right for Relational (as long as you use one table for the two "lists"). Ask a more specific question if you do not understand this point.
If not, could you suggest a good way to implement this feature in non-relational database? And, does this suggestion have advantage in speed compared to MySql?
Sorry, I can't answer that.
But it would be unlikely that a non-relational DB can store and retrieve info that is classic Relational, faster than a semi-relational Record Filing System such as MySQL. All things being equal, of course. A real SQL platform would be faster still.
Response to Comments
First you had:
So, I'm thinking to have each User holds a list of item ids he looked at and each Item holds a list of user ids seen by.
That is two lists. That is not good, because the second list is a 100% duplication of the first.
Now you have (edited in the Question, and in the new comments):
I didn't fully understand what you meant by 'use one table for the two list'. My interpretation is create 'viewed' table in which userId, itemId, and time of viewing are saved.
That is good, you now have one list.
Just to be clear about the database we are discussing, let me erect a model, and have you confirm it.
User Item Data Model
If you are not used to the standard Notation, please be advised that every little tick, notch, and mark, the solid vs dashed lines, the square vs round corners, means something very specific. Refer to the IDEF1X Notation.
So, when trying to recommend off of a current item a user is looking at, I would Sub = (SELECT userId FROM viewed WHERE itemId == currentItemId). Then, SELECT itemId FROM viewed INNER JOIN Sub on viewed.userId = Sub.userId. Is this what you mean?
I did make a declaration and caution about the table, but I didn't give any directions regarding non-SQL coding, so no.
I would never suggest doing something in two steps, that can be done in a single step. SQL has its problems, but difficulty in obtaining information from a set of Relational tables (ie. a derived relation) using a single SELECT is definitely not one of them.
SUB is not SQL. Although I can guess at what it does, I may well be wrong, therefore I cannot comment on that code.
Against the model I have supplied, on an ISO/IEC/ANSI Standard SQL platform, I would use:
SELECT DISTINCT ItemId -- Items viewed by ...
FROM UserItem
WHERE UserId = (
SELECT UserId -- Users who viewed Item
FROM UserItem
WHERE ItemId = #CurrentItemId
)
You will have to translate that into the non-SQL that your platform requires.
Wouldn't it be too much for 100,000 users who viewed 100 pages this month? Sorry for long question.
I have already answered that question in my initial response. Please read again.
You are trying to solve a performance problem that you do not yet have. That is not possible, given the laws of physics, the dependencies, our inability to reverse the chronology; etc. Therefore I recommend that you cease that activity.
Meanwhile, back at the farm, the cows need to be fed. Design the database first, then code the app, then if, and only if, there are performance problems, you can address them. IT Professionals can make scientific estimates, but I cannot give you a tutorial here in SO.
10,000,000 page views per month. You have not stated the no of Items, so the large figure is scary as hell. if you inform me as to how many Items; Users; Average Items viewed per session; and the duration (eg. month) you wish to cover, I can give you more specific advice.
As I understand it, an User views 1 (one) Item. As a selling-up feature, you want the system to identify the list of Items people "who viewed this item also viewed ...". That would appear to be a small fraction of 10,000,000 views. You do have an index on each table, yes ? So the non-SQL program you are using will not read 10,000,000 views to find that fraction, it will navigate the index, and read only the pages that contain that fraction.
Some of the non-SQLs need a second index to perform what real SQL platforms perform with one index. I have given that second index in the model.
While I appreciate that it was alright that a full definition was not provided for the file you described, up to now, since I am providing a model, I have to provide a complete and correct one, not a partial one.
Since Users view Items more than once, I have given a table that allows that, and tracks the Number of Views, and the Date Last Viewed. It is one row per User::Item, ever. If you would like a table that supports one row per User::Item view, please ask, I will provide.
From where I sit, on the basis of facts established thus far, the 10,000,000 figure is not concern.
This probably depends more on how you implement this feature than on the type of database used.
If you just store a lot of viewing history (like, "user x looked at item y"), you'd have to check out the users who viewed an item, and then all the items those users looked at. That can all be done on a single database table. However may end up with very large result sets.
It may be easier to use a graph structure of "connected" items that is continually updated during runtime and then easily queried.

CouchDB: separate collections

A quick thought today, does CouchDB handle multiple collections of databases?
To explain what I mean, our web app has two types of users, free and commercial which differ greatly in their document and view structure. For all intents and purposes, they are completely different products. A database is created per customer, and contains all their particular data and settings.
Without going into too much needless info, we currently have a mix of commercial-based databases and free-based databases mixed together in one instance of CouchDB. From a purely organisational standpoint, it's quite messy to sift through the (currently 50) free-based databases to find the (currently 3) commercial-based databases. Is there a better way to organise or sort these?
Has anyone got any ideas? I know I could simply add prefixes to the databases, but was after a MySQL-type approach where creating a separate database would be possible.
Other than grouping your data by using a property of each document (i.e. you put all of your free customers in the same db and prefix your views with that property) I don't think there's any way of grouping similar databases together aside from your suggestion of a prefix.
You could always consider running two instances, one for the free users and one for the paid ones.
Why don't you add 2 more properties to your couchdb users document, like .instance and .type, where you could store within .type field 'FreeUserType' or 'PaidUserType' and in .instance field just write some 'ID' of your 'group/collection' this way u can query your users base in very complex way and it will be giving you a lot of flexibility in extending your db with more 'related' data objects that you could 'join' on queries.

Performance of MySql Xml functions?

I am pretty excited about the new Mysql XMl Functions.
Now I can finally embed something like "object oriented" documents in my oldschool relational database.
For an example use-case consider a user who sings up at your website using facebook connect.
You can fetch an object for the user using the graph api, and get nice information. This information however can vary vastly. Some fields may or may not be set, some may be added over time and so on.
Well if you are just intersted in very special fields (for example friends relations, gender, movies...), you can project them into your relational database scheme.
However using the XMl functions you could store the whole object inside a field and then your different models can access the data using the ExtractValue function. You can store everything right away without needing to worry what you will need later.
But what will the performance be?
For example I have a table with 50 000 entries which represent useres.
I have an enum field that states "male", "female" (or various other genders to be politically correct).
The performance of for example fetching all males will be very fast.
But what about something like WHERE ExtractValue(userdata, '/gender/') = 'male' ?
How will the performance vary if the object gets bigger?
Can I maby somehow put an Index on specified xpath selections?
How do field types work together with this functions/performance. Varchar/blob?
Do I need fulltext indexes?
To sum up my question:
Mysql XML functins look great. And I am sure they are really great if you just want to store structured data that you fetch and analyze further in your application.
But how will they stand battle in procedures where there are internal scans/sorting/comparision/calculations performed on them?
Can Mysql replace document oriented databases like CouchDB/Sesame?
What are the gains and trade offs of XML functions?
How and why are they better/worse than a dynamic application that stores various data as attributes?
For example a key/value table with an xpath as key and the value as value connected to the document entity.
Anyone made any other experiences with it or has noticed something mentionable?
I tend to make comments similar to Pekka's, but I think the reason we cannot laugh this off is your statement "This information however can vary vastly." That means it is not realistic to plan to parse it all and project it into the database.
I cannot answer all of your questions, but I can answer some of them.
Most notably I cannot tell you about performance on MySQL. I have seen it in SQL Server, tested it, and found that SQL Server performs in memory XML extractions very slowly, to me it seemed as if it were reading from disk, but that is a bit of an exaggeration. Others may dispute this, but that is what I found.
"Can Mysql replace document oriented databases like CouchDB/Sesame?" This question is a bit over-broad but in your case using MySQL lets you keep ACID compliance for these XML chunks, assuming you are using InnoDB, which cannot be said automatically for some of those document oriented databases.
"How and why are they better/worse than a dynamic application that stores various data as attributes?" I think this is really a matter of style. You are given XML chunks that are (presumably) documented and MySQL can navigate them. If you just keep them as-such you save a step. What would be gained by converting them to something else?
The MySQL docs suggest that the XML file will go into a clob field. Performance may suffer on larger docs. Perhaps then you will identify sub-documents that you want to regularly break out and put into a child table.
Along these same lines, if there are particular sub-docs you know you will want to know about, you can make a child table, "HasDocs", do a little pre-processing, and populate it with names of sub-docs with their counts. This would make for faster statistical analysis and also make it faster to find docs that have certain sub-docs.
Wish I could say more, hope this helps.

Implementing a database structure for generic objects

I'm building a PHP/MySQL website and I'm currently working on my database design. I do have some database and MySQL experience, but I've never structured a database from scratch for a real world application which hopefully is going to get some good traffic, so I'd love to hear advices from people who've already done it, in order to avoid common mistakes. I hope my explanations are not too confusing.
What I need
In my application, the user should be able to write a post (title + text), then create an "object" (which can be anything, like a video, or a song, etc.) and attach it to the post. The site has a list of predefined object types the user can create, and I should be able to add new types in the future. The user should also have the ability to see the object's details in a dedicated page and add a comment to it - the same applies to posts.
What I tried
I created an objects table with these fields: oid, type, name and date. This table contains records for anything the user should be able to add comments to (i.e. posts and objects). Then I created a postmeta table which contains additional post data (such as text, author, last edit date, etc.), a videometa table for data about the "video" object (URL, description, etc.), and so on. A postobject table (pid,oid) links objects to posts. Additionally, there's a comments table which contains the comment text, the author and the ID of the object it refers to.
Since the list of object types is predefined and is probably not going to change (though I still need the ability to add a type easily at any time without changing the app's code structure or the database design), and it is relatively small, it's not a problem to create a "meta" table for each type and make a corresponding PHP class in my application to handle it.
Finally, a page on the site needs to show a list of all the posts including the objects attached to it, sorted by date. So I get all the records from the objects table with type "post" and join it with postmeta to get the post metadata. Then I query postobject to get all the objects attached to this post, and comments to get all the comments.
The questions
Does this make any sense? Is it any good to design a database in this way for a real world site? I need to join quite a few tables to get all the data I need, and the objects table is going to become huge since it contains almost every item (only the type, name and creation date, though) - this is to keep the database and the app code flexible, but does it work in the real world, or is it too expensive in the long term? Am I thinking about it in the wrong way with this kind of OOP approach?
More specifically: suppose I need to list all the posts, including their attached objects and metadata. I would need to join these tables, at least: posts, postmeta, postobject and {$objecttype}meta (not to mention an users table to get all posts by a specific user, for example). Would I get poor performance doing this, even if I'm using only numeric indexes?
Also, I considered using a NoSQL database (MongoDB) for this project (thanks to Stuart Ellis advice). Apparently it seems much more suitable since I need some flexibility here. But my doubt is: metadata for my objects includes a lot of references to other records in the database. So how would I avoid data duplication if I can't use JOIN? Should I use DBRef and the techniques described here? How do they compare to MySQL JOINs used in the structure described above in terms of performance?
I hope these questions do make any sense. This is my first project of this kind and I just want to avoid to make huge mistakes before I launch it and find out I need to rework the design completely.
I'm not a NoSQL person, but I wonder whether this particular case might actually be handled best with a document database (MongoDB or CouchDB). Various type of objects with metadata attached sounds like the kind of scenario that MongoDB is designed for.
FWIW, you've got a couple of issues with your table and field naming that might bite you later. For example, type and date are rather generic, and also reserved words. You've also mixed singular and plural table names, which will throw any automatic object mapping.
Whichever database you use, it's a good idea to find an existing set of database naming conventions and apply it from the start - this will help you avoid subtle issues and ensure that your naming stays consistent. I tend to use the Rails naming conventions ATM, because they are well-known and fairly sensible.
Or you could store the object contents as a file, outside of the database, if you're concerned about the database space.
If you store anything in the database, you already have the object type in objects; so you could just add object_contents table with a long binary field to store the object. You don't need to create a new table for each new type.
I've seen a lot of JOIN's in real world web application (5 to 10). Objects table may get large, but that's indices are for. So far, I don't see anything wrong in your database. BTW, what felt strange to me - one post, one object, and separate comments for each? No ability to mix pictures with text?