Using DataContext cache to preload child collection - linq-to-sql

In an attempt to reduce round-trips to the database, I was hoping to 'preload' child collections for a parent object. My hope was that if I loaded the objects that make up the child collection into the DataContext cache, Linq2SQL would use those objects instead of going to the database.
For example, assume I have a Person objects with two child collections: Children and Cars.
I thought this might work:
var children = from p in dbc.Person
select p.Children;
var cars = from p in dbc.Person
select p.Cars;
var people = from p in dbc.Person
select p;
var dummy1 = children.ToList();
var dummy2 = cars.ToList();
foreach(var person in people){
Debug.WriteLine(person.Children.Count);
}
But instead, I'm still getting one trip to the database for every call to person.Children.Count, even though all the children are already loaded in the DataContext.
Ultimately, what I'm looking for is a way to load a full object graph (with multiple child collections) in as few trips to the database as possible.
I'm aware of the DataLoadOptions class (and LoadWith), but it's limited to one child collection, and that's a problem for me.
I wouldn't mind using a stored procedure if I was still able to build up the full object graph (and still have a Linq2SQL object, of course) without a lot of extra manipulation of the objects.

I don't think what you require is directly possible with LINQ-SQL or even SQL in the way that your expecting.
When you consider how SQL works, a single one-many relationship can easily be flattened by an inner/left join. After this, if you want another set of objects including, you could theoretically write another left join which would bring back all the rows in the other table. But imagine the output SQL would provide, it's not easily workable back into an object graph. ORM's will typically start to query per row to provide this data. An example of this is to write multiple level DataLoadOptions using LINQ-SQL, you will start to see this.
Also, the impact on performance with many LEFT JOINs will easily outweigh the perceived benefits of single querying.
Consider your example. You are fetching ALL of the rows back from these two tables. This poses two problems further down the line:
The tables may get much more data in them than you expect. Pulling back 1000's of rows in SQL all at once may not be good for performance. You then expect LINQ-SQL to search the lists for matching objects. Dependent on use, I imagine SQL can provide this data faster.
The behavior you expect is hidden, and to me would be unusual that running a SELECT on a large table could potentially bloat the memory usage of the app by caching all of the rows. Maybe they could include it as an option or provide your own extension methods.
Solution
I would personally start to look at caching the data outside of LINQ-SQL, and then retrieve objects from the ASP.Net / Memory Cache / Your Cache provider, if it is deemed that the full object graph is expensive to create.
One level of DataLoadOptions plus relying on the built-in entity relationships + some manual caching will probably save many a headache.
I have come across this particular problem a few times and haven't thought of anything else yet.

Related

Rails - json column vs seperate table

I'm currently working on a Ruby on Rails project in which I have objects with association to instructions, meaning, each object, can have zero or more instruction objects that hold some basic data, like title, data (string), and position (for ordering them in the UI). I tried looking up an answer in google but found no relevant answer. the instructions are specific to each object and shouldn't be used for lookup or search of any kind, and therefore I figured I should store them as JSON within the object's own table instead of making a join table. The reason I think of doing so is that join table would explode when there would be many objects and because of that querying for each object's instructions would get longer over time. Is that a reasonable concern for storing this data as a JSON instead of has_many association?
Think of using JSON in an RDBMS as a form of denormalization. There are legitimate reasons to use denormalization, but you must keep in mind that it always optimizes for one type of query at the expense of other types of queries.
For example, in this case you could query your object and it would include the JSON document containing all instructions. But if you wanted to search for a specific instruction, it would be quite complex to search for the row that has a JSON documenting containing a specific instruction. Have you thought about how you would query that?
Using normalized database design, i.e. the join table you mention, allows for more flexibility in queries. You can query the object table, or you can query the instruction table. Either way, then simply join to the other table to the the corresponding rows.
The way to make this more optimized is to use indexes on the columns you want to search. See my presentation How to Design Indexes, Really or the video.
Using JSON creates a lot of complexity that you probably haven't considered. See my presentation How to Use JSON in MySQL Wrong.

Translate sql database schema to IndexedDB

I have three tables in my SQL Schema: Clients, with address and so on, orders with order details and files, which stores uploaded files. both the files table and the orders table contain foreign keys referencing the Client tables.
How would I do that in IndexedDB? IÄm new to this whole key-index-thinking and would just like to understand, how the same Thing would be done with indexedDB.
Now I know there is a shim.js file, but I'm trying to understand the concept itself.
Help and tips highly appreciated!
EDIT:
So I would really have to think about which queries I want to allow and then optimize my IndexedDB implementation for those queries, is that the main point here? Basically, I want to to store a customer once and then many orders for that customer and then be able to upload small files (preferably pdfs) for that customer, not even necessarily for each order (although if that's easy to implement, I may do it)... I see every customer as a separate entity, I wont have things like "give me all customers who ordered xy" - I only need to have each customer once and then store all the orders for the customer and all the files. I wanto be able to go: Search for customer with the name of XY - which then gives me a list of all orders and their dates and a list of the files uploaded for that customer (maybe associated to the order).
This question is a bit too broad to answer correctly. Nevertheless, the major concept to learn when transitioning from SQL to No-SQL (indexedDB) is the concept of object stores. Most SQL databases are relational and perform much of the work of optimizing queries for you. indexedDB does not. So the concepts of normalization and denormalization work a bit differently. The focal point is to explicitly plan your own queries. Unlike the design of an app/system that allows simple ad-hoc SQL queries that are designed at a later point in time, and possibly even easily added/changed at a later time, you really need to do a lot of the planning up front for indexedDB.
So it is not quite safe to say that the transition is simply a matter of creating three object stores to correspond to your three relational tables. For one, there is no concept of joining in indexedDB so you cannot join on foreign keys.
It is not clear from your question but your 3 tables are clients, orders, and files. I will go out on a limb here and make some guesses. I would bet you could use a single object store, clients. Then, for each client object, store the normal client properties, store an orders array property, and store a files array property. In the orders array, store order objects.
If your files are binary, this won't work, you will need to use blobs, and may even encounter issues with blob support in various browser indexedDB implementations (Chrome sort of supports it, it is unclear from version to version).
This assumes your typical query plan is that you need to do something like list the orders for a client, and that is the most frequently used type of query.
If you needed to do something across orders, independent of which client an order belongs to, this would not work so well and you would have to iterate over the entire store.
If the clients-orders relation is many to many, then this also would not work so well, because of the need to store the order info redundantly per client. However, one note here, is that this redundant storage is quite common in NoSQL-style databases like indexedDB. The goal is not to perfectly model the data, but to store the data in such a way that it your most frequently occurring queries complete quickly (while still maintaining correctness).
Edit:
Based on your edit, I would suggest a simple prototype that uses three object stores. In your client view page where you display client details, simply run three separate queries.
Get the one entity from the client object store based on client id.
Open a cursor over the orders and get all orders for the client. In the orders store, use a client-id property. Create an index on this client-id property. Open the cursor over the index for a specific client id.
Open a cursor over the files store using a similar tactic as #2.
In your bizlogic layer, enforce your data constraints. For example, when deleting a client, first delete all the files from the files store, then delete all the orders from the orders store, and then delete the single client entity from the client store.
What I am suggesting is to not overthink it. It is not that complicated. So far you have not described something that sounds like it will have performance issues so there is no need for something more elegant.
I will go with Josh answer but if you are still finding it hard to use indexeddb and want to continue using sql. You can use sqlweb - It will let you do operation inside indexeddb by using sql query.
e.g -
var connection = new JsStore.Instance('jsstore worker path');
connection.runSql("select * from Customers").then(function(result) {
console.log(result);
});
Here is the link - http://jsstore.net/tutorial/sqlweb/

Is MongoDB good for handling SQL-type data?

I have a rather huge application storing data in MongoDB (Mongoose) despite the fact my data is absolutely sequel and can be presented as tables with schemas very well. The specific is I have a lot of relations between objects. So I need to perform very deep populations — 25+ for each request in total.
A good way is to rewrite app for MySQL. However there are tonnes of code binded on MongoDB. The question is: if there will be growing amount of relations between objects by ObjectID, will it be still so efficient as MySQL or should I dive into code and move app complete to MySQL?
In both cases I use ORM. Now Mongoose, if I move — Sequelize.
Is Mongo really efficient in working with relations? I mean, SQL was designed to join tables with relations, I hope it has some optimisations undercover. Relations for Mongo seem to be a bit unusual usecase. So, I worry if logically the same query for gathering data from 25 collections in Mongo or join data from 25 tables in MySQL may be slower for Mongo.
Here's the example of Schema I'm using. Populated fields are marked with *.
Man
-[friends_ids] --> [Man]*
-friends_ids*: ...
-pets_ids*: ...
-...
-[pets_ids] -> [Pet]*
-name
-avatars*: [Avatar]
-path
-size
-...
My thoughts about relations. Lets imagine Man object that should have [friends] field. Let take it out.
MySQL ORM:
from MANS table find Man where id=:id.
from MAN-TO-MANS table find all records where friend id = :id of Man from step 1
from MANS table find all records where id = :id of Men from step 2
join it into one Man object with friends field populated
Mongo:
from MANS collection find Man where _id=:_id. Get it's friends _id's array on this step (non populated)
from MANS collection find all documents where _id = :_id of Men from step 1
join it into one Man object with friends field populated
No requestes to JOIN tables. Am I right?
So I need to perform very deep populations — 25+ for each request in total.
A common misconception is that MongoDB does not support JOINs. While this is partially true it is also quite untrue. The reality is that MongoDB does not support server-side joins.
The MongoDB motto is client side JOINing.
This motto can work against you; the application does not always understand the best way to JOIN as such you have to pick your schema, queries and JOINs very carefully in MongoDB to ensure that you are not querying inefficiently.
25+ is perfectly possible for MongoDB, that's not the problem. The problem will be what JOINs you are doing.
This leads onto:
Is Mongo really efficient in working with relations?
Let me give you an example of where MongoDB would actually be faster than MySQL.
Imagine you have a group collection with each group document containing a user_ids field which is represented as an array of ObjectIds which directly relate to the _id field in the user collection.
Doing two queries, one for the group and one for the users would likely be faster than MySQL in this specific case since MongoDB, for one, would not need to atomically write out a result set using your IO bandwidth for common tasks.
This being said though, anything complex and you will get hammered by the fact that the application does not truly know how to use index inter-sectioning and merging to create a slightly performant JOIN.
So for example say you wish to JOIN between 3 tables in one query paginating by the 3 JOINed table. That would probably kill MongoDBs performance while not being such an inefficient JOIN to perform.
However, you might also find that those JOINs are not scalable anyway and are in fact killing any performance you get on MySQL.
if there will be growing amount of relations between objects by ObjectID, will it be still so efficient as MySQL or should I dive into code and move app complete to MySQL?
Depends on the queries but I have at least given you some pointers.
Your question is a bit broad, but I interpret it in one of two ways.
One, you are saying that you have references 25 levels deep, and in that case using populate is just not going to work. I dearly hope this is not the pickle you find yourself in. Moving to SQL won't help you either, the fact is you'll be going back to the database too many times no matter what. But if this is how it's got to be, you can tackle it using a variation of the materialized path pattern, which will allow you to select subtrees much more efficiently within your very deep data tree. See here for a discussion: http://docs.mongodb.org/manual/tutorial/model-tree-structures-with-materialized-paths/
The other interpretation is that you have 25 relations between collections. Let's say in this case there is one collection in Mongo for every letter of the English alphabet, and documents in collection A have references to one or more documents in each of collections B-Z. In this case, you might be ok. Mongoose populate lets you populate multiple reference paths, and I doubt if there is a limit it is anywhere as low as 25. So you'd do something like docA.populate("B C ... Z"). In this case also, moving to SQL won't help you per se, you'll still be required to join on multiple tables.
Of course, your original statement that this could all be done in SQL is valid, there doesn't seem to have been a specific reason to use (or not use) Mongo here, just seems to be the way things were done. However, it also seems that whether you use NoSQL or SQL approaches here isn't the determining factor in whether you will see inefficiency. Rather, it's whether you model the data correctly within whatever solution you choose.

Database optimized for searching in large number of objects with different attributes

Im am currently searching for an alternative to our aging MySQL database using an EAV approach. Current projects seem to have outgrown traditional table oriented database structures and especially searches in such database.
I head and researched about various NoSQL database systems but I can't find anything that seems to be what Im looking for. Maybe you can help.
I'll show you a generalized example on what kind of data I have and what operations I want to execute on them:
I have an object that has a small number of META attributes. Attributes that are common to all instanced of my objects. For example these
DataObject Common (META) Attributes
Unique ID (Some kind of string containing a unique identifier)
Created Date (A date time showing creation time of the object)
Type (Some kind of type identifier, maybe something like "Article", "News", "Image" or "Video"
... I think you get the Idea
Then each of my Objects has a variable number of other attributes. Most probably, many Objects will share a number of these attributes, but there is no rule. For my sample, we say each Object instance has between 5 to 20 such attributes. Here are some samples
Data Object variable Attributes
Color (Some CSS like color string)
Name (A string)
Category (The category or Tag of this item) (Maybe we also have more than one of these?)
URL (a url containing some website)
Cost (a number with decimals
... And a whole lot of other stuff mostly being of the usual column types
References to other data is an idea, but not a MUST at the moment. I could provide those within my application logic if needed.
A small sample:
Image
Unique ID = "0s987tncsgdfb64s5dxnt"
Created Date = "2013-11-21 12:23:11"
Type = "Image"
Title = "A cute cat"
Category = "Animal"
Size = "10234"
Mime = "image/jpeg"
Filename = "cat_123.jpg"
Copyright = "None"
Typical Operations
An average storage would probably have around 1-5 million such objects, each with 5-20 attributes.
Apart from the usual stuff like writing one object to database or readin it by it's uid, the most problematic operations are these:
Search by several attributes - Select every DataObject that has Type "News" the Titel contains "blue" and the Created Date is after 2012.
Paged bulk read - Get a large number of objects from a search (see above) starting at element 100 and ending at 250
Get many objects with all of their attributes - When reading larger numbers of objects, I need to get every object with all of it's attributes in one call.
Storage Requirements
Persistance - The storage needs to be persistance and not in memory only. If the server reboots, the data has to be at the same point in time as when it shut down before. No memory only systems.
Integrity - All data is important, nothing can be ignored. So every single write action has to be securely stored. Systems (Redis?) that tend to loose something now and then arent usable. Systems with huge asynchronity are also problematic. If data changes, every responsible node should see that.
Complexity - The system should be fairly easy to setup and maintain. So, systems that force the admin to take many week long courses in it's use arent really a solution here. Same goes for huge data warehouses with loads of nodes. Clustering is nice, but it should also be possible to get a cheap system with one node.
tl;dr
Need super fast database system with object oriented data and fast searched even with hundreds of thousands of items.
A reason as to why I am searching for a better alternative to mysql can be found here: Need MySQL optimization for complex search on EAV structured data
Update
Key-Value stores like Redis weren't an option as we need to do some heavy searching insode our data. Somethng which isnt possible in a typical Key-Value store.
In the end, we are using MongoDB with a slightly optimized scheme to make best use of MongoDBs use of indizes.
Some small drawback still remain but are acceptable at the moment:
- MongoDBs aggregate function can not wotk with very large result sets. We have to use find (and refine our data structure to make that one sufficient)
- You can not sort large datasets on specific values as it would take up to much memory. You also cant create indizes on those values as they are schema free.
I don't know if you wan't a more sophisticated answer than mine. But maybe i can inspire you a little.
MySql are scaleable and can be used for exactly your course. I think it's more of an optimization and server problem if you database i slow. Many system with massive amount of data i using MySql and works perfectly, Though NoSql (Not-Only SQL) is built for large amount of data with different attributes.
There's many diffrent NoSql providers and they have different ways of handling you data.
Think about that before you choose a NoSql platform.
The possibilities are
Key–value Stores - ex. Redis, Voldemort, Oracle BDB
Column Store - ex. Cassandra, HBase
Document Store - ex. CouchDB, MongoDb
Graph Database - ex. Neo4J, InfoGrid, Infinite Graph
Most website uses document based storing, but ex. facebook are using the column based, because of the many dynamic atrribute.
You can try the Document based NoSql at http://try.mongodb.org/
In the end, it really depends on how you build and optimize you database, and not from which technology you choose, though chossing the right technology can save a bunch of time.
The system we have developed are using a a combination of MySql and NoSql depending on what data we are working with. MySql for the system itself and NoSql for all the data we import via API's.
Hope this inspires a little and feel free to ask any westions

Quickest way to represent array in mysql for retrieval

I have an array of php objects that I want to store into a mysql database table. The only way I can think of is just have a table to represent the object with a unique id and a separate table to store the array (there could be a column array_id and an object_id) but retrieving would require a join I believe which could get expensive. Is there a better way? I don't care much about storage space or insertion time as much as retrieval time.
I don't necessarily need this to work for associative arrays but if the solution could, that would be preferred.
Building a tree structure (read as Array) in mysql can be tricky but it is done all of the time. Almost any forum with nested threads has some mechanism to store a tree structure. As another poster said they do not have to be expensive.
The real question is how you want to use the data. If you need to be able to add/remove data fields from individual nodes in the tree then you can use one of two models
1) Adjacency List Model
2) Modified Preorder Tree Traversal Algorithm
(They sound scary, but it's not that bad I promise.)
The first one listed is probably the more common you will encounter and the second is the one I have begun to use more frequently and has some nice benefits once you wrap your head around it. Take a look at this page--it has an EXCELLENT writeup about both.
http://articles.sitepoint.com/article/hierarchical-data-database
As another poster said though, if you don't need to change the data with queries or search inside the text then use a PHP function to store it in a single field.
$array = array('something'=>'fun', 'nothing'=>'to do')
$storage_array = serialize($array);
//INSERT INTO DB
//DRAW OUT OF DB
$array = unserialize($row['stored_array']);
Presto-changeo, that one is easy.
If you are comfortable with not being able to SQL search through the data within the array, you could add a single column to the table, and serialize the array into it. You would have to deserialize it on retreival.
You could use JSON / PHP serializeation or whatever is more appropriate for the language you're developing in.
Joins don't have to be so expensive - you can define an index.