Abstracted JOIN for maintainability? - mysql

Does anyone know an ORM that can abstract JOINs? I'm using PHP, but I would take ideas from anywhere. I've used Doctrine ORM, but I'm not sure if it supports this concept.
I would like to be able to specify a relation that is actually a complicated query, and then use that relation in other queries. Mostly this is for maintainability, so I don't have a lot of replicated code that has to change if my schema change. Is this even possible in theory (at least for some subset of "complicated query")?
Here's an example of what I'm talking about:
ORM.defineRelationship('Message->Unresponded', '
LEFT JOIN Message_Response
ON Message.id = Message_Response.Message_id
LEFT JOIN Message AS Response
ON Message_Response.Response_id = Response.id
WHERE Response.id IS NULL
');
ORM.query('
SELECT * FROM Message
SUPER_JOIN Unresponded
');
Sorry for the purely invented syntax. I don't know if anything like this exists. It would certainly be complicated if it did.

One possibility would be to write this join as a view in the database. Then you can use any query tools on the view.
Microsofts Entity Framework also supports very complex mappings between code entities and the database tables, even crossing databases. The query you've given as an example would be easily supported in terms of mapping from that join of tables to an entity. You can then execute further queries against the resulting joined data using LINQ. Of course if you're using PHP this may not be a huge amount of use to you.
However I'm not aware of a product that wraps up the join into the syntax of further queries in the way you've shown.

Related

Are views in MySQL quicker than complex queries? [duplicate]

This question already has answers here:
MYSQL View vs Select Performance and Latency
(2 answers)
Closed 11 months ago.
I have a problem with a SELECT with multiple inner joins. My code is as follows:
SELECT `movies02`.`id`, `movies02`.`title`,
`movies03`.`talent`,
`movies07`.`character`,
`movies05`.`genre`
FROM `movies02`
INNER JOIN `movies07` ON `movies07`.`movie` = `movies02`.`id`
INNER JOIN `movies03` ON `movies03`.`id` = `movies07`.`performer`
INNER JOIN `movies08` ON `movies08`.`genre` = `movies05`.`id`
INNER JOIN `movies02` ON `movies08`.`movie` = `movies02`.`id`;
Doing an INNER JOIN to get the actors in the movie, as well as the characters they play, seems to work but the second two, which get the movie genre, don't work so I figure I can just write them as a VIEW and then combine them when I output the results. I would, therefore, end up with three VIEWs. One to get the genres, actors and characters, and then one to put everything together. Question is whether it is better to do that than one massive SELECT with multiple joins?
I tried rewriting the query a bunch of times and in multiple ways
When you do a query involving views, MySQL / MariaDB's query planner assembles all the views and your main query into a single query before working out how to access your tables. So, performance is roughly the same when using views, Common Table Expressions, and/or subqueries.
That being said, views are a useful way of encapsulating some query complexity.
And, you can grant a partly-trusted user access to a view without granting them access to the underlying tables.
The downside of views is the same as the downside of putting any application logic into your DBMS rather than in your application: it's trickier to update, and easier to forget to update. (This isn't relevant if you have a solid application-update workflow that updates views, stored functions, and stored procedures as it updates your application code.)
That being said, a good way to write queries like this is to start with the table containing the "top-level" entity. In your case I think it's the movie. Then LEFT JOIN the other tables rather than INNER JOINing them. That way you'll still see the movie in your results even when some of its subsidiary entities (performer, genre, I guess) are missing.
Pro tip: If you can, name your tables for the entities they contain (movie, genre, actor, etc) rather than using names like whatever01, whatever02 ... It's really important to be able to look at queries and reason about them, and naming the tables makes that easier.
Views are just sintactic sugar for queries. When you include a view in a query the engine reads the definition of it and combines it in the query.
They are useful to make queries easier to read and to type.
On the flip side, they can be detrimental to the query performance when naïve developers use them indiscriminately and end up producing queries that become unnecessarily complex behind the scenes. Use them with care.
Now, materialized view are a totally different story since they are pre-computed and refreshed at specific times or events. They can be quite fast to use since they can be indexed, but on the flip side their refresh interval configuration mean they may be showing data that is not 100% up to date.

Is MongoDB good for handling SQL-type data?

I have a rather huge application storing data in MongoDB (Mongoose) despite the fact my data is absolutely sequel and can be presented as tables with schemas very well. The specific is I have a lot of relations between objects. So I need to perform very deep populations — 25+ for each request in total.
A good way is to rewrite app for MySQL. However there are tonnes of code binded on MongoDB. The question is: if there will be growing amount of relations between objects by ObjectID, will it be still so efficient as MySQL or should I dive into code and move app complete to MySQL?
In both cases I use ORM. Now Mongoose, if I move — Sequelize.
Is Mongo really efficient in working with relations? I mean, SQL was designed to join tables with relations, I hope it has some optimisations undercover. Relations for Mongo seem to be a bit unusual usecase. So, I worry if logically the same query for gathering data from 25 collections in Mongo or join data from 25 tables in MySQL may be slower for Mongo.
Here's the example of Schema I'm using. Populated fields are marked with *.
Man
-[friends_ids] --> [Man]*
-friends_ids*: ...
-pets_ids*: ...
-...
-[pets_ids] -> [Pet]*
-name
-avatars*: [Avatar]
-path
-size
-...
My thoughts about relations. Lets imagine Man object that should have [friends] field. Let take it out.
MySQL ORM:
from MANS table find Man where id=:id.
from MAN-TO-MANS table find all records where friend id = :id of Man from step 1
from MANS table find all records where id = :id of Men from step 2
join it into one Man object with friends field populated
Mongo:
from MANS collection find Man where _id=:_id. Get it's friends _id's array on this step (non populated)
from MANS collection find all documents where _id = :_id of Men from step 1
join it into one Man object with friends field populated
No requestes to JOIN tables. Am I right?
So I need to perform very deep populations — 25+ for each request in total.
A common misconception is that MongoDB does not support JOINs. While this is partially true it is also quite untrue. The reality is that MongoDB does not support server-side joins.
The MongoDB motto is client side JOINing.
This motto can work against you; the application does not always understand the best way to JOIN as such you have to pick your schema, queries and JOINs very carefully in MongoDB to ensure that you are not querying inefficiently.
25+ is perfectly possible for MongoDB, that's not the problem. The problem will be what JOINs you are doing.
This leads onto:
Is Mongo really efficient in working with relations?
Let me give you an example of where MongoDB would actually be faster than MySQL.
Imagine you have a group collection with each group document containing a user_ids field which is represented as an array of ObjectIds which directly relate to the _id field in the user collection.
Doing two queries, one for the group and one for the users would likely be faster than MySQL in this specific case since MongoDB, for one, would not need to atomically write out a result set using your IO bandwidth for common tasks.
This being said though, anything complex and you will get hammered by the fact that the application does not truly know how to use index inter-sectioning and merging to create a slightly performant JOIN.
So for example say you wish to JOIN between 3 tables in one query paginating by the 3 JOINed table. That would probably kill MongoDBs performance while not being such an inefficient JOIN to perform.
However, you might also find that those JOINs are not scalable anyway and are in fact killing any performance you get on MySQL.
if there will be growing amount of relations between objects by ObjectID, will it be still so efficient as MySQL or should I dive into code and move app complete to MySQL?
Depends on the queries but I have at least given you some pointers.
Your question is a bit broad, but I interpret it in one of two ways.
One, you are saying that you have references 25 levels deep, and in that case using populate is just not going to work. I dearly hope this is not the pickle you find yourself in. Moving to SQL won't help you either, the fact is you'll be going back to the database too many times no matter what. But if this is how it's got to be, you can tackle it using a variation of the materialized path pattern, which will allow you to select subtrees much more efficiently within your very deep data tree. See here for a discussion: http://docs.mongodb.org/manual/tutorial/model-tree-structures-with-materialized-paths/
The other interpretation is that you have 25 relations between collections. Let's say in this case there is one collection in Mongo for every letter of the English alphabet, and documents in collection A have references to one or more documents in each of collections B-Z. In this case, you might be ok. Mongoose populate lets you populate multiple reference paths, and I doubt if there is a limit it is anywhere as low as 25. So you'd do something like docA.populate("B C ... Z"). In this case also, moving to SQL won't help you per se, you'll still be required to join on multiple tables.
Of course, your original statement that this could all be done in SQL is valid, there doesn't seem to have been a specific reason to use (or not use) Mongo here, just seems to be the way things were done. However, it also seems that whether you use NoSQL or SQL approaches here isn't the determining factor in whether you will see inefficiency. Rather, it's whether you model the data correctly within whatever solution you choose.

Select query to get database objects

I have database which contains huge number of tables, stored procedure. So,
how can i get specific objects like table, stored procedure in a single query for specific database.
SELECT
[schema] = s.name,
[object] = o.name,
o.type_desc
FROM sys.objects AS o
INNER JOIN sys.schemas AS s
ON o.[schema_id] = s.[schema_id]
WHERE o.[type] IN ('P','U');
Some other answers you'll find on this or other sites might suggest some or all of the following:
sysobjects - stay away, this is a backward compatibility view that has been deprecated, and shouldn't be used in any version > SQL Server 2000. See a thorough but not exhaustive replacement map here.
built-in functions like OBJECT_NAME(), SCHEMA_NAME() and OBJECT_SCHEMA_NAME() - I've recommended these myself over the years, until I realized they are blocking functions and don't observe the transaction's isolation semantics. So if you want to grab this information under read uncommitted while there are underlying changes happening, you can't, and you'll have to wait. Which may be what you want to do, but not always.
INFORMATION_SCHEMA - these views are there to satisfy the standards, but aren't complete, are warned to be inaccurate, and aren't updated to reflect new features (I blogged about several specific problems here). So for very basic information (or when you need to write cross-platform metadata code), they may be ok, but in almost all cases I suggest just always using a method you can trust instead of picking and choosing.

Rails and queries with complex joins: Can each joined table have an alias?

I'm developing an online application for education research, where I frequently have the need for very complex SQL queries:
queries usually include 5-20 joins, often joined to the same table several times
the SELECT field often ends up being 30-40 lines tall, between derived fields / calculations and CASE statements
extra WHERE conditions are added in the PHP, based on user's permissions & other security settings
the user interface has search & sort controls to add custom clauses to the WHERE / ORDER / HAVING clauses.
Currently this app is built on PHP + MYSQL + Jquery for the moving parts. (This grew out of old Dreamweaver code.) Soon we are going to rebuild the application from scratch, with the intent to consolidate, clean, and be ready for future expansion. While I'm comfortable in PHP, I'm learning bits about Rails and realizing, Maybe it would be better to build version 2.0 on a more modern framework instead. But before I can commit to hours of tutorials, I need to know if the Rails querying system (ActiveRecord?) will meet our query needs.
Here's an example of one query challenge I'm concerned about. A query must select from 3+ "instances" of a table, and get comparable information from each instance:
SELECT p1.name AS my_name, pm.name AS mother_name, pf.name AS father_name
FROM people p1
JOIN mother pm ON p1.mother_id = pm.id
JOIN father pf ON p1.father_id = pf.id
# etc. etc. etc.
WHERE p1.age BETWEEN 10 AND 16
# (selects this info for 10-200 people)
Or, a similar example, more representative of our challenges. A "raw data" table joins multiple times to a "coding choices" table, each instance of which in turn has to look up the text associated with a key it stores:
SELECT d.*, c1.coder_name AS name_c1, c2.coder_name AS name_c2, c3.coder_name AS name_c3,
(c1.result + c2.result + c3.result) AS result_combined,
m_c1.selection AS selected_c1, m_c2.selection AS selected_c2. m_c3.selection AS selected_c3
FROM t_data d
LEFT JOIN t_codes c1 ON d.id = c1.data_id AND c1.category = 1
LEFT JOIN t_menu_choice m_c1 ON c1.menu_choice = m_c1.id
LEFT JOIN t_codes c2 ON d.id = c2.data_id AND c2.category = 2
LEFT JOIN t_menu_choice m_c2 ON c2.menu_choice = m_c2.id
LEFT JOIN t_codes c3 ON d.id = c3.data_id AND c3.category = 3
LEFT JOIN t_menu_choice m_c3 ON c3.menu_choice = m_c3.id
WHERE d.date_completed BETWEEN ? AND ?
AND c1.coder_id = ?
These sorts of joins are straightforward to write in pure SQL, and when search filters and other varying elements are needed, a couple PHP loops can help to cobble strings together into a complete query. But I haven't seen any Rails / ActiveRecord examples that address this sort of structure. If I'll need to run every query as pure SQL using find_by_sql(""), then maybe using Rails won't be much of an improvement over sticking with the PHP I know.
My question is: Does ActiveRecord support cases where tables need "nicknames", such as in the queries above? Can the primary table have an alias too? (in my examples, "p1" or "d") How much control do I have over what fields are selected in the SELECT statement? Can I create aliases for selected fields? Can I do calculations & select derived fields in the SELECT clause? How about CASE statements?
How about setting WHERE conditions that specify the joined table's alias? Can my WHERE clause include things like (using the top example) " WHERE pm.age BETWEEN p1.age AND 65 "?
This sort of complexity isn't just an occasional bizarre query, it's a constant and central feature of the application (as it's currently structured). My concern is not just whether writing these queries is "possible" within Rails & ActiveRecord; it's whether this sort of need is supported by "the Rails way", because I'll need to be writing a lot of these. So I'm trying to decide whether switching to Rails will cause more trouble than it's worth.
Thanks in advance! - if you have similar experiences with big scary queries in Rails, I'd love to hear your story & how it worked out.
Short answer is Yes. Rails takes care of the large part of these requirements through various types of relations, scopes, etc. Most important thing is to properly model your application to support types of queries and functionality you are going to need. If something is difficult to explain to a person, generally will be very hard to do in rails. It's optimized to handle most of "real world" type of relationships and tasks, so "exceptions" become somewhat difficult to fit into this convention, and later become harder to maintain, manage, develop further, decouple etc. Bottom line is that rails can handle sql query for you, SomeObject.all_active_objects_with_some_quality, give you complete control over sql SomeObject.find_by_sql("select * from ..."), execute("update blah set something=''...) and everything in between.
One of advantages of rails allows you to quickly create prototypes, I would create your model concepts, and then test the most complex business requirements that you have. This will give you a quick idea of what is possible and easy to do vs bottlenecks and potential issues that you might face in development.

LINQ To Entities and Lazy Loading

In a controversial blog post today, Hackification pontificates on what appears to be a bug in the new LINQ To Entities framework:
Suppose I search for a customer:
var alice = data.Customers.First( c => c.Name == "Alice" );
Fine, that works nicely. Now let’s see
if I can find one of her orders:
var order = ( from o in alice.Orders
where o.Item == "Item_Name"
select o ).FirstOrDefault();
LINQ-to-SQL will find the child row.
LINQ-to-Entities will silently return
nothing.
Now let’s suppose I iterate through
all orders in the database:
foreach( var order in data.Orders ) {
Console.WriteLine( "Order: " + order.Item ); }
And now repeat my search:
var order = ( from o in alice.Orders
where o.Item == "Item_Name"
select o ).FirstOrDefault();
Wow! LINQ-to-Entities is suddenly
telling me the child object exists,
despite telling me earlier that it
didn’t!
My initial reaction was that this had to be a bug, but after further consideration (and backed up by the ADO.NET Team), I realized that this behavior was caused by the Entity Framework not lazy loading the Orders subquery when Alice is pulled from the datacontext.
This is because order is a LINQ-To-Object query:
var order = ( from o in alice.Orders
where o.Item == "Item_Name"
select o ).FirstOrDefault();
And is not accessing the datacontext in any way, while his foreach loop:
foreach( var order in data.Orders )
Is accessing the datacontext.
LINQ-To-SQL actually created lazy loaded properties for Orders, so that when accessed, would perform another query, LINQ to Entities leaves it up to you to manually retrieve related data.
Now, I'm not a big fan of ORM's, and this is precisly the reason. I've found that in order to have all the data you want ready at your fingertips, they repeatedly execute queries behind your back, for example, that linq-to-sql query above might run an additional query per row of Customers to get Orders.
However, the EF not doing this seems to majorly violate the principle of least surprise. While it is a technically correct way to do things (You should run a second query to retrieve orders, or retrieve everything from a view), it does not behave like you would expect from an ORM.
So, is this good framework design? Or is Microsoft over thinking this for us?
Jon,
I've been playing with linq to entities also. It's got a long way to go before it catches up with linq to SQL. I've had to use linq to entities for the Table per Type Inheritance stuff. I found a good article recently which explains the whole 1 company 2 different ORM technologies thing here.
However you can do lazy loading, in a way, by doing this:
// Lazy Load Orders
var alice2 = data.Customers.First(c => c.Name == "Alice");
// Should Load the Orders
if (!alice2.Orders.IsLoaded)
alice2.Orders.Load();
or you could just include the Orders in the original query:
// Include Orders in original query
var alice = data.Customers.Include("Orders").First(c => c.Name == "Alice");
// Should already be loaded
if (!alice.Orders.IsLoaded)
alice.Orders.Load();
Hope it helps.
Dave
So, is this good framework design? Or is Microsoft over thinking this for us?
Well lets analyse that - all the thinking that Microsoft does so we don't have to really makes us lazier programmers. But in general, it does make us more productive (for the most part). So are they overthinking or are they just thinking for us?
If LINQ-to-Sql and LINQ-to-Entities came from two different companies, it would be an acceptable difference - there's no law stating that all LINQ-To-Whatevers have to be implemented the same way.
However, they both come from Microsoft - and we shouldn't need intimate knowledge of their internal development teams and processes to know how to use two different things that, on their face, look exactly the same.
ORMs have their place, and do indeed fill a gap for people trying to get things done, but the ORM uses must know exactly how their ORM gets things done - treating it like an impenetrable black box will only lead you to trouble.
Having lost a few days to this very problem, I sympathize.
The "fault," if there is one, is that there's a reasonable tendency to expect that a layer of abstraction is going to insulate from these kinds of problems. Going from LINQ, to Entities, to the database layer, doubly so.
Having to switch from MS-SQL (using LingToSQL) to MySQL (using LinqToEntities), for instance, one would figure that the LINQ, at least, would be the same if not just to save from the cost of having to re-write program logic.
Having to litter code with .Load() and/or LINQ with .Include() simply because the persistence mechanism under the hood changed seems slightly disturbing, especially with a silent failure. The LINQ layer ought to at least behave consistently.
A number of ORM frameworks use a proxy object to dynamically load the lazy object transparently, rather than just return null, though I would have been happy with a collection-not-loaded exception.
I tend not to buy into the they-did-it-deliberately-for-your-benefit excuse; other ORM frameworks let you annotate whether you want eager or lazy-loading as needed. The same could be done here.
I don't know much about ORMs, but as a user of LinqToSql and LinqToEntities I would hope that when you try to query Orders for Alice it does the extra query for you when you make the linq query (as opposed to not querying anything or querying everything for every row).
It seems natural to expect
from o in alice.Orders where o.Item == "Item_Name" select o
to work given that's one of the reasons people use ORM's in the first place (to simplify data access).
The more I read about LinqToEntities the more I think LinqToSql fulfills most developers needs adequately. I usually just need a one-to-one mappingn of tables.
Even though you shouldn't have to know about Microsoft's internal development teams and processes, fact of the matter is that these two technologies are two completely different beasts.
The design decision for LINQ to SQL was, for simplicity's sake, to implicitly lazy-load collections. The ADO.NET Entity Framework team didn't want to execute queries without the user knowing so they designed the API to be explicitly-loaded for the first release.
LINQ to SQL has been handed over to ADO.NET team and so you may see a consolidation of APIs in the future, or LINQ to SQL get folded into the Entity Framework, or you may see LINQ to SQL atrophy from neglect and eventually become deprecated.