I am not a pro in MySQL, but want to do something like Object Layer above relational MySQL tables.
I want to have very many "structures" with a fields of type "bigint", "longtext", "datetime", "double" stored in just 7 tables.
entity_types (et_id, et_name) - list of "structures";
entity_types_fields (etf_id, parent_et_id, ....., etf_ident, etf_type) - list of structure properties stored in one table for ALL structures; etf_type contains int value (0,1,2,3) which referenced to one of 4 tables described below.
entities (e_id, et_id) - list of all available entities (id and type id of entity)
and 4 data tables (containing all data for entities) -
entities_props_bigint (parent_e_id, parent_etf_id, ep_data) - for BIGINT data properties
entities_props_longtext (parent_e_id, parent_etf_id, ep_data) - for LONGTEXT data properties
entities_props_datetime (parent_e_id, parent_etf_id, ep_data) - for DATETIME data properties
entities_props_double (parent_e_id, parent_etf_id, ep_data) - for DOUBLE data properties
What the best way to do selection from such data layer ?
Let I have list of e_id (id of entities), each entity can have any type. I want to get predefined list of properties. If some of entities don't have such property, I want to have it equal to NULL.
Do you have some info about how to do it ? May be you have some links or have already deal with such things.
Thanks!
You're reinventing the wheel by implementing a whole metadata system on top of a relational database. Many developers have tried to do what you're doing and then use SQL to query it, as if it is relational data. But implementing a system of non-relational data and metadata in SQL is harder than you expect.
I've changed the relational tag of your question to eav, because your design is a variation of the Entity-Attribute-Value design. There's a limit of five tags in Stack Overflow. But you should be aware that your design is not relational.
A relational design necessarily has a fixed set of attributes for all instances of an entity. The right way to represent this in a relational database is with columns of a table. This allows you to give a name and a data type to each attribute, and to ensure that the same set of names and their data types apply to every row of the table.
What the best way to do selection from such data layer ?
The only scalable way to query your design is to fetch the attribute data and metadata as rows, and reconstruct your object in application code.
SELECT e.e_id, f.etf_ident, f.etf_type,
p0.ep_data AS data0,
p1.ep_data AS data1,
p2.ep_data AS data2,
p3.ep_data AS data3
FROM entities AS e
INNER JOIN entity_type_fields AS f ON e.et_id = f.parent_et_id
LEFT OUTER JOIN entities_props_bigint AS p0 ON (p0.parent_e_id,p0.parent_etf_id) = (e.e_id,f.etf_id)
LEFT OUTER JOIN entities_props_longtext AS p1 ON (p1.parent_e_id,p1.parent_etf_id) = (e.e_id,f.etf_id)
LEFT OUTER JOIN entities_props_datetime AS p2 ON (p2.parent_e_id,p2.parent_etf_id) = (e.e_id,f.etf_id)
LEFT OUTER JOIN entities_props_double AS p3 ON (p3.parent_e_id,p3.parent_etf_id) = (e.e_id,f.etf_id)
In the query above, each entity field should match at most one property, and the other data columns will be null. If all four data columns are null, then the entity field is missing.
Re your comment, okay now I understand better what you are trying to do. You have a collection of entity instances in a tree, but each instance may be a different type.
Here's how I would design it:
Store any attributes that all your entity subtypes have in common in a sort of super-type table.
entities(e_id,entity_type,name,date_created,creator,sku, etc.)
Store any attributes specific to an entity sub-type in their own table, as in Martin Fowler's Class Table Inheritance design.
entity_books(e_id,isbn,pages,publisher,volumes, etc.)
entity_videos(e_id,format,region,discs, etc.)
entity_socks(e_id,fabric,size,color, etc.)
Use the Closure Table design to model the hierarchy of objects.
entity_paths(ancestor_e_id, descendant_e_id, path_length)
For more information on Class Table Inheritance and Closure Table, see my presentations Practical Object-Oriented Models in SQL and Models for Hierarchical Data in SQL, or my book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming, or Martin Fowler's book Patterns of Enterprise Application Architecture.
Related
I have Student model and Class model, and they have many-to-many relationship.
(A student can register for many classes, and a class can include many students)
I have Enrollment table as a join table.
(you can get the picture in the following website)
https://fmhelp.filemaker.com/help/18/fmp/en/index.html#page/FMP_Help/many-to-many-relationships.html
■ Student table
attributes:
・name
・age
■ Class table
attributes:
・className
・desrciption
■ Enrollment table
attributes:
・studentId
・classId
I think this is typical many-to-many relationship and I'm working this with MySQL and rails.
I would like to know if I could implement this relationships on Elasticsearch.
I read some articles which say that Elasticsearch does not support it, but is there any hacks or best practice for this?
Your use case is better suited for relational database.
Store and query data separately in elastic search and join data in API (business side).
Elasticsearch does not have concept of joins. It is based on concept of denormalization. Denormalization is used to improve the response time of a query at the expense of adding redundant data. Data from different tables can be combined and stored in single place , avoiding the need of joins, which results in faster retrieval at cost of storage(duplicity of data).
Your document which is equivalent to row in a table can be modeled as below
{
studentName:"",
"age":"",
....
classes:[
{
className:"class1",
...
},
{
className:"class2",
...
}
]
}
For each student store all the classes associated with him/her. This will cause duplication of class data across students. It will lead to faster search but slower update as any change in class data will need to be updated across students.
You can also model your data other way around with class as parent and array of students under it. Choose based on your use case.
For your sub model you can use different data types.
Object -- data is flattened.
Nested relations is maintained between properties of model.
Child/Parent sub model becomes different document. It is used when sub model data changes frequently. It does not case re-indexing of parent document.
I looking at trying to create a database that has keys with multiple values.
For example the key would be "benchpress" and the values would be "chest", "deltoid", "shoulder" .
I would have a bunch of these keys and then want to search the values and return all of the keys that contain what I was searching for. for example if I searched for chest I would get benchpress and other exercises that had that value.
I was wondering if this is possible and what the best way to do this is. I assume that I will have to use MySql for this.
Thanks!
You're probably going to want three tables to do this. One to hold all of your exercises, one to hold all of your muscles, and one to relate exercises to their respective muscles. You can then query the third table to figure out what muscles any one exercise targets using a JOIN statement.
The benefit of having three tables versus two is that each exercise definition and muscle definition is independent, so any exercise can target multiple muscles and any muscle can be targeted by multiple exercises.
Schema:
table exercise
id (int255) | exerciseName (varchar255)
table muscles
id (int255) | muscleName (varchar255)
table exerciseTargets
exerciseId (int255) | muscleId (int255)
Keys:
exercise.id: primary
exercise.name: unique
muscles.id: primary
muscles.name: unique
exerciseTargets.exerciseId: foreign key on exercise.id
exerciseTargets.muscleId: foreign key on muscle.id
Example query:
SELECT muscleName FROM muscles INNER JOIN exerciseTargets ON exerciseTargets.muscleId = muscles.id WHERE exerciseTargets.exerciseId = :exerciseId;
(where :exerciseId is the ID of an exercise.)
One way to do this in MySql would be to have two separate tables -- one for "exercise" and one for the affected muscle groups. These would be joined by a shared "key" (MuscleID). To add a new exercise to a muscle group, you'd just add it to the Exercise table with the correct "MuscleID".
For example:
create table Exercise (MuscleID int, exercise varchar(50));
insert into Exercise values (1, "benchpress"),(2, "leg press");
create table Muscles (MuscleID int, muscles varchar(50));
insert into Muscles values (1, "chest"),(1, "deltoid"),(1, "shoulder"),(2, "thigh")
select *
from Exercise as E
inner join Muscles as M on M.MuscleID = E.MuscleID
where E.exercise = "benchpress"
Demo: http://sqlfiddle.com/#!2/19c9c/1
However, this might not be a flexible enough system for you. The answer by #matt617 would most likely be a better option for a more comprehensive setup, but if you're doing something simple, this might work for you.
I would suggest looking into something more like an Ontology or a generic Graph Dataset.
An ontology formally represents knowledge as a set of concepts within a domain, using a shared vocabulary to denote the types, properties and interrelationships of those concepts. Whereas a Graph is a more free form notion of relationships (like a social network). All ontologies are in essence a graph but not all graphs are ontologies.
An VERY overly example of an RDF using the N3 format (which is the most common format for software ontologies) would be:
#prefix : http://www.example.org/ .
:bicep a :muscle.
:tricep a :muscle .
:quadracep a :muscle
then you would have a separate RDF for exercises
#prefix : http://www.example.org/ .
:benchpress a :exercise.
then you can start mapping them together:
#prefix : http://www.example.org/ .
:benchpress :uses :bicep :pectorals
There is also an XML format. For more information a good sample site is here: http://www.rdfabout.com/quickintro.xpd
Infact I wouldn't be surprised if RDFs for muscle groups exist in the open domain I just haven't looked much for them.
If you are more of a programmer at heart than an information scientist type then Neo4J is a great Graph DB that lets you implement graphs without doing full ontologies. It uses a pretty handy query languge called Cypher which describes the relationships http://www.neo4j.org/learn/cypher
I am sure you could make a MySQL DB that was sufficient for your needs but it will be very hard to maintain depending on the long term of your application.
This question already has answers here:
How do you know when you need separate tables?
(9 answers)
Closed 9 years ago.
I have a table called cars but each car has hundreds of attributes and they keep on increasing over time (horsepower, torque, a/c, electric windows, etc...) My table has each attribute as a column. Is that the right way to do it when I have thousands of rows and hundreds of columns? Also, I made each attribute a column so I facilitate advanced searching / filtering.
Using MySQL database.
Thanks
This is an interesting question IMHO, and the answer may depend on your specific data model and implementation. The most important factor in this case is data density.
How much of each row is actually filled up, in average?
If most of your fields are always present, then data scope partition may be the way to go.
If most of your fields are empty, then a metadata-like structure (like #JayC suggested) may be more attractive.
Let's use the case you mentioned, and do some simulations.
On the first case, scope partition, the idea is to implement partitions based on scope or usage. As an example of partitioning by usage, let's say that the most retrieved fields are Model, Year, Maker and Color. These fields may compose your main [CAR] table, the owner of the ID field which will exclusively identify the vehicle.
Now let's say that Engine, Horsepower, Torque and Cylinders are also used for searches from time to time, but not so frequently. These may exist on a secondary table [CAR_INFO_1], which is tied to the first table by the presence of the CAR_ID field, a foreign key. Proceed by creating as many partitions you need.
Advantage: Simpler queries. You may coalesce all information about a vehicle if you do a joint query (for example inside a VIEW).
Downside: Maintenance. Each new field must be implemented in the model itself, and you need an updated data model to locate where the field you need is actually stored (or abstract it inside a view.)
Metadata format is much more elegant, but demands more of your database engine. Check #JayC's and #Nitzan Shaked's answers for details.
Advantages: 100% data density. You'll never have empty Data values. Also maintenance - a new attribute is created by adding it as a row to the metadata identifier table. Data structure is less complex as well.
Downside: Complex queries, together with more complex execution plans. Let's say you need all Ford cars made in 2010 that are blue. It would be very trivial on the first case:
SELECT * FROM CAR WHERE Model='Ford' AND Year='2010' AND Color='Blue'
Now the same query on a metadata-structured model:
Assume the existence of this two tables,
CAR_METADATA_TYPE
ID DESC
1 'Model'
2 'Year'
3 'Color'
and
CAR_METADATA [CAR_ID], [METADATA_TYPE_ID], [VALUE]
The query itself would like something like this:
SELECT * FROM CAR, CAR_METADATA [MP1], CAR_METADATA [MP2], CAR_METADATA [MP3]
WHERE MP1.CAR_ID = CAR.ID AND MP1.METADATA_TYPE_ID = 1 AND MP1.Value='Ford'
AND MP2.CAR_ID = CAR.ID AND MP2.METADATA_TYPE_ID = 2 AND MP2.Value='2010'
AND MP3.CAR_ID = CAR.ID AND MP3.METADATA_TYPE_ID = 3 AND MP3.Value='Blue'
So, it all depends on you needs. But given your case, my suggestion would be the Metadata format.
(But do a model cleanup first - no repeated fields, 1:N data on their own table instead of inline fields like Color1, Color2, Color3, this kind of stuff ;) )
I guess the obvious question is, then: why not have a table car_attrs(car, attr, value)? Each attribute is a row. Most queries can be re-written to use this form.
If it is all about features, create a features table, list all your features as rows and give them some sort of automatic id, and create a car_features that with foreign keys to both your cars table and your features table that associates cars with features, maybe along with any values associated with the relationship (one passenger electric seat, etc.).
If you have ever changing attributes, then consider storing them in an XML blob or text structure in one column. This structure is not relational. The most important attributes will then be duplicated in additional columns so you can craft queries to search on them as the Blob will not be searchable from SQL queries. This will cut down on the amount of columns in that table and allow for expansion without changing the database schema.
As others as suggested, if you want all the attributes in a table, then use an attribute table to define them. Then will depend on your requirements and needs of the application.
Hello, stackoverflow community!
I am working on a rather large database-driven web application. The underlying database is growing in complexity as more components are being added, but so far I've had absolutely no trouble normalizing the data quite nicely.
However, this final component implies a table that can hold products.
Each product has a category, and depending on the category, has different fields.
Making a table for each product category doesn't seem right, as there are currently five types, and they still have quite a lot of fields in common. (but in weird ways - a few general fields such as description and price are common to all 5 categories, but some attributes are shared between 1 and 2, others 3,4,5 and so on).
I'm trying to steer away from the EAV model for obvious performance reasons.
The thing is that according to what product type the user wants to enter into the database there is a somewhat (but not completely) different field structure - all of them have a name and general description, but other attributes such as "area covered" can be applied only to certain categories such as seeds and pesticides, but not fuel, which would have a diesel/gasoline boolean and a bunch of other fuel-related attributes.
Should I just extract the core features in a table, and make another five for each category type? That would be a bit hard to expand in the future.
My current idea would be to have the product table contain all the fields from all the possible categories, and then just have another table to describe which category from the product table has which fields.
product: id | type | name | description | price | composition | area covered | etc.
fields: id | name (contains a list of the fields in the above table)
product-fields: id | product_type | field_id (links a bunch of fields to the product table based on the product type)
I reckon this wouldn't be too slow, easy to search (no need to actually join the other tables, just perform the search on the main product table based on some inputs) and it would facilitate things like form generation and data validation with just one lightweight additional query /join. (fetch a product from the db and join a concatenated list of the fields actually used in a string - split that and display the proper form fields based on what it contains, i.e. the fields actually associated with that product.
Thanks for your trouble!
Andrei Bârsan
EAV can actually be quite good at storing data and fetching that databack again when you know the key. It also excels in it's ability to add fields without changing the schema. But where it's quite poor is when you need the equivilent of WHERE field1 = x and field2 = y.
So while I agree the data behaviour is important (how many products share the same fields, etc), the use of that data is also important.
Which fields need searching, which fields are always just data storage, etc
In most cases I'd suggest keeping all fields that need searching, in combination with each other, in the same table.
In practice this often leads to a single table solution.
New fields require schema changes, new indexes, etc
Potential for sparsely populated data, using more space than is 'required'
Allows simple queries, simple indexing and often the fastest queries
Often, though not always, the space overhead is marginal
Where the sparse-data overheads reach a critical point, I would then head towards additional tables grouped by what fields they contain. More specifically, I would not create tables by product. This is on the dual assumption that most/all fields will be shared across at least some products, and that those fields will need searching.
This gives a schema more like...
Main_table ( PK, Product_Type, Field1, Field2, Field3 )
Geo_table ( PK, county, longitute, latitude )
Value ( PK, cost, sale_price, tax )
etc
You may also have a meta-data table describing which product types have which fields, etc.
What this schema allows is a more densly populated set of tables, which can be easily indexed and so quickly searched, while minimising table clutter and joins by grouping related fields.
In the end, there isn't a true answer, it's all a balancing act. My general rule of thumb is to stay with a single table until I actually have a real and pressing reason not to, not just a theoretical one.
In my experience unless you are writing a a complete framework that can render fully described fields (we are talking about a lot of metadata describing each field) it is not worth separating field definitions from the main object. Modern frameworks (like Grails) allow for virtual zero pain adding a new column to a domain/Model class and table.
If your common field overlap is about 80% between all object types I would put them all in 1 table and use Table per Hierarchy inheritance model, where a descriminator field helps you tell your object types apart. On the other hand if you have 20% overlap of common fields then go with Table per Class inheritance model with base class and table containing common fields. And other joint tables hang off the base.
Should I just extract the core features in a table, and make another five for each category type? That would be a bit hard to expand in the future.
This is called a SuperType - SubType relationship. It works very well if most of your queries are one of two types:
If you will be querying mostly the SupetType table and only drilling down into the SubType table infrequently.
If you will be querying the database after being filtered to a specific SubType.
I have a schema where most of the tables have associated users_*_meta tables which store per-user data like starred/unstarred, rating, and the like.
For example, stories -< users_stories_meta >- users.
However, I'm having trouble figuring out how to perform a joined load of a row and the related metadata row for the current user without either writing my own ORM on top of SQL Alchemy's expression builder or using a closure to generate a new session and schema for each request. (relationship() doesn't seem to support resolving components of primaryjoin lazily)
What would the simplest, least wheel-reinventing way be to treat the appropriate subset of the many-to-many relationship as a one-to-many relationship (one user, many stories/authors/elements/etc.) specific to each request?
There's a recipe at http://www.sqlalchemy.org/trac/wiki/UsageRecipes/GlobalFilter which illustrates how the criterion represented by a relationship() can be influenced at the per-query level.