Database schema practical approach

Database schema practical approach - mysql

I want to model 2 entities in database: CafeBrand and Cafe. I have pretty much the same properties in both entities:
CafeBrand{
foodDescription,
website,
email,
phone
}
cafe{
foodDescription,
website,
email,
phone
}
So let's say in case of McDonalds all 'cafes' would have the same foodDescription: 'Junk food'. But some other brands might have separate food description for separate cafes('sandwiches', 'drinks only', ...).
Same with website/email/phone properties: cafe might have its own website/email/phone but also it could be using the same website/email/phone for all of them. Quite often the same CafeBrand has one website but different email/phone for its different cafes.
My question is: is it wise to store these properties as it is and then use if/else (in SQL or code) to get a proper description,website,email,phone (if cafe.website == null then use cafebrand.website) or is it better to use relationships to separate tables 'FoodDescription', 'Website' The data won't be written to the database very often and most of the time only select statements will be used.
And if Company has a single cafe. How should this foodDescription/Website be split into CafeBrand/Cafe tables.

As Bill Gregg has mentioned you should probably put all similar data in one table. So you'll receive following structure:
cafe {
foodDescription,
website,
email,
phone,
brand
}
Because of foodDescription, website and so on columns will contains pretty unique values I assume, you won't gain any profit with separation the data into different tables.

Related

How would you model a collection of users and friends in Firebase?

I'm trying to create a database (json) with Firebase.
I searched the docs and the net but couldn't find a clear way to start.
I want to have a database of users.
each user (represented as UID) should have a nickname and a list of friends.
I tried making a .json file that looks like this:
{
users:{
}
}
and adding it to the Firebase console to get started but it wouldn't work.
How can I do it?
the database should look like this:
{
users:{
UID:{
nickname: hello
friends: UID2
}
UID2:{
nickname: world
friends: UID
}
}
I don't know if I got that right, so I would really appreciate any help you guys could give me at this subject.
Thanks in advance!

Seems like a good place to start. I would make two changes though.
keep the list is friends separate
keep the friends as a set, instead of a single value or array
keep the list is friends separate
A basic recommendation when using the Firebase Database is to keep your data structure shallow/flat. There are many reasons for this, and you have at least two.
With your current data structure, say that you want to show a list of user names. You can only get that list by listening to /users. And that means you don't just get the user name for each user, but also their list of friends. Chances that you're going to show all that data to the user are minimal, so that means that you've just wasted some of their bandwidth.
Say that you want to allow everyone to read the list of user names. But you only want each user to be able to read their own list of friends. Your current data structure makes that hard, since permission cascades and rules are not filters.
A better structure is to keep the list of user profiles (currently just their name) separate from the list of friends for each user.
keep the friends as a set
You current have just a single value for the friends property. As you start building the app you will need to store multiple friends. The most common is to then store an array or list of UIDS:
[ UID1, UID2, UID3 ]
Or
{
"-K.......1": "UID1"
"-K.......5": "UID2"
"-K.......9": "UID3"
}
These are unfortunately the wrong type for this data structure. Both the array and the second collection are lists: an ordered collection of (potentially) non-unique values. But a collection of friends doesn't have to be ordered, it has to be unique. I'm either in the collection or I'm not in there, I can't be in there multiple times and the order typically doesn't matter. That's why you often end up looking for friends.contains("UID1") or ref.orderByValue().equalTo("UID1") operations with the above models.
A much better model is to store the data as a set. A set is a collection of unordered values, which have to be unique. Perfect for a collection of friends. To store that in Firebase, we use the UID as the key of the collection. And since we can't store a key without a value, we use true as the dummy value.
So this leads to this data model:
{
users:{
UID:{
nickname: hello
}
UID2:{
nickname: world
}
}
friends:{
UID:{
UID2: true
}
UID2:{
UID: true
}
}
}
There is a lot more to say/learn about NoSQL data modeling in general and Firebase specifically. To learn about that, I recommend reading NoSQL data modeling and watching Firebase for SQL developers.

I keep a collection of Friends where the users field is an array of 2 user ids: ['user1', 'user2'].
Getting the friends of a user is easy:
friendsCollection.where("users", "array-contains", "user1").get()
This should get you all documents where user1 appears.
Now the tricky part was on how to query a single friend. Ideally, firebase would support multiple values in array-contains, but they won't do that: https://github.com/firebase/firebase-js-sdk/issues/1169
So they way I get around this is to normalize the users list before adding the document. Basically I'm utilizing JS' truthiness to check what userId is greater, and which is smaller, and then making a list in that order.
when adding a friend:
const user1 = sentBy > sentTo ? sentBy : sentTo
const user2 = sentBy > sentTo ? sentTo : sentBy
const friends = { users: [user1, user2] }
await friendsCollection.add(friends)
This basically ensures that whoever is part of the friendship will always be listed in the same order, so when querying, you can just:
await friendsCollection.where("users", "==", [user1, user2]).get()
This obviously only works because I trust the list will always have 2 items, and trust that the JS truthiness will work deterministically, but it's a great solution for this specific problem.

Front-end SQL Search - Dynamically check similar values?

I have a front-end DB search on my website: https://ygoprodeck.com
Currently it checks card names. A user can type in "magician" and get loads of results.
But if a user mis-spells it and types "magican" then no results show.
I've been tracking user inputs and it seems a lot of users are mis-spelling card names and getting no results.
A common example: There are cards with D/D/D in the name but users often type DDD which means they get no results.
Is there a pure SQL method to modify this behaviour or would it need to be implemented through JS/PHP?
Thanks!

How about saving all the mis-spells in new table like
CREATE TABLE misSpells(
misSpell VARCAHR,
correspondingOne VARCHAR
)
Examples:
magician, magician
magican, magician
DDD, DDD
D/D/D, DDD
and query this table for the corresponding one?

Yii2 is there a way to specify tablename in ActiveQuery conditions (like andWhere) in a nice and short way

I make a query (with \yii\db\ActiveQuery) with joins, and some fields in "where" clause become ambigous. Is there a nice and short way to specify the name of the current model`s (ActiveRecord) table (from which one the ActiveQuery was instantiated) before the column name? So I can use this all the time in all cases and to make it short.
Don't like doing smth like this all the time (especially in places where there're no joins, but just to be able to use those methods with joins if it will be needed):
// in the ActiveQuery method initialized from the model with tableName "company"
$this->andWhere(['{{%company}}.`company_id`' => $id]);
To make the "named scopes" to work for some cases with joins..
Also, what does the [[..]] mean in this case, like:
$this->andWhere(['[[company_id]]' => $id]);
Doesn't seem to work like to solve the problem described above.
Thx in advance!
P.S. sorry, don't have enough reputation to create tag yii2-active-query

to get real table name :
Class :
ModelName::getTableSchema()->fullName
Object :
$model::getTableSchema()->fullName

Your problem is a very common one and happens most often with fields liek description, notes and the like.
Solution
Instead of
$this->andWhere(['description'=>$desc]);
you simply write
$this->andWhere(['mytable.description'=>$desc]);
Done! Simply add the table name in front of the field. Both the table name and the field name will be automatically quoted when the raw SQL is created.
Pitfall
The above example solves your problem within query classes. One I struggled over and took me quite some time to solve was a models relations! If you join in other tables during your queries (more than just one) you could also run into this problem because your relation-methods within the model are not qualified.
Example: If you have three tables: student, class, and teacher. Student and teacher probably are in relation with class and both have a FK-field class_id. Now if you go from student via class to teacher ($student->class->teacher). You also get the ambigous-error. The problem here is that you should also qualify your relation definitions within the models!
public function getTeacher()
{
return $this->hasOne(Teacher::className(), ['teacher.id' => 'class.teacher_id']);
}
Proposal
When developing your models and query-classes always fully qualify the fields. You will never ever run into this problem again...that was my experience at least! I actually created my own model-gii-template. So this gets solved automatically now ;)
Hope it helped!

DynamicQuery: How to select a column with linq query that takes parameters

We want to set up a directory of all the organizations working with us. They are incredibly diverse (government, embassy, private companies, and organizations depending on them ). So, I've resolved to create 2 tables. Table 1 will treat all the organizations equally, i.e. it'll collect all the basic information (name, address, phone number, etc.). Table 2 will establish the hierarchy among all the organizations. For instance, Program for illiterate adults depends on the National Institute for Social Security which depends on the Labor Ministry.
In the Hierarchy table, each column represents a level. So, for the example above, (i)Labor Ministry - Level1(column1), (ii)National Institute for Social Security - Level2(column2), (iii)Program for illiterate adults - Level3(column3).
To attach an organization to an hierarchy, the user needs to go level by level(i.e. column by column). So, there will be at least 3 situations:
If an adequate hierarchy exists for an organization(for instance, level1: US Embassy), that organization can be added (For instance, level2: USAID).--> US Embassy/USAID, and so on.
How about if one or more levels are missing? - then they need to be added
How about if the hierarchy need to be modified? -- not every thing need to be modified.
I do not have any choice but working by level (i.e. column by column). I does not make sense to have all the levels in one form as the user need to navigate hierarchies to find the right one to attach an organization.
Let's say, I have those queries in my repository (just that you get the idea).
Query1
var orgHierarchy = (from orgH in db.Hierarchy
select orgH.Level1).FirstOrDefault;
Query2
var orgHierarchy = (from orgH in db.Hierarchy
select orgH.Level2).FirstOrDefault;
Query3, Query4, etc.
The above queries are the same except for the property queried (level1, level2, level3, etc.)
Question: Is there a general way of writing the above queries in one? So that the user can track an hierarchy level by level to attach an organization.
In other words, not knowing in advance which column to query, I still need to be able to do so depending on some conditions. For instance, an organization X depends on Y. Knowing that Y is somewhere on the 3rd level, I'll go to the 4th level, linking X to Y.
I need to select (not manually) a column with only one query that takes parameters.
=======================
EDIT
As I just said to #Mark Byers, all I want is just to be able to query a column not knowing in advance which one. Check this out:
How about this
Public Hierarchy GetHierarchy(string name)
{
var myHierarchy = from hierarc in db.Hierarchy
where (hierarc.Level1 == name)
select hierarc;
retuen myHierarchy;
}
Above, the query depends on name which is a variable. It mighbe Planning Ministry, Embassy, Local Phone, etc.
Can I write the same query, but this time instead of looking to much a value in the DB, I impose my query to select a particular column.
var myVar = from orgH in db.Hierarchy
where (orgH.Level1 == "Government")
select orgH.where(level == myVariable);
return myVar;
I don't pretend that select orgH.where(level == myVariable) is even close to be valid. But that is what I want: to be able to select a column depending on a variable (i.e. the value is not known in advance like with name).
Thanks for helping

How about using DynamicQueryable?
http://weblogs.asp.net/scottgu/archive/2008/01/07/dynamic-linq-part-1-using-the-linq-dynamic-query-library.aspx

Your database is not normalized so you should start by changing the heirarchy table to, for example:
OrganizationId Parent
1 NULL
2 1
3 1
4 3
To query this you might need to use recursive queries. This is difficult (but not impossible) using LINQ, so you might instead prefer to create a parameterized stored procedure using a recursive CTE and put the query there.

"Diffing" objects from a relational database

Our win32 application assembles objects from the data in a number of tables in a MySQL relational database. Of such an object, multiple revisions are stored in the database.
When storing multiple revisions of something, sooner or later you'll ask yourself the question if you can visualize the differences between two revisions :) So my question is: what would be a good way to "diff" two such database objects?
Would you do the comparison at the database level? (Doesn't sound like a good idea: too low-level, and too sensitive to the schema).
Would you compare the objects?
Would you write a function that "manually" compares the properties and fields of two objects?
How would you store the diff? In a separate, generic "TDiff" object?
Any general recommendations on how to visualize such things in a user interface?
Advice, or stories about your own experiences with this, are very welcome; thanks a bunch!
Extra info on use case (20090515)
In reply to Antony's comment: this specific application is used to schedule training courses, run by teams of teachers. The schedule of a teacher is stored in various tables in the database, and contains info such as "where does she have to go on which day", "who are her colleagues in the team", etc. This information is spread out over multiple tables.
Once in a while, we "publish" the schedule, so the teachers can see it on a webpage. Each "publication" is a revision, and we'd like to be able to show the users (and later also the teachers) what's changed between two publications --- if anything.
Hope that makes the scenario a bit more tangible :)
Some final remarks
Well, the bounty has come to an end, so I've accepted an answer. If it'd somehow be possible to slice a couple of extra 100's off of my rep and give it to some of the other answers, I would do so without hesitation. All your guys' help has been great, and I am very grateful! ~ Onno 20090519

Just an idea, but would it be worthwhile for you to convert the two object versions being compared to some text format and then comparing these text objects using an existing diff program - like diff for example? There are lots of nice diff programs out there that can offer nice visual representations, etc.
So for example
Text version of Object 1:
first_name: Harry
last_name: Lime
address: Wien
version: 0.1
Text version of Object 2:
first_name: Harry
last_name: Lime
address: Vienna
version: 0.2
The diff would be something like:
3,4c3,4
< address: Wien
< version: 0.1
---
> address: Vienna
> version: 0.2

Assume that a class has 5 known properties - date, time, subject, outline, location. When I look at my schedule, I'm most interested in the most recent (ie current/accurate) version of these properties. It would also be useful for me to know what, if anything, has changed. (As a side note, if the date, time or location changed, I'd also expect to get an email/sms advising me in case I don't check for an updated schedule :-))
I would suggest that the 'diff' is performed at the time the schedule is amended. So, when version 2 of the class is created, record which values have changed, and store this in two 'changelog' fields on the version 2 object (there must already be one parent table that sits atop all your tables - use that one!). One changelog field is 'human readable text' eg 'Date changed from Mon 1 May to Tues 2 May, Time changed from 10:00am to 10:30am'. The second changelog field is a delimted list of changed fields eg 'date,time' To do this, before saving you would loop over the values submitted by the user, compare to current database values, and concatenate 2 strings, one human readable, one a list of field names. Then, update the data and set your concatenated strings as the 'changelog' values.
When displaying the schedule load the current version by default. Loop through the fields in the changelog field list, and annotate the display to show that the value has changed (a * or a highlight, etc). Then, in a separate panel display the human readable change log.
If a schedule is amended more than once, you would probably want to combine the changelogs between version 1 & 2, and 2 & 3. Say in version 3 only the course outline changed - if that was the only changelog you had when displaying the schedule, the change to date and time wouldn't be displayed.
Note that this denormalised approach won't be great for analysis - eg working out which specific location always has classes changed out of it - but you could extend it using an E-A-V model to store the change log.

Doing a comparison at the database level would be good if what you cared about was changes to the database. That makes the most sense if you're trying to design a layer of generic functionality on top of the database itself.
Doing a comparison at the object level would be good if you care about changes to the data. For example, if the data was the input to a program and you were interested in looking at changes in the input to verify that changes to the output were correct.
Your use case doesn't appear to be either of these. You appear to care about the output and want differences from that perspective. If that's the case, I would do differences on the output report (or a pure-text version of it) instead of on the underlying data. You can do that with any off-the-shelf diff tool. To make things easier for your end-users you could parse the diff results and render them as HTML. There are lots of options here: side-by-side with color coding to indicate changes, one document with markup for changes (e.g. red strikethrough for deletions and green for additions), maybe just highlight areas that have changed and use balloons to show the previous/current values on demand.
I've thought about doing database comparisons but never tried to implement it. As you noted, any such attempts are intimately intertwined with the schema.
I have done object-level comparisons. The general algorithm was this:
Do a set comparison on the lists of object IDs. This creates three result groupings: added objects, deleted objects, and objects that live in both sets.
Report the deletions.
Report the additions.
For the things in both sets, do an attribute-by-attribute comparison.
If any differences are found, report the object ID, the attributes that differ, and the respective values. If appropriate, highlight the portion of the attribute value that has changed.
In my case, the comparison algorithms were hand-written to match the object attributes. This gave me control over which attributes were compared and how. A generic comparator might be possible for some cases but would depend on the situation and at least partially on the implementation language.

I've looked into MysQL Diffing a number of times. Unfortunately, there aren't any really good solutions available.
One tool I've tried was mysqldiff (www.mysqldiff.org). mysqldiff is a tool written in PHP which is capable of diffing mysql schemas. Unfortunately, it doesn't do a great job a lot of the time.
MySQL Workbench, MySQLs own SQL IDE provides the option to generate an alter script and I would imagine it does this by performing some kind of diff operation internally.
Aqua Data Studio is another tool that is capable of comparing schemas and outputing a diff of the two. While the ADS diff is quite nice, it does not provide a tool to create an alter script.
If I were writing my own I guess I would write code capable of comparing structure of two tables. Such code could be tuned to be highly sensitive (Ig if column order differs from from version to the next, it's a difference) or more moderately sensitive (Eg Column order is not a major issue, datatypes and lengths are important, as are indices and constraints).
Storage, I'm not to sure. I would look into how a version control system such as Mercurial stores its diff information for revisions and use that to elaborate a method appropriate for the DB.
Finally, for visual output I recommend you take a look at the Aqua Data Stduio compare feature (You can use the Trial version to test this...). Its diff output is pretty good.

My application dbscript compares hierarchical data (database schemas) in a stored procedure, which of course has to compare each field/property of every object with its counterpart. I guess you won't get around that step (unless you have a generic object description model)
As for the UI part of your question, have a look at screenshots to view and select differences.

I would think about some sort of common text representation of the objects and let the texts compare with an existing diffing tool like WinMerge.
I see no need to invent diffing by myself since there are already plenty of nice tools I can use.

In your situation in PostgreSQL I used a difference tables with the schema:
history_columns (
column_id smallint primary key,
column_name text not null,
table_name text not null,
unique (table_name, column_name)
);
create temporary sequence column_id_seq;
insert into history_columns
select nextval('column_id_seq'), column_name, table_name
from information_schema.columns
where
table_name in ('table1','table2','table3')
and table_schema=current_schema() and table_catalog=current_database();
create table history (
column_id smallint not null references history_columns,
id int not null,
change_time timestamp with time zone not null
constraint change_time_full_second -- only one change allowed per second
check (date_trunc('second',change_time)=change_time),
primary key (column_id,id,change_time),
value text
);
And on the tables I used a trigger like this:
create or replace function save_history() returns trigger as
$$
if (tg_op = 'DELETE') then
insert into historia values (
find_column_id('id',tg_relname), OLD.id,
date_trunc('second',current_timestamp),
OLD.id );
[for each column_name] {
if (char_length(OLD.column_name)>0) then
insert into history values (
find_column_id(column_name,tg_relname), OLD.id,
OLD.change_time, OLD.column_name
)
}
elsif (tg_op = 'UPDATE') then
[for each column_name] {
if (OLD.column_name is distinct from NEW.column_name) then
insert into history values (
find_column_id(column_name,tg_relname), OLD.id,
OLD.change_time, OLD.column_name
);
end if;
}
end if;
$$ language plpgsql volatile;
create trigger save_history_table1
before update or delete on table1
for each row execute procedure save_history();

This isn't really an answer to the question you asked rather an attempt to re-imagine the problem. Would you consider altering your database and object model to store the aggregate root and a series of deltas? That is, model and store RevisionSets that are collections of Revisions; a Revision is an entity property paired with a value. In a sense this is internalizing the revision structure into your architecture that the other posters are suggesting that you bolt-on to what you already have via "logs".
It's trivial to display the aggregate from the deltas, and even easier to display the deltas as a change history. The fact that you are using a rich client with state and local memory makes this even more compelling. You could very easily display "all the changes since date xxxx" without revisiting the database.
Credit for the basic idea goes to Greg Young and his work with financial data streams, but it is imminently applicable to your problem.

I'm riffing off of what Harry Lime suggested: Output your properties to text format, then hash the results. That way you can compare the hash values and easily flag the data that has been altered. This way you get the best of both worlds as you can visually see differences but programmatically identify differences. With the has you'll have a good source for an index should you want to store and retrieve the deltas.

Given you want to create a UI for this and need to indicate where the differences are, it seems to me you can either go custom or create a generic object comparer - the latter being dependent on the language you are using.
For the custom method, you need to create a class that takes to two instances of the classes to be comparied. It then returns differences;
public class Person
{
public string name;
}
public class PersonComparer
{
public PersonComparer(Person old, Person new)
{
....
}
public bool NameIsDifferent() { return old.Name != new.Name; }
public string NameDifferentText() { return NameIsDifferent() ? "Name changed from " + old.Name + " to " + new.Name : ""; }
}
This way you can use the NameComparer object to create your GUI.
The gereric approach would be much the same, just that you generalize the calls, and use object insepection (getObjectProperty call below) to find differences;
public class ObjectComparer()
{
public ObjectComparer(object old, object new)
{
...
}
public bool PropertyIsDifferent(string propertyName) { return getObjectProperty(old, propertyName) != getObjectProperty(new, propertyName) };
public string PropertyDifferentText(string propertyName) { return PropertyIsDifferent(propertyName) ? propertyName + " " + changed from " + getObjectProperty(old, propertyName) + " to " + getObjectProperty(new, propertyName): ""; }
}
}
I would go for the second, as it makes things really easy to change GUI on needs. The GUI I would try 'yellowing' the differences to make them easy to see - but that depends on how you want to show the differences.
Getting the object to compare would be loading your object with the initial revision and latest revision.
My 2 cents... Not as techy as the database compare stuff already here.

Have you looked at Open Source DiffKit?
www.diffkit.org
I think it does what you want.

Example with Oracle.
Export ordered objects to text with dbms_metadata
Export ordered tables data into CSV or query format
Make big text file
Diff

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008