I have a db schema where user data is stored using foreign key references , these foreign keys are admin defined . Also , there is some data which is stored without primary keys , however I have other constraints in place to avoid redundancy and other issues .
Due to the requirements of the application when a user 'updates' their info I have to delete all user records from the 'updated' table and reinsert all the user's records again . (I have looked into all other options)
Now because of my search solution (solr) , I need to track changes to the user data (updates/deletes) . I am planning on having a view to compare the last committed data to the real time data . I am fearful of how sustainable it would be to have a stored procedure running every 20 minutes or so , is there a better way of tracking data with SQL ?
You could create a change table that contains the same columns as the original table plus another column called "UpdatedOn." Then set up a trigger to write to this change table the original values when the original table is changed.
Example:
Original Table:
Name | Address | City
Jane Doe | 1 Main St. | New York
Change to Original Table:
Name | Address | City
Jane Doe | 2 Smith St. | Dubuque
...which triggers an insert to the Change Table:
Name | Address | City | UpdatedOn
Jane Doe | 1 Main St. | New York | 2012-01-01
There is information about using triggers in mysql here: http://dev.mysql.com/doc/refman/5.0/en/triggers.html
Related
I have a database that has to work with 2 countries, IT and RO.
I have a table called User, that contains also the birthplace.
User
| id | name | surname | birthplace |
| 1 | Test | Test | New York |
I also have 2 tables for the birthplace, one for the IT ones and one for the ROs. I cannot store all the cities in one table because IT and RO have a different gerarchy (region, province, district...). So my first thought was to do a birthplace field for each country, like this:
User
| id | name | surname | birthplaceIT | birthplaceRO |
The problem is that every time a nation is added, I'd have to modify the database and the application. On the other side, I cannot make a "birthplace" table because the IT and RO addresses are not compatible.
So, I cannot do this:
Birthplace
| idUser | country | city |
Because I cannot refer "city" to both the IT cities table and the RO ones.
Suggestions?
EDIT. In my PHP application i'm using Symfony with Doctrine, an ORM, so I NEED the Foreign Key constraint between the User and the CityID!
Instead of storing birthplace in User table, change it to let's name it birthplace ID - it can be simply integer but you can do something more sophisticated and use unique codes (your own or maybe there are "proper" geographical codes).
Then you can have table for each country specific birthplace and join tables based on birthplace ID. This way you can keep each country specific geographical hierarchy in its own table. If you need to add another country - you simply create another table for that country and join it with User.
In C, the compiler assigns "strings" numeric IDs (4-byte pointers) and only keeps one copy of each string: for char *a="Hello", *b="Hello";, only one copy of "Hello" is stored in memory. This is totally automatic and transparent to the user.
My question is whether MySQL can do the same, i.e, de-duplicate strings automatically and transparently to the user.
Ideally, I would expect it to be an internal storage mechanism of the database, so that (as in case of C) for the user the database would look and behave completely as if it contained actual strings, while in implementation it would only contain pointers.
In my database there are many repeating strings, like this:
`unit`, `building`, `office`, `firstName`, `lastName`
Chicago main production unit | headquarters | accounting | Jane | Smith
Chicago main production unit | office | sales | Jane | Dow
Miami administrative department | headquarters | sales | Mary | Smith
Miami administrative department | office | accounting | Mary | Dow
etc. where strings like 'Miami administrative department' or 'accounting' or 'Smith' are repeated many times in different records.
This increases the size of the database, so that I hit hosting limitations.
An obvious solution is data normalization: to maintain a separate table for names
`id`, `string`
1 | Chicago main production unit
2 | Miami administrative department
3 | headquarters
4 | accounting
5 | Jane
6 | Smith
7 | office
8 | sales
9 | Dow
and then have my table as
`unit_id`, `building_id`, `office_id`, `firstName_id`, `lastName_id`
1 | 3 | 4 | 5 | 6
1 | 7 | 8 | 5 | 9
and translate all strings on input and output. But of course this is very cumbersome.
My question is whether MySQL can do it automatically and transparently for the user: whenever I INSERT a row, it would automatically update the table of strings and only store the ids instead of strings in the table, and same for DELETE, WHERE, etc., so that to the user the table would look exactly the same as if it had strings, but occupy less space.
My question is whether MySQL can do the same.
Although you can certainly achieve the desired result (it is called data normalization) MySQL does not do it implicitly.
Can MySQL do it automatically and transparently for the user?
No, MySQL cannot do it automatically for you - you have to do it yourself. You need to be explicit about it in your queries and DDL statements.
Here is a short demo to show how you can create a lookup table, and then use it in your inserts and selects:
create table lookup(id int, name varchar(10));
create table data(id int, id_lookup int);
insert into lookup(id,name) values (1,'quick');
insert into lookup(id,name) values (2,'brown');
insert into lookup(id,name) values (3,'fox');
insert into data (id, id_lookup)
values (110, (select id from lookup where name = 'quick'));
insert into data (id, id_lookup)
values (120, (select id from lookup where name = 'brown'));
insert into data (id, id_lookup)
values (130, (select id from lookup where name = 'quick'));
insert into data (id, id_lookup)
values (140, (select id from lookup where name = 'fox'));
Now data has these rows:
110 1
120 2
130 1
140 3
To select the name, you need to join to your lookup table:
select d.id, t.name
from data d
join lookup t on t.id=d.id_lookup
Demo on sqlfiddle.
Note: it is uncommon to create a lookup table for all your strings. Commonly you would create a separate lookup table for each kind of strings (i.e. unit_lookup, building_lookup, and so on) or to partition your lookup table with a special lookup code column:
id code name
-- ---- ----
1 unit Chicago
2 unit Miami
3 bldg Headquarters
4 bldg Office
I have following table structure:
+------------------+ +---------------------+
| Users | | Data |
+------------------+ +---------------------+
| id | uname_UK | | id |user_id_FK |data|
+-----|------------| +---------------------+
| 1 | foobar | | 1 | 1 | aa |
| 2 | bazqui +<-------+ 2 | 3 | bb |
| 3 | foobaz | | 3 | 2 | cc |
+------------------+ | 4 | 2 | dd |
+---------------------+
The problem now is, that during storing data in database there was typo. The user named foobaz should be named foobar. The uname column has a Unique constraint.
My question is how to easily fix this problem? When I update the username table, I get error - duplicate uniqe key, as expected. In the end I would like to have the foreign keys updated too.
My idea was do some trigger magic, but I was hoping there would be some more elegant solution. Another constraint here is, that the update is initiated through frontend, so I cannot use PHP.
Alternate way would be to drop the Unique constraint and make some cron job, to periodically update the database and remove the duplicate entries.
Thanks.
Why not just delete the record? Update all data to the user you want to keep and delete the obsolete user.
In Oracle you can do this using the merge into statement. I don't know if that is possible to do in one statement in MySQL, but you might as well execute a separate delete for it. You can make it trigger magic, but I doubt if it's a good decision to always autmagically merge the users. The new username might be a typo too.
So in a normal application, if this would happen so often, I would make a 'merge users' functionality that lets you do just this.
What you should do, is figure out what it means to your data that two users are actualy one. In this case, since there are two records in Data for user ID 2, it seems as if it's okay for users to have several records in Data and you can just
UPDATE data
SET user_id_FK = 1
WHERE user_id_FK = 3;
DELETE FROM users
WHERE id = 3;
In general, you need to figure this out at an application level.
What if there's a foo counter for each user? You should probably add the value from the user you'll be deleting to the value you're keeping.
What if a user has an address?
What if a user can only have one e-mail address and your duplicate user has a different one? Which do you keep?
This is not an easy question with a general answer.
I have a scenario where I need to insert the data into table temporarily and later on approval or confirmation, make it permanent. The data will be inserted by a user and approval or denial needs to be done by Super User.
What I think of now is to have two different but identical tables (temporary and main) and the user will insert the data into temp table. After confirmation of Super User, the data will be moved to main table. But the problem comes when a database contains very large number of tables then this process will become more complex.
EDIT : This implies to CREATE EDIT & DELETE commands.
Is there any simpler or better approach of doing this?
Please suggest.
Using a version table (related to comment):
The idea here is to have a version table; when your user changes a piece of information the new version is stored in this table along with the related ID.
Then all you need to do is join on the PersonID and select the most recent accepted version.
This means the user can make as many updates as they want but they won't show until the super user accepts them, it also means the data is never destroyed (stored in the version table) and they don't need to implement rollback as it's already there!
See: http://sqlfiddle.com/#!3/cc77f/4
People Table:
ID | Age Etc... (Info That Doesn't Change)
-----------------------
1 | 12
2 | 16
3 | 11
People Version Table:
VersionID | PersonID | Name | Approved
-----------------------
1 | 1 | Stevz | FALSE
2 | 1 | Steve | TRUE
3 | 2 | James | TRUE
4 | 3 | Jghn | FALSE
5 | 3 | John | TRUE
Example table SQL
CREATE TABLE People
(
id int identity primary key,
age int
);
CREATE TABLE PeopleVersion
(
versionId int identity primary key,
peopleId int,
name varchar(30),
approved varchar(30)
);
Example Query
SELECT * FROM People p
INNER JOIN PeopleVersion v ON p.id = v.peopleID
WHERE v.approved = 'TRUE'
ORDER BY versionId DESC
A further insight:
You could even have three states of Approved; null meaning no admin has chosen yet, TRUE meaning it was accepted and FALSE meaning it was rejected
You could show the user the most recent from null and true, show the admin all three and show the other users of the site only versions that were true
Old Comments
Could you just add a field called approved to the table and then hide anything without the approval flag set to TRUE?
It could default to FALSE and only the super user would be able to see items with the flag set to FALSE
E.g.
Name | Age | Approved
-----------------------
Steve | 12 | FALSE
James | 16 | TRUE
John | 11 | FALSE
The user would only see James, but the SuperUser would see all three listed
Alternatively using your temporary and main tables is the other way of looking at this problem, though this may lead to problems as everything get's larger
The easiest approach is a flag within the table marking an entry either approved or not-yet approved.
Then just change the retrieving logic to only show entries where that flag is set to approved.
I have two entities: Event and Location. The relations are:
1 Event can have 1 Location.
1 Location can have many Events.
Basically I want to store events. And each event is hosted in a specific location. When I say specific location I mean:
Street, Number, City, Zip Code, State, Country
I basically have a design question, that I would like some help with:
1 - Right now I am thinking on doing the following:
Event table will have a location_id that will point to a specific location row in the locations table. What happens with this is that:
I will have many repeated values in each row. For example, if an event is happening in 356 Matilda Street in San Francisco, and another one is happening in 890 Matilda Street in San Francisco. The values Matilda Street and San Francisco will be duplicated many times in the location table. How can I redesign that to normalize this?
So, basically I would love to hear a good approach to solve this question in terms of a relational database, like MySQL.
If you want a strictly normalized database, you could have a table for street names, another for cities, another for states, and so on. You might even have an additional location table that holds unique combinations of street, city, and state; you'd add rows to this table each time an event occurs at a previously unknown location. Then each of your events would reference the appropriate row in the location table.
In practice, though, it's sometimes better simply to store the location data directly within the events table and tolerate the extra memory usage; there's always a trade-off between speed and memory use.
Another consideration: what happens if a street is renamed? Do you want old events to be associated with the old name or the new name?
Each location in the locations table should be uniquely identifiable by its PRIMARY KEY. Records in the events table then reference their associated location with column(s) that contain the value of that PRIMARY KEY.
For example, your locations table might contain:
location_id | Street | Number | City | Zip Code | State | Country
------------+----------------+--------+---------------+----------+-------+---------
1 | Matilda Street | 356 | San Francisco | 12345 | CA | USA
2 | Matilda Street | 890 | San Francisco | 12345 | CA | USA
Then your events table might contain:
event_id | location_id | Date | Description
---------+-------------+------------+----------------
1 | 1 | 2012-04-28 | Birthday party
2 | 1 | 2012-04-29 | Hangover party
3 | 2 | 2012-04-29 | Funeral
4 | 1 | 2012-05-01 | May day!
In this example, location_id is the PRIMARY KEY in the locations table and a FOREIGN KEY in the events table.
Your Locations table should have a unique ID such as 1=Matilda Street, 2=Market Street - one record for each possible location NO DUPLICATES, then your Events table should have a location ID that uses one of those IDs - again, one for each event, no duplicate.
You can then join them like this;
SELECT events.event_name, locations.location_name
FROM events
JOIN locations on locations.location_id = events.location_id
The duplication is very normal because each of them is an unique location. And beyond that, the design you think is very usable when you try to filter the places in san francisco.