Database not having unique ID's - mysql

I have a database containing attendance in monthly basis. Now, I want to display that data on a series of text box. But my problem is that it does not contain any unique id's that's making my task difficult. Have a look at the attachment so that you guys can get the picture of my problem.
http://s26.postimg.org/p8v0zhemx/image.png
Thank you so much in advance.
EDIT:
For future researchers using listview, this is the query for my MySQL.
You have to make a composite key if your db does not have a unique id. Google it.
The query i managed to pull out from my head.
"SELECT empno, line1, time1, line2, time2, line3, time3, line4, time4, line5, time5, line6, time6 FROM attendancelist WHERE empno = '" & ListPayroll.SelectedItems(0).Text & "' AND line1 = '" & ListPayroll.SelectedItems(0).SubItems(1).Text & "'"

It looks to me like your sample data table contains tons of attendance data that basically look like this:
employee workdate starttime endtime
00117 2014-02-03 08:15 17:30
00117 2014-02-04 09:00 17:30
00117 2014-02-05 null null
etc.
If the employee was absent on the given day, that's indicated by null values in starttime and endtime. If the employee was not employed at all on the particular date, you'd simply leave the row out of the table entirely. I think that's what the first five days of employee 00001 in your sample data's first row mean -- not present, not absent.
Your raw data is arranged in a pretty doggone inconvenient report layout that puts a week's work of days on each row. You can probably write a simple dotnet program to slurp up your six-day-week input table and insert six rows (or fewer) of this data from each row of that table.
Once you've loaded your data from that input table, you can switch over to maintaining it in your new table. That will be much easier for you to handle in a program. You will also be able to write a query program that will recreate your six-day-row report, if that's what your users prefer.
Arranged the way I have shown it, you'll get a nice little attendance table. If you know ahead of time you'll have at most one record per day per employee, you can use the columns I've shown, and use a composite primary key consisting of (employee, workdate).
If you might have more than one row per employee per date you'll need to add an id column, that can be an autoincrementing surrogate primary key.

If all you need is an arbitrary unique identifier for update and delete (as indicated in the comments), then add one:
ALTER TABLE my_table
ADD COLUMN id INT AUTO_INCREMENT PRIMARY KEY;
That is, of course, assuming you have that ability or can convince someone who does. It is a remarkably minor change. If column names are specified in the existing INSERT queries, it won't require a change to them. Someone ought to be willing to do it.
If you have the freedom to modify the schema, please consider revising it. This is very, very poorly designed (having repetitive columns and columns containing multiple pieces of information). If you cannot modify this one, creating a new, better designed schema and importing data from this schema may be another option you want to consider. (Creating a "new schema" could also be accomplished using a set of separate tables.)
Also be aware that with the current structure, you will need extremely heavy validation code side to prevent users from saving invalid data when they modify it.

Related

MySQL history table design and query

TL;DR: Is this design correct and how should I query it?
Let's say we have history tables for city and address designed like this:
CREATE TABLE city_history (
id BIGINT UNSIGNED NOT NULL PRIMARY KEY,
name VARCHAR(128) NOT NULL,
history_at DATETIME NOT NULL,
obj_id INT UNSIGNED NOT NULL
);
CREATE TABLE address_history (
id BIGINT UNSIGNED NOT NULL PRIMARY KEY,
city_id INT NULL,
building_no VARCHAR(10) NULL,
history_at DATETIME NOT NULL,
obj_id INT UNSIGNED NOT NULL
);
Original tables are pretty much the same except for history_id and obj_id (city: id, name; address: id, city_id, building_no). There's also a foreign key relation between city and address (city_id).
History tables are populated on every change of the original entry (create, update, delete) with the exact state of the entry at given time.
obj_id holds id of original object - no foreign key, because original entry can be deleted and history entries can't. history_at is the time of creation of history entry.
History entries are created for every table independently - change in city name creates city_history entry but does not create address_history entry.
So to see what was the state of the whole address with city (e.g. on printed documents) at any T1 point in time, we take from both history tables most recent entries for given obj_id created before T1, right?
With this design in theory we should be able to see the state of signle address with city at any given point of time. Could anyone help me create such a query for given address id and time? Please note that there could be multiple records with the same exact timestamp.
There is also a need to create a report for showing every change of state of given address in given time period with entries like "city_name, building_no, changed_at". Is it something that can be created with SQL query? Performance doesn't matter here so much, such reports won't be generated so often.
The above report will probably be needed in an interactive version where user can filter results e.g. by city name or building number. Is it still possible to do in SQL?
In reality address table and address_history table have 4 more foreign keys that should be joined in report (street, zip code, etc.). Wouldn't the query be like ten pages long to provide all the needed functionality?
I've tried to build some queries, play with greatest-n-per-group, but I don't think I'm getting anywhere with this. Is this design really OK for my use cases (if so, can you please provide some queries for me to play with to get where I want?)? Or should I rethink the whole design?
Any help appreciated.
(My answer copied from here, since that question never marked an answer as accepted.)
My normal "pattern" in (very)pseudo code:
Table A: a_id (PK), a_stuff
Table A_history: a_history_id (PK), a_id(FK referencing A.a_id), valid_from, valid_to, a_stuff
Triggers on A:
On insert: insert values into A_history with valid_from = now, and valid_to = null.
On update: set valid_to = now for last history record of a_id; and do the same insert from the "on insert" trigger with the updated values for the row.
On delete: set valid_to = now for last history record of a_id.
In this scenario, you'd query history with "x >= from and x < to" (not BETWEEN as the a previous record's "from" value should match the next's to "value").
Additionally, this pattern also makes "change log" reports easier.
Without a table dedicated to change logging, the relevant records can be found just by SELECT * FROM A_history WHERE valid_from BETWEEN [reporting interval] OR valid_to BETWEEN [reporting interval].
If there is a central change log table, the triggers can just be modified to include log entry inserts as well. (Unless log entries include "meta" data such as reason for change, who changed, etc... obviously).
Note: This pattern can be implemented without triggers. Using a stored procedure, or even just multiple queries in code, can actually negate the need for the non-history table.
The history table's "a_id" would need to be replaced with whatever uniquely identifies the record normally though; it could still be an id value, but these values would need synthesized when inserting, and known when updating/deleting.
Queries:
(if not new) UPDATE the most recent entry's valid_to.
(if not deleting) INSERT new entry
This is a very "traditional" Problem, when it comes down to versioning (or monitoring) of changes to a certain row.
There are various "solutions", each having its own drawback and advantage.
The following "statements" are a result of my expericence, they are neither perfect, nor do I claim they are the "only ones"!
1.) Creating a "history table": That's the worst Idea of all. You would always need to take into account which table you need to query, depending on DATA that should be queried. That's a "Chicken-Egg" Problem...
2.) Using ONE Table with ONE (increasing) "Revision" Number: That's a better approach, but it will get "hard" to query: Determining the "most recent row" per "id" is very costly no matter which aproach is used.
My personal expierence is, that following the pattern of a "double linked List" ist the best to solve this, when it comes down to Millions of records:
3.) Maintain two columns among every entity, let's say prev_version_id and next_version_id. prev_version_id points to NULL, if there is no previous version. next_version_id points to NULL if there is no later version.
This approach would require you to ALWAYS perform two actions upon an update:
Create the new row
Update the old rows reference (next_version_id) to the just insterted row.
However, when your database has grown to something like 100 Million Rows, you will be very happy that you have choosen this path:
Querying the "Oldest" Version is as simple as querying where ISNULL(prev_version_id) and entity_id = 5
Querying the "Latest" Version is as simple as querying where ISNULL(next_version_id) and entity_id = 5
Getting a full version history will just target the entity_id=5 of the data-table, sortable by either prev_version_id or next_version_id.
The very often neglected fact: The first two queries will also work to get a list of ALL first versions or of ALL recent versions of an entity - in about NO TIME! (Don't underestimate how "costly" it can be do determine the most recent version of an entity otherwise! Believe me, when "testing" everything seems equaly fine, but the real struggle starts when live-data with millions of records is used.)
cheers,
dognose

Need Help in Creating History Table in Access

In MS Access, i need to create a history table off a select query that is used for reporting? I don't want an append table as i need the select's data for reporting.
The answer that works best is you do want to use an append query.
Instead of garbaging up your database with lots of history tables, it's better to have one history table with a unique key to differentiate the multiple history reports.
Usually a "Time Stamp" field is a good primary key. Where each record in the report gets the same time stamp.
Also, you can have other key fields depending on the type of report it is. You may want a version field, or a re-try field. You also may want a final copy field. Having these fields will allow you to go back and delete garbage reports or updated reports or bad attempt reports.
Also having solid date fields will allow you to discriminate between daily reports, weekly reports, or monthly reports. (Let alone if you have to worry about Fiscal year, or retail calendars, etc.)
The good thing about having a single table is, you can always go back into your history table and pull out lots of historical data for other types of report comparisons... all in one query instead of trying to tie multiple tables together (mostly with hard to figure out names).
Do yourself and future programmers who will have to deal with your code a favor... and put all the history in one table. Especially since one of those future programmer may be you. You'll be thanking yourself.
Oh... and to get the data for the reporting, you use your primary key to pull out that data. Or... you can have a staging table for your report and then you append the staging table data to the History table (with all the proper key info).

Database design: Managing old and new data in database table

I have a table Student with field as followed,
Student table (one record per student)
student_id
Name
Parent_Name
Address_line1, Address_line2, Addess_line
Photo_path
Signature_file_path
Preferred_examcity_choice1,Preferred_examcity_choice1, Preferred_examcity_choice3
Gender
Nationality
.
.
.
I am inserting into this table on Registration form completion through the web interface.
Now there is one more module in a web interface for updating the student data, on every update request I am updating the student table records and inserting the new entry in student_data_change_request. student can change records any number of times.
student_data_change_request
request_id(auto_incr PK)
old_name
new_name
old_photo_path
new_photo_path
old_signature_file_path
new_signature_file_path
Now coming to problem, earlier students were allowed to change very few fields, now client want to allow the candidate to update more number of fields(around 20 fields) and adding old and new columns for the corresponding column isn't elegant and preferred(I guess), I will end up creating 40 columns to keep track of 20 columns. So how should I redesign my table? suggestions are welcomed.
One approach is to have a shadow table named (table)_xx that has the same columns, the time, date, update/insert/delete flag, user or whatever and no referential integrity. Set a trigger to update that table from the source whenever anything happens.
If you've got genuine business requirements that need history then do those properly but this pattern is great as a general audit, debugging and forensic tool.
It's also really easy to automate/script as you just generate it from the DB metadata.
Usually historical table looks like:
request_id
column_name
old_value
new_value
dt
request_id and column_name are primary key. When you update student table you insert new entry in student_data_change_request for each updating column.
Edited:
Another way:
request_id
value_type
name
photo_path
signature_file_path
...
and insert first entry with old values and second entry with new values. Colum value_type is mark old or new.
I would rather have just one table, with an additional column for effective date. Then a view that picks up just the most recent row for each student_id becomes your first "table". If for some reason you must show "current" and "most recently changed" values side-by-side, that is another view.
As usual, it all depends on how you intend to use the data.
My strong preference in these cases is the solution #mathguy suggests - embedding the concept of time in the main table design. This allows you to ask the question "what was this student's address on 1 Jan?", or "who had signature x on 12 Feb?".
If you have to report or execute business logic that reflects the status at any point in time, this design works really well. For instance, if you have to report on how many students lived in a particular address for a given term, you want to know when the records were valid.
But not all applications care about "time" - sometimes, you just want to have an audit table, so you can trace what happened over time in case of anomalies.
In that case, #loztinspace's solution is useful - but in my experience, this rapidly escalates into more work, because those who want to inspect the audit records can or should not get access to a SQL prompt on your production environment.

Late arriving fact - best way to deal with it

I have a star schema that tracks Roles in a company, e.g. what dept the role is under, the employee assigned to the role, when they started, when/if they finished up and left.
I have two time dimensions, StartedDate & EndDate. While a role is active, the end date is null in the source system. In the star schema i set any null end dates to 31/12/2099, which is a dimension member i added manually.
Im working out the best way to update the Enddate for when a role finishes or an employee leaves.
Right now im:
Populating the fact table as normal, doing lookups on all dimensions.
i then do a lookup against the fact table to find duplicates, but not including the EndDate in this lookup. non matched rows are new and so inserted into the fact table.
matching rows then go into a conditional split to check if the currentEndDate is different from the newEnd Date. If different, they are inserted into an updateStaging table and a proc is run to update the fact table
Is there a more efficient or tidier way to do this?
How about putting all that in a foreach container, it would iterate through and be much more efficient.
I think it is a reasonable solution. I personally would use a Stored Proc instead for processing efficiency, but with your dimensional nature of the DWH and implied type 2 nature, this is a valid way to do it.
The other way, is to do your "no match" leg of the SSIS as is, but in your "match" leg, you could insert the row into the actual fact table, then have a post process T-SQL step which would update the two records needed.

MySQL Database Design Questions

I am currently working on a web service that stores and displays money currency data.
I have two MySQL tables, CurrencyTable and CurrencyValueTable.
The CurrencyTable holds the names of the currencies as well as their description and so forth, like so:
CREATE TABLE CurrencyTable ( name VARCHAR(20), description TEXT, .... );
The CurrencyValueTable holds the values of the currencies during the day - a new value is inserted every 2 minutes when the market is open. The table looks like this:
CREATE TABLE CurrencyValueTable ( currency_name VARCHAR(20), value FLOAT, 'datetime' DATETIME, ....);
I have two questions regarding this design:
1) I have more than 200 currencies. Is it better to have a separate CurrencyValueTable for each currency or hold them all in one table?
2) I need to be able to show the current (latest) value of the currency. Is it better to just insert such a field to the CurrencyTable and update it every two minutes or is it better to use a statement like:
SELECT value FROM CurrencyValueTable ORDER BY 'datetime' DESC LIMIT 1
The second option seems slower.. I am leaning towards the first one (which is also easier to implement).
Any input would be greatly appreciated!!
p.s. - please ignore SQL syntax / other errors, I typed it off the top of my head..
Thanks!
To your questions:
I would use one table. Especially if you need to report on or compare data from multiple currencies, it will be incredibly improved by sticking to one table.
If you don't have a need to track the history of each currency's value, then go ahead and just update a single value -- but in that case, why even have a separate table? You can just add "latest value" as a field in the currency table and update it there. If you do need to track history, then you will need the two tables and the SQL you posted will work.
As an aside, instead of FLOAT I would use DECIMAL(10,2). After MySQL 5.0, this will actually have improved results when it comes to currency handling with rounding.
It is better to have one table holding all currencies
If there is need for historical prices, then the table needs to hold them. A reasonable compromise in many situations is to split the price table into a full list of historical prices and another table which only has the current prices.
Using data type float can be troublesome. Please be sure you know what you are doing. If not, use a database currency data type.
As your webservice is transactional it is better if you'd have to access less tables at the same time. Since you will be reading and writing a lot, I would suggest having a single table.
Its better to insert a field to the CurrencyTable and update it rather than hitting two tables for a single request.