null and not null mysql - mysql

Hi I have started moving access into mysql and I was wondering if there is a constraint or something I could use to be able to make a column not null and still have empty values in it?
This is not my own database, if it was I would just fill in the empty fields and then change the column to not null.

Yes, there are various approaches for modelling missing information without using nulls.
You can choose a value to represent missing. It's quite hard to genrealize so here are a few examples. For the end_date attribute in an open-ended period (i.e. has started but is in progress and not yet finished), use a far-future date such as 9999-12-31. For a person_middle_name attributte, Joe Celko suggests placing metadata values in double-curly braces e.g. {{NK}} for 'not known', {{NA}} for 'not applicable', etc.
Another somewhat intuitive approach for modelling missing information is by the absence of a row in a table. If an employee is unsalaried then do not add a row for them in the Payroll table, thus making them distinct from a salaried employee who is currently receiving no salary represented by a salary_amount of zero in the Payroll table.
A further approach is by the presence of a row in a table. You could have tables for Salaried, Unsalaried and SalaryUnknown and ensure every employee has one row in exactly one of these tables (perhaps enforced in MySQL using triggers and/or procedures).

Related

Database design: Managing old and new data in database table

I have a table Student with field as followed,
Student table (one record per student)
student_id
Name
Parent_Name
Address_line1, Address_line2, Addess_line
Photo_path
Signature_file_path
Preferred_examcity_choice1,Preferred_examcity_choice1, Preferred_examcity_choice3
Gender
Nationality
.
.
.
I am inserting into this table on Registration form completion through the web interface.
Now there is one more module in a web interface for updating the student data, on every update request I am updating the student table records and inserting the new entry in student_data_change_request. student can change records any number of times.
student_data_change_request
request_id(auto_incr PK)
old_name
new_name
old_photo_path
new_photo_path
old_signature_file_path
new_signature_file_path
Now coming to problem, earlier students were allowed to change very few fields, now client want to allow the candidate to update more number of fields(around 20 fields) and adding old and new columns for the corresponding column isn't elegant and preferred(I guess), I will end up creating 40 columns to keep track of 20 columns. So how should I redesign my table? suggestions are welcomed.
One approach is to have a shadow table named (table)_xx that has the same columns, the time, date, update/insert/delete flag, user or whatever and no referential integrity. Set a trigger to update that table from the source whenever anything happens.
If you've got genuine business requirements that need history then do those properly but this pattern is great as a general audit, debugging and forensic tool.
It's also really easy to automate/script as you just generate it from the DB metadata.
Usually historical table looks like:
request_id
column_name
old_value
new_value
dt
request_id and column_name are primary key. When you update student table you insert new entry in student_data_change_request for each updating column.
Edited:
Another way:
request_id
value_type
name
photo_path
signature_file_path
...
and insert first entry with old values and second entry with new values. Colum value_type is mark old or new.
I would rather have just one table, with an additional column for effective date. Then a view that picks up just the most recent row for each student_id becomes your first "table". If for some reason you must show "current" and "most recently changed" values side-by-side, that is another view.
As usual, it all depends on how you intend to use the data.
My strong preference in these cases is the solution #mathguy suggests - embedding the concept of time in the main table design. This allows you to ask the question "what was this student's address on 1 Jan?", or "who had signature x on 12 Feb?".
If you have to report or execute business logic that reflects the status at any point in time, this design works really well. For instance, if you have to report on how many students lived in a particular address for a given term, you want to know when the records were valid.
But not all applications care about "time" - sometimes, you just want to have an audit table, so you can trace what happened over time in case of anomalies.
In that case, #loztinspace's solution is useful - but in my experience, this rapidly escalates into more work, because those who want to inspect the audit records can or should not get access to a SQL prompt on your production environment.

Late arriving fact - best way to deal with it

I have a star schema that tracks Roles in a company, e.g. what dept the role is under, the employee assigned to the role, when they started, when/if they finished up and left.
I have two time dimensions, StartedDate & EndDate. While a role is active, the end date is null in the source system. In the star schema i set any null end dates to 31/12/2099, which is a dimension member i added manually.
Im working out the best way to update the Enddate for when a role finishes or an employee leaves.
Right now im:
Populating the fact table as normal, doing lookups on all dimensions.
i then do a lookup against the fact table to find duplicates, but not including the EndDate in this lookup. non matched rows are new and so inserted into the fact table.
matching rows then go into a conditional split to check if the currentEndDate is different from the newEnd Date. If different, they are inserted into an updateStaging table and a proc is run to update the fact table
Is there a more efficient or tidier way to do this?
How about putting all that in a foreach container, it would iterate through and be much more efficient.
I think it is a reasonable solution. I personally would use a Stored Proc instead for processing efficiency, but with your dimensional nature of the DWH and implied type 2 nature, this is a valid way to do it.
The other way, is to do your "no match" leg of the SSIS as is, but in your "match" leg, you could insert the row into the actual fact table, then have a post process T-SQL step which would update the two records needed.

Database not having unique ID's

I have a database containing attendance in monthly basis. Now, I want to display that data on a series of text box. But my problem is that it does not contain any unique id's that's making my task difficult. Have a look at the attachment so that you guys can get the picture of my problem.
http://s26.postimg.org/p8v0zhemx/image.png
Thank you so much in advance.
EDIT:
For future researchers using listview, this is the query for my MySQL.
You have to make a composite key if your db does not have a unique id. Google it.
The query i managed to pull out from my head.
"SELECT empno, line1, time1, line2, time2, line3, time3, line4, time4, line5, time5, line6, time6 FROM attendancelist WHERE empno = '" & ListPayroll.SelectedItems(0).Text & "' AND line1 = '" & ListPayroll.SelectedItems(0).SubItems(1).Text & "'"
It looks to me like your sample data table contains tons of attendance data that basically look like this:
employee workdate starttime endtime
00117 2014-02-03 08:15 17:30
00117 2014-02-04 09:00 17:30
00117 2014-02-05 null null
etc.
If the employee was absent on the given day, that's indicated by null values in starttime and endtime. If the employee was not employed at all on the particular date, you'd simply leave the row out of the table entirely. I think that's what the first five days of employee 00001 in your sample data's first row mean -- not present, not absent.
Your raw data is arranged in a pretty doggone inconvenient report layout that puts a week's work of days on each row. You can probably write a simple dotnet program to slurp up your six-day-week input table and insert six rows (or fewer) of this data from each row of that table.
Once you've loaded your data from that input table, you can switch over to maintaining it in your new table. That will be much easier for you to handle in a program. You will also be able to write a query program that will recreate your six-day-row report, if that's what your users prefer.
Arranged the way I have shown it, you'll get a nice little attendance table. If you know ahead of time you'll have at most one record per day per employee, you can use the columns I've shown, and use a composite primary key consisting of (employee, workdate).
If you might have more than one row per employee per date you'll need to add an id column, that can be an autoincrementing surrogate primary key.
If all you need is an arbitrary unique identifier for update and delete (as indicated in the comments), then add one:
ALTER TABLE my_table
ADD COLUMN id INT AUTO_INCREMENT PRIMARY KEY;
That is, of course, assuming you have that ability or can convince someone who does. It is a remarkably minor change. If column names are specified in the existing INSERT queries, it won't require a change to them. Someone ought to be willing to do it.
If you have the freedom to modify the schema, please consider revising it. This is very, very poorly designed (having repetitive columns and columns containing multiple pieces of information). If you cannot modify this one, creating a new, better designed schema and importing data from this schema may be another option you want to consider. (Creating a "new schema" could also be accomplished using a set of separate tables.)
Also be aware that with the current structure, you will need extremely heavy validation code side to prevent users from saving invalid data when they modify it.

how to build self referencing table

In the source table, there are two columns as following snapshot shows:
Then for destination table, it should be something like this:
("DimLocationKey" is auto-generated surrogate key)
How could I achieve self-referencing effect in SSIS? I tried following approach but it's not working because there would be no matches in the lookup.
If the column is nullable, then you could load the unique values for location_ID and then have a secondary process come back through and take care of updating existing and possibly adding new.
Pass 1
1 NULL A NULL
2 NULL B NULL
3 NULL C NULL
4 NULL D NULL
I suppose if it's not nullable, then you could precompute those ids in a data flow and assign current row and parent to themselves. As a developer, I might hate you for that though ;)
Pass 2
At this point, it becomes a question of whether there should be 8 rows in the table or 4 (whatever your source data indicates). This becomes a question for business users, appropriately "dumbed down". I've seen both answers in my hierarchy questions - "Who does the President report to?" At one place, the President reported to no one which meant expense requests were automatically approved. A different place had the CEO report to themselves which meant their expense reports still had to be approved by themselves. I guess it was to ensure they had executive accountability as nothing was automagic.
If the answer is 8 rows, then your data flow would look about right. If it's 4, then you'd use the existing data flow but update the rows instead. If it's a small set of rows, hundreds, then you can use the OLEDB Command and write your update statement. Just realize that it will issue an UPDATE statement for every row that hits the component. That can bring your processing to a standstill as it's terribly inefficient.
The more efficient route for updates is to use the OLE DB destination and the after the Data Flow completed, have an Execute SQL task issue a set-based UPDATE statement. See Andy Leonard's Stairway to Integration Services series for a well written example of how to do this.
If it's not nullable and nodes referencing themselves is not allowed, then it seems your data model does not accurately describe

MySQL Database Design Questions

I am currently working on a web service that stores and displays money currency data.
I have two MySQL tables, CurrencyTable and CurrencyValueTable.
The CurrencyTable holds the names of the currencies as well as their description and so forth, like so:
CREATE TABLE CurrencyTable ( name VARCHAR(20), description TEXT, .... );
The CurrencyValueTable holds the values of the currencies during the day - a new value is inserted every 2 minutes when the market is open. The table looks like this:
CREATE TABLE CurrencyValueTable ( currency_name VARCHAR(20), value FLOAT, 'datetime' DATETIME, ....);
I have two questions regarding this design:
1) I have more than 200 currencies. Is it better to have a separate CurrencyValueTable for each currency or hold them all in one table?
2) I need to be able to show the current (latest) value of the currency. Is it better to just insert such a field to the CurrencyTable and update it every two minutes or is it better to use a statement like:
SELECT value FROM CurrencyValueTable ORDER BY 'datetime' DESC LIMIT 1
The second option seems slower.. I am leaning towards the first one (which is also easier to implement).
Any input would be greatly appreciated!!
p.s. - please ignore SQL syntax / other errors, I typed it off the top of my head..
Thanks!
To your questions:
I would use one table. Especially if you need to report on or compare data from multiple currencies, it will be incredibly improved by sticking to one table.
If you don't have a need to track the history of each currency's value, then go ahead and just update a single value -- but in that case, why even have a separate table? You can just add "latest value" as a field in the currency table and update it there. If you do need to track history, then you will need the two tables and the SQL you posted will work.
As an aside, instead of FLOAT I would use DECIMAL(10,2). After MySQL 5.0, this will actually have improved results when it comes to currency handling with rounding.
It is better to have one table holding all currencies
If there is need for historical prices, then the table needs to hold them. A reasonable compromise in many situations is to split the price table into a full list of historical prices and another table which only has the current prices.
Using data type float can be troublesome. Please be sure you know what you are doing. If not, use a database currency data type.
As your webservice is transactional it is better if you'd have to access less tables at the same time. Since you will be reading and writing a lot, I would suggest having a single table.
Its better to insert a field to the CurrencyTable and update it rather than hitting two tables for a single request.