Stock portfolio database design to support stock split/merge - mysql

I am currently creating a web application to manage my stock portfolio, but when it comes to the transaction table, I have some problem I want to ask.
The following is my stock transaction table design:
| column name | datatype |
|----------------|----------------------|
| id | int(10) | primary key, auto increment
| portfolio_id | int(10) | reference to portfolio table primary key
| symbol | varchar(20) | stock symbol e.g: YHOO, GOOG
| type | ENUM('buy','sell') |
| tx_date | DATE |
| price | DOUBLE(15,2) |
| volume | int(20) |
| commission | DOUBLE(15,2) |
| created_at | TIMESTAMP |
| updated_at | TIMESTAMP |
In my current design, I don't have an extra table for storing the stock symbol. I generate a list of stock symbols (using some stock api) for the user to pick when they try to create a new transaction record, and I think that this approach may cause some problem when there is stock split/merge, because I may not be able to retrieve the stock price again using the same symbol.
I would like to know how I should modify my table, in order to support the stock split/merge case?

Stock splits
... symbol type shares ...
... AAPL split 100 ...
2 for 1 split; 100 shares became 200 shares.
Dividends
symbol type amount
AAPL div 20.00
Mergers
Workaround: Record a merger as a sale or the old stock and a buy of the new stock. Add appropriate notes in the 'notes' column.
A more accurate (but more complicated) strategy is to redesign the entire database so that each trade is literally a trade of one transaction for another. A 'buy' trades cash for stock. A 'sell' trades stock for cash. A merger trades stock A for stock B. A split trades 0 shares for 100 shares, etc. Cash is just another asset class.
Foreign stocks
All the major finance sites have this figured out. symbol.exchange is a unique id. No need to reinvent the wheel and create a new id column.
You will also need to add a currency column for foreign stocks.

There are less than 4000 stocks in USA. Why don't you use the stock symbol as the Primary Key. How do you plan for dividends?

I like your approach of having your own custom security (stock) ID. You can then map this to various ticker/CUSIP/ISIN changes over time from the exchange/data provider. So have a security_master table which has your security_ID, and a separate <data_provider>_security table with the one-to-many mappings. And a third security events table (splits, mergers, etc)
Your transaction, holding, and any other tables which refer to securities, will only refer to your internal security ID.
If a stock splits, you still refer to it using the same security_id, but it would map to a security events table that tracks over time, and you would query the appropriate quantity based on the split ratio for that point in time.

Related

What is the best way to store multiple user data in mysql database?

I'm trying to make a table to store the user's weight per month (over the course of 12 months), what is the best approach to this if I want to store multiple user datas, because every month the weight is different.
First of all you should have some way of identifying users, let's say you have a unique user_id. Then your table could look like this:
user_id | month | weight
--------------------------
1 | 03/2019 | 76.54
1 | 04/2019 | 75.32
2 | 03/2019 | 103.12
2 | 04/2019 | 97.84
In that table you can store any amount of records for the same user. If you want to make sure that each user can only have one measurement per month, you can add a unique index for the columns user_id and month combined.
Any other information about the users like their name and email address have to be stored in a separate table, because you only want to store them once per user. You should also define a foreign key constraint on the column user_id to tell the database application that each weight record references a user record.

database table design for ratings in mysql

i am designing ratings table in mysql. The thing is the following:
a) The one that gives rating to other can be different type
(Admin,Client, Provider or anything else in the future)
b) The one that receives rating from other can be different
type(Admin,Client,Provider or anything else in the future)
c) Giving ratings to others happen on when There's an order(imagine
just you go to website and order food. when they get you food, you
rate them) . but orders could be different kind. In my case order
means Freight Delivery. So I can have RoadOrder, SeaOrder.
After looking at the above things, I came to conclusion to have tables like this:
1) all_ratings table
(This table doesn't calculate the current ratings for each kind of user. There are 3 morphs here)
id | from_userable_type | from_userable_id | to_userable_type | to_userable_id | orderable_type | orderable_id | rating
2) ratings table(which calculates current rating. all_ratings just saves
any kind of rating from each user, whereas ratings table has the final
rating. ratingable_id and ratingable_type could be any type of user)
id | quantity | current_rating | ratingable_id | ratingable_type
Users table
id | name | email | userable_id | userable_type
Admins,Clients,Providers(they look the same for now)
id
RoadOrder table
id | from_place | to_place | ...etc
The question: What do you think? is this the right table schema for this type of scenario?

Periodic snapshot fact table - Possibly missing some captures

I am tracking employee changes daily in a DimPerson dimension table, and filling up my fact table each end-of-month and counting Hires, Exits, and Headcount.
For this example, let's say I will be populating the fact table end-of-month April 30th. Now here's the problem I am facing:
I have an employee record on April 17th that's a "Hire" action, so at that point in time my DimPerson table reads like this:
+-------+-----------+----------+--------+--------------------+-------+
| EmpNo | Firstname | LastName | Action | EffectiveStartDate | isCur |
+-------+-----------+----------+--------+--------------------+-------+
| 4590 | John | Smith | Hire | 4/17/2017 | Y |
+-------+-----------+----------+--------+--------------------+-------+
Now 2 days later, I see the same employee but with an action "Manager Change", so now my DimPerson table becomes this:
+-------+-----------+----------+-----------------+--------------------+-------+
| EmpNo | Firstname | LastName | Action | EffectiveStartDate | isCur |
+-------+-----------+----------+-----------------+--------------------+-------+
| 4590 | John | Smith | Hire | 4/17/2017 | N |
| 4590 | John | Smith | Manager Change | 4/19/2017 | Y |
+-------+-----------+----------+-----------------+--------------------+-------+
So at Month end, when I select all "Current" employees, I will miss the Hire capture for this person since his most recent record is just a manager change and the actual hiring happened "in-month".
Is this normal that you can miss certain changes when doing a periodic snapshot? What you recommend I do to capture the Hire action in this case?
Sounds like you need to fill up your fact table differently- you need a reliable source of numbers of hires, exits and headcount. You could pick those events up directly from the source system if available, or pick them up from your dimension table (if it was guaranteed to contain all the history, and not just end-of-day changes).
The source system would be the best solution, but if the dimension table overall shows the history you need, then rather than selecting the isCur people and seeing their most recent action, you need to get all the dimension table records for the period you are snapshotting, and count the actions of each type.
However I would not recommend you use the dimension table at all to capture transactional history. SCDs on a dimension should be used to track changes to the dimension attributes themselves, not to track the history of actions on the person. Ideally, you would create a transactional fact table to record these actions. That way, you have a transactional fact that records all actions, and you can use that fact table to populate your periodic snapshot at the end of each month, and your dimension table doesn't need to worry about it. Think of your dimension table as a record of the person, not of the actions on the person.
If your fact is intended to show the organizational change at the month end, I would say it is working as designed. The employee has a manager at the end of the month, but did not exist at the end of the previous month. This implies the employee was hired during the month. With a monthly grain, it should not be expected to show the daily activity.
Our employee dimension contains the hire date as a Type 1 attribute. We also include hire date in certain fact tables to allow a role playing relationship with the date dimension.

How to design my database to accommodate this data

I am developing a database for a payroll application, and one of the features I'll need is a table that stores the list of employees that work at each store, each day of the week.
Each employee has an ID, so my table looks like this:
| Mon | Tue | Wed | Thu | Fri | Sat | Sun
Store 1 | 3,4,5 | 3,4,5 | 3,4,5 | 4,5,7 | 4,5,7 | 4,5,6,7 | 4,5,6,7
Store 2 | 1,8,9 | 1,8,9 | 1,8,9 | 1,8,9 | 1,8,9 | 1,8,9 | 1,8,9
Store 3 | 10,12 | 10,12 | 10,12 | 10,12 | 10,12 | 10,12 | 10,12
Store 4 | 15 | 15 | 15 | 16 | 16 | 16 | 16
Store 5 | 6,11,13 | 6,11,13 | 6,11,13 | 14,18,19| 14,18,19| 14,18,19| 14,18,19
My question is, how do I represent that on my database? I came up with the following ideas:
Idea 1: Pretty much replicate the design above, creating a table with the following columns: [Store_id | Mon | Tue ... | Sat | Sun] and then store the list of employee IDs of each day as a string, with IDs separated by commas. I know that comma-separated lists are not good database design, but sometimes they do look tempting, as in this case.
Store_id | Mon | Tue | Wed | Thu | Fri | Sat
---------+---------+---------+---------+---------+---------+---------
1 | '3,4,5' | '3,4,5' | '3,4,5' | '4,5,7' | '4,5,7' | '4,5,6,7'
2 | '1,8,9' | '1,8,9' | '1,8,9 '| '1,8,9' | '1,8,9' | '1,8,9'
Idea 2: Create a table with the following columns: [Store_id | Day | Employee_id]. That way each employee working at a specific store at a specific day would be an entry in this table. The problem I see is that this table would grow quite fast, and it would be harder to visualize the data at the database level.
Store_id | Day | Employee_id
---------+-----+-------------
1 | mon | 3
1 | mon | 4
1 | mon | 5
1 | tue | 3
1 | tue | 4
Any of these ideas sound viable? Any better way of storing the data?
if I were you I would store the employee data and stores data in separate tables... but still keep the design of your main table. so do something like this
CREATE TABLE stores (
id INT, -- make it the primary key auto increment.. etc
store_name VARCHAR(255)
-- any other data for your store here.
);
CREATE TABLE schedule (
id INT, -- make it the primary key auto increment.. etc
store_id INT, -- FK to the stores table id
day VARCHAR(20),
emp_id INT -- FK to the employees table id
);
CREATE TABLE employees
id INT, -- make it the primary key auto increment.. etc
employee_name VARCHAR(255)
-- whatever other employee data you need to store.
);
I would have a table for stores and for employees as that way you can have specific data for each store or employee
BONUS:
if you wanted a query to show the store name with the employees name and their schedule and everything then all you have to do is join the two tables
SELECT s.store_name, sh.day, e.employee_name
FROM schedule sh
JOIN stores s ON s.id = sh.store_id
JOIN employees e ON e.id = sh.emp_id
this query has limitations though because you cannot order by days so you could get data by random days.. so in reality you also need a days table with specific data for the day that way you can order the data by the beginning or end of the week.
if you did want to make a days table it would just be the same thing again
CREATE TABLE days(
id INT,
day_name VARCHAR(20),
day_type VARCHAR(55)
-- any more data you want here
)
where day name would be Mon Tue... and day_type would be Weekday or Weekend
and then all you would have to do for your query is
SELECT s.store_name, sh.day, e.employee_name
FROM schedule sh
JOIN stores s ON s.id = sh.store_id
JOIN employees e ON e.id = sh.emp_id
JOIN days d ON d.id = sh.day_id
ORDER BY d.id
notice the two colums in the schedule table for day would be replaced with one column for the day_id linked to the days table.
hope thats helpful!
The second design is correct for a relational database. One employee_id per row, even if it results in multiple rows per store per day.
The number of rows is not likely to get larger than the RDBMS can handle, if your example is accurate. You have no more than 4 employees per store per day, and 5 stores, and up to 366 days per year. So no more than 7320 rows per year, and perhaps less.
I regularly see databases in MySQL that have hundreds of millions or even billions of rows in a given table. So you can continue to run those stores for many years before running into scalability problems.
I upvoted John Ruddell's answer, which is basically your option #2 with the addition of tables to hold data about the store and the employee. I won't repeat what he said, but let me just add a couple of thoughts that are too long for a comment:
Never ever ever put comma-separated values in a database record. This makes the data way harder to work with.
Sure, either #1 or #2 makes it easy to query to find which employees are working at store 1 on Friday:
Method 1:
select Friday_employees from schedule where store_id='store 1'
Method 2:
select employee_id from schedule where store_id=1 and day='fri'
But suppose you want to know what days employee #7 is working.
With method 2, it's easy:
select day from schedule where employee_id=7
But how would you do that with method 1? You'd have break the field up into it's individual pieces and check each piece. At best that's a pain, and I've seen people screw it up regularly, like writing
where Friday_employees like '%7%'
Umm, except what if there's an employee number 17 or 27? You'll get them too. You could say
where Friday_employees like '%,7,%'
But then if the 7 is the first or the last on the list, it doesn't work.
What if you want the user to be able to select a day and then give them the list of employees working on that day?
With method 2, easy:
select employee_id from schedule where day=#day
Then you use a parameterized query to fill in the value.
With method 1 ...
select employee_id from schedule where case when #day='mon' then Monday_employees when #day='tue' then Tuesday_employees when #day='wed' then Wednesday_employees when #day='thu' then Thursday_employees when #day='fri' then Friday_employees when #day='sat' then Saturday_employees as day_employees
That's a beast, and if you do it a lot, sooner or later you're going to make a mistake and leave a day out or accidentally type "when day='thu' then Friday_employees" or some such. I've seen that happen often enough.
Even if you write those long complex queries, performance will suck. If you have a field for employee_id, you can index on it, so access by employee will be fast. If you have a comma-separated list of employees, then a query of the "like '%,7,%' variety requires a sequential search of every record in the database.

MySQL database organization for stocks

So i am creating a web service to predict future stock prices based on historical data for each stock and need to store the following information in a database:
Stock information: company name, ticker symbol, predicted price
For each tracked stock: historical data including daily high, daily low, closing price etc for every day dating back to 1-5 years.
User information: username, password, email, phone number, (the usual)
User tracked stocks: users can pick and choose stocks to be later alerted predictions of via email or phone.
The set of stocks that prediction will be made on will not be predefined and thus there should be a quick way of being able to add and remove stocks and consequently add/remove all data (as stated above) connected to them. My approach to designing is the following:
Table: Stocks
+-----+-----------+----------+------------+----------+-------------+
| ID | Company | ticker | industry | Sector | Prediction |
+-----+-----------+----------+------------+----------+-------------+
Table: HistoricalPrices
+-------------------------------------+--------+--------+-------+----------+
| StockID(using stock ID from above) | Date | High | Low | Closing |
+-------------------------------------+--------+--------+-------+----------+
Table: Users
+-----+------------+------------+---------------+
| ID | Username | Password | PhoneNumber |
+-----+------------+------------+---------------+
Table: TrackedStock
+---------+----------+
| UserID | StockID |
+---------+----------+
Is there a better way at optimizing this organization? As far as queries are concerned the majority will be done on the historical data, for each stock one at a time. (Please excuse any security issues such as passwords being salted and hashed as the purpose of the question is on organization)
Simply said: No. THough you may want to add the colume to the historical prices.
What you may also want is to have a market table and to use lookup tables for industry, sector, possibly prediction - which should possibly be (the prediction) in a separate table with... a date (so you can look back to past predictions).