Is there a programmatic way to determine what the most recent context is for an SEC Filing? - xbrl

If I'm trying to read an SEC XBRL filing, is there a way to programatically determine what the most recent context is? Or does the naming of a context follow a particular pattern?
For example, if I am trying to read the AAPL Q2 2020 SEC Filing, there are many different contexts, ex. FI2019Q4, FD2020Q2YTD, FI2020Q2, FD2019Q2YTD, FI2019Q4, FD2020Q2QTD, etc.
I just want the most recent quarterly numbers, ie. FI2020Q2, however I also want a way to determine this programatically so that I don't have to manually decide which context I'm interested in for every SEC filing.
Is there a systematic way to do this, or does the naming of the context follow a pattern?

For SEC filings, there should be a single dei:DocumentType fact. The period for this fact will correspond to the current reporting period. You can then find other facts that have the same period, or the same end date (for instant facts).
See section 6.5.19 of the Edgar Filer Manual Volume II
The names of contexts are simply unique identifiers to allow them to be referenced by facts. Whilst some tools may follow a convention in assigning them, you should not attempt to infer any meaning from them.
Also be aware that not all facts for the same period will be in the same context. Some facts may have additional dimensions, and will therefore use a different context.

The fact associated with the concept dei:DocumentPeriodEndDate has a dateTime value that corresponds to the last balance sheet date. There is another fact associated with the concept dei:CurrentFiscalYearEndDate that tells you the end of the fiscal period (day and month).
These two values together allow you to infer the quarterly and yearly periods.
For example, with this filing, we have:
Document period end date: 2020-06-26
Current fiscal year end date: --12-31
You can infer:
that this is quarter Q2
that the facts carrying the balance sheet values should have the instant period (date) 2020-06-26 (and there will others with the instant periods 2019-12-31, 2019-06-26, ... if the previous periods are reported again)
that the facts carrying the income statement and cash flow statement values for Q2 should have the rough duration 2020-03-31 to 2020-06-26, for YTD 2020-01-01 to 2020-06-26, etc.
Note that you may need to add/remove a few days from the start and end periods, which requires a bit of trial and error code to find those facts that have "meaningful" periods attributable to Q2, FY, YTD, etc.
The resolution of the instant and duration periods associated with facts is indirect, meaning that the context id carried by the fact (like FI2019Q4) allows you to look up the context and find the period inside. I do not recommend trying to make sense of the context ids, because every filer may use a different convention. Rather, you need to dereference the context id and look at the actual XBRL periods.

Related

Record Master values in MySQL database?

I’m trying to figure out a good way of handling this situation strictly using MySQL. (using generated columns, views, logic statements, procedures, etc.)
I’ve simplified this for the example. Let’s say I have a table storing cost of production information for particular products in particular years in particular factories.
Some of these costs are specific for the product. (plastic, molding cost, packaging, labour, etc.)
And some of these costs are fairly generic; I may want to assign a specific value to them, but for many of them most of the time, I’ll just want them to refer to a particular value for the factory in that year. I’ll refer to these as “Master” values. (Such as overhead costs, so things like interest costs, electricity, heat, property taxes, admin labour, etc.)
Then if I update my Master values, the costs on these will automatically be adjusted; and they could be different for each year and factory. (So I can’t just use default values.)
So my columns might be:
And here’s the logic of how that would be defined:
$MValue(var) = var WHERE product_id = M (Master) AND year = year AND factory_id = factory_id;
Then essentially, if I wanted to use a unique cost for that product, I could put the cost amount in the field, but if I wanted it to use the Master value (shown on row 4, designated by M), then I could insert $MValue(column_id) in the field.
Any thoughts on how this could be accomplished with MySQL?
I should add that I’m already using generated (calculated) columns and views on these fields, so thus why I’m looking for a strictly MySQL solution.
I suggest storing the derived costs as NULL in your product rows, and then define a view that joins to the master row.
CREATE VIEW finalcosts AS
SELECT p.cost_id, p.product_id, p.factory_id, p.plastic_cost, p.molding_cost,
COALESCE(p.interest_cost, m.interest_cost) AS interest_cost,
COALESCE(p.tax_cost, m.tax_cost) AS tax_cost
FROM costs AS p
JOIN costs AS m ON p.year = m.year and m.product_id = 'M (Master)'
There's no way to use a default or a generated column to retrieve data from a different row. Those expressions must only reference values within the same row.
P.S.: Regarding the terminology of "master" values, I have been accustomed to the terms "direct costs" and "indirect costs." Direct costs are those that are easily attributed to per-unit costs of products, and indirect costs are like your master costs, they're attributed to the business as a whole, and they usually don't scale per unit produced.

How to store recent usage frequency in MySQL

I'm working on the Product Catalog module of an Invoicing application.
When the user creates a new invoice the product name field should be an autocomplete field which shows the most recently used products from the product catalog.
How can I store this "usage recency/frequency" in the database?
I'm thinking about adding a new field recency which would be increased by 1 every time the product was used, and decreased by 1/(count of all products), when an other product is used. Then use this recency field for ordering, but it doesn't seem to me the best solution.
Can you help me what is the best practice for this kind of problem?
Solution for the recency calculation:
Create a new column in the products table, named last_used_on for example. Its data type should be TIMESTAMP (the MySQL representation for the Unix-time).
Advantages:
Timestamps contains both date and time parts.
It makes possible VERY precise calculations and comparisons in regard
to dates and times.
It lets you format the saved values in the date-time format of your
choice.
You can convert from any date-time format into a timestamp.
In regard to your autocomplete fields, it allows you to filter
the products list as you wish. For example, to display all products
used since [date-time]. Or to fetch all products used between
[date-time-1] and [date-time-2]. Or get the products used only on Mondays, at 1:37:12 PM, in the last two years, two months and three
days (so flexible timestamps are).
Resources:
Unix-Time
The DATE, DATETIME, and TIMESTAMP Types
How should unix timestamps be stored in int columns?
How to convert human date to unix timestamp in Mysql?
Solution for the usage rate calculation:
Well, actually, you are not speaking about a frequency calculation, but about a rate - even though one can argue that frequency is a rate, too.
Frequency implies using the time as the reference unit and it's measured in Hertz (Hz = [1/second]). For example, let's say you want to query how many times a product was used in the last year.
A rate, on the other hand, is a comparison, a relation between two related units. Like for example the exchange rate USD/EUR - they are both currencies. If the comparison takes place between two terms of the same type, then the result is a number without measurement units: a percentage. Like: 50 apples / 273 apples = 0.1832 = 18.32%
That said, I suppose you tried to calculate the usage rate: the number of usages of a product in relation with the number of usages of all products. Like, for a product: usage rate of the product = 17 usages of the product / 112 total usages = 0.1517... = 15.17%. And in the autocomplete you'd want to display the products with a usage rate bigger than a given percentage (like 9% for example).
This is easy to implement. In the products table add a column usages of type int or bigint and simply increment its value each time a product is used. And then, when you want to fetch the most used products, just apply a filter like in this sql statement:
SELECT
id,
name,
(usages*100) / (SELECT sum(usages) as total_usages FROM products) as usage_rate
FROM products
GROUP BY id
HAVING usage_rate > 9
ORDER BY usage_rate DESC;
Here's a little study case:
In the end, recency, frequency and rate are three different things.
Good luck.
To allow for future flexibility, I'd suggest the following additional (*) table to store the entire history of product usage by all users:
Name: product_usage
Columns:
id - internal surrogate auto-incrementing primary key
product_id (int) - foreign key to product identifier
user_id (int) - foreign key to user identifier
timestamp (datetime) - date/time the product was used
This would allow the query to be fine tuned as necessary. E.g. you may decide to only order by past usage for the logged in user. Or perhaps total usage within a particular timeframe would be more relevant. Such a table may also have a dual purpose of auditing - e.g. to report on the most popular or unpopular products amongst all users.
(*) assuming something similar doesn't already exist in your database schema
Your problem is related to many other web-scale search applications, such as e.g. showing spell corrections, related searches, or "trending" topics. You recognized correctly that both recency and frequency are important criteria in determining "popular" suggestions. In practice, it is desirable to compromise between the two: Recency alone will suffer from random fluctuations; but you also don't want to use only frequency, since some products might have been purchased a lot in the past, but their popularity is declining (or they might have gone out of stock or replaced by successor models).
A very simple but effective implementation that is typically used in these scenarios is exponential smoothing. First of all, most of the time it suffices to update popularities at fixed intervals (say, once each day). Set a decay parameter α (say, .95) that tells you how much yesterday's orders count compared to today's. Similarly, orders from two days ago will be worth α*α~.9 times as today's, and so on. To estimate this parameter, note that the value decays to one half after log(.5)/log(α) days (about 14 days for α=.95).
The implementation only requires a single additional field per product,
orders_decayed. Then, all you have to do is to update this value each night with the total daily orders:
orders_decayed = α * orders_decayed + (1-α) * orders_today.
You can sort your applicable suggestions according to this value.
To have an individual user experience, you should not rely on a field in the product table, but rather on the history of the user.
The occurrences of the product in past invoices created by the user would be a good starting point. The advantage is that you don't need to add fields or tables for this functionality. You simply rely on data that is already present anyway.
Since it is an auto-complete field, maybe past usage is not really relevant. Display n search results as the user types. If you feel that results are better if you include recency in the calculation of the order, go with it.
Now, implementation may defer depending on how and when product should be displayed. Whether it has to be user specific usage frequency or application specific (overall). But, in both case, I would suggest to have a history table, which later you can use for other analysis.
You could design you history table with atleast below columns:
Id | ProductId | LastUsed (timestamp) | UserId
And, now you can create a view, which will query this table for specific time range (something like product frequency of last week, last month or last year) and will give you highest sold product for specific time range.
Same can be used for User's specific frequency by adding additional condition to filter by Userid.
I'm thinking about adding a new field recency which would be increased
by 1 every time the product was used, and decreased by 1/(count of all
products), when an other product is used. Then use this recency field
for ordering, but it doesn't seem to me the best solution.
Yes, it is not a good practice to add a column for this and update every time. Imagine, this product is most awaiting product and people love to buy it. Now, at a time, 1000 people or may be more requested for this product and for every request you are going to update same row, since to maintain the concurrency database has to lock that specific row and update for each request, which is definitely going to hit your database and application performance instead you can simply insert a new row.
The other possible solution is, you could use your existing invoice table as it will definitely have all product and user specific information and create a view to get frequently used product as I mentioned above.
Please note that, this is an another option to achieve what you are expecting. But, I would personally recommend to have history table instead.
The scenario
When the user creates a new invoice the product name field should be an autocomplete field which shows the most recently used products from the product catalogue.
your suggested solution
How can I store this "usage recency/frequency" in the database?
If it is a web application, don't store it in a Database in your server. Each user has different choices.
Store it in the user's browser as Cookie or Localstorage because it will improve the User Experience.
If you still want to store it in MySQL table,
Do the following
Create a column recency as said in question.
When each time the item used, increase the count by 1 as said in question.
Don't decrease it when other items get used.
To get the recent most used item,
query
SELECT * FROM table WHERE recence = (SELECT MAX(recence) FROM table);
Side note
Go for the database use only if you want to show the recent most used products without depending the user.
As you aren't certain on wich measure to choose, and it's rather user experience related problem, I advice you have a number of measures and provide a user an option to choose one he/she prefers. For example the set of available measures could include most popular product last week, last month, last 3 months, last year, overall total. For the sake of performance I'd prefer to store those statistics in a separate table which is refreshed by a scheduled job running every 3 hours for example.

time slot database design

I am creating a database which need to allow booking a resource from start time to end time on a particular day. For example, I have 11 badminton courts. These courts can be booked for 1 hour and it can very also and in a day each court takes 18 bookings from morning 6 am till night 12 pm. (considering each booking is for one hour). The price of booking also varies from day to day, for example morning charges are more than day charges. Weekend charges are more than weekdays charges.
Now my question is, is it advisable to pre-populate slots and then book it for user depending on the availability. But in this case for the abobe example If I need to store slots for next 1 month then I will have to store 11*18*30 = 5940 records in advance without any real bookings.Every midnight I will need to run script to create slots. If no of clubs increases this number can become huge. Is this good design for such systems? If not then what is the better designs in these scenerios.
club name||court || date || start_time || end_time || status || charge ||
a c1 20/04/2015 6:00 7:00 available
a c1 20/04/2015 7:00 8:00 available
.
.
.
a c1 20/04/2015 11:00 24:00 available
.
.
a c11 20/04/2015 11:00 24:00 available
Now my question is, is it advisable to pre-populate slots and then book it for user depending on the availability. But in this case for the abobe example If I need to store slots for next 1 month then I will have to store 11x18x30 = 5940 records in advance without any real bookings.Every midnight I will need to run script to create slots. If no of clubs increases this number can become huge.
Yes. that is a horrible method. For the reasons you have stated, plus many more.
The storage of non-facts is absurd
The storage of masses of non-facts cannot be justified
If the need to write simple code is an issue, deal with that squarely, and elevate your coding skills, such that it isn't an issue (instead of downgrading the database to a primitive filing system, in order to meet your coding skills).
Notice that what you are suggesting is a calendar for each court (which is not unreasonable as a visualisation, or as a result set), in which most of the slots will be empty (available).
Is this good design for such systems?
No, it is horrible.
It is not a design. It is an implementation without a design.
If not then what is the better designs in these scenerios.
We use databases. And given its unequalled position, and your platform, specifically Relational Database.
We store only those Facts that you need, about the real world that you need to engage with. We need to get away from visualising the thing we need for the job we have to do (thousands of calendars, partially empty) and think of the data, as data, and only as data. Including all the rules and constraints.
Following that, the determination of Facts, or the absence of a Fact, is dead easy. I can give you the Relational Database that you will need, but you have to be able to write SQL code, in order to use the database effectively.
Data Model
Try this:
Resource Reservation Data Model
That is an IDEF1X data model. IDEF1X is the Standard for modelling Relational Databases. Please be advised that every little tick; notch; and mark; the crows foot; the solid vs dashed lines; the square vs round corners; means something very specific and important. Refer to the IDEF1X Notation. If you do not understand the Notation, you will not be able to understand or work the model.
I have included:
Storage of Facts (Reservations) only. The non-fact or absence of a Fact (Availability) is easy enough to determine.
club_resource_slot.duration in the Key to allow any duration, rather than assuming one hour, which may change. It is required in any case, because it delimits the time slot.
resource_code, rather than court number. This allows any club resource (as well as a court number) to be reserved, rather than only a badminton or squash court. You may have meeting rooms in the future.
Joel's reply re the rate table is quite correct in terms of answering that specific question. I have given a simpler form in the context of the rest of the model (less Normalised, easier to code).
If you would like the Predicates, please ask.
Code/General
You seem to have problems with some aspects of coding, which I will address first:
But the problem in this approach is if I need to find the availability of court based on game,location, date and time slot then I will have to load this rate table for all the clubs and the look into actual booking table if someone has already booked the slots. Is nt the better approach be if I keep the slots in advance and then someone book , jst change the status to booked. so That query will be performed entirely in DB without doing any computation in memory.
The existence of the rate table, or not, does not create an issue. That can be accomplished via a join. The steps described are not necessary.
Note that you do not need to "load this whole table" as a matter of course, but you may have to load one table or other in order to populate your drop-downs, etc.
When someone books a court, simply INSERT reservation VALUES ()
When someone cancels a reservation, simply DELETE reservation VALUES ()
Code/Data Model
Printing your matrix of Reserved slots should be obvious, it is simple.
Printing your matrix of Available or Available plus Reserved (your calendar visual) requires Projection. If you do not understand this technique, please read this Answer. Once you understand that, the code is as simple as [1].
You need to be able to code Subqueries and Derived tables.
Determination of whether a slot is Reserved or Available requires a simple query. I will supply some example code to assist you. "Game" isn't specified, and I will assume location means club.
IF (
SELECT COUNT(*) -- resources/courts reserved
FROM reservation
WHERE club_code = $club_code
AND date_time = $date_time
) = 0
THEN PRINT "All courts available"
ELSE IF (
SELECT COUNT(*) -- resources/courts that exist
FROM club_resource_slot
WHERE club_code = $club_code
AND date_time = $date_time
) = (
SELECT COUNT(*) -- resources/courts reserved
FROM reservation
WHERE club_code = $club_code
AND date_time = $date_time
)
THEN PRINT "All courts reserved"
ELSE PRINT "Some courts available"
Please feel free to comment or ask questions.
Assuming that each booking is for one hour (that is, if someone wants two hours on the court, they're taking two bookings of one hour each) it seems to me the most efficient storage mechanism would be a table Booking with columns Court, Date, and Hour (and additional columns for the person who booked, payment stated, etc..) You would insert one record each time a court was booked for an hour.
This table would be sparsely populated, in that there would only be records for the booked hourly units, not for the available ones. No data would be pre-generated; you would only create records when a booking occurred.
To produce an daily or weekly calendar your application would retrieve the booked hours from the database and join this with its knowledge of your hours (6am to midnight) to produce a visualization of court availability.
It is probably much more efficient from a data maintenance perspective to have a table with courts (1 record per court) and a table with bookings (1 record per booking).
The BOOKING record should have a foreign key to the COURT a booking start date/time and a booking end date/time. It would also have information about who made the booking, which could be a foreign key to a CUSTOMER table or it might be a fill-in name, etc., depending on how your business works.

Designing my first database schema: suggestion needed [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
This is my first database schema design. I am trying to develop a small web application for my department which will be used for food cost management. And I am doing this for my learning purpose.
How the food cost management works in my department:
Total members: 15
One admin to keep record of all the cost. he will update the database on a daily basis.
Each member can order only once a day. If anyone has guest on any specific day he can order multiple number of meal.
Usually members pay their bill for a week or two in advance.
one or two persons are responsible for bringing the food from outside and they don't need to pay for their lunch. Transportation cost is also given to them. their food cost+ transportation cost is distributed equally to other 15 members expenses.
Database queries:
From admin perspective:
he will manage/add the daily order. (Table: orders)
he will add the payments for all the members which will be credited against respective member's "Balance" (Table: payments)
he will be able to see an overview of all members' order/cost history and their current balance in a chart for one month at a time.
If any member has negative balance or less than a specific amount of money, it will be notified to admin dashboard.
From member perspective:
he will be able to see his current balance and order/cost history for last one month at a time.
he will be able to see last x number of payment history he made.
Based on the queries I mentioned above I tried to design a database schema which looks like the diagram below:
Elaboration of some attributes:
EPlatenum: Number of extra plate of food brought besides number of plate ordered.
Eplatecost: cost for extra plate of food. this cost is distributed equally among 15members individual cost.
EPersonnum & EPersoncost: Number of extra person involved in bringing the food and their total cost. the cost will be distributed equally among 15members individual cost.
TransCost: transportation cost. the cost will be distributed equally among 15members individual cost.
Questions:
what are the mistakes I have made and how can I overcome them?
For my DailyList table I have used "date" as the primary key.Is it OK to use date as primary key? IF not OK, what can be the primary key here instead?
when I am going to populate a chart overview for 30 months cost/order history the database query will be huge I assume. what approach should I take to optimize the query?
I am looking forward to getting your suggestions on improving the database schema. Please help me correcting my design mistakes and overcome them. Thank you for your patience.
My first impression:
I think payment should be related to order (because user pays for specific order).
I don't know what DailyList is, but if there may be more than two with the same date (and as I can image it may be) you shouldn't use it as a primari key.
Password should be encoded with e.g. SHA (so varchar 15 is to less).

Retention Tracking

Let’s say I have an Angry Birds game.
I want to know how many players are buying the ‘mighty eagle’ weapon each month out of the players which bought the mighty eagle weapon in the previous months in their LTV in the system
I have the dates of all items bought per each client.
What I practically would like to have is a two dimensional
matrix that will tell me what the percentage of the players which moved from
LTV_month_X to LTV_month_Y for each combination of X<Y for a specific current
month?
An example:
example_png
(it didn't let me to put the pic inline so please press the link to see the pic)
Now, I have found a way to get the number of players moved
actually from from LTV_month_X to LTV_month_Y that LTV_month_Y is their current
month of activity within the system using SQL query and Excel Pivot table.
What I try find out is mainly is how to get the base number of those who potentially could do that transition.
A few definitions:
LTV_month_X = DATEDIFF(MONTH, first_eagle_month, specific_eagle_month)+1
Preferably I would like to have the solutions in ANSI-SQL, if not then MySQL or
MSSQL but no Oracle functions should be used at all.
Since I’m looking for the percentage of the transition two-steps plans could also work, first find the potential ones and the find the actual ones who moved to measure the retention from  LTV_month_X to LTV_month_Y.
One last issue: I need for it to be possible to drill down and find the actual IDs of the clients who moved from any stage X to any other stage Y (>X).
The use of the term LTV here is not clear. I assume you mean the lifetime of the user.
If I understand the question, you are asking, based on a list of entities each with one or more events, how do I group (e.g. count) the entities by the month of the last event and the month of the one before last event.
in mysql, you can use a variable to do that. I'm not going to expalin the whole concept, but basically, when within a SELECT statement you write #var:=column, then that variable is assigned the value of that column, and you can use that to compare values between consectuive columns e.g.
LEAST(IF(#var=column,#same:=#same+1,#same:=0),#var:=column)
the use of LEAST is a trick to ensure execution order.
The two dimension you are looking for are
Actual purchase month
Relative purchase month
SELECT
player_id,
TRUNCATE(first_purchase,'MM') AS first_month ,
TRUNCATE(current_purchase_date ,'MM') AS purchase_month,
months_between(current_purchase _date, first_purchase_date)+1 AS relative_month,
SUM(purchase_amount) AS total_purchase,
COUNT(DISTINCT player_id) AS player_count
FROM ...
Now you can pivot purchase month to relative month and aggregate