Multiple models in Database from a Single DBT model - jinja2

I am thinking to publish two models in the database from one model in DBT. One model is going to be with filter and the other one is going to be without filter. Have you ever created such a model like this with some jinjas? It would be so helpful if you can share with me some examples or resources please.
I would not create two models as they are the same models only the filter is applied as a difference. So the idea is to keep one sql file and publish two models with different names into the database.
I look forward to hear some suggestions from you.
Many thanks!

Short answer is you cannot do this with a single model, but there are lots of good alternatives. See this question for a discussion about why you can't produce multiple assets from a single dbt model.
In this instance, since you're just applying a filter, I would just create two models, with one selecting from the other. This is a bread-and-butter use case of dbt's DAG.
unfiltered.sql
select
-- your logic goes here
...
filtered.sql
select *
from {{ ref('unfiltered') }}
where
-- your filter goes here
...
If that doesn't work, and the models share a lot of logic, but not all of it, I would wrap the common bits in a macro, and then invoke the macro in both unfiltered.sql and filtered.sql (as in the other answer I linked to).

Thank you very much for giving the idea.
I have also another solution actually and kindly would like to share it with you.
Create the table without filter
Post hook clone the created table
Delete from cloned table where the filter applied.
post_hook helps a lot to make it happen.
So basically the post_hook is going to be like this:
{{ config(
materialized='table',
post_hook= [ "create or replace table table_filtered clone {{this}}",
"delete from table_filtered where 1=1 and filter=true" ]
) }}
By cloning the table, we can keep the descriptions at both column and table level.

Related

Database design for keeping track of experiment data

I am designing a database to record experiment results. Basically, an experiment has several input parameters and an output response. Therefore, the data table will look like following:
run_id parameter_1 parameter_2 ... parameter_n response
1 ... ... ... ...
2 ... ... ... ...
.
.
.
However, the structure of this table is not determinant since different experiments have different number of columns. Then the question is: when a user instantiate an experiment, is it a good idea to create data table dynamically on the fly? Otherwise, what is the elegant solution for that? Thanks.
When I find myself trying to dynamically create tables during runtime, it usually means I need another table to resolve a relationship between entities. In short, I would recommend treating your input parameters as a separate entity and store them in a separate table.
It sounds like your entities are:
experiment
runs of an experiment, which consist of a response and one or more:
input parameters
The relationships between entities is:
One experiment to zero or more runs
One run to one or more input parameter values (one to many)
This last relationship will require an additional table to resolve. You can have a separate table that stores your input parameters, and associate the input parameters with a run_id. This table could look like:
run_parameter_id ... run_id_fk ... parameter_keyword ... parameter_value
Where run_id_fk is a foreign key to the appropriate row in the Runs table (described in your question). The parameter_keyword is just used to keep track of the name of the parameter (parameter_1_exp1, parameter_2_exp1, etc).
Your queries to read/write from the database now become a bit more complicated (needing a join), but no longer reliant on creating tables on the fly.
Let me know if this is unclear and I can provide a potential database diagram.

Separate get request and database hit for each post to get like status

So I am trying to make a social network on Django. Like any other social network users get the option to like a post, and each of these likes are stored in a model that is different from the model used for posts that show up in the news feed. Now I have tried two choices to get the like status on the go.
1.Least database hits:
Make one sql query and get the like entry for every post id if they exist.Now I use a custom django template tag to see if the like entry for the current post exist in the Queryset by searching an array that contains like statuses of all posts.
This way I use the database to get all values and search for a particular value from the list using python.
2.Separate Database Query for each query:
Here i use the same custom template tag but rather that searching through a Queryset I use the mysql database for most of the heavy lifting.
I use model.objects.get() for each entry.
Which is a more efficient algorithm. Also I was planning on getting another database server, can this change the choice if network latency is only around 0.1 ms.
Is there anyway that I can get these like statuses on the go as boolean values along with all the posts in a single db query.
An example query for the first method can be like
Let post_list be the post QuerySet
models.likes.objects.filter(user=current_user,post__in = post_list)
This is not a direct answer to your question, but I hope it is useful nonetheless.
and each of these likes are stored in a model that is different from the model used for news feed
I think you have a design issue here. It is better if you create a model that describes a post, and then add a field users_that_liked_it as a many-to-many relationship to your user model. Then, you can do something like post.users_that_liked_it and get a query set of all users that liked your page.
In my eyes you should also avoid putting logic in templates as much as possible. They are simply not made for it. Logic belongs into the model class, or, if it is dependent on the page visited, in the view. (As a rule of thumb).
Lastly, if performance is your main worry, you probably shouldn't be using Django anyway. It is just not that fast. What Django gives you is the ability to write clean, concise code. This is much more important for a new project than performance. Ask yourself: How many (personal) projects fail because their performance is bad? And how many fail because the creator gets caught in messy code?
Here is my advice: Favor clarity over performance. Especially in a young project.

Split a table in access

I feel like this should be an easy question for someone to answer however despite my numerous searches I have not been able to find an answer (probably due to lack of dB knowledge).
The problem: I am building a research dB related to clinical examinations. I have created a main table and a couple of additional tables which have one to many relationships to the main (think, mulitple findings documented in one examination etc. I have successfully been able to create a Main form with two embedded subforms. This functions as expected.
What I would like to do is break the Main table up into a three individual tables where there are logical differences between groups of fields. This will make the dB easier to revise later on and it will make it easier to find fields.
I would like a record to be created in the two related tables every time i create a record in the first table but I cannot work out how to acheive this. When I go into the form I would like checkboxes from all three tables displayed and editable on the same page and kept in sync by the RecordID of the main table.
Any help or direction to an example dB would be greatly appreciated.
Regards,
James

Is it better to store recursive data in one OR two tables?

Goal: For a simple toDo app, tasks and possible subtasks needs to be stored (Model 1).
Is it "better" to have one table that is using recursive relations OR to use two tables? Advantages/disadvantages in your opinion? Positive/negative effects on performance, useability, etc. Is it even correct to use the recursive one this way ?
Model 1: Tasks and subtasks in two tables. More subtask levels are not necessary.
Model 2: Tasks and subtasks in one table. Btw, is it correct, that with this design to have unlimited subtask-levels (beside technical bounderies) ? task-subtask-subtask-...
I am not sure why you form your question this way and what confuses you.
A classical example of a database is one that stores employees. In the employees table you also store managers as managers are also employees. So what you describe as model 2 is not something "weird".
Self join is a common query.
Try to define the tables in a way that will make your queries as simple as possible and your model easy to understand and extend.
In your case you should define a second table only if each subtask has extra information that other tasks do not.
In your model 1 as you describe it you just duplicate the columns of your main table. This is not a good design IMO.
As far as I can see model 2 fits what you are trying to do.

How do I properly structure my relational mySQL database

I am making a database that is for employee scheduling. I am, for the first time ever, making a relational mySQL database so that I can efficiently manage all of the data. I have been using the mySQL Workbench program to help me visualize how this is going to go. Here is what I have so far:
What I have pictured in my head is that, based on the drawing, I would set the schedule in the schedule table which uses references from the other tables as shown. Then when I need to display this schedule, I would pull everything from the schedule table. Whenever I've worked with a database in the past, it hasn't been of the normalized type, so I would just enter the data into one table and then pull the data out from that one table. Now that I'm tackling a much larger project I am sure that having all of the tables split (normalized) like this is the way to go, but I'm having trouble seeing how everything comes together in the end. I have a feeling it doesn't work the way I have it pictured, #grossvogel pointed out what I believe to be something critical to making this all work and that is to use the join function to pull the data.
The reason I started with a relational database was so that if I made a change to (for example) the shift table and instead of record 1 being "AM" I wanted it to be "Morning", it would then automatically change the relevant sections through the cascade option.
The reason I'm posting this here is because I am hoping someone can help fill in the blanks and to point me in the right direction so I don't spend a lot of hours only to find out I made a wrong turn at the beginning.
Maybe the piece you're missing is the idea of using a query with joins to pull in data from multiple tables. For instance (just incorporating a couple of your tables):
SELECT Dept_Name, Emp_Name, Stat_Name ...
FROM schedule
INNER JOIN departments on schedule.Dept_ID = departments.Dept_ID
INNER JOIN employees on schedule.Emp_ID = employees.Emp_ID
INNER JOIN status on schedule.Stat_ID = status.Stat_ID
...
where ....
Note also that a schedule table that contains all of the information needed to be displayed on the final page is not in the spirit of relational data modeling. You want each table to model some entity in your application, so it might be more appropriate to rename schedule to something like shifts if each row represents a shift. (I usually use singular names for tables, but there are multiple perspectives there.)
This is, frankly, a very difficult question to answer because you could get a million different answers, each with their own merits. I'd suggest you take a look at these (there are probably better links out there too, these just seemed like good points to note) :
http://www.devshed.com/c/a/MySQL/Designing-a-MySQL-Database-Tips-and-Techniques/
http://en.wikipedia.org/wiki/Boyce%E2%80%93Codd_normal_form
http://www.sitepoint.com/forums/showthread.php?66342-SQL-and-RDBMS-Database-Design-DO-s-and-DON-Ts
I'd also suggest you try explaining what it is you want to achieve in more detail rather than just post the table structure and let us try to figure out what you meant by what you've done.
Often by trying to explain something verbally you may come to the realisations you need without anyone else's input at all!
One thing I will mention is that you don't have to denormalise a table to report certain values together, you should be considering views for that kind of thing...