Question about SQL performance when selecting a 'blog post' based on user views by date.
I want to record the user views of each post, and i ll select everyone of them using 'daily' and 'monthly' as parameters:
PS:
Most viewed posts of the day, or month.
To record the views, i created a table to insert, after every page load, the date of each view.
And them select them (count them) by DAY() and MONTH() when needed.
The problem here is, when the table or the amount of users requiring this information grows the select starts to be slower, due to the amount of rows(views) multiplied for the amount of posts.
One alternative that i thought was, create a table for daily records, and another table for monthly records, then on every page load the code checks if there is a row for the selected date, if the rows exist the script increment the views count on it, if it doesn't, the script insert the row with views count = 1;
Ps:
Daily Views
Post ID | Views | Date
1 | 898 | 2014-07-11
2 | 676 | 2014-07-11
1 | 333 | 2014-07-10
This way every post can have only one row per day.
Is there any better option? what do you think about my alternative? there is no need for my suggestion?
I think the best solution is:
Create a table with statistical data with fields:
id
date (store date m-d-y)
day
month
year
views (store number of visits)
page (store blog post)
One unique row per day, and update programmatically as needed.
Then you can make queries using day, month, year fields, even you can add weeknum field to make queries to obtain statistics grouped by weeks.
As addition you can add a second table to store the full date (m-d-y h:m:s) for each visit, you can add fields like browser, ip, etc... to this table.
Related
I have a table with the following features: Invoice ID, billing_period_start, billing_period_end, and items_purchased during that period.
I'm looking to break out a date range by individual dates. A date range can be contained within one month, but it can also be spread across two months, unequally. This will effectively create many more records than are currently in the table. Once I have done that, I need to breakout the amount of purchased items equally among that dates of the daterange.
billing_period_start billing_period_end
-------------------- ------------------
2010-03-05 2010-03-07
2010-04-29 2010-05-05
2010-06-29 2006-08-12
billing_date
------------
2010-03-05
2010-03-06
2010-03-07
2010-04-29
2010-04-30
2010-05-01
...
2010-05-05
2010-06-29
2010-06-30
...
2010-08-12
Now that the date range is broken into individual dates, I need to take the items_purchase and divide it by the number of the days in the billing period for each date, so that I have the items_purchase_per_date.
select
invoice_line_id AS invoice_id
,items_purchased
,billing_period_start
,billing_period_end
,date_from_parts(YEAR(billing_period_start), MONTH(billing_period_start), 1) AS period1_month_start
,last_day(month_start, month) AS period1_month_end
,datediff(day, billing_period_start, billing_period_end) + 1 AS billing_period_length
from "INVOICE_DATA"
order by 1;
I'm running this on Snowflake, but can easily convert from mySQL, if someone knows that DBMS better.
The best way to handle this in a data warehouse is using a date dimension table. That is, a table that contains all the dates you need for analysis, plus any date attributes that are interesting as well, such as which week/month/quarter etc the date belongs to and so on.
Once you have table with unique rows for all relevant dates, you can more easily tackle date spine challenges like this.
For example, for your case you'd write (assuming dates is the name of your date dimension and calendar_date the name of the column containing the unique dates:
select
d.calendar_date,
i.*
from
dates d
join
invoice_data i
on d.calendar_date between i.billing_period_start and i.billing_period_end
Now you have one row per date between those start/end dates and you can do your daily billing allocation.
I have designed a database schema for a subscription product. user can select subscription starting from a date for certain amount of days. User can cancel a subscription for some days while still keeping the subscription active afterwards. meaning if user subscribes for a month she can cancel it on days say 10,15 and 20 thus paying for only 27 days (30 minus 3).
So far I have come up with this schema.
each user has one profile.
user can select a plan.
once user selects a plan it is noted as transaction which also stores information about start date and duration of that plan.
each transaction has payment (focusing on that part later)
Now, since user can cancel subscription any day how should I keep track of different users and days on which they had subscription?
The solution that I have in mind is crate a new table Plan_Transaction_user which will keep track of each date and transaction ID for that date. This way If user cancels her subscription on particular date there will be no record of that date for that transaction ID.
table will look like this:
Date Transaction ID
1-1-2017 1
1-1-2017 2
1-1-2017 3
1-2-2017 1
1-2-2017 3
Since user associated with transaction_id 2 cancelled for day 2 her transaction record is not present in this table.
Now if I have customer base of say 5000 then in best case within one year I will have 5000 * 365 ~ 1.8m rows. I am sure this is not best approach to go about it. Could you please suggest me any better schema or some changes in existing schema which can be more efficient? Just in case you want to know I will be using MariaDB (AWS RDS) as a database and Python 2 as my language.
Thank you,
Ojas
You can add a end_date field in Transaction table instead of duration. You can easily defined end_date as start_date + how many days you will given for selected plan. When user cancel some days then you can reduce end_date as end_date = end_date - number of cancel days. You can check how many valid subscription currently at any days checking through end_date >= today.
Similar to your Plan_Transaction_user design, if u only need to know how many subscribers for any particular date, but not who they are, u can aggregate the table by day. Like
Date user_count
1-1-2017 1
1-2-2017 2
I have a table with the following structure:
Entry ID | Date | Approved
Whenever a new entry is made, Entry ID auto increments and date is set to whenever the entry was made through the web application. These entries are not necessarily made every day, so there are gaps between dates.
I need to find all "missing" entries, meaning that there is no entry for that date. For instance, if there was an entry for 2015-06-01 and the next one didn't come until 2015-06-07, I need a query that returns the list of dates from 2015-06-02 to 2015-06-06 and an indication of their approved status from that field. I've been looking for a while but can't seem to find a method to get a list of entries that don't exist. Is there a method for this, or should I restructure?
Create a temp table with all possible dates and do
SELECT Date FROM temp_table WHERE Date NOT IN (SELECT Date FROM your_table);
I'm a little lost on how I should do this any help or guidance would be appreciated. I have a table that has 3 columns with multiple values inserted into it basically a log_book table of events. We can say there is an order_id, event_status, and datetime. I need to get the difference of days between the datetime columns and sum them together where order_id=? and event_status=? I know how to limit my queries to get the data I want. But what would be the best way to get the difference in days and add them together. Essentially there could be only one entry with the same order_id and event_status or there could be multiple entries with the same order_id and event_status.
Event Status codes
1 = initially assigned
2 = submitted for review
3 = sent back for more work
because 2015-01-01 to 2015-01-15 is 15 days
and 2015-01-17 to 2015-01-18 is 1 day
so the total days would be: 16 Days
I'm trying to create a weekly (Monday - Sunday) schedule/agenda, similar to Google Calendar, using mySQL where users are able to fill out and display a schedule for what tasks they have at some day during some hour interval. For example a user task could store in the schedule as
Username | Day | Time | Task
Jimbob, Tuesday, 13:00, Eat super delicious spaghetti.
I'm wondering what is the best way to design the tables?
I could create a table for every day of the week, or have one big table that will store info for any day of the week. But what if I have a million users, would one big table be a performance issue?
Also, for the field of the tables I was planning to make one row store only one task for each hour, but I could also store all the tasks for each hour of the day. The latter could result in a lot of null values and take up more memory, but if the users fill out a lot of the calendar it seems like it could reduce a lot of rows, and redundant username entries. It also makes it easier to print out the schedule. Any thoughts?
Username | Hour | Task|
or
Username | 12am |Task1 | 1am | Task2 | 2am ....
Thanks.
The best way to do this, imo, would be to have two tables: one for users and one for tasks.
Users would have a user ID, a username plus whatever else.
Tasks would have the date and time, the description of the task, a task ID and the user ID of the user whose task it is. You shouldn't make a table storing each day of the week since you can just query the dates of the tasks. This is how you should design the tables. The actual calendar and the queries should be done in PHP or whatever you want to use for it.