MYSQL - Selecting a specific date range to get "current" popular screensavers - mysql

Let's say I have a screensaver website. I want to display the CURRENT top 100 screensavers on the front page of the website.
What I mean is, "RECENT" top 100 screensavers. What would be an example query to do this?
My current one is:
SELECT * FROM tbl_screensavers WHERE WEEK(tbl_screensavers.DateAdded) = WEEK('".date("Y-m-d H:i:s",strtotime("-1 week"))."') ORDER BY tbl_screensavers.ViewsCount, tbl_screensavers.DateAdded
This will select the most viewed ("tbl_screensavers.ViewsCount") screensavers that were added ("tbl_screensavers.DateAdded") in the last week.
However, in some cases there are no screensavers, or less than 100 screensavers, submitted in that week.
So, how can I perform a query which would select "RECENT" top 100 screensavers? Hopefully you have an idea of what I'm try to accomplish when I say "RECENT" or "CURRENT" top screensavers. -- aka. the most viewed, recently - not the most viewed, all-time.

Given no other algorithm to weigh the value of a view vs. a recent view, you would just simply want
SELECT * FROM tbl_screensavers ORDER BY ViewsCount limit 100
However, to capture the concept of "recent" you may want to introduce an algorithm to weigh the recent-ness of a particular view. One way to do that is to assign a daysOld score to each view and show the 100 with the lowest score (with this mechanism, low score is good like in golf).
I'm not enough of a MySQL guru to write the query for that, but it would involve summing up the score, computed based on daysOld=today-dateOfScore and then ordering the result set based on that score, with a limit of 100.

Related

Order by then select incrementally

I have a table of > 250k rows of 'names' (and ancillary info) which I am displaying using jQuery Datatables.
My Users can choose any 'name' (Row), which is then flagged as 'taken' (and timestamped).
A (very) cut down version of the table is:
Key, Name, Taken, Timestamp
I would like to be able to display the 'taken' rows (in timestamp order) first and then the untaken records in their key order [ASC] next.
The problem would be simple, but, because of size constraints (both visual UI & data set size) My display mechanism paginates - 10 / 20 / 50 / 100 rows (user choice)
Which means a) the total number of 'taken' will vary and b) the pagination length varies.
Thus I can see no obvious method of keeping track of the pagination.
(My Datatable tells me the count of the start record and the length of the displayed records)
My SQL (MySQL) at this level is weak, and I have no idea how to return a record set that accounts for the 'taken' offset without some kind of new (or internal MySQL) numeric indices to paginate to.
I thought of:
Creating a temporary table with the key and a new numeric indices on
each pagination.
Creating a trigger that re-ordered the table when the row was
'taken'.
Having a "Running order" column that was updated on each new 'taken'
Some sort of cursor based procedure (at this point my hair was
ruffled as the explanations shot straight over the top of my head!)
All seem excessive.
I also thought of doing a lot of manipulation in PHP (involving separate queries, dependant on the pagination size, amount of names already taken, and keeping a running record of the pagination position.)
To the Human Computer (Brain) the problem is untaxing - but translating it into SQL has foxed me, as has coming up with a fast alternative to 1-3 (the test case on updating the "Running order" solution took almost three minutes to complete!)
It 'feels' like there should be a smart SQL query answer to this, but all efforts with ORDER BY, LIMITS, and the like fall over unless I return the whole dataset and do a lot of nasty counting.
Is there something like a big elephant in the room I am missing - or am I stuck with the hard slog to get what I need.
A query that displays the 'taken' rows (in timestamp order) first and then the untaken records in their key order [ASC] next:
SELECT *
FROM `table_name`
ORDER BY `taken` DESC, IF(`taken` = 1, `Timestamp`, `Key`) ASC
LIMIT 50, 10
The LIMIT values: 10 is the page size, 50 is the index of the first element on page 6.
Change the condition on IF(taken = 1,Timestamp,Key) with the correct condition to match the values you store in column taken. I assumed you store 1 when the row is 'taken' and 0 otherwise.

Design for 'Total' field in a database

I am trying to find an optimal solution for my Database (MySQL), but I'm stuck over the decision whether or not to store a Total column.
This is the simplified version of my database :
I have a Team table, a Game table and a 'Score' table. Game will have {teamId, scoreId,...} while Score table will have {scoreId, Score,...} (Here ... indicates other columns in the tables).
On the home page I need to show the list of Teams with their scores. Over time the number of Teams will grow to 100s while the list of Score(s) will grow to 100000s. Which is the preferred way:
Should I sum up the scores and show along with teams every time the page is requested. (I don't want to cache because the scores will keep changing) OR
Should I have a total_score field in the Team table where I update the total_score of a team every time a new score is added to the Scores table for that group?
Which of the two is a better option or is there any other better way?
I use two guidelines when deciding to store a calculated value. In the best of all worlds, both of these statements will be true.
1) The value must be computationally expensive.
2) The value must have a low probability of changing.
If the cost of calculating the value is very high, but it changes daily, I might consider making a nightly job that updates the value.
Start without the total column and only add it if you start having performance issues.
Calculating sum at request time is better for accuracy but worse for efficiency.
Caching total in a field (dramatically) improves performance of certain queries, but increases code complexity or may show stale data (if you update cached value not at the same time, but via cron job).
It's up to you! :)
I agree that computed values should not be used except for special situations such as month end snapshots of databases.
I would simply create a view with one column in the view equal to your computed total column. Then you can query the view instead of the base tables.
Depending on how often your scores gets updated and what exactly the "score" means
Case1: Score is a LIVE score
If the "score" is the live score like "runs scored in cricket or baseball" or "score of vollyball match or tabletennis" then I really dont understand the need of showing the "sum" of the "running" scores. However, this may be a requirements also in some cases like showing the total runs scored by a team till now + the runs scored so far in the on going (live) game.
In this case I suggest you another option which is combination of your 1st and 2nd option
Total_score in the team table would be good with slight change in your data model. which is
Add a new column in the scores table called LIVE which will be 0 for a finished match 1 for a live match (and optionally -1 indicating match is about to start but the scores wont get update)
Now union two tables something like
select team_id,sum(total_sore) from (
select team_id,total_score from team
union
select team_id,sum(score) total_score from scores where live = 1 group by team_id)subquery
group by team_id
Case2: Score is just a RESULT
Well just query the db directly (your 1st option) as because the result will be updated only after the game ends and the update infact it will be a new entry in the score table.
If my assumption is correct, the scores get updated only after the game is finished. Moreover the update can be even less often when considered the games played by a team.

How to make turnover on Mysql database records

I am a website developer and I need help for an analyse: My (future) website is more or less a villa directory. People can add their villas there. Each villa will be stored in database.
I need to show 15 villas per page but I want a "turn over" (not sure it's the correct word in English) of the villas: every hour the villa that appears first on first page becomes the last villa of last page (so every villa rank increase of 1 except the first one that become the last). I want every villa to have the same chance (more or less) to appear on the first page. I don’t want a totally random system.
I need help on how to make a simple system that would not take a lot of resources (should be working with a few millions of records).
Note: I don’t want to use the ID of the villa because if a person posts 3 different villas at the same time, they will be all shown next to each other.
My proposition:
I create a field (INTEGER) called “random_order” for each villa and I put a random number between 0 and Max(INTEGER) and I create an Index on the column “random_order”.
Then to get the records in the order I want, I store (dunno where yet) a variable that point to a record in the index. Then every hours, I increase by 1 this variable (with a modulo).
I’m not an expert on indexes so I’m not really sure if it’s possible to do it and how to do it. I don’t know if there is a better way to do it as well…
Could you please tell me if this is correct or if you have better ideas?
Thank you
Another thing you could do, is store a count variable - from 0 to MAX, and constantly update that. Then query the server for the top 15 villas (using ORDER BY ASC/DESC) on (random_order + count). This will prevent the need to update the column every hour - only the count variable needs to be updated.
EDIT:
First you would get the count (from where you have stored it) and store it in a variable - count.
Then execute a query like
SELECT *, (random_order + <count>)%MAX_VAL AS villa_order
FROM villa_table
ORDER BY villa_order ASC
LIMIT 15
This will prevent constant unnecessary updations to your indexed column.
EDIT 2:
Ok after further analyzing, this is how i would do this.
Execute a simple select query
SELECT * FROM villa_table
WHERE random_order > count
ORDER BY random_order
LIMIT 15
If the number of rows in the result set is < 15 then fill in the remaining records from the beginning using.
SELECT *
FROM villa_table
ORDER BY random_order ASC
LIMIT <number of rows to be filled>
Even on 20m rows on an indexed column this takes < .5s.

How do I get the ID of a rows which have MAX and MIN values in SQL

I am trying to make the queries my website uses more efficient.
Being a bit vague about SQL, I've not really learnt how to use nested queries, but I have just managed to get something that is pretty near what I want.
I sell guitars, I have a big database with all the products with different finish options listed individually. Items have unique IDs in the dB but are grouped by their title, for example, a Gibson Les Paul Standard is listed in my dB 7 times with 7 different finish options. Not all the finish options will necessarily have the same price, and not all finish options will necessarily be in stock.
In the search results page of my website I want to be able to show:
1) Just one record per product, ie 1 record for Gibson LP Std, which can then be sub-linked to the different finishes.
2) The actual product displayed must either be the cheapest finish option, OR, the cheapest in stock.
This is currently working on my website, but it's using N+1 queries and seems to be running dreadfully slowly, but for an example of what I mean, click here: http://www.hartnollguitars.co.uk/search.asp?subcat=Gibson-Les-Pauls (if the bloody thing works)
Part one is fine, I can just group the title in SQL, it's getting part 2 out that's the problem.
Using the following SQL query I can get the lowest price and the highest price and I have counted how many variants there are, I also have the max and min stock levels.
results.Open "SELECT * FROM
(SELECT *, count(id) as Variants, MAX(price) as highestPrice, MIN(price) as
lowestPrice, MAX(shopstock) as highestStock, MIN(shopstock) as lowestStock FROM
products WHERE item LIKE '%"& replace([searchterm]," ","%") &"%' GROUP BY item)
AS UnknownVar LIMIT 40", conn, 3, &H0001
What I need to be able to do is get the ID value for the rows representing the max and min stock and price values.
I basically need to be able to run if/or logic on it and I am not sure if this is possible.
So, I need to be able to say
if Item_With_Cheapest_Price is in stock, display this as the thumbnail & link
else
display first item in price sorted list where stock >=1
I also need a fall back, if none of the finishes are in stock, display the cheapest one.
The database is MySQL using ODBC connections, I am currently scripting in Classic ASP but aim to upgrade to .NET, once I've worked out how!!! :-)
I think for the order by part you should use something like
order by case
when stock > 0 then 0
when stock < 0 then 1
end ascending,
price ascending
I didn't check the syntax but that's the idea. You can google case in order by for more info.
As for the rest of you requirements I would need the table structure to better understand...
Do you know the concept of dense_rank? If not I could explain it to you. Your purpose could be solved by following queries. Have a look at this.
SELECT id,
MIN(stock) KEEP (DENSE_RANK FIRST ORDER BY stock,price) "Lowest"
,MAX(stock) KEEP (DENSE_RANK LAST ORDER BY stock,price) "Highest"
FROM products
GROUP BY id;

Need help with a database design for Top 10

I am trying to come up with a database design to hold the "Top 10" results for some calculations that are being done. Basically, when all is said in done, there will be 3 "Top 10" categories, which I am fine with all being separate tables, however I need to be able to go back and later pull historical data about what was in the Top 10 at certain times, hence the need for a database, although a flat-file would work, this has the potential to hold years worth of data.
Now, it's been awhile since I have done anything serious with a database, other than something that had a couple of simple tables, so I am having some issues thinking through this design. If someone could help me with the design of it, I know enough MySQL to get the rest done.
So, in essence, I need to store: A group of 10 names, a % of the total points each name had, the rank they held in the Top 10 and a time associated with that Top 10 (So I can later query for that time)
I would think I need a table for for the Top 10 with 11 columns, one for the ID and 10 for the Foreign Key of the 'Names' table, that holds every name ever used with a PK, Name, %, and Rank. This seems clunky to me, anyone else have a suggestion?
edit:The 'Top 10' is associated with a specific set of data for 5-minute intervals, and each interval is completely independent from the previous or future intervals.
I don't recommend your solution, because then if you want to ask the database "How often has Joe been in the top 10," you have to write 10 queries of the form
SELECT Date FROM Top10 WHERE FirstPlace = 'joe'
SELECT Date FROM Top10 WHERE SecondPlace = 'joe'
...
Instead, how about a Rankings table, with fields:
id
Date
Person
Rank
Then if you want the Top 10 list for a certain date, the query is
SELECT * FROM Rankings WHERE Date = ...
and if you want to know someone's historical ranking, the query is
SELECT * FROM Rankings WHERE Person = ...
and if you want to know all the historical leaders, the query is
SELECT * FROM Rankings WHERE Rank = 1
The downside to this is that you might accidentally make two different people 8th place, and your database would allow the anomaly. But I have good news for you -- people might actually tie for 8th place, so you might actually want that to be possible!
I assume that your "Top 10" is a snapshot data in certain time. And your business logic is that "every 5 minutes" so that the time is the parent entity for table design
top_10_history
th_id - the primary key
th_time - the time point when taking the snapshot data of "Top 10"
top_10_detail
td_th_id - the FK to top_10_history
td_name_id - the FK to name
td_percentage - the "%"
td_rank - the rank
If the sequence of "Top 10" could be calculated from columns in "top_10_detail", you don't need a column to keep the sequence of it. Otherwise, you need a column to persist the sequence for it.
If you need more complicated query such as "The top 10 at 12:00 AM in last 30 days", using individual columns for "day", "hour", and "minute" would be a better idea for performance(with suitable indexes).