Hello all and thanks in advance
I have the tables accounts, votes and contests
A vote consists of an author ID, a winner ID, and a contest ID, so as to stop people voting twice
Id like to show for any given account, how many times theyve won a contest, how many times theyve come second and how many times theyve come third
Whats the fastest (execution time) way to do this? (Im using MySQL)
After using MySQL for a long time I'm coming to the conclusion that virtually any use of GROUP BY is really bad for performance, so here's a solution with a couple of temporary tables.
CREATE TEMPORARY TABLE VoteCounts (
accountid INT,
contestid INT,
votecount INT DEFAULT 0
);
INSERT INTO VoteCounts (accountid, contestid)
SELECT DISTINCT v2.accountid, v2.contestid
FROM votes v1 JOIN votes v2 USING (contestid)
WHERE v1.accountid = ?; -- the given account
Make sure you have an index on votes(accountid, contestid).
Now you have a table of every contest that your given user was in, with all the other accounts who were in the same contests.
UPDATE Votes AS v JOIN VoteCounts AS vc USING (accountid, contestid)
SET vc.votecount = vc.votecount+1;
Now you have the count of votes for each account in each contest.
CREATE TEMPORARY TABLE Placings (
accountid INT,
contestid INT,
placing INT
);
SET #prevcontest := 0;
SET #placing := 0;
INSERT INTO Placings (accountid, placing, contestid)
SELECT accountid,
IF(contestid=#prevcontest, #placing:=#placing+1, #placing:=1) AS placing,
#prevcontest:=contestid AS contestid
FROM VoteCounts
ORDER BY contestid, votecount DESC;
Now you have a table with each account paired with their respective placing in each contest. It's easy to get the count for a given placing:
SELECT accountid, COUNT(*) AS count_first_place
FROM Placings
WHERE accountid = ? AND placing = 1;
And you can use a MySQL trick to do all three in one query. A boolean expression always returns an integer value 0 or 1 in MySQL, so you can use SUM() to count up the 1's.
SELECT accountid,
SUM(placing=1) AS count_first_place,
SUM(placing=2) AS count_second_place,
SUM(placing=3) AS count_third_place
FROM Placings
WHERE accountid = ?; -- the given account
Re your comment:
Yes, it's a complex task no matter what to go from the normalized data you have to the results you want. You want it aggregated (summed), ranked, and aggregated (counted) again. That's a heap of work! :-)
Also, a single query is not always the fastest way to do a given task. It's a common misconception among programmers that shorter code is implicitly faster code.
Note I have not tested this so your mileage may vary.
Re your question about the UPDATE:
It's a tricky way of getting the COUNT() of votes per account without using GROUP BY. I've added table aliases v and vc so it may be more clear now. In the votes table, there are N rows for a given account/contest. In the votescount table, there's one row per account/contest. When I join, the UPDATE is evaluated against the N rows, so if I add 1 for each of those N rows, I get the count of N stored in votescount in the row corresponding to each respective account/contest.
If I'm interpreting things correctly, to stop people voting twice I think you only need a unique index on the votes table by author (account?) ID and contestID. It won't prevent people from having multiple accounts and voting twice but it will prevent anyone from casting a vote in a contest twice from the same account. To prevent fraud (sock puppet accounts) you'd need to examine voting patterns and detect when an account votes for another account more often then statistically likely. Unless you have a lot of contests that might actually be hard.
Related
Say I'll get all the followers of a certain content from my project; here is my db
table
contents
users
Now, everytime I want to get content's numbers of followers, I have this table here to get connections with users called content-followers.
table
contents
users
content-followers <
columns
user_id
content_id
Now my concern is say this will run getting the numbers of followers of a content, but this will be along with the other queries and stuff and I understand it may get the sql slower on process.
See, everytime people will visit the content, I'll have to show that count, but that count (as I imagine) will run through the entire table just to count.
Is there other way to make it simple? Like counting only once a certain time and save to contents table?
I have no proper database lessons so, thanks guys for your help in advance!
CREATE TABLE ContentFollowers (
user_id ...,
content_id ...,
PRIMARY KEY(user_id, content_id),
INDEX(content_id, user_id)
) ENGINE=InnoDB;
SELECT ...,
( SELECT COUNT(*) FROM ContentFollowers
WHERE user_id = u.id
) AS follower_count
FROM Contents AS c
JOIN Users AS u ON ...
WHERE ...
The COUNT(*) will efficiently use the PRIMARY KEY of ContentFollowers. The added time taken will be a few milliseconds, even with many millions of users and contents.
If you want to discuss further, please provide the SHOW CREATE TABLE for each relevant table and your tentative SELECT (which will have more than what I specified). So "... counting only once ..." should be unnecessary (and a hassle).
Is it possible for a "user" to "follow" a "content" more than once? This is a potential hack to mess up your numbers, but I think what I say here avoids that possibility. (A PRIMARY KEY includes an 'uniqueness' constraint.) Without this, a user could repeatedly click on [Follow] to inflate the number of 'followers'.
In what you have specified so far, I don't see the need for a TRIGGER. Furthermore, a Trigger would reopen the possibility of the above 'hack'.
What is the least resource-intensive way to calculate a sum of points from two tables? The total point tally is calculated by adding points from a table points and subtracting points from a table points_redeemed.
points:
CREATE TABLE IF NOT EXISTS points(
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
user__id INT,
tx__id INT,
points INT
) ENGINE=MyISAM;
points_redeemed:
CREATE TABLE IF NOT EXISTS points_redeemed(
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
user__id INT,
points INT
) ENGINE=MyISAM;
(Both tables above are heavily simplified.)
points is populated upon a transaction (recorded in a different table). When transaction values are changed or voided, the corresponding row in points is updated as well.
points_redeemed is populated when user redeems their accumulated points for a reward.
Use cases:
show stats to user and admin: total, redeemed, and unredeemed points
check unredeemed points upon user-initiated redeem request
The options I've came up with are:
a) Triggers.
Create a table points_sum with one row per user.id and add three triggers:
on insert into points
on update of points
on insert into points_redeemed
I've heard that MySQL triggers are not that performant though, so I'm simply not sure if this is a good idea.
b) View.
Create a view that calculates points.points - points_redeemed.points. Not sure if this is any better than just doing it on the fly.
c) Sum table.
Create a table points_sum and update it per separate query each time points and points_redeemed is inserted into and updated. This feels like the least effective way, but then again I could be wrong and it might be the best way.
d) On the fly.
Query points from both tables on the fly and calculate the difference. This is the easiest and probably the most accurate way, but it can potentially clog up the pipes a lot when the tables grow in size. Then again, are any of the other options any better in that regard?
Edit: These are the current on-the-fly queries.
First, a very straight-forward query from points_redeemed:
SELECT *
FROM points_redeemed
WHERE user__id = 1
Second, the points table is queried:
(
SELECT p.*,
tx.*
FROM points p
INNER JOIN tx ON p.tx__id = tx.id
WHERE p.user__id = '1'
AND p.tx_is_external IS NULL
ORDER BY p.date DESC
)
UNION
(
SELECT p.*,
tx.*
FROM points p
INNER JOIN tx_external tx ON p.tx__id = tx.id
WHERE p.user__id = '1'
AND p.tx_is_external = '1'
ORDER BY p.date DESC
)
(There are several named columns SELECTed that I abbrieviated as * here. In the second query, about 40 columns are fetched per row.)
After this, I'm looping through both result sets and adding/subtracting points on the app layer.
My worry is that the two separate queries, and the joins in the second query, might "clog the pipes" when the tx tables grow in size (and the points table too). That's why I'm trying to figure out a better way that will save resources at runtime.
The more I think about it though... transactions and points inserts will probably happen a lot more frequently than a user looking up their current point status. In that scenario, a trigger would probably have the opposite effect.
I'd appreciate any kind of insight. Thank you!
WHERE user__id = 1 needs INDEX(user__id) on the table.
( SELECT ... ORDER BY ... ) UNION ( SELECT ... ORDER BY ... ) will not have a particular order; do you need to move the ORDER BY outside?
tx and tx_external need id to be indexed (PRIMARY KEY?)
Did you really want UNION DISTINCT? That's the default. UNION ALL is faster.
Fix those, then see if you still need to discuss triggers, etc.
I'm implementing a voting system for a php project which uses mysql. The important part is that I have to store every voting action separately for statistic reasons. The users can vote for many items multiple times, and every vote has a value (think of it like a donation kinda stuff).
So far I have a table votes in which I'm planning to store the votes with the following columns:
user_id - ID of the voting user, foreign key from users table
item_id - ID of the item which the user voted for, foreign key from items table
count - # of votes spent
created - date and time of voting
I'll need to get things out of the table like: Top X voters for an item, all the items that a user have voted for.
My questions are:
Is this table design suitable for the task? If it is, how should I index it? If not, where did I go wrong?
Would it be more rewarding to create another table beside this one, which has unique rows for the user-item relationship (not storing every vote separately, but update the count row)?
Each base table holds the rows that make a true statement from some fill-in-the-(named-)blanks statement aka predicate.
-- user [userid] has name ...
-- User(user_id, ...)
SELECT * FROM User
-- user [user_id] voted for item [item_id] spending [count] votes on [created]
-- Votes(user_id, item_id, count, created)
SELECT * FROM Votes
(Notice how the shorthand for the predicate is like an SQL declaration for its table. Notice how in the SQL query a base table predicate becomes the table's name.)
Top X voters for an item, all the items that a user have voted for.
Is this table design suitable for the task?
That query can be asked using that design. But only you can know what queries "like" that one are. You have to define sufficient tables/predicates to describe everything you care about in every situation. If Votes records the history of all relevant info about all events then it must be suitable. The query "all the items that user User has voted for" returns rows satisfying predicate
-- user User voted for item [item] spending some count on some date.
-- for some count & created,
user User voted for item [item_id] spending [count] votes on [created]
-- for some count & created, Votes(User, item_id, count, created)
-- for some user_id, count & created,
Votes(user_id, item_id, count, created) AND user_id = User
SELECT item_id FROM Votes WHERE user_id = User
(Notice how in the SQL the condition turns up in the WHERE and the columns you keep are the ones that you care about. More here and here on querying.)
If it is, how should I index it?
MySQL automatically indexes primary keys. Generally, index column sets that you JOIN ON, otherwise test, GROUP BY or ORDER BY. MySQL 5.7 Reference Manual 8.3 Optimization and Indexes
Would it be more rewarding to create another table beside this one, which has unique rows for the user-item relationship
If you mean a user-item table for some count & created, [user_id] voted for [item_id] spending [count] votes on [created] and you still want all the individual votings then you still need Votes, and that user-item table is just SELECT user_id, item_id FROM Votes. But if you want to ask about people who haven't voted, you need more.
(not storing every vote separately, but update the count row)
If you don't care about individual votings then you can have a table with user, item and the sum of count for user-item groups. But if you want Votes then that user-item-sum table is expressible in terms of Votes using GROUP BY user_id, item_id & SUM(count).
I am working on extracting financial information from a couple of tables and summarizing it into another table. What I want is to select several account items from a balancesheet table by accountID, summing the items and then saving the result into another table. I need to do this for several clients. I have worked out part of the problem in this bit of code:
;with
T1 AS (
SELECT CompanyID, QEndDate, Qtr, [50], [76] FROM (
SELECT CompanyID, ItemID, CAST(Amount AS DECIMAL(18,4)) AS Amount, QEndDate, Qtr from BalanceSheet
WHERE CompanyID = 2335 AND (ItemID = 50 OR ItemID = 76) AND Amount <> '-'
AND QA = 'Q' )as s
PIVOT (MAX(Amount) FOR ItemID IN ([50], [76])) AS P
)
UPDATE Funds SET Funds.EV = (#mCap - ([50] + [76])) / #EB
FROM T1
INNER JOIN Funds ON T1.CompanyID = Funds.CompanyID
The above works fine for one Company, but I need to do several at a time.
A little added info:
The Balancesheet table contains all information as VARCHAR, hence the <> '-', which some companies (but not all) use to indicate Not Applicable as opposed to zero.
The 50 and 76 are item numbers from the Accounts table and indicate which account the amount belongs to.
I am picking up the amounts and items from the balancesheet table and assembling them on one line so that I can then access the items, perform some math and generate a result to be stored in the Funds table. I hope that all makes sense.
So how can I turn this into something that can perform the operations for as many customers as I need.
Thanks, and also special thanks to all the good folks who contributed ideas and code that allowed me to get this far.
What's stopping this for working over many companies, as is?
You are limiting the company by selection "WHERE CompanyID = 2335"; just broaden the scope of the data within the pivot. If this is unhelpful, then not sure I understand where the limitation is.
Please forgive my ignorance here. SQL is decidedly one of the biggest "gaps" in my education that I'm working on correcting, come October. Here's the scenario:
I have two tables in a DB that I need to access certain data from. One is users, and the other is conversation_log. The basic structure is outlined below:
users:
id (INT)
name (TXT)
conversation_log
userid (INT) // same value as id in users - actually the only field in this table I want to check
input (TXT)
response (TXT)
(note that I'm only listing the structure for the fields that are {or could be} relevant to the current challenge)
What I want to do is return a list of names from the users table that have at least one record in the conversation_log table. Currently, I'm doing this with two separate SQL statements, with the one that checks for records in conversation_log being called hundreds, if not thousands of times, once for each userid, just to see if records exist for that id.
Currently, the two SQL statements are as follows:
select id from users where 1; (gets the list of userid values for the next query)
select id from conversation_log where userid = $userId limit 1; (checks for existing records)
Right now I have 4,000+ users listed in the users table. I'm sure that you can imagine just how long this method takes. I know there's an easier, more efficient way to do this, but being self-taught, this is something that I have yet to learn. Any help would be greatly appreciated.
You have to do what is called a 'Join'. This, um, joins the rows of two tables together based on values they have in common.
See if this makes sense to you:
SELECT DISTINCT users.name
FROM users JOIN conversation_log ON users.id = converation_log.userid
Now JOIN by itself is an "inner join", which means that it will only return rows that both tables have in common. In other words, if a specific conversation_log.userid doesn't exist, it won't return any part of the row, user or conversation log, for that userid.
Also, +1 for having a clearly worded question : )
EDIT: I added a "DISTINCT", which means to filter out all of the duplicates. If a user appeared in more than one conversation_log row, and you didn't have DISTINCT, you would get the user's name more than once. This is because JOIN does a cartesian product, or does every possible combination of rows from each table that match your JOIN ON criteria.
Something like this:
SELECT *
FROM users
WHERE EXISTS (
SELECT *
FROM conversation_log
WHERE users.id = conversation_log.userid
)
In plain English: select every row from users, such that there is at least one row from conversation_log with the matching userid.
What you need to read is JOIN syntax.
SELECT count(*), users.name
FROM users left join conversion_log on users.id = conversation_log.userid
Group by users.name
You could add at the end if you wanted
HAVING count(*) > 0