it is possible to "group by" without losing the original rows? - mysql

i have a query like this:
ID | name | commentsCount
1 | mysql for dummies | 33
2 | mysql beginners guide | 22
SELECT
...,
commentsCount // will return 33 for first row, 22 for second one
FROM
mycontents
WHERE
name LIKE "%mysql%"
also i want to know the total of comments, of all rows:
SELECT
...,
SUM(commentsCount) AS commentsCountAggregate // should return 55
FROM
mycontents
WHERE
name LIKE "%mysql%"
but this one obviously returns a single row with the total.
now i want to merge these two queries in one single only,
because my actual query is very heavy to execute (it uses boolean full text search, substring offset search, and sadly lot more), then i don't want to execute it twice
is there a way to get the total of comments without making the SELECT twice?
!! custom functions are welcome !!
also variable usage is welcome, i never used them...

You can cache the intermediate result to a temporary table, and then do the sum over this table

One obvious solution is storing intermediate results withing another 'temporary' table, and than perform aggregation in the second step.
Another solution is preparing a lookup table containing sums you need (but there obviously needs to be some grouping ID, I call it MASTER_ID), like that:
CREATE TABLE comm_lkp AS
SELECT MASTER_ID, SUM(commentsCount) as cnt
FROM mycontents
GROUP BY MASTER_ID
Also create an index on that table on column MASTER_ID. Later, you can modify your query like that:
SELECT
...,
commentsCount,
cnt as commentsSum
FROM
mycontents as a
JOIN comm_lkp as b ON (a.MASTER_ID=b.MASTER_ID)
WHERE
name LIKE "%mysql%"
It also shouldn't touch your performance as long as lookup table will be relatively small.

A GROUP BY on one of the ID fields might do the trick. This will then give you the SUM(commentsCount) for each ID.
The query in your question is not detailed enough to know which of your fields/tables the ID field should come form.

Related

Generate a query that show the times that questions get wrong

I have a table named countwronganswer with columns cwa_id, question_num. How can I generate a table with query that shows two columns, one column lists all the question_num and second column lists the number of times that cwa_id that related to the question_num.
Question Number |Total # of Mistake |
1 12
2 22
..etc
ATTENTION: This question was asked without the awareness of the existence of count or Groupby method because of the knowledge level at that state. Count() or Groupby() were the key to generate the 2nd column of total # values which I did not aware of completely, therefore, any attempt, at that point of time, to write the code for the data will be close to meaningless. Vote up if possible if you think its useful or resolved your issue.
Probably something like this
SELECT question_num, COUNT(cwa_id) total_mistakes
FROM countwronganswer
GROUP BY question_num
select question_num , count(cwa_id)
from tableName group by question_num

SQL query to check if any cell in a column matches with a set of given strings.

I have this data in a column
Steffi | ND Baumecker | Cassy
I would like to do a query to find if any of the above exist in another column
example of other column (Artist being column name)
Artist
Steffi
Derrick Carter
Ben Klock
Craig Richards
I don't think a LIKE will work here so wondering what query I can use to return the artist name from column 'Artist' when a match is made - so in the above example 'Steffi' would be returned.
Would I also need to remove the spaces before and after the | in the first column?
Thanks!
If I understand properly your problem: you want to filter rows using values from a column and searching for these values in another column?
SELECT a.first_name, a.last_name, a.nickname
FROM artist AS a
WHERE a.related_nickname IN (
SELECT sa.nickname
FROM artist AS sa
WHERE sa.popularity > 30
)
MySQL documentation: http://dev.mysql.com/doc/refman/5.1/en/any-in-some-subqueries.html
It seems that you are trying to achieve a complicated task and I'd advise you to try a couple of things.
Subqueries are useful but makes your queries much slower so using two queries might speed things up. The first query would pick the values that will be used for filtering and the second query will search rows.
If you filter by string, consider using indexes on your table: http://dev.mysql.com/doc/refman/5.1/en/create-index.html

MySQL find the last occurrence of a string pattern in a specific column

I have a mysql table in which there is a column e.g. called name. The column data has a specific pattern nameBase+number. E.g.
name
----------
test0
test1
test2
stack0
stack1
stack2
Each time I want to add data to the column, I have to find the last number for specific nambeBase and add the new entry +number+1.
For example, if now test came, I have to add test3 to db.
My question: What is the best way to 1. check if the nameBase already exists in db(sth like contains) and 2.find the last nameBase number. E.g. here for test is 3.
Update : Everyone, one update. I finally used java Pattern class. So cool and easy. It made everything so simple. I just could add the /d to pattern and then I could check if that matches the name and could use the pattern group to easily access the second part.
The real solution here is to change the database schema to split this into two columns, the name and its number. It becomes trivial then to get the aggregate MAX() via
SELECT name, MAX(num) AS num FROM tbl GROUP BY name
However,if changing it is not an option, I would recommend using REPLACE() to remove the name portion from the column value leaving only the number portion when querying, and get the aggregate MAX() of that to find the highest existing number for it:
SELECT
MAX(REPLACE(name, <the name to search>, '')) AS maxnum
FROM tbl
WHERE
name LIKE '<the name to search>%'
Or instead of LIKE, using a regular expression, which is more accurate than LIKE (in case a name contains another name, the LIKE might match) but more expensive:
SELECT
MAX(REPLACE(name, <the name to search>, '')) AS maxnum
FROM tbl
WHERE
name REGEXP '^<the name to search>[0-9]+$'
I would do this with an additional table with two columns and store in this table each name and the last assigned id. And then replace your nameBase+number column in your original table with a name column being a foreign key to the addition table, and a number column, being the appropriate count for that entry.
This will be much easier and more efficient to manipulate.
If possible, I would restructure the table to place these in either 2 tables (better) or at least two columns (medium). The structure you have is not normalized at all :-/
Without knowing too much about your schema; here is my recommendation for the two-table solution: (note: this is normalized and also follows the idiom "Do not store that which can be calculated")
names
------
id | name
01 | test
02 | stack
name_hits
-------
name_id | date
01 | 01/01/2001
01 | 01/15/2001
01 | 04/03/2001
02 | 01/01/2001
...
and then select like this:
SELECT names.name, count(name_hits.id) as hits
FROM names JOIN name_hits ON names.id=name_hits.name_id
GROUP BY names.id
and insert like this:
INSERT INTO name_hits SELECT id, NOW() FROM names WHERE name = "stack";
Presuming that you are unable to change the structure of the table, you can do what you want. However, it is rather expensive.
What you would like to do is something like:
select name
from t
where left(name, length(<name parameter>)) = <name parameter>
order by name desc
limit 1
Unfortunately, your naming probably does not allow this, because you are not left padding the numeric portion with zeroes.
So, the following gets around this:
select name,
cast(substring(name, length(<name parameter>), 1000) as int) as number
from t
where left(name, length(<name parameter>)) = <name parameter>
order by 2 desc
limit 1
This is not particularly efficient. Also, indexes cannot really help with this because the collating sequence for strings is different than for numbers (test0, test1, test10, test100, test11, etc. versus 0, 1, 2, 3, 4 . . .).
If you can, I would follow the advice of the others who suggest multiple columns or tables. I only offer this as a method where you don't have to modify the current table.
If you cannot change the schema, try this:
INSERT INTO names (name)
SELECT CONCAT("stack", CAST(TRIM(LEADING "stack" FROM name) AS INT)+1)
WHERE name LIKE "stack%" ORDER BY name DESC LIMIT 1;
The idea is:
select the "highest" previous value,
chop of the name,
cast the remaining string as an int,
add one to it,
then put the name back on it.
I have not tested this... I hope it leads you in the right direction.
Note that I have used a constant string "stack" as an example, you will likely want to make that dynamic.

optimize SELECT query, knowing that we are dealing with a limited range

I am trying to include in a MYSQL SELECT query a limitation.
My database is structured in a way, that if a record is found in column one then only 5000 max records with the same name can be found after that one.
Example:
mark
..mark repeated 5000 times
john
anna
..other millions of names
So in this table it would be more efficent to find the first Mark, and continue to search maximum 5000 rows down from that one.
Is it possible to do something like this?
Just make a btree index on the name column:
CREATE INDEX name ON your_table(name) USING BTREE
and mysql will silently do exactly what you want each time it looks for a name.
Try with:
SELECT name
FROM table
ORDER BY (name = 'mark') DESC
LIMIT 5000
Basicly you sort mark 1st then the rest follow up and gets limited.
Its actually quite difficult to understand your desired output .... but i think this might be heading in the right direction ?
(SELECT name
FROM table
WHERE name = 'mark'
LIMIT 5000)
UNION
(SELECT name
FROM table
WHERE name != 'mark'
ORDER BY name)
This will first get upto 5000 records with the first name as mark then get the remainder - you can add a limit to the second query if required ... using UNION
For performance you should ensure that the columns used by ORDER BY and WHERE are indexed accordingly
If you make sure that the column is properly indexed, MySQL will take care off optimisation for you.
Edit:
Thinking about it, I figured that this answer is only useful if I specify how to do that. user nobody beat me to the punch: CREATE INDEX name ON your_table(name) USING BTREE
This is exactly what database indexes are designed to do; this is what they are for. MySQL will use the index itself to optimise the search.

randomizing large dataset

I am trying to find a way to get a random selection from a large dataset.
We expect the set to grow to ~500K records, so it is important to find a way that keeps performing well while the set grows.
I tried a technique from: http://forums.mysql.com/read.php?24,163940,262235#msg-262235 But it's not exactly random and it doesn't play well with a LIMIT clause, you don't always get the number of records that you want.
So I thought, since the PK is auto_increment, I just generate a list of random id's and use an IN clause to select the rows I want. The problem with that approach is that sometimes I need a random set of data with records having a spefic status, a status that is found in at most 5% of the total set. To make that work I would first need to find out what ID's I can use that have that specific status, so that's not going to work either.
I am using mysql 5.1.46, MyISAM storage engine.
It might be important to know that the query to select the random rows is going to be run very often and the table it is selecting from is appended to frequently.
Any help would be greatly appreciated!
You could solve this with some denormalization:
Build a secondary table that contains the same pkeys and statuses as your data table
Add and populate a status group column which will be a kind of sub-pkey that you auto number yourself (1-based autoincrement relative to a single status)
Pkey Status StatusPkey
1 A 1
2 A 2
3 B 1
4 B 2
5 C 1
... C ...
n C m (where m = # of C statuses)
When you don't need to filter you can generate rand #s on the pkey as you mentioned above. When you do need to filter then generate rands against the StatusPkeys of the particular status you're interested in.
There are several ways to build this table. You could have a procedure that you run on an interval or you could do it live. The latter would be a performance hit though since the calculating the StatusPkey could get expensive.
Check out this article by Jan Kneschke... It does a great job at explaining the pros and cons of different approaches to this problem...
You can do this efficiently, but you have to do it in two queries.
First get a random offset scaled by the number of rows that match your 5% conditions:
SELECT ROUND(RAND() * (SELECT COUNT(*) FROM MyTable WHERE ...conditions...))
This returns an integer. Next, use the integer as an offset in a LIMIT expression:
SELECT * FROM MyTable WHERE ...conditions... LIMIT 1 OFFSET ?
Not every problem must be solved in a single SQL query.