Writing an sql query that uses count - mysql

Im trying to write a query that utilizes count in sql. The query I am trying to write is.
Find users that reviewed at least 2 restaurants.
Here are the tables that I am using:
explain is_a_restaurant;
+--------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------+-------+
| business_id | int(11) | NO | PRI | NULL | |
| cuisine_type | varchar(20) | YES | | NULL | |
| total_seats | int(11) | YES | | 1 | |
+--------------+-------------+------+-----+---------+-------+
explain reviews;
+-------------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------+------+-----+---------+-------+
| business_id | int(11) | NO | PRI | NULL | |
| user_id | int(11) | NO | PRI | NULL | |
| review_id | int(11) | NO | PRI | NULL | |
| review_date | date | YES | | NULL | |
| star_rating | int(1) | YES | | 1 | |
+-------------+---------+------+-----+---------+-------+
explain users;
+------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+-------+
| user_id | int(11) | NO | PRI | NULL | |
| name | varchar(50) | YES | | NULL | |
| user_since | date | YES | | NULL | |
Here is what ive tried (ive tried a lot more than this but heres one):
SELECT reviews.user_id FROM reviews JOIN is_a_restaurant ON
(reviews.business_id = is_a_restaurant) WHERE (count(*).is_a_restaurant > 1)
GROUP BY reviews.user_id ASC;
Heres the error that I get
You have an error in your SQL syntax; check the manual that corresponds to
your MySQL server version for the right syntax to use
near '.is_a_restaurant > 1) GROUP BY reviews.user_id ASC' at line 1

I'm not a MySQL guy so my syntax might be a little out, but you probably want to using a HAVING clause.
SELECT reviews.user_id
FROM reviews
JOIN is_a_restaurant ON reviews.business_id = is_a_restaurant.business_id
GROUP BY reviews.user_id ASC
HAVING COUNT(*) > 1;
The HAVING clause is like a WHERE clause but is used for aggregated values (the COUNT in this case).
You were also missing the column name from is_a_restaurant in the JOIN expression.

You need the HAVING clause.
SELECT reviews.user_id
FROM reviews
JOIN is_a_restaurant ON (reviews.business_id = is_a_restaurant.business_id)
GROUP BY reviews.user_id ASC
HAVING count(*) > 1

count() is an aggregate function. SQL aggregate functions are those which return a single value, calculated from values in a column.You can not use where clause with aggregate functions.Instead,you have to use having clause.
Below are the most commonly used aggregate functions.
AVG() - Returns the average value
COUNT() - Returns the number of rows
FIRST() - Returns the first value
LAST() - Returns the last value
MAX() - Returns the largest value
MIN() - Returns the smallest value
SUM() - Returns the sum
So,use having instead of where.
Take care and good luck....!!

Related

Nested queries in mysql which return primary key?

Q. print the complete details of the product which is ordered by the maximum number of customers and its price is greater than 3.0
products
+--------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------+----------------+
| productID | int | NO | PRI | NULL | auto_increment |
| Name | varchar(30) | NO | | NULL | |
| Price | double(3,2) | NO | | NULL | |
| CoffeeOrigin | varchar(30) | YES | | NULL | |
+--------------+-------------+------+-----+---------+----------------+
orders
+------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+----------+------+-----+---------+----------------+
| orderID | int | NO | PRI | NULL | auto_increment |
| productID | int | YES | MUL | NULL | |
| customerID | int | YES | MUL | NULL | |
| Date_Time | DateTime | NO | | NULL | |
+------------+----------+------+-----+---------+----------------+
query:
select * from products where productID=y.id (
select y.id from (
select products.productsID as id, count(*) as counter
from orders join products on orders.productID=products.productID
group by productID order by counter desc limit 1
) y
);
what is that I m doing is not correct?
Two things:
First, if you use a scalar subquery, you don't need to give it a table alias y. You only need to assign a table alias if you use a subquery as a derived table. That is, in the FROM clause.
Second, if you compare productID to the result of the scalar subquery, you don't need to reference the y.ID. The subquery expression itself can be the right hand side of the comparison.
You can write expressions to compare to a scalar subquery like this:
WHERE productID = ( ... subquery... )
No table alias following the subquery, and no need to reference y.ID.

Why does this query return an intermediate record?

I ran a somewhat nonsense query on MySQL, but because its output is the same each time, I'm wondering if someone can help me understand the underlying algorithm.
Here's the table Orders on which we'll execute the query (database taken from here, just in case someone's interested):
+----------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+-------------+------+-----+---------+-------+
| orderNumber | int(11) | NO | PRI | NULL | |
| orderDate | date | NO | | NULL | |
| requiredDate | date | NO | | NULL | |
| shippedDate | date | YES | | NULL | |
| status | varchar(15) | NO | | NULL | |
| comments | text | YES | | NULL | |
| customerNumber | int(11) | NO | MUL | NULL | |
+----------------+-------------+------+-----+---------+-------+
There are 326 records for now, with the largest orderNumber being 10425.
Now here's the query I ran (basically removed GROUP BY from a sensible query):
mysql> select count(1), orderNumber, status from orders;
+----------+-------------+---------+
| count(1) | orderNumber | status |
+----------+-------------+---------+
| 326 | 10100 | Shipped |
+----------+-------------+---------+
1 row in set (0.00 sec)
So I'm asking for the total number of rows, along with status and orderNumber, which can be just about anything under the given circumstances. But the query always returns orderNumber 10100, even if I log out and run it again.
Is there a predictable answer for this?
There's no predictable answer for which you should use in your design. In general, the DB will return the values of the first row that matches the query. If you want predictability, you should apply an aggregate to every column (e.g. using MIN or MAX to always get smallest/largest value)

Calculate average of values between 2 columns sql

I have a table called validation_errors that looks like this:
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| link | varchar(200) | NO | MUL | NULL | |
| message | varchar(500) | NO | | | |
| explanation | mediumtext | NO | | NULL | |
| type | varchar(50) | NO | | | |
| subtype | varchar(50) | NO | | | |
| message_id | varchar(50) | NO | | | |
+-------------+--------------+------+-----+---------+----------------+
Link table looks like this:
+-----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| link | varchar(200) | NO | PRI | NULL | |
| visited | tinyint(1) | NO | | 0 | |
| validated | tinyint(1) | NO | | 0 | |
+-----------+--------------+------+-----+---------+-------+
I wish to calculate the average number of validation errors per page per topdomain.
I have a query that can fetch the amount of pages per topdomain:
SELECT substr(link, - instr(reverse(link), '.')) as domain , count(*) as count
FROM links
GROUP BY domain
ORDER BY count desc
limit 30;
And have a sql query that can fetch the amount of validation errors per top domain:
SELECT substr(link, - instr(reverse(link), '.')) as domain ,count(*) as count
FROM validation_errors
GROUP BY domain
ORDER BY count desc
limit 30;
What i now need to do is combine them into a query and divise the results of one column with the other and i can't figure out how to do it.
Any help would be greatly apriciated.
First, use substring_index(), rather than your construct. Here is the query to join them together:
select domain, sum(numviews) as numviews, sum(numerrors) as numerrors,
sum(numerrors) / nullif(sum(numviews), 0) as error_rate
from ((SELECT substring_index(link, '.', -1) as domain , count(*) as numviews, 0 as numerrors
FROM links
GROUP BY domain
) UNION ALL
(SELECT substring_index(link, '.', -1) as domain , 0, count(*)
FROM validation_errors
GROUP BY domain
)
) d
GROUP BY domain;
With both variables, I don't know which 30 you want to choose, so I haven't included an order by.
Note that this doesn't use a join, it uses union all with aggregation. This ensures that you will get all domains, even those with no views and those with no errors.

MySQL Limit query by time when there's not enough results

I have a big table, with 670k rows and I'm running a SELECT with a lot of WHEREs to search and filter useful results, the thing is sometimes there are NO results with the selected filters, and the query just goes all over the table and takes a lot of time, I'd like to stop the query if there are no results found in, say, 30 seconds.
This is my query:
SELECT date, s.name, l.id, l.title,ratingsum,numvotes,keyword,tag
from news_links l
LEFT JOIN sources s on s.id = l.source
WHERE
l.date BETWEEN STR_TO_DATE(?,'%Y-%m-%d')
AND STR_TO_DATE(?,'%Y-%m-%d')
AND s.name like ?
AND ((numvotes-1) *?) <= l.ratingsum
AND numvotes > ?
AND matches = 1
AND tag >= ?
AND tag <= ?
AND (l.title like ? or l.keyword like ?)
AND category >= ?
AND category <= ?
order by date desc
limit ?,15
I tried running a sub-query instead of joining but it didn't speed up the query.
News table(640k rows)
-----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | UNI | NULL | auto_increment |
| link | varchar(450) | NO | PRI | NULL | |
| date | datetime | NO | MUL | NULL | |
| title | varchar(145) | NO | MUL | NULL | |
| source | int(11) | NO | MUL | NULL | |
| text | mediumtext | YES | | NULL | |
| numvotes | int(3) | NO | MUL | 0 | |
| ratingsum | int(3) | NO | | 0 | |
| matches | int(1) | NO | | 0 | |
| keyword | varchar(45) | YES | | NULL | |
| tag | int(1) | NO | | 0 | |
+-----------+--------------+------+-----+---------+----------------+
I have indexes set up on date,title,source,numvotes as well as the primary key on link
670k rows should run VERY fast in MySQL. You should have a closer look at your indices. Start adding a combined HASH index on news_links.source and news_links.matches:
ALTER TABLE news_links ADD INDEX myIdx1 USING HASH (source, matches)
What does EXPLAIN SELECT ... gives you with that?
After that you can try to improve the Performance further by including more Information in your index (Note that MySQL will use only one index per table). Add a BTREE index:
ALTER TABLE news_links ADD INDEX myIdx2 USING BTREE (source, matches, `date`)
BTREE will be good for range-queries (eg with a BETWEEN in it). HASH is good for equal/unequal conditions. If you want to index several columns with mixed conditions (range an equal) use BTREE
What does EXPLAIN SELECT ... gives you now?

'Unknown column in where clause' in Delphi but not MySQL

I'm about ready to rip my hair out and take up poop flinging as a living!
I have a MySQL query which runs fine in MySQL
SELECT
p.ID AS DataID,
p.timestamp AS Timestamp,
sum(p.Value * v.Factor) AS Value,
v.VirtualProfiles_id AS VProfileID
FROM
profiledata p
JOIN
profilevirtualjoin v
ON
p.Profile_ID=v.Profile_ID
WHERE
v.VirtualProfiles_id = 5
GROUP BY
v.Profile_ID,
p.timestamp
But when I try to run this as a query in a SQLDataSet in Delphi
SQLDataSet2.Active := False;
SQLDataSet2.CommandText := 'SELECT p.ID AS DataID, p.timestamp AS Timestamp, sum(p.Value * v.Factor) AS Value,' +
'v.VirtualProfiles_id AS VProfileID FROM profiledata p JOIN profilevirtualjoin v ON ' +
'p.Profile_ID=v.Profile_ID WHERE v.VirtualProfiles_id = ' + InttoStr(5)
+' GROUP BY v.Profile_ID, p.timestamp';
SQLDataSet2.Active := True;
I get an error
First chance exception at $765BC41F. Exception class TDBXError with message 'Unknown column 'v.VirtualProfiles_id' in 'where clause''. Process EMVS.exe (7556)
If anyone can offer any insight, I would be most appreciative.
EDIT:
I am using the MySQL server 5.5 and Delphi XE
What I am trying to do is this:
I have a tables as follows:
Profile:
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| ID | int(11) | NO | PRI | NULL | auto_increment |
| Designation | varchar(255) | YES | | NULL | |
| Description | text | YES | | NULL | |
| UnitID | int(11) | NO | PRI | NULL | |
+-------------+--------------+------+-----+---------+----------------+
profiledata
+------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+----------+------+-----+---------+----------------+
| ID | int(11) | NO | PRI | NULL | auto_increment |
| TimeStamp | datetime | YES | | NULL | |
| Value | double | YES | | NULL | |
| Profile_ID | int(11) | NO | PRI | NULL | |
+------------+----------+------+-----+---------+----------------+
Virtualprofiles
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| Designation | varchar(45) | NO | | NULL | |
| Description | varchar(255) | YES | | NULL | |
| Unit_ID | int(11) | NO | PRI | 0 | |
+-------------+--------------+------+-----+---------+----------------+
profilevirtualjoin
+--------------------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+---------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| VirtualProfiles_id | int(11) | NO | PRI | NULL | |
| Profile_ID | int(11) | NO | PRI | NULL | |
| Factor | double | NO | | 1 | |
+--------------------+---------+------+-----+---------+----------------+
What I need to do is to "produce" a new profile which is the sum of a set of existing profiles. so, the data from the profiledata table must be summed where the ProfileID is included in the virtualprofile and the timestamp values are equal.
The Problem
So, the problem is this. The DBExpress driver provided with Delphi XE can only process Dynamic SQL queries, not MySQL Queries. Although Dynamic SQL is compatible with MySQL, it is not compatible the other way around.
Quoting from the MySQL Manual (sec 12.16.3):
In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause.
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause.
The updated DBExpress driver included with Delphi XE3 includes specific support for MySQL code, and so this limitation is not applicable.
The Workaround
The solution to this problem is to create a view in MySQL server and to call it from Delphi using only Dynamic SQL compatible code. In the end the following workaround did the trick:
In MySQL:
CREATE VIEW `VirtualProfileData` AS
SELECT
p.ID AS DataID,
p.timestamp AS Timestamp,
sum(p.Value * v.Factor) AS Value,
v.VirtualProfiles_id AS VProfileID
FROM
profiledata p
JOIN
profilevirtualjoin v
ON
p.Profile_ID=v.Profile_ID
GROUP BY
v.Profile_ID,
p.timestamp
Then in Delphi
SQLDataSet2.Active := False;
SQLDataSet2.CommandText := 'SELECT * FROM VirtualProfileData WHERE VProfileID = ' + InttoStr(5);
SQLDataSet2.Active := True;
You changed the name of the column here:
v.VirtualProfiles_id AS VProfileID
After that point, in most cases (the exception being those involving grouping or aggregation), you need to refer to the column by the new name. I think that's the case here.
Try changing your WHERE clause to use the alias instead:
WHERE v.VirtualProfiles_id = ' + InttoStr(5)
The problem is the compatibility between Mysql Types and Delphi types try to use Basics Types of delphi
In my case changing from
SELECT * FROM table_name WHERE column_name = 'VALUE';
to
SELECT * FROM table_name WHERE column_name LIKE 'VALUE';
solved the problem and return the same result.
I didn't dig dipper to figure it out what was happening, but it is a weird bug because it works fine with every other column except with a specific one.