I'm not entirely sure if this is possible, but I suspect it is.
I'm trying to gather some very basic statistics, so I have a'tracker' table that stores info on an ongoing basis, like so;
ID, IP, itemid
Each time an item is viewed, the visitors IP address and the Item ID are logged.
On a daily basis, I'd like to summarize this data and insert it into another table, like so;
ID, itemid, views
Now, the 'views' element I want to be unique - so ignoring any duplicate IP addresses (counting them only once).
I know I could simply loop through them all and do it that way, but is it possible to do the entire process with just a single query?
I'm using MySQL
If you group the tracker table by itemid, the number of distinct IP addresses should be the number of views you want:
INSERT INTO newtable (itemid, views)
SELECT itemid, COUNT(DISTINCT IP)
FROM tracker
GROUP BY itemid;
In other RDBMS it possible on this manner:
insert into othertable (field_views, field_itemid)
select count(distinct t.views), t.itemid from tracker
group by t.itemid
See also http://dev.mysql.com/doc/refman/5.0/en/insert-select.html
Note, this solution implies presence autoincrement in othertable.id
Try this,
insert into newtable(itemid,views)
select itemid,count(*)
from (
select itemid
from tracker
group by itemid,ip
)
as a
group by a.itemid.
Related
I have 2 tables (1) Updates (2) Companies
Updates Table Columns: ID, Title, inserted_at, updated_at, revisions, published_at, archived_at, versions
Companies Table columns: id, name, host, email, inserted_at, updated_at, features.
How do I write a query to show how many posts have been made by a company.
What I know so far is I need to use COUNT in the query but how can I get the no. of updates by a company using that?
SELECT COUNT (column_name)
FROM TABLE (table_name)
condition??
Thanks in advance.
You will have to add a column "company_id" in Updates table which will reference the ID column in Companies table. You will need to add this column then only you can identify which update was made by which company.
So the new structure will be as follows
Companies (ID, name, host, email, inserted_at, updated_at, features)<br/>
Updates (ID, title, inserted_at, updated_at, revisions, published_at, archived_at, company_id)
Then use the following command to select the count of updates
SELECT COUNT(*) FROM updates WHERE company_id = 1;
Or if you want for all the companies then use the following command
SELECT u.company_id, c.name COUNT(u.company_id) FROM companies c, updates u, WHERE c.id = u.company_id GROUP BY u.company_id, c.name;
I am creating a query for a log system. The table contains 100,000 rows or so and I would like to remove duplicates for the the following columns and only return the latest entry.
Columns to avoid duplicates,
user
ip
time_accessed
mainlocation
secondlocation
thirdlocation
did_user_have_access
The purpose of this is to see which portions of the site a user has visited. We do not need to know that they have visited a particular pages 100 times, we only need to know that they visited it once.
The table has the following columns,
id
user
ip
time_accessed
mainlocation
secondlocation
thirdlocation
task
did_user_have_access
My question is, why do the following queries return such drastic results? The MAX('id') query returns 450 results and the MAX(time_accessed) returns 835. Shouldn't they return the same ammount?
SELECT DISTINCT mainlocation, secondlocation, thirdlocation, ip, user, did_user_have_access, time_accessed
FROM `log_table`
WHERE `id` IN (SELECT MAX(id) AS id
FROM `log_table`
GROUP BY `mainlocation`, `secondlocation`, `thirdlocation`, `ip`, `user`, `did_user_have_access`)
ORDER BY `log_table`.`time_accessed` DESC;
SELECT DISTINCT mainlocation, secondlocation, thirdlocation, ip, user, did_user_have_access, time_accessed
FROM `log_table`
WHERE `time_accessed` IN (SELECT MAX(`time_accessed`) AS time_accessed
FROM `log_table`
GROUP BY `mainlocation`, `secondlocation`, `thirdlocation`, `ip`, `user`, `did_user_have_access`)
ORDER BY `log_table`.`time_accessed` DESC;
Without knowing how do you populate both fields you're applying MAX() to - it could hardly be answered, only guessed, perhaps.
Although... does this matter? If you get a proper result - does it?
Then, you don't have to split it into two queries - as you're grouping results exactly by the fields you expect to be de-duped, you guaranteed to have unique combinations with MAX() in the main query:
SELECT DISTINCT mainlocation, secondlocation, thirdlocation,
ip, user, did_user_have_access,
MAX(`time_accessed`) AS last_accessed
FROM log_table
GROUP BY mainlocation, secondlocation, thirdlocation,
ip, user, did_user_have_access
In other words, each six-tuple would be unique with each last_accesed
I am trying to retrieve the the first row among the duplicate row, THE FIRST OCCURED ***
--Table--
Order_No Product User
1 Book Student
2 Book Student
3 Book Student
I want to get the Order_No of the first duplicate row in JAVA, I have used DISTINCT and DISTINCT TOP 1 etc but nothing worked, NEED HELP
SELECT min(order_no), product, user
FROM 'table'
GROUP BY user, product
This is basic SQL?
SELECT min(order_no), product, user FROM table GROUP BY product, user
See also more information on GROUP BY
All fields not part of your group by must have some sort of way to determine which to pick of the n potentially different values. min() will pick the lowest value (even with strings and dates) while max() will pick the highest. You can also use First() and Last() to grab the value according to when they show up.
Supposing you had other values to pick from, you might see something like:
SELECT min(order_no), product, user, min(creation_date),
sum(quantity), first(billing_address)
FROM orders GROUP BY product, user
SELECT t.*
FROM table t
WHERE NOT EXISTS ( SELECT a
FROM table t2
WHERE t2.Product = t.Product
AND t2.User = t.User
AND t2.Order_No < t.Order_No
)
Within my J2EE web application, I need to generate a bar chart representing the percentage of users in the system with specific alerts. (EDIT - I forgot to mention, the graph only deals with alerts associated with the first situationof each user, thus the min(date) ).
A simplified (but structurally similar) version of my database schema is as follows :
users { id, name }
situations { id, user_id, date }
alerts { id, situation_id, alertA, alertB }
where users to situations are 1-n, and situations to alerts are 1-1.
I've omitted datatypes but the alerts (alertA and B) are booleans. In my actual case, there are many such alerts (30-ish).
So far, this is what I have come up with :
select sum(alerts.alertA), sum(alerts.alertB)
form alerts, (
select id, min(date)
from situations
group by user_id) as situations
where situations.id = alerts.situation_id;
and then divide these sums by
select count(users.id) from users;
This seems far from ideal.
Your recommendations/advice as to how to improve as query would be most appreciated (or maybe I need to re-think my database schema)...
Thanks,
Anthony
PS. I was also thinking of using a trigger to refresh a chart specific table whenever the alerts table is updated but I guess that's a subject for a different query (if it turns out to be problematic).
At first, think about your schema again. You will have a lot of different alerts and you probably don't want to add a single column for every one of those.
Consider changing your alerts table to something like { id, situation_id, type, value } where type would be (A,B,C,....) and value would be your boolean.
Your task to calculate the percentages would then split up into:
(1) Count the total number of users:
SELECT COUNT(id) AS total FROM users
(2) Find the "first" situation for each user:
SELECT situations.id, situations.user_id
-- selects the minimum date for every user_id
FROM (SELECT user_id, MIN(date) AS min_date
FROM situations
GROUP BY user_id) AS first_situation
-- gets the situations.id for user with minimum date
JOIN situations ON
first_situation.user_id = situations.user_id AND
first_situation.min_date = situations.date
-- limits number of situations per user to 1 (possible min_date duplicates)
GROUP BY user_id
(3) Count users for whom an alert is set in at least one of the situations in the subquery:
SELECT
alerts.type,
COUNT(situations.user_id)
FROM ( ... situations.user_id, situations.id ... ) AS situations
JOIN alerts ON
situations.id = alerts.situation_id
WHERE
alerts.value = 1
GROUP BY
alerts.type
Put those three steps together to get something like:
SELECT
alerts.type,
COUNT(situations.user_id)/users.total
FROM (SELECT situations.id, situations.user_id
FROM (SELECT user_id, MIN(date) AS min_date
FROM situations
GROUP BY user_id) AS first_situation
JOIN situations ON
first_situation.user_id = situations.user_id AND
first_situation.min_date = situations.date
GROUP BY user_id
) AS situations
JOIN alerts ON
situations.id = alerts.situation_id
JOIN (SELECT COUNT(id) AS total FROM users) AS users
WHERE
alerts.value = 1
GROUP BY
alerts.type
All queries written from my head without testing. Even if they don't work exactly like that, you should still get the idea!
I have a products table which contains duplicate products by a column id_str and not id. We use the id_str to track each product. This is what I tried thus far:
Created a temp table and truncated it, then ran the following query
INSERT INTO products_temp SELECT DISTINCT id_str, id, title, url, image_url, long_descr, mp_seller_name, customer_rating, curr_item_price, base_item_price, item_num, rank, created_at, updated_at, published, publish_ready, categories, feed_id, category_names, last_published_at, canonical_url, is_curated, pr_attributes, gender, rating, stock_status, uploadedimage_file_name, updated_by, backfill_text, image_width, image_height, list_source, list_source_time, list_category, list_type, list_image, list_name, list_domain, notes, street_date, list_product_rank, created_by from products
And this moved everything over however when I searched the new table for duplicate id_str's:
SELECT id_str, COUNT(*) C FROM PRODUCTS GROUP BY id_str HAVING C > 1
I get the same result as I do on the original table. What am i missing?
one or more of the other columns cause the rows being inserted to be unique.
you are only testing the id_str in the count query,.
Using SELECT DISTINCT only removes duplicated entire rows. It doesn't remove a row if only one of the values is the same and the others are different.
Assuming that id is unique, try this instead:
INSERT INTO products_temp
SELECT id_str, id, title, url, -- etc
FROM products
WHERE id IN (SELECT MIN(id) FROM products GROUP BY id_str)
Try SELECT id_str, COUNT(*) C FROM PRODUCTS_TEMP GROUP BY id_str HAVING C > 1
In your case you are selecting again from the original table.
This is the simplest way I found to find and delete duplicates:
Note: Because of a bug with the InnoDB engine, for this to work you need to change your engine to MyISAM:
ALTER TABLE <table_name> ENGINE MyISAM
then add a unique index to the column you are trying to find dup's in using ignore:
ALTER IGNORE TABLE <table_name> ADD UNIQUE INDEX(`<column_name>`)
and change your db engine back:
ALTER TABLE <table_name> ENGINE InnoDB
and if you want you can delete the index you just created, but I would suggest also looking into what caused the duplicates in the first place.