How to Insert Data into Tempdb Until Amount of Distinct Values Met - mysql

I have a main table named tblorder.
It contains CUID(Customer ID), CuName(Customer Name) and OrDate(Order Date) that I care about.
It is currently ordered by date in ascending order(ex. 2001 before 2002).
Objective:
Trying to retrieve most recent 1 Million DISTINCT Customer's CUID and CuNameS, and Insert them Into a Tempdb(#Recent1M) for Later Joining Uses.
So I:
Would Need Order By desc to flip the date to retrieve most recent 1 Million Customers
Only want first 1 Million DISTINCT Customer Information(CUID, CuName)
I know following code is not correct, but it is the main idea. I just can't figure out the correct syntax. So far I have the While Loop with Select Into as the most plausible solution.
SQL Platform: SSMS
Declare #DC integer
Set #DC = Count(distinct(CUID)) from #Recent1M))
While (#DC <1000000)
Begin
Select CuID,CuName into #Recent1MCus from tblorder
End
Thank you very much, I appreciate any help!

TOP 1000000 is the way to go, but you're going to need an ORDER BY clause or you will get arbitrary results. In your case, you mentioned that you wanted the most recent ones, so:
ORDER BY OrderDate DESC
Also, you might consider using GROUP BY rather than DISTINCT. I think it looks cleaner and keeps the select list a select list so you have the option to include whatever else you might want (as I took the liberty of doing). Notice that, because of the grouping, the ORDER BY now uses MAX(ordate) since customers can presumably have multiple ordate's and we are interested in the most recent. So:
select top 1000000 cuid, cuname, sum(order_value) as ca_ching, count(distinct(order_id)) as order_count
into #Recent1MCus
from tblorder
group by cuid, cuname
order by max(ordate) desc
I hope this helps.

Wouldn't you just do this?
select distinct top 1000000 cuid, cuname
into #Recent1MCus
from tblorder;
If the names might not be distinct, you can do:
select top 1000000 cuid, cuname
into #Recent1MCus
from (select o.*, row_number() over (partition by cuid order by ordate desc) as seqnum
from tblorder o
) o
where seqnum = 1;

Use DISTINCT and ORDER BY <colname> DESC to get latest unique records.
Try this SQL query:
SELECT DISTINCT top 1000000
cuid,
cuname
INTO #Recent1MCus
FROM tblorder
ORDER BY OrDate DESC;

Related

MySQL Count the rows of which there are most

I have a businesses table. I want to get the owners name who has the most businesses. So far all I only know that I need to use GROUP BY and HAVING.
The problem is I only know the most basic queries...
Maybe something like this can help:
select owner, count(*) cntx
from businesses
group by owner
order by cntx desc
limit 1
Or executing the query without limit 1 clause, and then iterate the result till your needs are satisfied.
Use GROUP BY and order descending and then take the top one record which is the one that has most businesses:
select OwnerId, count(*) from businesses
group by OwnerId order by count(*) desc
limit 1

MySQL - Get row and average of rows

First of all I'll just warn everyone that I'm something of a rookie with MySQL. Additionally I haven't tested the example queries below so they might not be perfect.
Anyway, I have a table of items, each one with a name, a category and a score. Every 12 hours the top item is taken, used and then removed.
So far I've simply been grabbing the top item with
SELECT * FROM items_table ORDER BY score DESC LIMIT 1
The only issue with this is that some categories are biased and have generally higher scores. I'd like to solve this by sorting by the score divided by the average score instead of simply sorting by the score. Something like
ORDER BY score/(GREATEST(5,averageScore))
I'm now trying to work out the best way to find averageScore. I have another table for categories so obviously I could add an averageScore column to that and run a cronjob to keep them updated and retrieve them with something like
SELECT * FROM items_table, categories_table WHERE items_table.category = categories_table.category ORDER BY items_table.score/(GREATEST(5,categories_table.averageScore)) DESC LIMIT 1
but this feels messy. I know I can find all the averages using something like
SELECT AVG(score) FROM items_table GROUP BY category
What I'm wondering is if there's some way to retrieve the averages right in the one query.
Thanks,
YM
You can join the query that calculates the averages:
SELECT i.*
FROM items_table i JOIN (
SELECT category, AVG(score) AS averageScore
FROM items_table
GROUP BY category
) t USING (category)
ORDER BY i.score/GREATEST(5, t.averageScore) DESC
LIMIT 1

MySQL GROUP_CONCAT DISTINCT ordering anomaly

I've been reading about this for the past day (even here), and have not found a suitable resource, so I'm popping it out there again :)
Check out these two queries:
SELECT DISTINCT transactions.StoreNumber FROM transactions WHERE PersonID=2 ORDER BY transactions.transactionID DESC;
and
SELECT GROUP_CONCAT(DISTINCT transactions.StoreNumber ORDER BY transactions.transactionID DESC SEPARATOR ',') FROM transactions WHERE PersonID=2 ORDER BY transactions.transactionID DESC;
From everything I've read I would expect the two queries to return the same results, with the second set grouped into CSV. They're not though.
Result set for query 1 (picture each value in its own row, formatting results here is cumbersome):
'611'
'345'
'340'
'310'
'327'
'323'
'362'
'360'
'330'
'379'
'356'
'367'
'375'
'306'
'354'
'389'
'343'
'346'
'357'
'733'
'370'
'347'
'703'
'355'
'341'
'342'
'358'
'351'
'319'
'365'
'372'
'368'
'353'
'363'
'349'
'369'
'336'
'364'
'202'
'366'
'416'
'731'
Result Set for query 2:
611,379,375,389,703,355,351,372,368,362,342,365,353,341,733,347,336,319,354,306,345,364,202,358,370,343,366,349,356,367,369,416,323,346,731,360,363,330,310,357,340,327
If I remove the DISTINCT clause, the results line up.
Can anyone point out what I'm doing wrong with the difference between the queries above?
The fact that removing DISTINCT from each query returns the same result indicates that DISTINCT is problematic within GROUP_CONCAT. Performing a GROUP BY outside the GROUP_CONCAT causes multiple rows to be returned, which isn't what I'm after.
Any ideas on how I can get a GROUP_CONCAT DISTINCT list of StoreNumber, in order of TransactionID DESC?
Thanks all
Consider your first query:
SELECT DISTINCT transactions.StoreNumber
FROM transactions
WHERE PersonID=2
ORDER BY transactions.transactionID DESC;
This is equivalent to:
SELECT transactions.StoreNumber
FROM transactions
WHERE PersonID=2
group by transactions.StoreNumber
ORDER BY transactions.transactionID DESC;
You are ordering by something that is not in the select list. So, MySQL chooses an arbitrary transactionid for each store number. This may differ from one execution to another.
I believe the same thing is happening in the group_concat(). The issue is that the arbitrary number chosen is different for each one.
If you want consistency, consider these two queries:
SELECT transactions.StoreNumber
FROM transactions
WHERE PersonID=2
group by transactions.StoreNumber
ORDER BY min(transactions.transactionID) DESC;
and:
SELECT GROUP_CONCAT(DISTINCT t.StoreNumber ORDER BY t.mintransactionID DESC SEPARATOR ',')
FROM (select t.StoreNumber, min(TransactionId) as minTransactionId
from transactions t
WHERE PersonID=2.transactionID
group by t.StoreNumber
) t
These should produce the same results.
Before you complain too loudly about MySQL, any other database would return an error on the first query, because, when using select distinct, you can only order by columns in the select list (or expressions composed of them).

I want to use max and count aggregate functions together in mysql query

I am using mysql 5.0.51b.
I have one table named xyz.
xyz table has a columns abc,location,pqr and lmn
Everytime an information is sent to particular location, its entry is done in xyz table.
I want to have the name of the location to which maximum information is sent.
The way i tried:
First of all i count the number of entries sent to each location using count and group by.
Now, the problem is to have the name(s) of the location with maximum values.
I have used temporary solution:
I use order by clause and limit to get the first record that has max values.
But this has one problem
If two locations has same count then above solution will give only one location and the other with same count will not be returned.
I want to solve this problem
Any hint will be very helpful
Thanks in anticipation
Thank you very much to everyone who has responded to my question and spared time to solve my problem.
However, i have got the solution:
SELECT count( * ) AS cnt2, location
FROM sms
GROUP BY location
HAVING cnt2 = (
SELECT count( * ) AS cnt
FROM sms
GROUP BY location
ORDER BY cnt DESC
LIMIT 1 );
very important hint on http://lists.mysql.com/mysql/203074
The inner query gives you the max count and outer query compares each count with max count.
Select MAX(cnt.Total) from
(select count( Name)as Total from Gk_RegUser_answer_rel group by Reg_UserId) As cnt
Try this interesting solution -
SELECT x.* FROM xyz x
JOIN (
SELECT GROUP_CONCAT(location) locations FROM (
SELECT location, COUNT(*) cnt FROM xyz GROUP BY location ORDER BY COUNT(*)) t
GROUP BY cnt DESC
LIMIT 1
) t
ON FIND_IN_SET(x.location, t.locations);
SELECT COUNT(*), `location` FROM `xyz` GROUP BY `location`
Above query will give you the information count to a specific location.
If this is not what you were looking for, can you provide some sample data and expected output?

Mysql Groupby and Orderby problem

Here is my data structure
alt text http://luvboy.co.cc/images/db.JPG
when i try this sql
select rec_id, customer_id, dc_number, balance
from payments
where customer_id='IHS050018'
group by dc_number
order by rec_id desc;
something is wrong somewhere, idk
I need
rec_id customer_id dc_number balance
2 IHS050018 DC3 -1
3 IHS050018 52 600
I want the recent balance of the customer with respective to dc_number ?
Thanx
There are essentially two ways to get this
select p.rec_id, p.customer_id, p.dc_number, p.balance
from payments p
where p.rec_id IN (
select s.rec_id
from payments s
where s.customer_id='IHS050018' and s.dc_number = p.dc_number
order by s.rec_id desc
limit 1);
Also if you want to get the last balance for each customer you might do
select p.rec_id, p.customer_id, p.dc_number, p.balance
from payments p
where p.rec_id IN (
select s.rec_id
from payments s
where s.customer_id=p.customer_id and s.dc_number = p.dc_number
order by s.rec_id desc
limit 1);
What I consider essentially another way is utilizing the fact that select rec_id with order by desc and limit 1 is equivalent to select max(rec_id) with appropriate group by, in full:
select p.rec_id, p.customer_id, p.dc_number, p.balance
from payments p
where p.rec_id IN (
select max(s.rec_id)
from payments s
group by s.customer_id, s.dc_number
);
This should be faster (if you want the last balance for every customer), since max is normally less expensive then sort (with indexes it might be the same).
Also when written like this the subquery is not correlated (it need not be run for every row of the outer query) which means it will be run only once and the whole query can be rewritten as a join.
Also notice that it might be beneficial to write it as correlated query (by adding where s.customer_id = p.customer_id and s.dc_number = p.dc_number in inner query) depending on the selectivity of the outer query.
This might improve performance, if you look for the last balance of only one or few rows.
I don't think there is a good way to do this in SQL without having window functions (like those in Postgres 8.4). You probably have to iterate over the dataset in your code and get the recent balances that way.
ORDER comes before GROUP:
select rec_id, customer_id, dc_number, balance
from payments
where customer_id='IHS050018'
order by rec_id desc
group by dc_number