Populate second table with averages from first - mysql

I've got two tables, transactions and listings.
transactions list all of my transactions that occur each day. I have a column in transactions that lists the price of each transacted item. Several transactions occur each day. I would like to take the median or mean of all transactions in each day, and populate a new column in listings with this information.
So my end result would have a column in listings called daily_price_average, that takes the average price of individual transaction information from transactions.
Any thoughts on how to do this?
Or how could I do this using a view?

You can do this in a view as:
create v_listings as
select l.*,
(select avg(price)
from transactions t
where date(t.transactiondate) = l.date
) as daily_price_average
from listings l;
To do the update, you would first be sure daily_price_average is a column in listings:
update listings join
(select date(t.transactiondate) as tdate, avg(price) as avgprice
from transactions
group by date(t.transactiondate)
) td
on listings.date = td.tdate
set daily_price_average = td.avgprice;
Both of these assume that listings has a column called date for the average.

Use INSERT... SELECT
For average
INSERT INTO averages (day, average)
SELECT date, AVG(price)
GROUP BY date
Getting the median is... complicated.

Related

SQL: Obtain X, Y attributes for the greatest number of ID's (items sold)

I have 2 tables, one containing tickets and the other routes. I want to produce 2 attributes, flight_DATE and route_CODE, for the greatest numbers of tickets sold. Since there is no attribute that stores the number of tickets sold I have to perform a query finding the max() on the number of count(ticket_ID), as each ticket_ID represents a ticket sold... right? I have no actual database to try this out so that's my query:
SELECT routes.ROUTE_CODE , tickets.FLIGHT_DATE
FROM routes JOIN tickets ON routes.ROUTE_CODE = tickets.ROUTE_CODE
WHERE count(ticket.TICKET_ID) = (
SELECT max(count(ticket.TICKET_ID)
)
I am not so confident with SQL so is this even correct??? thanks in advance!
The idea behind your query is correct, but you wrote the max calculation without a level, so you will get the count of all the tickets sold.
You also cannot put in your where clause a condition on an aggregated column (as you did with count(ticket.TICKET_ID); that kind of condition goes to the having clause.
This one should do what you need
SELECT ROUTE_CODE, FLIGHT_DATE
JOIN tickets
GROUP BY ROUTE_CODE , FLIGHT_DATE
HAVING count(tickets.TICKET_ID) = ( /*filter the row with count = max count */
SELECT max(CNT) /* get the max of the counts */
FROM (
SELECT count(TICKET_ID) as CNT /* get the counts at desired levels */
FROM tickets
GROUP BY FLIGHT_DATE, ROUTE_CODE
)
)
I removed the join with routes table because the only column you were using (ROUTE_CODE) is available on the tickets table too, but that may be useful if you want to select more data from that table and definitely was not an error.
BTW, if you don't have a database available for testing, you can try your queries on sites like rextester

SQL Query: How to use sub-query or AVG function to find number of days between a new entry?

I have a two tables, one called entities with these relevant columns:
id, company_id ,and integration_id. The other table is transactions with columns id, entity_id and created_at. The foreign keys linking the two tables are integration_id and entity_id.
The transactions table shows the number of transactions received from each company from the entities table.
Ultimately, I want to find date range with highest volume of transactions occurring and then from that range find the average number of days between transaction for each company.
To find the date range I used this query.
SELECT DATE_FORMAT(t.created_at, '%Y/%m/%d'), COUNT(t.id)
FROM entities e
JOIN transactions t
ON ei.id = t.entity_id
GROUP BY t.created_at;
I get this:
Date_FORMAT(t.created_at, '%Y/%m/%d') | COUNT(t.id)
+-------------------------------------+------------
2015/11/09 4
etc
From that I determine the range I want to use as 2015/11/09 to 2015/12/27
and I made this query
SELECT company_id, COUNT(t.id)
FROM entities e
INNER JOIN transactions t
ON e.integration_id = t.entity_id
WHERE tp.created_at BETWEEN '2015/11/09' AND '2015/12/27'
GROUP BY company_id;
I get this:
company_id | COUNT(t.id)
+-----------+------------
1234 17
and so on
Which gives me the total transactions made by each company over this date range. What's the best way now to query for the average number of days between transactions by company? How can I sub-query or is there a way to use the AVG function on dates in a WHERE clause?
EDIT:
playing around with the query, I'm wondering if there is a way I can
SELECT company_id, (49 / COUNT(t.id))...
49, because that is the number of days in that date range, in order to get the average number of days between transactions?
I think this might be it, does that make sense?
I think this may work:
Select z.company_id,
datediff(max(y.created_at),min(created_at))/count(y.id) as avg_days_between_orders,
max(y.created_at) as latest_order,
min(created_at) as earliest_order,
count(y.id) as orders
From
(SELECT entity_id, max(t.created_at) latest, min(t.created_at) earliest
FROM entities e, transactions t
Where e.id = t.entity_id
group by entity_id
order by COUNT(t.id) desc
limit 1) x,
transactions y,
entities z
where z.id = x.entity_id
and z.integration_id = y.entity_id
and y.created_at between x.earliest and x.latest
group by company_id;
It's tough without the data. There's a possibility that I have reference to integration_id incorrect in the subquery/join on the outer query.

MySQL: WHERE COUNT(*) = 0

I am trying to get all the customer_id's where no rows from have been found, for example:
SELECT customer_id FROM transaction WHERE count(*) = '0'
I have tried this aswell:
SELECT customer_id, count(*) as total_rows FROM transaction WHERE total_rows='0'
But I get the error that total_rows is not a column.
The easiest way to do this is to think about it in a bit of a different way: "how do I get a list of all customers who have no transaction history?"
Simple! You get a list of all of the customers, join it against their transactions and filter out any customers who have a non-empty list of transactions. Or, in SQL:
SELECT
customer.customer_id
FROM customer
LEFT JOIN transaction
ON transaction.customer_id = customer.customer_id
WHERE
transaction.transaction_id IS NULL
Note that you cannot simply use the transaction table like you're attempting. It is not a complete list of customer_id but rather it contains only IDs of customers who have an order.
Instead of operating on transaction and finding customers with no transactions (which you literally cannot do), you must find all customers and then filter by those who have no transactions. Similar concept, just opposite order.

complex query in mysql for generating a report datewise

I have following tables in mysql:
salesinvoices (with invoice number, ledgerid and date)
salesinvoiceitems (with invoice number)
payments( with invoice number, ledgerid and date)
Now i need to calculate the total amount from salesinvoiceitems table against a specific invoiceNumber for a particular ledgerid by using the calculations of taxations etc (columns included in table).
Then i have a payments table that maintains the records of all the payments made against specific invoices date wise. This table also contains invoice number and ledger id.
I need to generate a report for a specific ledger id showing the end balance. I am clueless for such query. Please shed some light.
I assume your salesinvoiceitems table has an "amount" field so that we can calculate the sum of the item amounts per invoice. In the example below I call that column "item_amt". We could try doing something like this....
select ledgerid , sum(balance) as total_balance
from
-- sub query to calculate balance per invoice and ledgerid
(
select distinct
invoices.invoice_number,
invoices.ledgerid,
tot_item_amt,
tot_payment_amt,
(tot_item_amt - tot_payment_amt) as balance
from salesinvoices as invoices
-- sub query to get the sum of all the items on an invoice
inner join (select invoice_number, sum(item_amt) as tot_item_amt from salesinvoiceitems group by invoice_number) as items on invoices.invoice_number = items.invoice_number
-- sub query to get the sum of all the payments made against an invoice
inner join (select invoice_number, sum(payment_amt) as tot_payment_amt from payments group by invoice_number) as payments on invoices.invoice_number = payments.invoice_number
) t
-- sum balance and group by ledgerid
group by ledgerid

MySQL huge tables JOIN makes database collapse

Following my recent question Select information from last item and join to the total amount, I am having some memory problems while generation tables
I have two tables sales1 and sales2 like this:
id | dates | customer | sale
With this table definition:
CREATE TABLE sales (
id int auto_increment primary key,
dates date,
customer int,
sale int
);
sales1 and sales2 have the same definition, but sales2 has sale=-1 in every field. A customer can be in none, one or both tables. Both tables have around 300.000 records and much more fields than indicated here (around 50 fields). They are InnoDB.
I want to select, for each customer:
number of purchases
last purchase value
total amount of purchases, when it has a positive value
The query I am using is:
SELECT a.customer, count(a.sale), max_sale
FROM sales a
INNER JOIN (SELECT customer, sale max_sale
from sales x where dates = (select max(dates)
from sales y
where x.customer = y.customer
and y.sale > 0
)
)b
ON a.customer = b.customer
GROUP BY a.customer, max_sale;
The problem is:
I have to get the results, that I need for certain calculations, separated for dates: information on year 2012, information on year 2013, but also information from all the years together.
Whenever I do just one year, it takes about 2-3 minutes to storage all the information.
But when I try to gather information from all the years, the database crashes and I get messages like:
InternalError: (InternalError) (1205, u'Lock wait timeout exceeded; try restarting transaction')
It seems that joining such huge tables is too much for the database. When I explain the query, almost all the percentage of time comes from creating tmp table.
I thought in splitting the data gathering in quarters. We get the results for every three months and then join and sort it. But I guess this final join and sort will be too much for the database again.
So, what would you experts recommend to optimize these queries as long as I cannot change the tables structure?
300k rows is not a huge table. We frequently see 300 million row tables.
The biggest problem with your query is that you're using a correlated subquery, so it has to re-execute the subquery for each row in the outer query.
It's often the case that you don't need to do all your work in one SQL statement. There are advantages to breaking it up into several simpler SQL statements:
Easier to code.
Easier to optimize.
Easier to debug.
Easier to read.
Easier to maintain if/when you have to implement new requirements.
Number of Purchases
SELECT customer, COUNT(sale) AS number_of_purchases
FROM sales
GROUP BY customer;
An index on sales(customer,sale) would be best for this query.
Last Purchase Value
This is the greatest-n-per-group problem that comes up frequently.
SELECT a.customer, a.sale as max_sale
FROM sales a
LEFT OUTER JOIN sales b
ON a.customer=b.customer AND a.dates < b.dates
WHERE b.customer IS NULL;
In other words, try to match row a to a hypothetical row b that has the same customer and a greater date. If no such row is found, then a must have the greatest date for that customer.
An index on sales(customer,dates,sale) would be best for this query.
If you might have more than one sale for a customer on that greatest date, this query will return more than one row per customer. You'd need to find another column to break the tie. If you use an auto-increment primary key, it's suitable as a tie breaker because it's guaranteed to be unique and it tends to increase chronologically.
SELECT a.customer, a.sale as max_sale
FROM sales a
LEFT OUTER JOIN sales b
ON a.customer=b.customer AND (a.dates < b.dates OR a.dates = b.dates and a.id < b.id)
WHERE b.customer IS NULL;
Total Amount of Purchases, When It Has a Positive Value
SELECT customer, SUM(sale) AS total_purchases
FROM sales
WHERE sale > 0
GROUP BY customer;
An index on sales(customer,sale) would be best for this query.
You should consider using NULL to signify a missing sale value instead of -1. Aggregate functions like SUM() and COUNT() ignore NULLs, so you don't have to use a WHERE clause to exclude rows with sale < 0.
Re: your comment
What I have now is a table with fields year, quarter, total_sale (regarding to the pair (year,quarter)) and sale. What I want to gather is information regarding certain period: this quarter, quarters, year 2011... Info has to be splitted in top customers, ones with bigger sales, etc. Would it be possible to get the last purchase value from customers with total_purchases bigger than 5?
Top Five Customers for Q4 2012
SELECT customer, SUM(sale) AS total_purchases
FROM sales
WHERE (year, quarter) = (2012, 4) AND sale > 0
GROUP BY customer
ORDER BY total_purchases DESC
LIMIT 5;
I'd want to test it against real data, but I believe an index on sales(year, quarter, customer, sale) would be best for this query.
Last Purchase for Customers with Total Purchases > 5
SELECT a.customer, a.sale as max_sale
FROM sales a
INNER JOIN sales c ON a.customer=c.customer
LEFT OUTER JOIN sales b
ON a.customer=b.customer AND (a.dates < b.dates OR a.dates = b.dates and a.id < b.id)
WHERE b.customer IS NULL
GROUP BY a.id
HAVING COUNT(*) > 5;
As in the other greatest-n-per-group query above, an index on sales(customer,dates,sale) would be best for this query. It probably can't optimize both the join and the group by, so this will incur a temporary table. But at least it will only do one temporary table instead of many.
These queries are complex enough. You shouldn't try to write a single SQL query that can give all of these results. Remember the classic quote from Brian Kernighan:
Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?
I think you should try adding an index on sales(customer, date). The subquery is probably the performance bottleneck.
You can make this puppy scream. Dump the whole inner join query. Really. This is a trick virtually no one seems to know about.
Assuming dates is a datetime, convert it to a sortable string, concatenate the values you want, max (or min), substring, cast. You may need to adjust the date convert function (this one works in MS-SQL), but this idea will work anywhere:
SELECT customer, count(sale), max_sale = cast(substring(max(convert(char(19), dates, 120) + str(sale, 12, 2)), 20, 12) as numeric(12, 2))
FROM sales a
group by customer
Voilá. If you need more result columns, do:
SELECT yourkey
, maxval = left(val, N1) --you often won't need this
, result1 = substring(val, N1+1, N2)
, result2 = substring(val, N1+N2+1, N3) --etc. for more values
FROM ( SELECT yourkey, val = max(cast(maxval as char(N1))
+ cast(resultCol1 as char(N2))
+ cast(resultCol2 as char(N3)) )
FROM yourtable GROUP BY yourkey ) t
Be sure that you have fixed lengths for all but the last field. This takes a little work to get your head around, but is very learnable and repeatable. It will work on any database engine, and even if you have rank functions, this will often significantly outperform them.
More on this very common challenge here.