I have a database including certain strings, such as '{TICKER|IBM}' to which I will refer as ticker-strings. My target is to count the amount of ticker-strings per day for multiple strings.
My database table 'tweets' includes the rows 'tweet_id', 'created at' (dd/mm/yyyy hh/mm/ss) and 'processed text'. The ticker-strings, such as '{TICKER|IBM}', are within the 'processed text' row.
At this moment, I have a working SQL query for counting one ticker-string (thanks to the help of other Stackoverflow-ers). What I would like to have is a SQL query in which I can count multiple strings (next to '{TICKER|IBM}' also '{TICKER|GOOG}' and '{TICKER|BAC}' for instance).
The working SQL query for counting one ticker-string is as follows:
SELECT d.date, IFNULL(t.count, 0) AS tweet_count
FROM all_dates AS d
LEFT JOIN (
SELECT COUNT(DISTINCT tweet_id) AS count, DATE(created_at) AS date
FROM tweets
WHERE processed_text LIKE '%{TICKER|IBM}%'
GROUP BY date) AS t
ON d.date = t.date
The eventual output should thus give a column with the date, a column with {TICKER|IBM}, a column with {TICKER|GOOG} and one with {TICKER|BAC}.
I was wondering whether this is possible and whether you have a solution for this? I have more than 100 different ticker-strings. Of course, doing them one-by-one is an option, but it is a very time-consuming one.
If I understand correctly, you can do this with conditional aggregation:
SELECT d.date, coalesce(IBM, 0) as IBM, coalesce(GOOG, 0) as GOOG, coalesce(BAC, 0) AS BAC
FROM all_dates d LEFT JOIN
(SELECT DATE(created_at) AS date,
COUNT(DISTINCT CASE WHEN processed_text LIKE '%{TICKER|IBM}%' then tweet_id
END) as IBM,
COUNT(DISTINCT CASE WHEN processed_text LIKE '%{TICKER|GOOG}%' then tweet_id
END) as GOOG,
COUNT(DISTINCT CASE WHEN processed_text LIKE '%{TICKER|BAC}%' then tweet_id
END) as BAC
FROM tweets
GROUP BY date
) t
ON d.date = t.date;
I'd return the specified resultset like this, adding expressions to the SELECT list for each "ticker" I want returned as a separate column:
SELECT d.date
, IFNULL(SUM(t.processed_text LIKE '%{TICKER|IBM}%' ),0) AS `cnt_ibm`
, IFNULL(SUM(t.processed_text LIKE '%{TICKER|GOOG}%'),0) AS `cnt_goog`
, IFNULL(SUM(t.processed_text LIKE '%{TICKER|BAC}%' ),0) AS `cnt_goog`
, IFNULL(SUM(t.processed_text LIKE '%{TICKER|...}%' ),0) AS `cnt_...`
FROM all_dates d
LEFT
JOIN tweets t
ON t.created_at >= d.date
AND t.created_at < d.date + INTERVAL 1 DAY
GROUP BY d.date
NOTES: The expressions within the SUM aggregates above are evaluated as booleans, so they return 1 (if true), 0 (if false), or NULL. I'd avoid wrapping the created_at column in a DATE() function, and use a range scan instead, especially if a predicate is added (WHERE clause) that restricts the values ofdatebeing returned fromall_dates`.
As an alternative, expressions like this will return an equivalent result:
, SUM(IF(t.process_text LIKE '%{TICKER|IBM}%' ,1,0)) AS `cnt_ibm`
Related
I am trying to speed up a MYSQL query.
In a column called "MISC", I first have to extract a "traceID" variable, that will be used to match row of another table.
Example of the MISC column:
PFFCC_Strip/fkk49322/PMethod=Diners/CardType=Diners/9999******9999/2010/TraceId=7122910
I am extracting the value "7122910" as traceID and find corresponding row with a left join. The traceId value being unique, only one row must be present on each table.
I cannot set Index on the tables to speed up process. Any approach that could make this query run faster? As it is, it takes a few seconds to run which is not possible.
select *
from
(select TraceID,PP,UDef2, Payment_Method, Approved, TransactionID, Amount
from pr) pr
left join
(select
PAYMENT_ID as Payment_ID_omega, TRANSACTION_TYPE,
REQUESTED_AMOUNT, AMOUNT, `STATUS` as StatusRef_omega,
REQUEST_DATE, Agent,
if (locate('TraceId=',MISC)>0, SUBSTRING_INDEX(MISC,'TraceId=',-1),'') as traceID
from BankingActivity ) omega
on pr.TraceID = omega.traceID
having
(REQUEST_DATE BETWEEN DATE_ADD(DATE(NOW()), INTERVAL -1 DAY) AND NOW())
ORDER BY pr.TraceID DESC
You can place your filters inside the query before join that must make a difference and you must have the index on table pr(TraceID) and BankingActivity(REQUEST_DATE, traceID). For more optimised query, Please post the execution plan.
select * from(select TraceID
,PP
,UDef2
,Payment_Method
,Approved
,TransactionID
,Amount
from pr) pr
left join (select PAYMENT_ID as Payment_ID_omega
,TRANSACTION_TYPE
,REQUESTED_AMOUNT
,AMOUNT
,`STATUS` as StatusRef_omega
,REQUEST_DATE
,Agent
,if (locate('TraceId=', MISC) > 0, SUBSTRING_INDEX(MISC,'TraceId=',-1),'') as traceID
from BankingActivity
WHERE REQUEST_DATE BETWEEN DATE_ADD(DATE(NOW()), INTERVAL -1 DAY) AND NOW()) omega
on pr.TraceID = omega.traceID
ORDER BY pr.TraceID DESC
Hi I want sum two columns (type double) with two diffrent tables. My query sql works until i add clauses "where". If every clausule "where" are met then is okej, return correct result. If even one clause return null then result is null. What change my code to single clause return 0 if doesnt exist record.
select (select sum(amount) from change_graphic where month(change_date)=4 and year(change_date)=2019)+(select SUM(provision) from contracts where accepted=0 and month(date)=4 and year(date)=2019);
Use coalesce():
select (select coalesce(sum(amount), 0)
from change_graphic
where month(change_date) = 4 and year(change_date) = 2019) +
(select coalesce(sum(provision), 0)
from contracts
where accepted = 0 and month(date) = 4 and year(date) = 2019
);
The subqueries are guaranteed to return one row, because they are aggregation queries with no GROUP BY. Hence, you can convert NULL generated by the SUM() into 0 for the addition.
I would recommend that you approach the date comparisons as:
select (select coalesce(sum(amount), 0)
from change_graphic
where change_date >= '2019-04-01' and
change_date < '2019-05-01'
) +
(select coalesce(sum(provision), 0)
from contracts
where accepted = 0 and
date >= '2019-04-01' and
date < '2019-05-01'
);
This enables MySQL to use an index on the date column, if an appropriate index is available.
I have the following query I'm trying to use to spit out each day in a date range and show the # of leads, assignments, & returns:
select
date_format(from_unixtime(date_created), '%m/%d/%Y') as date_format,
(select count(distinct(id_lead)) from lead_history where (date_format(from_unixtime(date_created), '%m/%d/%Y') = date_format) and (id_vertical in (2)) and (id_website in (3,8))) as leads,
(select count(id) from assignments where deleted=0 and (date_format(from_unixtime(date_assigned), '%m/%d/%Y') = date_format) and (id_vertical in (2)) and (id_website in (3,8))) as assignments,
(select count(id) from assignments where deleted=1 and (date_format(from_unixtime(date_deleted), '%m/%d/%Y') = date_format) and (id_vertical in (2)) and (id_website in (3,8))) as returns
from lead_history
where date_created between 1509494400 and 1512086399
group by date_format
The date_created, date_assigned, and date_deleted fields are integers representing timestamps. id, id_lead, id_vertical and id_website are already indexed.
Would adding indexes to date_created, date_assigned, date_deleted, and deleted help make this faster? The issue I'm having is that it is very slow, and I'm not sure an index will help when using date_format(from_unixtime(...
Here is the EXPLAIN:
Looking to your code you could rewrite the query as ..
select
date_format(from_unixtime(date_created), '%m/%d/%Y') as date_format
, count(distinct(h.id_lead) as leads
, sum(case a.deleted = 1 then 1 else 0 end) assignments
, sum(case b.deleted = 0 then 1 else 0 end) returns
from lead_history h
inner join assignments on a a.date_assigned = h.date_created
and a.id_vertical = 2
and id_website in (3,8))
inner join assignments on b b.deleted = h.date_created
and a.id_vertical = 2
and id_website in (3,8))
where date_created between 1509494400 and 1512086399
group by date_format
anyway you shold avoid unuseful () and nested (), avoid unuseful conversion between date and use join instead of subselect .. or at least reduce similar sabuselect using case
PS for what concern the index remember that the use of conversion on a column value invalid the use of related the index ..
I have a single table with rows like this: (Date, Score, Name)
The Date field has two possible dates, and it's possible that a Name value will appear under only one date (if that name was recently added or removed).
I'm looking to get a table with rows like this: (Delta, Name), where delta is the score change for each name between the earlier and later dates. In addition, only a negative change interests me, so if Delta>=0, it shouldn't appear in the output table at all.
My main challenge for me is calculating the Delta field.
As stated in the title, it should be an SQL query.
Thanks in advance for any help!
I assumed that each name can have it's own start/end dates. It can be simplified significantly if there are only two possible dates for the entire table.
I tried this out in SQL Fiddle here
SELECT (score_end - score_start) delta, name_start
FROM
( SELECT date date_start, score score_start, name name_start
FROM t t
WHERE NOT EXISTS
( SELECT 1
FROM t x
WHERE x.date < t.date
AND x.name = t.name
)
) AS start_date_t
JOIN
( SELECT date date_end, score score_end, name name_end
FROM t t
WHERE NOT EXISTS
( SELECT 1
FROM t x
WHERE x.date > t.date
AND x.name = t.name
)
) end_date_t ON start_date_t.name_start = end_date_t.name_end
WHERE score_end-score_start < 0
lets say you have a table with date_value, sum_value
Then it should be something like that:
select t.date_value,sum_value,
sum_value - COALESCE((
select top 1 sum_value
from tmp_num
where date_value > t.date_value
order by date_value
),0) as sum_change
from tmp_num as t
order by t.date_value
The following uses a "trick" in MySQL that I don't really like using, because it turns the score into a string and then back into a number. But, it is an easy way to get what you want:
select t.name, (lastscore - firstscore) as diff
from (select t.name,
substring_index(group_concat(score order by date asc), ',', 1) as firstscore,
substring_index(group_concat(score order by date desc), ',', 1) as lastscore
from table t
group by t.name
) t
where lastscore - firstscore < 0;
If MySQL supported window functions, such tricks wouldn't be necessary.
I have a column inside my table: tbl_customers that distinguishes a customer record as either a LEAD or a CUS.
The column is simply: recordtype, with is a char(1). I populate it with either C, or L.
Obviously C = customer, while L = lead.
I want to run a query that groups by the day the record was created, so I have a column called: datecreated.
Here's where I get confused with the grouping.
I want to display a result (in one query) the COUNT of customers and the COUNT of leads for a particular day, or date range. I'm successful with only pulling the number for either recordtype:C or recordtype:L , but that takes 2 queries.
Here's what I have so far:
SELECT COUNT(customerid) AS `count`, datecreated
FROM `tbl_customers`
WHERE `datecreated` BETWEEN '$startdate."' AND '".$enddate."'
AND `recordtype` = 'C'
GROUP BY `datecreated` ASC
As expected, this displays 2 columns (the count of customer records and the datecreated).
Is there a way to display both in one query, while still grouping by the datecreated column?
You can do a group by with over multiple columns.
SELECT COUNT(customerid) AS `count`, datecreated, `recordtype`
FROM `tbl_customers`
WHERE `datecreated` BETWEEN '$startdate."' AND '".$enddate."'
GROUP BY `datecreated` ASC, `recordtype`
SELECT COUNT(customerid) AS `count`,
datecreated,
SUM(`recordtype` = 'C') AS CountOfC,
SUM(`recordtype` = 'L') AS CountOfL
FROM `tbl_customers`
WHERE `datecreated` BETWEEN '$startdate."' AND '".$enddate."'
GROUP BY `datecreated` ASC
See Is it possible to count two columns in the same query
There are two solutions, depending on whether you want the two counts in separate rows or in separate columns.
In separate rows:
SELECT datecreated, recordtype, COUNT(*)
FROM tbl_customers
WHERE datecreated BETWEEN '...' AND '...'
GROUP BY datecreated, recordtype
In separate colums (this is called pivoting the table)
SELECT datecreated,
SUM(recordtype = 'C') AS count_customers,
SUM(recordtype = 'L') AS count_leads
FROM tbl_customers
WHERE datecreated BETWEEN '...' AND '...'
GROUP BY datecreated
Use:
$query = sprintf("SELECT COUNT(c.customerid) AS count,
c.datecreated,
SUM(CASE WHEN c.recordtype = 'C' THEN 1 ELSE 0 END) AS CountOfC,
SUM(CASE WHEN c.recordtype = 'L' THEN 1 ELSE 0 END) AS CountOfL
FROM tbl_customers c
WHERE c.datecreated BETWEEN STR_TO_DATE('%s', '%Y-%m-%d %H:%i')
AND STR_TO_DATE('%s', '%Y-%m-%d %H:%i')
GROUP BY c.datecreated",
$startdate, $enddate);
You need to fill out the date format - see STR_TO_DATE for details.