I have a very large csv log file with the following header:
CustomerID , Date , URL , ....
I want to find all those customers who had visited at least 2 distinct URLS exactly in 2 days within the last 3 days.
What would be the SQL command ,
I though of this one : (how the date part looks : GETDATE()-4 is not important at the moment)
SELECT CustomerID FROM log
WHERE DATE > (GETDATE() - 4)
GROUP BY (CustomerID, DATE, URL)
HAVING COUNT(DISTINCT(DATE)) = 2
AND HAVING (COUNT(DISTINCT(URL))) > 2
Just miss out the having keyword so like
Having condition1 > val1 and condition2 >val 2
Sorry I'm on a phone so can't copy and paste that well
Related
I am trying to write a quarry in a module for Dolibarr ERP. But module hase a part of code that is predefined and can not be changed. And I need to insert a SUM() function in it that will combine rows with similar id. That i know how to do in a regular MySQL:
SELECT fk_product AS prod, SUM(value) AS qty
FROM llx_stock_mouvement
WHERE type_mouvement = 2 AND label LIKE 'SH%'
GROUP BY fk_product
ORDER BY 1 DESC
LIMIT 26
that gives me what I want :
prod qty
1 13
2 10
BUT module has a predefined unchangeable code :
this part is predefined module writes it himself based on values provider in it:
SELECT DISTINCT
c.fk_product AS com,
c.value AS qty
THIS PART I CAN WRITE IN A MODULES GUI:
FROM
llx_stock_mouvement AS c
WHERE
type_mouvement = 2
AND label LIKE 'SH%'
And this part is predefined:
ORDER BY 1 DESC
LIMIT 26
I would appreciate any help and advice on question is there any workaround that can be done to make my desired and result ampere ? As it would using the first code I posted ?
If you can only modify the bit in the middle box then you might need to use a subquery;
--fixed part
SELECT DISTINCT
c.fk_product AS com,
c.value AS qty
--begin your editable part
FROM
(
SELECT fk_product,
SUM(value) AS value
FROM llx_stock_mouvement
WHERE type_mouvement = 2 AND label LIKE 'SH%'
GROUP BY fk_product
) c
--end your editable part
--fixed part
ORDER BY 1
DESC
LIMIT 26
I have a table with the following fields:
id, type, date, changelog.
The changelog field has 10 useful pieces of information I would like to split out into their own fields. both new and old: name, month, year, zipcode, status
So I would like to create a table with the following fields:
id, type, date, old_name, new_name, old_month, new_month, old_year, new_year, old_zipcode, new_zipcode, old_status, new_status.
When all 5 pieces of information exist it is easy but when some are missing I can’t get it to work. Any help is appreciated.
a typical changelog field doesn't have all of these pieces of information, just what is being updated.
for example:
id type date changelog
101 upd 1/1/2019 ---!hash:ActiveSupport
name:
- Adam
- Chris
month:
- 7
- 12
status:
- 1
- 3
Which would translate to:
id type date old_name new_name old_month new_month old_year new_year old_zipcode new_zipcode old_status new_status
101 upd 1/1/19 Adam Chris 7 12 1 3
This is not a complete solution (it assumes you can already parse out the values when you know they are present), but it addresses how to handle when those values are missing:
INSERT INTO tableV2 (id, type, date, old_name, new_name, and so on....)
SELECT id, type, date
, CASE WHEN INSTR(changelog, 'name:') = 0 THEN NULL
ELSE (parse the value out here)
END AS old_name
, CASE WHEN INSTR(changelog, 'name:') = 0 THEN NULL
ELSE (parse the value out here)
END AS new_name
, and so on....
FROM tableV1
;
The parsing, while not trivial, probably won't be too difficult other than the tediousness of it. You'll need to take the found "tag" location, find the 3 newlines following it (first for the tag, latter for each value), and then use those along with other string functions such as SUBSTR, LEFT... and maybe some CHAR_LENGTH(tag string) like CHAR_LENGTH('name:') to make the parsing repeatable for each tag with minor modification.
I was given a task to show the CPU usage trend as part of a building process which also do regression test.
Each individual test case run has a record in the table RegrCaseResult. The RegrCaseResult table looks something like this:
id projectName ProjectType returnCode startTime endTime totalMetrics
1 'first' 'someType' 16 'someTime' 'someOtherTime' 222
The RegrCaseResult.totalMetrics is a special key which links to another table called ThreadMetrics through ThreadMetrics.id.
Here is how ThreadMetrics will look like:
id componentType componentName cpuTime linkId
1 'Job Totals' 'Job Totals' 'totalTime' 34223
2 'parser1' 'parser1' 'time1' null
3 'parser2' 'generator1' 'time2' null
4 'generator1' 'generator1' 'time3' null
------------------------------------------------------
5 'Job Totals' 'Jot Totals' 'totalTime' 9899
...
The rows with the compnentName 'Job Totals' is what the totalMetrics from RegrCaseResult table will link to and the 'totalTime' is what I am really want to get given a certain projectType. The 'Job Totals' is actually a summation of the other records - in the above example, the summation of time1 through time3. The linkId at the end of table ThreadMetrics can link back to RegrCaseResult.id.
The requirements also states I should have a way to enforce the condition which only includes those projects which have a consistent return code during certain period. That's where my initial question comes from as follows:
I created the following simple table to show what I am trying to achieve:
id projectName returnCode
1 'first' 16
2 'second' 16
3 'third' 8
4 'first' 16
5 'second' 8
6 'first' 16
Basically I want to get all the projects which have a consistent returnCode no matter what the returnCode values are. In the above sample, I should only get one project which is "first". I think this would be simple but I am bad when it comes to database. Any help would be great.
I tried my best to make it clear. Hope I have achieved my goal.
Here is an easy way:
select projectname
from table t
group by projectname
having min(returncode) = max(returncode);
If the min() and max() values are the same, then all the values are the same (unless you have NULL values).
EDIT:
To keep 'third' out, you need some other rule, such as having more than one return code. So, you can do this:
select projectname
from table t
group by projectname
having min(returncode) = max(returncode) and count(*) > 1;
select projectName from projects
group by projectName having count(distinct(returnCode)) = 1)
This would also return projects which has only one entry.
How do you want to handle them?
Working example: http://www.sqlfiddle.com/#!2/e7338/8
This should do it:
SELECT COUNT(ProjectName) AS numCount, ProjectName FROM (
SELECT ProjectName FROM Foo
GROUP BY ProjectName, ReturnCode
) AS Inside
GROUP BY Inside.ProjectName
HAVING numCount = 1
This groups all the ProjectNames by their names and return codes, then selects those that only have a single return code listed.
SQLFiddle Link: http://sqlfiddle.com/#!2/c52b6/11/0
You can try something like this with Not Exists:
Select Distinct ProjectName
From Table A
Where Not Exists
(
Select 1
From Table B
Where B.ProjectName = A.ProjectName
And B.ReturnCode <> A.ReturnCode
)
I'm not sure exactly what you're selecting, so you can change the Select statement to what you need.
Let's say I have a table 'shares' with the following columns:
company price quantity
Microsoft 100 10
Google 99 5
Google 99 20
Google 101 15
I'd like to run the equivalent of a SQL statement like this:
select price,
sum(quantity) as num
from shares
where company='Google'
group by price;
The closest I've come is:
result = (dbsession.query(Shares.price, func.sum(Shares.quantity))
.filter(Shares.company == 'Google')
.group_by(Shares.price)
.all())
I'm having trouble with setting up the 'sum(quantity) as num' in sqlalchemy. It appears I need to use alias() but I can't figure out how by looking at the documentation.
You actually want the label method.
result = dbsession.query(
Shares.price,
func.sum(Shares.quantity).label("Total sold")
) \
.filter(Shares.company== 'Google') \
.group_by(Shares.price).all()
I have a table tbl_usertests from which i want to retrieve the user who have maximum testscore for each test.
Note: User here means usertestid which is unique.
Its colums are:
pk_usertestid attemptdate uploaddate fk_tbl_tests_testid fk_tbl_users_userid testscore totalquestionsnotattempted totalquestionscorrect totalquestionsincorrect totalquestions timetaken iscurrent
data :
1;NULL;"2010-06-24 22:48:07";"11";"3";"1";"53";"1";"21";"75";"92";"1"
2;NULL;"2010-06-25 01:21:37";"11";"4";"13";"0";"13";"62";"75";"801";"1"
3;NULL;"2010-06-25 01:21:50";"10";"4";"17";"5";"17";"53";"75";"640";"1"
4;NULL;"2010-06-25 01:24:23";"11";"4";"13";"0";"13";"62";"75";"801";"1"
5;NULL;"2010-06-25 01:24:47";"10";"4";"17";"5";"17";"53";"75";"640";"1"
6;NULL;"2010-06-25 01:36:04";"11";"5";"13";"0";"13";"62";"75";"801";"1"
7;NULL;"2010-06-25 01:47:26";"7";"5";"10";"1";"10";"49";"60";"302";"1"
My Query is :
SELECT max(`testscore`) , `fk_tbl_tests_testid` , `fk_tbl_users_userid` , `pk_usertestid`
FROM `tbl_usertests`
GROUP BY `fk_tbl_tests_testid`
This query output:
max(`testscore`) fk_tbl_tests_testid fk_tbl_users_userid pk_usertestid
10 7 5 7
17 10 4 3
13 11 3 1
But the problem is that if there are two users who have same score, it displays only one user because i have used group by clause.
For. e.g. testid =10 i have two records(pk_usertestid 3 and 5) but it displays 3 only.
I want the user whose upload date is less than the other user(in case of two users having same testscore). It should display for usertestid=3 since 3 upload date is less than 5.
Right now its displaying 3 but it is due to group by clause.
I am unable to construct the query.
Please help me on this
Thanks
Try this:
SELECT t.`fk_tbl_tests_testid` , t.`fk_tbl_users_userid` , t.`pk_usertestid`, maxscores.maxscore
FROM `tbl_usertests` t
JOIN (SELECT `fk_tbl_tests_testid`,max(`testscore`) as maxscore
FROM `tbl_usertests`
GROUP BY `fk_tbl_tests_testid`) maxscores ON t.`fk_tbl_tests_testid` = maxscores.`fk_tbl_tests_testid`
the logic behind is to separate the whole thing into two parts: get the maximum (or any other aggregate) values for each group (this is the subquery part), then for each element, join the corresponding aggregate. (JOIN it back to the riginal table)