MySQL single column n-gram split and count

MySQL single column n-gram split and count - mysql

Given a column of strings (passwords) in MySQL and given a value N, i'm looking for an sql-way to count the frequency of each n-gram (substrings of length n).
It's important to keep the code inside MySQL, cause in other environments I have, it will result with memory overflow.
The only working approach I found meanwhile is by assuming limited length of the string (legit assumption), select separately by extracting different locations substrings ,union and then group by and count, like this (for 9-grams out of 13 chars):
Select
nueve,
count(*) as density,
avg(location) as avgloc
From
(select
mid(pass, 1, 9) as nueve, 1 as location
from
passdata
where
length(pass) >= 9 and length(pass) <= 13 UNION ALL select
mid(pass, 2, 9), 2 as location
from
passdata
where
length(pass) >= 10 and length(pass) <= 13 UNION ALL select
mid(pass, 3, 9), 3 as location
from
passdata
where
length(pass) >= 11 and length(pass) <= 13 UNION ALL select
mid(pass, 4, 9), 4 as location
from
passdata
where
length(pass) >= 12 and length(pass) <= 13 UNION ALL select
mid(pass, 5, 9), 5 as location
from
passdata
where
length(pass) = 13) as nueves
group by nueve
order by density DESC
The results are looking like this:
nueve density avgloc
123456789 1387 2.4564
234567890 193 2.7306
987654321 141 2.0355
password1 111 1.7748
123123123 92 1.913
liverpool 89 1.618
111111111 86 2.2791
where nueve is the 9-gram, density is the number of appearances, and avgloc is the mean starting location in the string
Any suggestions to improve the query? I'm doing the same for other n-grams too.
Thanks!

Create a table that contains all the numbers from 1 to the maximum length of passwords. You can then join with this to get the substring positions.
SELECT nueve, COUNT(*) AS density, AVG(location) as avgloc
FROM (
SELECT MID(p.pass, n.num, #N) AS nueve, n.num AS location
FROM passdata AS p
JOIN numbers_table AS n ON LENGTH(p.pass) >= (#N + n.num - 1)
) AS x
GROUP BY nueve
ORDER BY density DESC

Related

SQL Row wise total value

I have a table named calcu
id date name s1 s2 s3 s4 min_value
1 02/10/2017 dicky 7 4 8 9 4
2 02/10/2017 acton 12 15 17 19 15
3 02/10/2017 adney 28 13 19 10 13
This is my table in SQL Fiddle
I need row wise total value. I means in a new column total, it will be (s1 + s2 + s3 + s4) i.e. (7+4+8+9) = 28 where id=1, (12+15+17+19)=63 where id=2, (28+13+19+10)=70 where id=3 respectively.
Result will be like below:
id date name s1 s2 s3 s4 min_value Total
1 02/10/2017 dicky 7 4 8 9 4 28
2 02/10/2017 acton 12 15 17 19 15 63
3 02/10/2017 adney 28 13 19 10 13 70
see my problem here
It results all total 161 and 3 rows become 1 row.
How to write SQL query?

The SUM() function is an aggregate function. As with other aggregates, use it only to compute values across multiple rows.
You want to add up values in one row, so just use the + operator (brackets are optional).
As for finding the minimum value in the row, use CASE WHEN with 3 tests, comparing S1, S2, S3 and S4.
This should work:
select
c.id, c.date, c.name, c.s1, c.s2, c.s3, c.s4,
(c.s1 + c.s2 + c.s3 + c.s4) as total,
case
when c.s1 <= c.s2 and c.s1 <= c.s3 and c.s1 <= c.s4 then c.s1
when c.s2 <= c.s1 and c.s2 <= c.s3 and c.s2 <= c.s4 then c.s2
when c.s3 <= c.s2 and c.s3 <= c.s1 and c.s3 <= c.s4 then c.s3
when c.s4 <= c.s2 and c.s4 <= c.s3 and c.s4 <= c.s1 then c.s4
end as min_value
from calcu c
;
See SQLFiddle

select c.id,
c.date, c.name, c.s1, c.s2, c.s3, c.s4,
least(s1,s2,s3,s4) Minvalue,
(s1+s2+s3+s4) Total
from calcu c
I tried simplifying the query. So you are looking for the minimum value among s1,s2,s3 and s4. You can achieve with least function. And you need a total of all four 's' columns. Just add them

SELECT *,s1+s2+s3+s4 as Total FROM calcu

select: result based on occurrence of explicit value

Given is following mysql table:
CREATE TABLE fonts
(`id` int, `fontName` varchar(22), `price` int,`reducedPrice` int,`weight` int)
;
INSERT INTO fonts
(`id`, `fontName`, `price`,`reducedprice`,`weight`)
VALUES
(1, 'regular', 50,30,1),
(2, 'regular-italic', 50,20,1),
(3, 'medium', 60,30,2),
(4, 'medium-italic', 50,30,2),
(5, 'bold', 50,30,3),
(6, 'bold-italic', 50,30,3),
(7, 'bold-condensed', 50,30,3),
(8, 'super', 50,30,4)
;
As an example a user chooses following ids: 1,2,3,5,6,7
which would result in following query/result:
> select * from fonts where id in(1,2,3,5,6,7);
id fontName price reducedPrice weight
1 regular 50 30 1
2 regular-italic 50 20 1
3 medium 60 30 2
5 bold 50 30 3
6 bold-italic 50 30 3
7 bold-condensed 50 30 3
Is it possible to have a kind of "if statement" in a query to return a new field based on column weight. Where a value occurs more than once reducedPrice should be returned as newPrice else price:
id fontName price reducedPrice weight newPrice
1 regular 50 30 1 30
2 regular-italic 50 20 1 20
3 medium 60 30 2 60
5 bold 50 30 3 30
6 bold-italic 50 30 3 30
7 bold-condensed 50 30 3 30
Which means ids 1,2,5,6,7 should be reduced but id 3 not as its weight "2" only occurs once
Please find a fiddle here: http://sqlfiddle.com/#!9/73f5db/1
And thanks for your help!

Write a subquery that gets the number of occurrences of each weight, and join with this. Then you can test the number of occurrences to decide which field to put in NewPrice.
SELECT f.*, IF(weight_count = 1, Price, ReducedPrice) AS NewPrice
FROM fonts AS f
JOIN (SELECT weight, COUNT(*) AS weight_count
FROM fonts
WHERE id IN (1, 2, 3, 5, 6, 7)
GROUP BY weight) AS w ON f.weight = w.weight
WHERE id IN (1, 2, 3, 5, 6, 7)
Updated fiddle

select *,if(occurences>=2,reducedPrice,price) as newPrice from fonts
left join (Select count(id) as occurences, id,weight from fonts
where fonts.id in(1,2,3,5,6,7) group by weight) t on t.weight = fonts.weight
where fonts.id in(1,2,3,5,6,7);
The mysql if keyword reference is here:https://dev.mysql.com/doc/refman/5.1/en/control-flow-functions.html#function_if
Edit: Added fiddle, changed to instances as comment requested.
Updated fiddle:http://sqlfiddle.com/#!9/a93ef/14

SELECT DISTINCT x.*
, CASE WHEN y.weight = x.weight THEN x.reducedPrice ELSE x.price END newPrice
FROM fonts x
LEFT
JOIN
( SELECT * FROM fonts WHERE id IN(1,2,3,5,6,7) )y
ON y.weight = x.weight
AND y.id <> x.id
WHERE x.id IN(1,2,3,5,6,7)
ORDER
BY id;

Mysql best students in every class in a school

In MySql I need to select top student in every class in a school in termid=10 to get discount for next term enrollment .
Please notice that total is not in table(I put in below for clearing problem)
I have this workbook table for all students workbook:
id studentid classid exam1 exam2 total termid
1 2 11 20 40 60 10
2 1 22 40 20 60 10
3 4 11 40 20 60 10
4 5 33 10 60 70 10
5 7 22 10 40 50 10
6 8 11 10 30 40 10
7 9 33 20 45 65 10
8 11 11 null null null 10
9 12 54 null null null 02
10 13 58 null null null 02
1st challenge is : exam1 and exam2 are VARCHAR and total is not in table (as i explained).
2nd challenge is : as you can see in id=8 std #11 has not numbers
3rd challenge is : may be two students have top level so they must be in result.
I need result as :
id studentid classid exam1 exam2 total termid
1 2 11 20 40 60 10
3 4 11 40 20 60 10
4 5 33 10 60 70 10
2 1 22 40 20 60 10
i have this query but not work good as i mention.
SELECT DISTINCT id,studentid,classid,exam1,exam2,total,termid ,(CAST(exam1 AS DECIMAL(9,2))+CAST(exam2 AS DECIMAL(9,2))) FROM workbook WHERE ClassId = '10';

You can get the total for the students by just adding the values (MySQL will convert the values to numbers). The following gets the max total for each class:
select w.classid, max(coalesce(w.exam1, 0) + coalesce(w.exam2, 0)) as maxtotal
from workbook w
group by w.classid;
You can then join this back to the original data to get information about the best students:
select w.*, coalesce(w.exam1, 0) + coalesce(w.exam2, 0) as total
from workbook w join
(select w.classid, max(coalesce(w.exam1, 0) + coalesce(w.exam2, 0)) as maxtotal
from workbook w
group by w.classid
) ww
on w.classid = ww.classid and (coalesce(w.exam1, 0) + coalesce(w.exam2, 0)) = ww.maxtotal;

Another approach is to join the table with itself. You find out the max for each class and then join all students of this class which match the class max:
max for each class (included in the final statement already):
SELECT classid, MAX(CAST(exam1 AS UNSIGNED) + CAST(exam2 AS UNSIGNED)) as 'maxtotal'
FROM students
WHERE NOT ISNULL(exam1)
AND NOT ISNULL(exam2)
GROUP BY classid
The complete statement:
SELECT s2.*, s1.maxtotal
FROM (SELECT classid, MAX(CAST(exam1 AS UNSIGNED) + CAST(exam2 AS UNSIGNED)) as 'maxtotal'
FROM students
WHERE NOT ISNULL(exam1)
AND NOT ISNULL(exam2)
GROUP BY classid) s1
JOIN students s2 ON s1.classid = s2.classid
WHERE s1.maxtotal = (CAST(s2.exam1 AS UNSIGNED) + CAST(s2.exam2 AS UNSIGNED));
SQL Fiddle: http://sqlfiddle.com/#!2/9f117/1

Use a simple Group by Statement:
SELECT
studentid,
classid,
max(coalesce(exam1,0)) as max_exam_1,
max(coalesce(exam2,0)) as max_exam_2,
sum(coalesce(exam1,0) + coalesce(exam2,0)) as sum_exam_total,
termid
FROM
workbook
WHERE
termid=10
GROUP BY
1,2
ORDER BY
5

Try something like this:
SELECT id,studentid,classid,exam1,exam2,(CAST(exam1 AS DECIMAL(9,2))+CAST(exam2 AS DECIMAL(9,2))) AS total,termid FROM `workbook` WHERE ((CAST(exam1 AS DECIMAL(9,2))+CAST(exam2 AS DECIMAL(9,2)))) > 50

Thanks all my friends
I think combine between 2 answer in above is best :
SELECT s2.*, s1.maxtotal
FROM (SELECT ClassId, MAX(
coalesce(exam1,0)+
coalesce(exam2,0)
) as 'maxtotal'
FROM workbook
WHERE
(
termid = '11'
)
GROUP BY ClassId) s1
JOIN workbook s2 ON s1.ClassId = s2.ClassId
WHERE s1.maxtotal = (
coalesce(exam1,0)+
coalesce(exam2,0)
) AND (s1.maxtotal >'75');
last line is good for s1.maxtotal=0 (some times student scores have not be entered and all equals 0 so all will shown as best students) or some times we need minimum score (to enroll in next term).
So thanks all

select columns from one table using count function of another table in sql

run_software
runID Release
1 X
2 X
3 Y
4 Z
5 Y
6 X
7 Y
8 Z
9 X
10 Z
testcase
testID runID Result
T_1 1 PASS
T_2 1 FAIL
T_3 1 PASS
T_4 2 PASS
T_5 2 FAIL
T_6 3 PASS
T_7 4 FAIL
T_8 3 PASS
T_9 3 FAIL
T_10 5 PASS
T_11 5 FAIL
T_12 3 PASS
1) From run_software table we can understand X software run on runID's 1, 2,6,9
2) Take runID - 1 and come to testcase table.
Here we have 7 testID's with runID 1.
From these 7 testID's we need to measure the TC count and percentage of PASS/FAIL using group by Result and runID.
AIM: Ultimate aim is to find the latest 3 Release's and its runID's with PASS percentage by considering max testcase count.
Eg. if 'X' release executed on runID's 1, 2, 3, 4 with each 10, 12, 9, 21 testcases respectively, we should consider runID 4 for release 'X' to measure the 'PASS %'
Desired OutPut:
considering PASS% is > 60
Release runID Result PASS %
X 1 PASS 66.66
Y 3 PASS 75
Z 4 FAIL 0
To be understanding
Release 'X' has runID's - 1, 2 , 6, 9 with 3, 2, 0 , 0 TestID's respecively
Hence, X finalized runID '1' with 66.66 as PASS% (2 PASS & 1 FAIL)

Your question is not very clear actually, but it should be something like that I guess:
SELECT r.Release, r.runID, (CASE WHEN subq.PassPerc >= 60 THEN 'PASS' ELSE 'FAIL' END) AS Result, subq.PASSPerc
FROM run_software AS r
INNER JOIN (SELECT p.runID, (100 * p.passed / t.total) AS PASSPerc
FROM (SELECT runID, COUNT(*) AS passed FROM testcase WHERE Result = 'PASS' GROUP BY runID) AS p
INNER JOIN (SELECT runID, COUNT(*) AS total FROM testcase GROUP BY runID) AS t ON t.runID = p.runID
GROUP BY p.runID, p.passed, t.total) AS subq
ON subq.runID = r.runID
GROUP BY r.Release, r.runID, subq.PASSPerc

Return sums of multiples of nth

Now here's a fun MySQL question, I wonder if it's even possible!
Disclaimer: Although it's very similar question that I asked before, it actually is COMPLETELY different. Just saying before anyone says I've asked this before.
For this example lets say I want SUMS() of multiples of 20.
I want to SUM() the row score and return the date.
Lets say I have the following table sorted by date ASC:
Data
score | date
4 2000-01-01
2 2000-01-02
6 2000-01-03
1 2000-01-04 //Score 4+2+6+1 = 13
7 2000-01-05 //Score 4+2+6+1+7 = 20 so return this date
1 2000-01-06
2 2000-01-07
1 2000-01-08
5 2000-01-09
1 2000-01-10
9 2000-01-11 //Score = 39 so far.
7 2000-01-12 //Score = 46 It's not 40 but is the closest number above 40 so return it.
3 2000-01-13
4 2000-01-14
7 2000-01-15 //Score = 60, return this date.
Expected results:
score | date
20 2000-01-05
40 2000-01-12
60 2000-01-15
And etcetera. Is it possible to do this in MySQL?

By using SQL Variables, you don't have to keep doing recursive aggregations for every subsequent row to tally up to the given entity. This does each one in sequence with a flag of which one triggers the multiple of 20. That result is then processed out only where the "ThisOne" flag is set to 1.
select
M20.*
from
( select
TransDate,
score,
if( #runTotal + Score >= 20 * #multCnt, 1, 0 ) as ThisOne,
#multCnt := #multCnt + if( #runTotal + Score >= 20 * #multCnt, 1, 0 ) as nextSeq,
#runTotal := #runTotal + Score
from Mult20s,
( select #multCnt := 1,
#runTotal := 0 ) sqlvars
order by transdate ) M20
where
M20.ThisOne = 1

Sure, anything's possible :)
select
floor(partial / 20) * 20, min(date)
from
(select
(select sum(score) from Scores s2
where s2.date <= s.date) as partial,
score,
date
from
Scores s) p
where
floor(partial / 20) > 0
group by
floor(partial / 20)
Demo: http://www.sqlfiddle.com/#!2/d44cf/3

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL single column n-gram split and count - mysql

Related

SQL Row wise total value

select: result based on occurrence of explicit value

Mysql best students in every class in a school

select columns from one table using count function of another table in sql

Return sums of multiples of nth

Categories

Resources