MySQL group by URL path pattern - mysql

How to group rows with URLs by path pattern? E.g. we have there addresses:
1 http://example.com
2 http://example.com/products
3 http://example.com/products/some-product
4 http://example.com/categories
5 http://example.com/categories/cat1
6 http://example.com/categories/cat2
7 http://example.com/categories/cat3
8 http://example.com/tags
9 http://example.com/tags/tag1
10 http://example.com/tags/tag2
11 http://example.com/tags/tag3
12 http://example.com/about
So results would be:
1 http://example.com
2 http://example.com/products
3 http://example.com/products/some-product
4 http://example.com/categories
5 http://example.com/categories/cat1
8 http://example.com/tags
9 http://example.com/tags/tag1
12 http://example.com/about
We know domain http://example.com. We need all distinct path types. Basically we want to know what different pages website have. So it's kind of http://example.com/ * / * / * ...

Try this
Rextester Sample
select * from tbl1 t1
where exists
(select 1
from tbl1 t2
group by substring_index(concat(url,'#'),'/',4)
having t1.id=min(t2.id)
);
In MYSQL, there is no hard rule for not selecting columns which are not in group by. So you can go by this as well.
select *
from tbl1
group by
substring_index(concat(url,'#'),'/',4)
order by id
;
Role of substring_index(concat(url,'#'),'/',4)
It will first add an extra character at the end of the url, say #. Then it would truncate the url till the 4th /. Without adding # at the end, http://example.com/categories/cat1 and http://example.com/categories will be put in same group which you dont want.

Related

why isn't this window function working? I can't use a group by

I am working in MySQL and I have this table:
url x y count *
abc.com 1 1 2
abc.com 2 1 2
xyz.com 1 56 5
xyz.com 2 71 5
xyz.com 3 59 5
xyz.com 4 67 5
xyz.com 5 59 5
and I'm trying to run this code:
count(dh.x) over(partition by dh.url)
in order to get a count of 2 for url of abc.com and a count of 5 for url of xyz.com to be the denominator I divide by in my query below...
that window function is nested in this larger CTE:
, stdev_estimates AS (
SELECT dh.url
, CASE SUM(SQUARE(dh.x - xmean)) WHEN 0 THEN 1 ELSE SQRT(SUM(SQUARE(dh.x - xmean)) / (count(dh.x) over(partition by dh.url)) - 1) END AS xstdev
, SQRT(SUM(SQUARE(dh.y - ymean)) / (count(dh.x) over(partition by dh.url)) - 1) AS ystdev
, count(dh.x) over(partition by dh.url) 'denom'
FROM date_hits dh
INNER JOIN mean_estimates m ON m.url = dh.url
GROUP BY dh.url, m.xmean, m.ymean, dh.x, dh.y
)
select *
from stdev_estimates
but when I try to run the inner subselect, I keep getting this error:
Msg 8120, Level 16, State 1, Line 49
Column 'date_hits.x' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
But it doesn't make sense because I thought this was a workaround needing a group by for the dh.x? I thought you don't need group by's if you're using an aggregating function.
Bottom line is I need to not group by dh.x because it breaks out my table into 7 rows when I'm aggregating everything down to:
url xstdev ystdev
abc.com .12345 .5679
xyz.com .23455 .79023
What's within the window function must be an expression that complies with the outer group by.
If you remove dh.x from the group by clause, you cannot use it alone, you need an aggregate function on it.
Presumably, you want to replace this:
count(dh.x) over(partition by dh.url)
With:
sum(count(dh.x)) over(partition by dh.url)

Selecting unique lines based on two columns in mysql

I've been trying to figure out a way to select lines that are unique based on the values of two columns. For example, here is a sample/example of my file:
s r
10 12
10 13
14 10
10 12
14 10
12 10
And this is what I want my output to look like:
s r
10 12
10 13
14 10
I've tried to use this code
SELECT * FROM `message` WHERE (`s_id`=$b_id or 'r_id'=$b_id) GROUP BY
r_id
However, it only select 10,12 10,11 and i also needed 14,10 i mean the third row
in this case $b_id is obtained from session and it may be s or r
You can try to use DISTINCT
SELECT DISTINCT s_id,r_id FROMmessageWHERE (s_id=$b_id or 'r_id'=$b_id);
always try to specify which column needed instead of * (all) it improve the performance
Do it like this
select distinctrow s_id,r_id from table where s_id = 10 or r_id = 10
or else use "having" clause and group by both columns
After two days struggling i am succeeded in finding out solution to my problem
SELECT * FROM `message` WHERE s=$b_id or r=$b_id AND (r and
s)<>(s_ and r) GROUP BY s,r

MySQL multi-step GROUP BY without subquery

I'm working on improving some queries I inherited, and was curious if it was possible to do the following - given a table the_table that looks like this:
id uri
---+-------------------------
1 /foo/bar/x
1 /foo/bar/y
1 /foo/boo
2 /alpha/beta/carotine
2 /alpha/delic/ipa
3 /plastik/man/spastik
3 /plastik/man/krakpot
3 /plastik/man/helikopter
As an implicit intermediate step I'd like to group these by the 1st + 2nd tuple of uri. The results of that step would look like:
id base
---+---------------
1 /foo/bar
1 /foo/boo
2 /alpha/beta
2 /alpha/delic
3 /plastik/man
And the final result would reflect the number of unique tuple1 + tuple2 values, per unique id:
id cnt
---+-----
1 2
2 2
3 1
I can achieve these results, but not without doing a subquery (to get the results of the implicit step mentioned above), and then select/grouping out of that. Something like:
SELECT
id,
count(base) cnt
FROM (
SELECT
id,
substring_index(uri, '/', 3) AS base
FROM the_table
GROUP BY id, base
)
GROUP BY id;
My reason for wanting to avoid the subquery is that I'm working with a fairly large (20M rows) data set, and the subquery gets very expensive. Gut tells me it's not doable, but figured I'd ask SO...
There's no need for a subquery -- you can use count with distinct to achieve the same result:
SELECT
id,
count(distinct substring_index(uri, '/', 3)) AS base
FROM the_table
GROUP BY id
SQL Fiddle Demo
BTW -- this returns count of 1 for id 3 -- I assume that was a typo in your posting.

Find emails in the same table that having %40 in MySQL

I have a table with two columns - id, email
What query can I run to show only single emails having %40 ?
e.g. my table is
id email
-------------
1 stevemartin140%40gmail.com
2 stevemartin141%40gmail.com
3 stevemartin140#gmail.com
4 stevemartin141#gmail.com
5 stevemartin148%40gmail.com
6 andymartin%40ymail.com
So result will be:
id email
-------------
5 stevemartin148%40gmail.com
6 andymartin%40ymail.com
i would like to find out & replace those emails with #
so my final output will be:
id email
-------------
1 stevemartin140%40gmail.com
2 stevemartin141%40gmail.com
3 stevemartin140#gmail.com
4 stevemartin141#gmail.com
5 stevemartin148#gmail.com
6 andymartin#ymail.com
Thanks in advance
If you want to replace them, then:
UPDATE t SET email=REPLACE(email, '%40', '#')
-in MySQL.
Maybe not the must performance one but to find duplicate...
SELECT * FROM table GROUP BY REPLACE(email, '%40', '#') HAVING COUNT(*) < 2 AND email LIKE '%\%40%';
MySQL compliant
http://www.sqlfiddle.com/#!2/6808c/3/0
You should use like '%\%40%' in your where condition to escape the wildcard, like this:
select * from t1 where email like '%\%40%';
here is the SqlFiddle

Oracle/MYSQL: Sort records from a select query on a column that contains alphanumeric values

I know that this question has been asked in various forms but my requirement happens to be a bit different.
Suppose I have a table that contains data as follows:
ID NAME VALUE
-----------------------------
1 ABC-2-2 X
2 PQRS-1-3 Y
3 ABC-3-2 Z
4 PQRS-1-4 A
5 PQRS-3-4 B
6 MNO-2-1 C
7 AAA-1 D
8 BBB-2 E
9 CCC-3 F
Now, the output that I'm expecting should look something like this:
ID NAME VALUE
-----------------------------
7 AAA-1 D
2 PQRS-1-3 Y
4 PQRS-1-4 A
8 BBB-2 E
6 MNO-2-1 C
1 ABC-2-2 X
9 CCC-3 F
3 ABC-3-2 Z
5 PQRS-3-4 B
Note that this is not a direct alpha-numeric sort. Instead, the value before the first "-" is ignored and the fields are sorted on what is after the first "-" in the name.
I'm not very familiar with PL/SQL and any kind of help on this would be appreciated.
Thanks.
PS: Note that this should work on both Oracle and MySQL.
For your example this would suffice (Oracle syntax):
ORDER BY SUBSTR(name,4)
If the number of characters before the first hyphen can vary, you can do this (again Oracle syntax):
ORDER BY SUBSTR(name,INSTR(name,'-')+1)
However that won't work if you have codes like:
AAA-10-1
AAA-8-1
AAA-9-1
and expect AAA-10-1 to appear after AAA-9-1. Then you will need to parse it further:
ORDER BY LPAD(SUBSTR(name,INSTR(name,'-')+1, INSTR(name,'-',1,2)-INSTR(name,'-')-1),10,'0'),
LPAD(SUBSTR(name,INSTR(name,'-',1,2)+1),10,'0')
(NB I have used LPAD(x,10,'0') to turn a value like '1' into '0000000001' and so on, rather than use TO_NUMBER since this could fail if there are any non-numerics in your data.)
Example:
with data as
(
select 'AAA-1' name from dual
union all
select 'PQR-1-4' name from dual
union all
select 'PQR-1-3' name from dual
union all
select 'AAA-10-10' name from dual
union all
select 'AAA-10-1' name from dual
union all
select 'AAA-9-10' name from dual
union all
select 'AAA-9-1' name from dual
)
select *
from data
ORDER BY LPAD(SUBSTR(name,INSTR(name,'-')+1, INSTR(name,'-',1,2)-INSTR(name,'-')-1),10,'0'),
LPAD(SUBSTR(name,INSTR(name,'-',1,2)+1),10,'0');
Output:
NAME
---------
PQR-1-3
PQR-1-4
AAA-9-1
AAA-9-10
AAA-10-1
AAA-10-10
AAA-1
And if AAA-1 should come first:
ORDER BY LPAD(SUBSTR(name,INSTR(name,'-')+1, INSTR(name||'-','-',1,2)-INSTR(name,'-')-1),10,'0'),
LPAD(SUBSTR(name,INSTR(name||'-','-',1,2)+1),10,'0') nulls first
Not sure about mysql syntax, but you can do this in oracle:
select * from <your_table>
order by substr(name, 5)
in mssql the syntax of finding your problem is :
select * from mytable order by substring(name,PATINDEX('%-%',name)+1,len(name)-PATINDEX('%-%',name))
SqlFiddle