Retrieving Tree in Hierarchical data in MySQL - mysql

I have stored some hierarchical data of categories where each category is related to others, the trick is a single category can have multiple parents (Maximum 3, Minumum 0).
The table structures are:
category table
id - Primary Key
name - Name of the Category
ref_id - Reference ID that is being used for relationship
id
name
ref_id
1
everything
-1
2
computing
0
3
artificial intelligence
1
4
data science
2
5
machine learning (ML)
3
6
programming
4
7
web technologies
5
8
programming languages
7
9
content technologies
8
10
operating systems
9
11
algorithms
10
12
software development systems
102
category_relation table
id
child_ref_id
parent_ref_id
1
0
-1
2
1
0
3
2
0
4
3
1
5
3
2
6
4
102
7
5
0
8
7
4
9
8
0
10
9
0
11
10
0
12
10
4
13
102
0
as you can see in the diagram, the relationship is pretty complicated, algorithms has two parents computing and programming, similarly machine learning (ML) also has two parents artificial intelligence and data science
How can I retrieve all the children of a specific category, e.g. computing, I need to retrieve all the children till the third level, i.e. programming languages and algorithms.
MySQL dump of the database: https://github.com/codersrb/multi-parent-hierarchy/blob/main/taxonomy.sql

Assuming the data structure is fixed with a good PK, in MySQL 8.x you can do:
with recursive
n (id, name, ref_id, lvl) as (
select id, name, ref_id, 1 from category where id = 2 -- starting node
union all
select c.id, c.name, c.ref_id, n.lvl + 1
from n
join category_relation r on r.parent_ref_id = n.ref_id
join category c on c.ref_id = r.child_ref_id
)
select * from n where lvl <= 3
Result:
id name ref_id lvl
---- --------------------------------------- ------- ---
2 computing 0 1
3 artificial intelligence 1 2
4 data science 2 2
7 web technologies 5 2
9 content technologies 8 2
10 operating systems 9 2
11 algorithms 10 2
62 information science 61 2
103 software / systems development 102 2
165 scientific computing 165 2
296 image processing 316 2
297 text processing 317 2
301 Google 321 2
322 computer vision 343 2
5 machine learning (ML) 3 3
5 machine learning (ML) 3 3
6 programming 4 3
18 models 17 3
21 classification 20 3
27 data preparation 26 3
28 data analysis 27 3
29 imbalanced datasets 28 3
50 visualization 49 3
61 information retrieval 60 3
68 k-means 67 3
71 Random Forest algorithm 70 3
104 project management 103 3
105 software development methodologies 104 3
107 web development 106 3
113 kNN model 112 3
132 CRISP-DM methodology 131 3
143 data 142 3
153 SMOTE 153 3
154 MSMOTE 154 3
157 backward feature elimination 157 3
158 forward feature selection 158 3
176 deep feature synthesis (DFS) 177 3
196 unsupervised learning 197 3
210 mean-shift 211 3
212 DBSCAN 213 3
246 naïve Bayes algorithm 247 3
248 decision tree algorithm 249 3
249 support vector machine (SVM) algorithm 250 3
251 neural networks 252 3
252 artificial neural networks (ANN) 253 3
281 deep learning 300 3
281 deep learning 300 3
285 image classification 304 3
285 image classification 304 3
286 natural language processing (NLP) 305 3
286 natural language processing (NLP) 305 3
288 text representation 307 3
294 visual recognition 314 3
295 optical character recognition (OCR) 315 3
295 optical character recognition (OCR) 315 3
296 image processing 316 3
298 machine translation (MT) 318 3
299 speech recognition 319 3
300 TensorFlow 320 3
302 R 322 3
304 Android 324 3
322 computer vision 343 3
323 object detection 344 3
324 instance segmentation 345 3
325 edge detection 346 3
326 image filters 347 3
327 feature maps 348 3
328 stride 349 3
329 padding 350 3
335 text preprocessing 356 3
336 tokenization 357 3
337 case normalization 358 3
338 removing punctuation 359 3
339 stop words 360 3
340 stemming 361 3
341 lemmatization 362 3
342 Porter algorithm 363 3
350 word2vec 371 3
351 Skip-gram 372 3
364 convnets 385 3
404 multiplicative update algorithm 716 3
If you want to remove duplicates you can use DISTINCT. For example:
with recursive
n (id, name, ref_id, lvl) as (
select id, name, ref_id, 1 from category where id = 2 -- starting node
union all
select c.id, c.name, c.ref_id, n.lvl + 1
from n
join category_relation r on r.parent_ref_id = n.ref_id
join category c on c.ref_id = r.child_ref_id
)
select distinct * from n where lvl <= 3
See running example at DB Fiddle.

Related

mysql: how to select group by first character and top 5 by counter

my table look like following
id person counter
1 Ona 4946
2 Mayra 15077
3 Claire 496
4 Rita 13929
5 Demond 579
6 Winnifred 13580
7 Green 1734
8 Jacquelyn 19092
9 Aisha 5572
10 Kian 8826
11 Alexandrea 7514
12 Dalton 14151
13 Rossie 18403
14 Carson 19537
15 Mason 2022
16 Emie 2394
17 Jonatan 6655
18 June 5037
19 Jazmyn 10856
20 Mittie 18928
here is the fiddle
i would like to select the top 5 by counter and group by first character, here is the sql that i tried:
SELECT SUBSTR(person,1,1) AS Alpha, person, counter
FROM myTable
GROUP BY SUBSTR(person,1,1)
ORDER BY SUBSTR(person,1,1) ASC, counter DESC;
how to select desired result as following:
alpha person counter
a Arvid 9236
a Aisha 5572
a Alf 4000
a Ahmad 3500
a Alvin 2100
b Brandon 13000
b Ben 8230
b Bonny 7131
b Bella 4120
b Bun 1200
c Connie 9320
c Calvin 8310
c Camalia 6123
c Cimon 3419
c Clay 2515
im using mysql 8.0
You can do:
select *
from (
select *, row_number() over(partition by substr(person, 1, 1)
order by counter desc) as rn
from myTable
) x
where rn <= 5
order by substr(person, 1, 1), rn
Result:
id person counter rn
---- ---------- -------- --
153 Alf 19758 1
283 Alycia 19706 2
260 Abe 19463 3
223 Assunta 18808 4
300 Ari 18031 5
210 Bennie 18309 1
159 Barry 18281 2
128 Beulah 18080 3
314 Benny 16795 4
474 Barry 15789 5
342 Casandra 19656 1
14 Carson 19537 2
67 Chaim 19429 3
280 Colin 18507 4
500 Corbin 18433 5
380 Daphney 19138 1
234 Dejah 18781 2
241 Derrick 18722 3
49 Dasia 18562 4
312 Darrel 17903 5
163 Evalyn 19847 1
79 Ernestine 19523 2
344 Emilie 19520 3
371 Eva 19119 4
469 Emma 18403 5
140 Fiona 19522 1
216 Flo 18314 2
356 Frieda 16082 3
254 Floy 15942 4
54 Florencio 12739 5
447 Geoffrey 19858 1
327 Geoffrey 19223 2
335 Grant 19100 3
454 Giuseppe 16175 4
83 Gardner 15235 5
373 Hilario 19507 1
35 Hanna 19276 2
200 Halle 18150 3
491 Hailee 17521 4
411 Hermann 17018 5
21 Idella 7440 1
177 Izabella 5536 2
115 Isai 4164 3
412 Izabella 2112 4
275 Imani 573 5
195 Joannie 19374 1
8 Jacquelyn 19092 2
48 Jalon 18861 3
251 Jamie 18768 4
367 Joanny 17600 5
282 Kendra 19278 1
421 Kendra 19213 2
363 Kaylin 18977 3
96 Kaylie 18423 4
310 Katrine 17754 5
146 Lonzo 19778 1
194 Leonora 18258 2
399 Laurine 16847 3
137 Leslie 16718 4
190 Luther 16318 5
87 Maegan 19112 1
20 Mittie 18928 2
271 Mariana 18149 3
317 Mary 18043 4
305 Maybelle 17666 5
281 Noelia 19203 1
176 Nickolas 19047 2
408 Nelson 15901 3
142 Nasir 13700 4
366 Nicole 10694 5
423 Ova 19759 1
487 Osborne 19539 2
438 Ozella 18911 3
375 Ora 18270 4
414 Onie 17358 5
52 Pascale 19658 1
39 Pearlie 17621 2
364 Price 14177 3
161 Precious 10337 4
294 Paula 9162 5
70 Quincy 18343 1
73 Quincy 16631 2
192 Quentin 13578 3
131 Rodger 19776 1
231 Royal 19033 2
313 Rocky 19008 3
13 Rossie 18403 4
45 Rosanna 15992 5
418 Sydnee 19810 1
470 Sadie 19189 2
123 Shanna 18862 3
485 Savanah 18664 4
302 Steve 16412 5
406 Toney 18283 1
28 Tremaine 16400 2
98 Taurean 15911 3
278 Tremaine 14391 4
311 Treva 14026 5
239 Ubaldo 11630 1
78 Valentina 17736 1
458 Vita 17527 2
170 Vergie 16971 3
158 Vance 15089 4
272 Veronica 12027 5
102 Willis 18155 1
329 Ward 14919 2
156 Westley 14867 3
136 Winnifred 14315 4
6 Winnifred 13580 5
323 Yolanda 17920 1
155 Yesenia 6164 2
402 Zachary 19129 1
37 Zaria 5398 2
See running example at DB Fiddle.

Select target number of records by groups

I have a database in which each record has a rank and is associated to a certain group.
Also, there is a target number of IDs for each group.
I need to select this target number in each group with the highest ranked records.
This is an example of this data:
Group Id Rank
--------------------------------
GUADALAJARA 1 356
GUADALAJARA 2 387
PUEBLA 3 431
TIJUANA 4 315
PUEBLA 5 315
MONTERREY 6 315
MONTERREY 7 263
PUEBLA 8 356
PUEBLA 9 447
GUADALAJARA 10 356
MONTERREY 11 356
TIJUANA 12 447
PUEBLA 13 356
PUEBLA 14 387
MONTERREY 15 431
MONTERREY 16 412
MONTERREY 17 447
TIJUANA 18 263
And the targets for each group are:
Group Records Goal
----------------------------
GUADALAJARA 4 2
MONTERREY 6 3
PUEBLA 6 3
TIJUANA 3 2
For example, group Guadalajara has 4 records , and I need to select the first two highest-ranked which would be 100630487 and 133255369:
ID Rank
----------------
100630487 447
133255369 387
138314098 356
114194869 356
I will appreciate any ideas to make this query.

MySQL: SUM data from two different tables

I have a db for a menu which tracks clicks on it. The menu has categories and subcategories and I'm trying to get the amount of clicks for each category but in the db, the clicks will register to the subcategory if the item is in one, otherwise the clicks are counted in the category. I have a query that will get clicks for all subcategories (category_type 3) but I need to add them with the clicks from their parent category (category_type 2). There is a table called CategoryHierarchy that maps each category to it's parent category. This is what I have:
SELECT IFNULL(SUM(`MenuEntryAnalytics`.`opened`), 0) AS `clicks`,
`Categories`.`id`,
`Categories`.`name`,
`Categories`.`category_type`,
`CategoryHierarchy`.`parent_id` AS `parent`
FROM `MenuEntryAnalytics`
INNER JOIN `MenuEntries`
ON `MenuEntryAnalytics`.`menu_entry_id` = `MenuEntries`.`id`
LEFT JOIN `MenuEntryToCategory`
ON `MenuEntryAnalytics`.`menu_entry_id` = `MenuEntryToCategory`.`menu_entry_id`
RIGHT JOIN `Categories`
ON `MenuEntryToCategory`.`category_id` = `Categories`.`id`
RIGHT JOIN `CategoryHierarchy`
ON `Categories`.`id` = `CategoryHierarchy`.`category_id`
WHERE `Categories`.`category_type` = 3
GROUP BY `id`;
Results:
clicks id name type parent
=============================================
2032 3 Appetizers 3 2
455 4 Salads 3 2
680 6 Sandwiches 3 5
424 7 Burgers 3 5
584 9 Pizza 3 8
466 10 Kids Menu 3 8
1445 12 Soda 3 11
1089 13 Signature Cocktails 3 11
391 14 Bottled Beer 3 11
167 15 Wine 3 11
0 17 Events 3 16
0 18 Sponsors 3 16
186 19 Dessert 3 11
621 26 Restaurants 3 22
263 27 Bars 3 22
112 28 Services 3 25
254 29 Amenities 3 25
67 30 Exclusive Benefits 3 25
190 31 Area Attractions 3 24
14 32 Entertainment 3 24
2 33 Shopping 3 24
117 34 Transportation & Tours 3 24
471 35 Mixed Drinks 3 11
541 36 Draft Beer 3 11
if I GROUP BY parent then I can get most of what I need (all the clicks from subcategories of each category) but this doesn't get the clicks counted towards categories (as opposed to subcategories, i.e. category_type 2). I'm stuck trying to add that part in, all I can think of is using a subquery but there's no way of identifying which category I'm looking at, thus I get a subquery with multiple rows.
PS I do not have permission to restructure the db.
Since the parent ID is in the same namespace as the ID, you can simply use IFNULL to pick the parent if it exists, or otherwise the ID. And use that as your grouping strategy.
You may also want to select the same data out as an actual column.
GROUP BY
IFNULL(CategoryHierarchy.parent_id, Categories.id, CategoryHierarchy.parent_id)

How to find the last six rows(dynamic) sum of one column based on another column

BallByBallID Deliveries RunsScored BowlPlayerId BatPlayerId
109 0 1 127 4
110 0.1 2 127 6
111 0.2 3 127 6
112 0.3 4 127 4
113 0.4 6 127 4
114 0.5 6 127 4
230 0 1 162 4
231 0.1 2 162 6
232 0.2 3 162 6
233 0.3 4 162 4
234 0.4 5 162 4
235 0.5 6 162 6
236 1 1 169 4
237 1.1 2 169 6
238 1.2 3 169 6
239 1.3 4 169 4
240 1.4 5 169 4
241 1.5 6 169 6
I have data in the above mentioned format. Now i want to find the sum of RunsScored and BowlPlayerId for last inserted data (the data is dynamic the last six may change at any time) based on BallByBallId.
I tried to find the solution by using like this.........
SELECT SUM(RunsScored) from (select BallByBallId from BallByBall ORDER BY BallByBallId DESC LIMIT 6);
It is giving total some......
SELECT SUM(RunsScored) from (SELECT top 6 * from BallByBall ORDER BY BallByBallId DESC) A GROUP BY A.BowlPlayerID
SELECT SUM(RunsScored+BowlPlayerID)
FROM from BallByBall group by BallByBallId limit 6;
http://sqlfiddle.com/#!2/f3ab78/10

Webscraping the data using R

Aim: I am trying to scrape the historical daily stock price for all companies from the webpage http://www.nepalstock.com/datanepse/previous.php. The following code works; however, it always generates the daily stock price for the most recent (Feb 5, 2015) date only. In another words, output is the same, irrespective of the date that I entered. I would appreciate if you could help in this regard.
library(RHTMLForms)
library(RCurl)
library(XML)
url <- "http://www.nepalstock.com/datanepse/previous.php"
forms <- getHTMLFormDescription(url)
# we are interested in the second list with date forms
# forms[[2]]
# HTML Form: http://www.nepalstock.com/datanepse/
# Date: [ ]
get_stock<-createFunction(forms[[2]])
#create sequence of dates from start to end and store it as a list
date_daily<-as.list(seq(as.Date("2011-08-24"), as.Date("2011-08-30"), "days"))
# determine the number of elements in the list
num<-length(date_daily)
daily_1<-lapply(date_daily,function(x){
show(x) #displays the particular date
readHTMLTable(htmlParse(get_stock(Date = x)), which = 7)
})
#18 tables out of which 7 is one what we desired
# change the colnames
col_name<-c("SN","Traded_Companies","No_of_Transactions","Max_Price","Min_Price","Closing_Price","Total_Share","Amount","Previous_Closing","Difference_Rs.")
daily_2<-lapply(daily_1,setNames,nm=col_name)
Output:
> head(daily_2[[1]],5)
SN Traded_Companies No_of_Transactions Max_Price Min_Price Closing_Price Total_Share Amount
1 1 Agricultural Development Bank Ltd 24 489 471 473 2,868 1,359,038
2 2 Arun Valley Hydropower Development Company Limited 40 365 360 362 8,844 3,199,605
3 3 Alpine Development Bank Limited 11 297 295 295 150 44,350
4 4 Asian Life Insurance Co. Limited 10 1,230 1,215 1,225 898 1,098,452
5 5 Apex Development Bank Ltd. 23 131 125 131 6,033 769,893
Previous_Closing Difference_Rs.
1 480 -7
2 363 -1
3 303 -8
4 1,242 -17
5 132 -1
> tail(daily_2[[1]],5)
SN Traded_Companies No_of_Transactions Max_Price Min_Price Closing_Price Total_Share Amount Previous_Closing
140 140 United Finance Ltd 4 255 242 242 464 115,128 255
141 141 United Insurance Co.(Nepal)Ltd. 3 905 905 905 234 211,770 915
142 142 Vibor Bikas Bank Limited 7 158 152 156 710 109,510 161
143 143 Western Development Bank Limited 35 320 311 313 7,631 2,402,497 318
144 144 Yeti Development Bank Limited 22 139 132 139 14,355 1,921,511 134
Difference_Rs.
140 -13
141 -10
142 -5
143 -5
144 5
Here's one quick approach. Note that the site uses a POST request to send the date to the server.
library(rvest)
library(httr)
page <- "http://www.nepalstock.com/datanepse/previous.php" %>%
POST(body = list(Date = "2015-02-01")) %>%
html()
page %>%
html_node(".dataTable") %>%
html_table(header = TRUE)