I am trying to reconstruct data that has a tree structure.
Example - Country / City:
1) USA
1.1) New York
1.2) Chicago
2) France
2.1) Paris
2.2) Lyon
3) China
In my database it looks like this:
| Element | Level | Row |
|:--------:|:-----:|:---:|
| USA | 1 | 1 |
| New York | 2 | 2 |
| Chicago | 2 | 3 |
| France | 1 | 4 |
| Paris | 2 | 5 |
| Lyon | 2 | 6 |
| China | 1 | 7 |
Based on the sequence (row) of my entries I can reconstruct the tree structure. For each row I look for the nearest previous row that has Level-1.
max(pre.Row) / pre.Row < cur.Row / pre.Level = cur.Level-1
Following code is working and it returns the right results. My problem is that the table is 7 million rows large and therefore it takes a lot of time. It is like comparing 7 million times 7 million rows...
SELECT cur.`Row`, (
SELECT max(pre.`Row`)
FROM `abc`.`def` AS pre
WHERE pre.`Row` < cur.`Row`
AND pre.`Level`=cur.`Level`-1
) AS prev_row
FROM `abc`.`def` AS cur
;
Is there a faster way to implement this?
Maybe with loops or user variables? I could imagine that you actually start from the current row and then test if the previous row meets the conditions otherwise look for the next previous row and so on. This will reduce the opertions to 7 million times ~5. I never worked with loops so I have no clue if this is possible in SQL. Any ideas?
here's my try with 3 levels you can add levels if you have more, not sure why it's returning weird values that look like encoded values but CAST() AS unSIGNED gets you prev_row just as your query.
SELECT Row,
CAST(ELT(level-1,#level_1,#level_2,#level_3) as UNSIGNED) as prev_row,
#level_1 := IF(`level` = 1, row, #level_1),
#level_2 := IF(`level` = 2, row, #level_2),
#level_3 := IF(`level` = 3, row, #level_3)
FROM `def`
ORDER BY Row ASC
http://sqlfiddle.com/#!9/719b2/22
Related
I need to select all data having non-duplicate IDs..
here's my sample table..
----------------------------------------------------------------------------------
ID | Zip-Code | Search Query | ID_LIST
----------------------------------------------------------------------------------
1 | 1000 | Query Sample 1 | 13,14,15,
----------------------------------------------------------------------------------
2 | 2000 | Query Sample 2 | 16,13,17,
----------------------------------------------------------------------------------
3 | 3000 | Query Sample 3 | 18,17,13,
----------------------------------------------------------------------------------
4 | 4000 | Query Sample 4 | 15,16,17,18,
----------------------------------------------------------------------------------
5 | 5000 | Query Sample 5 | 19, 20,
u can notice that IDs 1 and 2 have duplicate, which is 13 on ID_LIST
2 and 3 also have duplicate, which is 13 and 17.
What I want to do is make it like this...
----------------------------------------------------------------------------------
ID | Zip-Code | Search Query | ID_LIST
----------------------------------------------------------------------------------
1 | 1000 | Query Sample 1 | 13,14,15,
----------------------------------------------------------------------------------
2 | 2000 | Query Sample 2 | 16,17,
----------------------------------------------------------------------------------
3 | 3000 | Query Sample 3 | 18,
----------------------------------------------------------------------------------
5 | 5000 | Query Sample 5 | 19,20,
What query would be good for this? Any Help?
Best way to approach it is to normalize your data, as mentioned in comments. But if you absolutely have to do it this way, it would be very difficult to do in query on mysql.
I would suggest you to create a procedure for it. As and when you develop each step, you can google that particular solution of that step, and test it and build up on that. Let me know if any step sound confusing/unclear.
Create a variable string, say v_vals. Initialize with null. At the end of procedure, it will contain all the distinct values of id_list (13,14...20)
Iterate through each row.
Count the number of comma in id_list.
Loop from 1 to number of comma.
In every iteration, use substring and instring to find position of each comma and then extract values from id_list. (13,14...)
use another variable v_id_list. Put null in it.
Search for the values (from step 5) in v_vals. If they exist in v_val, then skip them, else put them in v_val and v_id_list.
Now run an update statement to update id_list with v_id_list.
Now repeat Step 3 to 8 for each row.
Note that v_id_list will be reinitialize for each loop, however v_val will contain all the distinct values of id_list.
Thanks for taking a look at this question. I'm kind of lost and hope someone can help me. Below is a update query i would like to run.
This query now returns an error:
1054 - Unknown column 'spi.et_cross_rank' in 'where clause'
Some background:
from table: tmp_ranking_tbl
I would like to get the nth(spi.et_return_rank) record
for a group with value x (spi.et_cross_rank)
SET #rownum=0;
UPDATE STRToer_Poule_indeling spi
SET spi.team_id = (SELECT R.team_poule_id
FROM (SELECT #rownum:=#rownum+1 AS rownum, trt.team_poule_id
FROM tmp_ranking_tbl trt
WHERE trt.overal_rank = spi.et_cross_rank
ORDER BY trt.punten DESC, (trt.goals_voor - trt.goals_tegen) DESC, trt.goals_voor DESC) R
WHERE R.rownum = spi.et_return_rank)
WHERE spi.et_ronde = v_et_ronde
AND spi.poule_id IN (SELECT row_id FROM STRToer_Poules WHERE toernooi_onderdeel_id=v_onderdeel_id) ;
Data in tmp_ranking_tbl looks like:
team_poule_id | punten | goals_voor | goals_tegen | overal_rank
65 | 6 | 10 | 10 | 2
69 | 6 | 9 | 10 | 2
75 | 7 | 11 | 4 | 2
84 | 6 | 6 | 8 | 2
112 | 5 | 7 | 7 | 2
Thanks in advance for the help!
Update after question in comment about the goal, i'll try to keep it short. :-)
This query is used on a website to keep scores of a tournament. Sometimes you have an odd number of teams going to the next round. At that point I want to select the best number 3(spi.et_cross_rank) team across poules. This is setting saved in the STRToer_Poule_indeling with what rank per poule and the 1st, 2nd or nth team(spi.et_return_rank). The table tmp_ranking_tbl is filled with all rank 3 teams across the poules. When this if filled I would like the 1st or 2nd, depedining on the setting in STRToer_Poule_indeling, record to return.
Subset of structure the STRToer_Poule_indeling table
row_id | team_id | et_ronde | et_cross_rank | et_return_rank
1 | null | 1 | 3 | 1
Just check if you have a column named et_cross_rank on your table STRToer_Poule_indeling
The problem seems to be that SQL can't find that column on your table.
Hope it helps.
I have a table with a bit over a million timestamped rows, is there a way for me to select like 30 rows which are evenly distributed?
So that if my data table contains five rows and I need three I want row 1, 3 and 5 returned.
Is there a way to do this in SQL?
Edit:
More specifically, I have a table with a list of different URLs and another table where data about the URLs are fetched and stored with regular intervals (in my case hourly).
What I want to do is be able to fetch a limited number of data rows (in my case 30) with an even interval between the dates. In a sense I want to filter out data points at a dynamic interval.
Does that make sense?
I guess you could consider something like this..
SELECT * FROM ints;
+---+
| i |
+---+
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+---+
Now let's say I wanted to return approximately 5 evenly distributed results from across this table...
SELECT x.i
FROM ints x
JOIN ints y
ON y.i <= x.i
GROUP
BY i
HAVING MOD(COUNT(y.i),ROUND((SELECT COUNT(*)/5 FROM ints),0)) = 0; -- where '5' equals the approximate number of results to be returned.
+---+
| i |
+---+
| 1 |
| 3 |
| 5 |
| 7 |
| 9 |
+---+
Note that at ca. 1m results, this solution is NOT going to scale well. Use variables for the ranking bit instead.
I want to make stored procedure but I don't know what should be the right approach or if this is even possible to do in MySQL.
Let me introduce you with my problem. Let's say I have table that has columns like this:
TABLE A
id | Hotel | city_name | region_name | country
1 | A | Amsterdam | North-Holland | Netherlands
2 | B | Amsterdam | North-Holland | Netherlands
3 | C | Leiden | North-Holland | Netherlands
4 | D | Katwijk | North-Holland | Netherlands
5 | E | Leiden | North-Holland | Netherlands
6 | F | Katwijk | North-Holland | Netherlands
I would like to get only 3 results each time I execute this query and results need to be created in this order:
If there are 3 or more cities ( user selects in this case we can take Amsterdam ) that city_name=Amsterdam then return random 3 cities in Amsterdam
If there is less then 3 records in Amsterdam return any of the record that has Amsterdam + return random records where region=North-Holland but the total amount of the records that are returned should always be 3 ( example: we have 2 records where city= Amsterdam + we take one random record from region=North Holand;
example2: we have one record where city=Amsterdam + we take 2 random records where region="north Holland")
Is that possible to this with the SQL Or should I get all the records in php and then iterate through each of the records?
I probably need to pass 2 arguments in the procedure (city_name, region ).
So far I have tried some basic SQL queries and I couldn't get it to work.
You should try something like
(SELECT *
FROM A
WHERE city_name = 'Amsterdam'
ORDER BY RAND()
LIMIT 3)
UNION
(SELECT *
FROM A
WHERE region = 'North Holand' AND city_name <> 'Amsterdam'
ORDER BY RAND()
LIMIT 3)
LIMIT 3;
I would recommend you do the logical part of this primarily in php. While it is possible to do in sql, I've found that logic structures in sql tend to be hard to follow, and that is less of an issue in php.
Doing the logic in php could require two separate queries (but only if you don't get 3 initially).
I'd run a query to get the initial three (use LIMIT 3 in the sql). Check to see if you got three results. If you didn't, subtract the amount you do get from 3, then use that as the LIMIT in a second query to get the other random results.
I've checked out a few of the stackoverflow questions and there are similar questions, but didn't quite put my fingers on this one.
If you have a table like this:
uid cat_uid itm_uid
1 1 4
2 1 5
3 2 6
4 2 7
5 3 8
6 3 9
where the uid column in auto_incremented and the cat_uid references a
category of relevance to filter on and the itm_uid values are the one
we're seeking
I would like to get a result set that contains the following sample results:
array (
0 => array (1 => array(4,5)),
1 => array (2 => array(6,7)),
2 => array (3 => array(8,9))
)
An example issue is - select 2 records from each category (however many categories there may be) and make sure they are the last 2 entries by uid in those categories.
I'm not sure how to structure the question to allow an answer, and any hints on a method for the solution would be welcome!
EDIT:
This wasn't a very clear question, so let me extend the scenario to something more tangible.
I have a set of records being entered into categories and I would like to select, with as few queries as possible, the latest 2 records entered per category, so that when I list out the contents of those categories, I will have at least 2 records per category (assuming that there are 2 or more already in the database). A similar query was in place that selected the last 100 records and filtered them into categories, but for small numbers of categories with some being updated faster than others can lead to having the top 100 not consisting of members from every category, so to try to resolve that, I was looking for a way to select 2 records from each category (or N-records assuming it's the same per-category) and for those 2 records to be the last entered. A date field is available to sort on, but the itm_uid itself could be used to indicate inserted order.
SELECT cat_uid, itm_uid,
IF( #cat = cat_uid, #cat_row := #cat_row + 1, #cat_row := 0 ) AS cat_row,
#cat := cat_uid
FROM my_table
JOIN (SELECT #cat_row := 0, #cat := 0) AS init
HAVING cat_row < 2
ORDER BY cat_uid, uid DESC
You will have two extra columns in the results, just ignore them.
This is the logic:
We sort the table by cat_uid, uid descending, then we start from the top and give each row a "row number" (cat_row) we reset this row number to zero whenever cat_uid changes:
---------------------------------------
| uid | cat_uid | itm_uid | cat_row |
| 45 | 4 | 34 | 0 |
| 33 | 4 | 54 | 1 |
| 31 | 4 | 12 | 2 |
| 12 | 4 | 51 | 3 |
| 56 | 6 | 11 | 0 |
| 20 | 6 | 64 | 1 |
| 16 | 6 | 76 | 2 |
| ... | ... | ... | ... |
---------------------------------------
now if we keep only the rows that have cat_row < 2 we get the results we want:
---------------------------------------
| uid | cat_uid | itm_uid | cat_row |
| 45 | 4 | 34 | 0 |
| 33 | 4 | 54 | 1 |
| 56 | 6 | 11 | 0 |
| 20 | 6 | 64 | 1 |
| ... | ... | ... | ... |
---------------------------------------
This is called an adjacent tree model or a parent-child tree model. It's one of the simplier tree model where there is only 1 pointer or 1 leaf. You would solve your query with a recursion or using a Self Join. Sadly MySQL doesn't support recursive queries, maybe it's working with prepared statements. I want to suggest you an Self Join. With a Self Join you can get all the rows from the right side and the left side with a special condition.
select t1.cat_uid, t2.cat_uid, t1.itm_uid, t2.itm_uid From t1 Inner Join t2 On t1.cat_uid = t2.cat_uid