MySQL help, updating fields based on a calculation - mysql

I've been trying to work out this particular problem and though I can easily think of a sort of brute force PHP + MySQL solution, I want some guidance on solving this particular problem without iterating through fields with PHP.
So.. with that, here's the problem.
I want to precalculate each room's size relative to all other rooms with a single query (or 3), such that rooms that are bigger than 66% of all other rooms can have a category filled in as Large, while rooms within a 33%-66% range are given Medium and the rest are considered Small.
I have a general idea of how to complete this, but I'm hoping that someone more adept as SQL queries could at least point me in the right direction.
The hardest part for me comes with being able to simultaneously update every field that fits the criteria of falling within a certain range :(.
Here's an example of the table
Rooms
ID | Length | Width | Relative Size [Expected Values]
------------------------------------
1 | 15 | 12 | Large
2 | 15 | 12 | Large
3 | 10 | 10 | Medium
4 | 10 | 10 | Medium
5 | 8 | 9 | Small
6 | 8 | 8 | Small
7 | 8 | 7 | Small
8 | 10 | 9 | Medium
I'd be perfectly happy with a resources or clues that can assist me with this, but I've been going in circles.
Thanks for any attempts at helping me out.
Edit: I ended up going with a standard deviation approach, since it makes a lot more sense in this case, I'll post it as an answer.

I couldnt think of a way to do it in 1 query but here is my attempt to do it in 3, assuming size is Null before populating
Update table set size = 'Large' where ID in (select TOP 33 PERCENT ID from table order by length*width Desc)
Update table set size = 'Small' where ID in (select TOP 33 PERCENT ID from table order by length*width Asc)
Update table set size = 'Medium' where size = null

Related

phpmyadmin is this Index Size?

Good day, newbie on php here.
I use phpmyadmin mqysql, my problem is i don't know what should i put in the encircle field shown in the picture below (also know what is this and how to use it)
I proceeded not giving any value on it and it happens whenever i make a primary key or unique key on a table i created. Is this what they call index size? i tried searching this on internet and see other tutorials but i don't see any mentions on this(maybe im googling it wrong?).
So what does this do?
what value should i put here?
what is the default value of this?
when using unique, what do veterans put on index name when selecting unique?
i hope you could enlighten or teach it to me because its quite vague now that im self studying it, thanks :)
That with the index size is very simple. imagine that you create an index on a VARCHAR (16) column. In this case the index entry is created with 16 characters. Now it can be that the strings already differ in the first characters. In such a case, the length of the index can e.g. 8 set.
This makes the index shorter, uses less memory and is therefore faster. If there are several entries in the column that are the same in the first 8 characters, all these rows are found via the index and the comparison which row really fits is then made by comparing the individual rows. So if the number of entries found is very high, the whole thing will be slower.
check how many equal entries in the table with a shorter index
+----+-----+---------------------------------------------------+
| id | rev | content |
+----+-----+---------------------------------------------------+
| 2 | 1 | One hundred angels can dance on the head of a pin |
| 1 | 1 | The earth is flat |
| 3 | 2 | The earth is flat and rests on a bull's horn |
| 5 | 5 | The earth is flat type |
| 4 | 3 | X The earth is like a ball. |
+----+-----+---------------------------------------------------+
SELECT d.*,count(*) as cnt
FROM docs as d
GROUP BY SUBSTRING(d.content,1,8)
ORDER BY cnt DESC;
+----+-----+---------------------------------------------------+-----+
| id | rev | content | cnt |
+----+-----+---------------------------------------------------+-----+
| 1 | 1 | The earth is flat | 3 |
| 2 | 1 | One hundred angels can dance on the head of a pin | 1 |
| 4 | 3 | X The earth is like a ball. | 1 |
+----+-----+---------------------------------------------------+-----+
3 rows in set (0.00 sec)

database design - MySQL: How to store and split time series

I have a table where I store historical data and add a record for items I'm tracking every 5 mins.
This is an example using just 2 items:
+----+-------------+
| id | timestamp |
+----+-------------+
| 1 | 1533209426 |
| 2 | 1533209426 |
| 1 | 1533209726 |
| 2 | 1533209726 |
| 1 | 1533210026 |
| 2 | 1533210026 |
+----+-------------+
The problem is that I'm actually tracking 4k items and the table keeps getting bigger, also, I don't need 5 mins data if I want to get the last month. What I'm trying to understand is if there's a way to keep 5 mins records for the last 24h, 1h records for the last 7 days etc. Maybe every hour I could get the first 12 records from the 5 mins table and store the average in the 1h table? But what if some records are missing because there were errors? Is this the correct way to solve this problem or there are some better alternatives?
You are on the right track.
There are multiple issues to decide on how to handle -- missing entries, timestamps skewed by 1 second (or whatever), etc.
By providing a count (which should always be 12), you can discover some hiccups:
SELECT FLOOR(timestamp / 3600) AS hr, -- MEDIUMINT UNSIGNED
COUNT(*), -- TINYINT UNSIGNED
AVG(metric) -- FLOAT
FROM tbl
GROUP BY 1;
Yes, every hour, do the previous hour's worth of data. Add WHERE timestamp BETWEEN ... AND ... + 3599 to constrain the range in question. Then purge the same set of data.
The table would have PRIMARY KEY(hr).
Unless you are talking about millions of rows in a table, I would not recommend any use of PARTITION.

Advanced Average Date DIfference with unique ids

im back to stack overflow with another headache that I have been trying to get to the bottom of with no success at all. No matter how many times I use avg(datediff) functions.
I have an SQL table like the below:
ID | PersonID | Start | End
1 | 1 | 2006-03-21 00:00:00 | 2007-05-19 00:00:00 | Active
2 | 1 | 2007-05-19 00:00:00 | 2007-05-20 00:00:00 | Active
3 | 2 | 2016-08-24 00:00:00 | 2016-08-25 00:00:00 | Active
4 | 2 | 2005-08-25 00:00:00 | 2016-08-28 00:00:00 | Active
5 | 2 | 2016-08-28 00:00:00 | 2017-10-05 00:00:00 | Active
Im trying to find the average active stay (in days) across all unique people.
Ie the average number of days based on their EARLIEST start date and LATEST end date (as a single person ID can have multiple active statuses).
For example, person ID 1, their earliest start date was 2006-03-21 and their latest end date is 2007-05-20. Their stay has therefore been 425 days.
Repeat this for ID number 2, their stay is 407 days.
After doing this for everyone on the table... I want to get the average length of stay, the average for the above 5 rows, with 2 unique people is 416. Doing a simple datediff average across all rows will give me a very inaccurate average of 102.
Hope this makes sense. As always,any help you could give is very much appreciated.
So why not try that:
SELECT
AVG(DATEDIFF(PersonEnd, PersonStart))
FROM
(SELECT
MIN(Start) AS PersonStart,
MAX(End) AS PersonEnd
FROM
table
GROUP BY
PersonID) PeriodsPerPerson
Of course, you should have proper indexes so that MySQL can compute MAX and MIN fast and can group fast as well, which means indexes at least on PersonID, Start and End.
Please note that you really need the alias for the inner query although I don't use it anywhere. If you leave it away, you'll run into an error, at least with MySQL 5.5 (I don't know about later versions).
If you have millions or even billions of rows, you might be better off moving the calculation into a stored procedure or a back-end application instead of doing it as shown above.

mySQL database: Separating/clustering(?) data

Currently I'm dealing with kinda large mySQL transactional database for one e-commerce project. We obtain data from e-shops including products sold. Each e-shop adds information about similarities between products and list them as groups. So, for instance shop A sends information:
Group 1: iPhone blue, iPhone black, iPhone green
Group 2: iPad blue, iPad black, iPad green, etc.
Another e-shop sends this kind of information:
Group 3: iPhone pink, iPhone black
Group 4: iPad blue, iPad pink
Each product is stored in table Products: (Important: This table has about 150 000 000 rows)
Id | Name
------------------
1 | iPhone blue
2 | iPhone black
3 | iPhone green
4 | iPhone pink
5 | iPad blue
6 | iPad black
7 | iPad green
8 | iPad pink
Also, there is a table Groups with groups stated above: (M:N relationship)
Id | Id_product | Group
--------------------------
1 | 1 | 1
2 | 2 | 1
3 | 3 | 1
4 | 5 | 2
5 | 6 | 2
6 | 7 | 2
7 | 4 | 3
8 | 1 | 3
9 | 5 | 4
10 | 8 | 4
Now, the problem is that groups 1 + 3 and groups 2 + 4 should be merged together.
Current (horrible) solution to this problem is based on obtaing all groups for the product (by GROUP_CONCAT function in query) and then all products from these groups. Then updating table groups to merge these groups into one.
Main problems with this approach are:
Very problematic computational complexity.
Groups obtained from e-shops can be wrong(!). Imagine this group:
Group5: iPhone Black, iPad Black. Taking this group into account, whole separation process is wrong. You end up with one group with iPhones and iPads together (that's wrong).
So, now, finally, the question:
Any ideas how to approach this problem? Just hints/tips will be enough, I'm just totally stuck with lack of my knowledge.
I was playing around with fuzzy-hashing algorithms / k-means clustering, but it seems to me that it is not suitable for this problem. Fuzzy-hashing seems to be getting into account names of the products (that can be good with iPhone, but cannot image it with T-shirts, their names are not very "well-prepared", so it's hard to guess differences just from the name). Am I missing something?
So, any idea?
Anyway, just for the purpose of solving this particular problem, it's possible to introduce different database solution, there's no problem in that.
Thanks in advance:)
Chmelda
An idea might be to add a table "group_conversion" which translates each external group number into your own group number.
In this case the table would look like:
Group_external | NameMatch | ID_my_group
----------------------------------------
1 | null | 1
2 | null | 2
3 | null | 1
4 | null | 2
5 | "IPhone%" | 1
5 | "IPad%" | 2
When inserting new data coming from an e-shop, you should first translate the incoming group number to your own group numbering, before adding it to the Groups table.
The NameMatch field is only used if you want to separate products whitin an incoming group (the Group5 you mentioned).
So if this field is null, just convert the ID. Otherwise only convert the ID if the name of the product matches NameMatch.
To convert your current data it might help to create a new table (e.g. Groups2) which has the same fields as Groups, with the only difference that Group is a reference to the new group numbering.
You can then fill the new table by converting each record of Groups.
After conversion is done, drop the Groups table and rename the Groups2 table.
In this way you will get a much smaller table size for Groups and the table already contains merged data, so no separate queries are needed for merging.
Hope this will help!

SQL: Issue retrieving data via Select query in a specific (static) order

I am currently trying to create a MySQL query, which outputs data from a specific table (images) using a static order. I am currently using UNION ALL to do this, but I'm kinda stuck now.
Here is what I want it to do:
I have a table named images with the following fields: user_id, image, image_dimension (this field values: small, wide, tall, large).
From the images table, I want to retrieve images with specific dimensions in a static order, to fit these into my static image grid. Now it gets complicated: I only want one (1) image per user_id.
This is my try so far:
SELECT
*
FROM
(
(SELECT * FROM `artworks` WHERE img = 'Large' LIMIT 1)
UNION ALL
(SELECT * FROM `artworks` WHERE img = 'Small' LIMIT 2)
UNION ALL
(SELECT * FROM `artworks` WHERE img = 'Large' LIMIT 1)
UNION ALL
(SELECT * FROM `artworks` WHERE img = 'Wide' LIMIT 1)
UNION ALL
(SELECT * FROM `artworks` WHERE img = 'Small' LIMIT 2)
UNION ALL
(SELECT * FROM `artworks` WHERE img = 'Tall' LIMIT 2)
UNION ALL
(SELECT * FROM `artworks` WHERE img = 'Large' LIMIT 1)
) AS order
This query pulls images with a specific dimension from the table "images" and with LIMIT - I get the amount I need for the order. This works so far, but it doesn't allow me to only get one image per user_id while keeping the order.
How can I make sure I get the order (see query) but only get one image per user_id?
Order again is: 1x Large, 2x Small, 1x Large, 1x wide, 2x small, 2x tall and 1x large)
Sorry for my bad english, let me know if you need any further description.
Edit:
Table format artworks:
id | user_id | img_name | img
Sample data from artworks:
1 | 1 | Test 1 | Large
2 | 1 | Test 2 | Large
3 | 2 | Test 3 | Small
4 | 2 | Test 4 | Small
5 | 2 | Test 5 | Small
6 | 3 | Test 6 | Small
7 | 3 | Test 7 | Small
8 | 3 | Test 8 | Small
9 | 4 | Test 9 | Large
10 | 4 | Test 10 | Large
11 | 5 | Test 11 | Small
12 | 5 | Test 12 | Wide
13 | 6 | Test 13 | Small
14 | 7 | Test 14 | Small
My expected result (ORDER BY id DESC) to get latest img of each user:
2 | 1 | Test 2 | Large
5 | 2 | Test 5 | Small
8 | 3 | Test 8 | Small
10 | 4 | Test 10 | Large
12 | 5 | Test 12 | Wide
13 | 6 | Test 13 | Small
14 | 7 | Test 14 | Small
One method, which I've yet to test in this circumstance but the idea is correct, would be to construct a helper table in the database containing your image arrangement like so:
----- --------
size ordering
----- --------
Large 1
Small 2
Small 3
Large 4
Wide 5
Small 6
Small 7
Tall 8
Tall 9
Large 10
Then you'll need to construct a query that will pull out a list of users paired with an ordering number, something like the following:
SET #ordering = 0;
SELECT artwork.*
FROM arrangement
JOIN (SELECT #ordering := #ordering + 1 AS ordering, user_id
FROM users) AS ordered_users
ON arrangement.ordering = ordered_users.ordering
JOIN artwork ON ordered_users.user_id = artwork.user_id AND
arrangement.size = artwork.img
GROUP BY ordering ASC;
The trick here is that the #ordering variable is incremented and set for each row of the users table queried, and this can then be paired with entries form the arrangement table by their ordering number to pair a user, an image size, and their order in the arrangement. Once you've that, you can match them with an image. (Ab)using GROUP BY (rather than ORDER BY) with some MySQL-specific behaviour eliminates any images of the same size that the user might have, and effectively chooses one of them by some arbitrary method.
This method is far from ideal, especially as it abuses some MySQL-specific behaviour, though that abuse could possibly be eliminated with a correlated subquery. However, it might give you some ideas as to how to solve the problem at the very least.
That said, this kind of thing isn't exactly the sort of thing you'd do with SQL.