MySQL group multiple(variable) rows into one row with multiple columns - mysql

I'm looking for a query that will return me several rows into columns but without knowing the number of rows beforehand. I have searched and the only solutions I found involve knowing how many rows there are.
Here's an example table:
parentID colA colB
2 aaaaaa 1000.00
2 bbbbbb 1500.00
3 cccccc 500.00
3 dddddd 700.00
3 eeeeee 2000.00
and i need it to look like:
parentID colA(n) colB(n) colA(n+1) colB(n+1) colA(n+2) colB(n+2)
2 aaaaaaa 1000.00 bbbbbb 1500.00 NULL NULL
3 cccccc 500.00 dddddd 700.00 eeeeee 2000.00
I realize this should be done in PHP but I need it to be in mysql for a third party excel exporter plugin I'm using.
Edit: Is there a way to do this if I know the maximum number of columns I'll need?

You cannot do a query in SQL without knowing the number of columns.
The columns of a SELECT-list must be fixed at the time of parsing the query. What you're asking for is that the state of data, which is not known until the query executes, determines the number of columns. That is not the way SQL works.
To accomplish a pivot-type operation, or any query where the data determines the columns, you have two choices:
Do a preparatory query to discover how many distinct groups you want to fetch, and use this to build the query with a matching number of columns.
Query all the data in rows, fetch it back into your application, and then transform the result set wholly within data structures (i.e. arrays) within your code.
Either way, you need to write application code, either before or after fetching the data.
Re your comment: You're right, this isn't a traditional pivot, but it's similar in that data is driving the number of columns. That is, you need as many columns as 2x the number of rows in the largest group. But they won't all be filled, because the number of rows per group varies. You don't have a fixed number of row per group, therefore you can't fill a fixed number of columns per group.
I'd really recommend you use the latter strategy: fetch the data as rows, as they are stored in the database. Then post-process it in your application code. Loop over the result set and build your data structure incrementally as you fetch rows from the database.
For example, in PHP:
while ($row = $stmt->fetch()) {
$data[$row['parentID']][] = $row['colA'];
$data[$row['parentID']][] = $row['colB'];
}

Related

Update a row if a field is a subsequence of a string

I have a string S = "1-2-3-4-5-6-7-8"
This is how my database table rows look like:
id
SubSequence
1
1-2-4-5
2
1-3-4-5
3
2-5-7-8
4
5-8-9-10
5
6-7-10-11
and so on ...
I want to write a query that would update (in this example) only the first 3 rows because they're a subsequence of string S.
The current solution I have is to programmatically go thru each row, check if it's a subsequence, and update. But I'm wondering if there's a way to do it at the MySQL level for performance.
Update: I don't mind changing the way data is stored. For example, String S could be an array holding those numbers, and the "SubSequence" column can hold those numbers as an array.
No, there is not a way to do the query you describe with good performance in SQL when you store the subsequences as strings like you have done. The reason is that doing substring comparisons cannot be optimized with indexes, so your query will be forced to do the comparisons row by row.
In general, when you try to store sets of values as a string, but you want to use SQL to treat them as discrete values, it's bound to be awkward, difficult to code, and ultimately have bad performance.
In this case, what I would do is make a two tables, one that numbers your entities, and a second table in which each value in your subsequence is stored on a row by itself.
SubSequences:
id
1
2
SubSequenceElements:
id
SubSequenceElement
1
1
1
2
1
4
1
5
2
1
2
3
2
4
2
5
And so on.
Then you can use relational-division techniques to find cases where every element of this set exists in the set you want to compare it to.
Here's an example:
SELECT s.id
FROM SubSequences AS s
LEFT OUTER JOIN (
SELECT id
FROM SubSequenceElements
WHERE SubSequenceElement NOT IN (1,2,3,4,5,6,7,8)
) AS invalid USING (id)
WHERE invalid.id IS NULL;
In other words, you want to return rows from SubSequences such that no match is found in SubSequenceElements with an element value that is not in the set you're trying to match.
It's a bit confusing, because you have to think about the problem is a double-don't-match-this-set problem. But once you get relational division, it can be very powerful.
If the set can be represented by the numbers 0 through 63 (or some subset of that), then...
Using a column like this
elements BIGINT UNSIGNED NOT NULL DEFAULT '0'
Then "2-5-7-8" could be put into it thus:
UPDATE ...
SET elements = (1<<2) | (1<<5) | (1<<7) | (1<<8);
Then various operations can be done in a single expression:
WHERE elements = (1<<2) | (1<<5) | (1<<7) | (1<<8) -- Test for exactly that set
WHERE (elements ^ ~ ( (1<<2) | (1<<5) | (1<<7) | (1<<8) )) != 0
-- checks to see if any other bits are turned on
This last example is close to what you need. One side of the "and not" would have the 1..8 of your example, the other would have
Your example has S represented as 0x1FE;
WHERE subsequence & ~0x1FE
will be 0 (false) for ids 1,2,3; non-zero (true) for ids 4 and 5.

MySql: adding columns dynamically, as many as rows in another table

Transport table
id name
1 T1
2 T2
Pallets table
id name
1 P1
2 P2
Transport Pallet Capacity table
id transport_id pallet_id capacity
1 1 1 10
2 1 2 null
3 2 1 20
4 2 2 24
How to generate table like this:
id transport_id pallet_id_1_capacity pallet_id_2_capacity
1 1 10 null
2 2 20 24
Problem: pallets and transports can be added, so, neither quantity is known in advance.
For example, manager adds another pallet type and 'pallet_id_3_capacity' column should be generated (and can show null if no capacity data is yet available).
Another manager can fill 'transport pallet capacity' table later when notified.
Is there a way to build sql in mysql that will care about the above: specifically - dynamic number of pallets?
The SQL select-list must be fixed at the time you write the query. You can't make SQL that auto-expands its columns based on the data it finds.
But your request is common, it's called a pivot-table or a crosstab table.
The only solution is to do this in multiple steps:
Query to discover the distinct pallet ids.
Use application code to build a dynamic SQL query with as many columns as distinct pallet id values found in the first query.
Run the resulting dynamic SQL query.
This is true for all SQL databases, not just MySQL.
See MySQL pivot row into dynamic number of columns for a highly-voted solution for producing a pivot-table query in MySQL.
I am not voting your question as a duplicate of that question, because your query also involves transport_id, which will make the query solution a bit different. But reading about other pivot-table solutions should get you started.

Storing csv in MySQL field – bad idea?

I have two tables, one user table and an items table. In the user table, there is the field "items". The "items" table only consists of a unique id and an item_name.
Now each user can have multiple items. I wanted to avoid creating a third table that would connect the items with the user but rather have a field in the user_table that stores the item ids connected to the user in a "csv" field.
So any given user would have a field "items" that could have a value like "32,3,98,56".
It maybe is worth mentioning that the maximum number of items per user is rather limited (<5).
The question: Is this approach generally a bad idea compared to having a third table that contains user->item pairs?
Wouldn't a third table create quite an overhead when you want to find all items of a user (I would have to iterate through all elements returned by MySQL individually).
You don't want to store the value in the comma separated form.
Consider the case when you decide to join this column with some other table.
Consider you have,
x items
1 1, 2, 3
1 1, 4
2 1
and you want to find distinct values for each x i.e.:
x items
1 1, 2, 3, 4
2 1
or may be want to check if it has 3 in it
or may be want to convert them into separate rows:
x items
1 1
1 2
1 3
1 1
1 4
2 1
It will be a HUGE PAIN.
Use atleast normalization 1st principle - have separate row for each value.
Now, say originally you had this as you table:
x item
1 1
1 2
1 3
1 1
1 4
2 1
You can easily convert it into csv values:
select x, group_concat(item order by item) items
from t
group by x
If you want to search if x = 1 has item 3. Easy.
select * from t where x = 1 and item = 3
which in earlier case would use horrible find_in_set:
select * from t where x = 1 and find_in_set(3, items);
If you think you can use like with CSV values to search, then first like %x% can't use indexes. Second, it will produce wrong results.
Say you want check if item ab is present and you do %ab% it will return rows with abc abcd abcde .... .
If you have many users and items, then I'd suggest create separate table users with an PK userid, another items with PK itemid and lastly a mapping table user_item having userid, itemid columns.
If you know you'll just need to store and retrieve these values and not do any operation on it such as join, search, distinct, conversion to separate rows etc. etc. - may be just may be, you can (I still wouldn't).
Storing complex data directly in a relational database is a nonstandard use of a relational database. Normally they are designed for normalized data.
There are extensions which vary according to the brand of software which may help. Or you can normalize your CSV file into properly designed table(s). It depends on lots of things. Talk to your enterprise data architect in this case.
Whether it's a bad idea depends on your business needs. I can't assess your business needs from way out here on the internet. Talk to your product manager in this case.

parameters for nested mysql views

Here is a table with barcodes which belongs to different warehouses.
Barcode | Warehouse
_____________________________
1111111 | A
2222222 | B
1111111 | C
3333333 | A
And here is a table with boxes containing barcodes.
Barcode | Box
_____________________________
1111111 | 0001
2222222 | 0002
Each warehouse's available stock is its amount in the first table, plus all the amount in boxes.
Example for warehouse A:
Barcode
_________
1111111 (from its warehouse)
3333333 (from its warehouse)
1111111 (from a box)
2222222 (from a box)
This is a simplified example. After retrieving the total amount of barcodes, I cross it with a lot of other queries and tables to transform it into a human-readable report.
Ok,
The idea would be a server-side query.
Every client (VBA msaccess) would retrieve the query and filter it using its warehouse code.
Warehouse A would call it like this:
select * from finalQuery where warehouse like 'A' <--- BUT it won't work, because boxes' barcodes haven't the warehouse field, thus, they would be excluded.
The "where" clause should be performed before the UNION ALL.
Would it be possible to use parameters in order to exclusively retrieve a warehouse's barcodes + all boxes' barcodes in a server-side query? Even though the user calls the last query with its code, it should push the parameter down to the first nested query.
Or any other trick? Maybe my scheme is wrong?
The problem manipulating queries in the client side, is that it becomes painstakingly SLOW, because as I said, after joining barcodes, I use the resulting query for building other queries.
Hope I explaied it clearly. It is somewhat complex to explain. I would appreciate any suggestion, trick, idea, etc
Thank you.
I think what you're looking for is a JOIN statement. You can join the Barcode-Warehouse table with the Barcode-Box table using the common Barcode column. This article is a great explanation: http://www.tutorialspoint.com/mysql/mysql-using-joins.htm.
Your server side query will end up being something like this:
SELECT Barcode, Box, Warehouse FROM Barcode-Warehouse LEFT JOIN Barcode-Box USING (Barcode);
This should result in a result set that has Barcode, Box, and Warehouse on each line. Your users would then be able to filter that result by Warehouse and retrieve only the records that they are interested in.
Found a way to solve it:
Can I create view with parameter in MySQL?
I filter the first query by adding:
"where warehouse=function()"
When I call the final query, I add the parameter for the function as explained in the post. Easy, simple.
thank you

mysql optimize data content: multi column or simple column hash data

I actually have a table with 30 columns. In one day this table can get around 3000 new records!
The columns datas look like :
IMG Name Phone etc..
http://www.site.com/images/image.jpg John Smith 123456789 etc..
http://www.site.com/images/image.jpg Smith John 987654321 etc..
I'm looking a way to optimize the size of the table but also the response time of the sql queries. I was thinking of doing something like :
Column1
http://www.site.com/images/image.jpg|John Smith|123456789|etc..
And then via php i would store each value into an array..
Would it be faster ?
Edit
So to take an example of the structure, let's say i have two tables :
package
package_content
Here is the structure of the table package :
id | user_id | package_name | date
Here is the structure of the table package_content :
id | package_id | content_name | content_description | content_price | content_color | etc.. > 30columns
The thing is for each package i can get up to 16rows of content. For example :
id | user_id | package_name | date
260 11 Package 260 2013-7-30 10:05:00
id | package_id | content_name | content_description | content_price | content_color | etc.. > 30columns
1 260 Content 1 Content 1 desc 58 white etc..
2 260 Content 2 Content 2 desc 75 black etc..
3 260 Content 3 Content 3 desc 32 blue etc..
etc...
Then with php i make like that
select * from package
while not EOF {
show package name, date etc..
select * from package_content where package_content.package_id = package.id and package.id = package_id
while not EOF{
show package_content name, desc, price, color etc...
}
}
Would it be faster? Definitely not. If you needed to search by Name or Phone or etc... you'd have to pull those values out of Column1 every time. You'd never be able to optimize those queries, ever.
If you want to make the table smaller it's best to look at splitting some columns off into another table. If you'd like to pursue that option, post the entire structure. But note that the number of columns doesn't affect speed that much. I mean it can, but it's way down on the list of things that will slow you down.
Finally, 3,000 rows per day is about 1 million rows per year. If the database is tolerably well designed, MySQL can handle this easily.
Addendum: partial table structures plus sample query and pseudocode added to question.
The pseudocode shows the package table being queried all at once, then matching package_content rows being queried one at a time. This is a very slow way to go about things; better to use a JOIN:
SELECT
package.id,
user_id,
package_name,
date,
package_content.*
FROM package
INNER JOIN package_content on package.id = package_content.id
WHERE whatever
ORDER BY whatever
That will speed things up right away.
If you're displaying on a web page, be sure to limit results with a WHERE clause - nobody will want to see 1,000 or 3,000 or 1,000,000 packages on a single web page :)
Finally, as I mentioned before, the number of columns isn't a huge worry for query optimization, but...
Having a really wide result row means more data has to go across the wire from MySQL to PHP, and
It isn't likely you'll be able to display 30+ columns of information on a web page without it looking terrible, especially if you're reading lots of rows.
With that in mind, you'll be better of picking specific package_content columns in your query instead of picking them all with a SELECT *.
Don't combine any columns, this is no use and might even be slower in the end.
You should use indexes on a column where you query at. I do have a website with about 30 columns where atm are around 600.000 results. If you use EXPLAIN before a query, you should see if it uses any indexes. If you got a JOIN with 2 values and a WHERE at the same table. You should make a combined index with the 3 columns, in order from JOIN -> WHERE. If you join on the same table, you should see this as a seperate index.
For example:
SELECT p.name, p.id, c.name, c2.name
FROM product p
JOIN category c ON p.cat_id=c.id
JOIN category c2 ON c.parent_id=c2.id AND name='Niels'
WHERE p.filterX='blaat'
You should have an combined index at category
parent_id,name
AND
id (probably the AI)
A index on product
cat_id
filterX
With this easy solution you can optimize queries from NOT DOABLE to 0.10 seconds, or even faster.
If you use MySQL 5.6 you should step over to INNODB because MySQL is better with optimizing JOINS and sub queries. Also MySQL will try to run them into MEMORY which will make it a lot faster aswel. Please keep in mind that backupping INNODB tables might need some extra attention.
You might also think about making MEMORY tables for super fast querieing (you do still need indexes).
You can also optimize by making integers size 4 (4 bytes, not 11 characters). And not always using VARCHAR 255.