MySQL - How to pivot NVP? - mysql

Say I have a product_attribute table with the following rows:
================================================================
| product_attribute_id | product_id | name | value |
================================================================
| 1 | 25 | Author | John Doe |
----------------------------------------------------------------
| 2 | 25 | Author | Jane Doe |
----------------------------------------------------------------
| 3 | 55 | Publisher | ABC Corp |
----------------------------------------------------------------
| 4 | 55 | Release Date | 20100125 |
----------------------------------------------------------------
I'm looking into implementing Solr for full-text searching and I think this table potentially has important information that should be indexed. So, I think this table needs to be pivoted (using product_id as the pivot point) so I can combine it with other tables that have information that should be indexed.
Questions:
How do I pivot this in MySQL?
I do not know in advance what all the name/value pairs are going to be. Will this be a problem?
Some attributes have identical names (e.g. "Author" in the example above). Will this be a problem?

thats a pretty standard implementation
SELECT
product_id,
GROUP_CONCAT(if(name = 'Author', value, NULL)) AS 'Author',
GROUP_CONCAT(if(name = 'Publisher', value, NULL)) AS 'Publisher',
FROM product_attribute
GROUP BY product_id;
you have to
select distinct(name) from product_attribute
so you can build the above query
but NO it will not work with identical names , GROUP_CONCAT will concat the values .
i ve seen an implementation which adds a column and populates it with increment values so that it can then pivot the table using variables and a counter. but i dont have that in mysql

Related

MySQL - How to order duplicate rows in a key value pair table based on multiple columns?

So I have the following key/value pair table, where users submit data through a form and each question on the form is added to the table here as an individual row. Submission_id identifies each form submission.
+----+---------------+--------------+--------+
| id | submission_id | key | value |
+----+---------------+--------------+--------+
| 1 | 10 | manufacturer | Apple |
| 2 | 10 | model | 5s |
| 3 | 10 | firstname | Paul |
| 4 | 15 | manufacturer | Apple |
| 5 | 15 | model | 5s |
| 6 | 15 | firstname | Paul |
| 7 | 20 | manufacturer | Apple |
| 8 | 20 | model | 5s |
| 9 | 20 | firstname | Andrew |
+----+---------------+--------------+--------+
From the data above you can see that the submissions with id of 10 and 15 both have the same values (just different submission id). This is basically because a user has submitted the same form twice and so is a duplicate.
Im trying to find a way to order these table where the any duplicate submissions appear together in order. Given the above table I am trying to build a query that gives me the result as below:
+---------------+
| submission_id |
+---------------+
| 10 |
| 15 |
| 20 |
+---------------+
So I want to check to see if a submission where the manufacturer, model and firstname keys have the same value. If it does then these get the submission id and place them adjacently in the result. In the actual table there are other keys, but I only want to match duplicates based on these 3 keys (manufacturer, model, firstname).
I’ve been going back and forth to the drawing board quite some time now and have tried looking for some possible solutions but cannot get something reliable.
That's not a key value table. It's usually called an Entity-Attribute-Value table/relation/pattern.
Looking at the problem, it would be trivial if the table were laid out in conventional 1st + 2nd Normal form - you just do a join on the values, group by those and take a count....
SELECT manufacturer, model, firstname, COUNT(DISTINCT submission_id)
FROM atable
GROUP BY manufacturer, model, firstname
HAVING COUNT(DISTINCT submission_id)>1;
Or a join....
SELECT a.manufacturer, a.model, a.firstname
, a.submission_id, b.submission_id
FROM atable a
JOIN atable b
ON a.manufacturer=b.manufacturer
AND a.model=b.model
AND a.firstname=b.firstname
WHERE a.submission_id<b.submission_id
;
Or using sorting and comparing adjacent rows....
SELECT *
FROM
(
SELECT #prev.submission_id AS prev_submission_id
, #prev.manufacturer AS prev_manufacturer
, #prev.model AS prev_model
, #prev.firstname AS pref_firstname
, a.submission_id
, a.manufacturer
, a.model
, set #prev.submission_id:=a.submission_id as currsid
, set #prev.manufacturer:=a.manufacturer as currman
, set #prev.model:=a.model as currmodel
, set #prev.firstname=a.forstname as currname
FROM atable
ORDER BY manufacturer, model, firstname, submission_id
)
WHERE prev_manufacturer=manufacturer
AND prev_model=model
AND prev_firstname=firstname
AND prev_submission_id<>submission_id;
So the solution is to simply make your data look like a normal relation....
SELECT ilv.values
, COUNT(ilv.submission_id)
, GROUP_CONCAT(ilv.submission_id)
FROM
(SELECT a.submission_id
, GROUP_CONCAT(CONCAT(a.key, '=',a.value)) AS values
FROM atable a
GROUP BY a.submission_id
) ilv
GROUP BY ilv.values
HAVING COUNT(ilv.submission_id)>1;
Hopefully the join and sequence based solutions should now be obvious.

How to check if there are some recordings share the same name, but they have different types in mysql

There is a table with 97972561 rows (recordings) and 4 columns (attributes). The format looks like:
+------+-------------+-------------+-------------+
| PMID | SUBJECT_NAME| SUBJECT_TYPE| Sentence_ID |
+------+-------------+-------------+-------------+
I would like to check if there are some subjects share the same name with different types.
For example, there are three recordings in a table:
+------+-------------+-------------+-------------+
| PMID | SUBJECT_NAME| SUBJECT_TYPE| Sentence_ID |
+------+-------------+-------------+-------------+
| 1 | Bob | F | 1 |
+------+-------------+-------------+-------------+
| 2 | Bob | B | 2 |
+------+-------------+-------------+-------------+
| 3 | Bob | F | 3 |
+------+-------------+-------------+-------------+
I do not care about how many cases, just want to check if there are two recordings with the same subject_name, but different subject_type. Any help would be appreciated!
I would aggregate by subject name and then assert that the max and min types are different:
SELECT SUBJECT_NAME
FROM yourTable
GROUP BY SUBJECT_NAME
HAVING MIN(SUBJECT_TYPE) <> MAX(SUBJECT_TYPE);
Note the way I wrote the HAVING clause leaves it sargable, meaning that any index on SUBJECT_TYPE could potentially be used. The following index might speed up this query:
CREATE INDEX idx ON yourTable (SUBJECT_NAME, SUBJECT_TYPE);

Confusion in creating table design

I am using Mysql and I have two tables-
BusDetails
+-------+-----------+
| busId | BusName |
+-------+-----------+
| 1 | A TRAVELS |
| 2 | B TRAVELS |
| 3 | C TRAVELS |
+-------+-----------+
AreaDetails
+--------+----------+
| cityId | cityName |
+--------+----------+
| 1 | ABC |
| 2 | DEF |
| 3 | GHI |
| 4 | JKL |
+--------+----------+
Now I have to create third table which will map bus table to city table. Suppose busId 1 stops at cityId 2 and 3 and bustId 2 stops at cityId 1 and 4. To create this scenario I have 2 options-
first option-
+-------+--------+
| busId | areaId |
+-------+--------+
| 1 | 3,2 |
| 2 | 4,1 |
+-------+--------+
second option-
+-------+--------+
| busId | areaId |
+-------+--------+
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 4 |
+-------+--------+
In future when there will be large number of records then which table will give better performance and why ?
The first option is poor because comma-separated lists do not get indexed. If you want to find all the busses in area 2, you would have to use
SELECT busID
FROM bus_areas
WHERE FIND_IN_SET('2', areaID)
This will have to perform a full table scan, parse the areaID column on each row, and test whether 2 is a member of the resulting array.
With the second version you can do:
SELECT busID
FROM bus_areas
WHERE areaID = 2
If you have an index on areaID, this will be extremely efficient.
If you wanted to know how many busses are in each area, it's easy with the second option:
SELECT areaID, COUNT(*)
FROM bus_areas
GROUP BY areaID
With the first option it would be more cumbersome:
SELECT cityID, COUNT(*)
FROM areaDetails a
JOIN bus_areas ba ON FIND_IN_SET(a.cityID, ba.areaID)
GROUP BY cityID
This will be very inefficient because it has to perform M*N FIND_IN_SET operations, and as I explained above this cannot be indexed. Notice that I had to join with the areaDetails table because there's no way to enumerate all the areas in the comma-separated lists in SQL.
The answer depends upon your use.
Although first option is not recommended but if you have very large data and you are not planning to perform wide range of Database operations (probably for self or small project) you can use it.
Second options has it's own advantage and recommended by relational model. It will give you more flexibility and scalability as this minimize redundancy.
Dear Second Table Is Better For All Reason Baecause At long time you have big data second type in save so many rows but better for getting report easy in easy for SQL query easy. you can all type join easy.

MySQL IN() Operator not working

How to use IN() Operator not working it's.
Those table are example and look the same as the real database I have.I don't have the permitting to add tables or change
Those are the tables:
students
+------+------+
| id | name |
+------+------+
| 1 | ali |
| 2 | man |
| 3 | sos |
+------+------+
Classes
+------+---------+
| c_id | students|
+------+---------+
| 1 | 1,2,3,4 |
| 2 | 88,33,55|
| 3 | 45,23,72|
+------+---------+
When I use this query it return me only the student with id =1
because "id IN (students)" return 1 when the first value are equal.
select name,c_id from students,classes where id IN (students);
when I get the list out on PHP than add it. it work fine.But, this solution need a loop and cost many queries.
select name,c_id from students,classes where id IN (1,2,3,4);
FIND_IN_SET()
the same happened, it's only return 1 but if the value on other position it return 0.
The IN operator works just fine, where it's applicable for what it does.
First, consider restructuring your data to be normalized, and avoid storing values as comma separated lists.
Second, if you absolutely have to deal with columns containing comma separated lists of values, MySQL provides the FIND_IN_SET() function.
FOLLOWUP
Ditch the old-school comma syntax for the join operation, and use the JOIN keyword instead. And relocate the join predicates from the WHERE clause to the ON clause. Fully qualify column references, eg.
SELECT s.name
, c.c_id
FROM students s
JOIN classes c
ON FIND_IN_SET(s.student_id,c.students)
ORDER BY s.name, c.c_id
To reiterate, storing a "comma separated list" in a column is an anti-pattern; it flies against relational theory and normalization, and disregards the best practices around relational databases. O
One might argue for improved performance, but this pattern doesn't improve performance; rather it adds unnecessary complexity in query and DML operations.
You need three tables.
One table students, one table classes, and then one table, say, students_to_classes containing something like
c_id | student_id
1 | 1
1 | 2
1 | 3
1 | 4
2 | 88
and so on.
Then you can query
select c_id from students_to_classes where student_id in (1,2,3,4)
Google "n:m relationship" for background on this.
EDIT
I know you're not specifically asking for another table structure, but this is a way of having a data type (a single number) that works with IN. Please believe me that this is the right way to do it, the reason you run into trouble with something as simple as IN is that you're using a non-standard approach, which, for such a standard problem, is typically not a good idea.
That's not how the function IN is supposed to work. You use IN when you have a list of possible matches like:
instead of:
WHERE id=1 or id=2 or id=3 or id=4
you use:
WHERE id IN (1,2,3,4)
Anyhow, your logic is not correct. The relation of Class and Student is Many-to-Many, thus a third table is needed. Let's call it studend_class, where you can store the students of each class.
student
+------+------+
| id | name |
+------+------+
| 1 | ali |
| 2 | man |
| 3 | sos |
+------+------+
class
+------+---------+
| id | name |
+------+---------+
| 1 | math |
| 2 | english |
| 3 | science |
+------+---------+
student_class
+------------+-------------+
| class_id | student_id |
+------------+-------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 3 | 3 |
+--------------+-----------+
In the example above all students are in math class and ali is also in science class.
Finally, if you whant to know which students are in what class, let's say Math, you can use:
SELECT s.id, s.name, c.name
FROM student s
INNER JOIN student_class sc ON sc.student_id=s.id
INNER JOIN class c ON sc.class_id = c.id
WHERE c.name="math";

How to split CSVs from one column to rows in a new table in MSSQL 2008 R2

Imagine the following (very bad) table design in MSSQL2008R2:
Table "Posts":
| Id (PK, int) | DatasourceId (PK, int) | QuotedPostIds (nvarchar(255)) | [...]
| 1 | 1 | | [...]
| 2 | 1 | 1 | [...]
| 2 | 2 | 1 | [...]
[...]
| 102322 | 2 | 123;45345;4356;76757 | [...]
So, the column QuotedPostIds contains a semicolon-separated list of self-referencing PostIds (Kids, don't do that at home!). Since this design is ugly as a hell, I'd like to extract the values from the QuotedPostIds table to a new n:m relationship table like this:
Desired new table "QuotedPosts":
| QuotingPostId (int) | QuotedPostId (int) | DatasourceId (int) |
| 2 | 1 | 1 |
| 2 | 1 | 2 |
[...]
| 102322 | 123 | 2 |
| 102322 | 45345 | 2 |
| 102322 | 4356 | 2 |
| 102322 | 76757 | 2 |
The primary key for this table could either be a combination of QuotingPostId, QuotedPostId and DatasourceID or an additional artificial key generated by the database.
It is worth noticing that the current Posts table contains about 6,300,000 rows but only about 285,000 of those have a value set in the QuotedPostIds column. Therefore, it might be a good idea to pre-filter those rows. In any case, I'd like to perform the normalization using internal MSSQL functionality only, if possible.
I already read other posts regarding this topic which mostly dealt with split functions but neither could I find out how exactly to create the new table and also copying the appropriate value from the Datasource column, nor how to filter the rows to touch accordingly.
Thank you!
€dit: I thought it through and finally solved the problem using an external C# program instead of internal MSSQL functionality. Since it seems that it could have been done using Mikael Eriksson's suggestion, I will mark his post as an answer.
From comments you say you have a string split function that you you don't know how to use with a table.
The answer is to use cross apply something like this.
select P.Id,
S.Value
from Posts as P
cross apply dbo.Split(';', P.QuotedPostIds) as S