I do clustering for news articles. I classify a number of data into a number of clusters. what I want to do is to take one data from each cluster that has content with the longest text.
I have two table, i want to join that two tables and show only the record with the longest text.
my tables:
Table newscontent
news_id title content category
1 abcd abcd a
2 abcd abcdefg a
3 abcd abcdefghij a
4 efgh efgh a
5 efgh efghijk a
6 efgh efghijklmn a
7 ijkl ijkl b
8 ijkl ijklmn b
Table newscluster
newscluster_id news_id category cluster
1 1 a 0
2 2 a 0
3 3 a 0
4 4 a 1
5 5 a 1
6 6 a 1
Desired output:
news_id title content category cluster
3 abcd abcdefghij a 0
6 efgh efghijklmn a 1
How can i do that?
You can accomplish what you want by using a series of joins. However, I have the feeling that your schema is not completely normalized.
SELECT t2.news_id,
t2.title,
t2.content,
t2.category,
t1.cluster
FROM newscluster t1
INNER JOIN newscontent t2
ON t1.news_id = t2.news_id
INNER JOIN
(
SELECT t1.cluster, MAX(CHAR_LENGTH(t2.content)) AS max_content_length
FROM newscluster t1
INNER JOIN newscontent t2
ON t1.news_id = t2.news_id
GROUP BY t1.cluster
) t3
ON t1.cluster = t3.cluster AND
CHAR_LENGTH(t2.content) = t3.max_content_length
-- WHERE t2.category = 'a'
Try this:
select * from (
select a.*, cluster from newscontent a
join newscluster b on a.news_id =b.news_id
order by length(content) desc) x
group by cluster
Some people will complain, but if it works, it works!
Related
I'm trying to count multiple columns as one column. For example:
Table 1 (Books):
ID
BookName
Genre
SubGenre
1
Name1
1
3
2
Name2
2
1
3
Name3
4
2
Table 2 (Genre):
ID
Genre
1
Horror
2
Drama
3
Romance
4
Sci-Fi
I want to be able to count the genre and subgenre as one to create a table of:
Result:
Genre
Count
Horror
2
Drama
2
Romance
1
Sci-Fi
1
Any help would be really appreciated. Thanks
Try below query-
SELECT t.Genre, Sum(t.cg) AS Count
FROM (
SELECT t2.Genre, Count(t1.Genre) AS cg
FROM Table2 as t2 LEFT JOIN Table1 as t1 ON t2.ID = t1.Genre
GROUP BY t2.Genre
UNION ALL
SELECT t2.Genre, Count(t1.SubGenre) AS cg
FROM Table2 as t2 LEFT JOIN Table1 as t1 ON t2.ID = t1.SubGenre
GROUP BY t2.Genre
) as t GROUP BY t.Genre;
I need to find a way to find what conversions are missing.
I have three tables.
Table 1: type, which are the different types.
id
name
1
typeA
2
typeB
3
typeC
Table 2: section, which are the different sections.
id
name
1
section1
2
section2
3
section3
4
section4
Table 3: conversions, which contains all the combinations to go from one type to another for each of the sections.
id
section_id
type_convert_from
type_convert_to
1
1
1
2
2
2
1
2
3
3
1
2
4
4
1
2
5
1
1
3
6
2
1
3
7
3
1
3
8
4
1
3
9
1
2
1
10
2
2
1
11
3
2
1
12
4
2
1
For example some are missing from the table above, how can I identify them with a SQL query? Thanks!
Try this. The cross join of table type with itself generates all possible combinations of type id's. I've excluded combinations in the cross join where id_from = id_to (ie you're not interested in conversions from a type to itself)
select * from conversions C
right join (
select T1.id as id_from, T2.id as id_to
from type T1 cross join type T2
where T1.id <> T2.id
) X on X.id_from = C.type_convert_from and X.id_to = C.type_convert_to
where C.type_convert_from is null
If you want to check missing type conversions by section, extend the cross join by adding the section table to include section.id as follows. It will list missing type conversions within each section.
select X.section_id, X.id_from, X.id_to from conversions C
right join (
select S.id as section_id, T1.id as id_from, T2.id as id_to
from types T1 cross join types T2 cross join section S
where T1.id <> T2.id
) X
on X.id_from = C.type_convert_from and X.id_to = C.type_convert_to
and C.section_id = X.section_id
where C.type_convert_from is null
I have results:
item_id subitem_id
----------------------
1 35
1 25
1 8
2 10
2 25
3 60
4 35
4 25
4 44
5 1
5 23
5 15
5 13
5 9
and I have two lists of subitem
(25,44,1)
(8,9)
how do I set the where clause in order to filter the result and return this
item_id subitem_id
----------------------
1 35
1 25 <-- first set
1 8 <-- second set
-----------------
5 1 <-- first set
5 23
5 15
5 13
5 9 <-- second set
because this item_id contain both subitem_id from two lists
SELECT
`item_id`
FROM table
WHERE `subitem_id` in (25,44,1)
AND `subitem_id` in (8,9)
Did not work, because in one time subitem_id have one id (not all list)
P.S.
This is a simple example, in reality we have more than 100k records with some join construction
http://sqlfiddle.com/#!9/71c28e5/3
SELECT t1.*
FROM (
SELECT DISTINCT(t1.item_id)
FROM t1
INNER JOIN t1 t2
ON t1.item_id = t2.item_id
AND t2.subitem_id in (8,9)
WHERE t1.subitem_id in (25,44,1)
) t
LEFT JOIN t1
ON t.item_id = t1.item_id
Another approach to avoid big number of executed records for mysql:
http://sqlfiddle.com/#!9/71c28e5/10
SELECT t1.*
FROM t1
WHERE item_id in (
SELECT DISTINCT(t1.item_id)
FROM t1
INNER JOIN t1 t2
ON t1.item_id = t2.item_id
AND t2.subitem_id in (25,44,1)
WHERE t1.subitem_id in (8,9)
)
SQL Fiddle
I think you're trying to make sure a item_ID has subcategories in 2 differen sets..
Select * from table A
where exists (Select 1 from table B where A.Item_Id = B.Item_ID and subitem_ID in (25,44,1))
and exists (Select 1 from table C where A.Item_Id = C.Item_ID and subitem_ID in (8,9))
just think I have a table1 like shown below
table1
id product
2 chocolate
1 chocolate
2 pepsi
3 fanta
2 pepsi
4 chocolate
5 chips
3 pizza
1 coke
2 chips
6 burger
7 sprite
0 pepsi
and want to arrange the above table in the manner shown below using only mysql
table2
id product
0 pepsi
1 chocolate, coke
2 chocolate,fanta,chips
3 fanta,pizza
4 chocolate
5 chips
6 burger
7 sprite
the above thing can be done by using
select id, group_concat(distinct product) as products
from table1
group by id
order by id
and after that i want the product column from table2 to be updated in another table which is named table3, which is shown below, here i want to update the products column from table2 as things column in table3 and the number in table3 = id from table2
table3
number name things
0 hi null
1 hello null
2 hehe null
3 wow null
4 hi null
5 hi null
6 hi null
7 hi null
and i want the final output to be shown as in table2
table3
number name things
0 hi pepsi
1 hello chocolate, coke
2 hehe chocolate,fanta,chips
3 wow fanta,pizza
4 hi chocolate
5 hi chips
6 hi burger
7 hi sprite
Use update from select syntax. Check here for more info
UPDATE table3
JOIN (SELECT id,
Group_concat(DISTINCT product) AS products
FROM table1
GROUP BY id) b
ON table3.number = b.id
SET table3.things = b.products
You have a nice normalized data structure and you want to start putting in comma-separated lists. That is a bad idea. Doable, but probably a bad idea.
You would use an update with a join:
update table3 t3 join
(select id, group_concat(distinct product) as products
from table1
group by id
) tt
on t3.number = tt.id
set t3.things = tt.products;
I need help with a relatively simple query. For a table:
A | B | C
----------
2 1 6
2 2 5
3 3 4
4 4 3
5 5 2
6 6 1
I need to have an output like so:
A | B | C
----------
2 1 6
3 3 4
4 4 3
5 5 2
6 6 1
So that each value in A is distinct, but I also get the corresponding values in B and C. I know "select distinct(A) from table" but that only returns the values 2,3,4,5,6 and I need the values in columns B and C, too. Please help. I have a deadline fast approaching. This question is stupid and trivial, but one must walk before they can run. Thanks so much.
Try this:
SELECT T1.A, T1.B, MIN(T1.C) AS C
FROM yourtable T1
JOIN (
SELECT A, MIN(B) AS B
FROM yourtable
GROUP BY A
) T2
ON T1.A = T2.A AND T1.B = T2.B
GROUP BY T1.A, T1.B
SELECT DISTINCT(A), B, C
FROM table
Is there a specific logic behind which distinct A rows you want to select when considering columns B and C?