So I have a weird scenario where I want the following data to be ordered in a certain way. Let the table data be:
abc 111 2 priority
abc 111 blah data
abc 222 1 priority
abc 222 blah data
abc 333 3 priority
abc 333 blah data
I want to order that data based on column three (where column 4 is priority) but keep the return order grouped by column 2. So the expected query result would look like:
abc 222 1 priority
abc 222 blah data
abc 111 2 priority
abc 111 blah data
abc 333 3 priority
abc 333 blah data
What's the best possible way of doing this. I can think of doing a query up front and an in clause, but then I would have to account for all possible priorities.
Thanks in advance. FYI, its MySQL that I am using.
I think this is what you need:
http://sqlfiddle.com/#!2/48057/14
DDL
CREATE TABLE my_data(
name VARCHAR(20),
num NUMERIC(5),
c NUMERIC(3),
priority NUMERIC(3)
);
DML
INSERT INTO my_data VALUES("abc", 111, 2, 1);
INSERT INTO my_data VALUES("abc", 222, 3, 4);
INSERT INTO my_data VALUES("abc", 222, 1, 9);
INSERT INTO my_data VALUES("abc", 111, 4, 2);
It would be better if you could have actually include actual column names even with bogus data, however, I've named them corresponding with the type of content I think you are presenting.
It appears your second column is some "ID" column and want to keep them all grouped together. So, pre-query that in the order you want but only for the "Priority" column ignoring the rest of the records. THEN, using MySQL variables, you can assign it a sequential value for final output. Then join to the regular data on the "ID" column for all other values... Something like...
select
md2.ABCColumn,
SortSeqQuery.IDColumn,
md2.DescripColumn,
md2.ColumnWith123Sequence
from
my_Data md2
LEFT JOIN ( select
md1.IDColumn,
#SeqVal := #SeqVal +1 as FinalSortOrder
from
my_Data md1,
( select #SeqVal := 0 ) sqlvars
where
md1.DescripColumn = "priority"
order by
md1.ColumnWith123Sequence,
md1.IDColumn ) SortSeqQuery
on md2.IDColumn = SortSeqQuery.IDColumn
order by
case when SortSeqQuery.IDColumn is null then 2 else 1 end,
coalesce( SortSeqQuery.FinalSortOrder, 0 ) as FinalSort
With the "Order By", it pre-sorts the qualified data BEFORE it actually applies the #SeqVal which in turn should come back as 1, 2, 3, 4, etc...
The "SortSeqQuery" will be run first to pre-qualify any "priority" records and have them sorted and available. Then, your "my_data" table is basis of the query and joins to that result set and grabs the appropriate sort sequence based on matching ID column. If there are IDColumn entries that DO NOT have a priority, then they will also be included, but with the order by based on a case/when, any that are null are pre-sorted to the bottom, those found will be sorted to the top. Within that sort, it will keep all IDColumn entries grouped together since the same "FinalSortOrder" is directly associated to the IDColumn in the preliminary query.
ANSWER -
CREATE TABLE my_data( name VARCHAR(20),groupid INT(5), attributekey VARCHAR(10),attributevalue VARCHAR(10));
INSERT INTO my_data VALUES("abc", 111, "priority", "2");
INSERT INTO my_data VALUES("abc", 111, "someattr", "blah");
INSERT INTO my_data VALUES("abc", 222, "priority", "1");
INSERT INTO my_data VALUES("abc", 222, "someattr", "blah");
INSERT INTO my_data VALUES("abc", 333, "priority", "3");
INSERT INTO my_data VALUES("abc", 333, "someattr", "blah");
solution -
SELECT m1.*,
(select attributevalue from my_data m2
where m1.groupid=m2.groupid
AND m2.attributekey='priority'
) groupidpriority
FROM my_data m1
order by groupidpriority;
Related
I am trying to figure out in PostgreSQL 11 JSONB query
SELECT id, my_json_field #>> '{field_depth_1, field_depth_2}'
FROM my_data
WHERE my_json_field #> '{"other_field": 3}'::jsonb
If other_field is a key-value pair, this works perfectly and I get every row with other_field = 3. However, if other_field is a list of values, eg: [2,3,6,8,10], and I want to find out for every row whether the value 3 exists in the list represented by other_field, how should I write the query?
Use the operator #>. Per the documentation:
#> jsonb Does the left JSON value contain the right JSON path/value entries at the top level?
Example:
with my_data(id, my_json_field) as (
values
(1, '{"field_depth_1": {"field_depth_2": "something 1"}, "other_field": 3}'::jsonb),
(2, '{"field_depth_1": {"field_depth_2": "something 2"}, "other_field": 4}'),
(3, '{"field_depth_1": {"field_depth_2": "something 3"}, "other_field": [2,3,6,8,10]}'),
(4, '{"field_depth_1": {"field_depth_2": "something 4"}, "other_field": [2,4,6,8,10]}')
)
select id, my_json_field #>> '{field_depth_1, field_depth_2}' as value
from my_data
where my_json_field->'other_field' #> '3'
id | value
----+-------------
1 | something 1
3 | something 3
(2 rows)
i have an existing table
id, name, user
15, bob, 1
25, alice, 2
30, ann, 1
55, bob, 2
66, candy, 1
we want the name records for user 1 to now be set to the values in this string:
"ann, candy, dave"
if I do it the easy way
delete from table where user = 1
insert into table (name,user) values (ann,1), (candy,1), (dave,1)`
then the table now looks like this
id, name, user
25, alice, 2
55, bob, 2
67, ann, 1
68, candy, 1
69, dave, 1
i.e. new rows are created. I don't want the new identities, and over time in huge tables, this causes fragmentation and identity holes and so on. what is the most efficient way in SQL to reduce this to just the actual 2 required operations:
delete from table where user = 1 and name is not in the string "ann, candy, dave", so that the table is then:
25, alice, 2
30, ann, 1
55, bob, 2
66, candy, 1
`
insert into table user = 1, name = any value from "ann, candy, dave" that does not match name/user=1 , so that the table is then:
25, alice, 2
30, ann, 1
55, bob, 2
66, candy, 1
67, dave, 1
It sounds like you have a list and want to process it twice, once for deletes and once for inserts. Store the list in a temporary table and use that for processing.
Along the way, start with a unique index on user, name to prevent updates into the table:
create unique index idx_table_user_name on table(user, name);
This seems to be a requirement for your data, so let the database enforce it. Then the code for processing is like:
create temporary table toprocess as (
select 1 as user, 'ann' as name union all
select 1, 'candy' union all
select 1, 'dave'
);
create index idx_toprocess_user_name on toprocess(user, name);
delete t
from table t
where t.user in (select p.user from toprocess p) and
not exists (select 1 from toprocess p where p.user = t.user and p.name = t.name);
insert into table(user, name)
select user, name
from toprocess
on duplicate key update user = values(user);
Although this might look a bit complicated, it lets you handle multiple users at the same time. And, the list for processing is only entered once, which reduces the scope for error.
It is not so clear but may be this is what you want:
delete from table where user = 1 and name not in('ann', 'candy', 'dave')
insert into table
select * from(select 'ann' as name
union all
select 'candy'
union all
select 'dave') t
where t.name not in(select name from table where user = 1)
Short question about the statement "group by" in mysql:
My current db structure looks like:
CREATE TABLE TableName
(
ID int primary key,
name varchar(255),
number varchar(255)
);
INSERT INTO TableName
(ID, name, number)
VALUES
(1, "Test 1", "100000"),
(2, "Apple", "200000"),
(3, "Test 1 beta", "100000"),
(4, "BLA", "300000"),
(5, "ABU", "400000"),
(6, "CBA", "700000"),
(7, "ABC", "600000"),
(8, "Orange - Test", "400000"),
(9, "ABC", "");
My current statement looks like:
SELECT name, number, count(*) as Anzahl
FROM TableName
group by name,number
with this statement the result looks like:
NAME NUMBER ANZAHL
ABC 1
Test 1 100000 2
Apple 200000 1
BLA 300000 1
ABU 400000 2
ABC 600000 1
CBA 700000 1
But the value "ABC" wouldn't merged.
the result should look like:
NAME NUMBER ANZAHL
Test 1 100000 2
Apple 200000 1
BLA 300000 1
ABU 400000 2
ABC 600000 2
CBA 700000 1
Any Ideas how it could work?
SQLFiddle:
http://sqlfiddle.com/#!2/dcbee/1
the solution must be performant for something like +1 000 000 rows
First of all IMHO, it's a bad design to store numbers into character column. Working with integers is faster than characters. Being said that, I assume all values in name column will be numbers. Here's is a query to avoid multiple ABC values
SELECT name,
SUM(convert(number, SIGNED INTEGER)) as number,
count(*) as Anzahl
FROM TableName
GROUP BY name ;
This is what I suggest (SQL Fiddle Link: http://sqlfiddle.com/#!2/c6f83b/5/0)
Like #Parag said, I strongly urge you to changed the table definition
Then the SQL is easy:
SELECT name, number, COUNT(*) AS anzahl
FROM tablename
WHERE number IS NOT NULL
GROUP BY name, number
I have a list of unigrams (single word), bigrams (two words), and trigrams (three words) I have pulled out of a bunch of documents. My goal is a statically analyses report and also a search I can use on these documents.
John Doe
Xeon 5668x
corporate tax rates
beach
tax plan
Porta San Giovanni
The ngrams are tagged by date and document. So for example, I can find relations between bigrams and when their phrases first appeared as well as relations between documents. I can also search for documents that contain these X number of un/bi/trigram phrases.
So my question is how to store them to optimize these searches.
The simplest approach is just a simple string column for each phrase and then I add relations to the document_ngram table each time I find that word/phrase in the document.
table document
{
id
text
date
}
table ngram
{
id
ngram varchar(200);
}
table document_ngram
{
id
ngram_id
document_id
date
}
However, This means that if I want to search through trigrams for a single word I have to use string searching. For example, lets say I wanted all trigrams with the word "summer" in them.
So if I instead split the words up so that the only thing stored in ngram was a single word, then added three columns so that all 1, 2, & 3 word chains could fit inside document_ngram?
table document_ngram
{
id
word1_id NOT NULL
word2_id DEFAULT NULL
word3_id DEFAULT NULL
document_id
date
}
Is this the correct way to do it? Are their better ways? I am currently using PostgreSQL and MySQL but I believe this is a generic SQL question.
This is how I would model your data (note that 'the' is referenced twice) You could also add weights to the single words.
DROP SCHEMA ngram CASCADE;
CREATE SCHEMA ngram;
SET search_path='ngram';
CREATE table word
( word_id INTEGER PRIMARY KEY
, the_word varchar
, constraint word_the_word UNIQUE (the_word)
);
CREATE table ngram
( ngram_id INTEGER PRIMARY KEY
, n INTEGER NOT NULL -- arity
, weight REAL -- payload
);
CREATE TABLE ngram_word
( ngram_id INTEGER NOT NULL REFERENCES ngram(ngram_id)
, seq INTEGER NOT NULL
, word_id INTEGER NOT NULL REFERENCES word(word_id)
, PRIMARY KEY (ngram_id,seq)
);
INSERT INTO word(word_id,the_word) VALUES
(1, 'the') ,(2, 'man') ,(3, 'who') ,(4, 'sold') ,(5, 'world' );
INSERT INTO ngram(ngram_id, n, weight) VALUES
(101, 6, 1.0);
INSERT INTO ngram_word(ngram_id,seq,word_id) VALUES
( 101, 1, 1)
, ( 101, 2, 2)
, ( 101, 3, 3)
, ( 101, 4, 4)
, ( 101, 5, 1)
, ( 101, 6, 5)
;
SELECT w.*
FROM ngram_word nw
JOIN word w ON w.word_id = nw.word_id
WHERE ngram_id = 101
ORDER BY seq;
RESULT:
word_id | the_word
---------+----------
1 | the
2 | man
3 | who
4 | sold
1 | the
5 | world
(6 rows)
Now, suppose you want to add a 4-gram to the existing (6-gram) data:
INSERT INTO word(word_id,the_word) VALUES
(6, 'is') ,(7, 'lost') ;
INSERT INTO ngram(ngram_id, n, weight) VALUES
(102, 4, 0.1);
INSERT INTO ngram_word(ngram_id,seq,word_id) VALUES
( 102, 1, 1)
, ( 102, 2, 2)
, ( 102, 3, 6)
, ( 102, 4, 7)
;
SELECT w.*
FROM ngram_word nw
JOIN word w ON w.word_id = nw.word_id
WHERE ngram_id = 102
ORDER BY seq;
Additional result:
INSERT 0 2
INSERT 0 1
INSERT 0 4
word_id | the_word
---------+----------
1 | the
2 | man
6 | is
7 | lost
(4 rows)
BTW: adding a document-type object to this model will add two additional tables to this model: one for the document, and one for document*ngram. (or in another approach: for document*word) A recursive model would also be a possibility.
UPDATE: the above model will need an additional constraint, which will need triggers (or a rule+ an additional table) to be implemented. Pseudocode:
ngram_word.seq >0 AND ngram_word.seq <= (select ngram.n FROM ngram ng WHERE ng.ngram_id = ngram_word.ngram_id)
One idea would be to modify your original table layout a bit. Consider the ngram varchar(200) column to only contain 1 word of the ngram, add in a word_no (1, 2, or 3) column, and add in a grouping column, so that, for example the two records for the two words in a bigram are related (give them the same word_group). [In Oracle, I'd pull the word_group numbers from a Sequence - I think PostGres would have something similar)
table document
{
id
text
date
}
table ngram
{
id
word_group
word_no
ngram varchar(200);
}
table document_ngram
{
id
ngram_id
document_id
date
}
My knowledge of MySQL is basic. I want to build a query to return all rows that sum a given value, in ascending order. I can't figure out how I can do that. Using sum() only returns one row. I've tried a subquery but it returns all rows. I don't want anybody do my work, I just want you to help me to figuring this out.
Anybody have an idea?
How to retrieve all rows that its filed "value" sum 30
Example:
given value: 30
field to sum: value
table:
id name value order
1 name1 3 1
2 name2 10 6
3 name3 13 3
4 name4 5 8
5 name5 20 25
So, the query must return:
id 1, id 3, id 2, id 4
Thanks in advance.
set #total:=0;
select id, name, value, `order`
from
(select
id, name, value, `order`,
#total:=if(#total is null, 0, #total)+`order` as total
from THE_TABLE
order by `order`
) as derived
where total<=30;
Using postgres as database, I think this does what you want. I'm not sure if it works similar in mysql:
CREATE TABLE test (
id int,
name varchar(50),
value int,
order_ int
);
INSERT INTO test values (1, 'name1', 3, 1);
INSERT INTO test values (3, 'name3', 13, 3);
INSERT INTO test values (2, 'name2', 10, 6);
INSERT INTO test values (4, 'name4', 5, 8);
INSERT INTO test values (5, 'name5', 20, 25);
SELECT * FROM (SELECT *, SUM(value) OVER (ORDER BY order_) sumvalues FROM TEST) a WHERE sumvalues <30