Iterate through a column and summarize findings - mysql

I have a table (t1) in mySQL that generates the following table:
type time full
0 11 yes
1 22 yes
0 11 no
3 13 no
I would like to create a second table (t2) from this that will summarize the information found in t1 like the following:
type time num_full total
0 11 1 2
1 22 1 1
3 13 0 1
I want to be able to iterate through the type column in order to be able to start this summary, something like a for-loop. The types can be up to a value of n, so I would rather not write n+1 WHERE statements, then have to update the code every time more types are added.
Notice how t2 skipped the type of value 2? This has also been escaping me when I try looping. I only want the the types found to have rows created in t2.
While a direct answer would be nice, it would be much more helpful to be pointed to some sources where I could figure this out, or both.

This may do what you want
create table t2 if not exists select type, time, sum(full) num_full, count(*) count
from t1
group by type,time
order by type,time;
depending on how you want to aggregate the time column.
This is a starting point for reference on the group by functions : https://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html
here for create syntax
https://dev.mysql.com/doc/refman/5.6/en/create-table.html

Related

How this query can be answered ? Select SUM(1) FROM table

select * from "Test"."EMP"
id
1
2
3
4
5
Select SUM(1) FROM "Test"."EMP";
Select SUM(2) FROM "Test"."EMP";
Select SUM(3) FROM "Test"."EMP";
why the output of these queries is?
5
10
15
And
I don't understand why they write table name like this "Test"."EMP"
your table has 5 records. the statement select 1 from test.emp returns 5 records with values as 1 for all 5 records.
id
1
1
1
1
1
This is because db engine simply returns 1 for each existing record without reading the contents of the cell. and same happens for select <any static value> from test.emp
same happens for 2 and 3
id
2
2
2
2
2
hence there are 5 records returned with the static values and sum of those values will be the product of static number passed in the select statement and total records in the table
additional fact: It is always recommended to perform count(1) than count(*) as it consumes less resource and hence less load on the server
I don't think it's "Test"."EMP" with double quotes.. it's probably `Test`.`EMP` with backticks instead. The definition means its database_name.table_name. This is the recommended format to get the correct table_name from database_name; in this case, you're specifically making the syntax to query from `Test`.`EMP`. Read more about identifier qualifiers.
As for SUM(x), the x get's repeated according to the rows present in the table. So SUM(1) on 5 rows is 1+1+1+1+1, SUM(2) on 5 rows is 2+2+2+2+2, and so on.

MySql: adding columns dynamically, as many as rows in another table

Transport table
id name
1 T1
2 T2
Pallets table
id name
1 P1
2 P2
Transport Pallet Capacity table
id transport_id pallet_id capacity
1 1 1 10
2 1 2 null
3 2 1 20
4 2 2 24
How to generate table like this:
id transport_id pallet_id_1_capacity pallet_id_2_capacity
1 1 10 null
2 2 20 24
Problem: pallets and transports can be added, so, neither quantity is known in advance.
For example, manager adds another pallet type and 'pallet_id_3_capacity' column should be generated (and can show null if no capacity data is yet available).
Another manager can fill 'transport pallet capacity' table later when notified.
Is there a way to build sql in mysql that will care about the above: specifically - dynamic number of pallets?
The SQL select-list must be fixed at the time you write the query. You can't make SQL that auto-expands its columns based on the data it finds.
But your request is common, it's called a pivot-table or a crosstab table.
The only solution is to do this in multiple steps:
Query to discover the distinct pallet ids.
Use application code to build a dynamic SQL query with as many columns as distinct pallet id values found in the first query.
Run the resulting dynamic SQL query.
This is true for all SQL databases, not just MySQL.
See MySQL pivot row into dynamic number of columns for a highly-voted solution for producing a pivot-table query in MySQL.
I am not voting your question as a duplicate of that question, because your query also involves transport_id, which will make the query solution a bit different. But reading about other pivot-table solutions should get you started.

what does this sql query do? SELECT column_1 FROM table_1,table_2;

SELECT column_1 FROM table_1,table_2;
When I ran this on my database it returned huge number of rows with duplicate column_1 values. I could not understand why I got these results. Please explain what this query does.
it gives you a cross product from table 1 and table 2
In more layman's terms, it means that for each record in Table A, you get every record from Table B (all possible combinations).
TableA with 3 records and Table B with 3 records gives 9 total records in the result:
TableA-1/B-1
TableA-1/B-2
TableA-1/B-3
TableA-2/B-1
TableA-2/B-2
TableA-2/B-3
TableA-3/B-1
TableA-3/B-2
TableA-3/B-3
Often used as a basis for Cartesian Queries (which themselves are the means to generate, say, a list of future dates based on a recurrence schedule: give me all possible results for the next 6 months, then restrict that set to those whose factor matches my day of the week)
This is 'valid' way of cross joining two tables; it is not the preferred way though. Cross Join would be much clearer. An on condition would then be helpful to limit results,
Imagine that i have 3 friends named Jhon, Ana, Nick; then i have in the other table 2 are T-shirts a red and a yellow and i wanna know witch is from.
So in the query being tableA:Friends and tableB:Tshirts returns:
1|JHON | t-shirt_YELLOW
2|JHON | t-shirt_RED
3|ANA | t-shirt_YELLOW
4|ANA | t-shirt_RED
5|NICK | t-shirt_YELLOW
6|NICK | t-shirt_RED
As you see this join has no relational logic between friends and Tshirts so by evaluating all the posible combination generates what you call duplicates.

When comparing current row with previous row the query is too slow

When subtracting the previous row from the current row the query is too slow, is there a more efficient way to do this?
I am trying to create a data filter which has the capacity to highlight events which occur sequentially to those that do not. I have a table of machine operational data 'source' which is ordered chronologically. Using a WHERE clause I filter out the data which is of less relevance to this particular analysis. The remaining data is inserted into a new table 'filtered'. Using the inserted ID numbers from 'source' I compare each row with its proceeding row to find the difference in value – if the difference is 1 then then the events have occurred in sequence and if the difference is null then they have not. My problem is with the length of time it takes to compare a row with the previous row. I have reduced my data volume to just 2.5% (275000 rows) of what it full volume will be and the query takes 3012 seconds according to the MySQL Workbench action output. I have experimented with structuring the query differently but ultimately have reached dead ends. So my question is – Is there a more efficient way to compare a row with its previous row ?
OK – here are some more details.
/*First I create the table for the filtered data */
drop table if exists filtered_dta;
create table filtered_dta
(
ID int (11) not null auto_increment,
IDx1 int (11),
primary key (ID)
);
/Then I insert the filtered data/
insert into filtered_dta (IDx1)
select seq from source
WHERE range_value < -1.75
and range_value > -5 ;
/* Then I compare each row with its previous */
select t1.ID, t1.IDx1,(t1.IDx1-t2.IDx1)
as seq_value
from filtered_dta t1
left outer join filtered_dta t2
on t1.IDx1 = t2.IDx1+1
order by IDx1
;
Here are sample tables.
Table - filtered_dta Results
| ID | IDx1 | | ID | IDx1 | seq_value |
1 3 1 3 null
2 4 2 4 1
3 7 3 7 null
4 12 4 12 null
5 13 5 13 1
6 14 6 14 1
A full data set from the source table is expected to be between 3 and 10 million rows. The database will create and use about 50 tables. This database is being used as a back end engine for simulation software which does not have the capacity to process this amount of data and give an appropriate analysis of the system which the data represents.
I have spent some time on the issue and have come across the following;
It may be possible that the find_seq table is creates with myISAM and requires converting to an innoDB table. I tried to set the default engine to innoDB but seen no noticeable differences.
This question was similar in its problem of a slow query MySQL query painfully slow on large data - but its issue lay in having a function in a where clause – from my action output I can see the where clause is not too slow.
I would appreciate any input anyone may have on this. Also I am not a proficient user of MySQL so if possible give details.
Kind regards.
You can use something like this template to identify sequential "islands" without a self-join:
SELECT #island := #island + IF(seqId <> #lastSeqId + 1, 1, 0) AS island
, orderQ.[fieldsYouWant]
, #lastSeqId := seqId
FROM (
SELECT [fieldsYouWant], [sequentialIdentifier] AS seqId
FROM [theTable] AS t
, (SELECT #island := 0, #lastSeqId := [somethingItCannotBe]) AS init_dnr -- Initializes variables, do not reference
WHERE [filteringConditionsMet]
ORDER BY [orderingCriteria]
) AS orderingQ
;
I tried keeping it as generic as possible, but you'll note I had to revert to the assumption that seqId was numeric and expected to increment by one. Conditions in the island calculation can be much more complicated if needed (for cases such as where (A, 1), (A, 2), (B, 3) should be two islands based on the sequence not being defined by a single value).
You can take this template further, to identify "island" boundaries and sizes by simple making the above query as subquery for something like:
SELECT island, MIN(seqId), MAX(seqId), COUNT(seqId)
FROM ([above query]) AS islandQ
GROUP BY island
;

Obtain running frequency distribution from previous N rows of MySQL database

I have a MySQL database where one column contains status codes. The column is of type int and the values will only ever be 100,200,300,400. It looks like below; other columns removed for clarity.
id | status
----------------
1 300
2 100
3 100
4 200
5 300
6 300
7 100
8 400
9 200
10 300
11 100
12 400
13 400
14 400
15 300
16 300
The id field is auto-generated and will always be sequential. I want to have a third column displaying a comma-separated string of the frequency distribution of the status codes of the previous 10 rows. It should look like this.
id | status | freq
-----------------------------------
1 300
2 100
3 100
4 200
5 200
6 300
7 100
8 400
9 300
10 300
11 100 300,100,200,400 -- from rows 1-10
12 400 100,300,200,400 -- from rows 2-11
13 400 100,300,200,400 -- from rows 3-12
14 400 300,400,100,200 -- from rows 4-13
15 300 400,300,100,200 -- from rows 5-14
16 300 300,400,100 -- from rows 6-15
I want the most frequent code listed first. And where two status codes have the same frequency it doesn't matter to me which is listed first but I did list the smaller code before the larger in the example. Lastly, where a code doesn't appear at all in the previous ten rows, it shouldn't be listed in the freq column either.
And to be very clear the row number that the frequency string appears on does NOT take into account the status code of that row; it's only the previous rows.
So what have I done? I'm pretty green with SQL. I'm a programmer and I find this SQL language a tad odd to get used to. I managed the following self-join select statement.
select *, avg(b.status) freq
from sample a
join sample b
on (b.id < a.id) and (b.id > a.id - 11)
where a.id > 10
group by a.id;
Using the aggregate function avg, I can at least demonstrate the concept. The derived table b provides the correct rows to the avg function but I just can't figure out the multi-step process of counting and grouping rows from b to get a frequency distribution and then collapse the frequency rows into a single string value.
Also I've tried using standard stored functions and procedures in place of the built-in aggregate functions, but it seems the b derived table is out of scope or something. I can't seem to access it. And from what I understand writing a custom aggregate function is not possible for me as it seems to require developing in C, something I'm not trained for.
Here's sql to load up the sample.
create table sample (
id int NOT NULL AUTO_INCREMENT,
PRIMARY KEY(id),
status int
);
insert into sample(status) values(300),(100),(100),(200),(200),(300)
,(100),(400),(300),(300),(100),(400),(400),(400),(300),(300),(300)
,(100),(400),(100),(100),(200),(500),(300),(100),(400),(200),(100)
,(500),(300);
The sample has 30 rows of data to work with. I know it's a long question, but I just wanted to be as detailed as I could be. I've worked on this for a few days now and would really like to get it done.
Thanks for your help.
The only way I know of to do what you're asking is to use a BEFORE INSERT trigger. It has to be BEFORE INSERT because you want to update a value in the row being inserted, which can only be done in a BEFORE trigger. Unfortunately, that also means it won't have been assigned an ID yet, so hopefully it's safe to assume that at the time a new record is inserted, the last 10 records in the table are the ones you're interested in. Your trigger will need to get the values of the last 10 ID's and use the GROUP_CONCAT function to join them into a single string, ordered by the COUNT. I've been using SQL Server mostly and I don't have access to a MySQL server at the moment to test this, but hopefully my syntax will be close enough to at least get you moving in the right direction:
create trigger sample_trigger BEFORE INSERT ON sample
FOR EACH ROW
BEGIN
DECLARE _freq varchar(50);
SELECT GROUP_CONCAT(tbl.status ORDER BY tbl.Occurrences) INTO _freq
FROM (SELECT status, COUNT(*) AS Occurrences, 1 AS grp FROM sample ORDER BY id DESC LIMIT 10) AS tbl
GROUP BY tbl.grp
SET new.freq = _freq;
END
SELECT id, GROUP_CONCAT(status ORDER BY freq desc) FROM
(SELECT a.id as id, b.status, COUNT(*) as freq
FROM
sample a
JOIN
sample b ON (b.id < a.id) AND (b.id > a.id - 11)
WHERE
a.id > 10
GROUP BY a.id, b.status) AS sub
GROUP BY id;
SQL Fiddle