mysql recursive self join - mysql

create table test(
container varchar(1),
contained varchar(1)
);
insert into test values('X','A');
insert into test values('X','B');
insert into test values('X','C');
insert into test values('Y','D');
insert into test values('Y','E');
insert into test values('Y','F');
insert into test values('A','P');
insert into test values('P','Q');
insert into test values('Q','R');
insert into test values('R','Y');
insert into test values('Y','X');
select * from test;
mysql> select * from test;
+-----------+-----------+
| container | contained |
+-----------+-----------+
| X | A |
| X | B |
| X | C |
| Y | D |
| Y | E |
| Y | F |
| A | P |
| P | Q |
| Q | R |
| R | Y |
| Y | X |
+-----------+-----------+
11 rows in set (0.00 sec)
Can I find out all the distinct values contained under 'X' using a single self join?
EDIT
Like, Here
X contains A, B and C
A contains P
P contains Q
Q contains R
R contains Y
Y contains C, D and E...
So I want to display A,B,C,D,E,P,Q,R,Y when I query for X.
EDIT
Got it right by programming.
package com.catgen.helper;
import java.sql.Connection;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;
import com.catgen.factories.Nm2NmFactory;
public class Nm2NmHelper {
private List<String> fetched;
private List<String> fresh;
public List<String> findAllContainedNMByMarketId(Connection conn, String marketId) throws SQLException{
fetched = new ArrayList<String>();
fresh = new ArrayList<String>();
fresh.add(marketId.toLowerCase());
while(fresh.size()>0){
fetched.add(fresh.get(0).toLowerCase());
fresh.remove(0);
List<String> tempList = Nm2NmFactory.getContainedNmByContainerNm(conn, fetched.get(fetched.size()-1));
if(tempList!=null){
for(int i=0;i<tempList.size();i++){
String current = tempList.get(i).toLowerCase();
if(!fetched.contains(current) && !fresh.contains(current)){
fresh.add(current);
}
}
}
}
return fetched;
}
}
Not the same table and fields though. But I hope you get the concept.
Thanks guys.

You can't get all the contained objects recursively using a single join with that data structure. You would need a recursive query but MySQL doesn't yet support that.
You could however construct a closure table, then you can do it with a simple query. See Bill Karwin's slideshow Models for heirarchical data for more details and other approaches (for example, nested sets). Slide 69 compares the different designs for ease of implementing 'Query subtree'. Your chosen design (adjacency list) is the most awkward of all four designs for this type of query.

What about reading the whole table into a php array, and determine the children via. a function which would call itself?
But this is not a good solution if the table has more than 10000 rows...

Related

SQLAlchemy - filtering rows before today with autoloaded DATETIME column?

I have a MARIADB database radio_progs with a table FUTUREEPISODE. I'm using SQLAlchemy and trying to add a function that selects all entries in the table that are before today.
I'm having problems with the datetime field though. Is this as I'm autoloading the fields? In my real world example I have a number of columns so would prefer to autoload than specify each individually.
error is
eps = self.query.filter_by(IN_LIST=1, EP_ENDTIME < todays_datetime).all()
^
SyntaxError: positional argument follows keyword argument
The table has the following columns
| Column | Type |
| ---------- | ---------- |
| ID | int(11) |
| EP_ENDTIME | datetime |
| IN_LIST | tinyint(1) |
from datetime import datetime
from sqlalchemy import and_, func
from .dbmgr import db
class FutureEpisode(db.Model):
__bind_key__ = 'radio_progs'
__tablename__ = 'FUTUREEPISODE'
__table_args__ = {
'autoload': True,
'autoload_with': db.engine
}
def get_expired(self):
todays_datetime = datetime(datetime.today().year, datetime.today().month, datetime.today().day)
eps = self.query.filter_by(IN_LIST=1, EP_ENDTIME < todays_datetime).all()
return eps
Using filter rather than filter_by works, i.e. changing the query to:
eps = self.query.filter(FutureEpisode.IN_LIST==1, FutureEpisode.EP_ENDTIME < todays_datetime).all()

Loading quoted numbers into snowflake table from CSV with COPY TO <TABLE>

I have a problem with loading CSV data into snowflake table. Fields are wrapped in double quote marks and hence there is problem with importing them into table.
I know that COPY TO has CSV specific option FIELD_OPTIONALLY_ENCLOSED_BY = '"'but it's not working at all.
Here are some pices of table definition and copy command:
CREATE TABLE ...
(
GamePlayId NUMBER NOT NULL,
etc...
....);
COPY INTO ...
FROM ...csv.gz'
FILE_FORMAT = (TYPE = CSV
STRIP_NULL_VALUES = TRUE
FIELD_DELIMITER = ','
SKIP_HEADER = 1
error_on_column_count_mismatch=false
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
)
ON_ERROR = "ABORT_STATEMENT"
;
Csv file looks like this:
"3922000","14733370","57256","2","3","2","2","2019-05-23 14:14:44",",00000000",",00000000",",00000000",",00000000","1000,00000000","1000,00000000","1317,50400000","1166,50000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000",",00000000"
I get an error
'''Numeric value '"3922000"' is not recognized '''
I'm pretty sure it's because NUMBER value is interpreted as string when snowflake is reading "" marks, but since I use
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
it shouldn't even be there... Does anyone have some solution to this?
Maybe something is incorrect with your file? I was just able to run the following without issue.
1. create the test table:
CREATE OR REPLACE TABLE
dbNameHere.schemaNameHere.stacko_58322339 (
num1 NUMBER,
num2 NUMBER,
num3 NUMBER);
2. create test file, contents as follows
1,2,3
"3922000","14733370","57256"
3,"2",1
4,5,"6"
3. create stage and put file in stage
4. run the following copy command
COPY INTO dbNameHere.schemaNameHere.STACKO_58322339
FROM #stageNameHere/stacko_58322339.csv.gz
FILE_FORMAT = (TYPE = CSV
STRIP_NULL_VALUES = TRUE
FIELD_DELIMITER = ','
SKIP_HEADER = 0
ERROR_ON_COLUMN_COUNT_MISMATCH=FALSE
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
)
ON_ERROR = "CONTINUE";
4. results
+-----------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
| file | status | rows_parsed | rows_loaded | error_limit | errors_seen | first_error | first_error_line | first_error_character | first_error_column_name |
|-----------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------|
| stageNameHere/stacko_58322339.csv.gz | LOADED | 4 | 4 | 4 | 0 | NULL | NULL | NULL | NULL |
+-----------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
1 Row(s) produced. Time Elapsed: 2.436s
5. view the records
>SELECT * FROM dbNameHere.schemaNameHere.stacko_58322339;
+---------+----------+-------+
| NUM1 | NUM2 | NUM3 |
|---------+----------+-------|
| 1 | 2 | 3 |
| 3922000 | 14733370 | 57256 |
| 3 | 2 | 1 |
| 4 | 5 | 6 |
+---------+----------+-------+
Can you try with a similar test as this?
EDIT: A quick look at your data shows many of your numeric fields appear to start with commas, so something definitely amiss with the data.
Assuming your numbers are European formatted , decimal place, and . thousands, reading the numeric formating help, it seems Snowflake does not support this as input. I'd open a feature request.
But if you read the column in as text then use REPLACE like
SELECT '100,1234'::text as A
,REPLACE(A,',','.') as B
,TRY_TO_DECIMAL(b, 20,10 ) as C;
gives:
A B C
100,1234 100.1234 100.1234000000
safer would be to strip placeholders first like
SELECT '1.100,1234'::text as A
,REPLACE(A,'.') as B
,REPLACE(B,',','.') as C
,TRY_TO_DECIMAL(C, 20,10 ) as D;

Django query on related model

I have models like below
class Scheduler(models.Model):
id = <this is primary key>
last_run = <referencing to id in RunLogs below>
class RunLogs(models.Model):
id = <primary key>
scheduler = <referencing to id in Scheduler above>
overall_status = <String>
Only when the scheduler reaches the scheduled time of the job, RunLogs entry is created.
Now I am querying on RunLogs to show running schedules as below.
current = RunLog.objects\
.filter(Q(overall_status__in = ("RUNNING", "ON-HOLD", "QUEUED") |
Q(scheduler__last_run__isnull = True))
The above query gives me all records with matching status from RunLogs but does not give me records from Scheduler with last_run is null.
I understand why the query is behaving so but is there a way to get records from scheduler also with last_run is null
?
I just did the same steps which you followed and found the reason why you where getting all the records after running your query. Here is the exact steps and a solution for this.
Steps
Created models
from django.db import models
class ResourceLog(models.Model):
id = models.BigIntegerField(primary_key=True)
resource_mgmt = models.ForeignKey('ResourceMgmt', on_delete=models.DO_NOTHING,
related_name='cpe_log_resource_mgmt')
overall_status = models.CharField(max_length=8, blank=True, null=True)
class ResourceMgmt(models.Model):
id = models.BigIntegerField(primary_key=True)
last_run = models.ForeignKey(ResourceLog, on_delete=models.DO_NOTHING, blank=True, null=True)
Added the data as following:
resource_log
+----+----------------+------------------+
| id | overall_status | resource_mgmt_id |
+----+----------------+------------------+
| 1 | RUNNING | 1 |
| 2 | QUEUED | 1 |
| 3 | QUEUED | 1 |
+----+----------------+------------------+
resource_mgmt
+----+-------------+
| id | last_run_id |
+----+-------------+
| 1 | NULL |
| 2 | NULL |
| 3 | NULL |
| 4 | 3 |
+----+-------------+
According to the above table resource_mgmt(4) is referring to resource_log(3). But thing to be noted is, resource_log(3) is not referring to resource_mgmt(4).
Ran the following command in python shell
In [1]: resource_log1 = ResourceLog.objects.get(id=1)
In [2]: resource_log.resource_mgmt
Out[2]: <ResourceMgmt: ResourceMgmt object (1)>
In [3]: resource_log1 = ResourceLog.objects.get(id=2)
In [4]: resource_log.resource_mgmt
Out[4]: <ResourceMgmt: ResourceMgmt object (1)
In [5]: resource_log1 = ResourceLog.objects.get(id=3)
In [6]: resource_log.resource_mgmt
Out[6]: <ResourceMgmt: ResourceMgmt object (1)>
from this we can understand that all the resource_log objects are referring to 1st object of resource_mgmt(ie, id=1).
Q) Why all the objects are referring to 1st object in the resource_mgmt?
resource_mgmt is a foreign key field which is not null. Its default value is 1. when you create a resource_log object, if you are not specifying resource_mgmt, it will add the default value there which is 1.
Run your query
In [60]: ResourceLog.objects.filter(resource_mgmt__last_run__isnull = True)
Out[60]: <QuerySet [<ResourceLog: ResourceLog object (1)>, <ResourceLog: ResourceLog object (2)>, <ResourceLog: ResourceLog object (3)>]>
This query is returning all three ResourceLog objects because all three are referring to 1st resource_mgmt object which has its is_null value as True
Solution
You actually want to check the reverse relationship.
We can achieve this using two queries:
rm_ids = ResourceMgmt.objects.exclude(last_run=None).values_list('last_run', flat=True)
current = ResourceLog.objects.filter(overall_status__in = ("RUNNING", "QUEUED")).exclude(id__in=rm)
The output is:
<QuerySet [<ResourceLog: ResourceLog object (1)>, <ResourceLog: ResourceLog object (2)>]>
Hope that helps!

Copy previous values kettle pentaho

I have an issue and i'm looping on it! :| I hope someone can help me..
So i have an input file (.xls), that is simple but there are a row (lets say its "ROW1") that is like this:
ROW1 | ROW2 | ROW3 | ROW_N
765 | 1 | AAAA-MM-DD | ...
null | 1 | AAAA-MM-DD | ...
null | 1 | AAAA-MM-DD | ...
944 | 2 | AAAA-MM-DD | ...
null | 2 | AAAA-MM-DD | ...
088 | 7 | AAAA-MM-DD | ...
555 | 2 | AAAA-MM-DD | ...
null | 2 | AAAA-MM-DD | ...
There are no stardard here, like you can see.. There are some lines null (ROW1) and in ROW2, there are equal numbers, with different association to ROW1 (like in line 5 and 6, then in line 8 and 9).
My objective is to copy and paste the values from ROW1, in the ROW1 after when is null, till isn't null. Basically is to copy form previous step, when is null...
I'm trying to use the "Formula" step, by using something like:
=IF(AND(ISBLANK([ROW1]);NOT(ISBLANK([ROW2]));ROW_n=ROW1;IF(AND(NOT(ISBLANK([ROW1]));NOT(ISBLANK([ROW2]));ROW_n=ROW1;ROW_n=""));
But nothing yet..
I've tried "Analytic Query" but nothing too..
I'm using just stream a xls file input..
Tks very much, any help is very much appreciiated!!
Best Regardsd!
Well i discover a solution, adding a "User Defined Java Class" with the code below:
import java.util.HashMap;
private FieldHelper output_field, card_field;
private RowSet out, log;
private String previou_card =null;
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
if (first)
{
first = false;
out = findTargetRowSet("out");
output_field = get(Fields.Out, "previous_card");
} else {
Object[] r = getRow();
if (r == null) {
setOutputDone();
return false;
}
r = createOutputRow(r, data.outputRowMeta.size());
if (previous_card != null) {
output_field.setValue(r, previous_card);
}
if (card_field == null) {
card_field = get(Fields.In, "Grupo de Cartões");
}
String card = card_field.getString(r);
if (card != null && !card.isEmpty()) {
previous_card = card;
}
// Send the row on to the next step.
putRowTo(data.outputRowMeta, r, out);
}
return true;
After this i have to put a few steps but this help very much.
Thank you mates!!
Finally i got result. Please follow below steps
Below image is full transformation screen.
Data Grid Data will be like these. Sorry for that in my local i don't have Microsoft because of that i took Data Grid. Instead of Data Grid you can drag and drop Microsoft Excel Input step.
Drag and Drop one java script step and write below code.
Last step of transformation, drag and drop Select values step and select the columns.( These step is no necessary)
Final result will be like these.
Hope this helps.

Summarizing/aggregating a Scala Slick object into another

I'm essentially trying to recreate the following SQL query using Scala Slick:
select labelOne, labelTwo, sum(countA), sum(countB) from things where date > 'blah' group by labelOne, labelTwo;
As you can see, it takes what a table of labeled things and aggregates them, summing various counts. A table with the following info:
ID | date | labelOne | labelTwo | countA | countB
-------------------------------------------------
0 | 0 | foo | cheese | 1 | 2
1 | 0 | bar | wine | 0 | 3
2 | 1 | foo | cheese | 3 | 4
3 | 1 | bar | wine | 2 | 1
4 | 2 | foo | beer | 1 | 1
Should yield the following result if queried across all dates:
labelOne | labelTwo | countA | countB
-------------------------------------
foo | cheese | 4 | 6
bar | wine | 2 | 4
foo | beer | 1 | 1
This is what my Scala code looks like:
import scala.slick.driver.MySQLDriver.simple._
import scala.slick.jdbc.StaticQuery
import StaticQuery.interpolation
import org.joda.time.LocalDate
import com.github.tototoshi.slick.JodaSupport._
case class Thing(
id: Option[Long],
date: LocalDate,
labelOne: String,
labelTwo: String,
countA: Long,
countB: Long)
// summarized version of "Thing": note there's no date in this object
// each distinct grouping of Thing.labelOne + Thing.labelTwo should become a "SummarizedThing", with summed counts
case class SummarizedThing(
labelOne: String,
labelTwo: String,
countASum: Long,
countBSum: Long)
trait ThingsComponent {
val Things: Things
class Things extends Table[Thing]("things") {
def id = column[Long]("id", O.PrimaryKey, O.AutoInc)
def date = column[LocalDate]("date", O.NotNull)
def labelOne = column[String]("labelOne", O.NotNull)
def labelTwo = column[String]("labelTwo", O.NotNull)
def countA = column[Long]("countA", O.NotNull)
def countB = column[Long]("countB", O.NotNull)
def * = id.? ~ date ~ labelOne ~ labelTwo ~ countA ~ countB <> (Thing.apply _, Thing.unapply _)
val byId = createFinderBy(_.id)
}
}
object Things extends DAO {
def insert(thing: Thing)(implicit s: Session) { Things.insert(thing) }
def findById(id: Long)(implicit s: Session): Option[Thing] = Things.byId(id).firstOption
// ???
def summarizeSince(date: LocalDate)(implicit s: Session): Set[SummarizedThing] = {
Query(Things).where(_.date > date).groupBy(x => (x.labelOne, x.labelTwo)).map {
case(thing: Thing) => {
// obviously this line below is wrong, but you can get an idea of what I'm trying to accomplish:
// create a new SummarizedThing for each unique labelOne + labelTwo combo, summing the count columns
new SummarizedThing(thing.labelOne, thing.labelTwo, thing.countA.sum, thing.countB.sum)
}
} // presumably need to run the query and map to SummarizedThing here, perhaps?
}
}
The summarizeSince function is where I'm having trouble. I seem to be able to query Things just fine, filtering by date, and grouping by my fields... however, I'm having trouble summing countA and countB. With the summed results, I'd then like to create a SummarizedThing for each unique labelOne + labelTwo combination. Hopefully that makes sense. Any help would be greatly appreciated.
presumably need to run the query and map to SummarizedThing here, perhaps?
Exactly.
Query(Things).filter(_.date > date).groupBy(x => (x.labelOne, x.labelTwo)).map {
// match on (key,group)
case ((labelOne, labelTwo), things) => {
// prepare results as tuple (note .sum returns an Option)
(labelOne, labelTwo, things.map(_.countA).sum.get, things.map(_.countB).sum.get)
}
}.run.map(SummarizedThing.tupled) // run and map tuple into case class
Same as the other answer, but expressed as a for comprehension, except that .get is exceptional so you probably need getOrElse.
val q = for {
((l1,l2), ts) <- Things.where(_.date > date).groupBy(t => (t.labelOne, t.labelTwo))
} yield (l1, l2, ts.map(_.countA).sum.getOrElse(0L), ts.map(_.countB).sum.getOrElse(0L))
// see the SQL that generates.
println( q.selectStatement )
// select x2.`labelOne`, x2.`labelTwo`, sum(x2.`countA`), sum(x2.`countB`)
// from `things` x2 where x2.`date` > '2013' group by x2.`labelOne`, x2.`labelTwo`
// map the result(s) of your query to your case class
q.map(SummarizedThing.tupled).list