How to join 2 Arrow tables? - pyarrow

I want to Join two Arrow tables on a common attribute. Does Arrow have some C++ API to achieve the same? I did find something called HashJoin but I am not sure if that can be used to join 2 tables. Any pointers on this would be immensely helpful.

If you are working with the C++ API then a join can be achieved with an ExecPlan. The ExecPlan API is still marked experimental but it should have some updated documentation soon. An example is being added as part of this PR. The crux of the example is:
ARROW_ASSIGN_OR_RAISE(left_source,
cp::MakeExecNode("scan", plan.get(), {}, l_scan_node_options));
ARROW_ASSIGN_OR_RAISE(right_source,
cp::MakeExecNode("scan", plan.get(), {}, r_scan_node_options));
arrow::compute::HashJoinNodeOptions join_opts{arrow::compute::JoinType::INNER,
/*in_left_keys=*/{"lkey"},
/*in_right_keys=*/{"rkey"}};
ARROW_ASSIGN_OR_RAISE(
auto hashjoin,
cp::MakeExecNode("hashjoin", plan.get(), {left_source, right_source}, join_opts));
You can check out HashJoinNodeOptions here.

Related

Jquery find - increase of number

I have a simple jQuery function as I was just trying to find that best way to write this. So basically,I want the correct div to find the correct contact form. E.g. cf1 will show contactform1, cf2 will find contactform2 etc... What is the best way to write this, rather then duplicating and writing this code multiple times?
$('.cf1').parent().parent().parent().find('.contactform1').show();
$('.cf2').parent().parent().parent().find('.contactform2').show();
$('.cf3').parent().parent().parent().find('.contactform3').show();
Thank you
You can use closest function.
Example Code
$('.cf1').closest('.contactform1').show();
$('.cf2').closest('.contactform2').show();
$('.cf3').closest('.contactform3').show();
You must use both, closest and find jquery methods.
With the closest method you go up to an element which is parent of both your elements (eg. .cf1 and .contactform1), then with the find method you will go down to find the right form.
$('.cf1').closest('.selector-parent-of-both').find('.contactform1').show();
$('.cf2').closest('.selector-parent-of-both').find('.contactform2').show();
$('.cf3').closest('.selector-parent-of-both').find('.contactform3').show();

How to highlight a range of selected months

I'm new to Angular. All the answers that i came accross are using jquery and javascript. Few weeks ago there was a requirement in my team that we need a month only picker. But the condition is that we cannot outsource the component from anywhere outside the organization. Not even bootstrap, material or primeng. So I decided to create a custom one from the scratch using HTML and CSS. Here is the screenshot:
app-monthpicker is a component and on top of it there is a parent component app-timeselector.
The monthpicker is working perfectly. But I'm not able to implement the logic for highlighting the selected range of months. All the solutions on stackoverflow and other websites are using jquery or js. But here We're talking typescript. I've created a minimal stackblitz and here is one more stackblitz created by one of the answerer. Can someone help me in this regard please. With HTML and CSS and Typescript only. I badly need someones help here. I want this:
You can see 6 months from the previous year and all the months from next year also. And they also need to be highlighted if they're in range. For now I need this for 2017 to 2025 only. I don't mind even if you hard-code these values for now.
PS: I'm afraid that my whole implementation is incorrect. Please correct me.
Ideally, for such a use case you should not re-invent the wheel and leverage a good library that solved this problem. But if you want to make your current code works for the use case here is what can be done:
demo: https://angular-zedvjx.stackblitz.io
implementation: https://stackblitz.com/edit/angular-zedvjx
At high level:
I used approach where overall months are represented by one large
Array (monthsData) since the use case needs to support months
selection across years and this way it is easier to iterate over it.
Then each month view is just a "slice" into this big array, so
switching between years is switching between the "views" (view here
is monthArray.slice(viewStart, viewFinish) )
Also introduced state for the range to keep track of it easier.
Update: wrote an article with cleaner implementation here: https://medium.com/better-programming/month-range-picker-in-angular-8-4ce93ef7d76b
I'll take another aproach. You has four variables:lboundMonth,lboundYear,uboundMonth and uboundYear.
I think that you can has some like, I put and example from october 2020 to febrary 2021
lbound:{year:2020,month:10,yearMonth:"202010"} //yearMonth it's the way yyyyMM
ubound:{year:2020,month:1,ueatMonth:"202101"}
Futhermore, you create an array with the month. As #Sergey say, we can create an array of months. But in my case, I'll take that was in the way
{monthName: "january",month:1,monthYear:202001}
So, when you change the year
month=arr.map((x,index)=>{
return {
monthName:x,
month:(index+1)
monthYear:displayYear+('00'+(x+1)).slice(-2)
})
You only need compare in the loop monthYear with lbound and ubound. Some like
<div *ngFor="let month of months>
<span [ngClass]="{'ubound':month.yearMonth==ubound.yearMonth,
'lbound':month.yearMonth==lbound.yearMonth,
'range':month.yearMonth>lbound.yearMonth &&
month.yearMonth<ubound.yearMonth
}"
(click)="click(month)"
</div>
When you click you has in
click(month:any)
{
const my={
year:this.displayYear
month:month.month
monthYear:month.monthYear
}
..asing to lbound or tbound
//you emit:
this.ouputToparent({
lbound:this.lbound,
ubound:this.ubound,
})
//or
this.ouputToparent({
lbound:{year:this.lbound.year,month:this.lbound.month},
ubound:{year:this.ubound.year,month:this.ubound.month},
})
}

XrController.hitTest not returning any ESTIMATED_SURFACE or DETECTED_SURFACE results

I'm calling
XrController.hitTest(X, Y, ['FEATURE_POINT','ESTIMATED_SURFACE', 'DETECTED_SURFACE'])
But all the results I'm getting are of type 'FEATURE_POINT' only.
If I leave out 'FEATURE_POINT' from the included types
XrController.hitTest(X, Y, ['ESTIMATED_SURFACE', 'DETECTED_SURFACE'])
I'm not getting any results at all.
Are 'ESTIMATED_SURFACE', 'DETECTED_SURFACE' not implemented yet, or do I need to do something specific in order to get them
Thanks
The ESTIMATED_SURFACE and DETECTED_SURFACE options are in the spec but aren't currently a features of 8th Wall Web. Only FEATURE_POINT is currently implemented.
Currently only 'FEATURE_POINT' is supported in 8th Wall. You can reference this document: https://www.8thwall.com/docs/web/#xr8xrcontrollerhittest

How can I use "Interpolated Absolute Discounting" for a bigram model in language modeling?

I want to compare two smoothing methods for a bigram model:
Add-one smoothing
Interpolated Absolute Discounting
For the first method, I found some codes.
def calculate_bigram_probabilty(self, previous_word, word):
bigram_word_probability_numerator = self.bigram_frequencies.get((previous_word, word), 0)
bigram_word_probability_denominator = self.unigram_frequencies.get(previous_word, 0)
if self.smoothing:
bigram_word_probability_numerator += 1
bigram_word_probability_denominator += self.unique__bigram_words
return 0.0 if bigram_word_probability_numerator == 0 or bigram_word_probability_denominator == 0 else float(
bigram_word_probability_numerator) / float(bigram_word_probability_denominator)
However, I found nothing for the second method except for some references for 'KneserNeyProbDist'. However, this is for trigrams!
How can I change my code above to calculate it? The parameters of this method must be estimated from a development-set.
In this answer I just clear up a few things that I just found about your problem, but I can't provide a coded solution.
with KneserNeyProbDist you seem to refer to a python implementation of that problem: https://kite.com/python/docs/nltk.probability.KneserNeyProbDist
There exists an article about Kneser–Ney smoothing on wikipedia: https://en.wikipedia.org/wiki/Kneser%E2%80%93Ney_smoothing
The article above links this tutorial: https://nlp.stanford.edu/~wcmac/papers/20050421-smoothing-tutorial.pdf but this has a small fault on the most important page 29, the clear text is this:
Modified Kneser-Ney
Chen and Goodman introduced modified Kneser-Ney:
Interpolation is used instead of backoff. Uses a separate discount for one- and two-counts instead of a single discount for all counts. Estimates discounts on held-out data instead of using a formula
based on training counts.
Experiments show all three modifications improve performance.
Modified Kneser-Ney consistently had best performance.
Regrettable the modified Version is not explained in that document.
The original documentation by Chen & Goodman luckily is available, the Modified Kneser–Ney smoothing is explained on page 370 of this document: http://u.cs.biu.ac.il/~yogo/courses/mt2013/papers/chen-goodman-99.pdf.
I copy the most important text and formula here as screenshot:
So the Modified Kneser–Ney smoothing now is known and seems being the best solution, just translating the description beside formula in running code is still one step to do.
It might be helpful that below the shown text (above in screenshot) in the original linked document is still some explanation that might help to understand the raw description.

MDX Children of Several Members

The children functions returns the set of the member.
But I need the children of several members.
The problem is, that I can't use Union to make it work like that:
Union([Geography].[Geography].[USA].children,[Geography].[Geography].[Canada].children)
I don't know how many member it will be... So I actually would need all children of a set of members.
like:
([Geography].[Geography].[USA],[Geography].[Geography].[Canada],[Geography].[Geography].[GB]).children
Is there a function like that?
I couldn't answer my question and so I just edit it. With the help of DHN's answer and some brain work I found a solution I could use:
Except(DRILLDOWNLEVEL( {[Geography].[Geography].[USA],[Geography].[Geography].[Canada]},,0 ),
{[Geography].[Geography].[USA],[Geography].[Geography].[Canada]})
That does work for me.
Explanation: I drilldown the elements the tool provides me, which returns children plus parents and then I use DHN's idea and except the parents so clean the list up a bit.
Hopefully it is understandable.
You can use the Descendants method (the fourth form of the description linked uses a set as its first argument. Thus,
Descendants( {
[Geography].[Geography].[USA],
[Geography].[Geography].[Canada],
[Geography].[Geography].[GB]
},
1,
SELF
)
should deliver exactly what you want.
Well actually, you could use a Crossjoin to get the set you want.
Something like
[Geography].[Geography].[USA] * [Geography].[Geography].[Canada] * [Geography].[Geography].[GB]
But this is only a proper solution, if you have only a few different search criteria.
Alternatively, you could use Except to remove those criteria you're not interested in. E.g.
Except([Geography].[Geography].children, [Geography].[Geography].[Germany])
This would give you the whole content of the [Geography] dimension, except the one of [Germany].
Hope this helps a bit.
Edit after comment of TO
Ok, this wasn't part of your question, but I think what you need is the MemberToStr() function. Please find the doc here.
I think something like this should do the trick.
with member [Measures].[Cities]
as membertostr([Geography].[Geography].members.children)
select [Measures].[Cities] on 0
from [WhatEverYourCubeNameIs]
where (
[Geography].[Geography].[USA],
[Geography].[Geography].[Canada]
)
Please note that this query is totally untested. I also may have lost some of my skills, because it's been a while, since I used mdx. You will also have to create the query dynamically, since the selection seems to be user dependant. But I'm sure that you're aware of it. ;)