TabPy LDA Topic Modelling from Calculated Field - lda

I am trying to perform LDA on unstructured data to identify topics for each row using Tableau - Python (TabPy). An example of data which arg_1 would be ran across might read:
"vulnerable due to bereavement and has cancer"
I am struggling with the logic of how to do this from a calculated field and have tried all combinations of Script_STR, INT, REAL etc.
Does anyone know how to do this please from within calculated field? Thanks in advance for the help. James

Related

How do I write a measure in this power pivot table that will only sum values next to a unique value?

I want to sum 'hours' in this table. Every 'item's' hours should be counted once, even if it appears twice. So Group A has 12.25 hours, in the example below.
Here is the source table:
A PowerPivot gives me:
So it's double counting rows where 'item' occurs twice, of course.
Because the 'hours' for different 'items' aren't the same, I'm not sure how to write a DAX measure to make this work in the pivotable (this is just an example, real dataset is the same problem but much larger). I tried
=([Sum of Hours]/COUNT([Hours]))*DISTINCTCOUNT([Item])
However it's not the correct calculation. It gave me 9.84375 for group A (right answer 12.25) and 47.53125 for group B (44 is correct).
You can see this from a deduped list (for unrelated reasons, it's not feasible to dedupe the list).
What measure (or combo of them) is going to give me what I need?
Thanks!
CALCULATE( SUMX( VALUES( Table1[Item] ), CALCULATE( MIN( Table1[Hours] ) ) ) )
Sorry for the delay. Your last request helped distract me after I messed up in an interview.
I tried to make it as simple as possible:
The outer CALCULATE would be necessary only if you want to overwrite the filter contexts (slicers, row headers, column headers) present in your report.
Which route to take to learn DAX depends on your available time. I have always been a believer that your first approach to this type of language should be something practical, more focused on scenarios and solutions so that you do not lose the will to learn (Learn to Write DAX A Practical Guide to Learning Power Pivot for Excel and Power BI by Matt Allington). Once you are interested in the topic, you can go to mid-level books like (Microsoft Excel 2013: Building Data Models with PowerPivot, Analyzing Data with Microsoft Power BI and Power Pivot for Excel ...) or jump directly to The Definitive Guide to DAX: Business intelligence with Microsoft Power BI, SQL Server Analysis Services, and Excel (Second Edition). Finally, understanding the language is about 80% of the way, you need practice and identify patterns: DAX Patterns (Russo, Marco Ferrari, Alberto) and all the SQLBI resources will be very helpful for you.

Forecasting using LSTM-RNN Python

I am using LSTM - RNN for Sales forecasting.
I have already trained and validated my model in the ratio of 70-30. I have around 144 rows of data which includes 2 columns (Date (YYYY/MM) and Sales).
I am only using the sales column to predict values.
I need to know whether it is possible to train 100% of your dataset. And If so, how do I predict the immediate future values (Out of Sample Data). What should be given as input for the model? Since I am kinda new to Deep Learning, this might be something easy that I am missing. Any help would be appreciated.

Batch file to verify to verify a .csv file

I am hoping someone can point me in the right direction, in relation to the scenario I am faced with.
Essentially, I am given a csv each day containing payment information of 200+ lines
As the Payment reference is input by the user at source, this isn't always in the format I need.
The process is currently done manually, and can take considerable time, therefore I was hoping to come up with a batch file to isolate the reference I require, based on a set of parameters.
Each reference should be; 11 digits in length, be numeric only and start either 1,2 or 3.
I have attached a basic example with this post.
It may be that this isn't possible in batch, but any ideas would be appreciated.
Thanks in advance :-)
I'm not too sure about batch but Python and Regexcan help you out here.
Here is a great tutorial on using csv's with python.
Once you have that down, you could use Regex to filter out the correct values.
Here is the correct expression to help you out ^[1|2|3][0-9]{10}$

Pattern match to identify date format

My source having different date formats as shown below, And im looking for an algorithm to identify the source date pattern tried in Pentaho Data integration with select value and Fuzzy steps.
Date Column (String)
"20150210"
"20050822--"
"2014-02-May"
"20051509--"
"02-May-2014"
"2013-May-12"
"12DEC2013"
"15050815"
"May-02-2014"
"12312015"
I know that in PDI we can achieve through JS step by writing If conditions for each pattern but is not a good idea and this approach makes transformation dead when dealing with huge records, looking out for efficient way to search date pattern.
I believe this is very common issue in all ETL projects, Here Im trying to understand how enterprise vendors like SAS Data Integration, Informatica, SSIS provides easy way to handle.
Do we have any Algorithm to identify source pattern. If so which one?
The formats that are listed above are not limited.
One cannot simply determine a "monovalent" value as the format for any given input.
Consider all of the following formats completely valid:
MM-dd-yy
dd-MM-yy
yy-MM-dd
As stated in a comment by #billinkc, what would you call 01-02-05 in that case?
If at all, your would be a solvable one only if you took a data set into account (e.g. you know that the next X rows are all from the same date format). Then you can look at it as a linear problem with some constraints that can help you determine the date format. Even then, you can't assure that you'll get a definite answer, just increase the probability that you'll have a definite answer.

Software to export Abs (Absolute database) to MySQL

Can anyone suggest any software to export Abs (Absolute database) to MySQL? I have already tried: http://www.componentace.com/bde_replacement_database_delphi_absolute_database.htm; which returns corrupted data ( trying to decode data (exporting Abs to MySQL) ) and ABC Amber Absolute Converter 1.03 which was unable to handle data (900 MB). Can anyone suggest alternatives? The database contains only one table (entries) and has one column in it WIDEMEMO. I am trying to export data to MySQL as stated above.
Have you thought about writing your own? If you've got one table with just one column, this isn't a big programming project.
Just open the table, loop through the records, writing them to temporary intermediate file. Then write another program to read them in to MySQL.
But, I agree with Radu: if you're in good standing with the Absolute people, they should be able to help you. Maybe they, like me, can't figure why you just wouldn't write a quick and dirty program to do this.
Sorry if I've overlooked something that makes my suggestion unreasonable.
As I've answered in your other question, have you tried to contact the vendor(Absolute database producer) and ask some advice from it?