Error Writing existing and populated Julia Dataframe to csv output - csv

I have a dataframe built out with various calculations performed on different rows. I'm trying to write the dataframe to a csv output, but keep getting the following error:
ArgumentError: `nothing` should not be printed; use `show`, `repr`, or custom output instead.
Here is the code:
filename = "Alldata.csv"
CSV.write(filename, bf)
Here is the stack trace:
[2] print_to_string(::Nothing) at .\strings\io.jl:123
[3] string(::Nothing) at .\strings\io.jl:156
[4] writecell(::Array{UInt8,1}, ::Int32, ::Int32, ::IOStream, ::Nothing, ::CSV.Options{UInt8,UInt8,Nothing,Tuple{}}) at C:\Users\haley.sims\.julia\packages\CSV\ztQqu\src\write.jl:281
[5] macro expansion at C:\Users\haley.sims\.julia\packages\CSV\ztQqu\src\write.jl:182 [inlined]
[6] eachcolumn at C:\Users\haley.sims\.julia\packages\Tables\FXXeK\src\utils.jl:49 [inlined]
[7] writerow(::Array{UInt8,1}, ::Base.RefValue{Int32}, ::Int32, ::IOStream, ::Tables.Schema{(:bid, :building_name, :constr_year, :floor_area, :primary_type, :primary_
[8] (::getfield(CSV, Symbol("##60#61")){getfield(CSV, Symbol("##53#54")){Bool,Tables.Schema{(:bid, :building_name, :constr_year, :floor_area, :primary_type, :primary_occupancy, :n_above, :period, :peak_pop, :eco_pop, :demo_cost, :renew_cost, :replacement_cost......
[9] #open#310(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::getfield(CSV, Symbol("##60#61")){getfie.....
[10] open at .\iostream.jl:367 [inlined]
[11] with(::getfield(CSV, Symbol("##53#54")){Bool,Tables.Schema{(:bid, :building_name, :constr_year, :floor_area, :primary_type, :primary_occupancy, :n_above, :period, :peak_pop, :eco_pop, :demo_cost, :renew_cost, :replacement_cost, :retro_cost, :retro_cost_notes, :perc_1, :perc_2, :perc_3, :perc_4, :perc_5, :perc_6, :perc_7, :perc_8, :perc_9, :perc_10, :perc_11, :n_col, :IM_43, :IM_200, :IM_475, :IM_2475, :PC_43, :PC_200, :PC_475, :PC_2475, :renew_cost3, :temp_cost, :tier3, :fr_HAZ, :cole_HAZ, :IM_975, :collapse_ext, :pcim_975, :theta3, :beta3, :theta4, :beta4, :tier4, :aaf4, :aafr, :pf4_50, :pfr_50, :annual_collapse, :pocc, :pf_col, :ri....
[12] #write#52(::Bool, ::Bool, ::Array{String,1}, ::Function, ::Tables.Schema{(:bid, :building_name, :constr_year, :floor_area, :primary_type, :primary_occupancy, :n_above, :period, :peak_pop, :eco_pop, :demo_cost, :renew_cost, :replacement_cost, :retro_cost, :retro_cost_notes....
[13] write(::Tables.Schema{(:bid, :building_name, :constr_year, :floor_area, :primary_type, :primary_occupancy, :n_above, :period, :peak_pop, :eco_pop, :demo_cost, :renew_cost, :replacement_cost, :retro_cost, :retro_cost_notes, :perc_1, :perc_2, :perc_3, :perc_4, :perc_5, :perc....
[14] #write#51(::Char, ::Char, ::Nothing, ::Nothing, ::Char, ::Char, ::Char, ::Nothing, ::Bool, ::String, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(CSV.write), ::String, ::DataFrame) at C:\Users\haley.sims\.julia\packages\CSV\ztQqu\src\write.jl:60
[15] write(::String, ::DataFrame) at C:\Users\haley.sims\.julia\packages\CSV\ztQqu\src\write.jl:53
[16] top-level scope at In[177]:3
The dataframe exists and is populated since I can print it:
│ Row │ bid │ building_name │ constr_year │ floor_area │ primary_type │ primary_occupancy │ n_above │ period │ peak_pop │ eco_pop │ demo_cost │ renew_cost │ replacement_cost │ retro_cost │ retro_cost_notes │ perc_1 │ perc_2 │ perc_3 │ perc_4 │ perc_5 │ perc_6 │ perc_7 │ perc_8 │ perc_9 │ perc_10 │ perc_11 │ n_col │ IM_43 │ IM_200 │ IM_475 │ IM_2475 │ PC_43 │ PC_200 │ PC_475 │ PC_2475 │ renew_cost3 │ temp_cost │ tier3 │ fr_HAZ │ cole_HAZ │ IM_975 │ collapse_ext │ pcim_975 │ theta3 │ beta3 │ theta4 │ beta4 │ tier4 │ aaf4 │ aafr │ pf4_50 │ pfr_50 │ annual_collapse │ pocc │ pf_col │ risk_individual │ risk_individual_perc │
│ │ String │ String │ Int64⍰ │ Int64 │ String │ String⍰ │ Int64⍰ │ Float64⍰ │ Float64 │ Float64 │ Missing │ Float64⍰ │ Float64 │ Float64⍰ │ Union{Missing, String} │ Float64⍰ │ Float64⍰ │ Float64⍰ │ Float64⍰ │ Float64⍰ │ Float64⍰ │ Float64⍰ │ Float64⍰ │ Float64⍰ │ Float64⍰ │ Float64⍰ │ Int64⍰ │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Float64 │ Float64 │ Float64 │ Float64⍰ │ Float64⍰ │ String │ Float64⍰ │ Float64⍰ │ Int64 │ Float64⍰ │ Float64⍰ │ Float64 │ Float64 │ Float64 │ Float64 │ Union… │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │
├─────┼────────┼─────────────────────────────────────┼─────────────┼────────────┼──────────────┼───────────────────┼─────────┼──────────┼──────────┼─────────┼───────────┼────────────┼──────────────────┼────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼──────────┼────────┼───────┼────────┼────────┼─────────┼───────┼─────────┼─────────┼─────────┼─────────────┼───────────┼────────┼──────────┼──────────┼────────┼──────────────┼──────────┼─────────┼──────────┼─────────┼──────────┼────────┼───────────┼────────────┼──────────┼──────────┼─────────────────┼──────────┼──────────┼─────────────────┼──────────────────────┤
│ 1 │ 22 │ LOWER MALL RESEARCH STATION │ 1960 │ 75229 │ C2 │ Laboratory │ 4 │ 0.37 │ 206.874 │ 68.9579 │ missing │ 47.8929 │ 56.3446 │ missing │ Lab wing 95% vulnerable - Office wing far less vulnerable at 5% collapse. New shear and frame ductility required. Non structural risk high. │ 0.85 │ 0.85 │ 0.9 │ 0.9 │ 0.8 │ 0.85 │ 0.9 │ 0.9 │ 0.9 │ 0.85 │ 0.9 │ 11 │ 43 │ 200 │ 475 │ 2475 │ 0 │ 0.0 │ 0.098 │ 0.42 │ 22.969 │ 47.8929 │ III │ missing │ missing │ 975 │ 0.872727 │ 0.9 │ 2294.63 │ 0.951074 │ 572.462 │ 0.398816 │ V │ 0.0970286 │ 0.00312973 │ 0.992183 │ 0.144857 │ 0.00179141 │ 0.333333 │ 0.785455 │ 0.000469024 │ 0.0469024 │
│ 2 │ 48 │ ANTHROPOLOGY AND SOCIOLOGY BUILDING │ 1975 │ 72384 │ C2 │ Office │ 3 │ 0.2 │ 267.931 │ 60.2845 │ missing │ 24.1623 │ 25.4339 │ 21.6189 │ Wings B and C highly vulnerable to collapse. Lateral strength weakness very high, connections of roofs minimal. Mitigation would need to address adjacent ANSO buildings. Modelling includes assumtions about soil stability and foundation response. │ 0.95 │ 0.5 │ 0.5 │ 0.5 │ 0.5 │ 0.5 │ 0.7 │ 0.7 │ 0.5 │ 0.5 │ 0.5 │ 11 │ 43 │ 200 │ 475 │ 2475 │ 0 │ 0.0 │ 0.016 │ 0.582 │ 16.414 │ 24.1623 │ IV │ missing │ missing │ 975 │ 0.577273 │ 0.9 │ 2111.45 │ 0.657641 │ 572.462 │ 0.398816 │ V │ 0.0561079 │ 0.00273608 │ 0.939517 │ 0.127859 │ 0.00179141 │ 0.225 │ 0.519545 │ 0.000209412 │ 0.0209412 │
│ 3 │ 148 │ CHEMISTRY B BLOCK, SOUTH WING │ 1959 │ 73590 │ C1 │ Laboratory │ 3 │ 0.51 │ 180.63 │ 60.21 │ missing │ 41.1035 │ 48.357 │ missing │ missing │ 0.05 │ 0.05 │ 0.2 │ 0.1 │ 0.1 │ 0.0 │ 0.3 │ 0.3 │ 0.3 │ 0.8 │ 0.3 │ 10 │ 43 │ 200 │ 475 │ 2475 │ 0 │ 0.0 │ 0.0 │ 0.517 │ 26.943 │ 48.357 │ IV │ missing │ missing │ 975 │ 0.25 │ 0.76 │ 2452.13 │ 0.217791 │ 753.719 │ 0.425775 │ V │ 0.018312 │ 0.0027327 │ 0.599724 │ 0.127711 │ 0.00135172 │ 0.333333 │ 0.225 │ 0.000101379 │ 0.0101379 ```

You are probably on Julia 1.0. Try switching to Julia 1.3.1 (current release) and the problem should disappear.
If you have to stick to Julia 1.0 the problem is that nothing value is not allowed to be printed in this version (it is allowed now in Julia 1.3.1 so that is why switching solves your problem). In order to solve it the simplest thing is to replace nothing with missing (as probably if you have nothing in your data then missing is not used in it so it should be safe to use it as a replacement).
The code that will work in this case is:
filename = "Alldata.csv"
CSV.write(filename, something.(bf, missing))
After this operation in the saved file each occurrence of nothing in your original data frame will be stored as an empty field in a specific row (and when you read back the file using CSV.read it will take a missing value in the resulting data frame).

Related

Automatically fill all available space in a CSS grid row

I have a dynamically generated CSS Grid layout with grid-row values which are set using a script. It looks something like this:
╭───────────────╮
│ grid-row: 1/2 │
│ │
╰───────────────╯
╭───────────────╮
│ grid-row: 2/4 │
│ │
│ │
│ │
╰───────────────╯
╭───────────────╮
│ grid-row: 4/5 │
│ │
╰───────────────╯
╭───────────────╮
│ grid-row: 6/7 │
│ │
╰───────────────╯
When two elements have the same grid-row value, they are automatically displayed on the same row, like so:
╭──────╮
│ :1/2 │
│ │
╰──────╯
╭──────╮ ╭──────╮
│ :3/4 │ │ :3/4 │
│ │ │ │
╰──────╯ ╰──────╯
╭──────╮
│ :4/6 │
│ │
│ │
│ │
╰──────╯
How can I get the elements to automatically fill all available horizontal space, like below?
╭───────────────╮
│ grid-row: 1/2 │
│ │
╰───────────────╯
╭──────╮ ╭──────╮
│ :3/4 │ │ :3/4 │
│ │ │ │
╰──────╯ ╰──────╯
╭───────────────╮
│ grid-row: 4/6 │
│ │
│ │
│ │
╰───────────────╯

Centre overflowing element around non-centre point

I have an element E that is 1000px wide, that is within a parent container P.
Of E's 1000 pixels of width, the columns x position 600–800px are more important than the others, so as its parent element gets narrower, I'd like to position E within P so that these pixels are visible and centred within P (until this is no longer possible).
So, if P can fit the entire width of E, no problem! Just centre E within P:
┌──────────────────────────────────────────────────────────────┐
│P │
│ ┌─────────────────────────┬──────────────┬───────┐ │
│ │E │Important█████│ │ │
│ │ │██████████████│ │ │
│ │ │██████████████│ │ │
│ │ │██████████████│ │ │
│ │ │██████████████│ │ │
│ │ │██████████████│ │ │
│ │ │██████████████│ │ │
│ │ │██████████████│ │ │
│ │ │██████████████│ │ │
│ │ │██████████████│ │ │
│ │ │██████████████│ │ │
│ │ │██████████████│ │ │
│ │ │██████████████│ │ │
│ │ │██████████████│ │ │
│ │ │██████████████│ │ │
│ │ │██████████████│ │ │
│ │ │██████████████│ │ │
│ │ │██████████████│ │ │
│ │ │██████████████│ │ │
│ └─────────────────────────┴──────────────┴───────┘ │
│ │
└──────────────────────────────────────────────────────────────┘
However, if the parent is resized to be e.g. 250px wide, E should be centred around x=700px:
┌───────────────────────────┐
│P │
┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┼─────┬──────────────┬──────│
E │ │Important█████│ ││
│ │ │██████████████│ │
│ │██████████████│ ││
│ │ │██████████████│ │
│ │██████████████│ ││
│ │ │██████████████│ │
│ │██████████████│ ││
│ │ │██████████████│ │
│ │██████████████│ ││
│ │ │██████████████│ │
│ │██████████████│ ││
│ │ │██████████████│ │
│ │██████████████│ ││
│ │ │██████████████│ │
│ │██████████████│ ││
│ │ │██████████████│ │
│ │██████████████│ ││
│ │ │██████████████│ │
│ │██████████████│ ││
└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┼─────┴──────────────┴──────│
│ │
└───────────────────────────┘
Is this possible to achive with CSS? (Maybe using calc()?)
Bonus: Is it possible to have the overflow be scrollable? (If so, I guess JS is needed?)
You could do this with absolute positioning. For larger screens, position the #Important div absolutely 200px from the right-end of E. For smaller screens, position the div absolutely from the left (e.g., 50px from the left).
See this CodePen for the solution.
With this method, the contents of #Important div will overlap the contents of E which I'm assuming is what you want based on the visuals in your question.

Add identifier of first created record to select statement with group_by

I have the following payments table
┌─name───────────────────────────┬─type────────────────────────────┐
│ payment_id │ UInt64 │
│ factory │ String │
│ user_id │ UInt64 │
│ amount_cents │ Int64 │
│ action │ String │
│ success │ UInt8 │
│ country │ FixedString(2) │
│ created_at │ DateTime │
│ finished_at │ Nullable(DateTime) │
└────────────────────────────────┴─────────────────────────────────┘
With sample data
┌─factory───┬─────────finished_at─┬─payment_id─┬─country─┬─action──┬─amount_cents─┬─user_id───┬
│ 0_factory │ 2021-01-18 00:00:01 │ 1 │ BY │ payment │ 1 │ 1 │
│ 0_factory │ 2021-01-18 00:00:02 │ 2 │ BY │ payment │ 1 │ 1 │
│ 1_factory │ 2021-01-18 00:00:02 │ 2 │ PL │ win │ 4 │ 1 │
│ 1_factory │ 2021-01-18 00:00:03 │ 3 │ PL │ win │ 7 │ 1 │
│ 2_factory │ 2021-01-18 00:00:01 │ 4 │ PL │ win │ 7 │ 1 │
│ 2_factory │ 2021-01-18 00:00:02 │ 1 │ PL │ payment │ 7 │ 1 │
│ 2_factory │ 2021-01-18 00:00:03 │ 2 │ PL │ win │ 7 │ 1 │
│ 2_factory │ 2021-01-18 00:00:04 │ 3 │ GR │ win │ 2 │ 1 │
└───────────┴─────────────────────┴────────────┴─────────┴─────────┴─────────┴────────────────┘
This is an example of what I have right now with
SELECT
factory,
user_id,
payment_id,
action,
created_at
FROM payments_all
WHERE (payments_all.action = 'payment') AND (payments_all.factory IN ('0_factory', '1_factory', '2_factory')) AND isNotNull(payments_all.created_at)
GROUP BY
factory,
user_id,
payment_id,
action
HAVING (min(created_at) >= toDate('2019-01-01 00:00:00')) AND (min(created_at) < toDate('2021-10-01 00:00:00'))
ORDER BY user_id
┌─factory───┬─user_id─┬─payment_id─┬─action──┬──────────created_at─┐
│ 1_factory │ 1 │ 1 │ payment │ 2021-02-04 09:00:00 │
│ 0_factory │ 1 │ 1 │ payment │ 2021-01-17 00:00:01 │
│ 0_factory │ 1 │ 2 │ payment │ 2021-01-17 00:00:06 │
└───────────┴─────────┴────────────┴─────────┴─────────────────────┘
I need to add new column first_payment
first_payment takes value 1 if action is payment && it is first payment for a user. Otherwise it takes value 0.
the first_payment should be checked for all period
So expected result is:
┌─factory───┬─────────finished_at─┬─payment_id─┬─country─┬─action──┬─amount_cents─┬─user_id───┬first_payment─┐
│ 0_factory │ 2021-01-18 00:00:01 │ 1 │ BY │ deposit │ 1 │ 1 │ 1 │
│ 0_factory │ 2021-01-18 00:00:02 │ 2 │ BY │ deposit │ 1 │ 1 │ 0 │
│ 1_factory │ 2021-01-18 00:00:02 │ 2 │ PL │ win │ 4 │ 1 │ 0 │
│ 1_factory │ 2021-01-18 00:00:03 │ 3 │ PL │ win │ 7 │ 1 │ 0 │
│ 2_factory │ 2021-01-18 00:00:01 │ 4 │ PL │ win │ 7 │ 1 │ 0 │
│ 2_factory │ 2021-01-18 00:00:02 │ 1 │ PL │ deposit │ 7 │ 1 │ 1 │
│ 2_factory │ 2021-01-18 00:00:03 │ 2 │ PL │ win │ 7 │ 1 │ 0 │
│ 2_factory │ 2021-01-18 00:00:04 │ 3 │ GR │ win │ 2 │ 1 │ 0 │
└───────────┴─────────────────────┴────────────┴─────────┴─────────┴─────────┴────────────────┘
I couldn't find much about ClickHouse, but it doesn't appear to support Windowed Functions.
Your example output also seems to be exactly the same as your sample table, plus one additional column, so I'm not sure what you GROUP BY was meant to achieve.
So, I'd use a LEFT JOIN on to a sub-query.
SELECT
payments_all.*,
CASE WHEN user_summary.user_id IS NOT NULL THEN 1 ELSE 0 END AS first_payment
FROM
payments_all
LEFT JOIN
(
SELECT
user_id,
factory,
MIN(created_at) AS first_created_at
FROM
payments_all
WHERE
action = 'payment'
GROUP BY
user_id,
factory
)
AS user_summary
ON payments_all.user_id = user_summary.user_id
ON payments_all.factory = user_summary.factory
AND payments_all.created_at = user_summary.first_created_at
WHERE
(payments_all.factory IN ('0_factory', '1_factory', '2_factory'))
AND (payments_all.created_at >= toDate('2019-01-01 00:00:00'))
AND (payments_all.created_at < toDate('2021-10-01 00:00:00'))
As I can see for first payment the payment_id is always 1. So, I think you can use CASE WHEN payment_id=1 Then 1 ELSE 0 END AS first_payment. Please check query below =>
WITH CTE AS
(SELECT
factory,
user_id,
payment_id,
action,
created_at
FROM payments_all
WHERE (payments_all.action = 'payment') AND (payments_all.factory IN ('0_factory', '1_factory', '2_factory')) AND isNotNull(payments_all.created_at)
GROUP BY
factory,
user_id,
payment_id,
action
HAVING (min(created_at) >= toDate('2019-01-01 00:00:00')) AND (min(created_at) < toDate('2021-10-01 00:00:00'))
) T1
SELECT *,CASE WHEN payment_id=1 Then 1
ELSE 0 END AS first_payment
FROM CTE
ORDER BY T1.user_id
NOTE: Query is written in SQL Server. Please check and let me know.

How do I normalize a survey site's database?

I am trying to make a survey site, but I have not been able to normalize the database. I do not know how to correlate questions and answers in the database. There is a very strong relationship between questionnaire and questionnaire. Because I have the QuestionID in the question table as the primary Key, the ID numbers of the question numbers are unique. This is how the answer table works. How am I supposed to do that?
┌────────────────┬─────────────┬────────────┬────────────┬──────────────┐
│ Member │ Survey │ Question │ Choice │ Category │
├────────────────┼─────────────┼────────────┼────────────┼──────────────┤
│ ID │ SurveyID │ QuestionID │ ChoiceID │ ID │
│ FirstName │ SurveyorID │ Question │ Choice │ CategoryName │
│ LastName │ SurveyTitle │ SurveyID │ QuestionID │ │
│ Mail │ CategoryID │ │ │ │
│ Password │ │ │ │ │
│ NumberOfSurvey │ │ │ │ │
└────────────────┴─────────────┴────────────┴────────────┴──────────────┘
I have come up with a structure which might work for your case with few or no modifications. The database structure is in the Image attached. All the columns marked with * are the ones that are to be set as a primary key. If you are not sure of how to set multiple columns as primary keys, please refer Composite Keys.
┌────────────────┬─────────────┬────────────┬────────────┬──────────────┐
│ Member │ Survey │ Question │ Choice │ Category │
├────────────────┼─────────────┼────────────┼────────────┼──────────────┤
│ MemberID* │ SurveyID* │ QuestionID*│ ChoiceID* │ CategoryID* │
│ FirstName │ MemberID* │ Question │ Choice │ CategoryName │
│ LastName │ SurveyTitle │ SurveyID* │ QuestionID*│ │
│ Mail │ CategoryID* │ │ │ │
│ Password │ │ │ │ │
│ NumberOfSurvey │ │ │ │ │
└────────────────┴─────────────┴────────────┴────────────┴──────────────┘
Good Luck!

How to read a non-standard space delimited data into a DataFrame and build a GLM model using it?

I am trying to read a tab delimited file with all data present into julia. It saves all the columns as NullableArrays.NullableArray{Int64,1} although I specified the type:
data = CSV.read("../datasets/baby.dat"; delim='\t', types=[Int, Float64, Float64, Float64, Float64, Float64])
The dataset is from http://stat.ethz.ch/Teaching/Datasets/baby.dat
I want to do a regression with the dataset, but the glm.jl Package gives an error with Nullable Arrays ...
Any ideas?
The complete error message is:
fit(GeneralizedLinearModel, #formula(Survival2 ~
Weight+Age+X1.Apgar+X5.Apgar+pH), data, Binomial(), ProbitLink())
ERROR: Non-call expression encountered
Stacktrace:
[1] dospecials(::Expr) at
/.julia/v0.6/DataFrames/src/statsmodels/formula.jl:97
[2] collect_to!(::Array{Symbol,1},
::Base.Generator{Array{Any,1},DataFrames.#dospecials}, ::Int64,::Int64) at
./array.jl:508
[3] collect_to_with_first!(::Array{Symbol,1}, ::Symbol,
::Base.Generator{Array{Any,1},DataFrames.#dospecials}, ::Int64) at
./array.jl:495
[4] _collect(::Array{Any,1},
::Base.Generator{Array{Any,1},DataFrames.#dospecials}, ::Base.EltypeUnknown,
::Base.HasShape) at ./array.jl:489
[5] map(::Function, ::Array{Any,1}) at ./abstractarray.jl:1868
[6] dospecials(::Expr) at
.julia/v0.6/DataFrames/src/statsmodels/formula.jl:101
[7] DataFrames.Terms(::DataFrames.Formula) at
.julia/v0.6/DataFrames/src/statsmodels/formula.jl:209
[8] #ModelFrame#127(::Array{Any,1}, ::Type{T} where T, ::DataFrames.Formula, ::DataFrames.DataFrame) at .julia/v0.6/DataFrames/src/statsmodels/formula.jl:333
[9] (::Core.#kw#Type)(::Array{Any,1}, ::Type{DataFrames.ModelFrame}, ::DataFrames.Formula, ::DataFrames.DataFrame) at ./<missing>:0
[10] #fit#153(::Dict{Any,Any}, ::Array{Any,1}, ::Function, ::Type{GLM.GeneralizedLinearModel}, ::DataFrames.Formula, ::DataFrames.DataFrame, ::Distributions.Binomial{Float64}, ::Vararg{Any,N} where N) at .julia/v0.6/DataFrames/src/statsmodels/statsmodel.jl:52
[11] fit(::Type{GLM.GeneralizedLinearModel}, ::DataFrames.Formula, ::DataFrames.DataFrame, ::Distributions.Binomial{Float64}, ::GLM.ProbitLink) at .julia/v0.6/DataFrames/src/statsmodels/statsmodel.jl:52
[12] eval(::Module, ::Any) at ./boot.jl:235
[13] eval(::Any) at ./boot.jl:234
[14] macro expansion at .julia/v0.6/Atom/src/repl.jl:186 [inlined]
[15] anonymous at ./<missing>:?
I assume that you want to get a DataFrame. Unfortunately your file is not tab-delimited. This is how you can load it into a DataFrame:
using DataFrames
data = split.(readlines("baby.dat"))
types = [Int, Float64, Float64, Float64, Float64, Float64]
df = DataFrame([parse.(t, getindex.(data[2:end], i)) for (i, t) in enumerate(types)],
Symbol.(replace.(data[1], ".", "")))
Observe that I remove . from names of columns as later GLM package has problem with them.
Now you can check that all is as desired:
julia> showcols(df)
247×6 DataFrames.DataFrame
│ Col # │ Name │ Eltype │ Missing │ Values │
├───────┼──────────┼─────────┼─────────┼──────────────────┤
│ 1 │ Survival │ Int64 │ 0 │ 1 … 0 │
│ 2 │ Weight │ Float64 │ 0 │ 1350.0 … 790.0 │
│ 3 │ Age │ Float64 │ 0 │ 32.0 … 27.0 │
│ 4 │ X1Apgar │ Float64 │ 0 │ 4.0 … 4.0 │
│ 5 │ X5Apgar │ Float64 │ 0 │ 7.0 … 8.0 │
│ 6 │ pH │ Float64 │ 0 │ 7.25 … 7.35 │
julia> head(df)
6×6 DataFrames.DataFrame
│ Row │ Survival │ Weight │ Age │ X1Apgar │ X5Apgar │ pH │
├─────┼──────────┼────────┼──────┼─────────┼─────────┼──────┤
│ 1 │ 1 │ 1350.0 │ 32.0 │ 4.0 │ 7.0 │ 7.25 │
│ 2 │ 0 │ 725.0 │ 27.0 │ 5.0 │ 6.0 │ 7.36 │
│ 3 │ 0 │ 1090.0 │ 27.0 │ 5.0 │ 7.0 │ 7.42 │
│ 4 │ 0 │ 1300.0 │ 24.0 │ 9.0 │ 9.0 │ 7.37 │
│ 5 │ 0 │ 1200.0 │ 31.0 │ 5.0 │ 5.0 │ 7.35 │
│ 6 │ 0 │ 590.0 │ 22.0 │ 9.0 │ 9.0 │ 7.37 │
julia> tail(df)
6×6 DataFrames.DataFrame
│ Row │ Survival │ Weight │ Age │ X1Apgar │ X5Apgar │ pH │
├─────┼──────────┼────────┼──────┼─────────┼─────────┼──────┤
│ 1 │ 1 │ 1120.0 │ 28.0 │ 7.0 │ 7.0 │ 7.33 │
│ 2 │ 1 │ 1020.0 │ 28.0 │ 5.0 │ 7.0 │ 7.34 │
│ 3 │ 1 │ 1320.0 │ 28.0 │ 6.0 │ 6.0 │ 7.24 │
│ 4 │ 0 │ 900.0 │ 27.0 │ 5.0 │ 6.0 │ 7.37 │
│ 5 │ 1 │ 1150.0 │ 27.0 │ 4.0 │ 7.0 │ 7.37 │
│ 6 │ 0 │ 790.0 │ 27.0 │ 4.0 │ 8.0 │ 7.35 │
Now the GLM part (notice the correct way to call GLM):
julia> using GLM
julia> glm(#formula(Survival ~ Weight+Age+X1Apgar+X5Apgar+pH), df, Binomial(), ProbitLink())
StatsModels.DataFrameRegressionModel{GLM.GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Distributions.Binomial{Float64},GLM.ProbitLink},GLM.DensePredChol{Float64,Base.LinAlg.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}
Formula: Survival ~ 1 + Weight + Age + X1Apgar + X5Apgar + pH
Coefficients:
Estimate Std.Error z value Pr(>|z|)
(Intercept) -0.563327 8.36692 -0.0673279 0.9463
Weight 0.00213458 0.000479601 4.45074 <1e-5
Age 0.0996481 0.0444713 2.24073 0.0250
X1Apgar 0.0698717 0.0646315 1.08108 0.2797
X5Apgar 0.0371294 0.0703724 0.527614 0.5978
pH -0.624956 1.11015 -0.562946 0.5735
You can check that the results are the same as in R for this model.