Suitability of mixed effect models to explain sales in retail? - regression

I have weekly promotional sales data for different product clusters and different stores. In addition, I have information on the promotions (e.g. discount), weather, economic situation and more. Sales can not always be observed at all different locations.
I am interested in the key drivers of the sales for each product cluster. Since I do not have many observations for each product cluster but at the same time I know that observations are not independent between within a cluster, I thought of doing a mixed effect model with location and product cluster as random effects. I have to read much more about the assumptions and implementation of mixed effect models, but would be very grateful for a feedback on whether this makes sense.

Related

Scheduling Optimization for multi-step growth modeling

I recently had a Gaussian Process machine learning program built for my production department. This GP system has built a massive mySQL database that provides growth durations for each of the organisms we grow (Lab environment) and the predicted yield for each of those combinations of growth steps.
I would like to build an optimization program in python (preferably) to assist me in scheduling what organisms to grow, when to grow them, and for how long at each step.
Here is some background:
4 steps to the process
Plate step (organism is plated; growth is started)
Seed step (organism transferred from plate to seed phase)
Incubation step (organism is transferred from seed to incubation phase)
Harvest step (organism is harvested; yield collected)
There are multiple organisms (>50) that are grown per year. Each has their own numerical ID
There is finite space to grow organisms at the incubation step
There is infinite space to grow organisms at the plate and seed step.
Multiple 'lots' of the same organism are typically grown at a time. A lot is predefined by the number of containers being used at the incubation step.
Different organisms have very different maximum yields. Some yield 2000 grams max and others 600 g max.
The mySQL server has every combination of # of days at each step for each organism and the predicted yield for that combination. This data is what needs to be used for optimization.
The massive challenge we run into is scheduling what organisms to grow when. With the GP process, we know the theoretical maximums (and they work!) but its hard putting it into practice due to constraints (see below)
Here would be my constraints:
Only one organism can be harvested per day.
No steps can be started on weekends. Organisms can grow over the weekend, but we can't start a new step on a weekend
If multiple 'lots' are being grown of the same mold, the plate and seed start dates should be the same for every 'lot'.
- What this typically looks like in practice is:
- plate and seed steps start on the same day
- next, incubation steps start day-after-day for as many lots as being made
- finally, harvests occur in the same pattern (day-after-day)
- Therefore, what you typically get is identical # of days in the plate phase, identical # of incubation days, and differing # of seed days.
Objective Function: I don't know how to articulate this perfectly, but very broadly we need to maximize the yields for each organism. However, there needs to be a time balance too as the space to grow the organisms is finite and the time we have to grow them is finite as well.
I have created a metric known as lot*weeks that tries to capture that. It is a measure of the number of the number of weeks (at the incubation phase) needed to grow the expected annual demand of a specific organism based upon the predicted yield from the SQL server. Therefore, a potential objective function would be to minimize the lot_weeks for each organism.
This is obviously more of a broad ask for help. I don't have a specific request. If this is not appropriate for this forum, I can take my question elsewhere. I feel comfortable with the scope of the project and can figure out how to write the code over time but I need assistance with what tools to use and what's possible.
I've seen that pyomo may be helpful but I also wanted to check here first. Thank you
I've tried looking into using Pyomo but stopped due to the complexity and didn't want to learn all of it if it wasn't appropriate for the problem.
Edit: This was too broad, I apologize. I've created another post with more concrete examples. Thank you for all that helped.
This is really too broad of a question for this forum, and it may likely get closed. That said...
You have a framework here that you could develop an optimization in. The database part is irrelevant. For an effective optimization model, what you really need is a known relationship between the variables and the outcomes, for instance, days in incubation ==> size of harvest or such. Which it sounds like you have.
This isn't an entry level model you are describing. Do you have any resources to help? Local university that might have need for grad student projects in the field or such?
As you develop this, you should start small and focus the model on the key issues here... if they aren't known, then perhaps that is the place to start. For instance, perhaps the key issue is management of planting times vis-a-vis the weekends (that is one model). Or perhaps the key issue is the management of the limited space for growth and the inability to achieve steps on the weekend just kinda works itself out. (That is another model for space management.) Try one that seems to address key management questions. Start very small and see if you can get something working as a proof of concept. If this is your first foray into linear programming, you will need help. You might also start with an introductory textbook on LP.

How to manage multiple simulations that continually create large datasets

my work is related to mathematical modelling and running computer simulations in fluid mechanics. I have a mathematical model that has, say, has 5 parameters. Each of them have some range defined by us, and we would like to study how this model performs within these ranges.
We make a computer code, and start running simulations.
Very soon, I have an extremely large dataset, and it becomes increasingly difficult to keep track of what simulation i run when...
...and if they are running on different computers, it is even more difficult to manage.
One simulation takes about 3-4 days to finish, so by the time one finishes, we have to track our lab notes to see what made us run that simulation in the first place.
The problem is compounded when the number of parameters is very large, obviously.
I want something that tracks all of this. An app, website, tool, code, software, anything that can tabulate all of these parameters. Maybe record dates, keep track of re-runs, and just show the 'status-board' of all my simulations.

Some total newbie questions on NFT and Ethereum

I'm interested in the conceptual topic of creating rights managements systems on the the Ethereum block chain with digital assets represented by an NFT.
I am just reading up on how to write programs that run on Etherium but I have some very basic questions just to get to started.
I read that NFT are created on the Ethereum blockchain. I don't really understand if that is the same block chain on which the currency Ether is maintained? Seems like the ledger will become impossibly large huge if both the every currency transaction and every digital asset and copy thereof that migrates to Ethereum is stored in one single giant ledger and that each miner on the chain has to download the entire ledger to one single machine in order to validate transactions? Have I got big misunderstanding there? I know there is talk about "sharding" in the future, but it seems like that isn't coming very soon.
Cost of running a smart contract on the blockchain? Assuming that the we are talking about the same block chain, from what I can see the price of "Gas" is quite high. I'm reading that the price of ETH transfer from one party to another is 21,000 Gwei, about $0.03 today. Just trying to understand the basics, how much does it cost to create a NFT? And roughly how much does it cosst to execute a simple function on the blockchain (without loops). Let say the equivalent of 5 statement function which takes a few simple params, reads a few blocks, doesn't write to the block chain but just performs some simple math and a few if statements and returns a string? Does that also cost, like, more than penny? Is the conversion to ETH2 switch from proof of work to proof of stake going to bring those costs down by orders of magnitude?
Any good resources or reference on how to write programs which create and manipulate NFTS on Etherium? Most of what I have seen in the bookstores seem to cover financial transactions with Ether.
Yes, it's the same blockchain.
You can see in the stats that full node (stores current state) currently takes about 400 GB and archive node (stores current and historical states as well) takes about 6.6 TB.
My observation is that most web apps using blockchain data don't verify and trust a third-party service running a node (such as Infura). And I believe that most end users or businesses who want/need to verify, usually have the capacity to store 400+ GB and are able to scale.
But if this amount of data is okay or "impossibly large huge", I'll leave that to your decision. :)
Deployment of a token smart contract usually costs between 500k to 3M gas. My estimate is that most token contracts with basic features that were compiled with an optimizer, cost around 1M gas to deploy. With current prices of ~200 Gwei/gas and $1800/ETH, that's about $350. But I remember just few months ago the average gas prices were ~20 and ETH cost $500, so that would be around $10. So yea, the cost of deploying a contract is very volatile.
Simple function that performs validations and transformations in memory is going to cost the base 21k + few hundred gas. (Working with memory data is cheap gas-wise, accessing the storage is much more expensive.) So in current prices around $7, few months ago it could have been $0.25.
As for the question, whether ETH2.0 is going to bring lower gas price: My opinion is that L2 (which should be released earlier than PoS) is going to have some effect on the price since it allows for sidechain transactions (similar to Lightning network on Bitcoin). But this is a development forum, so I'm not not going to dive deeper into price speculations.
I recommend OpenZeppelin docs where they cover their opensource implementations of ERC standards (including ERC-721 NFTs) or googling the topic you're interested in and read articles that catch your eye (at least that's my current approach).
And if you're new to Solidity in general, I recommend at least few chapters from CryptoZombies tutorial. In my opinion, the first few chapters are great and you'll learn a lot, but then the quality slowly fades.

How to do Transfer Learning with LSTM for time series forecasting?

I am working on a project about time-series forecasting using LSTMs layers. The dataset used for training and testing the model was collected among 443 persons which worn a sensor that samples a physical variable ( 1 variable/measure) every 5 minutes, for each patient there are around 5000 records/readings.
Although, I can train and test my model under different scenarios, I am troubled finding information about how to apply transfer learning in such an architecture. I mean, I understand I can use inductive transfer-learning by copying the matrix-weights from the general model onto a secondary model (unknown person), then after I can re-train this model with specific data and evaluate the result.
But I would like to know if somebody knows other ways to apply transfer-learning on this type of architecture or where to find information about it since there aren't many scientific papers talking about it, mostly they talk about NLP and other type of application but time series?
Cheers X )

How often does Google Cloud Preemptible instances preempt (roughly)?

I see that Google Cloud may terminate preemptible instances at any time, but have any unofficial, independent studies been reported, showing "preempt rates" (number of VMs preempted per hour), perhaps sampled in several different regions?
Given how little information I'm finding (as with similar questions), even anecdotes such as: "Looking back the past 6 months, I generally see 3% - 5% instances preempt per hour in uswest1" would be useful (I presume this can be monitored similarly to instance count metrics in AWS).
Clients occasionally want to shove their existing, non-fault-tolerant code in the cloud for "cheap" (despite best practices), and without having an expected rate of failure, they're often blind-sighted by the cheapness of preemptible, so I'd like to share some typical experiences of the GCP community, even if people's experiences may vary, to help convey safe expectations.
Thinking about “unofficial, independent studies” and “even anecdotes such as:” “Clients occasionally want to shove their existing, non-fault-tolerant code in the cloud for "cheap"” it ought to be said that no one architect or sysadmin in right mind would place production workloads with defined SLA into an execution environment without SLA. Hence the topic is rather speculative.
For those who is keen, Google provides preemption rate expectation:
For reference, we've observed from historical data that the average
preemption rate varies between 5% and 15% per day per project, on a
seven-day average, occasionally spiking higher depending on time and
zone. Keep in mind that this is an observation only: Preemptible
instances have no guarantees or SLAs for preemption rates or
preemption distributions.
Besides that there is an interesting edutainment approach to the task of "how to make inapplicable applicable".