Related Posts
Is Elon Musk's image finally breaking apart?
Typical tech consulting salary in uk for 3 yoe?
Everyone going office regularly?
Additional Posts in Data & Analytics Consultants
today I choose violence

Thought this was interesting. Across 160 teams of researchers, just about all failed to make good life outcome predictions on things like GPA, evictions, layoffs, and others. Data followed 4.5k families across 15 years, with 13k features (varied over time). Haven't looked at it directly yet, but will be turning the docs and data inside out... In the meantime, authors claim this as showing the limits of ML. Oh, and it's published in PNAS, so you know there's some big publication energy there.
https://www.pnas.org/content/117/15/8398
Got messaged by a C3 . ai recruiter. Read that wlb is bad and that the interview process is absurdly long, but the Glassdoor reviews are 4.2 and can't find actual hours worked posted by anyone. How's the culture really? I'd be aiming for DS consulting, something more functional but with DS/ML concepts as my differentiator.
C3.ai, Inc.
New to Fishbowl?
unlock all discussions on Fishbowl.




You probably need to quantify the qualitative variables first. Then it depends on what test gives you the best accuracy. Split the data into training and test, and try different models.
To do this you will need to have a decent understanding of statistical fundamental rules.
Thanks, will give this a shot. There’s a lot of history to the data so biases can be mitigated by expanding the sample data.
Shap values
Thx, can this be done without python?
Sounds like your data is probably tabular. XGBoost then just take a look at the SHAP values.
Try running a regression to see which independent variables have pvalues lower than .05 and use them to build other forecast models or try using PCA to see which top variables explain the highest variance among your data and use them to build further models
Any thoughts on whether regression or PCA is superior/ inferior compared to other suggestions (eg, shap, xgboost) for my use case?
Coach
How do you define top?
Why are you forecasting with an undetermined dependent variable?
Coach
There’s dozens of algos you can try, but your dep varible needs to be driven by a clear biz question or objective. Otherwise it’s just DS for DS sake and the ppl funding us hate that.
What type of forecast model ? Regression ? Something else ?
Surprised there is no time element to this if you are looking at discounts of products. Could be a high degree of seasonality !
You mentioned forecasting, are these variables cross sectional or time series data? What’s your dependent variable?
Cross sectional, dependent var = sku quantity consumed
CART
Always start from understanding what you are trying to predict, and the general structure of your data.
It sounds like you are trying to predict volume on each of ~100 SKUs for a set of customers? How many customers? What time horizon are you trying to predict?
Do your candidate variables describe the SKUs, the customers, the order history or all of the above? This will all influence what you can/should do.
Appreciate everyone’s thoughts btw. New to the group & learning DS for PD so this is great!