Related Posts
What is the fastest way to sponsor a pup?
Where do people go after Visa?
Welp, see yeah later KFC!
Additional Posts in Data & Analytics Consultants
today I choose violence

Thought this was interesting. Across 160 teams of researchers, just about all failed to make good life outcome predictions on things like GPA, evictions, layoffs, and others. Data followed 4.5k families across 15 years, with 13k features (varied over time). Haven't looked at it directly yet, but will be turning the docs and data inside out... In the meantime, authors claim this as showing the limits of ML. Oh, and it's published in PNAS, so you know there's some big publication energy there.
https://www.pnas.org/content/117/15/8398
New to Fishbowl?
unlock all discussions on Fishbowl.



When you say sometimes working sometimes not, I am guessing you mean that some of SpaCy's stopwords are not stopwords in your domain and vice versa? One possibility to generate stopwords for your own domain is to calculate the IDF of all words across your corpus and use a sensible threshold to mark words above the threshold as stopwords for your domain.
Sorry words below the IDF threshold. Stopwords are basically words with high document frequency (i.e. they occur in so many docs that they lose their discriminative power).
Coach
I've worked a little with this. Not extensively. If you put code on github and want me to look at it I will.