Related Posts
Hi, any idea about WFO,?
What is the starting salary for a tax senior? 😅
Additional Posts in Consulting
New to Fishbowl?
Download the Fishbowl app to
unlock all discussions on Fishbowl.
unlock all discussions on Fishbowl.
Hi, any idea about WFO,?
What is the starting salary for a tax senior? 😅
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Download the Fishbowl app to unlock all discussions on Fishbowl.
Copy and paste embed code on your site

Scan your QR code to download
Fishbowl app on your mobile

Pro
Mine can't find an email based on some simple criteria like who it was from or the subject matter. It was really good at giving me a five paragraph explanation on what kind of other emails it might be able to show me
Chief
They still can’t make tacos
Chief
That’s a clanker taco
lol. sure.
Oh sweet summer child
Well, that’s the entire point of “generative” ai
The industry is not ready to handover the keys to AI. Period.
Most companies flaunting AI are posturing.
I just want to know who you work for and what clown university you got your degree from at this point
Pro
but my water :(
Not sure if you are being facetious or serious but we should be worried about our water quality, air quality, etc specifically long term
Nice try, AI
If you think LLMs write "flawless code," I have some serious doubts about your own ability.
I've worked with several of the mainstream LLMs and they all need a lot of hand-holding to keep the level of technical debt low. Even then, theu still requires very regular code reviews to catch stupid assumptions/hallucinations.
As a force-multiplier, they are practical. As a standalone expert, not even close.
Force multiplier is a great way to describe them. As a developer they can make me a better developer because I understand what I’m building. I’m not a particularly good writer. If I use an llm to help me write a novel, I’ll write a novel but it won’t be very good because it’s multiplying a weakness instead of a strength.
Technically slightly overstated at present. Relevant benchmarks for hard agentic terminal use problems and non-hallucination rates max out in the 50% range for the top tier models presently. Given the established rate of change though, we could easily see mid-90% range by the end of the year. Between Gemini 3 Pro in November 2025 and Gemini 3.1 Pro released this week, the non-hallucination rate increased from 12% to 50% on the relevant benchmark.
The core LLM flow (take input, produce plausible tokens) quality has pretty muched peaked and now it’s mainly about building harnesses around the models. Hallucinations, lack of transparency, context length/ attention limitations are a feature not a bug, and different tools and workflows need to be built to get around them and make LLMs useful for actual serious work.
It’s a shame the LLMs are superficially convincing. For example I see people ask “Analyse all the research in the world to define if X is true” and then actually believe the chatbot when it says seconds later “Based on analysis of all publications…”. In reality of course it did NOT analyse anything as that would have taken hours :) Even “Deep Research” is mainly 100-200 google searches and very little critical analysis.
Winners will be those who take a pragmatic approach to AI and learn to use a few great tools fit for their workflows. Check out NotebookLM, Claude Cowork, Skimle, Gamma etc for typical consulting use cases. And spend time learning how LLMs works beyond the “it’s magic” hype :)
While I agree that an agentic harness and tool use greatly enhance models' capabilities, it is inaccurate to say the models themselves have peaked. Objective benchmarks demonstrate they continue to make rapid progress on multiple fronts, including on tasks that people typically cite as their shortcomings. This is why Google, OpenAI and Anthropic have increased their pace in releasing new SOTA models--the gains are evident at a faster and faster rate. Check out Artificial Analysis if you don't already.
Rising Star
There is still a ton of things it can’t do but in about two years 😳
Rising Star
We use different LLMs. The answers I get from Claude/Gemini/ChatGPT range from D+ to C+.
Plenty of leverage for sure, it even using the latest models and rigorous prompt design and criteria, I still have to pressure test and iterate anything I get through multiple cycles before I consider it even of even moderately decent quality - this is for data analytics tasks or text generation. And even then I end up making significant adjustments and revisions before it’s client ready. As someone who actually holds a PhD, I question your estimation of the models’ capabilities. I’d put them at “over-eager undergrad intern” or worse yet “average foreigner paying the full cost of a 1-year Master’s Degree to access the U.S. job market”
What you on? AI still fecked-up 😆 Can’t lie that it’s great in certain industry & use cases - can only use it take minutes which i tend spend time auditing for accuracy…like a job in itself