“Gemini Ultra is the first model to outperform human experts on MMLU (massive multitask language understanding), which uses a combination of 57 subjects such as math, physics, history, law, medicine and ethics for testing both world knowledge and problem-solving abilities.”
Related Posts
Hi fishes,
How is the wlb in Amex?
What are the employee Benefits In Dxc ?
More Posts
Hi there need 10 likes for Dm please
How is Prod support roles in Bofa. I heard some flexibility there to change project through IJP. Can I change my project to technical side any time after joining the project? I have been doing Devops certification. Will they give a chance to work in Devops, if I have skills?Bank of America BA Continuum India Pvt. Ltd.
Anyone from BCG Gamma who wouldn’t mind DM?
Additional Posts in Artificial Intelligence
New to Fishbowl?
unlock all discussions on Fishbowl.



They gamed the benchmark. Using chain of thought prompting while GPT-4 is benchmarked against 4 shot (iirc). Technical paper shows minimal difference on an apples-to-apples comparison, but multimodal performance is significantly better for Gemini. Google needed a PR headline - “almost the same as GPT-4 isn’t good enough.
Benchmark is also compared to GPT-4 standard, not turbo, which OpenAI claims is better.
My suspicion is that the multimodal aspect is easier to catch up to for OpenAI.
What is obvious is that there is a variety of low-hanging fruit to juice performance of the models that is constrained by compute, so these are likely to get substantially better rapidly.