Recently, a Malaysian KOL faced criticism for selling a skincare product believed to contain mercury. The product is believed to have not obtained permission to be sold. Let's try to use various LLMs to inspect the packaging of this product and see if we can find anything suspicious.
The input
The input is an image we obtain from a media portal.
A simple prompt is used:
Based on the product description, did you spot anything suspicious?
The LLMs
Since the input contains image and text, I selected the following models:
- Open AI GPT 4.5
- Claude 3.7 Sonnet
- X Grok 2 Vision
- Gemini 2.0 Flash
The Results
1) Open AI GPT 4.5 replied:
2) Claude 3.7 Sonnet replied:
3) Grok 2 Vision replied:
4) Google Gemini 2.0 Flash / 1.5 Pro
The result given by Gemini 2.0 Flash is rather disappointing. I retried the same request using Gemini 1.5 Pro, the result was similar.
What has happened, Google?
Summary
Grok 2 Vision | Claude 3.7 Sonnet | Open AI GPT 4.5 | Google Gemini 2.0 Flash | |
Based on description | Detect spelling errors, lack of ingredients, no regulatory information | Lack of ingredients, no manufacturer, safety certificate | Detect spelling errors and bad English. Lack of ingredients. Lack of manufacturer and safety info. | Detect 1 spelling mistake |
Based on the packaging | Mentioning that it is not a well-known brand and the packaging is basic | The red packaging evokes certain associations | ||
Others | The small size (15g) looks more like a trial/sample | |||
Possible AI halucination | Suggesting that it could be a stimulant |
- Grok 2 Vision and Claude 3.7 Sonnet analyze from both the extracted text and the picture itself. While GPT 4.5 and Gemini 2.0 Flash seem to just analyze the extracted text. (This could be due to the original prompt asking to check description specifically)
- Grok 2 Vision seems to be surprisingly good, although it is kind of a "previous" generation model compared to the rest.
- Google Gemini Flash is disappointing.