Identifying suspicious products using AI

Recently, a Malaysian KOL faced criticism for selling a skincare product believed to contain mercury. The product is believed to have not obtained permission to be sold. Let's try to use various LLMs to inspect the packaging of this product and see if we can find anything suspicious.

The input

The input is an image we obtain from a media portal.

A simple prompt is used:

Based on the product description, did you spot anything suspicious?

The LLMs

Since the input contains image and text, I selected the following models:

Open AI GPT 4.5
Claude 3.7 Sonnet
X Grok 2 Vision
Gemini 2.0 Flash

The Results

1) Open AI GPT 4.5 replied:

2) Claude 3.7 Sonnet replied:

3) Grok 2 Vision replied:

4) Google Gemini 2.0 Flash / 1.5 Pro

The result given by Gemini 2.0 Flash is rather disappointing. I retried the same request using Gemini 1.5 Pro, the result was similar.

What has happened, Google?

Summary

	Grok 2 Vision	Claude 3.7 Sonnet	Open AI GPT 4.5	Google Gemini 2.0 Flash
Based on description	Detect spelling errors, lack of ingredients, no regulatory information	Lack of ingredients, no manufacturer, safety certificate	Detect spelling errors and bad English. Lack of ingredients. Lack of manufacturer and safety info.	Detect 1 spelling mistake
Based on the packaging	Mentioning that it is not a well-known brand and the packaging is basic	The red packaging evokes certain associations
Others		The small size (15g) looks more like a trial/sample
Possible AI halucination		Suggesting that it could be a stimulant

Grok 2 Vision and Claude 3.7 Sonnet analyze from both the extracted text and the picture itself. While GPT 4.5 and Gemini 2.0 Flash seem to just analyze the extracted text. (This could be due to the original prompt asking to check description specifically)
Grok 2 Vision seems to be surprisingly good, although it is kind of a "previous" generation model compared to the rest.
Google Gemini Flash is disappointing.

AI Summary

gpt-4o-2024-08-06 2025-04-17 00:38:14

The blog discusses using AI language models to inspect a skincare product suspected of containing mercury and lacking sales permission. Various models like OpenAI GPT 4.5, Claude 3.7 Sonnet, Grok 2 Vision, and Google Gemini 2.0 Flash were tested on identifying suspicious elements from the product's packaging and description. Grok 2 Vision and Claude 3.7 Sonnet excelled in analyzing both text and image, while Google Gemini 2.0 Flash provided unsatisfactory results.

Chrome On-device AI 2025-08-08 10:02:45

Share this Post