Identifying suspicious products using AI

Recently, a Malaysian KOL faced criticism for selling a skincare product believed to contain mercury. The product is believed to have not obtained permission to be sold. Let's try to use various LLMs to inspect the packaging of this product and see if we can find anything suspicious.


The input

The input is an image we obtain from a media portal.

A simple prompt is used:

Based on the product description, did you spot anything suspicious?

The LLMs

Since the input contains image and text, I selected the following models:

  • Open AI GPT 4.5
  • Claude 3.7 Sonnet
  • X Grok 2 Vision
  • Gemini 2.0 Flash

The Results

1) Open AI GPT 4.5 replied:

2) Claude 3.7 Sonnet replied:

3) Grok 2 Vision replied:

4) Google Gemini 2.0 Flash / 1.5 Pro

The result given by Gemini 2.0 Flash is rather disappointing. I retried the same request using Gemini 1.5 Pro, the result was similar.

What has happened, Google?


Summary

  Grok 2 Vision Claude 3.7 Sonnet Open AI GPT 4.5 Google Gemini 2.0 Flash
Based on description Detect spelling errors, lack of ingredients, no regulatory information Lack of ingredients, no manufacturer, safety certificate Detect spelling errors and bad English. Lack of ingredients. Lack of manufacturer and safety info. Detect 1 spelling mistake
Based on the packaging Mentioning that it is not a well-known brand and the packaging is basic The red packaging evokes certain associations    
Others   The small size (15g) looks more like a trial/sample    
Possible AI halucination   Suggesting that it could be a stimulant    
  • Grok 2 Vision and Claude 3.7 Sonnet analyze from both the extracted text and the picture itself. While GPT 4.5 and Gemini 2.0 Flash seem to just analyze the extracted text. (This could be due to the original prompt asking to check description specifically)
  • Grok 2 Vision seems to be surprisingly good, although it is kind of a "previous" generation model compared to the rest.
  • Google Gemini Flash is disappointing.

 


AI Summary AI Summary
gpt-4o-2024-08-06 2025-04-17 00:38:14
The blog discusses using AI language models to inspect a skincare product suspected of containing mercury and lacking sales permission. Various models like OpenAI GPT 4.5, Claude 3.7 Sonnet, Grok 2 Vision, and Google Gemini 2.0 Flash were tested on identifying suspicious elements from the product's packaging and description. Grok 2 Vision and Claude 3.7 Sonnet excelled in analyzing both text and image, while Google Gemini 2.0 Flash provided unsatisfactory results.
Chrome On-device AI 2025-06-21 04:08:19
Writing

Share Share this Post