This is a really odd way to test capabilities of an LLM. First, most photos of clocks are 10:10, since the training data for watches are usually set to 10:10 (in order to better sell watches etc).
Second, I don't think the photo generation aspect of chat gpt is being marketed or presented as a problem solving AI.
I love the concept of the article where one LLM can't draw a simple clock but the other one can accurately diagnose medical conditions from a hypothetical drawn image.
Here's the thing (which you probably knew going in).. Generative AI is quite well-known to be terrible at drawing specific times on clock faces.
This is down to the training data. It has been trained on a huge amount of images.
That includes advertising. For whatever reason, wrist watch manufacturers have a tendency to set watches to 10:10 in ads, almost without exception. Perhaps it's just a nice-looking time, or it's good for comparison purposes.
Simply Google "wrist watch" and you'll see.
So, these generative models have a huge bias towards 10:10 on clock faces, because that's what all the clocks they've been trained on look like.
Second, I don't think the photo generation aspect of chat gpt is being marketed or presented as a problem solving AI.
There are signs of pre-frontal cortex damage or early stage dementia.
This is down to the training data. It has been trained on a huge amount of images.
That includes advertising. For whatever reason, wrist watch manufacturers have a tendency to set watches to 10:10 in ads, almost without exception. Perhaps it's just a nice-looking time, or it's good for comparison purposes.
Simply Google "wrist watch" and you'll see.
So, these generative models have a huge bias towards 10:10 on clock faces, because that's what all the clocks they've been trained on look like.
Prompt was just "create an svg of a clockface with the time being 10 past 11"