Image generation has become commoditised. Any platform can generate an image from a text prompt. The outputs are technically impressive and frequently useless for marketing. A beautiful image that does not align with the brand, does not suit the platform's dimensions, and does not communicate the intended message is just a beautiful distraction.
Visual intelligence - the ability to understand what makes an image effective in a specific marketing context - is a different and much harder problem than image generation. It is also the one that matters.
Comprehension before creation
Before Cleo generates an image, the system considers context that a generic image generator cannot access. What is the brand's visual identity? What platform is the image for? What is the surrounding content? What is the strategic goal of the piece? What has performed well visually for this brand in the past?
These considerations shape the generation prompt before a single pixel is produced. The image generation model sees a carefully constructed request that encodes brand knowledge, platform requirements, and strategic intent - not just a user's casual description.
The classification layer
When images enter the system - whether user-uploaded or AI-generated - they pass through a classification layer that understands their content semantically. The system can identify whether an image contains people, products, text, logos, landscapes, abstract patterns. It understands visual composition, colour palette, and mood.
This classification feeds back into future generation. When the AI knows that a brand's best-performing content features warm, people-centric imagery with natural lighting, it can steer generation in that direction without the user having to specify these preferences every time.
Platform-aware output
An Instagram story has different requirements than a LinkedIn post header than a Google display ad than an email hero image. Dimensions, safe zones, text overlay considerations, visual density - all vary by platform and placement.
Our visual pipeline is platform-aware. When generating an image for a specific channel, the system automatically applies the correct constraints. It is not just resizing - it is reconsidering composition for each aspect ratio and each context of use.
Quality as taste
The hardest aspect of visual intelligence is taste - the subjective judgment of whether an image is good enough for a professional brand's marketing. We approach this through a quality evaluation step that considers technical quality, brand alignment, platform suitability, and compositional strength before presenting output to the user.
Not every generated image meets the bar. The system generates, evaluates, and selects rather than presenting the first result. The user sees the best output, not the only output.
Visual intelligence is not about generating images. It is about understanding what makes images work.
- Cleo's Team