Image Generation Workflow

The closed-loop image generation workflow

We have created a closed-loop image generation workflow to generate images that represent the core message of the Chinese modern poems visually. The workflow consists of 5 key phrases.

1. Poem Analysis: A text-based LLM (GPT-4o) analyzes the poem to identify key elements, semantic, and core message of it.

2. Prompt Engineering: Based on this analysis, the LLM crafts a detailed image generation prompt, selecting a visual style that resonates with the poem’s themes.

3. Image Generation: The engineered prompt is processed by FLUX-pro-1.1 to create a visual representation.

4. Evaluation: A separate LLM evaluator examines the image against the following areas:
1. Appropriateness
2. Representation of the key elements
3. Representation of the semantics
4. Representation of the core message
5. Technical quality

5. Regeneration: If the evaluation indicates shortcomings, feedback is incorporated into a revised prompt, initiating up to two regeneration cycles until quality thresholds are met.

Our prompt templates are publicly available here: CUHK-DAO/template at main · sam1037/CUHK-DAO

Video demostration

This video demostrates the workflow for the poem 我病了還是這個社會病了 by author 南山匹夫, a poem exploring moral paralysis in modern society and the ethical questions that arise when fear prevents action.