We collect cookies to analyze our website traffic and performance; we never collect any personal data. Cookie Policy
Accept
NEW YORK DAWN™NEW YORK DAWN™NEW YORK DAWN™
Notification Show More
Font ResizerAa
  • Home
  • Trending
  • New York
  • World
  • Politics
  • Business
    • Business
    • Economy
    • Real Estate
  • Crypto & NFTs
  • Tech
  • Lifestyle
    • Lifestyle
    • Food
    • Travel
    • Fashion
    • Art
  • Health
  • Sports
  • Entertainment
Reading: Qwen-Picture is a robust, open supply new AI picture generator with assist for embedded textual content in English & Chinese language
Share
Font ResizerAa
NEW YORK DAWN™NEW YORK DAWN™
Search
  • Home
  • Trending
  • New York
  • World
  • Politics
  • Business
    • Business
    • Economy
    • Real Estate
  • Crypto & NFTs
  • Tech
  • Lifestyle
    • Lifestyle
    • Food
    • Travel
    • Fashion
    • Art
  • Health
  • Sports
  • Entertainment
Follow US
NEW YORK DAWN™ > Blog > Technology > Qwen-Picture is a robust, open supply new AI picture generator with assist for embedded textual content in English & Chinese language
Qwen-Picture is a robust, open supply new AI picture generator with assist for embedded textual content in English & Chinese language
Technology

Qwen-Picture is a robust, open supply new AI picture generator with assist for embedded textual content in English & Chinese language

Last updated: August 4, 2025 7:51 pm
Editorial Board Published August 4, 2025
Share
SHARE

After seizing the summer season with a blitz of highly effective, freely obtainable new open supply language and coding centered AI fashions that matched or in some circumstances bested closed-source/proprietary U.S. rivals, Alibaba’s crack “Qwen Team” of AI researchers is again once more at the moment with the discharge of a extremely ranked new AI picture generator mannequin — additionally open supply.

Qwen-Picture stands out in a crowded subject of generative picture fashions because of its emphasis on rendering textual content precisely inside visuals — an space the place many rivals nonetheless battle.

Supporting each alphabetic and logographic scripts, the mannequin is especially adept at managing complicated typography, multi-line layouts, paragraph-level semantics, and bilingual content material (e.g., English-Chinese language).

In apply, this permits customers to generate content material like film posters, presentation slides, storefront scenes, handwritten poetry, and stylized infographics — with crisp textual content that aligns with their prompts.

The AI Influence Collection Returns to San Francisco – August 5

The following part of AI is right here – are you prepared? Be a part of leaders from Block, GSK, and SAP for an unique take a look at how autonomous brokers are reshaping enterprise workflows – from real-time decision-making to end-to-end automation.

Safe your spot now – house is restricted: https://bit.ly/3GuuPLF

Qwen-Picture’s output examples embrace all kinds of real-world use circumstances:

Advertising and marketing & Branding: Bilingual posters with model logos, stylistic calligraphy, and constant design motifs

Presentation Design: Structure-aware slide decks with title hierarchies and theme-appropriate visuals

Schooling: Technology of classroom supplies that includes diagrams and exactly rendered educational textual content

Retail & E-commerce: Storefront scenes the place product labels, signage, and environmental context should all be readable

Inventive Content material: Handwritten poetry, scene narratives, anime-style illustration with embedded story textual content

Customers can work together with the mannequin on the Qwen Chat web site by deciding on “Image Generation” mode from the buttons under the immediate entry subject.

Nevertheless, my transient preliminary exams revealed the textual content and immediate adherence was not noticeably higher than Midjourney, the favored proprietary AI picture generator from the U.S. firm of the identical identify. My session by means of Qwen chat produced a number of errors in immediate comprehension and textual content constancy, a lot to my disappointment, even after repeated makes an attempt and immediate rewording:

Screenshot 2025 08 04 at 2.03.19%E2%80%AFPM

Screenshot 2025 08 04 at 2.06.23%E2%80%AFPM

But Midjourney solely provides a restricted variety of free generations and requires subscriptions for any extra, in comparison with Qwen Picture, which, because of its open supply licensing and weights posted on Hugging Face, will be adopted by any enterprise or third-party supplier free-of-charge.

Licensing and availability

Qwen-Picture is distributed underneath the Apache 2.0 license, permitting business and non-commercial use, redistribution, and modification — although attribution and inclusion of the license textual content are required for spinoff works.

However the truth that the mannequin’s coaching information stays a tightly guarded secret — like with most different main AI picture turbines — could bitter some enterprises on the concept of utilizing it.

The mannequin and related belongings — together with demo notebooks, analysis instruments, and fine-tuning scripts — can be found by means of a number of repositories:

As well as, a dwell analysis portal known as AI Area permits customers to check picture generations in pairwise rounds, contributing to a public Elo-style leaderboard.

Coaching and growth

Behind Qwen-Picture’s efficiency is an intensive coaching course of grounded in progressive studying, multi-modal process alignment, and aggressive information curation, in line with the technical paper the analysis workforce launched at the moment.

The coaching corpus contains billions of image-text pairs sourced from 4 domains: pure imagery, human portraits, inventive and design content material (corresponding to posters and UI layouts), and artificial text-focused information. The Qwen Crew didn’t specify the scale of the coaching information corpus, apart from “billions of image-text pairs.” They did present a breakdown of the tough share of every class of content material it included:

Nature: ~55%

Design (UI, posters, artwork): ~27%

Folks (portraits, human exercise): ~13%

Artificial textual content rendering information: ~5%

Notably, Qwen emphasizes that each one artificial information was generated in-house, and no photographs created by different AI fashions have been used. Regardless of the detailed curation and filtering phases described, the documentation doesn’t make clear whether or not any of the info was licensed or drawn from public or proprietary datasets.

In contrast to many generative fashions that exclude artificial textual content because of noise dangers, Qwen-Picture makes use of tightly managed artificial rendering pipelines to enhance character protection — particularly for low-frequency characters in Chinese language.

A curriculum-style technique is employed: the mannequin begins with easy captioned photographs and non-text content material, then advances to layout-sensitive textual content situations, mixed-language rendering, and dense paragraphs. This gradual publicity is proven to assist the mannequin generalize throughout scripts and formatting varieties.

Qwen-Picture integrates three key modules:

Qwen2.5-VL, the multimodal language mannequin, extracts contextual which means and guides era by means of system prompts.

VAE Encoder/Decoder, skilled on high-resolution paperwork and real-world layouts, handles detailed visible representations, particularly small or dense textual content.

MMDiT, the diffusion mannequin spine, coordinates joint studying throughout picture and textual content modalities. A novel MSRoPE (Multimodal Scalable Rotary Positional Encoding) system improves spatial alignment between tokens.

Collectively, these parts enable Qwen-Picture to function successfully in duties that contain picture understanding, era, and exact enhancing.

Efficiency benchmarks

Qwen-Picture was evaluated in opposition to a number of public benchmarks:

GenEval and DPG for prompt-following and object attribute consistency

OneIG-Bench and TIIF for compositional reasoning and structure constancy

CVTG-2K, ChineseWord, and LongText-Bench for textual content rendering, particularly in multilingual contexts

In practically each case, Qwen-Picture both matches or surpasses present closed-source fashions like GPT Picture 1 [High], Seedream 3.0, and FLUX.1 Kontext [Pro]. Notably, its efficiency on Chinese language textual content rendering was considerably higher than all in contrast programs.

On the general public AI Area leaderboard — primarily based on 10,000+ human pairwise comparisons — Qwen-Picture ranks third total and is the highest open-source mannequin.

Implications for enterprise technical decision-makers

For enterprise AI groups managing complicated multimodal workflows, Qwen-Picture introduces a number of practical benefits that align with the operational wants of various roles.

These managing the lifecycle of vision-language fashions — from coaching to deployment — will discover worth in Qwen-Picture’s constant output high quality and its integration-ready parts. The open-source nature reduces licensing prices, whereas the modular structure (Qwen2.5-VL + VAE + MMDiT) facilitates adaptation to customized datasets or fine-tuning for domain-specific outputs.

The curriculum-style coaching information and clear benchmark outcomes assist groups consider health for goal. Whether or not deploying advertising visuals, doc renderings, or e-commerce product graphics, Qwen-Picture permits speedy experimentation with out proprietary constraints.

Engineers tasked with constructing AI pipelines or deploying fashions throughout distributed programs will admire the detailed infrastructure documentation. The mannequin has been skilled utilizing a Producer-Client structure, helps scalable multi-resolution processing (256p to 1328p), and is constructed to run with Megatron-LM and tensor parallelism. This makes Qwen-Picture a candidate for deployment in hybrid cloud environments the place reliability and throughput matter.

Furthermore, assist for image-to-image enhancing workflows (TI2I) and task-specific prompts allows its use in real-time or interactive functions.

Professionals centered on information ingestion, validation, and transformation can use Qwen-Picture as a instrument to generate artificial datasets for coaching or augmenting pc imaginative and prescient fashions. Its means to generate high-resolution photographs with embedded, multilingual annotations can enhance efficiency in downstream OCR, object detection, or structure parsing duties.

Since Qwen-Picture was additionally skilled to keep away from artifacts like QR codes, distorted textual content, and watermarks, it provides higher-quality artificial enter than many public fashions — serving to enterprise groups protect coaching set integrity.

In search of suggestions and alternatives to collaborate

The Qwen Crew emphasizes openness and neighborhood collaboration within the mannequin’s launch.

Builders are inspired to check and fine-tune Qwen-Picture, provide pull requests, and take part within the analysis leaderboard. Suggestions on textual content rendering, enhancing constancy, and multilingual use circumstances will form future iterations.

With a said aim to “lower the technical barriers to visual content creation,” the workforce hopes Qwen-Picture will serve not simply as a mannequin, however as a basis for additional analysis and sensible deployment throughout industries.

Every day insights on enterprise use circumstances with VB Every day

If you wish to impress your boss, VB Every day has you coated. We provide the inside scoop on what firms are doing with generative AI, from regulatory shifts to sensible deployments, so you may share insights for optimum ROI.

An error occured.

vb daily phone

You Might Also Like

AI denial is turning into an enterprise threat: Why dismissing “slop” obscures actual functionality positive factors

GAM takes purpose at “context rot”: A dual-agent reminiscence structure that outperforms long-context LLMs

The 'reality serum' for AI: OpenAI’s new technique for coaching fashions to admit their errors

Anthropic vs. OpenAI pink teaming strategies reveal completely different safety priorities for enterprise AI

Inside NetSuite’s subsequent act: Evan Goldberg on the way forward for AI-powered enterprise methods

TAGGED:ChineseembeddedEnglishgeneratorimageopenpowerfulQwenImagesourcesupportText
Share This Article
Facebook Twitter Email Print

Follow US

Find US on Social Medias
FacebookLike
TwitterFollow
YoutubeSubscribe
TelegramFollow
Popular News
Windfall’s ’s 50 Latest Listings: September 9, 2025
Real Estate

Windfall’s ’s 50 Latest Listings: September 9, 2025

Editorial Board September 10, 2025
How to Get a Digital Nomad Visa in Colombia: Global Mobility Institute
The Perils of Slow Vote-Counting and Delayed Election Results
Chicago’s 50 Latest Listings: September 2, 2025
Russians Now See a New Side to Putin: Dragging Them Into War

You Might Also Like

Nvidia's new AI framework trains an 8B mannequin to handle instruments like a professional
Technology

Nvidia's new AI framework trains an 8B mannequin to handle instruments like a professional

December 4, 2025
Gong examine: Gross sales groups utilizing AI generate 77% extra income per rep
Technology

Gong examine: Gross sales groups utilizing AI generate 77% extra income per rep

December 4, 2025
AWS launches Kiro powers with Stripe, Figma, and Datadog integrations for AI-assisted coding
Technology

AWS launches Kiro powers with Stripe, Figma, and Datadog integrations for AI-assisted coding

December 4, 2025
Workspace Studio goals to unravel the true agent drawback: Getting staff to make use of them
Technology

Workspace Studio goals to unravel the true agent drawback: Getting staff to make use of them

December 4, 2025

Categories

  • Health
  • Sports
  • Politics
  • Entertainment
  • Technology
  • Art
  • World

About US

New York Dawn is a proud and integral publication of the Enspirers News Group, embodying the values of journalistic integrity and excellence.
Company
  • About Us
  • Newsroom Policies & Standards
  • Diversity & Inclusion
  • Careers
  • Media & Community Relations
  • Accessibility Statement
Contact Us
  • Contact Us
  • Contact Customer Care
  • Advertise
  • Licensing & Syndication
  • Request a Correction
  • Contact the Newsroom
  • Send a News Tip
  • Report a Vulnerability
Term of Use
  • Digital Products Terms of Sale
  • Terms of Service
  • Privacy Policy
  • Cookie Settings
  • Submissions & Discussion Policy
  • RSS Terms of Service
  • Ad Choices
© 2024 New York Dawn. All Rights Reserved.
Welcome Back!

Sign in to your account

Lost your password?