Cơ bản tạo hình ảnh | MinAI Learning

Lý thuyết

2/13•30 phút

Đang tải...

Cơ bản tạo hình ảnh

Tìm hiểu cách Diffusion Models hoạt động để tạo hình ảnh

🎯 Mục tiêu bài học

TB5 min

Trong bài này, chúng ta sẽ tìm hiểu cách các AI models tạo ra hình ảnh - từ lý thuyết đến thực hành.

Sau bài này, bạn sẽ:

✅ Hiểu cách Diffusion Models hoạt động (forward/reverse process) ✅ Nắm các tham số quan trọng: CFG Scale, Steps, Seed ✅ Thực hành tạo hình ảnh với DALL-E 3 và Replicate API ✅ Áp dụng best practices cho image generation

Task 0

🔍 Diffusion Models là gì?

TB5 min

Diffusion Process

Diffusion models học cách tạo hình ảnh bằng cách:

Forward process: Thêm noise vào image cho đến khi thành random noise
Reverse process: Học cách remove noise từng bước để tạo image

Diagram

Đang vẽ diagram...

Checkpoint

Bạn đã hiểu forward process và reverse process trong Diffusion Models chưa?

Task 1

📊 Các loại Image Generation Models

TB5 min

1. DALL-E (OpenAI)

DALL-E 2: 1024x1024, inpainting, variations
DALL-E 3: Text tốt hơn, prompt chính xác hơn

python.py

1from openai import OpenAI
2
3client = OpenAI()
4
5response = client.images.generate(
6    model="dall-e-3",
7    prompt="A serene Vietnamese countryside with rice paddies at sunset, watercolor style",
8    size="1024x1024",
9    quality="hd",
10    n=1
11)
12
13image_url = response.data[0].url
14print(image_url)

Task 2

Tiếp: DALL-E 3 Mastery

📝 Prompt Engineering cho Images

TB5 min

Structure của một good prompt

Ví dụ

1[Subject] + [Style] + [Details] + [Lighting] + [Quality Tags]

Ví dụ:

Ví dụ

1A Vietnamese woman in traditional áo dài, 
2watercolor painting style, 
3standing in a garden of lotus flowers,
4soft morning light,
5detailed, high quality, 4k

Prompt Tích Cực vs Tiêu Cực

Positive (muốn có):

"detailed", "high quality", "sharp focus"
"beautiful lighting", "professional photo"

Negative (không muốn):

"blurry", "low quality", "distorted"
"bad anatomy", "extra fingers"

Từ Khóa Phong Cách

Phong cách	Từ khóa
Photo	photorealistic, photography, DSLR, 50mm
Art	oil painting, watercolor, digital art
3D	3D render, Blender, Unreal Engine
Anime	anime style, manga, Studio Ghibli

Checkpoint

Bạn đã biết cách cấu trúc prompt với Subject, Style, Details, Lighting và Quality Tags chưa?

Task 3

💻 Thực hành với Python

TB5 min

DALL-E 3 API

python.py

1from openai import OpenAI
2import requests
3from PIL import Image
4from io import BytesIO
5
6client = OpenAI()
7
8def generate_image(prompt, size="1024x1024", quality="standard"):
9    """Generate image with DALL-E 3"""
10    response = client.images.generate(
11        model="dall-e-3",
12        prompt=prompt,
13        size=size,
14        quality=quality,
15        n=1
16    )
17    
18    image_url = response.data[0].url
19    revised_prompt = response.data[0].revised_prompt
20    
21    print(f"Revised prompt: {revised_prompt}")
22    return image_url
23
24# Generate
25prompt = "A cozy Vietnamese coffee shop (quán cà phê) with traditional decor, warm lighting, people enjoying cà phê sữa đá"
26url = generate_image(prompt, quality="hd")
27
28# Download image
29response = requests.get(url)
30img = Image.open(BytesIO(response.content))
31img.save("coffee_shop.png")

Replicate API (Stable Diffusion)

python.py

1import replicate
2
3# SDXL
4output = replicate.run(
5    "stability-ai/sdxl:39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b",
6    input={
7        "prompt": "Vietnamese street food scene, pho restaurant, steam rising, warm evening light",
8        "negative_prompt": "blurry, low quality",
9        "width": 1024,
10        "height": 1024,
11        "num_inference_steps": 30,
12        "guidance_scale": 7.5
13    }
14)
15
16print(output)  # Returns URL to generated image

Checkpoint

Bạn đã thử generate images với cả DALL-E 3 API và Replicate API chưa?

Task 4

⚡ Parameters quan trọng

TB5 min

Thang Hướng Dẫn (CFG)

Low (1-5): Sáng tạo, ít tuân theo prompt
Medium (7-10): Cân bằng
High (15+): Tuân thủ chặt chẽ, có thể quá bão hòa

Các Bước

Low (20-30): Nhanh, ít chi tiết
Medium (40-50): Cân bằng tốt
High (100+): Chậm, cải thiện không đáng kể

Seed (Hạt Giống)

Cùng seed + cùng prompt = cùng output:

python.py

1# Reproducible generation
2output = generate(
3    prompt="...",
4    seed=12345  # Fixed seed
5)

Kích Thước & Tỉ Lệ Hình Ảnh

Trường hợp sử dụng	Tỷ lệ khung hình	Kích thước
Vuông (Instagram)	1:1	1024x1024
Dọc	2:3	832x1216
Ngang	3:2	1216x832
Rộng	16:9	1344x768

Checkpoint

Bạn đã hiểu tác dụng của CFG Scale, Steps và Seed trong image generation chưa?

Task 5

Khóa học

Mentor & Hỗ trợ

Blog

Giới thiệu

Cơ bản tạo hình ảnh

🎯 Mục tiêu bài học

Sau bài này, bạn sẽ:

🔍 Diffusion Models là gì?

Checkpoint

📊 Các loại Image Generation Models

1. DALL-E (OpenAI)

2. Stable Diffusion

3. Midjourney

Checkpoint

📝 Prompt Engineering cho Images

Structure của một good prompt

Prompt Tích Cực vs Tiêu Cực

Từ Khóa Phong Cách

Checkpoint

💻 Thực hành với Python

DALL-E 3 API

Replicate API (Stable Diffusion)

Checkpoint

⚡ Parameters quan trọng

Thang Hướng Dẫn (CFG)

Các Bước

Seed (Hạt Giống)

Kích Thước & Tỉ Lệ Hình Ảnh

Checkpoint

📝 Best Practices

Checkpoint

🎯 Tổng kết

Bài tập thực hành

Câu hỏi tự kiểm tra

🚀 Bài tiếp theo

Tài liệu tham khảo