Custom Script to create Gif from LoRa for 0 to strength you like
Unzip in (stable-diffusion-webui)\scripts
You output gif is in stable-diffusion-webui\outputs\txt2img-images\txt2gif
Examples:
Ksampler (Efficient)
A modded KSampler with the ability to preview and output images.
Re-outputs key inputs which helps promote a cleaner and more streamlined workflow look for ComfyUI.
Can force hold all of its outputs without regenerating by setting its state to "Hold".
note: when using multiple instances of this node, each instance must have a unique ID for the "Hold" state to function properly.
Efficient Loader
A combination of common initialization nodes.
Image Overlay
Node that allows for flexible image overlaying.
Evaluate Integers
3 integer input node that gives the user ability to write their own python expression for a INT/FLOAT type output.
Evaluate Strings
3 string input node that
This custom node provides face detection and detailer features. Using this, the DDetailer extension of the WebUI can be implemented in ComfyUI. Currently, this is the main feature and additional feature will be added in the future.
https://github.com/ltdrdata/ComfyUI-Impact-Pack
Install guide:
1.Download
2.Uncompress into ComfyUI/custom_nodes
3.Restart ComfyUI
ComfyUI is an advanced node based UI utilizing Stable Diffusion. It allows you to create customized workflows such as image post-processing, or conversions.
-when you run comfyUI, the suit will generate a config file
The file looks like this :
{
"autoUpdate": true,
"branch": "main",
"openAI_API_Key": "sk-#################################"
}
this file is used to control Auto update, and to manage any other settings the tool requires
File Description:
"autoUpdate": can be (true) or (false),
"branch": default is ("main")
other options for branch:
"v2.1.X": means it will only update bug fixes for v2 version.
"main" means it will always be on latest stable build, this may add new nodes suddenly (also usually it assume you update comfy)
"develop": it will contain latest stuff I'm working on now, but may contain bugs
"openAI_API_Key": if you want to use the ChatGPT or Dall-E2 features, you need to add your open-AI API key, you can get it from (Account API Keys - OpenAI API)
you must update comfyUI first before using this version
As this version relies heavily on the new feature of comfyUI : the ability to switch inputs to be widgets and widgets to be inputs
Download the zip file.
Extract to ..\ComfyUI\custom_nodes : like this image :
restart comfy if it was running (reload web, not enough)
you will find my nodes under new group O/…
You can check the workflow folder to find great examples of how to use the tool
Kindly be notified that you can load the images in the downloaded ZIP/workflows in comfyUI to load the workflow that was used to generate it
Current Nodes:
//7/4/2023 -----------------------------------------------------------------
selectLatentFromBatchNode
if you generate multiple images, it allows you to pick which to use
for example, if you generate 4 images, it allows you to select 1 of them to do further processing on it
or you can use it to process them sequentially
NSP
this node allow you to select random value from SoupPrompts file
equations
- this node allow you to perform math equations on the input
- there are two variants
- 1 input (X)
- 2 inputs (X,Y)
(you can convert the x and y to inputs by right click on them, so you can use values from another node)
if you like this node tell me i can enhance it so you can select inputs number
// 22/3/2023 -----------------------------------------------------------------
OpenAI Nodes
OpenAI ChatGPT and DALLE-2 API as nodes, so you can use them to enhance your workflow
ChatGPT-Advanced
Load_openAI
to initialize openAI for next nodes
Advanced ChatGPT nodes
chat_message :
create a message to send it to chatGPT
combine_chat_messages:
used to group messages together before sending them to chatGPT
Chat_Completion:
the magic node this node will send the messages to ChatGPT and receive response from it , the response will be the output string
debug_Completion:
this to help you check the whole response
in this workflow, I used ChatGPT to create the prompt,
at start, I send 2 messages to ChatGPT
first message is to tell ChatGPT how to behave and what is the prompt format that I need from him
in the second message I send what I want in this case young girl dancing (I added young, so her clothes become decent XD don't misunderstand me please )
after that I feed the messages to the completion node “it is called like that in their API sorry”
and congrats, you have a nice input for your image
DallE-2 Image nodes
create_image:
used to create and image using DALLE-2 for now only 1 image each time, will update it in next patch to allow multiple images
variation_image:
this node will generate variations similar to the image you send to it
this is a full workflow where
1- use ChatGPT to generate a prompt
2- send that prompt to DALLE-2
3- give the generated image to Stable Diffusion to paint over it
4- use DALLE-2 to create variations from the output
ChatGPT-simple
This node harnesses the power of chatGPT, an advanced language model that can generate detailed image descriptions from a small input.
You need to have OpenAI API key , which you can find at https://beta.openai.com/docs/developer-apis/overview
Once you have your API key, add it to the api_key.txt file
I have made it a separate file, so that the API key doesn't get embedded in the generated images.
<you can load this image in comfyUI to load the workflow>
String Suit
add multiple nodes to support string manipulation also a tool to generate image from text
String:
node that can hold string (text)
Debug String
this node will write the string on the console
Concat string
this node is used to combine two strings together
Trim string
this is used to remove any extra spaces at the start or the end of a string
Replace string & replace string advanced
used to replace part of the text by another part
>>>> String2image <<<<
this node will generate an images based on a text, which can be used with controlNet to add text to the image.
— the tool support fonts “add the font you want in fonts folder”
“If you load the example image in comfyUI the workflow that generated it will be loaded”
>>>>CLIPStringEncode <<<
The normal ClipTextEncode node but this one receive the text from the string node, so you don't have to retype your prompt twice anymore
in this example I used depth filter but if you are using WAS nodes you can convert the text to canny using WAS canny filter it will give much better results with the canny controlNet
Other tools
LatentUpscaleMultiply:
it is a variant from the original LatentUpscale tool but instead of using width and height you use a multiply number
for example, if the original images dimensions are (512,512) and the mul values were (2,2) the result image will be (1024,1024)
also you can use it to downscale if needed by using fractions ex:(512,512) mul (.5,.5) → (256,256)
Node Path: O/Latent/LatentUpscaleMultiply
there are also many brilliant nodes in this package
WAS's Comprehensive Node Suite - ComfyUI | Stable Diffusion Other | Civitai
thanks for reading my message, I hope that my tools will help you.
Discord: Omar92#3374
"Super Easy AI Installer Tool" is a user-friendly application that simplifies the installation process of AI-related repositories for users. The tool is designed to provide an easy-to-use solution for accessing and installing AI repositories with minimal technical hassle to none the tool will automatically handle the installation process, making it easier for users to access and use AI tools.
For Windows 10+ and Nvidia GPU-based cards
For more Info:
https://github.com/diStyApps/seait
Please note that Virustotal and other antivirus programs may give a false positive when running this app. This is due the use Pyinstaller to convert the python file EXE, which can sometimes trigger false positives even for the simpler scripts which is a known issue
Unfortunately, I don't have the time to handle these false positives. However, please rest assured that the code is transparent on https://github.com/diStyApps/seait
I would rather add features and more AI tools at this stage of development.
Download the "Super Easy AI Installer Tool" at your own discretion.
Multi-language support
More AI-related repos
Pre installed auto1111 version
Pre installed python version
Locate repo
App updater
Remembering arguments
Adding arguments with input
Maybe arguments profiles
Better event handling
Support
https://www.patreon.com/distyx
https://coindrop.to/disty
files are free please sub to my channel if you like the content or consider supporting me
This If_ai SD prompt assistant help you to make good prompts to use directly in Oobabooga like shown here youtu.be/15KQnmll0zo The prompt assistant was configured to produce prompts that work well and produce varied results suitable for most subjects + to use you just give the input a name of the character or subject and a location or situation like (Harry Potter, cast a spell) if you get out of that pattern the ai starts to act normally and forget it is a prompt generator Tested and works well with the smallest Alpaca Native 4bit 7B and the llama 30b 4bit 128g
i have having issues with an image that is not the tipical power of 8 resolution, the vae encoder would crop the image but that was simly not acceptable by me so i figures something out. use the images and drop it in comfy ui.
i just padded the origenal images turned it into latent so it only cropped black area then i did what i want with the latent and then cropped back the image to its origenal size.
PS this is not the image i i needed not cropped but that was NSFW so i used this to post.
Waiting to be supplemented, comfyUI nodes built around openai and gpt
git
新しいバージョンで多分問題無いです!
git for windowsをインストール
python
python3.10.6をインストール
インストール後、パワーシェルで
python -V
を実行、バージョンが表示されていればインストールされています。
webuiでもsd-scriptでもこのバージョンが安定しているようです。古かったり新しかったりしても不具合が出るようです。
コマンドプロンプトとパワーシェルは別環境なので、パワーシェルに読み替えて下さい。
コマンドプロンプトを管理者として実行:
PyTorch のページを確認
PyTorch のページ: https://pytorch.org/index.html
次のようなコマンドを実行(実行するコマンドは,PyTorch のページの表示されるコマンドを使う).
次のコマンドは, PyTorch 2.0 (NVIDIA CUDA 11.8 用) をインストールする.
事前に NVIDIA CUDA のバージョンを確認しておくこと(ここでは,NVIDIA CUDA ツールキット 11.8 が前もってインストール済みであるとする).
https://developer.nvidia.com/cuda-11-8-0-download-archive
python -m pip install -U pip
python -m pip install -U torch torchvision torchaudio numpy numba --index-url https://download.pytorch.org/whl/cu118
python -c "import torch; print(torch.__version__, torch.cuda.is_available())"
パワーシェルから1.を実行してください。
通常はフォルダに一式ダウンロードされる筈です。
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
webui-user.bat
をエクスプローラーから実行
webブラウザから http://127.0.0.1:7860 を開く(http://localhost:7860 でもokな筈)
(設定で自動でブラウザで開くようにも出来ます。)
web-user.batの中身 例
@echo off
set PYTHON=
set GIT=
set VENV_DIR=
set COMMANDLINE_ARGS=--opt-sdp-attention --medvram --opt-channelslast --device-id 0
set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:24
call webui.bat
VRAM4GB以下向けオプション
VRAM消費量を低減する代わりに速度が犠牲になるとのこと。
set COMMANDLINE_ARGS=--medvram
↑で out of memory が出た場合
set COMMANDLINE_ARGS=--medvram --opt-split-attention
↑でもまだ out of memory が出た場合
set COMMANDLINE_ARGS=--lowvram --always-batch-cond-uncond --opt-split-attention
その他のオプション
--xformers (高速化/VRAM消費減)
--opt-channelslast (高速化)
--no-half-vae (画像真っ黒対策)
--ckpt-dir(モデルの保存先を指定する。)
--autolaunch (自動的にブラウザを立ち上げる)
--opt-sdp-no-mem-attentionまたは--opt-sdp-attention
(Torch2限定
xformersと同じく20%前後高速化し、出力にわずかな揺らぎが生じる。VRAM消費が多くなる可能性がある。
AMD Radeon,Intel Arcでも使える。)
--device-id(複数枚GPUが刺さっている場合に指定する、0から始まる。デフォルトでは0を使う。)
set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:24
PytorchでCUDAがメモリを使う時の設定
閾値6割メモリが使われたら 24MB単位でGarbageCollectionするよ(メモリ上の使われていないデータを掃除、消費メモリが減る。のでCUDAがOutOfMemoryを表示して落ちなくなる・・・という願い。)
パワーシェルでコマンドが実行出来るように権限を設定
管理者権限でパワーシェルを開く
Set-ExecutionPolicy Unrestricted
と入力しAを打つ
パワーシェルを閉じる
powershellをスタートメニューから検索して右クリックして管理者として実行をクリックしてください
パワーシェルを開いて以下を一行ずつ実行
git clone https://github.com/kohya-ss/sd-scripts.git
cd sd-scripts
python -m venv venv
.\venv\Scripts\activate
pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
pip install --upgrade -r requirements.txt
pip install -U -I --no-deps https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl
cp .\bitsandbytes_windows\*.dll .\venv\Lib\site-packages\bitsandbytes\
cp .\bitsandbytes_windows\cextension.py .\venv\Lib\site-packages\bitsandbytes\cextension.py
cp .\bitsandbytes_windows\main.py .\venv\Lib\site-packages\bitsandbytes\cuda_setup\main.py
accelerate config
Release installers v6 · derrian-distro/LoRA_Easy_Training_Scripts (github.com)
installer.pyを導入したいフォルダに配置して
パワーシェルで
python installer.py を打ち込み実行
途中色々ダウンロードされるので待ちます
Do you want to install the optional cudnn1.8 for faster training on high end 30X0 and 40X0 cards? [Y,N]?
と聞かれるので30x0/40x0シリーズのグラボを使っている場合はYを入力、それ以外のグラボはNを入力してください
sd-scriptが入りますが設定が終わっていないので
パワーシェルで一行ずつ実行してください
cd sd-scripts
venv\Scripts\activate
accelerate config
共通
accelerate configで次のように答えて下さい
- This machine
- No distributed training
- NO
- NO
- NO
- all
- fp16 (数字キーの1を押してリターンで選びます、矢印キーで操作しようとするとエラーで落ちます)
画像を用意する
(画像が少なければ反転・切り取りなどを駆使)極論すれば一枚あればどうにか出来るらしい?
ファイルをフォルダに配置する。(正則化画像は良く分からないので使わない)
webui automatic1111の拡張機能のwd1.4taggerでタグ付けバッチ処理をする(他の使った事無いのでベストかどうかは分からない)。
ファインチューン用jsonファイル作成バッチ(https://wikiwiki.jp/sd_toshiaki/LoRA#b0cb0cc0)でtaggerで作られた.txtファイルを.jsonファイルにする。
.jsonファイルの内容を見てトリガーワード(にしたいタグ)があったらそのまま、無ければ一番最初の位置に追加する(--keep_tokens=1と--shuffle_captionを指定する為)。
"C:\\Users\\watah\\Downloads\\kyousi_78\\siranami ramune\\100741149_p0.jpg": {
"tags": "siranami ramune,1girl, virtual youtuber, solo, v, fang, multicolored hair, blue jacket, blue hair, choker, hair behind ear, smile, crop top, bangs, streaked hair, hair ornament, jewelry, looking at viewer, earrings"
},
サンプルです。
.jsonファイルは上のような3行1セットな書き方をされています。画像ファイルの数だけセットがあると思って下さい。
”画像ファイルのパス”:{
”tags”:”token1,token2,,,,,,(略)”
}
token1でトリガーワードにしたいタグ(ややこしいですね)を入れます。
私はテキストエディアの置換で全部書き換えています。
置換元 -> 置換先
"tags": " -> "tags": "トリガーワード,
--shuffle_caption
これは各タグをシャッフルしてタグの重みを分散させる効果があるのだとか。
--keep_tokens=1
1番目のタグまでを保持(この場合は1番目にあるtoken1)にします。
トリガーワードを1つで強く効かせたいのでこのような設定をしています。
理論的な解説は他の方におまかせします。
sd-scriptで学習を実行。
venvの仮想環境に入ってコマンド直打ち、もしくはtoml設定ファイルを使用する。
sd-scriptのフォルダを右クリックしてターミナルを開く
venv/Scripts/activateと入力してvenv(仮想環境)に入る
コマンドをコピペして実行
(改行を入れない、使いまわししてる設定は見やすくするために改行を入れています。また設定値は適宜変更して下さい。)
ステップ数6000くらいになるようにepochとrepeatを適当に弄る。
特に根拠はありません。最適な数値は自分で模索しましょう。
私の環境では所要時間一時間弱。だいたい 1.80it/sくらいの速度。
出来上がったLoRAをwebuiのLoRAフォルダに入れてwebuiを立ち上げる。
プロンプトを調整する。
複数枚絵を生成して出来栄えが良い物を選別する。
どうしても結果が芳しくない場合はepochの小さいものを使うか、さらに学習を続ける(--network_weights=""で指定するとLoRAファイルにさらに学習させられます。)。
だいたい重みで調整出来る場合が多い気がします。
CIVITAIに投稿する。
pnginfoは編集しないでそのまま載せてるのでLoRAファイル名を弄るだけで再現出来る筈(CIVITAIがファイル名を変更している為)。(ToME入れてるので背景のディテールが違う?)
https://github.com/kohya-ss/sd-scripts/blob/main/train_README-ja.md
LoRA以外にも追加学習について書かれています。一読しましょう。
今後LyCORISを使いたい。LoCon,LoHA,ia3,lokrとか。
LoRAのリサイズ、階層別マージも時間があればやりたい。
LoCon使う時
--network_module lycoris.kohya
--network_dim=16
--network_alpha=8
--network_args "conv_dim=8" "conv_alpha=1" "dropout=0.05" "algo=lora"
LoHA使う時
--network_module lycoris.kohya
--network_dim=8
--network_alpha=4
--network_args "conv_dim=4" "conv_alpha=1" "dropout=0.05" "algo=loha"
ia3使う時(検証してない)
--network_module = lycoris.kohya
--network_dim = 32
--network_alpha=16
--network_args = "conv_rank=32", "conv_alpha=4", "algo=ia3"
--learning_rate = 1e-3
lokr使う時(検証してない)
--network_module lycoris.kohya
--network_args = "conv_rank=16", "conv_alpha=16", "algo=lokr",”decompose_both=True”,”factor=-1”
--optimizer_type lion
パラメーター二つ追加というのがこちらの可能性もある(なるべく早く確認します)
--decompose_both=True
--factor=-1
LyCORIS/Kronecker.md at b0d125cf573c99908c32c71a262ea8711f95b7f1 · KohakuBlueleaf/LyCORIS (github.com)から
It is on experiment.
rank_lora, optimizer, learning rate, filesize. alpha=rank
16_loRA : lion, unet lr=1.5e-4, TE lr = 7.5e-5, 38,184KB (reference)
4_loRA : lion, unet lr=1.5e-4, TE lr = 7.5e-5, 9,665KB (-75%)
4_LoHa : lion, unet lr=1.5e-4, TE lr = 7.5e-5, 19,258KB (-50%)
4_LoKr : lion, unet lr=3.0e-4, TE lr = 1.5e-4, 633KB (-98%)
8_LoKr : lion, unet lr=3.0e-4, TE lr = 1.5e-4, 1,027KB (-97%)
16_LoKr : lion, unet lr=3.0e-4, TE lr = 1.5e-4, 1,817KB (-95%)
unet lerning rateと
TextEncoder lerning rateを設定するようです???
optimizerにlion使うには
venv/Scripts/activate
pip install lion-pytorch
で導入しておきます
https://github.com/lucidrains/lion-pytorch
--optimizer_type lion
tomlファイル使うと楽になるらしいです
--config_file
で .toml
ファイルを指定してください。ファイルは key=value
形式の行で指定し、key はコマンドラインオプションと同じです。詳細は #241 をご覧ください。
ファイル内のサブセクションはすべて無視されます。
省略した引数はコマンドライン引数のデフォルト値になります。
コマンドライン引数で .toml
の設定を上書きできます。
--output_config
オプションを指定すると、現在のコマンドライン引数を--config_file
オプションで指定した .toml
ファイルに出力します。ひな形としてご利用ください。
ふたば may AIに絵を描いてもらって適当に貼って適当に雑談するスレ 不定期
としあきwiki 上のスレのまとめ
なんJ なんか便利なAI部 5ch
くろくまそふと
経済的生活日誌
Gigazine
原神LoRA作成メモ・検証
AIものづくり研究会@ディスコード
[Guide] Make your own Loras, easy and free@CIVITAI
githubのreadme sd-scriptとLyCorisとautomatic1111は一読して欲しいです
左のを意訳
このモデルを使う時にユーザー許可する内容
私の名前(この場合watahanを)を表記しなくていいです
このモデルのマージを共有してください
マージには異なる許可を使用する
右のを意訳
商業利用
全部禁止
生成した絵を販売する
AI絵生成サービスで使用する
このモデルまたはマージしたものを販売する
二次創作は二次創作ガイドラインがある場合、規約に従ってください。
モデルのタイトルにUnOfficialと必ず入れているのは公式だと誤認させない為です。
--max_train_epochs --dataset_repeats --train_data_dirだけ変えています。
accelerate launch --num_cpu_threads_per_process 16 train_network.py
--pretrained_model_name_or_path=C:\stable-diffusion-webui\models\Stable-diffusion\hogehoge.safetensors
--train_data_dir=C:\Users\hogehoge\Downloads\kyousi\
--output_dir=I:\train\outputs
--reg_data_dir=I:\train\seisoku
--resolution=512,512
--save_every_n_epochs=1
--save_model_as=safetensors
--clip_skip=2
--seed=42
--network_module=networks.lora
--caption_extension=.txt
--mixed_precision=fp16
--xformers
--color_aug
--min_bucket_reso=320
--max_bucket_reso=512
--train_batch_size=1
--max_train_epochs=15
--network_dim=32
--network_alpha=16
--learning_rate=1e-4
--use_8bit_adam
--lr_scheduler=cosine_with_restarts
--lr_scheduler_num_cycles=4
--shuffle_caption
--keep_tokens=1
--caption_dropout_rate=0.05
--lr_warmup_steps=1000
--enable_bucket
--bucket_no_upscale
--in_json="C:\train\marge_clean.json"
--dataset_repeats=5
--min_snr_gamma=5
※1引用元
https://www.kkaneko.jp/ai/win/stablediffusion.html より引用致しました
colaboでのLoRA作りたい場合については↓を使えばどうにか?
googleアカウントがあれば無料枠で学習させる事が出来ますね。
英語の単語で分からないのを調べていけば雰囲気でなんとかなるかも?
Linaqruf/kohya-trainer: Adapted from https://note.com/kohya_ss/n/nbf7ce8d80f29 for easier cloning (github.com)
👓 Promptvision is a web application that allows users to view and browse images. It allows quickly browsing through generations and changing directories in the "web" app. It's running locally using Flask.
🌱 Updated EXIF parser - parses everything that is available in EXIF. Supports PNG and JPG. Aesthetic score evaluation of your images. Filtering based on prompts, rating, aesthetic score, categories and tags.
🔥 Executable for Windows available! No need to git, python, gradio... Just double click and you're rolling!
🥕 If you want the most up to date version you have to clone from Github!
git clone https://github.com/Automaticism/Promptvision.git
View all details of images created with Automatic1111
Positive prompt
Negative prompt
Steps
Sampler
CFG scale
Seed
Size
Model hash
Model
Eta
Postprocessing
Extras
And all other fields which are detected in EXIF data
Aesthetic score is also available as metadata now if you want to analyze your images. Note: GPU is recommended. The aesthetic score is based on this: AUTOMATIC1111/stable-diffusion-webui#1831. See the code in gallery_engine.
You can add metadata which are stored locally on your system
Tags
Categories
Rating
Favourite
Reviewed status
You can change image directory by just pasting the path in and pressing the button
Metadata, thumbnails and exif are read / created / initialized when you enter a new directory
You can even load a directory while you are generating images (although this can cause some issues, haven't tested this too much)
It will update the data on your next launch of the folder when it sees that the number of images in your folder is different than what is in your metadata
(Deletions are not yet covered by this logic)
Supports some keybindings
Left and right arrow for navigating
F for favorite
1-5 for rating
S for saving
Double click to open
Change directory by pasting in your directory and then pressing "Change image directory"
Open via terminal - supports same launch arguments as before (plus config file)
Sample config file is included
usage: promptvision.exe [-h] [--config CONFIG] [--imagedir IMAGEDIR] [--port PORT]
[--log {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
Image viewer built with Flask.
options:
-h, --help show this help message and exit
--config CONFIG Path to configuration file
--imagedir IMAGEDIR Path to image directory
--port PORT Port number for the web server
--log {DEBUG,INFO,WARNING,ERROR,CRITICAL}
Set the logging level
Source code available: https://github.com/Automaticism/Promptvision
(Use git to get source instead of downloading from here)
Feedback are welcome. Post it here in comments or on Github as issues :)
Installing Conda / miniconda
Miniconda is a lightweight version of the Anaconda distribution, which is a popular data science platform. Conda is a package manager that allows you to install and manage packages and dependencies for various programming languages, including Python. Here are the steps to install Miniconda:
Go to the Miniconda website (https://docs.conda.io/en/latest/miniconda.html) and download the appropriate installer for your operating system. There are different installers for Windows, macOS, and Linux.
Once the installer is downloaded, run it and follow the instructions to complete the installation process. You can accept the default settings or customize them based on your preferences.
After the installation is complete, open a new terminal or command prompt window to activate the conda environment. You can do this by running the following command:
conda activate base
This will activate the base environment, which is the default environment that comes with Miniconda.
To verify that conda is installed correctly, you can run the following command:
conda --version
This should display the version number of conda.
That's it! You have now installed Miniconda and activated the base environment. You can use conda to install packages and manage your Python environments.
Setting up a virtual environment with Conda and running Promptvision
Open up any terminal program (CMD, Windows terminal, Bash, zsh, Powershell). Use the cd command to navigate to the "Documents" folder. Type cd Documents
and press enter. Use the git clone command to clone the repository. Type git clone [repository URL]
and press enter. Replace "[repository URL]" with the URL of the repository you want to clone. For example:
git clone https://github.com/Automaticism/Promptvision.git
Use the "cd" command to navigate to the cloned repository. Type cd repository and press enter. Replace "repository" with the name of the cloned repository. Create a new conda environment and activate it with the following commands:
conda create --name myenv
conda activate myenv
These commands will create a new environment named "myenv" and activate it.
Install the necessary dependencies using the following command:
pip install -r requirements.txt
This command will install the dependencies listed in the "requirements.txt" file.
Finally, run the Python script with the following command, replacing "[your image folder]" with the name of the folder containing your images:
python gallery.py --imagedir "[your image folder]"
Using aesthetic score
Based on this: AUTOMATIC1111/stable-diffusion-webui#1831 See the code in gallery_engine.
Required extras, this assumes you have setup Nvidia CUDA version 11.8 in this case. Adjust pytorch-cuda=<version>
according to what you have installed. If you have any challenges look at https://pytorch.org/get-started/locally/ to see how you can install it to your specific system.
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git
python gallery.py --imagedir "[your image folder]" --aesthetic True
This will calculate aesthetic score for all your images.
Run the application:
python .\gallery.py --imagedir "F:\stable-diffusion-webui\outputs\txt2img-images\2023-03-21\rpg"
Note: on launch it will extract exif data from all images and initialize metadata for all images. It will also create thumbnails. Everything will be placed in a metadata folder in the current working directory. Under this a folder for the will be created.
Note regarding sd webui plugin which has been discussed in the comments for a while:
Given that github.com/AlUlkesh/stable-diffusion-webui-images-browser exists I see no further point in making a sdwebui plugin.
I'll be continuing on with this standalone image viewer. Soon I'll be extending this with dataframe browsing that will enable users extensive insight into their own prompts and such based on their own metadata additions. I haven't yet landed on which framework since there is quite the extensive list of frameworks to choose from (e.g. Dash, Streamlit, Panel, and so on).
Latest exe: https://www.virustotal.com/gui/file/d48deef1e69425ce5d5b6cd350057180b72481f83ff611a69416b667ca62aeef?nocache=1 (Note that this has one false positive from Malwarebytes and their AI rules. This is most likely triggered because it's a "rare" file and because it trips "something" in their AI algorithm detection engine.)
https://www.virustotal.com/gui/file/290bb58559113d2224554bf1df856a799a4ff6ea2976d7b20c35ccd5ae7ced00
Its a script that generates a gif with (I think) 40 images. It only took me about 3 minutes to make a gen (Euler A | 16 steps | RTX 3070)
THINGS TO NOTE:
Dont worry about this guy, thats for something in the future.
Dont put a comma or space at the end of your positive prompt (nothing bad will happen, but its slightly annoying)
make sure it looks like this
Make sure you're using the same seed (otherwise you'll get a seziure from the changing colors)
and finally, IF YOU ARE USING CONTROLNET, TURN THIS STUPID THING ON (in settings)
Workflows: https://github.com/LEv145/images-grid-comfy-plugin/tree/main/workflows
cd custom_nodes
git clone https://github.com/LEv145/images-grid-comfy-plugin ImagesGrid
cd custom_nodes/ImagesGrid
git pull
More examples and help documents on github: https://github.com/wyrde/wyrde-comfyui-workflows
The recent changes to civit's UI make sharing these on civit a painful process.
Expand the About this Version box to the right → to see more.
for some reason im struggling with uploading context images of this so im just not going to try anymore. either they are getting deleted or not visible to viewers and i am not being given any reason for them so i can fix it, so im not trying anymore
If you decide to do this please upload a gif in the comments, this is something new i tried and want to see what people can do with it.
there seems to be a confusion here, so to make it clear the body painted version images are not generated they are the base photogrammetry images i origenally used in instaNGP to generate the transform.json
Also
NVIDIA's instaNGP also known as NeRF is a neural photogrammetry application instantly generates a 3D dense point cloud from 50-160 images, which typically takes 300-500 images to produce a satisfactory result in 30 minutes to 1 hour. I just edited the photogrammetry images using controlnet.
The download contains the instaNGP folders with the transforms.json files for both datasets, the samus bodypaint and sanus nude (both transforms.json are exactly the same)
Processed bodypaint images using instaNGP.
Copied the transforms.json file from the bodypaint folder to a new folder.
Used the controlnet m2m (it only supports mp4 videos) script for openpose, normal, depth controlnet, and generated text2image instead of image2image.
Placed the generated images in the images folder of the new folder.
I'm using the transforms.json file from a pre-calculated dataset on a new dataset with the same dimensions. The transforms.json file contains the calculated camera locations and extracted features of the provided dataset. If the new dataset has images with the same dimensions as the original dataset, using the transforms.json file will allow the same model to be built with the new images.
Although there were some unusual images, I think instaNGP disregards the pixels that do not match up and utilizes the matching portions, so I decided to keep them.
Tutorial for control net
1 . convert your base photogrammetry images into a mp4 video
2 . setting the prompt
3 . set width and height the same as your video
4 . set control model - 0 as open pose (leave the image empty)
5 . set control model - 1 as normal_map (leave the image empty)
6 . set control model - 2 as depth (leave the image empty)
7 . select the controlnet m2m script from the script section (you should have it if you have controlnet) and put your mp4 video in ControlNet-0
8 . put the same mp4 video in ControlNet-1
9 . put the same mp4 video in ControlNet-2
10 . click generate and you video frames will start processing WARNING make sure you are absolutely ready to start because after starting it is very hard to stop.
11 . after all frames are generated rename the generated images to match the origenal photogrammetry images using a programme called "advanced renamer"
12 . copy the images in the images folder in the newfolder refered in the main bullet points
This is a *.pmd for MMD.
This is a V0.1. I did it for science.
I learned Blender/PMXEditor/MMD in 1 day just to try this.
It's clearly not perfect, there are still work to do :
- head/neck not animated
- body and legs joints is not perfect.
How to use in SD ?
- Export your MMD video to .avi and convert it to .mp4.
- In SD :
setup your prompt
setup controlnet openpose
enable script "controlnet m2m"
put your .mp4 in the ControlNet-M2M tab
Generate
How to install ?
- Extract .zip file in your "...\MMD\UserFile\Model" repository
- Open MikuMikuDance.exe and load the model
Credit :
https://toyxyz.gumroad.com/l/ciojz for the openpose blender model
Disclaimer, this is not my script, I did not make it and I can't take credit for it whatsoever (if you recognise the script and it's owner, please let me know so I can contact them and ask them for permission, if you recognise this as your own script and you would like it removed, please let me know!)
The initial script was designed for making a deepthroat animation, and admittedly I could never get it to work, but it piqued my curiosity so I've tampered with it several times, this being one of the better iterations! This doesn't do anything the original script it will allow, so once again, the original author deserves all credit.
For anyone who knows how to edit the script, you'll be able to see what it does. This version has 18 frames, ranging from "topless, (small breasts:1.2), nipples" > "topless, (huge breasts:1.4), nipples) and exports them into a gif afterwards. I couldn't work out how to upload the file without choosing a .zip file, but just extract it into the 'Scripts' folder and it should show up where you'd choose the X/Y prompt option.
Advanced tips:
1: You should try to control the image as much as possible, making sure to pose your subject, their hands, the background as much as possible so as much will stay the same as possible.
2: Img2Img frames. If the gif turned out alright, save for one or two frames where it's a little too different, I've had decent luck using Img2Img with that frame, until it looks like it'll match with the rest. Then just use something like https://ezgif.com/maker to make it manually!
3: It prefers drawn models more than realistic!
Make a quick GIF animation using ControlNet to guide the frames in a stop motion pipeline
Add this extension through the extensions tab, Install from URL and paste this repository URL:
https://github.com/gogodr/sd-webui-stopmotion
Select the script named Stop Motion CN and you will be able to configure the interface
Select how many ControlNet Modules you want to use
Select which ControlNet model you will use for each tab
Add the corresponding frames for the animation **
Click on generate and it will generate all the frames ***
** As a recommendation use numbered files (Ex: 1.png, 2.png, 3.png ...)
*** The individual frames will be saved as normal in the corresponding txt2img or img2img output folder, but only the gif will be shown then the processing is done.
Handle output FPS
Handle batch img2img guide
Handle ControlNet preprocessing
This is a node based implementation of the cutoff extension for A1111. Cutoff is a method to limit the influence of specific tokens to certain regions of the prompt. This can be helpful if you want to e.g. specify exactly what colors certain things in the generated image should be.
For a detailed explanation of the method, the introduced nodes, or raise an issue, please see the github page for this project. You can take any of the example images listed in the gallery and load them into ComfyUI to have a closer look at an example node tree.
To install simply unzip into the custom_nodes folder.
This is sample config json file.
On request, here's a script to turn your prompts into gifs.
I built this off the prompts_from_file gif that comes with the webui.
if all you want is a script in the webui to turn a list of prompts into a gif, then this is the only file you need to worry about!
Grab the prompts_from_file_to_gif upload, unzip it, and put it into your webui/scripts directory, then restart your webui. You'll find it under the name "prompts from file or textbox with gif generation."
Grab the sample_prompts_to_get_you_started upload, unzip it, and then you can either open it up, and copy paste into the box, or you can click the upload_prompts_here button in the script to select the txt file.
Each prompt needs to be on one line, so if you have a bunch of prompts, you need to move them each to their own line.
To help with that, I also uploaded the parameter_grabber script.
If you don't want to, then you don't need to worry about that, but what it does, is it has simple gui, and it grabs the parameter data for all of the images files in a given directory, with an option to remove new line characters, and to write only your prompts, one per line, to a file.
Helps a lot. You can generate your images, one at a time, not needing to worry about saving the gen data seperate, then just drag and drop them off the webui to a new folder when you find a new frame you like, and at the end, you can use the parameter_grabber script to build the generation file for you.
It's particularly useful for img2img, and so that's why I uploaded the prompts_from_file_for_batch script.
drop it into your webui scripts directory, then, it again uses the prompts from file script as a base, but what this one does, is it applies the prompts in the list you give it to the files in your batch.
So, if you go to the img2img tab, select batch, and choose the image folder that you put all of your images in? You can use the prompts file you got from parameter_grabber for those images, and then do whatever you want, batch to those files. ControlNet them, change the resolution, change cfg, anything.
It does apply them in filename order, so line one, should apply to the first file in the batch, and so on.
A node that enables you to mix a text prompt with predefined styles in a styles.csv file. Each line in the file contains a name, positive prompt and a negative prompt. Positive prompts can contain the phrase {prompt} which will be replaced by text specified at run time.
Now I made a decent image, you can deduce what the VAE is for
Reddit version of this guide: https://www.reddit.com/r/StableDiffusion/comments/11izvoj
LoRAs used as example: https://civitai.com/models/7649, https://civitai.com/models/9850
Extension name: sd-webui-lora-block-weight
Syntax: <lora:loraname:casyalweight:blockweights>
This extension allows you to connect not the entire LoRA, but only individual blocks. This allows you to use some overtrained models, find a fault in your model, or in some cases combine the best epochs.
For example you can use it to take only initial blocks from LoRA, which have influence on the composition. The last blocks, which mostly determine the color hue. Or the middle blocks. color tone, or the middle blocks, which are responsible for a little bit of everything. This can make it easier to generate things that LoRA wasn't particularly intended, for example:
Lowering the weight of the initial blocks can give you your favorite Anime character with normal proportions.
Lowering the weight of the end blocks allows you to get the same character with eyes half a face, but in a normal color scheme.
Adding end blocks from extraneous LoRAs can enhance stroke, reflections, skin texture, lighten or darken the image
A style that sees everything as homes will slightly reduce its enthusiasm and start drawing characters.
And add all sorts of freaks, artifacts, extra eyes and fingers and stuff. After all, we're going to break the normal workings of the model, by cutting off the pieces you don't like.
To install, find sd-webui-lora-block-weight in the add-on list and install it.
After restarting the UI, the txt2img and img2img you will see new element: LoRA Block Weight.
Please note: There is currently a conflict with Composable Lora and Additional Networks. Additional Networks right now just broke this extension. Composable Lora can be installed at the same time, only one of them must be Enabled /Activate at a time. Otherwise the effect of the LoRA can be applied twice (if not more), creating a scorched image or a mishmash of colors. This is most likely a Webui problem because prompt scheduling shows similar problems in some conditions.
Off topic, but let me explain. Prompt scheduling is changing a request at a certain step, for example, [cat:dog:0,4] will start drawing the cat, but when 40% of all steps have passed it will remove the cat from the prompt and put a dog in the same place. This can result in an animal that has features of both, as well as and a separately standing badly drawn cat and dog.
I'll give you a good starting point to start experimenting with block weights:
In the prompt after the name of the LoRA model and weight write another colon and the word XYZ, in the example of the popular model it would be <lora:yaeMikoRealistic_yaemikoMixed:1:XYZ> , or if you check screenshot <lora:HuaqiangLora_futaallColortest:1:XYZ>
After this, make sure that the addon is enabled (Active), expand the XYZ plot of the addon (do not confuse with the X/Y/Z plot in the scripts section) check the XYZ plot option.
Select X Types Original Weights, in the X field enter:
INS,IND,INALL,MIDD,OUTD,OUTS,OUTALL
Preparation is finished, you will see a table like the one attached.
If you like any of the results, replace XYZ in the prompt to the tag, that was at the top of the image, like MIDD:
<lora:HuaqiangLora_futaallColortest:1:MIDD>
If you don't like any of the options, you can try inverting query, all weights will turn into their opposites. To do this instead of XYZ write ZYX and run generation again. There is one small bug: At this point in the article, you need to add one more LoRA with weight 0 and tag XYZ. For example, I took Paimon. I think Paimon was happy that she has weight 0 no matter what. Maybe this will be fixed, maybe it won't. As the author of the add-on explained, this will require a change in the logic of the of the extension.
So example: <lora:HuaqiangLora_futaallColortest:1:ZYX> <lora:paimonGenshinImpact_v10:1:XYZ>
If you like one of the inverted options, You will need to expand below Weights setting list, find in the list the corresponding line, for example MIDD, copy it into notepad/Excel/Word and replace all 1's with any character, all 0's with 1 and the previously specified character to 0, then paste it directly into prompt instead of ZXY. Or you can find ready weights in the comments. Do not forget to remove Paimon from prompt and disable XYZ plot.
Also available on Github
Download the .zip archive
extract ComfyUI_Dave_CustomNode
folder to ComfyUI/custom_nodes/
Start ComfyUI
all require file should be downloaded/copied from there.
no need to manually copy/paste .js files anymore
Let you visualize the ConditioningSetArea node for better control
Right click menu to add/remove/swap layers
Display what node is associated with current input selected
Also come with a ConditioningUpscale node. useseful for hires fix workflow
Let you visualize the MultiLatentComposite node for better control
Right click menu to add/remove/swap layers
Display what node is associated with current input selected
Experimental Lycoris LoRA (LoHa) trained on pixiv artist with several configurations.
Decided to upload most succesful ones.
Poster image done on H2O_64-64-64-64_4e-4_COS3R-03 version.
Name format: network dim - network alpha - conv dim - conv alpha - unet lr - scheduler (all cosine with 3 restarts in this case) - epoch.
Seems CivitAi bugged again and did not allow to attach model file, so marked it as "other" and uploaded zipped.
ComfyUI is an advanced node based UI utilizing Stable Diffusion. It allows you to create customized workflows such as image post processing, or conversions.
CLIPTextEncode (NSP): Parse Noodle Soup Prompts
Constant Number
Debug to Console (Debug pretty much anything to the console window)
Image Analyze
Black White Levels
RGB Levels
Depends on matplotlib
, will attempt to install on first run
Image Blank: Create a blank image in any color
Image Blend by Mask: Blend two images by a mask
Image Blend: Blend two images by opacity
Image Blending Mode: Blend two images by various blending modes
Image Bloom Filter: Apply a high-pass based bloom filter
Image Canny Filter: Apply a canny filter to a image
Image Chromatic Aberration: Apply chromatic aberration lens effect to a image like in sci-fi films, movie theaters, and video games
Image Color Palette
Generate a color palette based on the input image.
Depends on scikit-learn
, will attempt to install on first run.
Supports color range of 8-256
Utilizes font in ./res/
unless unavailable, then it will utilize internal better then nothing font.
Image Edge Detection Filter: Detect edges in a image
Image Film Grain: Apply film grain to a image
Image Filter Adjustments: Apply various image adjustments to a image
Image Flip: Flip a image horizontal, or vertical
Image Gradient Map: Apply a gradient map to a image
Image Generate Gradient: Generate a gradient map with desired stops and colors
Image High Pass Filter: Apply a high frequency pass to the image returning the details
Image Levels Adjustment: Adjust the levels of a image
Image Load: Load a image from any path on the system, or a url starting with http
Image Median Filter: Apply a median filter to a image, such as to smooth out details in surfaces
Image Mix RGB Channels: Mix together RGB channels into a single iamge
Image Monitor Effects Filter: Apply various monitor effects to a image
Digital Distortion
A digital breakup distortion effect
Signal Distortion
A analog signal distortion effect on vertical bands like a CRT monitor
TV Distortion
A TV scanline and bleed distortion effect
Image Nova Filter: A image that uses a sinus frequency to break apart a image into RGB frequencies
Image Remove Background (Alpha): Remove the background from a image by threshold and tolerance.
Image Remove Color: Remove a color from a image and replace it with another
Image Resize
Image Rotate: Rotate an image
Image Save: A save image node with format support and path support. (Bug: Doesn't display image
Image Select Channel: Select a single channel of an RGB image
Image Select Color: Return the select image only on a black canvas
Image Style Filter: Style a image with Pilgram instragram-like filters
Depends on pilgram
module
Image Threshold: Return the desired threshold range of a image
Image Transpose
Image fDOF Filter: Apply a fake depth of field effect to an image
Image to Latent Mask: Convert a image into a latent mask
Input Switch (Disable until *
wildcard fix)
KSampler (WAS): A sampler that accepts a seed as a node inpu
Load Text File
Load Batch Images
Increment images in a folder, or fetch a single image out of a batch.
Will reset it's place if the path, or pattern is changed.
pattern is a glob that allows you to do things like **/*
to get all files in the directory and subdirectory or things like *.jpg
to select only JPEG images in the directory specified.
Latent Noise Injection: Inject latent noise into a latent image
Latent Upscale by Factor: Upscale a latent image by a facto
MiDaS Depth Approximation: Produce a depth approximation of a single image input
MiDaS Mask Image: Mask a input image using MiDaS with a desired color
Number Operation
Number to Seed
Number to Float
Number to Int
Number to String
Number to Text
Random Number
Save Text File: Save a text string to a file
Seed: Return a seed
Tensor Batch to Image: Select a single image out of a latent batch for post processing with filters
Text Concatenate: Merge two strings
Text Find and Replace: Find and replace a substring in a string
Text Multiline: Write a multiline text strin
Text Parse Noodle Soup Prompts: Parse NSP in a text input
Text Random Line: Select a random line from a text input string
Text String: Write a single line text string value
Text to Conditioning: Convert a text string to conditioning.
Text tokens can be used in the Save Text File and Save Image nodes. You can also add your own custom tokens with the Text Add Tokens node.
The token name can be anything excluding the :
character to define your token. It can also be simple Regular Expressions.
[time]
The current system microtime
[hostname]
The hostname of the system executing ComfyUI
[user]
The user that is executing ComfyUI
If you're running on Linux, or non-admin account on windows you'll want to ensure /ComfyUI/custom_nodes
, was-node-suite-comfyui
, and WAS_Node_Suite.py
has write permissions.
Navigate to your /ComfyUI/custom_nodes/
folder
git clone https://github.com/WASasquatch/was-node-suite-comfyui/
Start ComfyUI
WAS Suite should uninstall legacy nodes automatically for you.
Tools will be located in the WAS Suite menu.
If you're running on Linux, or non-admin account on windows you'll want to ensure /ComfyUI/custom_nodes
, and WAS_Node_Suite.py
has write permissions.
Download WAS_Node_Suite.py
Move the file to your /ComfyUI/custom_nodes/
folder
Start, or Restart ComfyUI
WAS Suite should uninstall legacy nodes automatically for you.
Tools will be located in the WAS Suite menu.
Create a new cell and add the following code, then run the cell. You may need to edit the path to your custom_nodes
folder.
!git clone https://github.com/WASasquatch/was-node-suite-comfyui /content/ComfyUI/custom_nodes/
Restart Colab Runtime (don't disconnect)
Tools will be located in the WAS Suite menu.
WAS Node Suite is designed to download dependencies on it's own as needed, but what it depends on can be installed manually before use to prevent any script issues. The dependencies which are not required by ComfyUI are as follows:
opencv
scipy
timm (for MiDaS)
MiDaS Models (they will download automatically upon use and be stored in /ComfyUI/models/midas/checkpoints/
, additional files may be installed by PyTorch Hub
)
img2texture (for Image Seamless Texture node)
Used for the perlin noise. I tried writing three different perlin noise functions but I couldn't get things as fast as this library, even with numpy, and that was really hard to figure out. Haha. I'm just terrible with math. Feel free to PR a in-house version so long as it doesn't take longer than a few seconds. Fastest I got was nearly a minute... Lol This version renames some nodes, as well as introduces new fields. Unfortunately ComfyUI doesn't handle these changes well, so you'll have to replace the dreaded "red nodes" manually.
These are a collection of nodes I have made to help me in my workflows. None of the nodes here require any external dependencies or packages that aren't part of the base ComfyUI install so they should be plug and play.
Download the node's .zip file
Extract it into your ComfyUI\custom_nodes
folder
Restart your ComfyUI server instance
Refresh the browse you are using for ComfyUI
Have fun!
Let me know if you see any issues.
Simple Windows powershell script, execute it from the directory you want to organize. Sorts models into different sub directories by Person and NSFW flags base on the ".civitai.info" files created by Civitai Helper. AUTOMATIC1111 will show sub-directories when the show/hide extra network icon is used so you can filter your results. You need to "Scan Models for Civitai" in the Civitai Helper tab prior to running the script in the model directory you want to organize.
Prerequisite: Civitai Helper
https://github.com/butaixianran/Stable-Diffusion-Webui-Civitai-Helper
Procedure:
Install Civitai Helper
Restart AUTOMATIC1111
Check the checkboxes (ti, hyper, ckp, lora) of the model types you want to organize in the "Scan Models for Civitai" section in the "Civitai Helper" tab
Run "Scan Models for Civitai" in the "Civitai Helper" tab by clicking the "Scan" button
Wait for scan to complete
Make sure that ".civitai.info" files have been created in the AUTOMATIC1111 model directories you selected
Download this Windows PowerShell script
Extract the Organize Script from the downloaded zip file
Place this Organize Script file in the AUTOMATIC1111 model directory you want to organize (".\models\hypernetworks", ".\models\lora", ".\models\Stable-diffusion", ".\embeddings")
Run the Organize Script file in Windows by right clicking on the script and selecting "Run with PowerShell" menu option
Wait for the script to finish organizing your models
Verify the models have been organized into sub-directories as expected
Loop the output of one generation into the next generation.
To use create a start node, an end node, and a loop node. The loop node should connect to exactly one start and one end node of the same type. The first_loop input is only used on the first run. Whatever was sent to the end node will be what the start node emits on the next run.
More loop types can be added by modifying loopback.py
A node suite for ComfyUI that allows you to load image sequence and generate new image sequence with different styles or content.
An opinionated take on stable-diffusion models-merging automatic-optimisation.
The main idea is to treat models-merging procedure as a black-box model with 26 parameters: one for each block plus base_alpha
(note that for the moment clip_skip
is set to 0
).
We can then try to apply black-box optimisation techniques, in particular we focus on Bayesian optimisation with a Gaussian Process emulator.
Read more here, here and here.
The optimisation process is split in two phases:
1. exploration: here we sample (at random for now, with some heuristic in the future) the 26-parameter hyperspace, our block-weights. The number of samples is set by the
--init_points
argument. We use each set of weights to merge the two models we use the merged model to generate batch_size * number of payloads
images which are then scored.
2. exploitation: based on the exploratory phase, the optimiser makes an idea of where (i.e. which set of weights) the optimal merge is.
This information is used to sample more set of weights --n_iters
number of times. This time we don't sample all of them in one go. Instead, we sample once, merge the models,
generate and score the images and update the optimiser knowledge about the merging space. This way the optimiser can adapt the strategy step-by-step.
At the end of the exploitation phase, the set of weights scoring the highest score are deemed to be the optimal ones.
- wildcards support
- TPE or Bayesian Optimisers. cf. Bergstra et al. 2011 for a comparison
- UNET visualiser
- convergence plot
Head to the wiki for all the instructions to get you started.
1. LR-Text Encoder
Information is a personal test, may not match. Please test it yourself. via LoRA weight adjustment
Sometimes it can only be trained on Unet. What influence does the Text-Encoder have on Unet now that it takes time to observe?
question
How important is it to TE? Compared to Unet
How much step training? for best results without Overfitting and Underfitting
DIM = 8 Alpha 4
example TE weight - Unet 1e-4 TE 5e-5 [x0.5]
example TE weight - Unet 1e-4 TE 1e-4 [x1]
example TE weight - Unet 1e-4 TE 2e-5 [x0.2]
example TE weight - Unet 1e-4 TE 1e-5 [x0.1]
example TE weight - Unet 1e-4 TE 3e-4 [x3]
Result https://imgur.com/Cs1As45
Reducing TE too much results in the creation of non-existent objects and cause damage to clothes
If used equal to Unet when reducing TE weight, it will result in a strange image or distorted clothing appearance.
TE will not result in overfitting if the value is not exceeded from Unet = *1
If using LR decay then Unet's 1e-4 can be used to keep the quality consistent.
Personal opinion: TE acts as an indicator of what is happening in the training image. keep the details in the picture
If this value is too high It will also pick up useless things. If it's too small, it will lack image details.
TE test results 5e-5 individual epochs
every 1 epochs = 237 steps https://imgur.com/a/SdYq1ET
Good in the 6 to 8 epochs or 1422 to 1896 steps
It can go up to 3K steps if the training image data is enough.
2. LR-Unet https://imgur.com/lVilHf9
Will change the image the most. Using too many or too few steps. This greatly affects the quality of LoRA.
Using LR unet more than usual It can cause a LoRA Style [even if it's not intended to be a Style]. This can happen when the training image is less than 100.
It was found that in 3e-4 and TE 1e-4 [x0.3] There is a chance that details will be lost.
When using TE x0.5, even if using LR-Unet 2 times higher, TE and Alpha /2 will prevent Unet from overfitting [but training too many steps can overfitting as well]
in 5e-5 White shirt tag is bad due to TE = 5e-5 causing poor tag retention.
may need training to 10 epochs
PS. Using a DIM higher than 16 or 32 might use more Unet ? [idk]
3. Train TE vs Unet Only [WIP] https://imgur.com/pNgOthy
File size - TE 2,620KB | Both 9,325KB | Unet 6,705KB
The Unet itself can do images even without a TE but sometimes the details of the outfit are worse.
both training Makes the image deformation in the model less. If you intend to train LoRA Style, only train Unet.
4. min_snr_gamma [WIP]
It's a new parameter that reduces the loss, takes less time to train.
gamma test [Training] = 1 - 20
Loss/avg
top to down - no_gamma / 20 / 10 / 5 / 2 / 1
From the experiment, it was found that the use of steps was reduced by up to 30% when using gamma = 5
4.1. DIM / Alpha [WIP]
?? Using less alpha or 1 will require more Unet regardless of DIM ??
4.2 Bucket [WIP]
according to the understanding displayed in CMD
Is to cut the proportions of various image sizes
by reducing the size according to the resolution setting If the image aspect ratio exceeds the specified bucket, it will be cropped. Try to keep your character as centered as possible.
4.3 Noise_offset
This setting if the trained image is too bright or too dark. set not more than 0.1
In most cases, practicing with anime images is recommended to set 0
PS. This setting will result in easier overfitting
4.4 Weight_Decay , betas
It is a parameter that is quite difficult to define. It is recommended to use between 0.1-1
betas then don't set it up
5. LoRA training estimation [WIP]
This was an ideal practice. which is difficult to happen with many factors
With too little training or high unet, the Text-Encoder doesn't get enough information and lacks detail.
With a low learning rate, it takes longer than usual. This makes overfitting very difficult. But it makes underfitting easier.
TE is responsible for storing the information of the Tag what it is in the image. and save details in the Tag
more changes Unet is different, the more data it collects ?
Inspired by the introduction of AnyLora by Lykon and an experiment done by Machi, I decide to further investigate the influence of base model used for training.
Here is the full documentation
https://rentry.org/LyCORIS-experiments#a-certain-theory-on-lora-transfer
On the same entry page I also have other experiments
I focus on anime training here. To quick recapitulate,
If you want to switch style when switching model, you should use NAI or ACertainty. On the other hand, if you want the trained style to be retained on a family of models, you should use a model that is close to all these models (potentially a merge).
If you want style of model X when using it, you train on ancestor of X that does not have this style. Especially, if you want to make cosplay images, you should better train on NAI and not train directly on NeverEndingDream or ChilloutMix.
Don't use SD 1.4/1.5 for anime training in general unless you train something at the scale of WD.
General Advice
Dataset is the most important. Use regularization set whenever possible. Make sure data are diverse and properly captioned (remember that trigger word learned what is in image but not described in caption).
Training on higher resolution can enhance background and details but it is not necessarily worth it.
I really see no difference training on clip 1 or 2. If you see it, please let me know.
I am not able to upload the full resolution image (more than 100mb for each), but you can download the zip and check yourself.
Images 2-6, made with final checkpoints with weight 1
Images 7-9, made with intermediate checkpoints
Images 10-12, made with final checkpoints with weight 0.65
Now, we finally have a Civitai SD webui extension!!
Update:
1.6.1.1 is here, to support bilingual localization extension.
This extension works with both gradio 3.23.0 and 3.16.2.
Civitai Helper 2 is under development, you can watch its UI demo video at github page.
Note: This extension is very stable and works well with many people. So, if you have an issue, read its github document and check console log window's detail.
Civitai Helper
Stable Diffusion Webui Extension for Civitai, to help you handle models much more easily.
The official SD extension for civitai takes months for developing and still has no good output. So, I developed this Unofficial one.
Github project:
https://github.com/butaixianran/Stable-Diffusion-Webui-Civitai-Helper
(Github page has better document)
Scan all models to download model information and preview images from Civitai.
Link local model to a civitai model by a civitai url
Download a model(with info+preview) by Civitai Url into SD's model folder or subfolder.
Downloading can resume at break-point.
Checking all your local model's new version from Civitai
Download a new version directly into SD model folder (with info+preview)
Modified Built-in "Extra Network" cards, to add the following buttons on each card:
🖼: Modified "replace preview" text into this icon
🌐: Open this model's Civitai url in a new tab
💡: Add this model's trigger words to prompt
🏷: Use this model's preview image's prompt
Also support thumbnail mode of Extra Network
Option to always show addtional buttons, so now they work with touch screen.
Everytime you install or update this extension, you need to shutdown SD Webui and Relaunch it. Just "Reload UI" won't work.
First of all, Update Your SD Webui to latest version!
This extension need to get extra network's cards id. Which is added since 2023-02-06. If your SD webui is an earlier version, you need to update it!
After install, Go to extension tab "Civitai Helper". There is a button called "Scan Model".
Click it, extension will scan all your models to generate SHA256 hash, and use this hash, to get model information and preview images from civitai.
After scanning finished,
Open SD webui's build-in "Extra Network" tab, to show model cards.
Move your mouse on to the bottom of a model card. It will show 4 icon buttons:
🖼: Modified "replace preview" text into this icon
🌐: Open this model's Civitai url in a new tab
💡: Add this model's trigger words to prompt
🏷: Use this model's preview image's prompt
If those buttons are not there, click the "Refresh Civitai Helper" button to get them back.
Everytime extra network tab refreshed, it will remove all additional buttons of this extension. You need to click Refresh Civitai Helper
button to bring them back.
Github repo + nodes description: LINK
Leave suggestions and errors if you meet them
What's new in 0.5.0:
CombiningArea scaler
More user-friendly ui names
ALL nodes description moved to GitHUB
Tuples and so on moved to their own directory in UI
Automate calculation depending on image sizes or something you want
easier(or not) editing multiple values of various nodes
Math
Modded scalers
Installing: unzip files in ComfyUI/custom_nodes folder
Should look like this:
For example (v0.5.0) there is an example how scaled ConditioningArea can improve image after scaled latent combining:
Only LatentCombine:
Combining preview:
LatentCombine with scaled ConditioningArea (640*360 to 1360*768):
Example of workflow i made for this located in: /Derfuu_ComfyUI_ModdedNodes/workflow_examples/
model: hPANTYHOSENEKO (sorry, couldn't find link)
negative promp: embedding:verybadimagenegative6400
If there are troubles with different sizes, aside from *64, this may solve problem: found on GitHUB
This code is at the end of this file: /ComfyUI/comfy/ldm/modules/diffusionmodules/openaimodules.py
NOTES#2:
Debug nodes counts as OUTPUT nodes and can be used withowt image preview or save nodes to get results
P.S.:
All fixes wou can find or post on github, i look there too
If you catch error like: Calculated padded input size per channel: (2 x 82). Kernel size: (3 x 3). Kernel size can't be greater than actual input size. This MAY be because of too high or low offset you give to node
🐣 Please follow me for new updates https://twitter.com/camenduru
🔥 Please join our discord server https://discord.gg/k5BwmmvJJU
https://github.com/lilly1987/ComfyUI_node_Lilly
```
ex : {3$$a1|{b2|c3|}|d4|{-$$|f|g}|{-2$$h||i}|{1-$$j|k|}}/{$$l|m|}/{0$$n|}
{1|2|3} -> 1 or 2 or 3
{2$$a|b|c} -> a,b or b,c or c,a or bb or ....
{9$$a|b|c} -> {3$$a|b|c} auto fix max count
{1-2$$a|b|c} -> 1~2 random choise
{-2$$a|b|c} -> {0-2$$a|b|c} 0-2
{1-$$a|b|c} -> {0-3$$a|b|c} 1-max
{-$$a|b|c} -> {0-3$$a|b|c} 0-max
{9$$ {and|or} $$a|b|c} -> a or b or c / c and b and a
```
install : ComfyUI\custom_nodes\ComfyUI_node_Lilly
txt folder :
ComfyUI\wildcards
or edit line
card_path=os.path.dirname(__file__)+"\\..\\wildcards\\**\\*.txt"
FaceRestore node for ComfyUI. To install copy the facerestore directory from the zip to the custom_nodes directory in ComfyUI.
I bodged this together in an afternoon. You might need to pip install a package if it doesn't work at first.
You'll need codeformer-v0.1.0.pth
or GFPGANv1.4.pth
in your models/upscale_models
directory. The node uses another model for face detection which it will download and put in models/facedetection
Use this instead: https://civitai.com/models/24537/comfyui-visual-multiareaconditioning
At the moment, these workflows won't really give you anything above what can be done much easier and simpler with the above custom node. This will likely remain the case, though if I ever make something more complex, I may update this. For the unconvinced:
https://github.com/comfyanonymous/ComfyUI
It can generate multiple subjects. Each subject has its own prompt.
For now, ComfyUI doesn't have much in terms of automation, so custom nodes are required or setting it up will take a moment. Instructions can be found in a disconnected prompt box to the left.
There are two methods for multiple subjects included so far:
Latent Couple limits the areas affected by each prompt to just a portion of the image
Noisy Latent Composition generates each prompt on a separate image for a few steps (eg. 4/20) so that only rough outlines of major elements get created, then combines them together and does the remaining steps with Latent Couple
From my testing, Latent Couple seems to generally do better
Model: https://civitai.com/models/8019/smix-series (sMix series; ver 12122)
VAE: https://huggingface.co/hakurei/waifu-diffusion-v1-4 (kl-f8-anime2)
LORA: N/A
Embeddings: https://huggingface.co/datasets/gsdf/EasyNegative (EasyNegative), https://huggingface.co/yesyeahvh/bad-hands-5/blob/main/bad-hands-5.pt (bad-hands-5), https://huggingface.co/NiXXerHATTER59/bad-artist (bad-artist)
Upscale Model: https://drive.google.com/file/d/1lELx_WiA25_S8rYINm_DyMNpFOhfZAzt/view (4x_foolhardy_Remacri) OR LatentUpscale
There are two/three example images in each zip file. You can drag&drop them on webui to load their full workflows. This can be helpful in figuring out how to set it up.
a princess
Install https://github.com/Fannovel16/comfy_controlnet_preprocessors
thanks to Fannovel16
Download:
https://civitai.com/models/9251/controlnet-pre-trained-models
at least Canny, Depth is optional
or difference model (takes your model as input, might be more accurate)
https://civitai.com/models/9868/controlnet-pre-trained-difference-models
put those controlnet models into ComfyUI/models/controlnet
thanks to Ally
Download attached file and put the nodes into ComfyUI/custom_nodes
Included are some (but not all) nodes from
https://civitai.com/models/20793/was-node-suites-comfyui
Restart ComfyUI
Usage:
Disconnect latent input on the output sampler at first.
Generate your desired prompt. Adding "open sky background" helps avoid other objects in the scene.
Adjust the brightness on the image filter. During my testing a value of -0.200 and lower works. Flowing hair is usually the most problematic, and poses where people lean on other objects like walls.
A free standing pose and short straight hair works really well.
The point of the brightness is to limit the depth map somewhat to create a mask that fits your subject.
Choose your background image. It can either be the same latent image or a blank image created by a node, or even a loaded image.
Alternatively you want to add another image filter between the yellow
Monochromatic Clip and ImageToMask node and add a little bit of blur to achieve some blend between the subject and the new background.
When you are satisfied with how the mask looks, connect the VAEEncodeForInpaint Latent output to the Ksampler (WAS) Output again and press Queue Prompt.
For this to work you NEED the canny controlnet. I have tried HED and normalmap aswell, but canny seems to work the best.
Depending on your subject you might need another controlnet type.
You would have to switch the preprocessor from canny and install a different controlnet for your application.
Applying the depth controlnet is OPTIONAL. It will add a slight 3d effect to your output depending on the strenght.
If you are strictly working with 2D like anime or painting you can bypass the depth controlnet.
Simply remove the condition from the depth controlnet and input it into the canny controlnet. Without the canny controlnet however, your output generation will look way different than your seed preview.
I added alot of reroute nodes to make it more obvious of what goes where.
Reproducing this workflow in automatic1111 does require alot of manual steps, even using 3rd party program to create the mask, so this method with comfy should be very convenient.
Disclaimer: Some of the color of the added background will still bleed into the final image.
https://github.com/Fannovel16/comfy_controlnet_preprocessors
https://civitai.com/models/9251/controlnet-pre-trained-models
(openpose and depth model)
optional but highly suggest:
https://civitai.com/api/download/models/25829
Tested with a few other models aswell like F222 and protogen.
The following explanation and instruction can also be found in a text node inside the workflow:
I used different "masks" in the load addition node aswell, with vastly different results but all returned backgrounds. Also the same mask in different colors.
This one is strickly a gradient of white created on a completely black background.
I can only presume that the AI uses it as some sort of guidance to distribute noise.
The green condition combine node input order actually matters. The output of the green "Depth Strenght" has to go into the lower input.
The upper input of that node comes from CLIP positive with the pose.
The blue sampler section does nothing more than to produce a depth map which is then encoded to latent and used as latent input for the cyan colored output sampler.
For the green image scale, I would suggest to always match it with your original image size with crop DISABLED
DEPTH STRENGHT setting can change the final image quite a bit, and you will lose weight of the original positive prompt if its too high.
You can start as low as 0 in some cases, but if background appears you want to increase it, even up to a strenght of 1. (lower is better)
If you haven't already I suggest you download and install
Fannovels preprocessors found here
https://github.com/Fannovel16/comfy_controlnet_preprocessors
The seed node and the Sampler with seed input you can download here
https://civitai.com/api/download/models/25829
The openpose and depth models are found here
https://civitai.com/models/9251/controlnet-pre-trained-models
You could also try using WAS's depth preprocessor, but I found it to create a depth map that is too detailed, or doesn't have the threshold that is useful for this.
The model I am using you can find here
Hey!
I'm TheAlly! You might have seen my content around here - I produce and host a diverse range of stuff to help boost your image creation capabilities. I've released some of the most popular content on Civitai, and am constantly pushing the boundaries with experimental and unusual projects.
Me!
This guide is aimed at the complete beginner - someone who is possibly computer-savvy, with an interest in AI art, but doesn’t know where to look to get started, or is overwhelmed by the jargon and huge number of conflicting sources.
This guide is not going to cover exactly how to start making images - but it will give you an overview of some key points you need to know, or consider, plus information to help you take the first steps of your AI art journey.
So what is “Generative AI”, and how does Stable Diffusion fit into it? You might have heard the term Generative AI in the media - it’s huge right now; it’s on the news, it’s on the app-stores, Elon Musk is Tweeting about it - it’s beginning to pervade our lives.
Generative AI refers to the use of machine learning algorithms to generate new data that is similar to the data fed into it. This technology has been used in a variety of applications, including art, music, and text generation. The goal of generative AI is to allow machines to create something new and unique, rather than simply replicating existing data.
Stable Diffusion is one example of generative AI that has gained popularity in the art world, allowing artists to create unique and complex art pieces by entering text “prompts”.
GPT-3/4 (Chat GPT) is another example of generative AI - a language model that can generate human-like text. It is capable of completing sentences, paragraphs, and even entire articles, given a short prompt. This technology is being used in a variety of applications, including chatbots, content creation, and even computer programming. I used it to write this paragraph in ~1 second.
This guide will specifically cover Stable Diffusion, but will touch on other Generative AI art services.
In mid-2022, the art world was taken by storm with the launch of several AI-powered art services, including Midjourney, Dall-E, and Stable Diffusion. These services and tools utilize cutting-edge machine learning technology to create unique and innovative art that challenge traditional forms and blur the lines between human and machine creation.
The impact of AI art on the industry has already been significant. Many artists and enthusiasts are exploring the possibilities of this new medium, while many fear the repercussions for established artists' careers. Many art portfolio websites have developed new policies that prohibit the display of AI-generated work. Some websites require artists to disclose if their work was created using AI, and others have even implemented software that can detect AI-generated art.
There are many big-players in the AI art world - here are a few names you'll often see mentioned;
OpenAI - A research laboratory with both for and non-profit subsiduaries, focusing on the development of AI, in an open and responsible manner. Founded by technology investors (including Peter Thiel and Elon Musk) in 2015, OpenAI has created some highly advanced generative AI models, such as GPT-3, and the recently announced GPT-4, which are highly regarded for their language processing and generation abilities.
Stability AI - The world’s leading open source generative AI company - the brainchild of CEO Emad Mostaque, Stability AI is a technology start-up, focused on open source releases of tools, models, and resources. Stability AI is behind the 2022 releases of the Stable Diffusion, and Stable Diffusion 2.0 text-to-image models.
RunwayML - One of the companies behind Stable Diffusion, RunwayML now provide a platform for artists to use machine learning tools in intuitive ways without any coding experience.
There are already a number of lawsuits challenging various aspects of the technology. Microsoft, GitHub and OpenAI are currently facing a class-action lawsuit, while Midjourney and Stability AI are facing a lawsuit alleging they infringed upon the rights of artists in the creation of their products.
Whatever the outcome, Generative AI is here to stay.
That is an incredibly complex topic, and we’ll just touch on it very briefly here at a very very high level;
(Forward) Diffusion is the process of slowly adding random pixels (noise) to an image until it no longer resembles the original image, and is 100% noise - we’ve diffused, or diluted, the original image. By reversing that process, we can reproduce something similar to the original image. There is obviously a lot more going on in the process, but that’s the general idea. We input text, the “model” processes that text, generates it from the “diffused” image, and displays an appropriate output image.
Simple! (because that's not really what's happening, don't @ me - I know)
There are a number of tools to generate AI art images, some more involved and complex to set up than others. The easiest method is to use a web-based image generation service, where the code and hardware requirements are taken care of for you but there’s often a fee involved.
Alternatively, if you have the required hardware (ideally an NVIDIA graphics card), you can create images locally, on your own PC, with no restriction, using Stable Diffusion.
When we talk about Stable Diffusion, we’re talking about the underlying mathematical/neural network framework which actually generates the images. We need some way to interface with that framework in a user-friendly way - that’s where the following tools come in;
This guide is extremely high level and won’t get into the deep technical aspects of installing (or using) any of these applications (I will be posting an extremely in-depth guide at a later date), but if you’d like to run Stable Diffusion on your own PC there are options!
Note that to get the most out of any local installation of Stable Diffusion you need an NVIDIA graphics card. Images can be generated using your computer’s CPU alone, or on some AMD graphics cards, but the time it will take to generate a single image will be considerable.
Automatic1111’s WebUI (Complexity factor ⭐⭐⭐⭐/5) - WebUI is the most commonly used Interface for Stable Diffusion. It is moderately complex, and has a wide range of plugins and extensions to extend the experience. There’s a great deal of community support available if you have problems.
ComfyUI (Complexity factor ⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐/5) - ComfyUI is relatively new to the scene, and provides an exceedingly complex workflow/node based workspace which requires in-depth knowledge of the Stable Diffusion image generation process to make work. Definitely not a beginner interface, but extremely powerful for the experienced user.
Cmdr2’s Easy Diffusion (Complexity factor ⭐⭐/5) - A great option for those starting out with a local install. Easy Diffusion has a 1-click installer for Windows, and a popular Discord server full of extremely knowledgeable people to help you get up and running. The interface itself is limited in what it can do, compared to the other Interfaces, but it remains the easiest way to get started making your own images, locally.
InvokeAI (Complexity factor ⭐⭐⭐/5) - A popular open-source text-to-image and image-to-image interface with powerful tools, not yet as full featured as Automatic1111’s WebUI, but getting close.
Mac owners can run Automatic1111’s WebUI, InvokeAI, and also a popular, lightweight, and super simple to use Interface, DiffusionBee;
DiffusionBee (Complexity factor ⭐/5) - DiffusionBee is an extremely lightweight MacOS interface for Stable Diffusion. It allows for basic image generation, but has a very small feature-set, to keep it as simple as possible.
Draw Things App - (Complexity factor ?/5) - Draw Things is a popular and highly rated MacOS App. I don't know much about it, but from anecdotal evidence it seems to have some good features!
There are many websites appearing which allow you to create Stable Diffusion images if you don’t want the fuss of setting up an interface on your local PC, or if your computer hardware can’t support one of the above interfaces.
Prodia - Prodia is an easy to use interface for Stable Diffusion, with access to a few popular models. Images can be generated here for free without a cap on the number, but advanced features require a paid subscription.
Mage.space - Mage.space is a fully featured interface with a host of advanced settings. Images can be generated for free (with an account), but more in-depth control requires a paid subscription.
Nightcafe - Nightcafe Studio is a popular AI art generator with a large community of followers, offering a range of options for free, or for earnable credits.
Dall-E 2 - One of the first image generator tools, now overtaken a little in terms of functionality and image quality. Users get 15 free generation credits per month.
Midjourney - Not technically a Stable Diffusion implementation - slightly different technology, doing the same thing! Midjourney produces extremely distinctive images and has a huge following.
An example of Midjourney generated artworks.
Checkpoints, also known as “weights” or “models” are part of the brains which produce our images. Each model can produce a different style of image, or a particular theme or subject. Some are “multi-use” and can produce a mix of portrait, realistic, and anime (for example), and others are more focused, only reproducing one particular style of subject.
Models come in two file types. It’s important to know the distinction if running a local Stable Diffusion interface, as there are security implications.
Pickletensor (.ckpt extension) models may contain and execute malicious code when downloaded and used. Many websites, including Civitai, have “pickle scanners” which attempt to scan for malicious content. However, it’s safer to download Safetensor (.safetensor) models when available. This file type cannot contain any malicious code and is inherently safe to download.
Note that if using a Generation Service you will only be able to use the models they provide. Some services provide access to some of the most popular models while others use their own custom models. It depends on the service.
Along with models there are many other files which can extend and enhance the images generated by the models, including LoRA, Textual Inversion, and Hypernetworks. We’ll look at those in a more in-depth guide.
Most stable diffusion interfaces come with the default Stable Diffusion models, SD1.4 and/or SD1.5, possibly SD2.1 or SD2.2. These are the Stable Diffusion models from which most other custom models are derived and can produce good images, with the right prompting.
Custom models can be downloaded from the two main model-repositories;
Civitai - You are here! Civitai is the leading model repository for Stable Diffusion checkpoints, and other related tools. There are tens of thousands of models to choose from, across many categories; something for everyone!
Huggingface Model Hub - Huggingface has a wide variety of txt2img models, but finding models you’d like to try is often a challenge, as the interface is not the most user friendly for browsing.
Generative AI is a huge field, with many applications. Some of the most popular and interesting tools right now are;
ChatGPT - Mentioned above, ChatGPT is what’s known as an LLM (Large Language Model), designed to provide conversational responses to input text, understand and answer questions, provide recommendations, generate content, and more. It can solve problems, write code - it’s extremely useful, and free (with limitations). The first local models for ChatGPT like LLMs are now appearing, and I will post a tutorial on my Patreon soon, covering their use.
Riffusion - Riffusion generates music from text prompts, rather than images! You can ask for your favorite style - or instrument - or ambient sounds, in any combination or beat, and get some really wonderful outputs. You can run Riffusion from the website, or alternatively, there is a way to run it locally from the Automatic1111 WebUI interface.
The Definitive Stable Diffusion Glossary (which needs to be updated, like, yesterday). Volunteers?
I run a popular Patreon site with lots of in-depth material - patreon.com/theally
Primarily, tutorials! Text-based, extremely in-depth, with lots of illustrative pictures and easy to understand language. There are also a range of files - scraped data sets, data set prep scripts, embeddings and LoRAs I'm too embarrassed to release on Civitai, that sort of thing.
I have tutorials covering;
LoRA Creation with Kohya_SS
ControlNet and 3D OpenPose
Making 5 minute "no-train" Embeddings
ComfyUI introduction
DepthMap walkthrough
And a bunch more. Some of the content currently in development includes;
Absolute Beginner's Guide to Generative Art, which you're reading.
Civitai.com How-To: The Insider's Guide
A full overhaul of all the content, bringing it up to date with the latest developments - this is an ongoing process, as the tech changes and updates are released.
Have you ever paid for a Udemy course? Or paid for someone's help on Fiverr? The Generative AI space moves so quickly that it's easy to get overwhelmed, and sure, there're a lot of (conflicting) tutorials out there for free - but I'm consolidating, testing, and presenting my findings to you in a plain, comprehensible, way so you don't have to go wading through tons of sus info. They're timesavers.
Great! I look forward to interacting with you! It's over here - https://www.patreon.com/theally
The Loopback Scaler is an Automatic1111 Python script that enhances image resolution and quality using an iterative process. The code takes an input image and performs a series of image processing steps, including denoising, resizing, and applying various filters. The algorithm loops through these steps multiple times, with user-defined parameters controlling how the image evolves at each iteration. The result is an improved image, often with more detail, better color balance, and fewer artifacts than the original.
Note: This is a script that is only available on the Automatic1111 img2img tab.
Iterative enhancement: The script processes the input image in several loops, with each loop increasing the resolution and refining the image quality. The image result from one loop is then inserted as the input image for the next loop which continually builds on what has been created.
Denoise Change: The denoising strength can be adjusted for each loop, allowing users to strike a balance between preserving details and reducing artifacts.
Adaptive change: The script adjusts the amount of resolution increase per loop based on the average intensity of the input image. This helps to produce more natural-looking results.
Image filters: Users can apply various PIL Image Filters to the final image, including detail enhancement, blur, smooth, and contour filters.
Image adjustments: The script provides sliders to fine-tune the sharpness, brightness, color, and contrast of the final image.
Recommended settings for img2img processing are provided in the script, including resize mode, sampling method, width/height, CFG scale, denoising strength, and seed.
Please note that the performance of the Loopback Scaler depends on the gpu, input image, and user-defined parameters. Experimenting with different settings can help you achieve the desired results.
Do NOT expect to recreate images with prompts using this method.
You can start from txt2img with a prompt. Generate your image and then send it over to img2img. When creating images for this process, shoot for lower resolution images (512x768, 340x512, etc)
ALWAYS have a prompt in your img2img tab when doing this process, unless you are interested in creating choas :D. Your results will usually be poor, but you CAN put a different prompt in img2img than what you created the source image with. Pretty interesting results come from this method.
When using models that require VAE keep the # of loops lower than normal because it will cause the image to fade each iteration. Luckily you can add Color and Sharpness back in with the PIL enhancements if you need.
Don't set your maximum Width/Height higher than what you can normally generate. This script is not an upscaler model and isn't intended to make giant images. It is intended to give you detailed quality images that you can send to an upscaler.
Once installed there is an Info panel at the bottom of the script interface to help you understand the settings and what they do.
Unzip the loopback_scaler.py
script.
Move the script to the \stable-diffusion-webui\scripts
folder.
Close the Automatic1111 webui console window.
Relaunch the webui by running the webui-user.bat
file.
Open your web browser and navigate to the Automatic1111 page or refresh the page if it's already open.
In Automatic1111 navigate to your 'Extensions' tab
Click on the 'Install from URL' sub-tab
copy/paste https://github.com/Elldreth/loopback_scaler.git into the 'URL for extension's git repository' textbox
Click on the 'Install' button and wait for it to complete
Click on the 'Installed' sub-tab
Click the 'Apply and Restart UI' button
Even if you don't know where to start or don't have a powerful computer, I can guide you to making your first Lora and more!
In this guide we'll be using resources from my GitHub page. If you're new to Stable Diffusion I also have a full guide to generate your own images and learn useful tools.
I'm making this guide for the joy it brings me to share my hobbies and the work I put into them. I believe all information should be free for everyone, including image generation software. However I do not support you if you want to use AI to trick people, scam people, or break the law. I just do it for fun.
Also here's a page where I collect Hololive loras.
An internet connection. You can even do this from your phone if you want to (as long as you can prevent the tab from closing).
Knowledge about what Loras are and how to use them.
Patience. I'll try to explain these new concepts in an easy way. Just try to read carefully, use critical thinking, and don't give up if you encounter errors.
It has a reputation for being difficult. So many options and nobody explains what any of them do. Well, I've streamlined the process such that anyone can make their own Lora starting from nothing in under an hour. All while keeping some advanced settings you can use later on.
You could of course train a Lora in your own computer, granted that you have an Nvidia graphics card with 8 GB of VRAM or more. We won't be doing that in this guide though, we'll be using Google Colab, which lets you borrow Google's powerful computers and graphics cards for free for a few hours a day (some say it's 20 hours a week). You can also pay $10 to get up to 50 extra hours, but you don't have to. We'll also be using a little bit of Google Drive storage.
This guide focuses on anime, but it also works for photorealism. However I won't help you if you want to copy real people's faces without their consent.
As you may know, a Lora can be trained and used for:
A character or person
An artstyle
A pose or concept
etc
However there are also different types of Lora now:
LoRA: The classic. You can use it in your webui no problem.
LoCon: Has more learning layers, it is reportedly good at artstyles. You'll need the Lycoris extension for your webui to use them like a normal lora.
LoHa: Has more layers and new mathematical algorithms. Takes much longer to train but can learn complex things, such as styles and characters at the same time. I rarely recommend it. You'll need the Lycoris extension for your webui to use them like a normal lora.
This is the longest and most important part of making a Lora. A dataset is (for us) a collection of images and their descriptions, where each pair has the same filename (eg. "1.png" and "1.txt"), and they all have something in common which you want the AI to learn. The quality of your dataset is essential: You want your images to have at least 2 examples of: poses, angles, backgrounds, clothes, etc. If all your images are face close-ups for example, your Lora will have a hard time generating full body shots (but it's still possible!), unless you add a couple examples of those. As you add more variety, the concept will be better understood, allowing the AI to create new things that weren't in the training data. For example a character may then be generated in new poses and in different clothes. You can train a mediocre Lora with a bare minimum of 5 images, but I recommend 20 or more, and up to 1000.
As for the descriptions, for general images you want short and detailed sentences such as "full body photograph of a woman with blonde hair sitting on a chair". For anime you'll need to use booru tags (1girl, blonde hair, full body, on chair, etc.). Let me describe how tags work in your dataset: You need to be detailed, as the Lora will reference what's going on by using the base model you use for training. Anything you don't include in your tags will become part of your Lora. This is because the Lora absorbs details that can't be described easily with words, such as faces and accessories. Knowing this you can let those details be absorbed into an activation tag, which is a unique word or phrase that goes at the start of every text file, and which makes your Lora easy to prompt.
You may gather your images online, and describe them manually. But fortunately, you can do most of this process automatically using my new 📊 dataset maker colab.
Here are the steps:
1️⃣ Setup: This will connect to your Google Drive. Choose a simple name for your project, then run the cell by clicking the floating play button to the left side. It will ask for permission, accept to continue the guide.
The folder structures are both fine, the old one organizes the files by type while the new one contains centralised project folders. Just make sure you use the same structure in the Lora trainer.
If you already have images to train with, upload them to your Google Drive's "lora_training/datasets/project_name" (old) or "Loras/project_name/dataset" (new) folder, and you may choose to skip step 2.
2️⃣ Scrape images from Gelbooru: In the case of anime, we will use the vast collection of available art to train our Lora. Gelbooru sorts images through thousands of booru tags describing everything about an image, which is also how we'll tag our images later. Follow the instructions on the colab for this step; basically, you want to request images that contain specific tags that represent your concept, character or style. When you run this cell it will show you the results and ask if you want to continue. Once you're satisfied, type yes and wait a minute for your images to download.
3️⃣ Curate your images: There are a lot of duplicate images on Gelbooru, so we'll be using the FiftyOne AI to detect them and mark them for deletion. This will take a couple minutes once you run this cell. They won't be deleted yet though: eventually an interactive area will appear below the cell, displaying all your images in a grid. Here you can select the ones you don't like and mark them for deletion too. Follow the instructions in the colab. It is beneficial to delete low quality or unrelated images that slipped their way in. When you're finished, send Enter in the text box above the interactive area to apply your changes.
4️⃣ Tag your images: We'll be using the WD 1.4 tagger AI to assign anime tags that describe your images, or the BLIP AI to create captions for photorealistic/other images. This takes a few minutes. I've found good results with a tagging threshold of 0.35 to 0.5. After running this cell it'll show you the most common tags in your dataset which will be useful for the next step.
5️⃣ Curate your tags: This step for anime tags is optional, but very useful. Here you can assign the activation tag (also called trigger word) for your Lora. If you're training a style, you probably don't want any activation tag so that the Lora is always in effect. If you're training a character, I myself tend to delete (prune) common tags that are intrinsic to the character, such as body features and hair/eye color. This causes them to get absorbed by the activation tag. Pruning makes prompting with your Lora easier, but also less flexible. Some people like to prune all clothing to have a single tag that defines a character outfit; I do not recommend this, as too much pruning will affect some details. A more flexible approach is to merge tags, for example if we have some redundant tags like "striped shirt, vertical stripes, vertical-striped shirt" we can replace all of them with just "striped shirt". You can run this step as many times as you want.
6️⃣ Ready: Your dataset is stored in your Google Drive. You can do anything you want with it, but we'll be going straight to the second half of this tutorial to start training your Lora!
This is the tricky part. To train your Lora we'll use my ⭐ Lora trainer colab. It consists of a single cell with all the settings you need. Many of these settings don't need to be changed. However, this guide and the colab will explain what each of them do, such that you can play with them in the future.
Here are the settings:
▶️ Setup: Enter the same project name you used in the first half of the guide and it'll work automatically. Here you can also change the base model for training. There are 2 recommended default ones, but alternatively you can copy a direct download link to a custom model of your choice.
▶️ Files: Here are the settings that change how your dataset will be processed.
The folder structures are both fine, the old one organizes the files by type while the new one contains centralised project folders. Make sure your images are in the right place.
The resolution should stay at 512 this time, which is normal for Stable Diffusion. Increasing it makes training much slower, but it does help with finer details.
flip_aug is a trick to learn more evenly, as if you had more images, but makes the AI confuse left and right, so it's your choice.
keep_tokens is important, set it to 1 if you included an activation tag in the first half of the guide.
shuffle_tags should always stay active if you use anime tags, as it makes prompting more flexible and reduces bias.
activation_tags is important, set it to 1 if you added one during the dataset part of the guide. This is also called keep_tokens.
▶️ Steps: We need to pay attention here. There are 4 variables at play: your number of images, the number of repeats, the number of epochs, and the batch size.
These 4 values decide how long your Lora will take to train (for my method, it is usually 15 to 90 minutes, which is around 1000 to 6000 total steps). The default values are good if you have less than 50 images, and the colab gives some instructions to help you decide.
Too few steps will undercook the Lora and make it useless, and too many will overcook it and distort your images. This is why we choose to save the Lora every few epochs, so we can compare and decide later. For this reason, I recommend few repeats and many epochs.
There are many ways to train a Lora. The method I personally follow focuses on balancing the epochs, such that I can choose between 10 and 20 epochs depending on if I want a fast cook or a slow simmer (which is better for styles). Also, I have found that more images generally need more steps to stabilize. Thanks to the new min_snr_gamma option, Loras take less epochs to train. Here are some healthy values for you to try:
20 images × 10 repeats × 10 epochs ÷ 2 batch size = 1000 steps
100 images × 3 repeats × 10 epochs ÷ 2 batch size = 1500 steps
400 images × 1 repeat × 10 epochs ÷ 2 batch size = 2000 steps
1000 images × 1 repeat × 10 epochs ÷ 3 batch size = 3330 steps
▶️ Training: The most important settings. However, you don't need to change any of these your first time. In any case:
The unet learning rate dictates how fast your Lora will absorb information. Like with steps, if it's too small the Lora won't do anything, and if it's too large the Lora will deepfry every image you generate. There's a flexible range of working values, specially since you can change the intensity of the lora in prompts. Assuming you set dim between 8 and 32 (see below), I recommend 5e-4 unet for almost all situations. If you want a slow simmer, 1e-4 or 2e-4 will be better. Note that these are in scientific notation: 1e-4 = 0.0001
The text encoder learning rate is less important, specially for styles. It helps learn tags better, but it'll still learn them without it. It is generally accepted that it should be either half or a fifth of the unet, good values include 1e-4 or 5e-5. Use google as a calculator if you find these small values confusing.
The scheduler guides the learning rate over time. This is not critical, but still helps. I always use cosine with 3 restarts, which I personally feel like it keeps the Lora "fresh". Feel free to experiment with cosine, constant, and constant with warmup. Can't go wrong with those. There's also the warmup ratio which should help the training start efficiently, and the default of 5% works well.
The dim/alpha mean the size and scaling of your Lora, and they are controversial: For months everyone taught each other that 128/128 was the best, and this is because of experiments wherein it resulted in the best details. However these experiments were flawed, as it was not known at the time that lowering the dim and alpha requires you to raise the learning rate to produce the same level of detail. This is unfortunate as these Lora files are 144 MB which is completely overkill. I personally use 16/8 which works great for characters and is only 18 MB. Nowadays the following values are recommended (although more experiments are welcome):
▶️ Lora Type: Here is where you choose the kind of Lora from the 3 I explained in the beginning. Personally I recommend you stick with LoRA for characters and LoCon Lycoris for styles. LoHas are hard to get right. This is also where you set the conv_dim and conv_alpha we just mentioned (which don't apply to LoRA); those are the size of the additional learning layers, which are each bigger than the last and so my recommended values all result in around 35 MB files. Finally you can choose to compress these additional layers but it might have a negative effect as well, we don't know yet.
▶️ Ready: Now you're ready to run this big cell which will train your Lora. It will take 5 minutes to boot up, after which it starts performing the training steps. In total it should be less than an hour, and it will put the results in your Google Drive.
You read that right. I lied! 😈 There are 3 parts to this guide.
When you finish your Lora you still have to test it to know if it's good. Go to your Google Drive inside the /lora_training/outputs/ folder, and download everything inside your project name's folder. Each of these is a different Lora saved at different epochs of your training. Each of them has a number like 01, 02, 03, etc.
Here's a simple workflow to find the optimal way to use your Lora:
Put your final Lora in your prompt with a weight of 0.7 or 1, and include some of the most common tags you saw during the tagging part of the guide. You should see a clear effect, hopefully similar to what you tried to train. Adjust your prompt until you're either satisfied or can't seem to get it any better.
Use the X/Y/Z plot to compare different epochs. This is a builtin feature in webui. Go to the bottom of the generation parameters and select the script. Put the Lora of the first epoch in your prompt (like "<lora:projectname-01:0.7>"), and on the script's X value write something like "-01, -02, -03", etc. These will perform replacements in your prompt, causing it to go through the different numbers of your lora so you can compare their quality. You can first compare every 2nd or every 5th epoch if you want to save time. You should ideally do batches of images to compare more fairly.
Once you've found your favorite epoch, try to find the best weight. Do an X/Y/Z plot again, this time with an X value like "0.5>, 0.6>, 0.7>, 0.8>, 0.9>, 1>". It will replace a small part of your prompt to go over different lora weights. Again it's better to compare in batches. You're looking for a weight that results in the best detail but without distorting the image. If you want you can do steps 2 and 3 together as X/Y, it'll take longer but be more thorough.
If you found results you liked, congratulations! Keep testing different situations, angles, clothes, etc, to see if your Lora can be creative and do things that weren't in the training data.
Finally, here are some things that might have gone wrong:
If your Lora doesn't do anything or very little, we call it "undercooked" and you probably had a unet learning rate too low or needed to train longer. Make sure you didn't just make a mistake when prompting.
If your Lora does work but it doesn't resemble what you wanted, again it might just be undercooked, or your dataset was low quality (images and/or tags). Some concepts are much harder to train, so you should seek assistance from the community if you feel lost.
If your Lora produces distorted images or artifacts, and earlier epochs don't help, or you even get a "nan" error, we call it "overcooked" and your learning rate or repeats were too high.
If your Lora is too strict in what it can do, we'll call it "overfit". Your dataset was probably too small or tagged poorly, or it's slightly overcooked.
If you got something usable, that's it, now upload it to Civitai for the world to see. Don't be shy. Cheers!
In this tutorial I would like to teach you how to get more consistent colors on your characters. Everything is based on this extension: hako-mikan/sd-webui-regional-prompter: set prompt to divided region (github.com)
Previously I did another tutorial to achieve a similar result: No more color contamination - Read Description | Stable Diffusion Other | Civitai
In positive prompt we put without quotes:
"blue hair twintail BREAK
yellow blouse BREAK
orange skirt"
In negative prompt we must place a negative token or several, if we do not put a single negative token, Stable Diffusion will bugge:
"worst quality, low quality"
In resolution I will put 572 x 768 and I will go to "divide mode" in Regional Prompter and put vertical. If I choose to put 768 x 572 then I must make horizontal and not vertical.
In divide ratio I will put 1,1,1. This will divide our image into 3 equal parts. Then I place an image to better understand what happens.
In short, let's imagine that our image is 100%, if we put it 1,1,1 it would be divided by 33%, 33%, 33%. If we put it 1.1, it would be 50%, 50%. I have not tested the proportions much.
For this step we should have our regional prompter in this way:
My result, if you don't look good, I leave printscreen to see my configuration used at the time of generating: https://prnt.sc/q395bQl_y9z7
If checked, this extention is enabled.
Prompts for different areas are separated by "BREAK". Enter prompts from the left for horizontal prompts and from the top for vertical prompts. Negative prompts can also be set for each area by separating them with BREAK, but if BREAK is not entered, the same negative prompt will be set for all areas. Prompts delimited by BREAK should not exceed 75 tokens. If the number is exceeded, it will be treated as a separate area and will not work properly.
Check this if you want to use the base prompt, which is the same prompt for all areas. Use this option if you want the prompt to be consistent across all areas. When using base prompt, the first prompt separated by BREAK is treated as the base prompt. Therefore, when this option is enabled, one more BRAKE-separated prompt is required than Divide ratios.
Sets the ratio of the base prompt; if 0.2 is setted, the base ratio is 0.2. It can also be specified for each region, and can be entered as 0.2, 0.3, 0.5, etc. If a single value is entered, the same value is applied to all areas.
If you enter 1,1,1, the area will be divided into three parts (33,3%, 33,3%, 33,3%); if you enter 3,1,1, the area will be divided into 60%, 20%, and 20%. Decimal points can also be entered. 0.1,0.1,0.1 is equivalent to 1,1,1.
Specifies the direction of division. Horizontal and vertical directions can be specified.
Updated 21.3.:
Support for multiple input files added
Extended sample range to 10 000 by default
Tool that helps with selecting a random amount of prompts from a file that contains prompts. I am using it when testing the different prompt packages I am uploading. I'll take a big enough sample to generate a few images. Remove and fix obvious maligned prompts, rinse and repeat.
pip install gradio
gradio guitoolkit.py
or use python guitoolkit.py
Download this file / copy the code below into a file called guitoolkit.py (or whatever you want to call it)
Make/use a virtual environment python -m venv venv
Activate environment venv\Scripts\activate
Run the command pip install gradio
to install the gradio library which is required to use this
When you have installed that, run either gradio guitoolkit.py
or python guitoolkit.py
You should now have the tool ready to use if you get the following: gradio .\guitoolkit.py
launching in reload mode on:
http://127.0.0.1:7861 (Press CTRL+C to quit)
You can now visit http://127.0.0.1:7861
where the tool is ready to use
Input the file(s) you want to shuffle, select how many you want, copy the output, insert it into e.g. Automatic1111
import gradio as gr
import random
def shuffle_file(file_obj, no_prompts):
prompts = []
for file in file_obj:
with open(file.name) as infile:
in_prompts = infile.readlines()
prompts.extend(list(set(in_prompts)))
prompts = random.sample(prompts, no_prompts)
random.shuffle(prompts)
print(type(prompts))
return "".join(prompts)
demo = gr.Interface(
fn=shuffle_file,
inputs=["files", gr.Slider(5, 10000)],
outputs=["code"],
)
if __name__ == "__main__":
demo.launch(server_port=9800)
Windows Defender is reporting very common anime based VAE files to be malware and is automatically deleting them. This VAE file is a pruned version of that file using the A1111 ToolKit extension, and in testing it works the same. It will not trigger detection and has been scanned by the premium antivirus software SpyHunter 5 and found to be malware-free.
Sample images were made with the same seed, prompt, and model, but switching between the original VAE file and my version. I have also included a simple difference map using layering functions in The GIMP image editing software, and a screenshot of the alert I received from Windows Defender.
This extension provides a simple and easy-to-use way to denoise images using the cv2 bilateral filter and guided filter. Original script by: https://github.com/lllyasviel/AdverseCleaner
Installation
Go to Extensions > Install from URL and paste the following URL:
https://github.com/gogodr/AdverseCleanerExtension
Or unzip this file manually in your extensions folder.
Get in GitHub: https://github.com/kanjiisme/anything-model-batch-downloader
Anything Model Batch Downloader allows you to batch download models from civitai, hugging face easily just through model URL.
Anything Model Bacth Downloader is designed to run on cloud systems like Google Colab, and Amazon SageMaker.
The download will be done via a JSON file.
The arguments system allows you to add download conditions to the downloader.
Anything Model Batch Download is written as modules, allowing you to use the source code in a simpler way.
{
"urls" : [
{
"model_url": "https://civitai.com/models/2583/grapefruit-hentai-model"
},
{
"model_url" : "https://civitai.com/models/11367/tifameenow",
"args" : "sub"
},
{
"model_url" : "https://civitai.com/api/download/models/12477",
"args" : "raw=\"arknights-suzuran.safetensors\" type=\"lora\" sub forcerewrite"
},
{
"model_url" : "https://civitai.com/models/4514/pure-eros-face",
"args" : "sub saveto=\"nsfw\""
}
]
}
In there:
model_url
is the model link (or download link if using raw
arguments).
args
are the conditions required for the download.
python batch_download.py
Or if you have a custom JSON file:
python batch_download.py --listpath="you/path/to/json"
See it here.
These are worksapces to load into ComfyUI for various tasks such as HR-Fix with AI Model Upscaling
HR-Fix Bloom Workspace depends on Filters Suite V3, and NSP CLIPTextEncode nodes from here: https://civitai.com/models/20793/was-node-suites-comfyui
Extract "ComfyUI-HR-Fix_workspace.json" (or whatever the worksapce is called)
Load workspace with the "Load" button in the right-hand menu and select "ComfyUI-HR-Fix_workspace.json"
Select your desired diffusion model
Select VAE model or use diffusion models vae
Select your desired upscale model
change prompt and sampling settings as seen fit.
(currently v1 set to 512x768 x4= 2048x3072, v2 has a resize so final size is 1024x1536)
ComfyUI is a super powerful node-based, modular, interface for Stable Diffusion. I have a brief overview of what it is and does here. And full tutorial on my Patreon, updated frequently.
Please consider joining my Patreon! Advanced SD tutorials, settings explanations, adult-art, from a female content creator (me!) patreon.com/theally
ComfyUI is a super powerful node-based, modular, interface for Stable Diffusion. I have a brief overview of what it is and does here. And full tutorial content coming soon on my Patreon.
In this model card I will be posting some of the custom Nodes I create. Let me know if you have any ideas, or if there's any feature you'd specifically like to see added as a Node!
Please consider joining my Patreon! Advanced SD tutorials, settings explanations, adult-art, from a female content creator (me!) patreon.com/theally
This is my complete guide how to Generate sprites for 8bit games or GIFs :) Enjoy the video
Use it with my toolkit to get similer results to the ones on the video: https://civitai.com/models/4118
or any other model that you like :)
Few other useful links:
My Artstation: https://www.artstation.com/spybg
My official Discord channel: https://discord.io/spybgtoolkit
Patreon: https://www.patreon.com/SPYBGToolkit
Do not download LoRa (NOT NECESSARY)
This is a simple and powerful tutorial, I uploaded a LORA file because it was mandatory to upload something, it has nothing to do with the tutorial. Tribute and credit to hnmr293.
Tips:
0# Give priority to colors, first them and then everything else, 1girl, masterpiece... but without going overboard, remember tip #3
1# The last Token of Target Token must have "," like this: white, green, red, blue, yellow, pink, 👈 ATTENTION: For some people it works to put a comma at the end of the token, for others this gives an error. If you see that it has an error, delete it.
2# The color should always come before the clothes. Not knowing much English happened to me that I put the colors after the clothes or the eyes and the changes were not applied to me.
3# Do not go over 75 token. It is a problem if they go to 150 or 200 tokens.
4# If you don't put any negative prompt, it can give an error.
5# Do not use token weights below 1 eg: (red hoddie:0.5)
20 images were always worked on and in most of the tests it was 100%. If they put, for example, green pants, some jean pants (blue) can appear, also with the skirts a black skirt can appear. These "mistakes" can happen.
That's why I put 95% in the title because 1 or 2 images out of 20 images may appear with this error.
It's VAE that, makes every colors lively and it's good for models that create some sort of a mist on a picture, it's good with kotosabbysphoto model that sometimes create mist on image and blend colors, dropped it here because it's faster to download if you use stable diffusion on huggingface so you don't have to drop file in Google colab and wait longer than you have : D
Stable diffusion = 2GB, Trained on 5B images.
Lora = 128mb, trained on 10/100/300?????
this image, for example, was trained in 1 dim, 1 alpha, yes, 1 mb of filesize.
and also, trained with only 3 images.
a portrait of a girl on red kimono, underwater, bubbles
and this too, the style is identical and it's changes with prompt.
a portrait of a girl
a portrait of elon musk
unet_lr: 2e3network_train_on: unet_only [ for styles ]
100 repeats 5 epochs because uses low number of images.
//////////////// New training setup
my new training recipe is 1e3, unet only, dim and alpha 1.
cosine with restart / 12 cycles.
10 repeats / 20 epochs.
⚠️was trained on anime vae, so it's need anime vae or will look fried ⚠️
clip 2, vae on, hypernetwork strenght 1.
1-Install Monkeypatch Extension and reload the ui
https://github.com/aria1th/Hypernetwork-MonkeyPatch-Extension
2-Go to create Beta hypernetwork in your train section.
3-Place this layer structure 1,0.1,0.1,1 //thanks queria!, i personally like this so much.
4-Select activation function of hypernetwork:tanh
5-Select Layer weights initialization:xavier normal
6-and finally, create the hypernetwork.
7-now in Train_Gamma, select your new hypernetwork.
8-Hypernetwork Learning rate: 6.5e-3 "this is for the math" so is perfectly normal ,also, 6.5e-4 will cause less damage to original image.
9-enable Show advanced learn rate scheduler options(for Hypernetworks) and Uses CosineAnnealingWarmupRestarts Scheduler.
10-Steps for cycle = number of images in your dataset.
11-Step multiplier per cycle: 1.1 or 1.2
12-Warmup step per cycle = the half of number of images.
13-Minimum learning rate for beta scheduler = 1e-5 [ or 6.5e-7 , will get less style from dataset, but more control ]
14-Decays learning rate every cycle = 0.9 or 1
15a-batchsize 2, grad 1, steps 1000.
15b-you can also do this [ batchsize 2, grad(number of image in dataset divided by two) but for that you only will need something like 250 steps, but personally i don't like it.
16- your prompt file need to be style.txt.
17- you can also try to "Read parameters (prompt, etc...) from txt2img tab when making previews" to see results with the style in your prompt, for example, mine is "girl in a red kimono".
Note: i train with 2 clip skip, none hypernetwork, and 1 hypernetwork strength.
18- and i'ts that! 5 MB of hypernetwork trained in under 10/20 minutes.
ComfyUI Extension Nodes for Automated Text Generation.