3840x2160

Render2.jpg

๐Ÿงต Untitled Thread

Anonymous No. 858763

I've been working on a 3d vtuber with neural tts for some weeks. Not satisfied with the lip sync. Anyone have experience with an easy workflow? What other software can i try? Other criticism? Quite fun to just write a script and have it all rendered out

https://www.youtube.com/watch?v=8SB_GD1l6QU

Anonymous No. 858777

>>858763
I take that's your video? Haven't bothered watching it through cause on phone but you might be on to something, looks like the format could work one way or another. You didn't ask, I know. You could try nvidias a2f
https://www.nvidia.com/en-us/omniverse/apps/audio2face/

Anonymous No. 858781

>>858763
>the lipsync is fine, most vtubers have abysmal lipsync but this is pretty good.
>the character design is interesting but it doesn't look like it would work in any other lighting conditions or situations
>no one is watching some fucking text to speech video for 20 minutes

Anonymous No. 858784

>>858781
>>the lipsync is fine, most vtubers have abysmal lipsync but this is pretty good.
It skips some words, like "tesla" in the intro

>>858777
Ill try it, looks promising

>the character design is interesting but it doesn't look like it would work in any other lighting conditions or situations
im making a studio for him, will be fun to test

>no one is watching some fucking text to speech video for 20 minutes
The dream is to just write a text, have it translated to other languages that are big on youtube, then render out videos for each language. I think it has promise considering deepvoice software keeps getting better every month. If i had better GPUs i could get a much better voice now

Anonymous No. 859428

>>858784
good vision anon keep it up.

Anonymous No. 859630

>>859428
Ty. Still very unsure if I should continue with the neural voice, or hire a voice actor... Average view time on youtube is 4m31s. Not good. I think the editing and voice should at least be faster, and overall theme less "dark". [/blog]

Anonymous No. 859631

>>859630
i prob boosted that the other day- I watched it from start to end- Don't let the AVG view time hit you either- its a bad marker of viewership. Most of my animations run between 1-5 minutes, but average view time is somewhere around 50-60 seconds- It hurts to think of people dipping when things are heating up in my vid, but it has more to do with the people that zip off after 2 seconds (for whatever reason, I do the same thing often) bringing the whole average down

Meant to reply with encouragement but I was too tired- I think the voice sounds good, the tech has advanced pretty far yea? Don't hire a voice actor, just move forward and you should find success. If the opportunity for a VO comes your way, maybe jump on the chance. But don't let it stop you from making more viddys

Anonymous No. 859641

>>859631
Link to your channel? Machine learning based TTS is much better than just regular ones from a few years ago. A guy made one based on jordan petersons e-book and it sound nearly perfect, too bad it requires A LOT of computational power and storage. The video and image in OP are upscaled using Topaz AI btw. Originally rendered at much lower quality. Magic.

Ty. Will continue with the voice, just slightly faster and tone back the effects a little bit. Making one on Howard Stern

128x128

3l_128.png

Anonymous No. 859735

>>859641
Cool I look forward to it.

Also my last reply, I meant to say:
1 minute vid: avg. view=25 seconds
5 minute vid, avg view time 45-60 seconds
Hope that makes more sense- I don't look into those stats very deeply but I like checking Geography of viewership sometimes. But also, youtube is my chillzone- It would mean more to somebody looking to grow/ extend reach

https://www.youtube.com/c/world4jack

Anonymous No. 859747

>>859735
Ah, Ive talked to you before. PS1 graphics guy.

https://www.youtube.com/watch?v=BbcW0aCSKiA
This is very, very good. You should be making music videos for oneohtrix point never

Anonymous No. 859934

>>859630
They neural voice makes it better.
You have to experiment with it.
The goal should be to give the program a text with sections in different color for different emotion and it does it by itself. And you link shapekeys to the text for the visuals.

840x488

2021-11-03 11_13_....png

Anonymous No. 859938

>>859934
I just tried this demo from Azure.
https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/#features

It has "SSML", which is basically a markup language for emotions like you are describing. I can export the wav with emotion-markers for the animation.

I also think I should try a female avatar for limbic appeal, Maybe a voice and soundbed similar to this:
https://vimeo.com/75534042