Kling AI

Video

Generate cinematic AI videos from text or images with start and end frames, lip sync, sound effects, and creative video tools.

AI videoimage to videotext to videolip syncsound effects

Visit website Watch review

Overview

Kling AI is an AI video generation platform for turning prompts and images into short cinematic clips. It supports text-to-video, image-to-video, start and end frame control, prompt enhancement, generated sound effects, lip sync, image generation, predefined effects, and virtual try-on workflows. It is strongest for creators who want to produce stylized scenes, animated characters, social videos, or short narrative clips without filming everything manually.

Platforms

Web
iOS
Android

Video review

Prefer YouTube? Open this review on YouTube.

Video transcript

Hey kid, do you want to learn how to make cool AI generated videos like this one with lip sync? Full character consistency across scenes, realistic sound effects, and AI generated voices. It's very easy with the tools I'm going to show you. You can use this to create explainer videos or documentaries for YouTube without ever showing your face or voice. Or just have some fun and make cool animations. My friend Florian will show you how. In this video, you will learn how to use Kling AI, Google's Nano Banana, and ElevenLabs to create full-blown movies with consistent characters, cinematic camera movements, expressive speech, and sound effects. And we will keep it simple. None of this overly complicated prompt engineering. You only need creativity. If you can think of a scene, you can create it. And the possibilities with this are really endless. You can create any kind of movie. You can start your own YouTube channel. And if your videos are good enough, you can make money from it. My name is Florian Walther and this is the AI tool corner where I review the latest AI software to find out which ones can actually improve our lives and businesses. So, the first tool we need is Kling AI, which is an amazing AI video generator. We create an account here, and when you sign up, you get free credits. So, you can try this out without paying anything. Go ahead and click on sign in and then create your account. And the first tool we need is not the video generator but the image generator. We use text to image to generate our base character for the movie. And here you simply describe your main character in a text prompt. I've already prepared a prompt and a bunch of different scenes. I will put all of these prompts and instructions and links into a PDF and I will put it into the video description below. You can download it from there if you want to follow along step by step. But of course, you can also create your very own movie with your own prompts. So, mine's photo realistic image of a fit young muscular guy in a tank top. He has a very low body fat and a chiseled good-looking face. Full body shot, white background. The last part here is important because we want to see the whole character so that he wears the same clothes in different scenes. and a white background because we only want the character and nothing else so that we can reuse this base image later for different scenes. And I use a muscular guy here because this will be a video about intermittent fasting. Then down here we select the aspect ratio for YouTube. You want to pick 16:9 if you want to use this for TikTok or Instagram Reels 9:16. We can generate multiple outputs at once. For our base character, I like to generate multiple ones because these image generations are very cheap. And then we can pick from the best one, right? So, let's create this. So, we get a bunch of different outputs. All of them are good, but I'm going to pick this one here because it's a full body shot and it looks exactly how I imagined it. Here, over the download button, we now download this to our computer. Right click and save this image. So, I have this image on my computer now. And this will be the base for the different scenes for our movie. Next, we go to ElevenLabs, which is the state-of-the-art AI text-to-speech generator. It's better than anything else. Creates super realistic voices. And again, I will put a link into the video description. Go ahead and create your account here. Again, when you sign up, you get a bunch of free credits, which are more than enough to create multiple videos. Here, we want to go to text to speech. For the model, we select Eleven v3, which is the latest one and the most expressive, and it gives us the most control. Then select a voice for the AI. Pick the ElevenLabs quality presets recommended for Eleven v3. And then select a voice that you like. You can hear a preview here. >> Architecture is the thoughtful arrangement of space to uplift the spirit. >> I'm going to choose Mark Convo AI. I like this one the most. And the first prompt will say this. Now, the cool thing about Eleven v3 is that we can add these text to describe how the AI should speak and what emotions it should use. Again, I will put all the prompts I use here into the PDF in the video description. You can copy it from there. Then we click on generate speech. This takes a few seconds. And then we get two different outputs. In this video, I'm going to teach you how to use intermittent fasting to stay lean all year round. In this video, I'm going to teach you how to use intermittent fasting to stay lean all year round. They're very similar, but I prefer this one here. Again, we download it. I just call it one for ordering because this is the first scene. If you want to have multiple speakers in your scene, that's also possible and it's explained in the ElevenLabs documentation. Again, the link to this will be in the PDF in the video description if you want to check this out. You can also enhance the speech with this button down here, which then adds text automatically. The third and last tool we need is Google AI Studio, which is where we can use Nano Banana, the image generation tool. Again, this comes with a free tier. You just have to create an account and then the UI should look something like this. You have a chat interface here. We can select Nano Banana. The pro one is paid but the default Nano Banana is free. And then we take the base image of our character. Drag it in here. And then we describe the scene where we want to put this character. Show this guy standing in a beautiful garden with a pool behind him. This is what I want to use for the first scene. We use Nano Banana and not Kling AI because Nano Banana is much better at character consistency. Putting the same character into different situations and we don't have to pay any credits. Result looks pretty good. This will be our start frame. Again, download this via the download button. I'm going to call this one_1 because this is the start frame of the first scene, but you don't have to follow the same naming convention. Then I want to create the end frame of our scene. Now, show him with the same shot, but make him pull up his shirt with his right hand, revealing his chiseled abs. If you're not happy with the result, you can either edit the previous message, change it a little bit, and try again, or just send a follow-up message, or refresh the page, and start over again. Pull up the shirt further. I want to see more of the abs. I refresh the page, tried again, and now I'm happy with the result. This should suffice. Now, when we create an image with Nano Banana, we have this watermark here on the bottom right. If this doesn't bother you, just leave it in. But I also found a free watermark remover which works really well. Again, I will put a link to this into the video description. So, we download this image here as well. Call it one_2 because this is the end frame of the first scene. Then, we upload it here. And voila, watermark removed. We don't lose any image quality. And this tool is free. So, we select this image again. Again, this is one_2. Now that we have our start and end frames, we go back to Kling AI and to the AI video generator. This is where things get interesting. Up here, we select the latest model, which at the time of this video is 2.5 Turbo. Then we can select a start and an end frame. And this is what we just generated. We want to pick the ones without the watermark. So the start frame is one_1, and the end frame is one_2. And the start and end frames are a really cool feature of Kling AI. The end frame is not necessary, but we can use it to guide the AI to achieve exactly the result we want. And you can create some cool transitions with this. More on that later. The prompt for the generated video goes here. Again, I have prepared it. The man looks confidently at the camera while pulling up his shirt, revealing his chiseled abs. We can send it like this, but we can also use this button here to let DeepSeek enhance our prompt. So it will look at both the start and end frame and then come up with three different enhanced prompts that we can use. They are more descriptive than what we wrote here. So for example, a man stands confidently by the pool in a white tank top and black shorts gradually pulling up his shirt to reveal toned abs while maintaining steady eye contact with the camera. The camera orbits around to showcase his physique capturing blah blah blah. I don't want to use this because I don't want this orbit movement. Actually, I'm just going to use my default prompt and none of these here. Down here, we can also add sound effects to the scene. The AI will actually do this automatically if you keep this empty. But if you have specific sounds in mind, you can describe them here. I'm going to use sounds of a garden with birds chirping in the background. Again, you can use DeepSeek to enhance this prompt, but I'm going to keep my default input. Then, we can select the duration, 5 seconds or 10 seconds. It depends on how long the script is that your character speaks in this time. For this scene, I'm going to keep 5 seconds. Again, you can generate multiple outputs, but to save credits, I'm just going to create one. And if we don't like it, we can still recreate it. And 2 minutes later, we get our result with sound effects and exactly the movement we described. This is perfect. But if you don't like it, you can click on regenerate here to try this again with the same prompt. Or you can modify the prompt and create a new scene. And then the last step is to make our character speak. For this, we click here on lip sync, which is a really cool feature. Here we click on upload local dubbing. We don't want to use one of these predefined voices. We want to use our own ElevenLabs voice because it's just better. I upload the MP3 of the first scene that I created earlier in ElevenLabs. After the upload has finished, we click on add speech, which adds the MP3 to this timeline down here. And then we position it where we want our character to speak it. at the very beginning or after a few seconds. Let's try here. >> In this video, I'm going to teach you how to use >> No, actually, let's put it at the very start. Now, the mouth of the character doesn't move with the voice. >> Fasting to stay lean. >> This only happens after we click on generate down here. This will now synchronize the lips to the voice that we uploaded. And then we get something like this. >> In this video, I'm going to teach you how to use intermittent fasting to stay lean all year round. And now we create a bunch of these different scenes. Stitch them together and then we have a full-blown movie. Now we will have some cool scene transitions later. Not all of the clips will look as static as this one. So make sure to watch the whole video to see the final result. We download this video doesn't have any watermark. Let's call it one because it's scene one. And then we repeat the same steps for our other scenes. We put a script for the next scene into ElevenLabs. >> Step one, wake up. Drink some water. Avoid caffeine too early because you will crash later. >> Download the MP3. We refresh Nano Banana with F5 because we want to start from a blank slate. Again, we insert our base character as the reference. We describe the scene and generate the image. Looks pretty good. Again, we download this. We don't need an end frame this time, so I'm just going to call this one two_1. Again, we remove the watermark with the watermark remover. We go back to Kling AI and the AI video generator. We have to clear this input here manually. Again, we upload the start frame. And this time, we don't need an end frame. I only add an end frame if I have a very specific result in mind. But with our next prompt, I leave it open to the AI how exactly it plays out. The man opens his eyes, sits up straight on his bed, puts his feet on the floor, then while still sitting on his bed, starts drinking the glass of water next to him. Again, we can let DeepSeek enhance the prompt. Let's try this one. This time, we keep the sound effects empty, which autogenerates sounds. We change the duration to 10 seconds this time because the scene is a bit longer. And then we generate it. Let's check it out. Now, the morning is a bit weird, but whatever. I think it's decent. Let's add lip sync. We upload our MP3. We add the speech and we position it where we want our character to speak it. Step one, wake up. Drink some water. Avoid caffeine too early because you will crash later. Yeah, maybe a little bit later. Like this. There's something important I forgot in the first clip. We want to activate sound from video to keep the original sound effects and add the speech on top like this. >> Step one, wake up. Drink some water. Avoid caffeine too early because you will crash later. >> Yeah, let's generate this. Let's check it out. >> Step one, wake up. Drink some water. Avoid caffeine too early because you will crash later. >> Yeah, looks good to me. Let's download this. Let's create the next script. >> After two to three hours, have your coffee. You want to drink it black to maximize the benefits of fasting. Download three. Show this guy next to a drip coffee machine holding a mug of coffee in his hands. He has a friendly smile on his face. And there we go. And again, the character consistency is really the strength of Nano Banana. It's the exact same guy in a different location with the same clothes on. But the kitchen looks a bit too clean for my taste. So, I'm going to try this again. Yeah, this looks better. So this time I'm going to use the same image as the start and end frame because I want our character to take a sip from his coffee and then go back to his original position because I want to use this end frame as the start frame of the next scene. More on that in a moment. The prompt says the man waits for a few seconds then takes a quick sip from his coffee mug. And the sound prompt is a coffee machine hissing and roaring quietly in the background. Let's use this enhanced prompt here. Sounds a bit better. 10 seconds generate. And by the way, did you know that coffee tastes much better when you drink it from a straw? I don't know why, but it's much better. Let's check out the result. Yeah, the break at the beginning is perfect because this is where I want him to speak. The hissing is a bit loud, but let's use this for our lip sync. And here's the lip synced video. >> After two to three hours, have your coffee. You want to drink it black to maximize the benefits of fasting. >> Yeah, pretty good. Next voice over. >> For lunch, eat only a small meal. We want to keep most of our calories for dinner. >> I'm going to pick this one. This time, I want to use the end frame of our previous scene as the start frame of the new scene because I want him to walk over to a table and sit down there. So, the prompt describes, "Show this guy sitting at a dinner table on the other side of the same kitchen. In front of him is a small bowl of yogurt." Nano Banana has trouble with this one. It always puts it at the same position, which is not what I want. I want him on the other side of the kitchen. Yeah, I made a slight change to the prompt. I added outside of the current frame. It's still not perfectly what I had in mind, but this is actually good because this is a few meters away from the original position, right? So he has to walk forward towards the screen. Let's try this one. So in Kling AI, I now use the end scene of the previous clip as the start scene of this one. And then our new end frame. And the prompt says, "The man puts his coffee mug down on the kitchen counter and then walks over to the dinner table while still looking at the camera. He sits down at the table with the yogurt bowl in front of him." I like this enhanced prompt here, so I'm going to use this one. No sound effect prompt. 10 seconds duration. generate. Here's the lip-synced video. >> For lunch, eat only a small meal. We want to keep most of our calories for dinner. Yeah, sometimes these lip movements seem a bit over the top, but I think it's decent. And by using the end frame of the previous video as the start frame of our new video, we can create seamless transitions and clips that are much longer than 10 seconds. You will see the final result in a few minutes. The fifth script. >> Keeping calories low will also keep you more productive because you're not tired from digesting food. Show this guy working on a desktop computer with over-ear headphones and glasses on while looking at the camera. And we don't need an end frame this time. The man keeps looking at the camera while sitting in front of his workstation. And the sound prompt is birds chirping in the background. I enhanced both the video and the audio prompt because I liked what it generated. 10 seconds duration. Generate. And here's the lip synced video. >> Keeping calories low will also keep you more productive because you're not tired from digesting food. >> Nice. Just one more scene to go and this one will have a cool transition. So, make sure to watch this video all the way to the end. >> Then enjoy your big dinner feast with family or friends, but stick to whole foods and stay within your calorie window. >> Show this guy sitting on a restaurant table surrounded by friends. Placed on the table is a variety of delicious foods. He wears a button shirt. There we go. But this will be our end frame. For the start frame, I say show this exact scene from the top. Keep people and their positions identical because this way we can generate a really cool transition. The camera moves down to the table in a cinematic motion, finally showing the faces of all protagonists. The main character looks directly at the camera the whole time while the other people are conversing with each other. And for the sound prompt, restaurant noise, people mumbling, glasses clinking, chairs moving. Here in the final lip sync, we have to select the character that we want to speak because there are multiple people in the scene. Character one is already the correct one. So, let's generate this. Here's our final scene. >> Then enjoy your big dinner feast with family or friends, but stick to whole foods and stay within your calorie window. >> Yeah, he looks a little bit like a psychopath in this scene, but I think it's fine. The last step is to put all these clips into a video editing tool. I use Premiere Pro, but you can use any video editing software. There are a lot of free alternatives like CapCut or DaVinci Resolve. Doesn't matter which one you use. And the editing we do here is very simple. We basically just stitch our different scenes together to get one long video. I trimmed the clips a little bit. I removed unnecessary parts to make the video a bit faster and more engaging. And we can also add some background music. For this, I like to use Pixabay because they have high quality music completely for free. Again, I will put the link into the video description. Let's use this one here. I dragged the music clip onto the timeline, adjusted the volume. It's all very simple. Let's render the video. And here is the final result. In this video, I'm going to teach you how to use intermittent fasting to stay lean all year round. Step one, wake up. Drink some water. Avoid caffeine too early because you will crash later. After two to three hours, have your coffee. You want to drink it black to maximize the benefits of fasting. For lunch, eat only a small meal. We want to keep most of our calories for dinner. Keeping calories low will also keep you more productive because you're not tired from digesting food. Then enjoy your big dinner feast with family or friends, but stick to whole foods and stay within your calorie window. >> I think this is super cool. If you created a cool video this way and uploaded it to YouTube, please leave the link in the comments below. I want to check them out. Kling AI also has a bunch of other AI tools like a text to video generator where you don't need a start frame. You have these predefined effects which you can use to create something like this. An old photo of my grandparents. I made them hug and the face consistency is actually pretty great. There is a virtual try on where you can upload your image and try on different clothes and a bunch of other tools. You can check them out on Kling AI. Again, the link is in the video description. Check out the PDF in the description with all the links and prompts and instructions. And again, if you created something cool, please show me in the comments below. I would love to check out your videos. Subscribe to the channel for more AI tool reviews in the future. Then I hope I see you in the next video. Take care.

Standout features

Image-to-video generation

Upload a starting image and describe the movement you want, then generate short clips with camera motion, character action, and scene animation.

Start and end frame control

Use an optional end frame to guide the clip toward a specific pose, camera angle, or transition, which helps when stitching multiple clips into a longer sequence.

Lip sync from uploaded audio

Add speech audio to a generated video, place it on the timeline, choose the speaking character when needed, and generate synchronized mouth movement.

AI sound effects

Let Kling generate ambient sound automatically or describe specific sounds such as birds, restaurant noise, or a coffee machine in the sound prompt.

Prompt enhancement

Use the built-in prompt enhancer to turn short motion or audio instructions into more descriptive prompts based on the selected frames.

Extra creative tools

Beyond the main video generator, Kling includes tools such as text-to-video, predefined effects, an image generator, and virtual try-on.

What it's great for

Create short AI movie scenes from character images and written prompts
Animate still images into social clips, explainers, or YouTube segments
Build longer videos by chaining clips with matching start and end frames
Add lip-synced dialogue or narration to generated characters
Generate quick visual experiments with effects, text-to-video, or virtual try-on

Pros & cons

Pros

What works especially well

Start and end frame control gives more direction than a simple one-shot text prompt
Lip sync can use uploaded audio instead of only built-in voices
Generated clips can include ambient sound effects and action sounds
Prompt enhancement is useful when a short prompt needs more visual detail
Free credits make it possible to test the workflow before paying
Includes several creative tools beyond basic AI video generation

Cons

Trade-offs to know upfront

Character consistency across very different scenes can still require careful source images and iteration
Generated motion is not always perfect, so weak clips may need regeneration
Lip movements can look exaggerated or unnatural in some scenes
Short clip durations mean longer videos need stitching in a separate editor
Credit usage can add up quickly when regenerating scenes or using higher-end models
Prompt enhancement may add camera moves or details you did not intend

Best for

Creators making AI-generated short films, animations, or story clips
YouTubers building faceless explainer or documentary-style videos
Social media creators testing cinematic AI scenes without filming
Marketers and educators prototyping video concepts from still images
Experimenters who enjoy iterating on prompts, frames, and transitions

Verdict

Kling AI is a strong creative video generator when you want more control than text-to-video alone, especially through image-to-video, start and end frames, sound prompts, and lip sync. It still rewards iteration and manual editing, but it can produce impressive short cinematic scenes from simple creative direction.

FAQ

Can Kling AI create videos from images?

Yes. Kling's image-to-video workflow lets you upload a still image as the starting frame and describe how the scene should move. You can also add an end frame when you want the clip to finish in a specific pose or composition.

Does Kling AI support lip sync?

Yes. Kling includes a lip sync workflow where you can upload audio, add it to the timeline, position it in the clip, and generate synchronized mouth movement for the selected character.

Can Kling AI generate sound effects?

Yes. You can leave the sound prompt empty to let Kling generate audio automatically, or describe the exact ambience and sound effects you want for the scene.

Can Kling AI make longer videos?

Kling generates short clips, but you can create longer videos by producing multiple scenes and stitching them together in a video editor. Reusing the previous clip's end frame as the next start frame can help make transitions feel more continuous.

Is Kling AI good for consistent characters?

It can work well when you guide scenes with strong reference images and careful start or end frames, but consistency is still not automatic. Expect to regenerate some clips or adjust source images when a character needs to stay recognizable across many scenes.