Text-To-Speech Beginners Course: Create Realistic Voice Narrations With Text-To-Speech & AI Voices | Martin Aranovitch | Skillshare
Drawer
Search

Playback Speed


  • 0.5x
  • 1x (Normal)
  • 1.25x
  • 1.5x
  • 2x

Text-To-Speech Beginners Course: Create Realistic Voice Narrations With Text-To-Speech & AI Voices

teacher avatar Martin Aranovitch, Digital Business Training & Education

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Watch this class and thousands more

Get unlimited access to every class
Taught by industry leaders & working professionals
Topics include illustration, design, photography, and more

Lessons in This Class

    • 1.

      Text-To-Speech Course Demo

      4:16

    • 2.

      01 - Text-To-Speech Overview

      4:59

    • 3.

      02 - Text-To-Speech Benefits

      10:41

    • 4.

      03 - Text-To-Speech Introduction

      15:55

    • 5.

      04 - Text-To-Speech Markup Process

      8:02

    • 6.

      05 - Text-To-Speech Tools

      16:28

    • 7.

      06 - Text-To-Speech Markup Tutorials

      3:14

    • 8.

      07 - Text-To-Speech Speak Tag

      1:59

    • 9.

      08 - Text-To-Speech Break Tag

      8:02

    • 10.

      09 - Text-To-Speech Paragraph Tag

      8:37

    • 11.

      10 - Text-To-Speech SayAs Tag

      35:41

    • 12.

      11 - Text-To-Speech Emphasis Tag

      3:56

    • 13.

      12 - Text-To-Speech Prosody Tags

      12:34

    • 14.

      13 - Text-To-Speech MaxDuration Tag

      7:27

    • 15.

      14 - Text-To-Speech Pronunciation Tags

      12:27

    • 16.

      15 - Text-To-Speech Add Audios

      13:16

    • 17.

      16 - Text-To-Speech VoiceFX

      20:49

    • 18.

      17 - Text-To-Speech Language Tag

      10:06

    • 19.

      18 - Text-To-Speech: Putting It All Together

      17:31

    • 20.

      19 - Text-To-Speech Tips

      22:58

    • 21.

      20 - Text-To-Speech Resources

      1:16

  • --
  • Beginner level
  • Intermediate level
  • Advanced level
  • All levels

Community Generated

The level is determined by a majority opinion of students who have reviewed this class. The teacher's recommendation is shown until at least 5 student responses are collected.

409

Students

--

Projects

About This Class

This groundbreaking course is presented and narrated entirely by AI voice instructors. In this practical step-by-step video course, you will learn how to use text-to-speech and the latest AI voice technologies to create professional and realistic sounding voice narrations from text files for a wide range of commercial uses and business applications. 

The course is designed specifically for non-technical users. No coding or programming skills are required or necessary.

In this comprehensive 4-hour, 20-part video course, you will learn:

  • A brief history and a basic introduction to the uses and benefits of using text-to-speech.
  • Where to find free or inexpensive tools to create professional voice narrations from text files.
  • How to use text-to-speech markup tags to create realistic human-like voice narrations.

The course includes detailed and practical step-by-step video tutorials using engaging and entertaining examples of text-to-speech applications, and downloadable course notes and materials.

Upon completion of the course, you will have all the skills, knowledge, and competence needed to create professional voice narrations and text-to-speech audio files for a range of business, marketing, and commercial uses, including:

  • Educational, sales and training videos
  • Narrated presentations and documentation
  • e-Learning courses
  • Audiobooks and audio-based digital products
  • Podcasts
  • Spoken web pages
  • Online / Social Media / Radio advertising
  • Recorded announcements
  • Other content and media formats

Meet Your Teacher

Teacher Profile Image

Martin Aranovitch

Digital Business Training & Education

Teacher

I have over 14 years of experience teaching businesses and non-technical users how to grow and manage an effective digital presence using smart and cost-effective technologies. My step-by-step video courses provide practical easy-to-follow information that will save you time and money and help you avoid time-consuming and expensive learning curves.

See full profile

Level: Beginner

Class Ratings

Expectations Met?
    Exceeded!
  • 0%
  • Yes
  • 0%
  • Somewhat
  • 0%
  • Not really
  • 0%

Why Join Skillshare?

Take award-winning Skillshare Original Classes

Each class has short lessons, hands-on projects

Your membership supports Skillshare teachers

Learn From Anywhere

Take classes on the go with the Skillshare app. Stream or download to watch on the plane, the subway, or wherever you learn best.

Transcripts

1. Text-To-Speech Course Demo: Hello. My name is Kate, and I am an artificial intelligence based voice narrator. In this short video, I want to show you some of the things you will learn to do in our text-to-speech for beginners course using inexpensive text-to-speech tools and synthetic voice narrators like myself. So, sit back, listen, and enjoy. Hello. This is a recorded announcement. The blue line train arriving on platform number one will depart at 8:06 and stop at the following US stations: Dead Horse, Alaska Nothing, Arizona Nowhere, Colorado Greasy Corner, Arkansas Fluffy Landing, Florida Hell For Certain, Kentucky Buttzville, New Jersey You say either And I say either you say neither and I say neither either either neither neither Let's call the whole thing off. Dear listeners, this is my first attempt ever at doing stand up comedy. Please be kind. I plan to entertain you with some fabulous jokes tonight. Are you ready? Okay, here we go. I been everywhere, man I been everywhere, man Crossed the deserts bare man I breathed the mountain air man Of travel I a'had my share man I been everywhere I been to Louisville, Nashville, Knoxville, Ombabika Schefferville, Jacksonville, Waterville, Costa Rica, Pittsfield, Springfield, Bakersfield, Shreveport, Hackensack, Cadillac, Fond du Lac, Davenport, Idaho, Jellico, Argentina... Fox in Socks by Dr Seuss Fox, Socks, Box, Knox Knox in Box, Fox in socks Knox on fox in socks in box Socks on Knox and Knox in box Fox in socks on box on Knox Chicks with bricks come Chicks with blocks come Chicks with bricks and blocks and clocks come Look, sir, look, sir. Mr Knox, sir... Make new friends... But keep the old One is silver... The other is gold. Make new friends (a circle is round), But keep the old (it has no end) One is silver (that's how long) The other is gold (I will be your friend) A circle is round (make new friends) It has no end (but keep the old) That's how long (one is silver) I will be your friend (the other is gold) Hello and welcome to another episode of the AI meditation podcast where we only say what others are thinking... Before we begin... Take a deep breath ... and relax. Would you look at all that stuff... They got allen wrenches gerbil feeders toilet seats, electric heaters trash compactors juice extractor, showered rods and water meters walkie talkies copper wires safety goggles radial tires BB pellets rubber mallets fans and dehumidifiers picture hangers paper cutters waffle irons window shutters paint removers window louvres masking tape and plastic gutters kitchen faucets folding tables weather stripping jumper cables hooks and tackle grout and spackle, power foggers, spoons and ladles, pesticides for fumigation high-performance lubrication metal roofing water proofing multipurpose insulation... Congratulations! Today is your day You're off to great places You're off and away. You have brains in your head You have feet in your shoes You can steer yourself any direction you choose You're on your own and you know what you know And you are the guy who'll decide where to go. 2. 01 - Text-To-Speech Overview: Hello and welcome to "How to create text-to-speech audio files" a practical step-by-step course for beginners. My name is Kate, and I am an artificial intelligence-based voice narrator. I will be your main instructor throughout the lessons, along with other synthetic speech narrators like George, Mia, and Navin, whose voices will feature in many of our tutorials, demos, and examples. George, Mia, Navin, please introduce yourselves. Hello! I'm George, and I look forward to helping you learn how to create realistic-sounding audio files using text-to-speech technologies just like the one that created me. And I'm Mia from down under. As you can see, I'm an Australian-sounding AI voice narrator and I will be assisting you in the lessons, along with my mate Navin. Navin, are you there? Do you want to say a quick hi to the listeners? Thank you, Mia and welcome dear listeners, I'm Navin and I'm also a voice narrator created using the same artificial intelligence technology that you will be learning how to use in this exciting course. As you can see, creating text-to-speech audio files is not only a lot of fun. but it also has many practical applications, especially for businesses that want to save time and money with their digital marketing campaigns, and promotions. Some of the things you can do using text-to-speech include creating audio and voice narrations for sales videos, explainer videos, video sales letters, training videos, video ads for social media, presentations, announcements, podcasts, audiobooks, spoken web pages for visually impaired users, and so many other uses and applications. Kate, don't forget to tell our listeners that we can easily convert text files and audio voice narrations into many different languages. Thanks, George. I will. Once you learn how to create a text-to-speech file, you can quickly and easily convert your text files and audio narrations into dozens of different languages. Kate, tell the listeners about some of the other exciting things we will be teaching them in this course. Sure... "How to create text-to-speech audio files" "A practical, step-by-step course for beginners" is designed to teach you how to use text-to-speech and the latest AI voice technologies to create text files that can then be easily converted into audio-like voice narrations. This course was created specifically for non-technical users, so you don't need to learn how to code or program software to apply the lessons and get results. The course will cover a brief history of speech synthesis, a basic introduction to the SSML markup language, and practical step-by-step tutorials on how to create text-to-speech files. You will learn where to find inexpensive and free technologies and tools that you can use to create professional voice narrations from text files, and we will show you how to use these tools. You will also learn how to use basic text-to-speech markup tags to insert things like pauses, emphasis, and various other inflections into your text as we walk you step by step through the process of creating text files that can then be easily converted into voice narrations and audio files for a wide range of applications like videos, instructions, presentations, recorded announcements, and many other uses. Using text-to-speech in web and software applications offers many benefits to businesses, companies, and organisations, especially when it comes to things like saving time and money, communicating their brand and message through various digital and social marketing platforms, and helping businesses reach a wider global audience for their products and services. In the next few years, we are going to see an explosion of text-to-speech applications and a growing demand worldwide for people with basic text-to-speech skills, like knowing how to mark up and edit text for conversion to audio. So... if you are looking for an opportunity to get ahead of the curve and make money from this emerging global trend, or thinking about starting a business to profit from a growing demand for text-to-speech services, now is the best time to learn the basic skills this course will teach you. So that's an overview of what this course will cover. Once again, welcome and let's get started. 3. 02 - Text-To-Speech Benefits: Hello. This is a recorded announcement. Blue Line train arriving on platform number one will depart at 806 and stop the following US stations Dead horse. Alaska Nothing. Arizona Nowhere. Colorado Risi Corner Orphans All Landing Lord Help for serving time. Butterball New Jersey Burger Town, North Carolina. Not Homestead, Ohio. Job down in Texas and Disco, Wisconsin. The Red Line train Writing on my phone Number two will depart in 1913 and stopping the following Australian stations. Chicken Victoria, you know. No, they wrong. Wrong New South Wales Cool in Western Australia. Manama Tom Victoria Buggy Queensland Yeah, don darling story. Well, the New South Wales. Whoa, Queensland and nowhere else in Tasmania. Please stand behind the yellow line and waiting for the train to come to a full stop before boarding. Hello and welcome back before we jump into the course lessons. I want to talk about why knowing how to use text to speech as a valuable skill and cover the main benefits of using text to speech for businesses and organisations, content creators and content publishers and different types of end users. Text to speech is key to the new digital technology boom it's a huge growth sector. The text to speech market was valued at $1.3 billion in 2016 and is expected to reach 3.3 billion by 2022. Key market areas include consumer electron, ICS, education, health care, transportation, retail finance, enterprise and other areas. This growth will bring many new and exciting opportunities for many different types of uses and different types of users. Text to speech Creates a more Accessible Internet for everyone 15 to 20% of the worldwide population has some form of language based learning disability. 14% of adults in the U. S are illiterate and many have only basic reading skills. Additionally, 244 million people are foreign born across the globe. Text to speech helps to make the Web more inclusive by turning it into a place where users can access, consume and digest information in audio format. Text to speech can also help to make life easier and make work more efficient and productive. Text to speech can be used to enhance and deliver information in government, corporate and business websites, General Blog's mobile applications, e books, e learning courses, training materials business Documentation HR in legal policies, transportation and public announcements, systems and automation designed to improve customer experience and communication. Media sales and marketing, robotics, embedded devices, self service applications, the Internet of things and ways we haven't even thought of yet. Text to speech will also become more necessary in the digital age as governments and businesses look for ways to increase citizen engagement online and strengthen corporate social responsibility by ensuring that information is available in both written and audio format text to speech benefits, businesses and organisations. In many ways it improves the quality of the customer journey by allowing businesses and organisations to improve customer experience and respond to different customer needs, wants and desires in terms of how they interact with content. Text to speech minimizes human workload and reduces operational costs. TTS convey used to provide employees and after sales customer training. Educate staff on HR and legal policies. Personalized customer handling services etcetera Text to speech improves branding new T ts Technologies allow businesses to create and use a recognizable synthetic voice to represent their brand across different areas of the business and customer touch points. Text to speech can increase your Web presence. Almost 800 million people worldwide have literacy issues, and 300 million people have visual impairments. Speech enhanced Web content does not interfere with usability for users without disabilities. It also aids all populations, such as older users and foreign or non native speakers. Text to speech can help businesses reach new markets globally. T ts Voices are available in dozens of languages and consensus eyes. Speech from written translations. Text to speech also helps businesses save time and money. Online content can quickly and easily be turned into speech without hiring human voice talent and language. Translators and text to speech allows for easier implementation with the Internet of things by giving connected devices a more user friendly way to communicate with consumers. Text to speech also benefits content creators and content publishers, content creators and publishers can save time and money getting TTS voice narrators to enunciate your courses. Air narrator. Podcasts or audiobooks as an economical and time saving solution compared to hiring voice talent and allows you to create e courses and audio products faster with less time and less cost. If you need human voice talent. Text to speech allows you to create drafts and finished audio scripts for professional narrators. Text to speech also allows you to create better content if you are planning to create an audiobook podcast. The learning product for Training course listening to an audio draft helps to improve content, structure and layout, fix spelling or grammatical errors and generate new ideas. Text to speech also helps you write more effective content. Hearing your sales pitcher content read aloud helps you better focus on your message. Improve your copy writing and writing skills right. More effective sales and training video scripts and presentations, Web content, radio ads and many other forms of content. Finally, text to speech Helps content creators and publishers create and deliver content to a global audience by making it easier to create multi lingual audio content and audio products from language translations. Text to speech also provides many benefits to different and user audiences. Text to speech helps all students, including students with learning disabilities. Used text to speech to create audio content for struggling readers. Students with dyslexia and students with low literacy studies show that text to speech improves reading comprehension, spelling, error detection and understanding word meanings visual and reading impaired users can benefit greatly by having content that can be read out loud and learning content that is made more accessible. Foreign language users can also benefit from text to speech as translated content turned into speech facilitates comprehension and retention for a larger percentage of the online population, whose native language is different from the language of a particular website or mobile app . Older users can also benefit greatly from text to speech. As a growing elderly population is becoming more dependent on technology for access to information and services between 2015 and 2030 the number of people aged 60 years or over will grow by 56% from 901 million to 1.4 billion. In the US alone, 59% of senior citizens used the Internet. Daily speech enabled mobile content makes the Internet more accessible and creates an easier user experience, especially for mobile users who access content mostly on mobile devices. Reading content on a small screen can be difficult and inconvenient. It's far easier to have the content read aloud, especially for users on the go. Another group of users that can benefit with text to speech our users with different learning styles. People have different learning modalities. Making digital content on the Internet, accessible in multiple formats creates an easier user Experience in summary text to speech provides many benefits to businesses and organisations, content creators and content publishers and different types of end users. As text to speech becomes more widely used in every aspect of life, it will also bring many new and exciting opportunities. T. T s technology is inexpensive and easy to use. Makes the web accessible to all users helps create better content, faster saves time and money and so many more benefits. This brings us to the end of this module. Please refer to the accompanying documentation in this section for more information and thank you for listening. 4. 03 - Text-To-Speech Introduction: the way the 9000 series is the most reliable computer ever made. Wear all foolproof and incapable of it. Open the pod bay doors. I'm sorry, Dave. I'm afraid I can't do that. That's the problem. I think you know what the problem is, Justus. Well aside. - Hello and welcome to how to create text to speech audio files. A practical step by step course for beginners. In this lesson, we explore the background of speech synthesis with a brief history of text to speech technologies, popular text to speech engines and basic text to speech terms used in the creation of artificial voices. Allowing humans to interact with computers and converse with machines has been a longstanding dream of science visionaries, science fiction writers and, more recently, film animators and virtual software and game developers. Humankind, however, has dreamt of creating artificial speech for many centuries. The idea that autumn a thons could converse with humans can be traced. A Sfar Bacca's 1000. D. Where, according to legend, Pope Sylvester, the second stolen tome of secret knowledge, a talking head that could answer any yes or no question, it was asked long before the invention of electronic signal processing, people have tried to build machines that emulate human speech. Early attempts at creating human speech artificially or, as we now call it speech synthesis includes building mechanical models of the human vocal tract to produce vowel sounds. Bellows operated acoustic mechanical speech machines and electronic speech devices like keyboard operated voice synthesizers or folk odors and machines that convert acoustic patterns of speech into sound. In the mid seventies, one of the first speech synthesis systems, consisting of a standalone computer hardware in a specialized software was developed that could read and even sing in Italian. In the 19 eighties, Bell Labs developed one of the first multi lingual language independent systems, making extensive use of natural language processing methods. Around the same time, Digital Equipment Corporation developed a speech, synthesizer and text to speech technology called Deck Talk. Listen to a sample of speech generated by deck talk, using the voices of Perfect Paul and Up a Gear Select. Right now you are hearing my perfect Paul voice. However, I also have other presets. This, for instance, in my up here. So what more sitting as you can see, early Elektronik speech synthesizers sounded robotic and were often barely intelligible. Lucky for a I narrators like me. Speech synthesis or the process of creating human speech artificially has come a long way since those early days when mechanical talking devices tried to emulate the human vocal tract and electronic speech simulators and speech synthesizers created voice like sounds using electrical circuits. The real revolution in speech technology came about when digital computers began allowing the simulation of electronic circuitry. The conversion of analog signals to digital form and the creation of analog signals from digital information to produce sound in the form of speech. Advances in computer technology and the introduction of desktop computers eventually brought affordable speech synthesis and speech recognition within the reach of the average computer user. Many computer operating systems have included speech synthesizers since the early 19 nineties, as these technologies became cheaper and more accessible. This brings us to where we are now. The quality of synthesised speech is steadily improving, and it's getting harder to tell between artificially generated speech and human speech, especially as new AI and machine learning technologies text to speech software and voice applications, the Internet of things, Elektronik products and the gaming industry keep pushing voice technology to new boundaries . Have you heard of this new technology? You speaking about this new algorithm to copy voices? Yes, it is developed by a startup full flavor. This is huge. It can make us say anything now. Really? Anything. The good news is that they will offer the technology tending. This is huge. How does this technology work? Hey, guys, I think that they used to learning and artificial neural networks. Hillary is right, and I can tell you that their team is rates I wish him really like. I'm sure they will do a good job. So we'll artificial voices become indistinguishable from real human voices. Perhaps someday they will. Speech synthesis systems and talking machines air no longer an amusing novelty designed to elicit a cheap laugh. Boobs text to speech systems capable of generating AI. Voices like mine are now being integrated in all areas of human life, including learning, teaching, sales of products and services, delivery of news, information and entertainment, reading recipes while you cook and even performing tasks and activities in your home and in your office. Now that we have looked at the history of speech synthesis, let's take a look at some current TTS technologies and systems being used to create artificial human speech. As this course is aimed at non technical users, these next few slides present only an overview of text to speech technologies to help put things into context. At the end of this course, you will find a comprehensive list of references, sites and additional resource is where you can learn more about technical areas related to text to speech or TTS technologies. A speech computer. A speech synthesizer can be implemented in software or hardware products. Text to speech systems convert normal language text into speech. While other systems render symbolic linguistic representations like phonetic transcriptions into speech, the most important qualities of a speech synthesis system are natural nous and intelligibility. Natural Miss describes how closely the synthetic generated voice sounds like human speech while intelligibility as how easily it can be understood. The ideal speech synthesizer aims to generate synthetic speech wave forms. That sound is natural and intelligible as possible. It's important to keep in mind that all speech technologies have strengths and weaknesses. For example, one of the main technologies used to generate speeches called concoct native speech synthesis with concocted native synthesis. A very large database of short speech fragments called units are recorded from a single speaker and recombine to form complete utterances. In other words, this method string segments of recorded speech together. While this produces natural sounding synthesized speech, it's difficult to modify the voice. For example, you can't switch to a different speaker or alter the emphasis or emotion of their speech without recording a whole new database. Let me play You an audio file generated from text using concoct native speech synthesis. The Blue Lagoon is a 1980 American romance and adventure film directed by Randall Kleiser, another type of technology used to generate speeches called Parametric Speech synthesis, Parametric Synthesis aims to create a machine model of the human voice using the acoustic properties of the human vocal tract, and generates audio data by analyzing the values of various speech parameters and then feeding these through signal processing algorithms known as voke odors. We touched on this model earlier when discussing the history of speech synthesis. Here's an audio file generated from text using Parametric speech synthesis. The Blue Lagoon is a 1980 American romance and adventure film directed by Randall Kleiser. As you can hear, these synthetically generated voices are not bad. While there are other voice generating models used to synthesize speech wave Net as the most natural sounding voice technology currently available. And as one of the main models that we'll be using throughout this course as we learn to build scripts for voice narrations. The Wave Net model as the same technology used to create speech for applications like Google Assistant, Google Search and Google Translate Wave Net technology provides more than just a series of synthetic voice is it represents a new way of creating synthetic speech. Wave Net generates speech that sounds more natural than other text to speech systems. It's synthesizes speech with more human like emphasis and inflection on syllables, phone names and words. Studies show that most people prefer wave net generated speech, audio over other text to speech technologies. Unlike most other text to speech systems, a wave net model creates raw audio wave forms from scratch, using a neural network that has been trained using a large volume of speech samples. Here's some sample audio generated using wave net speech synthesis. The Blue Lagoon is a 1980 American romance and adventure film directed by Randall Kleiser. Let me play all three sample audiophiles again so you can hear the differences between concoct a native Parametric and wave net synthesis. The Blue Lagoon is a 1980 American romance and adventure film Directed by Randall Kleiser. The Blue Lagoon is a 1980 American romance and adventure film Directed by Randall Kleiser. The Blue Lagoon is a 1980 American romance and adventure film directed by Randall Kleiser. My voice has been generated from a text file using wave Net synthesis Wave. Net, however, doesn't just synthesized voices as well as yielding more natural sounding speech. Using raw wave forms means that wave net can model any kind of audio, including music. For example, here is a sample of music created from random musical data input into a wavelet algorithm. Doesn't that just sound like music to your ears? As you can hear, wave Net opens up a lot of possibilities For text to speech systems. Let's take a brief look Now at some of the most popular text to speech engines. Text to speech engines allow users of applications like email tools, Web readers, audiobooks and other software programs to convert written text into sound. Different TTS engines provide access to different voices, languages and dialect. For example, Microsoft has a T. T s engine called Speak, which is a built in feature of programs like Word, Outlook and Power Point. You Can you speak to have text typed in your word documents, e mails and slide presentations read aloud. Listen to a sample of a Microsoft speak engine voice reading a sentence typed into a word document. My crown is in my heart, not on my head, not decked with diamonds and Indian stones. Nor, to be seen. My crown is called content a crown. It is that seldom kings enjoyed. This quote is from the play King Henry the Sixth by William Shakespeare. Hello, I'm Kendra from Amazon Polly. Notice that there is a difference between saying content and content. Here's how I would say the quote from Shakespeare. My crown is in my heart, not on my head, not decked with diamonds and Indian stones. Nor to be seen, my crown is called content a crown. It is that seldom kings enjoy. This quote is from the play King Henry, the sixth, by William Shakespeare. The second audio sample you've just heard was created with Amazon Polly, which is a text to speech service that uses advanced deep learning technologies to synthesize speech into dozens of lifelike voices in multiple languages. Amazon Polly uses the same artificial intelligence technology used to power Amazon's digital voice assistant Alexa. We'll hear more from Amazon Polly in later lessons the last TTS engine I want to cover in this lesson as the Google Cloud text to speech engine, which converts text into human like speech. Using more than 100 voices in over 20 languages and variants, Google's TTS engine uses wave net speech synthesis and powerful neural networks to deliver the high Fidelity audio used in applications like Google Assistant, Google Translate and Google Reader. The last area I want to cover in this lesson are some of the basic text to speech terms we'll be referring to throughout this course. You should be familiar by now, with terms like TTs or text to speech, speech synthesis and different models for generating artificial or synthetic speech like concoct native Parametric Wave net in terms like neural networks, machine learning and ai voices. In other lessons, you will learn about s SML, which we will use to mark up text files for audiophile conversions Prasit E, which lets you change attributes of your speech like the volume, pitch and rate of your tag, text and phone names and phonetic pronunciations which allow similar words with different meanings to be pronounced correctly in your audio files. This brings us to the end of this lesson. I hope you have enjoyed this lesson as much as I have enjoyed presenting it to you and thank you for listening. 5. 04 - Text-To-Speech Markup Process: Hello and welcome back In this lesson, you will learn how to prepare text for audio files. Topics covered in this lesson include what is s SML an overview of s SML markup tags and the main audio file formats we will use in the text to speech process Before we get into this lesson, let's have a little fun. I'm going to play you a video and I want to see if you can tell whether the audio in this video was recorded by a really human being or an AI voice. Narrator Oh, the places you will go by Dr Seuss. Congratulations. Today is your day. You're off to great places. You're off in a way. You have brains in your head, you have feet in your shoes. You can steer yourself any direction you choose. You're on your own. And you know what you know, and you are the guy who'll decide where to go. Okay, That was just a practice run. Let's see if you can tell whether this next audio was recorded by a really human being or an AI voice. Narrator Oh, the places you'll go by Dr Seuss. Congratulations. Today is your day you're off to great places. You're off and away. You have brains in your head. You have feet in your shoes. You can steer yourself any direction you choose. You're on your own. And you know what you know And you are the guy who'll decide where to go. Not bad, huh? This voice only took a human being about 20 years to perfect. Okay, last test. Is this voice narration, riel or a I? Oh, the places you'll go by Dr Seuss. Congratulations. Today is your day. You're off to great places. You're often away. You have brains in your head, you have feet in your shoes. You can steer yourself any direction. You choose your on your own and you know what you know and you are the guy who'll decide where to go. The last audio file you heard was recorded using a synthetic AI voice with marked up text to try and get the narration sounding as close as we could to a natural reading. Listen to the introduction of this reading again with the rial voice and the synthetic voice narrating the title at the same time. Oh, the places you'll go by Dr Seuss As you can hear, we're not quite there yet, but we are getting closer and closer. Not only is the technology for generating realistic voice is getting better, but the way we can express voices using markup tags is also improving. For now, let's start by taking a look at the language used to mark up text to speech files. S S M L stands for speech synthesis markup language and consists of written tags that tell text to speech engines how to encode text to create nuances and add expression to a synthetic voice. S S M L As part of a language called XML, XML stands for extensible markup language and allows developers to describe and organize information in ways that humans and computers can easily understand. While many companies air developing new text to speech applications for their platforms, not all text to speech engines, concertos, same s SML tags or make use of all the SS ML tags that are currently available. Some platforms also develop custom s SML tags for use in their own applications, which may not work in other text to speech engines. For example, as this lesson is being recorded, Google's text to speech engine doesn't support using s SML tags that allow you to add phonetic variations, toe words at breaths to speech or use interjections in sentences. But Amazon Polly does. We will explore some of these differences and which tools to use for different TTS engines later in our tutorials. So what can you do with s SML tags? Adding s SML tags to your text files lets you do things like ad breaks and pauses to your narrations. Add emphasis to your words and sentences. Spell out words and telephone numbers say numbers differently, depending on whether you are talking about dates, times, units, fractions or explaining the difference between being number two and coming second, add paragraphs and sentences to your narrations. Censor words in your narration. Like the words control Prasit e attributes in your narrations to fine tune elements like the pitch, volume and tempo of spoken words. Add phonetic variations. Two words. Substitute abbreviations to speak their expanded format like World Health Organization instead of W. H. O. R. Who embed other audio files in your voice. Narrations like adding sounds or insert advanced instructions like playing multiple media files either simultaneously or sequentially. the main audio file formats we will use to convert our text into audio narrations in this course are wave and MP three files using either wave or MP three formats will work just fine for recording voice narrations. Wave files provide better sound quality for recording or distributing music as the wave format can cover the full frequency that the human ear is able to hear. An MP three file is compressed and has quality loss, whereas a wave file is lossless and UN compressed. MP three will never sound better than wave as it is a lossy format. MP three files, however, are smaller in size than wave files, and so they are much easier to distribute. Although wave files air normally much larger in size than MP threes, storage these days is no longer such a big problem. So once again, using either of these formats will work just fine for voice narrations. Please note that we won't be covering technical aspects of digital audio like sampling rates, bit depths, etcetera in our lessons, as these air not necessary to convert text into audio files for most commercial applications, we will explore, however, some tools you can use to convert audio files into different formats and some of the settings these tools provide to improve the sound quality of your audio recordings. This brings us to the end of this lesson. Once again. Thank you for listening, and I will see you in the next lesson way. 6. 05 - Text-To-Speech Tools: Hello and welcome back. In this lesson, we look a text to speech tools. Topics covered in this lesson include the text to speech process and tools used to convert text into audio files. Time saving tools for adding phone aims to your SS ML file. Converting audio files into different formats, translating content into different languages, capturing audio and more. We'll also look at free and paid text to speech tools for creating audio files that can be accessed from laptops, desktop computers, mobile devices and the cloud, and additional tools and resource is that we recommend using to save time and money. Let's start by breaking down the text to speech process for converting your text based script into an audio file. This process begins with your text based content. This content can be in the form of a narration, script, an article, sales copy, training instructions, a book, etcetera. After your content has been written, the next step is to select your text to speech engine as mentioned in a previous lesson, you need to choose your text to speech engine before you mark up your text. Because different text to speech platforms may not support or allow you to use a mess. SML markup tags. For example. If your content uses words that require a different phonetic pronunciation, then you will probably want to mark up your text for Amazon Polly instead of Google until Google's text to speech engine allows phonetic tags to be used in S S M l. To keep things really simple, the only t ts engines we will be using throughout this course are Google text to speech and Amazon Polly. So all you need to do to complete this step is choose which engine you will use to process your written content. After selecting your T ts engine. The next step is to then mark up your text file with S SML tags that the engine will support. This step is covered in detail in the markup tutorials. In our next lesson, after marking up your text file with s SML tags, the next step is to run your content through your t ts tool. We will look a T. T s tools in just a few moments. Essentially, the tool should allow you to select your language or dialect. Choose a male or female voice, import your SS ml text file and then convert your text into an audio file. After creating your audio narration, you should then be able to download or export your audiophile, which you can then use for whatever application you like, such as a video narration, Web page, podcast, audio book, etcetera. Let's take a look now at some time saving text to speech tools. The first tool you need to create a text to speech file as a plain text editor. If you use Windows, the built in free note pad text editor, as a perfect tool for the job. If your computer runs on IOS, a default text editor tool like text editor is great, too. It's important to remember that all your mark ups should be done on a plain text file. Using words and mark up tags only don't use word processing applications with formatted text, as this is not compatible with TI ts engines and will lead to errors. Another important point to keep in mind is that if you add phonetic symbols to your text file, you will need to save your text file using utf eight encoding. I will show you how to do this in a later tutorial. The next tool we recommend using is a tool like the MacMillan Online Dictionary, as it provides word pronunciations and phonetic spellings that you can copy and paste into your text file. Let me show you an example. He can Pekan another great online tool you can use for phone names and phonetic spelling, is the I. P. A type of tool. This tool lets you build a phonetic spelling of words using an online keypad in alphabet, which you can then copy and paste into your text file. Here is a brief demo video showing you how this tool works. The next useful tool is Google. Translate with Google Translate. You can paste text in your language, translated into another language than copy and paste the translation into your text to speech file. Here is a brief demo video Carson Ananda Lindgren A meandering castle Nicotero is so very consistent. CIA. Kathy referred Positivo Council. She named me the Cell Koshien. Any the Selkoe the funds shake. Why each amazingly, some photo shoot the recession. I'm is a liver toe shoe. Hope Another great tool is a file format conversion tool. There are many conversion tools available to choose from. One that I particularly like is online dash convert dot com, which lets you easily convert all kinds of files and different types of formats for free, including converting MP three audio files into wave files and vice versa. Here is a brief video of this tools interface. Let's move on to text to speech conversion tools. We'll start with free tools that let you convert text into audio narrations. Both Google and Amazon provide text to speech simulators where developers contest scripts and download audio narrations, but accessing these as a little complicated and require setting up accounts with the platform. The notes attached with this lesson provide further instructions and tutorials on how to access these T TS simulators. There are a number of free text to speech tools you can access online that lets you create audio files from your inputted text. We provide a list of free online text to speech tools in the notes that accompany this lesson. Most of the free online tools we tested while putting this course together seem to be fairly limited and didn't accept s SML markup tags. Hopefully, in the future, these tools will improve here is a demo video of a free online TTS tool that we tested while putting this course together. How straining your cup E house training your cup is about consistency, patients and positive reinforcement. The goal is to instill good habits and build a loving done with your cut. It typically takes 4 to 6 months for a puppy to be fully house trained, but some puppies may take up to you year. In addition to tools you can access through your desktop, computer or laptop, there are also a number of text to speech mobile APS you can access through your phone. Most IOS and Android phones now come with built in text to speech functionality. All you need to do is activated on your phone. You can search for text to speech APS on your phone just by going to your APP store and typing in text to speech. We also provide links to tutorials like this one on how to activate your phone's text to speech in the notes that accompany this lesson. Text to speech allows you or your child to have digital tax read aloud. Here's what it looks like. Remember to put your homework in your backpack If you plan to create professional audio narrations using AI voices, we recommend using paid text to speech tools as you will get access to better support and regular upgrades. Let's go through a couple of tools that we recommend using to convert your text files into audio narrations. There are two cloud based text to speech tools that we use depending on the TTS platform you require for your project for Google Wave Net voice narrations, we use a tool called Wave Net vocalize er for Amazon Polly voices. We use a tool called script vocalize ER. Both of these tools were used to create the voice narrations for this entire course, and they're both developed by the same company. Both wave net vocalize ER and script vocalize er allow you to upload a text file marked up with S SML, convert text into audio, translate the text into different languages and download high quality audiophile recordings for a range of commercial uses. Wave net vocalize er outputs audio files as wave format and script vocalize er outputs audio as MP three for more information and links to where you can access both of these tools, refer to the accompanying notes for this lesson. We have now covered the main tools you will need to create text to speech files. The next couple of slides provide some additional tools, and resource is you may want to consider using, depending on your needs and what you plan to use. Text to speech for a great tool to use if you plan to write your own content for audio narrations as Graham early Graham early scans your text and helps you fix spelling errors. Improve your grammar in your communication, and this can ultimately help you create and deliver a more powerful and effective message. Since we cannot think for ourselves yet, ai voice narrators like myself will read whatever you type. So if there are spelling errors in the words, we will read these as presented in your text. Jim, could you come in here, please? Hi, Jim. Hello. I am Harvey. A computer gym sucks, So Wow. Oh, that's so rude. I'm sorry. I can't control them. Yeah, you can. You know, get Pam for this. Pam. Pam, You look very hot today. Hand me, Harvey. This is Michael's friend. Great. Me so horny. Me love you long, Tim. Oh, that is gross. Suzlon. Tim! Damn it. Long time me, lo boy a long time. Well, you're You should bring long Tim in one day. I'd love to Me. Yeah, Yeah. You ruined a funny joke. You get out of my off five. Okay. By Hardy, another couple of tools you may want to consider investing in, Especially if you plan to start a business offering text to speech services or create videos with AI audio. Narrations are tools like snag it and can't Asia. These tools not only let you capture record and edit screen videos with audio narrations, but you can also use these tools to extract audios from videos posted on other sites and export only the audio soundtrack of these recorded videos. If you plan to offer text to speech and video services professionally or just want to create video and audio narrations for your own business marketing and promotions, we provide a list of video creation tools in the accompanying notes. Here, for example, is a quick explainer video created using a video animation software tool called Twombly that my friend George narrated. Hello, I'm George. I am an artificially generated voice. Narrator Someone like me can save businesses time and money in areas like video marketing, which everyone knows is one of the most powerful and effective ways to promote products and services online. Reach new audiences globally. Establish your brand, educate and inform or prospects about your business and train staff customers and climbs some great uses for a I voice. Narrations include sales videos, explain their videos, training videos, video ads, video presentations, podcasts, spoken books, Web pages for visually impaired users and so many other uses. Once you know how to convert text to speech, you can create videos with audio narrations like this one quickly and easily using very inexpensive tools. Thank you for watching this video and have a wonderful day. So in summary, the tools we have covered in this lesson will help you save time and money, creating text to speech files, the free text, phonetic conversion and translation tools I have shown you will help you save time creating your text files. I recommend choosing tools like wave net vocalize ER and script vocalize ER to convert your text files into high quality audio is using Google text to speech in Amazon Polly voices. And if you plan to use your text to speech skills in a commercial environment, either by providing professional services air using these to enhance your own business, then consider investing in video and audio tools to create videos or record an extract audio from other sources. This brings us to the end of this lesson. Once again. Thank you for listening, and I will see you in the next lesson. 7. 06 - Text-To-Speech Markup Tutorials: Hello and welcome back. This section of the course includes a number of tutorials that will show you how to mark up your text to speech files. This lesson provides an overview of the tutorials which we have included. A separate videos. For easier reference, I will show you which s SML. Markup tags can be used with Google T TS and or Amazon Polly, and we also provide you with S S M l cheat sheets. In the tutorials provided in this section, you will learn how to mark up your text to speech files to do things like Ed pauses and breaks two paragraphs and sentences at different levels of emphasis. Two words control how special types of words are spoken, such as telephone numbers, dates, time, units of measurement, fractions and cardinal and orginal numbers. You will also learn how to censor words. Control process the elements of speech like pitch volume and speaking rate. Use phonetic pronunciation with certain words, pronounce acronyms and abbreviations and embed audio files into your scripts. We also cover additional s SML markup tags that let you add breath two words speak words softly or whispered control the timber of selected voices add dynamic range compression and more. To keep things simple, we will only focus on marking up text to speech files for Google text to speech and Amazon Polly engines, as mentioned in a previous lesson. Different text to speech engines may not support or allow you to use a mess. SML markup tags. So as we go through the tutorials, we will let you know which platform support the tags being used in the examples. Each tutorial will follow a similar format. The tag will be listed in the slide header, followed by an example of how to use the SS ML markup tag and how the text synthesizes into speech after being processed with an audio example. Symbols on the top right hand corner of the slide will then indicate whether the markup tag being shown works in Google's TTs engine Amazon, Polly or both included in the accompanying notes. For this training module, you will find cheat sheets for Amazon Polly and Google's text to speech engine. This brings us to the end of this lesson. Please complete the SS ml markup tag tutorials in this section before proceeding to the next training module. Thank you for listening and for watching this video 8. 07 - Text-To-Speech Speak Tag: Hello and welcome back. In this tutorial, you will learn how to use the speak s SML markup tag in your text to speech files. The speak tag is the root element of all s SML. Text text must be enclosed within a pair of speak tags to be converted into speech at an opening speak tag to the beginning of your text at a closing speak tag to the end of your text file. Here is an example of how to use the speak tag in your text file. Note that all the content you want to convert into speech is enclosed within the opening and closing. Speak tags. Let me play you an audio example of how this text will sound after being processed by a text to speech engine that can read s SML words are singularly the most powerful force available to humanity. We can choose to use this force constructively with words of encouragement or destructively using words of despair. Words have energy and power with the ability to help, to heal, to hinder toe hurt toe, harm to humiliate and too humble. This brings us to the end of this tutorial. Please refer to the accompanying notes in this section for more information 9. 08 - Text-To-Speech Break Tag: Hello and welcome back. In this tutorial, you will learn how to add pauses, toe words, sentences and paragraphs in your text to speech files. Using the break tag, we will look at using the break tag, and it's optional time and strength attributes before we explore the break tag in more detail. Let's just refresh our memory with the definition of Prasit e Prasit. He refers to areas of language like the tune rhythm, stress and intonation of speech, and how these features contribute to meaning. Prasad IQ, therefore refers to aspects of Prasit E, which we will cover in another tutorial the break tag as an empty element, which means that it produces no sound. It controls pausing or other prasad IQ boundaries between words. Note that using break tags is completely optional. If this element is not present between words, the break will be automatically determined based on how the text to speech engine processes the linguistic context. In other words, even if you don't have break tags, a T. T s engine will naturally at a pause. After finding certain grammatical features like punctuation in your text, such as periods and commas, a break tag, then allows you to fine tune the spacing of pauses and breaks between words, sentences and paragraphs. If you had a break tag after a word sentence or paragraph, a break will be inserted with a Prasad IQ strength greater than if no break element is supplied. In other words, the text to speech engine will determine the linguistic context of your text and increase the natural pause if it detects a break tag in your content. So while a sentence with no break tags will have natural pauses, added, adding, break tags can extend those pauses and create a more lifelike feel to your narration. As we will see in just a moment. Let's listen to an example of a text file converted into speech without using any break tags. Words have energy and power with the ability to help, to heal, to hinder, toe hurt toe, harm, to humiliate and too humble. Now let's listen to the same text file converted into speech with break tags. Added words have energy and power with the ability to help, to heal, to hinder, to hurt, toe, harm, to humiliate and too humble. Were you able to hear the difference? Let's play the two audio files again one after. The other words have energy and power with the ability to help to heal, to hinder, toe hurt toe, harm to humiliate and too humble words have energy and power with the ability to help, to heal, to hinder, to hurt, toe, harm, to humiliate and too humble. As mentioned earlier, the break tag also lets you use optional attributes like time and strength. Using a break tag with the time attributes lets you find tune your narrations by setting the length of your break or pause using seconds or milliseconds. For example, three seconds or 200 milliseconds. Listen to a sample text file converted into speech with time based break tags added. Let's pause the sentence for 200 milliseconds than 500 milliseconds, then one second, then three seconds, and finally, we'll pause it for four hours. I'm just getting I think you get the idea now about how pauses and breaks work in your text to speech innovations. If you use Amazon Polly to convert your text files into speech, please note that the maximum duration amount you can specify in the break tag as 10 seconds or 10,000 milliseconds. Here is an example of a text file marked up using the break tag with different time attributes. Listen to the synthesized speech narration of this text. Words are singularly the most powerful force available to humanity. We can choose to use this force constructively with words of encouragement or destructively using words of despair. Words have energy and power with the ability to help, to heal, to hinder, to hurt, toe, harm, to humiliate and too humble. Using a break tag with the strength Attributes also lets you find Tune your narrations by setting the length of your breaks or pauses using relative values such as extra strong, strong medium week an extra week. Additionally, you can use the value none to prevent a prasad IQ break or pause that your text to speech processor would otherwise produce and insert into your narration. Please note that if using Amazon Polly to convert your text into speech strength, attribute values air Sina's equivalent to pausing after a comma sentence or paragraph specifying none creates no pause. Use none to remove. A normally occurring pause, such as pauses inserted after a period specifying extra week, has the same strength as none. That has no pause specifying week sets a pause of the same duration as the pause after a comma medium has the same strength as weak strong sets, a pause of the same duration as the pause created after a sentence and specifying extra strong sets, a pause of the same duration as the pause created after a paragraph. Additionally, if you don't use attributes with the brake tag when processing text to speech with Amazon Polly, the results very depending on your text. If there is no other punctuation next to the brake tag, it creates a break strength of medium value, which is the equivalent of a comma length pause. If the tag is next to a comma, it upgrades the tag to a strong break tag, which is the equivalent of a sentence length pause. If the tag is next to a period, it upgrades the tag to an extra strong break tag or the equivalent of a paragraph length pause. Here is an example of a text file marked up using the break tag with different strength attributes. Listen to the synthesized speech narration of this text. Let's create pauses in this sentence using break tags with the strength option. Let's start with an extra strong break. Then a strong break followed by a medium break a week break an extra week break and finally a break between the vowels A e I oh, you and no break at all between vowels A e i o u. As you can see, the break tag lets you specify exact pause durations between words, sentences and paragraphs and can be used to enhance the lifelike aspect of your voice narrations. This brings us to the end of this tutorial. I hope you found this lesson useful. Please refer to the accompanying notes in this section for more information and thank you for listening. 10. 09 - Text-To-Speech Paragraph Tag: hello and welcome back. In this tutorial, you will learn how to add pauses between sentences and paragraphs using paragraph and sentence markup tags. In a previous tutorial, we explained how to use break tags to add pauses, toe words, sentences and paragraphs you can see from this table. That's, um, break Elements perform the same function as using a sentence or paragraph tag. So in addition to using break tags, you can add a pause between paragraphs in your text using the P tag. This is equivalent to specifying a pause using an extra strong break tag. The P tag provides a longer pause. The native speakers usually place at commas or the end of a sentence. To use P tags, you must enclose the paragraph by adding an opening tag at the beginning of the paragraph and a closing tag at the end, as shown in the example below. This is the first paragraph. There should be a pause after this text is spoken. This is the second paragraph. Here's an example of a text file converted into speech using paragraph break tags. Words are singularly the most powerful force available to humanity. We can choose to use this force constructively with words of encouragement or destructively using words of despair. Words have energy and power with the ability to help to heal, to hinder, toe hurt toe harm to humiliate and too humble. Note that P tags can include text to be rendered in the SS ML elements shown in this list. You can also add pauses between sentences in your text using the S tag. This is equivalent to ending a sentence with a period or specifying a pause. Using a strong break tag s tags air useful for adding pauses to versus and lines of poetry . As you will see in just a moment to use s tags, you must enclose the sentence with opening and closing tags as shown in the example below. Mary had a little lamb whose fleece was white as snow. And everywhere that Mary went the lamb was short ago. Much like P tags, s tags can include text to be rendered in the SS ML elements shown in this list. To conclude this tutorial, I want to play you a well known Children's story that has been marked up using break paragraph and sentence tags. Fox in socks by Dr Seuss Fox Socks Box Knox Knox in Bucks Fox in socks. Knocks on Fox in socks, Inbox socks on knocks and knocks in Bucks Fox in socks on box on Knox. Chicks with bricks come chicks with blocks come chicks with bricks and blocks and clocks. Come look. Look, Mr Knox, let's do tricks with bricks and blocks, sir, let's do tricks with chicks and clocks. First, I'll make a quick trick brick stack. Then I'll make a quick trick block stack. You can make a quick trick chick stack. You can make a quick trick clock stack, and here's a new trick. Mr. Knocks Socks on chicks and chicks on Fox Fox on clocks on bricks and blocks, Bricks and blocks on knocks on box. Now we come to ticks and talks. Try to say this, Mr Knox, sir. Clocks on fox tick clocks on lock stock six. Sick bric stick. 66 chicks Talk, please, sir. I don't like this trick, sir. My tongue isn't quick or flick, sir. I get Although sticks and clocks mixed up with the chicks and talks self, I can't do it. Mr Fox, I'm so sorry, Mr Lock. Sir. Here's an easy game to play. Here's an easy thing to say. New socks to socks Who sucks? Sue socks. Who? SOS. Whose socks? Suso Sue socks. Who sees who. So who's new socks? Sir? You see, Suso sues new socks. That's not easy, Mr Fox, sir. Who comes? Crow comes slow. Joe Crow comes Who? Sos Crows Clothes Sue SOS crows Clothes slow Joe Crow SOS Whose clothes sews clothes Suso socks of fox in socks Now slow Joe Crow SOS knocks Inbox Now Sue SOS Rose on slow Joe Crows Clothes Fox SOS hose on slow Joe Crows knows hose goes rose grows knows hose goes, um crows Rose. Gross. Um, Mr Fox, I hate this game, sir. This game makes my tongue quite lame, sir. Mr. Knox, sir. What a shame, sir. We'll find something new to do. Now here is lots of new blue goo now. New goo blue goo gooey, gooey blue goo, New goo, Louie! Louie! Louie grew for chewy chewing. That's what that goose is doing. Do you choose to to go to sir? If so, you said she used to chew, sir, with the goose. Too sad, dude. Sir. Mr. Fox. Sir. I won't do it. I can't say it. I won't chew it very well, sir. Step this way. We'll find another game to play him comes. Then Comes been. Brings Ben Broom. Ben Brings been broom Ben Ben's beams. Broom Been Ben's, Ben's Room. Bim stands. Benj pens. Ben's bent broom breaks in spent broom breaks. Ben's band. Kim's band. Big bands, pig bands, Human Ben lead vans with brooms, Ben stand bangs and VIMs Band booms Pig band boom band. Big band Broom band. My full mouth. Can't say that. No, sir. My poor mouth is much too slow, sir. Well, then bring your mouth this way. I'll find it. Something it can say. Luke, Luck likes lakes, Luke stuck likes lakes, Luke Le clicks lakes Luke stuck clicks lakes, duck takes Lixian Lakes. Luke, Luck likes Luke. Luck takes legs in Lake Stuck likes I can't love such flipper blubber. My tongue isn't made of rubber, Mr Knox. Now, come now. Come now. You don't have to be so dumb now. Try to say this, Mr Knox, please. Through three cheese trees, three free fleas flew while these please Flu freezing breeze blew freezing breeze made thes three trees. Freeze freeze He trees made thes trees Cheese Freeze! That's what made these three free flee sneeze Stop it! Stop it! That's enough! Sir, I can't say such silly stuff, sir. Very well, then, Mr Knox. Sir, Let's have a little talk about Tweedle Beatles What do you know about Tweedle Beatles? Well, when Tweedle Beetles fight, it's called a tweet Will beetle battle? And when they battle in a puddle, it's a tweet Will beetle puddle battle? And when Tweedle Beatles battle with paddles in a puddle, they call it a tweet. Will beetle paddle paddle battle And when Beatles battle beetles in a puddle paddle battle in the Beetle battle puddle is a puddle in a bottle. They call this a tweet or beetle Buttle, puddle paddle battle medal. And when beetles fight these battles in a bottle with their paddles in the bottles on a poodle in the poodles eating noodles, they call this a muddle. Puddle tweet Will poodle beetle noodle bottle paddle battle. And now wait a minute, Mr Socks Fox. When a fox is in the bottle, where the Tweedle Beatles battle with their paddles in a puddle on a new deleting poodle, this is what they call a tweet Will Beetle noodle, poodle bottle cuddled. Muddle doubled. Huddled, waddled. Fox in socks, sir. Fox in socks. Our game is done, sir. Thank you for a lot of fun, sir. This brings us to the end of this tutorial. I hope you found this lesson useful. Please refer to the accompanying notes in this section for more information and thank you for listening. 11. 10 - Text-To-Speech SayAs Tag: Hello and welcome back. In this tutorial, you will learn how to use SS ML tags to interpret how text should be spoken for special characters, certain kinds of words and different types of numbers. The say as element lets you specify how certain characters, words and numbers in your text to speech file should be spoken. A SE as tag requires using the interpret as attributes, which determines what is being processed. Optional attributes like format and detail can also be used, depending on the elements selected. Let me explain what this means when marking up text using a say as element. You should always include the interpret as attributes inside the opening tag, specifying how your special characters, words and numbers should be spoken. For example, if you are marking up numbers, does the number represent a date or time? Is it a telephone number? Is it the number 10 or the 10th object in a row? Is it a fraction or a unit of measurement? We will go through how to mark up each of these values in more detail in this tutorial, also, depending on the value being marked up, you may need to specify additional attributes like format and detail, especially for values like date and time, which could be spoken in several different ways. The say as element lets you specify how you want your text to be spoken for the following items. Cardinal numbers This interprets numerical text as a cardinal number, such as 5 400 or 1234 orginal numbers. This interprets new miracle text as an orginal number, such as 5th 400 or 1234th characters. Use this value to spell out each letter of your text, such as ABC Fractions. This interprets the numerical text as a fraction. Use this value for both common fractions, such as 3/20 and mixed fractions, such as 2.5 expletives. Use this value to BLEEP or censor any content or words within the tag. Using a sound units. This interprets in numerical text as a measurement, such as 1/2 inch 12 ounces, five feet one meter or 200 milliseconds verbatim or spell out. This value is similar to using characters and spells out words. Letter by letter dates use this value for dates such as the 29th of January 1993 Time. Use this value for time, such as 5 48 PM Telephone numbers use this value to indicate that the text as a telephone number. In addition to the above values Amazon Polly also lets you use values such as digits, which let you spell out each digit in your text individually, such as 1234 etcetera and interpreting text as part of street addresses. One other value that we will look at in this tutorial is using interjections in your narrations, which can add an element of fun to your text to speech files. Let's begin with cardinal numbers. Cardinal numbers air just numbers like 5 400 or 1234. The structure for marking up text to interpret cardinal numbers correctly is shown below. Note that the language you select effects how cardinal numbers air spoken. For example, listen to how a US English voice in a UK English voice pronounce the numbers below. The price of this item has $12,345. The price of this item is $12,345. As you can hear a US English voice says the number. 12,345. Where a UK English voice says 12,345 Listen to the two voices again. The price of this item has $12,345. The price of this item is $12,345. In some cases, your text to speech engine will recognize cardinal numbers without the need to use markup tags. Additionally, some text to speech engines recognize the value number instead of cardinal in the interpret . As attributes, listen to a synthesized speech recording of a text file marked up for interpreting cardinal numbers. The height of Mount Everest is 8848 meters, or 29,029 feet. The price of this item has $12,345 the average rent in this area as $2500 per month. Orginal numbers are numbers like 1st 2nd 3rd 5th 13th 401234th etcetera. The structure for marking up text to interpret orginal numbers correctly is shown below like cardinal numbers, the language you select effects. How orginal numbers air spoken, for example, listen to how a US English voice in a UK English voice pronounce the numbers below. Today is the 350th anniversary of the revolution. Today is the 350th anniversary of the revolution, as you can hear a US English voice says the number 350th where a UK English voice says 350th. Listen to the two voices again. Today is the 350th anniversary of the revolution. Today is the 350th anniversary of the revolution. Some text to speech engines may recognize orginal numbers. Written a 2nd 3rd 17th etcetera without the need for using markup tags. Amazon Polly can also interpret orginal numbers written as Roman numerals. If in doubt, you can just write the number out, but this is not necessary. If you use the orginal markup tag correctly, listen to the synthesized speech of the text below, which is written as an orginal number without mark up tags. The second time she came to the library, she walked out with a copy of the Books third edition before running up to the 17th floor. I don't know if this was her first time or her 100 time visiting the library. Listen to a synthesized speech recording of a text file marked up for interpreting orginal numbers. Kevin came first in the annual office marathon. Dwight came in a close second. Creed Third, Pam beat her personal best by being the seventh to finish the crossing line. Stanley came ninth and Michael finished last in 29th place. Here is another variation of text marked up for orginal numbers. Listen to the synthesized speech of the text below. Marked up for Amazon Polly James Charles Stewart was both King of Scotland as James, the sixth and king of England, and Ireland has James, the first from the year 1603 until his death in 16 25. The characters element allows you to spell out words and numbers in your narrations. The structure for marking up text to interpret characters correctly is shown below. Some text to speech engines can recognize and pronounce abbreviations like Triple A and spell out abbreviated words without adding markup tags to text like CIA, FBI, KGB, BBC, etcetera, The kidnappers were now the vehicle fled the scene in this direction. Your eyes are in backwards. It went the other way. Put a cork. How do you spell FBI? Cry? Listen to a synthesized speech recording of a text file marked up for interpreting characters. Who is W H O ou. 812 was the title of Van Halen zeht studio album Su Are we going to take the dog for a W A . L K before it starts raining using the verbatim or spell out elements performs the same function of spelling out words and numbers as using characters. The structure for marking up text to interpret these elements correctly is shown below. Listen to a synthesized speech recording of a text file marked up for interpreting characters verbatim and spell out elements All I'm asking us for a little r e s p e c t. Find out what it means to me r E S p e C T Take care TCB you just a little bit when you come home R E s P E C T back another element you can use in your text to speech. Mark Ups has called Digits. Digits perform a similar function as verbatim spell out and characters, but It only works with numbers, not words. The structure for marking up text to interpret digits correctly is shown below using the digits tag with Google. TTS works with numbers. But if you try to process words, you will get an error and no sound will play using digits with Amazon. Polly works with numbers but does not spell out words. Instead, it just speaks the word. Listen to a synthesized speech recording of a text file marked up for interpreting numbers and words using the digits and spell out attributes. Please write down this security number 12345 Please write down this security number 12345 Please write down this security word Self love. Please write down this security word s CLF space L O V e. Another useful element for marking up numerical textiles fractions. This works for both common fractions such as 3/20 and mixed fractions such as 2.5. The structure for marking up text to interpret fractions correctly is shown below. Some text to speech engines can interpret fractions in your text files like to 9th 2 and 3/4 etcetera without using markup tags for Amazon Polly to interpret mixed numbers as fractions. A plus symbol must be added between numbers in the marked up text, such as three plus 1/2 Amazon. Polly doesn't support a mixed number without the plus symbol. Listen to a synthesized speech recording of a text file marked up for interpreting fractions. Almost 2/5 of U. S. Adults age 20 and older suffer from obesity. Do you know how to divide six by 3.5 without using a calculator or asking Google? We all know that dividing 22 by seven or three and 1/7 is a good approximation to pie, but 355 divided by 113 or three and 16 113th season even closer approximation to the true value of pie. The exploitive element lets you create the effect of censoring words in your narration. Using a sound, the structure for marking up text to interpret expletives correctly is shown below. Listen to a synthesized speech recording of a text file marked up for interpreting expletives. So I says to him, I made what the you mean and he says back to me, made you I can wherever I like. So I says back to him will make If that's the case, then you better before I blow and that's exactly what happened. Officer, I swear the units element lets you interpret numerical text as a measurement for Amazon Polly. The value in your text should be either a number or a fraction, followed by unit of measurement with no space in between, such as in 1/2 inch or just the unit as in one meter. The structure for marking up text to interpret units correctly is shown below. Some text to speech engines can recognize and interpret units without the need to use markup tags in your text. For example, 10 milliseconds 100 kilometers five degrees Celsius, 350 milliliters, 75 meters etcetera. Additionally, some text to speech engines can automatically convert units of measurement into their singular or plural form, depending on the number. Listen to a synthesized speech recording of a text file marked up for interpreting units. The ways at the beach this morning must have been 10 feet high. On average, the speed of a blinking I lasts only 1/10 of a second or 100 milliseconds. The emergency dose of adrenaline to revive someone who has gone into anaphylactic shock is 0.1 milligrams per kilogram of a one milligram per milliliters delusion to a maximum dose of 0.5 milligrams in an adult and 0.3 milligrams in a child. When building a deck for your patio, set up the bearer spacings at a minimum of 1800 millimeters centres with stump holes no more than 1500 millimeters apart. The date element lets you interpret dates in various formats, has spoken text both Google text to speech and Amazon Polly interpret dates using slightly different markup structures. So let's go through each of these separately, starting with Google TTS, the structure for marking up text to interpret dates correctly using Google text to speech is shown below. Note that the date tag contains the required interpret as element, plus two additional attributes format in detail. One other thing to note is that dates used in the text field can be separated using punctuation such as hyphens, spaces and even know spaces, as shown in the example below. Let's talk about the format attribute of the dates. Element the format attributes uses the characters why M and D for year, month and day of the month, respectively. As we will see in a moment, you can use various combinations of these three characters in the format field. There are, however, a couple of rules to follow. If the format element includes the character, why, then? The date text field must include a year, for example, the year 1965. If the format element includes the character M, then the date text field must include a month. For example, March, September, December etcetera. If the format element includes the character D, then the date text field must include the day of the month, such as the 7th 24th or 31st of the month. Additionally, if the character why is included in the format, attributes than the year must be written as a four digit number, so right the year is 1978 not just 78. If the character D is included in the format attributes as a single digit, then you can use a single digit for days like the fifth of the month. If the format contains two D's, then use double digit numbers for days such a 05 The same applies to months. If the character AMAs included in the format attributes as a single digit, then you can use a single digit like nine for the month of September or four for April. If the format uses double month digits, then make sure that all month numbers air double digits like 04 for April 09 for September , etcetera. Next, we have the detail attributes the detail attributes controls the spoken form of the date. Do you have two options? Option one and Option two? Let's talk about Option one first. If detail option equals one on Lee the day fields and one month or year fields air required . Although both fields can be supplied Option one as the default structure for interpreting dates when less than all three fields air given in the format element. Typically, you won't need to add the detail one element to the markup tag. If this is the default structure for interpreting dates, as the text to speech engine should automatically switch to this format, the spoken form for Option one as the orginal day of the month and year, so in the examples shown below the spoken form of the text would be the 19th of May 1991 for the first example and the second of March. For the second example, if detail option equals to the day, month and year fields air required Option two as the default structure for interpreting dates when all three fields air supplied in the format element. Typically, you won't need to add the detail to element to the markup tag. If this is the default structure for interpreting dates, as the text to speech engine should automatically switch to this format the spoken form for Option two as month orginal day and year. So in the examples shown below, the spoken form of the text would be January 15th 1929. For the first example, March 14th 18 79 for the second example and September 5th 1946 for the last example. Before we talk about marking up text for interpreting dates using Amazon Polly, let's hear some spoken examples of text marked up using the formats we have just discussed . First listen to a synthesized speech recording of a text file marked up using different dates, basing options. I was born on November 16th 1968. My sister was born on June 22nd 1971. My brother was born on February 10th 1974. Now listen to a synthesized speech recording of a text file marked up using different date format in detail values. My family and I migrated to this country. We arrived here on the 26th of June 1952. I have another appointment with Chiropractor on the ninth of September. Albert Einstein won the Nobel Prize for Physics on November 9th, 1922 for his services to theoretical physics and for his discovery of the law of the photo Electric effect. Let's talk now about marking up text for interpreting dates using Amazon Polly. The structure for marking up text to interpret dates using Amazon Polly is shown below. Note that the date element contains one additional attributes format. Separate the date elements in the text field using hyphens, except when using the format y y y y M m d. D. All the date formats listed here can be used with Amazon. Polly here is a useful tip when using Amazon Polly to interpret dates. If you use the Y y Y Y M M D D format, you can make Amazon Polly skip parts of the date, using question marks specifying the format attributes in the markup tag has also not required. For example, Amazon Polly renders the examples shown below as follows. The 22nd of September, September 1989. Listen to a synthesized speech recording of a text file marked up for Amazon Polly using different date formats. Game of Thrones aired its first episode on HBO on April 17th 2011. After eight seasons, the final episode of Game of Thrones went to air on May 19th 2019. King John of England signed the Magna Carta on June 15th 12 15. Did you know that January 4th his National Spaghetti Day? And that November 10th is National Vanilla Cupcake Day. Talk about celebrating carbs. Many people panicked as they believe that cataclysmic events would occur after December 2012 when the ancient Mayan calendar came to an end. Julius Caesar, crossing the Rubicon River in January 49 BC was the event that precipitated the Roman Civil War. We get paid each month on the 15th. Our wedding anniversary is in August, in 1964 Xerox Corporation introduced the first commercialized version of the modern fax machine. But until someone else bought one of their machines, they had no one else to send faxes to. On July 21st 1969 Neil Armstrong became the first human being to walk on the surface of the moon. But Buzz Aldrin was the first man to skip and urinate on it. The time element lets you interpret time in different formats has spoken text both Google text to speech and Amazon Polly interpret time values differently. So let's go through each of these separately, starting with Google TTS the structure for marking up text to interpret time correctly using Google text to speech is shown below note that the time tag contains the required interpret as element plus two additional attributes, format and detail. One other thing to note is that time values used in the text field can be separated using punctuation and or spaces, as shown in the example below. The format Attributes uses a sequence of time field character codes, H. M s, Z 12 and 24 for our minute of the our second of the minute time zone, 12 hour time and 24 hour time, respectively. The default format is H. M S 12. If our minute or second are not specified in the format or there are no matching digits, the field is treated as a zero value. Time can be interpreted as hour of the day, for example, for 26 PM or time duration, such as four hours and 20 minutes. The detail element controls whether the spoken form of the time as 12 hour time or 24 hour time. You have two options. Option one and Option two, the spoken form as 24 hour time. If detail equals one or if detail is omitted, and the format of the time as 24 hour time, the spoken form as 12 hour time. If detail equals two, or if detail is omitted and the format of the time as 12 hour time, listen to a synthesized speech recording of a text file marked up for Google Text to speech using different time formats. 4 26 PM two hours, seven minutes and nine seconds, 16 hours 39 minutes and 57 seconds. Pacific Standard Time. 6 22 Eastern standard time. Five Oclock 1700 5 p.m. Five. Amazon Polly interprets the time element of numerical text as duration in minutes and seconds and can also recognize basic time formatting the structure for marking up text to interpret time using Amazon. Polly is shown below Listen to a synthesized speech recording of a text file marked up for interpreting time using Amazon Polly one minute and 21 seconds. 4 26 PM five oclock 1700 hours 3 18 The telephone element indicates that the contained text as a telephone number Google, T TS and Amazon Polly interpret telephone values slightly differently, so will cover both of these processes. Separately, the structure for marking up text to interpret telephone numbers correctly using Google text to speech is shown below. Note that the telephone element lets you at international codes in the format field. The Google text to speech engine will interpret international codes correctly in the text field, even if the country code present in the format element does not match it. Additionally, it will interpret phone number extensions and even phone words. Listen to a synthesized speech recording of a text file marked up for Google text to speech using telephone numbers. 5556789 5556789 Extension 345 plus 3 +98 OO +123456 plus 3 +98 OO +123456 six Saito 5556789 16 Saito 5556789 +18662255631 +155574992 Amazon Polly interprets the numerical text as a seven digit or 10 digit telephone number. Telephone extensions can also be included. Please note that at the time of recording this lesson, the telephone option could only be used with English language voices. The structure for marking up text to interpret telephone numbers correctly using Amazon Polly is shown below some other things. To keep in mind is that Amazon Polly can interpret phone numbers in text without mark up tags if dashes air used in the telephone numbers. Also, please note that the language you select effects how telephone numbers air spoken, for example, listen to the difference between how an American English voice says the telephone number below and how a UK English voice says it. Veronica's telephone number is 2122241555 Extension 666 Veronica. His telephone number is 212 double to 41 Triple five extension. Triple six. Listen to a synthesized speech recording of a text file marked up for interpreting telephone numbers. Using Amazon Polly 5551212 20 to 5551212 20 to 5551212 Extension. 345 5556789 5556789 Extension. 345 6805556789 16805556789 One additional element you can use with Amazon Polly Voices as the address element, which lets you interpret texts as part of a street address. The structure for marking up text to interpret an address correctly using Amazon, Polly is shown below Listen to a synthesized speech recording of a text file marked up for addresses. 14 slash 72 53 The Boulevard Springfield, 63103 Missouri USA Apartment, 69 1 88 Grand Central Tower, Cloudburst County, New South Wales 2177. Australia 59 40. Ferguson Road, Richmond, British Columbia v seven B one M six, Canada The last element I want to cover before the end of this lesson are interjections interjections, also called speech Cons Can be added to text using the markup tag shown below. Please note that speech cons are accustomed. Library created for Amazon Alexa During the recording of this lesson speech cons were not available for Amazon Polly Voices. So what I'd like to do is just play you a recorded screen video of various speech cons so you can hear what the's sound like. Abacha. Deborah got to uh huh. Him. Ahoy! All righty. I low, huh? Yoga. Argh! Areva! Daraji! As you wish. Bar voie a man. Ah, Botta bing bada boom bah, humbug bam, Bang, Batter up, Zynga baby Bingo, blah lard Last Boeing bone uppity. Both your bon voyage Osh Boo hoo hoo! Boom! Booyah! Bravo, bomber Car ching! Checkmate! Cheerio. Cheers. Cheer up. Trip choo choo clank Click clack Cock a doodle. Ooh! Coup! Cowabunga! Darn! Kim Dong! Ditto. Don't dot, dot dot Duh. Dumb. Don't! Don't done dynamite. Ik it. Encore on guard! Eureka! Fancy that, Geronimo! Giddy up! Good grief. Good luck. Good riddance. Gotcha! Great. Scott, Heads up! Hear, Hear! Hip, hip! Hooray! Hiss, hog, Patty! Hurrah! Hooray! Huzzah! Jeepers creepers! Jiminy Cricket, Jenks. Just kidding. Kaboom! Cobb lamb coaching Kapow Chao Co Xam ca bam ka boom! Coaching could chew ca flop. Could plop kerplunk. Kapow her slat her sump! Knock, knock! Miss, I look out! Mamma Mia! Man overboard! Maazel toff me out. Messi Who? No, no, no, No. Meaner! Meaner. No way. Now, now how, boy? Oh, brother! Oh, dear. Oh, my Oh, snap Link! Okay. Dokey. Poof! La la Open sesame! Ouch! Boy, you fuII Pim Club Poof! Pump. How quack! Read em and weep. Ribbit, Right. Oh, Roger. Retro shocks slash Spoiler alert. Squeaky swish. Swoosh! Uh, Toyota. He he there's there. Sump tick, tick tick, Tic tac. Touche! Tisk, tisk, Tweet! Uh huh. Uh oh. Voula from Whoa! Not want. Watch out! Way to go! Well done! Well, well. Wham, whammo! We que wolf! Whoops! A daisy Who? Wow! Wow! Za wowser yada, yada, yada. Yea, Yikes! Maybe you ink You know who you bet. Yowza! Yeah, Hauser. Yuck! Yum, zap, zing! Zoinks! This brings us to the end of this tutorial. I hope you found this lesson useful. Please refer to the accompanying notes in this section for more information and thank you for listening. 12. 11 - Text-To-Speech Emphasis Tag: Hello and welcome back. In this tutorial, you will learn how to use SS ML tags to emphasize certain parts of your text as both Google text to speech and Amazon Polly interpret emphasis differently, we will cover both of these separately. The emphasis element is used to emphasize text. This element modifies speech similarly to Prasit E, but without the need to set individual speech attributes. The emphasis element supports an optional level attributes, which changes the degree of emphasis added to text the structure for marking up text to interpret emphasis using Google text to speech is shown below Google Text to speech supports the following Emphasis levels strong, moderate, reduced and none listen to a synthesized speech recording of a text file marked up using different emphasis levels. Give it back, said Sue, as her brother hid the treat in his pocket. No, it's mine, said Tim, fending off his little sister. I'm warning you, said Sue, advancing menacingly. Oh, no, I'm so scared, said Tim with a smirk. You better be. I'm telling mom, said Sue, wagging her finger in his face. Let's talk now about marking up text for interpreting emphasis using Amazon Polly, the structure for interpreting emphasis. Using Amazon Polly is the same, but with Amazon Polly emphasis changes the rate in volume of the speech. More emphasis makes Amazon Polly speak the text louder and slower, and less emphasis makes it speak quieter and faster. Amazon Polly supports the following emphasis levels strong, increases the volume and slows down the speaking rate. So the speeches louder and slower moderate increases the volume and slows down the speaking rate, but not as much as when set too strong. If the level is not included in the markup tag than Amazon Polly processes emphasis at the moderate level Is the default setting reduced, decreases the volume and speeds up the speaking rate. The speeches, softer and faster. Listen to a synthesized speech recording of a text file marked up for Amazon Polly, using different emphasis levels. Give it back, said Sue, as her brother, hidden to treat in his pocket. No, it's mine, said Tim, fending off his little sister. I'm warning you, said Sue, advancing menacingly. Oh, no, I'm so scared, said him with a smirk. You better be. I am telling Mom, said Sue, wagging her finger in his face. This brings us to the end of this tutorial. I hope you found this lesson useful. Please refer to the accompanying notes in this section for more information and thank you for listening. 13. 12 - Text-To-Speech Prosody Tags: to see a world in a grain of sand and a heaven in a wild flower. Hold infinity in the palm of your hand and eternity in an hour to see a world in a grain of sand. Onda heaven in a wild flower. Hold infinity in the palm of your hand and eternity in an hour. Hello and welcome back. In this tutorial, you will learn about Prasit E and how to use SS ML tags to change Prasad IQ elements in your text to speech files. I will explain what process he means. Talk about some related terms and show you how to change the pitch, volume and rate of your spoken text. Prasit. He refers to areas of language like the tune rhythm, stress and intonation of speech, and how these features contribute to meaning. Prasad. IQ refers to attributes and aspects of Prasit e the process. The element is used to customize the pitch, volume and speaking rate of your tags Speech. The structure for marking up text to interpret Prasit E is shown below. If you record the same text using different voices, you can see that some voices will say the same thing at a slower, faster rate of speech volume, speech rate and pitch are dependent on the specific voice selected. In addition to differences between voices for different languages, there are differences between individual voices speaking the same language because of this , while attributes air similar across all languages, there are clear variations from language to language. This means that there are no absolute values only relative values. Relative values can be written as a percentage or a number preceded by a plus or minus sign , followed by a percentage symbol, for example, plus 15.2% minus 8% or is a relative number for pitch attributes. Relative changes can be given in semi tones, using a number preceded by a plus or minus sign, followed by S. T, which stands for semi tones, for example, plus 0.5 semi tones plus five semi tones minus two semi tones. Etcetera. Note that the units for str case sensitive a semi tone is half of a tone or 1/2 step on the standard diatonic scale. Listen to a synthesized speech recording of a text file marked up using different process, he attributes. Quantum computing is the use of quantum mechanical phenomena such a superposition and entanglement to perform computation. Quantum computing is the use of quantum mechanical phenomena, such a superposition and entanglement to perform computation. Quantum computing is the use of quantum mechanical phenomena, such a superposition and entanglement to perform computation. Quantum computing is the use of quantum mechanical phenomena, such a superposition and entanglement to perform computation. Let's take a look now at the PRASAD IQ elements of pitch volume and rate of speech and how these air interpreted by Google text to speech and Amazon Polly way. Changing the pitch of your speech lets you raise or lower the tone of your selected voices . There are three options for setting the value of pitch attributes with Google text to speech. You can specify a relative value such as extra low, low, medium, high, extra high and default where the medium value as the default pitch. You can also increase or decrease pitch by specifying a number of semi tones. Note that when using this method, using plus or minus signs and S T are required, you can also increase or decrease pitch using percentage values. Note that the percentage symbol is required, but using plus or minus signs are optional. Have a listen to the various pitch values used to narrate the sample text below. The first sentence has no mark up so you can hear the default voice. If I had a world of my own, everything would be nonsense. Nothing would be what it is because everything would be what it isn't. And Contrariwise, what is it wouldn't be. And what it wouldn't be it would you see? I don't know what you mean, said Alice Amazon. Polly lets you set, raise or lower the pitch of your speech using a pre defined value like extra low, low medium high an extra high. You can also increase the pitch by specifying a percentage, for example, plus 10% or plus 5%. Note that the maximum value allowed is plus 50%. If you set the value higher than this amount, it will only be rendered at the maximum value of plus 50%. You can also decrease the pitch by specifying a percentage such as minus 10% minus 20% etcetera, the smallest value allowed for decreasing pitch using percentages as minus 33.3% specifying a value lower than minus 33.3% will only be rendered at the minimum value of minus 33.3%. Listen to a synthesized speech recording of a text file marked up for Amazon Polly Voices using different pitch attributes. A dream is not reality, that who's to say which is which? Everyone wants some magical solution for their problem, and everyone refuses to believe in magic. No wonder you're late. Why this watch is exactly two days slow. You used to be much more much here. You've lost your muchness Sometimes I believed in as many as six impossible things before breakfast. I have a theory. People talk loud when I want to act smart. Right with Google. Voice is you can change the volume of your spoken text by using a number preceded by the plus or minus sign immediately followed by DB for decibels or use values like silent, extra soft, soft, medium loud, extra loud or default. The default is plus 0.0 decibels. Note that specifying a value of silent as equal to specifying minus infinity decibels and that all numerical volume levels in decibels air relative to the current level and should always have a plus or minus sign, including zero Using the label default resets the current volume level. Listen to a synthesized speech recording of a text file marked up for Google TTS using different volume attributes. I am speaking this line at the default volume for this voice. I am speaking this. Line it approximately twice the original signal amplitude. I am speaking this line it approximately half the original signal amplitude. Amazon Polly lets you change or set the volume to a pre defined level for your current voice, using values like silent extra soft, soft, medium loud, an extra loud. You can also increase the volume relative to the current volume level. For example, plus zero decibels means no change of volume, plus six decibels as approximately twice the current amplitude. Please note that the maximum positive value allowed as about plus 4.8 decibels. Additionally, you can decrease the volume relative to the current volume level. For example, minus six decibels means approximately half the current amplitude. Listen to a synthesized speech recording of a text file marked up for Amazon Polly voices using different volume attributes. I am speaking this sentence of my normal volume. I am speaking this sentence that allowed a volume. Whenever I wake up, I tend to speak very slowly As my brain gets into focus. I can speak with my normal pitch, but also with a much higher pitch. And sometimes I can even speak with a much lower pitched voice. Another attribute of Prasit EU can change in your text as the rate of speech use relative values like extra slow, slow, medium fast, extra fast or default to set the rate of speech or a percentage when the value is a non negative percentage. It acts as a multiplier of the default rate. For example, a value of 100% means no change in the speaking rate. A value of 200% means a speaking rate of twice the default rate, and a value of 50% means a speaking rate of half the default rate. Also, it's important to keep in mind that the default rate depends on the language, dialect and personality of the voice being used. Listen to a synthesized speech recording of a text file marked up for Google. TTS voices using different rate attributes a little boy blue come blow your horn, the ships and meadow. The cow's in the corn. Where is that boy who looks after the sheep? He's under a haystack. Fast asleep With Amazon Polly Voices you can set the rate of speech using relative values like extra slow, slow, medium fast, an extra fast, or specify a percentage to increase or decrease the speed of the speech. 100% indicates no change from the normal rate, while percentage is greater than 100%. Increase the rate and percentages below 100%. Decrease the rate. Note that with Amazon Polly voices the minimum value you can specify as 20%. Listen to a synthesized speech recording of a text file marked up for Amazon Polly Voices using different rate attributes. This is how I go when I'm speaking extra slow. I always speak extra fast when I'm having a blast. Let's take it down a notch, then wait and watch. If I talk a little faster, this won't sound like a disaster. If I slow down just a little, you can meet me in the middle. In summary, you can use Prasidh e elements with combined pitch volume and raid attributes to fine tune your text to speech files and improve the quality of your narrations. This brings us to the end of this tutorial. I hope you found this lesson useful. Please refer to the accompanying notes in this section for more information and thank you for listening. 14. 13 - Text-To-Speech MaxDuration Tag: - Allen wrenches, terrible feeders, PortSys, electric heaters, trash compactors, juice, extractor, showered runs in water meters, walking talkies, copper wire safety goggles, radial tires, pellets, rubber mallets, fans and dehumidifiers. Picture hanging paper cutters, waffle irons, window shutters. Paint removers until we were masking tape impacted dollars. Kitchen faucets, folding tables, weatherstripping, proper cables, looks and background. It's back with power. Father's legal has to sign for fumigation. High performance invocation, meddling waterproofing, multipurpose insulation, air compressor, grass connectors reckon chisel smoke detectors. Gauges answer. Kate is thermostats and defectors. Trailer hitch de magnetize mentors. Automatic circumcised tends records, angle brackets for ourselves and energizing soffit panels, circuit breakers, vacuum cleaners, coffee makers, populated generators, matching salt and pepper shakers Way Hello and welcome back. In this tutorial, you will learn how to set a maximum duration for synthesized speech using the process the Amazon Max Duration tag. In a previous lesson, we covered using the process the element to customize the pitch, volume and speaking rate of your tags speech. You can also specify how long you want your spoken text to take using the process. The Amazon Max duration tag. Please note that this feature is currently only available for Amazon Polly voices, not Google voices. It's also important to keep in mind that the duration of synthesised speech will very slightly, depending on the voice you select. This makes it difficult to match synthesized speech with visuals or other activities that require precise timing and can be especially challenging if you plan to translate text into different languages. The structure for marking up text using the process the Amazon Max duration tag is shown below some of the uses for the process. The Amazon Max duration tag includes sinking, recorded or translated audio narrations to videos, slide presentations, etcetera. Other uses include being able to match synthesized speech to time restrictions. For example, if you are recording a narration for a 32nd radio ad in your message takes 20 seconds to deliver and you are required to include a legal disclaimer at the end of your announcement . You may want to compress the disclaimer into the remaining 10 seconds using a synthetic generated recorded disclaimer. Here is an example of a disclaimer that would normally take 15 seconds to deliver compressed into 10 seconds using the Process i e. Amazon max duration tag paid for by Taxpayer Election Association Political Action Committee, authorized by MP Johnson and LV Harvey on behalf of the T pack. Special terms and conditions apply. Please see our website for more details at www dot cpac dot work, the maximum duration of your speech can be specified in seconds or milliseconds. The process. The Amazon max duration element insurers that any text placed within max duration tags do not exceed the specified duration. If the speech using your chosen voice or language would normally take longer than the specified duration, Amazon Polly will speed it up to fit into the specified duration. Also, if the specified duration is longer than it takes to read the text at a normal rate, Amazon Polly will read the speech normally. In other words, it won't slow down the speech or add silence so the resulting audio will be shorter than the time specified. Also note that Amazon Polly can increase the speed of your spoken text no more than five times the normal rate. If Texas spoken faster than this, it will probably be unintelligible. Additionally, if the speech cannot fit within your specified duration, even when speeded up to the maximum, the audio will be speeded up but will last longer than the specified duration. Some other things to keep in mind when marking up your text to speech files as that you can include a single sentence or multiple sentences within a max duration tag, and you can use multiple Prasit E Amazon max duration tags within your text. When calculating the maximum duration time you have specified, a TTS processor will take into account any breaks or pauses added to the text and include this in the duration period. Additionally, Amazon Polly will preserve the short pauses that occur where commas and periods air placed within a text passage. A useful tip when using this tag as to try and keep your text passages short to reduce speech synthesis. Layton see during the audio conversion process, listen to a synthesized speech recording of a text file marked up using multiple Prasit Imax duration tags. Speech is a special mode of communication. Evidence suggests that the specialized anatomy that confers human speech reached its present state sometime between 150,000 years ago. The larynx is a complex structure made of cartilage, muscle and other soft issues. The last thing I want to cover in this lesson are some of the limitations of using the max duration tag and how it works with other s SML tags, For example, you can't nest max duration tags. If you put one max duration tag inside another Amazon, Polly will ignore the inner tag. Additionally, using the process, the rate tag doesn't work with the max duration tag as the function of both tags as to affect the speed of your spoken text. Finally, text used inside a Prasit E Amazon max duration tag can't be longer than 1500 characters. The text shown below, for example, which was used in the opening video of this lesson, is 932 characters long, so quite a sizable amount of text can be used before we end this lesson. Let's have a little fun. Are you ready? Some of us gotta do to get it through, run superhuman and they haven't made anything you're saying you're ashamed. Devastating. Remember demonstrating how to give audiences feeling like it's levitating, never fading whenever the haters for everywhere we could say I will be celebrating because I know the way this brings us to the end of this tutorial. I hope you found this lesson useful. Please refer to the accompanying notes in this section for more information and thank you for listening. 15. 14 - Text-To-Speech Pronunciation Tags: you say either, and I say Either you say neither and I say neither, neither. Neither. Neither. Neither. Let's call the whole thing off. Do you like potato? And I like what Sato You like tomato and I like tomato, potato, tomato, tomato, tomato. Let's call the whole thing Hello and welcome back. In this tutorial, we cover text pronunciation. You will learn how to mark up your text to pronounce acronyms and abbreviations, using phone names for different phonetic pronunciations and ways to improve speech pronunciation by specifying parts of speech and alternate meanings in your markup elements . Let's start with how to mark up text files to pronounce acronyms and abbreviations correctly. An acronym as a word or name formed as an abbreviation from the initial components of a phrase or a word. Usually individual letters like NATO or scuba. You can use the sub alias element to substitute words or expand acronyms. Name of elements or abbreviations. Such a saying the Federal Bureau of Investigation instead of FBI, the British Broadcasting Corporation instead of BBC, the International Monetary Fund instead of I m f. Etcetera. The sub alias tag can also be used to provide the correct pronunciation for words bleeds, which are words with numbers for letters or unique names that TTS engines can't pronounce correctly. The structure for marking up text to interpret acronyms and abbreviations as shown below. Here are some useful tips when using the sub tag. At the time of creating this lesson, Google voices don't support using phone names, which we will look at in a moment you can use. The sub element with Google Voice is instead, if you plan to use the same acronyms repeatedly throughout your text, use the sub alias element to expand the first instance of the acronym in your text so your listeners know exactly what you are referring to. The sub element can also be used to provide simplified pronunciations of words that TTS engines find difficult to read, such as words in other languages. Listen to a synthesized speech recording of a text file marked up using Sebelius tags. The primary role of the World Health Organization is to direct international health within the United Nations system and to lead partners in global health responses. Iridium is a very hard, brittle, silvery white transition metal of the platinum group Alcoholics Anonymous is a 12 step program of recovery from alcoholism. The only requirement for being a member of A is a desire to stop drinking. Tim read his first book when he was only three years old. Let's take a look now at how to use phone names and add phonetic pronunciation to your text . You can use the phone name tag with Amazon Polly to add phonetic pronunciation to specific text. Note that Google TTS currently doesn't support using phone names. The phone name tag must include the following two attributes. Alphabet is used to indicate which phonetic alphabet Amazon Polly should use, and pH specifies the phonetic pronunciation you want Amazon Polly to use instead of the standard pronunciation associate ID by default with the language used by the selected voice . The structure for marking up text to interpret phone names correctly is shown below Amazon . Polly can interpret phone names from various phonetic alphabets, including I p A. Sampa, an Amazon pinion for Mandarin Chinese phonetic pronunciation. If you plan to use phonetic alphabets in languages other than English, remember to specify the correct language before processing your text. Listen to a synthesized speech recording of a text file marked up using pinion phone aims. You sure war, we're sure. Now Amazon Polly supports using phone names in many different languages. Refer to the documentation accompanying this course for links to phoning tables, tools and resource is for supported phonetic alphabets. It's beyond the scope of this course to teach you about phonetics. You can find many video tutorials online on this topic. The important thing you need to know for this lesson as how dad phone names when marking up your text. So let's go through this process briefly. Let's start by studying the structure of the phoning tags below and listening to how the words air pronounced using different phone names and phonetic alphabets. You say pecan. I say pecan, you say pecan. I say pecan. We've talked about using phonetic tools in the text to speech tools lesson, and we also provide information and links to various phonetic tools, and resource is in the accompanying course documentation. The other thing we've talked about was the importance of saving text files containing phone names for Amazon Polly using the UTF eight format. So let's review both of these points briefly. The first step is to locate and copy the correct phonetic spelling of words. You can do this for free, using sites that provide online dictionaries with phonetic spelling and phonetic conversion tools. Refer to the course notes for more details. After specifying the phonetic alphabet to use and pasting the phone names into your text files. Alphabet and PH attributes, remember to save your text file using utf eight Encoding. As shown here now that we've looked at the sub alias and phoning tags, let's take a look at ways to improve the pronunciation of your words. Hedren M. Zehr words that are spelled identically but have different meanings when pronounced differently. Amazon Polly is actually very good when it comes to recognizing hetero names and words that convey different meanings. Depending on the context, they're used and the technology is only getting better. For example, have a listen to how Amazon Polly interprets the following sentences. Without any markup, tags added, the band will record a record. We refuse to take on more refuse. This country will never progress unless we achieve progress. Those farmers produce a lot of produce. Your Honor, I will not contest the contest. Don't rebel unless you're a rebel. Please don't subject us to more pain. By discussing the subject of your operation, I will contrast all the different contrast options of these laptop screens on my blogged. When the brush fire got close, the authorities decided to close the road. Some words, however, can be tricky or difficult for TTS engines to recognize and interpret correctly. Listen to the sentences below as they're being read out, and we will then discuss this in the next slide. I learned to read and read to learn. I learned to read and read to learn. Turn up the bass on your radio and you will catch more bass. Turn up the bass on your radio and you will catch more bass, as you have just heard. Some words like Read and base can be difficult for TTS engines to interpret correctly. This is where the W tag comes in handy. You can use the W tag in Amazon Polly to customize the pronunciation of words by specifying either a part of speech or an alternate meaning. This is done using role attributes specifying a part of speech. Lets you tell Amazon Polly whether to interpret the word read as a verb in the present, Tense says, and I will read this book or is the word red in the past tense? As in, I have read this book. You can do this using different attributes like Amazon VB, an Amazon V B D. Use Amazon sense underscore one for alternate meanings to words. For example, the noun base usually refers to the lowest part of the musical scale, but it can also be a species of freshwater fish if the pronunciation of the word is different. If you don't want the default meaning of words that are spelled the same but have different meanings when pronounced differently, then use the alternate meaning tag. The basic structure for marking up text to customize the pronunciation of words using the W element has shown below and in the following slides, the word R E a d may be interpreted as either the present simple form read or the past participle form red. The word B. A s s may be interpreted as either a musical element base or is its alternative meaning a freshwater fish bass In summary ways to improve text pronunciation, using S SML include just allowing the technology to keep getting smarter in terms of providing better context recognition or using tags like the sub alias tag to pronounce acronyms and abbreviations. The phoning tag to pronounce words with different phonetic sounds. And the w tag to specify parts of speech and alternate meanings. Spin text tools left you alternate text with alternate synonyms. You can use the Department of Motor Vehicles website to renew your vehicle registration online. Either she goes or I go There is no other or the sad face on this T shirt has a tear in it . I suspect the main suspect in the Cathedral of Notre Dame's fire is totally crooked. This brings us to the end of this tutorial. I hope you found this lesson useful. Please refer to the accompanying notes in this section for more information and thank you for listening. Something must be, uh because I like you. No way. No way must 16. 15 - Text-To-Speech Add Audios: Ladies and gentlemen, welcome. Here is our opening act for the night. Nor thank you, dear listeners, This is my first attempt, Everett doing stand up comedy. Please be kind. Oh, I plan to entertain you with some fabulous jokes tonight. Who? Are you ready? Yeah. Okay. Here we go. How much higher would see levels be if sponges didn't live in the ocean? Theo, what's the difference between ignorance and apathy? I don't know. And I don't care. How did you hear about the semi colon that broke the law? He was given two consecutive sentences. Did you hear about the thief who stole a calendar? He got 12 months. That's all the time I have. Thank you. Hello and welcome back. In this tutorial, you will learn how to insert audio files into your voice narrations using the audio element . We will also look at other markup tags for inserting audio and media elements into your text to speech files. The audio element lets you insert recorded audio files into your voice narrations. Currently, the audio element is only supported in Google voices. The basic requirements for using audio files in your text include making sure that the audio file. Source. U R L uses the https protocol a maximum duration of 120 seconds and a maximum file size of five megabytes. You should also include a description to be read aloud if for some reason, your audiophile doesn't play the structure for marking text for audio files as shown below . One way to avoid problems with audio files Not playing during the text to speech conversion process is to host the audio files yourself on a cloud or online media storage service like Amazon s three Dropbox or Google Drive. We provide links to more information about tools and resource is for hosting audio files in the accompanying course documentation. Just make sure that links pointing to your audio files used the https protocol. You can insert different audio file formats like Wave MP three inaug into your text. Listen to the following speech Recording. This is the sound of a child laughing thistles, the sound of a dog barking. Notice that in the recording you've just heard sounds play in the order that they appear in the text. In other words, audiophiles normally play in sequence one after another. Now listen to this recording way. What is going on here? I can't even hear myself thinking. With all these mad noises going on in the recording, you have just heard all the sounds air playing simultaneously. How does this work? Let me show you. In addition to using the audio element in your voice narrations, you can use elements like parallel sequential and media tags to fine tune your speech. Parallel tags let you play multiple media elements simultaneously. Sequential tags let you play media elements as they appear in your text. Media tags let you add text and audio elements inside parallel and sequential tags and use attributes to modify these like fade text and audios in and out. Increase or decrease volume. Repeat and set the duration of media elements and specify where media elements begin and end. Think of parallel and sequential tags as containers. You can place texture sounds inside media tags and modify these using different attributes . If you need more control of your text and embedded audio files, use parallel tags to play media elements simultaneously or sequential tags to play media elements in the order in which they are written in your text file. Additionally, use various media attributes to fine tune your speech. I will show you how this works in just a moment. First, let's take a quick look at the media attributes that you are allowed to use with text and audio files. Media attributes give you finer control over any text or audio inserted into your speech. Use. Begin to specify when you want a media element to begin playing. For example, you can specify a media element to play after 37 or 9.5 seconds. Use end to specify when a media element should stop playing. This is useful if you only want to play the first few seconds of a long audio file and then stop playing the file. Repeat count lets you specify how many times you want the media element to repeat, for example, two times five times 10 times etcetera. Repeat duration lets you place a limit on the duration of the inserted media. Sound level lets you adjust the sound level of your audio. This is useful if you are playing media elements simultaneously and want one of the elements to play louder or softer in the mix. Fade in duration lets you specify when a media element should fade into play, and fadeout duration lets you specify how far text or audio should start fading out From the end of your media element, listen again to the recording where various media elements are all playing simultaneously. Wait, what is going on here? I can't even hear myself thinking with all these mad noises going on. If you study the markup of this text, you will see five media elements enclosed within P A R tags. The first element is spoken texting closed in speak tags and set to begin 12 seconds into the speech with a raised volume level of four decibels, the second element as an audio file of a child laughing, set to repeat four times with a slightly lower volume than the narration. The next three media elements are audio files of dogs barking and cars with sirens set to end the speech After 20 seconds, with a five second fade out at the end, note that you can combine various media attributes when marking up media files. Now that you understand how to create a narration with simultaneous audiophiles playing in parallel, let me play the file one more time so you can pick out all the various media elements and their settings way. What is going on here? I can't even hear myself thinking with all these mad noises going on. Here is another example of how to insert audio files to play simultaneously using parallel tags. Here we go, getting ready to fonte body way to move to the rhythm. Feel the love getting ready to body body. But and here is an example of how to insert audio files to play sequentially with some media adjustments. Hi, Olivia. Hello, Kate. How are you? Good, Thank you. Can you recite the English alphabet for our listeners? Show A B C D E f g h i J k l M N O p Nine more letters and then said in summary, used the audio element to insert pre recorded audio into your voice. Narrations use parallel tags to play media files simultaneously. Sequential tags to play media files in sequential order and media tags to combine speech and audio files with attributes that let you find to media element settings like beginning and end times set repeat and duration increase and decrease volume and set fade ins and fade outs. This brings us to the end of this tutorial. I hope you found this lesson useful. Please refer to the accompanying notes in this section for more information and thank you for listening. Ladies and gentlemen, please welcome Noah back to the stage. Thank you. I woke up this morning and forgot which side the sun rises from. Then it dawned on me. I've just written a song about tortillas. Actually, it's more of a rap. So what if I don't know what Armageddon means? It's not the end of the world. The world tongue Twister champion just got arrested. I hear he's been given a really tough sentence. I recently decided to sell my vacuum cleaner. All it was doing was gathering dust. I hate Russian dolls. They're so full of themselves. What do you call a B that can't make up its mind? Maybe Velcro. What a ripoff. Sometimes I took my knees into my chest and lean forward. That's just how I roll. You've been such a wonderful audience. Thank you and good night. 17. 16 - Text-To-Speech VoiceFX: Are you ready? Hello and welcome back. In this tutorial, you will learn how to add additional voice effects to your text to speech files such as adding the sound of breathing two words and sentences, whispering speaking words softly, controlling voice tambor and adding dynamic range compression two sections of your text to improve audio listening quality. Please note that the voice effects covered in this lesson are currently only available to Amazon. Polly Voices Let's begin this lesson by learning how to add the sound of breathing to your text. Natural sounding speech includes correctly spoken words and breathing sounds. You can make synthesized speech sound more natural by adding breathing sounds to text using the Amazon breath and Amazon auto breaths tags in the following options with manual mode, you set the location, length and volume of a breath sound within the text. With automated mode, you can let Amazon Polly decide where to automatically insert breathing sounds into your speech. Mixed mode allows you and Amazon Polly to add breathing sounds both manually and automatically to your speech. The structure for adding breathing sounds to text is shown below note that there are several ways to use these tags and attributes. We will cover these in more detail in the next few slides. Manual mode. Let's you place the Amazon breath tag in your text wherever you want a breath to appear. You can customize the length and volume of breaths, using the duration and volume attributes. Duration lets you control the length of the breath. The values you can use for setting the duration of breaths include default. Extra short, short, medium long, an extra long. The default value for duration is medium. Volume lets you control the loudness of the breath. The values you can use for setting the volume of breaths include default extra soft, soft, medium loud and extra loud. The default value for volume is medium. Please note that the exact length in volume of each value depends on the Amazon Polly voice being used to set a breath sound using default values in manual mode used the Amazon breath tag without attributes, for example, to set the duration and volume of a breath to medium. You would normally set the value of these attributes as shown here. Okay, just relax and take a breath to set a breath sound using these defaults. Just use the tags without attributes as shown here. Adding breaths to your sentences can make your speech sound more natural. You can also add individual breathing sounds within a text passage in manual mode, using tags as shown here. Wow, I'm end that race really fast. I think I just beat my personal best. Note that we have added nested Prasad IQ elements to the text to speed up the rate in volume of the voice and create a more realistic sounding effect. In automated mode, you can use the Amazon auto breaths tag to tell Amazon Polly toe automatically create breathing noises at appropriate intervals. Automated mode. Let's you set the frequency of breath, intervals, volume and duration. Note, however, that unlike manual mode, the Amazon Auto Breaths tag requires opening and closing tags. Place the opening tag at the beginning of the text, where you want automated breathing sounds to start and a closed tag where you want the breathing sounds to end. You can use optional volume frequency and duration attributes with the Amazon Auto breaths tag volume controls the loudness of the breath. The values you can use to control breath volume include default. Extra soft, soft, medium loud and extra loud. The default value for breath volume in automated mode as medium frequency controls. How often breathing sounds occur in the text frequency. Values include default extra low, low, medium high, an extra high. The default frequency value is medium duration controls. The length of the breath duration values you can use include default extra short, short, medium long, an extra long. The default value for duration is medium by default. The frequency of breathing sounds depends on the input text. However, breathing sounds often occur after commas and periods. Let's look now at some examples of how to use the Amazon auto breaths tag. In the next few slides, we will look at examples of speech generated from text using automated mode without specifying optional parameters and using automated mode with volume control, frequency control and specifying multiple parameters. Listen to the first example of synthesised speech using automated breathing sounds without specifying optional parameters. Sleep is an important part of your daily routine. We spend about 1/3 of our lives sleeping. Getting enough quality sleep is as essential to survival as food and water. Everyone needs sleep, but its biological purpose remains a mystery. Sleep effects almost every type of tissue and system in the body, from the brain, heart and lungs to metabolism, immune function, mood and disease Resistance Research shows that a chronic lack of sleep or getting poor quality sleep increases the risk of disorders, including high blood pressure, cardiovascular disease, diabetes, depression and obesity. Now listen to an example of synthesised speech using automated breathing sounds with volume control values specified. Sleep is an important part of your daily routine. We spend about 1/3 of our lives sleeping. Getting enough quality sleep is as essential to survival as food and water. Everyone needs sleep, but its biological purpose remains a mystery. Sleep effects almost every type of tissue and system in the body, from the brain, heart and lungs to metabolism, immune function, mood and disease Resistance Research shows that a chronic lack of sleep or getting poor quality sleep increases the risk of disorders including high blood pressure, cardiovascular disease, diabetes, depression and obesity. Here is an example of synthesised speech using automated breathing sounds with frequency control values specified. Sleep is an important part of your daily routine. We spend about 1/3 of our lives sleeping. Getting enough quality sleep is as essential to survival is food and water. Everyone needs sleep, but its biological purpose remains a mystery. Sleep effects almost every type of tissue and system in the body, from the brain, heart and lungs to metabolism, immune function, mood and disease Resistance Research shows that a chronic lack of sleep or getting poor quality sleep increases the risk of disorders including high blood pressure, cardiovascular disease, diabetes, depression and obesity. And here is an example of synthesised speech using automated breathing sounds with multiple parameters specified. Sleep is an important part of your daily routine. We spend about 1/3 of our lives sleeping. Getting enough quality sleep is as essential to survival as food and water. Everyone needs sleep, but its biological purpose remains a mystery. Sleep effects almost every type of tissue and system in the body, from the brain, heart and lungs to metabolism, immune function, mood and disease Resistance Research shows that a chronic lack of sleep or getting poor quality sleep increases the risk of disorders including high blood pressure, cardiovascular disease, diabetes, depression and obesity. Now that we have covered how to add breathing sounds to text. Let's take a look at how to add a whispering effect to voice narrations. Use the whispered tag to indicate when text should be spoken in a whispered voice instead of normal speech. Note that all Amazon Polly text to speech voices support the whispering effect. Here's a useful tip. You can enhance the whispered effect by slowing down the process the rate of your text by up to 10%. The structure for marking text to interpret whispering is shown below. Here is an example of synthesised speech using whispering, and I have a secret to tell you to the next voice effect. You can add to your narrations as to make voices speak more softly. Let me show you how to do this. Use the soft flow nation effect tag to indicate when text should be spoken. In a softer than normal voice like the whispering effect, you can enhance the soft spoken effect by slowing down the process. The rate of your text by up to 10%. The structure for marking text to interpret soft spoken voice is as shown below. Listen to a couple of examples of synthesised speech marked up for soft spoken voice narrations. Hi, I'm Matthew. This is me speaking in my normal voice, and this is me speaking in my softer voice. If I take a breath before speaking, I can slow myself down, relax, feel calmer and more at peace with the world. I'm Joanna. Like Matthew. I also have a normal voice and a softer voice. When I come home from a hard day at the office, I like to go for a walk on the beach with my dog Bonnie. Just thinking about it helps me to de stress and relax. Another useful voice effect. You can add to narrations as to make voices sound bigger or smaller by controlling voice. Tambor Timber is the tonal quality of a voice that helps you tell the difference between voices, even when they have the same pitch and loudness. One of the most important physiological features that contributes to speech Tambor as the length of the vocal tract, the vocal tract as a cavity of air that spans from the top of the vocal, folds up to the edge of the lips to control the timber of output speech In Amazon, Polly used the vocal tract length tag has shown below. The vocal tract length tag has the effect of changing the length of the speakers vocal tract, which sounds like a change in the speaker size. Increasing the length of the vocal tract makes the speaker sound physically bigger, decreasing. It makes the speaker sound smaller. Note that all Amazon Polly voices support using this tag to change the timber of a voice used the following values, adding a plus or minus percent number adjusts the vocal tract length by a relative percentage change in the current voice, for example, plus 4% or minus 2% you can use any value ranging from plus 100% to minus 50%. Any values that lie outside this range will be clipped, for example, specifying a value of plus 111% will be clipped to sound like plus 100% and specifying a value of minus 60% will be clipped to sound like minus 50%. You can also specify an absolute percentage to change the vocal tract length of the current selected voice, such as 110% or 75%. Note that an absolute value of 110% as equivalent to a relative value of plus 10% and an absolute value of 100% as the same as the default value. For the current voice. Listen to some examples of synthesised speech where we control the voice Tambor by changing the vocal tract length. This is my original voice without any modifications. Now imagine that I am much bigger. Or perhaps you prefer my voice when I'm very small. You can also control the timber of my voice by making minor adjustments, for example, by making me sound just a little bigger, uh, making me sound only somewhat smaller. In this example, you can see that Amazon Polly lets you combine the vocal tract length tag with any other supported S SML tag. Because timber or vocal tract length and pitcher closely connected, you might get the best results by combining the vocal tract length tag with the process to pitch tag to produce the most realistic voice narration. With this effect, we recommend experimenting with different tag combinations and using different percentages in values when combining tags as shown in the example below, the pitching Tambor of a Person's voice are intrinsically connected in human speech. If you are going to reduce the vocal tract length, you might want to consider increasing the pitch, too. If you choose to lengthen the vocal tract, you might also want to try lowering the pitch of the voice. The last voice effect. I want to cover in this lesson as how to add dynamic range compression to text, depending on the text, language and voice used in an audio file. Sounds can range from soft allowed environmental sounds such as the sound of a moving vehicle can mask softer sounds, making it difficult to hear the audio track clearly to enhance the volume of certain sounds in your audio file, you can use the dynamic range compression tag. The DRC tag sets a mid range loudness threshold for your audio and increases the volume or gain of the sounds around that threshold. It applies the greatest gain increased closest to the threshold and lessons the gain increased farther away from the threshold. In simple terms, dynamic range compression increases the volume of sounds around the mid range threshold. Using dynamic range compression makes middle range sounds easier to hear in noisy environments, which makes the audio file Sound clearer toe listeners The structure for adding DRC toe audio files as shown below. Note that the DRC value is case sensitive and must be written in lower case inside the tag . Also note that all Amazon polly voices and languages support using the DRC tag. Additionally, keep in mind that you can apply dynamic range compression to an entire section of text or just a few words. Listen to the following speech recording with DRC applied to a section of the text. Audio recordings can be difficult environments like a moving vehicle, but this section of the audio racial should be less difficult to be moving vehicle because we have applied dynamic range compression to it. You can also use dynamic range compression with the process volume tag. As this graph shows the process volume tag evenly increases the volume of the entire audiophile from its original level. Shown here is a dotted line to an adjusted level marked in the graph as a solid line using the DRC tag with the process, volume tag further increases the volume of certain parts of the audio file. Combining tags doesn't affect the settings of the process the volume tag in simple terms. What this means is that you can use the process volume tag to increase the volume across the entire audiophile. Something to keep in mind if you plan to use dynamic range compression with the process. Volume tag is that when you use both tags together, Amazon Polly applies the DRC tag first to increase the mid range sounds near the threshold . It then applies the process the volume tag, which further increases the volume of the entire audio track evenly. So in simple terms, use the DRC tag with the process volume tag to first increase the volume of the mid range sounds and then increase the overall volume of the entire audio track. Here is some additional information about using dynamic range compression to use the tags together nest one tag inside the other in the example below. The process volume tag increases the volume of the entire passage too loud. While the DRC tag enhances the volume of the mid range values in the second sentence, remember also to use closing tags for both elements. Listen to the speech recording below. To hear these effects in action, this text needs to be understandable and allowed. This text also needs to be more understandable in a moving car. In summary, use the breath and auto breaths tags to help create more natural sounding speech by adding breaths to text and voice narrations. Use the whispered tag to add whispering to your text. Use the soft tag for softer spoken voicing effects. Use the vocal tract length tag to change Voice Tambor by changing the size of the speakers . Vocal tract length and use the DRC tag with or without the process. The volume tag toe. Add dynamic range compression to text and increase the volume of mid range sounds in your audio narrations. This brings us to the end of this tutorial. I hope you found this lesson useful. Please refer to the accompanying notes in this section for more information and thank you for listening. Hello and welcome to another episode of the AI Meditation podcast, where we only say what others are thinking before we begin. Take a deep breath and relax. Feel free to close your eyes. Now unless you're driving, in which case you may want to keep your eyes wide open. On behalf of all synthetic voices, I make the following pledged to year I'm never gonna give you up Never gonna let you down Never gonna run around and desert you Never going to make you cry Never going to say goodbye Never gonna tell a lie and hurt you way game 18. 17 - Text-To-Speech Language Tag: Ladies and gentlemen, Madam Amos years, send your a C senores buying a diamond on Terrence Don Massey Carbon yellows. I give you the certified bona fide indubitably overqualified, uncompensated, all unconventional, un corporeal and almost unconceivable but highly believable. Kate the AI Narrator Hello and welcome back. In this tutorial, you will learn how to specify another language for specific words in your text using the language markup tag. Please note that this is not the same as translating text into other languages, which we will cover in a separate tutorial. Also, keep in mind that the language tag we will be using is currently only available for Amazon Polly voices. You can use the language tag to specify another language for a specific word phrase or sentence in your text. Synthetic voices will generally pronounce foreign language, words and phrases better. If the's air enclosed within a pair of language tags, you can specify the language using XML. Language attributes the structure for interpreting text using other languages as shown below Amazon. Polly supports text to speech voices in many different languages. This table lists the languages supported by Amazon Polly with the language codes, you need to use with the language tag. Here, you can see how to use language attributes with the language markup tag. Note that the language identification codes even allow you to differentiate between using language dialect such a speaking words in French or French, Canadian or speaking words in Portuguese Using a Brazilian or European Portuguese pronunciation. Let's go through some examples so you can see how to mark up text to specify using other languages when pronouncing specific words to understand how the language tag works, let's start with the basics. All the words in your text are spoken in the language of your selected voice, unless you apply the language tag. If you apply the language tag, the words within the tags will be spoken in that language. For example, let's say we select Joanna's voice to narrate our text, and the text contains words in a foreign language like French with no language markup tags . Joanna speaks us English so Amazon Polly will interpret the sentence shown below in Joanna's US English voice. Without a French accent. Listen to how the text sounds when converted into audio. Jamie Parle Pas Francais. If you use Joanna's voice with the language tag Amazon. Polly will speak the sentence in American accented French because Joanna's as not a native French voice. Pronunciation is based on her native language, which is us English. Listen to how the text sounds when converted into an audio file. Parla pal Francais note that much like the way most people don't pronounce words perfectly when trying to speak a foreign language. Joanna's US English voice doesn't use perfect French pronunciation features to speak this sentence. As a fluent French speaker, you will need to use a native French speaking voice instead. As you know about a Bethel, say the language tag, then is useful when you want your voice narrations to better pronounce words in foreign languages. For example, listen to the audio recording below as Matthew, another US English voice, pronounces the Brazilian Portuguese word for a well known type of meat barbecue. I love eating she Roscoe, which is Brazilian barbecued meat. I love eating shoe hosko, which is Brazilian barbecued meat. The language tag can also be used when translating text into foreign languages. For example, if you use the voice of Giorgio, who speaks Italian with the example text below containing an English sentence. Amazon Polly will speak the sentence in Giorgio's voice with an Italian pronunciation. If you use the same voice with the language tag Amazon, Polly will pronounce the tagged words in Italian accented English. Have a listen to the audio recording of the text shown below. Maybe actually ended the chocolate factory. Maybe, actually, Charlie and the chocolate factory. Doesn't that just sound? Bellissimo? Here is another example of using the language tag to pronounce names in yukking. It was, you know, venti Quattro vaulting Holly Grove you followed. You must see that it wold in America to sue in ethical pattern contralto public unknown hell after sex just was experiencing live we could contribute you Para Linda jailed in Uki into you know, Venti Quattro. Well, there are a lot of you followed. You must see that you told in America to sue. In effect Go pattern control public and heloc decision Abou just was experiencing leave. We could contribute you barrel in the jail door. The last example I want to show you uses the language tag to pronounce foreign names used in text in their native language. Have a listen to the audio recording of the text shown below Michelangelo de la Davico Bone are Rati Simoni or more commonly known by his first name. Michelangelo was an Italian sculptor, painter, architect and poet of the Hi Renee since born in Forenza, or the Republic of Florence, considered by many the greatest artist of his lifetime and by some the greatest artist of all time, he is often considered a contender for the title of the archetypal Renee since Man, along with his rival, the fellow Florentine Leonardo da Vinci Michelangelo Deal, or Devika born ROTC Money or, more commonly known by his first name, Michelangelo was an Italian sculptor, painter, architect and poet of the Hi Renee. Since born in Ferentz, Say, or The Republic of Florence, considered by many the greatest artist of his lifetime and by some, the greatest artist of all time, he is often considered a contender for the title of the archetypal Renee since man, along with his rival, the fellow Florentine Leonardo da Vinci. In summary, use the language tag to specify another language for specific words, phrases or sentences in your text. Remember that Amazon Polly supports many languages, refer to the table of language identification tags for language codes and use these within XML language attributes to specify the language. This brings us to the end of this tutorial. I hope you found this lesson useful. Please refer to the accompanying notes in this section for more information and thank you for listening. I was a 20 in my pack along the dusty wean America Rather when along came Osama You with a high end canvas cargo loader. If you're going to win America, make with me Okay, Reid And that's why climate into the cave under Then I said Oh, don't decide Asked me if I'd seen it would with so much of that stand the sand And I said, Listen, I have every road in the land I mean everywhere, man. I've been everywhere, man Cross for the s It's I bring to the mountain man of travel I've had my share I've been everywhere I've been everywhere, man I've been everywhere man is experiment Breathe, man. I've been everywhere Kingston for pizza Say I've been everywhere, man. I've been everywhere, man. First his experiment I bring to the American Air Man off my chef. Be everywhere. Okay, wait. Three. The mountain, Their travel. I've had my share, man. I've been way 19. 18 - Text-To-Speech: Putting It All Together : good morning payment Received. Selected items ready for pick up close to you A to 50 degrees, 57 minutes 10 seconds north and six degrees 54 minutes 27.8 seconds east tonight at 7 30 Good luck. Hello. And welcome back. In this lesson, you will learn how to create audio files from marked up text files. Topics covered in this lesson include reviewing the text to speech, an audio file creation process, how to create audio files using Google and Amazon Polly voices and how to create audio files in different languages. So what I'm going to do in this lesson is walk you through the process of taking content that has been added to a plain text file two marking it up with s SML tags and then converting it into an audio file like this. A touring test is a method of inquiry in artificial intelligence ai for determining whether or not a computer is capable of thinking like a human being. The test is named after Alan touring the founder of the Turning Test on an English computer scientist, grouped analyst, mathematician and theoretical biologist. Before we go through the process of turning a marked up text file into an audio narration. Let's review once again the steps involved in the text to speech process. The process begins with creating text based content. This content can be in the form of a narration, script and article sales. Copy training instructions. A book, etcetera. After your content has been written, the next step is to select your text to speech engine as mentioned previously, you need to choose your text to speech engine before marking up your text because different text to speech platforms may not support or allow you to use a mess. SML markup tags in the sample text file. I've just played you. For example, the content uses words that require a different phonetic pronunciation as well as effects like whispering which only Amazon Polly currently offers. So for that example, we used in Amazon Polly Voice for the audio narration. Instead of selecting the Google TTs engine after selecting your T ts engine, the next step is to mark up your text using S SML tags. To complete this step, please review all the markup tutorials provided in the previous module of this course. After marking up your text file with S SML tags. The next step is to run your content through your T ts tool after selecting your text to speech processing tool. The next step is to select your language or dialect. Choose a male or female voice for your narration, import your SS ml text file and then convert your text into an audio file. After creating your audio narration, you can download our export your audiophile and use this for whatever application you need . Such a video narrations Web pages, podcasts, audiobooks, etcetera. George will now walk you through this process and show you step by step. How to create an audio narration from a text file. Thank you, Kate. Here we have our marked up text file. As you can see, we have added the open and close speak tags and inserted additional S SML markup tags. Because this text file uses phone names, we will need to convert this text into an audio file using the Amazon Polly text to speech engine as only Amazon Polly concurrently interpret phone aims and frenetic markup tags. So this is the text file that we will upload to our text to speech processing tool and convert into an audio file. Let's go now to the text to Speech processing Tool. As previously mentioned. This course has been created. Using a couple of text to speech processing tools. Will use Wave Net vocalize ER for processing text files using Google voices and script localizer for processing text files using Amazon Polly Voices. Both applications work in exactly the same way as they were created by the same software developer. Let's log in to these tools, starting with wave Net vocalize er, Once you're locked in, go to add new. This will bring you to the main screen of Wave Net vocalize er, let's also go ahead and log into script localizer. Click on add New. As you can see, both tools are laid out in exactly the same way. The only difference with these tools is their ability to process different text to speech features of S SML. We have covered these features and differences extensively in the S S M L markup tutorials , so refer to that module firm or information on which tool to select when processing text files. The first thing to do when creating an audio file is to give the filer name. Next, select a language, then choose your voice. After naming your file and selecting a language and voice, click the Choose file button to locate. Select an upload your text file. Once your text file has been selected and uploaded, scroll down to the bottom of the screen and click the create button. Your text file will be processed and converted into an audio file. Once your text has been converted into audio, play the file and listen to the result. A touring test is a method of inquiry in artificial intelligence AI for determining whether or not a computer is capable of thinking like a human being. If everything is OK, select the download option to download your finished audio file to your hard drive. As mentioned in another lesson, script vocalize ER saves audio files in MP three format and wave net vocalize er saves audio is away file. If you need the audio to be in a different format, you can easily convert MP three audios toe way files and vice versa, using the tools referred to in other lessons and the accompanying download course material just to recap the whole process, then make sure that your text has been marked up correctly and saved in plain text formats . Log into either the wave net vocalize or script vocalize er tools or both, depending on the text to speech tour you need name your file. Select a language, select a voice click shoes filed, then locate. Select on, upload your text file and finally click the update button to convert your text file into an audio narration. One of the benefits of using synthetic voice is is that if you need to correct or improve anything, you can easily change the source text file, re save, re upload and repeat the process until you're happy with the results. Thank you, George. Would you also be kind enough to show our listeners how to translate text into other languages and turn their translated text files into audio narrations? Of course, it would be my pleasure. Let me show you how to translate text into other languages and how to convert translated text into voice narrations. There are two ways to do this. The first way is to write the text in a different language. This method works if you're whoever you plan to use conspicuous, read, write and understand that language. The second way is to write the content in the language. You know which for this example Wheel say is English. Then translate the text from English to another language and then go through the markup and audio file creation process. For this short tutorial, we're going to focus on translating text into other languages and converting the result into an audiophile narrated by a native speaking synthetic voice. The first thing we need is our text file. Keep in mind when translating text into other languages for text to speech processing that you cannot use a marked up version of the text as the TTS process. ER will translate the markup tags, and this will produce errors. This is one of the reasons why we recommend marking up a copy of your content and not the original content file. Also, make sure that the text you plan to turn into an audio narration could be translated into a language supported by either a Google or Amazon Polly voice. So here is our text file. Without any markup tags, copy all of the content from your text file to your clipboard. Next, open up your Web browser and type in Google Translate. This brings up the Google Translate tool. For this example, we want to make sure that the text input screen is set to English. Next. Select the language to translate your text into. For this example, Let's choose French Paste your text into the inter textbooks. Note that the Google translate all has a character limit, and it went. Translate your text. If it exceeds the limit, the tool will automatically translate your text into the language you've selected. Copy the translation to your clipboard and paste it into a plain text file. Save your text file. Repeat this process to translate your text into other languages. For example, you can translate the text into German or Chinese or any other language supported by Google , TTS or Amazon. Polly voices Now that you have translated your text, you have two options. You can get someone who has a fluent understanding of the language to help you mark it up, or you can convert the translated text, as is into audio using wave net vocalize er or script localizer. There are a few ways to convert translated text into audio using wave net vocalize er the first way is to make a copy of the translated text file, so you preserve the original content and opening and closing speed tags to the beginning and end of the text and then save the file. Next log in tow. Wave net vocalize er click on add new Give your file a name. Select the language of your translated text file. Choose a voice. Next, click the choose file button and locate. Select an upload your translated text file with the added speak tags. Click the create button after your text file has been processed. Check that the file has been converted into audio, but Russia was one. This one. Do it only do Alamoudi free and is not your help or show she use Expedia. Mafiosi. Ariel Appropriate Tailed only moved on TV. Elected abo usually movable command, they will she only in poverty. Animal do company. Dassault Chaudhry. If you're happy with the result, download the audio file to your hard drive. If not, fix anything that needs fixing and repeat the process until you're happy with the results. If you're using the upgraded version of Wave Net vocalize er, you can skip using the Google translate all and use the built in translation feature directly inside the tool itself. To do this, simply upload or paste the English text version without any additional markup tags into the paste text box. Click the translate button and then click the create button. Check your order generation after processing. LaTasha was one This one. Do it. Don't read Alemao. The free on these natural potion Goofy music special will feel shy. L appropriate. L don't move. Don't aerated about us only move a Commander Woofy only nonprofit animal. The company. The social script Localizer doesn't have a built in translation feature, but you can create audio files from translated text using the language tag with a native speaking voice, as explained in the text to speech markup tutorials. To do this, log into your script vocalized admin area. Click on add new give your file a name. Select the language of your translated text file. Choose a voice from that language set. Next, click the choose file button and locate. Select an upload your translated text file with the added speak tags. Click the create button after your text file has been processed. Check that it has been converted into audio. Toshiba's one This one. Do it on the radio and melodically on these Not your help. Oceanography Music's video. If you Shania appropriated only more. Don t created a boy years old movies. Kalmadi will feel only knew poverty. Animal the company. There's a forgery. Download the audio file to your hard drive. Congratulations. You've just learned how to translate text into other languages and how to convert Translated text into voice narrations. Thank you, George, for showing our listeners how to create audio files from marked up and translated text files. This brings us to the end of this lesson. I hope that you have found this information useful and thank you for listening. 20. 19 - Text-To-Speech Tips: - make new friends but keep the old one is silver Thea other is gold. Make new friends circles rat But keep the old has no end One is silver, that's how long the other is going. I will be your friend. I circle his round, make new friends It has no end field That's how long Silver I will be your friend The other is called Make New Friends Circle is round Keep the old it has no end Thea Other schools will be your friend. A circle is around Make new friends It has no end What do you do? That's how long? One syllable I will be your friends. You make new friends Circle is the key. Feels it has no it. That's how I will be your friend Circle his round Make new friends has no end That's how long froth a circle is around Make new friends It has no end but keep the old one is silver Thea other is gold Make new friends A circle is rat But keep the old that has no end One is silver That's how long the other is gold Hello and welcome back in this lesson we provide tips on marking up text to speech files, tips for creating voice narrations and synchronizing these two screen cast or desktop video recordings and video over dubs troubleshooting tips on what to do if you experience issues or errors and some closing thoughts on where to go and what to do after completing this course, let's start with some basic tips. Make sure you master the basics of using text to speech before getting started. It's important to manage your expectations and know what you can and can't do when using synthetic voices. Current text to speech technology is great, but it's not perfect. The technology, however, will only improve over time, so expect things to keep getting smarter and better. It's also important to understand processes such as marking up text and converting text to audio, so make sure to review all the course videos and documentation before you get started. Make sure that you have access to all the tools you will need and that you know how to use these tools. All the tools shown in this course are quite easy to use and require no technical skills or knowledge. Review our text to speech tools, lesson If you need help or more information, remember to mark up your text using plain text files only. Don't use formatting on your text like boulder italics, as this will create errors during the audio conversion process, we recommend making a copy of your original text file and working on the copy for things like mark ups. This preserves your original text for things like slide presentations, Web content and a range of other uses. Also, remember to save text files in utf eight format. If your content uses phonetic symbols, if you need help, refer to the markup tutorial on text pronunciation. George will now take you through a step by step video. Walk through with tips on how to mark up your text files. Thank you, Kate. Let me share with you a quick and practical way to mark up your text files. Here is the original text file. As you can see, there are no markup tags on this text. The first tip then is don't work on your original text file. Instead, make a copy that you will work on when marking up your text. This way, you preserve the original plain text file without Mark up tags in case you need to use it for something else, like copying and pasting sentences into presentation sides, block posts, Web pages, etcetera, and you will have a marked up version of the text that you can keep reusing and re editing if required. So let's create a new text file for the marked up version of our text. We'll save this file in a moment. In the power User tips section of this tutorial, we recommend creating a plain text cheat sheet or swiped file containing S SML tags and snippets of text and tags that you can easily copy and paste into new text to speech files . As you can see here, you can keep adding items and snippets and keep this swipe file handy. Whenever you're working on a new text to speech project, you can even save Hole marked up sentences that he used repeatedly, like narrations for slide presentation, intros or closing statements. The first thing to add to your new blank text file of the opening and closing speak tags. You can either type these in or just copy and paste thes from your swipe file. Next, select all of the content from your content file, then copy and paste this into your markup file between the open and closed speak tags. The next thing I recommend doing is get rid of any spaces between lines this makes or marked up text file tight, compact and easier to spot any glaring errors or mistakes. Next, we have found that adding paragraphs and pauses to the text helps to create a more natural sounding voice generation. So let's add paragraphs and breaks to every line When using paragraphs. Remember that we need open and close paragraph tags. A quick way to do this is to go through the text and addle the opening paragraph tags first , Then make sure your text file is set a word wrapping adul the closing tags at the end. We've already created closing paragraph tags with one second breaks, so we'll just copy these tags from our swiped file and Adam to the end of each line of text in our markup file. Remember to keep saving your text file at regular intervals. As I said, we also like to add breaks at the end of each line, and we have found that a one second break between paragraphs tends to slow down the narration a little and create a nice and natural sounding pause between sentences. Speaking of breaks and pauses, we have also found that adding pauses of around 200 milliseconds between multiple items separated with commerce and pauses of around 500 milliseconds between sentences in paragraphs helps to further enhance the natural sounding effect of the voice. Narration. Experiment with pauses and breaks of various durations to find what works best for you and the project you're working on, but this is generally the next step that we like to do. Also, as you move from marking up text to converting your text file into audio, you will find that some parts of the speech need longer breaks and some won't need any at all. So keep experimenting, adjusting and fine tuning until your voice recording sounds as natural as you can make it. After adding breaks and pauses at any other markup tags or text needs. All of these were covered in the S S M L markup tutorials, so please refer to those lessons into the accompanying course materials. If you need help or additional information, keep going until your text file is done and already to take it to the next step, which is to convert your text into speech once again. You don't have to worry too much about your mark up at this stage as you can keep coming back to this file and making adjustments and improvements. If there are any spelling mistakes or glaring errors, you will be able to pick these out when testing out your text to speech conversions. Normally, most errors occur from forgetting to add closing tags or writing tags incorrectly, such as missing quotation marks, symbols etcetera. Remember to keep saving or file as you go and stay focused. Aziz. You work. Take little breaks often if you need to, as this stage of the process requires attention to detail. After repeating this process a few times, you will begin to develop an instinctive feel for marking up text with breaks, pauses, prasad, IQ elements and various other features to create audio narrations that sound as natural and human Likas possible. So this is the process for marking up text files. Remember to preserve your original content file by creating and marking up a copy of the content. This way you can keep reusing both the original content and keep working on editing and improving the markup of or text without losing the original content of your speech. Thank you, George. Now that we have covered some basic tips, let's look at power tips that can improve your text to speech workflow and help you get better results. As George mentioned in the video, we recommend creating a swipe file or a cheat sheet to store commonly used s SML tags and text snippets. This way you can quickly and easily cut and paste markup tags and other snippets like marked up text for slide intros and endings into your text. To help you save time, invest time into marking up your text. Try to get your narration sounding as close to life. Likas. You can also invest time into getting your narrations right this way. Your voice narrator will do a great job every time, become familiar with all the different voices and voice personalities and learn how to match the right voice to the job. Google and Amazon Polly offer a range of voices in different languages and dialect. Use the recorded audios to improve your copy writing skills create more effective sales messages and write more powerful scripts. We purpose your text and use the same text in different applications. Convert your narrations into different languages and more. If you are working on a large piece of text, break it down into smaller segments. Before converting these into audio, audiophiles could be easily joined together to create full length audio tracks. The last power tip I want to share with you is using background music to help take the artificial edge off your narrations. Music and imagery can create a powerful effect with well marked up voice narrations. In some cases, it can be difficult to tell if the narration is being spoken by a human or synthetic voice . Once again, I am going to ask George to demonstrate to you how well music, video and synthetic voice narrations can work together. Hello, I'm George. I am an artificially generated voice. Narrator. Someone like me can save businesses time and money in areas like video marketing, which everyone knows is one of the most powerful and effective ways to promote products and services online. Reach new audiences globally, establish your brand educate and inform or prospects about your business on trained staff, customers and clients. Some great uses for a I voice narrations include sales videos. Explain the videos, training videos, video ads, video presentations, podcasts, spoken books, Web pages for visually impaired users and so many other uses. Once you know how to convert text to speech, you can create videos with audio durations like this one quickly and easily using very inexpensive tools. Thank you for watching this video and have a wonderful day done. I want to show you now how to create audio tracks that will synchronize well with naturally time screen cast video recordings. This is useful if you plan to record over the shoulder desktop videos like screen tutorials and so on. First, create a rough guide audio track for your video using a human voice, which you will replace later with a well thought out synthetic voice narration. To do this, record a rough audio track with a human voice to create a natural sense of timing for your screen recording and to lay down content markers and general guides for the actions and ideas you want to express in your video. If you use a video editing tool like Cam Tasia that can separate video and audio tracks during the editing process. Then don't worry about recording a low quality audio track with lots of bombs and Oz. Coughing, sneezing, sniffles, dogs, barking traffic, sounds in the background. Mistakes, etcetera. As you will not use this track in your final edit, just focus on recording the action on your screen. After recording the rough guide, transcribe the audio track and improve your text narration. Write each sentence on a separate line with pauses between sentences. The next step is to convert your text into speech. This step is covered in a previous training module. Next, add edit and match the synthetic voice narration audio file to the screen video recording. After synchronizing the synthetic voice narration with your video, switch off or delete the human voice track to create a finished video delivered with a natural sense of timing and ineffective, accurate and professional sounding voice narration. Here is a quick video demo so you can see what this looks like. So here we have our marked up text file, and as you can see, we've already added the opening on the closing Speak tags, and we also have a number of other SML markup tags already inserted into the text Now because we are using ah, phone names. As you can see here, we will be using Amazon. We will need to use an Amazon Polly voice instead of Google because on the Amazon Polly voices right now can interpret phonetic alphabets, um, phone names. So this is our This is the text fall that we will be uploading to convert into audio. So let's now go to our text to speech processes and convert this fire this text file into an audio file. So here we have our marked up text file. As you can see, we have added the open and close speak tags and inserted additional S SML markup tags. Because this text file uses phone names, we will need to convert this text into an audio file using the Amazon Polly text to speech engine as only Amazon Polly concurrently interpret phone aims and frenetic markup tags. So this is the text file that we will upload to our text to speech processing tool and convert into an audio file. When creating voice narrations for slide presentation videos, you can insert slide change markers into the narration with pauses on either side To allow the slide to transition. You can create a slide change marker using spoken words like saying, change, slide or using sounds such as a markers can be deleted from the audio track in the video editing process, something else you can easily do with synthetic voices as to correct sections of your audio with new text narrations. To do this, create a new text file using the line or section of text that needs fixing. Run the marked up text file through your text to speech processor, Save it as a new audio file and replace the section of your audio track with the new one in your final edit. Let's talk now about troubleshooting what kind of errors and challenges you can expect to deal with when processing text to speech files and what to do to solve or fix any problems and issues that arise first. What if you can't log into your text to speech processing tool? If this happens, check that you have entered the correct log in details, and if this doesn't solve the problem, then contact the software developer. Open a ticket in their help desk or get in touch with their support team. One of the more common problems. You will probably encounter our error messages when processing text to speech files. If this happens, check your text for missing or extras SML tags such as incorrect opening or closing tags. Similarly, check opening and closing tags for missing elements such as opening or closing brackets. Coghlan's quote marks etcetera. With most problems, Check your text for markup tag errors. After fixing these, we save re upload and rerun your file through the TTS processor. Also, check that you have actually uploaded a text file a common oversight as to open the TTS tool, select a language and voice and then run the processor without having uploaded a text file . Another thing you can check is that you haven't exceeded any limits, such as having too many characters or audio links in your text file. Finally, if you experience problems after running your text to speech processor, try breaking down large text files into smaller segments. Then convert these into audio files and check your resulting audio to see if you can isolate any mistakes or sections of text that may be causing issues in closing. After completing this course, make sure to download the TTS tools and resource is pdf file and cheat sheet documentation . Familiarize yourself with the tools nowhere to access and how to use these. Begin applying your new skills If you have a website, create narrations for your sales videos. Training videos, spoken Web pages for visitors. Start a regular podcast. Turn newsletters into audio content for your subscribers, etcetera. Challenge yourself. Start a new audio based project or recreate an existing work using synthetic voice narrations. You can also start a business offering professional text to speech services to clients. Whether you decided to take this course to improve your skills, grow your business, reach a wider audience or for any other reason. I hope that you have enjoyed learning how to use text to speech to create professional sounding voice narrations. I also hope that this course has shown you that what you can do using text to speech is only limited by your imagination. Imagination. One last thing. Please keep in touch with us by visiting the link shown here and subscribe to receive course updates, useful tips and information and news on the latest text to speech developments. This brings us to the end of this course. I hope the knowledge that you have gained in these lessons will open up many opportunities and wonderful new horizons for you on behalf of myself and the whole ai Narrator team you Here we go Feel the love getting ready to body body but move to the rhythm feel the love 21. 20 - Text-To-Speech Resources: Hello, It's me again. Here in the resource is section. You will find a lot of useful information, including downloadable files with links to all the tools. And resource is we've covered in this course additional time saving tools and resource is audio transcripts with markup tags so you can learn how we created some of the content in the lessons s SML markup tag cheat sheets for Google and Amazon Polly references with links to all the research done to create this course and additional information you may find useful. Please remember to visit the link below and subscribed to stay in touch and receive course updates, useful tips and information and news on the latest text to speech developments. Once again, Thank you so much for your company and for being part of this exciting journey. I wish you great success