Transcripts
1. About Course: In this video course, learn hugging phase
and its concepts. Hugging Face is a
company and open source community that focuses on natural language processing and artificial intelligence. It is best known for its
transformers library, which provides tools and pre trained models for a
wide range of NLP tasks, such as text classification, sentiment analysis, machine
translation, and more. In this course, we have covered the following lessons with
live running examples. Let us start with
the first lesson.
2. Hugging Face - Introduction and Features: In this lesson, we will
learn what is hugging face. With that, we will
also understand the features. Let us start. Hugging Face is a widely
known company and open source community
that focuses on NLP, that is natural
language processing. It also focuses on
artificial intelligence. Hugging Face is best known for its transformers library that provides tools and
pre trained models. So that wide range of NLP tasks such as
sentiment analysis, machine translation, text
summarization can be performed. The most widely used
hugging face libraries are transformers,
datasets and tokenizers. Let us see the features. It
includes lots of libraries. One of the key library
is transformers. That includes pre trained
models such as BT. Hugging face also
includes Model Hub. That is a platform where users can share and download
pre trained models. With that, users can also download datasets
and other resources. Hugging Face also includes a library for a
variety of datasets. The library is called Datasets Library and
is used for NLP task. Hugging Face is also
having a platform for hosting and sharing machine learning demos and applications, which is called spaces. With that, using Hugging Face, you can easily deploy and use models in production
environments. Hugging Face is having a strong community
and collaboration. That is a community
of developers, A lovers that contribute
to the ecosystem. Let us see some of the popular
models on Hugging Face. The widely used bird it is used for understanding the context of words
in a sentence. Its full form is bidirectional
encoder representations from transformers. It is a powerful open source
machine learning framework developed by Google for NLP. It excels at understanding the context of words
and sentences by analyzing relationships
between them in a bidirectional manner that allows computers to better understand the meaning
of ambiguous language. It also includes PT that is generative pre
trained transformer, also text to text
transformer with that robota that is a robustly
optimized bot approach. Robota is a transformer
based language model that employs self attention to
analyze input sequences. Robota applies dynamic masking where the masking
pattern is changed. It offers enhanced performance
on various NLP tasks. So you can also relate
BT with Robota. Consider that the main goal of the Robota model is to improve the performance
of the bord model by addressing its limitations. These were the popular
models of hugging phase. In this lesson, we saw
what is hugging phase, its introduction features,
and some popular models. Thank you for
watching the video.
3. Hugging Face - Use Cases: In this lesson, we
will understand the use cases of hugging phase. Hugging Face supports
a wide range of use cases across NLP, computer vision, and even
multimodal applications. Let us see the use cases. Hugging phase is widely used. We have discussed some
key use cases here, beginning with
conversational EI, which you already
know, that is chatbds. Okay, build intelligent
chat boards using models like GPT, Blender board, and others. These chat boods can be used for customer support, like
virtual assistance. With that, you can also create interactive dialogue systems. These can be used as
educational assistance as well as therapy boards. Next come sentiment analysis, as the name suggests, you can easily analyze
customer feedback, analyze their social
media posts or the responses to
survey so that you can determine the sentiment that it is positive,
negative or neutral. With that, easily generate
text, generate articles, blogs, even poems
using models like GPT. Quotes can also be
easily generated, generate code snippets in
any programming language. With that, content
can be generated, including product
descriptions, review, marketing plans, and others. Next comes text summarization. If you want to
summarize your text, let's say you want
to summarize news, you can easily do it. With that, let's say you have some PDF documents and you
only want to summarize it. Those lengthy documents
can be easily summarized into
important points. With that, you can also
jot down meeting notes. Then comes your named
entity recognition, easily extract names, skills
and experience from resumes. It is also useful in healthcare
to identify diagnosis, the name of patients, some
medical terms and others. With that, you can also
extract name of companies, how they are working from their financial reports as well. Therefore, it is also used
in the finance domain. Machine translation,
as the name suggests, you can translate
your website app and even documents from one
language to another, let's say, from
English to Spanish. It can also be used in
languages which have low resources. That
is for translation. Question answering use case is mostly useful for
customer support. With that easily answer
the questions asked by students based on a
specific textbook or notes. FAQs can be easily answered and when I said
customer support, that itself means
like we have seen support tickets on websites so that users can easily
ask questions. With that, you can
easily retrieve answers from large documents
or databases. Also use it for speech
recognition and synthesis. You can also convert
speech to text, build voice control applications
using speech to text, and text to speech models. You can also provide
real time captioning, generate descriptions
for images. Let's say you have scanned document or images and
you want text from it, you can easily achieve this. Also, if you want to
read or scan images, that can also be achieved. That means visual answering. Then comes your
recommendation systems. You must have seen it on
Netflix or Amazon Prime. Easily recommend
movies or web series using what people are actually
liking in their account. With that NAS the search results by understanding the intent
of users and their context. Also detect frauds easily. With that regarding emails, easily detect and
filter Sam emails, mental health can also be
monitored using a model, easily analyze text or speech so that the
emotions can be detected, such as stress, anxiety,
or even depression. With that, understand
the emotions of customers during support
calls or even chat. Text to speech and speech
to text can also be achieved and real
time translations can be easily worked upon. Multimodal applications
analyze video content easily. With multimodal applications,
you can easily analyze, not even text, but
also video and audio. Augmented reality
applications can also be built easily generate
synthetic text data for training, machine
learning models. You can also paraphrase text, also identify relationship
between entities in text, grade the essays or assignments
of students easily, build tools for
grammar correction, vocabulary, rephrasing
content, and others. Use in healthcare
and life sciences. From the medical
records of a patient, you can easily
extract the insights. From legal documents, easily
extract the key clauses, obligation or any possible risk. With that, you can also create AI driven narratives for games. Analyze social media
easily so that you can identify the trending
topics or even hash tags. Also analyze the impact of
post done by influences. Multilingual
applications can also be created so that you can enable search across
multiple languages. Hate speech and
harmful content is something that needs to work on. With this, you can easily
detect and moderate them. Easily predict the
stock market trends by analyzing news articles, social media sentiments,
Twitter posts and others. Predict events like
a product launch, personalized email content
for marketing campaigns, customize website or app content based on user preferences
and behavior. Fine Tune pre trained models
for your specific task. Also, you can compare
the performance of different models
on custom data sets. So, guys, we saw some of the great use cases
of hugging phase. In the upcoming lessons, we
will implement some of them.
4. Transformers Library of Hugging Face: In this lesson, we
will understand the transformers library
of hugging phase. We will also learn how to
install it. Let's see. The transformers library is the core library for pre
trained models and pipelines. It is an open source
Python library. As I said before, Hugging phase developed the
transformers library, and it is modular
and extensible. It includes thousands of
pre trained models for a wide range of NLP tasks
such as translation, text summarization, text
classification, and others. So in this lesson, we
will understand what is the transformers library, why use the
transformers library, it's use cases as well
as how to install. Let us start. So we already saw what is the
transformers library here. Now we will see why use
the transformers library. Transformers library is
widely used because it is quite simple to use
with complex NLP models. It provides you access
to cutting edge models. With that, it is backed by a
large and active community. It supports customization
and fine tuning. With that, you can integrate the transformers library
with other tools. Here are some use cases of
the transformers library, classify texts into categories
like text classification in the case of spam
detection in emails. Also identify
entities like names, dates, and locations in text, which is called named
entity recognition. Translate texts between different languages
from English to German with that generate
text using models like GPT. Also implement
quotien answering. That is to answer on the
basis of a given context. Let us see how to install
the transformers library. So here are different ways. Use PIP to install the
transformers library. PIP is a package
manager to download, install and manage Python
packages and libraries. With that, you can
also use Google Colab. Here you can find some
difference in the syntax. There is an exclamation sign if you're installing
it on Google Colab. With that, you can also install the transformers
library directly from the hugging Phase
Github repository. So let us see how to install it. We will use Google Colab for it. We will add the following
command. Let's see. Here is our browser, I'll type, Google Colab and press Enter. Here is the link provided by Google only
colab.research.google.com. Here you can see I
already logged into my Gmail account, so
it will directly open. I clicked on so it is asking me to create
a new notebook here. These are my already
created notebooks. I'll click New Notebook. So it is a free web application. So now we will use the same
command here to install it. I'll show you again.
Here is the command. Okay, let us type
the same command. Okay, PIP space install
space transformers. After that, what we
need to do we need to just click this.
Is written run here. Can you see runs Okay, so in this way, we can install the transformers library
using Google Colab. You can add the name of
your Python notebook here. So this created a
Python Notebook. If you know Anaconda, you can easily guess what is
a Python notebook. Save it from here
and rename it later. So here I've just implemented this syntax to install
the transformers library. In this lesson, we saw, what
is the transformers library. We also saw that why
it is so popular with that we also saw some use cases and how to install it.
5. Datasets Library of Hugging Face: In this lesson, we
will understand the datasets library
on hugging phase. With that, we will
also see how to install it. Let us start. The datasets library
provides easy access to a wide variety
of datasets for NLP and other machine
learning tasks. It is developed by Hugging
Face and is a Python library. It makes it easier
for developers and researchers to work with data for training
and evaluating models. So in this lesson,
we will see what is the datasets
library, why to use it. So use cases of the
datasets library with that how to install it. Let us start. We already covered what is the
datasets library. So let us start with why
use the datasets library. One of the reasons is
efficiency, lazy loading, and steaming, make it easy
to work with large datasets. The datasets can be huge, and we always need a
library or a technology to ease the work of accessing and
working on those datasets. So this library really helps. It has a unified API for
processing datasets. You can also work with the
Transformers Library and other ML frameworks with the datasets library so that integration and
interoperability is possible. Thousands of datasets are
provided by Huggingfas. It supports custom datasets
and preprocessing pipelines. So before installing
the datasets library, let me show you its website. So here is the link huggface.co, the official website of
huggingface slash datasets. So you can see how
many datasets are provided over 350
K, and here it is. If you'll click on any one of them, you can
get all the details. In this tutorial, we
will also show you how to download and access a dataset easily
using Huggingface. I'll later see the use cases
or the datasets library with easily load and
preprocess datasets for tasks like spam detection. That comes under
text classification. With that, you can
also work with sentiment analysis and
quotien answering. Using this, you can easily build quotien answering systems. Some datasets are also
provided for translation with that named entity
recognition purpose can also be fulfilled with some datasets already
provided by Hugging Face. Load and preprocess
your custom datasets using the datasets library. Now let us see how to install
the Datasets Library. So you can use PIP PPAs
package manager to download, install and manage Python
packages and libraries. Just use the command PIP
Space install space datasets. With that, you can also
use Google Colab easily. But there is a difference
between both the syntaxes. You have an exclamation mark for Google Colab. We
will see it later. With that, you can
directly download it from the huggfasGiTub repository
using the provided syntax, the Git plus github.com hugfacelash datasets Tell S pip to install the package from the hugging Phase datasets
repository on Github. Now let us see how to install
the datasets library. We already saw Google Colab. So I'll just use the second syntax to install the Datasets library
on Google Colab. So this was our Gold Collab. We already saw how to install
the transformers library. We can install the datasets
library here itself, but let me create a new
notebook, go to File. Click New Notebook. Now a
new Python notebook opened. Let us type the command
to install datasets, Pip install PIP space
install space datasets, and just run the cell from here. I've shown this before as well. Let's wait. Tik Mark is visible. That means we successfully
installed it. You can also save it
from here I told you before also save and let us add the name to
our Python notebook. So in this way, guys, we can easily install
the datasets library. In the upcoming lessons,
we will also see how to work with them
and its use cases. Guys, we saw what is
the datasets library. We easily understood
the concept, it's use cases also,
and we also saw how to install the datasets library.
6. Tokenizers Library of Hugging Face: In this lesson, we
will understand the tokenizers library
of hugging phase. With that, we will also
see how to install it. The Tokenizers Library is a fast and efficient library
for tokenizing text, which is often used alongside
the transformers library. We already saw the
transformers library before in the previous lessons. So the tokenizer
Library is a fast, efficient and flexible library designed for
tokenizing text data. Which is a crucial step in
natural language processing. Tokenization involves
splitting texts into smaller units
such as words, sub words or characters, then these are converted into numerical representations
that ML models can process. In this lesson, we
will understand what is the tokenizer's library, why use it, its use cases, as well as how to install. So let start, we already saw what is the
tokenizers library. So now we will see why use
the tokenizers library. It is quite quick
for tokenization, that means optimized
for fast tokenization, even on large datasets. It also supports
custom tokenizers, as well as flexible enough to support multiple
tokenization algorithms. Integration is possible. That means you can work it with other hugging face libraries
like transformers. It has an easy API
for tokenizing, decoding and managing
vocabularies. With that, you can easily
access pre trained tokenizers. Now let us see the use cases. Easily tokenize the textual data for classifying text for
spam detection in emails. With that, you can
also analyze the spam. Easily perform
sentiment analysis, align the tokens
with entity labels. It is also used for
machine translation. Some of its other use cases includes text generation and
even question answering. With that train and use tokenizers for domain
specific data sets. Now, let us see how to install
the tokenizers Library. We can use PEP PIPs package
manager to download, install and manage
Python packages, use the Syntax PIP space install tokenizers
to install it. With that, we can also use the tokenizers Library
on Google Collab. We already saw how to install
a library on Google Collab. Similarly, we can use the exclamation mark PIP space install space
tokenizers to install it. Also, the third way you can tell PIP to install
the package from the hugging phase
datasets repository. You can also use
the third way that is directly installed from
the Github repository, Type PIP space Install space Git plus the Github
path to install it. Now let us see how to install Tokenizers Library
on Google Colab. We will open Google Colab again. So here is our Google Colab. We already installed the transformers and
Datasets library. We can install the Tokenizer
Library here itself, but let me create a
new Python Notebook, GodoFle click New Notebook. Now, let us type the command. Exclamation mark,
PIP space install space tokenizers and
click on the cell. Click here, Run Sell. Now the tokenizers library
will get installed. You can also save this
As I told before, it will create a
Python notebook. So here, I'll type
Amith underscore. You can add any name. And
this is our Python notebook. Okay, we will utilize all
these libraries later on when we will work on the
use cases of Hugging Face. So, guys, we saw what is
the tokenizers library. We also saw its purpose, as well as the use cases. With that, we also installed the tokenizers library
on Google Colab.
7. Hugging Face Access Token (API Key) & How to Create: In this lesson, we will learn what is a hugging excess token. With that, we will also
learn how to create it. Let us start. Consider an excess token as a secure string of characters. This is mainly used to access hugging phase services
and resources. The hugging phase, API key and hugging phase excess
token are the same thing. So in this lesson, we will
see what is an excess token, that is an API key. With that, we will learn when do we need a hugging
phase excess token. Also, we will understand that when the hugging phase
excess token isn't required, in the end, we will
learn how to create an API key. Let us start. So we covered what is an APK, that is an excess token
in hugging phase. Now let us see when do we need a hugging
phase excess token? Here it is. When you're using a private or gated model
or an inference API, you need a hugging
phase excess token. You must have heard
about Meta Sama. It is a private model. To access it, you
need to authenticate. That means you need
to create an API key. You need a token. With that, if you're using
the huggingpas inference API, then you need an access
token to make API calls. Also, if you're
uploading models or datasets or even spaces
to the hugging Pace hub, you need an access token. Now let us see when do you not need a
huggingfas access token. Obviously, if you're
accessing public models which are publicly available
to download and use, you don't need an excess token. Just like GPD two, also, if you're using the models via the transformers
library of Hugging phase, you don't need an access token. These are publicly
available and can be easily downloadable without any authentication,
without any APAKe. Also, a lot of open
source models are available to access
these models, you don't need an API key, you don't need an access token because they are
freely available. So in the upcoming lessons, we'll be working on these
public and open source models only so that there is no need to create a hugging
phase access token. Now, let us see how to create a hugging phase access token. So we will go to the hugging
Phase website and we will create an access
token. So let us start. Open the official website huggface.co slash
Join and press Enter. So here it is, you need to join. That means you need to create
an account on Hugging face. Here you can use
your email address. So let me create my account. So here I have added account, my email ID. No
enter the password. Here it is now click Next complete your profile
here, add a username. Add your name. You can also
add your Twitter user name. These are optional
LinkedIn profile also. You can also upload
your OTR. I'll click. Also, you can add your iTub user name as well
as your website. As you can see,
these are optional. Click I have Red. And after
that, click Create Account. We have created an account. You need to check your email letters for a confirmation link. Now your account is verified. Your email address
has been verified. Click on your profile. Go below. It's written excess tokens. Here it is. Click on it. Now, you need to create a
new token by clicking here. Remember, do not share your
excess tokens with anyone. Create new token.
Add the token name. Let's say I'll type Demo key. Okay. Now go below. Click Create Token. The
key created successfully. You can copy it and save it. Here it's written, save
it somewhere safe. You will not be able to see it again after you
close this model. Click Done. Now all
your keys are visible. Here it is, we created
a single key just now, and when you click
here, you can edit it. You can edit the permissions,
and also delete it. Okay, we saw what are excess tokens or APA
key in hugging face. With that, we also
learned how to create.
8. Download a dataset on Hugging Face: In this lesson,
we will learn how to download a dataset
from Hugging phase. For that, we will use
the datasets library. Let us see a dataset refers to a collection
of structured data, which can be used for training, evaluating or testing
machine learning models. So Hugging pace is having a lot of datasets on its platform, which can be used for
various use cases like NLP. We will use the datasets
library to download a dataset from Hugging
phase. Let us see. First, let us see the datasets. Go to the hugging Phase
website slash datasets. So these are the datasets
provided by Hugging face. You can see a lot of them. Let us see how we
can download it. So we will go to the same
platform, Google Colab, which we have used before in
this tutorial. Here it is. Okay. So this is the
notebook we already created. In this first, we installed
the dataset library. I already told you
how to install it on Google Colab using
the PIP command. After that, we loaded a dataset using the
load underscoe dataset. Function. This function
can download datasets from the Hugging Face hub or
load them from local files. We are downloading a dataset from the hugging
face hub right now. Here it is. Okay, here we are
loading the IMDB dataset. After that, I'm printing the dataset using
the print method. Here we are importing the load underscore dataset function. This provides access to various public datasets
like IMDB in this case. Here we are loading
the IMDB dataset. The IMDB dataset contains movie reviews labeled as positive or negative or
sentiment classification. When you run, it
will automatically download and process
the dataset. Here we are printing
the dataset. This will split the dataset
into train and test. Okay, that it will display
an overview of the dataset, including the number of
samples in each split. Let us see after
running, here it is, it is showing us the structure
of the IMDB dataset. As dataset dictionary, which organizes the dataset
into different splits. The train contains 25 Kos with features text for movie reviews and label for sentiment
like positive or negative. Here, for test, that is 25 Kos for testing purposes
with the same features. It contains 50 Kos, but this split typically doesn't have labels for
sentiment analysis. It's often used for tasks like pre training or semi
supervised learning. In this you guys, we
can download a dataset.
9. Download a model from Hugging Face: In this lesson, we
will learn how we can download a model
from Hugging Face. Let us see. So to download, we will use the
transformers library. With that, we can also download directly from the
Hugging Face Hub. Let us see a step by step guide to download and use
models from Hugging Face. We will use the
transformers library, which we already discussed. Let us see. Here is our VS code. We already created a notebook
file, open notebook. So here we created AmTnderscoe
download model already. In this, what we did first, we installed the
transformers library. We already discussed
that Hugging Phase developed this library. So we used PIP to install
it on Google Colab. After that, what we did here, we downloaded a model using
the transformers library. We have used the From underscore pre trained
method for this. This method downloads
the model weights, configuration and tokenizer
from the hugging Phase hub. We are downloading a pre
trained bird model. Here it is. After running what we will get, we ran this and
we got the shape. This shape is commonly
seen in models bit where each token in a sequence is represented by
a 768 dimensional vector. When we use the Burt base hyphen uncased model and pass the input hello
hugging phase, the last hidden
state output shape represents the
tensor dimensions. For this example, the shape you would typically see
is the following. Here, one is visible. It is the bat size since
there is one input sentence. Seven is the sequence length. This corresponds to
the tokenized version of hello hugging phase
including special tokens. That means the following
hello hugging phase. 768 is the hidden size. Each token is represented as a 768 dimensional
vector standard for birds base architecture. In this way, guys, we can
easily download a model using the transformers library
with Google Colab.
10. Sentiment Analysis using Hugging Face: In this lesson,
we will learn how to implement sentiment
analysis with Hugging face. We will understand
what is sentiment analysis with its type. After that, we will run a
coding example on Google Colab. Let us see so we already discussed the
transformers library provided by Hugging Face. It is a powerful tool for
task like sentiment analysis. Now what is sentiment analysis? As the name suggests,
it includes determining the sentiment expressed
in a piece of text, like, negative or neutral. So let's say I love cricket, so this is a positive sentence. Okay, I don't like something, I'll hit something, so that
is a negative sentiment. Similarly, when I'll explain the types of sentiment analysis, Things will be more clear. First one is polarity detection that is positive,
negative or neutral. I love this product is
positive, obviously. The service is terrible, is not good, is negative. And when the things
are not clear, it will be neutral, like the
package arrived on time. Next comes emotion detection. Let's say you said, This is
not good, this is pathetic. This is so frustrating.
That is anger. And joy is expressed by a sentence like I'm
thrilled about the results. So emotion detection
includes happiness, frustration and other emotions. Then comes aspect based
sentiment analysis, like sentiment towards a
specific product or service. Like the food was great,
but the service was slow. In this case, the food is having a positive
sentiment, obviously. But since the service
was not good, it is a negative sentiment. Then the intent analysis, like the intent to purchase
something to complain. Let's say you said, We can I buy this product? So that is a purchase intent. So these were the types
of sentiment analysis. Now, let us see the
coding example. In this, we will
use a public model. So we won't be creating an excess token because
for public models, as I already told,
we don't need it. We will run the
code on Gool Colab. For efficiency, we
can also change the runtime on Google Collab, so I'll also show you
that with the example. Let us start Here is
our Google Colab. Okay, let me open the
code, file, open notebook. I already created the project. Here it is Sentiment Analysis. Here it is. So for efficiency, we can change the runtime type. Click the runtime menu, here
click, change Runtime type. Okay. We can see we already selected the
To GPU, not a problem. If your project is quite complex or you are having a
large scale project, you can select the V two
hyphen eight TPU also. I'll keep the same, okay? So initially here,
what we did, first, we installed the
required libraries, that is transformers and torch. Okay, we use the paper. We already discussed how to install it in the
previous lessons. After that, we ran it using this runs in this we imported the necessary
modules in this line. Here we loaded the sentiment
analysis pipeline. The pipeline function
provides a simple way to perform various NLP tasks, including sentiment analysis. You can load a pre trained sentiment analysis
model as follows. So here, what we did we
loaded the following model. Okay. After that, we
performed sentiment analysis. Since we have loaded the
sentiment analysis pipeline, use it to analyze the
sentiment of a piece of text. So here, I love playing
and watching cricket. These are my text,
and I hate when we're at Collis is a century. So obviously, you can guess
this is a positive sentence, and this is a negative sentence. You can easily guess it. So this is the sentiment analysis. Here the output you can see
is a list of dictionaries. Here it is here each dictionary contains the sentiment label
and the confidence se. Here it is label and
the confidence core. So here we have analyzed multiple texts at
once by passing a list of stings to the
sentiment analyzer. Now let us understand
the output completely. The score in the output of the hugging phase sentiment
analysis pipeline represents the confidence level or probability that the model assigns to the predictive
sentiment label. It indicates how confident the model is that the given text corresponds to the
predictive sentiment. The score is a value 0-1. As you can see, the score
closer to one means the model is very confident
in its prediction. If the score was closer to zero, that would mean the model is less confident in
its prediction. The label positive
indicates that the model predicts the sentiment of the text is positive.
That is the following. The negative means the
opposite, that is negative. Here it is. So here you would be wondering
why the score is so high, close to one. This is because the model we are using has been fine tuned on a large dataset and is highly accurate for
sentiment analysis task. The input text likely contains strong unambiguous language that makes it easy for the model to predict the sentiment
with high confidence, like ate means negative
and love means positive. So in this way, guys, we can work on sentiment analysis
with hugging face.
11. Text Classification using Hugging Face: In this lesson, we
will learn how we can use the hugging phase
for text classification. First, we will understand
what is text classification. With that, we will also see the difference between sentiment analysis and
text classification. After that, we will
create and run an example on Google
Colab. Let us start. Text classification,
as the name suggests, can be used for spam detection. So on your email
ID, you must have seen that some
emails go to Spam, some emails are not
considered as spam. In a similar way, you can also
classify news articles or documents like sports article
under the sports category, a tech related article under
the technology category. And with that, it also includes a use case
for intent detection, like to cancel an order, to book a flight, and others. So let us see till now, we have covered the
sentiment analysis. So here is the
difference between sentiment analysis and
text classification. As the name suggests,
sentiment analysis are narrow. That is specific
to the sentiment. Let's say positive sentiment for a text like I love Cricket. In a similar way, the labels for text classification depends
on the task like I just discussed about spam or not
spam or different topics. With that, for
sentiment analysis, we discussed before it is mainly positive,
negative or neutral. Some use cases
include classifying email as spam or not spam
under text classification. Under sentiment analysis, one of the use case can be a
positive product review. Now, let us see a coding
example where we will detect spam or not
spam based on a text. We will use a publicly available model that
is the following. So we won't be needing any excess token from
Hugging face for this. So let us see the example and classify text as
spam or not Spam. Here is our Google
Colab. We created these notebooks till now. Let us open our text classification notebook,
open notebook. Here it is. We already created
it. Let us see the steps. First, we will install
the required libraries. That is to begin with
the hugging face Transformers library as
well as the torch library. So we have used the PIP install command for this.
Let's go below. After that, we will import
the necessary modules. Here we have imported
the pipeline module. Then we have loaded a pre
trained spam detection model that is the following here.
It is freely available. So we did not apply any key
for it from hugging face. Now the next step includes
performing the spam detection. First, we have set
multiple text so that we can detect whether these
texts are spam or not spam. We have classified
multiple texts at once by passing
a list of strings. Here it is. We have mapped labels to spam
and not Spam here. Here it is label mapping. Negative means spam,
neutral means not spam, positive means not spam. Okay. To display the results, we have used the four in loop. Here it is. What will happen? A score will be visible
in the output. Okay. The output will also
include the label, whether that takes it
a spam or not spam. With that, the score
will also be visible. These are the confidence scores. Here it is so according
to our model, the first text is a spam. Obviously, because it is
showing congratulations. We have one of 500
INR Amazon gift card, click here to claim now. The second one is not a
Spam. Obviously, hier Myth. Let's have a meeting
tomorrow at 12:00 P.M. So obviously, this
is not a spam. The last one is also
considered as a spam. We get a lot of such spam emails that your Gmail account
has been compromised. Here, the confidence interval
is displaying the score. Low confidence
scores indicate that the model is uncertain
about its predictions. The following model is fine
tuned for sentiment analysis, but not specifically
for spam detection. We are still adapting
it for spam detection. Okay. That's why here
it is showing not spam, but the confidence score
is even less than 0.7. I told you that low
confidence scores indicate that the model is uncertain about its predictions. You can set a different model here from
the hugging phase. Here we are showing an example. So in this way, we can
use the transformers library on hugging
face to detect Spam. That is to perform
text classification.
12. Text Summarizations using Hugging Face: In this lesson, we
will understand how to perform summarizations
using hugging face. First, we will understand
why we need to summarize, and then we will see
a coding example on Google Colab to summarize
text. Let us start. The hugging phase
Transformers Library, as you already know,
is used for NLP task. That includes summarizing
text as well. So why summarization? Summarization is
actually used in a lot of real world
applications. You must have seen summarizing
long articles into short snippets with that summarizing documents
or research papers. Chatbots also provide
quick concise responses. With that, you can
extract key points and summarizations from a document and from large datasets also. Now let us see an example. Here we will use the
following model, which is publicly
available on Hugging face. So we don't need to
add the excess token. Okay, we will run the code on Google Colab like we saw before. So let us see the code. So
here is our Google Colab. We will open our code
file, open notebook. So here we are discussing
about summarization. Here is our code.
First, what we did, we installed the
required libraries. So we have installed the
transformers as well as the Pytoch library here using the PIP space
install command. We already discussed
this command before. After that, we will use Atomdel
for sequence to sequence LM and auto tokenizer
for more control over the process so that we can load the model and
tokenizer directly. So this is what we
have done here. Here we have loaded the following pre trained
model for summarization. So here we have set the
input text to summarize. So this is our text. We
will summarize this. First, we have tokenize the input text using
the following. Okay, so here you can
see some parameters. These parameters will control the summaries
length and quality. Max underscore length is the maximum number of
tokens in the summary. We have set 512, so here the summary will be
no longer than 512 tokens. We have tokenize the
input text here. To generate the summary, we have used the generate method. Here we have some parameters for the input, the
following, tokenized. Then the max length, which is the maximum number of
tokens in the summary. This is the minimum number
of tokens in the summary. Length underscoe
penalty. What is this? This encourages longer
or shorter summaries. Here it is two, that
means longer summaries. Num underscoe beams controls
the beam search width, higher values, improve quality, but slow down inference. Here we have set it to four. That means four
beams for decoding. Okay, so here was our input, and this is the summary here. We have printed the summary
here. We have summarized it. Okay. So in this way, guys, we can use hugging phase. So in this way, guys, we
can summarize text easily.
13. Text to Text (Translate) using Hugging Face: In this lesson, we
will understand how we can perform translation
using Hugging Face. That is text to text generation. Let us see for translation task, we will use the hugging
phase transformers library. Some models are already
provided for this. So translation, as we
all know, includes, let's say, translating
English, text to Spanish. This is a part of text to
text models that requires a task prefix to specify
the type of task, for example, translation,
summarization and others. Text to text generation includes not only translation
but also summarization, paraphrasing,
question answering, and even sentiment
classification. So let us see the
difference between text to text and
text generation. So text generation is used for autodgressive
text generation where the model generates text sequentially
one token at a time, like dialog systems, text
completions, and others. The text to text
generation class is used for sequence to sequence task where the model will take an input sequence and generate an output sequence, like your text summarization, paraphrasing, and
even translation. Now, let us see an
example to perform translation using the Hugging
Phase Transformers library. In this will use a model
T five underscore Small, which is publicly available
on Hugging phase. This model is a
smaller version of the T five model and can be used for tasks
like summarization, translation and even
quien answering. Let us see the example
on Google Colab. Here is a Google Colab. We just saw the text
summarization example. Now let us open. Now, let us open the translation example. Here it is. First we will
install the required libraries. That is the following
here. We have used the PIP space
install command. We already saw this
command before. After that, we will load a pre
trained translation model. That is here we have
loaded the T five model. It is a versatile
text to text model. That can handle
translation by prefixing the input with a task
specific prompt. So we are loading
a T five model. Here. Here we have
prepared the input text. So this is the text
we will translate, translate English to Spanish. That is the following text
we'll get translated. Tokenize the input text into input IDs that the
model can process. Use the model to generate
the translated text. You can customize the
generation process with parameters like max length, Nam underscoe beams we saw
in the previous lesson also. Here the output tokens
will be decoded to text, and after that, we will
print the translated text. So here is the output,
translated text. My name is Amed Devan
and I love cricket. So here it is
translated to Spanish. So in this way, guys, we
can perform translation.
14. Question Answering using Hugging Face: In this lesson, we
will understand how to use the hugging
phase for quien answering. We will also see an
example. Let us start. So we will use the
transformers library of hugging phase for performing
Quotien answering task. We will run the code
on Google Colab. So here we will use
the following model, which is publicly available
on hugging phase, so we don't need to create
an excess token for this. Let us see the example
on Google Colab. So here we will open our code. First, we will install the required libraries
we have shown. We have used the same PIP install command which we saw before to install the
required libraries. After that, we will
load a pre trained QA model and tokenizer. Here is our model
and the tokenizer. Prepare the input for QA task. That is for the quotien
answering task. We need a context as
well as a quotien. What is the context now? It is a paragraph or text where
the answer might be found. That is the following.
I'm providing a context also, and
here is the quoti. So this is about me,
and here is the, the question you want to answer. Okay. So we have said both. After that, we will
tokenize the input, tokenize the context and quotien using the tokenizer.
We have done both. Get the model's prediction, pass the tokenized input to the model to get the answer.
That is the following. It will extract the start
and end scores also. I will get the most
likely start and end positions here and it will use the same to convert
token IDs back to words so that the answer
is displayed here. Answer and disco
tokens will be set here and it will be
decoded back towards. This will have your output. So here the quotien was where
amid the onean is based. The context was the following, and the answer is deli. So in this way, guys, we can
perform answering easily.
15. Text to Image using Hugging Face: In this lesson, we
will understand how we can perform text to image
using hugging face. Let us understand
with an example, so here we will use the Hugging
Face diffusers library. This example will use the
stable diffusion model also, which is one of the most
popular text image models available in the
diffuser library. Now, what is the diffusers
library and stable diffusion? The diffusers library
is an open source Python library to focus on diffusion models
for generating images, audio, and other types of data. These are a class of
generative models only developed by Hugging Face. What is stable diffusion? It is a latent diffusion model designed for high quality
image generation. So you can generate images
from text proms using this. It is also one of the most
popular generative model. Let us see the example. Here we will use a publicly available
model on huggingface. Let us see the example and
convert text to image. The output will be generated as an image
on Google Colab itself. So let us see. Here
is our Google Colab. Let us open our notebook
for text to image. Here it is. First, we will install the
required libraries using the same pip install
command we already discussed. So now we will load the
stable diffusion pipeline. The diffusers library provides a stable diffusion
pipeline that makes it easy to generate
images from text prompts. We will load the stable
diffusion model here. Now generate an image
from a text prompt, easily generate an
image by passing a text prom to the pipeline.
Here is our prompt. Flying cars soar over a
futuristic cityscape at sunset. The following will
generate the image. Okay. Here is our image. This image will get saved on Google Colab only
using the same method, and it will also print image saved as generated
underscore image dot PNG. PNG file will get generated, where it will be visible on
Google Colab, click here. You can see files. Now, I'll run it. I'll run it. I'll run this now. Now, I'm running to generate
an image and save it. Okay, so here is our
image. It's written. Image saved as generated
Underscore image dot PNG. Okay, so it generated it. I'll just go here from here. You can download it. You
can also copy the path. I'll click Download. It downloaded. Okay, here it is. So we generated an image
that is text to image.
16. Text to Video using Hugging Face: In this lesson, we
will understand how to perform text to video
using urging face. This is called text
to video synthesis. We will understand what it is, and we will also run
a sample example. So let us start Text
to video includes generating video from
textual descriptions like typing a text and
generating a video. Like we saw in the
previous lesson, text to image, we typed a
text and generated an image. In this case, we will
generate a video. So we have a lot of
pre trained models and tools for generating videos. Hugging face provides
the same models. The text to video synthesis
Tom I just told it includes generating
a sequence of frames based on a
textual description. Since it's a complex task, it requires combining
different NLP models with generative models or
even diffusion models. Okay, diffusion models we
saw in the previous lesson, it is used for generating
images or videos. Let us see some video
generation frameworks before moving
towards the example. One of the most popular
ones are runway ML. It offers tools for video
generation and editing. Mostly for a generated videos, you can use PIA labs. With that deep mind perceiver IO can also be used to
handle multimodal inputs. Multimodal input can include
text images and even videos. You need to use the
library like Pitch or tensor flow so
that you can build pipelines for generating
video frames. Let us see an
example. So here we will use the difuser
library also. We already discussed
the difuser library. It is an open source
library developed by Hugging Face and used for generating images
and even videos. We will use the publicly
available stable division model. In our example, we will run the code on Google
Colab like we saw before. Let us start. Here
is a Google Collab. Let us open our code,
file, open notebook. We will open our notebook
for text to video. I'll type video only
to search. Here it is. First, we will install. So here we have used the pip install command to
install the transformers, as well as the diffusers
library also with Pytorch. After that, we will load a text to Image model, so here
we are loading it. This is the model I
already told you. We have used the diffusers library to load a
pre trained text to Image model like
stable diffusion. Generate frames from text
first, we have set the prom. Here, we will generate individual frames based
on the text description. This is the text description, a futuristic cityscape at
night with flying cars. This will generate ten frames
using the foreign loop. Here it is ten, and
it will append. Later on, we have used the OpenCV library to sketch
the frames into a video. Here we are using the OpenCV inside the foreign loop
so that we can stitch it. We have also used
the Numpi library. We are using the
Numpi array in it. So this will save
frames as images. And this will stitch
the frames into videos, and the following will
display the output, which is gathering the frames
using the foreign loop. And the output will
be displayed like this in a form of frames. So here when I run,
it will display me ten frames because we are
generating ten frames here. And after that int end, it will display the video. So here is the output, the
output video will have the following name output
underscore video dot mp four, but it will also
generate frames. How many frames? Ten frames. The format of the frame will be the following frame
underscore the value of I. So the frames would be like frame underscore
zero dot PNG, frame underscore one dot PNG, and it will go until nine. That means ten frames. And the output will be here, I told you. Now let us run it. A Now we will click here. And here you can see I told you it will generate ten frames. Frame underscore zero dot png till nine and output
video will be here. So this was the output.
I'll just click here and click Download. Download it. Right click and open.
Here is our video. Okay, you can see ten frames. So in this way, guys,
we can generate video from text
with hugging face. Thank you for
watching the video.