137
1 842 647

Understanding ChatGPT and LLMs from Scratch - Part 2

25:28

Understanding ChatGPT and LLMs from Scratch - Part 1

34:21

Understanding BERT Embeddings and How to Generate them in SageMaker

13:40

Understanding Coordinate Descent

5:59

Bootstrap and Monte Carlo Methods

17:15

Maximum Likelihood as Minimizing KL Divergence

10:34

Limitations of the ChatGPT and LLMs - Part 3

If you haven't watched the Part 1 and Part 2, I highly suggest watching them before watching the Part 3.
Large Language Models (LLMs) have shown a huge potential and recently the have drawn much attention. In this presentation, Ameet Deshpande and Alexander Wettig gives a detailed explanation about how Large Language Models and ChatGPT works. He makes clear that he does not assume that the audience has any prior knowledge about language models. He starts with embedding and give an explanation about Transformers as well. This is the last episode of this amazing serie. Thanks for watching.

Відео

Understanding ChatGPT and LLMs from Scratch - Part 2

25:28

Understanding ChatGPT and LLMs from Scratch - Part 2

Переглядів 861Рік тому

Large Language Models (LLMs) have shown a huge potential and recently the have drawn much attention. In this presentation, Ameet Deshpande and Alexander Wettig gives a detailed explanation about how Large Language Models and ChatGPT works. He makes clear that he does not assume that the audience has any prior knowledge about language models. He starts with embedding and give an explanation abou...

Understanding ChatGPT and LLMs from Scratch - Part 1

34:21

Understanding ChatGPT and LLMs from Scratch - Part 1

Переглядів 3,4 тис.Рік тому

Understanding BERT Embeddings and How to Generate them in SageMaker

13:40

Understanding BERT Embeddings and How to Generate them in SageMaker

Переглядів 4,5 тис.Рік тому

Course link: www.coursera.org/learn/ml-pipelines-bert In this course, you will use BERT for the same purpose. Before diving into the BERT algorithm, I will highlight a few differences between BlazingText and BERT at a very high level. As you can see here, BlazingText is based on Word2Vec, whereas BERT is based on transformer architecture. Both BlazingText and BERT generate word embeddings. Howe...

5:59

Understanding Coordinate Descent

Переглядів 7 тис.Рік тому

Course link: www.coursera.org/learn/ml-regression let's just have a little aside on the coordinate decent algorithm, and then we're gonna describe how to apply coordinate descent to solving our lasso objective. So, our goal here is to minimize sub function g. So, this is the same objective that we have whether we are talking about our closed form solution, gradient descent, or this coordinate d...

17:15

Bootstrap and Monte Carlo Methods

Переглядів 7 тис.Рік тому

Here we look at the two main concepts that are behind this revolution, the Monte Carlo method and the bootstrap. We will discuss the main principles behind these methods and then see how to apply them in various important contexts, such as in regression and for constructing confidence intervals. Course link: www.coursera.org/learn/stanford-statistics/

Maximum Likelihood as Minimizing KL Divergence

10:34

Maximum Likelihood as Minimizing KL Divergence

Переглядів 2,8 тис.2 роки тому

While the Bayes' formula for the posterior probability or for parameters given the data is very general, there are some interesting special cases where that can be analyzed separately. Let's look at them in a sequence. The first special case arises when the model is a fixed one and for all. In this case, we can drop the conditioning on M in this formula. The Bayesian evidence, in this case, is ...

16:17

Understanding The Shapley Value

Переглядів 14 тис.2 роки тому

Shapley Value is one of the most prominent ways of dividing up the value of a society, the productive value of some, set of individuals among its members. The Shapley Value is, is based on Lloyd Shapley's idea that members should basically be receiving things which are proportional to their marginal contributions. So, basically we look at what, what does a person add when we add them to a group...

5:01

Kalman Filter - Part 2

Переглядів 26 тис.2 роки тому

Course Link: www.coursera.org/learn/state-estimation-localization-self-driving-cars Let's consider our Kalman Filter from the previous lesson and use it to estimate the position of our autonomous car. If we have some way of knowing the true position of the vehicle, for example, an oracle tells us, we can then use this to record a position error of our filter at each time step k. Since we're dea...

8:35

Kalman Filter - Part 1

Переглядів 97 тис.3 роки тому

This course will introduce you to the different sensors and how we can use them for state estimation and localization in a self-driving car. By the end of this course, you will be able to: - Understand the key methods for parameter and state estimation used for autonomous driving, such as the method of least-squares - Develop a model for typical vehicle localization sensors, including GPS and I...

Recurrent Neural Networks (RNNs) and Vanishing Gradients

5:43

Recurrent Neural Networks (RNNs) and Vanishing Gradients

Переглядів 8 тис.3 роки тому

For one, the way plain or vanilla RNN model sequences by recalling information from the immediate past, allows you to capture dependencies to a certain degree, at least. They're also relatively lightweight compared to other n-gram models, taking up less RAM and space. But there are downsides, the RNNs architecture optimized for recalling the immediate past causes it to struggle with longer sequ...

Transformers vs Recurrent Neural Networks (RNN)!

6:28

Transformers vs Recurrent Neural Networks (RNN)!

Переглядів 21 тис.3 роки тому

Course link: www.coursera.org/learn/attention-models-in-nlp/lecture/glNgT/transformers-vs-rnns Using an RNN, you have to take sequential steps to encode your input, and you start from the beginning of your input making computations at every step until you reach the end. At that point, you decode the information following a similar sequential procedure. As you can see here, you have to go throug...

Language Model Evaluation and Perplexity

6:46

Language Model Evaluation and Perplexity

Переглядів 18 тис.3 роки тому

Course Link: www.coursera.org/lecture/probabilistic-models-in-nlp/language-model-evaluation-SEO4T Transcript: In this video I'll show you how to evaluate a language model. The metric for this is called perplexity and I will explain what this is. First, you'll divide the text corpus into train validation and test data, then you will dive into the concepts of perplexity an important metric used t...

Common Patterns in Time Series: Seasonality, Trend and Autocorrelation

5:06

Common Patterns in Time Series: Seasonality, Trend and Autocorrelation

Переглядів 8 тис.4 роки тому

Course link: www.coursera.org/learn/tensorflow-sequences-time-series-and-prediction Time-series come in all shapes and sizes, but there are a number of very common patterns. So it's useful to recognize them when you see them. For the next few minutes we'll take a look at some examples. The first is trend, where time series have a specific direction that they're moving in. As you can see from th...

Limitations of Graph Neural Networks (Stanford University)

1:26:35

Limitations of Graph Neural Networks (Stanford University)

Переглядів 14 тис.4 роки тому

Limitations of Graph Neural Networks (Stanford University)

Understanding Metropolis-Hastings algorithm

9:49

Understanding Metropolis-Hastings algorithm

Переглядів 69 тис.4 роки тому

Understanding Metropolis-Hastings algorithm

Learning to learn: An Introduction to Meta Learning

1:27:17

Learning to learn: An Introduction to Meta Learning

Переглядів 27 тис.4 роки тому

Learning to learn: An Introduction to Meta Learning

Page Ranking: Web as a Graph (Stanford University 2019)

1:26:56

Page Ranking: Web as a Graph (Stanford University 2019)

Переглядів 3,4 тис.4 роки тому

Page Ranking: Web as a Graph (Stanford University 2019)

Deep Graph Generative Models (Stanford University - 2019)

1:22:31

Deep Graph Generative Models (Stanford University - 2019)

Переглядів 19 тис.4 роки тому

Deep Graph Generative Models (Stanford University - 2019)

Graph Node Embedding Algorithms (Stanford - Fall 2019)

1:29:00

Graph Node Embedding Algorithms (Stanford - Fall 2019)

Переглядів 67 тис.4 роки тому

Graph Node Embedding Algorithms (Stanford - Fall 2019)

Graph Representation Learning (Stanford university)

1:16:53

Graph Representation Learning (Stanford university)

Переглядів 94 тис.4 роки тому

Graph Representation Learning (Stanford university)

13:22

Understanding Word Embeddings

Переглядів 10 тис.4 роки тому

Understanding Word Embeddings

Variational Autoencoders - Part 2 ( Modeling a Distribution of Images )

10:33

Variational Autoencoders - Part 2 ( Modeling a Distribution of Images )

Переглядів 1,6 тис.4 роки тому

Variational Autoencoders - Part 2 ( Modeling a Distribution of Images )

Variational Autoencoders - Part 1 (Scaling Variational Inference & Unbiased estimates)

6:26

Variational Autoencoders - Part 1 (Scaling Variational Inference & Unbiased estimates)

Переглядів 2,9 тис.4 роки тому

Variational Autoencoders - Part 1 (Scaling Variational Inference & Unbiased estimates)

6:58

DBSCAN: Part 2

Переглядів 21 тис.5 років тому

DBSCAN: Part 2

8:21

DBSCAN: Part 1

Переглядів 29 тис.5 років тому

DBSCAN: Part 1

12:13

Gaussian Mixture Models for Clustering

Переглядів 90 тис.5 років тому

Gaussian Mixture Models for Clustering

Understanding Irreducible Error and Bias (By Emily Fox)

6:27

Understanding Irreducible Error and Bias (By Emily Fox)

Переглядів 7 тис.5 років тому

Understanding Irreducible Error and Bias (By Emily Fox)

Python Libraries for Machine Learning You Must Know!

4:40

Python Libraries for Machine Learning You Must Know!

Переглядів 1,9 тис.5 років тому

Python Libraries for Machine Learning You Must Know!

12:41

Conditional Probability

Переглядів 1,4 тис.5 років тому

Conditional Probability

КОМЕНТАРІ

@homeycheese1 5 днів тому
will coordinate descent always converge using LASSO even if the ratio of number of features to number of observations/samples is large?
@muhammadaneeqasif572 15 днів тому
amazing great to see some good content again thank yt algorithm keep it up
@stewpatterson1369 18 днів тому
best video i've seen on this. great visuals & explanation
@pnachtwey 19 днів тому
This works ok on nice functions like g(x,y)=x^2+y^2 but real data often looks more like Grand Canyon where the path is very narrow and very windy.
@sELFhATINGiNDIAN Місяць тому
No
@kacpersarnowski7969 Місяць тому
Great video, you are the best :)
@frielruambil6275 Місяць тому
Thanks very much, I was looking for such videos to answer my assignment questions and you answered all of them at once within 3 minutes. I salute you,please keep on do more videos to assist the students to pass their exams and assignments.
@NeverHadMakingsOfAVarsityAthle 2 місяці тому
Hey! Thanks for the fantastic content :) I'm trying to understand the additivity axiom a bit better. Is this axiom the main reason why Shapley values for machine learning forecast can just be added up for one feature over many different predictions? Let's say we can have predictions for two different days in a time series and each time we calculate the shapley value for the price value. Does the additivity axiom then imply that I can add up the Shapley values for price for these two predictions (assuming they are independent) to make a statement about the importance of price over multiple predictions?
@somerset006 3 місяці тому
What about self-driving rockets?
@paaabl0. 4 місяці тому
Shapley values are great, but not gonna help you much with complex non-linear patterns, especially in terms of global feature importance
@williamstorey5024 4 місяці тому
what is text regression?
@yandajiang1744 4 місяці тому
Awesome explanation
@user-vh9de5dy9q 5 місяців тому
Why are the given weights for the distributions, are not really showcasing the distributions on the graph. I mean i would choose π1 = 45, π2 = 35, π3 = 20
@thechannelwithoutanyconten6364 5 місяців тому
Two things: 1. What the H matrix is has not been described. 2. One non s1x1 matrix cannot be smaller or greater then another. This is sloppy. Besides that, it is a great work.
@obensustam3574 5 місяців тому
I wish there was a Part 3 :(
@DenguBoom 5 місяців тому
Hi, about the sample has X1 to Xn, do X1 and Xn have to be different? Because you have a previous sample of 100 height from 100 different people. Or it can be like we treated in bootstrap that X1* to Xn* can be drawn randomly from X1 to Xn so basically can draw same height of a single person?
@feriyonika7078 6 місяців тому
Thanks, I can more understand about KF.
@usurper1091 6 місяців тому
7:10
@lingfengzhang2943 7 місяців тому
Thanks! It's very clear
@user-uk2rv4kt8d 7 місяців тому
very good video. perfect explaination!
@sadeghmirzaei9330 7 місяців тому
Thank you so much for your explanation.🎉
@laitinenpp 7 місяців тому
Great job, thank you!
@SCramah13 7 місяців тому
Clean explanation. Thank you very much...cheers~
@felipela2227 8 місяців тому
Your explanation was great, thx
@vambire02 8 місяців тому
Disappointed ☹️ no part 3
@Commonsenseisrare 9 місяців тому
Amazing lecture of gnns.
@cmobarry 9 місяців тому
I like your term "Word Algebra". It might be unintended side effect but I have been pondering it for years!
@rakr6635 10 місяців тому
no part 3, sad 😥
@vgreddysaragada 10 місяців тому
Great work..
@boussouarsari4482 10 місяців тому
I believe there might be an issue with the perplexity formula. How can we refer to 'w' as the test set containing 'm' sentences, denoting 'm' as the number of sentences, and then immediately after state that 'm' represents the number of all words in the entire test set? This description lacks clarity and coherence. Could you please clarify this part to make it more understandable?
@GrafBazooka 11 місяців тому
i cant concentrate she is too hot 🤔😰
@sunnelyeh 11 місяців тому
this video represent meaning that F/A 18 has capability locked UFO!
@thefantasticman 11 місяців тому
hard to foucus on ppt can any one explain me why ?
@nunaworship 11 місяців тому
Can you please share the link for the books you recommended!
@AoibhinnMcCarthy Рік тому
Hard to follow not concise.
@jcorona4755 Рік тому
Pagan porque vean que tiene más seguidores. De echo pagas $10 pesos por cada video
@g-code9821 Рік тому
Isn't the positional encoding done with the sinusoidal function?
@homataha5626 Рік тому
Hello, Thank you for sharing. Do you have the code repositiry? I only learn after I implemented it.
@MachineLearningTV Рік тому
Unfortunately, no..
@because2022 Рік тому
Great content.
@robinranabhat3125 Рік тому
Anyone. at 31:25, shouldn't the final equation at bottom-right be about minimizing the loss. think that's a typo.
@Karl_with_a_K Рік тому
I have run into token exhaustion while working with GPT4 specifically when it is giving programming language output. Im assuming resolving this will be a component of GPT5...
@yifan1342 Рік тому
sound quality is terrible
@nehalkalita 10 місяців тому
Turning on subtitles can be helpful to some extent.
@majidafra Рік тому
I deeply envy those who have been in your NN & DL class.
@josephzhu5129 Рік тому
Great lecture, he knows how to explain complicated ideas, thanks a lot!
@chris-dx6oh Рік тому
Great video
@ssvl2204 Рік тому
Very nice and conscise presentation, thanks!
@zhaobryan4441 Рік тому
super super clear!
@lara6893 Рік тому
Emily and Carlos rock, heck yeah!!
@StratosFair Рік тому
Great video ! Are you guys planning to upload follow up lectures on this topic ?
@StratosFair Рік тому
Where is the video on recursive least squares though ?

Machine Learning TV

КОМЕНТАРІ