So the talk today is very particular about mission,
really? Investors setting. And this time, the
focus on women's health evaluation as well as some cases
you get, uh, you ask me questions.
Um, So here's the outline A lot. So
we introduced the problem, then explain to you find
out to develop a few years and also held a
find operation which turns out not so straightforward and benefit
compared to from then, show some experiment themselves and
then complete the call. And at the end of
your time to talk about some research on the 24th
. So So this is a graph that shows the
direction between uh huh uhh! You sure? Okay
, this is a confirmation like general age, and
then he comes to the absolute, and that one
has observed strategy that has to decide what form,
use or ranking results to the user and return.
You can provide it back in the fall of the
click revenue. Um, and then all of this
in this kind of direction, uh, we call
this provision of context, um, and then relations
here, Uh, and then the strategy by called
policy and then all the feedback by user is called
This is called action and be like, totally Ward
. And this whole process to go, which I
view, is to maximize the total reward by optimizing
the policy here. So you can imagine it's optimizing
happy to maximize the 6 to 8 and the current
users. So let me give you a more complete
example on the front page. So eventually quotas on
a particular model here called a model a closer look
. Uh, so the module that recommendations are going
to use you can see here, which is a
small number of articles, articles are handpicked by human
, and, uh and then there's it's okay for
that has a big image. So that's the highlighted
articles that we recommend. This is the most problem
, one of the most common positions, and therefore
, we really want to amend the most interesting items
. So, um, so informal. Yes.
I want to interest you survive, explain useful articles
when you press on and leader former if we want
to maximize the number text for the number of effective
rates of ST uh, the many challenges here,
one of them is that the content for several articles
it's changing all the time, so the editors may
actually want to fool or remove forward, because we
try to keep the comments, complicate. Um,
And then there's also the smart services is where,
uh, users, uh, we have a huge
number of searches today, but you're certain movies is
small. So to transfer the knowledge of one another
so that I can use your information to identify your
interest from running a personalized organization, Okay. And
most important, this office is also the impact.
Very prized. Addition, exploration. So let me
elaborate on that. Uh, so again, um
, So I want to recommend the use of,
uh, trying to maximize the question. So the
observation and this process that we can only obtain intact
, but we recommend Well, I know that we
don't recommend you don't see it. So therefore,
don't see whether the use of the artist So this
is partially both. So we want to recommend this
is what we want to recommend to the user.
To do that, we need to estimate each article
trip to raise a great, but we don't know
the rate beforehand. So therefore we need sometimes when
the article is new to the school to mention related
trade for the user to see whether you like it
. So there's a There's a trade off between this
process. Wanted to utilize the knowledge we have data
from the data to do what you want. The
other is to do exploration, collect information from two
. So there's a trade off between these two conflicting
goals. And the problem is how to do that
Turn off dynamic pool or when you try to continue
interest as well. So let me give you an
example, that's where. Sufficient inspiration. So let's
and you. It's too bad machines, machines there
so usually play slot machine expectations. This money.
But, uh, you are nice. You can
earn money right here in the swamp and ability to
maximize them. You're not want one of these machines
. You don't know which one is higher, attractive
rewards for money. So you need to try one
of each one of them that that converts to one
of the things that have a high reward. So
let's say you try first. Want to give$5
and try to 2nd 10 1st, 1570 earlier and
five right, So at this point, I think
that the first machine gives you$5 on average and
the second machine doesn't do anything. So you think
that's okay? Maybe the first machine better than China
. So this is good. I looks good.
However, I think we only see this machine is
the only three data points for the second machine,
so the estimate here can actually be expected. So
let's say it turns out that the first machine$5
per round but the and actually having$100 a lot
of time. So now you have$25 per round
. So it's just unlucky to miss all the$100
around here because you don't China. So it's Friday
so you can see$100 a year and therefore comparing
to the person a draft of$20 per now.
So this is an example, as sufficient exploration of
the second issue give you an estimate of the average
payoff afternoon money to learn, and then we converse
this off first, so even formulate this problem kind
of a record product into something called contextual problem.
There's many names in the literature, so I think
this one, because emphasize the better make sure and
also the contextual nature. So let me explain what
it is. Um, so here we have,
uh, so it's a terrific process. Um,
here we have a candidate which will cost a arm
so you can meditate on the front page module we
have candidates are going to recommend to the user.
Um and then also So this is the teacher asked
. So you're very small, which is, like
a general education and personal interests, etcetera. And
then the decision maker you can choose one of the
actions from this content is displayed to the user and
in return, receive a numerical rewarding, uh,
American from the user. Um, so And then
, after observing this report, you can use this
signal to update our policy strategy and this process with
over time, you want to maximize the summer rewards
before this process. Let's say, one of the
key steps you want to maximize, uh, summer
rewards this key round. Okay. Um so so
technology can then calculate. And the key is the
use of features, uh, is obtained with Spain
on growth. Thank you, sir, And reported
correspondent uh, corresponding. Pointing to care about you
want to maximize the number of then it's natural that
are 10 that is expected by our the great CPR
. So it turns out that this form of this
foundation is one another capturing more patients on the Internet
. So let me see the example of a lot
of money in the street. So this is one
of the Chinese yahoo answers. So you can post
questions. Has questions this one, uh, computers
, they have computers, great computers. And then
these electrons are also advertising. Uh, usually it's
interesting. Uh, then so here in this example
, you can take a traveled a candidate who set
of boxes set up I can display a small region
and then access to the feature of the users in
general, every information in this article, and then
they have to stay here and our state. And
then there's a possible nature where for all other at
that stage is what you don't see the, um
So, for example, is ranking I contractual value
here and see a list of rankings out. Um
, so usually so in this case, uh,
a set of rankings, U. S T is
pretty documents. Are you certain you want to be
personalized. Um, 80 is the ranking of the
breakfast. And are we can define various ways to
find one when the session is a zero. Just
, um So, um yeah, so all I
can. So this, for example, show,
uh, that connection that is, uh, this
morning for many critical applications on the internet. And
it's related to some other, uh, other areas
in the surety of science and statistics. So one
thing you might want to, uh, I wonder
if the connection information that you were collaborative filtering both
of which Parabellum recommending, um, in those organisms
usually, um, they care. They assume that
the established set of articles study seven movies for you
recommend to do so. There's, uh there's no
dynamic content food, So everything that has been a
long time, So you have some data for them
, so you don't need to remind exploration exploitation.
You just need to speak the dataset train part of
the best part train the first test to evaluate the
test. So it's not even our inspiration. Um
, there's also foreign personally, very building is at
the top, so it's more generally that it is
a special case but it has to tackle the Temple
Islamic model. But hopefully it's not a very soon
in a kind of applications that explained just now,
Um, So there's all the traditional, more competitive
, Uh, they don't consider contextual information, so
you can, uh, personalized recommendation. So So
, actually, give you are last report. So
this is an introduction to get kind of problem,
too. So this is, uh, inspection of
the algorithms. Uh, so, uh, this
is the first time that the organization which explain more
later, um, so one way to do exploration
is trying things. So, uh, example,
Uh, So let's say you have three articles here
, and each has its own rates that you don't
know these numbers. Um, And then by using
the locking system, you can estimate the CR use
are just numbers first, uh, so that's a
reasonable estimate of CR. But then you know that
these issues are not accurate. So to collect data
for our refined estimate review so that the better you
can allow us to make the decisions in the future
. So one way call it really is to choose
article that has the highest estimate with high probability one
month and so on. And then a small probability
and so on exploring tomorrow. So this is a
very strange idea. Just assign probabilities to explore something
. Okay, Uh, this exploration is unguided.
You explore. The article is random. So as
you can imagine, it's not the most efficient rate
. So another strategy called UCB one is to choose
articles according to the index computer. This way,
So new is again, you estimate click rate of
the way. And then this, uh, wrap
up here as well and a s okay. It's
a number of times this article has been shown to
music. So at the beginning, when the new
system is very small, that means that we don't
have enough data to to build a good estimate for
that. So this term is paid. And then
, in other words, it encourages the system to
Chinese action. So you can think of that as
exploration boss term approach exploration, Um, and all
the time. And then something becomes watching. This
has made solution s essentially zero, and essentially using
this estimate, it's already very so. These are
the two typical challenges and traditional film problems that do
not consider contact So it means that the assumption is
that the critical made in the article does not depend
on user information. Doesn't become a certification organization,
which is not a reasonable assumption right there with the
different information so consequences that are considering the pictures.
And there's no way to do personalization. So now
so have been candidate or other fraternities were doing personalization
of contextual more. Combat it in the refrigerator back
in 2002. Uh, so, like these people
, as long as the sample, uh, strong
direct guarantees, but conditionally intractable in general, there's
also probably need 2008 to really explain the previous like
you randomize exploration, they do not motivation. Okay
, so this is what I'm going to talk about
. This more compact geometric models that both complications application
and the fact is optimizing maximizing total work with like
so the rest of this session will focus on three
parts. First, I'll explain a generalization of the
you see the strategy for the animals and then extended
to generalize. And then, uh, talk about
randomize our, uh, conscious. Yeah. Yeah
. So let me start with the You see the
strategy for the animals. Uh, so Let's say
so. Here's your moment. Assumption is that these
are the context before the user. Right? Uh
, expectedly. What is? It means that,
uh, that can be estimated by the operation of
this feature. And here take a coefficient corresponding to
that location. But these are not, so you
need to estimated from labor. And now close everything
up. They in the form of making the information
that we have many years of age and a spacious
exhibition and the corresponding let's see that are reversed.
So, in order to estimate favor a very straightforward
to apply a little question for this rich regression.
Mhm. Here, uh, this system statistics,
Um, now. So once you have an explainer
of data and reduce that to estimate when you count
estimated, uh, hold W is estimated close enough
to shoot estimates that, uh, so far so
good. But this grace only gives you a point
estimate in the sense that we give you a number
and tell how confident is this estimation? So we
need to qualify the uncertainty so that we know I'm
sure about the City of article so committed for exploration
, but article I don't have enough data I'm very
unconfident about my in my estimate that we need more
exploration. So we need to qualify this 17 so
that when they do it is to divide, uh
, to use this, uh, quality. We
can show that with high probability, our dictionary So
that the left hand side here is the prediction area
. So this is my question estimate. This is
true ground truth that I don't know, the absolute
differences Prediction Error is founded by the square of this
guy. Modified a super friend to constant. Uh
, So, um, so this time sometimes measures
how similar the new user X is two previous usage
. So it is very close to the previous certain
a Then, um Then the term is small,
in other words, that we have a group of
made this repression, so confidence level is small,
so there's no people need for exploration between, uh
Okay, so, uh huh, for high priority
, but then how we're going to use it.
So we have called in the city or, uh
, this model. So essentially, when you have
a user, then always chooses and on maximizes so
remember no context, you see, you know,
remember this for The first time of both about exploitation
is a point estimate of the, uh, a
second term is a communist involved how a certain in
your estimation and then the other is to the ground
, and I combined two algorithm gives a trade off
between the first part exploitation and exploration. Uh,
the album should have mentioned this job is for you
, C b. Because it changes on according to
offer confidence down nature. So I should mention that
, uh, the same government. And then it's
similar to what we want when we have more and
more simple in the data set and this time it
becomes small. And then you have an estimate for
the first time and essentially doing more exploitation rather than
exploration related algorithm back in 2000 and two that,
uh but it works in a more complicated way.
So how How so Anymore we started doing this work
with many, uh, Chinese. We can follow
, however, is not always perfect. So,
for instance, in the way that we want to
00:21:36.819 --> 00:21:38.920 A:middle L:90%
maximize without, it made the criminal rate a lot
319
00:21:38.920 --> 00:21:41.670 A:middle L:90%
of which is probably so it has to be between
320
00:21:41.680 --> 00:21:47.779 A:middle L:90%
01 but this model here. It's kind of case
321
00:21:47.779 --> 00:21:51.750 A:middle L:90%
that this estimating people just make it doesn't make any
322
00:21:51.759 --> 00:21:55.359 A:middle L:90%
sense, right? So So it's part of the
323
00:21:55.369 --> 00:21:59.009 A:middle L:90%
leading model is not always a supermodel for all applications
324
00:21:59.079 --> 00:22:04.029 A:middle L:90%
provide Everyone is like place generalizing enormous monusco. So
325
00:22:04.039 --> 00:22:07.509 A:middle L:90%
So in general as well as we make the assumption
326
00:22:07.519 --> 00:22:14.569 A:middle L:90%
that expected rewarding even the use of future because it's
327
00:22:14.579 --> 00:22:17.450 A:middle L:90%
the linear, uh, combination of peace of features
328
00:22:17.460 --> 00:22:21.960 A:middle L:90%
. Uh, followed by a fine investment company here
329
00:22:21.970 --> 00:22:26.980 A:middle L:90%
at the university. So there are two, uh
330
00:22:26.990 --> 00:22:30.799 A:middle L:90%
, journalists bringing all those ones power regression where,
331
00:22:30.809 --> 00:22:36.410 A:middle L:90%
uh, the city are tricky ways. Is this
332
00:22:36.420 --> 00:22:42.009 A:middle L:90%
, uh, have statements in many cases, uh
333
00:22:42.089 --> 00:22:47.289 A:middle L:90%
, combination here and then take an exponential and do
334
00:22:47.289 --> 00:22:49.339 A:middle L:90%
this. So after this is washing effect, this
335
00:22:49.339 --> 00:22:57.029 A:middle L:90%
number the brain dysfunction too. So it naturally also
336
00:22:57.259 --> 00:23:02.990 A:middle L:90%
for chronic model with, uh, being here the
337
00:23:03.000 --> 00:23:07.140 A:middle L:90%
cumulative distribution function of distribution. So again, a
338
00:23:07.309 --> 00:23:15.779 A:middle L:90%
function. Unfortunately, when you work with journalists,
339
00:23:15.789 --> 00:23:21.299 A:middle L:90%
many models you can always get those solutions like a
340
00:23:21.309 --> 00:23:23.500 A:middle L:90%
million models. So what we can do here is
341
00:23:23.509 --> 00:23:26.990 A:middle L:90%
to do a presentation. So even you want to
342
00:23:27.000 --> 00:23:30.349 A:middle L:90%
get a point estimate with journalist building models. We
343
00:23:30.349 --> 00:23:34.400 A:middle L:90%
can do this regression so instead of using a combination
344
00:23:34.410 --> 00:23:38.460 A:middle L:90%
to get the point, given the data so that
345
00:23:38.539 --> 00:23:42.109 A:middle L:90%
give us the optimal point estimates. And then we
346
00:23:42.109 --> 00:23:47.859 A:middle L:90%
can do something similar to derive confidence in the role
347
00:23:47.859 --> 00:23:51.930 A:middle L:90%
of the estimation. And then and then the energy
348
00:23:51.940 --> 00:23:53.950 A:middle L:90%
combined, too, can do something like that you
349
00:23:53.950 --> 00:23:59.160 A:middle L:90%
simply by choosing on the the highest wind plus compensation
350
00:23:59.890 --> 00:24:03.720 A:middle L:90%
for doing exploration. So here again, all the
351
00:24:03.720 --> 00:24:07.720 A:middle L:90%
stations in the first case I'm in here, which
352
00:24:07.720 --> 00:24:12.200 A:middle L:90%
is important in the second case of logistic and product
353
00:24:12.210 --> 00:24:17.190 A:middle L:90%
. And so the bones nails a little mysterious boarding
354
00:24:17.230 --> 00:24:21.500 A:middle L:90%
a technique side all the same to travel with the
355
00:24:21.500 --> 00:24:23.490 A:middle L:90%
loss of the being fixed up. It's more comfortable
356
00:24:23.500 --> 00:24:26.029 A:middle L:90%
, so I'm going to show that you don't hear
357
00:24:26.029 --> 00:24:33.259 A:middle L:90%
the ideas of person to indicate, and later we'll
358
00:24:33.269 --> 00:24:37.819 A:middle L:90%
show you some show. Some comparisons showed journalists better
359
00:24:37.819 --> 00:24:48.589 A:middle L:90%
than, uh rather than America. So so other
360
00:24:48.589 --> 00:24:52.039 A:middle L:90%
confident found exploration has been very popular in the literature
361
00:24:52.049 --> 00:24:53.279 A:middle L:90%
for a long time. People know that it's very
362
00:24:53.289 --> 00:24:56.789 A:middle L:90%
topic you can you can put a lot of interesting
363
00:24:56.799 --> 00:25:00.329 A:middle L:90%
here, uh, show that very quickly to the
364
00:25:00.480 --> 00:25:04.549 A:middle L:90%
solution. Um, and then, um and then
365
00:25:04.559 --> 00:25:07.980 A:middle L:90%
, uh, there are also indications of this kind
366
00:25:07.980 --> 00:25:12.420 A:middle L:90%
of exploration. So, uh, so the first
367
00:25:12.430 --> 00:25:17.059 A:middle L:90%
thing is that exploration and too much into the exploration
368
00:25:17.069 --> 00:25:21.089 A:middle L:90%
by adding exploration, exploration, bonus to index.
369
00:25:21.099 --> 00:25:23.680 A:middle L:90%
Uh, the economic can easily explored all the potential
370
00:25:23.690 --> 00:25:26.589 A:middle L:90%
piece part of the French stage. Explore all the
371
00:25:26.599 --> 00:25:30.799 A:middle L:90%
friends. Uh, this could be inefficient. Uh
372
00:25:30.809 --> 00:25:34.809 A:middle L:90%
, then it especially when you have high large,
373
00:25:34.819 --> 00:25:37.769 A:middle L:90%
that helps you. This part of the primary is
374
00:25:37.779 --> 00:25:41.720 A:middle L:90%
hopefully helping out parts that used to be still exploding
375
00:25:41.730 --> 00:25:45.470 A:middle L:90%
. Providence is uncertain. So you do want to
376
00:25:45.480 --> 00:25:49.069 A:middle L:90%
use any prior knowledge you have to hurt into performances
377
00:25:49.119 --> 00:25:52.579 A:middle L:90%
. The algorithm. Uh, the second thing is
378
00:25:52.579 --> 00:25:56.549 A:middle L:90%
that exploration using instinctual next so you can see is
379
00:25:56.549 --> 00:26:00.509 A:middle L:90%
the mystic out that always chooses the same home with
380
00:26:00.519 --> 00:26:03.430 A:middle L:90%
that has the maximum mean plus confidence. And,
381
00:26:03.470 --> 00:26:07.069 A:middle L:90%
uh, so there's a problem with the rewards of
382
00:26:07.069 --> 00:26:08.609 A:middle L:90%
the right. So you got to bring, since
383
00:26:08.619 --> 00:26:11.420 A:middle L:90%
we don't really get to use, these are needed
384
00:26:11.430 --> 00:26:15.430 A:middle L:90%
to. So we should the pace and user user
385
00:26:15.440 --> 00:26:18.880 A:middle L:90%
make looking to pay for three seconds and place,
386
00:26:18.150 --> 00:26:22.059 A:middle L:90%
and due to the infrastructure of their expectations, that
387
00:26:22.069 --> 00:26:27.329 A:middle L:90%
doesn't go back to the back and updates immediately,
388
00:26:27.339 --> 00:26:30.660 A:middle L:90%
so they usually some kind of delay and considering the
389
00:26:30.660 --> 00:26:34.410 A:middle L:90%
amount of time to have that few seconds or minutes
390
00:26:34.410 --> 00:26:37.750 A:middle L:90%
of delay correspond to thousands, tens of thousands of
391
00:26:37.759 --> 00:26:42.220 A:middle L:90%
users or maybe more. So in that way there's
392
00:26:42.230 --> 00:26:48.190 A:middle L:90%
a delay of 10,001 million steps, so you can
393
00:26:48.480 --> 00:26:52.259 A:middle L:90%
only get New York one million steps, uh,
394
00:26:52.269 --> 00:26:55.660 A:middle L:90%
in the past. So this is bad for eternity
395
00:26:55.730 --> 00:27:00.549 A:middle L:90%
. Determination, strategies because this black healthcare is always
396
00:27:00.549 --> 00:27:03.390 A:middle L:90%
use the same over and over again. The same
397
00:27:03.400 --> 00:27:04.910 A:middle L:90%
symptoms. Um, and that may not be the
398
00:27:04.910 --> 00:27:07.990 A:middle L:90%
best thing to do because you could have randomized strategies
399
00:27:07.990 --> 00:27:11.250 A:middle L:90%
so that you can explore different things in this back
400
00:27:11.259 --> 00:27:15.569 A:middle L:90%
to, uh, the first thing is that writing
401
00:27:15.579 --> 00:27:18.470 A:middle L:90%
a comic style is not always easy to use it
402
00:27:18.529 --> 00:27:21.390 A:middle L:90%
, so we can do that exactly. Arena Nigeria
403
00:27:21.400 --> 00:27:25.359 A:middle L:90%
approximately for generalized linear models after a bit of a
404
00:27:25.369 --> 00:27:27.180 A:middle L:90%
bit more work, but for other models, more
405
00:27:27.180 --> 00:27:33.019 A:middle L:90%
difficult. So here I'm going to describe a socialistic
406
00:27:33.269 --> 00:27:37.329 A:middle L:90%
concepts. In fact, three. Um so,
407
00:27:37.329 --> 00:27:41.210 A:middle L:90%
yeah, it's called probability. Metric, um,
408
00:27:41.220 --> 00:27:47.349 A:middle L:90%
so even the user X takes them on a case
409
00:27:47.349 --> 00:27:52.019 A:middle L:90%
. An article accord, a probability that is the
410
00:27:52.029 --> 00:27:56.230 A:middle L:90%
probability that this article, so it's the algorithm thinks
411
00:27:56.230 --> 00:28:00.650 A:middle L:90%
that Apple is 90% of the property. Chances are
412
00:28:00.700 --> 00:28:06.089 A:middle L:90%
so is optimal for most interesting for this current user
413
00:28:06.099 --> 00:28:08.259 A:middle L:90%
. Then, uh, we think properly. So
414
00:28:08.259 --> 00:28:11.230 A:middle L:90%
that's the idea. Uh um So, first of
415
00:28:11.230 --> 00:28:14.539 A:middle L:90%
all, you know, this is a randomized strategy
416
00:28:14.549 --> 00:28:18.400 A:middle L:90%
, so it's more of us to work the legend
417
00:28:18.410 --> 00:28:22.799 A:middle L:90%
you finally can. Delay is different. You can
418
00:28:22.809 --> 00:28:25.980 A:middle L:90%
do exploration, different users. And at the end
419
00:28:25.980 --> 00:28:26.690 A:middle L:90%
of the financial aid, you can have people.
420
00:28:26.700 --> 00:28:30.900 A:middle L:90%
You can have data for different sort of users and
421
00:28:30.910 --> 00:28:33.849 A:middle L:90%
that that's the whole, um and then, more
422
00:28:33.849 --> 00:28:37.910 A:middle L:90%
importantly, is straightforward information. So So, for
423
00:28:37.910 --> 00:28:41.529 A:middle L:90%
instance, in, uh, you have a logistic
424
00:28:41.529 --> 00:28:45.950 A:middle L:90%
model role model, you can just have a posterior
425
00:28:45.950 --> 00:28:51.900 A:middle L:90%
institution of gravity trying to work and posterior computer by
426
00:28:51.900 --> 00:28:55.390 A:middle L:90%
any standard based in the tradition of the data product
427
00:28:55.400 --> 00:28:59.069 A:middle L:90%
. So you can maintain posterior here and then and
428
00:28:59.079 --> 00:29:02.569 A:middle L:90%
then when the user accounts you want to recommend an
429
00:29:02.579 --> 00:29:07.230 A:middle L:90%
article, you just draw 11 parameter from this posterior
430
00:29:07.240 --> 00:29:12.250 A:middle L:90%
so each age or randomized to a and then to
431
00:29:12.259 --> 00:29:17.450 A:middle L:90%
choose out According to this random, uh, Grant
432
00:29:17.460 --> 00:29:22.650 A:middle L:90%
Baker. So efforts were model model logistic models,
433
00:29:22.920 --> 00:29:29.849 A:middle L:90%
so you can see because this market to join from
434
00:29:29.849 --> 00:29:33.000 A:middle L:90%
this possibility this process can be targeted. You can
435
00:29:33.000 --> 00:29:37.089 A:middle L:90%
show that this problem actually satisfied the probability that in
436
00:29:37.089 --> 00:29:41.609 A:middle L:90%
principle that I thought has 90% of China's, ultimately
437
00:29:41.619 --> 00:29:47.099 A:middle L:90%
you will be chosen probably 10%. Uh, this
438
00:29:47.099 --> 00:29:52.640 A:middle L:90%
part of Italy combined with many other models, financial
439
00:29:52.640 --> 00:29:56.710 A:middle L:90%
models and models because I think it's more you don't
440
00:29:56.720 --> 00:30:00.400 A:middle L:90%
have to devise new forms or anything. So this
441
00:30:00.400 --> 00:30:07.700 A:middle L:90%
is, uh this is useful in practice. Okay
442
00:30:07.710 --> 00:30:12.769 A:middle L:90%
, So I have really, uh, kind of
443
00:30:12.769 --> 00:30:18.279 A:middle L:90%
algorithm in general thing. And so you're like,
444
00:30:18.289 --> 00:30:22.009 A:middle L:90%
wonder how can probably whether different so your machine,
445
00:30:22.019 --> 00:30:26.549 A:middle L:90%
you do care about finding a massive, rigorous way
446
00:30:26.559 --> 00:30:30.170 A:middle L:90%
to measure the performance and the speed of an algorithm
447
00:30:30.180 --> 00:30:34.500 A:middle L:90%
. In this case, that's about it. The
448
00:30:34.500 --> 00:30:37.950 A:middle L:90%
normal, uh, metric is called agreement. So
449
00:30:37.950 --> 00:30:41.200 A:middle L:90%
here, uh, this is the expectation of the
450
00:30:41.210 --> 00:30:47.539 A:middle L:90%
summer rewards. Uh, so this is an expectation
451
00:30:47.539 --> 00:30:51.500 A:middle L:90%
of some reward of, uh, article algorithm,
452
00:30:51.509 --> 00:30:55.960 A:middle L:90%
assuming that the horrible knows the parameters of this problem
453
00:30:56.140 --> 00:31:00.299 A:middle L:90%
. So this is the best possible highest possible result
454
00:31:00.329 --> 00:31:03.670 A:middle L:90%
. Actually, you can hope for assuming we loaded
455
00:31:03.950 --> 00:31:07.990 A:middle L:90%
it. But on the other hand, you have
456
00:31:07.990 --> 00:31:10.549 A:middle L:90%
an algorithm that does not know the data beforehand.
457
00:31:10.559 --> 00:31:14.369 A:middle L:90%
So thousands adaptive and try to take a and this
458
00:31:14.369 --> 00:31:21.140 A:middle L:90%
is the sequence chooses for the user. So so
459
00:31:21.150 --> 00:31:25.529 A:middle L:90%
that we have an expectation of the total reward.
460
00:31:25.539 --> 00:31:29.299 A:middle L:90%
And the difference between them is called. Uh,
461
00:31:29.309 --> 00:31:33.450 A:middle L:90%
it's always nominated and the strength here, uh,
462
00:31:33.460 --> 00:31:40.299 A:middle L:90%
And then, uh so, uh and then regret
463
00:31:40.690 --> 00:31:45.940 A:middle L:90%
regret. It's not really it's suffering. Then you
464
00:31:45.940 --> 00:31:48.720 A:middle L:90%
can say that the album learn because it's suffering,
465
00:31:48.779 --> 00:31:53.569 A:middle L:90%
then divide users and have less than to regret.
466
00:31:53.579 --> 00:31:59.049 A:middle L:90%
So the first regret, if you decrease the zero
467
00:31:59.059 --> 00:32:00.190 A:middle L:90%
, this other small, smaller one. So the
468
00:32:00.200 --> 00:32:05.569 A:middle L:90%
personal regret decrease to zero all the time. In
469
00:32:05.569 --> 00:32:07.920 A:middle L:90%
other words, it is that we're running algorithms for
470
00:32:07.920 --> 00:32:09.960 A:middle L:90%
a long time. Then the algorithm will converge to
471
00:32:09.970 --> 00:32:16.180 A:middle L:90%
the point that you don't know what point. So
472
00:32:16.190 --> 00:32:21.829 A:middle L:90%
whenever this is how you learn and the album and
473
00:32:21.829 --> 00:32:24.519 A:middle L:90%
the algorithm learns that when, um, other small
474
00:32:24.529 --> 00:32:30.170 A:middle L:90%
a small outbreaks. But the bastard this average increases
475
00:32:30.180 --> 00:32:37.650 A:middle L:90%
to zero. So the algorithm suburbs. So this
476
00:32:37.650 --> 00:32:42.980 A:middle L:90%
is a pragmatic, uh, there's a mathematical metric
477
00:32:42.990 --> 00:32:46.119 A:middle L:90%
to measure the performance of the algorithm down. So
478
00:32:46.130 --> 00:32:49.130 A:middle L:90%
, for instance, the in your face and show
479
00:32:49.130 --> 00:32:51.910 A:middle L:90%
that, uh, the things that regret growth on
480
00:32:51.920 --> 00:32:55.269 A:middle L:90%
all of us were kpp case A number of articles
481
00:32:55.279 --> 00:33:01.579 A:middle L:90%
system the the features of the, uh, step
482
00:33:01.589 --> 00:33:06.460 A:middle L:90%
. So So let me give you that. How
483
00:33:06.470 --> 00:33:07.799 A:middle L:90%
these numbers mean, So that means that the actual
484
00:33:07.880 --> 00:33:13.849 A:middle L:90%
decreases the optimal solution at the rate of just turned
485
00:33:13.859 --> 00:33:16.660 A:middle L:90%
by key. First step, regret this, um
486
00:33:16.670 --> 00:33:22.000 A:middle L:90%
, So when you have 20 articles and 15 features
487
00:33:22.009 --> 00:33:25.569 A:middle L:90%
and 10 million is just a small community, china
488
00:33:25.750 --> 00:33:29.559 A:middle L:90%
, and then you can see that kind of a
489
00:33:29.569 --> 00:33:34.440 A:middle L:90%
small, uh, so quickly go to the ultimate
490
00:33:34.440 --> 00:33:37.799 A:middle L:90%
solution. Uh, that is a very good practice
491
00:33:38.950 --> 00:33:43.359 A:middle L:90%
, and generalized model is still open. Uh,
492
00:33:43.369 --> 00:33:47.180 A:middle L:90%
there's a valiant general model last year. Uh,
493
00:33:47.190 --> 00:33:54.210 A:middle L:90%
people should regret that study a slightly different, Uh
494
00:33:54.220 --> 00:34:00.390 A:middle L:90%
, this is not comparable and constant something, uh
495
00:34:00.400 --> 00:34:02.529 A:middle L:90%
, I think is a whole algorithm that, uh
496
00:34:02.539 --> 00:34:07.240 A:middle L:90%
, there's not been much attention to the Microsoft,
497
00:34:07.619 --> 00:34:13.230 A:middle L:90%
uh, exploration strategy to be particularly helpful in practice
498
00:34:13.239 --> 00:34:15.920 A:middle L:90%
. Um, so now people are starting to analysis
499
00:34:15.929 --> 00:34:21.309 A:middle L:90%
. Um, so So just, uh, last
500
00:34:21.309 --> 00:34:24.369 A:middle L:90%
week or three candidates. So something, uh,
501
00:34:25.599 --> 00:34:29.480 A:middle L:90%
I don't have to get the best possible way.
502
00:34:29.489 --> 00:34:36.059 A:middle L:90%
Uh, so the two days ago, Uh,
503
00:34:36.539 --> 00:34:39.119 A:middle L:90%
okay, So I hope so far. Show you
504
00:34:39.130 --> 00:34:43.800 A:middle L:90%
a lot of problems Were interested in solving, uh
505
00:34:43.809 --> 00:34:46.949 A:middle L:90%
, typical and algorithms that we can reason in solving
506
00:34:46.960 --> 00:34:51.949 A:middle L:90%
problems. And now we switch to the, uh
507
00:34:51.960 --> 00:34:58.110 A:middle L:90%
, evaluation problem. Um, so So the goal
508
00:34:58.110 --> 00:35:00.670 A:middle L:90%
here is that you have an algorithm of natural bank
509
00:35:00.670 --> 00:35:05.420 A:middle L:90%
problem, and you should become either come to the
510
00:35:05.420 --> 00:35:09.949 A:middle L:90%
side of the distribution. Um, and then the
511
00:35:09.960 --> 00:35:15.670 A:middle L:90%
algorithm is adapted, meaning that the learn from past
512
00:35:15.679 --> 00:35:20.179 A:middle L:90%
data and then for dessert, nature. Even then
513
00:35:20.179 --> 00:35:24.349 A:middle L:90%
you can define the value of the average percentage would
514
00:35:24.360 --> 00:35:30.340 A:middle L:90%
collapse Hope. So this is, uh, compared
515
00:35:30.340 --> 00:35:31.900 A:middle L:90%
to regret the same set of, uh, aggressive
516
00:35:31.920 --> 00:35:37.500 A:middle L:90%
difference between, uh, we can actually measure.
517
00:35:37.510 --> 00:35:40.889 A:middle L:90%
We cannot measure the optimal because we don't have a
518
00:35:40.929 --> 00:35:45.969 A:middle L:90%
conservation. That's why you're not fishing for religious emergency
519
00:35:45.340 --> 00:35:51.840 A:middle L:90%
. Uh, look at the financial quantity and then
520
00:35:51.849 --> 00:35:54.619 A:middle L:90%
for study, uh, study How high? It
521
00:35:54.619 --> 00:36:00.380 A:middle L:90%
doesn't learn something policy doesn't learn from when the savings
522
00:36:00.380 --> 00:36:04.219 A:middle L:90%
accounts always recommend the market. So so now can
523
00:36:04.230 --> 00:36:09.469 A:middle L:90%
this vulgar the first one can achieve. So these
524
00:36:09.469 --> 00:36:13.610 A:middle L:90%
are these two numbers are the numbers trying to estimate
525
00:36:13.619 --> 00:36:17.309 A:middle L:90%
prevented the way? Uh, the question is whether
526
00:36:17.309 --> 00:36:20.780 A:middle L:90%
we can do it online. So that was a
527
00:36:20.780 --> 00:36:24.179 A:middle L:90%
lot of data from QuickBooks. We don't want to
528
00:36:24.190 --> 00:36:27.480 A:middle L:90%
do when we have the algorithm. We don't want
529
00:36:27.480 --> 00:36:31.579 A:middle L:90%
to be the algorithm real system. It's always depend
530
00:36:31.579 --> 00:36:34.820 A:middle L:90%
a lot of time. It takes money. And
531
00:36:35.349 --> 00:36:40.920 A:middle L:90%
so immigration utilize data to evaluate a new algorithm in
532
00:36:40.920 --> 00:36:46.820 A:middle L:90%
a lot of benefits. That way uh, even
533
00:36:46.829 --> 00:36:51.000 A:middle L:90%
, uh, testing system. In a way,
534
00:36:51.010 --> 00:36:55.750 A:middle L:90%
this reserve experiences and it's very common in a machine
535
00:36:55.809 --> 00:37:00.369 A:middle L:90%
power, like we have a lot of benchmark.
536
00:37:00.369 --> 00:37:04.579 A:middle L:90%
It is that organizes input, use of nature versus
537
00:37:04.590 --> 00:37:08.789 A:middle L:90%
label finding classification zero in this case. So it's
538
00:37:08.909 --> 00:37:12.849 A:middle L:90%
a different training part of the test part. I
539
00:37:12.860 --> 00:37:17.219 A:middle L:90%
think we have done a very uh huh. But
540
00:37:17.230 --> 00:37:21.730 A:middle L:90%
interactive machine learning problem is not so straightforward. So
541
00:37:21.730 --> 00:37:22.780 A:middle L:90%
, for instance, better in the morning on benefits
542
00:37:22.780 --> 00:37:27.130 A:middle L:90%
, especially in this for the contact use of interest
543
00:37:27.170 --> 00:37:30.860 A:middle L:90%
we have recommended on and also the use of my
544
00:37:30.860 --> 00:37:35.280 A:middle L:90%
article for that, uh, so when you use
545
00:37:35.289 --> 00:37:37.889 A:middle L:90%
this data, this historical data to develop a new
546
00:37:37.900 --> 00:37:40.760 A:middle L:90%
outbreak, then you don't have the rewards technical where
547
00:37:40.760 --> 00:37:45.199 A:middle L:90%
the algorithm recommends and then on the law, because
548
00:37:45.199 --> 00:37:47.599 A:middle L:90%
we only see we work on the only see when
549
00:37:47.610 --> 00:37:51.960 A:middle L:90%
you press on the on the right Now for other
550
00:37:51.969 --> 00:37:54.090 A:middle L:90%
on this state. Uh, therefore, there's a
551
00:37:54.099 --> 00:37:58.019 A:middle L:90%
this is what we call a national level nature.
552
00:37:58.030 --> 00:38:00.530 A:middle L:90%
We don't see that we work for honest, not
553
00:38:00.530 --> 00:38:04.570 A:middle L:90%
in the law. Uh, for this reason,
554
00:38:04.579 --> 00:38:08.409 A:middle L:90%
A straightforward way to use historical data as follows.
555
00:38:08.420 --> 00:38:12.130 A:middle L:90%
Um, so let's say you have a lot of
556
00:38:12.139 --> 00:38:16.360 A:middle L:90%
historical data this morning. You interact and work and
557
00:38:16.460 --> 00:38:22.300 A:middle L:90%
whatever statistical machine learning techniques to estimate this, we
558
00:38:22.300 --> 00:38:24.650 A:middle L:90%
work at work. So you can say to have
559
00:38:24.659 --> 00:38:29.949 A:middle L:90%
a function that stimulates what users in practice. And
560
00:38:29.949 --> 00:38:31.260 A:middle L:90%
hopefully, if you can get this right, then
561
00:38:31.260 --> 00:38:35.690 A:middle L:90%
you can use this simulator to estimate how well the
562
00:38:35.699 --> 00:38:39.599 A:middle L:90%
algorithm. Unfortunately, um, the stuff here is
563
00:38:39.610 --> 00:38:44.170 A:middle L:90%
very difficult. All bias into more consumption is such
564
00:38:44.170 --> 00:38:46.429 A:middle L:90%
a uh and then the second step here, evaporation
565
00:38:46.440 --> 00:38:50.210 A:middle L:90%
to be unreliable. So that gives you a lot
566
00:38:50.210 --> 00:38:52.050 A:middle L:90%
of pain. When you use this kind of data
567
00:38:52.150 --> 00:38:58.079 A:middle L:90%
violation, you say, let's say a 10% so
568
00:38:58.090 --> 00:39:00.579 A:middle L:90%
that's a good number. But since these numbers are
569
00:39:00.590 --> 00:39:06.170 A:middle L:90%
undeniable, so because my vision in 10% 1% or
570
00:39:06.179 --> 00:39:09.809 A:middle L:90%
20%. So it's not good for my evaluation purposes
571
00:39:09.820 --> 00:39:14.019 A:middle L:90%
. And in contrast, uh, solution that we
572
00:39:14.030 --> 00:39:16.269 A:middle L:90%
and that is a kind of modern step, which
573
00:39:16.269 --> 00:39:22.130 A:middle L:90%
is make a procedure simple and most importantly, we
574
00:39:22.139 --> 00:39:27.219 A:middle L:90%
try to imagine a method that some unbiased. So
575
00:39:27.219 --> 00:39:30.630 A:middle L:90%
it's reliable. So this is the so here's how
576
00:39:30.630 --> 00:39:36.809 A:middle L:90%
we're going to do it again. So this is
577
00:39:36.809 --> 00:39:39.300 A:middle L:90%
the quantity which way? Simplicity. I just showed
578
00:39:39.300 --> 00:39:44.019 A:middle L:90%
the, uh, case, uh, in case
579
00:39:44.030 --> 00:39:46.440 A:middle L:90%
you have done the same way better. Here we
580
00:39:46.440 --> 00:39:52.230 A:middle L:90%
have an algorithm is a key requirement in the data
581
00:39:52.239 --> 00:39:57.130 A:middle L:90%
collection is that we have, uh, in this
582
00:39:57.130 --> 00:40:00.329 A:middle L:90%
war, all this age, you have to be
583
00:40:00.340 --> 00:40:04.860 A:middle L:90%
chosen on the revenue. So this young piece that
584
00:40:04.869 --> 00:40:08.909 A:middle L:90%
all our candidates help chances state. So no one
585
00:40:09.150 --> 00:40:15.820 A:middle L:90%
, no one will be starting from 2016. Um
586
00:40:16.059 --> 00:40:19.409 A:middle L:90%
, And then, if that's the function, both
587
00:40:19.420 --> 00:40:22.019 A:middle L:90%
then when we so then we can go to the
588
00:40:22.030 --> 00:40:24.519 A:middle L:90%
data one by one. So you can look at
589
00:40:24.519 --> 00:40:30.230 A:middle L:90%
the data and then the resource to the way that
590
00:40:30.230 --> 00:40:35.860 A:middle L:90%
they were going to recommend a two. So we
591
00:40:35.869 --> 00:40:37.880 A:middle L:90%
have here is recommended articles, and then we can
592
00:40:37.889 --> 00:40:42.920 A:middle L:90%
compare this a hash to the article in the paper
593
00:40:44.619 --> 00:40:49.019 A:middle L:90%
. And when when these two articles are the same
594
00:40:49.030 --> 00:40:51.389 A:middle L:90%
, then we call it a match, and then
595
00:40:51.400 --> 00:40:55.530 A:middle L:90%
we have to work. Single article was the algorithm
596
00:40:55.539 --> 00:41:00.460 A:middle L:90%
to allow you to learn more, and it was
597
00:41:00.469 --> 00:41:02.849 A:middle L:90%
not a match. Then we just simply ignore that
598
00:41:02.849 --> 00:41:07.639 A:middle L:90%
step. So we use that point of the data
599
00:41:08.719 --> 00:41:12.000 A:middle L:90%
. And then finally, we add up all the
600
00:41:12.000 --> 00:41:16.880 A:middle L:90%
rewards in the masculine and modern Friday by. So
601
00:41:16.880 --> 00:41:21.750 A:middle L:90%
this icy indicator function, which is one when the
602
00:41:21.760 --> 00:41:27.380 A:middle L:90%
equal match zero when there is no match. So
603
00:41:27.389 --> 00:41:30.010 A:middle L:90%
since the arms in this past in this data is
604
00:41:30.019 --> 00:41:34.400 A:middle L:90%
chosen by uniform, that random and therefore the probability
605
00:41:34.400 --> 00:41:38.389 A:middle L:90%
that you see on that it's one Okay, uh
606
00:41:38.400 --> 00:41:40.949 A:middle L:90%
, network. And you see okay here, which
607
00:41:40.949 --> 00:41:46.739 A:middle L:90%
is to normalize something. But this is the size
608
00:41:46.739 --> 00:41:52.619 A:middle L:90%
of data. So they can. So that knowledge
609
00:41:52.630 --> 00:41:59.309 A:middle L:90%
or producer Okay, um so yes, probably.
610
00:41:59.320 --> 00:42:01.250 A:middle L:90%
Uh, so this method is satisfied with what we
611
00:42:01.250 --> 00:42:05.860 A:middle L:90%
wanted to be. So the first one is that
612
00:42:05.869 --> 00:42:09.889 A:middle L:90%
this number is estimated so which means that if you
613
00:42:09.900 --> 00:42:15.610 A:middle L:90%
use the estimated on the algorithm and on average,
614
00:42:15.619 --> 00:42:19.880 A:middle L:90%
keeping the same number because you run the algorithm real
615
00:42:19.880 --> 00:42:22.170 A:middle L:90%
system. So this is nice because you don't have
616
00:42:22.170 --> 00:42:24.050 A:middle L:90%
to run the system expensive way in the system.
617
00:42:24.050 --> 00:42:30.739 A:middle L:90%
But again, you can get reliable and seven,
618
00:42:30.750 --> 00:42:32.869 A:middle L:90%
it's about estimation there. So you can You can
619
00:42:32.869 --> 00:42:36.289 A:middle L:90%
also show that when you have more and more data
620
00:42:36.300 --> 00:42:38.860 A:middle L:90%
than the error estimation, error goes away. Decreases
621
00:42:38.860 --> 00:42:43.349 A:middle L:90%
to zero rate of this. So again, case
622
00:42:43.349 --> 00:42:45.409 A:middle L:90%
number, arms and no signs of abating. So
623
00:42:45.409 --> 00:42:49.880 A:middle L:90%
you have a very large out then this term is
624
00:42:49.889 --> 00:42:53.079 A:middle L:90%
small, and that's what estimation areas is very small
625
00:42:53.170 --> 00:42:57.409 A:middle L:90%
. So the second theory makes an idea how confident
626
00:42:57.420 --> 00:43:00.079 A:middle L:90%
to estimation is. So now I can say my
627
00:43:00.090 --> 00:43:05.510 A:middle L:90%
CEO, that with this deal algorithm, I can
628
00:43:05.519 --> 00:43:10.239 A:middle L:90%
improve the system by 10% plus minus 1% this country
629
00:43:10.250 --> 00:43:15.619 A:middle L:90%
. Uh, so, yeah, so let me
630
00:43:15.619 --> 00:43:19.889 A:middle L:90%
show you something, case knowledge. Is that how
631
00:43:21.800 --> 00:43:22.960 A:middle L:90%
so? We have a lot more in the recent
632
00:43:23.409 --> 00:43:30.079 A:middle L:90%
, uh, recommendations. Um, we have to
633
00:43:30.090 --> 00:43:35.570 A:middle L:90%
evaluate, start much power with she recommends producers,
634
00:43:35.820 --> 00:43:39.329 A:middle L:90%
and then we have a data violation. So this
635
00:43:39.340 --> 00:43:44.590 A:middle L:90%
data satisfied climate that, uh, the article recommended
636
00:43:44.599 --> 00:43:50.139 A:middle L:90%
this. Uh huh. And then we use this
637
00:43:50.150 --> 00:43:53.639 A:middle L:90%
data to do online valuation for these policies of this
638
00:43:53.650 --> 00:43:58.440 A:middle L:90%
problem. So now we have two numbers. The
639
00:43:58.449 --> 00:44:02.610 A:middle L:90%
first is that because Iran is also in the campaign
640
00:44:02.619 --> 00:44:06.159 A:middle L:90%
. So we have metrics to measure what the average
641
00:44:06.170 --> 00:44:09.110 A:middle L:90%
trip to a sexual using Rivera just a counter process
642
00:44:09.110 --> 00:44:13.139 A:middle L:90%
experiencing on the other hand, we have outlined valuation
643
00:44:13.139 --> 00:44:16.280 A:middle L:90%
metrics using methodology scratch now. And the question is
644
00:44:16.289 --> 00:44:20.539 A:middle L:90%
whether the clothes, the clothes, that means the
645
00:44:20.550 --> 00:44:24.989 A:middle L:90%
outline valuation that is reliable French. So let me
646
00:44:24.989 --> 00:44:28.900 A:middle L:90%
show you some numbers. So the first part is
647
00:44:28.909 --> 00:44:32.969 A:middle L:90%
the water of the online click through rate of an
648
00:44:32.980 --> 00:44:37.730 A:middle L:90%
article versus the online estimated to sit down. So
649
00:44:37.739 --> 00:44:42.039 A:middle L:90%
every point respond to one of the articles a 10
650
00:44:42.039 --> 00:44:45.590 A:middle L:90%
day period so you can see the articles here.
651
00:44:45.599 --> 00:44:51.599 A:middle L:90%
Um and then I feel if the offline estimated a
652
00:44:51.610 --> 00:44:55.219 A:middle L:90%
that is a the same as online PR, then
653
00:44:55.230 --> 00:45:00.320 A:middle L:90%
every point right on this one x line, but
654
00:45:00.329 --> 00:45:01.949 A:middle L:90%
later on a finite. And there's some noises.
655
00:45:01.960 --> 00:45:07.139 A:middle L:90%
So you don't actually see, uh, in perfect
656
00:45:07.139 --> 00:45:09.250 A:middle L:90%
alignment agency that they're very close to this while you
657
00:45:09.260 --> 00:45:14.579 A:middle L:90%
explain meaning that this estimated to be in compliance so
658
00:45:14.579 --> 00:45:17.920 A:middle L:90%
you can give you a reliant estimated probably about running
659
00:45:19.219 --> 00:45:23.179 A:middle L:90%
. So here's another bottle that shows that compares.
660
00:45:23.190 --> 00:45:28.840 A:middle L:90%
The baby's getting on okay, offices and they that's
661
00:45:28.840 --> 00:45:31.239 A:middle L:90%
a mandate, period. So the green walk the
662
00:45:31.250 --> 00:45:36.409 A:middle L:90%
red one is the online and the background here,
663
00:45:36.420 --> 00:45:39.679 A:middle L:90%
the Alibaba CEO and each one responded overall city of
664
00:45:39.679 --> 00:45:42.519 A:middle L:90%
the whole day, you can see that there's a
665
00:45:42.530 --> 00:45:45.059 A:middle L:90%
strong correlation between these two curves. Okay, so
666
00:45:45.059 --> 00:45:50.989 A:middle L:90%
it's also equally on the table on four applicator.
667
00:45:52.900 --> 00:45:57.760 A:middle L:90%
The second question how the estimation ever of the CPR
668
00:45:57.769 --> 00:46:00.000 A:middle L:90%
carries zero and you have more and more data.
669
00:46:00.010 --> 00:46:01.929 A:middle L:90%
So we control the size of the U. S
670
00:46:02.039 --> 00:46:06.800 A:middle L:90%
. And look at the estimation of the sciences.
671
00:46:07.099 --> 00:46:09.920 A:middle L:90%
And, uh, I mentioned that they are mobilized
672
00:46:10.260 --> 00:46:15.190 A:middle L:90%
because business sensitivity We are not allowed to view the
673
00:46:15.199 --> 00:46:17.670 A:middle L:90%
real numbers. So my mother, all these numbers
674
00:46:17.679 --> 00:46:25.460 A:middle L:90%
by constant constant therefore so these numbers are absolutely You
675
00:46:25.460 --> 00:46:30.139 A:middle L:90%
can see that this estimation error is not allowed to
676
00:46:30.159 --> 00:46:37.010 A:middle L:90%
feel that destination to always, so you can actually
677
00:46:37.010 --> 00:46:42.570 A:middle L:90%
see it back. So there are many extensions of
678
00:46:42.579 --> 00:46:46.849 A:middle L:90%
the considerations of them. There are improvements for when
679
00:46:46.849 --> 00:46:51.909 A:middle L:90%
you don't have access to the data into something and
680
00:46:52.269 --> 00:46:55.340 A:middle L:90%
you can see the last estimate or projection to do
681
00:46:55.349 --> 00:47:01.650 A:middle L:90%
more fancy stuff data. So you see that interested
682
00:47:01.650 --> 00:47:07.360 A:middle L:90%
? I can explain one online. Okay. So
683
00:47:07.360 --> 00:47:10.710 A:middle L:90%
far, retail. So in this online operation,
684
00:47:10.719 --> 00:47:15.710 A:middle L:90%
such part where we have shown is that a biased
685
00:47:15.719 --> 00:47:19.349 A:middle L:90%
or reliable have to evaluate the most of them are
686
00:47:19.360 --> 00:47:23.050 A:middle L:90%
driven from historical data without without having to be a
687
00:47:23.059 --> 00:47:27.489 A:middle L:90%
real system. So you can think of why not
688
00:47:27.659 --> 00:47:30.019 A:middle L:90%
to, uh, datasets in supplies, for instance
689
00:47:30.659 --> 00:47:37.059 A:middle L:90%
, or consultation for, uh, interesting. So
690
00:47:37.070 --> 00:47:40.510 A:middle L:90%
this is the first benchmark it's just released by that
691
00:47:40.510 --> 00:47:45.099 A:middle L:90%
philosophy, interested in this kind of problem down.
692
00:47:45.289 --> 00:47:49.489 A:middle L:90%
Uh, And the second version is coming for competition
693
00:47:49.500 --> 00:47:53.369 A:middle L:90%
. Probably fostered by hostile. And also, uh
694
00:47:53.380 --> 00:47:59.320 A:middle L:90%
, also, I send out workshop. Okay,
695
00:47:59.320 --> 00:48:01.519 A:middle L:90%
So let me show you some experiments. Um,
696
00:48:01.530 --> 00:48:06.070 A:middle L:90%
how well we work. And so I have.
697
00:48:06.070 --> 00:48:10.539 A:middle L:90%
So you also describe how to do evaluations. So
698
00:48:10.539 --> 00:48:14.719 A:middle L:90%
this section is about how to find a way to
699
00:48:14.730 --> 00:48:22.670 A:middle L:90%
evaluate the, uh So we have, uh,
700
00:48:22.679 --> 00:48:27.679 A:middle L:90%
about 20 upwards her. Uh huh. And then
701
00:48:27.690 --> 00:48:31.269 A:middle L:90%
you dimensionality of use of features, which over 100
702
00:48:31.619 --> 00:48:36.949 A:middle L:90%
features by critical important analysis, uh, make the
703
00:48:36.960 --> 00:48:40.550 A:middle L:90%
future. Uh, And then, uh, so
704
00:48:40.559 --> 00:48:44.769 A:middle L:90%
remember, there's a model. There's an updated because
705
00:48:44.769 --> 00:48:47.780 A:middle L:90%
the system, uh, system infrastructure constraints. So
706
00:48:47.780 --> 00:48:52.739 A:middle L:90%
we also stimulate that fact. Particularly updated model every
707
00:48:52.750 --> 00:48:57.210 A:middle L:90%
five minutes. So finally use use the protected by
708
00:48:57.219 --> 00:49:00.250 A:middle L:90%
the last five minutes. Model used to model to
709
00:49:00.860 --> 00:49:05.130 A:middle L:90%
the next. Uh, hi. Would you like
710
00:49:05.659 --> 00:49:07.670 A:middle L:90%
to do a model here? They use the model
711
00:49:07.679 --> 00:49:14.599 A:middle L:90%
to use the existing time period and a model for
712
00:49:15.340 --> 00:49:20.079 A:middle L:90%
the next because it's great. And then the main
713
00:49:20.079 --> 00:49:22.250 A:middle L:90%
metric there'll be compared to show the next few slides
714
00:49:22.260 --> 00:49:28.900 A:middle L:90%
is the overall novel CPR each algorithm, uh,
715
00:49:28.909 --> 00:49:34.340 A:middle L:90%
a pocket. So, uh, so that's where
716
00:49:34.340 --> 00:49:37.150 A:middle L:90%
you have, Uh, yeah, but a bucket
717
00:49:37.159 --> 00:49:40.489 A:middle L:90%
here that you can run your content outward and then
718
00:49:40.500 --> 00:49:45.300 A:middle L:90%
in asserting bucket you can use the model is used
719
00:49:45.300 --> 00:49:49.900 A:middle L:90%
for users in this pocket by trying on all the
720
00:49:49.900 --> 00:49:52.949 A:middle L:90%
exploration components here. So when it usually comes here
721
00:49:52.960 --> 00:49:54.769 A:middle L:90%
, uh, it's been a point inside of the
722
00:49:54.780 --> 00:49:59.130 A:middle L:90%
user. All this in your pocket, Uh,
723
00:49:59.139 --> 00:50:02.159 A:middle L:90%
this story, bucket and bucket. And, uh
724
00:50:02.170 --> 00:50:07.630 A:middle L:90%
, welcome to the user that you also an exploration
725
00:50:07.630 --> 00:50:10.920 A:middle L:90%
here. Users also this pocket, then, uh
726
00:50:10.960 --> 00:50:16.190 A:middle L:90%
, Children without doing so. Because the value of
727
00:50:16.199 --> 00:50:21.480 A:middle L:90%
this started back here that are most important measures How
728
00:50:21.489 --> 00:50:24.300 A:middle L:90%
? Well, how well the model converts to run
729
00:50:27.760 --> 00:50:30.710 A:middle L:90%
. Yeah. So the first one, the first
730
00:50:30.710 --> 00:50:37.420 A:middle L:90%
comparison between these models and generally models. So everyone's
731
00:50:37.429 --> 00:50:40.610 A:middle L:90%
related models seeking a 31 structure store equation. And
732
00:50:40.619 --> 00:50:45.730 A:middle L:90%
this is the$20 1st, the left hand side
733
00:50:45.739 --> 00:50:49.090 A:middle L:90%
, actually. Really? So I found is the
734
00:50:49.099 --> 00:50:52.809 A:middle L:90%
probability that you choose depending on the exploration, the
735
00:50:52.809 --> 00:50:57.329 A:middle L:90%
right one is You see, we control the Crown
736
00:50:57.340 --> 00:51:00.539 A:middle L:90%
Corporation where you can find me and conference center.
737
00:51:00.610 --> 00:51:04.820 A:middle L:90%
So basically, offer is more confidence in the okay
738
00:51:04.829 --> 00:51:09.010 A:middle L:90%
, So more about more inspiration. And then even
739
00:51:09.019 --> 00:51:14.429 A:middle L:90%
so you can see that all the usually all the
740
00:51:14.440 --> 00:51:17.309 A:middle L:90%
food or the green ground curved bars are much higher
741
00:51:17.309 --> 00:51:22.340 A:middle L:90%
than the blue one. Meaning that, uh,
742
00:51:22.349 --> 00:51:27.349 A:middle L:90%
generalized linear models that provide the model than Windows because
743
00:51:27.360 --> 00:51:31.579 A:middle L:90%
the country, the refinery, worst signals and problem
744
00:51:31.760 --> 00:51:37.260 A:middle L:90%
And also the second observation is that compared to the
745
00:51:37.269 --> 00:51:38.809 A:middle L:90%
left hand side, right hand side. You see
746
00:51:38.809 --> 00:51:42.400 A:middle L:90%
, the BCB exploration is usually more efficient when you
747
00:51:42.400 --> 00:51:45.480 A:middle L:90%
have the right friends often here, which is also
748
00:51:45.480 --> 00:51:50.420 A:middle L:90%
consistent with previous work that shows you cities in general
749
00:51:50.420 --> 00:51:54.489 A:middle L:90%
more attractive exploration strategy than have strong really driving exploration
750
00:51:55.050 --> 00:52:00.840 A:middle L:90%
A second. A lot of you explain quickly is
751
00:52:00.849 --> 00:52:05.789 A:middle L:90%
the comparison between Thompson something and you see so many
752
00:52:05.789 --> 00:52:07.829 A:middle L:90%
algorithms here. I'm going to No, most of
753
00:52:07.840 --> 00:52:14.250 A:middle L:90%
them are focusing on Thompson something to here. And
754
00:52:14.250 --> 00:52:16.389 A:middle L:90%
they used to be one which is the agreement here
755
00:52:16.400 --> 00:52:22.789 A:middle L:90%
. So x delay minutes. So that way you
756
00:52:22.789 --> 00:52:27.639 A:middle L:90%
want it still slightly better and that way. But
757
00:52:27.650 --> 00:52:29.920 A:middle L:90%
as you increase the way that I think the comment
758
00:52:29.920 --> 00:52:35.280 A:middle L:90%
something is competitive uniform on this for me and more
759
00:52:35.280 --> 00:52:38.079 A:middle L:90%
importantly, within 60 minutes when the randomizes is more
760
00:52:38.079 --> 00:52:42.730 A:middle L:90%
robust doesn't seem to be affected by the way.
761
00:52:42.739 --> 00:52:50.800 A:middle L:90%
But in contrast to a so uh included in the
762
00:52:50.809 --> 00:52:53.679 A:middle L:90%
first part. So I showed them how to use
763
00:52:53.679 --> 00:52:58.889 A:middle L:90%
potential benefits principle to, uh, a lot of
764
00:52:58.900 --> 00:53:02.880 A:middle L:90%
critical notifications like this recommendation, ranking of computational advertisement
765
00:53:04.150 --> 00:53:07.400 A:middle L:90%
also show you how to use the starting point evaluation
766
00:53:07.409 --> 00:53:14.260 A:middle L:90%
without implementing something real system Also show you encouraging results
767
00:53:14.269 --> 00:53:21.809 A:middle L:90%
in communications is English recommendation on you later and particularly
768
00:53:21.820 --> 00:53:29.039 A:middle L:90%
I want to highlight the practices using the exploration exploration
769
00:53:30.480 --> 00:53:36.809 A:middle L:90%
future many interesting what to do offline violation many ways
770
00:53:36.820 --> 00:53:39.099 A:middle L:90%
to use non flavor. And also, when you
771
00:53:39.099 --> 00:53:45.159 A:middle L:90%
have prior knowledge devised a much better talk something along
772
00:53:45.159 --> 00:53:49.280 A:middle L:90%
that line by prior model, you can use prior
773
00:53:49.280 --> 00:53:51.610 A:middle L:90%
knowledge that way. Many other ways to find,
774
00:53:51.710 --> 00:53:55.489 A:middle L:90%
also in many various abandoned will be rectified. Reality
775
00:53:55.500 --> 00:54:01.250 A:middle L:90%
. Uh, so, uh okay, so that's
776
00:54:01.250 --> 00:54:05.760 A:middle L:90%
the, uh, the first part of a national
777
00:54:05.760 --> 00:54:09.139 A:middle L:90%
ban it So we need a couple more minutes on
778
00:54:09.150 --> 00:54:15.570 A:middle L:90%
research, so I have a background in the most
779
00:54:15.579 --> 00:54:21.860 A:middle L:90%
learning. Uh, so unfortunately, uh, it's
780
00:54:21.860 --> 00:54:25.199 A:middle L:90%
, uh, much money problem that optimized strategies and
781
00:54:25.199 --> 00:54:29.409 A:middle L:90%
the sequential decision making important. So here's one example
782
00:54:29.409 --> 00:54:32.579 A:middle L:90%
that are working towards and, uh, in China
783
00:54:32.590 --> 00:54:37.570 A:middle L:90%
, so it's called a dire uh, it's a
784
00:54:37.579 --> 00:54:40.389 A:middle L:90%
system that calls the young system that says, there
785
00:54:40.400 --> 00:54:43.940 A:middle L:90%
you can hold it. We can get out of
786
00:54:44.449 --> 00:54:49.550 A:middle L:90%
China and valuable transfer the call to the person that
787
00:54:49.559 --> 00:54:52.809 A:middle L:90%
you want. So I get that you want to
788
00:54:52.820 --> 00:54:57.800 A:middle L:90%
call someone here each and then and then you say
789
00:54:57.800 --> 00:55:00.530 A:middle L:90%
something like that, Max Peter Johnson go home.
790
00:55:00.539 --> 00:55:04.289 A:middle L:90%
And then this is a sound signal and then the
791
00:55:04.300 --> 00:55:07.949 A:middle L:90%
speech recognition techniques, and then transfer to some computer
792
00:55:07.949 --> 00:55:14.639 A:middle L:90%
recommended representation to take antiviral that depending on the signal
793
00:55:14.639 --> 00:55:16.719 A:middle L:90%
and decide whether you understand the question well, even
794
00:55:16.719 --> 00:55:21.260 A:middle L:90%
understanding and transfer the call to jump it was not
795
00:55:21.260 --> 00:55:22.639 A:middle L:90%
sure. Then you can Can you confirm that you
796
00:55:22.650 --> 00:55:25.659 A:middle L:90%
really want you really want to join me and then
797
00:55:25.670 --> 00:55:31.510 A:middle L:90%
correct question. Okay, so that repeat. So
798
00:55:31.510 --> 00:55:35.449 A:middle L:90%
in this kind of process, the notion of state
799
00:55:35.449 --> 00:55:37.780 A:middle L:90%
there's notion of actions and also, uh and then
800
00:55:37.780 --> 00:55:40.860 A:middle L:90%
we want to design a dialogue by these two so
801
00:55:40.860 --> 00:55:45.449 A:middle L:90%
that the system can succeed in, uh, still
802
00:55:45.460 --> 00:55:46.590 A:middle L:90%
conversation as possible. So if you do this,
803
00:55:46.590 --> 00:55:51.210 A:middle L:90%
we can define report function of minus one response.
804
00:55:51.219 --> 00:55:54.690 A:middle L:90%
So response here, then is its success again,
805
00:55:54.699 --> 00:55:59.659 A:middle L:90%
minus 20. So I maximize the reward system and
806
00:55:59.670 --> 00:56:07.719 A:middle L:90%
devilish behavior to optimize by the objective. So cute
807
00:56:07.730 --> 00:56:12.949 A:middle L:90%
, teacher that that report with defined world cultures systems
808
00:56:12.960 --> 00:56:17.989 A:middle L:90%
do left trade reports seem to have a lot of
809
00:56:19.000 --> 00:56:25.449 A:middle L:90%
problems. Control problems gain, not computing Cuban introduction
810
00:56:25.460 --> 00:56:29.530 A:middle L:90%
patients. Um, so let's see that usually,
811
00:56:29.900 --> 00:56:40.469 A:middle L:90%
uh, position processes. Uh, so in my
812
00:56:40.469 --> 00:56:45.639 A:middle L:90%
dissertation work on efficient exploration and algorithms for solving reinforcement
813
00:56:45.639 --> 00:56:49.170 A:middle L:90%
, learning a market position process, that the idea
814
00:56:49.170 --> 00:56:52.900 A:middle L:90%
is distinguished, non process is unimportant dynamics. And
815
00:56:52.900 --> 00:56:57.610 A:middle L:90%
then So if I connect that patient and I know
816
00:56:57.610 --> 00:57:00.579 A:middle L:90%
where I should expose at this stage, if I
817
00:57:00.590 --> 00:57:04.409 A:middle L:90%
know where certain about dynamics and they can do exploitation
818
00:57:04.420 --> 00:57:07.369 A:middle L:90%
so that it would be useful for doing exploration in
819
00:57:07.380 --> 00:57:13.460 A:middle L:90%
reinforcement rain and particularly proposed a simplified the process of
820
00:57:13.469 --> 00:57:15.000 A:middle L:90%
faith, uh, most what he knows, And
821
00:57:15.010 --> 00:57:19.969 A:middle L:90%
from within, you can devise a principal algorithm called
822
00:57:19.980 --> 00:57:23.420 A:middle L:90%
big formats that utilizes this, uh, principal during
823
00:57:23.420 --> 00:57:29.599 A:middle L:90%
exploration and then unify close to many of the existing
824
00:57:29.610 --> 00:57:32.510 A:middle L:90%
Nothing. You want various kinds of various kinds of
825
00:57:32.510 --> 00:57:38.070 A:middle L:90%
reinforcement problems, So it's not like a slight.
826
00:57:38.079 --> 00:57:40.440 A:middle L:90%
So in the first part, I talked about that
827
00:57:40.449 --> 00:57:45.610 A:middle L:90%
problem capture Internet application. And then, uh so
828
00:57:45.619 --> 00:57:49.369 A:middle L:90%
, uh, deep reinforcement, learning. And two
829
00:57:49.369 --> 00:57:52.190 A:middle L:90%
minutes that captured many sequences, decision making problems,
830
00:57:52.199 --> 00:57:55.969 A:middle L:90%
biologics and CI. Uh, so you can see
831
00:57:55.969 --> 00:58:00.119 A:middle L:90%
that the reinforcement camps a lot of, uh,
832
00:58:00.130 --> 00:58:04.570 A:middle L:90%
my research focuses in on exploration, expectation, tradeoff
833
00:58:04.989 --> 00:58:09.650 A:middle L:90%
, evaluation and also working on the selection and additional
834
00:58:09.650 --> 00:58:14.829 A:middle L:90%
confirmation of convergence. Trade analysis. Uh, unfortunately
835
00:58:15.610 --> 00:58:20.599 A:middle L:90%
, you also time right after learning, learning.
836
00:58:22.139 --> 00:58:23.550 A:middle L:90%
So, yeah. So that's the end of my
837
00:58:23.559 --> 00:58:30.530 A:middle L:90%
talk and I depression That our thanks. Yeah.