WEBVTT
1
00:00:06.240 --> 00:00:11.099 A:middle L:90%
So the talk today is very particular about mission,
2
00:00:11.099 --> 00:00:14.849 A:middle L:90%
really? Investors setting. And this time, the
3
00:00:14.859 --> 00:00:19.179 A:middle L:90%
focus on women's health evaluation as well as some cases
4
00:00:19.190 --> 00:00:24.839 A:middle L:90%
you get, uh, you ask me questions.
5
00:00:25.239 --> 00:00:27.839 A:middle L:90%
Um, So here's the outline A lot. So
6
00:00:27.850 --> 00:00:32.450 A:middle L:90%
we introduced the problem, then explain to you find
7
00:00:32.450 --> 00:00:36.250 A:middle L:90%
out to develop a few years and also held a
8
00:00:36.509 --> 00:00:40.619 A:middle L:90%
find operation which turns out not so straightforward and benefit
9
00:00:40.630 --> 00:00:45.810 A:middle L:90%
compared to from then, show some experiment themselves and
10
00:00:45.810 --> 00:00:47.729 A:middle L:90%
then complete the call. And at the end of
11
00:00:47.729 --> 00:00:50.670 A:middle L:90%
your time to talk about some research on the 24th
12
00:00:50.679 --> 00:00:54.850 A:middle L:90%
. So So this is a graph that shows the
13
00:00:54.859 --> 00:01:10.909 A:middle L:90%
direction between uh huh uhh! You sure? Okay
14
00:01:10.920 --> 00:01:12.930 A:middle L:90%
, this is a confirmation like general age, and
15
00:01:12.969 --> 00:01:17.180 A:middle L:90%
then he comes to the absolute, and that one
16
00:01:17.180 --> 00:01:21.769 A:middle L:90%
has observed strategy that has to decide what form,
17
00:01:21.769 --> 00:01:26.000 A:middle L:90%
use or ranking results to the user and return.
18
00:01:26.010 --> 00:01:29.709 A:middle L:90%
You can provide it back in the fall of the
19
00:01:29.719 --> 00:01:33.890 A:middle L:90%
click revenue. Um, and then all of this
20
00:01:33.900 --> 00:01:37.670 A:middle L:90%
in this kind of direction, uh, we call
21
00:01:37.670 --> 00:01:41.579 A:middle L:90%
this provision of context, um, and then relations
22
00:01:42.140 --> 00:01:49.719 A:middle L:90%
here, Uh, and then the strategy by called
23
00:01:49.719 --> 00:01:53.840 A:middle L:90%
policy and then all the feedback by user is called
24
00:01:53.849 --> 00:01:57.120 A:middle L:90%
This is called action and be like, totally Ward
25
00:01:57.129 --> 00:01:59.280 A:middle L:90%
. And this whole process to go, which I
26
00:01:59.420 --> 00:02:02.730 A:middle L:90%
view, is to maximize the total reward by optimizing
27
00:02:02.730 --> 00:02:07.129 A:middle L:90%
the policy here. So you can imagine it's optimizing
28
00:02:07.140 --> 00:02:09.150 A:middle L:90%
happy to maximize the 6 to 8 and the current
29
00:02:09.169 --> 00:02:13.530 A:middle L:90%
users. So let me give you a more complete
30
00:02:13.530 --> 00:02:17.250 A:middle L:90%
example on the front page. So eventually quotas on
31
00:02:17.259 --> 00:02:23.610 A:middle L:90%
a particular model here called a model a closer look
32
00:02:23.620 --> 00:02:27.169 A:middle L:90%
. Uh, so the module that recommendations are going
33
00:02:27.180 --> 00:02:31.979 A:middle L:90%
to use you can see here, which is a
34
00:02:31.990 --> 00:02:38.319 A:middle L:90%
small number of articles, articles are handpicked by human
35
00:02:38.319 --> 00:02:44.259 A:middle L:90%
, and, uh and then there's it's okay for
36
00:02:44.270 --> 00:02:46.750 A:middle L:90%
that has a big image. So that's the highlighted
37
00:02:46.750 --> 00:02:49.810 A:middle L:90%
articles that we recommend. This is the most problem
38
00:02:49.819 --> 00:02:53.009 A:middle L:90%
, one of the most common positions, and therefore
39
00:02:53.009 --> 00:02:55.240 A:middle L:90%
, we really want to amend the most interesting items
40
00:02:55.509 --> 00:03:00.069 A:middle L:90%
. So, um, so informal. Yes.
41
00:03:00.639 --> 00:03:05.430 A:middle L:90%
I want to interest you survive, explain useful articles
42
00:03:05.439 --> 00:03:07.789 A:middle L:90%
when you press on and leader former if we want
43
00:03:07.789 --> 00:03:12.229 A:middle L:90%
to maximize the number text for the number of effective
44
00:03:12.229 --> 00:03:15.370 A:middle L:90%
rates of ST uh, the many challenges here,
45
00:03:15.379 --> 00:03:19.849 A:middle L:90%
one of them is that the content for several articles
46
00:03:19.939 --> 00:03:22.780 A:middle L:90%
it's changing all the time, so the editors may
47
00:03:22.789 --> 00:03:24.699 A:middle L:90%
actually want to fool or remove forward, because we
48
00:03:24.710 --> 00:03:29.490 A:middle L:90%
try to keep the comments, complicate. Um,
49
00:03:29.500 --> 00:03:31.409 A:middle L:90%
And then there's also the smart services is where,
50
00:03:31.419 --> 00:03:36.099 A:middle L:90%
uh, users, uh, we have a huge
51
00:03:36.110 --> 00:03:38.719 A:middle L:90%
number of searches today, but you're certain movies is
52
00:03:38.729 --> 00:03:42.729 A:middle L:90%
small. So to transfer the knowledge of one another
53
00:03:42.740 --> 00:03:46.889 A:middle L:90%
so that I can use your information to identify your
54
00:03:46.900 --> 00:03:53.689 A:middle L:90%
interest from running a personalized organization, Okay. And
55
00:03:53.699 --> 00:03:57.469 A:middle L:90%
most important, this office is also the impact.
56
00:03:57.710 --> 00:04:00.210 A:middle L:90%
Very prized. Addition, exploration. So let me
57
00:04:00.300 --> 00:04:04.580 A:middle L:90%
elaborate on that. Uh, so again, um
58
00:04:04.590 --> 00:04:06.919 A:middle L:90%
, So I want to recommend the use of,
59
00:04:06.930 --> 00:04:11.219 A:middle L:90%
uh, trying to maximize the question. So the
60
00:04:11.229 --> 00:04:15.550 A:middle L:90%
observation and this process that we can only obtain intact
61
00:04:15.560 --> 00:04:17.620 A:middle L:90%
, but we recommend Well, I know that we
62
00:04:17.629 --> 00:04:21.040 A:middle L:90%
don't recommend you don't see it. So therefore,
63
00:04:21.050 --> 00:04:25.759 A:middle L:90%
don't see whether the use of the artist So this
64
00:04:25.759 --> 00:04:30.740 A:middle L:90%
is partially both. So we want to recommend this
65
00:04:30.740 --> 00:04:32.779 A:middle L:90%
is what we want to recommend to the user.
66
00:04:32.790 --> 00:04:36.279 A:middle L:90%
To do that, we need to estimate each article
67
00:04:36.290 --> 00:04:40.750 A:middle L:90%
trip to raise a great, but we don't know
68
00:04:40.750 --> 00:04:45.339 A:middle L:90%
the rate beforehand. So therefore we need sometimes when
69
00:04:45.350 --> 00:04:47.959 A:middle L:90%
the article is new to the school to mention related
70
00:04:47.970 --> 00:04:51.699 A:middle L:90%
trade for the user to see whether you like it
71
00:04:53.139 --> 00:04:55.970 A:middle L:90%
. So there's a There's a trade off between this
72
00:04:55.980 --> 00:05:00.490 A:middle L:90%
process. Wanted to utilize the knowledge we have data
73
00:05:00.500 --> 00:05:01.980 A:middle L:90%
from the data to do what you want. The
74
00:05:01.980 --> 00:05:05.759 A:middle L:90%
other is to do exploration, collect information from two
75
00:05:06.129 --> 00:05:10.759 A:middle L:90%
. So there's a trade off between these two conflicting
76
00:05:10.769 --> 00:05:13.240 A:middle L:90%
goals. And the problem is how to do that
77
00:05:13.250 --> 00:05:17.180 A:middle L:90%
Turn off dynamic pool or when you try to continue
78
00:05:17.220 --> 00:05:23.360 A:middle L:90%
interest as well. So let me give you an
79
00:05:23.360 --> 00:05:28.060 A:middle L:90%
example, that's where. Sufficient inspiration. So let's
80
00:05:28.139 --> 00:05:34.129 A:middle L:90%
and you. It's too bad machines, machines there
81
00:05:34.139 --> 00:05:39.839 A:middle L:90%
so usually play slot machine expectations. This money.
82
00:05:39.850 --> 00:05:43.160 A:middle L:90%
But, uh, you are nice. You can
83
00:05:43.170 --> 00:05:46.000 A:middle L:90%
earn money right here in the swamp and ability to
84
00:05:46.000 --> 00:05:48.509 A:middle L:90%
maximize them. You're not want one of these machines
85
00:05:48.519 --> 00:05:51.610 A:middle L:90%
. You don't know which one is higher, attractive
86
00:05:51.620 --> 00:05:55.800 A:middle L:90%
rewards for money. So you need to try one
87
00:05:55.800 --> 00:05:58.970 A:middle L:90%
of each one of them that that converts to one
88
00:05:58.970 --> 00:06:00.050 A:middle L:90%
of the things that have a high reward. So
89
00:06:00.050 --> 00:06:03.189 A:middle L:90%
let's say you try first. Want to give$5
90
00:06:03.199 --> 00:06:09.449 A:middle L:90%
and try to 2nd 10 1st, 1570 earlier and
91
00:06:09.459 --> 00:06:11.670 A:middle L:90%
five right, So at this point, I think
92
00:06:11.670 --> 00:06:14.889 A:middle L:90%
that the first machine gives you$5 on average and
93
00:06:14.889 --> 00:06:17.240 A:middle L:90%
the second machine doesn't do anything. So you think
94
00:06:17.240 --> 00:06:20.459 A:middle L:90%
that's okay? Maybe the first machine better than China
95
00:06:20.839 --> 00:06:24.209 A:middle L:90%
. So this is good. I looks good.
96
00:06:24.220 --> 00:06:29.810 A:middle L:90%
However, I think we only see this machine is
97
00:06:29.810 --> 00:06:30.670 A:middle L:90%
the only three data points for the second machine,
98
00:06:30.680 --> 00:06:34.560 A:middle L:90%
so the estimate here can actually be expected. So
99
00:06:34.560 --> 00:06:38.980 A:middle L:90%
let's say it turns out that the first machine$5
100
00:06:38.980 --> 00:06:44.209 A:middle L:90%
per round but the and actually having$100 a lot
101
00:06:44.209 --> 00:06:46.839 A:middle L:90%
of time. So now you have$25 per round
102
00:06:46.850 --> 00:06:51.389 A:middle L:90%
. So it's just unlucky to miss all the$100
103
00:06:51.389 --> 00:06:55.850 A:middle L:90%
around here because you don't China. So it's Friday
104
00:06:55.860 --> 00:07:00.230 A:middle L:90%
so you can see$100 a year and therefore comparing
105
00:07:00.230 --> 00:07:03.620 A:middle L:90%
to the person a draft of$20 per now.
106
00:07:03.629 --> 00:07:06.720 A:middle L:90%
So this is an example, as sufficient exploration of
107
00:07:06.720 --> 00:07:11.129 A:middle L:90%
the second issue give you an estimate of the average
108
00:07:11.139 --> 00:07:14.750 A:middle L:90%
payoff afternoon money to learn, and then we converse
109
00:07:14.750 --> 00:07:19.079 A:middle L:90%
this off first, so even formulate this problem kind
110
00:07:19.079 --> 00:07:23.819 A:middle L:90%
of a record product into something called contextual problem.
111
00:07:23.829 --> 00:07:28.089 A:middle L:90%
There's many names in the literature, so I think
112
00:07:28.089 --> 00:07:30.310 A:middle L:90%
this one, because emphasize the better make sure and
113
00:07:30.319 --> 00:07:33.180 A:middle L:90%
also the contextual nature. So let me explain what
114
00:07:33.180 --> 00:07:35.790 A:middle L:90%
it is. Um, so here we have,
115
00:07:35.800 --> 00:07:39.850 A:middle L:90%
uh, so it's a terrific process. Um,
116
00:07:39.860 --> 00:07:43.089 A:middle L:90%
here we have a candidate which will cost a arm
117
00:07:43.100 --> 00:07:46.519 A:middle L:90%
so you can meditate on the front page module we
118
00:07:46.519 --> 00:07:48.160 A:middle L:90%
have candidates are going to recommend to the user.
119
00:07:48.170 --> 00:07:51.879 A:middle L:90%
Um and then also So this is the teacher asked
120
00:07:51.959 --> 00:07:55.180 A:middle L:90%
. So you're very small, which is, like
121
00:07:55.189 --> 00:07:59.009 A:middle L:90%
a general education and personal interests, etcetera. And
122
00:07:59.009 --> 00:08:03.829 A:middle L:90%
then the decision maker you can choose one of the
123
00:08:03.829 --> 00:08:07.149 A:middle L:90%
actions from this content is displayed to the user and
124
00:08:07.149 --> 00:08:11.410 A:middle L:90%
in return, receive a numerical rewarding, uh,
125
00:08:11.420 --> 00:08:16.050 A:middle L:90%
American from the user. Um, so And then
126
00:08:16.139 --> 00:08:18.139 A:middle L:90%
, after observing this report, you can use this
127
00:08:18.139 --> 00:08:22.529 A:middle L:90%
signal to update our policy strategy and this process with
128
00:08:22.279 --> 00:08:26.709 A:middle L:90%
over time, you want to maximize the summer rewards
129
00:08:26.720 --> 00:08:28.800 A:middle L:90%
before this process. Let's say, one of the
130
00:08:28.810 --> 00:08:31.639 A:middle L:90%
key steps you want to maximize, uh, summer
131
00:08:31.639 --> 00:08:37.679 A:middle L:90%
rewards this key round. Okay. Um so so
132
00:08:37.690 --> 00:08:41.860 A:middle L:90%
technology can then calculate. And the key is the
133
00:08:41.870 --> 00:08:46.240 A:middle L:90%
use of features, uh, is obtained with Spain
134
00:08:46.240 --> 00:08:48.120 A:middle L:90%
on growth. Thank you, sir, And reported
135
00:08:48.940 --> 00:08:52.659 A:middle L:90%
correspondent uh, corresponding. Pointing to care about you
136
00:08:52.659 --> 00:08:56.860 A:middle L:90%
want to maximize the number of then it's natural that
137
00:08:56.870 --> 00:09:05.460 A:middle L:90%
are 10 that is expected by our the great CPR
138
00:09:05.470 --> 00:09:09.440 A:middle L:90%
. So it turns out that this form of this
139
00:09:09.440 --> 00:09:13.539 A:middle L:90%
foundation is one another capturing more patients on the Internet
140
00:09:13.539 --> 00:09:18.750 A:middle L:90%
. So let me see the example of a lot
141
00:09:18.750 --> 00:09:20.450 A:middle L:90%
of money in the street. So this is one
142
00:09:20.450 --> 00:09:24.289 A:middle L:90%
of the Chinese yahoo answers. So you can post
143
00:09:24.299 --> 00:09:28.379 A:middle L:90%
questions. Has questions this one, uh, computers
144
00:09:28.389 --> 00:09:31.500 A:middle L:90%
, they have computers, great computers. And then
145
00:09:31.509 --> 00:09:39.299 A:middle L:90%
these electrons are also advertising. Uh, usually it's
146
00:09:39.309 --> 00:09:46.919 A:middle L:90%
interesting. Uh, then so here in this example
147
00:09:46.929 --> 00:09:50.960 A:middle L:90%
, you can take a traveled a candidate who set
148
00:09:50.960 --> 00:09:56.049 A:middle L:90%
of boxes set up I can display a small region
149
00:09:56.259 --> 00:10:00.110 A:middle L:90%
and then access to the feature of the users in
150
00:10:00.110 --> 00:10:03.940 A:middle L:90%
general, every information in this article, and then
151
00:10:03.950 --> 00:10:13.970 A:middle L:90%
they have to stay here and our state. And
152
00:10:13.980 --> 00:10:18.460 A:middle L:90%
then there's a possible nature where for all other at
153
00:10:18.519 --> 00:10:24.149 A:middle L:90%
that stage is what you don't see the, um
154
00:10:24.159 --> 00:10:26.759 A:middle L:90%
So, for example, is ranking I contractual value
155
00:10:26.769 --> 00:10:30.870 A:middle L:90%
here and see a list of rankings out. Um
156
00:10:30.879 --> 00:10:33.029 A:middle L:90%
, so usually so in this case, uh,
157
00:10:33.039 --> 00:10:37.360 A:middle L:90%
a set of rankings, U. S T is
158
00:10:37.370 --> 00:10:39.529 A:middle L:90%
pretty documents. Are you certain you want to be
159
00:10:39.539 --> 00:10:43.879 A:middle L:90%
personalized. Um, 80 is the ranking of the
160
00:10:43.889 --> 00:10:48.330 A:middle L:90%
breakfast. And are we can define various ways to
161
00:10:48.340 --> 00:10:52.929 A:middle L:90%
find one when the session is a zero. Just
162
00:10:52.980 --> 00:10:58.000 A:middle L:90%
, um So, um yeah, so all I
163
00:10:58.000 --> 00:11:01.220 A:middle L:90%
can. So this, for example, show,
164
00:11:01.230 --> 00:11:03.529 A:middle L:90%
uh, that connection that is, uh, this
165
00:11:03.529 --> 00:11:09.620 A:middle L:90%
morning for many critical applications on the internet. And
166
00:11:09.629 --> 00:11:13.139 A:middle L:90%
it's related to some other, uh, other areas
167
00:11:13.139 --> 00:11:16.389 A:middle L:90%
in the surety of science and statistics. So one
168
00:11:16.389 --> 00:11:18.610 A:middle L:90%
thing you might want to, uh, I wonder
169
00:11:18.610 --> 00:11:22.159 A:middle L:90%
if the connection information that you were collaborative filtering both
170
00:11:22.159 --> 00:11:28.740 A:middle L:90%
of which Parabellum recommending, um, in those organisms
171
00:11:28.750 --> 00:11:31.360 A:middle L:90%
usually, um, they care. They assume that
172
00:11:31.360 --> 00:11:35.139 A:middle L:90%
the established set of articles study seven movies for you
173
00:11:35.139 --> 00:11:37.710 A:middle L:90%
recommend to do so. There's, uh there's no
174
00:11:37.720 --> 00:11:41.090 A:middle L:90%
dynamic content food, So everything that has been a
175
00:11:41.370 --> 00:11:46.080 A:middle L:90%
long time, So you have some data for them
176
00:11:46.090 --> 00:11:48.519 A:middle L:90%
, so you don't need to remind exploration exploitation.
177
00:11:48.519 --> 00:11:50.620 A:middle L:90%
You just need to speak the dataset train part of
178
00:11:50.629 --> 00:11:54.669 A:middle L:90%
the best part train the first test to evaluate the
179
00:11:54.919 --> 00:11:58.409 A:middle L:90%
test. So it's not even our inspiration. Um
180
00:11:58.419 --> 00:12:01.389 A:middle L:90%
, there's also foreign personally, very building is at
181
00:12:01.399 --> 00:12:07.480 A:middle L:90%
the top, so it's more generally that it is
182
00:12:07.480 --> 00:12:09.639 A:middle L:90%
a special case but it has to tackle the Temple
183
00:12:09.720 --> 00:12:13.460 A:middle L:90%
Islamic model. But hopefully it's not a very soon
184
00:12:13.460 --> 00:12:16.690 A:middle L:90%
in a kind of applications that explained just now,
185
00:12:16.700 --> 00:12:20.019 A:middle L:90%
Um, So there's all the traditional, more competitive
186
00:12:20.029 --> 00:12:24.950 A:middle L:90%
, Uh, they don't consider contextual information, so
187
00:12:24.960 --> 00:12:30.399 A:middle L:90%
you can, uh, personalized recommendation. So So
188
00:12:30.409 --> 00:12:37.250 A:middle L:90%
, actually, give you are last report. So
189
00:12:37.259 --> 00:12:41.350 A:middle L:90%
this is an introduction to get kind of problem,
190
00:12:41.830 --> 00:12:45.960 A:middle L:90%
too. So this is, uh, inspection of
191
00:12:45.970 --> 00:12:50.090 A:middle L:90%
the algorithms. Uh, so, uh, this
192
00:12:50.090 --> 00:12:52.950 A:middle L:90%
is the first time that the organization which explain more
193
00:12:52.950 --> 00:12:56.929 A:middle L:90%
later, um, so one way to do exploration
194
00:12:56.940 --> 00:13:01.289 A:middle L:90%
is trying things. So, uh, example,
195
00:13:01.299 --> 00:13:05.659 A:middle L:90%
Uh, So let's say you have three articles here
196
00:13:05.669 --> 00:13:09.509 A:middle L:90%
, and each has its own rates that you don't
197
00:13:09.509 --> 00:13:16.990 A:middle L:90%
know these numbers. Um, And then by using
198
00:13:16.000 --> 00:13:22.070 A:middle L:90%
the locking system, you can estimate the CR use
199
00:13:22.070 --> 00:13:26.259 A:middle L:90%
are just numbers first, uh, so that's a
200
00:13:26.269 --> 00:13:31.090 A:middle L:90%
reasonable estimate of CR. But then you know that
201
00:13:31.090 --> 00:13:33.100 A:middle L:90%
these issues are not accurate. So to collect data
202
00:13:33.110 --> 00:13:39.000 A:middle L:90%
for our refined estimate review so that the better you
203
00:13:39.000 --> 00:13:41.440 A:middle L:90%
can allow us to make the decisions in the future
204
00:13:41.450 --> 00:13:46.110 A:middle L:90%
. So one way call it really is to choose
205
00:13:46.120 --> 00:13:48.320 A:middle L:90%
article that has the highest estimate with high probability one
206
00:13:48.320 --> 00:13:50.940 A:middle L:90%
month and so on. And then a small probability
207
00:13:50.940 --> 00:13:54.129 A:middle L:90%
and so on exploring tomorrow. So this is a
208
00:13:54.139 --> 00:13:58.370 A:middle L:90%
very strange idea. Just assign probabilities to explore something
209
00:13:58.590 --> 00:14:03.399 A:middle L:90%
. Okay, Uh, this exploration is unguided.
210
00:14:03.409 --> 00:14:05.899 A:middle L:90%
You explore. The article is random. So as
211
00:14:05.899 --> 00:14:07.990 A:middle L:90%
you can imagine, it's not the most efficient rate
212
00:14:09.000 --> 00:14:13.169 A:middle L:90%
. So another strategy called UCB one is to choose
213
00:14:13.169 --> 00:14:16.220 A:middle L:90%
articles according to the index computer. This way,
214
00:14:16.230 --> 00:14:20.460 A:middle L:90%
So new is again, you estimate click rate of
215
00:14:20.460 --> 00:14:22.960 A:middle L:90%
the way. And then this, uh, wrap
216
00:14:22.970 --> 00:14:28.730 A:middle L:90%
up here as well and a s okay. It's
217
00:14:28.730 --> 00:14:31.009 A:middle L:90%
a number of times this article has been shown to
218
00:14:31.019 --> 00:14:33.570 A:middle L:90%
music. So at the beginning, when the new
219
00:14:33.580 --> 00:14:37.909 A:middle L:90%
system is very small, that means that we don't
220
00:14:37.909 --> 00:14:41.870 A:middle L:90%
have enough data to to build a good estimate for
221
00:14:41.870 --> 00:14:45.620 A:middle L:90%
that. So this term is paid. And then
222
00:14:45.620 --> 00:14:48.149 A:middle L:90%
, in other words, it encourages the system to
223
00:14:48.240 --> 00:14:50.690 A:middle L:90%
Chinese action. So you can think of that as
224
00:14:50.730 --> 00:14:54.799 A:middle L:90%
exploration boss term approach exploration, Um, and all
225
00:14:54.799 --> 00:14:58.399 A:middle L:90%
the time. And then something becomes watching. This
226
00:14:58.399 --> 00:15:03.080 A:middle L:90%
has made solution s essentially zero, and essentially using
227
00:15:03.090 --> 00:15:09.610 A:middle L:90%
this estimate, it's already very so. These are
228
00:15:09.610 --> 00:15:15.120 A:middle L:90%
the two typical challenges and traditional film problems that do
229
00:15:15.120 --> 00:15:18.649 A:middle L:90%
not consider contact So it means that the assumption is
230
00:15:18.649 --> 00:15:20.940 A:middle L:90%
that the critical made in the article does not depend
231
00:15:20.940 --> 00:15:24.600 A:middle L:90%
on user information. Doesn't become a certification organization,
232
00:15:24.610 --> 00:15:28.889 A:middle L:90%
which is not a reasonable assumption right there with the
233
00:15:28.029 --> 00:15:33.360 A:middle L:90%
different information so consequences that are considering the pictures.
234
00:15:33.899 --> 00:15:37.889 A:middle L:90%
And there's no way to do personalization. So now
235
00:15:37.899 --> 00:15:46.240 A:middle L:90%
so have been candidate or other fraternities were doing personalization
236
00:15:46.240 --> 00:15:48.789 A:middle L:90%
of contextual more. Combat it in the refrigerator back
237
00:15:48.799 --> 00:15:54.120 A:middle L:90%
in 2002. Uh, so, like these people
238
00:15:54.129 --> 00:15:56.360 A:middle L:90%
, as long as the sample, uh, strong
239
00:15:56.360 --> 00:16:00.299 A:middle L:90%
direct guarantees, but conditionally intractable in general, there's
240
00:16:00.309 --> 00:16:07.519 A:middle L:90%
also probably need 2008 to really explain the previous like
241
00:16:07.529 --> 00:16:11.210 A:middle L:90%
you randomize exploration, they do not motivation. Okay
242
00:16:11.220 --> 00:16:14.629 A:middle L:90%
, so this is what I'm going to talk about
243
00:16:14.639 --> 00:16:18.730 A:middle L:90%
. This more compact geometric models that both complications application
244
00:16:18.740 --> 00:16:23.639 A:middle L:90%
and the fact is optimizing maximizing total work with like
245
00:16:23.649 --> 00:16:30.700 A:middle L:90%
so the rest of this session will focus on three
246
00:16:30.710 --> 00:16:33.179 A:middle L:90%
parts. First, I'll explain a generalization of the
247
00:16:33.190 --> 00:16:37.009 A:middle L:90%
you see the strategy for the animals and then extended
248
00:16:37.019 --> 00:16:41.929 A:middle L:90%
to generalize. And then, uh, talk about
249
00:16:41.940 --> 00:16:48.149 A:middle L:90%
randomize our, uh, conscious. Yeah. Yeah
250
00:16:48.740 --> 00:16:51.840 A:middle L:90%
. So let me start with the You see the
251
00:16:51.850 --> 00:16:57.039 A:middle L:90%
strategy for the animals. Uh, so Let's say
252
00:16:57.049 --> 00:17:00.590 A:middle L:90%
so. Here's your moment. Assumption is that these
253
00:17:00.590 --> 00:17:03.750 A:middle L:90%
are the context before the user. Right? Uh
254
00:17:03.759 --> 00:17:07.690 A:middle L:90%
, expectedly. What is? It means that,
255
00:17:07.700 --> 00:17:14.470 A:middle L:90%
uh, that can be estimated by the operation of
256
00:17:14.480 --> 00:17:18.359 A:middle L:90%
this feature. And here take a coefficient corresponding to
257
00:17:18.369 --> 00:17:21.849 A:middle L:90%
that location. But these are not, so you
258
00:17:21.849 --> 00:17:23.809 A:middle L:90%
need to estimated from labor. And now close everything
259
00:17:23.809 --> 00:17:30.650 A:middle L:90%
up. They in the form of making the information
260
00:17:30.660 --> 00:17:33.859 A:middle L:90%
that we have many years of age and a spacious
261
00:17:33.869 --> 00:17:37.619 A:middle L:90%
exhibition and the corresponding let's see that are reversed.
262
00:17:37.640 --> 00:17:41.900 A:middle L:90%
So, in order to estimate favor a very straightforward
263
00:17:41.950 --> 00:17:45.769 A:middle L:90%
to apply a little question for this rich regression.
264
00:17:45.839 --> 00:17:51.089 A:middle L:90%
Mhm. Here, uh, this system statistics,
265
00:17:51.099 --> 00:17:55.069 A:middle L:90%
Um, now. So once you have an explainer
266
00:17:55.069 --> 00:18:00.589 A:middle L:90%
of data and reduce that to estimate when you count
267
00:18:00.799 --> 00:18:06.160 A:middle L:90%
estimated, uh, hold W is estimated close enough
268
00:18:06.170 --> 00:18:11.940 A:middle L:90%
to shoot estimates that, uh, so far so
269
00:18:11.950 --> 00:18:15.650 A:middle L:90%
good. But this grace only gives you a point
270
00:18:15.650 --> 00:18:17.670 A:middle L:90%
estimate in the sense that we give you a number
271
00:18:17.799 --> 00:18:21.509 A:middle L:90%
and tell how confident is this estimation? So we
272
00:18:21.509 --> 00:18:26.140 A:middle L:90%
need to qualify the uncertainty so that we know I'm
273
00:18:26.140 --> 00:18:29.690 A:middle L:90%
sure about the City of article so committed for exploration
274
00:18:29.700 --> 00:18:32.890 A:middle L:90%
, but article I don't have enough data I'm very
275
00:18:32.900 --> 00:18:36.559 A:middle L:90%
unconfident about my in my estimate that we need more
276
00:18:36.569 --> 00:18:40.630 A:middle L:90%
exploration. So we need to qualify this 17 so
277
00:18:40.630 --> 00:18:42.049 A:middle L:90%
that when they do it is to divide, uh
278
00:18:42.059 --> 00:18:45.619 A:middle L:90%
, to use this, uh, quality. We
279
00:18:45.619 --> 00:18:51.680 A:middle L:90%
can show that with high probability, our dictionary So
280
00:18:51.680 --> 00:18:53.710 A:middle L:90%
that the left hand side here is the prediction area
281
00:18:53.720 --> 00:18:56.099 A:middle L:90%
. So this is my question estimate. This is
282
00:18:56.109 --> 00:19:00.920 A:middle L:90%
true ground truth that I don't know, the absolute
283
00:19:00.930 --> 00:19:03.900 A:middle L:90%
differences Prediction Error is founded by the square of this
284
00:19:03.900 --> 00:19:08.670 A:middle L:90%
guy. Modified a super friend to constant. Uh
285
00:19:10.140 --> 00:19:12.980 A:middle L:90%
, So, um, so this time sometimes measures
286
00:19:14.069 --> 00:19:18.349 A:middle L:90%
how similar the new user X is two previous usage
287
00:19:18.359 --> 00:19:21.349 A:middle L:90%
. So it is very close to the previous certain
288
00:19:21.359 --> 00:19:25.339 A:middle L:90%
a Then, um Then the term is small,
289
00:19:25.349 --> 00:19:26.430 A:middle L:90%
in other words, that we have a group of
290
00:19:26.440 --> 00:19:32.799 A:middle L:90%
made this repression, so confidence level is small,
291
00:19:32.809 --> 00:19:37.259 A:middle L:90%
so there's no people need for exploration between, uh
292
00:19:37.640 --> 00:19:45.259 A:middle L:90%
Okay, so, uh huh, for high priority
293
00:19:45.269 --> 00:19:47.349 A:middle L:90%
, but then how we're going to use it.
294
00:19:47.359 --> 00:19:52.250 A:middle L:90%
So we have called in the city or, uh
295
00:19:52.329 --> 00:19:56.049 A:middle L:90%
, this model. So essentially, when you have
296
00:19:56.049 --> 00:20:02.769 A:middle L:90%
a user, then always chooses and on maximizes so
297
00:20:02.779 --> 00:20:08.319 A:middle L:90%
remember no context, you see, you know,
298
00:20:08.329 --> 00:20:14.500 A:middle L:90%
remember this for The first time of both about exploitation
299
00:20:14.509 --> 00:20:18.519 A:middle L:90%
is a point estimate of the, uh, a
300
00:20:18.529 --> 00:20:22.400 A:middle L:90%
second term is a communist involved how a certain in
301
00:20:22.400 --> 00:20:26.309 A:middle L:90%
your estimation and then the other is to the ground
302
00:20:27.309 --> 00:20:33.069 A:middle L:90%
, and I combined two algorithm gives a trade off
303
00:20:33.079 --> 00:20:38.309 A:middle L:90%
between the first part exploitation and exploration. Uh,
304
00:20:38.380 --> 00:20:42.910 A:middle L:90%
the album should have mentioned this job is for you
305
00:20:42.910 --> 00:20:47.730 A:middle L:90%
, C b. Because it changes on according to
306
00:20:47.740 --> 00:20:51.579 A:middle L:90%
offer confidence down nature. So I should mention that
307
00:20:51.660 --> 00:20:56.279 A:middle L:90%
, uh, the same government. And then it's
308
00:20:56.279 --> 00:20:59.720 A:middle L:90%
similar to what we want when we have more and
309
00:20:59.720 --> 00:21:02.609 A:middle L:90%
more simple in the data set and this time it
310
00:21:02.619 --> 00:21:04.529 A:middle L:90%
becomes small. And then you have an estimate for
311
00:21:04.529 --> 00:21:08.390 A:middle L:90%
the first time and essentially doing more exploitation rather than
312
00:21:08.400 --> 00:21:15.400 A:middle L:90%
exploration related algorithm back in 2000 and two that,
313
00:21:15.410 --> 00:21:18.950 A:middle L:90%
uh but it works in a more complicated way.
314
00:21:19.339 --> 00:21:26.319 A:middle L:90%
So how How so Anymore we started doing this work
315
00:21:26.329 --> 00:21:30.660 A:middle L:90%
with many, uh, Chinese. We can follow
316
00:21:30.670 --> 00:21:33.980 A:middle L:90%
, however, is not always perfect. So,
317
00:21:33.990 --> 00:21:36.819 A:middle L:90%
for instance, in the way that we want to
318
00:21:36.819 --> 00:21:38.920 A:middle L:90%
maximize without, it made the criminal rate a lot
319
00:21:38.920 --> 00:21:41.670 A:middle L:90%
of which is probably so it has to be between
320
00:21:41.680 --> 00:21:47.779 A:middle L:90%
01 but this model here. It's kind of case
321
00:21:47.779 --> 00:21:51.750 A:middle L:90%
that this estimating people just make it doesn't make any
322
00:21:51.759 --> 00:21:55.359 A:middle L:90%
sense, right? So So it's part of the
323
00:21:55.369 --> 00:21:59.009 A:middle L:90%
leading model is not always a supermodel for all applications
324
00:21:59.079 --> 00:22:04.029 A:middle L:90%
provide Everyone is like place generalizing enormous monusco. So
325
00:22:04.039 --> 00:22:07.509 A:middle L:90%
So in general as well as we make the assumption
326
00:22:07.519 --> 00:22:14.569 A:middle L:90%
that expected rewarding even the use of future because it's
327
00:22:14.579 --> 00:22:17.450 A:middle L:90%
the linear, uh, combination of peace of features
328
00:22:17.460 --> 00:22:21.960 A:middle L:90%
. Uh, followed by a fine investment company here
329
00:22:21.970 --> 00:22:26.980 A:middle L:90%
at the university. So there are two, uh
330
00:22:26.990 --> 00:22:30.799 A:middle L:90%
, journalists bringing all those ones power regression where,
331
00:22:30.809 --> 00:22:36.410 A:middle L:90%
uh, the city are tricky ways. Is this
332
00:22:36.420 --> 00:22:42.009 A:middle L:90%
, uh, have statements in many cases, uh
333
00:22:42.089 --> 00:22:47.289 A:middle L:90%
, combination here and then take an exponential and do
334
00:22:47.289 --> 00:22:49.339 A:middle L:90%
this. So after this is washing effect, this
335
00:22:49.339 --> 00:22:57.029 A:middle L:90%
number the brain dysfunction too. So it naturally also
336
00:22:57.259 --> 00:23:02.990 A:middle L:90%
for chronic model with, uh, being here the
337
00:23:03.000 --> 00:23:07.140 A:middle L:90%
cumulative distribution function of distribution. So again, a
338
00:23:07.309 --> 00:23:15.779 A:middle L:90%
function. Unfortunately, when you work with journalists,
339
00:23:15.789 --> 00:23:21.299 A:middle L:90%
many models you can always get those solutions like a
340
00:23:21.309 --> 00:23:23.500 A:middle L:90%
million models. So what we can do here is
341
00:23:23.509 --> 00:23:26.990 A:middle L:90%
to do a presentation. So even you want to
342
00:23:27.000 --> 00:23:30.349 A:middle L:90%
get a point estimate with journalist building models. We
343
00:23:30.349 --> 00:23:34.400 A:middle L:90%
can do this regression so instead of using a combination
344
00:23:34.410 --> 00:23:38.460 A:middle L:90%
to get the point, given the data so that
345
00:23:38.539 --> 00:23:42.109 A:middle L:90%
give us the optimal point estimates. And then we
346
00:23:42.109 --> 00:23:47.859 A:middle L:90%
can do something similar to derive confidence in the role
347
00:23:47.859 --> 00:23:51.930 A:middle L:90%
of the estimation. And then and then the energy
348
00:23:51.940 --> 00:23:53.950 A:middle L:90%
combined, too, can do something like that you
349
00:23:53.950 --> 00:23:59.160 A:middle L:90%
simply by choosing on the the highest wind plus compensation
350
00:23:59.890 --> 00:24:03.720 A:middle L:90%
for doing exploration. So here again, all the
351
00:24:03.720 --> 00:24:07.720 A:middle L:90%
stations in the first case I'm in here, which
352
00:24:07.720 --> 00:24:12.200 A:middle L:90%
is important in the second case of logistic and product
353
00:24:12.210 --> 00:24:17.190 A:middle L:90%
. And so the bones nails a little mysterious boarding
354
00:24:17.230 --> 00:24:21.500 A:middle L:90%
a technique side all the same to travel with the
355
00:24:21.500 --> 00:24:23.490 A:middle L:90%
loss of the being fixed up. It's more comfortable
356
00:24:23.500 --> 00:24:26.029 A:middle L:90%
, so I'm going to show that you don't hear
357
00:24:26.029 --> 00:24:33.259 A:middle L:90%
the ideas of person to indicate, and later we'll
358
00:24:33.269 --> 00:24:37.819 A:middle L:90%
show you some show. Some comparisons showed journalists better
359
00:24:37.819 --> 00:24:48.589 A:middle L:90%
than, uh rather than America. So so other
360
00:24:48.589 --> 00:24:52.039 A:middle L:90%
confident found exploration has been very popular in the literature
361
00:24:52.049 --> 00:24:53.279 A:middle L:90%
for a long time. People know that it's very
362
00:24:53.289 --> 00:24:56.789 A:middle L:90%
topic you can you can put a lot of interesting
363
00:24:56.799 --> 00:25:00.329 A:middle L:90%
here, uh, show that very quickly to the
364
00:25:00.480 --> 00:25:04.549 A:middle L:90%
solution. Um, and then, um and then
365
00:25:04.559 --> 00:25:07.980 A:middle L:90%
, uh, there are also indications of this kind
366
00:25:07.980 --> 00:25:12.420 A:middle L:90%
of exploration. So, uh, so the first
367
00:25:12.430 --> 00:25:17.059 A:middle L:90%
thing is that exploration and too much into the exploration
368
00:25:17.069 --> 00:25:21.089 A:middle L:90%
by adding exploration, exploration, bonus to index.
369
00:25:21.099 --> 00:25:23.680 A:middle L:90%
Uh, the economic can easily explored all the potential
370
00:25:23.690 --> 00:25:26.589 A:middle L:90%
piece part of the French stage. Explore all the
371
00:25:26.599 --> 00:25:30.799 A:middle L:90%
friends. Uh, this could be inefficient. Uh
372
00:25:30.809 --> 00:25:34.809 A:middle L:90%
, then it especially when you have high large,
373
00:25:34.819 --> 00:25:37.769 A:middle L:90%
that helps you. This part of the primary is
374
00:25:37.779 --> 00:25:41.720 A:middle L:90%
hopefully helping out parts that used to be still exploding
375
00:25:41.730 --> 00:25:45.470 A:middle L:90%
. Providence is uncertain. So you do want to
376
00:25:45.480 --> 00:25:49.069 A:middle L:90%
use any prior knowledge you have to hurt into performances
377
00:25:49.119 --> 00:25:52.579 A:middle L:90%
. The algorithm. Uh, the second thing is
378
00:25:52.579 --> 00:25:56.549 A:middle L:90%
that exploration using instinctual next so you can see is
379
00:25:56.549 --> 00:26:00.509 A:middle L:90%
the mystic out that always chooses the same home with
380
00:26:00.519 --> 00:26:03.430 A:middle L:90%
that has the maximum mean plus confidence. And,
381
00:26:03.470 --> 00:26:07.069 A:middle L:90%
uh, so there's a problem with the rewards of
382
00:26:07.069 --> 00:26:08.609 A:middle L:90%
the right. So you got to bring, since
383
00:26:08.619 --> 00:26:11.420 A:middle L:90%
we don't really get to use, these are needed
384
00:26:11.430 --> 00:26:15.430 A:middle L:90%
to. So we should the pace and user user
385
00:26:15.440 --> 00:26:18.880 A:middle L:90%
make looking to pay for three seconds and place,
386
00:26:18.150 --> 00:26:22.059 A:middle L:90%
and due to the infrastructure of their expectations, that
387
00:26:22.069 --> 00:26:27.329 A:middle L:90%
doesn't go back to the back and updates immediately,
388
00:26:27.339 --> 00:26:30.660 A:middle L:90%
so they usually some kind of delay and considering the
389
00:26:30.660 --> 00:26:34.410 A:middle L:90%
amount of time to have that few seconds or minutes
390
00:26:34.410 --> 00:26:37.750 A:middle L:90%
of delay correspond to thousands, tens of thousands of
391
00:26:37.759 --> 00:26:42.220 A:middle L:90%
users or maybe more. So in that way there's
392
00:26:42.230 --> 00:26:48.190 A:middle L:90%
a delay of 10,001 million steps, so you can
393
00:26:48.480 --> 00:26:52.259 A:middle L:90%
only get New York one million steps, uh,
394
00:26:52.269 --> 00:26:55.660 A:middle L:90%
in the past. So this is bad for eternity
395
00:26:55.730 --> 00:27:00.549 A:middle L:90%
. Determination, strategies because this black healthcare is always
396
00:27:00.549 --> 00:27:03.390 A:middle L:90%
use the same over and over again. The same
397
00:27:03.400 --> 00:27:04.910 A:middle L:90%
symptoms. Um, and that may not be the
398
00:27:04.910 --> 00:27:07.990 A:middle L:90%
best thing to do because you could have randomized strategies
399
00:27:07.990 --> 00:27:11.250 A:middle L:90%
so that you can explore different things in this back
400
00:27:11.259 --> 00:27:15.569 A:middle L:90%
to, uh, the first thing is that writing
401
00:27:15.579 --> 00:27:18.470 A:middle L:90%
a comic style is not always easy to use it
402
00:27:18.529 --> 00:27:21.390 A:middle L:90%
, so we can do that exactly. Arena Nigeria
403
00:27:21.400 --> 00:27:25.359 A:middle L:90%
approximately for generalized linear models after a bit of a
404
00:27:25.369 --> 00:27:27.180 A:middle L:90%
bit more work, but for other models, more
405
00:27:27.180 --> 00:27:33.019 A:middle L:90%
difficult. So here I'm going to describe a socialistic
406
00:27:33.269 --> 00:27:37.329 A:middle L:90%
concepts. In fact, three. Um so,
407
00:27:37.329 --> 00:27:41.210 A:middle L:90%
yeah, it's called probability. Metric, um,
408
00:27:41.220 --> 00:27:47.349 A:middle L:90%
so even the user X takes them on a case
409
00:27:47.349 --> 00:27:52.019 A:middle L:90%
. An article accord, a probability that is the
410
00:27:52.029 --> 00:27:56.230 A:middle L:90%
probability that this article, so it's the algorithm thinks
411
00:27:56.230 --> 00:28:00.650 A:middle L:90%
that Apple is 90% of the property. Chances are
412
00:28:00.700 --> 00:28:06.089 A:middle L:90%
so is optimal for most interesting for this current user
413
00:28:06.099 --> 00:28:08.259 A:middle L:90%
. Then, uh, we think properly. So
414
00:28:08.259 --> 00:28:11.230 A:middle L:90%
that's the idea. Uh um So, first of
415
00:28:11.230 --> 00:28:14.539 A:middle L:90%
all, you know, this is a randomized strategy
416
00:28:14.549 --> 00:28:18.400 A:middle L:90%
, so it's more of us to work the legend
417
00:28:18.410 --> 00:28:22.799 A:middle L:90%
you finally can. Delay is different. You can
418
00:28:22.809 --> 00:28:25.980 A:middle L:90%
do exploration, different users. And at the end
419
00:28:25.980 --> 00:28:26.690 A:middle L:90%
of the financial aid, you can have people.
420
00:28:26.700 --> 00:28:30.900 A:middle L:90%
You can have data for different sort of users and
421
00:28:30.910 --> 00:28:33.849 A:middle L:90%
that that's the whole, um and then, more
422
00:28:33.849 --> 00:28:37.910 A:middle L:90%
importantly, is straightforward information. So So, for
423
00:28:37.910 --> 00:28:41.529 A:middle L:90%
instance, in, uh, you have a logistic
424
00:28:41.529 --> 00:28:45.950 A:middle L:90%
model role model, you can just have a posterior
425
00:28:45.950 --> 00:28:51.900 A:middle L:90%
institution of gravity trying to work and posterior computer by
426
00:28:51.900 --> 00:28:55.390 A:middle L:90%
any standard based in the tradition of the data product
427
00:28:55.400 --> 00:28:59.069 A:middle L:90%
. So you can maintain posterior here and then and
428
00:28:59.079 --> 00:29:02.569 A:middle L:90%
then when the user accounts you want to recommend an
429
00:29:02.579 --> 00:29:07.230 A:middle L:90%
article, you just draw 11 parameter from this posterior
430
00:29:07.240 --> 00:29:12.250 A:middle L:90%
so each age or randomized to a and then to
431
00:29:12.259 --> 00:29:17.450 A:middle L:90%
choose out According to this random, uh, Grant
432
00:29:17.460 --> 00:29:22.650 A:middle L:90%
Baker. So efforts were model model logistic models,
433
00:29:22.920 --> 00:29:29.849 A:middle L:90%
so you can see because this market to join from
434
00:29:29.849 --> 00:29:33.000 A:middle L:90%
this possibility this process can be targeted. You can
435
00:29:33.000 --> 00:29:37.089 A:middle L:90%
show that this problem actually satisfied the probability that in
436
00:29:37.089 --> 00:29:41.609 A:middle L:90%
principle that I thought has 90% of China's, ultimately
437
00:29:41.619 --> 00:29:47.099 A:middle L:90%
you will be chosen probably 10%. Uh, this
438
00:29:47.099 --> 00:29:52.640 A:middle L:90%
part of Italy combined with many other models, financial
439
00:29:52.640 --> 00:29:56.710 A:middle L:90%
models and models because I think it's more you don't
440
00:29:56.720 --> 00:30:00.400 A:middle L:90%
have to devise new forms or anything. So this
441
00:30:00.400 --> 00:30:07.700 A:middle L:90%
is, uh this is useful in practice. Okay
442
00:30:07.710 --> 00:30:12.769 A:middle L:90%
, So I have really, uh, kind of
443
00:30:12.769 --> 00:30:18.279 A:middle L:90%
algorithm in general thing. And so you're like,
444
00:30:18.289 --> 00:30:22.009 A:middle L:90%
wonder how can probably whether different so your machine,
445
00:30:22.019 --> 00:30:26.549 A:middle L:90%
you do care about finding a massive, rigorous way
446
00:30:26.559 --> 00:30:30.170 A:middle L:90%
to measure the performance and the speed of an algorithm
447
00:30:30.180 --> 00:30:34.500 A:middle L:90%
. In this case, that's about it. The
448
00:30:34.500 --> 00:30:37.950 A:middle L:90%
normal, uh, metric is called agreement. So
449
00:30:37.950 --> 00:30:41.200 A:middle L:90%
here, uh, this is the expectation of the
450
00:30:41.210 --> 00:30:47.539 A:middle L:90%
summer rewards. Uh, so this is an expectation
451
00:30:47.539 --> 00:30:51.500 A:middle L:90%
of some reward of, uh, article algorithm,
452
00:30:51.509 --> 00:30:55.960 A:middle L:90%
assuming that the horrible knows the parameters of this problem
453
00:30:56.140 --> 00:31:00.299 A:middle L:90%
. So this is the best possible highest possible result
454
00:31:00.329 --> 00:31:03.670 A:middle L:90%
. Actually, you can hope for assuming we loaded
455
00:31:03.950 --> 00:31:07.990 A:middle L:90%
it. But on the other hand, you have
456
00:31:07.990 --> 00:31:10.549 A:middle L:90%
an algorithm that does not know the data beforehand.
457
00:31:10.559 --> 00:31:14.369 A:middle L:90%
So thousands adaptive and try to take a and this
458
00:31:14.369 --> 00:31:21.140 A:middle L:90%
is the sequence chooses for the user. So so
459
00:31:21.150 --> 00:31:25.529 A:middle L:90%
that we have an expectation of the total reward.
460
00:31:25.539 --> 00:31:29.299 A:middle L:90%
And the difference between them is called. Uh,
461
00:31:29.309 --> 00:31:33.450 A:middle L:90%
it's always nominated and the strength here, uh,
462
00:31:33.460 --> 00:31:40.299 A:middle L:90%
And then, uh so, uh and then regret
463
00:31:40.690 --> 00:31:45.940 A:middle L:90%
regret. It's not really it's suffering. Then you
464
00:31:45.940 --> 00:31:48.720 A:middle L:90%
can say that the album learn because it's suffering,
465
00:31:48.779 --> 00:31:53.569 A:middle L:90%
then divide users and have less than to regret.
466
00:31:53.579 --> 00:31:59.049 A:middle L:90%
So the first regret, if you decrease the zero
467
00:31:59.059 --> 00:32:00.190 A:middle L:90%
, this other small, smaller one. So the
468
00:32:00.200 --> 00:32:05.569 A:middle L:90%
personal regret decrease to zero all the time. In
469
00:32:05.569 --> 00:32:07.920 A:middle L:90%
other words, it is that we're running algorithms for
470
00:32:07.920 --> 00:32:09.960 A:middle L:90%
a long time. Then the algorithm will converge to
471
00:32:09.970 --> 00:32:16.180 A:middle L:90%
the point that you don't know what point. So
472
00:32:16.190 --> 00:32:21.829 A:middle L:90%
whenever this is how you learn and the album and
473
00:32:21.829 --> 00:32:24.519 A:middle L:90%
the algorithm learns that when, um, other small
474
00:32:24.529 --> 00:32:30.170 A:middle L:90%
a small outbreaks. But the bastard this average increases
475
00:32:30.180 --> 00:32:37.650 A:middle L:90%
to zero. So the algorithm suburbs. So this
476
00:32:37.650 --> 00:32:42.980 A:middle L:90%
is a pragmatic, uh, there's a mathematical metric
477
00:32:42.990 --> 00:32:46.119 A:middle L:90%
to measure the performance of the algorithm down. So
478
00:32:46.130 --> 00:32:49.130 A:middle L:90%
, for instance, the in your face and show
479
00:32:49.130 --> 00:32:51.910 A:middle L:90%
that, uh, the things that regret growth on
480
00:32:51.920 --> 00:32:55.269 A:middle L:90%
all of us were kpp case A number of articles
481
00:32:55.279 --> 00:33:01.579 A:middle L:90%
system the the features of the, uh, step
482
00:33:01.589 --> 00:33:06.460 A:middle L:90%
. So So let me give you that. How
483
00:33:06.470 --> 00:33:07.799 A:middle L:90%
these numbers mean, So that means that the actual
484
00:33:07.880 --> 00:33:13.849 A:middle L:90%
decreases the optimal solution at the rate of just turned
485
00:33:13.859 --> 00:33:16.660 A:middle L:90%
by key. First step, regret this, um
486
00:33:16.670 --> 00:33:22.000 A:middle L:90%
, So when you have 20 articles and 15 features
487
00:33:22.009 --> 00:33:25.569 A:middle L:90%
and 10 million is just a small community, china
488
00:33:25.750 --> 00:33:29.559 A:middle L:90%
, and then you can see that kind of a
489
00:33:29.569 --> 00:33:34.440 A:middle L:90%
small, uh, so quickly go to the ultimate
490
00:33:34.440 --> 00:33:37.799 A:middle L:90%
solution. Uh, that is a very good practice
491
00:33:38.950 --> 00:33:43.359 A:middle L:90%
, and generalized model is still open. Uh,
492
00:33:43.369 --> 00:33:47.180 A:middle L:90%
there's a valiant general model last year. Uh,
493
00:33:47.190 --> 00:33:54.210 A:middle L:90%
people should regret that study a slightly different, Uh
494
00:33:54.220 --> 00:34:00.390 A:middle L:90%
, this is not comparable and constant something, uh
495
00:34:00.400 --> 00:34:02.529 A:middle L:90%
, I think is a whole algorithm that, uh
496
00:34:02.539 --> 00:34:07.240 A:middle L:90%
, there's not been much attention to the Microsoft,
497
00:34:07.619 --> 00:34:13.230 A:middle L:90%
uh, exploration strategy to be particularly helpful in practice
498
00:34:13.239 --> 00:34:15.920 A:middle L:90%
. Um, so now people are starting to analysis
499
00:34:15.929 --> 00:34:21.309 A:middle L:90%
. Um, so So just, uh, last
500
00:34:21.309 --> 00:34:24.369 A:middle L:90%
week or three candidates. So something, uh,
501
00:34:25.599 --> 00:34:29.480 A:middle L:90%
I don't have to get the best possible way.
502
00:34:29.489 --> 00:34:36.059 A:middle L:90%
Uh, so the two days ago, Uh,
503
00:34:36.539 --> 00:34:39.119 A:middle L:90%
okay, So I hope so far. Show you
504
00:34:39.130 --> 00:34:43.800 A:middle L:90%
a lot of problems Were interested in solving, uh
505
00:34:43.809 --> 00:34:46.949 A:middle L:90%
, typical and algorithms that we can reason in solving
506
00:34:46.960 --> 00:34:51.949 A:middle L:90%
problems. And now we switch to the, uh
507
00:34:51.960 --> 00:34:58.110 A:middle L:90%
, evaluation problem. Um, so So the goal
508
00:34:58.110 --> 00:35:00.670 A:middle L:90%
here is that you have an algorithm of natural bank
509
00:35:00.670 --> 00:35:05.420 A:middle L:90%
problem, and you should become either come to the
510
00:35:05.420 --> 00:35:09.949 A:middle L:90%
side of the distribution. Um, and then the
511
00:35:09.960 --> 00:35:15.670 A:middle L:90%
algorithm is adapted, meaning that the learn from past
512
00:35:15.679 --> 00:35:20.179 A:middle L:90%
data and then for dessert, nature. Even then
513
00:35:20.179 --> 00:35:24.349 A:middle L:90%
you can define the value of the average percentage would
514
00:35:24.360 --> 00:35:30.340 A:middle L:90%
collapse Hope. So this is, uh, compared
515
00:35:30.340 --> 00:35:31.900 A:middle L:90%
to regret the same set of, uh, aggressive
516
00:35:31.920 --> 00:35:37.500 A:middle L:90%
difference between, uh, we can actually measure.
517
00:35:37.510 --> 00:35:40.889 A:middle L:90%
We cannot measure the optimal because we don't have a
518
00:35:40.929 --> 00:35:45.969 A:middle L:90%
conservation. That's why you're not fishing for religious emergency
519
00:35:45.340 --> 00:35:51.840 A:middle L:90%
. Uh, look at the financial quantity and then
520
00:35:51.849 --> 00:35:54.619 A:middle L:90%
for study, uh, study How high? It
521
00:35:54.619 --> 00:36:00.380 A:middle L:90%
doesn't learn something policy doesn't learn from when the savings
522
00:36:00.380 --> 00:36:04.219 A:middle L:90%
accounts always recommend the market. So so now can
523
00:36:04.230 --> 00:36:09.469 A:middle L:90%
this vulgar the first one can achieve. So these
524
00:36:09.469 --> 00:36:13.610 A:middle L:90%
are these two numbers are the numbers trying to estimate
525
00:36:13.619 --> 00:36:17.309 A:middle L:90%
prevented the way? Uh, the question is whether
526
00:36:17.309 --> 00:36:20.780 A:middle L:90%
we can do it online. So that was a
527
00:36:20.780 --> 00:36:24.179 A:middle L:90%
lot of data from QuickBooks. We don't want to
528
00:36:24.190 --> 00:36:27.480 A:middle L:90%
do when we have the algorithm. We don't want
529
00:36:27.480 --> 00:36:31.579 A:middle L:90%
to be the algorithm real system. It's always depend
530
00:36:31.579 --> 00:36:34.820 A:middle L:90%
a lot of time. It takes money. And
531
00:36:35.349 --> 00:36:40.920 A:middle L:90%
so immigration utilize data to evaluate a new algorithm in
532
00:36:40.920 --> 00:36:46.820 A:middle L:90%
a lot of benefits. That way uh, even
533
00:36:46.829 --> 00:36:51.000 A:middle L:90%
, uh, testing system. In a way,
534
00:36:51.010 --> 00:36:55.750 A:middle L:90%
this reserve experiences and it's very common in a machine
535
00:36:55.809 --> 00:37:00.369 A:middle L:90%
power, like we have a lot of benchmark.
536
00:37:00.369 --> 00:37:04.579 A:middle L:90%
It is that organizes input, use of nature versus
537
00:37:04.590 --> 00:37:08.789 A:middle L:90%
label finding classification zero in this case. So it's
538
00:37:08.909 --> 00:37:12.849 A:middle L:90%
a different training part of the test part. I
539
00:37:12.860 --> 00:37:17.219 A:middle L:90%
think we have done a very uh huh. But
540
00:37:17.230 --> 00:37:21.730 A:middle L:90%
interactive machine learning problem is not so straightforward. So
541
00:37:21.730 --> 00:37:22.780 A:middle L:90%
, for instance, better in the morning on benefits
542
00:37:22.780 --> 00:37:27.130 A:middle L:90%
, especially in this for the contact use of interest
543
00:37:27.170 --> 00:37:30.860 A:middle L:90%
we have recommended on and also the use of my
544
00:37:30.860 --> 00:37:35.280 A:middle L:90%
article for that, uh, so when you use
545
00:37:35.289 --> 00:37:37.889 A:middle L:90%
this data, this historical data to develop a new
546
00:37:37.900 --> 00:37:40.760 A:middle L:90%
outbreak, then you don't have the rewards technical where
547
00:37:40.760 --> 00:37:45.199 A:middle L:90%
the algorithm recommends and then on the law, because
548
00:37:45.199 --> 00:37:47.599 A:middle L:90%
we only see we work on the only see when
549
00:37:47.610 --> 00:37:51.960 A:middle L:90%
you press on the on the right Now for other
550
00:37:51.969 --> 00:37:54.090 A:middle L:90%
on this state. Uh, therefore, there's a
551
00:37:54.099 --> 00:37:58.019 A:middle L:90%
this is what we call a national level nature.
552
00:37:58.030 --> 00:38:00.530 A:middle L:90%
We don't see that we work for honest, not
553
00:38:00.530 --> 00:38:04.570 A:middle L:90%
in the law. Uh, for this reason,
554
00:38:04.579 --> 00:38:08.409 A:middle L:90%
A straightforward way to use historical data as follows.
555
00:38:08.420 --> 00:38:12.130 A:middle L:90%
Um, so let's say you have a lot of
556
00:38:12.139 --> 00:38:16.360 A:middle L:90%
historical data this morning. You interact and work and
557
00:38:16.460 --> 00:38:22.300 A:middle L:90%
whatever statistical machine learning techniques to estimate this, we
558
00:38:22.300 --> 00:38:24.650 A:middle L:90%
work at work. So you can say to have
559
00:38:24.659 --> 00:38:29.949 A:middle L:90%
a function that stimulates what users in practice. And
560
00:38:29.949 --> 00:38:31.260 A:middle L:90%
hopefully, if you can get this right, then
561
00:38:31.260 --> 00:38:35.690 A:middle L:90%
you can use this simulator to estimate how well the
562
00:38:35.699 --> 00:38:39.599 A:middle L:90%
algorithm. Unfortunately, um, the stuff here is
563
00:38:39.610 --> 00:38:44.170 A:middle L:90%
very difficult. All bias into more consumption is such
564
00:38:44.170 --> 00:38:46.429 A:middle L:90%
a uh and then the second step here, evaporation
565
00:38:46.440 --> 00:38:50.210 A:middle L:90%
to be unreliable. So that gives you a lot
566
00:38:50.210 --> 00:38:52.050 A:middle L:90%
of pain. When you use this kind of data
567
00:38:52.150 --> 00:38:58.079 A:middle L:90%
violation, you say, let's say a 10% so
568
00:38:58.090 --> 00:39:00.579 A:middle L:90%
that's a good number. But since these numbers are
569
00:39:00.590 --> 00:39:06.170 A:middle L:90%
undeniable, so because my vision in 10% 1% or
570
00:39:06.179 --> 00:39:09.809 A:middle L:90%
20%. So it's not good for my evaluation purposes
571
00:39:09.820 --> 00:39:14.019 A:middle L:90%
. And in contrast, uh, solution that we
572
00:39:14.030 --> 00:39:16.269 A:middle L:90%
and that is a kind of modern step, which
573
00:39:16.269 --> 00:39:22.130 A:middle L:90%
is make a procedure simple and most importantly, we
574
00:39:22.139 --> 00:39:27.219 A:middle L:90%
try to imagine a method that some unbiased. So
575
00:39:27.219 --> 00:39:30.630 A:middle L:90%
it's reliable. So this is the so here's how
576
00:39:30.630 --> 00:39:36.809 A:middle L:90%
we're going to do it again. So this is
577
00:39:36.809 --> 00:39:39.300 A:middle L:90%
the quantity which way? Simplicity. I just showed
578
00:39:39.300 --> 00:39:44.019 A:middle L:90%
the, uh, case, uh, in case
579
00:39:44.030 --> 00:39:46.440 A:middle L:90%
you have done the same way better. Here we
580
00:39:46.440 --> 00:39:52.230 A:middle L:90%
have an algorithm is a key requirement in the data
581
00:39:52.239 --> 00:39:57.130 A:middle L:90%
collection is that we have, uh, in this
582
00:39:57.130 --> 00:40:00.329 A:middle L:90%
war, all this age, you have to be
583
00:40:00.340 --> 00:40:04.860 A:middle L:90%
chosen on the revenue. So this young piece that
584
00:40:04.869 --> 00:40:08.909 A:middle L:90%
all our candidates help chances state. So no one
585
00:40:09.150 --> 00:40:15.820 A:middle L:90%
, no one will be starting from 2016. Um
586
00:40:16.059 --> 00:40:19.409 A:middle L:90%
, And then, if that's the function, both
587
00:40:19.420 --> 00:40:22.019 A:middle L:90%
then when we so then we can go to the
588
00:40:22.030 --> 00:40:24.519 A:middle L:90%
data one by one. So you can look at
589
00:40:24.519 --> 00:40:30.230 A:middle L:90%
the data and then the resource to the way that
590
00:40:30.230 --> 00:40:35.860 A:middle L:90%
they were going to recommend a two. So we
591
00:40:35.869 --> 00:40:37.880 A:middle L:90%
have here is recommended articles, and then we can
592
00:40:37.889 --> 00:40:42.920 A:middle L:90%
compare this a hash to the article in the paper
593
00:40:44.619 --> 00:40:49.019 A:middle L:90%
. And when when these two articles are the same
594
00:40:49.030 --> 00:40:51.389 A:middle L:90%
, then we call it a match, and then
595
00:40:51.400 --> 00:40:55.530 A:middle L:90%
we have to work. Single article was the algorithm
596
00:40:55.539 --> 00:41:00.460 A:middle L:90%
to allow you to learn more, and it was
597
00:41:00.469 --> 00:41:02.849 A:middle L:90%
not a match. Then we just simply ignore that
598
00:41:02.849 --> 00:41:07.639 A:middle L:90%
step. So we use that point of the data
599
00:41:08.719 --> 00:41:12.000 A:middle L:90%
. And then finally, we add up all the
600
00:41:12.000 --> 00:41:16.880 A:middle L:90%
rewards in the masculine and modern Friday by. So
601
00:41:16.880 --> 00:41:21.750 A:middle L:90%
this icy indicator function, which is one when the
602
00:41:21.760 --> 00:41:27.380 A:middle L:90%
equal match zero when there is no match. So
603
00:41:27.389 --> 00:41:30.010 A:middle L:90%
since the arms in this past in this data is
604
00:41:30.019 --> 00:41:34.400 A:middle L:90%
chosen by uniform, that random and therefore the probability
605
00:41:34.400 --> 00:41:38.389 A:middle L:90%
that you see on that it's one Okay, uh
606
00:41:38.400 --> 00:41:40.949 A:middle L:90%
, network. And you see okay here, which
607
00:41:40.949 --> 00:41:46.739 A:middle L:90%
is to normalize something. But this is the size
608
00:41:46.739 --> 00:41:52.619 A:middle L:90%
of data. So they can. So that knowledge
609
00:41:52.630 --> 00:41:59.309 A:middle L:90%
or producer Okay, um so yes, probably.
610
00:41:59.320 --> 00:42:01.250 A:middle L:90%
Uh, so this method is satisfied with what we
611
00:42:01.250 --> 00:42:05.860 A:middle L:90%
wanted to be. So the first one is that
612
00:42:05.869 --> 00:42:09.889 A:middle L:90%
this number is estimated so which means that if you
613
00:42:09.900 --> 00:42:15.610 A:middle L:90%
use the estimated on the algorithm and on average,
614
00:42:15.619 --> 00:42:19.880 A:middle L:90%
keeping the same number because you run the algorithm real
615
00:42:19.880 --> 00:42:22.170 A:middle L:90%
system. So this is nice because you don't have
616
00:42:22.170 --> 00:42:24.050 A:middle L:90%
to run the system expensive way in the system.
617
00:42:24.050 --> 00:42:30.739 A:middle L:90%
But again, you can get reliable and seven,
618
00:42:30.750 --> 00:42:32.869 A:middle L:90%
it's about estimation there. So you can You can
619
00:42:32.869 --> 00:42:36.289 A:middle L:90%
also show that when you have more and more data
620
00:42:36.300 --> 00:42:38.860 A:middle L:90%
than the error estimation, error goes away. Decreases
621
00:42:38.860 --> 00:42:43.349 A:middle L:90%
to zero rate of this. So again, case
622
00:42:43.349 --> 00:42:45.409 A:middle L:90%
number, arms and no signs of abating. So
623
00:42:45.409 --> 00:42:49.880 A:middle L:90%
you have a very large out then this term is
624
00:42:49.889 --> 00:42:53.079 A:middle L:90%
small, and that's what estimation areas is very small
625
00:42:53.170 --> 00:42:57.409 A:middle L:90%
. So the second theory makes an idea how confident
626
00:42:57.420 --> 00:43:00.079 A:middle L:90%
to estimation is. So now I can say my
627
00:43:00.090 --> 00:43:05.510 A:middle L:90%
CEO, that with this deal algorithm, I can
628
00:43:05.519 --> 00:43:10.239 A:middle L:90%
improve the system by 10% plus minus 1% this country
629
00:43:10.250 --> 00:43:15.619 A:middle L:90%
. Uh, so, yeah, so let me
630
00:43:15.619 --> 00:43:19.889 A:middle L:90%
show you something, case knowledge. Is that how
631
00:43:21.800 --> 00:43:22.960 A:middle L:90%
so? We have a lot more in the recent
632
00:43:23.409 --> 00:43:30.079 A:middle L:90%
, uh, recommendations. Um, we have to
633
00:43:30.090 --> 00:43:35.570 A:middle L:90%
evaluate, start much power with she recommends producers,
634
00:43:35.820 --> 00:43:39.329 A:middle L:90%
and then we have a data violation. So this
635
00:43:39.340 --> 00:43:44.590 A:middle L:90%
data satisfied climate that, uh, the article recommended
636
00:43:44.599 --> 00:43:50.139 A:middle L:90%
this. Uh huh. And then we use this
637
00:43:50.150 --> 00:43:53.639 A:middle L:90%
data to do online valuation for these policies of this
638
00:43:53.650 --> 00:43:58.440 A:middle L:90%
problem. So now we have two numbers. The
639
00:43:58.449 --> 00:44:02.610 A:middle L:90%
first is that because Iran is also in the campaign
640
00:44:02.619 --> 00:44:06.159 A:middle L:90%
. So we have metrics to measure what the average
641
00:44:06.170 --> 00:44:09.110 A:middle L:90%
trip to a sexual using Rivera just a counter process
642
00:44:09.110 --> 00:44:13.139 A:middle L:90%
experiencing on the other hand, we have outlined valuation
643
00:44:13.139 --> 00:44:16.280 A:middle L:90%
metrics using methodology scratch now. And the question is
644
00:44:16.289 --> 00:44:20.539 A:middle L:90%
whether the clothes, the clothes, that means the
645
00:44:20.550 --> 00:44:24.989 A:middle L:90%
outline valuation that is reliable French. So let me
646
00:44:24.989 --> 00:44:28.900 A:middle L:90%
show you some numbers. So the first part is
647
00:44:28.909 --> 00:44:32.969 A:middle L:90%
the water of the online click through rate of an
648
00:44:32.980 --> 00:44:37.730 A:middle L:90%
article versus the online estimated to sit down. So
649
00:44:37.739 --> 00:44:42.039 A:middle L:90%
every point respond to one of the articles a 10
650
00:44:42.039 --> 00:44:45.590 A:middle L:90%
day period so you can see the articles here.
651
00:44:45.599 --> 00:44:51.599 A:middle L:90%
Um and then I feel if the offline estimated a
652
00:44:51.610 --> 00:44:55.219 A:middle L:90%
that is a the same as online PR, then
653
00:44:55.230 --> 00:45:00.320 A:middle L:90%
every point right on this one x line, but
654
00:45:00.329 --> 00:45:01.949 A:middle L:90%
later on a finite. And there's some noises.
655
00:45:01.960 --> 00:45:07.139 A:middle L:90%
So you don't actually see, uh, in perfect
656
00:45:07.139 --> 00:45:09.250 A:middle L:90%
alignment agency that they're very close to this while you
657
00:45:09.260 --> 00:45:14.579 A:middle L:90%
explain meaning that this estimated to be in compliance so
658
00:45:14.579 --> 00:45:17.920 A:middle L:90%
you can give you a reliant estimated probably about running
659
00:45:19.219 --> 00:45:23.179 A:middle L:90%
. So here's another bottle that shows that compares.
660
00:45:23.190 --> 00:45:28.840 A:middle L:90%
The baby's getting on okay, offices and they that's
661
00:45:28.840 --> 00:45:31.239 A:middle L:90%
a mandate, period. So the green walk the
662
00:45:31.250 --> 00:45:36.409 A:middle L:90%
red one is the online and the background here,
663
00:45:36.420 --> 00:45:39.679 A:middle L:90%
the Alibaba CEO and each one responded overall city of
664
00:45:39.679 --> 00:45:42.519 A:middle L:90%
the whole day, you can see that there's a
665
00:45:42.530 --> 00:45:45.059 A:middle L:90%
strong correlation between these two curves. Okay, so
666
00:45:45.059 --> 00:45:50.989 A:middle L:90%
it's also equally on the table on four applicator.
667
00:45:52.900 --> 00:45:57.760 A:middle L:90%
The second question how the estimation ever of the CPR
668
00:45:57.769 --> 00:46:00.000 A:middle L:90%
carries zero and you have more and more data.
669
00:46:00.010 --> 00:46:01.929 A:middle L:90%
So we control the size of the U. S
670
00:46:02.039 --> 00:46:06.800 A:middle L:90%
. And look at the estimation of the sciences.
671
00:46:07.099 --> 00:46:09.920 A:middle L:90%
And, uh, I mentioned that they are mobilized
672
00:46:10.260 --> 00:46:15.190 A:middle L:90%
because business sensitivity We are not allowed to view the
673
00:46:15.199 --> 00:46:17.670 A:middle L:90%
real numbers. So my mother, all these numbers
674
00:46:17.679 --> 00:46:25.460 A:middle L:90%
by constant constant therefore so these numbers are absolutely You
675
00:46:25.460 --> 00:46:30.139 A:middle L:90%
can see that this estimation error is not allowed to
676
00:46:30.159 --> 00:46:37.010 A:middle L:90%
feel that destination to always, so you can actually
677
00:46:37.010 --> 00:46:42.570 A:middle L:90%
see it back. So there are many extensions of
678
00:46:42.579 --> 00:46:46.849 A:middle L:90%
the considerations of them. There are improvements for when
679
00:46:46.849 --> 00:46:51.909 A:middle L:90%
you don't have access to the data into something and
680
00:46:52.269 --> 00:46:55.340 A:middle L:90%
you can see the last estimate or projection to do
681
00:46:55.349 --> 00:47:01.650 A:middle L:90%
more fancy stuff data. So you see that interested
682
00:47:01.650 --> 00:47:07.360 A:middle L:90%
? I can explain one online. Okay. So
683
00:47:07.360 --> 00:47:10.710 A:middle L:90%
far, retail. So in this online operation,
684
00:47:10.719 --> 00:47:15.710 A:middle L:90%
such part where we have shown is that a biased
685
00:47:15.719 --> 00:47:19.349 A:middle L:90%
or reliable have to evaluate the most of them are
686
00:47:19.360 --> 00:47:23.050 A:middle L:90%
driven from historical data without without having to be a
687
00:47:23.059 --> 00:47:27.489 A:middle L:90%
real system. So you can think of why not
688
00:47:27.659 --> 00:47:30.019 A:middle L:90%
to, uh, datasets in supplies, for instance
689
00:47:30.659 --> 00:47:37.059 A:middle L:90%
, or consultation for, uh, interesting. So
690
00:47:37.070 --> 00:47:40.510 A:middle L:90%
this is the first benchmark it's just released by that
691
00:47:40.510 --> 00:47:45.099 A:middle L:90%
philosophy, interested in this kind of problem down.
692
00:47:45.289 --> 00:47:49.489 A:middle L:90%
Uh, And the second version is coming for competition
693
00:47:49.500 --> 00:47:53.369 A:middle L:90%
. Probably fostered by hostile. And also, uh
694
00:47:53.380 --> 00:47:59.320 A:middle L:90%
, also, I send out workshop. Okay,
695
00:47:59.320 --> 00:48:01.519 A:middle L:90%
So let me show you some experiments. Um,
696
00:48:01.530 --> 00:48:06.070 A:middle L:90%
how well we work. And so I have.
697
00:48:06.070 --> 00:48:10.539 A:middle L:90%
So you also describe how to do evaluations. So
698
00:48:10.539 --> 00:48:14.719 A:middle L:90%
this section is about how to find a way to
699
00:48:14.730 --> 00:48:22.670 A:middle L:90%
evaluate the, uh So we have, uh,
700
00:48:22.679 --> 00:48:27.679 A:middle L:90%
about 20 upwards her. Uh huh. And then
701
00:48:27.690 --> 00:48:31.269 A:middle L:90%
you dimensionality of use of features, which over 100
702
00:48:31.619 --> 00:48:36.949 A:middle L:90%
features by critical important analysis, uh, make the
703
00:48:36.960 --> 00:48:40.550 A:middle L:90%
future. Uh, And then, uh, so
704
00:48:40.559 --> 00:48:44.769 A:middle L:90%
remember, there's a model. There's an updated because
705
00:48:44.769 --> 00:48:47.780 A:middle L:90%
the system, uh, system infrastructure constraints. So
706
00:48:47.780 --> 00:48:52.739 A:middle L:90%
we also stimulate that fact. Particularly updated model every
707
00:48:52.750 --> 00:48:57.210 A:middle L:90%
five minutes. So finally use use the protected by
708
00:48:57.219 --> 00:49:00.250 A:middle L:90%
the last five minutes. Model used to model to
709
00:49:00.860 --> 00:49:05.130 A:middle L:90%
the next. Uh, hi. Would you like
710
00:49:05.659 --> 00:49:07.670 A:middle L:90%
to do a model here? They use the model
711
00:49:07.679 --> 00:49:14.599 A:middle L:90%
to use the existing time period and a model for
712
00:49:15.340 --> 00:49:20.079 A:middle L:90%
the next because it's great. And then the main
713
00:49:20.079 --> 00:49:22.250 A:middle L:90%
metric there'll be compared to show the next few slides
714
00:49:22.260 --> 00:49:28.900 A:middle L:90%
is the overall novel CPR each algorithm, uh,
715
00:49:28.909 --> 00:49:34.340 A:middle L:90%
a pocket. So, uh, so that's where
716
00:49:34.340 --> 00:49:37.150 A:middle L:90%
you have, Uh, yeah, but a bucket
717
00:49:37.159 --> 00:49:40.489 A:middle L:90%
here that you can run your content outward and then
718
00:49:40.500 --> 00:49:45.300 A:middle L:90%
in asserting bucket you can use the model is used
719
00:49:45.300 --> 00:49:49.900 A:middle L:90%
for users in this pocket by trying on all the
720
00:49:49.900 --> 00:49:52.949 A:middle L:90%
exploration components here. So when it usually comes here
721
00:49:52.960 --> 00:49:54.769 A:middle L:90%
, uh, it's been a point inside of the
722
00:49:54.780 --> 00:49:59.130 A:middle L:90%
user. All this in your pocket, Uh,
723
00:49:59.139 --> 00:50:02.159 A:middle L:90%
this story, bucket and bucket. And, uh
724
00:50:02.170 --> 00:50:07.630 A:middle L:90%
, welcome to the user that you also an exploration
725
00:50:07.630 --> 00:50:10.920 A:middle L:90%
here. Users also this pocket, then, uh
726
00:50:10.960 --> 00:50:16.190 A:middle L:90%
, Children without doing so. Because the value of
727
00:50:16.199 --> 00:50:21.480 A:middle L:90%
this started back here that are most important measures How
728
00:50:21.489 --> 00:50:24.300 A:middle L:90%
? Well, how well the model converts to run
729
00:50:27.760 --> 00:50:30.710 A:middle L:90%
. Yeah. So the first one, the first
730
00:50:30.710 --> 00:50:37.420 A:middle L:90%
comparison between these models and generally models. So everyone's
731
00:50:37.429 --> 00:50:40.610 A:middle L:90%
related models seeking a 31 structure store equation. And
732
00:50:40.619 --> 00:50:45.730 A:middle L:90%
this is the$20 1st, the left hand side
733
00:50:45.739 --> 00:50:49.090 A:middle L:90%
, actually. Really? So I found is the
734
00:50:49.099 --> 00:50:52.809 A:middle L:90%
probability that you choose depending on the exploration, the
735
00:50:52.809 --> 00:50:57.329 A:middle L:90%
right one is You see, we control the Crown
736
00:50:57.340 --> 00:51:00.539 A:middle L:90%
Corporation where you can find me and conference center.
737
00:51:00.610 --> 00:51:04.820 A:middle L:90%
So basically, offer is more confidence in the okay
738
00:51:04.829 --> 00:51:09.010 A:middle L:90%
, So more about more inspiration. And then even
739
00:51:09.019 --> 00:51:14.429 A:middle L:90%
so you can see that all the usually all the
740
00:51:14.440 --> 00:51:17.309 A:middle L:90%
food or the green ground curved bars are much higher
741
00:51:17.309 --> 00:51:22.340 A:middle L:90%
than the blue one. Meaning that, uh,
742
00:51:22.349 --> 00:51:27.349 A:middle L:90%
generalized linear models that provide the model than Windows because
743
00:51:27.360 --> 00:51:31.579 A:middle L:90%
the country, the refinery, worst signals and problem
744
00:51:31.760 --> 00:51:37.260 A:middle L:90%
And also the second observation is that compared to the
745
00:51:37.269 --> 00:51:38.809 A:middle L:90%
left hand side, right hand side. You see
746
00:51:38.809 --> 00:51:42.400 A:middle L:90%
, the BCB exploration is usually more efficient when you
747
00:51:42.400 --> 00:51:45.480 A:middle L:90%
have the right friends often here, which is also
748
00:51:45.480 --> 00:51:50.420 A:middle L:90%
consistent with previous work that shows you cities in general
749
00:51:50.420 --> 00:51:54.489 A:middle L:90%
more attractive exploration strategy than have strong really driving exploration
750
00:51:55.050 --> 00:52:00.840 A:middle L:90%
A second. A lot of you explain quickly is
751
00:52:00.849 --> 00:52:05.789 A:middle L:90%
the comparison between Thompson something and you see so many
752
00:52:05.789 --> 00:52:07.829 A:middle L:90%
algorithms here. I'm going to No, most of
753
00:52:07.840 --> 00:52:14.250 A:middle L:90%
them are focusing on Thompson something to here. And
754
00:52:14.250 --> 00:52:16.389 A:middle L:90%
they used to be one which is the agreement here
755
00:52:16.400 --> 00:52:22.789 A:middle L:90%
. So x delay minutes. So that way you
756
00:52:22.789 --> 00:52:27.639 A:middle L:90%
want it still slightly better and that way. But
757
00:52:27.650 --> 00:52:29.920 A:middle L:90%
as you increase the way that I think the comment
758
00:52:29.920 --> 00:52:35.280 A:middle L:90%
something is competitive uniform on this for me and more
759
00:52:35.280 --> 00:52:38.079 A:middle L:90%
importantly, within 60 minutes when the randomizes is more
760
00:52:38.079 --> 00:52:42.730 A:middle L:90%
robust doesn't seem to be affected by the way.
761
00:52:42.739 --> 00:52:50.800 A:middle L:90%
But in contrast to a so uh included in the
762
00:52:50.809 --> 00:52:53.679 A:middle L:90%
first part. So I showed them how to use
763
00:52:53.679 --> 00:52:58.889 A:middle L:90%
potential benefits principle to, uh, a lot of
764
00:52:58.900 --> 00:53:02.880 A:middle L:90%
critical notifications like this recommendation, ranking of computational advertisement
765
00:53:04.150 --> 00:53:07.400 A:middle L:90%
also show you how to use the starting point evaluation
766
00:53:07.409 --> 00:53:14.260 A:middle L:90%
without implementing something real system Also show you encouraging results
767
00:53:14.269 --> 00:53:21.809 A:middle L:90%
in communications is English recommendation on you later and particularly
768
00:53:21.820 --> 00:53:29.039 A:middle L:90%
I want to highlight the practices using the exploration exploration
769
00:53:30.480 --> 00:53:36.809 A:middle L:90%
future many interesting what to do offline violation many ways
770
00:53:36.820 --> 00:53:39.099 A:middle L:90%
to use non flavor. And also, when you
771
00:53:39.099 --> 00:53:45.159 A:middle L:90%
have prior knowledge devised a much better talk something along
772
00:53:45.159 --> 00:53:49.280 A:middle L:90%
that line by prior model, you can use prior
773
00:53:49.280 --> 00:53:51.610 A:middle L:90%
knowledge that way. Many other ways to find,
774
00:53:51.710 --> 00:53:55.489 A:middle L:90%
also in many various abandoned will be rectified. Reality
775
00:53:55.500 --> 00:54:01.250 A:middle L:90%
. Uh, so, uh okay, so that's
776
00:54:01.250 --> 00:54:05.760 A:middle L:90%
the, uh, the first part of a national
777
00:54:05.760 --> 00:54:09.139 A:middle L:90%
ban it So we need a couple more minutes on
778
00:54:09.150 --> 00:54:15.570 A:middle L:90%
research, so I have a background in the most
779
00:54:15.579 --> 00:54:21.860 A:middle L:90%
learning. Uh, so unfortunately, uh, it's
780
00:54:21.860 --> 00:54:25.199 A:middle L:90%
, uh, much money problem that optimized strategies and
781
00:54:25.199 --> 00:54:29.409 A:middle L:90%
the sequential decision making important. So here's one example
782
00:54:29.409 --> 00:54:32.579 A:middle L:90%
that are working towards and, uh, in China
783
00:54:32.590 --> 00:54:37.570 A:middle L:90%
, so it's called a dire uh, it's a
784
00:54:37.579 --> 00:54:40.389 A:middle L:90%
system that calls the young system that says, there
785
00:54:40.400 --> 00:54:43.940 A:middle L:90%
you can hold it. We can get out of
786
00:54:44.449 --> 00:54:49.550 A:middle L:90%
China and valuable transfer the call to the person that
787
00:54:49.559 --> 00:54:52.809 A:middle L:90%
you want. So I get that you want to
788
00:54:52.820 --> 00:54:57.800 A:middle L:90%
call someone here each and then and then you say
789
00:54:57.800 --> 00:55:00.530 A:middle L:90%
something like that, Max Peter Johnson go home.
790
00:55:00.539 --> 00:55:04.289 A:middle L:90%
And then this is a sound signal and then the
791
00:55:04.300 --> 00:55:07.949 A:middle L:90%
speech recognition techniques, and then transfer to some computer
792
00:55:07.949 --> 00:55:14.639 A:middle L:90%
recommended representation to take antiviral that depending on the signal
793
00:55:14.639 --> 00:55:16.719 A:middle L:90%
and decide whether you understand the question well, even
794
00:55:16.719 --> 00:55:21.260 A:middle L:90%
understanding and transfer the call to jump it was not
795
00:55:21.260 --> 00:55:22.639 A:middle L:90%
sure. Then you can Can you confirm that you
796
00:55:22.650 --> 00:55:25.659 A:middle L:90%
really want you really want to join me and then
797
00:55:25.670 --> 00:55:31.510 A:middle L:90%
correct question. Okay, so that repeat. So
798
00:55:31.510 --> 00:55:35.449 A:middle L:90%
in this kind of process, the notion of state
799
00:55:35.449 --> 00:55:37.780 A:middle L:90%
there's notion of actions and also, uh and then
800
00:55:37.780 --> 00:55:40.860 A:middle L:90%
we want to design a dialogue by these two so
801
00:55:40.860 --> 00:55:45.449 A:middle L:90%
that the system can succeed in, uh, still
802
00:55:45.460 --> 00:55:46.590 A:middle L:90%
conversation as possible. So if you do this,
803
00:55:46.590 --> 00:55:51.210 A:middle L:90%
we can define report function of minus one response.
804
00:55:51.219 --> 00:55:54.690 A:middle L:90%
So response here, then is its success again,
805
00:55:54.699 --> 00:55:59.659 A:middle L:90%
minus 20. So I maximize the reward system and
806
00:55:59.670 --> 00:56:07.719 A:middle L:90%
devilish behavior to optimize by the objective. So cute
807
00:56:07.730 --> 00:56:12.949 A:middle L:90%
, teacher that that report with defined world cultures systems
808
00:56:12.960 --> 00:56:17.989 A:middle L:90%
do left trade reports seem to have a lot of
809
00:56:19.000 --> 00:56:25.449 A:middle L:90%
problems. Control problems gain, not computing Cuban introduction
810
00:56:25.460 --> 00:56:29.530 A:middle L:90%
patients. Um, so let's see that usually,
811
00:56:29.900 --> 00:56:40.469 A:middle L:90%
uh, position processes. Uh, so in my
812
00:56:40.469 --> 00:56:45.639 A:middle L:90%
dissertation work on efficient exploration and algorithms for solving reinforcement
813
00:56:45.639 --> 00:56:49.170 A:middle L:90%
, learning a market position process, that the idea
814
00:56:49.170 --> 00:56:52.900 A:middle L:90%
is distinguished, non process is unimportant dynamics. And
815
00:56:52.900 --> 00:56:57.610 A:middle L:90%
then So if I connect that patient and I know
816
00:56:57.610 --> 00:57:00.579 A:middle L:90%
where I should expose at this stage, if I
817
00:57:00.590 --> 00:57:04.409 A:middle L:90%
know where certain about dynamics and they can do exploitation
818
00:57:04.420 --> 00:57:07.369 A:middle L:90%
so that it would be useful for doing exploration in
819
00:57:07.380 --> 00:57:13.460 A:middle L:90%
reinforcement rain and particularly proposed a simplified the process of
820
00:57:13.469 --> 00:57:15.000 A:middle L:90%
faith, uh, most what he knows, And
821
00:57:15.010 --> 00:57:19.969 A:middle L:90%
from within, you can devise a principal algorithm called
822
00:57:19.980 --> 00:57:23.420 A:middle L:90%
big formats that utilizes this, uh, principal during
823
00:57:23.420 --> 00:57:29.599 A:middle L:90%
exploration and then unify close to many of the existing
824
00:57:29.610 --> 00:57:32.510 A:middle L:90%
Nothing. You want various kinds of various kinds of
825
00:57:32.510 --> 00:57:38.070 A:middle L:90%
reinforcement problems, So it's not like a slight.
826
00:57:38.079 --> 00:57:40.440 A:middle L:90%
So in the first part, I talked about that
827
00:57:40.449 --> 00:57:45.610 A:middle L:90%
problem capture Internet application. And then, uh so
828
00:57:45.619 --> 00:57:49.369 A:middle L:90%
, uh, deep reinforcement, learning. And two
829
00:57:49.369 --> 00:57:52.190 A:middle L:90%
minutes that captured many sequences, decision making problems,
830
00:57:52.199 --> 00:57:55.969 A:middle L:90%
biologics and CI. Uh, so you can see
831
00:57:55.969 --> 00:58:00.119 A:middle L:90%
that the reinforcement camps a lot of, uh,
832
00:58:00.130 --> 00:58:04.570 A:middle L:90%
my research focuses in on exploration, expectation, tradeoff
833
00:58:04.989 --> 00:58:09.650 A:middle L:90%
, evaluation and also working on the selection and additional
834
00:58:09.650 --> 00:58:14.829 A:middle L:90%
confirmation of convergence. Trade analysis. Uh, unfortunately
835
00:58:15.610 --> 00:58:20.599 A:middle L:90%
, you also time right after learning, learning.
836
00:58:22.139 --> 00:58:23.550 A:middle L:90%
So, yeah. So that's the end of my
837
00:58:23.559 --> 00:58:30.530 A:middle L:90%
talk and I depression That our thanks. Yeah.