WEBVTT
1
00:00:09.140 --> 00:00:12.580 A:middle L:90%
Thanks for the introduction. So I hope everybody can
2
00:00:12.580 --> 00:00:15.240 A:middle L:90%
hear me please do safe? Not. And uh
3
00:00:15.250 --> 00:00:18.129 A:middle L:90%
I hope this is visible. It is great.
4
00:00:18.140 --> 00:00:20.690 A:middle L:90%
Thanks. So, uh as I said, I'm
5
00:00:20.690 --> 00:00:22.609 A:middle L:90%
on today, I'll be talking about dynamical processes on
6
00:00:22.609 --> 00:00:27.410 A:middle L:90%
large networks today. And uh uh first I hope
7
00:00:27.410 --> 00:00:29.339 A:middle L:90%
I don't want to have to convince you that networks
8
00:00:29.339 --> 00:00:31.949 A:middle L:90%
are really everywhere. For example, uh the social
9
00:00:31.949 --> 00:00:34.420 A:middle L:90%
network for example, where people are friends are uh
10
00:00:34.429 --> 00:00:36.579 A:middle L:90%
of each other. And you can see the facebook
11
00:00:36.579 --> 00:00:39.090 A:middle L:90%
network here in 2010 is a very interesting network.
12
00:00:39.090 --> 00:00:43.210 A:middle L:90%
It's a human disease network where notes diseases and edges
13
00:00:43.210 --> 00:00:45.969 A:middle L:90%
are between diseases share genes. So the nice thing
14
00:00:45.969 --> 00:00:48.250 A:middle L:90%
about networks is that they give you the local structure
15
00:00:48.250 --> 00:00:50.079 A:middle L:90%
as well as the global information. Right? Like
16
00:00:50.090 --> 00:00:52.609 A:middle L:90%
you can quickly see how the social network is distributed
17
00:00:52.619 --> 00:00:55.270 A:middle L:90%
throughout the world. You can see how diseases interact
18
00:00:55.280 --> 00:00:58.039 A:middle L:90%
and how the clusters and how similar diseases share similar
19
00:00:58.039 --> 00:01:02.600 A:middle L:90%
genes. So, uh, understanding that what I
20
00:01:02.600 --> 00:01:04.239 A:middle L:90%
wish to convey in this talk is that dynamical processes
21
00:01:04.239 --> 00:01:07.950 A:middle L:90%
over networks are also everywhere. And what it means
22
00:01:07.950 --> 00:01:10.480 A:middle L:90%
by dynamical process will be made clear soon. But
23
00:01:10.480 --> 00:01:12.530 A:middle L:90%
essentially some kind of propagation of spreading kind process on
24
00:01:12.530 --> 00:01:17.090 A:middle L:90%
the network. So why do we care? So
25
00:01:17.090 --> 00:01:19.120 A:middle L:90%
why do we care about this dynamical processes? We
26
00:01:19.120 --> 00:01:21.159 A:middle L:90%
care Because it occurs in lots of different domains and
27
00:01:21.159 --> 00:01:25.049 A:middle L:90%
lots of feels for example, online information refusing is
28
00:01:25.049 --> 00:01:27.079 A:middle L:90%
kind of like a dynamical process viral marketing where people
29
00:01:27.079 --> 00:01:30.500 A:middle L:90%
recommend products to each other on amazon, on twitter
30
00:01:30.500 --> 00:01:34.370 A:middle L:90%
and sales propagate are also a dynamical process. Then
31
00:01:34.370 --> 00:01:37.700 A:middle L:90%
they are also in cybersecurity where virus processes propagate the
32
00:01:37.700 --> 00:01:38.879 A:middle L:90%
epidemiology and public health. It's a big application area
33
00:01:38.879 --> 00:01:42.170 A:middle L:90%
for those kinds of processes and so on. So
34
00:01:42.170 --> 00:01:44.859 A:middle L:90%
I try to give you a sense of two main
35
00:01:44.859 --> 00:01:47.079 A:middle L:90%
application areas in my talk, one is the epidemiology
36
00:01:47.079 --> 00:01:49.109 A:middle L:90%
and one social media. So in epidemiology, the
37
00:01:49.120 --> 00:01:53.060 A:middle L:90%
dynamical process essentially disease spreading over contact networks, right
38
00:01:53.069 --> 00:01:55.170 A:middle L:90%
? Like for example in this contact network, you
39
00:01:55.170 --> 00:01:59.239 A:middle L:90%
have this uh infected with some disease flew his ill
40
00:01:59.250 --> 00:02:00.859 A:middle L:90%
probably spread to one of his neighbors. And what
41
00:02:00.859 --> 00:02:02.950 A:middle L:90%
I mean by contact networks, especially people who come
42
00:02:02.950 --> 00:02:05.530 A:middle L:90%
in contact with each other. Right. Who can
43
00:02:05.530 --> 00:02:08.469 A:middle L:90%
actually spread the disease to one hour. So yeah
44
00:02:08.479 --> 00:02:10.620 A:middle L:90%
, happens. This is a very interesting network actually
45
00:02:10.620 --> 00:02:14.490 A:middle L:90%
. It was the american general Health. It was
46
00:02:14.500 --> 00:02:16.050 A:middle L:90%
uh this is CDC data series is the Center for
47
00:02:16.050 --> 00:02:19.219 A:middle L:90%
Disease Control in Atlanta. And this is the visual
48
00:02:19.229 --> 00:02:22.479 A:middle L:90%
visualization of the 1st 35 tuberculosis patients. So what
49
00:02:22.479 --> 00:02:23.645 A:middle L:90%
it does is it shows you how the patient zero
50
00:02:23.645 --> 00:02:25.865 A:middle L:90%
was there and how he actually he or she spread
51
00:02:25.865 --> 00:02:29.745 A:middle L:90%
the uh disease to other parts of the uh,
52
00:02:29.754 --> 00:02:30.514 A:middle L:90%
in the population. And by the way, the
53
00:02:30.514 --> 00:02:34.685 A:middle L:90%
gray ones do not. People uh cases it's fatal
54
00:02:34.685 --> 00:02:36.525 A:middle L:90%
and the big ones do not which are not.
55
00:02:36.534 --> 00:02:38.064 A:middle L:90%
So you can quickly realized that which the nose you
56
00:02:38.064 --> 00:02:40.935 A:middle L:90%
should have immunized which uh help to control the disease
57
00:02:40.935 --> 00:02:45.914 A:middle L:90%
for example. So just to give you an example
58
00:02:45.914 --> 00:02:46.905 A:middle L:90%
of the type of questions we try to answer using
59
00:02:46.905 --> 00:02:50.955 A:middle L:90%
such uh terminology and abstraction is like this is one
60
00:02:50.955 --> 00:02:53.495 A:middle L:90%
of my work which I did with uh people like
61
00:02:53.495 --> 00:02:55.104 A:middle L:90%
michigan, like it's like each circle is the hospital
62
00:02:55.104 --> 00:03:00.425 A:middle L:90%
and this is like 303 3000 hospitals across the U
63
00:03:00.425 --> 00:03:00.525 A:middle L:90%
. S. And the U. S. Medicare
64
00:03:00.534 --> 00:03:04.104 A:middle L:90%
uh network. And this is like more than 30,000
65
00:03:04.104 --> 00:03:06.955 A:middle L:90%
patients transport. So the question here really was that
66
00:03:06.965 --> 00:03:08.004 A:middle L:90%
we had some k. Units of some infection control
67
00:03:08.004 --> 00:03:12.055 A:middle L:90%
resource. And we wanted to just decide how to
68
00:03:12.055 --> 00:03:15.294 A:middle L:90%
spread them out across the hospitals to immunize to minimize
69
00:03:15.294 --> 00:03:17.314 A:middle L:90%
the patients infected, right? So like we could
70
00:03:17.324 --> 00:03:21.405 A:middle L:90%
, so we developed an algorithm which was like this
71
00:03:21.405 --> 00:03:23.194 A:middle L:90%
the current practice, you can see like the red
72
00:03:23.194 --> 00:03:24.995 A:middle L:90%
ones do not, the hospitals are infected and you
73
00:03:24.995 --> 00:03:28.125 A:middle L:90%
can see the current our method is like almost six
74
00:03:28.125 --> 00:03:30.925 A:middle L:90%
times for your hospital infected. So that's the nice
75
00:03:30.925 --> 00:03:32.344 A:middle L:90%
thing, right? So uh these kinds of abstraction
76
00:03:32.344 --> 00:03:36.740 A:middle L:90%
help you solve such real problems. Uh so the
77
00:03:36.740 --> 00:03:38.439 A:middle L:90%
second application area, which I want to convey his
78
00:03:38.439 --> 00:03:40.740 A:middle L:90%
online diffusion which is like information diffusion in the social
79
00:03:40.750 --> 00:03:44.110 A:middle L:90%
sphere. For example, this is just a snapshot
80
00:03:44.110 --> 00:03:46.259 A:middle L:90%
of the startups and companies which are in the social
81
00:03:46.259 --> 00:03:50.120 A:middle L:90%
sphere in like 2007. So uh like facebook twitter
82
00:03:50.120 --> 00:03:52.330 A:middle L:90%
and linkedin are already really big companies with a lot
83
00:03:52.330 --> 00:03:54.349 A:middle L:90%
of revenue and also projected earnings. So it's a
84
00:03:54.349 --> 00:03:57.300 A:middle L:90%
big, it has a huge economic impact as well
85
00:03:57.310 --> 00:04:00.620 A:middle L:90%
. Uh So for example, in social viral marketing
86
00:04:00.620 --> 00:04:01.539 A:middle L:90%
, what would be the scenario? So you can
87
00:04:01.550 --> 00:04:04.319 A:middle L:90%
think of this as the mega celebrity who has thousands
88
00:04:04.319 --> 00:04:08.550 A:middle L:90%
of followers, all these little birds and suppose the
89
00:04:08.550 --> 00:04:10.419 A:middle L:90%
celebrity says bye to say either the U. S
90
00:04:10.419 --> 00:04:12.610 A:middle L:90%
. Has paid him or her or he tells of
91
00:04:12.610 --> 00:04:15.040 A:middle L:90%
his own volition but the resultant that follows go ahead
92
00:04:15.040 --> 00:04:18.550 A:middle L:90%
and buy something in and everyone makes money. So
93
00:04:18.560 --> 00:04:21.420 A:middle L:90%
uh this is essentially the demise of social media marketing
94
00:04:21.500 --> 00:04:25.360 A:middle L:90%
. Uh And of course given the events of the
95
00:04:25.360 --> 00:04:27.970 A:middle L:90%
past year or so like uh you might imagine the
96
00:04:27.970 --> 00:04:30.759 A:middle L:90%
social networks and uh this dynamical processes can be used
97
00:04:30.759 --> 00:04:32.620 A:middle L:90%
for collaborative action of some sort of changing the world
98
00:04:32.740 --> 00:04:36.569 A:middle L:90%
, right? So uh what I also want to
99
00:04:36.569 --> 00:04:40.370 A:middle L:90%
country is that you have different settings, multiple really
100
00:04:40.370 --> 00:04:43.149 A:middle L:90%
high impact settings but also similar questions are similar code
101
00:04:43.149 --> 00:04:45.829 A:middle L:90%
questions arising in different areas. For example, in
102
00:04:45.829 --> 00:04:47.649 A:middle L:90%
the uh social media setting, you can be like
103
00:04:47.740 --> 00:04:50.250 A:middle L:90%
squashing rumors right? Like somebody spread some bad rumor
104
00:04:50.250 --> 00:04:53.389 A:middle L:90%
about untrue things on twitter. How do you squash
105
00:04:53.389 --> 00:04:55.779 A:middle L:90%
them? How do you see the two information on
106
00:04:55.790 --> 00:04:58.389 A:middle L:90%
different people in the network? How do your opinion
107
00:04:58.389 --> 00:05:00.050 A:middle L:90%
spread? So how do rumors spread might change into
108
00:05:00.050 --> 00:05:03.689 A:middle L:90%
an epidemic spreading epidemiological setting? And how to opinion
109
00:05:03.689 --> 00:05:06.370 A:middle L:90%
spread in a facebook or in a group can be
110
00:05:06.370 --> 00:05:09.089 A:middle L:90%
a similar question can be asked. And how to
111
00:05:09.089 --> 00:05:12.230 A:middle L:90%
products are viruses spread or say a contact network or
112
00:05:12.240 --> 00:05:14.920 A:middle L:90%
influence network and how to market better in the twitter
113
00:05:14.920 --> 00:05:16.500 A:middle L:90%
said it can be also how to transmit software patches
114
00:05:16.509 --> 00:05:18.629 A:middle L:90%
. For example, Windows is working on ways to
115
00:05:18.639 --> 00:05:21.410 A:middle L:90%
transmit software patches in the best efficiently as possible to
116
00:05:21.410 --> 00:05:26.689 A:middle L:90%
prevent attacks. So uh again, hiding back with
117
00:05:26.699 --> 00:05:30.139 A:middle L:90%
multiple settings. So what's the research team? So
118
00:05:30.139 --> 00:05:31.540 A:middle L:90%
my research has really been consulted on these three main
119
00:05:31.540 --> 00:05:34.610 A:middle L:90%
areas, like essentially you have to, there's the
120
00:05:34.620 --> 00:05:38.069 A:middle L:90%
data which is the data about the large real world
121
00:05:38.079 --> 00:05:42.410 A:middle L:90%
applications and processes for and you build models out of
122
00:05:42.410 --> 00:05:44.829 A:middle L:90%
it and then you analyze the models which is the
123
00:05:44.829 --> 00:05:46.839 A:middle L:90%
analysis and understanding part. And once you do the
124
00:05:46.839 --> 00:05:49.879 A:middle L:90%
analysis, the next part is actually using their understanding
125
00:05:49.879 --> 00:05:54.399 A:middle L:90%
from the data to actually develop policy in action on
126
00:05:54.399 --> 00:05:57.379 A:middle L:90%
these processes. So for example, how this look
127
00:05:57.389 --> 00:06:00.459 A:middle L:90%
in epidemiological setting, for example, in the public
128
00:06:00.459 --> 00:06:01.129 A:middle L:90%
health setting, this would be modeling number of patient
129
00:06:01.129 --> 00:06:03.569 A:middle L:90%
transports, right? How how diseases spread among patients
130
00:06:03.569 --> 00:06:06.660 A:middle L:90%
in hospitals. Then the analysis part would be like
131
00:06:06.689 --> 00:06:09.800 A:middle L:90%
a new allies in the epidemic models to say well
132
00:06:09.800 --> 00:06:12.439 A:middle L:90%
an epidemic happened. And the policy and action part
133
00:06:12.449 --> 00:06:15.240 A:middle L:90%
would be essentially how to control outbreaks. I mean
134
00:06:15.250 --> 00:06:16.149 A:middle L:90%
, so once you have understood how the epidemics happened
135
00:06:16.149 --> 00:06:18.529 A:middle L:90%
, you can use that information to actually control the
136
00:06:18.529 --> 00:06:24.339 A:middle L:90%
outbreaks right by uh item distributing, disinfected and similarly
137
00:06:24.350 --> 00:06:26.560 A:middle L:90%
in the social media setting, you can think of
138
00:06:26.569 --> 00:06:29.120 A:middle L:90%
the data is modeling tweets, uh tweets spreading,
139
00:06:29.129 --> 00:06:31.709 A:middle L:90%
the people are retweeting uh information or topics or some
140
00:06:31.720 --> 00:06:35.220 A:middle L:90%
interesting news means item and the analysis would be what
141
00:06:35.220 --> 00:06:38.069 A:middle L:90%
would be the number of cascades and future can predicted
142
00:06:38.069 --> 00:06:39.959 A:middle L:90%
? Can you build a model for it and essentially
143
00:06:39.959 --> 00:06:41.879 A:middle L:90%
then using it for settings like how to market better
144
00:06:41.889 --> 00:06:46.910 A:middle L:90%
? Because once you have that are right. So
145
00:06:46.910 --> 00:06:49.029 A:middle L:90%
in this stock, I'll try to give you three
146
00:06:49.040 --> 00:06:51.720 A:middle L:90%
concrete questions which are trying to answer one of them
147
00:06:51.720 --> 00:06:54.100 A:middle L:90%
would be in the analysis part, which is like
148
00:06:54.100 --> 00:06:56.670 A:middle L:90%
given provocation models. Can you actually predict with an
149
00:06:56.670 --> 00:06:59.730 A:middle L:90%
epidemic happened? So these are really well understood,
150
00:06:59.740 --> 00:07:02.319 A:middle L:90%
well established propagation models and disease spreading. For example
151
00:07:02.329 --> 00:07:04.709 A:middle L:90%
, the second question would be in the policy action
152
00:07:04.709 --> 00:07:08.220 A:middle L:90%
part that once you have understood how these epidemics happened
153
00:07:08.230 --> 00:07:11.089 A:middle L:90%
, can unionize and control these outbreaks better. So
154
00:07:11.100 --> 00:07:13.920 A:middle L:90%
that's that that's a bit more algorithmic, right?
155
00:07:13.930 --> 00:07:16.170 A:middle L:90%
Because it's like you're developing algorithms to control the uh
156
00:07:16.180 --> 00:07:18.939 A:middle L:90%
epidemics. And the final thing I try to spend
157
00:07:18.939 --> 00:07:21.139 A:middle L:90%
some time is how to hashtag spread. For example
158
00:07:21.149 --> 00:07:24.949 A:middle L:90%
, that that give you a flavor of different application
159
00:07:24.959 --> 00:07:27.139 A:middle L:90%
domains here, the data is from twitter with tweets
160
00:07:27.139 --> 00:07:29.930 A:middle L:90%
are spreading among the people active on twitter and you
161
00:07:29.930 --> 00:07:31.110 A:middle L:90%
want to understand how do different topic hashtag just like
162
00:07:31.110 --> 00:07:34.209 A:middle L:90%
topics attached to the tweets and you want to understand
163
00:07:34.220 --> 00:07:39.300 A:middle L:90%
how different topic spread. Right? So as I
164
00:07:39.300 --> 00:07:41.069 A:middle L:90%
said this, the outline of the talk first,
165
00:07:41.069 --> 00:07:43.050 A:middle L:90%
I'll go over the theoretical part which is epidemics,
166
00:07:43.050 --> 00:07:45.839 A:middle L:90%
what happens then the action which is how to immunize
167
00:07:45.850 --> 00:07:47.899 A:middle L:90%
then some learning models from the total data. And
168
00:07:47.910 --> 00:07:49.879 A:middle L:90%
if time permits, I'll try to cover some other
169
00:07:49.889 --> 00:07:53.759 A:middle L:90%
uh other work which have been interested in can do
170
00:07:53.769 --> 00:07:59.339 A:middle L:90%
so uh in epidemic spreading. The fundamental question is
171
00:07:59.339 --> 00:08:01.420 A:middle L:90%
like uh when an epidemic happens. So imagine like
172
00:08:01.420 --> 00:08:05.610 A:middle L:90%
you have a strong virus on this contact network and
173
00:08:05.620 --> 00:08:07.550 A:middle L:90%
uh this probably this guy gets infected and it spread
174
00:08:07.550 --> 00:08:11.529 A:middle L:90%
the infection to his neighbors and then so slowly because
175
00:08:11.529 --> 00:08:13.379 A:middle L:90%
the virus is so virulent and strong, it spreads
176
00:08:13.379 --> 00:08:15.230 A:middle L:90%
to everyone in the network and you have a giant
177
00:08:15.230 --> 00:08:16.430 A:middle L:90%
dick epidemic. So almost everybody in the network is
178
00:08:16.430 --> 00:08:20.709 A:middle L:90%
infected, right? Uh suppose now you have a
179
00:08:20.709 --> 00:08:22.319 A:middle L:90%
weak virus. Uh you would imagine that the infection
180
00:08:22.319 --> 00:08:24.199 A:middle L:90%
would spread just to probably a couple of people and
181
00:08:24.199 --> 00:08:26.620 A:middle L:90%
it dies out, right? And so the footprint
182
00:08:26.620 --> 00:08:28.470 A:middle L:90%
is small, the number of infected people at the
183
00:08:28.470 --> 00:08:31.419 A:middle L:90%
end of small. So really you have these two
184
00:08:31.429 --> 00:08:35.850 A:middle L:90%
regimes of the epidemic and you want to understand what
185
00:08:35.850 --> 00:08:37.759 A:middle L:90%
separates them. So more concretely suppose you have this
186
00:08:37.769 --> 00:08:41.379 A:middle L:90%
number of infected was sustained. This is just the
187
00:08:41.379 --> 00:08:43.759 A:middle L:90%
number of infections per unit time and this is the
188
00:08:43.759 --> 00:08:46.649 A:middle L:90%
about but it's about regime which is the epidemic regime
189
00:08:46.649 --> 00:08:48.799 A:middle L:90%
that a lot of people got infected. And you
190
00:08:48.799 --> 00:08:50.629 A:middle L:90%
also have a bill a regime which is extinction.
191
00:08:52.039 --> 00:08:54.100 A:middle L:90%
And essentially the question is can you find a condition
192
00:08:54.100 --> 00:09:00.470 A:middle L:90%
which separates these two regimes? So right. So
193
00:09:00.480 --> 00:09:01.490 A:middle L:90%
again, just to reiterate you are given the epidemic
194
00:09:01.490 --> 00:09:03.120 A:middle L:90%
model. So of course you need to assume a
195
00:09:03.120 --> 00:09:05.460 A:middle L:90%
model here which is I have learned from the data
196
00:09:05.460 --> 00:09:09.490 A:middle L:90%
analysis analyzed and then you have the virus and under
197
00:09:09.490 --> 00:09:11.049 A:middle L:90%
graph which is essentially the contact network on which the
198
00:09:11.049 --> 00:09:13.399 A:middle L:90%
virus is spreading. And you want to find a
199
00:09:13.399 --> 00:09:16.889 A:middle L:90%
condition for virus extinction. So and I call this
200
00:09:16.889 --> 00:09:18.179 A:middle L:90%
the static version right now because the graph we assume
201
00:09:18.179 --> 00:09:20.509 A:middle L:90%
is static, it doesn't change. So essentially it's
202
00:09:20.509 --> 00:09:24.990 A:middle L:90%
the same given graphics you have. So, uh
203
00:09:26.000 --> 00:09:26.490 A:middle L:90%
, of course there's a fundamental question. You might
204
00:09:26.490 --> 00:09:28.309 A:middle L:90%
think that is interesting in itself, but why is
205
00:09:28.309 --> 00:09:31.649 A:middle L:90%
it important? Right? So it's important for uh
206
00:09:31.659 --> 00:09:33.080 A:middle L:90%
, many reasons. So one of them is that
207
00:09:33.090 --> 00:09:35.220 A:middle L:90%
it can accelerate simulations because these simulations are really expensive
208
00:09:35.220 --> 00:09:37.850 A:middle L:90%
and either done a lot of different machines and using
209
00:09:37.860 --> 00:09:41.269 A:middle L:90%
really huge contact networks and uh, so on.
210
00:09:41.279 --> 00:09:43.049 A:middle L:90%
So it would be really nice to actually be able
211
00:09:43.049 --> 00:09:45.409 A:middle L:90%
to predict analytically, but certainty that what happens,
212
00:09:45.409 --> 00:09:46.309 A:middle L:90%
right? If so if, if a simulation will
213
00:09:46.309 --> 00:09:48.509 A:middle L:90%
lead to a condition where the epidemic doesn't happen,
214
00:09:48.509 --> 00:09:50.590 A:middle L:90%
you don't really need to simulate right? It probably
215
00:09:50.590 --> 00:09:54.440 A:middle L:90%
is not that useful. And also forecasting what if
216
00:09:54.440 --> 00:09:56.039 A:middle L:90%
scenarios, what if the virus was two stripes are
217
00:09:56.039 --> 00:09:58.399 A:middle L:90%
stronger uh half times a week. So what will
218
00:09:58.399 --> 00:10:01.120 A:middle L:90%
happen? What will change how with the distribution of
219
00:10:01.120 --> 00:10:03.379 A:middle L:90%
the epidemic change? And so, uh, and
220
00:10:03.379 --> 00:10:05.120 A:middle L:90%
finally as I'll show later in the dark as well
221
00:10:05.120 --> 00:10:07.350 A:middle L:90%
, it's a great handle to manipulate the spreading which
222
00:10:07.350 --> 00:10:09.440 A:middle L:90%
is controlling outbreaks or maximum argument spreading it. Like
223
00:10:09.450 --> 00:10:15.500 A:middle L:90%
for example, maximum collaboration. So in this part
224
00:10:15.509 --> 00:10:16.730 A:middle L:90%
, the outline is essentially I'll try to give you
225
00:10:16.730 --> 00:10:20.470 A:middle L:90%
quickly a bit of background of the epidemic models.
226
00:10:20.470 --> 00:10:22.820 A:middle L:90%
And so and then the result of the inclusion on
227
00:10:22.820 --> 00:10:26.159 A:middle L:90%
static graph. Some uh, some ideas of the
228
00:10:26.169 --> 00:10:28.090 A:middle L:90%
proof. How Of course I won't be able to
229
00:10:28.090 --> 00:10:28.870 A:middle L:90%
give you the full proof, but I'll try to
230
00:10:28.870 --> 00:10:31.590 A:middle L:90%
give you a sense of what we did. And
231
00:10:31.590 --> 00:10:33.049 A:middle L:90%
as a bonus using similar methodology, you can even
232
00:10:33.049 --> 00:10:37.274 A:middle L:90%
get powerful results and other different areas like what if
233
00:10:37.274 --> 00:10:37.965 A:middle L:90%
the grass for changing over time, for example,
234
00:10:37.975 --> 00:10:41.284 A:middle L:90%
which is the more realistic scenario. Right? And
235
00:10:41.284 --> 00:10:45.375 A:middle L:90%
also in the domain of competing viruses. So let's
236
00:10:45.375 --> 00:10:48.034 A:middle L:90%
go. So this background. So S. I
237
00:10:48.034 --> 00:10:48.845 A:middle L:90%
. R. Is essentially a very simple one of
238
00:10:48.845 --> 00:10:52.575 A:middle L:90%
the most common basic epidemic models which is like it's
239
00:10:52.575 --> 00:10:56.695 A:middle L:90%
got a susceptible infected recovered model which is uh models
240
00:10:56.695 --> 00:10:58.705 A:middle L:90%
like immunity which you gain in mom's once you get
241
00:10:58.705 --> 00:11:01.105 A:middle L:90%
months you'll never get it again in your life.
242
00:11:01.115 --> 00:11:03.975 A:middle L:90%
So uh so the assumption here is that each node
243
00:11:03.975 --> 00:11:07.315 A:middle L:90%
in the network is essentially in three states, one
244
00:11:07.315 --> 00:11:09.335 A:middle L:90%
of them is susceptible, which just means healthy.
245
00:11:09.345 --> 00:11:11.264 A:middle L:90%
One of them is infected, you are infected with
246
00:11:11.264 --> 00:11:15.215 A:middle L:90%
the virus and the third one is removed their unfortunate
247
00:11:15.225 --> 00:11:18.274 A:middle L:90%
where you can get infected again or unfortunately you passed
248
00:11:18.274 --> 00:11:20.495 A:middle L:90%
away. So uh this this is represented by the
249
00:11:20.495 --> 00:11:22.184 A:middle L:90%
state diagram here. So you can think of this
250
00:11:22.184 --> 00:11:26.455 A:middle L:90%
as a susceptible estate I. And art. So
251
00:11:26.465 --> 00:11:28.085 A:middle L:90%
one of these parameters. So you assume that the
252
00:11:28.095 --> 00:11:31.554 A:middle L:90%
graph the virus spreads in this way, for example
253
00:11:31.554 --> 00:11:33.745 A:middle L:90%
, here you have shown three snapshots of the network
254
00:11:33.754 --> 00:11:35.085 A:middle L:90%
. So in the first snapshot this guy has been
255
00:11:35.085 --> 00:11:37.424 A:middle L:90%
infected and it spreads the virus for each of his
256
00:11:37.424 --> 00:11:41.054 A:middle L:90%
neighbors independently with probably DaVita. So that's an assumption
257
00:11:41.054 --> 00:11:43.445 A:middle L:90%
we make that the virus is essentially spreading independently on
258
00:11:43.445 --> 00:11:46.945 A:middle L:90%
the edges from an infected person. So sometime in
259
00:11:46.945 --> 00:11:50.539 A:middle L:90%
the future it is going to this guy spreads the
260
00:11:50.539 --> 00:11:52.500 A:middle L:90%
virus to one of his neighbors. At the same
261
00:11:52.500 --> 00:11:54.799 A:middle L:90%
time there is a competing process and the competing process
262
00:11:54.799 --> 00:11:58.129 A:middle L:90%
, the curing rate that the probability that infected person
263
00:11:58.139 --> 00:12:01.600 A:middle L:90%
cures themselves. Right? And that's delta. So
264
00:12:01.610 --> 00:12:05.320 A:middle L:90%
safe for time is typically three. This guy's cure
265
00:12:05.320 --> 00:12:07.700 A:middle L:90%
himself and he uh this person is infected. So
266
00:12:07.700 --> 00:12:09.149 A:middle L:90%
not that the epidemic has died out, right?
267
00:12:09.159 --> 00:12:11.509 A:middle L:90%
There's no way the epidemic can spread. There are
268
00:12:11.509 --> 00:12:13.669 A:middle L:90%
only two people who who have been affected by the
269
00:12:13.669 --> 00:12:16.059 A:middle L:90%
virus and that's it because there's no other neighbors for
270
00:12:16.059 --> 00:12:20.120 A:middle L:90%
this person. So so essentially we want to identify
271
00:12:20.120 --> 00:12:24.700 A:middle L:90%
when this happens in a general way. Uh as
272
00:12:24.700 --> 00:12:26.029 A:middle L:90%
you can imagine there are a lot of different epidemic
273
00:12:26.029 --> 00:12:30.320 A:middle L:90%
models. Uh we called I called virus propagation models
274
00:12:30.320 --> 00:12:33.759 A:middle L:90%
and uh the stock and uh so one variant of
275
00:12:33.759 --> 00:12:35.649 A:middle L:90%
the science model. It's also very popular model is
276
00:12:35.649 --> 00:12:37.500 A:middle L:90%
the flu like model in this you don't recover,
277
00:12:37.509 --> 00:12:41.220 A:middle L:90%
you don't get any immunity, you become susceptibility.
278
00:12:41.230 --> 00:12:45.100 A:middle L:90%
So uh there's the CRS model that you have temporary
279
00:12:45.100 --> 00:12:46.940 A:middle L:90%
immunity like purposes which is whooping cough and the ci
280
00:12:46.940 --> 00:12:50.580 A:middle L:90%
our moms like like virus incubation and all those things
281
00:12:50.679 --> 00:12:52.250 A:middle L:90%
. And the underlying contact network is essentially home.
282
00:12:52.539 --> 00:12:56.279 A:middle L:90%
Okay, so again, so this is a very
283
00:12:56.279 --> 00:13:00.419 A:middle L:90%
old problem as you can think of because it's very
284
00:13:00.429 --> 00:13:03.409 A:middle L:90%
easy to state and uh it's a lot of work
285
00:13:03.409 --> 00:13:03.659 A:middle L:90%
has been done on the right, so some of
286
00:13:03.659 --> 00:13:07.679 A:middle L:90%
the papers you can see is like from 69 91
287
00:13:07.799 --> 00:13:09.590 A:middle L:90%
so on. But really the key things to take
288
00:13:09.590 --> 00:13:13.509 A:middle L:90%
over here is that all our about structured apologies where
289
00:13:13.570 --> 00:13:15.830 A:middle L:90%
apologies has given its all resume. So it can
290
00:13:15.830 --> 00:13:18.590 A:middle L:90%
be like fully connected clicks which is everybody is connected
291
00:13:18.590 --> 00:13:22.470 A:middle L:90%
to everybody else or blocked Agnes or hierarchy. Uh
292
00:13:22.480 --> 00:13:26.539 A:middle L:90%
population is distributed hierarchical, there are random graphs or
293
00:13:26.539 --> 00:13:28.980 A:middle L:90%
they give the specific virus propagation models, like they
294
00:13:28.980 --> 00:13:31.789 A:middle L:90%
assume the virus propagation model to be some specific structure
295
00:13:31.909 --> 00:13:33.860 A:middle L:90%
and or static graphs where the graphs don't change.
296
00:13:33.980 --> 00:13:37.710 A:middle L:90%
So in the stock, I try to generalize all
297
00:13:37.710 --> 00:13:41.399 A:middle L:90%
these three directions. Right? So how would so
298
00:13:41.409 --> 00:13:43.059 A:middle L:90%
yeah, how would the answer look like? Right
299
00:13:43.070 --> 00:13:46.580 A:middle L:90%
. What should answer depend on. So because the
300
00:13:46.580 --> 00:13:48.600 A:middle L:90%
inputs the problem are essentially the graph and the virus
301
00:13:48.600 --> 00:13:50.320 A:middle L:90%
propagation model. It's reasonable to assume that it should
302
00:13:50.320 --> 00:13:52.120 A:middle L:90%
depend on both of them. Right? That if
303
00:13:52.120 --> 00:13:54.879 A:middle L:90%
you change the graph, the answer change. If
304
00:13:54.879 --> 00:13:56.539 A:middle L:90%
you change the model of the answer should change again
305
00:13:56.549 --> 00:14:00.279 A:middle L:90%
. But how like uh it's clear that because it's
306
00:14:00.279 --> 00:14:01.600 A:middle L:90%
a spreading process, there should be some connectivity metric
307
00:14:01.600 --> 00:14:05.009 A:middle L:90%
of the graph. Right? Uh So how should
308
00:14:05.009 --> 00:14:07.809 A:middle L:90%
the graph player should be average degree or uh expected
309
00:14:07.809 --> 00:14:09.860 A:middle L:90%
degree of max degree of the diameter. All these
310
00:14:09.860 --> 00:14:13.659 A:middle L:90%
are connectivity metrics. And for the virus propagation model
311
00:14:13.669 --> 00:14:15.570 A:middle L:90%
, which farm it is important. For example,
312
00:14:15.570 --> 00:14:16.379 A:middle L:90%
in the CR models, you saw beta and delta
313
00:14:16.379 --> 00:14:18.779 A:middle L:90%
two parameters which ones should be important? Like beta
314
00:14:18.779 --> 00:14:22.570 A:middle L:90%
and delta. Both important or are. And finally
315
00:14:22.570 --> 00:14:24.389 A:middle L:90%
the question is of course, how to combine them
316
00:14:24.399 --> 00:14:26.759 A:middle L:90%
? I mean, it should be linear quadratic explanation
317
00:14:26.759 --> 00:14:28.379 A:middle L:90%
in some format. So the nice thing is that
318
00:14:28.389 --> 00:14:33.000 A:middle L:90%
uh what we found in our uh work was that
319
00:14:33.009 --> 00:14:37.620 A:middle L:90%
it's easily suitable resident. So informally for any arbitrary
320
00:14:37.620 --> 00:14:39.360 A:middle L:90%
topology, which is be represented the topology by an
321
00:14:39.360 --> 00:14:41.809 A:middle L:90%
adjacency matrix. A. It's a matrix. Uh
322
00:14:41.820 --> 00:14:45.164 A:middle L:90%
and by a number of notes, uh and any
323
00:14:45.164 --> 00:14:48.865 A:middle L:90%
virus propagation model in the treasure standard literature. And
324
00:14:48.875 --> 00:14:50.365 A:middle L:90%
if you represent the adjacency matrix, it's only one
325
00:14:50.365 --> 00:14:54.345 A:middle L:90%
parameter lambda, which is the largest organ value of
326
00:14:54.345 --> 00:14:56.235 A:middle L:90%
the C matrix. I'll try to give you an
327
00:14:56.235 --> 00:14:58.254 A:middle L:90%
intuition about what it exactly means. And also a
328
00:14:58.254 --> 00:15:01.154 A:middle L:90%
constant cBP in which we talk to C B p
329
00:15:01.154 --> 00:15:03.735 A:middle L:90%
M. And it's a constant depending on the virus
330
00:15:03.735 --> 00:15:05.115 A:middle L:90%
propagation model and it's an explicit constant. We give
331
00:15:05.115 --> 00:15:07.695 A:middle L:90%
the constant in the proof. And given these two
332
00:15:07.705 --> 00:15:11.995 A:middle L:90%
things, there's no epidemic. If lambda times CPM
333
00:15:11.995 --> 00:15:13.144 A:middle L:90%
is less than one, that's it. So the
334
00:15:13.144 --> 00:15:18.664 A:middle L:90%
graph interacts with the threshold question. Uh The threshold
335
00:15:18.664 --> 00:15:20.955 A:middle L:90%
question uh only with only one parameter and its linear
336
00:15:20.965 --> 00:15:24.424 A:middle L:90%
in combination. Right? So it's lambda times C
337
00:15:24.424 --> 00:15:24.245 A:middle L:90%
B P. M. And C B P.
338
00:15:24.245 --> 00:15:26.105 A:middle L:90%
M is essentially a constant which is depending on the
339
00:15:26.105 --> 00:15:30.365 A:middle L:90%
virus propagation model. So if you multiply that and
340
00:15:30.365 --> 00:15:33.445 A:middle L:90%
it's less than one you're done. So how does
341
00:15:33.445 --> 00:15:37.245 A:middle L:90%
this uh threshold actually substantiate in particular models? So
342
00:15:37.254 --> 00:15:39.355 A:middle L:90%
some of these models, so just the standard is
343
00:15:39.355 --> 00:15:41.205 A:middle L:90%
a discussion. I'll try to use this term which
344
00:15:41.205 --> 00:15:43.514 A:middle L:90%
is effective strength. I do not it by S
345
00:15:43.524 --> 00:15:48.144 A:middle L:90%
. And so this is essentially this product. So
346
00:15:48.144 --> 00:15:50.375 A:middle L:90%
if this product is less than one year below threshold
347
00:15:50.384 --> 00:15:52.644 A:middle L:90%
, otherwise you're about threshold. So for the for
348
00:15:52.644 --> 00:15:54.115 A:middle L:90%
a whole bunch of models S. I. R
349
00:15:54.125 --> 00:15:54.514 A:middle L:90%
. S. I. S. This the flu
350
00:15:54.514 --> 00:15:58.164 A:middle L:90%
like this, the moms like and all the alphabet
351
00:15:58.164 --> 00:16:00.014 A:middle L:90%
soup, you can think that it's just lambda,
352
00:16:00.014 --> 00:16:03.634 A:middle L:90%
beta delta. So the important thing to note here
353
00:16:03.634 --> 00:16:06.434 A:middle L:90%
is that uh all these models actually have much more
354
00:16:06.434 --> 00:16:07.965 A:middle L:90%
parameters. For example, S. I. R
355
00:16:07.965 --> 00:16:10.195 A:middle L:90%
. S. Has a forgetting factory. It's temporary
356
00:16:10.195 --> 00:16:11.424 A:middle L:90%
immunity. So you gain immunity, then you lose
357
00:16:11.424 --> 00:16:14.225 A:middle L:90%
it. So there are a lot of different factors
358
00:16:14.225 --> 00:16:15.865 A:middle L:90%
which actually are in the model, but they don't
359
00:16:15.865 --> 00:16:18.174 A:middle L:90%
play a role in the threshold. And this gives
360
00:16:18.174 --> 00:16:18.325 A:middle L:90%
you a sense of the power of the result,
361
00:16:18.325 --> 00:16:21.264 A:middle L:90%
right? Because you can see these interactions play out
362
00:16:21.264 --> 00:16:23.575 A:middle L:90%
in the uh result. And for all these cases
363
00:16:23.585 --> 00:16:29.095 A:middle L:90%
lambda is essentially uh the graph the interacts only the
364
00:16:29.105 --> 00:16:30.875 A:middle L:90%
uh by the parameter lambda. Uh And they are
365
00:16:30.875 --> 00:16:33.065 A:middle L:90%
much more complicated models have said. But for example
366
00:16:33.075 --> 00:16:36.725 A:middle L:90%
, this model has two infected states. So it's
367
00:16:36.725 --> 00:16:38.144 A:middle L:90%
it's a bit more complex model for all of them
368
00:16:38.144 --> 00:16:40.649 A:middle L:90%
. The threshold is S is equal to one.
369
00:16:40.659 --> 00:16:45.649 A:middle L:90%
That's it. So what's the intention for lambda?
370
00:16:45.649 --> 00:16:48.519 A:middle L:90%
Right. So the official linear algebra definition is essentially
371
00:16:48.519 --> 00:16:51.649 A:middle L:90%
it's the root of the of the largest magnitude of
372
00:16:51.659 --> 00:16:53.730 A:middle L:90%
the characteristic polynomial. So sure, it doesn't give
373
00:16:53.740 --> 00:16:57.389 A:middle L:90%
too much attention. So uh what's the unofficial intuition
374
00:16:57.389 --> 00:17:00.669 A:middle L:90%
? This is essentially the number of parts of the
375
00:17:00.669 --> 00:17:03.240 A:middle L:90%
graph. So what we so imagine this adjacency matrix
376
00:17:03.240 --> 00:17:04.349 A:middle L:90%
, right? And if you take it to the
377
00:17:04.349 --> 00:17:07.470 A:middle L:90%
chaos power, then the I I. J.
378
00:17:07.480 --> 00:17:11.670 A:middle L:90%
Element in this matrix. This a square matrix and
379
00:17:11.680 --> 00:17:14.319 A:middle L:90%
digest element is essentially the number of parts from I
380
00:17:14.319 --> 00:17:15.309 A:middle L:90%
to change the network. So of course this parts
381
00:17:15.309 --> 00:17:18.880 A:middle L:90%
of repeated and their loops. But the rough intuition
382
00:17:18.880 --> 00:17:19.519 A:middle L:90%
is that it's captures the conduct of the graph in
383
00:17:19.519 --> 00:17:22.319 A:middle L:90%
that sense. And if you take the spectral decomposition
384
00:17:22.319 --> 00:17:23.829 A:middle L:90%
, so you don't need to know much about the
385
00:17:23.829 --> 00:17:26.519 A:middle L:90%
decomposition, apart from the fact that it involves tagging
386
00:17:26.519 --> 00:17:30.380 A:middle L:90%
value uh and the Eigen vectors. So we took
387
00:17:30.380 --> 00:17:32.339 A:middle L:90%
it just to the first I can vector and Eigen
388
00:17:32.339 --> 00:17:33.759 A:middle L:90%
value, you can see that lander to the parquet
389
00:17:33.769 --> 00:17:36.950 A:middle L:90%
. We don't care about these things. The lander
390
00:17:36.950 --> 00:17:38.279 A:middle L:90%
to the arcade government essentially. What's the magnitude of
391
00:17:38.289 --> 00:17:41.309 A:middle L:90%
this matrix in some sense? So how does it
392
00:17:41.309 --> 00:17:44.210 A:middle L:90%
look for some paragraphs? Right, So for example
393
00:17:44.210 --> 00:17:45.960 A:middle L:90%
, look at this uh there's a change star and
394
00:17:45.960 --> 00:17:48.630 A:middle L:90%
click uh all of have the same number of notes
395
00:17:48.640 --> 00:17:52.210 A:middle L:90%
, but uh different number of edges in particular.
396
00:17:52.210 --> 00:17:53.329 A:middle L:90%
These two are actually same number of edges. But
397
00:17:53.329 --> 00:17:56.220 A:middle L:90%
intuitively you can imagine that star is better for the
398
00:17:56.220 --> 00:17:59.140 A:middle L:90%
virus, right? Because once you infect the center
399
00:17:59.150 --> 00:18:00.650 A:middle L:90%
it will quickly spread. So how does it look
400
00:18:00.660 --> 00:18:03.390 A:middle L:90%
turned up in the wagon value so well, so
401
00:18:03.400 --> 00:18:06.960 A:middle L:90%
the differences are not too large here. But uh
402
00:18:06.970 --> 00:18:08.230 A:middle L:90%
, if you make it end notes, for example
403
00:18:08.240 --> 00:18:11.579 A:middle L:90%
, that how would it look like the value of
404
00:18:11.579 --> 00:18:12.259 A:middle L:90%
the star is essentially the uh, sorry, the
405
00:18:12.269 --> 00:18:15.519 A:middle L:90%
chain is essentially constant. So it's uh, the
406
00:18:15.529 --> 00:18:18.000 A:middle L:90%
corresponds to intuition, right? So to 10 no
407
00:18:18.000 --> 00:18:19.549 A:middle L:90%
change is probably as bad as 100. No change
408
00:18:19.559 --> 00:18:22.230 A:middle L:90%
. But the 10 note star is much better than
409
00:18:22.279 --> 00:18:26.170 A:middle L:90%
100 note stuff. And in particular grows as lieutenant
410
00:18:26.170 --> 00:18:27.079 A:middle L:90%
here and it grows as landmines from click is the
411
00:18:27.079 --> 00:18:29.930 A:middle L:90%
worst. Right? Because once you infect anyone of
412
00:18:29.930 --> 00:18:30.309 A:middle L:90%
them, you can quickly infect the rest of the
413
00:18:30.309 --> 00:18:33.759 A:middle L:90%
world. So uh, yeah, I've given also
414
00:18:33.759 --> 00:18:36.430 A:middle L:90%
values for them and see and it's called 1000.
415
00:18:36.440 --> 00:18:37.859 A:middle L:90%
Right? So what I wish to claim is that
416
00:18:37.869 --> 00:18:41.519 A:middle L:90%
for this problem, better connectivity is highlander. So
417
00:18:41.529 --> 00:18:44.970 A:middle L:90%
like and uh, I'll give you some examples of
418
00:18:44.980 --> 00:18:48.069 A:middle L:90%
uh, this result. So for maybe use a
419
00:18:48.069 --> 00:18:49.759 A:middle L:90%
really huge graphics was actually developed by the folks at
420
00:18:49.759 --> 00:18:52.529 A:middle L:90%
any ssl here in Virginia Tech and this is a
421
00:18:52.539 --> 00:18:55.819 A:middle L:90%
really huge graph like 31 million links and six million
422
00:18:55.819 --> 00:18:59.740 A:middle L:90%
notes. And uh it's been used lots of outbreak
423
00:18:59.740 --> 00:19:03.000 A:middle L:90%
stories like smallpox and all those things. And so
424
00:19:03.000 --> 00:19:03.819 A:middle L:90%
you can see two graphs here, we simulated the
425
00:19:03.819 --> 00:19:07.329 A:middle L:90%
sierra Madre here. So the two graphs are infection
426
00:19:07.329 --> 00:19:08.609 A:middle L:90%
profile and the takeoff plot. The infection profile,
427
00:19:08.609 --> 00:19:11.940 A:middle L:90%
essentially the number of people infected number of nodes infected
428
00:19:11.940 --> 00:19:15.190 A:middle L:90%
per unit time. So you can clearly see the
429
00:19:15.200 --> 00:19:17.529 A:middle L:90%
two different regimes. What it doesn't show is exactly
430
00:19:17.529 --> 00:19:19.460 A:middle L:90%
where the regime separate and this is this is what
431
00:19:19.460 --> 00:19:22.769 A:middle L:90%
this blood shows. So that the takeoff plot,
432
00:19:22.779 --> 00:19:25.200 A:middle L:90%
you can see that the X axis, the effective
433
00:19:25.200 --> 00:19:26.339 A:middle L:90%
strength, if you remember the effective strength was s
434
00:19:26.349 --> 00:19:30.710 A:middle L:90%
the product and biases the footprint essentially how many people
435
00:19:30.710 --> 00:19:33.049 A:middle L:90%
were in effect at the end of the infection.
436
00:19:33.059 --> 00:19:34.630 A:middle L:90%
So how bad the infection was. So you can
437
00:19:34.630 --> 00:19:37.799 A:middle L:90%
see that this is the predicted threshold at 10 0
438
00:19:37.799 --> 00:19:41.480 A:middle L:90%
which is one. And then the effective strength is
439
00:19:41.480 --> 00:19:44.809 A:middle L:90%
one. Suddenly the footprint takes off so you can
440
00:19:44.809 --> 00:19:47.970 A:middle L:90%
see that we can market these two regimes. Uh
441
00:19:47.980 --> 00:19:48.690 A:middle L:90%
And just to give you a further example, there's
442
00:19:48.690 --> 00:19:51.819 A:middle L:90%
another plot with the I. R. S.
443
00:19:51.819 --> 00:19:55.049 A:middle L:90%
Model which is the temporary immunity model purposes and you
444
00:19:55.049 --> 00:19:57.750 A:middle L:90%
can see similar behaviors. They're like uh So not
445
00:19:57.750 --> 00:20:00.619 A:middle L:90%
that effective strength incidentally means the same thing here.
446
00:20:00.619 --> 00:20:03.859 A:middle L:90%
Right? It's still land a bit of a delta
447
00:20:03.869 --> 00:20:06.039 A:middle L:90%
. It doesn't matter uh even though the series as
448
00:20:06.039 --> 00:20:11.440 A:middle L:90%
an extra parameter. So yeah. Right now that
449
00:20:11.440 --> 00:20:12.880 A:middle L:90%
I've given you a sense of the result and what
450
00:20:12.880 --> 00:20:15.220 A:middle L:90%
it means in different models uh I will try to
451
00:20:15.230 --> 00:20:19.059 A:middle L:90%
go a bit over the proof. Right. So
452
00:20:19.069 --> 00:20:22.119 A:middle L:90%
what's the proof sketch? So there are two main
453
00:20:22.119 --> 00:20:23.470 A:middle L:90%
ingredients in our model, in our proof which gives
454
00:20:23.470 --> 00:20:26.440 A:middle L:90%
rise to this nice operability between these two parameters,
455
00:20:26.440 --> 00:20:27.980 A:middle L:90%
which is one of the model, one on one
456
00:20:27.980 --> 00:20:30.700 A:middle L:90%
of the model, one on the graph. So
457
00:20:30.700 --> 00:20:33.400 A:middle L:90%
what are these two ingredients? One is the generalized
458
00:20:33.410 --> 00:20:34.910 A:middle L:90%
virus propagation model structure. That we have tried to
459
00:20:34.920 --> 00:20:38.579 A:middle L:90%
generalize all these cascade style models into some coherent structure
460
00:20:38.579 --> 00:20:41.039 A:middle L:90%
which is uh which captures every one of them and
461
00:20:41.039 --> 00:20:45.019 A:middle L:90%
get distracted. And at the same time we used
462
00:20:45.019 --> 00:20:48.250 A:middle L:90%
ideas from stability theory equivalents and stability points to actually
463
00:20:48.259 --> 00:20:52.619 A:middle L:90%
get some handle on the uh on the land or
464
00:20:52.630 --> 00:20:56.650 A:middle L:90%
how to involve the graph in the computation. So
465
00:20:56.660 --> 00:20:57.829 A:middle L:90%
these are the two ingredients which give rise to these
466
00:20:57.829 --> 00:21:02.039 A:middle L:90%
two parameters of the proof. So, for the
467
00:21:02.039 --> 00:21:03.150 A:middle L:90%
first part of our March, so they're all these
468
00:21:03.150 --> 00:21:06.029 A:middle L:90%
models. Right? And what we were able to
469
00:21:06.039 --> 00:21:07.710 A:middle L:90%
do, well we were able to convert all of
470
00:21:07.710 --> 00:21:10.049 A:middle L:90%
them into one big generalized model which you call S
471
00:21:10.049 --> 00:21:11.799 A:middle L:90%
. S. Stars for restart. So this uh
472
00:21:11.809 --> 00:21:15.299 A:middle L:90%
essentially represents these uh there are three different types conceptual
473
00:21:15.299 --> 00:21:18.309 A:middle L:90%
types of states in these kinds of models, one
474
00:21:18.309 --> 00:21:19.500 A:middle L:90%
of them susceptible, one of them is infected and
475
00:21:19.500 --> 00:21:22.799 A:middle L:90%
one of them is vigilant. And in our model
476
00:21:22.799 --> 00:21:23.470 A:middle L:90%
and I generalized model there can be any number of
477
00:21:23.470 --> 00:21:26.410 A:middle L:90%
states and any on any of these classes, right
478
00:21:26.420 --> 00:21:27.859 A:middle L:90%
? Because there's a family journalist state diagram respectively.
479
00:21:27.869 --> 00:21:30.299 A:middle L:90%
So and so the interesting part here is that there
480
00:21:30.299 --> 00:21:34.019 A:middle L:90%
is a big red arrow. So these are what
481
00:21:34.019 --> 00:21:37.559 A:middle L:90%
we term has graft, this transition. So intuitively
482
00:21:37.559 --> 00:21:38.720 A:middle L:90%
it just means that you can get infected only by
483
00:21:38.720 --> 00:21:41.650 A:middle L:90%
your neighbor. You can't just get infected by your
484
00:21:41.650 --> 00:21:44.190 A:middle L:90%
own. If you can get infected by your own
485
00:21:44.200 --> 00:21:45.750 A:middle L:90%
then it's it's not really a cascade style model.
486
00:21:45.750 --> 00:21:48.460 A:middle L:90%
Right? So this is this is one of the
487
00:21:48.460 --> 00:21:51.200 A:middle L:90%
only assumptions that you use and we were able to
488
00:21:51.200 --> 00:21:52.400 A:middle L:90%
generalize the whole, you know, big set of
489
00:21:52.400 --> 00:21:56.630 A:middle L:90%
models. So I won't go over this. Essentially
490
00:21:56.630 --> 00:21:57.789 A:middle L:90%
what I want to say here is that the huge
491
00:21:57.789 --> 00:22:00.980 A:middle L:90%
big boulders uh hire all these complexities, right?
492
00:22:00.980 --> 00:22:03.609 A:middle L:90%
There can be any number of transitions from any number
493
00:22:03.609 --> 00:22:06.329 A:middle L:90%
of states. You can have transition between states across
494
00:22:06.329 --> 00:22:08.839 A:middle L:90%
classes and so on. So uh yeah. So
495
00:22:08.839 --> 00:22:11.170 A:middle L:90%
what's the special case? So if you have just
496
00:22:11.170 --> 00:22:15.339 A:middle L:90%
one susceptible uh known one uh infected no one vigilant
497
00:22:15.339 --> 00:22:18.869 A:middle L:90%
state then it's just you're playing or less. I
498
00:22:18.869 --> 00:22:19.599 A:middle L:90%
our model right? That you have already seen before
499
00:22:19.609 --> 00:22:23.029 A:middle L:90%
. So another example is this which you are two
500
00:22:23.029 --> 00:22:26.039 A:middle L:90%
different uh infected states. So here essentially this means
501
00:22:26.039 --> 00:22:29.970 A:middle L:90%
non terminal and this is a terminal case for HIV
502
00:22:30.089 --> 00:22:32.329 A:middle L:90%
and what it shows is multiple, vigilant and multiple
503
00:22:32.329 --> 00:22:36.240 A:middle L:90%
infectious states. Uh The second ingredient in the proof
504
00:22:36.240 --> 00:22:37.470 A:middle L:90%
. Once you have generalized the model is essentially and
505
00:22:37.480 --> 00:22:41.950 A:middle L:90%
uh nonlinear dynamical system and stability theory. So the
506
00:22:41.950 --> 00:22:44.650 A:middle L:90%
key idea here is that view the whole system,
507
00:22:44.650 --> 00:22:47.299 A:middle L:90%
the view the whole evolution of the epidemic system as
508
00:22:47.299 --> 00:22:49.450 A:middle L:90%
a energies which is an only in a dynamical system
509
00:22:49.569 --> 00:22:52.480 A:middle L:90%
. And here essentially you have a big huge vector
510
00:22:52.490 --> 00:22:56.660 A:middle L:90%
B. D. Plus one which is a function
511
00:22:56.660 --> 00:22:57.970 A:middle L:90%
of the previous state which is P. T.
512
00:22:59.049 --> 00:23:02.349 A:middle L:90%
And what's the what's the relation between these two states
513
00:23:02.359 --> 00:23:04.539 A:middle L:90%
? Especially given by the function G. Which is
514
00:23:04.539 --> 00:23:07.660 A:middle L:90%
a huge but not only in your function. So
515
00:23:07.660 --> 00:23:08.640 A:middle L:90%
it's discrete time as you can see. So the
516
00:23:08.640 --> 00:23:11.440 A:middle L:90%
key idea here is that the P. T.
517
00:23:11.450 --> 00:23:14.569 A:middle L:90%
Can be probably directorate. It just states the what
518
00:23:14.579 --> 00:23:15.299 A:middle L:90%
specifies the state of the system at time, t
519
00:23:15.309 --> 00:23:18.799 A:middle L:90%
what's the probability of each note in the graph,
520
00:23:18.809 --> 00:23:21.609 A:middle L:90%
being in each state of the system and so on
521
00:23:21.619 --> 00:23:22.269 A:middle L:90%
. And jesus, as I said, a huge
522
00:23:22.269 --> 00:23:25.789 A:middle L:90%
big anonymous function. So it's a huge messy function
523
00:23:25.789 --> 00:23:27.019 A:middle L:90%
for the generalized model, which I won't try to
524
00:23:27.019 --> 00:23:29.690 A:middle L:90%
right here. But the idea is that it gives
525
00:23:29.700 --> 00:23:32.230 A:middle L:90%
explicitly gives the evolution of the system. So once
526
00:23:32.230 --> 00:23:33.140 A:middle L:90%
you're given gene, what you're given B you can
527
00:23:33.150 --> 00:23:37.319 A:middle L:90%
explicitly of all the system. Now the now that
528
00:23:37.319 --> 00:23:40.000 A:middle L:90%
you have an only in a dynamical system which is
529
00:23:40.009 --> 00:23:41.789 A:middle L:90%
given by PNG what what you do with it.
530
00:23:41.799 --> 00:23:45.670 A:middle L:90%
The next idea is that you transform the threshold question
531
00:23:45.680 --> 00:23:48.440 A:middle L:90%
, which is when an epidemic will happen. We're
532
00:23:48.440 --> 00:23:52.359 A:middle L:90%
tipping point question of stability question of this system and
533
00:23:52.369 --> 00:23:53.740 A:middle L:90%
uh we should analyze the stability. But at which
534
00:23:53.740 --> 00:23:56.630 A:middle L:90%
point, so the fixed point there which you analyze
535
00:23:56.630 --> 00:24:00.160 A:middle L:90%
the stability is essentially given by then nobody is infected
536
00:24:00.160 --> 00:24:02.380 A:middle L:90%
, right? Because that's when you and you infect
537
00:24:02.380 --> 00:24:03.109 A:middle L:90%
a few people initially. That's when you want to
538
00:24:03.119 --> 00:24:06.559 A:middle L:90%
uh understand that the system will take off or die
539
00:24:06.559 --> 00:24:07.690 A:middle L:90%
out. And what does it mean by stable and
540
00:24:07.690 --> 00:24:11.859 A:middle L:90%
unstable uh equilibrium points? So imagine this to be
541
00:24:11.859 --> 00:24:15.319 A:middle L:90%
uh contour as given by G by the system.
542
00:24:15.329 --> 00:24:18.240 A:middle L:90%
And you can see imagine the whole epidemic to be
543
00:24:18.240 --> 00:24:21.609 A:middle L:90%
this box. So you can see that a small
544
00:24:21.609 --> 00:24:22.750 A:middle L:90%
push to the system will effectively roll it down,
545
00:24:22.759 --> 00:24:25.470 A:middle L:90%
right, It will quickly go down and the epidemic
546
00:24:25.470 --> 00:24:27.269 A:middle L:90%
will take off essentially. And whereas a stable it's
547
00:24:27.269 --> 00:24:30.559 A:middle L:90%
below threshold, the epidemic will try to actually come
548
00:24:30.559 --> 00:24:32.339 A:middle L:90%
back. So the system will try to come back
549
00:24:32.339 --> 00:24:33.109 A:middle L:90%
to the state where there was no one infected.
550
00:24:33.480 --> 00:24:36.740 A:middle L:90%
And at threshold which is the neutral equilibrium. You
551
00:24:36.740 --> 00:24:37.940 A:middle L:90%
can say that there's no no inclination to go either
552
00:24:37.940 --> 00:24:41.839 A:middle L:90%
way. So what we have done is they cast
553
00:24:41.839 --> 00:24:44.000 A:middle L:90%
it as an energy is and then use it as
554
00:24:44.000 --> 00:24:45.329 A:middle L:90%
a stability and equilibrium point question rather than a threshold
555
00:24:45.329 --> 00:24:48.970 A:middle L:90%
question initially. So I won't try to go here
556
00:24:48.970 --> 00:24:49.910 A:middle L:90%
. But for N. S. I. R
557
00:24:49.910 --> 00:24:52.789 A:middle L:90%
. For example PT will be just three blocks right
558
00:24:52.799 --> 00:24:55.599 A:middle L:90%
? Which is probably of each note in the graph
559
00:24:55.609 --> 00:24:56.599 A:middle L:90%
being in each of the three states. That's it
560
00:24:56.670 --> 00:25:00.109 A:middle L:90%
. And you can have this G than the standard
561
00:25:00.109 --> 00:25:02.500 A:middle L:90%
ideas, the fixed point again, when no note
562
00:25:02.500 --> 00:25:04.380 A:middle L:90%
is infected, the question we're asking is that stable
563
00:25:04.390 --> 00:25:06.680 A:middle L:90%
? And in special case of S. I.
564
00:25:06.680 --> 00:25:07.140 A:middle L:90%
R. You can think of two notes in the
565
00:25:07.140 --> 00:25:11.089 A:middle L:90%
graph say I one I two and you are essentially
566
00:25:11.089 --> 00:25:11.569 A:middle L:90%
this is P. I. One and this P
567
00:25:11.569 --> 00:25:15.519 A:middle L:90%
. I to the probably no one is infected problem
568
00:25:15.529 --> 00:25:18.579 A:middle L:90%
nor too is infected so on under the stable regime
569
00:25:18.589 --> 00:25:19.619 A:middle L:90%
, which is below threshold regime. You can see
570
00:25:19.619 --> 00:25:22.259 A:middle L:90%
that if you part of the system, if you
571
00:25:22.259 --> 00:25:23.569 A:middle L:90%
make few notes infected in the system, it will
572
00:25:23.569 --> 00:25:27.059 A:middle L:90%
try to come back because it's below threshold. And
573
00:25:27.069 --> 00:25:30.019 A:middle L:90%
on the other hand, if you do the same
574
00:25:30.019 --> 00:25:32.069 A:middle L:90%
thing to an unstable system, it will just take
575
00:25:32.069 --> 00:25:33.880 A:middle L:90%
off and go. So what we've done is we
576
00:25:33.880 --> 00:25:37.019 A:middle L:90%
were able to separate out these two regimes based on
577
00:25:37.019 --> 00:25:40.019 A:middle L:90%
this condition. So again, please see the pain
578
00:25:40.019 --> 00:25:41.059 A:middle L:90%
over the whole proof. But these are the two
579
00:25:41.059 --> 00:25:44.339 A:middle L:90%
essential ingredients in the proof, which gives rise to
580
00:25:44.339 --> 00:25:48.730 A:middle L:90%
this nice linear severability. So right coming back to
581
00:25:48.730 --> 00:25:51.710 A:middle L:90%
the outline as I said, I promise you that
582
00:25:51.720 --> 00:25:55.309 A:middle L:90%
this kind of analysis and proof technique can actually be
583
00:25:55.319 --> 00:25:57.289 A:middle L:90%
uh extended to give you even more powerful results in
584
00:25:57.289 --> 00:26:00.250 A:middle L:90%
other cases, for example in dynamic graphs while dynamic
585
00:26:00.250 --> 00:26:03.190 A:middle L:90%
graphs. So this essentially the idea is to you
586
00:26:03.200 --> 00:26:06.930 A:middle L:90%
want to capture alternative behavior right? And human uh
587
00:26:06.940 --> 00:26:08.309 A:middle L:90%
mobility, uh, people go to work in the
588
00:26:08.309 --> 00:26:10.900 A:middle L:90%
morning. So essentially you come in contact with the
589
00:26:10.900 --> 00:26:11.769 A:middle L:90%
co workers and then you go back to your home
590
00:26:11.859 --> 00:26:15.660 A:middle L:90%
and Children probably go to school and playground and come
591
00:26:15.660 --> 00:26:18.049 A:middle L:90%
in contact with each other and and they come back
592
00:26:18.049 --> 00:26:19.049 A:middle L:90%
at night. So the adjacency matrix has changed.
593
00:26:19.059 --> 00:26:22.549 A:middle L:90%
So the notes are the same, the same people
594
00:26:22.559 --> 00:26:23.660 A:middle L:90%
is just that their behavior changes in the day and
595
00:26:23.660 --> 00:26:27.109 A:middle L:90%
night. So you have two different adjacency matrices and
596
00:26:27.119 --> 00:26:30.779 A:middle L:90%
as you can ask the same question here, right
597
00:26:30.789 --> 00:26:33.119 A:middle L:90%
? That if you have so for concreteness, I've
598
00:26:33.119 --> 00:26:34.380 A:middle L:90%
used just the science model here. Uh, if
599
00:26:34.380 --> 00:26:36.619 A:middle L:90%
you have the C. S Mott and set of
600
00:26:36.630 --> 00:26:38.089 A:middle L:90%
the arbitrary class. So the nice thing here is
601
00:26:38.089 --> 00:26:41.059 A:middle L:90%
that we assume the grass can be arbitrary. They
602
00:26:41.059 --> 00:26:41.950 A:middle L:90%
are just given, we have given just a set
603
00:26:41.950 --> 00:26:45.220 A:middle L:90%
of the class and which represent bay nine. It
604
00:26:45.220 --> 00:26:48.279 A:middle L:90%
can be weekend. You can get any granularity.
605
00:26:48.289 --> 00:26:48.960 A:middle L:90%
You're just given this tea set of grass and you
606
00:26:48.960 --> 00:26:51.910 A:middle L:90%
want to ask the same question with an epidemic takeoff
607
00:26:51.910 --> 00:26:53.569 A:middle L:90%
amount. So again, uh, we were able
608
00:26:53.569 --> 00:26:56.400 A:middle L:90%
to prove that the informally, there is no epidemic
609
00:26:56.410 --> 00:26:59.609 A:middle L:90%
. If the value of uh, of a matrix
610
00:26:59.619 --> 00:27:00.750 A:middle L:90%
, which is a huge risk matrix is less than
611
00:27:00.750 --> 00:27:03.390 A:middle L:90%
one. And again, it's just a single number
612
00:27:03.390 --> 00:27:06.210 A:middle L:90%
, right? It's just the value of a matrix
613
00:27:06.210 --> 00:27:08.009 A:middle L:90%
. And this matrix is just a product of some
614
00:27:08.009 --> 00:27:11.009 A:middle L:90%
matrices. Uh and the the important thing to note
615
00:27:11.009 --> 00:27:14.640 A:middle L:90%
here is that this sub matrices involved comes from the
616
00:27:14.640 --> 00:27:17.910 A:middle L:90%
virus propagation model as well as the ai their distance
617
00:27:17.910 --> 00:27:19.839 A:middle L:90%
matrices, which is a very reasonable intuitive right?
618
00:27:19.849 --> 00:27:22.700 A:middle L:90%
Because it should depend on the uh actually just changing
619
00:27:22.700 --> 00:27:26.339 A:middle L:90%
the distance matrices as well as the virus propagation model
620
00:27:26.349 --> 00:27:29.809 A:middle L:90%
. So, right and again, this gives you
621
00:27:29.809 --> 00:27:30.970 A:middle L:90%
the time plot this. Uh so on the left
622
00:27:30.970 --> 00:27:33.609 A:middle L:90%
side, I've shown you a synthetic uh network and
623
00:27:33.619 --> 00:27:36.890 A:middle L:90%
this is the mighty reality, which is a which
624
00:27:36.890 --> 00:27:37.990 A:middle L:90%
was a famous project by uh Sandy Pentland at M
625
00:27:37.990 --> 00:27:42.930 A:middle L:90%
I T. Which try to uh follow like track
626
00:27:42.940 --> 00:27:45.809 A:middle L:90%
undergraduates at M I T campus. And you can
627
00:27:45.809 --> 00:27:48.480 A:middle L:90%
see dips and then there's weekends because there's no connectivity
628
00:27:48.490 --> 00:27:52.869 A:middle L:90%
between people between their mobile phones in blue and so
629
00:27:52.869 --> 00:27:53.539 A:middle L:90%
on. So you can see the three again,
630
00:27:53.539 --> 00:27:56.609 A:middle L:90%
three different regimes, right? In all these three
631
00:27:56.609 --> 00:27:59.200 A:middle L:90%
cases and more concretely, if you look at the
632
00:27:59.210 --> 00:28:00.349 A:middle L:90%
takeoff plots, you can clearly see the difference.
633
00:28:00.359 --> 00:28:03.990 A:middle L:90%
Uh So the uh the X axis is again affected
634
00:28:03.990 --> 00:28:07.420 A:middle L:90%
strength here. The effective strength is the value of
635
00:28:07.420 --> 00:28:10.569 A:middle L:90%
that huge big nasty product of matrices, right?
636
00:28:10.619 --> 00:28:11.680 A:middle L:90%
And y axes. Again, the footprint, you
637
00:28:11.680 --> 00:28:15.009 A:middle L:90%
can see the predicted threshold is here and these are
638
00:28:15.009 --> 00:28:22.779 A:middle L:90%
the two separate regions. So so yeah, the
639
00:28:22.779 --> 00:28:25.190 A:middle L:90%
second bonus I wanted to tell you about this computing
640
00:28:25.190 --> 00:28:27.440 A:middle L:90%
viruses. So so the the law, right,
641
00:28:27.440 --> 00:28:30.990 A:middle L:90%
Only argue about a single viruses that there's one virus
642
00:28:30.990 --> 00:28:32.920 A:middle L:90%
and one. Uh and a bunch of contact networks
643
00:28:32.920 --> 00:28:34.690 A:middle L:90%
probably. So what if they're to viruses? And
644
00:28:34.700 --> 00:28:37.920 A:middle L:90%
these are really common scenarios. For example, you
645
00:28:37.920 --> 00:28:40.779 A:middle L:90%
can think of iphone versus android or blackberry. This
646
00:28:40.779 --> 00:28:45.200 A:middle L:90%
is the really or even more biological situations like common
647
00:28:45.200 --> 00:28:47.950 A:middle L:90%
flu is living through a pneumococcal infections and so on
648
00:28:47.960 --> 00:28:51.829 A:middle L:90%
. Uh And the question we the simple model that
649
00:28:51.829 --> 00:28:52.799 A:middle L:90%
we use here. So it's an extension of the
650
00:28:52.809 --> 00:28:53.650 A:middle L:90%
S. I. S. Model which you have
651
00:28:53.650 --> 00:28:56.059 A:middle L:90%
all seen. The important thing to note here is
652
00:28:56.059 --> 00:28:59.500 A:middle L:90%
that because you feel mutual immunity. So what it
653
00:28:59.500 --> 00:29:00.380 A:middle L:90%
means is that if you have an iphone you won't
654
00:29:00.390 --> 00:29:04.160 A:middle L:90%
buy an android unless you ditch type of. Right
655
00:29:04.170 --> 00:29:07.190 A:middle L:90%
? So that's what this model tries to capture.
656
00:29:07.200 --> 00:29:10.980 A:middle L:90%
The once you're infected one of the viruses you won't
657
00:29:10.990 --> 00:29:12.589 A:middle L:90%
be infected with another virus. So given such a
658
00:29:12.599 --> 00:29:15.529 A:middle L:90%
very simple extension of the classic S. I.
659
00:29:15.529 --> 00:29:17.220 A:middle L:90%
S. Model, uh what do we want to
660
00:29:17.220 --> 00:29:19.390 A:middle L:90%
ask the question? So you can clearly our previous
661
00:29:19.390 --> 00:29:22.140 A:middle L:90%
work answers the question with the one of the viruses
662
00:29:22.140 --> 00:29:22.910 A:middle L:90%
will survive or not. Right? I mean if
663
00:29:22.910 --> 00:29:26.759 A:middle L:90%
there's about threshold that will survive otherwise not. Uh
664
00:29:26.769 --> 00:29:29.299 A:middle L:90%
But what we really want to answer is that if
665
00:29:29.299 --> 00:29:30.950 A:middle L:90%
both of them are about threshold, what happens?
666
00:29:30.960 --> 00:29:33.109 A:middle L:90%
What's the end state? Right. And what happens
667
00:29:33.109 --> 00:29:34.869 A:middle L:90%
in the end? That's and this can be thought
668
00:29:34.869 --> 00:29:37.069 A:middle L:90%
of the footprint of the steady state development footprint at
669
00:29:37.079 --> 00:29:40.460 A:middle L:90%
the steady state of the, of the second wives
670
00:29:40.470 --> 00:29:41.859 A:middle L:90%
. So just for sake of clarity, just assume
671
00:29:41.859 --> 00:29:44.910 A:middle L:90%
that one of the virus is stronger than the other
672
00:29:44.920 --> 00:29:47.809 A:middle L:90%
. And uh, you're given just the first few
673
00:29:47.819 --> 00:29:49.069 A:middle L:90%
uh, time takes a revolution, Right? It's
674
00:29:49.069 --> 00:29:52.259 A:middle L:90%
the same infection profile. So what happens in the
675
00:29:52.259 --> 00:29:53.490 A:middle L:90%
end you think, uh, it will go like
676
00:29:53.490 --> 00:29:56.710 A:middle L:90%
this? Uh, I'll go like this. And
677
00:29:56.720 --> 00:29:57.980 A:middle L:90%
the ratio of the uh, end states will be
678
00:29:57.980 --> 00:30:00.779 A:middle L:90%
some factor of the strength. Like if you,
679
00:30:00.789 --> 00:30:02.740 A:middle L:90%
it's reasonable to assume that if the virus is two
680
00:30:02.750 --> 00:30:04.559 A:middle L:90%
times as strong, the end steady state will be
681
00:30:04.559 --> 00:30:07.329 A:middle L:90%
two times as worse. Right? The market share
682
00:30:07.329 --> 00:30:10.690 A:middle L:90%
would split probably in in that basis are square or
683
00:30:10.690 --> 00:30:12.500 A:middle L:90%
some other thing. So that really interesting thing which
684
00:30:12.500 --> 00:30:15.470 A:middle L:90%
reform that all of this is not true. And
685
00:30:15.470 --> 00:30:17.660 A:middle L:90%
essentially when it takes off. So even if one
686
00:30:17.660 --> 00:30:18.890 A:middle L:90%
of the viruses even a bit stronger than the second
687
00:30:18.890 --> 00:30:21.470 A:middle L:90%
one, it will just wipe it off. So
688
00:30:21.470 --> 00:30:25.029 A:middle L:90%
the weaker virus always dies off. So what's the
689
00:30:25.029 --> 00:30:26.710 A:middle L:90%
result? The result is given our model and any
690
00:30:26.710 --> 00:30:30.279 A:middle L:90%
graph again the graph is totally arbitrary. So uh
691
00:30:30.289 --> 00:30:33.289 A:middle L:90%
compared with previous stuff the weaker virus always dies or
692
00:30:33.299 --> 00:30:36.299 A:middle L:90%
uh and of course the cabinet is the stronger virus
693
00:30:36.299 --> 00:30:37.640 A:middle L:90%
survives if it itself is about threshold. And what's
694
00:30:37.640 --> 00:30:40.960 A:middle L:90%
the threshold? The threshold which were already computed from
695
00:30:40.960 --> 00:30:42.230 A:middle L:90%
a previous work just affected strength. Right? It's
696
00:30:42.230 --> 00:30:47.700 A:middle L:90%
the same as before. Right? So we try
697
00:30:47.700 --> 00:30:49.099 A:middle L:90%
to do some data and real examples. So this
698
00:30:49.109 --> 00:30:52.180 A:middle L:90%
shows you google translate to which is like a proxy
699
00:30:52.180 --> 00:30:55.339 A:middle L:90%
for the interests of the market share of product.
700
00:30:55.349 --> 00:30:56.410 A:middle L:90%
And you can see that the search if you plot
701
00:30:56.410 --> 00:30:59.230 A:middle L:90%
the search percentage of it was this time. This
702
00:30:59.230 --> 00:31:00.210 A:middle L:90%
is real data, right? For these two pairs
703
00:31:00.210 --> 00:31:02.890 A:middle L:90%
of products. So you can see some around these
704
00:31:02.890 --> 00:31:06.119 A:middle L:90%
like like boston it's like christmas is for example you
705
00:31:06.119 --> 00:31:07.579 A:middle L:90%
can see when the sales just pick up there.
706
00:31:07.589 --> 00:31:12.059 A:middle L:90%
But if you see the broad uh broad qualities,
707
00:31:12.059 --> 00:31:15.329 A:middle L:90%
qualitative behavior is still there. For example that it
708
00:31:15.329 --> 00:31:17.000 A:middle L:90%
was just a you can see that the strong viruses
709
00:31:17.000 --> 00:31:22.769 A:middle L:90%
coming down. Right. Right. So the law
710
00:31:22.769 --> 00:31:25.599 A:middle L:90%
have concentrated on essentially the theoretical part right? Which
711
00:31:25.599 --> 00:31:27.230 A:middle L:90%
is analyzing these models and seeing what happens, predicting
712
00:31:27.240 --> 00:31:30.519 A:middle L:90%
and answering some metrics of things. But how do
713
00:31:30.519 --> 00:31:32.220 A:middle L:90%
you actually go ahead and use like to do some
714
00:31:32.220 --> 00:31:33.829 A:middle L:90%
tasks. So the action part which we I'll go
715
00:31:33.829 --> 00:31:37.789 A:middle L:90%
where is who to immunize algorithms? So what's uh
716
00:31:37.799 --> 00:31:41.970 A:middle L:90%
what's the problem the so completely the problem is that
717
00:31:41.970 --> 00:31:44.009 A:middle L:90%
you're given a virus propagation model and you've given a
718
00:31:44.009 --> 00:31:45.859 A:middle L:90%
budget, right budget is you can think of the
719
00:31:45.859 --> 00:31:48.180 A:middle L:90%
number of nodes you want to remove and you want
720
00:31:48.180 --> 00:31:49.730 A:middle L:90%
to find the best care notes for removing, for
721
00:31:49.730 --> 00:31:52.599 A:middle L:90%
example its case to in this network and you remove
722
00:31:52.599 --> 00:31:55.460 A:middle L:90%
these two notes, is this two nights better or
723
00:31:55.460 --> 00:31:57.059 A:middle L:90%
these two notes? Uh intuitively you might imagine this
724
00:31:57.059 --> 00:32:00.160 A:middle L:90%
note this removal is better because it makes the graphic
725
00:32:00.160 --> 00:32:04.460 A:middle L:90%
chain and as we have seen before chain is bad
726
00:32:04.460 --> 00:32:07.329 A:middle L:90%
for the virus because the country is spread. So
727
00:32:07.329 --> 00:32:09.630 A:middle L:90%
if you can guess what's coming up that uh so
728
00:32:09.640 --> 00:32:12.819 A:middle L:90%
I'll talk about static graphs and then I try to
729
00:32:12.819 --> 00:32:15.319 A:middle L:90%
give you a flavor of an application. But what
730
00:32:15.319 --> 00:32:16.529 A:middle L:90%
are the challenges in such a question? Right,
731
00:32:16.539 --> 00:32:20.099 A:middle L:90%
so the challenge is, is the metric, how
732
00:32:20.099 --> 00:32:22.289 A:middle L:90%
do you measure the goodness value of a set of
733
00:32:22.289 --> 00:32:23.599 A:middle L:90%
notes? Which you have to measure that? Which
734
00:32:23.599 --> 00:32:27.430 A:middle L:90%
to set of notes are better. Uh an algorithm
735
00:32:27.440 --> 00:32:29.980 A:middle L:90%
. How do you actually go and quickly find these
736
00:32:29.980 --> 00:32:31.829 A:middle L:90%
best case set of notes with the highest goodness value
737
00:32:31.839 --> 00:32:36.299 A:middle L:90%
. So uh and given my previous work, the
738
00:32:36.299 --> 00:32:37.750 A:middle L:90%
proposed one liberty measure is actually lambda. And why
739
00:32:37.750 --> 00:32:40.460 A:middle L:90%
is it provided? It's true. It's because land
740
00:32:40.460 --> 00:32:44.190 A:middle L:90%
is epidemic threshold. So clearly what you want to
741
00:32:44.190 --> 00:32:45.039 A:middle L:90%
really do is to drop in value as fast as
742
00:32:45.039 --> 00:32:46.880 A:middle L:90%
possible. Because if it's about the threshold, you
743
00:32:46.880 --> 00:32:49.980 A:middle L:90%
know, there is an epidemic which below the threshold
744
00:32:49.980 --> 00:32:51.990 A:middle L:90%
that is not. So if you want to remove
745
00:32:51.990 --> 00:32:53.150 A:middle L:90%
notes, remove notes in a way that drops in
746
00:32:53.150 --> 00:32:55.890 A:middle L:90%
value as fast as possible. And so what I
747
00:32:55.890 --> 00:32:59.109 A:middle L:90%
mean by again, I can drop which is the
748
00:32:59.119 --> 00:33:00.960 A:middle L:90%
change in the value. Right? So you have
749
00:33:00.970 --> 00:33:02.279 A:middle L:90%
these two graphs which is original draft and you have
750
00:33:02.279 --> 00:33:05.920 A:middle L:90%
removed notes two and six. You can see diagonal
751
00:33:05.920 --> 00:33:07.730 A:middle L:90%
is probably this and adding value for here is this
752
00:33:07.740 --> 00:33:09.450 A:middle L:90%
dragon values. Again, the largest trading value of
753
00:33:09.450 --> 00:33:12.890 A:middle L:90%
the adjacency matrix. And what you want to find
754
00:33:12.890 --> 00:33:15.950 A:middle L:90%
is those set of notes which maximize the screen bar
755
00:33:15.049 --> 00:33:20.569 A:middle L:90%
within the budget. So uh as you can guess
756
00:33:20.569 --> 00:33:22.859 A:middle L:90%
the director algorithm is really expensive. You can prove
757
00:33:22.859 --> 00:33:24.720 A:middle L:90%
that it's NPR uh just to give you an example
758
00:33:24.720 --> 00:33:28.210 A:middle L:90%
if you're just 1000 notes and if you run the
759
00:33:28.220 --> 00:33:30.109 A:middle L:90%
good force method, you can see that it takes
760
00:33:30.119 --> 00:33:34.319 A:middle L:90%
like almost 2600 years. Just find five best notes
761
00:33:34.329 --> 00:33:37.230 A:middle L:90%
. So what what's our answer? So our our
762
00:33:37.230 --> 00:33:39.859 A:middle L:90%
method involves two parts again uh Part one is for
763
00:33:39.859 --> 00:33:42.430 A:middle L:90%
the shield value which was the goodness value of a
764
00:33:42.430 --> 00:33:44.920 A:middle L:90%
network. What for a set of notes? What
765
00:33:44.920 --> 00:33:46.170 A:middle L:90%
we did was we carefully approximately Dragon drop, which
766
00:33:46.170 --> 00:33:50.089 A:middle L:90%
is the drop in the value using matrix perturbation theory
767
00:33:50.099 --> 00:33:52.950 A:middle L:90%
. So once you get this formula for the I
768
00:33:52.950 --> 00:33:54.720 A:middle L:90%
can drop you can actually use. And it turns
769
00:33:54.720 --> 00:33:59.500 A:middle L:90%
out that this uh this uh this function which you
770
00:33:59.500 --> 00:34:01.390 A:middle L:90%
get from the matrix perturbation theory is essentially some model
771
00:34:01.400 --> 00:34:05.440 A:middle L:90%
. What it really means. Uh is that because
772
00:34:05.440 --> 00:34:07.680 A:middle L:90%
it's someone that you can do just a really approximation
773
00:34:07.680 --> 00:34:09.090 A:middle L:90%
quickly find the best game modes. And this gives
774
00:34:09.090 --> 00:34:12.869 A:middle L:90%
you a near optimal solution which is in the running
775
00:34:12.880 --> 00:34:15.230 A:middle L:90%
uh like linear and running time in both nodes and
776
00:34:15.230 --> 00:34:19.599 A:middle L:90%
edges. So this was nice because uh as you
777
00:34:19.599 --> 00:34:21.710 A:middle L:90%
can see, the direct algorithm is really expensive,
778
00:34:21.719 --> 00:34:22.659 A:middle L:90%
but we were able to get a really near optimal
779
00:34:22.659 --> 00:34:25.929 A:middle L:90%
linear time solution. And this is just an experiment
780
00:34:25.940 --> 00:34:29.260 A:middle L:90%
. So this is sort of contact graph again.
781
00:34:29.260 --> 00:34:30.760 A:middle L:90%
So this gives you the again the time profile,
782
00:34:30.769 --> 00:34:34.809 A:middle L:90%
this is the log of fraction of infected nodes poses
783
00:34:34.809 --> 00:34:37.469 A:middle L:90%
time. And these are the different uh immunization metrics
784
00:34:37.469 --> 00:34:40.050 A:middle L:90%
and algorithms currently in use. So you can see
785
00:34:40.050 --> 00:34:42.969 A:middle L:90%
that next year is the lower of all of them
786
00:34:42.969 --> 00:34:44.989 A:middle L:90%
, so it quickly drops to zero and the restriction
787
00:34:44.989 --> 00:34:46.239 A:middle L:90%
dies off. But the interesting thing to note here
788
00:34:46.239 --> 00:34:49.039 A:middle L:90%
is that if you know to look at the Qantas
789
00:34:49.039 --> 00:34:51.980 A:middle L:90%
immunization, this is a really popular uh method where
790
00:34:51.989 --> 00:34:53.420 A:middle L:90%
you choose a random person out of phone book.
791
00:34:53.429 --> 00:34:55.420 A:middle L:90%
You don't even is a random person, you immunized
792
00:34:55.429 --> 00:34:59.019 A:middle L:90%
random neighbor of that person. What it does is
793
00:34:59.019 --> 00:35:00.980 A:middle L:90%
that it gives you a handle on the degree,
794
00:35:00.989 --> 00:35:01.869 A:middle L:90%
the spreading the degree of the ground. So you
795
00:35:01.880 --> 00:35:05.670 A:middle L:90%
tend to immunize hubs hiding how you connected notes,
796
00:35:05.679 --> 00:35:07.570 A:middle L:90%
but you can see that our method is clearly much
797
00:35:07.570 --> 00:35:09.280 A:middle L:90%
better than that. Uh This is because we optimize
798
00:35:09.280 --> 00:35:13.139 A:middle L:90%
the real idea she'll value right, Which is uh
799
00:35:13.150 --> 00:35:14.960 A:middle L:90%
again value which we got from a previous present.
800
00:35:14.969 --> 00:35:21.000 A:middle L:90%
So as a promise, you just uh this is
801
00:35:21.010 --> 00:35:23.710 A:middle L:90%
uh variants like variant of the uh removal problem which
802
00:35:23.710 --> 00:35:27.610 A:middle L:90%
I actually had the good fortune of working with real
803
00:35:27.610 --> 00:35:30.630 A:middle L:90%
doctors at michigan. In fact uh it was interesting
804
00:35:30.630 --> 00:35:32.570 A:middle L:90%
working with the person who was a real doctor because
805
00:35:32.570 --> 00:35:35.980 A:middle L:90%
sometimes he had to come out or surgery and for
806
00:35:35.980 --> 00:35:37.280 A:middle L:90%
a meeting and used to tell us uh things happening
807
00:35:37.280 --> 00:35:42.489 A:middle L:90%
there. So uh the problem here is that you
808
00:35:42.489 --> 00:35:45.389 A:middle L:90%
have a network of hospitals like and hospitals transfer patients
809
00:35:45.579 --> 00:35:50.130 A:middle L:90%
and these patients have critical drug raises critical these patients
810
00:35:50.139 --> 00:35:52.550 A:middle L:90%
critically ill. Right? There are drug resistant bacteria
811
00:35:52.559 --> 00:35:55.230 A:middle L:90%
like this extensively drug resistant tragic losses. And what's
812
00:35:55.230 --> 00:35:58.570 A:middle L:90%
happening is that you have a set of fixed budget
813
00:35:58.570 --> 00:36:00.280 A:middle L:90%
of resources, can be money, it can be
814
00:36:00.280 --> 00:36:04.679 A:middle L:90%
disinfected, it can be a specialized medication which centralized
815
00:36:04.690 --> 00:36:07.510 A:middle L:90%
agencies trying to distribute among these hospitals and how do
816
00:36:07.510 --> 00:36:08.469 A:middle L:90%
you do that? So I suppose you have this
817
00:36:08.469 --> 00:36:10.730 A:middle L:90%
bunch of disinfectants and you give it to one of
818
00:36:10.730 --> 00:36:14.610 A:middle L:90%
the hospitals and what really happens to study the hospital
819
00:36:14.610 --> 00:36:17.579 A:middle L:90%
becomes more robust, Right? So the strength of
820
00:36:17.579 --> 00:36:22.050 A:middle L:90%
this infection decreases. And how does the strength decreases
821
00:36:22.050 --> 00:36:23.150 A:middle L:90%
given by a function? So of course there are
822
00:36:23.150 --> 00:36:25.469 A:middle L:90%
some specialized functions used in literature, but the key
823
00:36:25.469 --> 00:36:30.510 A:middle L:90%
idea everywhere is that it's diminishing returns because the more
824
00:36:30.510 --> 00:36:32.019 A:middle L:90%
you give, the more infection control associate, you
825
00:36:32.019 --> 00:36:35.420 A:middle L:90%
don't get a proportionately higher impact right? Which is
826
00:36:35.429 --> 00:36:37.010 A:middle L:90%
very reasonable. And under such a setting, you
827
00:36:37.010 --> 00:36:39.559 A:middle L:90%
really want to find out how you distribute these and
828
00:36:39.559 --> 00:36:45.019 A:middle L:90%
it can be any grand laboratory uh right to maximize
829
00:36:45.019 --> 00:36:46.809 A:middle L:90%
these hospitals so that these hospital starting a huge technical
830
00:36:46.809 --> 00:36:51.699 A:middle L:90%
picture, have already seen introduction. Right? So
831
00:36:51.699 --> 00:36:52.440 A:middle L:90%
these are the medical network, as I think you
832
00:36:52.440 --> 00:36:53.780 A:middle L:90%
have already seen the draft. So this is the
833
00:36:53.780 --> 00:36:57.710 A:middle L:90%
current practice. The current practices essentially giving every hospital
834
00:36:57.719 --> 00:37:00.809 A:middle L:90%
equal amount because they don't want to discriminate among hospitals
835
00:37:00.820 --> 00:37:02.079 A:middle L:90%
. So you can see that smart aleck bizarre method
836
00:37:02.079 --> 00:37:05.969 A:middle L:90%
and uh you can see that it has substantially fewer
837
00:37:05.969 --> 00:37:07.940 A:middle L:90%
infections and uh, this is the running time.
838
00:37:07.949 --> 00:37:10.699 A:middle L:90%
So before we came, before we started this collaboration
839
00:37:10.699 --> 00:37:13.690 A:middle L:90%
with the doctors, they were actually running simulation monte
840
00:37:13.690 --> 00:37:15.710 A:middle L:90%
Carlo simulations and they used to take more than three
841
00:37:15.710 --> 00:37:17.269 A:middle L:90%
weeks to actually figure out any kind of distribution.
842
00:37:17.280 --> 00:37:21.500 A:middle L:90%
So this is in contrast to the current practice on
843
00:37:21.500 --> 00:37:22.429 A:middle L:90%
the ground in the hospital, which is just uniform
844
00:37:22.440 --> 00:37:25.349 A:middle L:90%
. So you can see that this is greater than
845
00:37:25.349 --> 00:37:30.139 A:middle L:90%
one week, just like 14 seconds more than 30,000
846
00:37:30.150 --> 00:37:31.840 A:middle L:90%
XP. So yeah, I mean, it's a
847
00:37:31.840 --> 00:37:37.400 A:middle L:90%
totally different way of doing these things here. Uh
848
00:37:37.409 --> 00:37:39.840 A:middle L:90%
just to give you a further example of this problem
849
00:37:40.030 --> 00:37:43.230 A:middle L:90%
, uh coming back to the theme of that similar
850
00:37:43.230 --> 00:37:45.190 A:middle L:90%
problems in many different areas. This problem also occurs
851
00:37:45.190 --> 00:37:49.250 A:middle L:90%
in uh social graphs. So this is an online
852
00:37:49.250 --> 00:37:52.969 A:middle L:90%
virtual gain second life and their people administrators have some
853
00:37:52.969 --> 00:37:53.590 A:middle L:90%
resources time, which can, you can think of
854
00:37:53.590 --> 00:37:55.699 A:middle L:90%
time and you want to do this, you want
855
00:37:55.699 --> 00:37:59.539 A:middle L:90%
to see which users misbehaving and this user you should
856
00:37:59.539 --> 00:38:00.670 A:middle L:90%
give time to. And this is a pen network
857
00:38:00.679 --> 00:38:04.480 A:middle L:90%
. This is just the hospital network of pennsylvania and
858
00:38:04.480 --> 00:38:06.239 A:middle L:90%
it's an all pairs. So it's kinda was all
859
00:38:06.239 --> 00:38:07.940 A:middle L:90%
kinds of things. Uh again you can see more
860
00:38:07.940 --> 00:38:12.829 A:middle L:90%
than five x or 2.5 X difference between the current
861
00:38:12.829 --> 00:38:16.840 A:middle L:90%
practice and uh so uh lower is better work.
862
00:38:16.849 --> 00:38:23.090 A:middle L:90%
Uh Right, so the final uh part which I
863
00:38:23.099 --> 00:38:25.550 A:middle L:90%
would like to talk about the standard processes, learning
864
00:38:25.550 --> 00:38:29.940 A:middle L:90%
models from twitter, which is kind of uh study
865
00:38:29.940 --> 00:38:31.400 A:middle L:90%
on a huge data. So this was work done
866
00:38:31.400 --> 00:38:34.170 A:middle L:90%
with Yeah. What's the problem? We are given
867
00:38:34.170 --> 00:38:37.079 A:middle L:90%
an action log of people tweeting hashtag right. So
868
00:38:37.090 --> 00:38:38.789 A:middle L:90%
uh these are people who are tweeting and hashtag and
869
00:38:38.789 --> 00:38:43.119 A:middle L:90%
you just recording what they tweeted uh and you have
870
00:38:43.119 --> 00:38:45.289 A:middle L:90%
an underlying network of users so this network can be
871
00:38:45.289 --> 00:38:47.960 A:middle L:90%
defined in many different ways as and I come to
872
00:38:47.960 --> 00:38:51.590 A:middle L:90%
what we actually use in our work and what we
873
00:38:51.590 --> 00:38:52.980 A:middle L:90%
want to find is how external influence varies with the
874
00:38:52.980 --> 00:38:55.800 A:middle L:90%
number of hashtag. What is this concrete you mean
875
00:38:55.809 --> 00:39:00.389 A:middle L:90%
? So so you have a network which is this
876
00:39:00.389 --> 00:39:04.380 A:middle L:90%
person falling this and so on and she tweets something
877
00:39:04.389 --> 00:39:06.619 A:middle L:90%
about some topic. It can be done in Egypt
878
00:39:06.619 --> 00:39:10.150 A:middle L:90%
or Justin Bieber uh he picks it up and he
879
00:39:10.150 --> 00:39:13.500 A:middle L:90%
also tweets about it and you see even this person
880
00:39:13.500 --> 00:39:15.760 A:middle L:90%
tweeting about it. So the question we really wanted
881
00:39:15.760 --> 00:39:17.650 A:middle L:90%
to ask is that what really happened and what part
882
00:39:17.650 --> 00:39:20.690 A:middle L:90%
statistically can you figure out what part of it would
883
00:39:20.699 --> 00:39:22.070 A:middle L:90%
be due to this? Because he actually saw her
884
00:39:22.079 --> 00:39:24.539 A:middle L:90%
tweeting it or she just saw something in tv and
885
00:39:24.539 --> 00:39:27.389 A:middle L:90%
went ahead and tweeted about it. So why is
886
00:39:27.389 --> 00:39:29.550 A:middle L:90%
this important is that if you are a market and
887
00:39:29.559 --> 00:39:31.010 A:middle L:90%
a bunch of dollars very should give the dollar,
888
00:39:31.010 --> 00:39:34.579 A:middle L:90%
should give the money to this person because she cost
889
00:39:34.579 --> 00:39:37.139 A:middle L:90%
the whole cascade or you should just run an advertisement
890
00:39:37.139 --> 00:39:39.159 A:middle L:90%
on because that will give you much more mileage for
891
00:39:39.159 --> 00:39:43.179 A:middle L:90%
your money. Uh And we wanted to see how
892
00:39:43.179 --> 00:39:45.400 A:middle L:90%
these values with the hashtag right? Because as you
893
00:39:45.400 --> 00:39:46.739 A:middle L:90%
can imagine this might depend on the actual content of
894
00:39:46.750 --> 00:39:52.849 A:middle L:90%
the time. So the data we used was like
895
00:39:52.860 --> 00:39:54.030 A:middle L:90%
uh as I said this was the yahoo and this
896
00:39:54.030 --> 00:39:57.989 A:middle L:90%
was like almost more than almost 15 terabytes of data
897
00:39:58.000 --> 00:40:00.699 A:middle L:90%
. This was uh yeah who had twitter firehose where
898
00:40:00.699 --> 00:40:01.420 A:middle L:90%
they just used to get this whole big jump of
899
00:40:01.429 --> 00:40:05.119 A:middle L:90%
the day to more than 7 50 million tweets and
900
00:40:05.130 --> 00:40:07.840 A:middle L:90%
uh we ran our algorithms on the hard work and
901
00:40:07.840 --> 00:40:09.159 A:middle L:90%
pick system. It's like more than it's a huge
902
00:40:09.159 --> 00:40:12.210 A:middle L:90%
, they have a really excellent infrastructure, more than
903
00:40:12.210 --> 00:40:15.650 A:middle L:90%
6000 machines. Uh what we did was we took
904
00:40:15.659 --> 00:40:17.630 A:middle L:90%
find hashtags and we just saw how they value their
905
00:40:17.639 --> 00:40:22.050 A:middle L:90%
behavior on network of users and the network person connected
906
00:40:22.050 --> 00:40:23.469 A:middle L:90%
to another person if he or she can influence other
907
00:40:23.469 --> 00:40:25.730 A:middle L:90%
person. Right? So what does influence means is
908
00:40:25.730 --> 00:40:29.510 A:middle L:90%
essentially we are assuming that either your follower, what
909
00:40:29.510 --> 00:40:30.719 A:middle L:90%
you have at least actually directed messages to her.
910
00:40:30.730 --> 00:40:34.329 A:middle L:90%
So yeah there are many different ways of defining this
911
00:40:34.329 --> 00:40:36.260 A:middle L:90%
network, this is just one of the more acceptable
912
00:40:36.269 --> 00:40:39.809 A:middle L:90%
ways in the literature right now. So the model
913
00:40:39.809 --> 00:40:42.849 A:middle L:90%
here is this again details like the market here is
914
00:40:42.849 --> 00:40:44.929 A:middle L:90%
that we developed a model. The propagation is a
915
00:40:44.929 --> 00:40:46.219 A:middle L:90%
part of influence and external. So the influence part
916
00:40:46.219 --> 00:40:49.960 A:middle L:90%
is essentially how you get influence from the neighbors.
917
00:40:50.110 --> 00:40:52.820 A:middle L:90%
And the external part is what percentage of it is
918
00:40:52.829 --> 00:40:55.239 A:middle L:90%
through externalities rather than the network itself. So what
919
00:40:55.239 --> 00:40:58.300 A:middle L:90%
we did was we developed a model which takes the
920
00:40:58.300 --> 00:41:00.210 A:middle L:90%
previous observations take on. And there are some parameters
921
00:41:00.210 --> 00:41:02.489 A:middle L:90%
which explicitly represent the influence. So I want to
922
00:41:02.489 --> 00:41:05.960 A:middle L:90%
go into the detail of the model itself. But
923
00:41:05.969 --> 00:41:07.150 A:middle L:90%
the key thing to note here is that there are
924
00:41:07.150 --> 00:41:09.980 A:middle L:90%
parameters which actually represent external influence directly. And then
925
00:41:09.980 --> 00:41:13.539 A:middle L:90%
we are also developed E. M. Alternative minimization
926
00:41:13.539 --> 00:41:15.519 A:middle L:90%
algorithm to learn these models. But once you have
927
00:41:15.519 --> 00:41:15.880 A:middle L:90%
learned the models, when you have this set of
928
00:41:15.880 --> 00:41:19.460 A:middle L:90%
parameters on different hashtags, what we did was we
929
00:41:19.460 --> 00:41:22.320 A:middle L:90%
went ahead and group these tax according to the parameter
930
00:41:22.329 --> 00:41:23.780 A:middle L:90%
. And that's where we got interesting results on the
931
00:41:23.780 --> 00:41:27.599 A:middle L:90%
different behavior of hashtags to give you a flavor of
932
00:41:27.599 --> 00:41:29.679 A:middle L:90%
the results so you can see that. Uh So
933
00:41:29.679 --> 00:41:31.280 A:middle L:90%
if you think about these being external effects, so
934
00:41:31.280 --> 00:41:37.289 A:middle L:90%
more the external effect is essentially more externalities in the
935
00:41:37.300 --> 00:41:39.989 A:middle L:90%
hashtag and this is what sustained and not that these
936
00:41:39.989 --> 00:41:43.409 A:middle L:90%
are parameters which I learned from the model, not
937
00:41:43.420 --> 00:41:45.989 A:middle L:90%
given the data itself. So these are parameters as
938
00:41:45.989 --> 00:41:47.579 A:middle L:90%
implemented by the model. So these bunch of different
939
00:41:47.579 --> 00:41:51.570 A:middle L:90%
behaviors uh just go over two of them. So
940
00:41:51.579 --> 00:41:53.150 A:middle L:90%
these these are hashtags which represent the long running tax
941
00:41:53.159 --> 00:41:54.599 A:middle L:90%
. So you can see that there are really high
942
00:41:54.599 --> 00:41:57.510 A:middle L:90%
external component. So these are tax, which I
943
00:41:57.510 --> 00:41:59.550 A:middle L:90%
mean on twitter for almost two years. So there
944
00:41:59.550 --> 00:42:01.369 A:middle L:90%
is no really local component there there is no network
945
00:42:01.380 --> 00:42:04.199 A:middle L:90%
affects their. People know about the tags that you
946
00:42:04.199 --> 00:42:05.739 A:middle L:90%
eat it whenever they want. For example, this
947
00:42:05.739 --> 00:42:07.630 A:middle L:90%
just says not watching, this is what are they
948
00:42:07.630 --> 00:42:09.380 A:middle L:90%
watching now? It doesn't really depend on your friend
949
00:42:09.389 --> 00:42:13.730 A:middle L:90%
. Right? And so these these are taxes really
950
00:42:13.730 --> 00:42:15.110 A:middle L:90%
high. External component. On the other hand,
951
00:42:15.119 --> 00:42:17.610 A:middle L:90%
these are really word of more processes. These are
952
00:42:17.619 --> 00:42:22.949 A:middle L:90%
hashtags which actually grow organically in local networks and then
953
00:42:22.960 --> 00:42:25.599 A:middle L:90%
take off and then they are mentioning the twitter trending
954
00:42:25.610 --> 00:42:29.329 A:middle L:90%
topics pecked and these trending topics are essentially capture the
955
00:42:29.329 --> 00:42:31.739 A:middle L:90%
slash doc effect because suddenly there on the trending list
956
00:42:31.750 --> 00:42:34.690 A:middle L:90%
everybody knows about it and then there is no local
957
00:42:34.690 --> 00:42:37.030 A:middle L:90%
component again because now everybody knows about it and they
958
00:42:37.039 --> 00:42:37.829 A:middle L:90%
talk about it. So what we were able to
959
00:42:37.829 --> 00:42:39.579 A:middle L:90%
do is very, very able to capture these kinds
960
00:42:39.579 --> 00:42:42.659 A:middle L:90%
of different behaviors, right? And if you have
961
00:42:42.670 --> 00:42:44.849 A:middle L:90%
money, so these are just pastilles, external events
962
00:42:44.849 --> 00:42:46.039 A:middle L:90%
and trending. So if you have a bunch of
963
00:42:46.039 --> 00:42:49.619 A:middle L:90%
money where you would you give to hashtag behave like
964
00:42:49.619 --> 00:42:52.139 A:middle L:90%
this, right? Because essentially you are just seeing
965
00:42:52.139 --> 00:42:55.440 A:middle L:90%
the community with these hashtag, they're talking about it
966
00:42:55.480 --> 00:42:57.969 A:middle L:90%
and then they suddenly take off and a lot of
967
00:42:57.980 --> 00:42:59.880 A:middle L:90%
people know about it. So this gives you the
968
00:42:59.880 --> 00:43:01.900 A:middle L:90%
greatest bang for your buck. And the nice thing
969
00:43:01.900 --> 00:43:04.809 A:middle L:90%
here is that you can use this for forecasting anomaly
970
00:43:04.809 --> 00:43:06.969 A:middle L:90%
detection because if you have a hashtag which are expected
971
00:43:06.969 --> 00:43:07.440 A:middle L:90%
to behave in a certain way, they behave so
972
00:43:07.590 --> 00:43:12.920 A:middle L:90%
differently than you can do something there. Right?
973
00:43:12.929 --> 00:43:15.889 A:middle L:90%
So this concludes the uh dynamical process part of my
974
00:43:15.900 --> 00:43:19.280 A:middle L:90%
talk. So I try to quickly go over uh
975
00:43:19.289 --> 00:43:22.280 A:middle L:90%
for finance also about uh some of the other interesting
976
00:43:22.280 --> 00:43:22.860 A:middle L:90%
work I have done. So as I said,
977
00:43:22.860 --> 00:43:24.610 A:middle L:90%
I won't get too much time to discuss the details
978
00:43:24.610 --> 00:43:25.760 A:middle L:90%
, but I will be happy to talk about it
979
00:43:25.769 --> 00:43:30.099 A:middle L:90%
offline. So the first thing is like uh community
980
00:43:30.110 --> 00:43:32.110 A:middle L:90%
detection. This was what I did with Sprint research
981
00:43:32.119 --> 00:43:34.710 A:middle L:90%
. So the nice thing is that they had really
982
00:43:34.710 --> 00:43:37.239 A:middle L:90%
huge amounts of data on mobile polygraph users, right
983
00:43:37.239 --> 00:43:38.699 A:middle L:90%
? So essentially like we had, we collected data
984
00:43:38.699 --> 00:43:42.289 A:middle L:90%
from a switch in a large US city should remain
985
00:43:42.289 --> 00:43:45.670 A:middle L:90%
anonymous and you have 200,000 users and millions of calls
986
00:43:45.679 --> 00:43:49.559 A:middle L:90%
. So what we wanted to do is essentially understand
987
00:43:49.570 --> 00:43:52.000 A:middle L:90%
how how do these graphs look like, like So
988
00:43:52.010 --> 00:43:54.400 A:middle L:90%
I got to the long story short, we did
989
00:43:54.400 --> 00:43:58.260 A:middle L:90%
some extra data analysis, but the key thing is
990
00:43:58.260 --> 00:44:00.050 A:middle L:90%
that we were able to, if the graphs look
991
00:44:00.050 --> 00:44:04.449 A:middle L:90%
like this, which is essentially a core with a
992
00:44:04.449 --> 00:44:07.130 A:middle L:90%
lot of different small communities connected to the core,
993
00:44:07.139 --> 00:44:09.019 A:middle L:90%
then you can quickly identify these small communities and then
994
00:44:09.019 --> 00:44:12.480 A:middle L:90%
a really interesting thing is that these communities can be
995
00:44:12.480 --> 00:44:15.030 A:middle L:90%
clicks or by part of course. So even finding
996
00:44:15.039 --> 00:44:16.869 A:middle L:90%
planted clicks is a really hard problems, right?
997
00:44:16.880 --> 00:44:21.119 A:middle L:90%
But uh, finding bipartisan causes even harder and by
998
00:44:21.119 --> 00:44:22.809 A:middle L:90%
part I of course are not really community in the
999
00:44:22.809 --> 00:44:24.650 A:middle L:90%
traditional sense because there's no real connection between them,
1000
00:44:24.659 --> 00:44:27.670 A:middle L:90%
there's a connection across them to the other side.
1001
00:44:27.679 --> 00:44:30.309 A:middle L:90%
So, and and what we found that this kind
1002
00:44:30.309 --> 00:44:32.760 A:middle L:90%
of pattern, the pattern which we developed according a
1003
00:44:32.760 --> 00:44:35.409 A:middle L:90%
lot of different data sets, a lot of different
1004
00:44:35.489 --> 00:44:37.940 A:middle L:90%
uh, switches and users and months on the sprint
1005
00:44:37.940 --> 00:44:39.000 A:middle L:90%
dataset as well. So using this we were able
1006
00:44:39.000 --> 00:44:43.269 A:middle L:90%
to find really interesting communities. For example, this
1007
00:44:43.269 --> 00:44:45.320 A:middle L:90%
is a patent graph. This is just who cited
1008
00:44:45.320 --> 00:44:46.590 A:middle L:90%
home in a patent citation network. You can see
1009
00:44:46.590 --> 00:44:49.500 A:middle L:90%
that you can find, we could quickly find these
1010
00:44:49.500 --> 00:44:51.179 A:middle L:90%
two kinds of communities which are like patent from the
1011
00:44:51.179 --> 00:44:53.119 A:middle L:90%
same in rentals. And this shows the curtain based
1012
00:44:53.230 --> 00:44:55.809 A:middle L:90%
geographic, essentially all patents were on the same thing
1013
00:44:55.809 --> 00:44:58.699 A:middle L:90%
and you just go ahead and pick the reference from
1014
00:44:58.699 --> 00:45:00.809 A:middle L:90%
the previous payment and just copied it. So you're
1015
00:45:00.809 --> 00:45:01.880 A:middle L:90%
citing all these previous parents with the other day said
1016
00:45:02.480 --> 00:45:07.199 A:middle L:90%
. And so uh, for the Sprint for example
1017
00:45:07.199 --> 00:45:08.099 A:middle L:90%
, this kind of communities which are near clicks and
1018
00:45:08.099 --> 00:45:12.400 A:middle L:90%
nearby part of course were representative of much more business
1019
00:45:12.409 --> 00:45:15.980 A:middle L:90%
. Uh, many phenomena which are important to business
1020
00:45:15.980 --> 00:45:17.150 A:middle L:90%
right? For example, to all these people who
1021
00:45:17.150 --> 00:45:20.420 A:middle L:90%
are talking to each other. They leave too together
1022
00:45:20.420 --> 00:45:21.780 A:middle L:90%
. Right? One of them leaves the company,
1023
00:45:21.780 --> 00:45:23.199 A:middle L:90%
one of them such as service providers, you the
1024
00:45:23.199 --> 00:45:25.389 A:middle L:90%
others, which also, so that was one thing
1025
00:45:25.389 --> 00:45:27.909 A:middle L:90%
interesting. We were able to actually go ahead and
1026
00:45:27.909 --> 00:45:30.349 A:middle L:90%
talk to the marketing department of Sprint and validate that
1027
00:45:30.360 --> 00:45:34.090 A:middle L:90%
. So down this book. And so yeah,
1028
00:45:34.099 --> 00:45:37.190 A:middle L:90%
the other thing is time series analysis. Uh,
1029
00:45:37.199 --> 00:45:39.030 A:middle L:90%
so here the, one of the data sets which
1030
00:45:39.030 --> 00:45:42.719 A:middle L:90%
I used was like PGP daughter updates, Yeah,
1031
00:45:42.730 --> 00:45:45.360 A:middle L:90%
Pgp daughters that you can imagine just some networking rotors
1032
00:45:45.360 --> 00:45:47.130 A:middle L:90%
which propagate path information across the network. And the
1033
00:45:47.130 --> 00:45:50.380 A:middle L:90%
network be used was in Italy network. It's a
1034
00:45:50.380 --> 00:45:52.349 A:middle L:90%
famous research network all throughout the United States. So
1035
00:45:52.349 --> 00:45:55.119 A:middle L:90%
we had like almost 80 million updates over two years
1036
00:45:55.250 --> 00:45:58.079 A:middle L:90%
. And so what we have is that you have
1037
00:45:58.079 --> 00:46:00.469 A:middle L:90%
time see, is that each of these rotors essentially
1038
00:46:00.480 --> 00:46:02.239 A:middle L:90%
the number of uh traffic per unit time. That's
1039
00:46:02.239 --> 00:46:05.250 A:middle L:90%
it. And you want to find patterns and nominees
1040
00:46:05.260 --> 00:46:07.610 A:middle L:90%
. So very open ended question here. So what
1041
00:46:07.610 --> 00:46:08.789 A:middle L:90%
we did was we were able to find to concentrate
1042
00:46:08.789 --> 00:46:10.809 A:middle L:90%
on two patterns which are important even for a networking
1043
00:46:10.809 --> 00:46:14.940 A:middle L:90%
point of view. And uh, so one of
1044
00:46:14.940 --> 00:46:16.920 A:middle L:90%
them is for example, this is just the uh
1045
00:46:16.929 --> 00:46:21.630 A:middle L:90%
same time series after doing the logarithm market over time
1046
00:46:21.639 --> 00:46:23.579 A:middle L:90%
. You can find that there's something really steady and
1047
00:46:23.579 --> 00:46:25.659 A:middle L:90%
constant happened here. Right. And why is this
1048
00:46:25.659 --> 00:46:28.760 A:middle L:90%
important? This just means that there is a constant
1049
00:46:28.760 --> 00:46:30.179 A:middle L:90%
studies traffic for a long period of time. And
1050
00:46:30.179 --> 00:46:32.519 A:middle L:90%
what it means is that it means it relates to
1051
00:46:32.519 --> 00:46:36.460 A:middle L:90%
a real networking human called wrong flapping. Which is
1052
00:46:36.469 --> 00:46:38.420 A:middle L:90%
uh like people advertising I. P. And then
1053
00:46:38.420 --> 00:46:40.590 A:middle L:90%
take it back and they do this for a long
1054
00:46:40.590 --> 00:46:43.349 A:middle L:90%
period of time. And this is really points to
1055
00:46:43.349 --> 00:46:45.050 A:middle L:90%
inefficiencies in the network. So we were actually using
1056
00:46:45.050 --> 00:46:47.510 A:middle L:90%
a method we were actually able to find uh well
1057
00:46:47.510 --> 00:46:52.019 A:middle L:90%
appointed Alabama supercomputing network which which confirmed that one of
1058
00:46:52.019 --> 00:46:54.340 A:middle L:90%
the Rockies was actually flapping. And uh this when
1059
00:46:54.340 --> 00:46:57.699 A:middle L:90%
detected and resolved almost 30 days. Like there's a
1060
00:46:57.699 --> 00:47:00.539 A:middle L:90%
professionally managed network. And so this shows that it's
1061
00:47:00.539 --> 00:47:01.780 A:middle L:90%
really important to do these kinds of pattern analysis on
1062
00:47:01.780 --> 00:47:05.170 A:middle L:90%
the historical data as well. Right? And the
1063
00:47:05.170 --> 00:47:07.090 A:middle L:90%
other thing is also if you just if I just
1064
00:47:07.090 --> 00:47:08.909 A:middle L:90%
give you this mostly sequence, it's really hard to
1065
00:47:08.909 --> 00:47:13.219 A:middle L:90%
find anything useful there. Right? And it turns
1066
00:47:13.219 --> 00:47:14.909 A:middle L:90%
out that if you just magnify this part of the
1067
00:47:14.909 --> 00:47:17.269 A:middle L:90%
portion there is a really huge short burst of traffic
1068
00:47:17.280 --> 00:47:19.980 A:middle L:90%
which and it's a short bus ride just in eight
1069
00:47:19.980 --> 00:47:22.800 A:middle L:90%
hours by compared to the month long steady activity of
1070
00:47:22.809 --> 00:47:27.260 A:middle L:90%
uh the other event. And what we were able
1071
00:47:27.260 --> 00:47:29.670 A:middle L:90%
to show like this was uh it was because they
1072
00:47:29.670 --> 00:47:31.340 A:middle L:90%
were spammers in some middle schools in china which was
1073
00:47:31.349 --> 00:47:34.530 A:middle L:90%
uh what was happening is that they used to span
1074
00:47:34.539 --> 00:47:37.010 A:middle L:90%
quickly for eight hours and then go and take a
1075
00:47:37.039 --> 00:47:37.550 A:middle L:90%
second I. D. Block and do the same
1076
00:47:37.550 --> 00:47:39.960 A:middle L:90%
thing again. So and we were able to do
1077
00:47:39.960 --> 00:47:43.940 A:middle L:90%
this using a multi scale uh analysis which uses wave
1078
00:47:43.940 --> 00:47:45.010 A:middle L:90%
. Let's I won't go into details but there's an
1079
00:47:45.010 --> 00:47:47.500 A:middle L:90%
algorithm which we developed an algorithm which we quickly identified
1080
00:47:47.510 --> 00:47:51.940 A:middle L:90%
the spikes from the data. So right. The
1081
00:47:51.940 --> 00:47:54.159 A:middle L:90%
last time series analysis question which we try to answer
1082
00:47:54.159 --> 00:47:58.329 A:middle L:90%
is uh answering similarity varies. So this was motivated
1083
00:47:58.329 --> 00:48:00.260 A:middle L:90%
again by the GDP data which we had and also
1084
00:48:00.260 --> 00:48:01.849 A:middle L:90%
different many different kinds of data like data center monitoring
1085
00:48:01.849 --> 00:48:04.869 A:middle L:90%
data where you have time series from different sensors on
1086
00:48:04.869 --> 00:48:08.699 A:middle L:90%
the uh data center uh physio physiotherapy data, healthcare
1087
00:48:08.699 --> 00:48:13.429 A:middle L:90%
data for example heartbeats or some sensors on your body
1088
00:48:13.440 --> 00:48:15.940 A:middle L:90%
. Or even motion capture data where you can have
1089
00:48:15.940 --> 00:48:19.039 A:middle L:90%
markers which attracted movement in time. And what you
1090
00:48:19.039 --> 00:48:20.440 A:middle L:90%
really want to answer is that if you have a
1091
00:48:20.440 --> 00:48:22.739 A:middle L:90%
database of such types of time series can you quickly
1092
00:48:22.739 --> 00:48:25.449 A:middle L:90%
find other similar ones? Given a very. So
1093
00:48:25.449 --> 00:48:28.039 A:middle L:90%
what are the challenges? I just try to give
1094
00:48:28.039 --> 00:48:29.960 A:middle L:90%
you a challenge in the BdB setting. So if
1095
00:48:29.960 --> 00:48:31.550 A:middle L:90%
you have a say the time series from Washington after
1096
00:48:31.550 --> 00:48:36.519 A:middle L:90%
and you have a time series from uh Salt Lake
1097
00:48:36.519 --> 00:48:37.820 A:middle L:90%
city of water, then how do you say whether
1098
00:48:37.820 --> 00:48:42.159 A:middle L:90%
these are similar? So people are used the traditional
1099
00:48:42.159 --> 00:48:45.119 A:middle L:90%
classical methods of Euclidean distance or dynamic time warping,
1100
00:48:45.119 --> 00:48:46.670 A:middle L:90%
which captures like. But the problem with all these
1101
00:48:46.670 --> 00:48:49.559 A:middle L:90%
methods, like for example, for the ingredients,
1102
00:48:49.570 --> 00:48:52.219 A:middle L:90%
if the spice align then clearly there's nothing else you
1103
00:48:52.219 --> 00:48:52.800 A:middle L:90%
need to do. Most of the series will be
1104
00:48:52.800 --> 00:48:55.880 A:middle L:90%
classified as similar. So because these sequences so busty
1105
00:48:55.880 --> 00:48:59.530 A:middle L:90%
, you need something else. And there are uh
1106
00:48:59.539 --> 00:49:02.070 A:middle L:90%
specific problems with other distance functions as well. So
1107
00:49:02.070 --> 00:49:05.300 A:middle L:90%
what we did was we developed a complex, extended
1108
00:49:05.300 --> 00:49:07.630 A:middle L:90%
the classic real value Kalman filters which are like graphical
1109
00:49:07.630 --> 00:49:12.449 A:middle L:90%
models, uh like hidden Markov models which try to
1110
00:49:12.449 --> 00:49:15.250 A:middle L:90%
capture the statistical revolution of the data. So what
1111
00:49:15.250 --> 00:49:17.440 A:middle L:90%
we were able to do that we extended this uh
1112
00:49:17.449 --> 00:49:21.309 A:middle L:90%
real value common filters with complex domain. So we
1113
00:49:21.309 --> 00:49:24.159 A:middle L:90%
have no complex and variable and complex distributions. And
1114
00:49:24.159 --> 00:49:27.369 A:middle L:90%
we were able to develop a E. M.
1115
00:49:27.369 --> 00:49:30.590 A:middle L:90%
Style algorithm again here, which quickly learns these features
1116
00:49:30.599 --> 00:49:31.659 A:middle L:90%
. And once you've given these features you can quickly
1117
00:49:31.659 --> 00:49:34.860 A:middle L:90%
classifying cluster or do anything with them. And the
1118
00:49:34.860 --> 00:49:37.250 A:middle L:90%
nice thing about these features that learns dynamics, it
1119
00:49:37.250 --> 00:49:39.710 A:middle L:90%
learns the underlying uh evolution system. Like for example
1120
00:49:39.710 --> 00:49:42.550 A:middle L:90%
in the motion capture data, it learns how you
1121
00:49:42.550 --> 00:49:45.070 A:middle L:90%
walk there to walking motions. Most of them will
1122
00:49:45.070 --> 00:49:45.570 A:middle L:90%
remain the same, right? If you're walking a
1123
00:49:45.579 --> 00:49:47.630 A:middle L:90%
little bit faster, the dynamics are similar. Just
1124
00:49:47.630 --> 00:49:50.360 A:middle L:90%
the parameters are slightly different, but you want them
1125
00:49:50.360 --> 00:49:52.559 A:middle L:90%
to be clustered together. So. So yeah,
1126
00:49:52.570 --> 00:49:53.699 A:middle L:90%
so it captures all the nice things. And the
1127
00:49:53.710 --> 00:49:57.139 A:middle L:90%
really nice thing also is that includes all these workhorses
1128
00:49:57.139 --> 00:49:59.429 A:middle L:90%
, right? Like the CIA or the regression B
1129
00:49:59.429 --> 00:50:00.369 A:middle L:90%
f D. S special cases of the model.
1130
00:50:00.380 --> 00:50:04.039 A:middle L:90%
Uh Right. And if you apply it on GDP
1131
00:50:04.039 --> 00:50:07.380 A:middle L:90%
data, you can quickly see these clusters like which
1132
00:50:07.380 --> 00:50:08.829 A:middle L:90%
are, which makes sense because the geographical clusters and
1133
00:50:08.840 --> 00:50:12.889 A:middle L:90%
because the GPS routing protocol you would expect geographically closer
1134
00:50:12.889 --> 00:50:14.869 A:middle L:90%
, it is to be essentially similar in traffic.
1135
00:50:14.880 --> 00:50:19.340 A:middle L:90%
So right, right. Finally, future plans.
1136
00:50:19.349 --> 00:50:22.130 A:middle L:90%
So what my research team has essentially touched upon all
1137
00:50:22.130 --> 00:50:23.199 A:middle L:90%
these three things which have already seen like the data
1138
00:50:23.199 --> 00:50:25.909 A:middle L:90%
analysis and the policy action part. So my research
1139
00:50:25.909 --> 00:50:29.550 A:middle L:90%
future plans also are more line with these three right
1140
00:50:29.559 --> 00:50:31.409 A:middle L:90%
now. So the first challenge in the data part
1141
00:50:31.409 --> 00:50:35.099 A:middle L:90%
is of course capability given the unprecedented amount of data
1142
00:50:35.110 --> 00:50:37.300 A:middle L:90%
. Like you really need algorithms and techniques for massive
1143
00:50:37.300 --> 00:50:39.909 A:middle L:90%
graphs and we and the nice the interesting thing here
1144
00:50:39.909 --> 00:50:43.760 A:middle L:90%
is that you have high dimensionality which is you have
1145
00:50:43.760 --> 00:50:45.989 A:middle L:90%
much richer data as well as a large sample size
1146
00:50:45.989 --> 00:50:46.730 A:middle L:90%
. So a lot of data. So you need
1147
00:50:46.730 --> 00:50:49.840 A:middle L:90%
to get level algorithms for both learning models on the
1148
00:50:49.840 --> 00:50:52.510 A:middle L:90%
data if you remember the model learning part and also
1149
00:50:52.510 --> 00:50:55.809 A:middle L:90%
developing policies because if you develop a policy for actually
1150
00:50:55.820 --> 00:50:59.340 A:middle L:90%
manipulating the process, you need to make it act
1151
00:50:59.340 --> 00:51:00.369 A:middle L:90%
on really last data. So it's interactive both of
1152
00:51:00.369 --> 00:51:02.860 A:middle L:90%
these cases. So as part of my at least
1153
00:51:02.869 --> 00:51:06.519 A:middle L:90%
like what the law I use like mattresses, clusters
1154
00:51:06.519 --> 00:51:07.500 A:middle L:90%
and like how to for the data and it's a
1155
00:51:07.510 --> 00:51:09.420 A:middle L:90%
part and also compute uh if you have a computer
1156
00:51:09.420 --> 00:51:14.849 A:middle L:90%
in terms of simulations use like paralyzed systems like so
1157
00:51:14.860 --> 00:51:19.269 A:middle L:90%
so the second trust is on understanding analysis. Like
1158
00:51:19.280 --> 00:51:22.599 A:middle L:90%
it's using models forecast which is essentially you building predictive
1159
00:51:22.599 --> 00:51:24.800 A:middle L:90%
models. So this is one thing for example and
1160
00:51:24.809 --> 00:51:29.159 A:middle L:90%
actually forecasting and back casting the processes. For example
1161
00:51:29.159 --> 00:51:30.670 A:middle L:90%
this is a nice example you have all these people
1162
00:51:30.670 --> 00:51:32.780 A:middle L:90%
infected right? Can you actually reverse engineer epidemic and
1163
00:51:32.780 --> 00:51:36.269 A:middle L:90%
figure out who started it? So that's one way
1164
00:51:36.269 --> 00:51:37.460 A:middle L:90%
of using the model to actually back cast and prevent
1165
00:51:37.460 --> 00:51:39.590 A:middle L:90%
who are the culprits. And the other thing is
1166
00:51:39.590 --> 00:51:42.489 A:middle L:90%
like emerging models. So you if you have suppose
1167
00:51:42.489 --> 00:51:44.960 A:middle L:90%
you have this google trends data right? This essentially
1168
00:51:44.960 --> 00:51:45.429 A:middle L:90%
gives you the number of flu that is for your
1169
00:51:45.440 --> 00:51:49.409 A:middle L:90%
time and you have also CDC data which gives you
1170
00:51:49.409 --> 00:51:52.010 A:middle L:90%
the population density and the distribution of population United States
1171
00:51:52.019 --> 00:51:53.210 A:middle L:90%
. Can and if you have a model of learning
1172
00:51:53.210 --> 00:51:55.059 A:middle L:90%
this time series for example, you can say that
1173
00:51:55.059 --> 00:51:58.309 A:middle L:90%
the slope is a 1.5. So can you actually
1174
00:51:58.309 --> 00:52:00.380 A:middle L:90%
predict which part of the network it actually originated from
1175
00:52:00.389 --> 00:52:01.579 A:middle L:90%
? Can you actually identify which part of the network
1176
00:52:01.579 --> 00:52:06.079 A:middle L:90%
actually gave you that uh post and figure out of
1177
00:52:06.079 --> 00:52:08.440 A:middle L:90%
the epidemic is actually spreading. So essentially combining different
1178
00:52:08.449 --> 00:52:12.420 A:middle L:90%
kinds of models. And the third challenge is active
1179
00:52:12.420 --> 00:52:15.070 A:middle L:90%
policy which is on more online and timely intervention.
1180
00:52:15.079 --> 00:52:17.539 A:middle L:90%
So right now I show you immunization algorithms where he
1181
00:52:17.550 --> 00:52:20.860 A:middle L:90%
wants to the decision, you didn't really change it
1182
00:52:20.860 --> 00:52:22.559 A:middle L:90%
. Right? So how do you update these decisions
1183
00:52:22.559 --> 00:52:23.929 A:middle L:90%
on time? So for example if you have a
1184
00:52:23.929 --> 00:52:27.070 A:middle L:90%
lot of money and should be about money, one
1185
00:52:27.070 --> 00:52:29.980 A:middle L:90%
going a marketing campaign, say on twitter or you
1186
00:52:29.980 --> 00:52:31.559 A:middle L:90%
should try to spread the money or a month given
1187
00:52:31.559 --> 00:52:35.150 A:middle L:90%
how it goes. Or you should just uh have
1188
00:52:35.150 --> 00:52:37.030 A:middle L:90%
a constant uh supply. What to do with the
1189
00:52:37.030 --> 00:52:40.059 A:middle L:90%
vaccination campaign is not going well. Uh how do
1190
00:52:40.059 --> 00:52:43.269 A:middle L:90%
you change the parameters which how do you target and
1191
00:52:43.269 --> 00:52:45.690 A:middle L:90%
change the organization algorithms? So, uh final thing
1192
00:52:45.690 --> 00:52:47.300 A:middle L:90%
is like you want to tighten this, I want
1193
00:52:47.300 --> 00:52:51.159 A:middle L:90%
to tighten really try to describe the analysis and policy
1194
00:52:51.159 --> 00:52:53.659 A:middle L:90%
and action. So for example, right now the
1195
00:52:53.670 --> 00:52:57.460 A:middle L:90%
object IDs are probably not being transparent in the design
1196
00:52:57.460 --> 00:52:59.699 A:middle L:90%
and analysis of these actions. Right? So for
1197
00:52:59.699 --> 00:53:00.980 A:middle L:90%
example, when you want to do immunization, do
1198
00:53:00.980 --> 00:53:02.840 A:middle L:90%
you want to minimize the expected number of people in
1199
00:53:02.849 --> 00:53:07.539 A:middle L:90%
uh effective or want to minimize economic damage? How
1200
00:53:07.539 --> 00:53:09.409 A:middle L:90%
do you actually bring them into analysis part and actually
1201
00:53:09.420 --> 00:53:13.269 A:middle L:90%
do an analysis and given optimal or near optimal algorithms
1202
00:53:13.280 --> 00:53:15.110 A:middle L:90%
for this? Uh Also collaborating with the miners group
1203
00:53:15.110 --> 00:53:17.130 A:middle L:90%
that Minister of Pittsburgh, which is like, which
1204
00:53:17.130 --> 00:53:21.239 A:middle L:90%
is also an agent based simulations group, which where
1205
00:53:21.239 --> 00:53:22.550 A:middle L:90%
we're trying to study can be actually uh w optimal
1206
00:53:22.550 --> 00:53:27.150 A:middle L:90%
algorithms for such on the fly campaigns. So,
1207
00:53:27.389 --> 00:53:29.659 A:middle L:90%
uh hopefully, finally, I've given you a sense
1208
00:53:29.659 --> 00:53:31.239 A:middle L:90%
of that. The dynamical process on networks is a
1209
00:53:31.239 --> 00:53:34.570 A:middle L:90%
really rich area. The comments, problems, incompatible
1210
00:53:34.570 --> 00:53:37.730 A:middle L:90%
settings and not only see us uh areas are really
1211
00:53:37.730 --> 00:53:43.570 A:middle L:90%
heavily involved like machine and statistics. Uh Computer systems
1212
00:53:43.570 --> 00:53:45.630 A:middle L:90%
for data analysis uh like hello big data analysis and
1213
00:53:45.630 --> 00:53:49.369 A:middle L:90%
theory and algorithms but also they have really outreach and
1214
00:53:49.369 --> 00:53:52.940 A:middle L:90%
applications in many different areas like biology, like ecologies
1215
00:53:52.949 --> 00:53:54.690 A:middle L:90%
, you're always in epidemiology, Public health physics like
1216
00:53:54.690 --> 00:53:59.070 A:middle L:90%
money in the systems, uh social sciences, understanding
1217
00:53:59.070 --> 00:54:01.690 A:middle L:90%
human behavior, mobility and economic like money marketing and
1218
00:54:01.699 --> 00:54:05.599 A:middle L:90%
a lot of different things. So just a bit
1219
00:54:05.599 --> 00:54:07.780 A:middle L:90%
of shameless self promotion is the list of publications I
1220
00:54:07.780 --> 00:54:10.010 A:middle L:90%
have and uh the stars represent the publications I talked
1221
00:54:10.010 --> 00:54:13.280 A:middle L:90%
about in the talk that double stars represent. I
1222
00:54:13.280 --> 00:54:15.559 A:middle L:90%
meant it a bit more detail. Uh a couple
1223
00:54:15.559 --> 00:54:20.050 A:middle L:90%
of patents in summit advance. Uh I wish to
1224
00:54:20.050 --> 00:54:22.639 A:middle L:90%
thank my collaborators from the different universities and research labs
1225
00:54:22.639 --> 00:54:27.840 A:middle L:90%
and also graduate students and uh also the funding agencies
1226
00:54:27.849 --> 00:54:32.760 A:middle L:90%
. Thank you. Uh Mhm. Okay. Mhm
1227
00:54:34.150 --> A:middle L:90%
. Yeah.