WEBVTT

1
00:00:09.140 --> 00:00:12.580 A:middle L:90%
Thanks for the introduction. So I hope everybody can

2
00:00:12.580 --> 00:00:15.240 A:middle L:90%
hear me please do safe? Not. And uh

3
00:00:15.250 --> 00:00:18.129 A:middle L:90%
I hope this is visible. It is great.

4
00:00:18.140 --> 00:00:20.690 A:middle L:90%
Thanks. So, uh as I said, I'm

5
00:00:20.690 --> 00:00:22.609 A:middle L:90%
on today, I'll be talking about dynamical processes on

6
00:00:22.609 --> 00:00:27.410 A:middle L:90%
large networks today. And uh uh first I hope

7
00:00:27.410 --> 00:00:29.339 A:middle L:90%
I don't want to have to convince you that networks

8
00:00:29.339 --> 00:00:31.949 A:middle L:90%
are really everywhere. For example, uh the social

9
00:00:31.949 --> 00:00:34.420 A:middle L:90%
network for example, where people are friends are uh

10
00:00:34.429 --> 00:00:36.579 A:middle L:90%
of each other. And you can see the facebook

11
00:00:36.579 --> 00:00:39.090 A:middle L:90%
network here in 2010 is a very interesting network.

12
00:00:39.090 --> 00:00:43.210 A:middle L:90%
It's a human disease network where notes diseases and edges

13
00:00:43.210 --> 00:00:45.969 A:middle L:90%
are between diseases share genes. So the nice thing

14
00:00:45.969 --> 00:00:48.250 A:middle L:90%
about networks is that they give you the local structure

15
00:00:48.250 --> 00:00:50.079 A:middle L:90%
as well as the global information. Right? Like

16
00:00:50.090 --> 00:00:52.609 A:middle L:90%
you can quickly see how the social network is distributed

17
00:00:52.619 --> 00:00:55.270 A:middle L:90%
throughout the world. You can see how diseases interact

18
00:00:55.280 --> 00:00:58.039 A:middle L:90%
and how the clusters and how similar diseases share similar

19
00:00:58.039 --> 00:01:02.600 A:middle L:90%
genes. So, uh, understanding that what I

20
00:01:02.600 --> 00:01:04.239 A:middle L:90%
wish to convey in this talk is that dynamical processes

21
00:01:04.239 --> 00:01:07.950 A:middle L:90%
over networks are also everywhere. And what it means

22
00:01:07.950 --> 00:01:10.480 A:middle L:90%
by dynamical process will be made clear soon. But

23
00:01:10.480 --> 00:01:12.530 A:middle L:90%
essentially some kind of propagation of spreading kind process on

24
00:01:12.530 --> 00:01:17.090 A:middle L:90%
the network. So why do we care? So

25
00:01:17.090 --> 00:01:19.120 A:middle L:90%
why do we care about this dynamical processes? We

26
00:01:19.120 --> 00:01:21.159 A:middle L:90%
care Because it occurs in lots of different domains and

27
00:01:21.159 --> 00:01:25.049 A:middle L:90%
lots of feels for example, online information refusing is

28
00:01:25.049 --> 00:01:27.079 A:middle L:90%
kind of like a dynamical process viral marketing where people

29
00:01:27.079 --> 00:01:30.500 A:middle L:90%
recommend products to each other on amazon, on twitter

30
00:01:30.500 --> 00:01:34.370 A:middle L:90%
and sales propagate are also a dynamical process. Then

31
00:01:34.370 --> 00:01:37.700 A:middle L:90%
they are also in cybersecurity where virus processes propagate the

32
00:01:37.700 --> 00:01:38.879 A:middle L:90%
epidemiology and public health. It's a big application area

33
00:01:38.879 --> 00:01:42.170 A:middle L:90%
for those kinds of processes and so on. So

34
00:01:42.170 --> 00:01:44.859 A:middle L:90%
I try to give you a sense of two main

35
00:01:44.859 --> 00:01:47.079 A:middle L:90%
application areas in my talk, one is the epidemiology

36
00:01:47.079 --> 00:01:49.109 A:middle L:90%
and one social media. So in epidemiology, the

37
00:01:49.120 --> 00:01:53.060 A:middle L:90%
dynamical process essentially disease spreading over contact networks, right

38
00:01:53.069 --> 00:01:55.170 A:middle L:90%
? Like for example in this contact network, you

39
00:01:55.170 --> 00:01:59.239 A:middle L:90%
have this uh infected with some disease flew his ill

40
00:01:59.250 --> 00:02:00.859 A:middle L:90%
probably spread to one of his neighbors. And what

41
00:02:00.859 --> 00:02:02.950 A:middle L:90%
I mean by contact networks, especially people who come

42
00:02:02.950 --> 00:02:05.530 A:middle L:90%
in contact with each other. Right. Who can

43
00:02:05.530 --> 00:02:08.469 A:middle L:90%
actually spread the disease to one hour. So yeah

44
00:02:08.479 --> 00:02:10.620 A:middle L:90%
, happens. This is a very interesting network actually

45
00:02:10.620 --> 00:02:14.490 A:middle L:90%
. It was the american general Health. It was

46
00:02:14.500 --> 00:02:16.050 A:middle L:90%
uh this is CDC data series is the Center for

47
00:02:16.050 --> 00:02:19.219 A:middle L:90%
Disease Control in Atlanta. And this is the visual

48
00:02:19.229 --> 00:02:22.479 A:middle L:90%
visualization of the 1st 35 tuberculosis patients. So what

49
00:02:22.479 --> 00:02:23.645 A:middle L:90%
it does is it shows you how the patient zero

50
00:02:23.645 --> 00:02:25.865 A:middle L:90%
was there and how he actually he or she spread

51
00:02:25.865 --> 00:02:29.745 A:middle L:90%
the uh disease to other parts of the uh,

52
00:02:29.754 --> 00:02:30.514 A:middle L:90%
in the population. And by the way, the

53
00:02:30.514 --> 00:02:34.685 A:middle L:90%
gray ones do not. People uh cases it's fatal

54
00:02:34.685 --> 00:02:36.525 A:middle L:90%
and the big ones do not which are not.

55
00:02:36.534 --> 00:02:38.064 A:middle L:90%
So you can quickly realized that which the nose you

56
00:02:38.064 --> 00:02:40.935 A:middle L:90%
should have immunized which uh help to control the disease

57
00:02:40.935 --> 00:02:45.914 A:middle L:90%
for example. So just to give you an example

58
00:02:45.914 --> 00:02:46.905 A:middle L:90%
of the type of questions we try to answer using

59
00:02:46.905 --> 00:02:50.955 A:middle L:90%
such uh terminology and abstraction is like this is one

60
00:02:50.955 --> 00:02:53.495 A:middle L:90%
of my work which I did with uh people like

61
00:02:53.495 --> 00:02:55.104 A:middle L:90%
michigan, like it's like each circle is the hospital

62
00:02:55.104 --> 00:03:00.425 A:middle L:90%
and this is like 303 3000 hospitals across the U

63
00:03:00.425 --> 00:03:00.525 A:middle L:90%
. S. And the U. S. Medicare

64
00:03:00.534 --> 00:03:04.104 A:middle L:90%
uh network. And this is like more than 30,000

65
00:03:04.104 --> 00:03:06.955 A:middle L:90%
patients transport. So the question here really was that

66
00:03:06.965 --> 00:03:08.004 A:middle L:90%
we had some k. Units of some infection control

67
00:03:08.004 --> 00:03:12.055 A:middle L:90%
resource. And we wanted to just decide how to

68
00:03:12.055 --> 00:03:15.294 A:middle L:90%
spread them out across the hospitals to immunize to minimize

69
00:03:15.294 --> 00:03:17.314 A:middle L:90%
the patients infected, right? So like we could

70
00:03:17.324 --> 00:03:21.405 A:middle L:90%
, so we developed an algorithm which was like this

71
00:03:21.405 --> 00:03:23.194 A:middle L:90%
the current practice, you can see like the red

72
00:03:23.194 --> 00:03:24.995 A:middle L:90%
ones do not, the hospitals are infected and you

73
00:03:24.995 --> 00:03:28.125 A:middle L:90%
can see the current our method is like almost six

74
00:03:28.125 --> 00:03:30.925 A:middle L:90%
times for your hospital infected. So that's the nice

75
00:03:30.925 --> 00:03:32.344 A:middle L:90%
thing, right? So uh these kinds of abstraction

76
00:03:32.344 --> 00:03:36.740 A:middle L:90%
help you solve such real problems. Uh so the

77
00:03:36.740 --> 00:03:38.439 A:middle L:90%
second application area, which I want to convey his

78
00:03:38.439 --> 00:03:40.740 A:middle L:90%
online diffusion which is like information diffusion in the social

79
00:03:40.750 --> 00:03:44.110 A:middle L:90%
sphere. For example, this is just a snapshot

80
00:03:44.110 --> 00:03:46.259 A:middle L:90%
of the startups and companies which are in the social

81
00:03:46.259 --> 00:03:50.120 A:middle L:90%
sphere in like 2007. So uh like facebook twitter

82
00:03:50.120 --> 00:03:52.330 A:middle L:90%
and linkedin are already really big companies with a lot

83
00:03:52.330 --> 00:03:54.349 A:middle L:90%
of revenue and also projected earnings. So it's a

84
00:03:54.349 --> 00:03:57.300 A:middle L:90%
big, it has a huge economic impact as well

85
00:03:57.310 --> 00:04:00.620 A:middle L:90%
. Uh So for example, in social viral marketing

86
00:04:00.620 --> 00:04:01.539 A:middle L:90%
, what would be the scenario? So you can

87
00:04:01.550 --> 00:04:04.319 A:middle L:90%
think of this as the mega celebrity who has thousands

88
00:04:04.319 --> 00:04:08.550 A:middle L:90%
of followers, all these little birds and suppose the

89
00:04:08.550 --> 00:04:10.419 A:middle L:90%
celebrity says bye to say either the U. S

90
00:04:10.419 --> 00:04:12.610 A:middle L:90%
. Has paid him or her or he tells of

91
00:04:12.610 --> 00:04:15.040 A:middle L:90%
his own volition but the resultant that follows go ahead

92
00:04:15.040 --> 00:04:18.550 A:middle L:90%
and buy something in and everyone makes money. So

93
00:04:18.560 --> 00:04:21.420 A:middle L:90%
uh this is essentially the demise of social media marketing

94
00:04:21.500 --> 00:04:25.360 A:middle L:90%
. Uh And of course given the events of the

95
00:04:25.360 --> 00:04:27.970 A:middle L:90%
past year or so like uh you might imagine the

96
00:04:27.970 --> 00:04:30.759 A:middle L:90%
social networks and uh this dynamical processes can be used

97
00:04:30.759 --> 00:04:32.620 A:middle L:90%
for collaborative action of some sort of changing the world

98
00:04:32.740 --> 00:04:36.569 A:middle L:90%
, right? So uh what I also want to

99
00:04:36.569 --> 00:04:40.370 A:middle L:90%
country is that you have different settings, multiple really

100
00:04:40.370 --> 00:04:43.149 A:middle L:90%
high impact settings but also similar questions are similar code

101
00:04:43.149 --> 00:04:45.829 A:middle L:90%
questions arising in different areas. For example, in

102
00:04:45.829 --> 00:04:47.649 A:middle L:90%
the uh social media setting, you can be like

103
00:04:47.740 --> 00:04:50.250 A:middle L:90%
squashing rumors right? Like somebody spread some bad rumor

104
00:04:50.250 --> 00:04:53.389 A:middle L:90%
about untrue things on twitter. How do you squash

105
00:04:53.389 --> 00:04:55.779 A:middle L:90%
them? How do you see the two information on

106
00:04:55.790 --> 00:04:58.389 A:middle L:90%
different people in the network? How do your opinion

107
00:04:58.389 --> 00:05:00.050 A:middle L:90%
spread? So how do rumors spread might change into

108
00:05:00.050 --> 00:05:03.689 A:middle L:90%
an epidemic spreading epidemiological setting? And how to opinion

109
00:05:03.689 --> 00:05:06.370 A:middle L:90%
spread in a facebook or in a group can be

110
00:05:06.370 --> 00:05:09.089 A:middle L:90%
a similar question can be asked. And how to

111
00:05:09.089 --> 00:05:12.230 A:middle L:90%
products are viruses spread or say a contact network or

112
00:05:12.240 --> 00:05:14.920 A:middle L:90%
influence network and how to market better in the twitter

113
00:05:14.920 --> 00:05:16.500 A:middle L:90%
said it can be also how to transmit software patches

114
00:05:16.509 --> 00:05:18.629 A:middle L:90%
. For example, Windows is working on ways to

115
00:05:18.639 --> 00:05:21.410 A:middle L:90%
transmit software patches in the best efficiently as possible to

116
00:05:21.410 --> 00:05:26.689 A:middle L:90%
prevent attacks. So uh again, hiding back with

117
00:05:26.699 --> 00:05:30.139 A:middle L:90%
multiple settings. So what's the research team? So

118
00:05:30.139 --> 00:05:31.540 A:middle L:90%
my research has really been consulted on these three main

119
00:05:31.540 --> 00:05:34.610 A:middle L:90%
areas, like essentially you have to, there's the

120
00:05:34.620 --> 00:05:38.069 A:middle L:90%
data which is the data about the large real world

121
00:05:38.079 --> 00:05:42.410 A:middle L:90%
applications and processes for and you build models out of

122
00:05:42.410 --> 00:05:44.829 A:middle L:90%
it and then you analyze the models which is the

123
00:05:44.829 --> 00:05:46.839 A:middle L:90%
analysis and understanding part. And once you do the

124
00:05:46.839 --> 00:05:49.879 A:middle L:90%
analysis, the next part is actually using their understanding

125
00:05:49.879 --> 00:05:54.399 A:middle L:90%
from the data to actually develop policy in action on

126
00:05:54.399 --> 00:05:57.379 A:middle L:90%
these processes. So for example, how this look

127
00:05:57.389 --> 00:06:00.459 A:middle L:90%
in epidemiological setting, for example, in the public

128
00:06:00.459 --> 00:06:01.129 A:middle L:90%
health setting, this would be modeling number of patient

129
00:06:01.129 --> 00:06:03.569 A:middle L:90%
transports, right? How how diseases spread among patients

130
00:06:03.569 --> 00:06:06.660 A:middle L:90%
in hospitals. Then the analysis part would be like

131
00:06:06.689 --> 00:06:09.800 A:middle L:90%
a new allies in the epidemic models to say well

132
00:06:09.800 --> 00:06:12.439 A:middle L:90%
an epidemic happened. And the policy and action part

133
00:06:12.449 --> 00:06:15.240 A:middle L:90%
would be essentially how to control outbreaks. I mean

134
00:06:15.250 --> 00:06:16.149 A:middle L:90%
, so once you have understood how the epidemics happened

135
00:06:16.149 --> 00:06:18.529 A:middle L:90%
, you can use that information to actually control the

136
00:06:18.529 --> 00:06:24.339 A:middle L:90%
outbreaks right by uh item distributing, disinfected and similarly

137
00:06:24.350 --> 00:06:26.560 A:middle L:90%
in the social media setting, you can think of

138
00:06:26.569 --> 00:06:29.120 A:middle L:90%
the data is modeling tweets, uh tweets spreading,

139
00:06:29.129 --> 00:06:31.709 A:middle L:90%
the people are retweeting uh information or topics or some

140
00:06:31.720 --> 00:06:35.220 A:middle L:90%
interesting news means item and the analysis would be what

141
00:06:35.220 --> 00:06:38.069 A:middle L:90%
would be the number of cascades and future can predicted

142
00:06:38.069 --> 00:06:39.959 A:middle L:90%
? Can you build a model for it and essentially

143
00:06:39.959 --> 00:06:41.879 A:middle L:90%
then using it for settings like how to market better

144
00:06:41.889 --> 00:06:46.910 A:middle L:90%
? Because once you have that are right. So

145
00:06:46.910 --> 00:06:49.029 A:middle L:90%
in this stock, I'll try to give you three

146
00:06:49.040 --> 00:06:51.720 A:middle L:90%
concrete questions which are trying to answer one of them

147
00:06:51.720 --> 00:06:54.100 A:middle L:90%
would be in the analysis part, which is like

148
00:06:54.100 --> 00:06:56.670 A:middle L:90%
given provocation models. Can you actually predict with an

149
00:06:56.670 --> 00:06:59.730 A:middle L:90%
epidemic happened? So these are really well understood,

150
00:06:59.740 --> 00:07:02.319 A:middle L:90%
well established propagation models and disease spreading. For example

151
00:07:02.329 --> 00:07:04.709 A:middle L:90%
, the second question would be in the policy action

152
00:07:04.709 --> 00:07:08.220 A:middle L:90%
part that once you have understood how these epidemics happened

153
00:07:08.230 --> 00:07:11.089 A:middle L:90%
, can unionize and control these outbreaks better. So

154
00:07:11.100 --> 00:07:13.920 A:middle L:90%
that's that that's a bit more algorithmic, right?

155
00:07:13.930 --> 00:07:16.170 A:middle L:90%
Because it's like you're developing algorithms to control the uh

156
00:07:16.180 --> 00:07:18.939 A:middle L:90%
epidemics. And the final thing I try to spend

157
00:07:18.939 --> 00:07:21.139 A:middle L:90%
some time is how to hashtag spread. For example

158
00:07:21.149 --> 00:07:24.949 A:middle L:90%
, that that give you a flavor of different application

159
00:07:24.959 --> 00:07:27.139 A:middle L:90%
domains here, the data is from twitter with tweets

160
00:07:27.139 --> 00:07:29.930 A:middle L:90%
are spreading among the people active on twitter and you

161
00:07:29.930 --> 00:07:31.110 A:middle L:90%
want to understand how do different topic hashtag just like

162
00:07:31.110 --> 00:07:34.209 A:middle L:90%
topics attached to the tweets and you want to understand

163
00:07:34.220 --> 00:07:39.300 A:middle L:90%
how different topic spread. Right? So as I

164
00:07:39.300 --> 00:07:41.069 A:middle L:90%
said this, the outline of the talk first,

165
00:07:41.069 --> 00:07:43.050 A:middle L:90%
I'll go over the theoretical part which is epidemics,

166
00:07:43.050 --> 00:07:45.839 A:middle L:90%
what happens then the action which is how to immunize

167
00:07:45.850 --> 00:07:47.899 A:middle L:90%
then some learning models from the total data. And

168
00:07:47.910 --> 00:07:49.879 A:middle L:90%
if time permits, I'll try to cover some other

169
00:07:49.889 --> 00:07:53.759 A:middle L:90%
uh other work which have been interested in can do

170
00:07:53.769 --> 00:07:59.339 A:middle L:90%
so uh in epidemic spreading. The fundamental question is

171
00:07:59.339 --> 00:08:01.420 A:middle L:90%
like uh when an epidemic happens. So imagine like

172
00:08:01.420 --> 00:08:05.610 A:middle L:90%
you have a strong virus on this contact network and

173
00:08:05.620 --> 00:08:07.550 A:middle L:90%
uh this probably this guy gets infected and it spread

174
00:08:07.550 --> 00:08:11.529 A:middle L:90%
the infection to his neighbors and then so slowly because

175
00:08:11.529 --> 00:08:13.379 A:middle L:90%
the virus is so virulent and strong, it spreads

176
00:08:13.379 --> 00:08:15.230 A:middle L:90%
to everyone in the network and you have a giant

177
00:08:15.230 --> 00:08:16.430 A:middle L:90%
dick epidemic. So almost everybody in the network is

178
00:08:16.430 --> 00:08:20.709 A:middle L:90%
infected, right? Uh suppose now you have a

179
00:08:20.709 --> 00:08:22.319 A:middle L:90%
weak virus. Uh you would imagine that the infection

180
00:08:22.319 --> 00:08:24.199 A:middle L:90%
would spread just to probably a couple of people and

181
00:08:24.199 --> 00:08:26.620 A:middle L:90%
it dies out, right? And so the footprint

182
00:08:26.620 --> 00:08:28.470 A:middle L:90%
is small, the number of infected people at the

183
00:08:28.470 --> 00:08:31.419 A:middle L:90%
end of small. So really you have these two

184
00:08:31.429 --> 00:08:35.850 A:middle L:90%
regimes of the epidemic and you want to understand what

185
00:08:35.850 --> 00:08:37.759 A:middle L:90%
separates them. So more concretely suppose you have this

186
00:08:37.769 --> 00:08:41.379 A:middle L:90%
number of infected was sustained. This is just the

187
00:08:41.379 --> 00:08:43.759 A:middle L:90%
number of infections per unit time and this is the

188
00:08:43.759 --> 00:08:46.649 A:middle L:90%
about but it's about regime which is the epidemic regime

189
00:08:46.649 --> 00:08:48.799 A:middle L:90%
that a lot of people got infected. And you

190
00:08:48.799 --> 00:08:50.629 A:middle L:90%
also have a bill a regime which is extinction.

191
00:08:52.039 --> 00:08:54.100 A:middle L:90%
And essentially the question is can you find a condition

192
00:08:54.100 --> 00:09:00.470 A:middle L:90%
which separates these two regimes? So right. So

193
00:09:00.480 --> 00:09:01.490 A:middle L:90%
again, just to reiterate you are given the epidemic

194
00:09:01.490 --> 00:09:03.120 A:middle L:90%
model. So of course you need to assume a

195
00:09:03.120 --> 00:09:05.460 A:middle L:90%
model here which is I have learned from the data

196
00:09:05.460 --> 00:09:09.490 A:middle L:90%
analysis analyzed and then you have the virus and under

197
00:09:09.490 --> 00:09:11.049 A:middle L:90%
graph which is essentially the contact network on which the

198
00:09:11.049 --> 00:09:13.399 A:middle L:90%
virus is spreading. And you want to find a

199
00:09:13.399 --> 00:09:16.889 A:middle L:90%
condition for virus extinction. So and I call this

200
00:09:16.889 --> 00:09:18.179 A:middle L:90%
the static version right now because the graph we assume

201
00:09:18.179 --> 00:09:20.509 A:middle L:90%
is static, it doesn't change. So essentially it's

202
00:09:20.509 --> 00:09:24.990 A:middle L:90%
the same given graphics you have. So, uh

203
00:09:26.000 --> 00:09:26.490 A:middle L:90%
, of course there's a fundamental question. You might

204
00:09:26.490 --> 00:09:28.309 A:middle L:90%
think that is interesting in itself, but why is

205
00:09:28.309 --> 00:09:31.649 A:middle L:90%
it important? Right? So it's important for uh

206
00:09:31.659 --> 00:09:33.080 A:middle L:90%
, many reasons. So one of them is that

207
00:09:33.090 --> 00:09:35.220 A:middle L:90%
it can accelerate simulations because these simulations are really expensive

208
00:09:35.220 --> 00:09:37.850 A:middle L:90%
and either done a lot of different machines and using

209
00:09:37.860 --> 00:09:41.269 A:middle L:90%
really huge contact networks and uh, so on.

210
00:09:41.279 --> 00:09:43.049 A:middle L:90%
So it would be really nice to actually be able

211
00:09:43.049 --> 00:09:45.409 A:middle L:90%
to predict analytically, but certainty that what happens,

212
00:09:45.409 --> 00:09:46.309 A:middle L:90%
right? If so if, if a simulation will

213
00:09:46.309 --> 00:09:48.509 A:middle L:90%
lead to a condition where the epidemic doesn't happen,

214
00:09:48.509 --> 00:09:50.590 A:middle L:90%
you don't really need to simulate right? It probably

215
00:09:50.590 --> 00:09:54.440 A:middle L:90%
is not that useful. And also forecasting what if

216
00:09:54.440 --> 00:09:56.039 A:middle L:90%
scenarios, what if the virus was two stripes are

217
00:09:56.039 --> 00:09:58.399 A:middle L:90%
stronger uh half times a week. So what will

218
00:09:58.399 --> 00:10:01.120 A:middle L:90%
happen? What will change how with the distribution of

219
00:10:01.120 --> 00:10:03.379 A:middle L:90%
the epidemic change? And so, uh, and

220
00:10:03.379 --> 00:10:05.120 A:middle L:90%
finally as I'll show later in the dark as well

221
00:10:05.120 --> 00:10:07.350 A:middle L:90%
, it's a great handle to manipulate the spreading which

222
00:10:07.350 --> 00:10:09.440 A:middle L:90%
is controlling outbreaks or maximum argument spreading it. Like

223
00:10:09.450 --> 00:10:15.500 A:middle L:90%
for example, maximum collaboration. So in this part

224
00:10:15.509 --> 00:10:16.730 A:middle L:90%
, the outline is essentially I'll try to give you

225
00:10:16.730 --> 00:10:20.470 A:middle L:90%
quickly a bit of background of the epidemic models.

226
00:10:20.470 --> 00:10:22.820 A:middle L:90%
And so and then the result of the inclusion on

227
00:10:22.820 --> 00:10:26.159 A:middle L:90%
static graph. Some uh, some ideas of the

228
00:10:26.169 --> 00:10:28.090 A:middle L:90%
proof. How Of course I won't be able to

229
00:10:28.090 --> 00:10:28.870 A:middle L:90%
give you the full proof, but I'll try to

230
00:10:28.870 --> 00:10:31.590 A:middle L:90%
give you a sense of what we did. And

231
00:10:31.590 --> 00:10:33.049 A:middle L:90%
as a bonus using similar methodology, you can even

232
00:10:33.049 --> 00:10:37.274 A:middle L:90%
get powerful results and other different areas like what if

233
00:10:37.274 --> 00:10:37.965 A:middle L:90%
the grass for changing over time, for example,

234
00:10:37.975 --> 00:10:41.284 A:middle L:90%
which is the more realistic scenario. Right? And

235
00:10:41.284 --> 00:10:45.375 A:middle L:90%
also in the domain of competing viruses. So let's

236
00:10:45.375 --> 00:10:48.034 A:middle L:90%
go. So this background. So S. I

237
00:10:48.034 --> 00:10:48.845 A:middle L:90%
. R. Is essentially a very simple one of

238
00:10:48.845 --> 00:10:52.575 A:middle L:90%
the most common basic epidemic models which is like it's

239
00:10:52.575 --> 00:10:56.695 A:middle L:90%
got a susceptible infected recovered model which is uh models

240
00:10:56.695 --> 00:10:58.705 A:middle L:90%
like immunity which you gain in mom's once you get

241
00:10:58.705 --> 00:11:01.105 A:middle L:90%
months you'll never get it again in your life.

242
00:11:01.115 --> 00:11:03.975 A:middle L:90%
So uh so the assumption here is that each node

243
00:11:03.975 --> 00:11:07.315 A:middle L:90%
in the network is essentially in three states, one

244
00:11:07.315 --> 00:11:09.335 A:middle L:90%
of them is susceptible, which just means healthy.

245
00:11:09.345 --> 00:11:11.264 A:middle L:90%
One of them is infected, you are infected with

246
00:11:11.264 --> 00:11:15.215 A:middle L:90%
the virus and the third one is removed their unfortunate

247
00:11:15.225 --> 00:11:18.274 A:middle L:90%
where you can get infected again or unfortunately you passed

248
00:11:18.274 --> 00:11:20.495 A:middle L:90%
away. So uh this this is represented by the

249
00:11:20.495 --> 00:11:22.184 A:middle L:90%
state diagram here. So you can think of this

250
00:11:22.184 --> 00:11:26.455 A:middle L:90%
as a susceptible estate I. And art. So

251
00:11:26.465 --> 00:11:28.085 A:middle L:90%
one of these parameters. So you assume that the

252
00:11:28.095 --> 00:11:31.554 A:middle L:90%
graph the virus spreads in this way, for example

253
00:11:31.554 --> 00:11:33.745 A:middle L:90%
, here you have shown three snapshots of the network

254
00:11:33.754 --> 00:11:35.085 A:middle L:90%
. So in the first snapshot this guy has been

255
00:11:35.085 --> 00:11:37.424 A:middle L:90%
infected and it spreads the virus for each of his

256
00:11:37.424 --> 00:11:41.054 A:middle L:90%
neighbors independently with probably DaVita. So that's an assumption

257
00:11:41.054 --> 00:11:43.445 A:middle L:90%
we make that the virus is essentially spreading independently on

258
00:11:43.445 --> 00:11:46.945 A:middle L:90%
the edges from an infected person. So sometime in

259
00:11:46.945 --> 00:11:50.539 A:middle L:90%
the future it is going to this guy spreads the

260
00:11:50.539 --> 00:11:52.500 A:middle L:90%
virus to one of his neighbors. At the same

261
00:11:52.500 --> 00:11:54.799 A:middle L:90%
time there is a competing process and the competing process

262
00:11:54.799 --> 00:11:58.129 A:middle L:90%
, the curing rate that the probability that infected person

263
00:11:58.139 --> 00:12:01.600 A:middle L:90%
cures themselves. Right? And that's delta. So

264
00:12:01.610 --> 00:12:05.320 A:middle L:90%
safe for time is typically three. This guy's cure

265
00:12:05.320 --> 00:12:07.700 A:middle L:90%
himself and he uh this person is infected. So

266
00:12:07.700 --> 00:12:09.149 A:middle L:90%
not that the epidemic has died out, right?

267
00:12:09.159 --> 00:12:11.509 A:middle L:90%
There's no way the epidemic can spread. There are

268
00:12:11.509 --> 00:12:13.669 A:middle L:90%
only two people who who have been affected by the

269
00:12:13.669 --> 00:12:16.059 A:middle L:90%
virus and that's it because there's no other neighbors for

270
00:12:16.059 --> 00:12:20.120 A:middle L:90%
this person. So so essentially we want to identify

271
00:12:20.120 --> 00:12:24.700 A:middle L:90%
when this happens in a general way. Uh as

272
00:12:24.700 --> 00:12:26.029 A:middle L:90%
you can imagine there are a lot of different epidemic

273
00:12:26.029 --> 00:12:30.320 A:middle L:90%
models. Uh we called I called virus propagation models

274
00:12:30.320 --> 00:12:33.759 A:middle L:90%
and uh the stock and uh so one variant of

275
00:12:33.759 --> 00:12:35.649 A:middle L:90%
the science model. It's also very popular model is

276
00:12:35.649 --> 00:12:37.500 A:middle L:90%
the flu like model in this you don't recover,

277
00:12:37.509 --> 00:12:41.220 A:middle L:90%
you don't get any immunity, you become susceptibility.

278
00:12:41.230 --> 00:12:45.100 A:middle L:90%
So uh there's the CRS model that you have temporary

279
00:12:45.100 --> 00:12:46.940 A:middle L:90%
immunity like purposes which is whooping cough and the ci

280
00:12:46.940 --> 00:12:50.580 A:middle L:90%
our moms like like virus incubation and all those things

281
00:12:50.679 --> 00:12:52.250 A:middle L:90%
. And the underlying contact network is essentially home.

282
00:12:52.539 --> 00:12:56.279 A:middle L:90%
Okay, so again, so this is a very

283
00:12:56.279 --> 00:13:00.419 A:middle L:90%
old problem as you can think of because it's very

284
00:13:00.429 --> 00:13:03.409 A:middle L:90%
easy to state and uh it's a lot of work

285
00:13:03.409 --> 00:13:03.659 A:middle L:90%
has been done on the right, so some of

286
00:13:03.659 --> 00:13:07.679 A:middle L:90%
the papers you can see is like from 69 91

287
00:13:07.799 --> 00:13:09.590 A:middle L:90%
so on. But really the key things to take

288
00:13:09.590 --> 00:13:13.509 A:middle L:90%
over here is that all our about structured apologies where

289
00:13:13.570 --> 00:13:15.830 A:middle L:90%
apologies has given its all resume. So it can

290
00:13:15.830 --> 00:13:18.590 A:middle L:90%
be like fully connected clicks which is everybody is connected

291
00:13:18.590 --> 00:13:22.470 A:middle L:90%
to everybody else or blocked Agnes or hierarchy. Uh

292
00:13:22.480 --> 00:13:26.539 A:middle L:90%
population is distributed hierarchical, there are random graphs or

293
00:13:26.539 --> 00:13:28.980 A:middle L:90%
they give the specific virus propagation models, like they

294
00:13:28.980 --> 00:13:31.789 A:middle L:90%
assume the virus propagation model to be some specific structure

295
00:13:31.909 --> 00:13:33.860 A:middle L:90%
and or static graphs where the graphs don't change.

296
00:13:33.980 --> 00:13:37.710 A:middle L:90%
So in the stock, I try to generalize all

297
00:13:37.710 --> 00:13:41.399 A:middle L:90%
these three directions. Right? So how would so

298
00:13:41.409 --> 00:13:43.059 A:middle L:90%
yeah, how would the answer look like? Right

299
00:13:43.070 --> 00:13:46.580 A:middle L:90%
. What should answer depend on. So because the

300
00:13:46.580 --> 00:13:48.600 A:middle L:90%
inputs the problem are essentially the graph and the virus

301
00:13:48.600 --> 00:13:50.320 A:middle L:90%
propagation model. It's reasonable to assume that it should

302
00:13:50.320 --> 00:13:52.120 A:middle L:90%
depend on both of them. Right? That if

303
00:13:52.120 --> 00:13:54.879 A:middle L:90%
you change the graph, the answer change. If

304
00:13:54.879 --> 00:13:56.539 A:middle L:90%
you change the model of the answer should change again

305
00:13:56.549 --> 00:14:00.279 A:middle L:90%
. But how like uh it's clear that because it's

306
00:14:00.279 --> 00:14:01.600 A:middle L:90%
a spreading process, there should be some connectivity metric

307
00:14:01.600 --> 00:14:05.009 A:middle L:90%
of the graph. Right? Uh So how should

308
00:14:05.009 --> 00:14:07.809 A:middle L:90%
the graph player should be average degree or uh expected

309
00:14:07.809 --> 00:14:09.860 A:middle L:90%
degree of max degree of the diameter. All these

310
00:14:09.860 --> 00:14:13.659 A:middle L:90%
are connectivity metrics. And for the virus propagation model

311
00:14:13.669 --> 00:14:15.570 A:middle L:90%
, which farm it is important. For example,

312
00:14:15.570 --> 00:14:16.379 A:middle L:90%
in the CR models, you saw beta and delta

313
00:14:16.379 --> 00:14:18.779 A:middle L:90%
two parameters which ones should be important? Like beta

314
00:14:18.779 --> 00:14:22.570 A:middle L:90%
and delta. Both important or are. And finally

315
00:14:22.570 --> 00:14:24.389 A:middle L:90%
the question is of course, how to combine them

316
00:14:24.399 --> 00:14:26.759 A:middle L:90%
? I mean, it should be linear quadratic explanation

317
00:14:26.759 --> 00:14:28.379 A:middle L:90%
in some format. So the nice thing is that

318
00:14:28.389 --> 00:14:33.000 A:middle L:90%
uh what we found in our uh work was that

319
00:14:33.009 --> 00:14:37.620 A:middle L:90%
it's easily suitable resident. So informally for any arbitrary

320
00:14:37.620 --> 00:14:39.360 A:middle L:90%
topology, which is be represented the topology by an

321
00:14:39.360 --> 00:14:41.809 A:middle L:90%
adjacency matrix. A. It's a matrix. Uh

322
00:14:41.820 --> 00:14:45.164 A:middle L:90%
and by a number of notes, uh and any

323
00:14:45.164 --> 00:14:48.865 A:middle L:90%
virus propagation model in the treasure standard literature. And

324
00:14:48.875 --> 00:14:50.365 A:middle L:90%
if you represent the adjacency matrix, it's only one

325
00:14:50.365 --> 00:14:54.345 A:middle L:90%
parameter lambda, which is the largest organ value of

326
00:14:54.345 --> 00:14:56.235 A:middle L:90%
the C matrix. I'll try to give you an

327
00:14:56.235 --> 00:14:58.254 A:middle L:90%
intuition about what it exactly means. And also a

328
00:14:58.254 --> 00:15:01.154 A:middle L:90%
constant cBP in which we talk to C B p

329
00:15:01.154 --> 00:15:03.735 A:middle L:90%
M. And it's a constant depending on the virus

330
00:15:03.735 --> 00:15:05.115 A:middle L:90%
propagation model and it's an explicit constant. We give

331
00:15:05.115 --> 00:15:07.695 A:middle L:90%
the constant in the proof. And given these two

332
00:15:07.705 --> 00:15:11.995 A:middle L:90%
things, there's no epidemic. If lambda times CPM

333
00:15:11.995 --> 00:15:13.144 A:middle L:90%
is less than one, that's it. So the

334
00:15:13.144 --> 00:15:18.664 A:middle L:90%
graph interacts with the threshold question. Uh The threshold

335
00:15:18.664 --> 00:15:20.955 A:middle L:90%
question uh only with only one parameter and its linear

336
00:15:20.965 --> 00:15:24.424 A:middle L:90%
in combination. Right? So it's lambda times C

337
00:15:24.424 --> 00:15:24.245 A:middle L:90%
B P. M. And C B P.

338
00:15:24.245 --> 00:15:26.105 A:middle L:90%
M is essentially a constant which is depending on the

339
00:15:26.105 --> 00:15:30.365 A:middle L:90%
virus propagation model. So if you multiply that and

340
00:15:30.365 --> 00:15:33.445 A:middle L:90%
it's less than one you're done. So how does

341
00:15:33.445 --> 00:15:37.245 A:middle L:90%
this uh threshold actually substantiate in particular models? So

342
00:15:37.254 --> 00:15:39.355 A:middle L:90%
some of these models, so just the standard is

343
00:15:39.355 --> 00:15:41.205 A:middle L:90%
a discussion. I'll try to use this term which

344
00:15:41.205 --> 00:15:43.514 A:middle L:90%
is effective strength. I do not it by S

345
00:15:43.524 --> 00:15:48.144 A:middle L:90%
. And so this is essentially this product. So

346
00:15:48.144 --> 00:15:50.375 A:middle L:90%
if this product is less than one year below threshold

347
00:15:50.384 --> 00:15:52.644 A:middle L:90%
, otherwise you're about threshold. So for the for

348
00:15:52.644 --> 00:15:54.115 A:middle L:90%
a whole bunch of models S. I. R

349
00:15:54.125 --> 00:15:54.514 A:middle L:90%
. S. I. S. This the flu

350
00:15:54.514 --> 00:15:58.164 A:middle L:90%
like this, the moms like and all the alphabet

351
00:15:58.164 --> 00:16:00.014 A:middle L:90%
soup, you can think that it's just lambda,

352
00:16:00.014 --> 00:16:03.634 A:middle L:90%
beta delta. So the important thing to note here

353
00:16:03.634 --> 00:16:06.434 A:middle L:90%
is that uh all these models actually have much more

354
00:16:06.434 --> 00:16:07.965 A:middle L:90%
parameters. For example, S. I. R

355
00:16:07.965 --> 00:16:10.195 A:middle L:90%
. S. Has a forgetting factory. It's temporary

356
00:16:10.195 --> 00:16:11.424 A:middle L:90%
immunity. So you gain immunity, then you lose

357
00:16:11.424 --> 00:16:14.225 A:middle L:90%
it. So there are a lot of different factors

358
00:16:14.225 --> 00:16:15.865 A:middle L:90%
which actually are in the model, but they don't

359
00:16:15.865 --> 00:16:18.174 A:middle L:90%
play a role in the threshold. And this gives

360
00:16:18.174 --> 00:16:18.325 A:middle L:90%
you a sense of the power of the result,

361
00:16:18.325 --> 00:16:21.264 A:middle L:90%
right? Because you can see these interactions play out

362
00:16:21.264 --> 00:16:23.575 A:middle L:90%
in the uh result. And for all these cases

363
00:16:23.585 --> 00:16:29.095 A:middle L:90%
lambda is essentially uh the graph the interacts only the

364
00:16:29.105 --> 00:16:30.875 A:middle L:90%
uh by the parameter lambda. Uh And they are

365
00:16:30.875 --> 00:16:33.065 A:middle L:90%
much more complicated models have said. But for example

366
00:16:33.075 --> 00:16:36.725 A:middle L:90%
, this model has two infected states. So it's

367
00:16:36.725 --> 00:16:38.144 A:middle L:90%
it's a bit more complex model for all of them

368
00:16:38.144 --> 00:16:40.649 A:middle L:90%
. The threshold is S is equal to one.

369
00:16:40.659 --> 00:16:45.649 A:middle L:90%
That's it. So what's the intention for lambda?

370
00:16:45.649 --> 00:16:48.519 A:middle L:90%
Right. So the official linear algebra definition is essentially

371
00:16:48.519 --> 00:16:51.649 A:middle L:90%
it's the root of the of the largest magnitude of

372
00:16:51.659 --> 00:16:53.730 A:middle L:90%
the characteristic polynomial. So sure, it doesn't give

373
00:16:53.740 --> 00:16:57.389 A:middle L:90%
too much attention. So uh what's the unofficial intuition

374
00:16:57.389 --> 00:17:00.669 A:middle L:90%
? This is essentially the number of parts of the

375
00:17:00.669 --> 00:17:03.240 A:middle L:90%
graph. So what we so imagine this adjacency matrix

376
00:17:03.240 --> 00:17:04.349 A:middle L:90%
, right? And if you take it to the

377
00:17:04.349 --> 00:17:07.470 A:middle L:90%
chaos power, then the I I. J.

378
00:17:07.480 --> 00:17:11.670 A:middle L:90%
Element in this matrix. This a square matrix and

379
00:17:11.680 --> 00:17:14.319 A:middle L:90%
digest element is essentially the number of parts from I

380
00:17:14.319 --> 00:17:15.309 A:middle L:90%
to change the network. So of course this parts

381
00:17:15.309 --> 00:17:18.880 A:middle L:90%
of repeated and their loops. But the rough intuition

382
00:17:18.880 --> 00:17:19.519 A:middle L:90%
is that it's captures the conduct of the graph in

383
00:17:19.519 --> 00:17:22.319 A:middle L:90%
that sense. And if you take the spectral decomposition

384
00:17:22.319 --> 00:17:23.829 A:middle L:90%
, so you don't need to know much about the

385
00:17:23.829 --> 00:17:26.519 A:middle L:90%
decomposition, apart from the fact that it involves tagging

386
00:17:26.519 --> 00:17:30.380 A:middle L:90%
value uh and the Eigen vectors. So we took

387
00:17:30.380 --> 00:17:32.339 A:middle L:90%
it just to the first I can vector and Eigen

388
00:17:32.339 --> 00:17:33.759 A:middle L:90%
value, you can see that lander to the parquet

389
00:17:33.769 --> 00:17:36.950 A:middle L:90%
. We don't care about these things. The lander

390
00:17:36.950 --> 00:17:38.279 A:middle L:90%
to the arcade government essentially. What's the magnitude of

391
00:17:38.289 --> 00:17:41.309 A:middle L:90%
this matrix in some sense? So how does it

392
00:17:41.309 --> 00:17:44.210 A:middle L:90%
look for some paragraphs? Right, So for example

393
00:17:44.210 --> 00:17:45.960 A:middle L:90%
, look at this uh there's a change star and

394
00:17:45.960 --> 00:17:48.630 A:middle L:90%
click uh all of have the same number of notes

395
00:17:48.640 --> 00:17:52.210 A:middle L:90%
, but uh different number of edges in particular.

396
00:17:52.210 --> 00:17:53.329 A:middle L:90%
These two are actually same number of edges. But

397
00:17:53.329 --> 00:17:56.220 A:middle L:90%
intuitively you can imagine that star is better for the

398
00:17:56.220 --> 00:17:59.140 A:middle L:90%
virus, right? Because once you infect the center

399
00:17:59.150 --> 00:18:00.650 A:middle L:90%
it will quickly spread. So how does it look

400
00:18:00.660 --> 00:18:03.390 A:middle L:90%
turned up in the wagon value so well, so

401
00:18:03.400 --> 00:18:06.960 A:middle L:90%
the differences are not too large here. But uh

402
00:18:06.970 --> 00:18:08.230 A:middle L:90%
, if you make it end notes, for example

403
00:18:08.240 --> 00:18:11.579 A:middle L:90%
, that how would it look like the value of

404
00:18:11.579 --> 00:18:12.259 A:middle L:90%
the star is essentially the uh, sorry, the

405
00:18:12.269 --> 00:18:15.519 A:middle L:90%
chain is essentially constant. So it's uh, the

406
00:18:15.529 --> 00:18:18.000 A:middle L:90%
corresponds to intuition, right? So to 10 no

407
00:18:18.000 --> 00:18:19.549 A:middle L:90%
change is probably as bad as 100. No change

408
00:18:19.559 --> 00:18:22.230 A:middle L:90%
. But the 10 note star is much better than

409
00:18:22.279 --> 00:18:26.170 A:middle L:90%
100 note stuff. And in particular grows as lieutenant

410
00:18:26.170 --> 00:18:27.079 A:middle L:90%
here and it grows as landmines from click is the

411
00:18:27.079 --> 00:18:29.930 A:middle L:90%
worst. Right? Because once you infect anyone of

412
00:18:29.930 --> 00:18:30.309 A:middle L:90%
them, you can quickly infect the rest of the

413
00:18:30.309 --> 00:18:33.759 A:middle L:90%
world. So uh, yeah, I've given also

414
00:18:33.759 --> 00:18:36.430 A:middle L:90%
values for them and see and it's called 1000.

415
00:18:36.440 --> 00:18:37.859 A:middle L:90%
Right? So what I wish to claim is that

416
00:18:37.869 --> 00:18:41.519 A:middle L:90%
for this problem, better connectivity is highlander. So

417
00:18:41.529 --> 00:18:44.970 A:middle L:90%
like and uh, I'll give you some examples of

418
00:18:44.980 --> 00:18:48.069 A:middle L:90%
uh, this result. So for maybe use a

419
00:18:48.069 --> 00:18:49.759 A:middle L:90%
really huge graphics was actually developed by the folks at

420
00:18:49.759 --> 00:18:52.529 A:middle L:90%
any ssl here in Virginia Tech and this is a

421
00:18:52.539 --> 00:18:55.819 A:middle L:90%
really huge graph like 31 million links and six million

422
00:18:55.819 --> 00:18:59.740 A:middle L:90%
notes. And uh it's been used lots of outbreak

423
00:18:59.740 --> 00:19:03.000 A:middle L:90%
stories like smallpox and all those things. And so

424
00:19:03.000 --> 00:19:03.819 A:middle L:90%
you can see two graphs here, we simulated the

425
00:19:03.819 --> 00:19:07.329 A:middle L:90%
sierra Madre here. So the two graphs are infection

426
00:19:07.329 --> 00:19:08.609 A:middle L:90%
profile and the takeoff plot. The infection profile,

427
00:19:08.609 --> 00:19:11.940 A:middle L:90%
essentially the number of people infected number of nodes infected

428
00:19:11.940 --> 00:19:15.190 A:middle L:90%
per unit time. So you can clearly see the

429
00:19:15.200 --> 00:19:17.529 A:middle L:90%
two different regimes. What it doesn't show is exactly

430
00:19:17.529 --> 00:19:19.460 A:middle L:90%
where the regime separate and this is this is what

431
00:19:19.460 --> 00:19:22.769 A:middle L:90%
this blood shows. So that the takeoff plot,

432
00:19:22.779 --> 00:19:25.200 A:middle L:90%
you can see that the X axis, the effective

433
00:19:25.200 --> 00:19:26.339 A:middle L:90%
strength, if you remember the effective strength was s

434
00:19:26.349 --> 00:19:30.710 A:middle L:90%
the product and biases the footprint essentially how many people

435
00:19:30.710 --> 00:19:33.049 A:middle L:90%
were in effect at the end of the infection.

436
00:19:33.059 --> 00:19:34.630 A:middle L:90%
So how bad the infection was. So you can

437
00:19:34.630 --> 00:19:37.799 A:middle L:90%
see that this is the predicted threshold at 10 0

438
00:19:37.799 --> 00:19:41.480 A:middle L:90%
which is one. And then the effective strength is

439
00:19:41.480 --> 00:19:44.809 A:middle L:90%
one. Suddenly the footprint takes off so you can

440
00:19:44.809 --> 00:19:47.970 A:middle L:90%
see that we can market these two regimes. Uh

441
00:19:47.980 --> 00:19:48.690 A:middle L:90%
And just to give you a further example, there's

442
00:19:48.690 --> 00:19:51.819 A:middle L:90%
another plot with the I. R. S.

443
00:19:51.819 --> 00:19:55.049 A:middle L:90%
Model which is the temporary immunity model purposes and you

444
00:19:55.049 --> 00:19:57.750 A:middle L:90%
can see similar behaviors. They're like uh So not

445
00:19:57.750 --> 00:20:00.619 A:middle L:90%
that effective strength incidentally means the same thing here.

446
00:20:00.619 --> 00:20:03.859 A:middle L:90%
Right? It's still land a bit of a delta

447
00:20:03.869 --> 00:20:06.039 A:middle L:90%
. It doesn't matter uh even though the series as

448
00:20:06.039 --> 00:20:11.440 A:middle L:90%
an extra parameter. So yeah. Right now that

449
00:20:11.440 --> 00:20:12.880 A:middle L:90%
I've given you a sense of the result and what

450
00:20:12.880 --> 00:20:15.220 A:middle L:90%
it means in different models uh I will try to

451
00:20:15.230 --> 00:20:19.059 A:middle L:90%
go a bit over the proof. Right. So

452
00:20:19.069 --> 00:20:22.119 A:middle L:90%
what's the proof sketch? So there are two main

453
00:20:22.119 --> 00:20:23.470 A:middle L:90%
ingredients in our model, in our proof which gives

454
00:20:23.470 --> 00:20:26.440 A:middle L:90%
rise to this nice operability between these two parameters,

455
00:20:26.440 --> 00:20:27.980 A:middle L:90%
which is one of the model, one on one

456
00:20:27.980 --> 00:20:30.700 A:middle L:90%
of the model, one on the graph. So

457
00:20:30.700 --> 00:20:33.400 A:middle L:90%
what are these two ingredients? One is the generalized

458
00:20:33.410 --> 00:20:34.910 A:middle L:90%
virus propagation model structure. That we have tried to

459
00:20:34.920 --> 00:20:38.579 A:middle L:90%
generalize all these cascade style models into some coherent structure

460
00:20:38.579 --> 00:20:41.039 A:middle L:90%
which is uh which captures every one of them and

461
00:20:41.039 --> 00:20:45.019 A:middle L:90%
get distracted. And at the same time we used

462
00:20:45.019 --> 00:20:48.250 A:middle L:90%
ideas from stability theory equivalents and stability points to actually

463
00:20:48.259 --> 00:20:52.619 A:middle L:90%
get some handle on the uh on the land or

464
00:20:52.630 --> 00:20:56.650 A:middle L:90%
how to involve the graph in the computation. So

465
00:20:56.660 --> 00:20:57.829 A:middle L:90%
these are the two ingredients which give rise to these

466
00:20:57.829 --> 00:21:02.039 A:middle L:90%
two parameters of the proof. So, for the

467
00:21:02.039 --> 00:21:03.150 A:middle L:90%
first part of our March, so they're all these

468
00:21:03.150 --> 00:21:06.029 A:middle L:90%
models. Right? And what we were able to

469
00:21:06.039 --> 00:21:07.710 A:middle L:90%
do, well we were able to convert all of

470
00:21:07.710 --> 00:21:10.049 A:middle L:90%
them into one big generalized model which you call S

471
00:21:10.049 --> 00:21:11.799 A:middle L:90%
. S. Stars for restart. So this uh

472
00:21:11.809 --> 00:21:15.299 A:middle L:90%
essentially represents these uh there are three different types conceptual

473
00:21:15.299 --> 00:21:18.309 A:middle L:90%
types of states in these kinds of models, one

474
00:21:18.309 --> 00:21:19.500 A:middle L:90%
of them susceptible, one of them is infected and

475
00:21:19.500 --> 00:21:22.799 A:middle L:90%
one of them is vigilant. And in our model

476
00:21:22.799 --> 00:21:23.470 A:middle L:90%
and I generalized model there can be any number of

477
00:21:23.470 --> 00:21:26.410 A:middle L:90%
states and any on any of these classes, right

478
00:21:26.420 --> 00:21:27.859 A:middle L:90%
? Because there's a family journalist state diagram respectively.

479
00:21:27.869 --> 00:21:30.299 A:middle L:90%
So and so the interesting part here is that there

480
00:21:30.299 --> 00:21:34.019 A:middle L:90%
is a big red arrow. So these are what

481
00:21:34.019 --> 00:21:37.559 A:middle L:90%
we term has graft, this transition. So intuitively

482
00:21:37.559 --> 00:21:38.720 A:middle L:90%
it just means that you can get infected only by

483
00:21:38.720 --> 00:21:41.650 A:middle L:90%
your neighbor. You can't just get infected by your

484
00:21:41.650 --> 00:21:44.190 A:middle L:90%
own. If you can get infected by your own

485
00:21:44.200 --> 00:21:45.750 A:middle L:90%
then it's it's not really a cascade style model.

486
00:21:45.750 --> 00:21:48.460 A:middle L:90%
Right? So this is this is one of the

487
00:21:48.460 --> 00:21:51.200 A:middle L:90%
only assumptions that you use and we were able to

488
00:21:51.200 --> 00:21:52.400 A:middle L:90%
generalize the whole, you know, big set of

489
00:21:52.400 --> 00:21:56.630 A:middle L:90%
models. So I won't go over this. Essentially

490
00:21:56.630 --> 00:21:57.789 A:middle L:90%
what I want to say here is that the huge

491
00:21:57.789 --> 00:22:00.980 A:middle L:90%
big boulders uh hire all these complexities, right?

492
00:22:00.980 --> 00:22:03.609 A:middle L:90%
There can be any number of transitions from any number

493
00:22:03.609 --> 00:22:06.329 A:middle L:90%
of states. You can have transition between states across

494
00:22:06.329 --> 00:22:08.839 A:middle L:90%
classes and so on. So uh yeah. So

495
00:22:08.839 --> 00:22:11.170 A:middle L:90%
what's the special case? So if you have just

496
00:22:11.170 --> 00:22:15.339 A:middle L:90%
one susceptible uh known one uh infected no one vigilant

497
00:22:15.339 --> 00:22:18.869 A:middle L:90%
state then it's just you're playing or less. I

498
00:22:18.869 --> 00:22:19.599 A:middle L:90%
our model right? That you have already seen before

499
00:22:19.609 --> 00:22:23.029 A:middle L:90%
. So another example is this which you are two

500
00:22:23.029 --> 00:22:26.039 A:middle L:90%
different uh infected states. So here essentially this means

501
00:22:26.039 --> 00:22:29.970 A:middle L:90%
non terminal and this is a terminal case for HIV

502
00:22:30.089 --> 00:22:32.329 A:middle L:90%
and what it shows is multiple, vigilant and multiple

503
00:22:32.329 --> 00:22:36.240 A:middle L:90%
infectious states. Uh The second ingredient in the proof

504
00:22:36.240 --> 00:22:37.470 A:middle L:90%
. Once you have generalized the model is essentially and

505
00:22:37.480 --> 00:22:41.950 A:middle L:90%
uh nonlinear dynamical system and stability theory. So the

506
00:22:41.950 --> 00:22:44.650 A:middle L:90%
key idea here is that view the whole system,

507
00:22:44.650 --> 00:22:47.299 A:middle L:90%
the view the whole evolution of the epidemic system as

508
00:22:47.299 --> 00:22:49.450 A:middle L:90%
a energies which is an only in a dynamical system

509
00:22:49.569 --> 00:22:52.480 A:middle L:90%
. And here essentially you have a big huge vector

510
00:22:52.490 --> 00:22:56.660 A:middle L:90%
B. D. Plus one which is a function

511
00:22:56.660 --> 00:22:57.970 A:middle L:90%
of the previous state which is P. T.

512
00:22:59.049 --> 00:23:02.349 A:middle L:90%
And what's the what's the relation between these two states

513
00:23:02.359 --> 00:23:04.539 A:middle L:90%
? Especially given by the function G. Which is

514
00:23:04.539 --> 00:23:07.660 A:middle L:90%
a huge but not only in your function. So

515
00:23:07.660 --> 00:23:08.640 A:middle L:90%
it's discrete time as you can see. So the

516
00:23:08.640 --> 00:23:11.440 A:middle L:90%
key idea here is that the P. T.

517
00:23:11.450 --> 00:23:14.569 A:middle L:90%
Can be probably directorate. It just states the what

518
00:23:14.579 --> 00:23:15.299 A:middle L:90%
specifies the state of the system at time, t

519
00:23:15.309 --> 00:23:18.799 A:middle L:90%
what's the probability of each note in the graph,

520
00:23:18.809 --> 00:23:21.609 A:middle L:90%
being in each state of the system and so on

521
00:23:21.619 --> 00:23:22.269 A:middle L:90%
. And jesus, as I said, a huge

522
00:23:22.269 --> 00:23:25.789 A:middle L:90%
big anonymous function. So it's a huge messy function

523
00:23:25.789 --> 00:23:27.019 A:middle L:90%
for the generalized model, which I won't try to

524
00:23:27.019 --> 00:23:29.690 A:middle L:90%
right here. But the idea is that it gives

525
00:23:29.700 --> 00:23:32.230 A:middle L:90%
explicitly gives the evolution of the system. So once

526
00:23:32.230 --> 00:23:33.140 A:middle L:90%
you're given gene, what you're given B you can

527
00:23:33.150 --> 00:23:37.319 A:middle L:90%
explicitly of all the system. Now the now that

528
00:23:37.319 --> 00:23:40.000 A:middle L:90%
you have an only in a dynamical system which is

529
00:23:40.009 --> 00:23:41.789 A:middle L:90%
given by PNG what what you do with it.

530
00:23:41.799 --> 00:23:45.670 A:middle L:90%
The next idea is that you transform the threshold question

531
00:23:45.680 --> 00:23:48.440 A:middle L:90%
, which is when an epidemic will happen. We're

532
00:23:48.440 --> 00:23:52.359 A:middle L:90%
tipping point question of stability question of this system and

533
00:23:52.369 --> 00:23:53.740 A:middle L:90%
uh we should analyze the stability. But at which

534
00:23:53.740 --> 00:23:56.630 A:middle L:90%
point, so the fixed point there which you analyze

535
00:23:56.630 --> 00:24:00.160 A:middle L:90%
the stability is essentially given by then nobody is infected

536
00:24:00.160 --> 00:24:02.380 A:middle L:90%
, right? Because that's when you and you infect

537
00:24:02.380 --> 00:24:03.109 A:middle L:90%
a few people initially. That's when you want to

538
00:24:03.119 --> 00:24:06.559 A:middle L:90%
uh understand that the system will take off or die

539
00:24:06.559 --> 00:24:07.690 A:middle L:90%
out. And what does it mean by stable and

540
00:24:07.690 --> 00:24:11.859 A:middle L:90%
unstable uh equilibrium points? So imagine this to be

541
00:24:11.859 --> 00:24:15.319 A:middle L:90%
uh contour as given by G by the system.

542
00:24:15.329 --> 00:24:18.240 A:middle L:90%
And you can see imagine the whole epidemic to be

543
00:24:18.240 --> 00:24:21.609 A:middle L:90%
this box. So you can see that a small

544
00:24:21.609 --> 00:24:22.750 A:middle L:90%
push to the system will effectively roll it down,

545
00:24:22.759 --> 00:24:25.470 A:middle L:90%
right, It will quickly go down and the epidemic

546
00:24:25.470 --> 00:24:27.269 A:middle L:90%
will take off essentially. And whereas a stable it's

547
00:24:27.269 --> 00:24:30.559 A:middle L:90%
below threshold, the epidemic will try to actually come

548
00:24:30.559 --> 00:24:32.339 A:middle L:90%
back. So the system will try to come back

549
00:24:32.339 --> 00:24:33.109 A:middle L:90%
to the state where there was no one infected.

550
00:24:33.480 --> 00:24:36.740 A:middle L:90%
And at threshold which is the neutral equilibrium. You

551
00:24:36.740 --> 00:24:37.940 A:middle L:90%
can say that there's no no inclination to go either

552
00:24:37.940 --> 00:24:41.839 A:middle L:90%
way. So what we have done is they cast

553
00:24:41.839 --> 00:24:44.000 A:middle L:90%
it as an energy is and then use it as

554
00:24:44.000 --> 00:24:45.329 A:middle L:90%
a stability and equilibrium point question rather than a threshold

555
00:24:45.329 --> 00:24:48.970 A:middle L:90%
question initially. So I won't try to go here

556
00:24:48.970 --> 00:24:49.910 A:middle L:90%
. But for N. S. I. R

557
00:24:49.910 --> 00:24:52.789 A:middle L:90%
. For example PT will be just three blocks right

558
00:24:52.799 --> 00:24:55.599 A:middle L:90%
? Which is probably of each note in the graph

559
00:24:55.609 --> 00:24:56.599 A:middle L:90%
being in each of the three states. That's it

560
00:24:56.670 --> 00:25:00.109 A:middle L:90%
. And you can have this G than the standard

561
00:25:00.109 --> 00:25:02.500 A:middle L:90%
ideas, the fixed point again, when no note

562
00:25:02.500 --> 00:25:04.380 A:middle L:90%
is infected, the question we're asking is that stable

563
00:25:04.390 --> 00:25:06.680 A:middle L:90%
? And in special case of S. I.

564
00:25:06.680 --> 00:25:07.140 A:middle L:90%
R. You can think of two notes in the

565
00:25:07.140 --> 00:25:11.089 A:middle L:90%
graph say I one I two and you are essentially

566
00:25:11.089 --> 00:25:11.569 A:middle L:90%
this is P. I. One and this P

567
00:25:11.569 --> 00:25:15.519 A:middle L:90%
. I to the probably no one is infected problem

568
00:25:15.529 --> 00:25:18.579 A:middle L:90%
nor too is infected so on under the stable regime

569
00:25:18.589 --> 00:25:19.619 A:middle L:90%
, which is below threshold regime. You can see

570
00:25:19.619 --> 00:25:22.259 A:middle L:90%
that if you part of the system, if you

571
00:25:22.259 --> 00:25:23.569 A:middle L:90%
make few notes infected in the system, it will

572
00:25:23.569 --> 00:25:27.059 A:middle L:90%
try to come back because it's below threshold. And

573
00:25:27.069 --> 00:25:30.019 A:middle L:90%
on the other hand, if you do the same

574
00:25:30.019 --> 00:25:32.069 A:middle L:90%
thing to an unstable system, it will just take

575
00:25:32.069 --> 00:25:33.880 A:middle L:90%
off and go. So what we've done is we

576
00:25:33.880 --> 00:25:37.019 A:middle L:90%
were able to separate out these two regimes based on

577
00:25:37.019 --> 00:25:40.019 A:middle L:90%
this condition. So again, please see the pain

578
00:25:40.019 --> 00:25:41.059 A:middle L:90%
over the whole proof. But these are the two

579
00:25:41.059 --> 00:25:44.339 A:middle L:90%
essential ingredients in the proof, which gives rise to

580
00:25:44.339 --> 00:25:48.730 A:middle L:90%
this nice linear severability. So right coming back to

581
00:25:48.730 --> 00:25:51.710 A:middle L:90%
the outline as I said, I promise you that

582
00:25:51.720 --> 00:25:55.309 A:middle L:90%
this kind of analysis and proof technique can actually be

583
00:25:55.319 --> 00:25:57.289 A:middle L:90%
uh extended to give you even more powerful results in

584
00:25:57.289 --> 00:26:00.250 A:middle L:90%
other cases, for example in dynamic graphs while dynamic

585
00:26:00.250 --> 00:26:03.190 A:middle L:90%
graphs. So this essentially the idea is to you

586
00:26:03.200 --> 00:26:06.930 A:middle L:90%
want to capture alternative behavior right? And human uh

587
00:26:06.940 --> 00:26:08.309 A:middle L:90%
mobility, uh, people go to work in the

588
00:26:08.309 --> 00:26:10.900 A:middle L:90%
morning. So essentially you come in contact with the

589
00:26:10.900 --> 00:26:11.769 A:middle L:90%
co workers and then you go back to your home

590
00:26:11.859 --> 00:26:15.660 A:middle L:90%
and Children probably go to school and playground and come

591
00:26:15.660 --> 00:26:18.049 A:middle L:90%
in contact with each other and and they come back

592
00:26:18.049 --> 00:26:19.049 A:middle L:90%
at night. So the adjacency matrix has changed.

593
00:26:19.059 --> 00:26:22.549 A:middle L:90%
So the notes are the same, the same people

594
00:26:22.559 --> 00:26:23.660 A:middle L:90%
is just that their behavior changes in the day and

595
00:26:23.660 --> 00:26:27.109 A:middle L:90%
night. So you have two different adjacency matrices and

596
00:26:27.119 --> 00:26:30.779 A:middle L:90%
as you can ask the same question here, right

597
00:26:30.789 --> 00:26:33.119 A:middle L:90%
? That if you have so for concreteness, I've

598
00:26:33.119 --> 00:26:34.380 A:middle L:90%
used just the science model here. Uh, if

599
00:26:34.380 --> 00:26:36.619 A:middle L:90%
you have the C. S Mott and set of

600
00:26:36.630 --> 00:26:38.089 A:middle L:90%
the arbitrary class. So the nice thing here is

601
00:26:38.089 --> 00:26:41.059 A:middle L:90%
that we assume the grass can be arbitrary. They

602
00:26:41.059 --> 00:26:41.950 A:middle L:90%
are just given, we have given just a set

603
00:26:41.950 --> 00:26:45.220 A:middle L:90%
of the class and which represent bay nine. It

604
00:26:45.220 --> 00:26:48.279 A:middle L:90%
can be weekend. You can get any granularity.

605
00:26:48.289 --> 00:26:48.960 A:middle L:90%
You're just given this tea set of grass and you

606
00:26:48.960 --> 00:26:51.910 A:middle L:90%
want to ask the same question with an epidemic takeoff

607
00:26:51.910 --> 00:26:53.569 A:middle L:90%
amount. So again, uh, we were able

608
00:26:53.569 --> 00:26:56.400 A:middle L:90%
to prove that the informally, there is no epidemic

609
00:26:56.410 --> 00:26:59.609 A:middle L:90%
. If the value of uh, of a matrix

610
00:26:59.619 --> 00:27:00.750 A:middle L:90%
, which is a huge risk matrix is less than

611
00:27:00.750 --> 00:27:03.390 A:middle L:90%
one. And again, it's just a single number

612
00:27:03.390 --> 00:27:06.210 A:middle L:90%
, right? It's just the value of a matrix

613
00:27:06.210 --> 00:27:08.009 A:middle L:90%
. And this matrix is just a product of some

614
00:27:08.009 --> 00:27:11.009 A:middle L:90%
matrices. Uh and the the important thing to note

615
00:27:11.009 --> 00:27:14.640 A:middle L:90%
here is that this sub matrices involved comes from the

616
00:27:14.640 --> 00:27:17.910 A:middle L:90%
virus propagation model as well as the ai their distance

617
00:27:17.910 --> 00:27:19.839 A:middle L:90%
matrices, which is a very reasonable intuitive right?

618
00:27:19.849 --> 00:27:22.700 A:middle L:90%
Because it should depend on the uh actually just changing

619
00:27:22.700 --> 00:27:26.339 A:middle L:90%
the distance matrices as well as the virus propagation model

620
00:27:26.349 --> 00:27:29.809 A:middle L:90%
. So, right and again, this gives you

621
00:27:29.809 --> 00:27:30.970 A:middle L:90%
the time plot this. Uh so on the left

622
00:27:30.970 --> 00:27:33.609 A:middle L:90%
side, I've shown you a synthetic uh network and

623
00:27:33.619 --> 00:27:36.890 A:middle L:90%
this is the mighty reality, which is a which

624
00:27:36.890 --> 00:27:37.990 A:middle L:90%
was a famous project by uh Sandy Pentland at M

625
00:27:37.990 --> 00:27:42.930 A:middle L:90%
I T. Which try to uh follow like track

626
00:27:42.940 --> 00:27:45.809 A:middle L:90%
undergraduates at M I T campus. And you can

627
00:27:45.809 --> 00:27:48.480 A:middle L:90%
see dips and then there's weekends because there's no connectivity

628
00:27:48.490 --> 00:27:52.869 A:middle L:90%
between people between their mobile phones in blue and so

629
00:27:52.869 --> 00:27:53.539 A:middle L:90%
on. So you can see the three again,

630
00:27:53.539 --> 00:27:56.609 A:middle L:90%
three different regimes, right? In all these three

631
00:27:56.609 --> 00:27:59.200 A:middle L:90%
cases and more concretely, if you look at the

632
00:27:59.210 --> 00:28:00.349 A:middle L:90%
takeoff plots, you can clearly see the difference.

633
00:28:00.359 --> 00:28:03.990 A:middle L:90%
Uh So the uh the X axis is again affected

634
00:28:03.990 --> 00:28:07.420 A:middle L:90%
strength here. The effective strength is the value of

635
00:28:07.420 --> 00:28:10.569 A:middle L:90%
that huge big nasty product of matrices, right?

636
00:28:10.619 --> 00:28:11.680 A:middle L:90%
And y axes. Again, the footprint, you

637
00:28:11.680 --> 00:28:15.009 A:middle L:90%
can see the predicted threshold is here and these are

638
00:28:15.009 --> 00:28:22.779 A:middle L:90%
the two separate regions. So so yeah, the

639
00:28:22.779 --> 00:28:25.190 A:middle L:90%
second bonus I wanted to tell you about this computing

640
00:28:25.190 --> 00:28:27.440 A:middle L:90%
viruses. So so the the law, right,

641
00:28:27.440 --> 00:28:30.990 A:middle L:90%
Only argue about a single viruses that there's one virus

642
00:28:30.990 --> 00:28:32.920 A:middle L:90%
and one. Uh and a bunch of contact networks

643
00:28:32.920 --> 00:28:34.690 A:middle L:90%
probably. So what if they're to viruses? And

644
00:28:34.700 --> 00:28:37.920 A:middle L:90%
these are really common scenarios. For example, you

645
00:28:37.920 --> 00:28:40.779 A:middle L:90%
can think of iphone versus android or blackberry. This

646
00:28:40.779 --> 00:28:45.200 A:middle L:90%
is the really or even more biological situations like common

647
00:28:45.200 --> 00:28:47.950 A:middle L:90%
flu is living through a pneumococcal infections and so on

648
00:28:47.960 --> 00:28:51.829 A:middle L:90%
. Uh And the question we the simple model that

649
00:28:51.829 --> 00:28:52.799 A:middle L:90%
we use here. So it's an extension of the

650
00:28:52.809 --> 00:28:53.650 A:middle L:90%
S. I. S. Model which you have

651
00:28:53.650 --> 00:28:56.059 A:middle L:90%
all seen. The important thing to note here is

652
00:28:56.059 --> 00:28:59.500 A:middle L:90%
that because you feel mutual immunity. So what it

653
00:28:59.500 --> 00:29:00.380 A:middle L:90%
means is that if you have an iphone you won't

654
00:29:00.390 --> 00:29:04.160 A:middle L:90%
buy an android unless you ditch type of. Right

655
00:29:04.170 --> 00:29:07.190 A:middle L:90%
? So that's what this model tries to capture.

656
00:29:07.200 --> 00:29:10.980 A:middle L:90%
The once you're infected one of the viruses you won't

657
00:29:10.990 --> 00:29:12.589 A:middle L:90%
be infected with another virus. So given such a

658
00:29:12.599 --> 00:29:15.529 A:middle L:90%
very simple extension of the classic S. I.

659
00:29:15.529 --> 00:29:17.220 A:middle L:90%
S. Model, uh what do we want to

660
00:29:17.220 --> 00:29:19.390 A:middle L:90%
ask the question? So you can clearly our previous

661
00:29:19.390 --> 00:29:22.140 A:middle L:90%
work answers the question with the one of the viruses

662
00:29:22.140 --> 00:29:22.910 A:middle L:90%
will survive or not. Right? I mean if

663
00:29:22.910 --> 00:29:26.759 A:middle L:90%
there's about threshold that will survive otherwise not. Uh

664
00:29:26.769 --> 00:29:29.299 A:middle L:90%
But what we really want to answer is that if

665
00:29:29.299 --> 00:29:30.950 A:middle L:90%
both of them are about threshold, what happens?

666
00:29:30.960 --> 00:29:33.109 A:middle L:90%
What's the end state? Right. And what happens

667
00:29:33.109 --> 00:29:34.869 A:middle L:90%
in the end? That's and this can be thought

668
00:29:34.869 --> 00:29:37.069 A:middle L:90%
of the footprint of the steady state development footprint at

669
00:29:37.079 --> 00:29:40.460 A:middle L:90%
the steady state of the, of the second wives

670
00:29:40.470 --> 00:29:41.859 A:middle L:90%
. So just for sake of clarity, just assume

671
00:29:41.859 --> 00:29:44.910 A:middle L:90%
that one of the virus is stronger than the other

672
00:29:44.920 --> 00:29:47.809 A:middle L:90%
. And uh, you're given just the first few

673
00:29:47.819 --> 00:29:49.069 A:middle L:90%
uh, time takes a revolution, Right? It's

674
00:29:49.069 --> 00:29:52.259 A:middle L:90%
the same infection profile. So what happens in the

675
00:29:52.259 --> 00:29:53.490 A:middle L:90%
end you think, uh, it will go like

676
00:29:53.490 --> 00:29:56.710 A:middle L:90%
this? Uh, I'll go like this. And

677
00:29:56.720 --> 00:29:57.980 A:middle L:90%
the ratio of the uh, end states will be

678
00:29:57.980 --> 00:30:00.779 A:middle L:90%
some factor of the strength. Like if you,

679
00:30:00.789 --> 00:30:02.740 A:middle L:90%
it's reasonable to assume that if the virus is two

680
00:30:02.750 --> 00:30:04.559 A:middle L:90%
times as strong, the end steady state will be

681
00:30:04.559 --> 00:30:07.329 A:middle L:90%
two times as worse. Right? The market share

682
00:30:07.329 --> 00:30:10.690 A:middle L:90%
would split probably in in that basis are square or

683
00:30:10.690 --> 00:30:12.500 A:middle L:90%
some other thing. So that really interesting thing which

684
00:30:12.500 --> 00:30:15.470 A:middle L:90%
reform that all of this is not true. And

685
00:30:15.470 --> 00:30:17.660 A:middle L:90%
essentially when it takes off. So even if one

686
00:30:17.660 --> 00:30:18.890 A:middle L:90%
of the viruses even a bit stronger than the second

687
00:30:18.890 --> 00:30:21.470 A:middle L:90%
one, it will just wipe it off. So

688
00:30:21.470 --> 00:30:25.029 A:middle L:90%
the weaker virus always dies off. So what's the

689
00:30:25.029 --> 00:30:26.710 A:middle L:90%
result? The result is given our model and any

690
00:30:26.710 --> 00:30:30.279 A:middle L:90%
graph again the graph is totally arbitrary. So uh

691
00:30:30.289 --> 00:30:33.289 A:middle L:90%
compared with previous stuff the weaker virus always dies or

692
00:30:33.299 --> 00:30:36.299 A:middle L:90%
uh and of course the cabinet is the stronger virus

693
00:30:36.299 --> 00:30:37.640 A:middle L:90%
survives if it itself is about threshold. And what's

694
00:30:37.640 --> 00:30:40.960 A:middle L:90%
the threshold? The threshold which were already computed from

695
00:30:40.960 --> 00:30:42.230 A:middle L:90%
a previous work just affected strength. Right? It's

696
00:30:42.230 --> 00:30:47.700 A:middle L:90%
the same as before. Right? So we try

697
00:30:47.700 --> 00:30:49.099 A:middle L:90%
to do some data and real examples. So this

698
00:30:49.109 --> 00:30:52.180 A:middle L:90%
shows you google translate to which is like a proxy

699
00:30:52.180 --> 00:30:55.339 A:middle L:90%
for the interests of the market share of product.

700
00:30:55.349 --> 00:30:56.410 A:middle L:90%
And you can see that the search if you plot

701
00:30:56.410 --> 00:30:59.230 A:middle L:90%
the search percentage of it was this time. This

702
00:30:59.230 --> 00:31:00.210 A:middle L:90%
is real data, right? For these two pairs

703
00:31:00.210 --> 00:31:02.890 A:middle L:90%
of products. So you can see some around these

704
00:31:02.890 --> 00:31:06.119 A:middle L:90%
like like boston it's like christmas is for example you

705
00:31:06.119 --> 00:31:07.579 A:middle L:90%
can see when the sales just pick up there.

706
00:31:07.589 --> 00:31:12.059 A:middle L:90%
But if you see the broad uh broad qualities,

707
00:31:12.059 --> 00:31:15.329 A:middle L:90%
qualitative behavior is still there. For example that it

708
00:31:15.329 --> 00:31:17.000 A:middle L:90%
was just a you can see that the strong viruses

709
00:31:17.000 --> 00:31:22.769 A:middle L:90%
coming down. Right. Right. So the law

710
00:31:22.769 --> 00:31:25.599 A:middle L:90%
have concentrated on essentially the theoretical part right? Which

711
00:31:25.599 --> 00:31:27.230 A:middle L:90%
is analyzing these models and seeing what happens, predicting

712
00:31:27.240 --> 00:31:30.519 A:middle L:90%
and answering some metrics of things. But how do

713
00:31:30.519 --> 00:31:32.220 A:middle L:90%
you actually go ahead and use like to do some

714
00:31:32.220 --> 00:31:33.829 A:middle L:90%
tasks. So the action part which we I'll go

715
00:31:33.829 --> 00:31:37.789 A:middle L:90%
where is who to immunize algorithms? So what's uh

716
00:31:37.799 --> 00:31:41.970 A:middle L:90%
what's the problem the so completely the problem is that

717
00:31:41.970 --> 00:31:44.009 A:middle L:90%
you're given a virus propagation model and you've given a

718
00:31:44.009 --> 00:31:45.859 A:middle L:90%
budget, right budget is you can think of the

719
00:31:45.859 --> 00:31:48.180 A:middle L:90%
number of nodes you want to remove and you want

720
00:31:48.180 --> 00:31:49.730 A:middle L:90%
to find the best care notes for removing, for

721
00:31:49.730 --> 00:31:52.599 A:middle L:90%
example its case to in this network and you remove

722
00:31:52.599 --> 00:31:55.460 A:middle L:90%
these two notes, is this two nights better or

723
00:31:55.460 --> 00:31:57.059 A:middle L:90%
these two notes? Uh intuitively you might imagine this

724
00:31:57.059 --> 00:32:00.160 A:middle L:90%
note this removal is better because it makes the graphic

725
00:32:00.160 --> 00:32:04.460 A:middle L:90%
chain and as we have seen before chain is bad

726
00:32:04.460 --> 00:32:07.329 A:middle L:90%
for the virus because the country is spread. So

727
00:32:07.329 --> 00:32:09.630 A:middle L:90%
if you can guess what's coming up that uh so

728
00:32:09.640 --> 00:32:12.819 A:middle L:90%
I'll talk about static graphs and then I try to

729
00:32:12.819 --> 00:32:15.319 A:middle L:90%
give you a flavor of an application. But what

730
00:32:15.319 --> 00:32:16.529 A:middle L:90%
are the challenges in such a question? Right,

731
00:32:16.539 --> 00:32:20.099 A:middle L:90%
so the challenge is, is the metric, how

732
00:32:20.099 --> 00:32:22.289 A:middle L:90%
do you measure the goodness value of a set of

733
00:32:22.289 --> 00:32:23.599 A:middle L:90%
notes? Which you have to measure that? Which

734
00:32:23.599 --> 00:32:27.430 A:middle L:90%
to set of notes are better. Uh an algorithm

735
00:32:27.440 --> 00:32:29.980 A:middle L:90%
. How do you actually go and quickly find these

736
00:32:29.980 --> 00:32:31.829 A:middle L:90%
best case set of notes with the highest goodness value

737
00:32:31.839 --> 00:32:36.299 A:middle L:90%
. So uh and given my previous work, the

738
00:32:36.299 --> 00:32:37.750 A:middle L:90%
proposed one liberty measure is actually lambda. And why

739
00:32:37.750 --> 00:32:40.460 A:middle L:90%
is it provided? It's true. It's because land

740
00:32:40.460 --> 00:32:44.190 A:middle L:90%
is epidemic threshold. So clearly what you want to

741
00:32:44.190 --> 00:32:45.039 A:middle L:90%
really do is to drop in value as fast as

742
00:32:45.039 --> 00:32:46.880 A:middle L:90%
possible. Because if it's about the threshold, you

743
00:32:46.880 --> 00:32:49.980 A:middle L:90%
know, there is an epidemic which below the threshold

744
00:32:49.980 --> 00:32:51.990 A:middle L:90%
that is not. So if you want to remove

745
00:32:51.990 --> 00:32:53.150 A:middle L:90%
notes, remove notes in a way that drops in

746
00:32:53.150 --> 00:32:55.890 A:middle L:90%
value as fast as possible. And so what I

747
00:32:55.890 --> 00:32:59.109 A:middle L:90%
mean by again, I can drop which is the

748
00:32:59.119 --> 00:33:00.960 A:middle L:90%
change in the value. Right? So you have

749
00:33:00.970 --> 00:33:02.279 A:middle L:90%
these two graphs which is original draft and you have

750
00:33:02.279 --> 00:33:05.920 A:middle L:90%
removed notes two and six. You can see diagonal

751
00:33:05.920 --> 00:33:07.730 A:middle L:90%
is probably this and adding value for here is this

752
00:33:07.740 --> 00:33:09.450 A:middle L:90%
dragon values. Again, the largest trading value of

753
00:33:09.450 --> 00:33:12.890 A:middle L:90%
the adjacency matrix. And what you want to find

754
00:33:12.890 --> 00:33:15.950 A:middle L:90%
is those set of notes which maximize the screen bar

755
00:33:15.049 --> 00:33:20.569 A:middle L:90%
within the budget. So uh as you can guess

756
00:33:20.569 --> 00:33:22.859 A:middle L:90%
the director algorithm is really expensive. You can prove

757
00:33:22.859 --> 00:33:24.720 A:middle L:90%
that it's NPR uh just to give you an example

758
00:33:24.720 --> 00:33:28.210 A:middle L:90%
if you're just 1000 notes and if you run the

759
00:33:28.220 --> 00:33:30.109 A:middle L:90%
good force method, you can see that it takes

760
00:33:30.119 --> 00:33:34.319 A:middle L:90%
like almost 2600 years. Just find five best notes

761
00:33:34.329 --> 00:33:37.230 A:middle L:90%
. So what what's our answer? So our our

762
00:33:37.230 --> 00:33:39.859 A:middle L:90%
method involves two parts again uh Part one is for

763
00:33:39.859 --> 00:33:42.430 A:middle L:90%
the shield value which was the goodness value of a

764
00:33:42.430 --> 00:33:44.920 A:middle L:90%
network. What for a set of notes? What

765
00:33:44.920 --> 00:33:46.170 A:middle L:90%
we did was we carefully approximately Dragon drop, which

766
00:33:46.170 --> 00:33:50.089 A:middle L:90%
is the drop in the value using matrix perturbation theory

767
00:33:50.099 --> 00:33:52.950 A:middle L:90%
. So once you get this formula for the I

768
00:33:52.950 --> 00:33:54.720 A:middle L:90%
can drop you can actually use. And it turns

769
00:33:54.720 --> 00:33:59.500 A:middle L:90%
out that this uh this uh this function which you

770
00:33:59.500 --> 00:34:01.390 A:middle L:90%
get from the matrix perturbation theory is essentially some model

771
00:34:01.400 --> 00:34:05.440 A:middle L:90%
. What it really means. Uh is that because

772
00:34:05.440 --> 00:34:07.680 A:middle L:90%
it's someone that you can do just a really approximation

773
00:34:07.680 --> 00:34:09.090 A:middle L:90%
quickly find the best game modes. And this gives

774
00:34:09.090 --> 00:34:12.869 A:middle L:90%
you a near optimal solution which is in the running

775
00:34:12.880 --> 00:34:15.230 A:middle L:90%
uh like linear and running time in both nodes and

776
00:34:15.230 --> 00:34:19.599 A:middle L:90%
edges. So this was nice because uh as you

777
00:34:19.599 --> 00:34:21.710 A:middle L:90%
can see, the direct algorithm is really expensive,

778
00:34:21.719 --> 00:34:22.659 A:middle L:90%
but we were able to get a really near optimal

779
00:34:22.659 --> 00:34:25.929 A:middle L:90%
linear time solution. And this is just an experiment

780
00:34:25.940 --> 00:34:29.260 A:middle L:90%
. So this is sort of contact graph again.

781
00:34:29.260 --> 00:34:30.760 A:middle L:90%
So this gives you the again the time profile,

782
00:34:30.769 --> 00:34:34.809 A:middle L:90%
this is the log of fraction of infected nodes poses

783
00:34:34.809 --> 00:34:37.469 A:middle L:90%
time. And these are the different uh immunization metrics

784
00:34:37.469 --> 00:34:40.050 A:middle L:90%
and algorithms currently in use. So you can see

785
00:34:40.050 --> 00:34:42.969 A:middle L:90%
that next year is the lower of all of them

786
00:34:42.969 --> 00:34:44.989 A:middle L:90%
, so it quickly drops to zero and the restriction

787
00:34:44.989 --> 00:34:46.239 A:middle L:90%
dies off. But the interesting thing to note here

788
00:34:46.239 --> 00:34:49.039 A:middle L:90%
is that if you know to look at the Qantas

789
00:34:49.039 --> 00:34:51.980 A:middle L:90%
immunization, this is a really popular uh method where

790
00:34:51.989 --> 00:34:53.420 A:middle L:90%
you choose a random person out of phone book.

791
00:34:53.429 --> 00:34:55.420 A:middle L:90%
You don't even is a random person, you immunized

792
00:34:55.429 --> 00:34:59.019 A:middle L:90%
random neighbor of that person. What it does is

793
00:34:59.019 --> 00:35:00.980 A:middle L:90%
that it gives you a handle on the degree,

794
00:35:00.989 --> 00:35:01.869 A:middle L:90%
the spreading the degree of the ground. So you

795
00:35:01.880 --> 00:35:05.670 A:middle L:90%
tend to immunize hubs hiding how you connected notes,

796
00:35:05.679 --> 00:35:07.570 A:middle L:90%
but you can see that our method is clearly much

797
00:35:07.570 --> 00:35:09.280 A:middle L:90%
better than that. Uh This is because we optimize

798
00:35:09.280 --> 00:35:13.139 A:middle L:90%
the real idea she'll value right, Which is uh

799
00:35:13.150 --> 00:35:14.960 A:middle L:90%
again value which we got from a previous present.

800
00:35:14.969 --> 00:35:21.000 A:middle L:90%
So as a promise, you just uh this is

801
00:35:21.010 --> 00:35:23.710 A:middle L:90%
uh variants like variant of the uh removal problem which

802
00:35:23.710 --> 00:35:27.610 A:middle L:90%
I actually had the good fortune of working with real

803
00:35:27.610 --> 00:35:30.630 A:middle L:90%
doctors at michigan. In fact uh it was interesting

804
00:35:30.630 --> 00:35:32.570 A:middle L:90%
working with the person who was a real doctor because

805
00:35:32.570 --> 00:35:35.980 A:middle L:90%
sometimes he had to come out or surgery and for

806
00:35:35.980 --> 00:35:37.280 A:middle L:90%
a meeting and used to tell us uh things happening

807
00:35:37.280 --> 00:35:42.489 A:middle L:90%
there. So uh the problem here is that you

808
00:35:42.489 --> 00:35:45.389 A:middle L:90%
have a network of hospitals like and hospitals transfer patients

809
00:35:45.579 --> 00:35:50.130 A:middle L:90%
and these patients have critical drug raises critical these patients

810
00:35:50.139 --> 00:35:52.550 A:middle L:90%
critically ill. Right? There are drug resistant bacteria

811
00:35:52.559 --> 00:35:55.230 A:middle L:90%
like this extensively drug resistant tragic losses. And what's

812
00:35:55.230 --> 00:35:58.570 A:middle L:90%
happening is that you have a set of fixed budget

813
00:35:58.570 --> 00:36:00.280 A:middle L:90%
of resources, can be money, it can be

814
00:36:00.280 --> 00:36:04.679 A:middle L:90%
disinfected, it can be a specialized medication which centralized

815
00:36:04.690 --> 00:36:07.510 A:middle L:90%
agencies trying to distribute among these hospitals and how do

816
00:36:07.510 --> 00:36:08.469 A:middle L:90%
you do that? So I suppose you have this

817
00:36:08.469 --> 00:36:10.730 A:middle L:90%
bunch of disinfectants and you give it to one of

818
00:36:10.730 --> 00:36:14.610 A:middle L:90%
the hospitals and what really happens to study the hospital

819
00:36:14.610 --> 00:36:17.579 A:middle L:90%
becomes more robust, Right? So the strength of

820
00:36:17.579 --> 00:36:22.050 A:middle L:90%
this infection decreases. And how does the strength decreases

821
00:36:22.050 --> 00:36:23.150 A:middle L:90%
given by a function? So of course there are

822
00:36:23.150 --> 00:36:25.469 A:middle L:90%
some specialized functions used in literature, but the key

823
00:36:25.469 --> 00:36:30.510 A:middle L:90%
idea everywhere is that it's diminishing returns because the more

824
00:36:30.510 --> 00:36:32.019 A:middle L:90%
you give, the more infection control associate, you

825
00:36:32.019 --> 00:36:35.420 A:middle L:90%
don't get a proportionately higher impact right? Which is

826
00:36:35.429 --> 00:36:37.010 A:middle L:90%
very reasonable. And under such a setting, you

827
00:36:37.010 --> 00:36:39.559 A:middle L:90%
really want to find out how you distribute these and

828
00:36:39.559 --> 00:36:45.019 A:middle L:90%
it can be any grand laboratory uh right to maximize

829
00:36:45.019 --> 00:36:46.809 A:middle L:90%
these hospitals so that these hospital starting a huge technical

830
00:36:46.809 --> 00:36:51.699 A:middle L:90%
picture, have already seen introduction. Right? So

831
00:36:51.699 --> 00:36:52.440 A:middle L:90%
these are the medical network, as I think you

832
00:36:52.440 --> 00:36:53.780 A:middle L:90%
have already seen the draft. So this is the

833
00:36:53.780 --> 00:36:57.710 A:middle L:90%
current practice. The current practices essentially giving every hospital

834
00:36:57.719 --> 00:37:00.809 A:middle L:90%
equal amount because they don't want to discriminate among hospitals

835
00:37:00.820 --> 00:37:02.079 A:middle L:90%
. So you can see that smart aleck bizarre method

836
00:37:02.079 --> 00:37:05.969 A:middle L:90%
and uh you can see that it has substantially fewer

837
00:37:05.969 --> 00:37:07.940 A:middle L:90%
infections and uh, this is the running time.

838
00:37:07.949 --> 00:37:10.699 A:middle L:90%
So before we came, before we started this collaboration

839
00:37:10.699 --> 00:37:13.690 A:middle L:90%
with the doctors, they were actually running simulation monte

840
00:37:13.690 --> 00:37:15.710 A:middle L:90%
Carlo simulations and they used to take more than three

841
00:37:15.710 --> 00:37:17.269 A:middle L:90%
weeks to actually figure out any kind of distribution.

842
00:37:17.280 --> 00:37:21.500 A:middle L:90%
So this is in contrast to the current practice on

843
00:37:21.500 --> 00:37:22.429 A:middle L:90%
the ground in the hospital, which is just uniform

844
00:37:22.440 --> 00:37:25.349 A:middle L:90%
. So you can see that this is greater than

845
00:37:25.349 --> 00:37:30.139 A:middle L:90%
one week, just like 14 seconds more than 30,000

846
00:37:30.150 --> 00:37:31.840 A:middle L:90%
XP. So yeah, I mean, it's a

847
00:37:31.840 --> 00:37:37.400 A:middle L:90%
totally different way of doing these things here. Uh

848
00:37:37.409 --> 00:37:39.840 A:middle L:90%
just to give you a further example of this problem

849
00:37:40.030 --> 00:37:43.230 A:middle L:90%
, uh coming back to the theme of that similar

850
00:37:43.230 --> 00:37:45.190 A:middle L:90%
problems in many different areas. This problem also occurs

851
00:37:45.190 --> 00:37:49.250 A:middle L:90%
in uh social graphs. So this is an online

852
00:37:49.250 --> 00:37:52.969 A:middle L:90%
virtual gain second life and their people administrators have some

853
00:37:52.969 --> 00:37:53.590 A:middle L:90%
resources time, which can, you can think of

854
00:37:53.590 --> 00:37:55.699 A:middle L:90%
time and you want to do this, you want

855
00:37:55.699 --> 00:37:59.539 A:middle L:90%
to see which users misbehaving and this user you should

856
00:37:59.539 --> 00:38:00.670 A:middle L:90%
give time to. And this is a pen network

857
00:38:00.679 --> 00:38:04.480 A:middle L:90%
. This is just the hospital network of pennsylvania and

858
00:38:04.480 --> 00:38:06.239 A:middle L:90%
it's an all pairs. So it's kinda was all

859
00:38:06.239 --> 00:38:07.940 A:middle L:90%
kinds of things. Uh again you can see more

860
00:38:07.940 --> 00:38:12.829 A:middle L:90%
than five x or 2.5 X difference between the current

861
00:38:12.829 --> 00:38:16.840 A:middle L:90%
practice and uh so uh lower is better work.

862
00:38:16.849 --> 00:38:23.090 A:middle L:90%
Uh Right, so the final uh part which I

863
00:38:23.099 --> 00:38:25.550 A:middle L:90%
would like to talk about the standard processes, learning

864
00:38:25.550 --> 00:38:29.940 A:middle L:90%
models from twitter, which is kind of uh study

865
00:38:29.940 --> 00:38:31.400 A:middle L:90%
on a huge data. So this was work done

866
00:38:31.400 --> 00:38:34.170 A:middle L:90%
with Yeah. What's the problem? We are given

867
00:38:34.170 --> 00:38:37.079 A:middle L:90%
an action log of people tweeting hashtag right. So

868
00:38:37.090 --> 00:38:38.789 A:middle L:90%
uh these are people who are tweeting and hashtag and

869
00:38:38.789 --> 00:38:43.119 A:middle L:90%
you just recording what they tweeted uh and you have

870
00:38:43.119 --> 00:38:45.289 A:middle L:90%
an underlying network of users so this network can be

871
00:38:45.289 --> 00:38:47.960 A:middle L:90%
defined in many different ways as and I come to

872
00:38:47.960 --> 00:38:51.590 A:middle L:90%
what we actually use in our work and what we

873
00:38:51.590 --> 00:38:52.980 A:middle L:90%
want to find is how external influence varies with the

874
00:38:52.980 --> 00:38:55.800 A:middle L:90%
number of hashtag. What is this concrete you mean

875
00:38:55.809 --> 00:39:00.389 A:middle L:90%
? So so you have a network which is this

876
00:39:00.389 --> 00:39:04.380 A:middle L:90%
person falling this and so on and she tweets something

877
00:39:04.389 --> 00:39:06.619 A:middle L:90%
about some topic. It can be done in Egypt

878
00:39:06.619 --> 00:39:10.150 A:middle L:90%
or Justin Bieber uh he picks it up and he

879
00:39:10.150 --> 00:39:13.500 A:middle L:90%
also tweets about it and you see even this person

880
00:39:13.500 --> 00:39:15.760 A:middle L:90%
tweeting about it. So the question we really wanted

881
00:39:15.760 --> 00:39:17.650 A:middle L:90%
to ask is that what really happened and what part

882
00:39:17.650 --> 00:39:20.690 A:middle L:90%
statistically can you figure out what part of it would

883
00:39:20.699 --> 00:39:22.070 A:middle L:90%
be due to this? Because he actually saw her

884
00:39:22.079 --> 00:39:24.539 A:middle L:90%
tweeting it or she just saw something in tv and

885
00:39:24.539 --> 00:39:27.389 A:middle L:90%
went ahead and tweeted about it. So why is

886
00:39:27.389 --> 00:39:29.550 A:middle L:90%
this important is that if you are a market and

887
00:39:29.559 --> 00:39:31.010 A:middle L:90%
a bunch of dollars very should give the dollar,

888
00:39:31.010 --> 00:39:34.579 A:middle L:90%
should give the money to this person because she cost

889
00:39:34.579 --> 00:39:37.139 A:middle L:90%
the whole cascade or you should just run an advertisement

890
00:39:37.139 --> 00:39:39.159 A:middle L:90%
on because that will give you much more mileage for

891
00:39:39.159 --> 00:39:43.179 A:middle L:90%
your money. Uh And we wanted to see how

892
00:39:43.179 --> 00:39:45.400 A:middle L:90%
these values with the hashtag right? Because as you

893
00:39:45.400 --> 00:39:46.739 A:middle L:90%
can imagine this might depend on the actual content of

894
00:39:46.750 --> 00:39:52.849 A:middle L:90%
the time. So the data we used was like

895
00:39:52.860 --> 00:39:54.030 A:middle L:90%
uh as I said this was the yahoo and this

896
00:39:54.030 --> 00:39:57.989 A:middle L:90%
was like almost more than almost 15 terabytes of data

897
00:39:58.000 --> 00:40:00.699 A:middle L:90%
. This was uh yeah who had twitter firehose where

898
00:40:00.699 --> 00:40:01.420 A:middle L:90%
they just used to get this whole big jump of

899
00:40:01.429 --> 00:40:05.119 A:middle L:90%
the day to more than 7 50 million tweets and

900
00:40:05.130 --> 00:40:07.840 A:middle L:90%
uh we ran our algorithms on the hard work and

901
00:40:07.840 --> 00:40:09.159 A:middle L:90%
pick system. It's like more than it's a huge

902
00:40:09.159 --> 00:40:12.210 A:middle L:90%
, they have a really excellent infrastructure, more than

903
00:40:12.210 --> 00:40:15.650 A:middle L:90%
6000 machines. Uh what we did was we took

904
00:40:15.659 --> 00:40:17.630 A:middle L:90%
find hashtags and we just saw how they value their

905
00:40:17.639 --> 00:40:22.050 A:middle L:90%
behavior on network of users and the network person connected

906
00:40:22.050 --> 00:40:23.469 A:middle L:90%
to another person if he or she can influence other

907
00:40:23.469 --> 00:40:25.730 A:middle L:90%
person. Right? So what does influence means is

908
00:40:25.730 --> 00:40:29.510 A:middle L:90%
essentially we are assuming that either your follower, what

909
00:40:29.510 --> 00:40:30.719 A:middle L:90%
you have at least actually directed messages to her.

910
00:40:30.730 --> 00:40:34.329 A:middle L:90%
So yeah there are many different ways of defining this

911
00:40:34.329 --> 00:40:36.260 A:middle L:90%
network, this is just one of the more acceptable

912
00:40:36.269 --> 00:40:39.809 A:middle L:90%
ways in the literature right now. So the model

913
00:40:39.809 --> 00:40:42.849 A:middle L:90%
here is this again details like the market here is

914
00:40:42.849 --> 00:40:44.929 A:middle L:90%
that we developed a model. The propagation is a

915
00:40:44.929 --> 00:40:46.219 A:middle L:90%
part of influence and external. So the influence part

916
00:40:46.219 --> 00:40:49.960 A:middle L:90%
is essentially how you get influence from the neighbors.

917
00:40:50.110 --> 00:40:52.820 A:middle L:90%
And the external part is what percentage of it is

918
00:40:52.829 --> 00:40:55.239 A:middle L:90%
through externalities rather than the network itself. So what

919
00:40:55.239 --> 00:40:58.300 A:middle L:90%
we did was we developed a model which takes the

920
00:40:58.300 --> 00:41:00.210 A:middle L:90%
previous observations take on. And there are some parameters

921
00:41:00.210 --> 00:41:02.489 A:middle L:90%
which explicitly represent the influence. So I want to

922
00:41:02.489 --> 00:41:05.960 A:middle L:90%
go into the detail of the model itself. But

923
00:41:05.969 --> 00:41:07.150 A:middle L:90%
the key thing to note here is that there are

924
00:41:07.150 --> 00:41:09.980 A:middle L:90%
parameters which actually represent external influence directly. And then

925
00:41:09.980 --> 00:41:13.539 A:middle L:90%
we are also developed E. M. Alternative minimization

926
00:41:13.539 --> 00:41:15.519 A:middle L:90%
algorithm to learn these models. But once you have

927
00:41:15.519 --> 00:41:15.880 A:middle L:90%
learned the models, when you have this set of

928
00:41:15.880 --> 00:41:19.460 A:middle L:90%
parameters on different hashtags, what we did was we

929
00:41:19.460 --> 00:41:22.320 A:middle L:90%
went ahead and group these tax according to the parameter

930
00:41:22.329 --> 00:41:23.780 A:middle L:90%
. And that's where we got interesting results on the

931
00:41:23.780 --> 00:41:27.599 A:middle L:90%
different behavior of hashtags to give you a flavor of

932
00:41:27.599 --> 00:41:29.679 A:middle L:90%
the results so you can see that. Uh So

933
00:41:29.679 --> 00:41:31.280 A:middle L:90%
if you think about these being external effects, so

934
00:41:31.280 --> 00:41:37.289 A:middle L:90%
more the external effect is essentially more externalities in the

935
00:41:37.300 --> 00:41:39.989 A:middle L:90%
hashtag and this is what sustained and not that these

936
00:41:39.989 --> 00:41:43.409 A:middle L:90%
are parameters which I learned from the model, not

937
00:41:43.420 --> 00:41:45.989 A:middle L:90%
given the data itself. So these are parameters as

938
00:41:45.989 --> 00:41:47.579 A:middle L:90%
implemented by the model. So these bunch of different

939
00:41:47.579 --> 00:41:51.570 A:middle L:90%
behaviors uh just go over two of them. So

940
00:41:51.579 --> 00:41:53.150 A:middle L:90%
these these are hashtags which represent the long running tax

941
00:41:53.159 --> 00:41:54.599 A:middle L:90%
. So you can see that there are really high

942
00:41:54.599 --> 00:41:57.510 A:middle L:90%
external component. So these are tax, which I

943
00:41:57.510 --> 00:41:59.550 A:middle L:90%
mean on twitter for almost two years. So there

944
00:41:59.550 --> 00:42:01.369 A:middle L:90%
is no really local component there there is no network

945
00:42:01.380 --> 00:42:04.199 A:middle L:90%
affects their. People know about the tags that you

946
00:42:04.199 --> 00:42:05.739 A:middle L:90%
eat it whenever they want. For example, this

947
00:42:05.739 --> 00:42:07.630 A:middle L:90%
just says not watching, this is what are they

948
00:42:07.630 --> 00:42:09.380 A:middle L:90%
watching now? It doesn't really depend on your friend

949
00:42:09.389 --> 00:42:13.730 A:middle L:90%
. Right? And so these these are taxes really

950
00:42:13.730 --> 00:42:15.110 A:middle L:90%
high. External component. On the other hand,

951
00:42:15.119 --> 00:42:17.610 A:middle L:90%
these are really word of more processes. These are

952
00:42:17.619 --> 00:42:22.949 A:middle L:90%
hashtags which actually grow organically in local networks and then

953
00:42:22.960 --> 00:42:25.599 A:middle L:90%
take off and then they are mentioning the twitter trending

954
00:42:25.610 --> 00:42:29.329 A:middle L:90%
topics pecked and these trending topics are essentially capture the

955
00:42:29.329 --> 00:42:31.739 A:middle L:90%
slash doc effect because suddenly there on the trending list

956
00:42:31.750 --> 00:42:34.690 A:middle L:90%
everybody knows about it and then there is no local

957
00:42:34.690 --> 00:42:37.030 A:middle L:90%
component again because now everybody knows about it and they

958
00:42:37.039 --> 00:42:37.829 A:middle L:90%
talk about it. So what we were able to

959
00:42:37.829 --> 00:42:39.579 A:middle L:90%
do is very, very able to capture these kinds

960
00:42:39.579 --> 00:42:42.659 A:middle L:90%
of different behaviors, right? And if you have

961
00:42:42.670 --> 00:42:44.849 A:middle L:90%
money, so these are just pastilles, external events

962
00:42:44.849 --> 00:42:46.039 A:middle L:90%
and trending. So if you have a bunch of

963
00:42:46.039 --> 00:42:49.619 A:middle L:90%
money where you would you give to hashtag behave like

964
00:42:49.619 --> 00:42:52.139 A:middle L:90%
this, right? Because essentially you are just seeing

965
00:42:52.139 --> 00:42:55.440 A:middle L:90%
the community with these hashtag, they're talking about it

966
00:42:55.480 --> 00:42:57.969 A:middle L:90%
and then they suddenly take off and a lot of

967
00:42:57.980 --> 00:42:59.880 A:middle L:90%
people know about it. So this gives you the

968
00:42:59.880 --> 00:43:01.900 A:middle L:90%
greatest bang for your buck. And the nice thing

969
00:43:01.900 --> 00:43:04.809 A:middle L:90%
here is that you can use this for forecasting anomaly

970
00:43:04.809 --> 00:43:06.969 A:middle L:90%
detection because if you have a hashtag which are expected

971
00:43:06.969 --> 00:43:07.440 A:middle L:90%
to behave in a certain way, they behave so

972
00:43:07.590 --> 00:43:12.920 A:middle L:90%
differently than you can do something there. Right?

973
00:43:12.929 --> 00:43:15.889 A:middle L:90%
So this concludes the uh dynamical process part of my

974
00:43:15.900 --> 00:43:19.280 A:middle L:90%
talk. So I try to quickly go over uh

975
00:43:19.289 --> 00:43:22.280 A:middle L:90%
for finance also about uh some of the other interesting

976
00:43:22.280 --> 00:43:22.860 A:middle L:90%
work I have done. So as I said,

977
00:43:22.860 --> 00:43:24.610 A:middle L:90%
I won't get too much time to discuss the details

978
00:43:24.610 --> 00:43:25.760 A:middle L:90%
, but I will be happy to talk about it

979
00:43:25.769 --> 00:43:30.099 A:middle L:90%
offline. So the first thing is like uh community

980
00:43:30.110 --> 00:43:32.110 A:middle L:90%
detection. This was what I did with Sprint research

981
00:43:32.119 --> 00:43:34.710 A:middle L:90%
. So the nice thing is that they had really

982
00:43:34.710 --> 00:43:37.239 A:middle L:90%
huge amounts of data on mobile polygraph users, right

983
00:43:37.239 --> 00:43:38.699 A:middle L:90%
? So essentially like we had, we collected data

984
00:43:38.699 --> 00:43:42.289 A:middle L:90%
from a switch in a large US city should remain

985
00:43:42.289 --> 00:43:45.670 A:middle L:90%
anonymous and you have 200,000 users and millions of calls

986
00:43:45.679 --> 00:43:49.559 A:middle L:90%
. So what we wanted to do is essentially understand

987
00:43:49.570 --> 00:43:52.000 A:middle L:90%
how how do these graphs look like, like So

988
00:43:52.010 --> 00:43:54.400 A:middle L:90%
I got to the long story short, we did

989
00:43:54.400 --> 00:43:58.260 A:middle L:90%
some extra data analysis, but the key thing is

990
00:43:58.260 --> 00:44:00.050 A:middle L:90%
that we were able to, if the graphs look

991
00:44:00.050 --> 00:44:04.449 A:middle L:90%
like this, which is essentially a core with a

992
00:44:04.449 --> 00:44:07.130 A:middle L:90%
lot of different small communities connected to the core,

993
00:44:07.139 --> 00:44:09.019 A:middle L:90%
then you can quickly identify these small communities and then

994
00:44:09.019 --> 00:44:12.480 A:middle L:90%
a really interesting thing is that these communities can be

995
00:44:12.480 --> 00:44:15.030 A:middle L:90%
clicks or by part of course. So even finding

996
00:44:15.039 --> 00:44:16.869 A:middle L:90%
planted clicks is a really hard problems, right?

997
00:44:16.880 --> 00:44:21.119 A:middle L:90%
But uh, finding bipartisan causes even harder and by

998
00:44:21.119 --> 00:44:22.809 A:middle L:90%
part I of course are not really community in the

999
00:44:22.809 --> 00:44:24.650 A:middle L:90%
traditional sense because there's no real connection between them,

1000
00:44:24.659 --> 00:44:27.670 A:middle L:90%
there's a connection across them to the other side.

1001
00:44:27.679 --> 00:44:30.309 A:middle L:90%
So, and and what we found that this kind

1002
00:44:30.309 --> 00:44:32.760 A:middle L:90%
of pattern, the pattern which we developed according a

1003
00:44:32.760 --> 00:44:35.409 A:middle L:90%
lot of different data sets, a lot of different

1004
00:44:35.489 --> 00:44:37.940 A:middle L:90%
uh, switches and users and months on the sprint

1005
00:44:37.940 --> 00:44:39.000 A:middle L:90%
dataset as well. So using this we were able

1006
00:44:39.000 --> 00:44:43.269 A:middle L:90%
to find really interesting communities. For example, this

1007
00:44:43.269 --> 00:44:45.320 A:middle L:90%
is a patent graph. This is just who cited

1008
00:44:45.320 --> 00:44:46.590 A:middle L:90%
home in a patent citation network. You can see

1009
00:44:46.590 --> 00:44:49.500 A:middle L:90%
that you can find, we could quickly find these

1010
00:44:49.500 --> 00:44:51.179 A:middle L:90%
two kinds of communities which are like patent from the

1011
00:44:51.179 --> 00:44:53.119 A:middle L:90%
same in rentals. And this shows the curtain based

1012
00:44:53.230 --> 00:44:55.809 A:middle L:90%
geographic, essentially all patents were on the same thing

1013
00:44:55.809 --> 00:44:58.699 A:middle L:90%
and you just go ahead and pick the reference from

1014
00:44:58.699 --> 00:45:00.809 A:middle L:90%
the previous payment and just copied it. So you're

1015
00:45:00.809 --> 00:45:01.880 A:middle L:90%
citing all these previous parents with the other day said

1016
00:45:02.480 --> 00:45:07.199 A:middle L:90%
. And so uh, for the Sprint for example

1017
00:45:07.199 --> 00:45:08.099 A:middle L:90%
, this kind of communities which are near clicks and

1018
00:45:08.099 --> 00:45:12.400 A:middle L:90%
nearby part of course were representative of much more business

1019
00:45:12.409 --> 00:45:15.980 A:middle L:90%
. Uh, many phenomena which are important to business

1020
00:45:15.980 --> 00:45:17.150 A:middle L:90%
right? For example, to all these people who

1021
00:45:17.150 --> 00:45:20.420 A:middle L:90%
are talking to each other. They leave too together

1022
00:45:20.420 --> 00:45:21.780 A:middle L:90%
. Right? One of them leaves the company,

1023
00:45:21.780 --> 00:45:23.199 A:middle L:90%
one of them such as service providers, you the

1024
00:45:23.199 --> 00:45:25.389 A:middle L:90%
others, which also, so that was one thing

1025
00:45:25.389 --> 00:45:27.909 A:middle L:90%
interesting. We were able to actually go ahead and

1026
00:45:27.909 --> 00:45:30.349 A:middle L:90%
talk to the marketing department of Sprint and validate that

1027
00:45:30.360 --> 00:45:34.090 A:middle L:90%
. So down this book. And so yeah,

1028
00:45:34.099 --> 00:45:37.190 A:middle L:90%
the other thing is time series analysis. Uh,

1029
00:45:37.199 --> 00:45:39.030 A:middle L:90%
so here the, one of the data sets which

1030
00:45:39.030 --> 00:45:42.719 A:middle L:90%
I used was like PGP daughter updates, Yeah,

1031
00:45:42.730 --> 00:45:45.360 A:middle L:90%
Pgp daughters that you can imagine just some networking rotors

1032
00:45:45.360 --> 00:45:47.130 A:middle L:90%
which propagate path information across the network. And the

1033
00:45:47.130 --> 00:45:50.380 A:middle L:90%
network be used was in Italy network. It's a

1034
00:45:50.380 --> 00:45:52.349 A:middle L:90%
famous research network all throughout the United States. So

1035
00:45:52.349 --> 00:45:55.119 A:middle L:90%
we had like almost 80 million updates over two years

1036
00:45:55.250 --> 00:45:58.079 A:middle L:90%
. And so what we have is that you have

1037
00:45:58.079 --> 00:46:00.469 A:middle L:90%
time see, is that each of these rotors essentially

1038
00:46:00.480 --> 00:46:02.239 A:middle L:90%
the number of uh traffic per unit time. That's

1039
00:46:02.239 --> 00:46:05.250 A:middle L:90%
it. And you want to find patterns and nominees

1040
00:46:05.260 --> 00:46:07.610 A:middle L:90%
. So very open ended question here. So what

1041
00:46:07.610 --> 00:46:08.789 A:middle L:90%
we did was we were able to find to concentrate

1042
00:46:08.789 --> 00:46:10.809 A:middle L:90%
on two patterns which are important even for a networking

1043
00:46:10.809 --> 00:46:14.940 A:middle L:90%
point of view. And uh, so one of

1044
00:46:14.940 --> 00:46:16.920 A:middle L:90%
them is for example, this is just the uh

1045
00:46:16.929 --> 00:46:21.630 A:middle L:90%
same time series after doing the logarithm market over time

1046
00:46:21.639 --> 00:46:23.579 A:middle L:90%
. You can find that there's something really steady and

1047
00:46:23.579 --> 00:46:25.659 A:middle L:90%
constant happened here. Right. And why is this

1048
00:46:25.659 --> 00:46:28.760 A:middle L:90%
important? This just means that there is a constant

1049
00:46:28.760 --> 00:46:30.179 A:middle L:90%
studies traffic for a long period of time. And

1050
00:46:30.179 --> 00:46:32.519 A:middle L:90%
what it means is that it means it relates to

1051
00:46:32.519 --> 00:46:36.460 A:middle L:90%
a real networking human called wrong flapping. Which is

1052
00:46:36.469 --> 00:46:38.420 A:middle L:90%
uh like people advertising I. P. And then

1053
00:46:38.420 --> 00:46:40.590 A:middle L:90%
take it back and they do this for a long

1054
00:46:40.590 --> 00:46:43.349 A:middle L:90%
period of time. And this is really points to

1055
00:46:43.349 --> 00:46:45.050 A:middle L:90%
inefficiencies in the network. So we were actually using

1056
00:46:45.050 --> 00:46:47.510 A:middle L:90%
a method we were actually able to find uh well

1057
00:46:47.510 --> 00:46:52.019 A:middle L:90%
appointed Alabama supercomputing network which which confirmed that one of

1058
00:46:52.019 --> 00:46:54.340 A:middle L:90%
the Rockies was actually flapping. And uh this when

1059
00:46:54.340 --> 00:46:57.699 A:middle L:90%
detected and resolved almost 30 days. Like there's a

1060
00:46:57.699 --> 00:47:00.539 A:middle L:90%
professionally managed network. And so this shows that it's

1061
00:47:00.539 --> 00:47:01.780 A:middle L:90%
really important to do these kinds of pattern analysis on

1062
00:47:01.780 --> 00:47:05.170 A:middle L:90%
the historical data as well. Right? And the

1063
00:47:05.170 --> 00:47:07.090 A:middle L:90%
other thing is also if you just if I just

1064
00:47:07.090 --> 00:47:08.909 A:middle L:90%
give you this mostly sequence, it's really hard to

1065
00:47:08.909 --> 00:47:13.219 A:middle L:90%
find anything useful there. Right? And it turns

1066
00:47:13.219 --> 00:47:14.909 A:middle L:90%
out that if you just magnify this part of the

1067
00:47:14.909 --> 00:47:17.269 A:middle L:90%
portion there is a really huge short burst of traffic

1068
00:47:17.280 --> 00:47:19.980 A:middle L:90%
which and it's a short bus ride just in eight

1069
00:47:19.980 --> 00:47:22.800 A:middle L:90%
hours by compared to the month long steady activity of

1070
00:47:22.809 --> 00:47:27.260 A:middle L:90%
uh the other event. And what we were able

1071
00:47:27.260 --> 00:47:29.670 A:middle L:90%
to show like this was uh it was because they

1072
00:47:29.670 --> 00:47:31.340 A:middle L:90%
were spammers in some middle schools in china which was

1073
00:47:31.349 --> 00:47:34.530 A:middle L:90%
uh what was happening is that they used to span

1074
00:47:34.539 --> 00:47:37.010 A:middle L:90%
quickly for eight hours and then go and take a

1075
00:47:37.039 --> 00:47:37.550 A:middle L:90%
second I. D. Block and do the same

1076
00:47:37.550 --> 00:47:39.960 A:middle L:90%
thing again. So and we were able to do

1077
00:47:39.960 --> 00:47:43.940 A:middle L:90%
this using a multi scale uh analysis which uses wave

1078
00:47:43.940 --> 00:47:45.010 A:middle L:90%
. Let's I won't go into details but there's an

1079
00:47:45.010 --> 00:47:47.500 A:middle L:90%
algorithm which we developed an algorithm which we quickly identified

1080
00:47:47.510 --> 00:47:51.940 A:middle L:90%
the spikes from the data. So right. The

1081
00:47:51.940 --> 00:47:54.159 A:middle L:90%
last time series analysis question which we try to answer

1082
00:47:54.159 --> 00:47:58.329 A:middle L:90%
is uh answering similarity varies. So this was motivated

1083
00:47:58.329 --> 00:48:00.260 A:middle L:90%
again by the GDP data which we had and also

1084
00:48:00.260 --> 00:48:01.849 A:middle L:90%
different many different kinds of data like data center monitoring

1085
00:48:01.849 --> 00:48:04.869 A:middle L:90%
data where you have time series from different sensors on

1086
00:48:04.869 --> 00:48:08.699 A:middle L:90%
the uh data center uh physio physiotherapy data, healthcare

1087
00:48:08.699 --> 00:48:13.429 A:middle L:90%
data for example heartbeats or some sensors on your body

1088
00:48:13.440 --> 00:48:15.940 A:middle L:90%
. Or even motion capture data where you can have

1089
00:48:15.940 --> 00:48:19.039 A:middle L:90%
markers which attracted movement in time. And what you

1090
00:48:19.039 --> 00:48:20.440 A:middle L:90%
really want to answer is that if you have a

1091
00:48:20.440 --> 00:48:22.739 A:middle L:90%
database of such types of time series can you quickly

1092
00:48:22.739 --> 00:48:25.449 A:middle L:90%
find other similar ones? Given a very. So

1093
00:48:25.449 --> 00:48:28.039 A:middle L:90%
what are the challenges? I just try to give

1094
00:48:28.039 --> 00:48:29.960 A:middle L:90%
you a challenge in the BdB setting. So if

1095
00:48:29.960 --> 00:48:31.550 A:middle L:90%
you have a say the time series from Washington after

1096
00:48:31.550 --> 00:48:36.519 A:middle L:90%
and you have a time series from uh Salt Lake

1097
00:48:36.519 --> 00:48:37.820 A:middle L:90%
city of water, then how do you say whether

1098
00:48:37.820 --> 00:48:42.159 A:middle L:90%
these are similar? So people are used the traditional

1099
00:48:42.159 --> 00:48:45.119 A:middle L:90%
classical methods of Euclidean distance or dynamic time warping,

1100
00:48:45.119 --> 00:48:46.670 A:middle L:90%
which captures like. But the problem with all these

1101
00:48:46.670 --> 00:48:49.559 A:middle L:90%
methods, like for example, for the ingredients,

1102
00:48:49.570 --> 00:48:52.219 A:middle L:90%
if the spice align then clearly there's nothing else you

1103
00:48:52.219 --> 00:48:52.800 A:middle L:90%
need to do. Most of the series will be

1104
00:48:52.800 --> 00:48:55.880 A:middle L:90%
classified as similar. So because these sequences so busty

1105
00:48:55.880 --> 00:48:59.530 A:middle L:90%
, you need something else. And there are uh

1106
00:48:59.539 --> 00:49:02.070 A:middle L:90%
specific problems with other distance functions as well. So

1107
00:49:02.070 --> 00:49:05.300 A:middle L:90%
what we did was we developed a complex, extended

1108
00:49:05.300 --> 00:49:07.630 A:middle L:90%
the classic real value Kalman filters which are like graphical

1109
00:49:07.630 --> 00:49:12.449 A:middle L:90%
models, uh like hidden Markov models which try to

1110
00:49:12.449 --> 00:49:15.250 A:middle L:90%
capture the statistical revolution of the data. So what

1111
00:49:15.250 --> 00:49:17.440 A:middle L:90%
we were able to do that we extended this uh

1112
00:49:17.449 --> 00:49:21.309 A:middle L:90%
real value common filters with complex domain. So we

1113
00:49:21.309 --> 00:49:24.159 A:middle L:90%
have no complex and variable and complex distributions. And

1114
00:49:24.159 --> 00:49:27.369 A:middle L:90%
we were able to develop a E. M.

1115
00:49:27.369 --> 00:49:30.590 A:middle L:90%
Style algorithm again here, which quickly learns these features

1116
00:49:30.599 --> 00:49:31.659 A:middle L:90%
. And once you've given these features you can quickly

1117
00:49:31.659 --> 00:49:34.860 A:middle L:90%
classifying cluster or do anything with them. And the

1118
00:49:34.860 --> 00:49:37.250 A:middle L:90%
nice thing about these features that learns dynamics, it

1119
00:49:37.250 --> 00:49:39.710 A:middle L:90%
learns the underlying uh evolution system. Like for example

1120
00:49:39.710 --> 00:49:42.550 A:middle L:90%
in the motion capture data, it learns how you

1121
00:49:42.550 --> 00:49:45.070 A:middle L:90%
walk there to walking motions. Most of them will

1122
00:49:45.070 --> 00:49:45.570 A:middle L:90%
remain the same, right? If you're walking a

1123
00:49:45.579 --> 00:49:47.630 A:middle L:90%
little bit faster, the dynamics are similar. Just

1124
00:49:47.630 --> 00:49:50.360 A:middle L:90%
the parameters are slightly different, but you want them

1125
00:49:50.360 --> 00:49:52.559 A:middle L:90%
to be clustered together. So. So yeah,

1126
00:49:52.570 --> 00:49:53.699 A:middle L:90%
so it captures all the nice things. And the

1127
00:49:53.710 --> 00:49:57.139 A:middle L:90%
really nice thing also is that includes all these workhorses

1128
00:49:57.139 --> 00:49:59.429 A:middle L:90%
, right? Like the CIA or the regression B

1129
00:49:59.429 --> 00:50:00.369 A:middle L:90%
f D. S special cases of the model.

1130
00:50:00.380 --> 00:50:04.039 A:middle L:90%
Uh Right. And if you apply it on GDP

1131
00:50:04.039 --> 00:50:07.380 A:middle L:90%
data, you can quickly see these clusters like which

1132
00:50:07.380 --> 00:50:08.829 A:middle L:90%
are, which makes sense because the geographical clusters and

1133
00:50:08.840 --> 00:50:12.889 A:middle L:90%
because the GPS routing protocol you would expect geographically closer

1134
00:50:12.889 --> 00:50:14.869 A:middle L:90%
, it is to be essentially similar in traffic.

1135
00:50:14.880 --> 00:50:19.340 A:middle L:90%
So right, right. Finally, future plans.

1136
00:50:19.349 --> 00:50:22.130 A:middle L:90%
So what my research team has essentially touched upon all

1137
00:50:22.130 --> 00:50:23.199 A:middle L:90%
these three things which have already seen like the data

1138
00:50:23.199 --> 00:50:25.909 A:middle L:90%
analysis and the policy action part. So my research

1139
00:50:25.909 --> 00:50:29.550 A:middle L:90%
future plans also are more line with these three right

1140
00:50:29.559 --> 00:50:31.409 A:middle L:90%
now. So the first challenge in the data part

1141
00:50:31.409 --> 00:50:35.099 A:middle L:90%
is of course capability given the unprecedented amount of data

1142
00:50:35.110 --> 00:50:37.300 A:middle L:90%
. Like you really need algorithms and techniques for massive

1143
00:50:37.300 --> 00:50:39.909 A:middle L:90%
graphs and we and the nice the interesting thing here

1144
00:50:39.909 --> 00:50:43.760 A:middle L:90%
is that you have high dimensionality which is you have

1145
00:50:43.760 --> 00:50:45.989 A:middle L:90%
much richer data as well as a large sample size

1146
00:50:45.989 --> 00:50:46.730 A:middle L:90%
. So a lot of data. So you need

1147
00:50:46.730 --> 00:50:49.840 A:middle L:90%
to get level algorithms for both learning models on the

1148
00:50:49.840 --> 00:50:52.510 A:middle L:90%
data if you remember the model learning part and also

1149
00:50:52.510 --> 00:50:55.809 A:middle L:90%
developing policies because if you develop a policy for actually

1150
00:50:55.820 --> 00:50:59.340 A:middle L:90%
manipulating the process, you need to make it act

1151
00:50:59.340 --> 00:51:00.369 A:middle L:90%
on really last data. So it's interactive both of

1152
00:51:00.369 --> 00:51:02.860 A:middle L:90%
these cases. So as part of my at least

1153
00:51:02.869 --> 00:51:06.519 A:middle L:90%
like what the law I use like mattresses, clusters

1154
00:51:06.519 --> 00:51:07.500 A:middle L:90%
and like how to for the data and it's a

1155
00:51:07.510 --> 00:51:09.420 A:middle L:90%
part and also compute uh if you have a computer

1156
00:51:09.420 --> 00:51:14.849 A:middle L:90%
in terms of simulations use like paralyzed systems like so

1157
00:51:14.860 --> 00:51:19.269 A:middle L:90%
so the second trust is on understanding analysis. Like

1158
00:51:19.280 --> 00:51:22.599 A:middle L:90%
it's using models forecast which is essentially you building predictive

1159
00:51:22.599 --> 00:51:24.800 A:middle L:90%
models. So this is one thing for example and

1160
00:51:24.809 --> 00:51:29.159 A:middle L:90%
actually forecasting and back casting the processes. For example

1161
00:51:29.159 --> 00:51:30.670 A:middle L:90%
this is a nice example you have all these people

1162
00:51:30.670 --> 00:51:32.780 A:middle L:90%
infected right? Can you actually reverse engineer epidemic and

1163
00:51:32.780 --> 00:51:36.269 A:middle L:90%
figure out who started it? So that's one way

1164
00:51:36.269 --> 00:51:37.460 A:middle L:90%
of using the model to actually back cast and prevent

1165
00:51:37.460 --> 00:51:39.590 A:middle L:90%
who are the culprits. And the other thing is

1166
00:51:39.590 --> 00:51:42.489 A:middle L:90%
like emerging models. So you if you have suppose

1167
00:51:42.489 --> 00:51:44.960 A:middle L:90%
you have this google trends data right? This essentially

1168
00:51:44.960 --> 00:51:45.429 A:middle L:90%
gives you the number of flu that is for your

1169
00:51:45.440 --> 00:51:49.409 A:middle L:90%
time and you have also CDC data which gives you

1170
00:51:49.409 --> 00:51:52.010 A:middle L:90%
the population density and the distribution of population United States

1171
00:51:52.019 --> 00:51:53.210 A:middle L:90%
. Can and if you have a model of learning

1172
00:51:53.210 --> 00:51:55.059 A:middle L:90%
this time series for example, you can say that

1173
00:51:55.059 --> 00:51:58.309 A:middle L:90%
the slope is a 1.5. So can you actually

1174
00:51:58.309 --> 00:52:00.380 A:middle L:90%
predict which part of the network it actually originated from

1175
00:52:00.389 --> 00:52:01.579 A:middle L:90%
? Can you actually identify which part of the network

1176
00:52:01.579 --> 00:52:06.079 A:middle L:90%
actually gave you that uh post and figure out of

1177
00:52:06.079 --> 00:52:08.440 A:middle L:90%
the epidemic is actually spreading. So essentially combining different

1178
00:52:08.449 --> 00:52:12.420 A:middle L:90%
kinds of models. And the third challenge is active

1179
00:52:12.420 --> 00:52:15.070 A:middle L:90%
policy which is on more online and timely intervention.

1180
00:52:15.079 --> 00:52:17.539 A:middle L:90%
So right now I show you immunization algorithms where he

1181
00:52:17.550 --> 00:52:20.860 A:middle L:90%
wants to the decision, you didn't really change it

1182
00:52:20.860 --> 00:52:22.559 A:middle L:90%
. Right? So how do you update these decisions

1183
00:52:22.559 --> 00:52:23.929 A:middle L:90%
on time? So for example if you have a

1184
00:52:23.929 --> 00:52:27.070 A:middle L:90%
lot of money and should be about money, one

1185
00:52:27.070 --> 00:52:29.980 A:middle L:90%
going a marketing campaign, say on twitter or you

1186
00:52:29.980 --> 00:52:31.559 A:middle L:90%
should try to spread the money or a month given

1187
00:52:31.559 --> 00:52:35.150 A:middle L:90%
how it goes. Or you should just uh have

1188
00:52:35.150 --> 00:52:37.030 A:middle L:90%
a constant uh supply. What to do with the

1189
00:52:37.030 --> 00:52:40.059 A:middle L:90%
vaccination campaign is not going well. Uh how do

1190
00:52:40.059 --> 00:52:43.269 A:middle L:90%
you change the parameters which how do you target and

1191
00:52:43.269 --> 00:52:45.690 A:middle L:90%
change the organization algorithms? So, uh final thing

1192
00:52:45.690 --> 00:52:47.300 A:middle L:90%
is like you want to tighten this, I want

1193
00:52:47.300 --> 00:52:51.159 A:middle L:90%
to tighten really try to describe the analysis and policy

1194
00:52:51.159 --> 00:52:53.659 A:middle L:90%
and action. So for example, right now the

1195
00:52:53.670 --> 00:52:57.460 A:middle L:90%
object IDs are probably not being transparent in the design

1196
00:52:57.460 --> 00:52:59.699 A:middle L:90%
and analysis of these actions. Right? So for

1197
00:52:59.699 --> 00:53:00.980 A:middle L:90%
example, when you want to do immunization, do

1198
00:53:00.980 --> 00:53:02.840 A:middle L:90%
you want to minimize the expected number of people in

1199
00:53:02.849 --> 00:53:07.539 A:middle L:90%
uh effective or want to minimize economic damage? How

1200
00:53:07.539 --> 00:53:09.409 A:middle L:90%
do you actually bring them into analysis part and actually

1201
00:53:09.420 --> 00:53:13.269 A:middle L:90%
do an analysis and given optimal or near optimal algorithms

1202
00:53:13.280 --> 00:53:15.110 A:middle L:90%
for this? Uh Also collaborating with the miners group

1203
00:53:15.110 --> 00:53:17.130 A:middle L:90%
that Minister of Pittsburgh, which is like, which

1204
00:53:17.130 --> 00:53:21.239 A:middle L:90%
is also an agent based simulations group, which where

1205
00:53:21.239 --> 00:53:22.550 A:middle L:90%
we're trying to study can be actually uh w optimal

1206
00:53:22.550 --> 00:53:27.150 A:middle L:90%
algorithms for such on the fly campaigns. So,

1207
00:53:27.389 --> 00:53:29.659 A:middle L:90%
uh hopefully, finally, I've given you a sense

1208
00:53:29.659 --> 00:53:31.239 A:middle L:90%
of that. The dynamical process on networks is a

1209
00:53:31.239 --> 00:53:34.570 A:middle L:90%
really rich area. The comments, problems, incompatible

1210
00:53:34.570 --> 00:53:37.730 A:middle L:90%
settings and not only see us uh areas are really

1211
00:53:37.730 --> 00:53:43.570 A:middle L:90%
heavily involved like machine and statistics. Uh Computer systems

1212
00:53:43.570 --> 00:53:45.630 A:middle L:90%
for data analysis uh like hello big data analysis and

1213
00:53:45.630 --> 00:53:49.369 A:middle L:90%
theory and algorithms but also they have really outreach and

1214
00:53:49.369 --> 00:53:52.940 A:middle L:90%
applications in many different areas like biology, like ecologies

1215
00:53:52.949 --> 00:53:54.690 A:middle L:90%
, you're always in epidemiology, Public health physics like

1216
00:53:54.690 --> 00:53:59.070 A:middle L:90%
money in the systems, uh social sciences, understanding

1217
00:53:59.070 --> 00:54:01.690 A:middle L:90%
human behavior, mobility and economic like money marketing and

1218
00:54:01.699 --> 00:54:05.599 A:middle L:90%
a lot of different things. So just a bit

1219
00:54:05.599 --> 00:54:07.780 A:middle L:90%
of shameless self promotion is the list of publications I

1220
00:54:07.780 --> 00:54:10.010 A:middle L:90%
have and uh the stars represent the publications I talked

1221
00:54:10.010 --> 00:54:13.280 A:middle L:90%
about in the talk that double stars represent. I

1222
00:54:13.280 --> 00:54:15.559 A:middle L:90%
meant it a bit more detail. Uh a couple

1223
00:54:15.559 --> 00:54:20.050 A:middle L:90%
of patents in summit advance. Uh I wish to

1224
00:54:20.050 --> 00:54:22.639 A:middle L:90%
thank my collaborators from the different universities and research labs

1225
00:54:22.639 --> 00:54:27.840 A:middle L:90%
and also graduate students and uh also the funding agencies

1226
00:54:27.849 --> 00:54:32.760 A:middle L:90%
. Thank you. Uh Mhm. Okay. Mhm

1227
00:54:34.150 -->  A:middle L:90%
. Yeah.

