WEBVTT

1
00:00:18.539 --> 00:00:23.649 A:middle L:90%
Morning everybody. Mhm. Morning, thanks. Thanks

2
00:00:23.649 --> 00:00:27.300 A:middle L:90%
for coming. We uh a little bit difficulty with

3
00:00:27.300 --> 00:00:28.620 A:middle L:90%
some of the new microphones, but I think we

4
00:00:28.620 --> 00:00:30.539 A:middle L:90%
got them working. I'm gonna try

5
00:00:30.539 --> 00:00:32.600 A:middle L:90%
hard not to trip over this cord. Our table

6
00:00:32.600 --> 00:00:36.590 A:middle L:90%
apparently needs to be rebooted. Um but and no

7
00:00:36.590 --> 00:00:39.850 A:middle L:90%
tables get rebooted, but thanks for coming. Um

8
00:00:43.039 --> 00:00:45.740 A:middle L:90%
That's data management a nutshell. And we wanted to

9
00:00:45.740 --> 00:00:48.390 A:middle L:90%
talk a little bit this morning before we go straight

10
00:00:48.390 --> 00:00:50.439 A:middle L:90%
into data management, wanted to talk a little bit

11
00:00:50.450 --> 00:00:56.350 A:middle L:90%
about data management. Look for computers and what.

12
00:00:57.250 --> 00:01:03.000 A:middle L:90%
Yeah, this is old fashioned data management. Not

13
00:01:03.000 --> 00:01:04.120 A:middle L:90%
done very well. It was a Supreme Court Records

14
00:01:04.120 --> 00:01:08.969 A:middle L:90%
Room and the Gambia. These are uh Supreme Court

15
00:01:08.969 --> 00:01:11.980 A:middle L:90%
records that date all the way back to the 17th

16
00:01:11.980 --> 00:01:15.150 A:middle L:90%
century and there was some civil unrest and this is

17
00:01:15.150 --> 00:01:19.829 A:middle L:90%
the result. Um So before we start talking about

18
00:01:19.829 --> 00:01:23.260 A:middle L:90%
digital data and data management, I want to emphasize

19
00:01:23.260 --> 00:01:26.530 A:middle L:90%
that we know how to do analog and we think

20
00:01:26.530 --> 00:01:29.090 A:middle L:90%
we know how to do digital, but we don't

21
00:01:29.090 --> 00:01:30.480 A:middle L:90%
always get it right and it's when we don't get

22
00:01:30.480 --> 00:01:33.859 A:middle L:90%
it right, that the management really comes into play

23
00:01:33.870 --> 00:01:37.219 A:middle L:90%
and can help us out. Um That's just another

24
00:01:37.219 --> 00:01:41.170 A:middle L:90%
example. Yeah. This is here in our own

25
00:01:41.170 --> 00:01:42.700 A:middle L:90%
country. Where was this again, Nathan University of

26
00:01:42.700 --> 00:01:48.049 A:middle L:90%
Maryland Libraries after the earthquake? Yeah, two years

27
00:01:48.049 --> 00:01:49.159 A:middle L:90%
ago you can google this and it gets worse.

28
00:01:49.170 --> 00:01:51.150 A:middle L:90%
The more you look at it, the more your

29
00:01:51.150 --> 00:01:53.409 A:middle L:90%
heart breaks. Um then where do you think?

30
00:01:53.409 --> 00:01:55.629 A:middle L:90%
Oh man, this is this is not what we

31
00:01:55.640 --> 00:01:59.480 A:middle L:90%
hope to ever have to deal with. But if

32
00:01:59.480 --> 00:02:01.980 A:middle L:90%
we keep going along the disaster lane, I'll read

33
00:02:01.980 --> 00:02:06.030 A:middle L:90%
you the quote here. Jackson Mississippi Estimated they would

34
00:02:06.030 --> 00:02:10.780 A:middle L:90%
spend at least$2 million Hurricane Katrina. A systematic

35
00:02:10.780 --> 00:02:14.710 A:middle L:90%
program of producing microfilm or some other form of backup

36
00:02:14.719 --> 00:02:16.330 A:middle L:90%
would have eliminated the need for such recovery and probably

37
00:02:16.330 --> 00:02:21.650 A:middle L:90%
would have proved far less expensive. Which county was

38
00:02:21.650 --> 00:02:27.189 A:middle L:90%
that? Jackson County? Yeah. Where was Katrina

39
00:02:27.189 --> 00:02:30.430 A:middle L:90%
again? Katrina was helped me out new Orleans and

40
00:02:30.430 --> 00:02:34.360 A:middle L:90%
a whole bunch of place. That's right person type

41
00:02:35.539 --> 00:02:37.060 A:middle L:90%
. Yeah. All the way over in Mississippi.

42
00:02:38.539 --> 00:02:40.539 A:middle L:90%
This is one county, not even, not even

43
00:02:40.539 --> 00:02:44.849 A:middle L:90%
close to the center. The damage, It's gonna

44
00:02:44.849 --> 00:02:46.620 A:middle L:90%
spend$2 million dollars backing up. It's just their

45
00:02:46.620 --> 00:02:51.099 A:middle L:90%
court records, Other places got it worse, two

46
00:02:51.099 --> 00:02:53.400 A:middle L:90%
lane and so forth. And it's not just county

47
00:02:53.409 --> 00:02:54.699 A:middle L:90%
court records. I just print, you know,

48
00:02:54.699 --> 00:02:57.560 A:middle L:90%
a lot of people keep their service in the basement

49
00:02:59.340 --> 00:03:01.069 A:middle L:90%
. Yeah. And when we were looking for examples

50
00:03:01.080 --> 00:03:04.060 A:middle L:90%
of this chilling was one of the ways we looked

51
00:03:04.060 --> 00:03:06.770 A:middle L:90%
at and it was, I think we just chose

52
00:03:06.770 --> 00:03:07.400 A:middle L:90%
not to use those pictures that was so heartbreaking.

53
00:03:07.400 --> 00:03:12.340 A:middle L:90%
It was people in the Hazmat suits down in mold

54
00:03:12.419 --> 00:03:16.250 A:middle L:90%
and uh so when people start saying well you know

55
00:03:16.250 --> 00:03:19.400 A:middle L:90%
if you if you go digital and you've got the

56
00:03:19.400 --> 00:03:23.750 A:middle L:90%
data management problem. Yeah which is true. You

57
00:03:23.750 --> 00:03:29.650 A:middle L:90%
do. Um I always like to go to example

58
00:03:29.650 --> 00:03:31.750 A:middle L:90%
the library of alexandria burned. Mhm. And then

59
00:03:31.750 --> 00:03:35.840 A:middle L:90%
when the second one around going digital we at least

60
00:03:35.840 --> 00:03:40.479 A:middle L:90%
have the advantage that um Yeah well least have the

61
00:03:40.479 --> 00:03:45.219 A:middle L:90%
advantage. The backups are a little easier but that's

62
00:03:45.219 --> 00:03:46.770 A:middle L:90%
just one part of the sort of matrix that has

63
00:03:46.770 --> 00:03:51.080 A:middle L:90%
to come together to do a good data management program

64
00:03:51.090 --> 00:03:53.000 A:middle L:90%
. As an example of sort of hitting across all

65
00:03:53.000 --> 00:03:55.129 A:middle L:90%
of the different squares that have to be checked and

66
00:03:55.129 --> 00:03:58.860 A:middle L:90%
the Matrix um I thought we'd play a little short

67
00:03:58.860 --> 00:04:00.560 A:middle L:90%
examples about four or five minutes. Right. I

68
00:04:00.560 --> 00:04:12.659 A:middle L:90%
mean Hello my name is dr judy Benign. I'm

69
00:04:12.659 --> 00:04:14.800 A:middle L:90%
an oncologist at N. Y. U. School

70
00:04:14.800 --> 00:04:16.810 A:middle L:90%
of Medicine. Hello dr judy Benign. I read

71
00:04:16.810 --> 00:04:19.860 A:middle L:90%
your article on B cell function. I think that

72
00:04:19.860 --> 00:04:23.439 A:middle L:90%
I could use the data for my work on pancreatic

73
00:04:23.439 --> 00:04:27.240 A:middle L:90%
cancer. I am not an oncologist. I know

74
00:04:27.240 --> 00:04:29.199 A:middle L:90%
but I think I could use the daily for my

75
00:04:29.199 --> 00:04:31.709 A:middle L:90%
work on pancreatic cancer. Do you have the data

76
00:04:31.720 --> 00:04:34.100 A:middle L:90%
? Everything you need to know is in the article

77
00:04:34.110 --> 00:04:38.170 A:middle L:90%
No. What I need is the data. Will

78
00:04:38.170 --> 00:04:41.329 A:middle L:90%
you share your data? I am not sure that

79
00:04:41.329 --> 00:04:43.879 A:middle L:90%
will be possible but your work is in pub met

80
00:04:43.879 --> 00:04:46.199 A:middle L:90%
central and was funded by NIH. That is true

81
00:04:46.209 --> 00:04:49.350 A:middle L:90%
and it was published in science which requires that you

82
00:04:49.350 --> 00:04:53.319 A:middle L:90%
share your data. I did publish in science then

83
00:04:53.319 --> 00:04:56.970 A:middle L:90%
I am requesting your data. Can I have a

84
00:04:56.970 --> 00:05:01.230 A:middle L:90%
copy of your data? I am not sure where

85
00:05:01.230 --> 00:05:04.160 A:middle L:90%
my data is but surely you saved your data.

86
00:05:04.170 --> 00:05:08.250 A:middle L:90%
I did. I saved it on a USB drive

87
00:05:08.259 --> 00:05:10.050 A:middle L:90%
. Where is the U. S. B drive

88
00:05:11.139 --> 00:05:15.019 A:middle L:90%
? It is in the box, it is in

89
00:05:15.019 --> 00:05:18.060 A:middle L:90%
a box at home. I just moved but can

90
00:05:18.060 --> 00:05:21.449 A:middle L:90%
I use your data? There are many boxes,

91
00:05:21.839 --> 00:05:28.449 A:middle L:90%
so many boxes I forgot to label the boxes.

92
00:05:35.939 --> 00:05:39.899 A:middle L:90%
Hello again. Thank you for sending me a copy

93
00:05:39.899 --> 00:05:43.180 A:middle L:90%
of your data on a USB drive. I received

94
00:05:43.180 --> 00:05:46.439 A:middle L:90%
the envelope yesterday. You were welcome but I will

95
00:05:46.439 --> 00:05:48.209 A:middle L:90%
need that back when you were finished. That is

96
00:05:48.209 --> 00:05:51.709 A:middle L:90%
my only copy. I did have a question.

97
00:05:51.720 --> 00:05:56.000 A:middle L:90%
What is your question? You might find the answer

98
00:05:56.000 --> 00:05:59.500 A:middle L:90%
in my article. No I received the data but

99
00:05:59.500 --> 00:06:01.230 A:middle L:90%
when I opened it up it was in hexi decimal

100
00:06:01.240 --> 00:06:04.350 A:middle L:90%
. Yes, that is right. I cannot read

101
00:06:04.350 --> 00:06:08.300 A:middle L:90%
exodus Imo you asked for my data and I gave

102
00:06:08.300 --> 00:06:11.060 A:middle L:90%
it to you. I have done what you asked

103
00:06:12.040 --> 00:06:13.930 A:middle L:90%
but is there a way to read the hex a

104
00:06:13.930 --> 00:06:15.730 A:middle L:90%
decimal? You will need the program that created the

105
00:06:15.730 --> 00:06:19.120 A:middle L:90%
exoticism of file. Yes, I will. What

106
00:06:19.120 --> 00:06:21.180 A:middle L:90%
is the name of the program site? Oh,

107
00:06:21.180 --> 00:06:24.899 A:middle L:90%
synth. I do not know this program. It

108
00:06:24.899 --> 00:06:28.069 A:middle L:90%
was a very good program. The company that made

109
00:06:28.069 --> 00:06:30.610 A:middle L:90%
the program went bankrupt in 2007. Do you have

110
00:06:30.610 --> 00:06:32.819 A:middle L:90%
a copy of the program? I do not use

111
00:06:32.819 --> 00:06:35.740 A:middle L:90%
this program anymore because the company that made it went

112
00:06:35.740 --> 00:06:40.850 A:middle L:90%
bankrupt, Maybe you can buy a copy on Ebay

113
00:06:48.040 --> 00:06:50.360 A:middle L:90%
. I have good news. You again? I

114
00:06:50.360 --> 00:06:54.339 A:middle L:90%
talked to my colleague. She knew a person with

115
00:06:54.339 --> 00:06:56.670 A:middle L:90%
a copy of the software. Then why do you

116
00:06:56.670 --> 00:06:59.329 A:middle L:90%
need me? Everything you need to know about?

117
00:06:59.329 --> 00:07:00.779 A:middle L:90%
The data is in the article I opened the data

118
00:07:00.779 --> 00:07:03.300 A:middle L:90%
and I could not understand it. If you have

119
00:07:03.300 --> 00:07:06.060 A:middle L:90%
the program you will find it is clear. Well

120
00:07:06.439 --> 00:07:11.009 A:middle L:90%
I noticed that you called your data fields SAM is

121
00:07:11.009 --> 00:07:14.180 A:middle L:90%
that an abbreviation? Yes, it is an abbreviation

122
00:07:14.180 --> 00:07:16.449 A:middle L:90%
of my co authors name. His name is Samuel

123
00:07:16.449 --> 00:07:19.560 A:middle L:90%
. Lee. We call them some. I see

124
00:07:19.939 --> 00:07:23.379 A:middle L:90%
. And what is the content of the field called

125
00:07:23.389 --> 00:07:26.100 A:middle L:90%
? SaM one? Ah yes. SaM One is

126
00:07:26.100 --> 00:07:28.430 A:middle L:90%
the level of C x c R for expression.

127
00:07:28.439 --> 00:07:30.920 A:middle L:90%
And what is the content of the field called?

128
00:07:30.930 --> 00:07:32.459 A:middle L:90%
SaM to? That is logical. If you think

129
00:07:32.459 --> 00:07:35.670 A:middle L:90%
about it, what is the content of the field

130
00:07:35.670 --> 00:07:43.339 A:middle L:90%
called? SaM too? I don't remember what about

131
00:07:43.350 --> 00:07:47.930 A:middle L:90%
Sam 3? Is there a guide to the data

132
00:07:47.930 --> 00:07:50.980 A:middle L:90%
anywhere? Yes, of course, it is the

133
00:07:50.980 --> 00:07:56.170 A:middle L:90%
article that was published in science. The article does

134
00:07:56.170 --> 00:07:58.660 A:middle L:90%
not tell me what the field names mean. Is

135
00:07:58.660 --> 00:08:01.379 A:middle L:90%
there any record of what these field names mean?

136
00:08:01.389 --> 00:08:03.649 A:middle L:90%
Yes. My co author knows what the content of

137
00:08:03.660 --> 00:08:11.529 A:middle L:90%
SAM two is. And SAM three And Sam four

138
00:08:11.540 --> 00:08:15.569 A:middle L:90%
. Can I talk to your co author? I'm

139
00:08:15.569 --> 00:08:18.000 A:middle L:90%
not sure I would very much like to talk to

140
00:08:18.000 --> 00:08:20.879 A:middle L:90%
your co author. Well, he was a graduate

141
00:08:20.879 --> 00:08:24.160 A:middle L:90%
student. He went back to china two years ago

142
00:08:24.240 --> 00:08:28.889 A:middle L:90%
. Can I have his contact information? He is

143
00:08:28.889 --> 00:08:33.019 A:middle L:90%
in china. His name is SAM lee. I

144
00:08:33.019 --> 00:08:35.519 A:middle L:90%
think I cannot use your data. You could check

145
00:08:35.519 --> 00:08:37.850 A:middle L:90%
the article to see if what you need is there

146
00:08:37.870 --> 00:08:48.539 A:middle L:90%
. Please stop talking now. Mhm. Yeah,

147
00:08:48.039 --> 00:08:50.649 A:middle L:90%
it sort of makes you want to face palm,

148
00:08:50.649 --> 00:08:54.159 A:middle L:90%
but I mean, how often you run into that

149
00:08:54.240 --> 00:08:58.360 A:middle L:90%
? This is uh it's not terribly uncommon, sadly

150
00:09:01.940 --> 00:09:09.350 A:middle L:90%
successfully transitioned back without Texan work. That's right.

151
00:09:11.039 --> 00:09:16.590 A:middle L:90%
Got a link up. No, there's a link

152
00:09:16.590 --> 00:09:18.590 A:middle L:90%
on the bottom of this slide where everything above that

153
00:09:18.590 --> 00:09:22.169 A:middle L:90%
link comes from. Um you're probably familiar with the

154
00:09:22.179 --> 00:09:28.250 A:middle L:90%
recent White House directive that says, Well, read

155
00:09:28.250 --> 00:09:31.850 A:middle L:90%
the key part here. Federal agencies with more than

156
00:09:31.850 --> 00:09:35.320 A:middle L:90%
a$100 million dollars and research and development expenditures must

157
00:09:35.320 --> 00:09:37.049 A:middle L:90%
develop plans to make the published results of federally funded

158
00:09:37.049 --> 00:09:39.990 A:middle L:90%
research freely available to the public within one year of

159
00:09:39.990 --> 00:09:43.710 A:middle L:90%
publication, researchers must better account for and manage the

160
00:09:43.710 --> 00:09:48.399 A:middle L:90%
digital data resulting from federally funded scientific research. So

161
00:09:48.399 --> 00:09:52.980 A:middle L:90%
it's not just for publishing science or admit it's it's

162
00:09:52.980 --> 00:09:56.309 A:middle L:90%
a Anyone with a federal grant from an agency that

163
00:09:56.309 --> 00:09:58.679 A:middle L:90%
has more than$100 million dollars in research and development

164
00:09:58.679 --> 00:10:03.889 A:middle L:90%
expenditures went from just NIH possible. And then to

165
00:10:03.889 --> 00:10:07.830 A:middle L:90%
a few journals. NSF some sub agencies or sub

166
00:10:07.830 --> 00:10:11.509 A:middle L:90%
directorates of any H and now it's all federal agencies

167
00:10:11.519 --> 00:10:16.370 A:middle L:90%
with with large research and development expenditures. How many

168
00:10:16.370 --> 00:10:20.940 A:middle L:90%
is that? Well, let's see at Virginia tech

169
00:10:20.950 --> 00:10:26.409 A:middle L:90%
alone, we've got we get funding from NIH NSF

170
00:10:26.139 --> 00:10:31.860 A:middle L:90%
uh Department of Defense, um Nasa, uh Department

171
00:10:31.860 --> 00:10:37.649 A:middle L:90%
of Agriculture, um this Department of Energy. Um

172
00:10:37.940 --> 00:10:43.159 A:middle L:90%
Let's see there's Department of Interior Funding, there's a

173
00:10:45.539 --> 00:10:48.259 A:middle L:90%
department department health is part of an age. So

174
00:10:48.549 --> 00:10:52.659 A:middle L:90%
the list of alphabet soup. Can you name any

175
00:10:52.659 --> 00:10:56.370 A:middle L:90%
more federal agencies? I mean pretty much all of

176
00:10:56.370 --> 00:10:58.960 A:middle L:90%
them. So All of them have$100 million.

177
00:10:58.970 --> 00:11:01.860 A:middle L:90%
I don't know if they all do, but those

178
00:11:01.860 --> 00:11:03.789 A:middle L:90%
are the ones that apply that those I mean when

179
00:11:03.789 --> 00:11:05.889 A:middle L:90%
you look at the list of funding agencies in terms

180
00:11:05.889 --> 00:11:09.580 A:middle L:90%
of research dollars that that we get at Virginia Tech

181
00:11:09.590 --> 00:11:13.850 A:middle L:90%
from from various funding agencies. you know, it

182
00:11:13.850 --> 00:11:16.960 A:middle L:90%
applies to all the yeah like 2/3 of the money

183
00:11:16.960 --> 00:11:22.019 A:middle L:90%
that we get from from federal programs interesting because the

184
00:11:22.019 --> 00:11:28.110 A:middle L:90%
White House specifically scientific research. So what does that

185
00:11:28.110 --> 00:11:33.750 A:middle L:90%
mean? There's actually a lot of that directive was

186
00:11:33.750 --> 00:11:39.009 A:middle L:90%
pretty open to interpretation. Okay. Mhm. I

187
00:11:39.009 --> 00:11:41.240 A:middle L:90%
like the first part citizens deserve easy access to the

188
00:11:41.240 --> 00:11:50.559 A:middle L:90%
results, you know that they paid for. So

189
00:11:50.940 --> 00:11:52.240 A:middle L:90%
I would have to go. So this is um

190
00:11:52.250 --> 00:11:54.620 A:middle L:90%
an answer to a question I posed to environmental studies

191
00:11:54.620 --> 00:11:58.100 A:middle L:90%
researcher and he said I'd have to go in and

192
00:11:58.100 --> 00:12:01.629 A:middle L:90%
make the data little more. I know the best

193
00:12:01.629 --> 00:12:03.289 A:middle L:90%
practices that I should keep it all better than I

194
00:12:03.289 --> 00:12:05.250 A:middle L:90%
do and my students keep it better because I teach

195
00:12:05.250 --> 00:12:05.820 A:middle L:90%
them to but I do as I do rather than

196
00:12:05.820 --> 00:12:09.220 A:middle L:90%
what I teach. So this is an environmental studies

197
00:12:09.220 --> 00:12:11.250 A:middle L:90%
researcher here and he told me this when I was

198
00:12:11.250 --> 00:12:16.929 A:middle L:90%
interviewing him for my dissertation research and and this this

199
00:12:16.929 --> 00:12:18.610 A:middle L:90%
is um as we're putting this together recall this one

200
00:12:18.610 --> 00:12:22.120 A:middle L:90%
but there's I recall this one in particular but there's

201
00:12:22.120 --> 00:12:24.350 A:middle L:90%
a whole bunch of other people said pretty much the

202
00:12:24.350 --> 00:12:28.929 A:middle L:90%
same thing. Um It's a lot of work to

203
00:12:28.940 --> 00:12:31.809 A:middle L:90%
make your data manageable too, You know it's easy

204
00:12:31.809 --> 00:12:33.649 A:middle L:90%
to write sample and Sam to Sam three Sam 4

205
00:12:33.700 --> 00:12:35.860 A:middle L:90%
and especially when you're working with several other people.

206
00:12:37.840 --> 00:12:39.490 A:middle L:90%
Um and you know, it's easy to have a

207
00:12:39.490 --> 00:12:43.480 A:middle L:90%
spreadsheet where even if you label it, do you

208
00:12:43.659 --> 00:12:48.240 A:middle L:90%
see what that label means somewhere? Um Think about

209
00:12:48.240 --> 00:12:50.000 A:middle L:90%
the spreadsheets that you have right now in your office

210
00:12:50.000 --> 00:12:52.509 A:middle L:90%
that you work on. If you got hit by

211
00:12:52.509 --> 00:12:54.740 A:middle L:90%
a bus today, what's one of your colleagues be

212
00:12:54.740 --> 00:12:56.740 A:middle L:90%
able to take over what you're doing and roll with

213
00:12:56.740 --> 00:13:00.190 A:middle L:90%
it. What do they have, what they need

214
00:13:00.840 --> 00:13:03.870 A:middle L:90%
to be able to continue your work and is that

215
00:13:03.870 --> 00:13:09.139 A:middle L:90%
important? Um So when you think about the uh

216
00:13:09.149 --> 00:13:13.259 A:middle L:90%
the little cartoon, the two bears, um so

217
00:13:13.259 --> 00:13:16.110 A:middle L:90%
they were doing this practice, This person knows that

218
00:13:16.110 --> 00:13:18.320 A:middle L:90%
he should, that the bear didn't know that he

219
00:13:18.320 --> 00:13:20.279 A:middle L:90%
should be doing that, but this person knew that

220
00:13:20.279 --> 00:13:22.730 A:middle L:90%
he should. But let's think about what his incentives

221
00:13:22.730 --> 00:13:26.440 A:middle L:90%
are average in tech for, for for doing this

222
00:13:26.440 --> 00:13:30.840 A:middle L:90%
or not doing it. His his performance, he's

223
00:13:30.840 --> 00:13:33.659 A:middle L:90%
an associate professor by the way his performance is measured

224
00:13:33.039 --> 00:13:37.220 A:middle L:90%
by, um, by the articles he publishes by

225
00:13:37.220 --> 00:13:41.990 A:middle L:90%
his teaching and by a service. Um, there's

226
00:13:41.990 --> 00:13:43.570 A:middle L:90%
, there's no line in the faculty annual report.

227
00:13:43.580 --> 00:13:48.240 A:middle L:90%
There's no line in the promotion and tenure requirements that

228
00:13:48.240 --> 00:13:52.399 A:middle L:90%
say manager data. Well, um, it's since

229
00:13:52.399 --> 00:13:54.799 A:middle L:90%
it's brand new that, you know, this directive

230
00:13:54.799 --> 00:13:58.799 A:middle L:90%
is brand new that applies to research grants. You

231
00:13:58.799 --> 00:14:01.600 A:middle L:90%
don't have to have a research grant to get tenure

232
00:14:01.610 --> 00:14:03.190 A:middle L:90%
, although it depends on your, probably you do

233
00:14:03.190 --> 00:14:05.769 A:middle L:90%
in some departments on campus. But um, so

234
00:14:05.769 --> 00:14:09.759 A:middle L:90%
this person, it has very little incentive to actually

235
00:14:09.139 --> 00:14:11.399 A:middle L:90%
follow through and manage his data in a way that

236
00:14:11.399 --> 00:14:13.049 A:middle L:90%
it could be used by other people. He knows

237
00:14:13.049 --> 00:14:16.029 A:middle L:90%
what his fields mean. He's not as bad as

238
00:14:16.029 --> 00:14:18.379 A:middle L:90%
the bear who doesn't know what it feels. Me

239
00:14:18.389 --> 00:14:20.759 A:middle L:90%
, his grad student that he can't contact anymore,

240
00:14:20.759 --> 00:14:22.190 A:middle L:90%
know what it feels meaning. This guy knows what

241
00:14:22.190 --> 00:14:24.820 A:middle L:90%
his fields mean. But in order to make it

242
00:14:24.820 --> 00:14:28.029 A:middle L:90%
so that other people you can read those can understand

243
00:14:28.029 --> 00:14:31.039 A:middle L:90%
his data. Um He uh would it would take

244
00:14:31.039 --> 00:14:33.009 A:middle L:90%
a lot of extra work and what benefit would there

245
00:14:33.009 --> 00:14:39.919 A:middle L:90%
be for him to do that? Pretty much,

246
00:14:39.929 --> 00:14:41.779 A:middle L:90%
there'd be no benefit. Um He might maybe somebody

247
00:14:41.779 --> 00:14:46.220 A:middle L:90%
else should be able to use his research and how

248
00:14:46.220 --> 00:14:50.259 A:middle L:90%
the Virginia tech recognize that they don't there's if you

249
00:14:50.259 --> 00:14:52.159 A:middle L:90%
get if somebody else uses your data there is you

250
00:14:52.159 --> 00:14:54.490 A:middle L:90%
get zero credit for that. As far as having

251
00:14:54.490 --> 00:15:01.029 A:middle L:90%
a financial professional incentive you get maybe um warm fuzzies

252
00:15:01.029 --> 00:15:03.879 A:middle L:90%
and and brownie points in your in your community.

253
00:15:03.879 --> 00:15:07.980 A:middle L:90%
But that's it's not the same as a as a

254
00:15:07.990 --> 00:15:09.549 A:middle L:90%
the same incentive as his promotion Virginia. Right?

255
00:15:11.440 --> 00:15:16.100 A:middle L:90%
Remember that? Yeah. Yeah. So what I'm

256
00:15:16.100 --> 00:15:20.200 A:middle L:90%
talking about data management, how to do it beyond

257
00:15:20.210 --> 00:15:24.509 A:middle L:90%
incentivizing it. Um There are some some ways we

258
00:15:24.509 --> 00:15:28.059 A:middle L:90%
can talk with faculty about maintaining their data and making

259
00:15:28.059 --> 00:15:31.509 A:middle L:90%
it available that can standardize and make it easier for

260
00:15:31.509 --> 00:15:35.259 A:middle L:90%
others to use. And that that starts with open

261
00:15:35.259 --> 00:15:39.659 A:middle L:90%
standards and what those open standards are. Changes from

262
00:15:39.740 --> 00:15:43.850 A:middle L:90%
data format, the data format. Um So why

263
00:15:43.850 --> 00:15:46.350 A:middle L:90%
we didn't list everything here just wouldn't fit on the

264
00:15:46.350 --> 00:15:48.700 A:middle L:90%
slide. There's nobody done. Um Which is sort

265
00:15:48.700 --> 00:15:50.590 A:middle L:90%
of the next point there. Part of the next

266
00:15:50.590 --> 00:15:52.659 A:middle L:90%
point is that there's a lot of open standards based

267
00:15:52.659 --> 00:15:54.669 A:middle L:90%
on the data and part of it is keeping data

268
00:15:54.669 --> 00:15:58.120 A:middle L:90%
in multiple formats in multiple ways is a big part

269
00:15:58.129 --> 00:16:00.190 A:middle L:90%
of making sure that is available to other people to

270
00:16:00.200 --> 00:16:04.750 A:middle L:90%
use um as a geographically distributed preservation. This gets

271
00:16:04.759 --> 00:16:08.340 A:middle L:90%
difficult with data that can't really be shared to the

272
00:16:08.350 --> 00:16:11.779 A:middle L:90%
public. Um It has to be an itemized after

273
00:16:11.779 --> 00:16:15.649 A:middle L:90%
its after its uh they're finished with the research.

274
00:16:15.039 --> 00:16:22.570 A:middle L:90%
Um those faculty generally know that and they're often exempt

275
00:16:22.570 --> 00:16:25.820 A:middle L:90%
anyway from, they can say that they're exempt because

276
00:16:25.830 --> 00:16:27.080 A:middle L:90%
with a lot of the Department of Defense grants some

277
00:16:27.080 --> 00:16:30.210 A:middle L:90%
of them, you know, the Department of Defense

278
00:16:30.210 --> 00:16:33.779 A:middle L:90%
does oceanographic Research. You know, if you're if

279
00:16:33.779 --> 00:16:36.039 A:middle L:90%
you're plotting the floor of the ocean or find a

280
00:16:36.049 --> 00:16:37.980 A:middle L:90%
better way to make more accurate maps of the floor

281
00:16:37.980 --> 00:16:40.269 A:middle L:90%
of the ocean. This is really important for naval

282
00:16:40.269 --> 00:16:42.279 A:middle L:90%
research. That's not top secret stuff though, as

283
00:16:42.279 --> 00:16:49.269 A:middle L:90%
opposed to design propulsion for weapons systems. So,

284
00:16:51.340 --> 00:16:53.960 A:middle L:90%
so some data is not to be shared. Um

285
00:16:55.840 --> 00:16:59.980 A:middle L:90%
but with the with the open standards that's uh multiple

286
00:16:59.980 --> 00:17:02.470 A:middle L:90%
formats, so you don't write something in a program

287
00:17:02.470 --> 00:17:03.890 A:middle L:90%
that that's no longer supported when it goes bankrupt when

288
00:17:03.890 --> 00:17:07.009 A:middle L:90%
the company goes bankrupt in two years. Or let's

289
00:17:07.009 --> 00:17:11.440 A:middle L:90%
say if you write your data in a standard in

290
00:17:11.450 --> 00:17:12.420 A:middle L:90%
, let's say you save it in Excel. Everyone

291
00:17:12.420 --> 00:17:15.470 A:middle L:90%
uses Excel. Right. Well, can you open

292
00:17:15.470 --> 00:17:19.160 A:middle L:90%
an Excel file? Um And can you open itself

293
00:17:19.170 --> 00:17:23.950 A:middle L:90%
from from 1998 in your current software environment? Not

294
00:17:23.950 --> 00:17:26.849 A:middle L:90%
without some little bit of monkey wrenching around. Uh

295
00:17:27.539 --> 00:17:30.230 A:middle L:90%
And often times it was for me, yes,

296
00:17:30.240 --> 00:17:36.279 A:middle L:90%
he was formatting. So we talk about open standards

297
00:17:36.279 --> 00:17:38.059 A:middle L:90%
and we talk about migration, which is up their

298
00:17:38.069 --> 00:17:42.660 A:middle L:90%
migration and obsolescence. Um I have a colleague who

299
00:17:42.660 --> 00:17:47.460 A:middle L:90%
does a lot of data preservation work with the government

300
00:17:47.839 --> 00:17:49.759 A:middle L:90%
. And I asked him once at a conference,

301
00:17:49.759 --> 00:17:51.880 A:middle L:90%
I said, you know, how do you measure

302
00:17:51.880 --> 00:17:53.609 A:middle L:90%
the sort of market and know what you how do

303
00:17:53.609 --> 00:17:56.339 A:middle L:90%
you measure what you've got stored and look at the

304
00:17:56.339 --> 00:17:59.990 A:middle L:90%
format you have and whether or not some of your

305
00:17:59.990 --> 00:18:02.269 A:middle L:90%
data should be migrated forward based on the software that

306
00:18:02.269 --> 00:18:03.240 A:middle L:90%
created it. And I thought he was gonna have

307
00:18:03.240 --> 00:18:07.240 A:middle L:90%
this big long fancy answer documentation research to do and

308
00:18:07.240 --> 00:18:07.829 A:middle L:90%
he said they do a lot of the research.

309
00:18:07.839 --> 00:18:10.490 A:middle L:90%
They said the key factor is whether or not to

310
00:18:10.500 --> 00:18:11.980 A:middle L:90%
go down the best buy and buy a program that

311
00:18:11.980 --> 00:18:15.279 A:middle L:90%
will open it. And I thought that's actually pretty

312
00:18:15.279 --> 00:18:15.579 A:middle L:90%
simple way of keeping track of it. If the

313
00:18:15.579 --> 00:18:18.660 A:middle L:90%
market is still supporting an application that will open that

314
00:18:18.660 --> 00:18:22.049 A:middle L:90%
data, you don't have the obsolescence problem yet.

315
00:18:22.539 --> 00:18:23.200 A:middle L:90%
But he said it is something that they have to

316
00:18:23.210 --> 00:18:29.710 A:middle L:90%
keep on top of all the time and privacy and

317
00:18:29.710 --> 00:18:30.180 A:middle L:90%
security. I think we talked a little bit about

318
00:18:30.190 --> 00:18:37.799 A:middle L:90%
that and on the library services for data management,

319
00:18:37.819 --> 00:18:41.049 A:middle L:90%
what is we here in the library can provide.

320
00:18:41.920 --> 00:18:44.460 A:middle L:90%
This is the part where you pay attention in case

321
00:18:44.460 --> 00:18:47.339 A:middle L:90%
somebody at what we hope your pay attention alone.

322
00:18:47.339 --> 00:18:48.579 A:middle L:90%
But if somebody asks you later and say, what

323
00:18:48.579 --> 00:18:52.079 A:middle L:90%
is the library do for us about this data management

324
00:18:52.079 --> 00:18:55.170 A:middle L:90%
stuff that I've been hearing about this? This is

325
00:18:55.170 --> 00:19:00.819 A:middle L:90%
the answer right here for today. Right. So

326
00:19:00.819 --> 00:19:03.710 A:middle L:90%
we do data management plan consulting. DNP is data

327
00:19:03.710 --> 00:19:07.589 A:middle L:90%
management planning. Um we do that through FBI sessions

328
00:19:07.599 --> 00:19:10.630 A:middle L:90%
, we do one on one consultations and we support

329
00:19:10.640 --> 00:19:15.140 A:middle L:90%
the DNP tool which is a um something you can

330
00:19:15.140 --> 00:19:18.259 A:middle L:90%
log into with your, with your kid and you

331
00:19:18.259 --> 00:19:23.190 A:middle L:90%
can choose your choose the the agency that you're applying

332
00:19:23.190 --> 00:19:26.049 A:middle L:90%
for funding with and you can see the requirements for

333
00:19:26.049 --> 00:19:29.589 A:middle L:90%
data management planning for that agency. And it will

334
00:19:29.589 --> 00:19:33.410 A:middle L:90%
help you um answer all those questions by just giving

335
00:19:33.410 --> 00:19:34.799 A:middle L:90%
you one question time and then it puts out your

336
00:19:34.799 --> 00:19:37.829 A:middle L:90%
answers in a text file. Then you can format

337
00:19:37.829 --> 00:19:42.400 A:middle L:90%
it for your proposal. Right? We have a

338
00:19:42.400 --> 00:19:48.059 A:middle L:90%
storage and access through V tech works. Um Yes

339
00:19:49.339 --> 00:19:51.960 A:middle L:90%
, you don't talk about me. Sure, I

340
00:19:52.140 --> 00:19:56.549 A:middle L:90%
heard of it. Um So with storage we can't

341
00:19:56.549 --> 00:19:59.900 A:middle L:90%
just store anything that somebody throws at us without we

342
00:19:59.900 --> 00:20:00.509 A:middle L:90%
we have to budget for things in advance or to

343
00:20:00.509 --> 00:20:04.250 A:middle L:90%
plan. Um So if somebody says Nathan, I've

344
00:20:04.259 --> 00:20:07.069 A:middle L:90%
got some data that like to archive and say,

345
00:20:07.079 --> 00:20:10.119 A:middle L:90%
what is it? And how much is my first

346
00:20:10.119 --> 00:20:11.000 A:middle L:90%
question I asked and I don't even care what's it

347
00:20:11.000 --> 00:20:15.670 A:middle L:90%
about, but basically if they if they say they

348
00:20:15.670 --> 00:20:19.740 A:middle L:90%
have tabular data spreadsheets and I say ok, and

349
00:20:19.740 --> 00:20:22.450 A:middle L:90%
that those can be small. There typically are,

350
00:20:22.460 --> 00:20:25.900 A:middle L:90%
they can be really large, but typically I don't

351
00:20:25.900 --> 00:20:29.559 A:middle L:90%
see them getting into the gigabytes. Um So I

352
00:20:29.559 --> 00:20:33.579 A:middle L:90%
say okay, um if somebody says they have,

353
00:20:33.589 --> 00:20:37.940 A:middle L:90%
you know, land set files um and that they've

354
00:20:37.950 --> 00:20:40.009 A:middle L:90%
got, you know, for gigs or four terabytes

355
00:20:40.009 --> 00:20:45.470 A:middle L:90%
of uh imagery of uh of uh laser output,

356
00:20:45.480 --> 00:20:47.420 A:middle L:90%
you know, which I can't even imagine what that

357
00:20:47.430 --> 00:20:48.809 A:middle L:90%
looks like or how it works, but um I

358
00:20:48.809 --> 00:20:51.940 A:middle L:90%
know that I can't upload four terabytes through a web

359
00:20:51.940 --> 00:20:55.069 A:middle L:90%
interface. Um and that they can't either. Um

360
00:20:55.079 --> 00:20:56.720 A:middle L:90%
And I know that if we put that anywhere on

361
00:20:56.720 --> 00:21:00.390 A:middle L:90%
a library system, um Curtis or paul or somebody

362
00:21:00.390 --> 00:21:03.690 A:middle L:90%
will call with a question, what's this about?

363
00:21:03.700 --> 00:21:07.599 A:middle L:90%
So we like to inform our our partners here in

364
00:21:07.599 --> 00:21:11.730 A:middle L:90%
the library about what's happening well in advance so that

365
00:21:11.890 --> 00:21:15.240 A:middle L:90%
they can prepare for it. Um and access is

366
00:21:15.250 --> 00:21:18.380 A:middle L:90%
through the tech works so we can index that data

367
00:21:18.390 --> 00:21:21.579 A:middle L:90%
and see what it means, provided that somebody at

368
00:21:21.579 --> 00:21:25.619 A:middle L:90%
the researcher actually knows um and can can and can

369
00:21:25.630 --> 00:21:29.859 A:middle L:90%
describe that. And um once it's in V tech

370
00:21:29.869 --> 00:21:32.730 A:middle L:90%
works, even if you can't just download the set

371
00:21:32.740 --> 00:21:34.529 A:middle L:90%
, you can at least contact the researcher. Maybe

372
00:21:34.539 --> 00:21:36.950 A:middle L:90%
you've got that. We might have put the data

373
00:21:36.950 --> 00:21:37.380 A:middle L:90%
in there. Maybe we're storing it somewhere else in

374
00:21:37.380 --> 00:21:40.950 A:middle L:90%
the library. Maybe it's in a cloud storage that

375
00:21:40.960 --> 00:21:44.490 A:middle L:90%
jews got a contract with. So we can we

376
00:21:44.490 --> 00:21:47.259 A:middle L:90%
can handle larger data sets but it just takes a

377
00:21:47.259 --> 00:21:49.980 A:middle L:90%
bit of planning in advance. Which is sort of

378
00:21:49.980 --> 00:21:53.859 A:middle L:90%
the next point As one part of the automatic consultant

379
00:21:53.859 --> 00:21:57.500 A:middle L:90%
library like to work with faculty early on. And

380
00:21:57.500 --> 00:22:03.119 A:middle L:90%
is there preferences are developing their data Um and help

381
00:22:03.119 --> 00:22:04.509 A:middle L:90%
to try and avoid Sam one Sam to Sam three

382
00:22:04.509 --> 00:22:08.420 A:middle L:90%
field construction. Um as well as where it's going

383
00:22:08.420 --> 00:22:10.960 A:middle L:90%
to end up when they're done with their projects.

384
00:22:11.539 --> 00:22:15.970 A:middle L:90%
These projects usually have termination dates and then they're responsible

385
00:22:15.970 --> 00:22:18.089 A:middle L:90%
a lot of times to the granting agency or to

386
00:22:18.089 --> 00:22:19.059 A:middle L:90%
the thunder for maintaining the status for a while.

387
00:22:19.539 --> 00:22:23.170 A:middle L:90%
Um So all that falls under data management consultant and

388
00:22:25.240 --> 00:22:26.970 A:middle L:90%
uh you want talk about version Sure. I'll say

389
00:22:26.970 --> 00:22:30.069 A:middle L:90%
something about dimension consulting though. I have yet to

390
00:22:30.069 --> 00:22:33.099 A:middle L:90%
meet a faculty member who can in advance Stay estimate

391
00:22:33.099 --> 00:22:37.259 A:middle L:90%
within 100 GB of or within five gigabytes of how

392
00:22:37.259 --> 00:22:40.950 A:middle L:90%
much data they're going to produce with their research.

393
00:22:40.960 --> 00:22:42.940 A:middle L:90%
Um And I also haven't met one yet who will

394
00:22:42.940 --> 00:22:45.859 A:middle L:90%
say what the fields will be before they start collecting

395
00:22:45.859 --> 00:22:48.450 A:middle L:90%
the data and I can relate. So it's it's

396
00:22:48.450 --> 00:22:52.569 A:middle L:90%
not it's not an easy diametrical insulting isn't easy.

397
00:22:52.579 --> 00:22:55.130 A:middle L:90%
Even if you know the field, it's not it's

398
00:22:55.130 --> 00:22:56.500 A:middle L:90%
not a sit down one time either. It's an

399
00:22:56.500 --> 00:23:02.549 A:middle L:90%
ongoing thing as the project develops. So for versioning

400
00:23:03.279 --> 00:23:07.220 A:middle L:90%
and basically if you create your data in a proprietary

401
00:23:07.220 --> 00:23:10.869 A:middle L:90%
format, because that's all that exists for your field

402
00:23:10.880 --> 00:23:11.480 A:middle L:90%
, or maybe not all that exists, but maybe

403
00:23:11.480 --> 00:23:15.640 A:middle L:90%
all of your colleagues use the same thing. Um

404
00:23:15.650 --> 00:23:18.710 A:middle L:90%
We start versioning. Um Well that's one reason to

405
00:23:18.720 --> 00:23:21.960 A:middle L:90%
do versioning. Uh So we don't want to lose

406
00:23:21.960 --> 00:23:22.750 A:middle L:90%
that. We don't want to lose anything vital when

407
00:23:22.750 --> 00:23:26.329 A:middle L:90%
we transfer it to an open standard to it to

408
00:23:26.329 --> 00:23:29.460 A:middle L:90%
a good standard. Um I was using finger quotes

409
00:23:29.460 --> 00:23:32.670 A:middle L:90%
for in case the video didn't catch that. Um

410
00:23:32.680 --> 00:23:36.980 A:middle L:90%
So if if we have a um we create multiple

411
00:23:36.980 --> 00:23:40.710 A:middle L:90%
versions because when you transfer information from one file format

412
00:23:40.710 --> 00:23:42.960 A:middle L:90%
to another, you can always lose bits of information

413
00:23:42.970 --> 00:23:45.920 A:middle L:90%
. A really simple example of that is uh say

414
00:23:45.920 --> 00:23:48.880 A:middle L:90%
this PowerPoint presentation that we're going through. I don't

415
00:23:48.880 --> 00:23:52.299 A:middle L:90%
think we used any notes at the bottom. We

416
00:23:52.299 --> 00:23:53.640 A:middle L:90%
didn't add any any lecture notes. But if you

417
00:23:53.640 --> 00:23:56.180 A:middle L:90%
do that in the PowerPoint file and then you say

418
00:23:56.180 --> 00:23:59.619 A:middle L:90%
that as a pdf, you lose all the lecture

419
00:23:59.619 --> 00:24:03.700 A:middle L:90%
notes. So you want to really save both versions

420
00:24:03.710 --> 00:24:06.519 A:middle L:90%
because PowerPoint is not going to be supported this version

421
00:24:06.519 --> 00:24:08.049 A:middle L:90%
of PowerPoint, There's no There's a very little likelihood

422
00:24:08.049 --> 00:24:10.559 A:middle L:90%
that you'll be able to open this in 20 years

423
00:24:10.559 --> 00:24:15.200 A:middle L:90%
without um without downloading something else. So we'll create

424
00:24:15.200 --> 00:24:18.640 A:middle L:90%
a pdf which is more likely to be opened to

425
00:24:18.640 --> 00:24:18.990 A:middle L:90%
be able to be open in 20 years. But

426
00:24:18.990 --> 00:24:22.420 A:middle L:90%
we lose information. So that's but that's not,

427
00:24:22.420 --> 00:24:25.349 A:middle L:90%
that's just a very simple example of power points.

428
00:24:25.349 --> 00:24:30.140 A:middle L:90%
It could happen with any file format or let's say

429
00:24:30.140 --> 00:24:33.609 A:middle L:90%
you have your your cells, your columns in in

430
00:24:33.609 --> 00:24:36.440 A:middle L:90%
your spreadsheet format of one way, we've got these

431
00:24:36.440 --> 00:24:38.269 A:middle L:90%
equations that apply to everything in this row. Well

432
00:24:38.269 --> 00:24:41.750 A:middle L:90%
, if you say, if you reformat that to

433
00:24:41.750 --> 00:24:45.960 A:middle L:90%
another file, if you save that another file format

434
00:24:45.970 --> 00:24:48.390 A:middle L:90%
, maybe that file, maybe that format doesn't support

435
00:24:48.390 --> 00:24:51.259 A:middle L:90%
the equations that you that you wrote. So that's

436
00:24:51.259 --> 00:24:55.960 A:middle L:90%
one kind of versioning, the other kind is um

437
00:24:56.539 --> 00:24:57.990 A:middle L:90%
okay, I've written an article about this research that

438
00:24:57.990 --> 00:25:03.599 A:middle L:90%
I did, and I'm going to not just uh

439
00:25:03.609 --> 00:25:04.640 A:middle L:90%
I'm going to archive the data for the research,

440
00:25:04.640 --> 00:25:07.089 A:middle L:90%
and I'm going to archive the software that I used

441
00:25:07.089 --> 00:25:10.599 A:middle L:90%
to process the data. I designed this software myself

442
00:25:10.609 --> 00:25:11.829 A:middle L:90%
. By the way, I didn't actually do this

443
00:25:11.829 --> 00:25:12.490 A:middle L:90%
, I'm speaking, you know, as if I

444
00:25:12.490 --> 00:25:15.750 A:middle L:90%
was someone who did. Um So I've got this

445
00:25:15.750 --> 00:25:18.960 A:middle L:90%
piece, I've got a python program that I wrote

446
00:25:18.140 --> 00:25:21.789 A:middle L:90%
, and the article that I wrote is based on

447
00:25:21.789 --> 00:25:26.089 A:middle L:90%
the python program. Now, uh three years later

448
00:25:26.099 --> 00:25:27.259 A:middle L:90%
, let's say, you know, I've I've come

449
00:25:27.309 --> 00:25:30.240 A:middle L:90%
under based on reviews, have gotten from the article

450
00:25:30.240 --> 00:25:32.910 A:middle L:90%
and based on some of the things I've learned,

451
00:25:32.920 --> 00:25:37.890 A:middle L:90%
I'm going to uh adjust the python program, I'm

452
00:25:37.890 --> 00:25:41.240 A:middle L:90%
gonna tweak it a little bit. And and and

453
00:25:41.250 --> 00:25:42.980 A:middle L:90%
also I've been continuing to gather data in the last

454
00:25:42.980 --> 00:25:45.819 A:middle L:90%
three years, so my data has changed. I

455
00:25:45.819 --> 00:25:47.569 A:middle L:90%
still have that old data, but now I've got

456
00:25:47.569 --> 00:25:49.029 A:middle L:90%
this new data, I'm not going to replace the

457
00:25:49.029 --> 00:25:52.059 A:middle L:90%
old file because there's an article written about that,

458
00:25:52.240 --> 00:25:56.769 A:middle L:90%
I'm going to create a new file um and and

459
00:25:56.779 --> 00:26:00.869 A:middle L:90%
write a new article about that. So if you

460
00:26:00.869 --> 00:26:03.779 A:middle L:90%
have one data set that changes and somebody asks for

461
00:26:03.779 --> 00:26:04.589 A:middle L:90%
your data set and you give it to them,

462
00:26:04.599 --> 00:26:07.670 A:middle L:90%
they're going to get different results from your article three

463
00:26:07.670 --> 00:26:11.299 A:middle L:90%
years ago, then then then you did, and

464
00:26:11.309 --> 00:26:12.779 A:middle L:90%
then you don't have the same kind of transparency for

465
00:26:12.779 --> 00:26:15.619 A:middle L:90%
your research that you would have. Otherwise, if

466
00:26:15.619 --> 00:26:18.869 A:middle L:90%
you if you employ versioning so, and then when

467
00:26:18.869 --> 00:26:21.259 A:middle L:90%
you change the software, you're changing how the data

468
00:26:21.259 --> 00:26:22.920 A:middle L:90%
is processed. And if you give that to somebody

469
00:26:22.920 --> 00:26:23.150 A:middle L:90%
else, they should be able to get the same

470
00:26:23.150 --> 00:26:26.880 A:middle L:90%
answers as you. So if you every time you

471
00:26:26.880 --> 00:26:29.660 A:middle L:90%
change the software, you need to uh track that

472
00:26:29.660 --> 00:26:33.390 A:middle L:90%
change. Um so we archive both versions of the

473
00:26:33.390 --> 00:26:36.900 A:middle L:90%
software and both versions of the data so that other

474
00:26:36.900 --> 00:26:38.039 A:middle L:90%
people can look at read your article and say,

475
00:26:38.049 --> 00:26:41.950 A:middle L:90%
I wonder if they fudge these if they fudge the

476
00:26:41.950 --> 00:26:42.880 A:middle L:90%
results at all? Well, if you show them

477
00:26:42.880 --> 00:26:45.769 A:middle L:90%
the data and you show them the software, then

478
00:26:45.920 --> 00:26:49.549 A:middle L:90%
you designed to process the data, then they can

479
00:26:51.039 --> 00:26:52.930 A:middle L:90%
see that no, you're not full of it.

480
00:26:52.930 --> 00:26:56.549 A:middle L:90%
You actually did do your homework and that you're actually

481
00:26:56.940 --> 00:27:00.039 A:middle L:90%
uh the things that you wrote in the article are

482
00:27:00.039 --> 00:27:00.710 A:middle L:90%
actually true because they can do the same. They

483
00:27:00.710 --> 00:27:03.170 A:middle L:90%
can use the same tools to get the same results

484
00:27:03.539 --> 00:27:06.000 A:middle L:90%
. So that's what we do diversity and that's how

485
00:27:06.000 --> 00:27:10.849 A:middle L:90%
we diversity. Is that clear or do that take

486
00:27:10.849 --> 00:27:15.980 A:middle L:90%
too long? Mhm. I thought we'd open it

487
00:27:15.980 --> 00:27:19.650 A:middle L:90%
up to questions. Yeah. Um I like got

488
00:27:19.650 --> 00:27:36.339 A:middle L:90%
imagine as well. Mhm. Mhm. Yeah.

489
00:27:38.339 --> 00:27:44.549 A:middle L:90%
You have to log in, right. This is

490
00:27:44.549 --> 00:27:45.660 A:middle L:90%
an open source or something that we've created here.

491
00:27:47.240 --> 00:27:51.349 A:middle L:90%
Um This is it's not purchased. It's a it's

492
00:27:51.349 --> 00:27:55.700 A:middle L:90%
developed by California Digital Library and johns Hopkins and a

493
00:27:55.700 --> 00:27:57.359 A:middle L:90%
few other institutions. You via is a partner,

494
00:28:00.039 --> 00:28:04.079 A:middle L:90%
right? Excuse me. And we are now a

495
00:28:04.079 --> 00:28:07.380 A:middle L:90%
contributing member. And I don't know what that means

496
00:28:07.380 --> 00:28:10.269 A:middle L:90%
to. I don't think we actually contributed code um

497
00:28:10.640 --> 00:28:15.289 A:middle L:90%
or money at this point. I think we we

498
00:28:15.289 --> 00:28:18.460 A:middle L:90%
got on early enough that they weren't um that they

499
00:28:18.460 --> 00:28:22.549 A:middle L:90%
weren't saying that they weren't asking for money and I

500
00:28:22.549 --> 00:28:23.369 A:middle L:90%
don't know if they are yet, but I think

501
00:28:23.369 --> 00:28:26.099 A:middle L:90%
that's that might be part of their future, but

502
00:28:26.109 --> 00:28:29.430 A:middle L:90%
you log in with an institution, it goes into

503
00:28:29.430 --> 00:28:32.759 A:middle L:90%
your authentication system at your institution. So for us

504
00:28:32.759 --> 00:28:34.059 A:middle L:90%
it's the period and all those institutions have their own

505
00:28:34.069 --> 00:28:37.960 A:middle L:90%
authentication tools and then it gives you a drop down

506
00:28:37.960 --> 00:28:41.279 A:middle L:90%
list where you can choose what kind of what agency

507
00:28:41.289 --> 00:28:42.960 A:middle L:90%
you're applying for. You make it even bigger scott

508
00:28:44.140 --> 00:28:48.359 A:middle L:90%
. Yeah, so we've got both public and private

509
00:28:48.359 --> 00:28:52.119 A:middle L:90%
funding represented here in this list. Um there's about

510
00:28:52.130 --> 00:28:56.970 A:middle L:90%
12 or 16 different NSF, you see all those

511
00:28:56.970 --> 00:28:59.509 A:middle L:90%
NSF ones in there for different directorates, they each

512
00:28:59.509 --> 00:29:02.380 A:middle L:90%
have their own requirements. There's one for any any

513
00:29:02.380 --> 00:29:07.380 A:middle L:90%
age, this one for no uh NIH mls um

514
00:29:07.390 --> 00:29:11.599 A:middle L:90%
Gordon anymore, so go down from there. Okay

515
00:29:11.609 --> 00:29:15.980 A:middle L:90%
, so almost all of them are for NSF,

516
00:29:15.990 --> 00:29:17.769 A:middle L:90%
I thought the list is longer than that. Actually

517
00:29:18.039 --> 00:29:21.839 A:middle L:90%
, there's a that's all that they have enabled I

518
00:29:21.839 --> 00:29:23.980 A:middle L:90%
guess for the DNP tool. Um There's obviously other

519
00:29:23.980 --> 00:29:30.190 A:middle L:90%
agencies now that as of february require data management planning

520
00:29:30.210 --> 00:29:33.630 A:middle L:90%
or that announced in february, I can't remember.

521
00:29:33.640 --> 00:29:41.460 A:middle L:90%
That takes effect. So and this um plan um

522
00:29:41.240 --> 00:29:44.160 A:middle L:90%
actually could you go back to one of the NSF

523
00:29:44.160 --> 00:29:47.259 A:middle L:90%
ones and I think they it's a good way to

524
00:29:47.259 --> 00:29:52.329 A:middle L:90%
shoot shows up differently as well. Alright. Just

525
00:29:52.329 --> 00:29:53.660 A:middle L:90%
put in a test under that required field there.

526
00:29:57.539 --> 00:30:00.930 A:middle L:90%
So you put the formal solicitation number in here.

527
00:30:00.930 --> 00:30:04.650 A:middle L:90%
This um that's not required. Um But if you

528
00:30:04.650 --> 00:30:07.789 A:middle L:90%
had one researcher who's who's applying for multiple programs or

529
00:30:07.789 --> 00:30:11.380 A:middle L:90%
if you had one librarian who's doing all the grants

530
00:30:11.380 --> 00:30:12.609 A:middle L:90%
for institution or for all the grants that come to

531
00:30:12.609 --> 00:30:15.549 A:middle L:90%
them, they might organize, they might help organize

532
00:30:15.559 --> 00:30:22.170 A:middle L:90%
their their uh their data management planning uh services by

533
00:30:22.180 --> 00:30:23.930 A:middle L:90%
so they can go back and say, what did

534
00:30:23.930 --> 00:30:26.359 A:middle L:90%
I do last year? Okay. Here's all the

535
00:30:26.359 --> 00:30:29.119 A:middle L:90%
solicitation numbers from last year. Uh I'm gonna go

536
00:30:29.119 --> 00:30:30.849 A:middle L:90%
back and see how many of these got funded,

537
00:30:30.509 --> 00:30:33.630 A:middle L:90%
How many didn't so that be a reason to use

538
00:30:33.630 --> 00:30:37.109 A:middle L:90%
a solicitation number, comments just know it's why don't

539
00:30:37.109 --> 00:30:37.759 A:middle L:90%
you go on to the next page, you know

540
00:30:37.759 --> 00:30:40.839 A:middle L:90%
before you do you can see on the left there's

541
00:30:40.839 --> 00:30:42.650 A:middle L:90%
this number list that is kind of telling you where

542
00:30:42.650 --> 00:30:47.390 A:middle L:90%
you are in in the in the data management plan

543
00:30:47.400 --> 00:30:48.220 A:middle L:90%
. So we're not even to the first one yet

544
00:30:48.230 --> 00:30:52.980 A:middle L:90%
, so I can go down. So can you

545
00:30:52.980 --> 00:30:57.599 A:middle L:90%
expand the help? Their hateful. So this is

546
00:30:57.599 --> 00:31:02.230 A:middle L:90%
the basically the first question they're asking you to to

547
00:31:02.230 --> 00:31:07.049 A:middle L:90%
answer when you uh for answering the types of data

548
00:31:07.640 --> 00:31:11.059 A:middle L:90%
. And you see there's formatting tools in there which

549
00:31:11.059 --> 00:31:12.849 A:middle L:90%
I've always kind of ignored because I figure I'm just

550
00:31:12.849 --> 00:31:15.559 A:middle L:90%
going to format it when I I put it into

551
00:31:15.559 --> 00:31:19.450 A:middle L:90%
my text into my word document. Uh huh.

552
00:31:21.740 --> 00:31:22.500 A:middle L:90%
But it will take some time to get some of

553
00:31:22.500 --> 00:31:26.859 A:middle L:90%
this down in it to get something look at at

554
00:31:26.859 --> 00:31:30.519 A:middle L:90%
least. All right. And we got the next

555
00:31:30.519 --> 00:31:37.890 A:middle L:90%
one. So, you know, we've got answer

556
00:31:37.890 --> 00:31:41.660 A:middle L:90%
about types of data then data standards. Um um

557
00:31:42.039 --> 00:31:45.450 A:middle L:90%
So this is asking about how you're going to describe

558
00:31:45.450 --> 00:31:45.819 A:middle L:90%
the data. Are you going to call it San

559
00:31:45.819 --> 00:31:48.490 A:middle L:90%
Juan San to San three or are you going to

560
00:31:48.490 --> 00:31:53.670 A:middle L:90%
call it um you know mets dot title or whatever

561
00:31:53.670 --> 00:31:59.440 A:middle L:90%
it is. So um are you going um so

562
00:31:59.440 --> 00:32:02.829 A:middle L:90%
that's asking about metadata standards and then the next one

563
00:32:02.829 --> 00:32:07.059 A:middle L:90%
after that is what's your policy for access and sharing

564
00:32:07.440 --> 00:32:10.170 A:middle L:90%
? Is this a it's the graphic europe. Is

565
00:32:10.170 --> 00:32:15.450 A:middle L:90%
the is the research that you're proposing going to have

566
00:32:15.940 --> 00:32:20.460 A:middle L:90%
commercial value to to yourself. Are you planning on

567
00:32:20.460 --> 00:32:22.329 A:middle L:90%
patenting something from your research? If so, then

568
00:32:22.329 --> 00:32:25.210 A:middle L:90%
you can say that I'm exempt from sharing this specific

569
00:32:25.210 --> 00:32:29.369 A:middle L:90%
data. Um or you can say I'm using lots

570
00:32:29.369 --> 00:32:32.240 A:middle L:90%
of human subjects in my research and therefore I should

571
00:32:32.240 --> 00:32:35.960 A:middle L:90%
not show share my data because that would violate their

572
00:32:36.440 --> 00:32:39.740 A:middle L:90%
there are their privacy. You can see how this

573
00:32:39.740 --> 00:32:42.849 A:middle L:90%
is a really good way to help folks think through

574
00:32:42.849 --> 00:32:45.430 A:middle L:90%
this stuff up front though and see how it's related

575
00:32:45.430 --> 00:32:47.019 A:middle L:90%
, how the data management plans related to the rest

576
00:32:47.019 --> 00:32:52.089 A:middle L:90%
of the proposal. Um because a lot of people

577
00:32:52.099 --> 00:32:54.130 A:middle L:90%
can skimp on certain elements of the proposal and that

578
00:32:54.130 --> 00:32:57.849 A:middle L:90%
they want to get funded, but they will think

579
00:32:58.240 --> 00:33:00.180 A:middle L:90%
um they'll put so much effort into the research question

580
00:33:00.190 --> 00:33:02.730 A:middle L:90%
and methods, but not into what they'll do with

581
00:33:02.730 --> 00:33:07.779 A:middle L:90%
the results and how they disseminate them. NSF requires

582
00:33:07.789 --> 00:33:10.759 A:middle L:90%
a broader impact statement. So how are you gonna

583
00:33:10.769 --> 00:33:14.380 A:middle L:90%
not just make this be this ivory tower project,

584
00:33:14.380 --> 00:33:15.420 A:middle L:90%
but how is it going to get out to the

585
00:33:15.430 --> 00:33:17.559 A:middle L:90%
people? Well, maybe they write in something about

586
00:33:17.569 --> 00:33:21.450 A:middle L:90%
I'm gonna work with an area high school teacher and

587
00:33:21.450 --> 00:33:23.769 A:middle L:90%
and create lessons will teach us well. That should

588
00:33:23.779 --> 00:33:27.970 A:middle L:90%
be reflected in your access policies for access and sharing

589
00:33:27.980 --> 00:33:30.019 A:middle L:90%
well. And a lot of people do that without

590
00:33:30.019 --> 00:33:30.730 A:middle L:90%
even having to talk to him. High school teacher

591
00:33:30.730 --> 00:33:32.549 A:middle L:90%
beforehand. They'll just make it up as they go

592
00:33:32.559 --> 00:33:37.160 A:middle L:90%
. But the policies for access and sharing this ties

593
00:33:37.160 --> 00:33:40.180 A:middle L:90%
in directly to how um what their broader impact statement

594
00:33:40.180 --> 00:33:42.990 A:middle L:90%
is or how they're going to make this relevant for

595
00:33:42.990 --> 00:33:45.650 A:middle L:90%
society. Yeah. As well as the policies for

596
00:33:45.650 --> 00:33:49.130 A:middle L:90%
reuse and redistribution. If you're going to create lesson

597
00:33:49.130 --> 00:33:52.539 A:middle L:90%
plans out of this, um how will other people

598
00:33:52.539 --> 00:33:53.880 A:middle L:90%
be able to access those lesson plans? Will people

599
00:33:53.880 --> 00:33:58.359 A:middle L:90%
be able to create their own lesson plans from,

600
00:33:58.940 --> 00:34:12.289 A:middle L:90%
from the, from your research and archiving preservation.

601
00:34:12.289 --> 00:34:14.550 A:middle L:90%
And this is we have kind of a blanket statement

602
00:34:14.550 --> 00:34:15.139 A:middle L:90%
we use for if people want to use the tech

603
00:34:15.139 --> 00:34:19.059 A:middle L:90%
works, but it might be, it might it

604
00:34:19.059 --> 00:34:22.380 A:middle L:90%
might make more sense for certain disciplines to use a

605
00:34:22.389 --> 00:34:24.539 A:middle L:90%
disciplinary repositories because really you want to get it to

606
00:34:24.539 --> 00:34:28.320 A:middle L:90%
the audience where it's going to make, have the

607
00:34:28.329 --> 00:34:30.070 A:middle L:90%
biggest impact. People aren't interested in the company to

608
00:34:30.070 --> 00:34:35.469 A:middle L:90%
V tech works to find like the ultimate data set

609
00:34:35.480 --> 00:34:37.840 A:middle L:90%
on astronomy. They might be, there might be

610
00:34:37.840 --> 00:34:44.670 A:middle L:90%
a specific um uh repository that's specifically for astronomy data

611
00:34:45.039 --> 00:34:46.760 A:middle L:90%
and people. Australian researchers go there to look for

612
00:34:46.760 --> 00:35:02.289 A:middle L:90%
it. Yeah, there it is. And you

613
00:35:02.289 --> 00:35:07.030 A:middle L:90%
can say this is plain text rich text given the

614
00:35:07.030 --> 00:35:08.670 A:middle L:90%
formatting, I would have gone with plain. Mhm

615
00:35:09.239 --> 00:35:10.670 A:middle L:90%
. I haven't tried it with me for a minute

616
00:35:10.670 --> 00:35:14.710 A:middle L:90%
. I want to see what it does never from

617
00:35:14.710 --> 00:35:23.260 A:middle L:90%
anything in there, download it right on my desktop

618
00:35:25.440 --> 00:35:28.059 A:middle L:90%
. Yes, for many dozen off a little bit

619
00:35:30.039 --> 00:35:34.159 A:middle L:90%
. So that's how the DNP tool works. Um

620
00:35:34.940 --> 00:35:36.630 A:middle L:90%
Were there any other, it looks like a lot

621
00:35:36.630 --> 00:35:40.489 A:middle L:90%
of, it's not exactly self service for the P

622
00:35:40.489 --> 00:35:47.340 A:middle L:90%
. I. S. Um knowing say the some

623
00:35:47.340 --> 00:35:52.940 A:middle L:90%
of the formatting questions particularly they move more on to

624
00:35:52.949 --> 00:36:00.690 A:middle L:90%
the archiving very side of things. Um Where in

625
00:36:00.690 --> 00:36:05.409 A:middle L:90%
working with researchers, should we point them to the

626
00:36:05.420 --> 00:36:07.329 A:middle L:90%
templates for the individual funders that are available for the

627
00:36:07.340 --> 00:36:12.719 A:middle L:90%
funding requirements part on there and get them thinking about

628
00:36:12.730 --> 00:36:15.639 A:middle L:90%
those those questions. And has that been part of

629
00:36:15.639 --> 00:36:22.179 A:middle L:90%
the consultation process? Um But where if there's still

630
00:36:22.190 --> 00:36:30.119 A:middle L:90%
up against the deadline to submit things that as they

631
00:36:30.130 --> 00:36:35.170 A:middle L:90%
tried to translate the questions of the problems that are

632
00:36:35.170 --> 00:36:38.670 A:middle L:90%
asked in the tool library jargon? Yeah. Or

633
00:36:38.679 --> 00:36:42.409 A:middle L:90%
jargon from any number of sources. The ones that

634
00:36:42.409 --> 00:36:45.909 A:middle L:90%
are not their jargon. What what do you recommend

635
00:36:45.909 --> 00:36:52.539 A:middle L:90%
in terms of pointing people who don't work with us

636
00:36:52.539 --> 00:36:59.449 A:middle L:90%
in a timely fashion to to you mean like find

637
00:36:59.449 --> 00:37:00.579 A:middle L:90%
those things? Do we have or is there a

638
00:37:00.579 --> 00:37:07.090 A:middle L:90%
way, for example, that work as a complement

639
00:37:07.090 --> 00:37:10.260 A:middle L:90%
to the tool? We have some kind of utility

640
00:37:10.260 --> 00:37:14.349 A:middle L:90%
that will allow them to see the questions in advance

641
00:37:15.030 --> 00:37:17.920 A:middle L:90%
before they're they're actually filling in the boxes and that

642
00:37:17.920 --> 00:37:22.030 A:middle L:90%
could say and here are some of the standards of

643
00:37:22.039 --> 00:37:27.960 A:middle L:90%
Dublin core does this and you can uh these are

644
00:37:27.960 --> 00:37:30.679 A:middle L:90%
are sufficient for whomever. So there's a number of

645
00:37:30.900 --> 00:37:37.400 A:middle L:90%
online tutorials for for what that deconstruct the different parts

646
00:37:37.400 --> 00:37:39.619 A:middle L:90%
of the data management plan that you can go through

647
00:37:39.630 --> 00:37:43.789 A:middle L:90%
. Uh We don't have anything up like that for

648
00:37:43.800 --> 00:37:47.969 A:middle L:90%
procedures yet with the research hub. But um there's

649
00:37:47.980 --> 00:37:52.190 A:middle L:90%
, we can customize our DNP tool presence so that

650
00:37:52.199 --> 00:37:54.599 A:middle L:90%
shows up, we can also create an html web

651
00:37:54.599 --> 00:37:58.150 A:middle L:90%
page on the library side that would have that and

652
00:37:58.150 --> 00:38:00.809 A:middle L:90%
other libraries have done that. Um University Minnesota is

653
00:38:00.809 --> 00:38:06.260 A:middle L:90%
kind of famous for, there's um and although after

654
00:38:06.260 --> 00:38:07.739 A:middle L:90%
I was recommended to visit that the first time I

655
00:38:07.750 --> 00:38:09.230 A:middle L:90%
noticed there were lots of dead links, but I

656
00:38:09.230 --> 00:38:13.210 A:middle L:90%
hear lots of people talking about the university Minnesota say

657
00:38:13.219 --> 00:38:17.449 A:middle L:90%
. Um but Eva has good resources, California digital

658
00:38:17.449 --> 00:38:21.690 A:middle L:90%
library resources. So when people ask questions like that

659
00:38:21.699 --> 00:38:24.849 A:middle L:90%
, sometimes I'll refer them to the the website of

660
00:38:24.989 --> 00:38:30.949 A:middle L:90%
another institution for for the for those kind of tutorials

661
00:38:34.619 --> 00:38:38.250 A:middle L:90%
and that will be a part of makes them right

662
00:38:39.719 --> 00:38:51.900 A:middle L:90%
. Yes. So about the dress consumers. I

663
00:38:51.900 --> 00:38:54.610 A:middle L:90%
wonder question about Vicky works. Where is that actually

664
00:38:54.619 --> 00:38:59.650 A:middle L:90%
over the computing center or the partially here or how

665
00:39:00.820 --> 00:39:04.510 A:middle L:90%
Were the back end? Back end is on the

666
00:39:04.519 --> 00:39:06.809 A:middle L:90%
is in the server room on the 4th floor?

667
00:39:06.820 --> 00:39:12.449 A:middle L:90%
It is okay. At least it was. And

668
00:39:12.920 --> 00:39:15.289 A:middle L:90%
September 2011 when they showed it to me. I

669
00:39:15.300 --> 00:39:17.989 A:middle L:90%
still haven't moved it since. I wouldn't know and

670
00:39:17.989 --> 00:39:24.559 A:middle L:90%
I don't particularly care where it lives. I'm just

671
00:39:24.559 --> 00:39:28.840 A:middle L:90%
not in the basement. Yeah. What about the

672
00:39:29.320 --> 00:39:30.989 A:middle L:90%
sixth floor either? Because some library, I've worked

673
00:39:30.989 --> 00:39:34.949 A:middle L:90%
in libraries where there's leaks on the top floor,

674
00:39:36.320 --> 00:39:39.860 A:middle L:90%
usually throughput for don't put it by the window either

675
00:39:40.519 --> 00:39:44.940 A:middle L:90%
. Well, I'm curious what kind of capacity do

676
00:39:45.690 --> 00:39:49.949 A:middle L:90%
not want to fill it up. I think I

677
00:39:49.949 --> 00:39:52.969 A:middle L:90%
think they start us with two terabytes and they didn't

678
00:39:52.969 --> 00:39:54.300 A:middle L:90%
say that that's your limit. They said that they

679
00:39:54.460 --> 00:39:58.539 A:middle L:90%
were given you for this year Um if you need

680
00:39:58.539 --> 00:40:00.460 A:middle L:90%
more in six months, let us know. So

681
00:40:00.469 --> 00:40:02.969 A:middle L:90%
we I think we have and that was back in

682
00:40:02.980 --> 00:40:06.519 A:middle L:90%
to fall of 2011, so we might be a

683
00:40:07.010 --> 00:40:09.489 A:middle L:90%
we probably we probably have more than that now because

684
00:40:09.489 --> 00:40:13.360 A:middle L:90%
they don't think we filled that. So in the

685
00:40:13.360 --> 00:40:16.289 A:middle L:90%
project, hope of the historical aerial photos at this

686
00:40:16.289 --> 00:40:21.090 A:middle L:90%
point. Uh You gave me Right, right.

687
00:40:21.099 --> 00:40:22.730 A:middle L:90%
Some are, but I don't think there's been a

688
00:40:23.409 --> 00:40:27.849 A:middle L:90%
amend that upload uh for that, so I don't

689
00:40:27.849 --> 00:40:31.239 A:middle L:90%
think those are those are positive. Yeah, yeah

690
00:40:34.610 --> 00:40:38.219 A:middle L:90%
, terabytes question. And you said, well,

691
00:40:38.230 --> 00:40:43.739 A:middle L:90%
if you've got four terabytes you're hello? Mhm.

692
00:40:44.409 --> 00:40:46.980 A:middle L:90%
I just wondered how we're going. Things are going

693
00:40:46.980 --> 00:40:52.730 A:middle L:90%
so fast. It used to be that Just a

694
00:40:52.730 --> 00:40:54.559 A:middle L:90%
small amount of data would fill a computer now.

695
00:40:54.559 --> 00:40:58.980 A:middle L:90%
Of course we've got 100 give advice or whatever in

696
00:40:58.980 --> 00:41:02.019 A:middle L:90%
the interest. Little computers. Um I saw a

697
00:41:02.030 --> 00:41:07.610 A:middle L:90%
new thing about how some MIT students had constructed some

698
00:41:07.619 --> 00:41:10.849 A:middle L:90%
weather satellites, like a shoebox and I was just

699
00:41:10.849 --> 00:41:15.400 A:middle L:90%
wondering is there anything out there where people thinking about

700
00:41:15.409 --> 00:41:21.230 A:middle L:90%
how data storage, how pretty soon terabytes are going

701
00:41:21.239 --> 00:41:25.039 A:middle L:90%
to be the new megabytes. Well the terrible when

702
00:41:25.039 --> 00:41:27.960 A:middle L:90%
I said the terabytes can't be upload through us is

703
00:41:27.960 --> 00:41:30.579 A:middle L:90%
basically you can't upload terabytes to any web interface you

704
00:41:30.579 --> 00:41:36.900 A:middle L:90%
can use FTp or or other or other other internet

705
00:41:36.900 --> 00:41:40.599 A:middle L:90%
tools that can transfer larger amounts of data over longer

706
00:41:40.599 --> 00:41:44.869 A:middle L:90%
periods of time. But you can't do it through

707
00:41:44.869 --> 00:41:47.800 A:middle L:90%
like an html page. It's not a question of

708
00:41:47.800 --> 00:41:51.309 A:middle L:90%
storage of the question of bandwidth, other ways of

709
00:41:51.309 --> 00:41:53.969 A:middle L:90%
getting it in there. So storage is becoming a

710
00:41:53.980 --> 00:41:59.719 A:middle L:90%
problem. It's storage some dating like for instance,

711
00:41:59.719 --> 00:42:06.130 A:middle L:90%
space exploration, all those pictures, terabytes of terabytes

712
00:42:06.139 --> 00:42:09.340 A:middle L:90%
or not? It is. And Mhm Yeah,

713
00:42:09.340 --> 00:42:14.980 A:middle L:90%
they started talking about petabytes um which I think 1000

714
00:42:14.980 --> 00:42:20.699 A:middle L:90%
terabytes to petabytes. Um But you actually surprised that

715
00:42:20.710 --> 00:42:23.670 A:middle L:90%
the scientific data like the Sky Survey, the Sloan

716
00:42:23.670 --> 00:42:27.449 A:middle L:90%
Digital Sky Survey uses one of them. Um Takes

717
00:42:27.449 --> 00:42:29.360 A:middle L:90%
up an awful lot of room. It's it's it's

718
00:42:29.360 --> 00:42:31.309 A:middle L:90%
a giggle. Hundreds of gigabytes of data. Um

719
00:42:32.099 --> 00:42:37.320 A:middle L:90%
but it's a it's a video and audio that actually

720
00:42:37.329 --> 00:42:42.570 A:middle L:90%
eats up a lot of space. Um We're talking

721
00:42:42.570 --> 00:42:47.929 A:middle L:90%
about versioning and migration when you're dealing in uh and

722
00:42:49.599 --> 00:42:54.440 A:middle L:90%
photos, video and audio. They're different compression rates

723
00:42:54.440 --> 00:42:59.260 A:middle L:90%
what we listen to on our phones versus what you

724
00:42:59.260 --> 00:43:00.159 A:middle L:90%
might want to have recorded for for editing in the

725
00:43:00.159 --> 00:43:05.469 A:middle L:90%
studio letter. And so generally best practices to hang

726
00:43:05.469 --> 00:43:07.329 A:middle L:90%
onto an uncompressed version. Let me start doing that

727
00:43:07.340 --> 00:43:10.000 A:middle L:90%
. You're talking a lot of data. And uh

728
00:43:10.010 --> 00:43:15.699 A:middle L:90%
one of my favorite examples is that the steven Spielberg's

729
00:43:15.909 --> 00:43:19.539 A:middle L:90%
project. To show a foundation. Thank you.

730
00:43:19.550 --> 00:43:24.099 A:middle L:90%
Show a foundation of doing oral history and their server

731
00:43:24.099 --> 00:43:29.929 A:middle L:90%
room Could easily hold 10 of the digital side service

732
00:43:30.699 --> 00:43:34.059 A:middle L:90%
. And they're already full of their capacity. So

733
00:43:34.059 --> 00:43:39.139 A:middle L:90%
they had to expand some. Um So something outrageous

734
00:43:39.139 --> 00:43:40.480 A:middle L:90%
later. They had to invent, they had to

735
00:43:40.480 --> 00:43:44.030 A:middle L:90%
invent robots, storage for them for them, you

736
00:43:44.030 --> 00:43:47.900 A:middle L:90%
know? Well another thing is sad storage is yeah

737
00:43:47.909 --> 00:43:51.500 A:middle L:90%
is less of a problem than than uh bandwidth.

738
00:43:51.500 --> 00:43:55.190 A:middle L:90%
So access. Yeah. So um say like amazon

739
00:43:55.190 --> 00:43:59.500 A:middle L:90%
Glacier is a cloud service. It costs a penny

740
00:43:59.500 --> 00:44:01.500 A:middle L:90%
per gigabyte per month. So I don't care.

741
00:44:01.500 --> 00:44:05.920 A:middle L:90%
Goodbye. It's that's a dollar a month for storage

742
00:44:07.090 --> 00:44:09.000 A:middle L:90%
. You get charged not for storing it really when

743
00:44:09.000 --> 00:44:10.619 A:middle L:90%
you get a charge of dollar month restoring it,

744
00:44:10.889 --> 00:44:15.769 A:middle L:90%
you get charged for downloading it. That's when we

745
00:44:15.780 --> 00:44:20.710 A:middle L:90%
get you. So um and same with a flicker

746
00:44:20.719 --> 00:44:25.579 A:middle L:90%
if zero photographer I believe so um No, well

747
00:44:25.590 --> 00:44:29.230 A:middle L:90%
so maybe you have nice photos in your office.

748
00:44:29.239 --> 00:44:32.719 A:middle L:90%
Well um flicker for the pro accounts. You're not

749
00:44:32.719 --> 00:44:37.929 A:middle L:90%
charged for storage. Treasure uploading new things. You're

750
00:44:37.929 --> 00:44:39.539 A:middle L:90%
not charged for the things you uploaded last month.

751
00:44:39.539 --> 00:44:42.230 A:middle L:90%
You're charged with things. You're uploading this month.

752
00:44:42.420 --> 00:44:44.869 A:middle L:90%
It's I mean you're not charged, you're basically you

753
00:44:44.880 --> 00:44:50.340 A:middle L:90%
pay a an annual rate and you are allowed so

754
00:44:50.340 --> 00:44:53.530 A:middle L:90%
many gigabytes per month to upload. It doesn't matter

755
00:44:53.539 --> 00:44:55.579 A:middle L:90%
how long you been doing that for. But it's

756
00:44:55.579 --> 00:44:58.760 A:middle L:90%
part of data management. We're talking about cloud services

757
00:44:58.760 --> 00:45:00.130 A:middle L:90%
. We're talking about that. Sometimes we have to

758
00:45:00.130 --> 00:45:02.099 A:middle L:90%
point out the faculty that depends on where their funding

759
00:45:02.099 --> 00:45:05.530 A:middle L:90%
comes from. They can't use these clouds sources because

760
00:45:05.539 --> 00:45:07.119 A:middle L:90%
amazon is not gonna tell you where they, where

761
00:45:07.119 --> 00:45:07.809 A:middle L:90%
the computers are. Might not be in this country

762
00:45:08.389 --> 00:45:12.000 A:middle L:90%
where your dad is sitting and depending on your depending

763
00:45:12.000 --> 00:45:15.510 A:middle L:90%
on faculties funding research um they can't put it there

764
00:45:15.190 --> 00:45:17.429 A:middle L:90%
, you know, even if amazon says well secured

765
00:45:17.429 --> 00:45:21.710 A:middle L:90%
and it's password protected. Yeah, they won't tell

766
00:45:21.710 --> 00:45:24.219 A:middle L:90%
you where where it's sitting, where that physical server

767
00:45:24.219 --> 00:45:27.710 A:middle L:90%
is and for some funding that can't be outside of

768
00:45:27.710 --> 00:45:31.230 A:middle L:90%
the country, so Exactly. You know, it

769
00:45:31.230 --> 00:45:34.260 A:middle L:90%
might be on one of amazon servers in china and

770
00:45:34.269 --> 00:45:35.840 A:middle L:90%
I don't think d o d will be very happy

771
00:45:35.840 --> 00:45:38.389 A:middle L:90%
about that. So it's actually the transport of data

772
00:45:38.400 --> 00:45:45.099 A:middle L:90%
that's more expensive items that it's more of a technical

773
00:45:45.110 --> 00:45:45.880 A:middle L:90%
, it's more expensive, is more of a technical

774
00:45:45.880 --> 00:45:50.099 A:middle L:90%
challenge. So when somebody wants to upload data to

775
00:45:50.099 --> 00:45:52.260 A:middle L:90%
be tech works, if they want to do something

776
00:45:52.260 --> 00:45:54.530 A:middle L:90%
very large, they have to basically upload it to

777
00:45:54.539 --> 00:46:00.800 A:middle L:90%
a different server through um through a file transfer protocol

778
00:46:01.139 --> 00:46:02.619 A:middle L:90%
. And they're secure ways of doing that. But

779
00:46:02.630 --> 00:46:06.199 A:middle L:90%
basically they get it twisted, they give them a

780
00:46:07.079 --> 00:46:08.849 A:middle L:90%
a server address and they upload it. They log

781
00:46:08.849 --> 00:46:12.409 A:middle L:90%
into that space and they uploaded there, uh,

782
00:46:12.780 --> 00:46:16.329 A:middle L:90%
just through a direct uh ftp address or essay or

783
00:46:16.329 --> 00:46:21.320 A:middle L:90%
ssh or something. But um, they don't,

784
00:46:21.329 --> 00:46:22.300 A:middle L:90%
they don't upload directly to be tech works if it's

785
00:46:22.300 --> 00:46:30.030 A:middle L:90%
very large because basically there, their internet service provider

786
00:46:30.030 --> 00:46:31.849 A:middle L:90%
isn't going to handle that level of capacity for through

787
00:46:31.849 --> 00:46:38.929 A:middle L:90%
the web. I'm kind of surprised how text is

788
00:46:38.940 --> 00:46:44.300 A:middle L:90%
this whole? Yeah. Hmm. I mean,

789
00:46:44.780 --> 00:46:47.269 A:middle L:90%
you're basically people, people just friday. Okay,

790
00:46:47.269 --> 00:46:49.929 A:middle L:90%
well, here's my plan. Here's this, here's

791
00:46:49.929 --> 00:46:52.289 A:middle L:90%
that I'm kind of surprised. This is a more

792
00:46:52.289 --> 00:46:57.300 A:middle L:90%
structure types of data. I don't know in terms

793
00:46:57.880 --> 00:47:00.210 A:middle L:90%
radio buttons, checkboxes, things like that. They're

794
00:47:00.219 --> 00:47:04.789 A:middle L:90%
a little bit easier, for instance, or whoever

795
00:47:05.480 --> 00:47:07.510 A:middle L:90%
to figure out a table. Well, this doesn't

796
00:47:07.519 --> 00:47:13.079 A:middle L:90%
this, the mp tool generates text that goes into

797
00:47:13.090 --> 00:47:16.210 A:middle L:90%
a supplementary section of your grant proposal, which they

798
00:47:16.219 --> 00:47:19.900 A:middle L:90%
received as text. The Mp told you when you

799
00:47:19.900 --> 00:47:21.929 A:middle L:90%
submit this, it doesn't go to NSF. It

800
00:47:21.929 --> 00:47:24.579 A:middle L:90%
goes, it goes to your downloads folder And any

801
00:47:24.579 --> 00:47:28.130 A:middle L:90%
attached to the grand you trash it as a supplementary

802
00:47:28.130 --> 00:47:30.250 A:middle L:90%
page two grand. Well, still the agency is

803
00:47:30.250 --> 00:47:35.699 A:middle L:90%
having to basically, what's that information? What?

804
00:47:36.579 --> 00:47:37.300 A:middle L:90%
Yeah, it seems like, I mean, it's

805
00:47:37.300 --> 00:47:40.199 A:middle L:90%
a huge burden on asset in my age or whatever

806
00:47:40.880 --> 00:47:45.039 A:middle L:90%
. Yes, as a huge burden producers as well

807
00:47:45.050 --> 00:47:50.250 A:middle L:90%
. Uh anyone who's on the student's side around here

808
00:47:50.260 --> 00:47:52.960 A:middle L:90%
who worked with folks trying to make sense of end

809
00:47:52.960 --> 00:47:58.639 A:middle L:90%
note and actually make those records conformed all the different

810
00:47:58.639 --> 00:48:00.949 A:middle L:90%
style managers. It's not Endnotes fault, it's not

811
00:48:00.960 --> 00:48:06.170 A:middle L:90%
the data's fault. It's just that there's a lot

812
00:48:06.170 --> 00:48:08.789 A:middle L:90%
of specifications that have to be made, A lot

813
00:48:08.789 --> 00:48:13.050 A:middle L:90%
of people don't do. And my sense is,

814
00:48:13.050 --> 00:48:14.530 A:middle L:90%
and this is really getting back to part of my

815
00:48:14.530 --> 00:48:19.989 A:middle L:90%
earlier question is there's so many different questions out there

816
00:48:20.000 --> 00:48:22.670 A:middle L:90%
that what do you do with the things that are

817
00:48:22.679 --> 00:48:27.940 A:middle L:90%
outliers from the check boxes that the template doesn't doesn't

818
00:48:27.940 --> 00:48:32.059 A:middle L:90%
account, but it's very hard to You have five

819
00:48:32.070 --> 00:48:36.510 A:middle L:90%
radio buttons and then the other box I think I

820
00:48:36.510 --> 00:48:38.400 A:middle L:90%
would assume, but they feel easier to have to

821
00:48:38.400 --> 00:48:44.730 A:middle L:90%
let researchers put it in their own words, limiting

822
00:48:44.730 --> 00:48:47.380 A:middle L:90%
them to a certain set. Mhm. I guess

823
00:48:47.380 --> 00:48:51.869 A:middle L:90%
the disadvantage is a lot of people don't express what

824
00:48:51.880 --> 00:48:54.980 A:middle L:90%
you're doing very well or vaguely or whatever. I

825
00:48:54.980 --> 00:48:59.530 A:middle L:90%
still don't understand what your day is coming from that

826
00:49:00.369 --> 00:49:07.210 A:middle L:90%
. Yeah, way, you know, producer and

827
00:49:07.210 --> 00:49:09.869 A:middle L:90%
the reviewer, it ensures that money is kind of

828
00:49:09.880 --> 00:49:20.320 A:middle L:90%
people what? Mm He said, I think our

829
00:49:20.329 --> 00:49:22.750 A:middle L:90%
our is just about up questions. I am,

830
00:49:22.989 --> 00:49:28.010 A:middle L:90%
yeah, we started really early. I'm only three

831
00:49:28.179 --> 00:49:34.789 A:middle L:90%
weeks early. Thanks for coming. Thank you.

832
00:49:35.469 --> 00:49:36.579 A:middle L:90%
Mhm. Mhm.

