Browsing by Author "Fox, Edward A."
Now showing 1 - 20 of 359
Results Per Page
Sort Options
- 5SGraph: A Modeling Tool for Digital LibrariesZhu, Qinwei (Virginia Tech, 2002-11-18)The high demand for building digital libraries by non-experts requires a simplified modeling process and rapid generation of digital libraries. To enable rapid generation, digital libraries should be modeled with descriptive languages. A visual modeling tool would be helpful to non-experts so they may model a digital library without knowing the theoretical foundations and the syntactical details of the descriptive language. In this thesis, we describe the design and implementation of a domain-specific visual modeling tool, 5SGraph, aimed at modeling digital libraries. 5SGraph is based on a metamodel that describes digital libraries using the 5S theory. The output from 5SGraph is a digital library model that is an instance of the metamodel, expressed in the 5S description language (5SL). 5SGraph presents the metamodel in a structured toolbox, and provides a top-down visual building environment for designers. The visual proximity of the metamodel and instance model facilitates requirements gathering and simplifies the modeling process. Furthermore, 5SGraph maintains semantic constraints specified by the 5S metamodel and enforces these constraints over the instance model to ensure semantic consistency and correctness. 5SGraph enables component reuse to reduce the time and efforts of designers. The results from a pilot usability test confirm the usefulness of 5SGraph.
- 5SL: A Language for Declarative Specification and Generation of Digital LibrariesGoncalves, Marcos A.; Fox, Edward A. (2002-07-01)Digital Libraries (DLs) are among the most complex kinds of information systems, due in part to their intrinsic multi-disciplinary nature. Nowadays DLs are built within monolithic, tightly integrated, and generally inflexible systems- or by assembling disparate components together in an ad-hoc way, with resulting problems in interoperability and adaptability. More importantly, conceptual modeling, requirements analysis, and software engineering approaches are rarely supported, making it extremely difficult to tailor DL content and behavior to the interests, needs, and preferences of particular communities. In this paper, we address these problems. In particular, we present 5SL, a declarative language for specifying and generating domain-specific digital libraries. 5L is based on the 5S formal theory for digital libraries and enables high-level specification of DLs in five complementary dimensions, including: the kinds of multimedia information the DL supports (Stream Model); how that information is structured and organized (Structural Model); different logical and presentational properties and operations of DL components (Spatial Model); the behavior of the DL (Scenario Model); and the different societies of actors and managers of services that act together to carry out the DL behavior (Societal Model). The practical feasibility of the approach is demonstrated by the presentation of a 5SL digital library generator for the MARIAN digital library system.
- ACM Venue Recommender SystemKodur Kumar, Harinni (Virginia Tech, 2020-06-17)A frequent goal of a researcher is to publish his/her work in appropriate conferences and journals. With a large number of options for venues in the microdomains of every research discipline, the issue of selecting suitable locations for publishing cannot be underestimated. Further, the venues diversify themselves in the form of workshops, symposiums, and challenges. Several publishers such as IEEE and Springer have recognized the need to address this issue and have developed journal recommenders. In this thesis, our goal is to design and develop a similar recommendation system for the ACM dataset. We view this recommendation problem from a classification perspective. With the success of deep learning classifiers in recent times and their pervasiveness in several domains, we modeled several 1D Convolutional neural network classifiers for the different venues. When given some submission information like title, keywords, abstract, etc. about a paper, the recommender uses these developed classifier predictions to recommend suitable venues to the user. The dataset used for the project is the ACM Digital Library metadata that includes textual information for research papers and journals submitted at various conferences and journals over the past 60 years. We developed the recommender based on two approaches: 1) A binary CNN classifier per venue (single classifiers), and 2) Group CNN classifiers for venue groups (group classifiers). Our system has achieved a MAP of 0.55 and 0.51 for single and group classifiers. We also show that our system has a high recall rate.
- Advances in aircraft design: multiobjective optimization and a markup languageDeshpande, Shubhangi Govind (Virginia Tech, 2014-01-23)Today's modern aerospace systems exhibit strong interdisciplinary coupling and require a multidisciplinary, collaborative approach. Analysis methods that were once considered feasible only for advanced and detailed design are now available and even practical at the conceptual design stage. This changing philosophy for conducting conceptual design poses additional challenges beyond those encountered in a low fidelity design of aircraft. This thesis takes some steps towards bridging the gaps in existing technologies and advancing the state-of-the-art in aircraft design. The first part of the thesis proposes a new Pareto front approximation method for multiobjective optimization problems. The method employs a hybrid optimization approach using two derivative free direct search techniques, and is intended for solving blackbox simulation based multiobjective optimization problems with possibly nonsmooth functions where the analytical form of the objectives is not known and/or the evaluation of the objective function(s) is very expensive (very common in multidisciplinary design optimization). A new adaptive weighting scheme is proposed to convert a multiobjective optimization problem to a single objective optimization problem. Results show that the method achieves an arbitrarily close approximation to the Pareto front with a good collection of well-distributed nondominated points. The second part deals with the interdisciplinary data communication issues involved in a collaborative mutidisciplinary aircraft design environment. Efficient transfer, sharing, and manipulation of design and analysis data in a collaborative environment demands a formal structured representation of data. XML, a W3C recommendation, is one such standard concomitant with a number of powerful capabilities that alleviate interoperability issues. A compact, generic, and comprehensive XML schema for an aircraft design markup language (ADML) is proposed here to provide a common language for data communication, and to improve efficiency and productivity within a multidisciplinary, collaborative environment. An important feature of the proposed schema is the very expressive and efficient low level schemata. As a proof of concept the schema is used to encode an entire Convair B58. As the complexity of models and number of disciplines increases, the reduction in effort to exchange data models and analysis results in ADML also increases.
- AlcoZone: An Adaptive Hypermedia based Personalized Alcohol EducationBhosale, Devdutta (Virginia Tech, 2006-05-08)In our knowledge based economy, demand for better and effective learning has led to innovative instructional technologies. However, the one-size-fit-all approach taken by many e-Learning systems is not adequate to the different requirements of people who have different goals, preferences, and previous knowledge about a subject. Many e-Learning systems have approached this problem with personalized and customized content. However, many of these systems are closely tied to one particular subject that they are trying to teach; authoring of courses on different subjects using the same framework is a difficult process. Adaptive Hypermedia is an approach in which content presentation and navigation assistance is personalized depending on the requirements of the user. The user requirements are represented using a user model, while the content is represented using a content model. By using a set of algorithms, an Adaptive Hypermedia based system is able to select the most appropriate content to be presented, as the user interacts with the system. The objective of AlcoZone is to educate all of the 5,000 freshman students of Virginia Tech about alcohol education using Adaptive Hypermedia technology, as part of the mandatory university requirement. The course presents different content to different students based on their drinking pattern. AlcoZone integrates Curriculum Sequencing, Multimedia and Interactivity, Alternate Content Explanation, and Navigational Assistance to make the course interesting for students. This research investigates the design & implementation of AlcoZone and its Adaptive Hypermedia based reusable framework for course creation and delivery.
- The AlgoViz Project: Building an Algorithm Visualization Web CommunityAlon, Alexander Joel Dacara (Virginia Tech, 2010-07-15)Algorithm visualizations (AVs) have become a popular teaching aid in classes on algorithms and data structures. The AlgoViz Project attempts to provide an online venue for educators, students, developers,researchers, and other AV users. The Project is comprised of two websites. The first, the AlgoViz Portal, provides two major informational resources: an AV catalog that provides both descriptive and evaluative metadata of indexed visualizations, and an annotated bibliography of research literature. Both resources have over 500 entries and are actively updated by the AV community. The Portal also provides field reports, discussion forums, and other community-building mechanisms. The second website, OpenAlgoViz, is a SourceForge site intended to showcase exemplary AVs, as well as provide logistical and hosting support to AV developers.
- Analysis and Modeling of World Wide Web TrafficAbdulla, Ghaleb (Virginia Tech, 1998-04-27)This dissertation deals with monitoring, collecting, analyzing, and modeling of World Wide Web (WWW) traffic and client interactions. The rapid growth of WWW usage has not been accompanied by an overall understanding of models of information resources and their deployment strategies. Consequently, the current Web architecture often faces performance and reliability problems. Scalability, latency, bandwidth, and disconnected operations are some of the important issues that should be considered when attempting to adjust for the growth in Web usage. The WWW Consortium launched an effort to design a new protocol that will be able to support future demands. Before doing that, however, we need to characterize current users' interactions with the WWW and understand how it is being used. We focus on proxies since they provide a good medium or caching, filtering information, payment methods, and copyright management. We collected proxy data from our environment over a period of more than two years. We also collected data from other sources such as schools, information service providers, and commercial aites. Sampling times range from days to years. We analyzed the collected data looking for important characteristics that can help in designing a better HTTP protocol. We developed a modeling approach that considers Web traffic characteristics such as self-similarity and long-range dependency. We developed an algorithm to characterize users' sessions. Finally we developed a high-level Web traffic model suitable for sensitivity analysis. As a result of this work we develop statistical models of parameters such as arrival times, file sizes, file types, and locality of reference. We describe an approach to model long-range and dependent Web traffic and we characterize activities of users accessing a digital library courseware server or Web search tools. Temporal and spatial locality of reference within examined user communities is high, so caching can be an effective tool to help reduce network traffic and to help solve the scalability problem. We recommend utilizing our findings to promote a smart distribution or push model to cache documents when there is likelihood of repeat accesses.
- Analysis of Moving Events Using TweetsPatil, Supritha Basavaraj (Virginia Tech, 2019-07-02)The Digital Library Research Laboratory (DLRL) has collected over 3.5 billion tweets on different events for the Coordinated, Behaviorally-Aware Recovery for Transportation and Power Disruptions (CBAR-tpd), the Integrated Digital Event Archiving and Library (IDEAL), and the Global Event Trend Archive Research (GETAR) projects. The tweet collection topics include heart attack, solar eclipse, terrorism, etc. There are several collections on naturally occurring events such as hurricanes, floods, and solar eclipses. Such naturally occurring events are distributed across space and time. It would be beneficial to researchers if we can perform a spatial-temporal analysis to test some hypotheses, and to find any trends that tweets would reveal for such events. I apply an existing algorithm to detect locations from tweets by modifying it to work better with the type of datasets I work with. I use the time captured in tweets and also identify the tense of the sentences in tweets to perform the temporal analysis. I build a rule-based model for obtaining the tense of a tweet. The results from these two algorithms are merged to analyze naturally occurring moving events such as solar eclipses and hurricanes. Using the spatial-temporal information from tweets, I study if tweets can be a relevant source of information in understanding the movement of the event. I create visualizations to compare the actual path of the event with the information extracted by my algorithms. After examining the results from the analysis, I noted that Twitter can be a reliable source to identify places affected by moving events almost immediately. The locations obtained are at a more detailed level than in news-wires. We can also identify the time that an event affected a particular region by date.
- Analyzing and Navigating Electronic Theses and DissertationsAhuja, Aman (Virginia Tech, 2023-07-21)Electronic Theses and Dissertations (ETDs) contain valuable scholarly information that can be of immense value to the scholarly community. Millions of ETDs are now publicly available online, often through one of many digital libraries. However, since a majority of these digital libraries are institutional repositories with the objective being content archiving, they often lack end-user services needed to make this valuable data useful for the scholarly community. To effectively utilize such data to address the information needs of users, digital libraries should support various end-user services such as document search and browsing, document recommendation, as well as services to make navigation of long PDF documents easier. In recent years, with advances in the field of machine learning for text data, several techniques have been proposed to support such end-user services. However, limited research has been conducted towards integrating such techniques with digital libraries. This research is aimed at building tools and techniques for discovering and accessing the knowledge buried in ETDs, as well as to support end-user services for digital libraries, such as document browsing and long document navigation. First, we review several machine learning models that can be used to support such services. Next, to support a comprehensive evaluation of different models, as well as to train models that are tailored to the ETD data, we introduce several new datasets from the ETD domain. To minimize the resources required to develop high quality training datasets required for supervised training, a novel AI-aided annotation method is also discussed. Finally, we propose techniques and frameworks to support the various digital library services such as search, browsing, and recommendation. The key contributions of this research are as follows: - A system to help with parsing long scholarly documents such as ETDs by means of object-detection methods trained to extract digital objects from long documents. The parsed documents can be used for further downstream tasks such as long document navigation, figure and/or table search, etc. - Datasets to support supervised training of object detection models on scholarly documents of multiple types, such as born-digital and scanned. In addition to manually annotated datasets, a framework (along with the resulting dataset) for AI-aided annotation also is proposed. - A web-based system for information extraction from long PDF theses and dissertations, into a structured format such as XML, aimed at making scholarly literature more accessible to users with disabilities. - A topic-modeling based framework to support exploration tasks such as searching and/or browsing documents (and document portions, e.g., chapters) by topic, document recommendation, topic recommendation, and describing temporal topic trends.
- Analyzing Networks with Hypergraphs: Detection, Classification, and PredictionAlkulaib, Lulwah Ahmad KH M. (Virginia Tech, 2024-04-02)Recent advances in large graph-based models have shown great performance in a variety of tasks, including node classification, link prediction, and influence modeling. However, these graph-based models struggle to capture high-order relations and interactions among entities effectively, leading them to underperform in many real-world scenarios. This thesis focuses on analyzing networks using hypergraphs for detection, classification, and prediction methods in social media-related problems. In particular, we study four specific applications with four proposed novel methods: detecting topic-specific influential users and tweets via hypergraphs; detecting spatiotemporal, topic-specific, influential users and tweets using hypergraphs; augmenting data in hypergraphs to mitigate class imbalance issues; and introducing a novel hypergraph convolutional network model designed for the multiclass classification of mental health advice in Arabic tweets. For the first method, existing solutions for influential user detection did not consider topics that could produce incorrect results and inadequate performance in that task. The proposed contributions of our work include: 1) Developing a hypergraph framework that detects influential users and tweets. 2) Proposing an effective topic modeling method for short texts. 3) Performing extensive experiments to demonstrate the efficacy of our proposed framework. For the second method, we extend the first method by incorporating spatiotemporal information into our solution. Existing influencer detection methods do not consider spatiotemporal influencers in social media, although influence can be greatly affected by geolocation and time. The contributions of our work for this task include: 1) Proposing a hypergraph framework that spatiotemporally detects influential users and tweets. 2) Developing an effective topic modeling method for short texts that geographically provides the topic distribution. 3) Designing a spatiotemporal topic-specific influencer user ranking algorithm. 4) Performing extensive experiments to demonstrate the efficacy of our proposed framework. For the third method, we address the challenge of bot detection on social media platform X, where there's an inherent imbalance between genuine users and bots, a key factor leading to biased classifiers. Our approach leverages the rich structure of hypergraphs to represent X users and their interactions, providing a novel foundation for effective bot detection. The contributions of our work include: 1) Introducing a hypergraph representation of the X platform, where user accounts are nodes and their interactions form hyperedges, capturing the intricate relationships between users. 2) Developing HyperSMOTE to generate synthetic bot accounts within the hypergraph, ensuring a balanced training dataset while preserving the hypergraph's structure and semantics. 3) Designing a hypergraph neural network specifically for bot detection, utilizing node and hyperedge information for accurate classification. 4) Conducting comprehensive experiments to validate the effectiveness of our methods, particularly in scenarios with pronounced class imbalances. For the fourth method, we introduce a Hypergraph Convolutional Network model for classifying mental health advice in Arabic tweets. Our model distinguishes between valid and misleading advice, leveraging high-order word relations in short texts through hypergraph structures. Our extensive experiments demonstrate its effectiveness over existing methods. The key contributions of our work include: 1) Developing a hypergraph-based model for short text multiclass classification, capturing complex word relationships through hypergraph convolution. 2) Defining four types of hyperedges to encapsulate local and global contexts and semantic similarities in our dataset. 3) Conducting comprehensive experiments in which the proposed model outperforms several baseline models in classifying Arabic tweets, demonstrating its superiority. For the fifth method, we extended our previous Hypergraph Convolutional Network (HCN) model to be tailored for sarcasm detection across multiple low-resource languages. Our model excels in interpreting the subtle and context-dependent nature of sarcasm in short texts by exploiting the power of hypergraph structures to capture complex, high-order relationships among words. Through the construction of three hyperedge types, our model navigates the intricate semantic and sentiment differences that characterize sarcastic expressions. The key contributions of our research are as follows: 1) A hypergraph-based model was adapted for the task of sarcasm detection in five short low-resource language texts, allowing the model to capture semantic relationships and contextual cues through advanced hypergraph convolution techniques. 2) Introducing a comprehensive framework for constructing hyperedges, incorporating short text, semantic similarity, and sentiment discrepancy hyperedges, which together enrich the model's ability to understand and detect sarcasm across diverse linguistic contexts. 3) The extensive evaluations reveal that the proposed hypergraph model significantly outperforms a range of established baseline methods in the domain of multilingual sarcasm detection, establishing new benchmarks for accuracy and generalizability in detecting sarcasm within low-resource languages.
- Apache Solr: Indexing and SearchingSethi, Iccha; Aslan, Serdar; Fox, Edward A. (2010-10-26)This module addresses the basic concepts of the open source Apache Solr platform that is specifically designed for indexing documents and executing searches.
- Application SoftwareYang, Seungwon (2009-10-07)This module covers commonly used application software, which are specifically designed for the creation and development of digital library (DL) systems and similar types of collections and services, such as open access archives.
- Applying GIS and Text Mining Methods to Twitter Data to Explore the Spatiotemporal Patterns of Topics of Interest in KuwaitG. Almatar, Muhammad; Alazmi, Huda S.; Li, Liuqing; Fox, Edward A. (MDPI, 2020-11-25)Researchers have developed various approaches for exploring the spatial information, temporal patterns, and Twitter content in topics of interest in order to generate a better understanding of human behavior; however, few investigations have integrated these three dimensions simultaneously. This study analyzes the content of tweets in order to conduct a spatiotemporal exploration of the main topics of interest in Kuwait in order to provide a deeper understanding of the topics people think about, when they think about them, and where they tweet about them. To this end, we collect, process, and analyze tweets from nearly 120 areas in Kuwait over a 10-month period. The study’s results indicate that religion, emotions, education, and public policy are the most popular topics of interest in Kuwait. Regarding the spatiotemporal analysis, people post more tweets regarding religion on Fridays, a holy day for Muslims in Kuwait. Moreover, people are more likely to tweet about policy and education on weekdays rather than weekends. In contrast, people tweet about emotional expressions more often on weekends. From the spatial perspectives, spatial clustering in topics occurs across the days of the week. The findings are applicable to further topic analysis and similar research in other countries.
- Applying the 5S Framework To Integrating Digital LibrariesShen, Rao (Virginia Tech, 2006-04-17)We formalize the digital library (DL) integration problem and propose an overall approach based on the 5S (Streams, Structures, Spaces, Scenarios, and Societies) framework. We then apply that framework to integrate domain-specific (archaeological) DLs, illustrating our solutions for key problems in DL integration. An integrated Archaeological DL, ETANA-DL, is used as a case study to justify and evaluate our DL integration approach. We develop a minimum metamodel for archaeological DLs within the 5S theory. We implement the 5SSuite toolkit set to cover the process of union DL generation, including requirements gathering, conceptual modeling, rapid prototyping, and code generation. 5SSuite consists of 5SGraph, 5SGen, and SchemaMapper, which plays an important role during integration. SchemaMapper, a visual mapping tool, maps the schema of diverse DLs into a global schema for a union DL and generates a wrapper for each individual DL. Each wrapper transforms the metadata catalog of its DL to one conforming to the global schema. The converted catalogs are stored in the union catalog, so that the union DL has a global metadata format and union catalog. We also propose a formal approach to DL exploring services for integrated DLs based on 5S, which provides a systematic and functional method to design and implement DL exploring services. Finally, we propose a DL success model to assess integrated DLs from the perspective of DL end users by integrating 5S theory with diverse research on information systems success and adoption models, and information-seeking behavior models.
- Arabic News Text Classification and Summarization: A Case of the Electronic Library Institute SeerQ (ELISQ)Kan'an, Tarek Ghaze (Virginia Tech, 2015-07-21)Arabic news articles in heterogeneous electronic collections are difficult for users to work with. Two problems are: that they are not categorized in a way that would aid browsing, and that there are no summaries or detailed metadata records that could be easier to work with than full articles. To address the first problem, schema mapping techniques were adapted to construct a simple taxonomy for Arabic news stories that is compatible with the subject codes of the International Press Telecommunications Council. So that each article would be labeled with the proper taxonomy category, automatic classification methods were researched, to identify the most appropriate. Experiments showed that the best features to use in classification resulted from a new tailored stemming approach (i.e., a new Arabic light stemmer called P-Stemmer). When coupled with binary classification using SVM, the newly developed approach proved to be superior to state-of-the-art techniques. To address the second problem, i.e., summarization, preliminary work was done with English corpora. This was in the context of a new Problem Based Learning (PBL) course wherein students produced template summaries of big text collections. The techniques used in the course were extended to work with Arabic news. Due to the lack of high quality tools for Named Entity Recognition (NER) and topic identification for Arabic, two new tools were constructed: RenA for Arabic NER, and ALDA for Arabic topic extraction tool (using the Latent Dirichlet Algorithm). Controlled experiments with each of RenA and ALDA, involving Arabic speakers and a randomly selected corpus of 1000 Qatari news articles, showed the tools produced very good results (i.e., names, organizations, locations, and topics). Then the categorization, NER, topic identification, and additional information extraction techniques were combined to produce approximately 120,000 summaries for Qatari news articles, which are searchable, along with the articles, using LucidWorks Fusion, which builds upon Solr software. Evaluation of the summaries showed high ratings based on the 1000-article test corpus. Contributions of this research with Arabic news articles thus include a new: test corpus, taxonomy, light stemmer, classification approach, NER tool, topic identification tool, and template-based summarizer – all shown through experimentation to be highly effective.
- Architecting a Cloud-native Data Analysis Application for ETDsChen, Yinlin; Fox, Edward A. (2018)In this paper, we present a Cloud-native data analysis application and its architecture. This application was developed for librarians to explore useful information from the ETDs preserved in the Virginia Tech digital repository - VTechWorks. We realized the Cloud-native concepts by architecting a serverless architecture with microservices and managed services as backend, and deployed the entire application on Amazon Web Services (AWS). We detail our architecture strategies, decisions we made, and the best practices we followed. Furthermore, we share the lessons learned and cloud benefits we have gained. We believe that our proposed approach could be adopted by other ETD systems, e.g., NDLTD, and could be of benefit to the broader community.
- An Architecture for Collaborative Math and Science Digital LibrariesKrowne, Aaron Phillip (Virginia Tech, 2003-07-19)In this thesis I present Noosphere, a system for the collaborative production of digital libraries. Further, I describe the special features of Noosphere which allow it to support mathematical and scientific content, and how it applies an encyclopedic organizational style. I also describe how Noosphere frees the digital library maintainer from a heavy administrative burden by implementing the design pattern of zero content administration. Finally, I discuss evidence showing that Noosphere works and is sustainable, both in the a priori and empirical senses.
- An Architecture for Multischeming in Digital LibrariesKrowne, Aaron; Fox, Edward A. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2003)In this paper we discuss the problem of handling many classification schemes within the context of a single digital library concurrently, which we term multischeming. We discuss how to represent which category describes an object in the digital library in this system, as well as the workings of the browsing process which is performed by the user. We motivate this problem as related to digital library interoperability, and propose an architecture for representation of classification schemes in the digital library which solves the problem. We also discuss its implementation in the CITIDEL project.
- Architecture of an Object-Oriented Expert System for Composite Document Analysis, Representation, and RetrievalFox, Edward A.; France, Robert K. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1986-04-01)The CODER project is a multi-year effort to investigate how best to apply artificial intelligence methods to increase the effectiveness of information retrieval systems when handling collections of composite documents. In order to ensure system adaptability and to allow reconfiguration for controlled experimentation, the project has been designed as an expert system. The use of individually tailored specialist experts coupled with standardized blackboard modules for communication and internal and external knowledge bases for managing effective knowledge allows for quick prototyping, incremental development and flexibility under change. The system as a whole is structured as a set of communicating modules, designed under an object-oriented paradigm and implemented under UNIX&tm; using pipes and the TCP/IP protocol. Inferential modules are being coded in MU-Prolog; non-inferential modules are being prototyped in MU-Prolog and will be re-implemented as needed in C++.
- Are Repositories Impeding Big Data Reuse?Xie, Zhiwu; Galad, Andrej; Chen, Yinlin; Fox, Edward A. (Virginia Tech, 2016-06-14)In this intentionally provocative presentation, we question the scalability of popular digital repositories and whether they are suitable for big data reuse. Are the layers of API these repositories have painted over file system primitives necessary? How essential is it for the repository to insist on being the sole manager of the content, and arranging files in ways to prevent access other than from their own APIs? We explore these questions from the perspective of big data reuse, and describe controlled reuse experiments against Fedora 4 to evaluate the cost of these practices.