Browsing by Author "van Reenen, Johann"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
- Archiving the Relaxed Consistency WebXie, Zhiwu; Van de Sompel, Herbert; Liu, Jinyang; van Reenen, Johann; Jordan, Ramiro (ACM, 2013)The historical, cultural, and intellectual importance of archiving the web has been widely recognized. Today, all countries with high Internet penetration rate have established high-profile archiving initiatives to crawl and archive the fast-disappearing web content for long-term use. As web technologies evolve, established web archiving techniques face challenges. This paper focuses on the potential impact of the relaxed consistency web design on crawler driven web archiving. Relaxed consistent websites may disseminate, albeit ephemerally, inaccurate and even contradictory information. If captured and preserved in the web archives as historical records, such information will degrade the overall archival quality. To assess the extent of such quality degradation, we build a simplified feed-following application and simulate its operation with synthetic workloads. The results indicate that a non-trivial portion of a relaxed consistency web archive may contain observable inconsistency, and the inconsistency window may extend significantly longer than that observed at the data store. We discuss the nature of such quality degradation and propose a few possible remedies.
- Improving scalability by self-archivingXie, Zhiwu; Liu, Jinyang; Van de Sompel, Herbert; van Reenen, Johann; Jordan, Ramiro (ACM, 2011-06-13)The newer generation of web browsers supports the client-side database, making it possible to run the full web application stacks entirely in the web clients. Still, the server side database is indispensable as the central hub for exchanging persistent data between the web clients. Assuming this characterization, we propose a novel web application framework in which the server archives its database states at predefined periods then makes them available on the web. The clients then use these archives to synchronize their local databases. Although the main purpose is to reduce the database scalability bottleneck, this approach also promotes self-archiving and can be used for time traveling. We discuss the consistency properties provided by this framework, as well as the tradeoffs imposed.
- Poor Man's Social Network: Consistently Trade Freshness For ScalabilityXie, Zhiwu; Liu, Jinyang; Van de Sompel, Herbert; van Reenen, Johann; Jordan, Ramiro (USENIX Association, 2012-06)Typical social networking functionalities such as feed following are known to be hard to scale. Different from the popular approach that sacrifices consistency for scalability, in this paper we describe, implement, and evaluate a method that can simultaneously achieve scalability and consistency in feed following applications built on shared-nothing distributed systems. Timing and client-side processing are the keys to this approach. Assuming global time is available at all the clients and servers, the distributed servers publish a pre-agreed upon schedule based on which the continuously committed updates are periodically released for read. This opens up opportunities for caching and client-side processing, and leads to scalability improvements. This approach trades freshness for scalability. Following this approach, we build a twitter-style feed following application and evaluate it on a following network with about 200,000 users under synthetic workloads. The resulting system exhibits linear scalability in our experi-ment. With 6 low-end cloud instances costing a total of no more than $1.2 per hour, we recorded a peak timeline query rate at about 10 million requests per day, under a fixed update rate of 1.6 million new tweets per day. The maximum staleness of the responses is 5 seconds. The performance achieved sufficiently verifies the feasibility of this approach, and provides an alternative to build small to medium size social networking applications on the cheap.
- Web Archiving Inconsistency: A Research AgendaXie, Zhiwu; Van de Sompel, Herbert; Liu, Jinyang; van Reenen, Johann; Jordan, Ramiro (IEEE Technical Committee on Digital Libraries, 2015-10)Scaling web applications usually boils down to a tradeoff between consistency and latency. Very large web operations typically favor low latency, hence purposefully sacrifice strict consistency in the sense of serializability. By definition, the breakdown of serializability may cause the web applications to disseminate, albeit ephemerally, inaccurate and even contradictory information. If captured and preserved in the web archives as historical records, such information will degrade the overall archival quality. Despite its near omnipresent in the popular web, such relaxation in data consistency is not widely reported nor thoroughly studied by the web archiving community.