A Large-Scale Study of the Evolution of Web Pages - Microsoft

Extrait du fichier (au format texte) :

A Large-Scale Study of the Evolution of Web Pages
Dennis Fetterly
Hewlett Packard Labs
1501 Page Mill Road
Palo Alto, CA 94304
dennis.fetterly@hp.com

Mark Manasse

Marc Najork

Microsoft Research
Microsoft Research
1065 La Avenida
1065 La Avenida
Mountain View, CA 94043 Mountain View, CA 94043
manasse@microsoft.com najork@microsoft.com

Janet Wiener
Hewlett Packard Labs
1501 Page Mill Road
Palo Alto, CA 94304
janet.wiener@hp.com

ABSTRACT

1. INTRODUCTION

How fast does the web change? Does most of the content remain unchanged once it has been authored, or are the documents continuously updated? Do pages change a little or a lot? Is the extent of change correlated to any other property of the page? All of these questions are of interest to those who mine the web, including all the popular search engines, but few studies have been performed to date to answer them.
One notable exception is a study by Cho and Garcia-Molina,
who crawled a set of 720,000 pages on a daily basis over four months, and counted pages as having changed if their MD5 checksum changed. They found that 40% of all web pages in their set changed within a week, and 23% of those pages that fell into the
.com domain changed daily.
This paper expands on Cho and Garcia-Molina s study, both in terms of coverage and in terms of sensitivity to change. We crawled a set of 150,836,209 HTML pages once every week, over a span of
11 weeks. For each page, we recorded a checksum of the page, and a feature vector of the words on the page, plus various other data such as the page length, the HTTP status code, etc. Moreover, we pseudo-randomly selected 0.1% of all of our URLs, and saved the full text of each download of the corresponding pages.
After completion of the crawl, we analyzed the degree of change of each page, and investigated which factors are correlated with

Les promotions

Promo
14.9 € 12.9 €


MSR Quantum applications - Microsoft
MSR Quantum applications - Microsoft
23/08/2018 - www.microsoft.com
( What Can We Do with a Quantum Computer? ( Matthias Troyer  Station Q, ETH Zurich | 1 Classical computers have come a long way Antikythera mechanism ENIAC astronomical positions (1946) (100 BC) Kelvin s harmonic analyzer prediction of tides (1878) Difference Engine (1822) Is there anything that we cannot solve on future supercomputers? Titan, ORNL (2013) Matthias Troyer | | 2 How long will Moore s law continue? Do we see signs of the end of Moore s law? Can we go below 7nm...

MSFT SurfaceLaptopIntel Fact Sheet
MSFT SurfaceLaptopIntel Fact Sheet
02/10/2025 - www.microsoft.com
Windows Hello for Business with facial recognition and Enhanced Sign-In Security Surface Laptop for Business Near-edgeless display and Surface's signature 3:2 ratio for more screen in a compact footprint Premium experiences drive AI advantage NPUs delivering 40 or 48 TOPS of on-device AI performance to support today's capabilities and tomorrow's innovations5 Anti-reflective technology reduces reflections up to 50% Optional smart card reader16 Exceptional AI-enabled collaboration and Copilot+...

Microsoft K State Whitepaper 2021 08 17
Microsoft K State Whitepaper 2021 08 17
23/09/2024 - www.microsoft.com
Cloud enclave for academic research Streamlining security and compliance at your institution August 2021 Contents Introduction........................................................................................................ 3 1. Assess where you are today........................................................................ 4 Work directly with researchers to identify challenges............................................................................... 4 Identify existing compliance...

DictaNum : système de dialogue incrémental pour la dictée ... - Microsoft
DictaNum : système de dialogue incrémental pour la dictée ... - Microsoft
23/11/2017 - www.microsoft.com
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/262881756 DictaNum : système de dialogue incrémental pour la dictée de numéros. Conference Paper · July 2014 CITATIONS READS 0 57 3 authors, including: Hatim Khouzaimi Romain Laroche Orange Labs / Laboratoire Informatique d'Avi & Microsoft Maluuba 12 PUBLICATIONS 42 CITATIONS 58 PUBLICATIONS 185 CITATIONS SEE PROFILE SEE PROFILE All content following this page was uploaded by Hatim Khouzaimi on 06 June 2014. The user has requested enhancement of the downloaded file. 21ème...

1 Introduction - Microsoft
1 Introduction - Microsoft
11/04/2018 - www.microsoft.com
One-Way Accumulators: A Decentralized Alternative to Digital Signatures (Extended Abstract) Josh Benaloh Clarkson University Michael de Mare Giordano Automation Abstract This paper describes a simple candidate one-way hash function which satis es a quasi-commutative property that allows it to be used as an accumulator. This property allows protocols to be developed in which the need for a trusted central authority can be eliminated. Space-e cient distributed protocols are given for document time...

A Large-Scale Study of the Evolution of Web Pages - Microsoft
A Large-Scale Study of the Evolution of Web Pages - Microsoft
23/08/2018 - www.microsoft.com
A Large-Scale Study of the Evolution of Web Pages Dennis Fetterly Hewlett Packard Labs 1501 Page Mill Road Palo Alto, CA 94304 dennis.fetterly@hp.com Mark Manasse Marc Najork Microsoft Research Microsoft Research 1065 La Avenida 1065 La Avenida Mountain View, CA 94043 Mountain View, CA 94043 manasse@microsoft.com najork@microsoft.com Janet Wiener Hewlett Packard Labs 1501 Page Mill Road Palo Alto, CA 94304 janet.wiener@hp.com ABSTRACT 1. INTRODUCTION How fast does the web change? Does most...

DSCOVR: Randomized Primal-Dual Block Coordinate ... - Microsoft
DSCOVR: Randomized Primal-Dual Block Coordinate ... - Microsoft
23/08/2018 - www.microsoft.com
DSCOVR: Randomized Primal-Dual Block Coordinate Algorithms for Asynchronous Distributed Optimization lin.xiao@microsoft.com Lin Xiao Microsoft Research AI Redmond, WA 98052, USA weiyu@cs.cmu.edu Adams Wei Yu Machine Learning Department, Carnegie Mellon University Pittsburgh, PA 15213, USA qihang-lin@uiowa.edu Qihang Lin Tippie College of Business, The University of Iowa Iowa City, IA 52245, USA wzchen@microsoft.com Weizhu Chen Microsoft AI and Research Redmond, WA 98052, USA October 13,...

Msft Accessories Surface Pro Flex Keyboard Product Spec Sheet
Msft Accessories Surface Pro Flex Keyboard Product Spec Sheet
13/02/2026 - www.microsoft.com
Surface Pro Flex Keyboard A new era of flexibility Reimagining connectivity to unlock versatility. Elevate employee productivity and comfort with the freedom to position device and keyboard differently for specific tasks and scenarios. All-day productivity Work with the keyboard detached for up to 41 hours of continuous typing.31 Simply attach to Surface Pro to recharge. Attached to the Pro,15 it's the ultimate laptop setup. Detached,16 it enables unrivalled flexibility for any workspace. Up...
 
 

Transform Rooms Solutions Guide
Transform Rooms Solutions Guide
09/05/2025 - www.logitech.com
Transform rooms for greater impact Customized solutions that convert rooms into immersive learning environments Augment rooms with state-of-the-art accessories to create immersive ecosystems designed to improve instructional effectiveness, optimize device management, and provide a seamless learning experience for all. CAMERAS FOR EASY COLLABORATION RALLY BAR With AI video intelligence and advanced sound pickup, our class of Rally Bar solutions brings key concepts front and center. Flexible deployments,...

Logitech, Inc.
Logitech, Inc.
05/12/2014 - www.logitech.com
July 29, 2004 Logitech, Inc. 1499 SE Tech Center Drive, Suite 350 Vancouver, Washington 98683 Phone: (360) 896-2000 Fax: (360) 896-2020 We declare under our sole responsibility that the following products are in conformity with FCC Part 15, Subpart B, Section 15.107(a) and Section 15.109(a), Class B Digital Device. MODEL NAME: S-0115A Computer Speaker 5.1 System This device complies with Part 15 of the FCC rules. Operation is subject to the following two conditions: This device may not cause...

Combiné G series RL56GWEIH
Combiné G series RL56GWEIH
13/02/2012 - www.samsung.com
FROID Disponible 2011 * Fiche provisoire susceptible de modification Combiné G series RL56GWEIH Capacité nette 353L Froid ventilé intégral Classe énergétique A+ Poignées à dépression Ecran de contrôle LED bleus Distributeur d'eau fraîche (BRITA) * visuels non définitifs Combiné G series RL56GWEIH Caractéristiques Volume Brut Total (litres) Réfrigérateur Congélateur Net Total (litres) Réfrigérateur Congélateur Spécificités Technologie Contrôle Distributeur d'eau fraîche...

De'Longhi - CYGNUS FXN27GS / CYGNUS FXN27G
De'Longhi - CYGNUS FXN27GS / CYGNUS FXN27G
04/05/2017 - www.delonghi.com
F RECOMMANDATIONS " Après avoir déballé l appareil, vérifiez son intégrité. Si vous avez des doutes, ne l utilisez pas et adressez-vous à un spécialiste. " Éliminez les sacs en plastique, ils sont dangereux pour les enfants. " En cas d incompatibilité entre la fiche de l appareil et la prise, demandez à un spécialiste de remplacer cette dernière par une prise adéquate et de vérifier si la section des câbles de la prise est adaptée à la puissance consommée par l appareil. ...

CAB Notice ZEF 13-17-TR.indd - Cabasse
CAB Notice ZEF 13-17-TR.indd - Cabasse
07/08/2018 - www.cabasse.com
ZEF 13 - ZEF 13 TR ZEF 17 - ZEF 17TR notice d , installation des enceintes acoustiques loudspeakers owner , s manual betriebsanleitung fxr lautsprecherboxen www.cabasse.com CAB Notice ZEF 13-17-TR.indd 1 21/03/13 17:56 francais english deutsch MONTAGE MURAL VERTICAL - VERTICAL WALL MOUNT SENKRECHTE WANDBEFESTIGUNG 1 ZEF 13/13TR ZEF 17/17TR MONTAGE ANNEAU D ANCRAGE ANTI-CHUTE ASSEMBLY OF THE EYE BOLT FOR FALL PROTECTION AUFBAU DES ABSTURZSICHERUNGS BOLZEN CAB Notice ZEF 13-17-TR.indd...

Télécharger le manuel - Vtech
Télécharger le manuel - Vtech
15/11/2016 - www.vtech-jouets.com
Manuel d utilisation Les jeux des trois royaumes front Arrêt Q C cou Sofi ou Q histoi uiz 2 3 re 4 a é music cr 5 e al Ré imag uiz e © 2014 Disney Enterprises, Inc. Visitez le site Internet www.disney.fr 1 s ot