A Large-Scale Study of the Evolution of Web Pages - Microsoft

Extrait du fichier (au format texte) :

A Large-Scale Study of the Evolution of Web Pages
Dennis Fetterly
Hewlett Packard Labs
1501 Page Mill Road
Palo Alto, CA 94304
dennis.fetterly@hp.com

Mark Manasse

Marc Najork

Microsoft Research
Microsoft Research
1065 La Avenida
1065 La Avenida
Mountain View, CA 94043 Mountain View, CA 94043
manasse@microsoft.com najork@microsoft.com

Janet Wiener
Hewlett Packard Labs
1501 Page Mill Road
Palo Alto, CA 94304
janet.wiener@hp.com

ABSTRACT

1. INTRODUCTION

How fast does the web change? Does most of the content remain unchanged once it has been authored, or are the documents continuously updated? Do pages change a little or a lot? Is the extent of change correlated to any other property of the page? All of these questions are of interest to those who mine the web, including all the popular search engines, but few studies have been performed to date to answer them.
One notable exception is a study by Cho and Garcia-Molina,
who crawled a set of 720,000 pages on a daily basis over four months, and counted pages as having changed if their MD5 checksum changed. They found that 40% of all web pages in their set changed within a week, and 23% of those pages that fell into the
.com domain changed daily.
This paper expands on Cho and Garcia-Molina s study, both in terms of coverage and in terms of sensitivity to change. We crawled a set of 150,836,209 HTML pages once every week, over a span of
11 weeks. For each page, we recorded a checksum of the page, and a feature vector of the words on the page, plus various other data such as the page length, the HTTP status code, etc. Moreover, we pseudo-randomly selected 0.1% of all of our URLs, and saved the full text of each download of the corresponding pages.
After completion of the crawl, we analyzed the degree of change of each page, and investigated which factors are correlated with

Les promotions



Architectures reconfigurables et traitement de proble`mes ... - Microsoft
Architectures reconfigurables et traitement de proble`mes ... - Microsoft
16/11/2016 - www.microsoft.com
RECHERCHE Architectures reconfigurables et traitement de proble`mes NP-difficiles : un nouveau domaine d application Youssef Hamadi    David Merceron  '  ' LIRMM, UMR 5506 CNRS/Universite´ Montpellier II 161, Rue Ada, 34392 Montpellier Cedex 5 hamadi@lirmm.fr ''' EURIWARE, 12-14 rue du fort de St-Cyr 78067 St Quentin-en-Yvelines Cedex damercer@euriware.fr RE´SUME´. L algorithme GSAT est un algorithme de recherche locale. Cette me´thode recherche la premie`re instanciation...

D6. 4: Final evaluation of CLASSiC TownInfo and ... - Microsoft
D6. 4: Final evaluation of CLASSiC TownInfo and ... - Microsoft
23/11/2017 - www.microsoft.com
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/228835240 D6. 4: Final evaluation of CLASSiC TownInfo and Appointment Scheduling systems Article · May 2011 CITATIONS READS 15 56 11 authors, including: Helen Hastie Filip Jurcicek Heriot-Watt University Charles University in Prague 105 PUBLICATIONS 858 CITATIONS 55 PUBLICATIONS 439 CITATIONS SEE PROFILE SEE PROFILE Oliver Joseph Lemon Steve Young Heriot-Watt University University of Cambridge 323 PUBLICATIONS 3,678 CITATIONS 310 PUBLICATIONS 14,308 CITATIONS SEE PROFILE SEE PROFILE Some of the authors of this publication are also working on these related projects: MaDrIgAL: Multi-Dimensional Interaction management and Adaptive Learning View project ...

1 Introduction - Microsoft
1 Introduction - Microsoft
11/04/2018 - www.microsoft.com
One-Way Accumulators: A Decentralized Alternative to Digital Signatures (Extended Abstract) Josh Benaloh Clarkson University Michael de Mare Giordano Automation Abstract This paper describes a simple candidate one-way hash function which satis es a quasi-commutative property that allows it to be used as an accumulator. This property allows protocols to be developed in which the need for a trusted central authority can be eliminated. Space-e cient distributed protocols are given for document time...

MSFT SurfaceLaptopIntel 5g Fact Sheet
MSFT SurfaceLaptopIntel 5g Fact Sheet
02/10/2025 - www.microsoft.com
Windows Hello for Business with facial recognition and Enhanced Sign-In Security Surface Laptop 5G for Business Near-edgeless display and Surface's signature 3:2 ratio for more screen in a compact footprint Premium experiences drive AI advantage anywhere NPUs delivering 40 or 48 TOPS of on-device AI performance to support today's capabilities and tomorrow's innovations5 Anti-reflective technology reduces reflections up to 50% Exceptional AI-enabled collaboration and Copilot+ PC1 productivity...

Msft Microsoft Surface Pro 11th Edition Fact Sheet
Msft Microsoft Surface Pro 11th Edition Fact Sheet
10/10/2025 - www.microsoft.com
Surface Pro for Business Fact Sheet May 2024 The most flexible laptop, reimagined. The new Surface Pro is the most flexible 2-in-1 laptop, now reimagined with more speed and battery life for all-new AI experiences, powered by Snapdragon? X Elite and Plus processors with an industry leading NPU. All wrapped up in an ultra-portable design that can replace your tablet, your laptop, and power your multi-monitor set-up. The new Surface Pro Flex Keyboard allows you to position your Surface Pro and...

Microsoft Modern Work Plan Comparison Education 11 2021
Microsoft Modern Work Plan Comparison Education 11 2021
14/09/2024 - www.microsoft.com
Add-on licenses Endpoint and app management Microsoft Product Terms Desktop client apps1 %? %? %? %? %? Office Mobile apps2 %? %? %? %? %? %? Install apps on up to 5 PCs/Mac + 5 tablets + 5 smartphones %?3 %? %? %?3 %? %? Office for the web %? %?

MSR Quantum applications - Microsoft
MSR Quantum applications - Microsoft
23/08/2018 - www.microsoft.com
( What Can We Do with a Quantum Computer? ( Matthias Troyer  Station Q, ETH Zurich | 1 Classical computers have come a long way Antikythera mechanism ENIAC astronomical positions (1946) (100 BC) Kelvin s harmonic analyzer prediction of tides (1878) Difference Engine (1822) Is there anything that we cannot solve on future supercomputers? Titan, ORNL (2013) Matthias Troyer | | 2 How long will Moore s law continue? Do we see signs of the end of Moore s law? Can we go below 7nm...

Microsoft K State Whitepaper 2021 08 17
Microsoft K State Whitepaper 2021 08 17
23/09/2024 - www.microsoft.com
Cloud enclave for academic research Streamlining security and compliance at your institution August 2021 Contents Introduction........................................................................................................ 3 1. Assess where you are today........................................................................ 4 Work directly with researchers to identify challenges............................................................................... 4 Identify existing compliance...
 
 

LAVE VAISSELLE A CAPOT
LAVE VAISSELLE A CAPOT
16/04/2012 - tools.professional.electrolux.com
COMPOSITION DE LA GAMME La gamme est composée de 9 modèles avec une capacité de 1000/1200 assiettes par heure, permettant de laver les assiettes, couverts, verres, tasses, etc... 3 modèles capot simple isolation & 3 cycles 1 modèle capot simple isolation & 4 cycles 1 modèle capot simple isolation, avec adoucisseur & 4 cycles 1 modèle capot simple isolation & 4 cycles 60Hz 1 modèle capot double isolation & 4 cycles 1 modèle capot double isolation, avec adoucisseur & 4 cycles 1 modèle capot...

Curved OLED TV nun auch in der Schweiz und in Österreich ...
Curved OLED TV nun auch in der Schweiz und in Österreich ...
24/11/2014 - www.lg.com
Global Web Site www.lg.com Curved OLED TV nun auch in der Schweiz und in Österreich erhältlich LG erweitert seinen Markt für die beliebten OLED Modelle Wien, 15. November 2013 - Ab sofort ist der Curved OLED TV in der Schweiz und in Österreich verfügbar. Die Fernseher beeindrucken durch hervorragende Bildqualität, atemberaubendes Design und ihre innovativen Technologien. In Österreich beträgt der UVP 8.999 ¬. IMAX-Erlebnis für zu Hause: Der Curved OLED TV Bereits auf der CES 2013 vorgestellt...

Installation Guide
Installation Guide
16/02/2012 - www.logitech.com
Installation Guide English Français Español Português 3 2 1 1 2 ON 1 2 4 3 USB Logitech® www.logitech.com 5 USB 6 English Package Contents: 1. Keyboard 2. Mouse 3. USB Mini Receiver 4. USB stand 5. Four AA batteries 6. Software Español Contenido de la caja: 1. Teclado 2. Ratón 3. Minirreceptor USB 4. Base USB 5. Cuatro pilas AA 6. Software

Nikon annonce le zoom AF-S DX NIKKOR VR ED 18-105 mm f/3.5 ...
Nikon annonce le zoom AF-S DX NIKKOR VR ED 18-105 mm f/3.5 ...
12/03/2012 - www.nikon.fr
Communiqué de Presse Paris, le 27 août 2008 Tél. : +31 (0)20 449 6222 Nikon annonce le zoom AF-S DX NIKKOR VR ED 18-105 mm f/3.5-5.6G Un nouvel objectif 18-105 mm vient compléter la gamme NIKKOR de Nikon Nikon présente le zoom AF-S DX NIKKOR VR ED 18-105 mm f/3.5-5.6G. Ce nouvel objectif NIKKOR doté d'une plage de focales de 18 à 105 mm (équivalent 27-157,5 mm au format 24x36) est à la fois économique et performant. Il a été conçu pour une association parfaite avec le Nikon D90....

PCS-TL30P - Sony
PCS-TL30P - Sony
17/11/2014 - www.sony.fr
1365_PCS-TL30P_data_FR#sent2 24/04/06 17:49 Page 1 PRÉSENTATION DU PCS-TL30P Le PCS-TL30P constitue la solution de visioconférence de bureau idéale. Son prix très avantageux saura convaincre les utilisateurs recherchant un système rapide et économique. Adapté à une grande variété d'applications, il s'utilise aussi bien à domicile que dans les grandes entreprises et permet aux gens de communiquer entre eux en un simple clic de souris. Le PCS-TL30P est un système « tout-en-un » léger...

SGH-E840
SGH-E840
13/02/2012 - www.samsung.com
SGH-E840 Imaginez un monde de possibilités ƒ Appareil photo 2 mégapixels ƒ Lecteur multimédia (audio/vidéo) ƒ Extension mémoire MicroSD ƒ Radio FM ƒ Sortie TV SGH-E840 Disponible en mai 2007 Caractéristiques Caractéristiques principales Messages Poids 83 g Dimensions Quadri-bande Batterie 101,5 x 52,5 x 10,6 mm 850 / 900 / 1800 / 1900 MHz Li-ion 670 mAh Autonomie en communication 3h30 Autonomie en veille 250h y SMS / MMS Formats image supportés Formats son supportés Formats...