A Large-Scale Study of the Evolution of Web Pages - Microsoft

Extrait du fichier (au format texte) :

A Large-Scale Study of the Evolution of Web Pages
Dennis Fetterly
Hewlett Packard Labs
1501 Page Mill Road
Palo Alto, CA 94304
dennis.fetterly@hp.com

Mark Manasse

Marc Najork

Microsoft Research
Microsoft Research
1065 La Avenida
1065 La Avenida
Mountain View, CA 94043 Mountain View, CA 94043
manasse@microsoft.com najork@microsoft.com

Janet Wiener
Hewlett Packard Labs
1501 Page Mill Road
Palo Alto, CA 94304
janet.wiener@hp.com

ABSTRACT

1. INTRODUCTION

How fast does the web change? Does most of the content remain unchanged once it has been authored, or are the documents continuously updated? Do pages change a little or a lot? Is the extent of change correlated to any other property of the page? All of these questions are of interest to those who mine the web, including all the popular search engines, but few studies have been performed to date to answer them.
One notable exception is a study by Cho and Garcia-Molina,
who crawled a set of 720,000 pages on a daily basis over four months, and counted pages as having changed if their MD5 checksum changed. They found that 40% of all web pages in their set changed within a week, and 23% of those pages that fell into the
.com domain changed daily.
This paper expands on Cho and Garcia-Molina s study, both in terms of coverage and in terms of sensitivity to change. We crawled a set of 150,836,209 HTML pages once every week, over a span of
11 weeks. For each page, we recorded a checksum of the page, and a feature vector of the words on the page, plus various other data such as the page length, the HTTP status code, etc. Moreover, we pseudo-randomly selected 0.1% of all of our URLs, and saved the full text of each download of the corresponding pages.
After completion of the crawl, we analyzed the degree of change of each page, and investigated which factors are correlated with

Les promotions



MSFT Echo SurfaceLaptopIntel Fact Sheet
MSFT Echo SurfaceLaptopIntel Fact Sheet
13/12/2025 - www.microsoft.com
Windows Hello for Business with facial recognition and Enhanced Sign-In Security Surface Laptop for Business Near-edgeless display and Surface's signature 3:2 ratio for more screen in a compact footprint Premium experiences drive AI advantage NPUs delivering 40 or 48 TOPS of on-device AI performance to support today's capabilities and tomorrow's innovations5 Anti-reflective technology reduces reflections up to 50% Optional smart card reader16 Exceptional AI-enabled collaboration and Copilot+...

L'économie de la sécurité - Microsoft
L'économie de la sécurité - Microsoft
16/11/2016 - www.microsoft.com
nl y se Lect L économie de la sécurité Ces dernières années, la sécurité est devenue une priorité pour les pouvoirs publics et les entreprises. Crime organisé, terrorisme, interruption des chaînes d approvisionnement mondiales, virus informatiques  autant de menaces avec lesquelles il faut compter dans le monde d aujourd hui. D où l émergence d un marché des équipements et des services de sécurité de 100 milliards de dollars. Ce marché est alimenté par la demande croissante émanant...

Msft Echo Microsoft Surface Pro 10 Fact Sheet Row
Msft Echo Microsoft Surface Pro 10 Fact Sheet Row
13/12/2025 - www.microsoft.com
Surface Pro 10 An AI PC built for business, designed for versatility Surface Pro 10 blurs the boundary between hardware and software for peak performance in a secured, lightweight device that adapts to any work style. Employees get the benefits of an AI PC that accelerates Microsoft Copilot* experiences and offers integrated AI engines that enable the next wave of business features. Choose from Wi-Fi+5G or Wi-Fi only. A new era of workplace collaboration Never-ending, on-the-go impact Take advantage...

DictaNum : système de dialogue incrémental pour la dictée ... - Microsoft
DictaNum : système de dialogue incrémental pour la dictée ... - Microsoft
23/11/2017 - www.microsoft.com
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/262881756 DictaNum : système de dialogue incrémental pour la dictée de numéros. Conference Paper · July 2014 CITATIONS READS 0 57 3 authors, including: Hatim Khouzaimi Romain Laroche Orange Labs / Laboratoire Informatique d'Avi & Microsoft Maluuba 12 PUBLICATIONS 42 CITATIONS 58 PUBLICATIONS 185 CITATIONS SEE PROFILE SEE PROFILE All content following this page was uploaded by Hatim Khouzaimi on 06 June 2014. The user has requested enhancement of the downloaded file. 21ème...

Msft Surfacelaptopintel Ecoprofile
Msft Surfacelaptopintel Ecoprofile
02/10/2025 - www.microsoft.com
ECOPROFILE Surface Laptop 13.8" 7th Edition for Business (Intel) Surface Laptop 13.8" 7th Edition for Business (Intel) Ecoprofile ? Microsoft Corporation. All rights reserved. Last updated February 2025 Our goals In 2020 Microsoft committed to becoming carbon negative, water positive, and zero waste by 20301. Surface plays a key role in helping Microsoft achieve these goals, so we are working to reduce the environmental impacts of our Surface products. Our approach embeds sustainability into...

Vers une approche simplifiée pour introduire le caractère ... - Microsoft
Vers une approche simplifiée pour introduire le caractère ... - Microsoft
23/11/2017 - www.microsoft.com
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/262881208 Vers une approche simplifiée pour introduire le caractère incrémental dans les systèmes de dialogue Conference Paper · July 2014 CITATION READS 1 26 3 authors, including: Hatim Khouzaimi Romain Laroche Orange Labs / Laboratoire Informatique d'Avi & Microsoft Maluuba 12 PUBLICATIONS 42 CITATIONS 58 PUBLICATIONS 185 CITATIONS SEE PROFILE All content following this page was uploaded by Hatim Khouzaimi on 28 April 2015. The user has requested enhancement of the downloaded file. SEE PROFILE 21ème...

MSR Quantum applications - Microsoft
MSR Quantum applications - Microsoft
23/08/2018 - www.microsoft.com
( What Can We Do with a Quantum Computer? ( Matthias Troyer  Station Q, ETH Zurich | 1 Classical computers have come a long way Antikythera mechanism ENIAC astronomical positions (1946) (100 BC) Kelvin s harmonic analyzer prediction of tides (1878) Difference Engine (1822) Is there anything that we cannot solve on future supercomputers? Titan, ORNL (2013) Matthias Troyer | | 2 How long will Moore s law continue? Do we see signs of the end of Moore s law? Can we go below 7nm...

Microsoft K State Whitepaper 2021 08 17
Microsoft K State Whitepaper 2021 08 17
23/09/2024 - www.microsoft.com
Cloud enclave for academic research Streamlining security and compliance at your institution August 2021 Contents Introduction........................................................................................................ 3 1. Assess where you are today........................................................................ 4 Work directly with researchers to identify challenges............................................................................... 4 Identify existing compliance...
 
 

Xserve (Early 2008) DIY Procedure for Rear ID Tab - Support
Xserve (Early 2008) DIY Procedure for Rear ID Tab - Support
27/11/2014 - manuals.info.apple.com
øÿ Xserve (Late 2006/Early 2008) Rear ID Tab Replacement Instructions Follow the instructions in this document carefully. Failure to follow these instructions could damage your equipment and void its warranty. Online instructions are available at http://www.apple.com/support/diy/. Working Safely Inside the Xserve Always touch the Xserve enclosure to discharge static electricity before you touch any components inside the Xserve. To avoid generating static electricity, do not walk around the room...

Quelle table de cuisson
Quelle table de cuisson
09/02/2018 - www.sauter-electromenager.com
GUIDE D'UTILISATION Table de cuisson Chère Cliente, Cher Client, Vous venez d'acquérir une table de cuisson SAUTER et nous vous en remercions. C'est pour vous que nous avons conçu cette nouvelle génération d'appareils pour vous permettre jour après jour d'exercer sans retenue vos talents de chef et votre créativité, pour vous faire plaisir et faire plaisir à vos amis et votre famille. Votre nouvelle table de cuisson SAUTER s'intègrera harmonieusement dans votre cuisine et alliera parfaitement...

Jeu «Instant Gagnant - Renault
Jeu «Instant Gagnant - Renault
15/01/2018 - www.renault.fr
Règlement du jeu « GRAND JEU RENAULT STAR WARS : LES DERNIERS JEDI » pour Renault SANS OBLIGATION D ACHAT DU 15/11/2017 AU 28/02/2018 REGLEMENT COMPLET ARTICLE 1 - Organisation La société Renault (ci-après « l Organisateur »), SAS au capital de 533.941.113 euros immatriculée au Registre du Commerce et des sociétés de Nanterre sous le N° B 780 129 987 dont le siège social est situé au 13-15 Quai Alphonse Le Gallo , 92 100 Boulogne-Billancourt - France, organise un jeu gratuit sans obligation...

Enhorabuena, usted y su MacBook Pro están ... - Support
Enhorabuena, usted y su MacBook Pro están ... - Support
27/11/2014 - manuals.info.apple.com
Enhorabuena, usted y su MacBook Pro están hechos el uno para el otro. Salude a su MacBook Pro. www.apple.com/es/macbookpro Con iChat y cámara iSight integrada Comuníquese con sus amigos y familiares esté donde esté con el vídeo chat. Ayuda Mac isight Finder Navegue por sus archivos como navega por su música con Cover Flow. Ayuda Mac finder MacBook Pro Multi-Touch trackpad Navegue por los archivos, ajuste imágenes y agrande el texto solo moviendo los dedos. Arrastrar Desplazar Pellizcar...

Getting started with Logitech® Wireless Gaming Mouse G700
Getting started with Logitech® Wireless Gaming Mouse G700
05/12/2014 - www.logitech.com
COLOR SPECIFICATIONS BRAND / LAUNCH: Logitech 2010 SPOT COLORS PROJECT TITLE: Kowloon DIE LINE COLOR (NO PRINT) DIE LINES DETAILS: EMEA-14 front/ GSW Guide PROCESS COLORS K 5 25 50 DIELINE NAME: n/a SPECIFICATIONS / NOTES: Final files Job is one-color MODIFICATION DATE: March 16, 2010 75 95 DIELINE RECEIVED: n/a THIS PRINT SIZE / SCALE: 100% of original Designer: Gregory Gomez Location: Fremont, CA, USA Getting started with Logitech® Wireless Gaming Mouse G700 1 2 3

Western Digital Large Capacity Drives Technology Brief
Western Digital Large Capacity Drives Technology Brief
11/04/2012 - www.wdc.com
Disques durs grande capacité Fiche technique Les disques durs SATA sont maintenant capables de stocker jusqu'à 3 To de données sur un seul disque dur. Face à ces capacités particulièrement élevées, certains problèmes de compatibilité se posent. WD , leader de l'industrie du disque dur, vous offre des solutions pour réduire ces problèmes de compatibilité avec les disques durs de plus de 2,19 To. ® Quels sont les problèmes de compatibilité potentiels ? Les systèmes d'exploitation...