A Large-Scale Study of the Evolution of Web Pages - Microsoft

Extrait du fichier (au format texte) :

A Large-Scale Study of the Evolution of Web Pages
Dennis Fetterly
Hewlett Packard Labs
1501 Page Mill Road
Palo Alto, CA 94304
dennis.fetterly@hp.com

Mark Manasse

Marc Najork

Microsoft Research
Microsoft Research
1065 La Avenida
1065 La Avenida
Mountain View, CA 94043 Mountain View, CA 94043
manasse@microsoft.com najork@microsoft.com

Janet Wiener
Hewlett Packard Labs
1501 Page Mill Road
Palo Alto, CA 94304
janet.wiener@hp.com

ABSTRACT

1. INTRODUCTION

How fast does the web change? Does most of the content remain unchanged once it has been authored, or are the documents continuously updated? Do pages change a little or a lot? Is the extent of change correlated to any other property of the page? All of these questions are of interest to those who mine the web, including all the popular search engines, but few studies have been performed to date to answer them.
One notable exception is a study by Cho and Garcia-Molina,
who crawled a set of 720,000 pages on a daily basis over four months, and counted pages as having changed if their MD5 checksum changed. They found that 40% of all web pages in their set changed within a week, and 23% of those pages that fell into the
.com domain changed daily.
This paper expands on Cho and Garcia-Molina s study, both in terms of coverage and in terms of sensitivity to change. We crawled a set of 150,836,209 HTML pages once every week, over a span of
11 weeks. For each page, we recorded a checksum of the page, and a feature vector of the words on the page, plus various other data such as the page length, the HTTP status code, etc. Moreover, we pseudo-randomly selected 0.1% of all of our URLs, and saved the full text of each download of the corresponding pages.
After completion of the crawl, we analyzed the degree of change of each page, and investigated which factors are correlated with

Les promotions

Promo
14.9 € 12.9 €


Vers une approche simplifiée pour introduire le caractère ... - Microsoft
Vers une approche simplifiée pour introduire le caractère ... - Microsoft
23/11/2017 - www.microsoft.com
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/262881208 Vers une approche simplifiée pour introduire le caractère incrémental dans les systèmes de dialogue Conference Paper · July 2014 CITATION READS 1 26 3 authors, including: Hatim Khouzaimi Romain Laroche Orange Labs / Laboratoire Informatique d'Avi & Microsoft Maluuba 12 PUBLICATIONS 42 CITATIONS 58 PUBLICATIONS 185 CITATIONS SEE PROFILE All content following this page was uploaded by Hatim Khouzaimi on 28 April 2015. The user has requested enhancement of the downloaded file. SEE PROFILE 21ème...

A Large-Scale Study of the Evolution of Web Pages - Microsoft
A Large-Scale Study of the Evolution of Web Pages - Microsoft
23/08/2018 - www.microsoft.com
A Large-Scale Study of the Evolution of Web Pages Dennis Fetterly Hewlett Packard Labs 1501 Page Mill Road Palo Alto, CA 94304 dennis.fetterly@hp.com Mark Manasse Marc Najork Microsoft Research Microsoft Research 1065 La Avenida 1065 La Avenida Mountain View, CA 94043 Mountain View, CA 94043 manasse@microsoft.com najork@microsoft.com Janet Wiener Hewlett Packard Labs 1501 Page Mill Road Palo Alto, CA 94304 janet.wiener@hp.com ABSTRACT 1. INTRODUCTION How fast does the web change? Does most...

Microsoft Modern Work Plan Comparison Education 11 2021
Microsoft Modern Work Plan Comparison Education 11 2021
14/09/2024 - www.microsoft.com
Add-on licenses Endpoint and app management Microsoft Product Terms Desktop client apps1 %? %? %? %? %? Office Mobile apps2 %? %? %? %? %? %? Install apps on up to 5 PCs/Mac + 5 tablets + 5 smartphones %?3 %? %? %?3 %? %? Office for the web %? %?

Msft Surfacelaptopintel Ecoprofile
Msft Surfacelaptopintel Ecoprofile
02/10/2025 - www.microsoft.com
ECOPROFILE Surface Laptop 13.8" 7th Edition for Business (Intel) Surface Laptop 13.8" 7th Edition for Business (Intel) Ecoprofile ? Microsoft Corporation. All rights reserved. Last updated February 2025 Our goals In 2020 Microsoft committed to becoming carbon negative, water positive, and zero waste by 20301. Surface plays a key role in helping Microsoft achieve these goals, so we are working to reduce the environmental impacts of our Surface products. Our approach embeds sustainability into...

Msft Microsoft Surface Pro 11th Edition Fact Sheet
Msft Microsoft Surface Pro 11th Edition Fact Sheet
10/10/2025 - www.microsoft.com
Surface Pro for Business Fact Sheet May 2024 The most flexible laptop, reimagined. The new Surface Pro is the most flexible 2-in-1 laptop, now reimagined with more speed and battery life for all-new AI experiences, powered by Snapdragon? X Elite and Plus processors with an industry leading NPU. All wrapped up in an ultra-portable design that can replace your tablet, your laptop, and power your multi-monitor set-up. The new Surface Pro Flex Keyboard allows you to position your Surface Pro and...

Msft Echo Microsoft Surface Pro 10 Fact Sheet Row
Msft Echo Microsoft Surface Pro 10 Fact Sheet Row
13/12/2025 - www.microsoft.com
Surface Pro 10 An AI PC built for business, designed for versatility Surface Pro 10 blurs the boundary between hardware and software for peak performance in a secured, lightweight device that adapts to any work style. Employees get the benefits of an AI PC that accelerates Microsoft Copilot* experiences and offers integrated AI engines that enable the next wave of business features. Choose from Wi-Fi+5G or Wi-Fi only. A new era of workplace collaboration Never-ending, on-the-go impact Take advantage...

MSFT Echo SurfaceLaptopIntel 5g Fact Sheet
MSFT Echo SurfaceLaptopIntel 5g Fact Sheet
13/12/2025 - www.microsoft.com
Windows Hello for Business with facial recognition and Enhanced Sign-In Security Surface Laptop 5G for Business Near-edgeless display and Surface's signature 3:2 ratio for more screen in a compact footprint Premium experiences drive AI advantage anywhere NPUs delivering 40 or 48 TOPS of on-device AI performance to support today's capabilities and tomorrow's innovations5 Anti-reflective technology reduces reflections up to 50% Exceptional AI-enabled collaboration and Copilot+ PC1 productivity...

1 Introduction - Microsoft
1 Introduction - Microsoft
11/04/2018 - www.microsoft.com
One-Way Accumulators: A Decentralized Alternative to Digital Signatures (Extended Abstract) Josh Benaloh Clarkson University Michael de Mare Giordano Automation Abstract This paper describes a simple candidate one-way hash function which satis es a quasi-commutative property that allows it to be used as an accumulator. This property allows protocols to be developed in which the need for a trusted central authority can be eliminated. Space-e cient distributed protocols are given for document time...
 
 

Mode d'emploi
Mode d'emploi
06/03/2012 - www.pentax.fr
e_kb498_cover_7.fm Page 1 Tuesday, February 23, 2010 9:46 AM HOYA CORPORATION PENTAX Imaging Systems Division 2-36-9, Maeno-cho, Itabashi-ku, Tokyo 174-8639, JAPAN (http://www.pentax.jp) PENTAX Europe GmbH Julius-Vosseler-Strasse, 104, 22527 Hamburg, GERMANY (European Headquarters) (HQ - http://www.pentaxeurope.com) (Germany - http://www.pentax.de) Hotline: 0180 5 736829 / 0180 5 PENTAX Austria Hotline: 0820 820 255 (http://www.pentax.at) PENTAX U.K. Limited PENTAX House, Heron Drive, Langley,...

Communiqué de presse - Philips
Communiqué de presse - Philips
26/07/2018 - www.philips.fr
Communiqué de presse Elekta et Philips installent un accélérateur linéaire pour la radiothérapie guidée par IRM de haut flux à l Institut Néerlandais du Cancer (NKI) Amsterdam, Pays-Bas  Elekta, Royal Philips et l Institut Néerlandais du Cancer ont annoncé l installation d un accélérateur linéaire pour la radiothérapie guidée par IRM de haut flux (1,5 Tesla). Cet équipement dénommé MR-Linac est conçu pour capturer des images de haute qualité d une tumeur et des tissus voisins,...

Communiqué de presse [PDF - 228 Ko] - Sony
Communiqué de presse [PDF - 228 Ko] - Sony
16/02/2012 - www.sony.fr
! % " # $! · " ! & · ' ! $ ! ! ! ( ! ! ! ' $ ) ' ( ! ! . / $ # 5$ ! ! 6 $ ' ! + ! . " # + ! ! # # $ ( . 0 ! $ ! ! 4 ! * # , - ! + $ !( ! 0 $ #3 # - ' ! 1 +! ! . % ( $ 1 ! & # ( $ 2 # ! # ' # % 7 $ ! ! : ;4 3. 8 9 3 ! # $ ! # + ! % + @A ! . ! !! 6 # & # ! ! ' 7 / ! $ . < . =

Marine, ma tortue à comptines
Marine, ma tortue à comptines
03/04/2012 - www.vtech-jouets.com
Manuel d'utilisation Marine, ma tortue à comptines © 2010 VTech Imprimé en Chine 91-002441-009-000 INTRODUCTION Vous venez d'acquérir Marine, ma tortue à comptines de VTech®. Félicitations ! Voici une jolie et très amusante tortue pour découvrir les formes, les couleurs et les animaux. En manipulant chacun des boutons Bébé entend 5 mélodies et 2 chansons dont celle de « La famille tortue ». Il s'amuse aussi avec les 3 animaux arroseurs ou gicleur. Grâce à cette tortue maline,...

Audio E 550
Audio E 550
22/06/2024 - www.accuphase.com
FOTOS: HEINZ D. KUPSCH, M. WEHNER (2), J. BAUER (4) AUDIOphile 138 AUDIO 3/2006 www.audio.de Maß-Arbeit Der größte und teuerste Vollverstärker von Accuphase im AUDIO-Exklusivtest. Text: Joachim Pfeiffer eine großen Überraschungen auf den ersten Blick: Das neue VollverstärkerFlaggschiff von Accuphase, der E-550, ist minimal höher und einen Hauch schmaler geraten als sein legendärer Vorgänger E-530, die ehedem mattierten Seitenwangen sind nun hochglanzpoliert, und der Preis sinkt...

OM, 132 HBV, 1997-01, GB, DE, FR, ES, IT, NL
OM, 132 HBV, 1997-01, GB, DE, FR, ES, IT, NL
19/06/2012 - www.husqvarna.com