A Large-Scale Study of the Evolution of Web Pages - Microsoft

Extrait du fichier (au format texte) :

A Large-Scale Study of the Evolution of Web Pages
Dennis Fetterly
Hewlett Packard Labs
1501 Page Mill Road
Palo Alto, CA 94304
dennis.fetterly@hp.com

Mark Manasse

Marc Najork

Microsoft Research
Microsoft Research
1065 La Avenida
1065 La Avenida
Mountain View, CA 94043 Mountain View, CA 94043
manasse@microsoft.com najork@microsoft.com

Janet Wiener
Hewlett Packard Labs
1501 Page Mill Road
Palo Alto, CA 94304
janet.wiener@hp.com

ABSTRACT

1. INTRODUCTION

How fast does the web change? Does most of the content remain unchanged once it has been authored, or are the documents continuously updated? Do pages change a little or a lot? Is the extent of change correlated to any other property of the page? All of these questions are of interest to those who mine the web, including all the popular search engines, but few studies have been performed to date to answer them.
One notable exception is a study by Cho and Garcia-Molina,
who crawled a set of 720,000 pages on a daily basis over four months, and counted pages as having changed if their MD5 checksum changed. They found that 40% of all web pages in their set changed within a week, and 23% of those pages that fell into the
.com domain changed daily.
This paper expands on Cho and Garcia-Molina s study, both in terms of coverage and in terms of sensitivity to change. We crawled a set of 150,836,209 HTML pages once every week, over a span of
11 weeks. For each page, we recorded a checksum of the page, and a feature vector of the words on the page, plus various other data such as the page length, the HTTP status code, etc. Moreover, we pseudo-randomly selected 0.1% of all of our URLs, and saved the full text of each download of the corresponding pages.
After completion of the crawl, we analyzed the degree of change of each page, and investigated which factors are correlated with

Les promotions

Promo
14.9 € 12.9 €


User-Driven Access Control: Rethinking Permission ... - CiteSeerX
User-Driven Access Control: Rethinking Permission ... - CiteSeerX
23/08/2018 - www.microsoft.com
User-Driven Access Control: Rethinking Permission Granting in Modern Operating Systems Franziska Roesner, Tadayoshi Kohno {franzi, yoshi}@cs.washington.edu University of Washington Alexander Moshchuk, Bryan Parno, Helen J. Wang {alexmos, parno, helenw}@microsoft.com Microsoft Research, Redmond Crispin Cowan crispin@microsoft.com Microsoft Abstract tionality and security for access to the user s data and resources. From a functionality standpoint, isolation inhibits the client-side manipulation...

MatrixExplorer: Un système pour l'analyse exploratoire de ... - Microsoft
MatrixExplorer: Un système pour l'analyse exploratoire de ... - Microsoft
22/05/2017 - www.microsoft.com
MatrixExplorer: Un système pour l analyse exploratoire de réseaux sociaux Nathalie Henry Jean-Daniel Fekete INRIA Futurs/LRI/University of Sydney Bât 490, Université Paris-Sud 91405 Orsay Cedex Nathalie.Henry@lri.fr INRIA Futurs/LRI Bât 490, Université Paris-Sud 91405 Orsay Cedex Jean-Daniel.Fekete@inria.fr RESUME ABSTRACT Dans cet article, nous présentons le système MatrixExplorer destiné à explorer des réseaux sociaux. Il a été conçu pour des chercheurs en sciences sociales...

MSFT Echo Microsoft Surface Pro 11th Edition Fact Sheet
MSFT Echo Microsoft Surface Pro 11th Edition Fact Sheet
12/02/2026 - www.microsoft.com
Surface Pro for Business Pioneering versatility matched by intelligent power Unlock high performance in a form factor that redefines what a laptop can do. The brilliant display with touch and inking, combined with an adjustable kickstand make work comfortable in more places. Choose from Wi-Fi+5G or Wi-Fi only. Snapdragon? X Elite and Plus processors deliver speed and efficiency with CPUs and industry-defining NPU driving up to 45 TOPS for seamless on-device AI Adapts to changing workstyles Exceptional...

Entanglement and Rigidity in Percolation Models ... - Alexander Holroyd
Entanglement and Rigidity in Percolation Models ... - Alexander Holroyd
22/05/2017 - www.microsoft.com
 ''&'''''' '&'!' &'' &''&''''''' ' ' ''''''''''''"' ''#' '$'%&''&&'''*')'+'!',''-''''.')'+' '/ ')'0''1&''!''2 ''3 '4'6'5'8'7''9';':'=''§'H''£'Œ'X'© '’''“'”'','¾'K''‘''£'Œ'‹'“'”!’'8'’''Š''Œ''Š''›'ž'’'''£'Œ'ž'Š'­'Š',!’'8'’'''£!’'H'¥&`''œ'Š',!”''Š',!’'8'’'''£!’'H'™&'Œ'ž'“'”'¥&`'“'œ'™'H'“'œ'’'¸'¨'£'²'‹'¬''Ž'@'Ž&`'›'ž'Š',''œ'¨$i'›'ž'§'V'Š',''£'®%Ï'“'”!’'H'¥'H'»&`'’'' 'H'Š'­!”''Š'z''£!’'K'“'”!’'H'¥ 'Ž'£'$c'’'' 'H'Š','›'ž'Š$e'’''Š''Œ'!”''›'­'“'”'›'´''£'›'´''¢'Ž&`''œ''”'Ž'h'¤'‡'›','²'>'±''¥&`'Œ''t'§'H' '0'“'”!’'¯'’'' 'H'Œ''Š''Š'#'©'P'™'H'“'”!”''Š',!’'H'›'ž'“'”'Ž&`!’'H''£''¹'›''§'|''£'''Š'¼'“'”'›'Q'Š',!’'8'’'.''£!’'K'¥&`''”'Š''™ '“'«''´'“'«'’'w''z''£!’'H!’'K'Ž'£'’$i'Ÿ'V'Š'0'R'n'§'H'¾'H''”''œ'Š','™'p''£'§'|'t'Œ'ž'’'zÏ&'¤'‡' 'K'Š',!’Ð'’'' 'H'Š''Š','™'H'¥&`'Š''›''t'Œ''Š'+'Œ''Š''¥'8''£'Œ''™'K'Š','™Ñ't'›$i'§'K' %Ï'¨'@'›'ž'“'”''z't''µ'''Ž&`!’%²'© !’'H'Š''''’''“'”'Ž&`!’'H'›$i!”&''£'™'H'Š''Ž'£''*'Š',''”''£'›'ž'’''“'”'t'²'0'±Ò'¥&`'Œ'''£'§'H' Ð'“'”'›$i'Œ''“'”'¥'£'“'”'™'p'“'œ''­'“'«'’$i''z''£!’'H!’'K'Ž'£'’'º'Ÿ'"'Š'1'R'n'™'H'Š'#''¢'Ž&`'Œ'!”''Š','™&Ï$c'¤'‡' 'H'Š'!’ '’'' 'H'Š'º'Š','™'K'¥&`'Š','›'w''£'Œ'ž'Š'º'Œ''Š''¥'8''£'Œ'ž'™'H'Š','™Ó''£'›'-'›''Ž&`''œ'“'”'™''Œ'ž'Ž%²'™'K'›'-'¤'‡' 'H'“'”''.' '…'',''£!’'…'§'H'“'«'¶&`'Ž'£'’'w''F'’'-'’'' 'K'Š''¶&`'Š''Œ'ž'’''“'œ'','Š''›','²$i'·'*' 'H'Š','›'ž'Š '“'œ!’%Ï'’''¾'H'“'œ'’''“'œ'¶'£'Š$e!’'H'Ž'£'’''“'”'Ž&`!’'H'›'­'¤'‡'“'œ''”'&'Ÿ'"'Š$e''¢'Ž&`'Œ'ž!”'¯''£''œ'“'”'›'ž'Š','™'0''‘''F'’''Š','Œ''² Ô'=!’'8'’'.''£!’'H'¥'£''”'Š',!”''Š',!’'8'’'''£!’'H'™'p'Œ''“'œ'¥&`'“'”'™'H'“'«'’'¸'¨&c'“'œ!’'p'§'"'Š','Œ'ž'','Ž&`''”'t'’''“'œ'Ž&`!’'p''£'Œ''Š''Ž'£''´'“'”!’'8'’''Š''Œ''Š','›'X'’'''¢'Ž'£'Œ$i'›'ž'Š''¶'£'Š','Œ''t''Q'Œ'ž'Š'z''F'© '›'ž'Ž&`!’'H'›','²ÖÕ×'“'”'Œ''›'X'’'''«'¨&`'»'*'’'' 'H'Š'#'¨Ø' '|''z'¶&`'Š'p'“'”!”''§'"'Ž&`'Œ'ž'’'''£!’'8'’'...

DictaNum : système de dialogue incrémental pour la dictée ... - Microsoft
DictaNum : système de dialogue incrémental pour la dictée ... - Microsoft
23/11/2017 - www.microsoft.com
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/262881756 DictaNum : système de dialogue incrémental pour la dictée de numéros. Conference Paper · July 2014 CITATIONS READS 0 57 3 authors, including: Hatim Khouzaimi Romain Laroche Orange Labs / Laboratoire Informatique d'Avi & Microsoft Maluuba 12 PUBLICATIONS 42 CITATIONS 58 PUBLICATIONS 185 CITATIONS SEE PROFILE SEE PROFILE All content following this page was uploaded by Hatim Khouzaimi on 06 June 2014. The user has requested enhancement of the downloaded file. 21ème...

1 Introduction - Microsoft
1 Introduction - Microsoft
11/04/2018 - www.microsoft.com
One-Way Accumulators: A Decentralized Alternative to Digital Signatures (Extended Abstract) Josh Benaloh Clarkson University Michael de Mare Giordano Automation Abstract This paper describes a simple candidate one-way hash function which satis es a quasi-commutative property that allows it to be used as an accumulator. This property allows protocols to be developed in which the need for a trusted central authority can be eliminated. Space-e cient distributed protocols are given for document time...

MSR Quantum applications - Microsoft
MSR Quantum applications - Microsoft
23/08/2018 - www.microsoft.com
( What Can We Do with a Quantum Computer? ( Matthias Troyer  Station Q, ETH Zurich | 1 Classical computers have come a long way Antikythera mechanism ENIAC astronomical positions (1946) (100 BC) Kelvin s harmonic analyzer prediction of tides (1878) Difference Engine (1822) Is there anything that we cannot solve on future supercomputers? Titan, ORNL (2013) Matthias Troyer | | 2 How long will Moore s law continue? Do we see signs of the end of Moore s law? Can we go below 7nm...

C dric FOURNET LE JOIN-CALCUL : UN CALCUL POUR ... - Microsoft
C dric FOURNET LE JOIN-CALCUL : UN CALCUL POUR ... - Microsoft
11/04/2018 - www.microsoft.com
TH SE pr sent e L' COLE POLYTECHNIQUE pour obtenir le titre de DOCTEUR DE L' COLE POLYTECHNIQUE sp cialit : INFORMATIQUE par C dric FOURNET Sujet de la th se : LE JOIN-CALCUL : UN CALCUL POUR LA PROGRAMMATION R PARTIE ET MOBILE The Join-Calculus: a Calculus for Distributed Mobile Programming Soutenue le 23 Novembre 1998 devant le jury compos de : MM. Robin Milner Roberto Amadio G rard Boudol Jean-Jacques L vy G rard Berry Luca Cardelli Georges Gonthier Pr sident Rapporteurs Directeur de th...
 
 

Pr Information Relating To Jacques Aschenbroichs Compensation
Pr Information Relating To Jacques Aschenbroichs Compensation
23/06/2024 - www.valeo.com
Free translation for information purposes only Information relating to Jacques Aschenbroich's compensation for his role as Chairman of the Board of Directors in anticipation of the separation of the roles of Chairman of the Board of Directors and Chief Executive Officer from January 2022 In accordance with the succession plan unanimously approved by the Board of Directors on October 27, 2020 and disclosed on the same day, Jacques Aschenbroich will continue to act as Chairman of the Board of Directors...

Rugged Combo 3 Product Brief Es
Rugged Combo 3 Product Brief Es
05/04/2025 - www.logitech.com
Logitech Rugged Combo 3 Permite a estudiantes escribir, crear y progresar sin descuidar la protecci?n de sus iPads en todo momento LLEVA EL APRENDIZAJE A UN NIVEL SUPERIOR En un t?pico d?a lectivo, los estudiantes tienen diferentes asignaturas y varios tipos de tareas. Esto hace que la flexibilidad para interactuar con el material de diferentes maneras sea una necesidad. La Rugged Combo 3, una funda protectora con teclado para iPad? (7.?, 8.? y 9.? generaci?n), lleva el aprendizaje digital a otro...

Squeezebox Radion lisäkäyttöohje - Logitech
Squeezebox Radion lisäkäyttöohje - Logitech
05/12/2014 - www.logitech.com
Squeezebox Radion lisäkäyttöohje 10/14/2009 Squeezebox Radion lisäkäyttöohje 0 Squeezebox Radion lisäkäyttöohje 10/14/2009 Sisällys Kiitos ...................................................................................................................................................................................... 4 Käyttöoppaat ............................................................................................................................................................................

Mx Brio 705 For Business Datasheet
Mx Brio 705 For Business Datasheet
25/09/2024 - www.logitech.com
DATASHEET MX BRIO 705 FOR BUSINESS MX Brio 705 for Business is a premium 4K webcam for advanced employees and executives. Powered by our largest image sensor yet, custom-designed lens, and AI image enhancement, MX Brio 705 for Business provides a crisp, authentic video experience. Completely reimagined and designed for sustainability, MX Brio 705 for Business provides immersive, true-to-life video meetings on every major video platform, in various environments. " Superior 4K video quality: Powered...

Fr 210331 Cp Thermofast
Fr 210331 Cp Thermofast
21/06/2024 - www.terraillon.com
Communiqué de presse - Le 31 mars 2021 NOUVEAU THERMOMÈTRE 3 EN 1, THERMO FAST : PRENDRE LA TEMPERATURE DEVIENT UN VRAI UN JEU D'ENFANT ! Terraillon, marque iconique française et leader européen sur le marché du bien-être, étend sa gamme de produits dédiée à la santé. Après Thermo Distance, un thermomètre qui permet une prise de température à distance, Terraillon propose une nouvelle solution ergonomique : le Thermo Fast. Ce thermomètre 3 en 1 simplifie la prise de température...

DON PHILANTHROPIQUE – MODALITÉS ET CONDITIONS Le ...
DON PHILANTHROPIQUE – MODALITÉS ET CONDITIONS Le ...
22/03/2012 - www.hp.com
Hewlett-Packard Company 3000 Hanover Street, Bldg 20D Mail Stop 1029 Palo Alto, CA 94304 www.hp.com DON PHILANTHROPIQUE ­ MODALITÉS ET CONDITIONS Le demandeur (le «demandeur» ou «donataire») sera considéré comme l'entité sans but lucratif nommée dans le système en ligne d'administration des dons de Hewlett-Packard Company («HP»). Par les présentes, le demandeur reconnaît et accepte que sa demande de don n'oblige d'aucune façon HP à accorder en totalité ou en partie le don faisant...