A Large-Scale Study of the Evolution of Web Pages - Microsoft

Extrait du fichier (au format texte) :

A Large-Scale Study of the Evolution of Web Pages
Dennis Fetterly
Hewlett Packard Labs
1501 Page Mill Road
Palo Alto, CA 94304
dennis.fetterly@hp.com

Mark Manasse

Marc Najork

Microsoft Research
Microsoft Research
1065 La Avenida
1065 La Avenida
Mountain View, CA 94043 Mountain View, CA 94043
manasse@microsoft.com najork@microsoft.com

Janet Wiener
Hewlett Packard Labs
1501 Page Mill Road
Palo Alto, CA 94304
janet.wiener@hp.com

ABSTRACT

1. INTRODUCTION

How fast does the web change? Does most of the content remain unchanged once it has been authored, or are the documents continuously updated? Do pages change a little or a lot? Is the extent of change correlated to any other property of the page? All of these questions are of interest to those who mine the web, including all the popular search engines, but few studies have been performed to date to answer them.
One notable exception is a study by Cho and Garcia-Molina,
who crawled a set of 720,000 pages on a daily basis over four months, and counted pages as having changed if their MD5 checksum changed. They found that 40% of all web pages in their set changed within a week, and 23% of those pages that fell into the
.com domain changed daily.
This paper expands on Cho and Garcia-Molina s study, both in terms of coverage and in terms of sensitivity to change. We crawled a set of 150,836,209 HTML pages once every week, over a span of
11 weeks. For each page, we recorded a checksum of the page, and a feature vector of the words on the page, plus various other data such as the page length, the HTTP status code, etc. Moreover, we pseudo-randomly selected 0.1% of all of our URLs, and saved the full text of each download of the corresponding pages.
After completion of the crawl, we analyzed the degree of change of each page, and investigated which factors are correlated with

Les promotions



DSCOVR: Randomized Primal-Dual Block Coordinate ... - Microsoft
DSCOVR: Randomized Primal-Dual Block Coordinate ... - Microsoft
23/08/2018 - www.microsoft.com
DSCOVR: Randomized Primal-Dual Block Coordinate Algorithms for Asynchronous Distributed Optimization lin.xiao@microsoft.com Lin Xiao Microsoft Research AI Redmond, WA 98052, USA weiyu@cs.cmu.edu Adams Wei Yu Machine Learning Department, Carnegie Mellon University Pittsburgh, PA 15213, USA qihang-lin@uiowa.edu Qihang Lin Tippie College of Business, The University of Iowa Iowa City, IA 52245, USA wzchen@microsoft.com Weizhu Chen Microsoft AI and Research Redmond, WA 98052, USA October 13,...

MSFT Echo SurfaceLaptopIntel 5g Fact Sheet
MSFT Echo SurfaceLaptopIntel 5g Fact Sheet
13/12/2025 - www.microsoft.com
Windows Hello for Business with facial recognition and Enhanced Sign-In Security Surface Laptop 5G for Business Near-edgeless display and Surface's signature 3:2 ratio for more screen in a compact footprint Premium experiences drive AI advantage anywhere NPUs delivering 40 or 48 TOPS of on-device AI performance to support today's capabilities and tomorrow's innovations5 Anti-reflective technology reduces reflections up to 50% Exceptional AI-enabled collaboration and Copilot+ PC1 productivity...

C dric FOURNET LE JOIN-CALCUL : UN CALCUL POUR ... - Microsoft
C dric FOURNET LE JOIN-CALCUL : UN CALCUL POUR ... - Microsoft
11/04/2018 - www.microsoft.com
TH SE pr sent e L' COLE POLYTECHNIQUE pour obtenir le titre de DOCTEUR DE L' COLE POLYTECHNIQUE sp cialit : INFORMATIQUE par C dric FOURNET Sujet de la th se : LE JOIN-CALCUL : UN CALCUL POUR LA PROGRAMMATION R PARTIE ET MOBILE The Join-Calculus: a Calculus for Distributed Mobile Programming Soutenue le 23 Novembre 1998 devant le jury compos de : MM. Robin Milner Roberto Amadio G rard Boudol Jean-Jacques L vy G rard Berry Luca Cardelli Georges Gonthier Pr sident Rapporteurs Directeur de th...

Architectures reconfigurables et traitement de proble`mes ... - Microsoft
Architectures reconfigurables et traitement de proble`mes ... - Microsoft
16/11/2016 - www.microsoft.com
RECHERCHE Architectures reconfigurables et traitement de proble`mes NP-difficiles : un nouveau domaine d application Youssef Hamadi    David Merceron  '  ' LIRMM, UMR 5506 CNRS/Universite´ Montpellier II 161, Rue Ada, 34392 Montpellier Cedex 5 hamadi@lirmm.fr ''' EURIWARE, 12-14 rue du fort de St-Cyr 78067 St Quentin-en-Yvelines Cedex damercer@euriware.fr RE´SUME´. L algorithme GSAT est un algorithme de recherche locale. Cette me´thode recherche la premie`re instanciation...

User-Driven Access Control: Rethinking Permission ... - CiteSeerX
User-Driven Access Control: Rethinking Permission ... - CiteSeerX
23/08/2018 - www.microsoft.com
User-Driven Access Control: Rethinking Permission Granting in Modern Operating Systems Franziska Roesner, Tadayoshi Kohno {franzi, yoshi}@cs.washington.edu University of Washington Alexander Moshchuk, Bryan Parno, Helen J. Wang {alexmos, parno, helenw}@microsoft.com Microsoft Research, Redmond Crispin Cowan crispin@microsoft.com Microsoft Abstract tionality and security for access to the user s data and resources. From a functionality standpoint, isolation inhibits the client-side manipulation...

D6. 4: Final evaluation of CLASSiC TownInfo and ... - Microsoft
D6. 4: Final evaluation of CLASSiC TownInfo and ... - Microsoft
23/11/2017 - www.microsoft.com
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/228835240 D6. 4: Final evaluation of CLASSiC TownInfo and Appointment Scheduling systems Article · May 2011 CITATIONS READS 15 56 11 authors, including: Helen Hastie Filip Jurcicek Heriot-Watt University Charles University in Prague 105 PUBLICATIONS 858 CITATIONS 55 PUBLICATIONS 439 CITATIONS SEE PROFILE SEE PROFILE Oliver Joseph Lemon Steve Young Heriot-Watt University University of Cambridge 323 PUBLICATIONS 3,678 CITATIONS 310 PUBLICATIONS 14,308 CITATIONS SEE PROFILE SEE PROFILE Some of the authors of this publication are also working on these related projects: MaDrIgAL: Multi-Dimensional Interaction management and Adaptive Learning View project ...

Entanglement and Rigidity in Percolation Models ... - Alexander Holroyd
Entanglement and Rigidity in Percolation Models ... - Alexander Holroyd
22/05/2017 - www.microsoft.com
 ''&'''''' '&'!' &'' &''&''''''' ' ' ''''''''''''"' ''#' '$'%&''&&'''*')'+'!',''-''''.')'+' '/ ')'0''1&''!''2 ''3 '4'6'5'8'7''9';':'=''§'H''£'Œ'X'© '’''“'”'','¾'K''‘''£'Œ'‹'“'”!’'8'’''Š''Œ''Š''›'ž'’'''£'Œ'ž'Š'­'Š',!’'8'’'''£!’'H'¥&`''œ'Š',!”''Š',!’'8'’'''£!’'H'™&'Œ'ž'“'”'¥&`'“'œ'™'H'“'œ'’'¸'¨'£'²'‹'¬''Ž'@'Ž&`'›'ž'Š',''œ'¨$i'›'ž'§'V'Š',''£'®%Ï'“'”!’'H'¥'H'»&`'’'' 'H'Š'­!”''Š'z''£!’'K'“'”!’'H'¥ 'Ž'£'$c'’'' 'H'Š','›'ž'Š$e'’''Š''Œ'!”''›'­'“'”'›'´''£'›'´''¢'Ž&`''œ''”'Ž'h'¤'‡'›','²'>'±''¥&`'Œ''t'§'H' '0'“'”!’'¯'’'' 'H'Œ''Š''Š'#'©'P'™'H'“'”!”''Š',!’'H'›'ž'“'”'Ž&`!’'H''£''¹'›''§'|''£'''Š'¼'“'”'›'Q'Š',!’'8'’'.''£!’'K'¥&`''”'Š''™ '“'«''´'“'«'’'w''z''£!’'H!’'K'Ž'£'’$i'Ÿ'V'Š'0'R'n'§'H'¾'H''”''œ'Š','™'p''£'§'|'t'Œ'ž'’'zÏ&'¤'‡' 'K'Š',!’Ð'’'' 'H'Š''Š','™'H'¥&`'Š''›''t'Œ''Š'+'Œ''Š''¥'8''£'Œ''™'K'Š','™Ñ't'›$i'§'K' %Ï'¨'@'›'ž'“'”''z't''µ'''Ž&`!’%²'© !’'H'Š''''’''“'”'Ž&`!’'H'›$i!”&''£'™'H'Š''Ž'£''*'Š',''”''£'›'ž'’''“'”'t'²'0'±Ò'¥&`'Œ'''£'§'H' Ð'“'”'›$i'Œ''“'”'¥'£'“'”'™'p'“'œ''­'“'«'’$i''z''£!’'H!’'K'Ž'£'’'º'Ÿ'"'Š'1'R'n'™'H'Š'#''¢'Ž&`'Œ'!”''Š','™&Ï$c'¤'‡' 'H'Š'!’ '’'' 'H'Š'º'Š','™'K'¥&`'Š','›'w''£'Œ'ž'Š'º'Œ''Š''¥'8''£'Œ'ž'™'H'Š','™Ó''£'›'-'›''Ž&`''œ'“'”'™''Œ'ž'Ž%²'™'K'›'-'¤'‡' 'H'“'”''.' '…'',''£!’'…'§'H'“'«'¶&`'Ž'£'’'w''F'’'-'’'' 'K'Š''¶&`'Š''Œ'ž'’''“'œ'','Š''›','²$i'·'*' 'H'Š','›'ž'Š '“'œ!’%Ï'’''¾'H'“'œ'’''“'œ'¶'£'Š$e!’'H'Ž'£'’''“'”'Ž&`!’'H'›'­'¤'‡'“'œ''”'&'Ÿ'"'Š$e''¢'Ž&`'Œ'ž!”'¯''£''œ'“'”'›'ž'Š','™'0''‘''F'’''Š','Œ''² Ô'=!’'8'’'.''£!’'H'¥'£''”'Š',!”''Š',!’'8'’'''£!’'H'™'p'Œ''“'œ'¥&`'“'”'™'H'“'«'’'¸'¨&c'“'œ!’'p'§'"'Š','Œ'ž'','Ž&`''”'t'’''“'œ'Ž&`!’'p''£'Œ''Š''Ž'£''´'“'”!’'8'’''Š''Œ''Š','›'X'’'''¢'Ž'£'Œ$i'›'ž'Š''¶'£'Š','Œ''t''Q'Œ'ž'Š'z''F'© '›'ž'Ž&`!’'H'›','²ÖÕ×'“'”'Œ''›'X'’'''«'¨&`'»'*'’'' 'H'Š'#'¨Ø' '|''z'¶&`'Š'p'“'”!”''§'"'Ž&`'Œ'ž'’'''£!’'8'’'...

Surfacelaptopgo3 Ecoprofile
Surfacelaptopgo3 Ecoprofile
12/02/2026 - www.microsoft.com
ECOPROFILE Surface Laptop Go 3 Surface Laptop Go 3 Ecoprofile Last updated Oct 2023 ? Microsoft Corporation. All rights reserved Our commitments Microsoft is committed to becoming carbon negative, water positive, and zero waste by 20301. Surface plays a key role in helping Microsoft achieve these goals, so we are working to reduce the environmental impacts of our Surface products. Our approach embeds sustainability into the design, manufacturing, distribution, use, and end-of-life management...
 
 

MacBook Air User Guide - Support - Apple
MacBook Air User Guide - Support - Apple
27/11/2014 - manuals.info.apple.com
Congratulations, you and your MacBook Air were made for each other. Say hello to your MacBook Air. www.apple.com/macbookair Built-in iSight camera and iChat Video chat with friends and family anywhere in the world. Mac Help isight Finder Browse your files like you browse your music with Cover Flow. Mac Help finder MacBook Air Multi-Touch trackpad Scroll through files, adjust images, and enlarge text using just your fingers. Swipe Rotate Four fingers swipe Pinch and expand Mac Help trackpad Scroll Mac...

LED TV 16/9 - Thomson
LED TV 16/9 - Thomson
25/04/2018 - www.thomsontv.fr
LED TV 16/9 FT 48FU4243C Spécifications du 48FU4243C RA Résolution : 1920 X 1080 Luminosité :300 cd/m2 Contraste dynamique : Mega Contrast Son : nc D Prêt pour la télévision numérique haute définition (Canal Ready *: Port CI+ compatible avec le mini-décodeur (ou module) CANAL READY permettant de recevoir les chaînes payantes du groupe CANAL+ via la TNT Connectique numérique : 2 HDMI - HDCP - 1 CMP Port USB Multimédia (vidéo haute définition, photo, musique) : 1 Péritel : 1 Entrée...

00 F61547C6F7151EDDB0DDF01AD0E553AA
00 F61547C6F7151EDDB0DDF01AD0E553AA
25/06/2024 - media.miele.com
Fiche d'information sur le produit RÈGLEMENT DÉLÉGUÉ (UE) 2019/2017 Nom du fournisseur ou marque commerciale Miele Adresse du fournisseur Carl-Miele-Straße 29, 33332 Gütersloh, DE Référence du modèle G 5312 SCi Active Plus Paramètres généraux du produit Paramètre Capacité nominale(a) (ps) Valeur 14 Paramètre Dimensions en cm IEE(a) 43.5 Classe d'efficacité énergétique(a) Indice de performance de lavage(a) 1.121 Indice de performance de séchagelavage(a) Consommation...

Logitech Zone Product Family Brochure
Logitech Zone Product Family Brochure
12/09/2024 - www.logitech.com
SOLUTIONS POUR LES ENTREPRISES: ESPACE DE TRAVAIL PERSONNEL LE NOUVEL ESPACE DE TRAVAIL PERSONNEL Dans une ?poque o? les ?quipes se r?organisent dans un mod?le hybride plus durable, il n'existe pas de solution unique pour les espaces de travail. C'est pourquoi Logitech cr?e des solutions technologiques centr?es sur l'humain qui aident les utilisateurs ? cr?er et ? collaborer efficacement, o? qu'ils soient. Nos solutions d'espace de travail personnel compl?tes aideront votre ?quipe ? faire de...

Polycom® RealPresence® Medialign™
Polycom® RealPresence® Medialign™
16/02/2017 - www.polycom.fr
FICHE TECHNIQUE Polycom® RealPresence® Medialign!" L'avenir est à la collaboration vidéo tout-en-un Les entreprises ont besoin d'équipes plus réactives et plus efficaces. Le désir de se réunir rapidement où que l'on se trouve, de partager des idées et de collaborer sur des projets importants a alimenté la demande en matière de collaboration visuelle. C'est pour satisfaire ce besoin croissant que les entreprises font appel à des solutions clé en main qui peuvent être installées sans...

Scanner photo HP Scanjet G3110
Scanner photo HP Scanjet G3110
22/03/2012 - www.hp.com
Fiche technique Scanner photo HP Scanjet G3110 Présentation générale Le scanner photo HP Scanjet G3110 ­ d'une grande souplesse ­ facilite la pérennisation, le partage et l'archivage de photos et de documents importants. Grâce à ses touches de fonction directes et à son interface orientée tâches, le scanner photo HP Scanjet G3110 est une solution idéale pour les clients qui désirent numériser une grande diversité de supports dans une qualité exceptionnelle: photos, transparents,...