A Large-Scale Study of the Evolution of Web Pages - Microsoft

Extrait du fichier (au format texte) :

A Large-Scale Study of the Evolution of Web Pages
Dennis Fetterly
Hewlett Packard Labs
1501 Page Mill Road
Palo Alto, CA 94304
dennis.fetterly@hp.com

Mark Manasse

Marc Najork

Microsoft Research
Microsoft Research
1065 La Avenida
1065 La Avenida
Mountain View, CA 94043 Mountain View, CA 94043
manasse@microsoft.com najork@microsoft.com

Janet Wiener
Hewlett Packard Labs
1501 Page Mill Road
Palo Alto, CA 94304
janet.wiener@hp.com

ABSTRACT

1. INTRODUCTION

How fast does the web change? Does most of the content remain unchanged once it has been authored, or are the documents continuously updated? Do pages change a little or a lot? Is the extent of change correlated to any other property of the page? All of these questions are of interest to those who mine the web, including all the popular search engines, but few studies have been performed to date to answer them.
One notable exception is a study by Cho and Garcia-Molina,
who crawled a set of 720,000 pages on a daily basis over four months, and counted pages as having changed if their MD5 checksum changed. They found that 40% of all web pages in their set changed within a week, and 23% of those pages that fell into the
.com domain changed daily.
This paper expands on Cho and Garcia-Molina s study, both in terms of coverage and in terms of sensitivity to change. We crawled a set of 150,836,209 HTML pages once every week, over a span of
11 weeks. For each page, we recorded a checksum of the page, and a feature vector of the words on the page, plus various other data such as the page length, the HTTP status code, etc. Moreover, we pseudo-randomly selected 0.1% of all of our URLs, and saved the full text of each download of the corresponding pages.
After completion of the crawl, we analyzed the degree of change of each page, and investigated which factors are correlated with

Les promotions



Entanglement and Rigidity in Percolation Models ... - Alexander Holroyd
Entanglement and Rigidity in Percolation Models ... - Alexander Holroyd
22/05/2017 - www.microsoft.com
 ''&'''''' '&'!' &'' &''&''''''' ' ' ''''''''''''"' ''#' '$'%&''&&'''*')'+'!',''-''''.')'+' '/ ')'0''1&''!''2 ''3 '4'6'5'8'7''9';':'=''§'H''£'Œ'X'© '’''“'”'','¾'K''‘''£'Œ'‹'“'”!’'8'’''Š''Œ''Š''›'ž'’'''£'Œ'ž'Š'­'Š',!’'8'’'''£!’'H'¥&`''œ'Š',!”''Š',!’'8'’'''£!’'H'™&'Œ'ž'“'”'¥&`'“'œ'™'H'“'œ'’'¸'¨'£'²'‹'¬''Ž'@'Ž&`'›'ž'Š',''œ'¨$i'›'ž'§'V'Š',''£'®%Ï'“'”!’'H'¥'H'»&`'’'' 'H'Š'­!”''Š'z''£!’'K'“'”!’'H'¥ 'Ž'£'$c'’'' 'H'Š','›'ž'Š$e'’''Š''Œ'!”''›'­'“'”'›'´''£'›'´''¢'Ž&`''œ''”'Ž'h'¤'‡'›','²'>'±''¥&`'Œ''t'§'H' '0'“'”!’'¯'’'' 'H'Œ''Š''Š'#'©'P'™'H'“'”!”''Š',!’'H'›'ž'“'”'Ž&`!’'H''£''¹'›''§'|''£'''Š'¼'“'”'›'Q'Š',!’'8'’'.''£!’'K'¥&`''”'Š''™ '“'«''´'“'«'’'w''z''£!’'H!’'K'Ž'£'’$i'Ÿ'V'Š'0'R'n'§'H'¾'H''”''œ'Š','™'p''£'§'|'t'Œ'ž'’'zÏ&'¤'‡' 'K'Š',!’Ð'’'' 'H'Š''Š','™'H'¥&`'Š''›''t'Œ''Š'+'Œ''Š''¥'8''£'Œ''™'K'Š','™Ñ't'›$i'§'K' %Ï'¨'@'›'ž'“'”''z't''µ'''Ž&`!’%²'© !’'H'Š''''’''“'”'Ž&`!’'H'›$i!”&''£'™'H'Š''Ž'£''*'Š',''”''£'›'ž'’''“'”'t'²'0'±Ò'¥&`'Œ'''£'§'H' Ð'“'”'›$i'Œ''“'”'¥'£'“'”'™'p'“'œ''­'“'«'’$i''z''£!’'H!’'K'Ž'£'’'º'Ÿ'"'Š'1'R'n'™'H'Š'#''¢'Ž&`'Œ'!”''Š','™&Ï$c'¤'‡' 'H'Š'!’ '’'' 'H'Š'º'Š','™'K'¥&`'Š','›'w''£'Œ'ž'Š'º'Œ''Š''¥'8''£'Œ'ž'™'H'Š','™Ó''£'›'-'›''Ž&`''œ'“'”'™''Œ'ž'Ž%²'™'K'›'-'¤'‡' 'H'“'”''.' '…'',''£!’'…'§'H'“'«'¶&`'Ž'£'’'w''F'’'-'’'' 'K'Š''¶&`'Š''Œ'ž'’''“'œ'','Š''›','²$i'·'*' 'H'Š','›'ž'Š '“'œ!’%Ï'’''¾'H'“'œ'’''“'œ'¶'£'Š$e!’'H'Ž'£'’''“'”'Ž&`!’'H'›'­'¤'‡'“'œ''”'&'Ÿ'"'Š$e''¢'Ž&`'Œ'ž!”'¯''£''œ'“'”'›'ž'Š','™'0''‘''F'’''Š','Œ''² Ô'=!’'8'’'.''£!’'H'¥'£''”'Š',!”''Š',!’'8'’'''£!’'H'™'p'Œ''“'œ'¥&`'“'”'™'H'“'«'’'¸'¨&c'“'œ!’'p'§'"'Š','Œ'ž'','Ž&`''”'t'’''“'œ'Ž&`!’'p''£'Œ''Š''Ž'£''´'“'”!’'8'’''Š''Œ''Š','›'X'’'''¢'Ž'£'Œ$i'›'ž'Š''¶'£'Š','Œ''t''Q'Œ'ž'Š'z''F'© '›'ž'Ž&`!’'H'›','²ÖÕ×'“'”'Œ''›'X'’'''«'¨&`'»'*'’'' 'H'Š'#'¨Ø' '|''z'¶&`'Š'p'“'”!”''§'"'Ž&`'Œ'ž'’'''£!’'8'’'...

1 Introduction - Microsoft
1 Introduction - Microsoft
11/04/2018 - www.microsoft.com
One-Way Accumulators: A Decentralized Alternative to Digital Signatures (Extended Abstract) Josh Benaloh Clarkson University Michael de Mare Giordano Automation Abstract This paper describes a simple candidate one-way hash function which satis es a quasi-commutative property that allows it to be used as an accumulator. This property allows protocols to be developed in which the need for a trusted central authority can be eliminated. Space-e cient distributed protocols are given for document time...

Vers une approche simplifiée pour introduire le caractère ... - Microsoft
Vers une approche simplifiée pour introduire le caractère ... - Microsoft
23/11/2017 - www.microsoft.com
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/262881208 Vers une approche simplifiée pour introduire le caractère incrémental dans les systèmes de dialogue Conference Paper · July 2014 CITATION READS 1 26 3 authors, including: Hatim Khouzaimi Romain Laroche Orange Labs / Laboratoire Informatique d'Avi & Microsoft Maluuba 12 PUBLICATIONS 42 CITATIONS 58 PUBLICATIONS 185 CITATIONS SEE PROFILE All content following this page was uploaded by Hatim Khouzaimi on 28 April 2015. The user has requested enhancement of the downloaded file. SEE PROFILE 21ème...

Msft Echo Microsoft Surface Pro 10 Fact Sheet Row
Msft Echo Microsoft Surface Pro 10 Fact Sheet Row
13/12/2025 - www.microsoft.com
Surface Pro 10 An AI PC built for business, designed for versatility Surface Pro 10 blurs the boundary between hardware and software for peak performance in a secured, lightweight device that adapts to any work style. Employees get the benefits of an AI PC that accelerates Microsoft Copilot* experiences and offers integrated AI engines that enable the next wave of business features. Choose from Wi-Fi+5G or Wi-Fi only. A new era of workplace collaboration Never-ending, on-the-go impact Take advantage...

User-Driven Access Control: Rethinking Permission ... - CiteSeerX
User-Driven Access Control: Rethinking Permission ... - CiteSeerX
23/08/2018 - www.microsoft.com
User-Driven Access Control: Rethinking Permission Granting in Modern Operating Systems Franziska Roesner, Tadayoshi Kohno {franzi, yoshi}@cs.washington.edu University of Washington Alexander Moshchuk, Bryan Parno, Helen J. Wang {alexmos, parno, helenw}@microsoft.com Microsoft Research, Redmond Crispin Cowan crispin@microsoft.com Microsoft Abstract tionality and security for access to the user s data and resources. From a functionality standpoint, isolation inhibits the client-side manipulation...

MSFT Echo SurfaceLaptopIntel Fact Sheet
MSFT Echo SurfaceLaptopIntel Fact Sheet
13/12/2025 - www.microsoft.com
Windows Hello for Business with facial recognition and Enhanced Sign-In Security Surface Laptop for Business Near-edgeless display and Surface's signature 3:2 ratio for more screen in a compact footprint Premium experiences drive AI advantage NPUs delivering 40 or 48 TOPS of on-device AI performance to support today's capabilities and tomorrow's innovations5 Anti-reflective technology reduces reflections up to 50% Optional smart card reader16 Exceptional AI-enabled collaboration and Copilot+...

Microsoft K State Whitepaper 2021 08 17
Microsoft K State Whitepaper 2021 08 17
23/09/2024 - www.microsoft.com
Cloud enclave for academic research Streamlining security and compliance at your institution August 2021 Contents Introduction........................................................................................................ 3 1. Assess where you are today........................................................................ 4 Work directly with researchers to identify challenges............................................................................... 4 Identify existing compliance...

A Large-Scale Study of the Evolution of Web Pages - Microsoft
A Large-Scale Study of the Evolution of Web Pages - Microsoft
23/08/2018 - www.microsoft.com
A Large-Scale Study of the Evolution of Web Pages Dennis Fetterly Hewlett Packard Labs 1501 Page Mill Road Palo Alto, CA 94304 dennis.fetterly@hp.com Mark Manasse Marc Najork Microsoft Research Microsoft Research 1065 La Avenida 1065 La Avenida Mountain View, CA 94043 Mountain View, CA 94043 manasse@microsoft.com najork@microsoft.com Janet Wiener Hewlett Packard Labs 1501 Page Mill Road Palo Alto, CA 94304 janet.wiener@hp.com ABSTRACT 1. INTRODUCTION How fast does the web change? Does most...
 
 

MacBook Pro ????? - Support - Apple
MacBook Pro ????? - Support - Apple
27/11/2014 - manuals.info.apple.com
`mUœÿ`¨TŒ`¨v„ MacBook Pro wf/ Y) W0Š-v„N 0 thunderbolt facetime MacBook Pro vî“ {, 1 zàÿ˜P™0Š-[š0•‹YË 9 SˆÝvÒQgv„‘MNö 9 Š-[š`¨v„ MacBook Pro 16 ‹“ MacBook Pro 2Qewaw bQv•Üj_ 20 22 24 26 30 31 {, 2 zàÿMacBook Pro ‚`¨v„um; MacBook Pro v„Wúg,RŸ€ý‘Mn MacBook Pro v„“uväRŸ€ý‘Mn MacBook Pro N v„P³8Wà Ou(Y‘͉øc§_ŽÌág Ou( MacBook Pro v„–ûl` SÖ_—‰ã{T {, 3 zàÿXž2 MacBook Pro v„eH€ý ...

Manuel d'utilisation
Manuel d'utilisation
03/04/2012 - www.vtech-jouets.com
Manuel d'utilisation DUCATIVE TAC SOLE É TILE CON ® Disney elements © 2011 Disney Chers parents, Chez VTech®, nous sommes conscients que les enfants sont notre avenir. C'est pourquoi tous nos jeux sont conçus de manière à entretenir et à renforcer leur désir d'apprendre. Chaque année, des enfants de plus en plus jeunes s'intéressent aux jeux vidéo. Toutefois, nous comprenons vos craintes sur le contenu de ces jeux, souvent inadapté à l'âge de vos enfants. L'équipe Recherche et...

Fiche produit Sony : 41/1198162985541.pdf
Fiche produit Sony : 41/1198162985541.pdf
16/02/2012 - www.sony.fr
www.sonybiz.net/go-digital Introduction à la technologie des liaisons audio numériques sans fil 03 Pourquoi Sony a choisi de développer des systèmes de liaisons audio numériques sans fil 04 Système de transmission audio numérique sans fil 04 Présentation 04 Emetteur 05 Récepteur 06 06 06 07 Technologies clés - CODEC audio Présentation Caractéristiques Applications 08 Technologies clés - Modulateur et démodulateur numérique 08 Présentation 08 Caractéristiques 10 Applications 11...

Guide de l'utilisateur (PDF) - Canadian Tire
Guide de l'utilisateur (PDF) - Canadian Tire
29/07/2016 - www.husqvarna.com
Gasoline containing up to 10% ethanol (E10) is acceptable for use in this machine. The use of any gasoline exceeding 10% ethanol (E10) will void the product warranty. Vous pouvez utiliser de l essence contenant jusqu à 10 % d éthanol (E10) avec cet appareil. L utilisation d essence contenant plus de 10 % d éthanol annulera la garantie du produit. 115 66 53-32 Operator s Manual Manuel de L Opérateur 6751P / 961330022 Please read the operator's manual carefully and make sure you understand...

eMac - Apple
eMac - Apple
27/11/2014 - manuals.info.apple.com
Pila l Españo AppleCare Instrucciones de sustitución Sigue las instrucciones de este documento atentamente. De lo contrario podrías dañar tu equipo y anular su garantía. Nota : En la web http://www.info.apple.com/installparts/ hay a tu disposición instrucciones por escrito y en vídeo que tratan sobre las piezas que puede instalar el usuario. Herramientas necesarias " Destornillador plano de plástico para extraer la pila de su sujeción Apertura de la puerta de acceso para el usuario 1....

Samsung Laser Printers - Inland Associates Inc.
Samsung Laser Printers - Inland Associates Inc.
21/11/2014 - www.samsung.com
ML-2510 Compact Personal Laser Printers " Up to 25 ppm print speed " First page out in less than 9 seconds ML-2570 " Ultra-compact and stylish design " Up to 1200 x 1200 dpi effective resolution " 10,000 pages per month duty cycle ML-2571N " Up to 32 MB RAM, 400 MHz processor " Compatible with Windows, Macintosh, Linux imagine compact laser printers that offer the speed to handle any business situation. The ML-2510 packs blazing 25 ppm print speeds and 1200 x 600 dpi resolution into...