A Large-Scale Study of the Evolution of Web Pages - Microsoft

Extrait du fichier (au format texte) :

A Large-Scale Study of the Evolution of Web Pages
Dennis Fetterly
Hewlett Packard Labs
1501 Page Mill Road
Palo Alto, CA 94304
dennis.fetterly@hp.com

Mark Manasse

Marc Najork

Microsoft Research
Microsoft Research
1065 La Avenida
1065 La Avenida
Mountain View, CA 94043 Mountain View, CA 94043
manasse@microsoft.com najork@microsoft.com

Janet Wiener
Hewlett Packard Labs
1501 Page Mill Road
Palo Alto, CA 94304
janet.wiener@hp.com

ABSTRACT

1. INTRODUCTION

How fast does the web change? Does most of the content remain unchanged once it has been authored, or are the documents continuously updated? Do pages change a little or a lot? Is the extent of change correlated to any other property of the page? All of these questions are of interest to those who mine the web, including all the popular search engines, but few studies have been performed to date to answer them.
One notable exception is a study by Cho and Garcia-Molina,
who crawled a set of 720,000 pages on a daily basis over four months, and counted pages as having changed if their MD5 checksum changed. They found that 40% of all web pages in their set changed within a week, and 23% of those pages that fell into the
.com domain changed daily.
This paper expands on Cho and Garcia-Molina s study, both in terms of coverage and in terms of sensitivity to change. We crawled a set of 150,836,209 HTML pages once every week, over a span of
11 weeks. For each page, we recorded a checksum of the page, and a feature vector of the words on the page, plus various other data such as the page length, the HTTP status code, etc. Moreover, we pseudo-randomly selected 0.1% of all of our URLs, and saved the full text of each download of the corresponding pages.
After completion of the crawl, we analyzed the degree of change of each page, and investigated which factors are correlated with

Les promotions



Vers une approche simplifiée pour introduire le caractère ... - Microsoft
Vers une approche simplifiée pour introduire le caractère ... - Microsoft
23/11/2017 - www.microsoft.com
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/262881208 Vers une approche simplifiée pour introduire le caractère incrémental dans les systèmes de dialogue Conference Paper · July 2014 CITATION READS 1 26 3 authors, including: Hatim Khouzaimi Romain Laroche Orange Labs / Laboratoire Informatique d'Avi & Microsoft Maluuba 12 PUBLICATIONS 42 CITATIONS 58 PUBLICATIONS 185 CITATIONS SEE PROFILE All content following this page was uploaded by Hatim Khouzaimi on 28 April 2015. The user has requested enhancement of the downloaded file. SEE PROFILE 21ème...

Microsoft K State Whitepaper 2021 08 17
Microsoft K State Whitepaper 2021 08 17
23/09/2024 - www.microsoft.com
Cloud enclave for academic research Streamlining security and compliance at your institution August 2021 Contents Introduction........................................................................................................ 3 1. Assess where you are today........................................................................ 4 Work directly with researchers to identify challenges............................................................................... 4 Identify existing compliance...

D6. 4: Final evaluation of CLASSiC TownInfo and ... - Microsoft
D6. 4: Final evaluation of CLASSiC TownInfo and ... - Microsoft
23/11/2017 - www.microsoft.com
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/228835240 D6. 4: Final evaluation of CLASSiC TownInfo and Appointment Scheduling systems Article · May 2011 CITATIONS READS 15 56 11 authors, including: Helen Hastie Filip Jurcicek Heriot-Watt University Charles University in Prague 105 PUBLICATIONS 858 CITATIONS 55 PUBLICATIONS 439 CITATIONS SEE PROFILE SEE PROFILE Oliver Joseph Lemon Steve Young Heriot-Watt University University of Cambridge 323 PUBLICATIONS 3,678 CITATIONS 310 PUBLICATIONS 14,308 CITATIONS SEE PROFILE SEE PROFILE Some of the authors of this publication are also working on these related projects: MaDrIgAL: Multi-Dimensional Interaction management and Adaptive Learning View project ...

MSFT SurfaceLaptopIntel 5g Fact Sheet
MSFT SurfaceLaptopIntel 5g Fact Sheet
02/10/2025 - www.microsoft.com
Windows Hello for Business with facial recognition and Enhanced Sign-In Security Surface Laptop 5G for Business Near-edgeless display and Surface's signature 3:2 ratio for more screen in a compact footprint Premium experiences drive AI advantage anywhere NPUs delivering 40 or 48 TOPS of on-device AI performance to support today's capabilities and tomorrow's innovations5 Anti-reflective technology reduces reflections up to 50% Exceptional AI-enabled collaboration and Copilot+ PC1 productivity...

C dric FOURNET LE JOIN-CALCUL : UN CALCUL POUR ... - Microsoft
C dric FOURNET LE JOIN-CALCUL : UN CALCUL POUR ... - Microsoft
11/04/2018 - www.microsoft.com
TH SE pr sent e L' COLE POLYTECHNIQUE pour obtenir le titre de DOCTEUR DE L' COLE POLYTECHNIQUE sp cialit : INFORMATIQUE par C dric FOURNET Sujet de la th se : LE JOIN-CALCUL : UN CALCUL POUR LA PROGRAMMATION R PARTIE ET MOBILE The Join-Calculus: a Calculus for Distributed Mobile Programming Soutenue le 23 Novembre 1998 devant le jury compos de : MM. Robin Milner Roberto Amadio G rard Boudol Jean-Jacques L vy G rard Berry Luca Cardelli Georges Gonthier Pr sident Rapporteurs Directeur de th...

MSFT Echo SurfaceLaptopIntel Fact Sheet
MSFT Echo SurfaceLaptopIntel Fact Sheet
13/12/2025 - www.microsoft.com
Windows Hello for Business with facial recognition and Enhanced Sign-In Security Surface Laptop for Business Near-edgeless display and Surface's signature 3:2 ratio for more screen in a compact footprint Premium experiences drive AI advantage NPUs delivering 40 or 48 TOPS of on-device AI performance to support today's capabilities and tomorrow's innovations5 Anti-reflective technology reduces reflections up to 50% Optional smart card reader16 Exceptional AI-enabled collaboration and Copilot+...

DSCOVR: Randomized Primal-Dual Block Coordinate ... - Microsoft
DSCOVR: Randomized Primal-Dual Block Coordinate ... - Microsoft
23/08/2018 - www.microsoft.com
DSCOVR: Randomized Primal-Dual Block Coordinate Algorithms for Asynchronous Distributed Optimization lin.xiao@microsoft.com Lin Xiao Microsoft Research AI Redmond, WA 98052, USA weiyu@cs.cmu.edu Adams Wei Yu Machine Learning Department, Carnegie Mellon University Pittsburgh, PA 15213, USA qihang-lin@uiowa.edu Qihang Lin Tippie College of Business, The University of Iowa Iowa City, IA 52245, USA wzchen@microsoft.com Weizhu Chen Microsoft AI and Research Redmond, WA 98052, USA October 13,...

MSR Quantum applications - Microsoft
MSR Quantum applications - Microsoft
23/08/2018 - www.microsoft.com
( What Can We Do with a Quantum Computer? ( Matthias Troyer  Station Q, ETH Zurich | 1 Classical computers have come a long way Antikythera mechanism ENIAC astronomical positions (1946) (100 BC) Kelvin s harmonic analyzer prediction of tides (1878) Difference Engine (1822) Is there anything that we cannot solve on future supercomputers? Titan, ORNL (2013) Matthias Troyer | | 2 How long will Moore s law continue? Do we see signs of the end of Moore s law? Can we go below 7nm...
 
 

Table des matières - LaCie
Table des matières - LaCie
24/08/2018 - www.lacie.com
LaCie USB 2.0 PCI Card " DESIGN BY SISMO Manuel d utilisation Table des matières page 1 Table des matières 1. Introduction............................................................................................................... 4 2. Votre carte LaCie USB 2.0 PCI Card........................................................................... 5 2.1. Configuration minimale requise...................................................................................................... 5 2.2. Contenu...

Lave-linge - Fiche Produit Selon le règlement délégué (UE) N° 1061 ...
Lave-linge - Fiche Produit Selon le règlement délégué (UE) N° 1061 ...
03/04/2018 - www.brandt.fr
Lave-linge - Fiche Produit Selon le règlement délégué (UE) N° 1061/2010 Marque BRANDT Référence commerciale BT652M 6.5 kg Capacité nominale Classe d'efficacité énergétique Consommation d'énergie de A+++ 164 kWh/an Sur la base de 220 cycles de lavage standard par an pour les programmes coton à 60°C et à 40°C à pleine charge et à demi-charge et de la consommation des modes à faible puissance. La consommation réelle d'énergie dépend des conditions d'utilisation de l'appareil. Consommation...

FR-Canon_Gartner_MFP_Magic_Quadrant_FINAL dec - Canon France
FR-Canon_Gartner_MFP_Magic_Quadrant_FINAL dec - Canon France
19/07/2018 - www.canon.fr
Communiqué de presse Canon se classe dans la catégorie « Leaders » de l étude Magic Quadrant consacrée aux périphériques multifonctions et imprimantes Courbevoie, le 30 novembre 2011 - Canon Europe, leader mondial des solutions d imagerie, se classe dans la catégorie « Leaders » de l étude Magic Quadrant, édition 2011, que consacre Gartner aux périphériques multifonctions et imprimantes1. Les analystes de Gartner notent qu au sortir de la récession, « les marchés renouent avec...

Mode d'emploi
Mode d'emploi
11/07/2017 - www.hotpoint.fr
Mode d emploi SÈCHE-LINGE FR Table des matières Installation, 2 FR Français Où installer le sèche-linge Ventilation Raccordement électrique Informations préliminaires Précautions et conseils, 4 Sécurité générale Économie d énergie et protection de l environnement Soin et entretien, 6 TCS 83B Interruption de l alimentation électrique Nettoyage du filtre après chaque cycle Contrôle du tambour après chaque cycle Vidange du réservoir d eau après chaque cycle Nettoyage...

California Governmental Pricing (January 2013) - Samsung
California Governmental Pricing (January 2013) - Samsung
21/11/2014 - www.samsung.com
State of CA - Government Internet List Price - January 31st, 2013 Part Number 320TSn-3 320MXn-3 400BX 400DX-3 400FP-3 400TS-3 400UX-3 400UXn-3 DE40A SUR40 460DX-3 460FP-3 460TS-3 460UT-2 460UT-B 460UTn-2 460UTn-B 460UX-3 460UXn-3 DE46A UD46A UE46A 514-85901 514-85902 514-85903 514-85904 514-85905 514-85906 514-85907 514-85908 550DX DE55A UD55A UE55A 650FP-2 650TS-2 700TSn-2 820DXN-2 820TSn-2 999-0000 999-0001 999-0003 999-0004 999-0005 999-0007 999-0008 999-0009 999-0010

H. koenig H. koenig
H. koenig H. koenig
22/12/2017 - www.hkoenig.com
H. koenig TC30S KB15 Aspirateur sans sac Hugo 2400 W Après s être enfin débarrassé des sacs à poussières, nous avons décidé de rectifier le principal défaut des aspirateurs sans sac : la perte d aspiration due à la poussière qui se colle au filtre HEPA. Après des années de recherche, nos ingénieurs ont enfin trouvé la solution pour isoler le filtre HEPA et nous avons crée l aspirateur sans sac de seconde génération : HUGO. Nous en avons profité pour y loger les dernières innovations...