skip to main content
10.1145/2835776.2835832acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article
Open access

Improving Website Hyperlink Structure Using Server Logs

Published: 08 February 2016 Publication History

Abstract

Good websites should be easy to navigate via hyperlinks, yet maintaining a high-quality link structure is difficult. Identifying pairs of pages that should be linked may be hard for human editors, especially if the site is large and changes frequently. Further, given a set of useful link candidates, the task of incorporating them into the site can be expensive, since it typically involves humans editing pages. In the light of these challenges, it is desirable to develop data-driven methods for automating the link placement task. Here we develop an approach for automatically finding useful hyperlinks to add to a website. We show that passively collected server logs, beyond telling us which existing links are useful, also contain implicit signals indicating which nonexistent links would be useful if they were to be introduced. We leverage these signals to model the future usefulness of yet nonexistent links. Based on our model, we define the problem of link placement under budget constraints and propose an efficient algorithm for solving it. We demonstrate the effectiveness of our approach by evaluating it on Wikipedia, a large website for which we have access to both server logs (used for finding useful new links) and the complete revision history (containing a ground truth of new links). As our method is based exclusively on standard server logs, it may also be applied to any other website, as we show with the example of the biomedical research site Simtk.

References

[1]
L. Adamic and E. Adar. Friends and neighbors on the Web. Social Networks, 25(3):211--230, 2003.
[2]
E. Adar, J. Teevan, and S. T. Dumais. Large scale analysis of Web revisitation patterns. In CHI, 2008.
[3]
M. Bilenko and R. White. Mining the search trails of surfing crowds: Identifying relevant websites from user activity. In WWW, 2008.
[4]
E. H. Chi, P. Pirolli, K. Chen, and J. Pitkow. Using information scent to model user information needs and actions and the Web. In CHI, 2001.
[5]
F. Chierichetti, R. Kumar, P. Raghavan, and T. Sarlos. Are Web users really Markovian? In WWW, 2012.
[6]
B. D. Davison. Learning Web request patterns. In Web Dynamics. Springer, 2004.
[7]
P. Devanbu, Y.-F. Chen, E. Gansner, H. Müller, and J. Martin. Chime: Customizable hyperlink insertion and maintenance engine for software engineering environments. In ICSE, 1999.
[8]
D. Downey, S. Dumais, and E. Horvitz. Models of searching and browsing: Languages, studies, and application. In IJCAI, 2007.
[9]
D. Downey, S. Dumais, D. Liebling, and E. Horvitz. Understanding the relationship between searchers' queries and information goals. In CIKM, 2008.
[10]
J. Edmonds. Matroids and the greedy algorithm. Mathematical Programming, 1(1):127--136, 1971.
[11]
S. Fissaha Adafre and M. de Rijke. Discovering missing links in Wikipedia. In LinkKDD, 2005.
[12]
M. Grecu. Navigability in information networks. Master's thesis, ETH Zürich, 2014.
[13]
A. Halfaker, O. Keyes, D. Kluver, J. Thebault-Spieker, T. Nguyen, K. Shores, A. Uduwage, and M. Warncke-Wang. User session identification based on strong regularities in inter-activity time. In WWW, 2015.
[14]
D. Helic, M. Strohmaier, M. Granitzer, and R. Scherer. Models of human navigation in information networks based on decentralized search. In HT, 2013.
[15]
J. Kleinberg. The small-world phenomenon: An algorithmic perspective. In STOC, 2000.
[16]
D. Lamprecht, D. Helic, and M. Strohmaier. Quo vadis? On the effects of Wikipedia's policies on navigation. In Wiki-ICWSM, 2015.
[17]
D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. JASIST, 58(7):1019--1031, 2007.
[18]
G. Marchionini. Exploratory search: From finding to understanding. Communications of the ACM, 49(4):41--46, 2006.
[19]
D. Milne and I. H. Witten. Learning to link with Wikipedia. In CIKM, 2008.
[20]
G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions -- I. Mathematical Programming, 14(1):265--294, 1978.
[21]
C. Nentwich, L. Capra, W. Emmerich, and A. Finkelstein. Xlinkit: A consistency checking and smart link generation service. TOIT, 2(2):151--185, 2002.
[22]
T. Noraset, C. Bhagavatula, and D. Downey. Adding high-precision links to Wikipedia. In EMNLP, 2014.
[23]
C. Olston and E. H. Chi. ScentTrails: Integrating browsing and searching on the Web. TCHI, 10(3):177--197, 2003.
[24]
A. Paranjape, R. West, and L. Zia. Project website, 2015. https://meta.wikimedia.org/wiki/Research:Improving_link_coverage (accessed Dec. 10, 2015).
[25]
P. Pirolli. Information foraging theory: Adaptive interaction with information. Oxford University Press, 2007.
[26]
R. R. Sarukkai. Link prediction and path analysis using Markov chains. Computer Networks, 33(1):377--386, 2000.
[27]
P. Singer, D. Helic, A. Hotho, and M. Strohmaier. HypTrails: A Bayesian approach for comparing hypotheses about human trails on the Web. In WWW, 2015.
[28]
J. Teevan, C. Alvarado, M. S. Ackerman, and D. R. Karger. The perfect search engine is not enough: A study of orienteering behavior in directed search. In CHI, 2004.
[29]
C. Trattner, D. Helic, P. Singer, and M. Strohmaier. Exploring the differences and similarities between hierarchical decentralized search and human navigation in information networks. In i-KNOW, 2012.
[30]
R. West and J. Leskovec. Human wayfinding in information networks. In WWW, 2012.
[31]
R. West, A. Paranjape, and J. Leskovec. Mining missing hyperlinks from human navigation traces: A case study of Wikipedia. In WWW, 2015.
[32]
R. West, D. Precup, and J. Pineau. Completing Wikipedia's hyperlink structure through dimensionality reduction. In CIKM, 2009.
[33]
R. White and J. Huang. Assessing the scenic route: Measuring the value of search trails in Web logs. In SIGIR, 2010.
[34]
E. Wulczyn and D. Taraborelli. Wikipedia Clickstream. Website, 2015. http://dx.doi.org/10.6084/m9.figshare.1305770 (accessed July 16, 2015).
[35]
E. Zachte. Wikipedia statistics. Website, 2015. https://stats.wikimedia.org/EN/TablesWikipediaZZ.htm (accessed July 16, 2015).

Cited By

View all
  • (2023)A Large-Scale Characterization of How Readers Browse WikipediaACM Transactions on the Web10.1145/358031817:2(1-22)Online publication date: 3-Apr-2023
  • (2022)Wikipedia Reader NavigationProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498496(16-26)Online publication date: 11-Feb-2022
  • (2022)Going Down the Rabbit Hole: Characterizing the Long Tail of Wikipedia Reading SessionsCompanion Proceedings of the Web Conference 202210.1145/3487553.3524930(1324-1330)Online publication date: 25-Apr-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining
February 2016
746 pages
ISBN:9781450337168
DOI:10.1145/2835776
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 February 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. browsing
  2. link prediction
  3. log analysis
  4. navigation

Qualifiers

  • Research-article

Funding Sources

  • NSF
  • NIH

Conference

WSDM 2016
WSDM 2016: Ninth ACM International Conference on Web Search and Data Mining
February 22 - 25, 2016
California, San Francisco, USA

Acceptance Rates

WSDM '16 Paper Acceptance Rate 67 of 368 submissions, 18%;
Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)130
  • Downloads (Last 6 weeks)30
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)A Large-Scale Characterization of How Readers Browse WikipediaACM Transactions on the Web10.1145/358031817:2(1-22)Online publication date: 3-Apr-2023
  • (2022)Wikipedia Reader NavigationProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498496(16-26)Online publication date: 11-Feb-2022
  • (2022)Going Down the Rabbit Hole: Characterizing the Long Tail of Wikipedia Reading SessionsCompanion Proceedings of the Web Conference 202210.1145/3487553.3524930(1324-1330)Online publication date: 25-Apr-2022
  • (2021)Wikipedia citations: A comprehensive data set of citations with identifiers extracted from English WikipediaQuantitative Science Studies10.1162/qss_a_001052:1(1-19)Online publication date: 8-Apr-2021
  • (2021)Multilingual Entity Linking System for Wikipedia with a Machine-in-the-Loop ApproachProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3481939(3818-3827)Online publication date: 26-Oct-2021
  • (2021)Predicting Links on Wikipedia with Anchor Text InformationProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462994(1758-1762)Online publication date: 11-Jul-2021
  • (2021)How Inclusive Are Wikipedia’s Hyperlinks in Articles Covering Polarizing Topics?2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671943(1300-1307)Online publication date: 15-Dec-2021
  • (2021)Trajectories through temporal networksApplied Network Science10.1007/s41109-021-00374-76:1Online publication date: 3-May-2021
  • (2020)Distinct Web Search Engine with Reducing Ambiguity Word ComplexityInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology10.32628/CSEIT2063128(527-535)Online publication date: 1-May-2020
  • (2020)Quantifying Engagement with Citations on WikipediaProceedings of The Web Conference 202010.1145/3366423.3380300(2365-2376)Online publication date: 20-Apr-2020
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media

  NODES
Association 2
INTERN 15
Note 1
Project 1
USERS 3