Medical articles on French Wikipedia have "high rate of veracity"
- Reviewed by Nicolas Jullien
A doctoral thesis at Aix-Marseille University examined the accuracy of medical articles on the French Wikipedia. From the English abstract: "we selected a sample of 5 items (stroke, colon cancer, diabetes mellitus, vaccination and interruption of pregnancy) which we compare, assertion by assertion, with reference sources to confirm or refute each assertion. Results: Of the 5 articles, we analyzed 868 assertions. Of this total, 82.49% were verified by the referentials, 15.55% not verifiable due to lack of information and 1.96% contradicted by the referentials. Of the contradicted results, 10 corresponded to obsolete notions and 7 to errors, but mainly dealing with epidemiological or statistical data, thus not leading to a major risk when used, not recommended, on health. Conclusion: ... This study of five medical articles finds a high rate of veracity with less than 2% incorrect information and more than 82% of information confirmed by scientific references. These results strongly argue that Wikipedia could be a reliable source of medical information, provided that it does not remain the only source used by people for that purpose."
This medical PhD thesis is a very well documented analysis of the questions raised by the publication of medical information on Wikipedia. Although the findings, summarized in the abstract, will not be new to those who know Wikipedia well, it presents a good review of the literature on the topic of medical accuracy, and also of the purpose of Wikipedia (not a professional encyclopedia, but a form of popular science, an introduction, and some links to go further). This document is in French.
Assessing article quality and popularity across 44 Wikipedia language versions
- Reviewed by Nicolas Jullien
From the paper: Distribution of quality scores in 12 topic areas on English, German and French Wikipedia
This is the topic of a paper in the journal Informatics. From the English abstract: "Our research has showed that in language sensitive topics, the quality of information can be relatively better in the relevant language versions. However, in most cases, it is difficult for the Wikipedia readers to determine the language affiliation of the described subject. Additionally, each language edition of Wikipedia can have own rules in the manual assessing of the content’s quality. There are also differences in grading schemes between language versions: some use a 6–8 grade system to assess articles, and some are limited to 2–3. This makes automatic quality comparison of articles between various languages a challenging task, particularly if we take into account a large number of unassessed articles; some of the Wikipedia language editions have over 99% of articles without a quality grade. The paper presents the results of a relative quality and popularity assessment of over 28 million articles in 44 selected language versions. Comparative analysis of the quality and the popularity of articles in popular topics was also conducted. Additionally, the correlation between quality and popularity of Wikipedia articles of selected topics in various languages was investigated. The proposed method allows us to find articles with information of better quality that can be used to automatically enrich other language editions of Wikipedia."
Regarding the quality metrics, I salute the coverage in terms of languages, which allows to go beyond the "official" automated evaluation provided by the Wikimedia Foundation (ORES) that is only available on some big language projects. As the authors explained, this part is mostly based on a work already published, but fairly extended. It also proposes some solutions to the quality comparisons between different languages, and takes into account the variations of perspectives between different cultures.
It also opens a discussion about the popularity of articles, and how this can help to choose which master language has to be chosen when an article exists. Although this part is just at its beginning, their discussion makes the next step for their work, looking forward.
From the paper: Distribution of various article metrics by quality class on English Wikipedia
- Reviewed by FULBERT
This theoretical paper explored ambiguous relationships between credibility, trust, and authority in library and information sciences and how they are related to perceived accuracy in information sources. Credibility is linked to trust, necessary when we seek to learn from or convey information between people. This is complicated when the authority of a source is considered, as personal or institutional levels of expertise increase the ability to speak with greater credibility.
The literature about how this works with knowledge and information on the Web is inconsistent, and as a result this work sought to develop a unified approach through a new model. As credibility, trust, and authority are distinct concepts that are frequently used together inconsistently, they were explored through how Wikipedia is used and perceived. While Wikipedia is considered highly accurate, trust in it is average while its credibility is at times suspect.
Sahut and Tricot developed the authority, trust and credibility (ATC) model, where “knowledge institutions confer authority to a source, this authority ensures trust, which ensures the credibility of the information.” As a result, “the credibility of the information builds trust, which builds the authority of the source.” This model can be useful when applying to the citation of sources in Wikipedia, as it helps explain how the practice of providing citations in Wikipedia increases credibility and thus encourages trust, “linking content to existing knowledge sources and institutions.”
The ATC model is a helpful framework for explaining how Wikipedia, with its enormous readership, continues to suffer from challenges to being perceived as an authority due to its inconsistencies in article citations and references. This theorizes that filling these gaps will increase authority and thus the reputation of Wikipedia itself.
Figure 2 from the paper, on Wikipedia authority, trust and credibility. ("The educational institution can spread a bad reputation on Wikipedia, which decreases its authority, has a negative influence on its trust, which negatively influences the credibility of the information. Conversely, a positive experience of credibility of Wikipedia information increases readers’ trust.")
Conferences and events
Academia and Wikipedia: Critical Perspectives in Education and Research
A call for papers has been published for a conference titled "Academia and Wikipedia: Critical Perspectives in Education and Research", to be held on June 18, 2018, at Maynooth University in the Republic of Ireland. The organizers describe it as "a one-day conference that aims to investigate how researchers and educators use and interrogate Wikipedia. The conference is an opportunity to present research into and from Wikipedia; research about Wikipedia, or research that uses Wikipedia as a data object".
Wiki Workshop 2018
The fifth edition of Wiki Workshop will take place in Lyon, France on April 24, 2018, as part of The Web Conference 2018. Wiki Workshop brings together researchers exploring all aspects of Wikimedia websites, such as Wikipedia, Wikidata, and Wikimedia Commons. The call for papers is now available. The submission deadline for papers to appear in the proceedings of the conference is January 28, all other papers on March 11.
See the research events page on Meta-wiki for other upcoming conferences and events, including submission deadlines.
Other recent publications
Other recent publications that could not be covered in time for this issue include the items listed below. contributions are always welcome for reviewing or summarizing newly published research.
- Compiled by Tilman Bayer
- "What do Wikidata and Wikipedia have in common?: An analysis of their use of external references" From the abstract: "Our findings show that while only a small number of sources is directly reused across Wikidata and Wikipedia, references often point to the same domain. Furthermore, Wikidata appears to use less Anglo-American-centred sources."
- "A glimpse into Babel: An analysis of multilinguality in Wikidata" From the abstract: "we explore the state of languages in Wikidata as of now, especially in regard to its ontology, and the relationship to Wikipedia. Furthermore, we set the multilinguality of Wikidata in the context of the real world by comparing it to the distribution of native speakers. We find an existing language maldistribution, which is less urgent in the ontology, and promising results for future improvements."
- "Before the sense of 'we': Identity work as a bridge from mass collaboration to group emergence" From the paper: "... From these interviews, we identified that a Featured Article (FA) collaboration that had occurred in 2007 in the “Whooper Swan” Wikipedia article, was very important for the actions of later group work. The focus of this paper is around this foundational article."
Illustration from "Interpolating quality dynamics in Wikipedia and demonstrating the Keilana
- "Interpolating quality dynamics in Wikipedia and demonstrating the Keilana effect" From the abstract: "I describe a method for measuring article quality in Wikipedia historically and at a finer granularity than was previously possible. I use this method to demonstrate an important coverage dynamic in Wikipedia (specifically, articles about women scientists) and offer this method, dataset, and open API to the research community studying Wikipedia quality dynamics." (see also research project page on Meta-wiki)
See also our earlier coverage of another OpenSym 2017 paper: "Improved article quality predictions with deep learning"
- "Mining team characteristics to predict Wikipedia article quality" From the abstract: "The experiment involved obtaining the Spanish Wikipedia database dump and applying different data mining techniques suitable for large data sets to label the whole set of articles according to their quality (comparing them with the Featured/Good Articles, or FA/GA). Then we created the attributes that describe the characteristics of the team who produced the articles and using decision tree methods, we obtained the most relevant characteristics of the teams that produced FA/GA. The team's maximum efficiency and the total length of contribution are the most important predictors."
- "Predicting the quality of user contributions via LSTMs" From the discussion section: "We have presented a machine-learning approach for predicting the quality of Wikipedia revisions that can leverage the complete contribution history of users when making predictions about the quality of their latest contribution. Rather than using ad-hoc summary features computed on the basis of user’s contribution history, our approach can take as input directly the information on all the edits performed by the user [e.g. features such as "Time interval to previous revision on page", the number of characters added or removed, "Spread of change within the page", "upper case/ lower case ratio", and "day of week"]. Our approach leverages the power of LSTMs (long-short term memory neural nets) for processing the variable-length contribution history of users."
Plot describing the change, from October 2014 to January 2016, in the absolute number of female biography articles (horizontal axis) and their share among all biographies (vertical axis), for various Wikipedia languages (appearing in similar form in the "Monitoring the Gender Gap ..." paper)
- "Monitoring the gender gap with Wikidata human gender indicators" From the abstract: "The gender gap in Wikipedia’s content, specifically in the representation of women in biographies, is well-known but has been difficult to measure. Furthermore the impacts of efforts to address this gender gap have received little attention. To investigate we use Wikidata, the database that feeds Wikipedia, and introduce the “Wikidata Human Gender Indicators” (WHGI), a free and open-source, longitudinal, biographical dataset monitoring gender disparities across time, space, culture, occupation and language. Through these lenses we show how the representation of women is changing along 11 dimensions. Validations of WHGI are presented against three exogenous datasets: the world’s historical population, “traditional” gender-disparity indices (GDI, GEI, GGGI and SIGI), and occupational gender according to the US Bureau of Labor Statistics." (see also Wikimedia Foundation grant page)
- "An empirical evaluation of property recommender systems for Wikidata and collaborative knowledge bases" From the abstract: "Users who actively enter, review and revise data on Wikidata are assisted by a property suggesting system which provides users with properties that might also be applicable to a given item. ... We compare the [recommendation] approach currently facilitated on Wikidata with two state-of-the-art recommendation approaches stemming from the field of RDF recommender systems and collaborative information systems. Further, we also evaluate hybrid recommender systems combining these approaches. Our evaluations show that the current recommendation algorithm works well in regards to recall and precision, reaching a recall of 79.71% and a precision of 27.97%."
- "Medical science in Wikipedia: The construction of scientific knowledge in open science projects" From the abstract: "The goal of my research is to build a theoretical framework to explain the dynamic of knowledge building in crowd-sourcing based environments like Wikipedia and judge the trustworthiness of the medical articles based on the dynamic network data. By applying actor–network theory and social network analysis, the contribution of my research is theoretical and practical as to build a theory on the dynamics of knowledge building in Wikipedia across times and to offer insights for developing citizen science crowd-sourcing platforms by better understanding how editors interact to build health science content."
- '"Comparing OSM area-boundary data to DBpedia" From the abstract: "OpenStreetMap (OSM) is a well known and widely used data source for geographic data. This kind of data can also be found in Wikipedia in the form of geographic locations, such as cities or countries. Next to the geographic coordinates, also statistical data about the area of these elements can be present. ... in this paper OSM data of different countries are used to calculate the area of valid boundary (multi-) polygons and are then compared to the respective DBpedia (a large-scale knowledge base extract from Wikipedia) entries."
See also our earlier coverage of another OpenSym 2016 paper: "Making it easier to navigate within article networks via better wikilinks"
Diverse other papers, relating to structured data