Version 1.0 is available for download.Size of the Science Hall of Fame. This first version of the Hall includes 4209 individual people classified in the fields of chemistry, physics, biology and mathematics. (The social sciences have been excluded from this version.) They are identified as follows (about 5% of people are classified by more than one field category):
Parsing string. The lists of scientists were assembled by parsing categories for the following case sensitive substring matches from their Wikipedia entries:
- 860 in Chemistry
- 1213 in Physics
- 1437 in Biology
- 972 in Mathematics
- 115 Nobel laureates in Chemistry
- 128 Nobel laureates in Physics
- 161 Nobel laureates in Medicine or Physiology.
Parsing of Social Scientists. The offline dataset includes individuals tagged with “social sciences”. So far, these include scientists involved in psychology and linguistics. Sociology, anthropology, and other social science fields have been excluded because of the large number of scholars who do not seem to be scientists. The social sciences category will be expanded in further version of the SHoF.Creating the lists of scientists. Biographical pages within Wikipedia were identified using the DBPedia framework. Optimal full names and ambiguity conflicts were resolved using the method described in the supplementary online materials (SOM) of the Science paper. We implemented a method aiming to correct the potential mistake occurring when a compound last name was considered a full name. For records where the optimal name corresponded to the last two names of the full name, the first name in the string was appended to the optimal name. Manual exclusion. By comparing the ratio of science-related categories to the total set of categories for each individual, putative non-scientists were identified. These were manually verified and, when appropriate excluded from the Science Hall of Fame. The most notable individuals excluded were Margaret Thatcher (politician listed as a chemist), Mary Eddy Baker (religious leader listed as a metaphysicist) and Edward Gray (politician and fellow of the Zoological Society of London).Known Issues.
- Chemistry: ‘chemi’, ‘Chemistry’
- Physics: ‘physics’, ‘Physics’, ‘physicists’, ‘Physicists’
- Biology: 'physio', 'patho', ' ecol', 'zool', 'geneti', 'Biol', 'Physio', 'Patho', 'Ecol', 'Zool', 'Geneti'
- Maths: 'mathe', 'Mathe'
- Categories are not exhaustive.
- Some scientists may be missing:
- Fame signal may be elevated around birth leading to exclusion (detailed in SOM). If the fame signal in a 20-year window centered at birth is within an order of magnitude or less from the lifetime fame signal, this will lead to exclusion. A notable example of this is George Smoot.
- Name may not be unique and homonymity resolution (described in SOM) may not be possible. These ambiguities, when possible are resolved by comparing the size of the Wikipedia entries for the conflicting individuals.
- The dataset was generated using DBPedia 3.5.1 from March 2010.
- Scientist page may not be associated with a “year_births” category and not identified as biographical entry.
- Scientist page may not be associated with parsed categories.
- Some scientists are referred to by too long a name. This is a consequence of unresolvable name ambiguity. When in doubt, the algorithm will include a middle name in the full name of the individual. The use of the longer name (first + middle + last) may then reduce the scientist’s fame signal.