Victorianist and inveterate book buyer.  

Posts tagged with webtech

Why you should look suspiciously at Google Scholar citation rankings: an ongoing saga

I previously discussed the oddities of Google Scholar in 2011 and 2015.  Another four years have passed, and yet...

1) Something there is that doesn't love a (moving) wall.  As I noted in a Twitter conversation a few weeks ago, it has become clear that GS scrapes a lot of its data from JSTOR.  But JSTOR's moving wall means that there is at least a three-year delay on journal issues becoming accessible in its database.  And, as it turns out, the knock-on effect for our purposes is that it can take a similar three years (or more) for citations to appear in your profile.  (One reference to Book Two from 2015, for example, cropped up only this year.) 

2) When Springer and Palgrave Macmillan attack! Books published by Palgrave Macmillan and available through Springer show up in GS as both individual titles and individual chapters.  As a result, GS inadvertently generates duplicate citations--an article with six citations may actually only have one.  

3) GoogleBooks blues.  Not surprisingly, one of GS' primary sources for book citations is GoogleBooks.  But this means that GS replicates all of the notorious metadata errors in GB.  Moreover, if, as is sometimes the case, there are multiple versions of the same book in the GB archive in slightly different formats, GS will simply pick them up as different citations.

4) Book review aggregation.  GS seems to have a hard time with the publication format of book reviews, as it is often the case that one ends and another begins on the same page.  It thus may count a review twice--once individually and once as part of a larger set of reviews treated as a single article.  (Moreover, it will do this both with reviews you have written and with reviews of your work.)

5) Amazingly, there are academics on this planet who do not write in English.  GS' access to non-Anglophone scholarship seems to be...scattershot.

My advice remains as it has been for the last eight years: anyone using the numbers for purposes of hiring, tenure, and promotion (or polemic, for that matter) needs to treat them as approximate, at best, and proceed with due caution.  

The Incredible Lightness of GoogleScholar Citations

If you're in the humanities, GoogleScholar's list of citations looks like a tempting way of quantifying one's impact on the planet.  Or, at least, on fellow academics.  However, I'm a bit concerned about the prospect of administrations using GoogleScholar's number count as a convenient shortcut, because GoogleScholar's number counts are frequently...odd.  For example, here's my own profile.  There are some weird things going on:

1) Wow, my edition of Robert Elsmere has fifty-one citations!

[beat]

And amazingly, many of these citations predate its publication!

In other words, what GoogleBooks has done is simply attach all citations of Ward's novel to my edition, which is not helpful.  (As some of you may recall, I had a long and ultimately losing battle with Amazon about their insistence on doing the same thing, which left me stuck with one-star reviews intended for entirely different editions.)  I would guess that this is not an isolated instance.

2) GoogleScholar does not believe I wrote an article about movies, which deflates my citation count by, um, one.  (They believe that the article exists, but don't have it attached to my profile.  The perils of branching out, I guess.)

3) There are a number of "nonsense" entries in the publication list, including a book review of the aforementioned RE by someone else (it would be awkward if I reviewed my own work, you understand...), entries in journal indexes, etc.  If one is carefully reading GoogleScholar, this is not an issue; if one is skimming GoogleScholar for quick data, however, it is.

4) Worse still, there are a number of "nonsense" citations, including, again, journal indexes, "books received," and annual bibliographies.  Moreover, book reviews are sometimes counted twice (once on their own, once as part of a book review section).  These are not "citations" in any sense of the term, and shouldn't be counted.  However, to find them, you have to double-check the citation list for each entry, which is the opposite of a quick fix. 

5) Finally, GoogleScholar is very spotty about catching citations in non-academic press outlets, which makes it difficult to detect impact outside a strictly academic audience. 

Short version: be wary of GoogleScholar citation counts. 

Unmediated

Stephanie Buckhanon Crowder makes one of those points that ought to be obvious, but isn't, namely that it's totes OK to not share everything and the kitchen sink with one's students.  At the same time, as someone pointed out in comments, it's pretty much a given that students will be curious and poke around a bit on the internets.  Which means that there is sharing and sharing.  

My own rules, developed by ye olde method of trial and error:

1) In general: discuss "students" in only the most general of terms; try to avoid anything that could be pinned to one student, or even a specific group of students, unless it's wholeheartedly positive.  I know that some people get huffy about this, but I'm not anonymous, and my students haven't asked or given permission to be identified.  If you're anonymous, then obviously you have more freedom in this respect, but even so, bear in mind that one day in the distant future, your anonymity may be breached (either because somebody snoops or, even more likely, you accidentally identify yourself, whether on- or off-line).

I also tend to be somewhat chary about discussing my personal life, because it's, well, personal, and not something I tend to do in the "real" world, either.  Photos of cats are OK, though.  Oh, and book acquisitions.  

2) Facebook.  Completely locked down.  No requests accepted from current students, because a) if I am going to post something personal, it will be there and b) surely my students don't want me seeing their personal life.  However, I decided some time back that rule #1 would also have to apply, simply because you never know who is going to cut and paste you. 

3) Twitter.  I've followed and been followed by students on occasion, but neither seek them out nor dissuade them.  (I've done Twitter-y things in courses before, but many of our students don't use the app, and I'm not going to make them sign up for it.)

4) Here.  Obviously, students can find me here (and have, on occasion).  

And now, to make life even more complicated

For obvious reasons, I didn't feel like reading an already-digitized book while I was in the British Library--the whole point being to go to the British Library to read otherwise-inaccessible books--so I Googled each title before calling it up.  (Yes, this took a while.)  This led to a somewhat unexpected chain of events:

1) In the USA, I had already searched a bunch of titles and found no sign that they enjoyed a free existence.

2) In the UK, I searched some of these titles again, and...there they were in GoogleBooks.  Say what?

3) Back in the USA, on a lark I began looking up some books I had found available on Amazon in facsimile form, and which had not cropped up in GoogleBooks (or archive.org, or HathiTrust) when I searched in the UK.    And there they are.  

Bear in mind that most of these books bring up only one or two pages of results--we're not talking about combing through thousands, hundreds, or even dozens of hits.  I didn't overlook them; they just weren't there.  It's possible that Google "learned" my preferences, but surely it could have brought up an item it hosts the first time around?

Moreover, I've also found that some books cannot be searched via GoogleBooks at all; you have to use the regular Google search function.  To take a minor example: GoogleBooks pulls up volumes one and three of Mrs. O'Shea Dillon's triple-decker Dark Rosaleen.  Where's volume 2? (You know, the middle of the book?) It turns out you can get it on Google Play or at the French GoogleBooks--but not at the American or English GoogleBooks sites! 

The Way We GoogleBooks Now

Timothy Burke bemusedly observes that "of the texts I’ve looked at recently in Google Books, quite a few of them seem sloppy: lines at the bottom of the page distorted or unreadable, half-pages missing, weird noise or distortion."  I've been griping about this off and on for some time now, but the subject merits a return visit. 

As conceived, GoogleBooks is an obvious boon to readers, including impoverished graduate students and contingent faculty, moderately-less impoverished full-time faculty, and independent scholars (the latter of whom can face real obstacles in gaining access to university libraries, let alone research funding).  Those of us who cannot afford to jump on a plane to read the only-extant copy of a novel held at the Bodleian now have the opportunity to read it in the (relative) comfort of our homes and offices.  Moreover, the search function means that it is now possible to identify relevant materials--articles in Victorian Christian periodicals, for example--in texts that are scattered across multiple repositories in both Europe and the United States.  I don't think anyone would deny that these are positive results.

However:

1.  As any bibliographer can tell you, a digital copy of a text cannot replace the original.   Specialists in the history of the book lose relevant information (about specific print techniques, ink, paper composition, bookbindings, watermarks, etc.) when faced with digitized texts.  Moreover,  non-bibliographers discover that digitization sometimes worsens flaws in the paper original; for example, scanning a book printed on especially cheap paper sometimes results in excessive "bleed-through" from the verso.  Similarly, faded inks and stereotypes at the end of their lifespan reproduce badly under current conditions, as do some smaller fonts.   Frontispiece engravings may or may not scan properly.  Etcetera. 

2.  A badly-reproduced book is an only partially useful book.  Or a completely useless book, depending on how badly it was copied.  And it can be more difficult than you think to obtain photocopies of poorly-scanned or missing pages, depending on the source library; some libraries, for example, require substantial minimum charges for photocopies that need to be sent via post.  (Someone in the USA who wants a single page from a book held by Cambridge, for example, quickly discovers that the minimum charge is ten pounds, before all of the other fees; right now, that's over $20 USD.)

Incidentally, switching from page view to plain text HTML frequently produces very scary results--and I'm not just talking about the lost paragraphing.   Punctuation disappears (including quotation marks), words partially vanish or are misspelled, etc.

3.  A poorly-digitized text + de-accessioning = headaches.   Libraries will absolutely deaccession print materials if a digital version appears, even though the digital version may be inferior or offer less ease of use.  This can be a real problem with reference materials (years ago, I discovered one library that decided to chuck the entire Union Catalog...)   and bound periodicals.  But older texts of all sorts suffer similar fates.

4.  Complementary search techniques.  Online searching works very well with definite targets; it doesn't work very well if a book is missing some obvious keyword (or suffers from typos).  A search for "Popery" won't necessarily net me anything that references the "Church of Rome."    In addition, searching the old-fashioned way--by shelfwalking, sitting down with a stack of periodicals, or working through a library's rare books collection--often yields essential data and/or connections that won't necessarily appear in an online search.  (To finish one of the articles I'm currently planning, for example, I'm still going to have to finance a trip to England, even though I can do a lot of preliminary work on GoogleBooks.)    

5.  Editions, editions...  It goes without saying that up-to-date, copyrighted scholarly editions are largely inaccessible on GoogleBooks, given the restrictions.  It does not therefore follow that the editions available on GoogleBooks are necessarily editions anyone should cite.  Thus, someone in search of John Foxe's Acts and Monuments will only find nineteenth-century editions, despite Thomas S. Freeman's stern warnings about just how appallingly bad they are [1].  Similarly, nineteenth-century translations from Greek and Latin do not always travel well, shall we say (as A. E. Housman so spectacularly pointed out).  Nineteenth-century (and earlier) editors have been known to bowdlerize, silently abridge, or just as silently update older texts, all of which can have dangerous results for the more modern academic.   

[1] E.g., Thomas Freeman, "Texts, Lies, and Microfilm: Reading and Misreading Foxe's 'Book of Martyrs,'" Sixteenth Century Journal 30.1 (1999): 23-46.  JSTOR.  This article focuses on the first and most popular of the four complete Victorian editions, edited by Stephen Cattley and George Townsend.