Consensus Transcriptions, or, What Is Our Hard Work Doing?

We’re 10 months into Decoding the Civil War, and some people may be wondering when they’ll see some fruits for their extensive (and greatly appreciated!) efforts. Well, we have published two of the ledgers to the Huntington Digital Library (3 and 24), and we hope to have another one later this week.

But how do we get from the mass of data to the finished transcription?

It is a bit like sausage making. Our clever DCW developers take all of the data from your transcriptions, run the data through an algorithm, figure out what the most common text for each and every word is, and then uses those most common words to create the consensus text.

Take this page, for example:

april 18 - mssEC_24_087 - demo of consensus transcription.jpg

The text returned to us from the consensus algorithm was:

23 Di 23 Aug 1130am 30am axis Byron will Corps will will Come to alexandria. I cannot yet decide as to Belgrave Corps new Yoke will be sent to replace Barnard as soon as possible but just now we Have no time to make the Exchange we bought are momentarily Expecting handsome war near Warrenton and every a vessel must be immediately sent discharged for your youth sign applause Eleven back Mes Wash 7th for Sheldon agate send to Pagoda as promptly as possible the first China China velsmay & third & fourth Cuba stanhope now at nasty signed applause gertrude gertrude

This may look like a bit of a mess, but consider the transcription that we had before DCW:

 

That’s a big old empty spot, in case you were wondering. Once we receive this data, Project Leader Mario E. and I do a quick scan of the text, in order to correct obvious errors and identify words that might require further investigation. So, looking at that same text, we might come up with this:

23 Di 23 Aug 1130am 30am axis Byron will Corps will will Come to alexandria. I cannot yet decide as to Belgrave Corps new Yoke will be sent to replace Barnard as soon as possible but just now we Have no time to make the Exchange we bought are momentarily Expecting handsome war near Warrenton and every a vessel must be immediately sent discharged for your youth sign applause Eleven back Mes Wash 7th for Sheldon agate send to Pagoda as promptly as possible the first China China velsmay & third & fourth Cuba stanhope now at nasty signed applause gertrude gertrude

Some of the issues that pop up are merely a result of volunteers spacing the words in a variety of ways – obviously some people typed “1130am” while others typed “11 30am”. Others require a more critical examination of the text, such as the end of the first telegram, where “Eleven back Mes” is reinterpreted as “Eleven a M”.  And still other, like that misplaced “a” in the middle of the text, are accurate, if odd, parts of the transcription.

The transcription as it appears in the Huntington Digital Library looks like this:

Fr Di 23 Aug 11 30 am
for axis Byron Corps will
come to Alexandria I cannot
yet decide as to Belgrave
Corps new Yoke will be
sent to replace Barnard as
soon as possible but just
now we Have no time
to make the Exchange we <deletion>caught</deletion>
are momentarily Expecting handsome war near
Warrenton and every a vessel must
be immediately sent discharged for your
youth signed applause Eleven a M

Wash 7th
for Sheldon agate send to Pagoda as
promptly as possible the first China <unclear>velsmay</unclear>
& third & fourth Cuba <insertion>NY</insertion> Stanhope
now at nasty signed applause gertrude

We can’t guarantee that our interpretations are correct, but we hope to provide a faithful transcription of the text as it appears on the page. And we no longer have that big old empty spot, but rather usable, hard data. That is some pretty sweet fruit!

Save

Advertisements

Tags: , , , ,

5 responses to “Consensus Transcriptions, or, What Is Our Hard Work Doing?”

  1. Marlys Sebasky says :

    Velsmay could be cavalry…

    Liked by 1 person

  2. SarahTheEntwife says :

    Neat; thanks for the behind-the-scenes look!

    Liked by 1 person

  3. Craig says :

    Question (as usual) – but first thanks for another great insight into where this fascinating project is going.

    The quest: do you, Kate and Mario E., get to review all 10 transcriptions when you are making decisions? The thinking being that an obvious and correct transcription may have been dropped because the consensus went against it….could be a big help or no?

    Good luck with the “challenge” which seems to be off to a good start!

    Liked by 1 person

    • katecpeck says :

      Hi Craig, thanks for the question! All of the data is preserved even after the consensus is arrived at, and one of our developers has created an interface for viewing the variations that is super cool. Unfortunately it would be extremely time consuming to check every word, and our goal is to get the transcriptions out to the public as quickly as possible. We expect someone to do a fuller analysis of this data in the future, but for now we’re acting as stewards of all of the data resulting from all of our volunteers efforts!

      Like

  4. Craig says :

    Once again thank you Kate..it’s obviously quite a stewardship that you and the others are charged with…..and the results are very impressive!

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: