June 29, 2010
Tim Forsythe wrote:

The Henry Project

Stewart Baldwin has recently updated and greatly expanded The Henry Project. If you are not already aware, The Henry Project includes detailed pages for the ancestors of Henry II, King of England. How does this differ from the plethora of other websites out there you ask. Stewart Baldwin has gone to great lengths to examine all the primary, contemporary and secondary sources. He cites these sources on each page and discusses at length their conclusions as well as his own. Before each page is put online, it is reviewed by one or more experts in the fields of Medieval history and genealogy. The pages are updated periodically as new information comes to light. The Henry Project continues to be the best online resource for these ancestors. While it is true that not every conclusion he makes is widely accepted by his peers, he provides the reader with all the tools necessary to make their own conclusions. I find it to be most useful in casting doubt on long held conventional wisdom. After all, the most important claim to be made in genealogy is not who a person's parents are, but the certanity of that claim. My take on Mr. Baldwin's analyses is that he tends to be, if anything, perhaps a smidgen overcautious. Good for him.

Most people, of course, do not have Henry II lying around in their genealogy database waiting to be processed. The Henry Project can also very useful to anyone with a hanckering for medieval history. The detail provided will astound you.

Links

The Henry Project



June 8, 2010
Tim Forsythe wrote:

Adam and VGed Updated

I am pleased to announce the newly updated versions of Adam Version 8.02 and VGed Version 3.02. Check out the What's New sections on the websites for a list of changes.

Links

Adam
VGed


June 4, 2010
Tim Forsythe wrote:

The Power of Association

The GEDCOM Standard Release 5.5, the most popular database format used for transferring genealogical data between applications provides many of the basic elements needed to properly document your ancestry. Included among these are the ability to link source citations to individual attributes and events. I wrote about the importance of this in my last post, so I won't be rehashing that here. Today I'll be covering a feature that when used in conjunction with these source references provides the most powerful tool in validating your ancestry.

All genealogical databases are basically a collection of claims, the most important of which are the relationships between individuals. Children, parents, and spouses can all be neatly linked together through the use of Family records and family references. Unfortunately, the boys over at GEDCOM did not provide any fields in the family record to link source citations to these relationships directly. Unless we can link these relationships to sources, like any other claim, it is not possible to validate these relationships.

The family record does have an indirect way for accomplishing this for spouses by using the marriage record. Any sources linked to a marriage record are usually equally valid for linking spouses because, by their very nature, any reference to the marriage between two individuals will always include both of their names. This however is not true to references of birth. Perhaps if you are fortunate enough to get your hands on a birth certificate, you will have the parents names, but in most cases we obtain birth dates by other means. We must find another way to link source citations directly between children and their parents. Fortunately, the girls at GEDCOM provided a method for doing this - the ASSOciation.

The GEDCOM ASSOciation record allows us to define a relationship between any two individuals, but most importantly, it allows us to directly link source citations to this relationship. Fortunately, the GEDCOM team did not define any relationship names, so applications using this feature are allowed to let their users enter the strings that indicate the associated relationships for a son, a daughter, a mother and a father.

When you are entering data into your database, using your editor of choice, you should always enter these associations along with their source references. If your genealogy editor does not provide this feature, I'd recommend finding an editor that does (See Pandora's Box). When presenting your data on the Internet as so many of us are apt to do, make sure that your family website generator also supports displaying these associations and linked sources. If not, ditto, more of the same.

The following image shows an example presentation


In this case the plus marks next to the father's name indicate the certainty of the relationship based on the source reference quality. No sources are listed next to the mother's name indicating that the relationship is not supported by evidence.

If you are forward thinking enough to use the quality field to rate your associations (as you should) (See The Law of Averages), then your genealogy applications should be bright enough to determine the certainty of these relationships and display them for you. The following image shows how these database fields can be used together to determine uncertain ancestors and then indicate these in an ancestor list.


Links

The GEDCOM Standard Release 5.5


May 31, 2010
Tim Forsythe wrote:

The Law of Averages

There is a wealth of information on the Internet, coming in all shapes and sizes, that can aid us in our genealogy research. Probably the single most common format is the family website. The family websites that I am referring to here are those that contain a simple presentation of the genealogy data of Joe and Jane Average. By and large these presentations are either generated by a genealogy editor that provides an option to create a website or by standalone website creators that convert GEDCOM files. Unfortunately, most of these websites do not present their claims in a form that can be used reliably by others.

Anyone that has been doing research for very long learns fairly quickly that all genealogy claims are not alike. In order for a genealogy claim to be of any value to us, and that includes Joe and his lovely wife, we must be able to measure its quality or certainty. This is usually accomplished by checking the referenced sources, and this is the primary reason why family websites are of little value. I would estimate that 95% of them do not reference sources to back up their claims. It seems the Averages would like us to accept the fruits of their labor on faith alone. I would advise anyone doing research to fall victim to The Law of Averages.

The Experiment

I thought I would test out my theory, so I decided to run a little experiment. I took one of my Colonial ancestors who was quite an important figure in his time and place, but not so important that he is mentioned in history books. In my data I make 210 claims and back these up by referencing 25 sources. He is obviously someone for which it is not difficult to find data. My experiment was quite simple: I typed his name and birth date into the Google search bar and cataloged all the family websites in the order they were found over the period of one hour. I used only family websites where he was listed as the head of his family, in other words, I did not rate any site where he was incidentally listed. I gave each site a score based on the number of claims made that were backed up by source references. Each claim with a reference was worth 1 point. If the reference had a quality score of any sort it got an additional point, and if the reference linked to or included quoted citations it was worth an additional 5 points. Often when sources are referenced, they are not referenced at the claim level, but instead at the individual level. These score the same as other references, but only once per individual, not per claim made, because it is impossible to distinguish which claims are associated with that source. In all I found 27 sites whose average rating was 1.5 points. Only three of the websites listed any sources, none of them indicated quality, and all of them were at the individual level. Only one website showed quoted citations earning it a score of 15 points. As a comparison, my website for this individual received 786 points.

Supporting Your Claims

I am almost completely positive that Jane and her knightly husband would, if they knew better, provide documentation to support the claims they've published, and therein lies the real purpose of this blog: to provide some insight for those who aren't sure how to go about this. The first step in documenting your sources is to understand the basic types of sources. The GEDCOM Standard Release 5.5 defines 4 basic types (I've gone much further in my stab at a better standard. See GREnDL). These are: primary, secondary, questionable and unreliable. Briefly, Primary sources are the original documents relating to a persons life i.e. birth certificate, census record, etc. Primary sources are the best source of reliable information. This is not to say that they are not questionable. Many primary sources have mistakes. Take census records for an example. I'd say about 80% of the census records in my data list the ages of individuals in error by more than 1 year. Secondary sources are basically direct written copies of primary sources, where someone has taken the time to manually duplicate the text. Secondary sources are subject to copying errors, intentional corrections and unintentional biases of the copier. For these reasons they are not as reliable, so should be backed up whenever possible by independently derived at sources. Unreliable sources are those for which it is known the data therein is not dependable, usually because primary or secondary sources have been found in contradiction. Family websites almost always fall into this category even when they reference sources. All other sources are considered questionable and include the bulk of every genealogist's database. Some of these will be more questionable than others.

The GEDCOM standard provides several tools for documenting your sources. These include support for adding Source records with quoted citations, referencing these sources from claims and providing a Quality score along with the reference. They also allow indicating the page number in the referenced source where the claim is documented. These features provide more than sufficient coverage for most genealogists to document their sources. Why is it then that very few family websites use them? I think there are two basic reasons. Firstly, the Averages are a bit lazy. It takes work to add sources to a database, and referencing a source for each of its claims can be exhausting. The alternative however is to have a genealogy full of unreliable information which makes it worth not much more than the paper it is printed on. The second reason that Jane and Joe don't bother documenting their sources, is that the genealogy editor that they use does not make this easy, and may not even provide the ability to rate the quality of the reference. If this is the case, I'd recommend to anyone that they immediately go get themselves a new editor that provides these features. In my opinion, having the ability to cite, reference and rate sources is one of the most important aspects of creating a reliable genealogy. Changing genealogy editors can be feat in itself. See my post Pandora's Box for a guide on how to go about this.

Presenting Your Genealogy

Once you've gone through the task of throwing out all your undocumented claims and adding references and quality scores for those you can, you'll want to republish your website so that all your hard work can be appreciated. In order to do this you must make sure that whatever website creator you use has at a minimum the ability to display your source references and quality scores for each claim and link back to the sources where quoted citations can be viewed. I cannot stress enough how important it is to include all referenced claims, even those that you think are invalid. It is important that you to leave it up to the viewer to decide for themselves a claim's validity. Weeding out invalid claims from your database is akin to censorship. One important advantage of showing these claims is that it gives your viewers the ability to judge the overall quality of the source, not just the reference. Any source with invalid claims will pretty quickly be recognized as unreliable.

The following screen capture shows an example of a good presentation and comes directly from my family website. The website was generated using my homegrown program: Adam. You'll notice one of the death dates is shown as disproved. The plus signs indicate the highest quality score for the referenced sources. If you click on the image and go to the web page, you will see when you hover over a source reference, the quality of each reference is show in a floating box. You can also click on any of the sources to see the quoted citations.


I encourage everyone to begin the process of documenting each claim made in their genealogical database using the methods described in this post. You won't regret it and your viewer's will appreciate all the hard work you've done. More importantly, your children will not need to start over. It's time to quit being just another Average Joe.

Links

The GEDCOM Standard Release 5.5
GREnDL
Adam


May 30, 2010
Tim Forsythe wrote:

Adam 8.01 Released

I am pleased to announce the newly updated version of Adam Version 8.01. Adam is a family history website generator that parses GEDCOM 5.5 files. Adam was rewritten from scratch to be faster and was redesigned to create a better presentation for your genealogical data. Many features have been updated and new ones added. You can check out the What's New section on my website for a more detailed list.

Links

Adam


May 22, 2010
Tim Forsythe wrote:

VGed 3.01 Released

I am pleased to announce the newly updated version of VGed Version 3.01, a GEDCOM 5.5 Validating program. It's fast, accurate and provides several options to enable and disable validation features that are commonly violated. Besides validating your file, it can locate both missing and dead records. New to this release is an option to validate the format of Gregorian dates. Also new is the option to validate CONTinuation tags when used to extended the length of data beyond the 255 character limit in places other than the commonly used places like notes and source citations. Unfortunately, few programs currently permit the use of these tags, so it may be necessary to detect when they are being used.

Links

VGed


May 21, 2010
Tim Forsythe wrote:

Pandora's Box

In Though the Looking Glass, Lewis Carroll wrote "The time has come the walrus said to speak of many things." He had no idea what kind of Pandora's box, or perhaps in his case it was an oyster's shell, he was opening up. The sheer number of blogs in the new blogosphere (what they used to call the Internet) is evidence enough of that. The way I figure it, it is my duty and obligation as a citizen of this blogosphere to take my rightful place and rant about whatever I like. Actually I just want to talk about genealogy. I've got quite a lot to say about that.

In this first post I'd like to "Begin at the beginning." Where's that you ask - the data.

Data Security

We all collect data and store it way. Whether it is handwritten notes, photocopies, or just digital files on a thumb drive we carry from room to room, we've all got precious data we want to protect. I know that a lot of you out there are like me with boxes and boxes piled high in some out of the way location like a seldom used guest room or the stinky old attic. Mine are wedged into one side of my closet. My clothes are delegated to the other. Over the last 15 years or so I've gradually migrated the most critical of this into digital form which I have backed up onto any number of hard drives scattered about the country. I love my data and will do anything to protect it. And of all this data, the files that hold the pertinent information on my ancestors, the most precious of all, the stuff that if I lost it I would go crazy and tear what's left of my hair out - that data - it seems no one else cares. I'm referring to the myriad software manufacturers who will gladly load up my data into what appears at first glance to be a well oiled machine and then throw half of it away because it doesn't meet some arbitrary criteria of what they consider important.

Lack of Standards

There is not, unfortunately, a standard in which all genealogical data is kept. That would be too easy. The several dozen software companies who vie constantly for your genealogical dollar have an incentive to get you to use their products and then make it difficult for you to switch to a competitor later. I get the logic in that, but what I don't understand is why they are willing to throw away years of hard work, yours not theirs, to ensure you cannot switch. What am I talking about you ask? I am talking about exporting your data from one application and then importing it into another. The only way you can switch from one company's product to use another is if this process is smooth an error free. There are two ways in which this can be accomplished. The first is to transfer the data using the proprietary formats of the applications themselves. These databases are not viewable in the traditional sense so you have no recourse but to trust they behaved correctly. If you are changing applications, you are probably unhappy with the first and have no history with the second, so it is unlikely you have that trust built in. The only way to reliably make the switch is to use the second approach. Export your data from the old application into a GEDCOM file and then import the file into the new application. I have done a lot of testing on many versions of genealogical software over the years, and almost without exception this process is anything but smooth or easy.

GEDCOM To The Rescue

This problem could be easily solved if makers of genealogical software would come together create an internationally recognized standard and adhere to it. Although there is no official standard out there, there is a recognized de facto in Release 5.5 of the GEDCOM Standard. Most software packages today will let you import and export GEDCOM files. Many of them will make bold claims about being 100% compliant. This gives customers the impression they can safely export their data to a GEDCOM file and then import it into another program and keep on rolling. Unhappily this is not the case. Chances are if you do a fair amount of research and have a complex database with notes, cited sources, photographs, etc., when you export your file you'll find it is not GEDCOM compliant at all. Even if it is, there is a good chance that some of your data is either missing entirely or has been placed into their application specific tags that are not readily understood by other products. Let's say you are serious and really pissed and don't feel like giving up at this point, so you go through the rigorous task of straitening out your precious data. What happens next? First you throw out the old software package that screwed you and then import your GEDCOM file into the latest version of 100% compliant GEDCOM software only to find that much of your data has gone down the rabbit hole. Worse yet, you may not even know that this has happened for months. A typical database of 5000 persons may have 50,000 pieces of data. How long do you think it will take you to find out that all of your occupation claims are missing. Once a few weeks have gone by and you have used the product to enter new data, there is usually no going back. Time to repair the damage yourself - again. Unfortunately, there is no solution that is easy for everyone. At a minimum though, it is imperative if you plan on changing products, and this includes upgrading, that you verify both packages' ability to import and export data. Remember the key here is not 100% GEDCOM compliant, but instead 100% data reliant.

Getting It Right ... Again

There are several things you must do. First you must become familiar with the GEDCOM standard. It is really not difficult. The key is to know what tags there are and which ones you use regularly. When you export a GEDCOM file from your old application you are going to have to examine it manually for errors. No one else will be able to help you with this, at least not for free. Depending on how seriously you want to protect your data, you can at this point manually scrub your file and get it put right, or accept this version as your starting place and choose to go forward from here. Either way, you'll need to validate the file is GEDCOM compliant before proceeding. As you would expect, there are a plethora of GEDCOM validators out there. You can download a copy of mine from my web site at VGed. VGed is easy to setup and use and best of all its free. It is a standalone Windows program so you can move it to your favorite utility folder and double-click on it to run. Press the Open button to open a file and then Validate to validate it. By default VGed will validate the file for compliancy. There are several settings as explained on the web site that give you some expanded control of what errors or warnings you would like displayed. You can save the log file for later reference and reload in back into VGed to display. VGed does not correct errors for you and does not modify your GEDCOM file. You will need to fix these errors manually with a text editor. This may take several hours, or perhaps days, but when you are through you will have a GEDCOM compliant and data reliant digital file will all your precious data restored. Back it up now and don't let anyone muck with it ever again - unless you enjoyed this process.

Testing It Out

Now comes the really educational part. The process of testing new software packages. You can try them all yourself or browse various blogs like this one and the one at http://www.tamurajones.net to read reviews that specifically cover products' ability to import and export GEDCOM files. In the end, you'll want to test them yourself before giving them complete control of your data. You want to be absolutely sure they'll treat it safely.

The procedure goes like this: You want to first import your GEDCOM compliant file into the new software package and then immediately, before making any changes, export the data back out to another GEDCOM file. You'll know you are in trouble if on import they generate a log telling you they found problems. Since your file is compliant any problems they found are theirs not yours. Some applications won't actually tell you they found a problem on import. Some of them will create an import log file but not show it to you. You should at a minimum, check the install directory of this program to see if they've created a file and view it if found. I think it's safe to say that if they cannot import your file correctly its time to move on. Hopefully they can and you can now check the export file. First run it through the validator to make sure it is GEDCOM compliant. If the file is compliant you next need to check that the data is identical to your import file. This does not necessarily mean that the files are themselves identical. Common differences you can expect are line length changes for note and citation text records and the addition of application specific tags. Neither of these are critical and can be expected. To compare the files, you can obtain any of the text difference engines that are freely downloadable off the Internet. KDiff3 will do the job. Look at every difference in the file to make sure your data is intact. If this worked, you are a third of the way there. The next step is easy. Import the exported file and export a second time. I've found, oddly enough, that some programs cannot reliably read there own exported files. Now validate the exported file and compare the differences, hopefully they're identical. The last step is the most crucial. Spend a few hours adding data to every field you can find, particularly those you know you use a lot. Try to keep at least an outline of the types of operations you performed for comparison later. Don't worry about messing up your data, because you have backed up your original file. When you're ready, export it a third time, then import it and export it again. Validate and compare to be sure to be sure they handles all the new changes properly. I know this seems like overkill, but you are looking for a package that won't destroy your data. At this point you are probably safe to go ahead and start using the package. If you are a power user you will probably want to make one more pass. Before making that last pass, edit your GEDCOM file and add user defined tags, some with line values, some without, and do this at every level. Don't be shy. User defined tags are any tags that begin with an underscore (i.e. 3 _URL http://www.rumblefische.com). Import the file to make sure they did not have any issues with these and then export. If the files compare I think you are good to go. I've never found a single package that would do all of these operations perfectly, so you are going to have to compromise. Every package will have some limitations. I would look for the one with the fewest issues, and issues you know you can work around. For instance if a package will not accept user defined tags, but you never use them, this is probably not an issue. Or perhaps you'll have to move all you web links into a note instead of _URL tag. So be it. As long as the data is preserved that's the key.

The Scary Details

The following is a list of notes I took when I went through this process several years ago with a file I exported from PAF and tried to import into 6 of the most popular packages. If you're thinking well, they've probably all fixed their issues by now, I tried this on another package yesterday and never got past the first step, and the problems were worse. I'll be posting on this experience soon.

  • 3 sources are missing
  • ABBR gobbles first return
  • ADDR info lost except PHON moved to RESI
  • ADDR is lost
  • CAUS lost
  • CHAN reset to import date
  • DD MMM YYY1/YYY2 changed to DD MM YYY2
  • DD MMM/MMM YYYY changed to DD MMM-MMM YYYY
  • EVEN are lost
  • EVEN type "Ship" changed to "Event-Misc"
  • NICK is lost
  • NICK is moved to a new name
  • SURN and GIVN are lost
  • Sometimes notes are concatenated to title and text is garbled
  • Sources combine notes and citations into notes using a semicolon to separate them
  • TIME is lost
  • _AKA is moved to a new name
  • _TAGs are lost
  • _TAGs are lost including _AKA
  • added DEAT Y to living persons
  • changes YYYY to YYYY/yy if it falls where between 1583 and 1752
  • chopped off citation if longer than 255 characters
  • convert "Invalid Date" to Jan 0 B.C.
  • converted 12/15 Mar 1941 and 27/28 Jun 1877 and 1889/1890 to "Invalid Date"
  • converted 8/9 Jul 1974 to "Or:1974/07/08-1974/07/09"
  • converted 8/9 Jul 1974 to 8 Sep 0 B.C.
  • converted PUBL to ABBR
  • date day is made to be 2 digits
  • double quotes in notes are changed to single quotes - breaks HTML
  • family notes are lost
  • if no day or month YYYY/YYYY changed to BET YYYY AND YYYY
  • if no day or month YYYY/YYYY changed to YYYY-YYYY
  • if only one spouse, family note moved to person note
  • if short PUBL, moved start of note to ABBR (gobbled 1st char) - rest of note in note
  • loses Gender, some family refs, and notes
  • loses all CHAN tags
  • loses all formatting on source notes and citations
  • lost 2 sources unless 2 of original weren't used
  • lost about 30 sources
  • lost one source
  • moved citation to NOTE in quotes
  • moves "Person Source" notes back to root sources
  • notes lose some tabs
  • notes spacing is messed up
  • person sources are moved to name sources
  • removed more tabs from notes
  • root sources are moved to the source of a note called "Person Source"
  • source notes are appended to publisher or title/abbr
  • suffixes are removed from name and placed in NSFX tags
  • tabs in notes are lost

Final Words

I do apologize for the long windedness of this blog, but preserving your data should be every genealogists primary goal. Be the Walrus, not the oyster.

You may be asking yourself if I know of a package that will work for you. The short answer is no. Every genealogist has a different set of needs. I have, however, found a package that works for me. I have been using it for about 7 years and it has never once in all that time lost so much as 1 bit of data, and believe me I check. I'll be discussing this package in a future blog, but so as not to keep you in suspense, the program is Family Historian.

Links

The GEDCOM Standard Release 5.5
VGed
http://www.tamurajones.net
KDiff3
Family Historian