The Windows GEDCOM Validator 2.0


What is The Windows GEDCOM Validator?

The Windows GEDCOM Validator is an updated version of The Gedcom Validator 1.0. It is a fully Compliant GEDCOM 5.5 parser that will open a file, parse it, validate it based on the options configured, print appropriate warning messages to a file or the screen, build a database of GEDCOM records and then optionally save the resulting database back to a file for comparison.


History

GEDCOM 5.5 is the standard format used to transfer genealogy data between different programs or applications. The idea being that if everyone can import and export GEDCOM files, then users can switch programs at will to meet their needs. [I personally feel the GEDCOM format has way too many limitations (See the XML specification for GREnDL 1.1 for an alternate approach - and yes I wrote it)]. For people who have spent years, or in some cases, decades entering their genealogy data into a program, the idea of switching to another program may seem daunting. It shouldn't be. It should just be a matter of export/import. Viola. Right? Wrong! For some reason unknown to me, most genealogy programs DO NOT support the GEDCOM format correctly. Intentionally or not, they either provide incomplete support, or they inject errors during the import and export processes. This will lead to data loss in all but a very few cases. It would not be so bad if a user knew which data had been lost (or more appropriately - destroyed), but unfortunately, most programs simply do not inform the user except when gross import errors occur.

I recently found myself in this exact predicament. I had been using Personal Ancestral File 5.0 (PAF) for years and though I really liked the price (free), I was tired of the program's inability to handle basic GEDCOM support. So in an attempt to modernize, I started downloading other free, demo, and trial versions of genealogy programs to see what kinds of troubles I would have in porting my genealogy data. As it turns out, I was going to have lots of trouble. To test the programs, I had to first correct the GEDCOM errors in the PAF export. To accomplish this, I wrote The Windows GEDCOM Validatorr. The Windows GEDCOM Validator is a Windows MFC application using WGV 1.02 (The GEDCOM Parser) for its GEDCOM engine. WGV 1.02 is a fully compliant GEDCOM parser and validator - and yes I wrote that too. WGV does not validate enumerations. I also used The Windows GEDCOM Validator to create a test.ged file which contained at least one of every record type. I then imported these GEDCOM files into several of the available genealogy programs and then exported them back again. With only a single exception, none of the programs were able to do this successfully. Even many of the most popular, fancy, high priced ones. Each had their own problems, but all of them induced severe enough errors that would require many, many hours to manually fix them - that is, supposing you knew the problems existed in the first place. Most of these errors were not announced by the programs during import. I spotted the errors, because I ran the imported and exported files through a difference engine to compare the results.

The following is a list of some of the errors found:

  • 3 sources are missing
  • ABBR gobbles first return
  • ADDR info lost except PHON moved to RESI
  • ADDR is lost
  • CAUS lost
  • CHAN reset to import date
  • DD MMM YYY1/YYY2 changed to DD MM YYY2
  • DD MMM/MMM YYYY changed to DD MMM-MMM YYYY
  • EVEN are lost
  • EVEN type "Ship" changed to "Event-Misc"
  • NICK is lost
  • NICK is moved to a new name
  • SURN and GIVN are lost
  • Sometimes concats notes to title and garbles text
  • Source combine notes and citations into notes using a semicolon to seperate them
  • TIME is lost
  • _AKA is moved to a new name
  • _TAGs are lost
  • _TAGs are lost including _AKA
  • added DEAT Y to living persons
  • changes YYYY to YYYY/yy if it falls between 1583 and 1752
  • chopped off citation if long
  • convert "Invalid Date" to Jan 0 B.C.
  • converted 12/15 Mar 1941 and 27/28 Jun 1877 and 1889/1890 to "Invalid Date"
  • converted 8/9 Jul 1974 to "Or:1974/07/08-1974/07/09"
  • converted 8/9 Jul 1974 to 8 Sep 0 B.C.
  • converted rest of PUBL to ABBR
  • date day is made to be 2 digits
  • double quotes in notes is changed to single quotes - breaks HTML
  • family notes are lost
  • if no day or month YYYY/YYYY changed to BET YYYY AND YYYY
  • if no day or month YYYY/YYYY changed to YYYY-YYYY
  • if only one spouse, family note moved to person note
  • if short PUBL, moved start of note to ABBR (gobbled 1st char) - rest of note in note
  • loses Gender, some family refs, and notes
  • loses all CHAN tags
  • loses all formatting on source notes and citations
  • lost 2 sources unless 2 of original weren't used
  • lost about 30 sources
  • lost one source
  • moved citation to NOTE in quotes
  • moves "Person Source" notes back to root sources
  • notes lose some tabs
  • notes spacing is messged up
  • person sources are moved to name sources
  • removed more tabs from notes
  • root sources are moved to the source of a note called "Person Source"
  • source notes are appended to publisher or title/abbr
  • suffixes are removed from name and placed in NSFX tags
  • tabs in notes are lost

To test a genealogical application:

1. import test.ged into the genealogy application
2. export a GEDCOM file - export1.ged
3. reimport export1.ged
4. rexport - export2.ged
5. now validate export1.ged
6. now validate export2.ged
7. now compare test.ged to export1.ged
8. now compare export1.ged to export2.ged

Importing and exporting from your application should show no errors. The validations should show no errors. The compares should be identical. You can use WGV to accomplish the validation checks. There are numerous difference engines available on the internet for doing the comparison checks.


Where can I download WGV?

The Windows GEDCOM Validator Download

unzip the above file and move any DLLs to your windows/system32 folder.

What options does The Windows GEDCOM Validator support?

Validate Line Length - This forces checking of line value lengths, which are often not adhered to by genealogy programs. This may need to be disabled out of necessity, because the GEDCOM standard made some of these limits too small.

Validate Max Occurances - This forces checking for more than the maximum number of allowed specific content records specified by the GEDCOM standard. In some cases, genealogy programs allow this limit to be violated, for instance, if a genealogy program allows more than one citation record per source - duh! When this occurs, it may be necessary to ignore these messages.

Detect Unused Records - This forces checking that all record ids are referenced. This is an indication of a dead record..

Detect Trailing Spaces - This forces checking for trailing spaces for each GEDCOM file line.

Find User Defined Records - The forces detection of user defined tags in the GEDCOM file. User defined tags are not an indication of an error, but warnings may be a useful indication of a non-compliant GEDCOM file. Some genealogy programs support a large number of user defined tags as a normal course, so turning them off is sometimes necessary to see the real warnings.

Find Skipped Records - When an error is found in the GEDCOM file, WGV skips adding the record to the database. If this option is enabled, all of the record's contents will also have container skipped warnings.

Add Missing Required Records - Some records are required by the GEDCOM standard. If WGV finds a missing record, it can optionally add the record to the database in the correct position. When it does this, it fills in the required fields with dummy data. This is done recursively, so if adding a record requires another record to be added, this will occur as expected. A missing Header record is always prepended to the database. A missing Trailer record will always be appended to the database. This is only really useful when creating a test GEDCOM file which would then include at least one of each record in each possible position. It was used to create the test.ged file.

Add Missing Optional Records - Some records are optionally allowed by the GEDCOM standard. If WGV finds a missing record, it can optionally add the record to the database in the correct position. When it does this, it fills in the required fields with dummy data. This is done recursively, so if adding a record requires another record to be added, this will occur as expected. This will only be done for the minimum number of records allowed. This is only really useful when creating a test GEDCOM file which would then include at least one of each record in each possible position. It was used to create the test.ged file.

Remove Unused Records - Forces the removal from the database of records that are unused.

Remove Trailing Spaces - Forces the removal of trailing spaces from records.


How do you use The Windows GEDCOM Validator?


Syntax: wgv.exe