Assessing Writer Performance
As with readers, you can't improve the performance of a writer unless you can first assess how well it is already performing. But assessing the speed of writing has the same complexity as reading: FME starts writing data as soon as it becomes available, and doesn’t necessarily wait until processing is done.
False Writer Performance
Take the previous workspace, which read a set of address points and found their nearest neighbor, but now has a writer too:
According to the log file, we find that the output dataset is open for writing before the source dataset has even finished being read!
Opened Shape File C:\FMEData2017\Output\PostalAddress.shp for output Opened DBF File 'C:\FMEData2017\Output\PostalAddress.dbf' for output DBF Writer: Writing DBF File using character encoding 'ANSI' Reading source feature # 5000 Reading source feature # 7500 Reading source feature # 10000 Reading source feature # 12500
If you examined this log you would get quite a false representation of how well the writer is performing.
True Writer Performance
So, how can we assess writer performance? Plainly this is different to readers. If you isolate the writers by disabling everything else, there won't be any data to write!
The easiest way is to disable the writer itself!
If the result is this:
Translation was SUCCESSFUL with 0 warning(s) (0 feature(s) output) FME Session Duration: 14.2 seconds. (CPU: 13.4s user, 0.4s system)
...whereas previously it was this:
Translation was SUCCESSFUL with 0 warning(s) (0 feature(s) output) FME Session Duration: 16.0 seconds. (CPU: 15.2s user, 0.4s system)
...then we can easily calculate that the writing process is taking 1.8 seconds.
Another method is a two-step process. Firstly add a Recorder transformer – after all other transformation has taken place – to preserve the data in FFS format at the moment it is about to be written:
Now replace the Recorder with a Player transformer – to re-read the preserved FFS data – and disable everything else up to that point:
Now the data will be played back into the workspace, and is followed up by being written to the output.
This second method is useful because it allows you to make changes to the writer and re-run just the writer part of the process; the first method would require you to run the entire translation all over again to get a new result.
|Jake Speedie says…|
Be aware that fanouts will complicate your view of the log, not to mention slow the process somewhat.
For example, if I do a Dataset Fanout on the above workspace, FME creates a separate writer for each fanned-out dataset. Firstly this causes a performance hit – because FME has to cache all the output data and create multiple writers – and additionally I get a section of log for each output dataset.