DeSpace

This program removes missing-space errors in a SRT subtitle file.
example: 'What areyou doing?' is fixed to 'What are you doing?'
Please contact me if you have anything you wish added, or if you would like a copy of this program that would work on another format of text file.

    Usage: DeSpace in.srt out.srt
The pre-alphebetized text file 'words' must be located in the same directory for this program to function properly.
An optional file 'words2' may also be included. This file is read in unsorted, and is also used (with no sorting or optimizing.

General notes:

  • When it finds a word not in the dictionary, it looks at all possibile splitting points to see if it can make a set of words. It will always choose the splitting that yields the longest possible first word. This causes problems once and a while, like "beastairway" is fixed as "beast airway" instead of "be a stairway" :).
  • When there is more than one possibility, it will display an Ambiguous Splitting warning. If your console is not tall enough to hold all of the messages it spits out (i.e. if you're using Win9x and can't change the number of lines in the screen buffer), then you should launch the program like this:
      DeSpace in.srt out.srt > res.txt
    And you will have a file "res.txt" that contains all of the messages displayed while parsing.
  • If something is not in the dictionary, it may split it in weird ways using words that are in the dictionary, like "a", "I", "re", and "ow". It is recommended to update the supplemental dictionary file with words not found that should be in there (a lot of compound words are missing, it seems), and names of characters. It will also reduce the number of warnings that are displayed.
  • Contractions aren't handled very well, and therefore the roots (like "aren", "don", and "isn") are in the supplimental dictionary.

Download:

Credits

  • Jimb Esser programming
  • The included dictionary is fairly old, but it is currently public domain
  • The supplimental dictionary was compiled while running this program on the first 4 DVDs of Trigun, so there is a section in the supplimental dictionary that contains a lot of names and words from those.
Back to Jimb Esser's webpage.