Dungeon Master: Swoosh Construction Kit - A data extractor




"Oh, wow--that totally rocks!
Thanks a million. You've saved me an incredible amount of work."

Mon Ful Ir, Aug 29 2008

"So I checked out the XSLT converter and my goodness: this is an amazing piece of work!
It's fantastic that you can convert DUNGEON.DAT into XML because this info can be used by anyone now (well, anyone who knows how to use XML files).
Excellent work!"

Gambit, Feb 13 2008

"Greatstone's work looks interesting."
Paul Stevens, Feb 14, 2008

i'm gathering info enough to render complete dungeon viewport (DM snes style) in some ways: (i) using sck (swoosh construction kit) to extract viewport information. it includes decoder for "558 item", very excellent!
Kentaro, Mar 11 2007


Technical documentation - PAK file format

This section is devoted to the description of the PAK file format, supported by the sck tool. If the Encyclopedia site rewrites the "technical documentation" part to easily add more descriptions, perhaps this section will be included in it.
1.0 (05 July 2006): initial release.

Summary

These PAK files are used in DM and CSB (Atari) games to store the main program. The known PAK files are named "START.PAK".
The PAK format is a proprietary file format done by FTL and is a compression algorithm. You can see a PAK file like a zip compressed file if you want.
That's why the associated mapfile will describe the decompressed data in fact and not directly the PAK file.

Global structure:

  • PAK header
  • Atari ST executable header
  • Compressed data
The uncompressed data can be described by a mapfile and corresponding items can be extracted by the sck tool.

The following notation has been used:

0x1234hexadecimal word 1234
1011b1011 in binary, 11 in decimal
26:offset 26 in decimal
nibble4 bits
byte8 bits = 2 nibbles
word16 bits = 2 bytes
dword32 bits = 2 words

Structure

PAK header (4 bytes):
00: 1 dword: file size in words = file size in bytes / 2.
Atari ST executable header (28 bytes):
04+00: 2 bytes: magic identifier (0x601A).
04+02: 1 dword: size of the text segment, which is the uncompressed data part.
04+06: 1 dword: size of the data segment.
04+10: 1 dword: size of the bss segment. see note 1 about BSS.
04+14: 1 dword: size of the symbol table.
04+18: 1 dword: reserved.
04+22: 1 dword: bit vector that defines additional process characteristics, as follows:
Bit 0 PF_FASTLOAD - if set, only the BSS area is cleared, otherwise, the program´s whole memory is cleared before loading.
Bit 1 PF_TTRAMLOAD - if set, the program will be loaded into TT RAM.
Bit 2 PF_TTRAMMEM - if set, the program will be allowed to allocate memory from TT RAM.
Bit 4 AND 5 as a two bit value with the following meanings:
0 PF_PRIVATE - the processes entire memory space is considered private.
1 PF_GLOBAL - the processes memory will be r/w-allowed for others.
2 PF_SUPER - the memory will be r/w for itself and any supervisor proc.
3 PF_READ - the memory will be readable by others.
04+26: 1 word: is NON-ZERO, if the program does not need to be relocated. is ZERO, if the program needs to be relocated.
note: since some TOS versions handle files with ABSFLAG>0 incorrectly, this value should be set to ZERO also for programs that need to be relocated, and the FIXUP_offset should be set to 0.
Compressed data (x bytes):
32: 1920 words: table of the most frequent words used by the decompression.
1952: x bytes: compressed code with the same algorithm than the FTL HUNK_CODE part. see note 2 to know how to decompress it.
Notes:
Note 1: BSS
A BSS (Block Started by Symbol) section contains all reserved and uninitialized space in memory.
[back to structure]
Note 2: How to decompress the data part
To decompress the data part, allocate a byte array with size = size of the text segment, found in the Atari ST executable header (offset 00).
The number of iterations can be computed from the file size decoded in the PAK header (offset 04+02).
It is the same decompression algorithm than the one used in the FTL file format for HUNK_CODE part.
  word[] most_frequent_words = new word[1920];
  // ... this array is filled by the previous 1920 bytes
  byte[] uncompressed_code = new byte[uncompressed_code_size];
  int uncompressed_code_index = 0;
  int number_of_iteration = (file_size_in_words * 2) - 28; // 28: size of the previously decoded Atari ST executable header
  for (int i = 0; i < number_of_iteration; i++) {
      nibble = get_nibble(compressed_code);
      if (nibble == 0xF) { // 1111b
        nibble_1 =  get_nibble(compressed_code);
        nibble_2 =  get_nibble(compressed_code);
        uncompressed_code[uncompressed_code_index] = byte(nibble_1, nibble_2); uncompressed_code_index++;
        nibble_1 =  get_nibble(compressed_code);
        nibble_2 =  get_nibble(compressed_code);
        uncompressed_code[uncompressed_code_index] = byte(nibble_1, nibble_2); uncompressed_code_index++;
      } else if (nibble >= 0x8) { // 1000b
        nibble_1 =  get_nibble(compressed_code);
        nibble_2 =  get_nibble(compressed_code);
        word = word(nibble, nibble_1, nibble_2);
        // maximum value for word is achieved which nibble = 1110b, nibble_1 = 1111b, nibble_2 = 1111b
        // as nibble must be less than 1111b in this else
        // maximum value for word is 3839
        // minimum value for word is achieved which nibble = 1000b, nibble_1 = 0000b, nibble_2 = 0000b
        // as nibble must be >= 1000b in this else
        // minimum value for word is 2048
        // consequently, to use word as an index in the most frequent words, word must be 0 < word < 1920
        // so...
        word = word - 1920;
        // maximum value for word is 3839 - 1920 = 1919
        // minimum value for word is 2048 - 1920 = 128
        // in this else, the indexes used to get data from the most frequent words table are: 128 <= word <= 1919
        uncompressed_code[uncompressed_code_index] = get_most_significant_byte(most_frequent_words[word]);
        uncompressed_code_index++;
        uncompressed_code[uncompressed_code_index] = get_less_significant_byte(most_frequent_words[word]);
        uncompressed_code_index++;
      } else {
        nibble_1 =  get_nibble(compressed_code);
        byte = byte(nibble, nibble_1);
        // maximum value for byte is achieved with nibble = 0111b, nibble_1 = 1111b
        // as nibble must be > 1000b in this else (so >= 0111b)
        // maximum value for byte is 127
        // minimum value for byte is achieved with nibble = 0000b, nibble_1 = 0000b
        // minimum value for byte is 0
        // in this else, the indexes used to get data from the most frequent words table are: 0 <= byte <= 127
        uncompressed_code[uncompressed_code_index] = get_most_significant_byte(most_frequent_words[byte]);
        uncompressed_code_index++;
        uncompressed_code[uncompressed_code_index] = get_less_significant_byte(most_frequent_words[byte]);
        uncompressed_code_index++;
        }
      }
    }
  }
[back to structure]

Credits

Christophe Fontanel, who understood first this format.
DaFi, for his note about the Atari ST executables format.