===== Universal Floppy Format ===== This document describes a new low-level format for floppy images that tries to be able to represent everything that has existed, including support for copy protections, in a sufficiently unified way so that multi-system emulators, or hardware flpppy emulators, only have to implement the format once. We also want it to provide enough metadata information so that tools for classification and exploration of images are easier to make. The main characteristics are: * Low-level format, can represent any information that was usefully put on a floppy disk * Information about the type of media, to check for physical compatibility * (Untested yet) support for single spiral track (Quickdrive) * Optional information about: * Targetted system(s) * Encoding characteristics (floppy rpm per track, cell size, data rate, encoding) * Type of filesystem, if any * Methodology to rewrite the image to a physical floppy * Optional pictures for front, back, title screen, etc * Extensible in the optional information category, fixed in the data encoding, to avoid needing too many updates The format is not designed for easy writing by emulators (even if it's perfectly possible). ==== Related work ==== A number of types of formats for floppy images exist. === Sector/data based formats === A large number of individual formats exist, per-system or multi-system, which only store the data present and some metadata like sector numbers and crc errors. They require a lot of guessing by any low-level emulator, and can encode only to most basic of protections, which makes them unsuitable for preservation. === Observational formats === In this category we indicate the formats that try to store the results of all the commands that can be given to the floppy controller. Examples are the Pasti format (stx, for Atari ST) and the nib format for the apple 2. They still lose information about timings and relative synchronization which can be quite important (count of sync bits for nibble, actual read duration for Pasti) and also don't handle very well when the result changes from one time to the other. They also require a lot of guessing to solve what actually was on the floppy, making low-level emulation difficult and rewriting floppies very hard. === Unresolved flux formats === The formats (scp, a2r...) store the raw floppy flux changes stream for a number of rotations of the disk. They tend to be not usable directly are require "solving" what's actually on the track, by aligning flux changes between rotations, and that's a complicated operation which requires a lot of heuristics and computation time. === Resolve flux formats === Mame's mfi, and the single-rotation resolved a2r variant could be usable as low-level formats. The main problems is that they tend to be big (a2r) or require using a compression library (mfi), and don't store nearly enough metadata to make them a nice archival format. === Mono-system low-level-compatible formats === The woz and moof formats are part of that category. They are perfectly usable for the system they're designed for. But following that path would require creating new formats for every new system which quickly creates an annoying maintainance burden. === Multi-system low-level-compatible formats === The only format we know of in that category is ipf. It has a number of problems. The first is that it's a closed, proprietary format (even if it has been throughly reverse-engineered already), which disqualifies it from the start. For a technical point of view, its handling of non-standard flux timing is extremely limited (to Rob Northen protections to be precise) and can't (or couldn't at the time of the reverse-engineering) handle protections like the Dungeon Master/Oids one. It also has issues when trying to make it work in non-mfm systems, has a not very useful data/gap distinction, and other more minor issues that cannot be fixed given its proprietary nature. ==== Format description ==== An image is a binary file composed of: * A fixed header * An index of information blocks * The information blocks It's semantically equivalent to a tagged format, but with the tags replaced by an index at the start to avoid requiring non-continuous access to the contents to find a given information. All multi-byte values are stored in little-endian order. Bit numbering starts with the lowest bit. FOURCC (sequences of 4 characters) are used to type a number of things and a written msb-first so that they're directly readable when looking at the file, which makes they representation as 32-bits values byte-inverted. All information blocks start at an offset multiple of 4, all 32-bits values and FOURCC are 4-bytes aligned, all 16-bits values are 2-bytes aligned. === Header === ^Offset ^Value ^ Purpose ^ |0 |55 46 46 31 |The ASCII string ‘UFF1’, 0x31464655 | |4 |FF |Make sure that high bits are valid (no 7-bit data transmission) | |5 |0A 0D 0A |LF CR LF – File translators will often try to convert these. | === Index === The index directly follows the header. ^Offset ^ Value ^ Purpose ^ |8 | 32-bits | Number of entries in the index | |12+12i | FOURCC | Information block type | |12+16i | 32-bits | Offset of the block in the image (multiple of 4) | |12+20i | 32-bits | Length of the block | Some blocks are optional, some are required, some can appear multiple times. If the length is not a multiple of 3 the gap between a block and the following one is filled with zeroes. The index entries are in no particular order. Blocks have no header. List of defined block types: ^ FOURCC ^ 32-bits value ^ Optional ^ Multiple ^ Purpose ^ | INFO | 0x4f464e49 | No | No | General media information | | CREA | 0x41455243 | Yes | No | Image creator information | | SYST | 0x54535953 | Yes | No | General target system information | | TLST | 0x54534c54 | No | No | Tracks list (points into TDAT and TTYP) | | TTYP | 0x50595454 | Yes | No | Track types | | TDAT | 0x54414454 | No | No | Tracks contents | | INDX | 0x58444e49 | Depends | No | Sector positions for hard-sectored media | | CSUM | 0x4d555343 | Yes | Yes | Image checksum | | PICT | 0x54434950 | Yes | Yes | Embedded pictures | === INFO - General media information === ^Offset ^ Value ^ Purpose ^ |0 | FOURCC | Media form-factor | |4 | FOURCC | Media variant | |8 | 32-bits value | Flags | The form factor indicates in which kind of drive the floppy can physically fit. The variant indicates which kind of media it is within that form factor. Some specific variants can be detected by the floppy drive and the information given to the host (DD vs. HD vs. ED 3.5" floppies, single or double-sided 8" floppies), other can't (single vs. double-sided 5.25" floppies). Defined form-factors: ^ FOURCC ^ 32-bits value ^ Form-factor ^ | 28 | 0x20203832 | 2.8" spiral-track floppy (Quickdisk, Thomson, Famicon) | | 3 | 0x20202033 | 3" floppy (Amstrad, Oric...) | | 35 | 0x20203533 | 3.5" floppy (Lots) | | 525 | 0x20353235 | 5.25" floppy (Even more) | | 8 | 0x20202038 | 8" floppy (Verrrry old systems) | Defined variants: ^ FOURCC ^ 32-bits value ^ Variant ^ | SSSD | 0x44535353 | Single-sided single-density | | SSDD | 0x44445353 | Single-sided double-density | | SSQD | 0x44515353 | Single-sided quad-density | | DSSD | 0x44535344 | Double-sided single-density | | DSDD | 0x44445344 | Double-sided double-density (720K in 3.5, 360K in 5.25) | | DSQD | 0x44515344 | Double-sided quad-density (720K in 5.25, means DD+80 tracks) | | DSHD | 0x44485344 | Double-sided high-density (1200K/1440K) | | DSED | 0x44455344 | Double-sided extra-density (2880K) | Flags: ^ Bit ^ Meaning | | 0 | 1 is the media is supposed to be write-protected, 0 otherwise | | 2-1 | Track resolution, 0 = full track, 1 = half, 2 = quarter, 3 = eighth | | 3 | 1 if the information for rewriting to a real floppy is present | | 4 | 1 if writing to a real floppy is not possible without unavailable hardware | | 5 | 1 if the disk has a spiral track | | 31-6 | Must be zero | Bit 4 is used in particular for the Electronics Arts "Wide head" protection where a special extra-wide write head was used to cover multiple tracks at the same time with, as a result, perfect sync between them. The spiral track case is special: the emulation must automatically jump to the next track when reaching the end of a track, instead of wrapping. There is no seek, instead a "rewind" signal is used to restart at the beginning of track zero. === CREA - Image creator information === TBD, that's where we'd put Applesauce or Mame or whatever, and probably a time of creation too. === SYST - General target system information === TBD. That's where we'd put the list of systems that are supposed to handle it (some floppies are bootable on multiple systems, like atari/pc, or atari/amiga, or some other combinaisons). Perhaps also indicate which models actually support it when it's not all of them, I'm thinking mac categories here. We'd also put the filesystem information, if we've detected one, like DOS, PRODOS, AFS, MFS, HFS, FAT12, STFAT, AMIGADOS, etc. In general I don't know in that block if we want to go FOURCC or strings. === TLST - Track list === This block holds the list of tracks present on the disk. Each track has a fixed-size structure. The block size divided by the struct size gives the number of formatted tracks. Unformatted tracks do not appear in the list. The per-track structure (12 bytes long): ^Offset ^ Value ^ Purpose ^ |0 | 8-bits value | Track number | |1 | 8-bits value | Head number | |2 | 8-bits value | Sub-track number (0-1, 0-3 or 0-7) | |3 | 8-bits value | Track type (0 if no TTYP block) | |4 | 32-bits value | Offset in the TDAT block of the specific track data, multiple of 4 | |8 | 32-bits value | Length of the specific track data | Tracks are in no particular order, track data in the TDAT block is in no particular order either. === TTYP - Track type === This block holds a vector of meta-information about the tracks on the disk that are not necessary to manage the data but can be useful for higher-level processes, or for better hardware usage in the case of floppy emulators. Each information is a fixed size struct, the vector is indexed by the TLST track type. Multiple tracks can share the same type, and in practice the number of different track types is typically one and usually does not go over 8 (for Lisa systems). The per-type structure (16 bytes long): ^Offset ^ Value ^ Purpose ^ |0 | 32-bits value | Normal floppy drive rpm used to access this track | |4 | 32-bits value | Theoretical minimal flux separation for the track, in nanoseconds (image can have some faster successive changes though) | |8 | 32-bits value | Theoretical cell duration, in nanoseconds (usually equal or half the previous value) | |12 | FOURCC | Encoding of the track, if known, 0x20202020 otherwise | TBD: Define the encodings (apple gcr, commodore gcr, ibm fm, ibm mfm, m2fm, ibm fm/mfm mix, amiga mfm, others?). === TDAT - Track contents === This block holds the flux information present in the tracks. It is further divided in segments which are pointed to by the TLST block. A given track contents are represented as a series of blocks encoding the information present. There are three block types: * Bitstream block, where the flux changes are nicely regular over an area * Flux block, where the flux changes have been tweaked, usually for a protection, and require precise irregular timing * Damaged area (laserlock) which behaves as an unformatted region that can not be written to Regions not covered by one of those blocks are unformatted. A typical mastered track will have one bitstream block covering the whole rotation. A normal formatted then written-to track will have multiple bitstream blocks because each write will have a different phase compared to the others. Some of the thoughest protections will also have flux blocks in addition to the bitstream ones, or may even be full flux. The damaged area is specific to the laserlock protection. The blocks are ordered following each other as the disk rotates. The first block can start from anywhere except in the spiral track case where it is required to start at position zero. == Common content block header == Each content block starts with a fixed header which gives its type, its start and end position, and a number of flags to help for re-mastering the track to a floppy. Positions are angles (e.g. rpm-independant), in 1/200,000,000th of a turn. Angle 0 is at the index, if the system has an index pulse, or at an arbitrary position on the disk that must be common to all tracks. They can be converted to times using either a known rpm (when emulating a knows floppy drive) or using the one present in the TTYP block. ^Offset ^ Value ^ Purpose ^ |0 | 8-bits value | Block type, 'b', 'f' or 'd' | |1 | 8-bits value | Flags | |2-3 | 0 | Padding | |4 | 32-bits value | Start position of the block (0-199,999,999) | |8 | 32-bits value | Length of the block, as an angle (1-200,000,000) | Blocks can wrap around position 0 (position of the index pulse). Blocks must not overlap with one exception: consecutive bitstream blocks can overlap if the overlapping zone includes no flux change. Flags: ^ Bit ^ Meaning ^ | 0 | 1 = When remastering a disk, a write sequence should start at that block | | 1 | 1 = When remastering a disk, a write sequence should end at that block | | 2 | 1 = When remastering a disk, that block should be ignored | | 7-3 | Must be zero | Remastering is explained further later in the document. == 'b' - 0x62 - Bitstream block == ^Offset ^ Value ^ Purpose ^ |12 | 32-bits value | Number of cells in the block | |16+ | Bitstream | Flux change info, one bit per cell | The bitstream is a series of bits packed in a sequence of bytes lsb-first. The block is divided in cells of equal size (up to rounding) and for each cell a flux change happens exactly in the middle if the corresponding bitstream bit is 1. The bitstream is padded with zeroes until reaching a multiple of four bytes to ensure every block is 4-bytes aligned. == 'f' - 0x66 - Flux block == ^Offset ^ Value ^ Purpose ^ |12 | 32-bits value | Number of cells in the block | |16+ | 32-bits values | Position of the flux change, as an angle relative to the index | == 'd' - 0x64 - Damaged block == A damaged block has no additional information. === INDX - Hard sector positions === This blocks provide pulse position information for hard-sectored floppies. The main index pulse is at position zero. ^Offset ^ Value ^ Purpose ^ |0 | 8-bits value | Number of hard sectors | |1 | 8-bits value | Flags | |2-3 | 0 | Padding | |4+ | 32-bits values | Pulse positions | Flags: ^Bit ^ Meaning ^ | 0 | 1 = The index pulse is special and is distinguished as such by the drive (usually means it is as a different distance form the center | | 7-1 | Must be zero | === CSUM - Checksums === These blocks store the value of the checksum of the whole image with all the hashes of all checksum blocks simultaneously set to zero. Multiple checksum blocks can be present in the file, if wanted. ^Offset ^ Value ^ Purpose ^ |0 | FOURCC | Checksum type | |4+ | Variable | Checksum result | Defined checksum types: ^ FOURCC ^ 32-bits value ^ Hash length ^ Checksum type ^ | CRC | 0x20435243 | 4 | Insert AS crc32 here | | SHA1 | 0x31414853 | 20 | SHA-1 | | S256 | 0x36353253 | 32 | SHA-256 | Checking the checksums, if present, is not in any way required. === PICT - Embedded pictures === These blocks store pictures useful for the image user. The block size minus 8 is the embedded picture size. ^Offset ^ Value ^ Purpose ^ |0 | FOURCC | Picture usage | |4 | FOURCC | Picture format | |8+ | Data | Picture data | Defined picture usages: ^ FOURCC ^ 32-bits value ^ Usage ^ | FRNT | 0x544e5246 | Front view of the floppy | | BACK | 0x4b434142 | Back view of the floppy | | TITL | 0x4c544954 | Title screen of the application | Defined picture formats: ^ FOURCC ^ 32-bits value ^ Format ^ | PNG | 0x20474e50 | PNG file | | JPEG | 0x4745504a | JPEG/JFIF file |