Some of Tuesday's design decisions and issues.
Some of these will be subject to change/refinement, however...
Let me know if you notice anything that looks wrong.
- Sections of the site included in initial implementation:
-
- News and Analysis, including site's Daily Update page.
- Features.
- Start work on a Glossary.
- Some information for the InfoBank, although it's not yet
clear what access will be given to this initially.
- Contact details for Webmaster, Editor, etc.
- Begin to collecte data for Site Index.
- Home Page, with links to all sections, Site DISCLAIMER, plus
site blurb (possibly with link to longer blurb). Sunil will
need to supply suitable text for latter two items.
- Possibly a section for reports --- we'll have more idea
exactly what's involved here when we actually see one!
- Glossary: the structure of an entry
-
- The TERM to be glossed.
- Synonyms, Antonyms, related terms, see alsos (will this be
structured?).
- Brief definition.
- Longer definition (via link on full Glossary page, but immediate
if access if reached via glossary link) if possible or
necessary. This might include links to other Glossary
entries, or to a definitive definition (which
could be on an external site, but not to anywhere else
(unless anybody's got a better case here).
- Links: to a page for this term if there is one (e.g. in
InfoBank), or perhaps to its Site Index entry.
- Full glossary entry may be integrated with page for
entity in InfoBank. This might also solve some problems of
deciding where a link goes to.
- Indexing Specification
-
Indexing data should be collected from very early on. The
(human generated) Site Index may well become on of the most
valuable ways to access the site.
- Each indexed phrase requires a named anchor in the
document, and a code pointing to this (which will convert
into a URL) is stored with the entry.
- The Index term or terms for the entry (deafults to phrase).
- We should probably record the actual phrase indexed with
the entry.
- Some kind of descriptor will be needed to describe the
reference's context (e.g. a code for News Item + date).
- A caption for the entry.
- Some blurb for the index, e.g. giving the context
sentence in which the reference occurs. By default, this
will be the actual phrase itself.
- A (very rough)percentage score giving the relevance or
usefulness of this particular reference to the term (don't
think hard about this, just pick a number).
- Some very rough idea of an expiry date; e.g. a reference
to a News item might be less relevant a year later than one
in a Market Survey. The reference won't necessarily
disappear after this date, but it may be displayed after
more current references. Of course this is a very crude way
to indicate that something has gradually declining
significance, but anything more sophisticated will be too
complicated.
Eventually it would be nice to produce classified indices
(e.g. index of Names, etc.) but some of this will be covered
by database. A method to identify cognate terms would be useful.
- Some security and administrative details
-
-
UserIDs will be based on a short representation of the
subscriber's e-mail address. If possible, the full e-mail
address will be accepted as an alternative.
- Passwords for the initial release of the site will be
set and issued by the site administrator, although the
subscriber can of course request the password of their
choice. (They cannot change their own password).
- The whole site is readable by anybody with a web account
on NetLink's server, as is our password file! The passwords
are scrambled of course, but this is easy to
crack. Subscribers must NOT under any
circumstances use a password that is the same as one that
they use on any other system. I don't know exactly how
this news should be explained to them, but we must in any
case indemnify ourseles against liabilities arising from
this problem in the site disclaimer, which should be on
paper and signed before any account is activated!
- NetLink's error log files are similarly public! A smart
compliance officer will take subscribers off the site
faster than you can say "click here".
- The service agreement, including disclaimer, must
stipulate that a UserId is for the use of one person
only. This is not simply a neat scam to increase revenue
--- when the site becomes interactive, those operating it
could, for example, be held liable for any libellous
or illegal material posted whose source could not be
firmly identified. Subscribers must therefore agree to
indemnify against anything arising from use of their
account details. However, it's very hard to see how
we could claim to have taken due care to keep these
details secret at present! (Flexible licences with
concurrent user options will make this no longer a problem).
- Ditto user data.
- The disclaimer will also appear on the HomePage.
As it will not be necessary to go through this to access
the site, the paper version is essential. In the longer
term, a brief copy could be displayed every time a
subscriber logs in to the site.
- While very little on the web is truly secure,
all of the above shortcomings are due to flaws in
NetLink's system, and would not apply in the same way to a
better configured server under our control.
- Document Identifiers
-
Should work as filenames on any system (beware Mac limit of
31? characters --- pray that IMM's Macs' OS isn't so old as to
make this smaller!).
- A code (a couple of letters, say) for the type of
document. This will probably consist of one letter for the
smallest section in which the document lives, plus a code
for it's type (e.g. NI, NS, could stand for News Item and
Daily news page (with links to items) respectively.
- A six digit number unique to the main document,
of which this is a part. e.g. we could be numbering nodes
which are chapters of a report here. This number belongs to
the whole report. Number has all leading zeroes.
- One letter code describing the level of the main
document. It probably makes sense to start with, say, B or
C, for Section (Level 1), then D for Chapter, etc. N.B: this
is the opposite order to the original proposal. I think this
code may simply be for convenience.
- A dash-separated list of the various sub-sections
identifiers at their various levels, i.e. in a book, -1-2
would stand for Section 2 of Chapter 1.
- Extra components of a document are given an abbreviation
to add on, e.g. -abs for Abstract.
- On UNIX system, we may well replace some or all dashes
with slashes, i.e. create a hierarchy of small
sub-directories. This is a lousy idea on a Mac!
- What do we do with included figures, tables, etc. For
now, I think we use a sub-directory.
- These Document identifiers should work somewhat like
URLS, although they are designed to be simple and foolproof
for humans to use. For example, #-notation should be used to
refer to a named section in a document in this system. The
identifier structure may yet be altered to facilitate this
URL-like behaviour, or simply to make it look more like URL structure.
- Some brief notes on Document and Node structure:
-
- Documents' text is generally split into nodes, the
largest being a Chapter (or equivalent), and the smallest a
sub-section.
- For example, a big article might correspond roughly to
our conceptual idea of Chapter, whereas a small article
might be regarded as a Section.
- A text node which is sub-divided starts with a mini-ToC:
this has only 2 levels, and only goes down as far as
sub-subsections at the very worst. Usually it will only
reach down to subsections.
- ToCs generally have at least two levels (if they exist).
I can't imagine circumstances where they should have more
than three.
- If a division of a doc is large enough to require
its own ToC, then it should have a node of its
own. Occasionally a ToC may occur at the start of a division
within the body text of a node --- this should be a one-level
ToC at most.
- If it doesn't have a heading, then we can't see it!
- There is a case for ToCs to include sections at a higher
lever. at least along the prefix to this node in the
document tree, e.g. a Section ToC might carry the Chapter
headings for the main document, and the Section names within
its own chapter.
- The following design decision might change with
experience, however:
- The primary ToC for a main document usually
has at most two levels.
- It will only ever have three if this is necessary to
provide direct links to nodes of body text (we are
talking big document here), and even then it might
not.
- Abstract ToCs (i.e. those without body text in the node)
usually all live on sections of one page for all levels, so
they will all download in one go.
- sub-subsections (level 3) never feature in ToCs of documents with
level 0 (Chapter) nodes, and only rarely in documents with
level 1 nodes.
Tim Heap
Last modified: Thu Nov 13 18:59:52 GMT 1997