The DATABASE - principles and practical hints

 
A good database structure is a necessary for best use of the data. It can be very simple, but som basic principles should alway be followed. Below we list these principles and some practical hints. We also enclose a template Excel file that you can copy and use as a starting point. The recommendations given below relate to this template. 
 
Excel is not ideal as database due to risk of messing up data during sorting etc..  Access is better as database, however if you know that you will be using a specific statistical software like JMP or STATA, you can as well use them as databases right from the start. Regardless which one you use, the data can easily be converted from one to another (but not the log and the info - see below).
 
We strongly recommend that you follow the principles below. Note also that a well organized database is prerequisite for getting assistance from EPI the centre in the analysis of data.

The database consists of (at least) three sheets: (i) the datatable  (ii) the log sheet and (iii) the info sheet.

In the example Excel file below we have also included a sheet with practical hints and warnings that are specific to use of Excel.

The datatable

  • This is where the  data are punched and stored.
  • Main organization is a table with variables as columns and observations as rows.
  • An ID variable containing a unique identifier for each individual unit (e.g. animal) must be included.
  • If you have repeated observations on the same unit, then punching one new row - with same ID - for each repetition is usually better than puncing repeated observations as new variables.
  • Keep all original data values as detailed as possible. Do not group the values. This can be done by use of software after data has been punched.
  • Some statistical software (like STATA) require that all variable values to be used in analysis are numerical i.e. numbers only.
  • Make sure to have backup copies of the database while entering data.
  • Store a mother copy of the database and lock it when the database is complete, and use a working copy for analysis etc.
  • Variable values in the mother database computed by the database software, should be changed to and stored as actual values (by "copy special") to prevent problems with formulas when using/copying to statistical software.
  • If you discover errors in the mother database, then save a copy as version "old" of the mother file and make sure to document (in the log) the corrections you make.
  • For date and time variables, make sure that these are entered and understood as date and time values (and not as text or numbers!)

The log sheet

  • The purpose is to keep track of the history of the database.
  • Write the date and info about changes in database structure and/or about editing of data + your initials if several persons edit the same database.

The info sheet

  • The purpose is to have the basic info about the database and the variables orderly available in one place.
  • Give a general discription of the dataset
  • List all variables and give info about name, describe what it is ("label") and list the codes with value if any.

 

The spread sheet template