Explain different data representation methods.
Answers
Answer:
Explanation:
There are many different ways of representing longitudinal data structures
Some depend on the nature of the data ...
...or on the nature of the analysis
Others are largely equivalent
Let's take pure panel information: discrete evenly spaced state observations
One file per wave, with identical structure, identified by PID
One file, one record per respondent, identified wave by variable name/position
One file, one record per respondent per wave, identified by PID and a wave-number index variable.
These are equivalent in their information content:
We can move between them relatively easily (especially between types 2 and 3, in Stata and later versions of SPSS)
But differ in their ease of use for different purposes.
For example, to cross-tabulate a variable for a given pair of waves, type 2 is clearly better.
However, if you want to cross-tabulate current status with last year's status, pooling across waves, type 3 is better.
Status history data is relatively simple in principle: there is an observation for each time unit per person
An easy way to represent is as a wide horizontal file: one variable per time unit
Broadly equivalent is a long vertical file: one record per person-time-unit
The practical complication is combining waves
If (as in ECHP design) the reference period is a calendar year, some respondents do not report their recent experience
If (as in BHPS) a variable length reference period is used there will be overlap
With overlap, a decision for the analyst: which report to accept?
In SPSS, handling wide `calendars' by VECTOR/LOOP is straightforward
In Stata, handling long vertical files is easy
Event history data is a little more complicated
An efficient representation is to record the dates and destinations of all transitions: this is a pure event history (the act of observation must be recorded as an event)
Closely related is spell or episode history: store start of spell, state and end-date (including `on-going at time of observation' or `censored')
However, for many purposes event/episode data can be transformed into state histories, with a variable per time unit
This can be wasteful, if the average spell length is much greater than one time unit: long strings of the same data
A bigger problem is that it loses information: for instance two successive jobs with the same characteristics look like one long job.
It's also harder to think in spell terms (how long, when did this spell end/start)
But if you need to relate status in many domains, it's very convenient (e.g., you want to know job status and marital status at a particular time)