2007-01-08

Reading surveys

a couple weeks ago some celebrated columnist (I really forgot who wrote it) in either Apple Daily or Next Weekly commented about an internet survey about the now infamous clock tower on Star Ferry (island side -- my mom still confuse it with the train terminal tower on TST side).

The results said something like this (I do not remember exactly and do not have time to find the old issue of Next)
  • 50% agree it should be preserved
  • 33.33% against
  • 16.67% neutral

The columnist went on to say that this show what the Hong'er think.

The numbers look a bit fishy to me. If you know a bit of number theory or arithmetic, you may have recognized the ratios are exactly 3:2:1. i.e. the number of people answered the internet survey are most likely multiple of six, and that the different categories of answers are also in exactly that ratio.

Here my gut feeling is that the size of the survey is very likely to be EXACTLY 6. Because, people into probability and statistic theory may be able to tell you, it is extreme unlikely and coincidental that survey results turn out ot be exactly that ratio at larger numbers. e.g. if the sample size is sixty, one could generate a Monte Carlo simulation (or go through the trinomial distribution) of the partition is "roughly" ratio of 3:2:1, and would find that the probability that the cases of exactly 30:20:10 will be extremely rare. More likely it will be 29:22:9 or something like that, and there are so many of these combinations that you rarely get the exact case.

I saw too many of these results in the Fast Eastern Economic Review Executive Surveys in the past, they are all "ound factions", in some cases I would safely bet the sample size was 3 or even 2. (e.g., It would be quite likely it is 6 or 9 instead of 3, but extremely unlikely that it would be 12 or larger)

Now, if the Star Ferry CLock Tower survey only has a sample size of 6, how does it tell about what our people think? Remember it is an internet survey, those who feel very strongly about the subject may simple use several IP addresses to vote (home, neighbor through unsecured wi-fi, work, star bucks, airports).

This would be my tip to identify small size samples in a survey

  • .50 multiple of 2
  • .33/.67: most likely mutiples of 3
  • .14/.28/.57/etc, multiples of 7
  • .11/.22/.33... multiples of 9
  • etc.

When I say "multiple", I mean most likely it is just that number, not its multiples. Even if it is, at best 2x , not 3x or 4x.

No comments: