Central limit theorem

(I talk about little insights or aha moments I’ve had while learning concepts, the concepts themselves may be learned from sources far wiser than me, so I do not try to be comprehensive, instead I prod you to think by presenting the crisp little joyful moments of clarity I’ve had and invite corrections of my thought process)

Talking about the central limit theorem, I encountered this theorem many times while studying probability and statistics, without quite understanding it and as a result having a fundamental lack of clarity when it came to hypothesis testing. Why are we using the normal distribution to talk about average number of heads in a series of coin tosses? What is so ‘normal’ about tossing a coin. What about those light bulb failure rates? Why are they so faulty and how do I know they all fall in a bell curve, maybe the distribution of time to failure looks like a dinosaur tail, why a bell curve? Maybe I should just get a beer.

So today, we’ll understand a few things about the central limit theorem, twiddle around with it, with our own hands, and as a result understand a thing or two about hypothesis testing. There are many versions of this theorem, but I will restrict this discussion to the classical central limit theorem which talks about the mean of independently and identically distributed random variables. For a large enough number of such random variables, their mean will approach a normal distribution.

Before talking about what the parameters of the distribution would be, I’ll talk about the beauty of this which makes it so applicable to a wide range of problems. Remember the dinosaur tail looking distribution of time to failure for light bulbs? That may actually be so! but if I sample enough such light bulbs, the mean of their failure times, will lead to a normal distribution. The same with the average number of heads in a sample of coin tosses. You can see at once, how the convergence of all these distributions into the normal distribution is at once, frightfully wonderful and useful.

To be a little more specific. If we sample from a distribution any probability distribution, with mean  \mu  variance  \sigma^2 , then as the sample size  n  increases, the mean of the sample tends to a normal distribution with a mean  \mu  and variance  (\sigma ^2) / n

So we already get an idea of how this may be useful in testing hypotheses, given that the normal distribution is well understood (as compared to dino tails) but before delving into that. Let us play around with what we know. Observe, tinker, be silly. The jupyter notebook in the link below allows you to simulate the toss of a coin and observe how for larger sample sizes, the number of heads in a sample approximates to the well known bell curve. (The distribution of the sum of heads in a sample approaches a normal distribution as the sum is a constant times the mean. This concept, called the normal approximation to the binomial distribution can be explored in detail in the sources below.)

Press the play button on the left of the notebook cell to run the tool and observe the animation.

(Opens in a new tab, give it a bit to load the environment)

Coin Toss Notebook

Misleading Through Charts and Graphs – How you are made to buy organic food and sold other scams.

(Alberto Cairo’s paper Graphics Lies, Misleading Visuals Reflections on the Challenges and Pitfalls of Evidence-Driven Visual Communication gave guidance to the below analysis)

Humans love visual representation of data. A computer may look at long rows of data, or unstructured data even, and draw insights from it. For us humans though, that information needs to be presented as graphics we can understand, often with various shapes and colors added to drive home a key point. While I’m all for making information and trends visually insightful to humans, we must proceed with caution as often such representations can be misleading or downright dishonest. I highly recommend reading Cairo’s paper to gain a deeper understanding of this problem.
Here, I’d like to provide a quick analysis of a graph I saw on a medium article titled ‘Why We Need to Recognize and Consider Organic Foods’ .

[1] https://medium.com/@mcmahonadam2/why-we-need-to-recognize-and-consider-organic-foods-f127f69261df

I’m leaving out the statistical information on the top of the graph, including debates on the relevance of p values and R square goodness of fit values, or even the fact that correlation doesn’t imply causation, to focus simply on the visual deception of the graphic.

The deceptive tricks used fall into two categories:

  1. Too much data is represented to obscure reality
  2. Using graphic forms in inappropriate ways.
Too much data is represented to obscure reality

The graph proclaims to plot two different correlations:

between glyphosate usage and death rates from end stage renal disease

between the percentage of US corn and soy crops that are GE and death rates from end stage renal disease.

What does it show in reality though – Three data time series superimposed on each other at the same time.

Note how the x axis is time, meaning the graph doesn’t show the correlation between any two series, instead it  simply shows how three different series of data are correlated with time!

Need I point out how the series all start at different points in time. For eg: Death rates from renal disease are plotted from 1985 to 1991 even though there is no information plotted about the supposedly causal glyphosate usage and percentage of soy and corn crops that are GE.

Using graphic forms in inappropriate ways.

Now look at the Y axes.

For one, they are both truncated, also why are there two axes ? Is there a third axis for the % GE Soy and Corn series.( btw how does the same percentage apply for soy and corn)

Truncating the  Y axis helps to magnify and hence distort the magnitude of change in a series.

For a series(40,50) let’s say if the y axis is truncated at 40, the point with value 50 would look like  infinite growth from the previous point!

Including multiple y axes in data is a way to suggest correlations or superimpositions in values that don’t really exist. If I’m allowed to change the scale of the y axis and its origin, I can make almost any two series look like they correlate.

To illustrate, I constructed two series of numbers random 1 and random 2, with 1 data point each from 1991 to 2009, both series are the sum of a random number and a linear time trend.

In the above figure, the two series are plotted against time, with a common Y axis starting at the origin 0.

Above, I’ve included two y axes with truncated origins.

Hid some of the values of Random1 above, overall suggesting to a user at a first glance that the sudden occurrence of the blue line caused the changes in the orange line.

So, in conclusion, graphs are great, but they are worth pondering over beyond the initial aha moment they might create in us. 

When it hurts

Today marks 365 days spent in Canada. While I celebrate this day, frankly it hasn’t been easy at all. Around the end of March 2018, I woke up with terrible pain in my forearms. It soon spread to my legs as well. At a point, I couldn’t walk to the bathroom without severe pain and my partner had to feed me as I couldn’t bend my elbow to feed myself. Several blood draws and internal imaging rounds later, we still don’t have a definite answer about what went wrong. Just some good guesses based on intense observation. I was forced to take time off work because of my illness.

 I haven’t been in a place to talk about the fall out of my illness, especially the mental health components and the fear of judgment that comes with sharing such news with friends and acquaintances. However, now I must, because I sense learning here that I must make available and social discourse that needs kindling. Also, it involves love, support and silly laughs, and we all need more of that. Don’t worry, it isn’t a tear jerker, the ending isn’t too bad, maybe because we don’t know it yet. I must also emphatically add, that falling ill has been one of the loudest wake up calls I have had to the immense privilege I have, to be well fed and housed, while not actively earning, to have family and friends supporting me. The unearned advantages of my caste and the wealth of my parents, have benefitted me greatly.

Since extensive testing did not show up any major illness, save a minor Vitamin D deficiency. Most doctors thought, I must have fibromyalgia. That is a pretty devastating diagnosis at thirty, but oh well, in the words of that greatly missed gadfly, Christopher Hitchens, “To the question “Why me?” the cosmos barely bothers to reply: why not?” One thing that gave me hope though was that the fatigue associated with the medical life sentence, did not show up. I still am very unsure of this diagnosis. I was always up and about, except when overwhelmed by my Post Traumatic Stress Disorder. As soon as I could, I was swimming, the least impactful of exercises, and a great one to get me going. Impact is still painful, but not as bad as when I couldn’t take a walk or type a line, Today I walked all over the city, taking public transit at times, and here I am typing these lines out, though with regular breaks.

I want to talk about the times I shied away, made myself small, because of insecurities often stemming from a lack of ability to concisely communicate the nuanced nature of my condition. I do this in the hope that we humans feel less alone with these feelings, that discourse starts which makes these disabilities easier to talk about, relate to and get support for, that we stop beating ourselves up about terrible numbers thrown by giant uncaring cosmic dice, and that we might go on to accept ourselves and treat ourselves with kindness. So here’s to needing a seat on a crowded train because you can barely hold on to anything, but not knowing how to ask for it because you look young and healthy. Here’s to not being able to quickly explain that I can post on social media because I can type for short bursts but can’t sustain it to the levels required for a full time job, to wondering whether I owe someone an explanation for why I don’t use my voice instead, no I don’t, and I will not offer one here, just to make that point. It took me time to realize, I’m valid, even when I take time for myself to heal, even when I lean on my spouse, I’m still an active part of the unit, I hope it takes others less time to reaffirm their self worth when they need to seek help. I am also aware of those who need help and don’t get it at all, even in a compassionate nation like Canada where the society offers helping hands rather than talk about invisible ones. Sartre said, “Hell is other people.” I hope we don’t turn into our own hell, with that judgmental voice forever in our head.

I also want to talk about times of anguish, suicidal thoughts, the interaction of mental and physical health conditions that exist simultaneously and amplify one another, but that will take several more posts to delve into. Here I’ll just express gladness that when I thought it was all over, it wasn’t. While I thought I’d never get up from bed, there were days of punching out push ups with the very same arms and long walks along the sea, waiting, just a few weeks away.

Let me now talk about happiness, that feeling I get when I wake up and see my partner’s face, the knowledge that we’ll face it all together and still make our time on this rock in these decaying bodies worthwhile. Some fortunate fall outs here, I made peace with my family, or at least started the long process of healing. We have eight dogs in Bangalore, our family has grown so much, I have in the past months received more love than thought possible. I learned that swiping is less impactful than typing and hence started my love affair with my Huion graphics tablet, creating silly art that I am now going to share with all of you. I created in my head and with my pen, the character of a little girl with attitude I wish I had and nerdiness I wish I’d explored earlier. Hope you’ll check out my page, have some laughs and even hang out there to share your thoughts.

I want to say a big thanks to my family. My mother who has cooked nourishing meals for me, my dad who I am very proud of, for joining me in a journey of healing, along with our four-legged little kids. My sister for her patience and love through all this. Sidhartha, oh gosh, what can I even say about the support he’s given me, I know running out of words is cliché, but I really have, let’s just say, I never knew I could love someone so much. Varsha, for being that rock-solid backbone through all times, she’s one of the strongest people I have known and I have greatly leaned on that strength.

I also want to say a big thanks to my friends, everyone at Nirmukta who heard me vent and gave me unconditional understanding. Specially to Joy without whose advice and support I wouldn’t have known where to begin. I know I haven’t won an award to thank so many people, but for a person who could barely eat by herself in April 2018, typing this out is just an incredible personal achievement and I relish that.

I will in the future talk about the specifics that I feel helped my health, in coming posts. While I’m no doctor and certainly don’t advocate self-treatment, if you read up on repetitive strain injuries, you’ll know that doctors can only help to an extent, many non evidence-based practitioners are out to scam you, and a lot of help has traditionally come from sufferers themselves, in my case largely from computer programmers.

Hope you keep reading, and more importantly, responding with your thoughts and sharing this, so I can reach out to many others.