Tutorial :Age distribution of Stack Overflow users


Update: I decided to rerun the script since we now have about twice as many users. The distribution trend is much more pronounced in this version. Though the following text is from the original version of this post, the data and graph are new.

Update: Here is a StackExchange Data Explorer query

Since I was curious about the average age of the users here, I decided to write a script to answer the question in the vein of Grant's profile scraper. The script downloaded the profile page of every user and cataloged the age for each person that chose to provide it. The script then printed out the aggregate data.

#!/usr/bin/perl    use strict;  use warnings;    use LWP::Simple;    my $MAX_USER_ID = 15_000;    my %Ages;    $| = 1;  # print immediately    print "Fetching...";  for my $uid (1..$MAX_USER_ID)  {      my $page = get "http://stackoverflow.com/users/$uid";      next unless $page;        $Ages{$1}++ if $page =~ m/Age\s*<\/td>\s*<td>\s*(\d+)\s*<\/td>/;      print "$uid..." if $uid % 10 == 0;  }  print "\nDone.\n";    for my $age (sort keys %Ages)  {      print "$age: $Ages{$age}\n";  }  

The output is as follows:

8: 13  9: 3  11: 3  12: 1  15: 2  16: 14  17: 8  18: 21  19: 49  20: 58  21: 91  22: 118  23: 173  24: 221  25: 255  26: 284  27: 276  28: 289  29: 225  30: 224  31: 212  32: 167  33: 164  34: 142  35: 112  36: 123  37: 111  38: 121  39: 78  40: 65  41: 60  42: 36  43: 30  44: 23  45: 30  46: 20  47: 12  48: 9  49: 12  50: 8  51: 10  52: 3  53: 2  54: 4  55: 3  56: 1  58: 4  60: 1  61: 3  63: 1  66: 1  68: 1  88: 12  

The average age of a Stack Overflow user is 30.1, assuming everyone is honest in their profile (though we do have more 8-year-olds than I would have expected). Finally, here is a graph showing the distribution more clearly. That outlier to the right are our eight 88-year-olds still going strong. If we remove the 8 and 88-year-olds from the average, it drops us down by 0.1 years to 30.0. (Actually, it's a drop of 0.104.)

Alt text


Cool. Now let's see a distribution of reputation by age...


I always think of Mark Twain when I see things like this:

There are three kinds of lies: lies, damned lies and statistics.


I was impressed by this, so I have taken it further, to include reputation and membership length:



Eight and 9 year olds seem unlikely, but the 11's and 12's could be real. Jeff & co probably should look into this: Child Online Privacy Protection Act and take appropriate action.


Since there's no value in entering a valid age, I imagine many of the ages are invalid. And there's been a lot of validating the birthdate field performed by users.


For the 88 year old, when I try to set my birthdate to 1919, it shows this:

There were errors when updating your profile

* Birthday must be after 1920/01/01  

When I tried to set it to 2020, it shows this:

There were errors when updating your profile

* Birthday must be before 2000/12/06  

The two edge cases make sense to me.


88 year olds are most likely false as well since I think that is the max allowable age.


Damn, I feel (relatively) old now.


All the 8-year-olds must be people that have "2000-01-01" as default date of birth.


What's interesting is what the data possibly reflects. I mean does the curve reflect the "prime-time" portion of a programmer's career?


Can those of us to the right of the bell have a Veterans Badge (or Old Gits)?

*** Edit: I'm gobsmacked why someome would vote this down. Really starting to wonder about the reputation system of SO - great idea, but in practice rep has nothing to do with expertise if people downmark (or upmark) threads like this what does it really say?


At 29 you begin to stop looking for answers. Is that because you now have the answers, or because you no longer care?


I'd be more interested in "When did you start programming?" instead of age.

I started programming 30 years ago, but I'm not interested in putting my age on my profile. Sadly, age discrimination is a factor in this industry...


I wrote up a version that uses random sampling over what appears to be the full range of UIDs at the time of this writing.

#!/usr/bin/perl    use strict;  use warnings;    use LWP::Simple;    my $MAX_USER_ID = 55_000;       # Maximum range of UIDs to search  my $MAX_HITS = 100;             # Maximum UIDs to randomly try  my $Count_No_Age_As_Hit = 1;    my %Ages;   # Store the age distribution    $| = 1;  # print immediately    print "Fetching $MAX_HITS users.\n";    my $hits = 0;  until( $hits >= $MAX_HITS ) {      my $uid = int rand $MAX_USER_ID + 1;        my $page = get "http://stackoverflow.com/users/$uid";      unless( $page) {          warn "$uid not found.\n";          next;      }        if( $page =~ m/Age\s*<\/td>\s*<td>\s*(\d+)\s*<\/td>/ ) {          $Ages{$1}++;          $hits++;      }      else {          $Ages{"no age given"}++;          $hits++ if $Count_No_Age_As_Hit;      }        print "#$hits $uid... ";  }  print "\nDone.\n";    for my $age (sort keys %Ages)  {      print "$age: $Ages{$age}\n";  }  

And the results of one run...

19: 1  20: 1  21: 1  23: 1  25: 2  26: 2  27: 3  28: 4  29: 3  32: 1  34: 1  36: 1  43: 1  44: 1  no age given: 77  

From which the only responsible conclusion I can draw is that you can't draw a conclusion. Too many users don't specify an age.


Would be interesting to see the corresponding graph for Yahoo! Answers. Hah.


  • The gap between 66 and 88, woth 8 people with 88 years is quite strange! I think 1920 is the default year when signing up.
  • 8 years old? WOW!!


IMHO it would be much better for performance if you'd replace the call to curl with LWP::UserAgent. Interesting idea though.


Damn, on a site like this I used to be one of those kids in the 13--18 range... now I'm all of 20 years old, but at least this graph lets me still feel like I'm on the young end. LOL, I guess I don't get to complain.


Graphs are only as good as the data going in!

8 & 88 should be discounted seeing there are a couple of very probably explanations.

On the other hand, I can't remember the last time I gave the right year as part of my profile - at least on a half dozen registrations, I'm well in to my 200s...



It's very interesting that the graph you display follows Normal Distribution, aka the The Bell Curve.


Given the age distribution here, I'll have to keep my Frankie Goes to Hollywood references to a bare minimum.


Perhaps at a more advanced age you know all you need to know to maintain your Cobol and Fortran code, and you are not willing to learn yet another language or methodology. Or maybe people in my age group (47) have moved on to other positions e.g. management, sales etc.


I've listed my birthday in my profile as the latest that the system will accept, purely to test the "Autobiographer" badge. 8 years of age is the youngest that the system will allow. Curiously, my question as to why this is so has vanished from the system entirely.

(No, I'm not 8; I just don't want to give my age.)


Here is a data.stackexchange.com query for age distribution Here a graph

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »