Pregnancy Weight Linear Regression with Octave (1)

I had my first idea for a data science project about six months into my last pregnancy. Like the obsessive INTJ I am, I had carefully recorded my morning weighings (on a balance beam scale accurate to the 1/4 pound) in my journal, and knowing that pregnancy weight gain is (supposed to be) roughly linear, I was curious whether I could do a linear regression at that point, use it to predict my end weight, and then see whether it was accurate. Kind of like those booths that will tell you what your toddler will look like in 20 years.

I hunted down the numbers and put them into a two-column list that eventually totaled … 23 entries. Okay, so not exactly big data. But a decent first project nonetheless. Now for the regression.

My first inclination was to use R, seeing as it’s the flagship language of the academic data science community. However, I could not for the life of me get a single R linear regression tutorial to work for me right out of the box (that is, not slogging through the 80 pages of intro before that chapter). Perusing the documentation in tandem proved fruitless — the arguments in the tutorial examples often didn’t match the function description! Plus, the hubby would make fun of me viciously every time he caught me using it.

Anyway, I abandoned R.

At this point, I succumbed to third-trimester nesting and by the time I surfaced from the haze, I was several months postpartum. This time, I had a better idea: Octave.

You know how sometimes you bang your head against a problem for what seems like ever … and then it gives, just like that? Well, Octave was my breakthrough for regression — the very first tutorial I found worked right out of the gate! Remember, never let go.

By now, I had weight data for my entire pregnancy, which both swelled my file to an impressive 49 entries and made the original purpose of my inquiry moot. Nevertheless, I decided to do the project anyway — what’s an education without some pointless busywork anyway? ;)

So let me show you how I did it.

Gather The Data

The first thing I did was gather my data into a file, which looks like this:

13 January 2015,151.75
23 January 2015,152
27 January 2015,148.5
15 February 2015,149.5
25 February 2015,147.75

29 August 2015,174.5
1 September 2015,175.25

You’ll notice if you look at it that the dates are written in my favorite format “dd Month YYYY”, which I favor for its exceptional human readability. Unfortunately, this is a rather unfriendly format for the computer, a most unhuman creature. After a brief attempt to wrangle dates in Octave plots, I pulled out my Swiss Army Chainsaw and converted them to Unix timestamps (an approach later approved of by the husband).

Here’s the (rather uninspired and utilitarian) Perl code:

use strict; use warnings;
use Date::Parse;

open(my $fh, '<', 'weights.csv') || die 'cannot open';
my $weight_list = "";
while (my $line = <$fh>) {
  chomp $line;
  my ($date, $weight) = split(/,/, $line);
  $weight_list .= str2time($date) . ",$weight\n";
open(my $nf, '>', 'data.csv') || die 'cannot open';
print $nf $weight_list;

I put that in a file called (creative I know) and run it like so, perl, in the same directory as your weights.csv. And now we have a parsed data file called data.csv that looks like this:




Now, clearly the most pressing question at this point is, what does a 175 pound pregnancy look like on a 5-foot-2 woman? (For science, of course.) Well, since you asked:

38 weeks 2 days pregnant

No, it was not terribly comfortable. Yes, I do recommend losing more weight in between pregnancies if you can possibly manage it.


Exploratory Graph

You know how they’re always saying that data science is all about visualization, et cetera, et cetera? Well this is our moment. Let’s go ahead and fire up Octave, load our data, and plot it:

> data = load('data.csv')
> plot(data(:,1), data(:,2), 'linestyle', 'none', 'marker', 'o')

You should be seeing something like this,

Exploratory graph of data

which looks quite plausibly linear. Well, at least we’re barking up the right tree. :) In the next post, we’ll linear regress (is that a phrase?) that data.