Tracking the Things I Make with GitHub

[I] spend my days writing. In my day job, I write emails, I write code, I write requirements documents. Outside my day job I write stories, and articles, and blog posts. Whether I am writing code, or writing a blog post, the result is something to which I can point and say, “Hey, I made that!”

For a few months now, I have been fascinated by the idea of tracking the lifecycle of my work products. I can do this because all of my work products are essentially digital, and essentially plain text. When I write code, the files containing the code are just text files. When I write blog posts, stories, or articles, the files that containing the posts, stories, and articles are, at their heart, text files. This means that at the most basic level, all of the things I make are the same format. The text within the files may include markup, like HTML, or RTF, but they are all readable with simple text editors.

I have frequently used Git to track the things I make at the day job, but for a long time, my Git checkins were limited to just code. Recently, I have started to use Git to track other things I made, like requirements documents, and specifications. In my writing, I used Google Docs for a long time. When my 825 consecutive day writing streak came to an end, I decided I wanted a change, and moved back to Scrivener. At its heart, the text I write in Scrivener is stored as RTF files. I have been using GitHub to track my work on these things.

I like GitHub because it provides a single place to track everything I make. Like most revision control systems, GitHub captures just the changes between checkins, which is efficient. It also makes it easy to see how the things I write evolve over time. I can look at an initial draft of blog post and easily compare it to the final draft by running a command. I can see any intermediary steps. I can see what text I decided to delete or what words I decided to change, or add at the last minute.

Using GitHub with Scrivener

Experimenting, I decided to see how well GitHub would work with Scrivener. At its core, the main writing files within a Scrivener package (a .scriv file) are rich-text format files. I created a Scrivener project for my blog. Each post I write is a new file within the Scrivener project

When I complete a draft, I check it in to GitHub. I am a command-line person and so I use Git commands to perform the check-in, but there are GUI tools available for this as well. For instance, for this post:

The short comment lets me know what I was working on. If I want to see what I worked on in a particular day, I can look at the Git log for the project. If I want to see what I’d done since yesterday, for instance:

By default, all GitHub repositories are public. In order to keep these things private until I am ready to publish them, I have a paid GitHub account which gives me access to private repositories. Private repositories are not necessary for everyone, but as someone who writes fiction and nonfiction for other markets, it is important that my work has not been previously published elsewhere. That includes previously published on the web. Private repos ensure that my writing is not published until I am ready to publish it.

All of my “work products” are now captured the same way, and that makes it easy to track what I make. And because all I really care about are the text files that Scrivener produces, I have created a .gitignore file that ignores other types of files in a Scrivener project. You can find a gist of my .gitignore file for Scrivener here.

One nice thing about my Google Docs Writing Tracker was that it automated the process of tracking my daily word counts, streaks and other statistics about my writing. I am working on a similar tool that does for Scrivener what my Google Docs Writing Tracker does for Google Docs—except it uses my GitHub repos as the source of data. I’m still in the early stages of this, but I’ll post more when I have something that works consistently.

I am hoping that by the end of 2016, I’ll have an automated report that I can point people to that will summarize everything I made in 2016, along with all of the interesting stats. I can do this because everything related to my writing now goes into GitHub.

12 responses to “Tracking the Things I Make with GitHub”

Ben Wilson

December 23, 2015

Bitbucket (yet-another-git-repository) has private accounts for free (under 5 concurrent users). That said, you’ll lose Travis CI support in BB, and the interface doesn’t feel as friendly.

Rampant Badger

December 24, 2015

I, too, use Bitbucket for the free private repos. But I was writing because I like that command prompt. I looked in your dotfiles repo, but it is empty. Can you share that prompt?

Jamie Todd Rubin

December 24, 2015

I just pushed my .bash_profile out to the dotfiles repo. The prompt is the line that begins “PS1=”.

Loading…

Reply

Ben Wilson

December 25, 2015

Jamie, this post really erks me. I thought I couldn’t manipulate Scrivener the way I needed. You challenged me to think otherwise. With the information below, and realizing I can manipulate the raw RTF, I have very few reasons to revert to my cobbled toolchain.

you mentioned wanting to pull statistics out of Scrivener. Check out the XML file in the project folder (project.scriv/project.scrivx) and search for:

When you look at your git history, you’ll see this is updated daily but there does not appear to be a long-term history…just the last session. I notice that this record is updated at every auto-save (for me 5-seconds). In the Project Targets Panel (⇧⌘-T), you can have this reset at midnight. Then, all you would need is a cron that runs a few minutes before midnight (like https://gist.github.com/Merovex/38a5d12dcf043be97c9e).

Then, you can parse to your heart’s content in your preferred way, even google JS.

I tested, and it looks like the session setting remains and increments between closing Scrivener. You can set to reset midnightly or “next day opened,” either of which should preserve the Session count until you’ve harvested the data.

There is also a Draft Target that tracks total progress on the draft, which could also end itself to tracking.

Jamie Todd Rubin

December 26, 2015

Ben, see my reply to Michael above. Most of what I want to do an be achieved with simple git commands and a few UNIX commands to parse the results. No need to read the XML files in Scrivener to get this to work.

Loading…

Reply

Michael Cummings

December 26, 2015

FWIW – as a stopgap, I still have the scripts I used to put scrivener word counts into google so that your original google wordcount scripts could work with them (mostly just making sure to setup text exports and a few checkboxes). A few tweaks (renaming the files along the way to avoid namespace collisions, etc.), but mostly it can slide into place without too much effort. I realize there’s a joy in making it work a new way via git, only offering in case you need to count words in the interim 🙂

I of course went off the rails, switched back to Linux when my mac coulldn’t cope any more, and drank the markdown kool-aid 🙂

Jamie Todd Rubin

December 26, 2015

I am taking a somewhat simpler approach. By running a git diff on the last commit of each day, I can easily parse the output for the words I added and removed each day across multiple projects. For instance, I just ran a simple test in Scrivener where I added a file with a single line of text:

This is a test.

I committed this change to GitHub, and then changed the file to look as follows:

This was a test. Here is a new line.

I committed the change, then ran a git diff command against the two commits. Here is the output of that command:

From this output, it would be trivial to parse the word count differences each day. A simple example of how many words were removed in the diff might look like this:

git diff 0d71bb 3ae023 | grep ^+ | grep -v ^+++ | wc -w

This gives me 6 words, because there is some RTF in the file. If you parsed out the RTF from the file first, you’d get the raw counts. The beauty of this approach is that it works across all projects and any date range you choose. You just have to be able to identify the first and last dates of the range.

Loading…

Reply
1. Michael Cummings
  
  December 26, 2015
  
  Ah – I went for the lame synch to external folder option (synching as text), then just pointing my local scripts at the raw text files. Are you using a local git repo or private hosting on github? (Just curious – I know github has some nice features you can’t do locally, vs having it all on-hand).
  
  Loading…
  
  Reply
  1. Jamie Todd Rubin
    
    December 26, 2015
    
    I’m using a private repo in GitHub for my writing since that prevents it from being “published” and preserves “first rights” when I go to sell pieces.
    
    Loading…
    
    Reply

Ben Wilson

December 26, 2015

Jamie, the gist I provide is one line of Unix GREP that gives you exactly what you really want which is daily word count.

My example greps out the one line in the XML that Scrivener uses to store session progress, and populates a logfile. By Scrivener configuration, that value can reset at midnight. By using the cron example that runs right before midnight, you’re always catching daily progress…even when you’re sick and forget to commit. (Another grep could pull out the cumulative progress, but that’s outside scope.)

My example assumes that Scrivener has calculated the word count without RTF artifacts, implicitly trusting the tool. It also recognizes that there are RTF files that are non-manuscript that you would not want to count that that Scrivener automatically excludes from the calculation (e.g., the Research section); and that you can use Scrivener to fine-tune which other files are included in the word count calculation (by including the compile-list only). Thus, you use Scrivener instead of a script to manage what is counted.

In your example, you’re looking at all RTF files, which could include copy/paste research, frontmatter, and other false-positives. And without parsing the same XML to get the manifest, you won’t know which files to include/exclude.

Come to the dark side and try my gist.

The Scrivener developer(s) have already done 95 percent of what you’re going to code in Unix. Don’t make it harder than it needs to be. 🙂

This is coming from a guy who wrote an entire toolchain in Ruby to publish LaTeX to PDF/ePUB/MOBI. One of my justifications was “better able to manage wordcount.” Your article challenged me to find that I was wrong.

Jamie Todd Rubin

December 26, 2015

You’ve convinced me! I will take a look and give it a try.

Loading…

Reply
1. Ben Wilson
  
  December 26, 2015
  
  http://cdn.meme.am/instances/42724613.jpg
  
  Loading…
  
  Reply

Ben Wilson

December 23, 2015

Bitbucket (yet-another-git-repository) has private accounts for free (under 5 concurrent users). That said, you’ll lose Travis CI support in BB, and the interface doesn’t feel as friendly.

Loading…

Rampant Badger

December 24, 2015

I, too, use Bitbucket for the free private repos. But I was writing because I like that command prompt. I looked in your dotfiles repo, but it is empty. Can you share that prompt?

Loading…

1. Jamie Todd Rubin
  
  December 24, 2015
  
  I just pushed my .bash_profile out to the dotfiles repo. The prompt is the line that begins “PS1=”.
  
  Loading…
  
Ben Wilson

December 25, 2015

Jamie, this post really erks me. I thought I couldn’t manipulate Scrivener the way I needed. You challenged me to think otherwise. With the information below, and realizing I can manipulate the raw RTF, I have very few reasons to revert to my cobbled toolchain.

you mentioned wanting to pull statistics out of Scrivener. Check out the XML file in the project folder (project.scriv/project.scrivx) and search for:

When you look at your git history, you’ll see this is updated daily but there does not appear to be a long-term history…just the last session. I notice that this record is updated at every auto-save (for me 5-seconds). In the Project Targets Panel (⇧⌘-T), you can have this reset at midnight. Then, all you would need is a cron that runs a few minutes before midnight (like https://gist.github.com/Merovex/38a5d12dcf043be97c9e).

Then, you can parse to your heart’s content in your preferred way, even google JS.

I tested, and it looks like the session setting remains and increments between closing Scrivener. You can set to reset midnightly or “next day opened,” either of which should preserve the Session count until you’ve harvested the data.

There is also a Draft Target that tracks total progress on the draft, which could also end itself to tracking.

Loading…

1. Jamie Todd Rubin
  
  December 26, 2015
  
  Ben, see my reply to Michael above. Most of what I want to do an be achieved with simple git commands and a few UNIX commands to parse the results. No need to read the XML files in Scrivener to get this to work.
  
  Loading…
  
Michael Cummings

December 26, 2015

FWIW – as a stopgap, I still have the scripts I used to put scrivener word counts into google so that your original google wordcount scripts could work with them (mostly just making sure to setup text exports and a few checkboxes). A few tweaks (renaming the files along the way to avoid namespace collisions, etc.), but mostly it can slide into place without too much effort. I realize there’s a joy in making it work a new way via git, only offering in case you need to count words in the interim 🙂

I of course went off the rails, switched back to Linux when my mac coulldn’t cope any more, and drank the markdown kool-aid 🙂

Loading…

1. Jamie Todd Rubin
  
  December 26, 2015
  
  I am taking a somewhat simpler approach. By running a git diff on the last commit of each day, I can easily parse the output for the words I added and removed each day across multiple projects. For instance, I just ran a simple test in Scrivener where I added a file with a single line of text:
  
  This is a test.
  
  I committed this change to GitHub, and then changed the file to look as follows:
  
  This was a test. Here is a new line.
  
  I committed the change, then ran a git diff command against the two commits. Here is the output of that command:
  
  From this output, it would be trivial to parse the word count differences each day. A simple example of how many words were removed in the diff might look like this:
  
  git diff 0d71bb 3ae023 | grep ^+ | grep -v ^+++ | wc -w
  
  This gives me 6 words, because there is some RTF in the file. If you parsed out the RTF from the file first, you’d get the raw counts. The beauty of this approach is that it works across all projects and any date range you choose. You just have to be able to identify the first and last dates of the range.
  
  Loading…
  
  1. Michael Cummings
    
    December 26, 2015
    
    Ah – I went for the lame synch to external folder option (synching as text), then just pointing my local scripts at the raw text files. Are you using a local git repo or private hosting on github? (Just curious – I know github has some nice features you can’t do locally, vs having it all on-hand).
    
    Loading…
    
    1. Jamie Todd Rubin
      
      December 26, 2015
      
      I’m using a private repo in GitHub for my writing since that prevents it from being “published” and preserves “first rights” when I go to sell pieces.
      
      Loading…
      
Ben Wilson

December 26, 2015

Jamie, the gist I provide is one line of Unix GREP that gives you exactly what you really want which is daily word count.

My example greps out the one line in the XML that Scrivener uses to store session progress, and populates a logfile. By Scrivener configuration, that value can reset at midnight. By using the cron example that runs right before midnight, you’re always catching daily progress…even when you’re sick and forget to commit. (Another grep could pull out the cumulative progress, but that’s outside scope.)

My example assumes that Scrivener has calculated the word count without RTF artifacts, implicitly trusting the tool. It also recognizes that there are RTF files that are non-manuscript that you would not want to count that that Scrivener automatically excludes from the calculation (e.g., the Research section); and that you can use Scrivener to fine-tune which other files are included in the word count calculation (by including the compile-list only). Thus, you use Scrivener instead of a script to manage what is counted.

In your example, you’re looking at all RTF files, which could include copy/paste research, frontmatter, and other false-positives. And without parsing the same XML to get the manifest, you won’t know which files to include/exclude.

Come to the dark side and try my gist.

The Scrivener developer(s) have already done 95 percent of what you’re going to code in Unix. Don’t make it harder than it needs to be. 🙂

This is coming from a guy who wrote an entire toolchain in Ruby to publish LaTeX to PDF/ePUB/MOBI. One of my justifications was “better able to manage wordcount.” Your article challenged me to find that I was wrong.

Loading…

1. Jamie Todd Rubin
  
  December 26, 2015
  
  You’ve convinced me! I will take a look and give it a try.
  
  Loading…
  
  1. Ben Wilson
    
    December 26, 2015
    
    http://cdn.meme.am/instances/42724613.jpg
    
    Loading…

CommentsCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Jamie Todd Rubin

Tracking the Things I Make with GitHub

Using GitHub with Scrivener

Like this:

Related posts

12 responses to “Tracking the Things I Make with GitHub”

CommentsCancel reply

Tracking the Things I Make with GitHub

Using GitHub with Scrivener

Like this:

Related posts

12 responses to “Tracking the Things I Make with GitHub”

CommentsCancel reply

Discover more from Jamie Todd Rubin