Tag: data

Winter Cleaning

black classic car inside the garage
Photo by Mike B on Pexels.com

While on winter break I decided to tackle some winter cleaning that I’ve put off for years. I decided to clean up my files and data and organize them into something more useful. This was part of the personal automation effort that I mentioned in my goals for 2023.

I have files that go back to my college days. Looking at my data, I see that the oldest document file I have in my archives goes back to March 10, 1993, when I was junior in college. The data that I have spans just about 30 years. And it was something of a mess.

In order to avoid organizing things, I created “temp” folders to store old stuff. Within those temp folders, I also had “bak” folder with still older stuff. Sometimes, I had multiple copies of these things spread across various parts of my filesystem. Over the break, I decided it was finally time to clean this all up.

A profusion of storage options

To clean things up, I first needed to figure out where to put things. Over the years, I’ve built up a profusion of storage options. Locally, I’ve got a laptop, a Mac Mini and my iPhone. The Mac Mini hosts 2 external disks with 6 TB of storage capacity.

I have an iCloud account with 2 TB of storage. I have Dropbox subscription giving me an additional 2 TB of storage. I have a Office 365 account for the family which gives me somewhere around 5 TB of storage. And I have a Google account with Google Drive and some amount of storage there. And I had files scattered about all of these data sources.

Simplifying my file system

I decided to start by simplifying my file system. I’ve had interesting arc over the years. For a long time, was a strong proponent of cloud storage for the obvious benefits of accessibility. But time and experience has taught me that local access with sufficient backup is most desirable for me, as you never know when a service might go away or be priced out of reach. Also, it’s nice to have everything centralized in one place so that I don’t have to remember which data I store where.

I decided, therefore, that the primary source for my data would be local and that I’d use cloud services as a mechanism for making the data accessible between systems, but not as primary storage.

I decided that there are really three types of data that I work with on a regular basis:

  1. Working documents (source code, spreadsheets, etc.)
  2. Notes (Obsidian, all of my writing, daily notes, diaries, etc.)
  3. Archive (all of my data that in not “active” in the sense that I don’t work with it regularly, but is of great historical value to me)

Working documents

I decided that my working documents would be stored on my local machine, in my Documents folder, and that folder is part of my iCloud Drive, so that whatever is stored there is synchronized with other devices that I use.

Given that one of my goals is to see if I can pare down the tools I use to the smallest possible set1, I found that I was able to consolidate my Documents folder down to just three top-level folder:

Folder in my working documents

The Repos folder contains code repositories for projects I am actively working on–the key word being “actively.”

The Settings folder is where I store various configuration settings, templates, custom fonts I use, and various branding artifacts like profile photos, etc.

Inside my Settings folder

Finally, the Wolfram Mathematica folder contains Wolfram Language notebooks I am actively working on. This folder may go away, however. I found that if I enable the “Detect All File Extension” option in Obsidian, I can store my notebooks there, link to the notebooks from other notes, and open the notebooks directly from the links, which is far more useful to me.

For working documents, this is pretty much all I have.

Notes and writing

All of my notes and writing are stored in Obsidian. These days, this is where the vast majority of my daily output goes. I use the Obsidian Sync service to sync my notes between devices. I’ve found the services to be virtually flawless, fast, and extremely reliable. I’ve currently got about 4,000 notes in my Obsidian vault and Obsidian Sync has been perfect in keeping my devices up-to-date. It works so well, that I basically forget it is even running.

I don’t keep my Obsidian vaults on iCloud. There have been issues with vault synchronization when vaults are stored in cloud services like iCloud or Dropbox, and it can occasionally lead to some odd behavior when those services sync files. Instead, I have a separate Vaults folder on my local machine that is not part of a cloud sync service–except for Obsidian Sync–and all of my vaults are stored within that folder.


As I mentioned, I have files going back 30 years. The most time-consuming part of my winter cleaning was cleaning and organizing that archive, eliminating duplicates, moving things into the archive that belong there and getting rid of things that don’t.

There are really two parts to my archive: Data and Installers. The data portion of my archive is about 15 GB and contains all of the files I’ve worked on over the last 30 years or so. The Installers is much newer than that. Generally, I try to keep copies of older version of software installers whenever I get a new version, so that if I ever run into compatibility issues, I can go back to an older version of a piece of software. This makes up about 8 GB of my archive, so that the total archive currently stands at about 24 GB.

It is important to mention what is not in the archive. Photos and videos I store in Apple Photo which is part of iCloud. I don’t have the time or patience to go through the 30,000 photos and videos I’ve accumulated over the years and pare them down, so I just leave them alone.

I debated where to store my archive. Should I include it iCloud so that it is accessible from all my devices? Or should I keep in one location? After considerable thought, I decided on the latter. My archive is stored on external 3 TB drive connected to my Mac Mini. This drive, along with my Mac Mini, and indeed, all of our household computers, is backed up using CrashPlan, so that if something where to happen to the drive, I wouldn’t lose any data. (There are additional redundancies in place for the archive, but that is a topic for a separate post.) Given that I don’t access the archive regularly, it didn’t seem necessary to keep it in an active cloud service.

Next, I had to figure out how to organize the archive. Over the years, I’ve played around with all kinds of organizational structures, including, most recently, PARA, or even no organization and relying on search functionality to find what I am looking for. But I decided to go old-school and use a more traditional, hierarchical structure to organize the archive instead. The reason is that more and more, I think about how my family would access this data if I wasn’t around. Structures like PARA or arbitrary searches might not work for them. A more obvious hierarchy of topics would be more useful.

Ultimately, I ended up with the following structure:

Structure of my archive

I tend to sort things from most recently to least recently modified, which explains the order in which the folders appear. The “Photos” folder is not my Apple photos, but rather curated photos that I’ve specifically moved into the archive. Many of these folders contains sub-folders. Here, for instance, is what it looks like inside my Writing folder:

Inside my Writing folder in my archive

The archive contains all of my personal and work email. I tend to perform these archival functions annually and then zip up the resulting email archives for storage. The earliest email message I have in my archive dates to October 17, 1994.

The archive also contains big social media archives. For instance, when I stopped using Evernote in favor of Obsidian, I did an expert of all of my Evernote data to an archive. Similarly, when I stopped using Facebook, I archived all of my Facebook data. And when things were looking iffy with Twitter, I grabbed an archive of my Twitter data as well. These are all within the Cloud Services section of my archive.

I rely heavily on the file meta-data for finding what I am looking for, particularly filename, modification and creation dates. It is helpful that I’ve kept the original files in many cases because it makes it easier to search the archive in a given timeframe. As I mentioned, some of my files go back as far as 30 years. Here is an example of a few of my files from 30 years ago:

Some 30-year old files in my archive

Cleaning up cloud services

In centralizing my files locally, I also took the opportunity to clean up what I had on the various cloud services I use. I had a lot of random stuff on Dropbox that I moved to my archive because I rarely access it these days. Instead, I now use Dropbox as a convenient way to share files, and for some application settings where the application prefers Dropbox over other services for its settings. Also, Dropbox is where my writer’s group stores its stuff, so it is convenient to keep it around. But what I have there is mostly ephemeral now.

We have a family OneDrive from Office 365 and there are a few files I’ve stored there for convenience, but I rarely use Office tools these days outside of work. I moved much of the writing-related documents I had in OneDrive to my archive. What is left there is a few things that are shared between family members.

Still to-do

For several years, between 2013-2016, I used Google Drive almost exclusively for my writing. This is one place that I have yet to tackle cleanup. It is a mess and I imagine it will be a challenge to get it all cleaned up. It should be made easier by the fact that much of what I wrote there should already be in my archive. But it will take time, and I may end up putting off this task until next winter. One indicator of whether I need something is how frequently I access it, and I haven’t needed to access my files in Google Drive in a long time.


I’ve had a robust strategy for backing up my data, and indeed, all of my family’s data for a long time now. But as it is somewhat off-topic for this post, I’ll save the details of the backup strategy for later.

A feeling of relief

It is amazing what a winter cleaning like this does for the soul. When i completed it, when I had everything setup the way I wanted it, a feeling of relief washed over me. It was a similar feeling to looking over a freshly mown lawn, or a recently cleaned desk surface. Everything was in its place, and everything had a place to go. It’s nice to know that when I create something, there is a clear and obvious place to put it.

It was also a relief to know that I’d finally organized my archive and eliminated all of the duplicate files there.

I am now working on a README file with a target audience of my family that should make it quick and easy for them to find something in the archive in my absence.

Written on 15 January 2023.

Did you enjoy this post?
If so, consider subscribing to the blog using the form below or clicking on the button below to follow the blog. And consider telling a friend about it. Already a reader or subscriber to the blog? Thanks for reading!

Follow Jamie Todd Rubin on WordPress.com

  1. I mentioned two tools in my Goals post, Obsidian and Mathematica/Wolfram Language

Automation and the Power of Process Improvement

Three recent experiences remind me that automation for the sake of automation doesn’t really do much. But if automation can be used to improve processes, eliminate repetition, and redundancy, then it is well worth investing the time to improve the automation. It is a personal pet peeve of mine whenever I have to supply the same piece of information more than one in a given transaction, especially when that data is available somewhere else. Here are two small failures, and one small success to illustrate where we stand with automation and process improvement in day-to-day tasks.

“Would you like to apply for a Target Red Card?”

The complex we live in borders a Target and Safeway shopping center. This has been very convenient. We can walk to the store. It means we probably go more frequently than we need to. And because Target has pretty much everything, we go there quite a bit. Eventually, we decided to get one of their Red Card credit cards because we save 5% on every purchase. Once we had the card, I set it up to pay the bill in full each month. There is no point in saving 5% on purchases if you are paying 19% interest. So we get a nice benefit on every purchase we make from Target. So far, so good.

Recently, however, it seems that Target is really pushing the use of the Red Card to the exclusion of all logic. For instance, on several occasions, I’d put my items on the conveyor, slide my Red Card and wait for the total.

“Would you like to apply for a Target Red Card?” the cashier asked.

I blinked. “Well,” I said, holding up my Target Red Card, “I just paid with mine. Do I need another?”

The cashier laughed and we each went about our day.

But it happened again on the next visit with a different cashier. And then again. And again.

Finally, I said to the cashier, “You guys have been asking this quite a bit. I pay with my Red Card every time I come here and you always ask me if I want to apply for a Red Card. Isn’t there something on your screen that tells you that I have paid with a Red Card?”

The cashier said, “We are told to ask everyone, even if you already have a Red Card?”

“What sense does that make? If I say, ‘yes’ and apply for a second Red Card what would happen?”

The cashier just shrugged.

I don’t mind being asked this once or twice, but every single time I come to Target, and when I have already swiped my Red Card? That seems like a major breakdown, not just in a logical process, but in customer service.

“Can you fill out these forms?”

I took the Little Man to the doctor the other day. There is a nice touch-screen system to check in, and pay your co-pay, if one is due. It’s a nice piece of automation. But it failed this time. The system told me to see the desk. So I saw the receptionist and he told me that since it was the Little Man’s first visit this year, I had to fill out some forms. He handed me three forms.

The forms were all standard information. Parent names, addresses, phone numbers, insurance company information. I filled them out at a slow burn because I knew what would happen. I’d turn the papers back to the receptionist and he would key in my responses to the central system. So not only was I entering the information, but he was entering the information. He had to parse my handwriting, increasing the chance that some of the data would be entered incorrectly.

Read more