`ark`: A Personal Archive System, Part 2: A Day with the Archives

This is the second in a new series describing ark, a command-line-based personal archive system I have developed and am using as my primary archiving tool. If you are not interested in this type of tech post, feel free to skip it. For those who are interested, a plan to provide a new entry in the series each Tuesday for the next 13 weeks.

“Pivot! Pivot! PIVOT!” — Ross Geller

I had planned to write about ark‘s store and how it manages the items in the archive today. But it occurred to me that in writing about ark last week, I didn’t really show any of the system. Instead, I told a lot about the system. And as they say in the writing world: show, don’t tell. So this week I’ve opted to show what ark looks like so you get a tour before seeing the plumbing. Part 1 made the case for the archivist’s lens. Part 2 shows what it looks like. I’ve cast this as seven “scenes” through a day — not necessarily a literal day, but seven realistic moments, stitched into one. The post looks long only because of the archive output for the examples I’ve provided.

Scene 1 – Morning: Catching Up

ark has a wide range of items throughout my life. I need an easy way to see what’s there for a given day, and this is where the ark day command comes in handy. Here is what it looks like for the day before I began writing this post:

			
ark day yesterday
# Saturday, May 30, 2026  ·  yesterday
📍 Arlington, VA  ·  🌤 Partly cloudy  73°/63°F
💪 7,405 steps  6h 37m sleep  72 bpm  HRV 38 ms
   SpO2 95.4%  VO2max 35.3
## Calendar (2 events)
[1]  dare to dream  *Family*
[2]  zach soccer  *Family*
## Diary (1 entry)
[3] Evening Notes
    A beautiful day today. From the time I woke up and headed out for my walk, through dinner on the deck with Kelly and the…
## Reading (3)
[4] *The Complete Essays* by Michel de Montaigne  (started today)
[5] *In Memory Yet Green* by Isaac Asimov  (finished today ✓)  ✎4
[6] *Proto* by Laura Spinney  (started 2 days ago)
## Reminders (1 open)
[7] Check into my flight  *My Tasks*  ⚠ overdue 1 day
## Email (37 messages — none from known contacts, 30 filtered)
… 37 other messages  (use --all-emails to see all)
## People
[8] Kelly R.  (in 3 documents today)
[9] Eric S.  (in 2 documents today)
[10] Lisa K.  (in 1 document today)
[11] Zachary R.  (in 1 document today)
## Messages (7 messages)
[12] Kelly R. —   (+2 messages)
[13] Eric S. — Yeah but mine's not on your...  (+1 message, group)
[14] Zachary R. — https://www.youtube.com/watch?v=0h2jZtuRuic
[15] Lisa K. — Maybe he’s role playing  (group)

		

A few things to note here. First, ark is pulling information from the archive from a variety of sources. The header information contains the date (and how long it has been from that date to the time the command was run). It has my location, which ark captures either explicitly through artifacts in the archive, or through an inferred default location over time. The health information (steps, sleep, etc) comes from Apple Health.

It has seven sections that follow that try to provide a high level picture of the day: what was on the calendar and diary entries lead the way. The numbers ([1]) are sigils that can be used with other ark commands in place of longer document IDs. So I can type ark show @3 to see the full diary entry.

Reading is important to me so I can see what I was reading on that day, including what I started reading and what I finished. Any reminders from Apple Reminders show up for that day. Also email, but note that the email is filtered in this view to show only mail from known contacts, and despite getting 37 messages, none of them were from my known contacts that I’d want appearing here.

There is a list of people I interacted with on that day, or who appear in documents that were ingested on that day, followed by a list of text messages (or message groups) that took place on that day.

The command provides an easy way to get a view of any day in the archive. I could also have ark summarize the day for me via ark day 2026-05-30 --summarize. Like most ark commands, there are plenty of options for filtering and controlling the output. Here is what ark day --help looks like:

			
usage: ark day [-h] [--on-this-day] [--deep] [--all-emails] [--all-calendars] [--verbose] [--full]
               [--format {table,json}] [--bundle] [--summarize] [--regenerate]
               [DATE]
Compact daily digest — every item gets a [N] sigil for follow-on commands.
positional arguments:
  DATE                  Date to view (YYYY-MM-DD, 'today', 'yesterday'). Default: today
options:
  -h, --help            show this help message and exit
  --on-this-day         Include documents from the same date in other years
  --deep                Include low-priority items (music plays, browser visits)
  --all-emails          Show all emails, ignoring contact and blocklist filters
  --all-calendars       Show events from all calendars, ignoring the day.calendar.include filter
  --verbose             Full narrative output (old default behavior)
  --full                Show full document text instead of excerpts (--verbose mode)
  --format {table,json}
                        Output format: table (default) or json
  --bundle              Output as ark bundle format (pipe to ark task)
  --summarize           Display or generate a narrative summary of the day via LLM
  --regenerate          Force regeneration of the day summary (implies --summarize)
Subcommands:
  ark day rank [DATE] <-2..+2> [--note NOTE]   # set the Jim-Collins-style day-rank
  ark day rank [DATE] --clear                  # remove a rank
  ark day rank [DATE]                          # show the rank for a date
  ark day ranks [--since 30d|--year YYYY]      # chart + summary of recorded ranks
See 'ark day rank --help' or 'ark day ranks --help' for details.

		

Scene 2 – Triage: What Wants Attention

With a critical mass in the archive, and new items always coming in, it was a logical next step to make use of all of that information. After reviewing the day, I might make use of ark‘s “stewardship” layer. ark reviews items in the archive, as well as items coming into the archive, and identifies things that require my attention or action. From the CLI I can get a digest of this as follows:

			
ark prepare digest
# Stewardship digest — 2026-05-31
_act today: 0 · aging: 2 · outbox: 0 · stalled: 0 · review: 1_
**Yesterday (2026-05-30) unranked.**  Rate it: `ark day rank yesterday <-2..+2>` (or skip — auto-clears once ranked).
## Aging into Q1
- **[orange] Pay Arlington County utilities bill due 06/16/26** — Arlington County Utilities Services Office
  age *10d* · Q2 · *reminder*  (`ark prepare show 722866`)
- **[orange] Complete patient experience survey for my doctor's office** — (medical provider)
  age *8d* · Q4 · *reminder*  (`ark prepare show 724994`)
## Review (low-confidence classifications)
- **Prepare for upcoming trip to Boston per American Airlines reminder** — American Airlines
  conf *0.70* · *reminder*  (`ark prepare show 727451`)

		

or in TUI:

"Same digest in TUI view" — Same digest in TUI view

Here, I can see at a glance things that require my attention and take action on them¹. ark uses the Eisenhower matrix to put actions into quadrants around (1) Do First, (2) Schedule, (3) Delegate, and (4) Delete. That is what those Q1, Q2, etc., represent.

ark knows how to dedupe these items as they come into the archive so that if I get 3 reminder emails about the same thing, the thing only shows up once in the digest (even though all 3 emails are stored in the archive).

Working through the day, ark prepare digest is a nice follow-up to looking at the day or the previous day to get the context of what happened and what needs to happen.

Scene 3 – Looking Back: Finding What’s In the Archive

Let’s face it, a big reason for having an archive is so that you can search it. ark has sophisticated search capabilities that allow for a rich search experience. First, ark can do full-text searches of anything in the archive, whether text, PDFs, Office documents, email, text message, annotations, even photos. That’s because when something enters the archive it it goes through a process that “enriches” what comes in making anything searchable.

Let’s start with a regular full-text search. Here, I am going to search for items in the archive with the words “tax 2025”. The other parameters in the command tell it to limit the results to 10 items, to leave out snippets of the items, and to show how long it took the search to run:

			
ark search "tax 2025" --limit 10 --no-snippet --timing
Searching: 'tax 2025'
[1]   557281  imessage  2025-12-19  iMessage to High School Gang — 2025-12-19: He can now claim his tax-free…
[2]   151377  pdf       2025-08-09  Holiday Inn Express Neptune Receipt — 2025-08-07
[3]   152738  pdf       2025-01-07  Holiday Inn Express Florence Hotel Receipt — 2025-01-06
[4]   413583  pdf       2026-03-01  2025 St. Ann Faith Formation Donation Tax Statement
[5]   149508  pdf       2026-01-02  Holiday Inn Express and Suites Boynton Beach West — 2025-12-31
[6]   151497  pdf       2025-08-02  Holiday Inn Express and Suites Dickson City-Scranton Receipt — 2025-08-01
[7]   165270  email     2026-03-01  2025 Tax documents have been uploaded!
[8]   165269  email     2026-03-01  Re: 2025 Tax documents have been uploaded!
[9]   413659  pdf       2026-03-25  2025 Tax Summary
[10]  146864  email     2026-03-19  Prepare for tax time - access your 2025 prescription spending report
# 10 results
# query: 295ms

		

Not a bad mix of items, but probably, not quite was I was looking for, either. The search ran in about quarter of a second, but the results were meh. I was really looking for tax-related documents, not documents with the words “tax” and “2025” in them.

This is where a semantic search comes in. Semantic searches allow searching for meaning.

			
ark search "documents I'll need at year's end" --semantic --from 2026-01-01 --timing --limit 10
Semantic search: "documents I'll need at year's end"
[1]   148828  email           2026-01-23  Your 2025 tax documents are ready
[2]   413602  pdf             2026-03-01  2025 Tax Organizer - Completed
[3]   165270  email           2026-03-01  2025 Tax documents have been uploaded!
[4]   708408  pdf             2026-05-12  Brokerage Tax Package 10 — 2024 Tax Year
[5]   145936  email           2026-01-08  Important information about the year-end tax statement for your ho…
[6]   165269  email           2026-03-01  Re: 2025 Tax documents have been uploaded!
[7]   165261  email           2026-03-26  Re: 2025 Tax documents have been uploaded!
[8]   614083  calendar_event  2026-02-11  discuss CDMS remaining use cases
[9]   413598  pdf             2026-02-04  Form 1099-G — 2024 Tax Year
[10]  413659  pdf             2026-03-25  2025 Tax Summary
# 10 results
# query: 1052ms

		

In this case, I searched for “documents I’ll need at the end of the year” added a filter from the beginning of the year, and added the --semantic parameter to tell ark do perform a semantic search. The results are much closer to what I was looking for. They are tax documents or tax-adjacent documents when compared with the simple full-text search. The trade-off is (a) more time during ingestion to give each item embeddings² that can be used by semantic search, and (b) a slightly longer time to execute the query (because it uses AI) — just over 1 second compared to 0.250 seconds. But even against an archive with 700,000 items, it is a negligible change in speed and the value of the quality of the search far outweighs it.

Scene 4: A Conversation with My Archive via Claude Desktop (MCP)

With more than 7,300 posts here written over 20+ years, I can’t always remember if I’ve written about something before. For instance, as I write this, I am about to head up to Boston for a conference. In reviewing the profiles of the various attendees, I couldn’t help being bemused by the wild garden of jargon on display. Immediately I thought: great topic for the blog. But I’ve written a lot over the years and it was entirely possible I’d written about this before. Since all of my blog posts are captured in ark, I decided to check there first.

One of the great things about the design of the archive is that many of the commands in the system are exposed as tools to LLMs via an MCP server. That means I don’t always have to sit at the command line. I can go into Claude Desktop, for instance, and ask it directly: “I want to write a blog post about how business jargon annoys me. What have I written about jargon in the past?”

If you can’t read the screenshot, here is what Claude said in answer to my question:

Searched memory

The conversation search didn’t turn up anything specifically about jargon. Let me try the ark search directly, since that’s where your blog posts live.

Used ark integration, loaded tools

There’s a very promising hit: “Meaningless Statements, Vague Gibberish, and Useless Qualifications” (2021) and “What not to do in a resume” (2006). Also that “AI jargon” email from 2025. Let me pull the relevant ones.

Used ark integration

You’ve got two solid posts in the archive that directly touch on jargon. Here’s what I found:

“Meaningless Statements, Vague Gibberish, and Useless Qualifications” (March 10, 2021) — Your most directly relevant piece. It was prompted by a spam email pitching IT services that was so thick with buzzwords it meant nothing. You used the Foundation passage where Holk’s symbolic logic analysis of a politician’s speech finds, after eliminating all the meaningless statements and vague gibberish, that nothing is left. You particularly skewered “talk synergies,” looked it up in Merriam-Webster, and imagined an email parser that auto-replies with Strunk & White’s Rule 17: Omit needless words. The post got three comments.

“What not to do in a resume” (November 30, 2006) — This is a broader piece about resume writing, but item #4 is explicitly about jargon: “Don’t overuse jargon.” You wrote that you grow annoyed when bullet points read like a foreign language, and gave the example “Implemented best-of-breed, open-standards ITIL-based WEP-encryption application” as something that sounds ridiculous even if true.

So you’ve touched on this twice — once obliquely (resumes) and once head-on (the spam email). The 2021 post has the richest material, with the Asimov hook being particularly strong. For a new blog post, you might want to build on that Foundation framing more broadly, or shift the lens from spam email specifically to workplace communication generally. The 2021 post was reactive; a new one could be more systematic — here’s what jargon actually costs us, here are the worst offenders, here’s what plain language looks like instead.

So right away, I know that I’ve already written about jargon on the blog at least twice before, the last time in 2021 — so maybe it is time for something new.

I wanted to point out the line where Claude said “Used ark integration, loaded tools.” When Claude couldn’t find anything in our conversations about jargon, it immediately knew to use the ark tool provided via the MCP integration to check the archive, and it used the correct search terms to find previous blog posts about jargon. All I had to do was ask my question.

This is easier to use than the search commands on the CLI, but they also serve different purposes. The CLI tools can be used in conjunction with other tools and commands through pipes, redirects, and scripts. Using Claude lets me interact with the archive as part of other work I am doing. It also allows me to follow up on things. For instance, I might wonder how my writing style changes in the 15 years that separates the two posts.

It is cut-off in the screenshot, but here is the short version at the end of Claude’s response:

Short version: 2006 Jamie is a practical manager sharing hard-won tips. 2021 Jamie is a writer using an annoyance as a lens to say something about language, meaning, and the value of clarity — and reaching for Asimov to make the point stick.

Keep in mind, I could do this with any of the 700,000 documents in the archive. That said, there are some limits with the MCP, all intentionally imposed.

All documents in the archive get a sensitivity level. Any documents with a “Restricted” sensitivity are never returned to the MCP. These documents never leave the local machine. If I want LLM summaries of these documents, I can get them from the CLI, which will fall back to a local LLM that runs on my laptop.
Documents marked “Sensitive” are swept for sensitive information and that information is redacted before returning it to the MCP. For instance, if an SSN, account number, phone number, etc., appears in a document, those all get replaced with “[REDACTED]” before the document is returned to the MCP.
All documents are swept for a known list of sensitive information (SSNs, etc.) and are redacted before leaving the local machine, regardless of sensitivity.

Still, I can converse with about 90% of the documents that are in the archive, and that can be fun. For instance, I asked Claude what music I was listening to on past May 31sts³, and it came back with this:

The best part about the MCP integration, for me, is that I use Claude as a kind of personal assistant and with access to my archive, Claude is much more effective in doing things for me because it has a rich context to draw from.

Scene 5 – Adding My Voice: The Annotation System

It is one thing to have artifacts in an archive. But real value is added to the archive when you can layer in context to those artifacts. This is where the annotation system comes in. When items are pulled into the archive, they become read-only and cannot be changed⁴. The annotation system provides a layer atop the item itself to provide context to that item. When showing that item in ark, any annotations that are associated with the item are displayed as part of the output. Annotations are available in searches, and there are multiple ways of making annotations.

Directly through the CLI via the ark annotate command. This command uses the default editor set in your shell⁵.
Redirected into the ark annotate command. For instance, if I made a bunch of notes in a files called notes.txt, I could do this as the command line: cat notes.txt | ark annotate document 1234 to add the annotation to document 1234.
Via the MCP. Yes, I can ask Claude to annotate for me.

I have a book club meeting this evening. Our book was Wild Dark Shore by Charlotte McConaghy. ark captures my books and reading so I can search for the book in ark as follows:

			
ark search "wild dark shore" --type book
Searching: 'wild dark shore'
[1]  725665  book    Wild Dark Shore  ✎2
   ↳ Wild Dark Shore Charlotte McConaghy A family on a remote island. A mysterious woman washed ashore. A rising storm …
# 1 result

		

That [1] on the line with the matching record is a “sigil”. It is a shortcut reference to the item that I can use in future commands (until another search is run) instead of having to provide the full id (725665 in this case) each time I want to refer to it. When a book is added, ark searches sources on the Internet to pull in metadata about the book. When I look at the record via ark show here is what I see:

			
ark show @1
id:                725665
title:             Wild Dark Shore
doc_type:          book
record_class:      document
date_authored:
date_precision:    undated
date_ingested:     2026-05-24T14:29:23.876608+00:00
metadata_updated:  2026-05-24T14:31:12.699497+00:00
series:
sub_series:
sensitivity:       normal
priority:          normal
starred:           no
tags:              bookclub
slug:
physical_location:
source:
source_path:       ark:/books/work
store_path:
sha256:            7b3a2e46382a6c857413db215adb76e6192fe47d9f0c2658dd36b7b5406d2366
doc_status:        active
record_origin:     unknown
origin:
last_accessed:     2026-05-31 14:24:32
access_count:      8
author:            Charlotte McConaghy
published:         2025
genre:             Mystery
isbn:              9789025477356
Wild Dark Shore Charlotte McConaghy A family on a remote island. A mysterious woman washed ashore. A rising storm on the horizon.
Dominic Salt and his three children are caretakers of Shearwater, a tiny island not far from Antarctica. Home to the world’s largest seed bank, Shearwater was once full of researchers, but with sea levels rising, the Salts are now its final inhabitants. Until, during the worst storm the island has ever seen, a woman mysteriously washes ashore.
Isolation has taken its toll on the Salts, but as they nurse the woman, Rowan, back to strength, it begins to feel like she might just be what they need. Rowan, long accustomed to protecting herself, starts imagining a future where she could belong to someone again.
But Rowan isn’t telling the whole truth about why she set out for Shearwater. And when she discovers sabotaged radios and a freshly dug grave, she realizes Dominic is keeping his own secrets. As the storms on Shearwater gather force, they all must decide if they can trust each other enough to protect the precious seeds in their care before it’s too late―and if they can finally put the tragedies of the past behind them to create something new, together.
A novel of breathtaking twists, dizzying beauty, and ferocious love, Wild Dark Shore is about the impossible choices we make to protect the people we love, even as the world around us disappears. Mystery Thriller Literary Fiction
--- annotations ---
[23] 2026-05-31 11:21
  Initial thoughts after reading:
- characters: Rowan, Dom, Raff Orly, Fen
- a kind of Robinson Crusoe or Swiss Family Robinson about the story, only darker
- did a very good job for the first two-thirds at playing out the mystery
- last third of the book took too many zigs and zags and was on the verge of becoming parody
- loved the setting of the book
- wondered why there would only be one place in the world where the seeds would be kept for rebuilding civilization; one would think there would be multiple places; also why one an island as wracked and ravaged as this one.
- odd to have one chapter from Alex POV -- it throws off the rhythm of the story; as if author could not come up with any other way of conveying that info
[24] 2026-05-31 11:30
  **Book Club Discussion Notes — May 31, 2026**
*Initial reactions:* Strong first two-thirds that sustains mystery well; final third overloaded with reversals, verging on parody. The island setting is the book's most consistently successful element, grounded in McConaghy's research on the real Macquarie Island.
...

		

After all of that metadata for the book record, note the section called --- annotations ---. This is where I have gone and added annotations to this particular book record. I’ve added two, as you can see, but I can add as many as I want. These are not part of the book record, but rather are linked to the book record. The ark show command is pulling all of this together when it shows the book to me.

Adding the annotation is as simple as running the command. Or, asking Claude to do it for you.

Here, not only was Claude easily able to find what I was looking for, but once found, it summarized it as requested, and then added the annotation that I asked for. For me, this turned annotation from a chore into something easy. I can sit with my book and as I read, tell Claude to add various annotations to the book for the thoughts or passages I want to comment on.

Scene 6 – `ark quality`: The Archive That Knows Where it is Weak

Archivists work to ensure the quality of the archives they manage. That not only means preserving the archive but enriching it with as much information as they can about the contents of the archive. Each item in ark has a certain set of metadata. It has the date of the item, the date it came into the archive, the source it came from, its location in the real world (if it is not digital in the first place). Each item has a title, a series and sub-series, a format, a sensitivity level. Items can have OCR data, embeddings, generated descriptions, and more. With all of that possible data surrounding an item in the archive, some way is needed to assess the quality of items in the archive.

Enter the ark quality command.

This command provides a way of looking at the archive as a whole seeing the overall quality of items. It also allows for looking at a specific item and seeing the quality of that item. Quality is a score between 0-100 that is a measure of how complete the picture of the item (or archive) is. An item with no metadata at all rates a 0. An item with every possible element filled in rates 100. Here is a quality summary of the entire archive.

			
ark quality summary
=== ark quality — Archive Completeness ===
  Total documents:  681,576
  Median score:     76/100
  Score distribution:
       0-25                                   11,116  (1.6%)
      26-50  ███                              85,298  (12.5%)
      51-75  ███████                         178,002  (26.1%)
     76-100  █████████████████               407,160  (59.7%)
  Median score by type:
    attachment        ████                   24/100
    subscription      ███████                35/100
    reading_started   ████████               40/100
    calendar_event    █████████              47/100
    cli_command       █████████              47/100
    reminder          █████████              47/100
    note              ██████████             53/100
    video             ██████████             53/100
    reading_finished  ████████████           60/100
    code_file         █████████████          65/100
    conversation      █████████████          65/100
    imessage          █████████████          65/100
    outbox_draft      █████████████          65/100
    text              █████████████          65/100
    image             ██████████████         70/100
    browser_visit     ██████████████         73/100
    timeline_event    ██████████████         73/100
    watch_event       ██████████████         73/100
    blog_comment      ███████████████        76/100
    blog_page         ███████████████        76/100
    book              ███████████████        76/100
    day_summary       ███████████████        76/100
    email             ███████████████        76/100
    facebook-post     ███████████████        76/100
    message           ███████████████        76/100
    office            ███████████████        76/100
    pdf               ███████████████        76/100
    purchase          ███████████████        76/100
    review            ███████████████        76/100
    tweet             ███████████████        76/100
    git_commit        █████████████████      87/100
    music_play        █████████████████      87/100
    weather_snapshot  █████████████████      87/100
    action_item       █████████████████      88/100
    blog_post         █████████████████      88/100
    health_day        █████████████████      88/100
    playlist          █████████████████      88/100
    diary_entry       ████████████████████  100/100

		

Here you can see that about 60% of the items in the archive fall into the top quadrant of quality. The highest quality class of items are diary entries. The lowest quality are attachments. This makes sense because attachments are pulled in from email messages and outside of getting descriptions from vision models and embeddings on those descriptions, they have almost no other metadata.

Anything that comes into the archive has a title. Often times it is simply the name of the file that was ingested and we all know that filenames can be ambiguous or provide little useful information about what the file contains. ark can assess the quality of titles as well, for instance:

			
ark quality titles --limit 20
=== ark quality titles — 20 document(s) with bad titles ===
       ID  Reason              Type            Date        Title
  ───────  ──────────────────  ──────────────  ──────────  ──────────────────────────────────────────────────
filename w/ ext     attachment      —           Suggestions_for_OS_8_image.doc
filename w/ ext     attachment      —           Nuv.doc
filename w/ ext     attachment      —           database.doc
filename w/ ext     attachment      —           OS_8_seminar_outline.doc
filename w/ ext     attachment      —           Proposal.doc
filename w/ ext     attachment      —           g970527.doc
filename w/ ext     attachment      —           g970515.doc
filename w/ ext     attachment      —           updaterr.doc
filename w/ ext     attachment      —           REV980617ACOSTEDBOM1.XLS
filename w/ ext     attachment      —           adopters2.xls
filename w/ ext     attachment      —           Development.xls
filename w/ ext     attachment      —           kpatel.doc
filename w/ ext     attachment      —           BDY.RTF
filename w/ ext     attachment      —           98issues.doc
filename w/ ext     attachment      —           michael.doc
filename w/ ext     attachment      —           Invoice1.xls
filename w/ ext     attachment      —           Dun12.doc
filename w/ ext     attachment      —           Lanrover.doc
filename w/ ext     attachment      —           %adopters.xls
filename w/ ext     attachment      —           adopters.xls
document(s) shown.
  Use 'ark enrich titles' to fix with LLM-proposed replacements.

		

Note that in addition to listing items in the archive with poor quality titles, the command also tells you how you can raise the quality of those titles using ark enrich titles. Automated enrichment is a key element of ark. It uses LLMs to read the document and give a descriptive title to the document based on the context. Each document class has its own template of title format. So for instance a letter might have the format “Sender-to-recipient (date)” and the LLM will produce titles in that format so that they are consistent across that class of items, raising the quality and consistency of the archive.

ark also uses a controlled vocabulary for its metadata like series and sub-series. When something is ingested it may get a series that falls outside that controlled vocabulary. ark quality vocab can help to identify such items:

			
ark quality vocab
=== ark quality vocab — Controlled Vocabulary Check ===
  Vocabulary: 15 series configured
  Violations: 4 unknown series value(s), 10 unknown sub-series value(s)
  Unknown series (4):
    Series                      Docs
    ────────────────────────  ──────
    health                      9392
    shopping                    5813
    diary                        624
    environmental                 43
  Unknown sub-series (10):
    Series              Sub-series              Docs
    ──────────────────  ────────────────────  ──────
    media               watch                  26149
    shopping            digital                 2642 (unknown series)
    shopping            audible                 2049 (unknown series)
    shopping            retail                  1122 (unknown series)
    media               reviews                   83
    correspondence      amazon                    22
    legal               identity                  11
    legal               power-of-attorney          1
    legal               will                       1
    professional        systems administration       1
  Sample documents with unknown series (up to 10):
         ID  Series              Sub-series        Title
    ───────  ──────────────────  ────────────────  ────────────────────────────────────────
     728106  environmental       —                 Weather: Arlington, VA on May 30, 2026
     728102  health              —                 Fitbit: May 30, 2026
     727759  environmental       —                 Weather: Arlington, VA on May 29, 2026
     727758  health              —                 Health: May 30, 2026
     727751  health              —                 Fitbit: May 29, 2026
     727457  environmental       —                 Weather: Arlington, VA on May 28, 2026
     727456  health              —                 Health: May 29, 2026
     727444  health              —                 Fitbit: May 28, 2026
     727087  environmental       —                 Weather: Arlington, VA on May 27, 2026
     727086  health              —                 Health: May 28, 2026
  Fix: ark set document <id> --series <valid-series>
  Or:  ark normalize-series  (if renaming a whole class)

		

Once identified, ark provides commands for automatically fixing the items, either manually, or across the entire series.

All of this helps to raise the quality of the archive, making searching more accurate.

Scene 7 – Closing the Day

We’ll conclude the tour with one way of looking at the work I got done yesterday. This is a good example because it shows how ark was really designed with core Unix principles, each command doing something the output of which could be used as input for other commands.

Let’s begin by looking at what the archive ingested from yesterday. The easy way to do this is as following (I’m limiting this to just 10 entries for space):

			
ark recent --days 1 --limit 10
[1]   728102  health_day    2026-05-31  2026-05-30  Fitbit: May 30, 2026
[2]   728101  diary_entry   2026-05-31  2026-05-30  Evening Notes
[3]   728100  git_commit    2026-05-31  2026-05-30  feat(cli): auto-follow supersedes chain to head on reads…
[4]   728099  git_commit    2026-05-31  2026-05-30  feat(search): identify multi-version docs via --has-vers…
[5]   728098  git_commit    2026-05-31  2026-05-30  feat(vim): :BlogShot — SnagIt → Vim → WordPress screensh…
[6]   728097  git_commit    2026-05-31  2026-05-30  fix(reading): ark reading finish <id|@N> stored raw CLI …
[7]   728096  git_commit    2026-05-31  2026-05-30  fix(blog): standalone images publish as wp:image blocks;…
[8]   728095  outbox_draft  2026-05-31  2026-05-31  Attempts  ✎1
[9]   728094  playlist      2026-05-31  2026-05-31  Recently Added (v5)
[10]  728089  music_play    2026-05-31  2026-05-28  On Your Shore — Enya
# 10 results

		

But if we wanted to know the total number of things ingested from yesterday we can do this:

			
ark recent --days 1 --limit 200 | wc -l
# 81 results
      82

Next, we can use the ark bundle command to generate a “bundle” of one or more documents from ark. This one bundle would contain the contents of all of the documents we put into it. So for instance, if we wanted to generate a bundle with the full contents of everything that came into ark yesterday we could do this:

ark recent --days 1 --limit 200 | ark bundle

which would produce output containing the contents of all 81 items ingested yesterday — a total of 799 lines so I won’t show them all here.

Finally, we can redirect that bundle to an ark task. A task is a prompt for an LLM to do something with the data. We’ll use my “shipped-yesterday” task, which summarizes everything I “shipped” (code commits, posts written, emails sent, etc.) Putting it all together, it looks like this:

			
ark recent --days 1 --limit 200 | ark bundle | ark task shipped-yesterday
# 81 results
ark: warning: Redacted 1 sensitive pattern(s) before sending.
Running task: 'shipped-yesterday' (standard)
## May 30, 2026 — Daily Output Report
### Shipped
Five commits to the **ark** repo:
- feat(cli): auto-follow supersedes chain to head on reads (#94 Phase 1); bump to 1.68.0 (728100)
- feat(search): identify multi-version docs via --has-versions + (vN of M) annotation (#95); bump to 1.69.0 (728099)
- feat(vim): :BlogShot — SnagIt → Vim → WordPress screenshot workflow; bump to 1.69.1 (728098)
- fix(reading): ark reading finish stored raw CLI arg as event title; bump to 1.69.2 (728097)
- fix(blog): standalone images publish as wp:image blocks; caption via title (#96); bump to 1.69.3 (728096)
### Written
**Diary** — "Evening Notes" (May 30) covers a pleasant day at home, mentions writing an internal company blog post on AI use, notes a traffic spike (2,300+ views) on his public blog, and reflects on starting Montaigne's *Essays* and a desire to re-read *The Stand*.
**Self-email** — "Tomorrow's Plan — Sunday, May 31, 2026" sent from j**.**@icloud.com to himself; a detailed day-plan covering the DCA→BOS flight, the dare-to-dream volunteer event, open action items, and upcoming occasions.
### In Flight
**"Attempts"** (outbox draft, WordPress destination) — a personal essay connecting the etymology of "essay" to the author's own writing history, from junior high through college, prompted by a morning walk and an archive search. The draft is complete in the outbox but not yet published.

		

The first thing to note is the warning that ark gives on that second line after the command:

ark: warning: Redacted 1 sensitive pattern(s) before sending

This is an example of what I was talking about earlier. Sensitive information never leaves the local machine; it is either blocked, or, as in this case, redacted before sending it to an LLM.

The rest summarizes the work I did yesterday, based on how I specified the way I wanted it summarized in the “shipped-yesterday” task.

Seven scenes, but really seven angles on one habit. Most days I use four or five of these without thinking about it: the archive surfaces the day, I notice what wants attention, I look for something I half-remember, I add a note to something I just read, I check what shipped. None of those scenes are unique to ark taken in isolation — search engines search, task managers triage, note apps annotate. What’s unusual is that they sit on top of the same archive, share the same discipline, and the parts compound. That’s the payoff Part 1 promised the archivist’s lens would deliver.

The next post takes the floor out from under all of this and shows what holds it up — ark‘s data model. A single SQLite database, a clean graph connecting documents to people to one another, and a schema disciplined enough that it should still answer the same questions thirty years from now. Plumbing always sounds duller than the demo, but it’s where this stops being a pile of clever commands and starts being an archive built to last.

Notes:

Eventually, `ark` will be able to take some of these actions for me! ↩︎
An embedding is a way of converting text (or other data) into a list of numbers — a vector — that captures its meaning. The key property is that things with similar meanings end up with similar numbers, so you can do math on meaning. When you ask an LLM a question, your question is turned into an embedding and that math is applied to see how “similar” it is to other embeddings. ↩︎
Yes, `ark` has a history of everything I’ve listened to (via Apple Music) and watched (via YouTube, Amazon, Netflix and a variety of sources). More on this in the next post in the series. ↩︎
They can be versioned and superseded, but that is a topic for a future post. ↩︎
MacVim in my case. ↩︎

Tagged asessay

One response to “`ark`: A Personal Archive System, Part 2: A Day with the Archives”

Mark Plutowski

June 11, 2026

“All documents in the archive get a sensitivity level.”

Great idea, given that you’re allowing access to family members.

“if an SSN, account number, phone number, etc., appears in a document, those all get replaced with “[REDACTED]””

Another great idea… super easy to implement too.

Loading…

CommentsCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Jamie Todd Rubin

`ark`: A Personal Archive System, Part 2: A Day with the Archives

Scene 1 – Morning: Catching Up

Scene 2 – Triage: What Wants Attention

Scene 3 – Looking Back: Finding What’s In the Archive

Scene 4: A Conversation with My Archive via Claude Desktop (MCP)

Scene 5 – Adding My Voice: The Annotation System

Scene 6 – `ark quality`: The Archive That Knows Where it is Weak

Scene 7 – Closing the Day

Like this:

Related posts

One response to “`ark`: A Personal Archive System, Part 2: A Day with the Archives”

CommentsCancel reply

`ark`: A Personal Archive System, Part 2: A Day with the Archives

Scene 1 – Morning: Catching Up

Scene 2 – Triage: What Wants Attention

Scene 3 – Looking Back: Finding What’s In the Archive

Scene 4: A Conversation with My Archive via Claude Desktop (MCP)

Scene 5 – Adding My Voice: The Annotation System

Scene 6 – ark quality: The Archive That Knows Where it is Weak

Scene 7 – Closing the Day

Like this:

Related posts

One response to “`ark`: A Personal Archive System, Part 2: A Day with the Archives”

CommentsCancel reply

Discover more from Jamie Todd Rubin

Scene 6 – `ark quality`: The Archive That Knows Where it is Weak