## Calculating public burden using OIRA data -- Part TwoAn experiment in using open data to make government betterPublished on: Feb 13, 2017

Yesterday, I published an article about using open government data to hunt for paper-based information requests by the government. Based on the data, it looked like there are still a lot of hours spent filling out paper-based forms. As I noted, though, I ran out of time to do careful analysis. So, today, let's explore deeper.

First, we'll create a histogram to look for the distributions of requests. To do so, we'll use pandas to examine the results data, and specifically the histogram method.

In [1]:
```# Set up the graphing environment. Because I'm using jupyter notebooks, first I need to tell
# it to show the graphs inline. I also use the `ggplot` style, because it's less hideous.
%matplotlib inline
import matplotlib
matplotlib.style.use('ggplot')
```
In [2]:
```import pandas as pd
data.burden.plot.hist()
```
Out[2]:
`<matplotlib.axes._subplots.AxesSubplot at 0x7fca4799d358>`

Wait. Hold on right there. That's not what you'd expect to see. That looks like there's an outlier. Let's see what that might be... To do so, we look for the top ten burdens.

In [3]:
```data[["burden", "title"]] .sort_values('burden', ascending=False).head(10)
```
Out[3]:
burden title
720 2997500000 U. S. Business Income Tax Return
729 48731780 IRA Contribution Information
719 34115874 Form 1099-DIV--Dividends and Distributions
718 24951529 Return of Organization Exempt From Income Tax ...
248 20036012 2017-2018 Free Application for Federal Student...
509 13500230 National Fire Incident Reporting System (NFIRS...
717 10880812 Employer's Annual Tax Return for Agricultural ...
497 9902378 Arrival and Departure Record
449 7736084 Physician Quality Reporting System (PQRS) (CMS...
713 7041290 Customer Due Diligence Requirements for Financ...

Oh dear. Looks like we've got a pretty obvious mistake here: "U.S. Business Income Tax Return" can definitely be filed electronically. Same with the other things on the list. And that one outlier accounts for 3 billion of the 3.3 billion hours. Oof. So what gives?

Well, it turns out that the way that OIRA displays the burden data is that if any of the forms that are part of an information collection request is not electronically available, then the burden for all of the forms gets aggregated. And unfortunately, there doesn't seem to be an obvious way to back out the other forms. So, that's not very useful, unfortunately.

Let's see what the total burden is if you remove the top 20% of information collection requests.

In [4]:
```"{:,} hours".format(data.burden.sum() - data.sort_values('burden', ascending=False).head(220).burden.sum())
```
Out[4]:
`'5,589,316 hours'`

So, that feels a lot more sane, and a lot less exciting. There are only 5,589,316 hours of public burden for everything but the top 20% of information collection requests.

In the end, this is a great lesson in how a data schema can lead to incorrect conclusions.

Still, we have some good data near the bottom of the chart.

In [5]:
```data.sort_values('burden').head(890).burden.plot.hist(bins=30)
```
Out[5]:
`<matplotlib.axes._subplots.AxesSubplot at 0x7fca467f8080>`

In other words, there are a lot of information requests that account for a couple hundred hours of public burden. Not a surprising result, but perhaps even more useful in the end. This result means that there are about 200 forms in the middle that account for much of the remaining burden hours. Now, that seems like a good place to start.

## Calculating public burden using OIRA dataAn experiment in using open data to make government betterPublished on: Feb 12, 2017

Recently, the new Administration issued an Executive Order aimed at Reducing Regulation and Controlling Regulatory Costs. As part of this effort, the Administration is supposed to offset regulated costs.

So, that got me thinking. The Office of Information and Regulatory Affairs (OIRA) is charged with reviewing not only regulations, but also is charged with reviewing agency's information-collection requests under the Paperwork Reduction Act. And as part of that review, OIRA and the agencies are supposed to track the public burden associated with the information collection.

As a thought experiment, I decided to see whether we could find some low-hanging fruit, namely paper-based information requests. And the results were interesting...

## Should lawyers learn to code?Yes, but we should not strive to be codersPublished on: Aug 14, 2016

For the past several years, I’ve been asked one question many times: “should lawyers learn to code?” Over those years, my view has been mostly consistent… “yes, lawyers should learn to code.” Probably unsurprising, given that I wrote Coding for Lawyers several years ago.

But, there’s always been a lingering bit of doubt. “Should all lawyers learn to code?” I would quietly ask myself. “Why?” I’d wonder. What specifically about coding did I think lawyers should learn?

Recently, the parlor game has been played out many times over amongst the #legaltech set, and folks are taking sides. So now, despite my previous reservations, here is my full-throated argument for why lawyers should learn to code.

## When a micro-purchase doesn’t work out, we try to learn from itLessons from the trenchesPublished on: Jul 09, 2016

This week, I co-authored a blog post for the 18F blog, entitled: When a micro-purchase doesn’t work out, we try to learn from it. It discussed a thing that is rarely discussed in government: failure. Here’s the opening graf:

Two months ago, the 18F acquisitions team ran a public micro-purchase auction to find a vendor to develop a small new feature for 18F’s cloud.gov, and for the first time after several successful micro-purchases for other products, the contracted vendor didn’t deliver the code on time. This was very interesting to us — we’re early in the life of the micro-purchase platform, and we believe that failure is a great way to learn. In the spirit of experimentation and sharing our lessons, here’s how we went about analyzing this, and here’s what we learned.

I encourage you to read it!

## The Code of the District of Columbia is now available onlineAnd I couldn't be happierPublished on: Jun 26, 2016

At long last, the Code of the District of Columbia has a permanent URL, within the dccouncil.us domain. This may not seem like a big deal, but this simple event is the culmination of years of effort, and I couldn’t possibly be happier.

## DC's Voter Rolls are on the InternetIs this 'Shocking' or is it 'same old same old'?Published on: Jun 19, 2016

Earlier this week, the Washington Post ran an article with a headline destined to scare the crap out of DC’s voters: “D.C. makes it shockingly easy to snoop on your fellow voters.” But behind this hyperbole was a simple act; the DC Board of Elections posted the voter roll on the internet for public inspection. For those who might not know any better, this must have been quite a surprise. But for close observers of DC’s elections, this was, well… a nothingburger. Here’s why.

## My theory about The AmericansPrepare for your mind to be blownPublished on: Jun 11, 2016

This week marked the finale of season 4 of the Americans. Like almost everyone else, I loved it. Already, I can barely wait until the next season starts. But as I prepared to watch the finale, I had a nagging thought. I just couldn’t let it go. And now, I am absolutely convinced that … [Warning, serious spoiler alert ahead!]

## Storytelling and federal procurementA lesson in how to explain complicated thingsPublished on: Jun 05, 2016

Last week, after chatting about challenges in federal procurement, a colleague suggested a book entitled the “Free Enterprise Patriot.” The opening statement of the book sets the stage:

## On links to court filingsJournalists should link to court filings by defaultPublished on: Feb 20, 2016

Dear Media,

It’s time we had a talk. Because you’re hurting democracy.

## 6 months into 18FAn updatePublished on: Sep 10, 2015

Several months ago, I described my intention to leave a happy job in the law and join the emerging government technology office known as 18F. Today, 6 months after starting at 18F, I want to give an update about how it’s going.

tl;dr It’s better than I could have ever imagined.

## Mailmerge for Word Docs... in Python?A neat trick for document automationPublished on: Jan 25, 2015

I’m going to say something nice about Microsoft Word: there’s a simple loophole to its impossibly ornate OOXML schema that allows for document templating. If you are trying to do some document automation for Word documents from Python (or other languages, I suppose), listen up.

## Joining 18FWhy I'm leaving the greatest job in the world for another onePublished on: Jan 20, 2015

Today, I informed the members of the Council and my colleagues that I will be leaving the District government at the beginning of March and joining the growing ranks of public servants at 18F. One question I have heard from friends, colleagues, and family is “Why?” It’s a fair question. Those who know me know that I love my job: my staff is amazing, the work is fascinating, and I have been given extraordinary opportunities to serve the District of Columbia. So what gives?

## Dogfooding with JekyllUsing the new `data_source` configuration to serve mankindPublished on: Nov 29, 2014

Yesterday, I learned that Jekyll, the well-known powerful static-site generator, has a little-known feature that is kind of a big deal for open-data sites hosted on Github.

tl;dr: Jekyll can let you consume and publish data files with the `data_source` configuration setting

## Client Confidentiality on Trello?Why two-factor authentication matters for lawyers who want to use the agile toolPublished on: Nov 29, 2014

This weekend I signed up for Trello. I started playing around with it, started liking it, and then I hit a snag. There’s no two-factor authentication (“2FA”).

As a practicing lawyer obligated to protect client confidentiality, this is a major barrier to entry. Fortunately, Trello has announced that 2FA is on the way. This is a great development. Trello has announced that 2FA is In Progress. Read on for why this matters, especially for lawyers like me. Read more

## In Praise of CommoditizationOpen source takes a villagePublished on: Nov 28, 2014

Earlier this week, Dr. Robert Read and Eric Mill penned an article for the 18F blog, entitled How to Use More Open Source in Your Next Federal IT Acquisition. It’s an important article for a variety of reasons. Most of it is a pitch-perfect explanation of why open-source tools are more important than ever, and why federal (and ahem local and state governments) should be looking for opportunities to use open-source tools.

## Court Statistics: Part IWhy we may need to open a "floodgate" of judicial dataPublished on: Nov 15, 2014

This weekend, I spent approximately 16 hours sitting in a windowless meeting room in a Chicago hotel discussing specific processes for arbitrating family-law disputes. This is how Uniform Law Commissioners like me get our kicks.

During the weekend, I learned of a recent article entitled Let’s Stop Spreading Rumors About Settlement and Litigation: A Comparative Study of Settlement and Litigation in Hawaii Courts, written by one my fellow commissioners, the multi-talented Elizabeth Kent and her co-author John Barkai.

Based on this article, I plan on writing three blog posts about judicial data and, hopefully, make the case for lawyers and the courts to think more critically about the need for good judicial data.

## "Dumb" Government DataDoing small things wellPublished on: Nov 12, 2014

I recently was named a member of the Mayor’s Open Government Advisory Group. Among the things that the Group will be tasked with is “[e]stablish[ing] specific criteria for agency identification of additional datasets.”

## An RSS feed for LIMSSyndicating data for better legislation trackingPublished on: Nov 07, 2014

Recently, I built an RSS feed for LIMS.

The URL for the RSS feed is here: https://esq.io/lims-rss.xml, and the source code is available here: https://gist.github.com/vzvenyach/757fa97fd99c3a14e798. This post explains my reasons for doing it.

## Hello WorldJust what the world needs: another blog...Published on: Nov 06, 2014

I’ve decided to start a blog. I’ll explain more later.