Saturday, January 12, 2013

academics want their work to be available

This is a minor point in the face of a tragedy, but I think it's worth making.

Aaron Swartz, the young information rights activist who was indicted for downloading millions of academic journal articles from MIT's network, has committed suicide. The charges against him were, frankly, insane. Yes, it was a mistake to break into a networking closet and access their network illegally-- the kind of mistake that should get you, say, a fine, community service, and probation. The feds were apparently out to make an example of Swartz at a time when they are under great pressure from media companies to enforce IP laws. It's impossible to say if his suicide was a direct result of his prosecution; mental health is enormously complicated and does not operate on a simplistic system of cause and effect. But it would be absurd not to assume that the prospect of years and years in prison contributed to his mental well-being at the time of his suicide.

Here's the point I want to make about journal archive access: I don't know a single academic who is opposed to open and free access of their work. And I know more than my fair share of academics. My father was a professor and his father was a professor, my family's social circle growing up was full of academics, I am a grad student who maintains friendships and connections with other grad students and professors at many universities, and I spend an awful lot of time talking about the university and its culture. And I have brought up the question of gated journal access constantly, because it's a subject of considerable interest to me. I know that this isn't a very rigorous standard of evidence, but my own experience is all I've got. I have never talked to anyone-- arts, professional schools, humanities, social sciences, or STEM-- who was opposed in theory to the idea of free access. You've got to do something to rebuild the revenue streams of the academic journals, many of which operate at a loss already. But as a principle, giving people free access to journal articles is as close to a universal stance as I can think of among academics. Why wouldn't it be? Researchers believe that their research has value, that it matters, and they want it to be read.

It's important to say: I am very far from a piracy apologist or advocate for totally free media. "Information wants to be free" is an entirely empty statement, an attempt to use a profound-sounding aphorism in the place of actual intellectual work. In my experience, those who advocate the freedom to pirate simply want whatever they want, whenever they want it, at no cost. That's not an adult stance, and to date I have never heard a piracy advocate articulate a system that would achieve that universal free access while still making the media we love practically possible, to say nothing of compensating creators for their talent and their work. I am not someone inclined to an "anything goes" attitude towards intellectual property. But this prosecution was ridiculous, this outcome tragic, and this restriction on the free dissemination of academic articles an affront not just to the ideals of scholarship but to the actual desires of most academics. Who was the government protecting in this prosecution? Who was it for?

15 comments:

mord said...

you wrote: "to date I have never heard a piracy advocate articulate a system that would achieve that universal free access while still making the media we love practically possible"

it's interesting how in one post you're willing to suggest that human rights cannot progress w/out a radical resistance to capitalism, and in this post you want piracy advocates to develop a system that would keep capitalism running smoothly. maybe their position is that ppl will continue to produce scholarship even w/out the financial incentives promised by jstor subscriptions.

Freddie said...

First, you're misreading me-- I think the public should have unfettered free access to scholarship. Scholars themselves have no direct financial incentives to publish, and certainly not from JSTOR. I'm talking about in media.

Second, my resistance is to socializing the production of culture without socializing the economic system in which those who produce it live. It is cruel and unworkable to tell artists "you need to provide your work for free, but you need to pay for everything you need to live." You socialize the system from the material needs, not from the cultural ones. And this is part of my frustration with piracy activists: they pose as socialist revolutionaries when it suits them, but they have no actual commitment to a genuinely social system. They just want to watch The Avengers for free.

I hate current IP law and I want reform. That can't happen until we get an IP reform movement that is made up of grown ups.

Rasmus Xera said...

I think you're buying into a sad caricature of the piracy movement, Freddie.

Sure, the average person torrenting movies might just want to watch them for free, in the comfort of their own home, usually before they would even be able to buy them. But the average piracy activist (at least in my experience) stands on solid empirical ground in staunch opposition to the same system we decry.

It's the same with ideas like Anonymous. They might seem on the surface to be a bunch of kids with a mild sense of justice, but in reality they're expressing a genuine anti-capitalist message, and have the tools to accomplish more than we can pontificating about it. They may not be rooted in Marx and Mill and the like, but their criticisms are pretty damn close, and they resonate a lot more with people today than the prose of centuries old.

Freddie said...

I hope you're right.

Charles said...

Sort of in response to comments 1 and 2:

What almost always goes unnoticed in discussions about access to JSTOR and other similar walled gardens is that digitizing and maintaining academic journals is REALLY FUCKING EXPENSIVE. This isn't, like, "let's feed some journals through a scanner and put them online." This is, like, capturing terabytes of data over a span of decades, developing software to capture complicated and inconsistent metadata, having a large staff of humans QC that metadata, maintaining expensive hardware in data centers around the globe, hiring system administrators to... Anyways, this is a massive endeavor that costs many tens of millions of dollars to execute even at a small-to-medium scale.

My point is that what academics want with regard to access to their journal articles isn't the only relevant question here. The work done to get their writings online is also, you know, work. The people who do it have to get paid or the work won't get done.

I also want to point out that while academics may not see any direct financial incentives from online access to their articles, academic publishers certainly do. The paywalled digital collections of academic journals pay money directly to publishers based on access -- x number of people read your journals, you get y dollars. Many of these are small publishers. Nearly all of them have lost some of their funding from their universities over the last ten years. This revenue matters.

I don't mean any of this to suggest that Swartz's prosecution was justified. It was, as you pointed out, insanely overboard, especially since both MIT and JSTOR made it clear to the prosecutor that they were uninterested in pursuing legal action. But the broader issues surrounding the case aren't simple.

Bart said...


What I keep running up against as an Old is the ability to do on-line research in medical journals concerning the various ills that afflict me.

Tom Allen said...

Perhaps if you write a very kind note to Elsevier they'll spend a buck or two of their billion-dollar yearly profits and send you Xerox copies of the articles you need. If you enclose a SASE, naturally.

Freddie said...

Naturally.

VL said...

The thing that often gets left out of this conversation--and which makes the government's prosecution of the case especially bizarre -- it that, at least in the sciences, most of the research is funded by the federal government in the first place, i.e., by your tax dollars. Moreover, when you receive an NIH or NSF grant, you explicitly agree to make your work available to the public -- including expensive reagents such as genetically engineered mice.

So, the government pays for the research, the lab head pays the journal to publish the work (yes, scientists have to pay for color figures, etc, so publishing a paper usually costs $1-2K or sometimes more), then the university library pays thousands for a subscription to the journal (some major scientific journals cost libraries $20K+ each year), and anyone outside the university system pays $35 for access to the thing. So researchers are paying at every step to get their work out, the public pays for both the research _and_ the privilege of reading it, and the journal itself keeps raking in the dough. This is particularly egregious when the university itself is supposed to be state supported (yes, I know that's basically a fiction by now). And you know who makes the money? Not the poor souls who have to make online databases work, but the CEOs of the major publishing companies.

This unhappy state of affairs is what led Nobel Laureate Harold Varmus and colleagues to start PLoS (Public Library of Science) a decade or so ago --a truly open-access journal. PNAS, the Proceedings of the National Academy of Science, are also freely available to the public. If these venerable journals can do it, I see no reason for it not to become the default model. Except for the problem of capitalism, of course.

VL said...

Freddie, you and your readers might enjoy Lawrence Lessig's comments here:

http://lessig.tumblr.com/post/40347463044/prosecutor-as-bully

Cian said...

What almost always goes unnoticed in discussions about access to JSTOR and other similar walled gardens is that digitizing and maintaining academic journals is REALLY FUCKING EXPENSIVE.

No it isn't. I'm sure Elsivier and the rest would like people to believe this, which is why they pump this propaganda out there for the gullible. Sure it costs money, but so does running a university web site, running a department, organizing a library.

And when you compare it to the cost of the subscription to a journal like Nature... Oh boy.

cian said...

FWIW: Aaron probably wasn't trying to liberate the JSTOR documents. We'll never know for sure (he was unable to comment due to the case), but it seems likely he just wanted mass access to the documents.

And the case was actually worse than we knew:
http://unhandled.com/2013/01/12/the-truth-about-aaron-swartzs-crime/

zmil said...

"Moreover, when you receive an NIH or NSF grant, you explicitly agree to make your work available to the public -- including expensive reagents such as genetically engineered mice.

So, the government pays for the research, the lab head pays the journal to publish the work (yes, scientists have to pay for color figures, etc, so publishing a paper usually costs $1-2K or sometimes more), then the university library pays thousands for a subscription to the journal (some major scientific journals cost libraries $20K+ each year), and anyone outside the university system pays $35 for access to the thing."

Current US law requires that papers that come from NIH funded research must be available for free within something like a year of publication, so this has been partially solved already. Doesn't help with older papers, though.

If you really really want a paper there's always the option of emailing the corresponding author. Most will be glad to email you a pdf.

Lastly, on open access journals- they're great and all, but they do tend to charge much higher publication fees in order to cover costs. Publication does cost money, and somebody has to pay for that.

Charles said...

Cian,

Yes, many things do indeed cost money. Digitizing and hosting journals on a centralized platform with almost no downtime for access by millions of users is one of the things that costs huge amounts of money, at least when done at the level of quality the academic community seems to want. Forgetting the costs of long-term maintenance of a digital archive (which we can't, because it's way more expensive than storing paper, which is itself way more expensive than anyone outside the library community thinks it is), the labor required to digitize back runs of paper journals is just huge.

You need people to track down copies of the journals (usually in large sets), which means staff specialized in dealing with many institutions. You need people to go through those sets and make sure everything is intact, which of course it isn't. Then you need to hunt down individual volumes to find replacement pages for the ones that were bad in the large sets you were given. Then you need to pay bulk scanning companies to make scans (and pay for shipping to/from those companies). Then OCR the scans. Then do QC on the scans and the OCR. Then capture lots of metadata and QC that. We haven't even gotten into developing a frontend and backend for millions of users and keeping it updated over time; buying and maintaining the hardware all of this will run on; and a thousand other things

I have no interest in defending Elsevier, who I agree are a horrible company. My point is that a "this should all be free and the paywall is immoral" attitude fails to account for how this work would be done if the current model fell apart. I worked in large-scale academic IT for nine years and know many people who have worked on digitization projects, and I can tell you from first-hand experience that this shit costs. It has to be payed for somehow. I wish it were high enough on the list of our societal priorities to be payed for through taxes, but it isn't and won't be any time soon.

cian said...

Forgetting the costs of long-term maintenance of a digital archive (which we can't, because it's way more expensive than storing paper, which is itself way more expensive than anyone outside the library community thinks it is), the labor required to digitize back runs of paper journals is just huge.

JSTOR spends about $3million a year on it currently. They spent rather more on it in previous years, though still relatively trivial amounts ($12 million). To put this in perspective, they spend $8million on publisher fees (which is a sick joke, but whatever). JSTOR will have vastly higher costs than most publishers when it comes to digitization (because that's mostly what they do). Obviously digitizaiton is irrelivant for any article published in the last ten years.

They spend $3million on IT, and around $11million on staff costs. Even if all those staff costs were all IT (they won't be), we're still talking trivial amounts of money.

Arxiv costs $400,000 a year to run. Which is ludicrously cheap. I'm not sure how much the economics one costs, but I doubt it's significant.

Then do QC on the scans and the OCR. Then capture lots of metadata and QC that. We haven't even gotten into developing a frontend and backend for millions of users and keeping it updated over time; buying and maintaining the hardware all of this will run on; and a thousand other things.

Even if you were to produce a good interface/system (none of the publishers do), this would not be hugely expensive. It would be considerably cheaper if you did away with the security requirements. Security is expensive.

My point is that a "this should all be free and the paywall is immoral" attitude fails to account for how this work would be done if the current model fell apart.

Public purse. Hell it could come out of the research budget if necessary. The savings in public expenditure by doing it that way would be significant. Anyway, nobody buys single articles from JSTOR. That's simply not a serious part of their budget.

Incidentally, it's going to happen one way or another. The current model is not financially sustainable.