Write. ArXiv. Repeat.

As discussed in earlier posts, I believe that two simple and effective ways to improve the publication process are to require that only papers available on the arxiv can be be submitted for publication and to eliminate all formatting requirements. (Throughout this blog I use the arxiv for concreteness only; several other repositories should work just as well.) In this post I want to consider a broader, radical publishing reform, and discuss several related issues.

Here’s how I would like the publication process to be:

As an author, you write your paper. When you are done, you post it on the arxiv. Period. You now move to your next paper.

Forget reverse-engineering the chronology of progress: there would now exist a unique citation for your paper: its arxix entry. Forget bibtex and its BeaST. And also forget trying to pick the best venue. Forget “are they going to invite me for the special issue? In fact, is there even going to be a special issue?” Forget the conference vs. journal debate. Forget a lengthy camera-ready production process whose goal is to put your paper in an electronic format that is only read by library computers.

Papers would be ranked by a system of badges. For starters, the badges will correspond to the current entities. So we have the STOC badge, the JACM badge, etc. We also have some badges like ECCC, which are assigned to papers that satisfy minimal requirements, such as not making sweeping unsupported claims. Badges cannot be removed but can be added. This last aspect makes the new system more flexible. Today, it is a bit funny to find out that a seminal paper appeared in an obscure venue, but it is hard to update that paper’s status. With the new system one could just add another badge.

Q: Which papers are the committees supposed to evaluate?

A: Committees will need to monitor papers like many people already do. Note that for the year 2014 the ECCC repository lists 184 reports, for the year 2013 191. These are fairly small numbers, comparable to the number of submissions to a top conference.

Q: What if a paper does not get noticed? What mechanism would there be for giving it additional chances?

A: The current default mechanism is that the author resubmits the paper, to signal that venue’s committee that they should give the paper an n+1 chance. The same can be done with the new system, for example by posting an arxiv revision with the comment “no changes from previous version”. In both systems, what prevents authors from flooding committees with resubmissions is the reputation loss, so I expect this aspect to work in roughly the same way. A more rare current mechanism is that the paper gets invited or is selected for an award. This would work in the same way in the new system.

Q: What about the cycle of getting feedback from the reviewers and revising the paper accordingly?

A: In the spirit of “only papers available on the arxiv can be be submitted for publication”, I would like the public to have access to the same information that is given to authors and referees. So I would like this cycle to take place in a public forum. If there is a serious issue with an arxiv paper, I would like to see a comment pointing this out right away, instead of having to wait for authors and referees to converge on a new version to release to the public. I also believe that the feedback/revision cycle is less prominent in theoretical fields than it is elsewhere. In other fields, it is common to receive feedback of the type: “Result X is interesting but not enough. Please run experiment Y. If you get outcome Z then we’ll talk.” With theoretical papers you hardly get a request for obtaining better results. If there is any feedback it is mostly about presentation, references, and correctness. Also, especially with conferences, it is not uncommon to get inessential feedback.

As a first step towards implementation, one could keep the publication venues as they are, but replace the cumbersome submission process with an email containing a link to the arxiv record. The production of a camera-ready version and copyright transfers are eliminated. Conferences going solo have a great opportunity to implement this first step. Alas, the Computational Complexity Conference did not quite go for it; think about it next time you get an email about an overfull hbox. Once this step is taken, one can ask if even the submission email is required.

NSF now requires its grantees to make their peer-reviewed research papers freely available within 12 months of publication in a journal. This move by NSF is the answer, at least in part, to this petition, which I signed. (Incidentally, my inability to advertise this petition through the available channels is one of the factors that eventually led me to start my own blog.) However, I don’t find this change very significant. For one thing, 12 months after the publication time is a very long time for research.

Advertisements

31 thoughts on “Write. ArXiv. Repeat.

  1. @Emanuele: I couldn’t get the point. Do you say that NO paper should be submitted to a journal until it is loaded in arxiv? Or this only concerns conference submissions?

    In my eyes, both are overdone. Will the papers get “arxiv referee reports”? Have never seen. Will this finally break the “paying firewall” – definitely. So, I would only recommend authors to load their papers in open access repositories (arxiv, eccc, etc.) to be read, if theu want. To be “estimated” – try to submit elsewhere, STOC, FOCS or to some other “knowers of what is important”…

    1. Yes, that is what I am saying. Both conferences and journals should only accept submissions that are on the arxiv. I am not making any distinction between conferences and journals since in computer science conferences are very influential. Regarding the referee reports, I think those should be public too.

  2. Nice post. I don’t understand some of your sentences though. What do you mean by “forget reverse-engineering…”?

    1. Thanks. Re reverse-engineering: in many cases the chronology of papers as given by conference and journal publications is NOT valid, and you cannot use it when adding citations. You are supposed to know what really happened, which is especially difficult if you are citing something outside your bailiwick.

  3. I recently saw a process for writing and publishing papers on Github. See https://github.com/PeerJ/paper-now and the example paper at https://read-lab-confederation.github.io/nyc-subway-anthrax-study/

    It seems like it’s a little early and maybe not polished for math/tcs needs, but it fits a lot of your requirements (kills formatting, single id with detailed revision history, searchable index, etc.). And one of the advantages of github is the ability for anyone to submit in-line comments, which can be used to provide immediate (and public) feedback. I think it would not be hard to implement an authenticated badge system either.

    One small downside is that their math support does not include document-wide TeX macros (just stackexchange-style math support), or bibliography formatting, or knowledge of theorem “envrionments”. But if you want to get rid of bibtex anyway I’m sure one could get around the other two issues.

    As a graduate student the idea of me demanding to anyone that they read/review my papers in this format is silly, but if the conferences organizers were willing to try it out then I’d happily submit a paper using this tool.

    1. Thanks for the link. It would be good to try to keep things as simple as possible. I am not sure who would be willing to learn github, but I suspect that few researchers would object to using public repositories such as the arxiv. A bottom line is that I would like authors to be almost exclusively concerned with the production of a .pdf file (and in this respect I find the arxiv slightly suboptimal — I think I would prefer a system where you submit both .pdf and .tex, the .pdf is displayed as is, and the .tex is used to extract information such as title and authors, in case this cannot be extracted from the .pdf itself).

      1. I agree github is a bit overwhelming at first, but I’m fairly certain it’s no more complicated than the TeX mess we have learned to deal with. If you want to get rid of the issues that come with physical print, then ditching pdf in favor of a translatable format makes more sense. But I have to agree, asking someone else to learn a new technology just to keep doing what they’ve been doing is a big hurdle.

  4. :idea: :!: :star: like the ideas & the time is ripe. the whole “system” is quite patchwork in its current form. stackexchange also has a lot of innovative ideas on/ mechanisms for a scientific commons. it needs a Big ReThink. see also some musings on NIPs peer review & arxiv 1M papers etc …. one drawback is that arXiv has avoided peer review so far. its the natural place to have it, but any system would likely be controversial. hope that some different approaches are tried & experimented with to find different solutions & explore the possible “spaces” to find what works best.

  5. O.k. let’s not make any difference between journal and conference publications (although your mentioning of “formatting reauirements” target exactly conferences, not journals). But, as I understood, you are suggesting that it will be ONLY arxiv (or whatever repository) which will count as the “published in …”. Journals/conferences will remain just badges? But what then to do with:

    1. Job applications: would hiring committees just count badges?

    2. Tons of P!=NP proofs already “published” in arxiv?

    3. Reviewers not willing to serve as just “badge givers”?

    4. Putting “likes” to arxiv papers being unable to replace a (hard) work of referees?

    Albeit very congenial (in view of these paid firewalls), your suggestion seems a bit too extreme. Since even NSF, as well as European science foundations, are starting to (softly) require to make papers “freely accessible”, we could just rely on researchers themselves what to do with their papers. Existence of repositories makes this decision easier.

    1. @Stasys. Thanks for your thoughts! There seems to be some confusion so let me try to clarify. First, regarding your parenthetical remark that my mentioning of “formatting requirements” targets exactly conferences, not journals. No: it targets both. Even publishing in journals requires authors to prepare a camera-ready version. To be even more clear, I include in “formatting requirement” any additional work beyond the preparation of say a .pdf in whatever style the authors choose. Sometimes conferences go out of their way to increase this burden, but in general I don’t see a qualitative difference between conferences and journals. In fact, some journal webpages even ask authors to put *submissions* in a particular style file, though I learned the hard way that this is not typically enforced.

      Appearance in the arxiv does not give any value or ranking to a paper. Only badges do.

      Regarding your points:

      1. Yes, they will count badges exactly to the extent that they now count conferences and journals.

      2. These papers will receive no badges. Note here the ECCC badge is very useful and practical.

      3. The power to give a badge should belong to the same entities that now accept papers in conferences and journals, so I am not sure what would be the problem here.

      4. I am not at all suggesting that they can! I am not thinking of “likes” but of badges that are based on decisions of no lower quality than the current ones.

      1. 1. What then happens with conference proceedings and journal issues? What role should they play then? There seems then no need in them?

        2. But (most) papers on arxiv without any badges will then be still counted as “published”. For experienced researchers, this is no problem. But what about the beginners? Being “published or not” is a bit different from “having badges or not”. Had arxiv at least some filtering mechanism (at least at the – rather low, needless to say one in ECCC), this would be not “the” problem.

        3. See my concerns with item 1.

        But most pessimistic I am about item 1: the whole this crazy publishing system (founded by tax payers, but purely commercial) can exist only because of our silly hiring system. Counting system. Remember all our petitions against Elsevier, all attempts of NSF and other founding societies to break down this system. Not much changed … And all this despite: refereeing is voluntary, authors prepare themselves almost “print ready” versions, only “fat guys” from big publishing houses are happy with this system. So, item 1 may be the crucial bar for your (interesting, needless to say) suggestion.

      2. Re 1: I think that the role of current conferences and journals should be to run a badge system. (Conferences may have the additional goal of organizing meetings.)

      3. Only to “run a badge system” is a very cardinal suggestion. Much more cardinal than to force journals/conferences to at least make their issues/proceedings open after a fair delay. For publishing companies, this would mean a total loss of their (fat) profits! What libraries should then buy, and why? After some pressure, Elsevier now at least gives access to > 4 years old publications (for some journals). But that was. Say, Springer hasn’t done this till now.

        So, just to “run a badge system” will definitely not work. What could work is the “submission via arxiv” type procedures. Some journals/conferences already do this, I think (could not quicly recall examples). But this depends on editorial boards, and is suggested as an alternative only.

      4. Re your question: Recall the new policy by NSF. So, should we now all stock up on water and brace for the apocalyptic dent in the library-publisher market?

        Seriously, I am curious what you think the role of conferences and journals should be.

      5. Re your question: they should be what they were intended to be. Journals as *archivists* of checked knowledge, conferences as *meeting places* of scientists to exchange hot, not yet “archived” ideas. [B.t.w. I like the current ideas to stress the “event aspect” of STOC/FOCS – this is in the right direction.]

        The role and reputation of J&C (journals & conferences) is determined by people running them, and by papers already published in them, not by publishers or umbrella associations. Since J&C are run by volunteers (editorial boards and PCs), it should be no big problems that they become independent (of publishing houses). It would be enough to turn a small part of the huge “library budgets” into creation of something like “virtual journal platform(s)” (not mix with the “open journal” madness). Editorial boards and PCs of established J&C could just run their journals from these platforms. (Instead of retiring from editorial boards as a protest; former J. of Algorithm would, e.g., have had survived in this case.) Even the names of J&C could remain the same (in case of any copyright issues, just replace “Journal of …” by “Free Journal of …”, or similar.) Of course, long-time archiving is a serious deal, and costs money. But, in view of huge budgets for buying J&C, these costs would be incomparable smaller. If such giants as NSF or other founds would be really interested, I see no crucial problems. Temporal dangers, technical difficulties – yes. But, in principle, science publication, just as its production, *could* be run entirely by scientists themselves, without any (extremely costly) intermediaries. Also tax payers would be much happier.

        Unfortunately, the roots “library-publisher market” lie deeper: these constitute the core of the whole science organization system. Starting with hiring, and finishing with foundation. Be this system bad or not, but it works. And nobody is interested in destroying this status quo. After all, this is about a very cardinal change of the entire system, of the whole “publish and perish” ideology, not just of some technicalities. This is why I am very skeptical concerning any changes in this issue. [ Sorry for being so lengthy … ]

      6. Thanks! So it seems you don’t have deep objections but are skeptical. Incidentally, it looks to me as your sentence “tax payers would be much happier” contradicts “nobody is interested in destroying this status quo”.

  6. I like how practical your proposal is. We often hear about open publishing, but nobody comes up with concrete steps like the ones in your post.

    I don’t know if you are aware of the experiment pursued by Aram Harrow & the TQC commitee in 2014. They posted all referee reports on Scirate, which is an ArXiv overlay: https://scirate.com/tqc-2014-program-committee/comments
    I don’t know how successful it was, but definitely worth noting it.

    1. Thanks! And thanks for pointing out that very interesting experiment, too. I wasn’t aware of it. Here’s what their call for papers says:
      “A new and experimental policy for TQC 2014 will be partial open reviewing. For accepted talks, a digest of the PC discussion will be posted on the website SciRate. The reviews will be signed by the PC as a whole and may not include the entire PC discussion. One of the main purposes of publishing reviews in this way is to help create a resource for quantum information that is similar to the AMS’s MathSciNet. ”
      I like this, though I think that my priority would be to require that submissions are on the arxiv.

  7. This is a bit naive, in my opinion. While this strategy may be best in 90% cases, you have to allow for other cases. Making a rule “only arXiv papers can be submitted” changes a culture in some cases for the worst. Roughly, it benefits the “rich” at the expense of the “poor”. Let me explain.

    Imagine a graduate student or a mathematician from a third world country who has been working on a problem for a while and has achieved a solid progress, but not enough to completely resolve the problem. Should she/he publish the paper? Yes, I think so. People in sciences need to write papers to justify their effort and positions to themselves and others (to get funding, to graduate, etc.) Should he/she give a talk on the result? Sure. The talks rarely contain enough technical details for the listener to reproduce the paper and more exposure wouldn’t hurt. Bu should she/he publish the paper on the arXiv? Not necessarily…

    It would be better in some cases to continue working on the problem, look for new tools or coauthors experts in other fields, which when combined with existing ideas/techniques would resolve the problem. But the arXiv posting effectively makes the paper available to everyone and if the problem is worth the effort, there are excellent people who might already be experts with missing tools who might be able to resolve the problem and post on the arXiv a week earlier. In fact, even if they don’t follow arXiv, the Google Scholar will ring a bell – hey, go read that paper, it’s for you! I have seen this happen to several graduate students – it’s incredibly demoralizing…

    In summary, for people like me who are not all that worried about the competition (and have a tenure, anyway), the system you propose is great. It’s nice to have a searchable free database of all existing knowledge. But for the most vulnerable researchers in fields with high entrance barrier – this system has heavy downsides.

    1. Thank you for your comment. Actually, an intended effect of the reform is to close the gap between “rich” and “poor,” so I am very interested in your further reaction. Before explaining further let me say that the community I have in mind, and the one I am mostly familiar with, is theoretical computer science. Other communities may have different dynamics, though I suspect they do not.

      I claim the current system hurts the “poor” very badly. Suppose you are an outsider from a third world country and you want to start working in hot area X.

      “To learn the background, typically you have to read papers. However, for every paper that you read, it is not uncommon that there is another one which is or was under submission. Indeed, the community is producing great results the majority of which is rejected due to capacity constraints. So unless these works are on electronic archives such as the arXiv, you don’t have access to them.

      Who does? The experts of area X, to whom these papers are sent so that they can be properly evaluated. But it may be hard for reviewers to ignore submissions until publication. Suppose for example you have been working on problem Y for months and now you are asked to review a paper that solves Y. Are you going to ignore this information and keep working on Y despite knowing that you will be beaten? Also, when the paper does come out you’ve had a long time to internalize its implications.

      The edge currently given to an insider over an outsider is months if the paper is accepted right away; it may be years otherwise.”

      (The three paragraphs above are taken from the earlier post
      LINK.)

      Moreover, I think virtually every researcher is hurt to some extent by the current system, because they are “poor” in at least some area. So the current system reinforces the partition of research into (sub)areas, making it hard for an outsider to leave their own.

      Finally, I don’t understand why, in the current system, appearance in the arxiv would attract people’s attention but publication in a conference or journal would not. Also the “this paper is for you” bell is rung for the experts when they are called to review the paper. I don’t see what should prevent them from working on the problem if the paper gets published in a conference or journal.

      1. Let me reply point by point, in no particular order.

        1) Your made a global claim which may or may not be applicable to your field. I am warning against it in my field. Either you have to be very specific about which field are you talking about, or admit there might be problems with other fields where the cultures are different.

        2) In mathematics, conferences play a completely different role. They are served as a mean of advertising, usually but not always post-arXiv publication, and job searches are independent on it. No papers are submitted and investigated by a large committee. So no experts are looking at student’s work because even post-conference they still don’t have access to it. The papers remain non-public when the authors choose so (a minority of them, but a vulnerable minority which needs and deserves some protection).

        3) In many areas of mathematics, the cycle from submission to publication takes 1 to 3 years, enough to finish working on a problem. It’s refereed by 1-2 people, and only the referees plus the handling editor have access to it. It is considered to be extremely unethical (and easily verifiable) to use result of non-public paper you are refereeing to make your own mathematical advances.

        4) “You’ve been beaten” issue. That’s why people in my area like to give talks on an unfinished or otherwise unavailable paper, so make some kind of claim on the problem. If your arXiv+conference scheme is the only way to stake a claim, and your job prospects depend on it you really don’t leave “poor” much of a choice. Arguably, this system has its own downsides since occasionally people announce solutions of problems which turn out to be invalid, and it takes the community a couple of years to understand this.

        5) You can’t see the papers that were rejected which the authors don’t want to put on the arXiv for whatever reason – tough bananas! You also can’t hear what people are really thinking when they say “I really liked your talk”. Maybe some papers have bad mistakes, or awfully written in the hours before the submission deadline, or have underdeveloped ideas. Doesn’t matter. If people want to keep their results private it’s their right and forcing them to do something they really don’t want is unseemly (as I said, it would be ok’ish if the conferences played no role in job or grant applications).

        6) “This creates partition” issue. That’s why your community might want to consider moving away from super competitive STOC/FOCS style conferences, which accommodated the growth of interest by walling themselves off from a lot of neighboring areas, not unlike zoning laws in urban planning. They are the ones who are hurting poor the most. And I hear some senior people already advocate for changing that system. You cure the illness, not the negative effects.

        In summary, You might be right that in some ways the current systems also hurts “poor” at least in your field. That’s clear and beside my point. My point is that changing it will solve some issues but create some others. There are always unintended consequences, but in fact the scenario I presented is completely expected. Since your blog post is adamant about having no downsides without being specific of the field of study, I thought I would point this one out.

    2. Igor, I would slightly disagree: when the “rich” (experienced but not fair) scholars want to “get it” after reviewing the paper – this is even easier when staying “anonymous”. After a partial result is loaded to arxiv, they have a problem then.

      But I fully agree with your claim that it is much better to look for experts *directly*, not via arxiv. At least because then the expert with new ideas and I with my old are are producing a *new* result, not just an improvement of arxiv … version.

      Yet another technical problem I see is the impossibility to REMOVE the paper from arxiv. So, if I *must* load the submission to arxiv, and conference/journal referees find a crucial bug – what then? Should I then not better put my paper for 2-5 years in a drawer, check, re-check many times … to avoid such a “disgrace” ( a label “the paper is buggy”)?

      1. Re the technical problem: I am not sure why that would be a disgrace: there are many examples of serious research where bugs were later discovered, even on the arxiv . On the other hand, finding a bug is immensely easier if the paper is public.

      2. Re your last sentence: agree. But being “public” and being “published” is still different, I think. Say after journal/conference referees have found a bug, I still have an option to withdraw the paper. But no withdraw is allowed in repositories like arxiv, ECCC, etc. Frankly, I do not understand this behavior: why not just to let a message “the paper had a bug and is withdrawn by the author”, instead of keeping this buggy paper “published” forever? What is the rationale her?

      3. Under “nobody” I, of course, meant “those who could change”. But also researchers themselves, in some extent: people have already adopted them to this system. So, even your much more “innocent” suggestion (all publications in repositories, J&C just give badges) will, I am afraid, be accepted as an extremely radical change … Even signed boycots by prominent scientists were able to achieve much less. Still, now we at least can read, say, TCS or JCSS older papers for free. I hope such actions will not stop, and we will achieve a bit more changes, step by step. Most importantly – in the entire ideaology of current “science organization”.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s