My original PWLConf reading list on patch generation

I loved speaking at the inaugural Papers We Love conference, co-located with Strange Loop, in the disarmingly cool city of St. Louis. I’d never been to or spoken at a PWL event before, but I’ve had a great time getting to know the community. The basic idea is a bunch of  meetups where participants (a mix of industry and academic types) present/discuss academic papers that they, well, love. 

I of course violated the ethos by speaking in part about my own work in the context of an introduction to patch generation research. Whoops. In my defense, I sang the praises of not-my-work when I discussed recent results from Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury (all from the National University of Singapore) on Angelix, new technique for multiline semantic program repair (a paper that I do, indeed, love).

Anyway. When I was first asked for which papers I would touch on in my talk, I was like HMMMMMM what is the ESSENTIAL reading list for a person who has no former exposure to this area and yet wants to learn ALL OF THE THINGS I MIGHT EVER TALK ABOUT. And so I produced a 10.5- item bibliography (that left a lot out!). Brevity: Not My Thing.

The organizers gently suggested that I slow my roll, and so I cut it down to the 2.5 papers I actually discussed in any detail, though I did touch on the work in many others. I’ve been asked to post the more extensive list, so here we are. 

Caveats: tracks work in automated program repair, is kept up to date, and is significantly more complete than my list. My list was constructed at a particular moment in time to support a particular type of talk on source-level patch generation for bug repair (ignoring dynamic/web languages and contract-based repair entirely); it thus should not be treated as the gospel word on “the Important Papers in Program Repair.” Instead, think of them as “the papers Claire pointed to as a reasonable overview that fit well within the context of a talk she gave on the subject.” That is: If I left your favorite paper out its not because I think it’s not important.

Finally: if you like this stuff, consider grad school. And check the box next to PhD in Software Engineering when you apply to CMU.

Reading strategies: As context, patch generation is generally conceptually divided into “heuristic” strategies and “semantic” strategies, but there is a general sentiment in the community that they are beginning to merge. The challenges in the space traditionally break down into scalability, output quality (are the patches good?), and expressive power.

Based on interest: if a reader wants…

  • …to read a SINGLE paper to get the flavor, read (1), which overviews one of the first salvos in heuristic repair on a relatively accessible way. If she plans to read anything else, it’s probably not worth the trouble.
  • …a more comprehensive picture of heuristic repair, read (2) and/or (3) (2 gives a complete story; 3 is more recent and has cooler experiments) and then at least (6) (and 6a, if she’s feeling completist). If the reader is on a tear she should read (7), too, especially if she cares about Java.
  • …an introduction to semantic repair, read at least (5), and possibly (4). If both are read, read (4) first. I included (4) at least in part because it has a really beautiful background section on the synthesis technique that underlies the family of approaches, so it may be worth referencing that background section while reading (5) instead of reading them both. They link other papers in this line of work at their website
  • …human-rated assessments of patch quality, read (7), and/or (8).   
  • …to know about using human patches to inform the construction of patches, read (6) and (7). Possibly in the other order. . There’s more like this, especially recently, but I’m trying (and failing) to be selective!
  • ….to know about the in the ways semantic and heuristic approaches are starting to come together, read (9) and maybe also (5). The evaluation in (9) is based on the work in (10), which may not be quite worth reading on its own for this hypothetical reader but as I included some results from it, I thought it worthwhile to list.  

Based on number of papers to read (these suggestions shouldn’t be taken as gospel, but I did my best to make it sensible):

  1. (1)
  2. (2) or (3), (5)
  3. (2) or (3), (5), (7)
  4. (2) or (3), (5), (7), (9)
  5. (2) or (3), (5), (7), (6), (9)
  6. (2) or (3), (5), (7), (6), (9), (8)
  7. (2) or (3), (4), (5), (7), (6), (9), (8)
  8. (2) or (3), (4), (5), (7), (6), (9), (8), (10)

Papers (I’ll add links when I’m not on my iPad!): 

  1. Westley Weimer, Stephanie Forrest, Claire Le Goues and ThanhVu Nguyen. Automatic Program Repair with Evolutionary Computation Communications of the ACM (CACM) Vol. 53 No. 5, May, 2010, pp. 109-116.
  2. Claire Le Goues, ThanhVu Nguyen, Stephanie Forrest and Westley Weimer. GenProg: A Generic Method for Automated Software Repair. IEEE Transactions on Software Engineering (TSE) 38(1): 54-72 (January/February 2012)
  3. Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest and Westley Weimer.A Systematic Study of Automated Program Repair: Fixing 55 out of 105 bugs for $8 Each. International Conference on Software Engineering (ICSE), 2012: 3-13.
  4. Hoang Duong Thien Nguyen, Dawei Qi, Abhik Roychoudhury, and Satish Chandra. SemFix: program repair via semantic analysis. International Conference on Software Engineering (ICSE), 2013: 772-781.
  5. Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. Angelix: scalable multiline program patch synthesis via symbolic analysis. International Conference on Software Engineering (ICSE), 2016: 691-701.
  6. Fan Long and Martin Rinard. Automatic patch generation by learning correct code. Principles of Programming Languages (POPL), 298-312.
    (This builds on (6a) Fan Long and Martin Rinard. Staged program repair with condition synthesis. Joint Meeting on Foundations of Software Engineering (ESEC/FSE), 2015: 166-178. …Which is also what the Angelix paper compares to, so if you’re feeling adventurous/completist, you might read both.)
  7. Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim. 2013. Automatic patch generation learned from human-written patches. International Conference on Software Engineering (ICSE), 2013: 802-811.
  8. Zachary P. Fry, Bryan Landau, Westley Weimer: A Human Study of Patch Maintainability. International Symposium on Software Testing and Analysis (ISSTA), 2012: 177-18.
  9. Yalin Ke, Kathryn T. Stolee, Claire Le Goues, and Yuriy Brun. Repairing Programs with Semantic Code Search. In Automated Software Engineering (ASE), 2015: 532-543.
  10. Edward K. Smith, Earl Barr, Claire Le Goues, and Yuriy Brun, Is the Cure Worse than the Disease? Overfitting in Automated Program Repair. Joint Meeting of the European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), 2015: 532–543.

Things I Keep Repeating About Writing

I often write papers with students, or read students’ papers to provide comments, and I find myself saying the same things over and over, especially the first time out.*  So: here’s a blog post I can point them to to (hopefully!) save us all some time and trouble. I plan to update it as I remember more things I say repeatedly.

I’m happy to argue these points, and take suggestions to expand the list.  I’m not claiming that I’m the world’s foremost writing expert, and some/many of these are the product of relatively arbitrary preference.  But, (A) this is targeted first and foremost at my  own students, so my preferences matter, and (B) I’ll try to justify when I can.

This list isn’t a complete delineation of all the rules of English grammar.  Follow those rules too, even if they’re not on this list.

* These are not the kinds of comments I typically make when reviewing, where I focus less on style.

Use clear and precise language.

Use short, declarative, active sentences.  BANISH THE PASSIVE VOICE. If you went to an American high school you probably need to retrain your instincts.

Use adverbs and pronouns judiciously:

  • Adverbs are often imprecise: what does “incredibly” add to the phrase “incredibly important” that the word “important” lacked on its own?  How much more important than important is something that is incredibly important?
  • Pronouns are often unclear with respect to their antecedents, which can confuse the reader.

Be as explicit/concrete in your statements as you can.  This is perhaps best illustrated by example (courtesy Yuriy Brun): Instead of “The dataset has a few attributes.”, say “The dataset has 22 attributes.”  Avoid descriptors like “a number of” or “several”, which rarely add meaning.  Instead of “We performed a number of experiments.” or “The cat had a number of lives.”, try “We performed four experiments.”, “The cat had nine lives.”

(To highlight the point, consider the sentence(s) without “a number of”: “We performed experiments.”/”The cat had lives.”  See how the meaning didn’t really change?)

Related: do not use more syllables than necessary. 

Two easy manifestations of this rule are the following transformations that can be applied universally to your draft:

  • “In order to” –> “To”
  • “Utilize” –> “Use” (unless in the context a discussion of CPU utilization, where it’s reasonable).

The point of writing is to communicate an idea.  Using more syllables than necessary obscures the idea without adding meaning.

Present numbers properly.

Write out in letters all positive numbers less than or equal to 10, unless they are in a sentence with a number greater than 10 (ETA: like 110, which makes this sentence comply with my rule).  I don’t know why.

Right justify columns of numbers.  I will repeat this in all-caps, because I really mean it: RIGHT JUSTIFY COLUMNS OF NUMBERS.  Ensure that the correct number of significant digits are used (your stats package is giving you waaaaay more than is appropriate), and that decimal points align.

You will argue with me about this, because you really want to left-justify or center them.  I don’t know why.  A reader should be able to quickly scan a column of numbers to get a sense of magnitude, and cannot do that if they are left-justified unless they are all (coincidentally) the same order of magnitude.

Text in columns should be left justified.  Never center anything that’s not a column header.

Typesetting/copy-editing minutiae.

(On all of these, the answer to Why? is usually: Because.)

Capitalize Table, Figure, and Section.  Refer to sections only, never subsections, even when you’re referencing an actual subsection (e.g., Section 4.1, not Subsection 4.1).  Include a non breaking space (~) between the words Figure/Section/etc and the \ref.

Capitalize and punctuate section/paragraph headings/captions consistently.  If one ends with a period, they all should.

Do not use citations as nouns.  No: “In [14], Hazelwood et al. describe facts.” Yes: “Hazelwood et al. [14] describe facts.” (H/T Kim Hazelwood)

Citations go before punctuation, with a non-breaking space between the word and the citation.  Footnotes go after the punctuation, with no space.

An em-dash is three dashes in latex.  You use these to offset text, like a parenthetical but without parentheses (I’d give an example but wordpress converts my triple-dash into an em-dash automatically so it’s hard to see!).  An en-dash is two dashes and is only used for ranges (like page numbers).  A single dash is used in hyphenated words.  You probably don’t need to hyphenate compound words nearly as often as you think you do. No spaces around dashes; sometimes a space after a hyphen, depends on the circumstance (e.g., pre- vs. post-condition).

(The actual rules for dashes and hyphens and compound phrases are complex, so beyond that I’ll punt to another website instead of typing them all out.)

Abbreviations should include appropriately placed periods, that is, after every shortened version of a word.  So “also known as” is abbreviated “a.k.a.”; versus is abbreviated v. or vs.; “et cetera” is abbreviated “etc.” (a mistake I made in the first draft of this document!).  Et al. is another one, and a pet peeve (period after the al., which is short for alii, not the et, which just means “and” and isn’t shortened). Et al. should not be italicized, though I took some convincing on this. It should be separated from the preceding name with a non-breaking space.

Always put a comma after i.e. and e.g., and use them properly (i.e. means “put differently” or “in other words”, e.g. means “for example”).

It didn’t initially occur to me to include “use the Oxford/serial comma,” because doing so is so obviously correct.

Make your figures and tables maximally readable.

Do not hit the page limit by shrinking your tables and figures.  Assume your reader is old, blind, lazy, and also colorblind.  Print out your paper at least once on physical paper and make sure you can read the figures and tables.  I do actually complain about this when reviewing.

Choose colors for graphs and figures that show up when your paper is printed in greyscale.  Go to and choose “colorblind safe” and “print friendly” to find color combinations that work.

Use booktabs for tables.  They look so much nicer and internal rules do not actually increase readability.

The default font size for labels on graphs coming out of basically any package (Excel, R, etc.) is too small.  Don’t let the defaults boss you around.

Use latex, bibtex, and version control in a way that makes your advisor happy.

There are myriad differing opinions on this; of all the “rules” on this page, these are almost certainly the most CLG-specific.

Naming.  Name your .tex file (and project/directory) something more informative than “paper”.  Reasonable schemes include but are not limited to:  “lastname-projectname-year”, “projectname-venue-year”, “lastname-venue-year”.

Version control. I prefer to collaborate using git, mercurial, or svn, through a hosted repository.  github or bitbucket are fine.  My username basically everywhere is clegoues.

Do not check in byproducts of the build process, including the PDF. If you do, we will conflict every time we commit, which is annoying.

Because I like to use git/hg/svn, I strongly prefer hard line breaks throughout a document.  My editor default is 80 characters.  Fewer than that is fine; longer gets silly.  Some people like to line break at the end of sentences, which I think is weird but preferable to no line breaks at all.  Note that I don’t “rewrap” unless things get crazy.  The point is just that if lines are roughly 80 characters, line-based diffing and merging (as done by git/hg/svn) works pretty well and simplifies collaborative editing.  If paragraphs are all one long line, merging becomes substantially more difficult.

Tools. I prefer to write papers in emacs and will add a Makefile to your directory, and then build the paper using “make”.  You can do the same, or use whatever other editor/tool you like.

I tend to dislike shared latex editing sites like sharelatex, but make allowances, especially when there are fewer than three collaborators.  I prefer those options to emailing a Word document around.  I prefer that to those WYSIWYGs that generate latex, which I won’t use. Google Drive is OK for early drafts, but I’d generally prefer we just skip to the latex.

Latex. I prefer latex documents to be structured as “all one file” rather than having sections or subsections in multiple latex documents and inserted via \input.  Dissertations/theses are a reasonable exception.  I compromise on this based on the preferences of my colleagues, but given a choice…

Leave space/a subsection/a paragraph for acknowledgements at the end so we can acknowledge sponsors without having to panic to make space right before the camera ready.

Bibtex. Give your bibtex entries reasonably indicative names.  If you cut and paste it from the web somewhere, ensure that it’s done properly (some sites make everything a @misc, which is almost always wrong) and modify the bibtex so that it’s reasonable.  Definition of reasonable: special characters are copied properly; authors names and title are spelled/capitalized correctly (don’t forget non-breaking spaces where relevant, like in your advisor’s last name…); includes venue, preferably spelled out along with its acronym, but you can drop the “Proceedings of the 23rd Annual ACM/IEEE blah blah” in favor of just “International  Conference on Software Engineering”; includes year and page numbers.  The rest is mostly optional.

Fair warning: I tend to insert broken bibtex cites as I write to remind myself/you to put references in appropriately.


(Shout out to the numerous others who commented/made suggestions/nit-picked my own copy editing, with especial thanks to Kim Hazelwood and Yuriy Brun, two of the only computer scientists I’ve ever met who are bigger sticklers than I am on grammar/typesetting.)

The ICSE submission cap, 2016 Data Edition.

[ETA: the Townhall will start at 5:45 in the Glass Oaks room, and will include time for discussion and questions, as well as a panel of experts and senior members of the community who will address the issue.]

I’ve been corresponding with Tom Zimmermann of Microsoft Research, who has analyzed ICSE 2016 submission data along the lines I explored in my previous post. Tom received permission to share his analysis, and subsequently gave me the same, as an update to augment some of my previous observations with more recent numbers (one year’s worth of data is not exactly satisfying).  As Tom has more information from 2016 than I did for 2015, he could make somewhat more nuanced observations about the potential impact of the cap (though naturally we don’t know what would actually happen; incentives are weird).

For completeness, note that the official number of submissions to ICSE 2016 is 530; this/Tom’s analysis is based on the data for approximately 500 of them, omitting at least the desk rejects.  That said, in 2016:

  • 42 of ~1482 authors (approximately 2.8%) submitted 4+ papers. Those 42 people co-authored 142 submissions in total.
  • The 42 “high volume” submitters  co-authored with 364 other people, or 25% of the submitting authors. This totals 406 submitting authors (~27%) potentially affected in some way by the 3-paper policy.

There are several high-level possibilities regarding how many papers would be blocked by the three-submission cap[1]:

  • In the most conservative scenario, the cap prevents 16 submissions from ICSE 2016 (142-3*42), saving 0.53 reviews for each of 90 PC members, and 1.14 for each of 28 PB members (assuming each submission gets to final round). This scenario counts each paper once, even if its author list includes multiple high-volume submitters.
  • If we account for multiple frequent submitters in the author lists, the policy could save up to 75 submissions.
  • The true value (assuming no other changes in behavior, which is dangerous) is likely somewhere in between (unless we all  start submitting up to the cap, which could happen).  If we assume that high volume submitters would have perfect oracular knowledge to only submit those papers that were ultimately accepted, the number of papers saved is around 50 (or a paper and a half per PC member).

Tom also divided submissions into two groups:

  • Group A: Submissions where ALL authors have at most three submissions (unaffected by ICSE 3-paper policy)
  • Group B: Submissions where at least ONE author has four or more submissions  (affected)

Interestingly, it looks as though the submitters with more papers submitted (submitting papers in group B) are more successful in getting papers accepted (ICSE 2016 had a 19% acceptance rate overall, for context):

broup accepted submitted rate
A (not affected by policy) 64 358 18%
B (affected by policy) 37 142 26%

IMPORTANT CAVEAT: CORRELATION DOES NOT EQUAL CAUSATION. [ETA and I’m not making any claims about statistical significance, not that one conference’s worth of anecdata should be enough to set any conclusions in stone anyway] That is: I’m not saying that you should submit more papers to increase your acceptance rate. Rather, it seems plausible that high-volume submitters are established, active members of the community with many collaborators, and are thus perhaps more likely to write high-quality papers. Indeed, the overall picture suggests a positive correlation between number of papers submitted and acceptance rate: Authors submitting 4-5 had the highest success rate in 2016.

One final point of interest, at least to me, is that the policy seems to affect submitters from some countries more than others [2]. In particular, submitters from China, Canada, Singapore, and Hong Kong were more likely in 2016 to participate in papers with high-volume submitters.

I’m not drawing firm conclusions, here; I’m just adding more recent data to the previously discussed numbers.  I’m frankly too tired to do anything beyond that at the moment (I blame my very minor jet lag!).  I thought it relevant to post now, however, given the panel scheduled during ICSE dedicated to this issue.  I believe (though am uncertain) that this panel is scheduled during the Town Hall session, Wednesday evening.  I will update this post again if I hear otherwise.

[1] Note that this doesn’t reduce the reviewing load overall; these papers will still be submitted/reviewed by the community, just not by the ICSE PC.^

[2] If I get the time in the next day (keynote tomorrow morning!), I’ll update this post with a table to that effect; I have the data, I just need to make it legible in blog form!^


A reluctant ICSE submission cap post or: an exploration of primary sources

(Context: I was on the ICSE 2016 PC and I am on the ICSE 2017 PC.  I have never submitted more than three papers to ICSE.)

Much recent brouhaha in the software engineering research community on the new 3-submissions-per-any-individual-author cap imposed by the ICSE 2017 organizing committee.  I’ve been resisting wading into this, but the recent email sent by the PC chairs (for whom I sincerely have nothing but the absolute highest respect) to the the PC notably invites/welcomes respectful discussion on this and any other policy.  It also includes the following:

Some detractors have been vigorous in their opposition, but we can point out that the policy has both its justification and (mostly silent) supporters.

This sentence has been rattling around in my head. Honestly, I’ve been finding those noisy detractors fairly convincing, though I’m fundamentally sympathetic to the plight of PC chairs in general. That said, I was bothered by the “silent supporters” and “existence of justification” claims absent further expansion.

There are various types of evidence that could be presented based on data from previous ICSEs that might convince me that the cap is a good solution to an important problem.[1] For example, if this policy might cut the review load by a third, or if the majority of submissions receiving Cs or below came from bulk submissions, I might come around.

Reflecting on this, I thought, well, maybe that data is available. I wouldn’t want to claim otherwise if I simply hadn’t looked hard enough. The email to the PC, which I’m trying not to quote too liberally because even bulk emails are private correspondence, includes:

The potential problem of bulk submission to ICSE was first documented by the chairs of ICSE 2015, and the idea of limiting the number of submissions was then suggested as a potential solution. All this information is publicly available:

To the primary sources, then.  The basic claim is that the review load is increasing beyond the bounds of scalability and that this cap is important to improving either the quality of submissions or the sustainability of the ICSE review load. From the blog post articulating the policy publicly:

One of the main reasons for this policy is that every year more people submit more papers, but the pool of qualified reviewers willing to make the necessary commitment does not grow in proportion… We are stuck in a vicious cycle.

Long story short: it’s not evident to me, based on the 2015 Technical Report (which details a system with an actual physical meeting, which should scale less well than 2016/2017’s Program Board model), that we are in such a cycle.  Three bullets stand out from the Executive Summary:

  • Page 2, under expertise and quality: “- Reviewing expertise was higher than previous ICSEs (as reported by authors and reviewers)”
  • Page 2, under reviewing load: “-90% [of reviewers] agreed that the load and schedule was manageable”
  • Page 3, under “Brief reflection from the chairs”, which includes suggestions on what to “keep”, “refine”, “explore”, or “drop”: “Drop: Panic about scalability of reviewing process.”

That is: The reviews were good, the load was manageable, and all the strum und drang about review process scalability is unwarranted.  These points are expanded in the document proper.  From Page 32, Section 4, Reflections from the Chairs (emphasis original):

Balanced process. We believe that, given the number of submissions and its rate of growth, the reviewing model we used this year struck a good balance between maintaining a manageable load for reviewers…Note that simply adding more RC members to contribute reviews in the first phase would make the process scale further.


No reason to panic about scalability of process. …when considering the 452 submissions for this ICSE, the growth may be linear but with a very small coefficient. This clearly requires close monitoring in the future but no dire measures.

This sounds like we have/had a sustainable model that may be scaled by adding  a small number of committee members[2] and shifting the load somewhat.[3] It does not sound like a vicious cycle.

2015’s chairs do propose an exploration on bounding the number of submissions per author. On page 35:

Bounding the number of submissions per author. It may be worth exploring whether defining a maximum number of submissions per author would help to curb the abusive shotgun approach to submissions and encourage authors with multiple collaborations to submit just their best work…enforcing a limit of three submissions per author would reduce the number of submissions by approximately 8% and we conjecture that the program will not suffer.

That amounts to 1.44 fewer papers on average, for those of us reviewing the max of 18.  This is basically marginal. I have served on three PCs with review loads of 18+ (and several smaller venues) over the last 12 months.[4]  I wouldn’t complain about a reduction, but the difference between 18 and 17 does not determine whether I accept an invitation.

The final claim is that this policy will encourage authors to submit their best work.  But: 2015’s chairs say that only 34 papers would have been blocked by the cap.  Even if they were all rejected, so were 334 others.  Table 3 outlines the score distribution: 220 submissions overall received only Cs or Ds. If we assume that every single one of those 34 papers is terrible (which the report authors do not), they could constitute no more than 15% of the terrible papers, leaving 186 other terrible papers to review. That’s a third of a terrible paper less to review per PC member, which again is marginal (and represents a best case).  I’d bet that most terrible papers are submitted by authors on only one submission, simply based on the underlying distribution.

Somewhat on that subject, from Section 2.4:

The large number of submissions for some authors may be a reflection of an undesired and costly shotgun submission pattern, but it also seems to be associated with authors that carry many active collaborations. We have not enough data to tease this out further.

This section does not provide information of the number of submissions per co-author, nor the number of acceptances per bulk-submitting author. It does say that the vast majority of authors (their words, not mine) are only on one submission, which concerns me because it means that selecting down for prolific submitters will prevent submission outright for their collaborators.

tl;dr: Based on the 2015 report, which is the only ICSE- or even SE-specific document I’ve seen cited in this discussion, I do not see the strong motivation for a hard submission cap.  I’m interested to know if the 2016 data is notably different.  Adding overhead to bulk submission may be reasonable, and other venues have done this.  As a person who serves on multiple PCs, I like the idea of having authors who resubmit a paper include the previous reviews and a list of changes (…on the other hand, reviewing an identical paper a second time has the definite benefit of being easy…).

The PC chairs have stated that there is justification for this policy and have promised an FAQ on it.  I believe them, and if my opinion is worth anything, it would be wonderful  if that FAQ contains empirical evidence to substantiate that this policy will (A) be impactful in improving the quality of submissions/review experience, while (B) not harming the submission prospects of graduate students or other junior collaborators. The evidence cited so far is incomplete, but what it contains is not fully convincing on these points.

[1] As part of our decision to implement double-blind review, my PC co-chair for SSBSE 2014 (a much smaller symposium, its worth mentioning; I’m not at all claiming it’s equivalent to ICSE. Though I do think the latter should also do double-blind, FWIW) insisted we write a blog post justifying the decision. My first reaction was “…uh, because we’re the PC chairs and we said so?” But I did it, because Shin was right.  My opinion on double-blind review is informed by the substantive body of work in implicit bias in general and the study of academic double blind review in particular, and I highlighted some of that literature in the post. I don’t believe we convinced everyone, but I do know anecdotally that members of the community appreciated our data-driven approach to conference organization decision making.^

[2] This year’s chairs mention that it is difficult to fill a PC.  2015’s chairs speak favorably of the benefits of recruiting and cultivating junior community members (like myself, which I greatly appreciate, bloviating blog posts notwithstanding), mentioning that “This is, in our opinion, a key way to educate future leaders in the ICSE community and, as such, it has a great value for the community.”^

[3] I do like a two-phase review system. I reviewed the same number of papers for ISSTA as I did for ICSE in 2016, but in two phases, and it felt much more manageable.^

[4] I review a lot.  I’m learning to say no.  It’s a process.^

Opinions my own, not my employers’, etc etc.

What The Bachelor Teaches Us About Choosing a PhD Advisor

Two of my  ongoing professional quests are to provide insight the processes of CS academia to those who would benefit from it and to increase the number of people who meet me and say “Oh, I know about you!  Jean Yang mentioned you on her blog!”  To those ends, over at her blog, Jean and I collaborated on some advice to prospective CS PhD students choosing between potential advisors, with lessons from our favorite reality TV show.

Free/Fair? Or: A Somewhat Bizarre Request to Fellow Harvard Alums

Hey, fellow Harvard alums: This year, when you get a ballot for the Harvard Board of Overseers Election in the snail-mail, instead of throwing it away without looking at it: don’t. Instead: vote.

The Board of Overseers is a group of 30 individuals each serving 8-year terms. Harvard says:

Drawing on the wide-ranging experience and expertise of its members, the Board exerts broad influence over the University’s strategic directions, provides counsel to the University leadership on priorities and plans, and has the power of consent to certain actions of the Corporation.

The “Corporation” is the president and fellows, who approve the university budget, major capital projects, endowment spending and tuition charges, etc. The board is elected by anyone with a Harvard degree who actually votes on those ballots they send to us every year that we then throw out.

This year, there is a slate of 5 individuals running on a platform they’ve dubbed “Free Harvard, Fair Harvard.” It’s confusing for approximately 155 reasons, not the least of which being that it includes Ralph Nader. On face, they argue for the following:

  1. Increased use of endowment income to make Harvard more accessible
  2. More transparency in Harvard admissions

Plank 1: they want to eliminate tuition.  This is the major point of coverage in the popular media.  The argument is: (A) Harvard can afford it, (B) doing so would increase access and applicant diversity, because students from under-represented socio-economic groups would be more likely to apply, and (C) eliminating tuition would have wide-reaching social effects by spreading to other institutions.

(A) is mostly false. Endowments aren’t liquid. The endowment is not a checking account, and the size of the endowment/return is not indicative of the amount of money available for tuition support. (I acknowledge the lack of objectivity on the part of the University spokesperson, but it’s still true.) Yes, the University should restructure the endowment to free up more money for tuition support and outright reduce tuition, and elicit donations just for this purpose. But: that’s not the same as using endowment returns to make Harvard free, overnight.

(B) is almost certainly true.

(C) is unlikely. The potential impact is limited to schools with sufficient resources to do the same, of which there are only a handful. Offhand (based on no information) just lowering tuition might have more impact, because it’s more realistic for other schools to follow while still being a legitimate kick to the system (fun fact: scholarship dollars spent is a factor in USN&W’s algorithms, so raising tuition and scholarships simultaneously is good for rankings).

Increasing access and encouraging the underrepresented to apply are awesome goals. Current stated tuition is obscene. It was obscene when I attended, and it’s worse now.

That said: Why on Earth should Harvard be free for everyone? It’s often reasonable to exchange money for services.  More than 30% of the class of 2019 comes from families making more than $250k a year.  Even though the financial aid office certainly makes strange need calculations at the boundaries, I’m willing to bet that many students supported by families making >= $250k/year can afford to pay some amount of money for their Harvard educations.

On the one hand, this is a chicken/egg problem: if there were greater access, less of the Freshman class would be so wealthy. On the other hand, it’s total overkill: It makes a Harvard education free for people who don’t need it to be, to solve what is effectively a marketing problem. Instead, Harvard could spend that money on reducing tuition overall, performing aggressive outreach, and ensuring that students from underserved populations are able to take full, effective advantage of their Harvard educations.

Plank 2: is more complicated: “more transparency in admissions.”

(I’ve found it difficult to write about this in a brief or linear fashion, so I’m trying to scope my remarks, with sadly limited success.)

First, note that comments from the slate tie this plank into a call to end legacy preferences (this is apparently Nader’s concern, though I can only find second-hand quotes from him on it), which would be awesome.  Legacy is ridiculous, a fact that is so obviously true I won’t bother citing it. However, based on no information, it’s not clear to me how Harvard can both maintain the alumni giving network that sustains the endowment without an implicit/explicit understanding about legacy.  This sounds awesome on paper but is probably impossible and not the real point anyway.

Back to that real point: At a high level, “more transparency in admissions” is related to evidence that Harvard and other elite schools discriminate against Asian and Asian American candidates. Unz says this outright, and his language exactly matches that of the plaintiffs in an ongoing lawsuit against the university demanding admissions transparency and asserting discrimination against Asian applicants.  Various members of the slate are involved in some way in that lawsuit.  This is a legitimate concern, or at least the evidence that substantiates it is pretty compelling (Unz’s link to the Economist summarizing the claim is broken, but you can find some summary data on his Free and Fair blog post).

Indeed, if discrimination were the only concern, I wouldn’t be so alarmed. But the evidence suggests that it’s not the only issue at stake. Additional fuel for this suspicion comes a bit sideways, but consider the way the campaign tries to explain the combination of planks via the NYTimes:

If Harvard omits tuition fees, more highly qualified students from all strata of the society will find opportunity to apply. Similarly, the university authority will find ease in balancing classes for racial or ethnic diversity and the Asian- Americans won’t lose out.

Uh…what? The conclusion only follows if Harvard were discriminating against Asian applicants because of a lack of socio-economically diverse applicants. I’m not saying it’s OK to discriminate against Asian/Asian-American applicants, but rather that there’s no obvious connection between that discrimination and a homogenous applicant pool. Increasing access and pool diversity can’t help on its own, because the two thoughts are disconnected. Either the candidates are stupid (unlikely), or something else is going on.

Going with the latter: With the exception of Nader, the public intellectual history of the candidates demonstrates strong opposition to affirmative action/consideration of diversity in college admissions. I’ll just give immediate and obvious evidence to that effect, in the interest of (dubious) brevity:

  • Unz writes in an article on The American Conservative:

    Conservatives have denounced “affirmative action” policies which emphasize race over academic merit, and thereby lead to the enrollment of lesser qualified blacks and Hispanics over their more qualified white and Asian competitors; they argue that our elite institutions should be color-blind and race-neutral. Meanwhile, liberals have countered that the student body of these institutions should “look like America,” at least approximately, and that ethnic and racial diversity intrinsically provide important educational benefits, at least if all admitted students are reasonably qualified and able to do the work.

    My own position has always been strongly in the former camp, supporting meritocracy over diversity in elite admissions. But based on the detailed evidence I have discussed above, it appears that both these ideological values have gradually been overwhelmed and replaced by the influence of corruption and ethnic favoritism, thereby selecting future American elites which are not meritocratic nor diverse, neither being drawn from our most able students nor reasonably reflecting the general American population.

    I could unpack that paragraph for days, but instead I’ll just assert the obvious: Ron Unz doesn’t like affirmative action. Which I refuse to scare quote even though he does.

  • Lee Cheng filed a brief in Fisher v. UT Austin supporting the challenge to the UT’s system, arguing against taking race into account in admissions decision making in almost all cases.
  • Stuart Taylor Jr. is the author of “Mismatch: How Affirmative Action Hurts Students It’s Intended to Help, and Why Universities Won’t Admit It.” Which is exactly what it sounds like.
  • Stephen Hsu, who gets my non-sarcastic vote as the most reasonable of the non-Nader bunch, has argued repeatedly for “Merit, not Race in College Admissions“.  I tangent on merit below.

Regardless of the stated platform, evidence in the form of their actions and writings in other forums suggests that these candidates are strongly anti-affirmative action/admissions diversity consideration.  Their calls for transparency in admissions are closely linked both intellectually and legally to their challenges to the use of race in admissions in this and other contexts.  Although aspects of the current admissions process are legitimately problematic, swinging to the other extreme that these candidates have advocated in many other places is probably a bad move.

However, even if they truly only want transparency and are not setting out to end diversity considerations in admissions, or even if I agreed with that goal, I still wouldn’t vote for them. Here’s why: Unz in particular has strongly intimated that he seeks to to enact dramatic, destabilizing change from a position on the board.  An overactive/radical board is basically the worst possible way to run a university.  In a stance that is admittedly influenced by my position as a faculty member at a major academic institution (and a UVA grad), I believe that shared governance, consensus building, and careful, nuanced  reasoning by conscientious people is a much healthier way to enact change in the university environment than a few angry bros dictating from on high how Harvard should do anything.  To be honest, I also don’t mind grass-roots revolution; I’m just opposed to the “angry bros on high” model

Executive summary: the Free Harvard, Fair Harvard slate is advocating free admission and more transparency in Harvard admissions. Free admission is impossible and also stupid, and the “transparency” claim is very likely a smokescreen for vehemently anti-affirmative action/diversity agenda. Even if it’s not (or even if it is and you think they’re right), the slate is advocating for destabilizing change to come from the Overseers, which is no way to run a university. Thus, I advocate that you vote for someone else in the upcoming board election.  Or, educate yourself and decide you disagree with me and vote for them anyway, but regardless: vote.  And either way, don’t be misled by Ralph Nader’s presence on the ballot.

A meritocratic aside: “Meritocracy” makes my skin crawl because tl;dr Silicon Valley Dudebros.  But skin crawlies is no basis for policy, so: Hsu advocates in the above-linked NYTimes article for a “strictly meritocratic” admissions model based strictly on GPA and SAT scores.  On it’s own, though, merit just means “the quality of being particularly good or worthy, especially so as to deserve praise or reward.” (google define: merit). Why that means “grades and test scores and nothing else” eludes me.  Admissions based on scores certainly is more objective than any “holistic” approach, but objectivity isn’t intrinsically praiseworthy.  Consider the following alternative admissions policy: admit all applicants above 5’10”; reject everyone else. This is very objective, but probably not the best way to admit a Harvard class.  Similarly, test scores do provide a more objective metric than many other factors, but that doesn’t make them more fair or a better way to assess merit than anything else. They just reward/are correlated with a different set of factors. For example, SAT scores are strongly correlated with socioeconomic status. As are grades, number of AP tests taken, ACT scores, etc (and height, actually.  And height is also strongly correlated with success! So maybe they should only admit tall people?).

This is a deeply complicated issue that gets into what kinds of students Harvard should admit and how they should be evaluated, and what kinds of graduates Harvard wants to produce (Stephen Pinker weighed in a couple of years ago). But this is probably best left to another post.  Suffice to say: discrimination is bad, but it’s fallacious on face to equate merit with just scores.  And also, neither the ENS model (which Hsu advocates) nor the resulting insane testing culture that dominates the French HS experience is necessarily to be aspired to.

**As usual, all opinions expressed here are my own and should not be considered representative of the opinions of my employer, spouse, neighborhood, family, dog, etc.  Wish I could figure out how to put this at the bottom of every post automagically, but so it goes.

CS PhDs and US Immigration Policy, a Long and Pointlessly Insane Saga

(Should I be working on a grant proposal? Yes, yes I should. Am I writing a blog post instead? Yes, yes I am.)

I’m beginning to wonder if anyone at the NYT actually knows anything about higher education/research in STEM.

Exhibit N: an ENTIRE ARTICLE about whether STEM graduates should get visas, without a SINGLE SOLITARY MENTION of the fact that your tax dollars are, by and large, paying for their PhDs [1].

I’m becoming a bit of a broken record on this, but: The US government funds the most successful scientific enterprise in the world. This is a major driver of economic growth/innovation (e.g., much of the technology in your cell phone came out of publicly-funded basic science). A large proportion of the money the US government gives us as research grants, especially in CS where we have fewer expensive infrastructure needs than, say, experimental physics, pays for graduate students’ tuition and living expenses. Without the students, we can’t do the science [2].

tl;dr: 60.1% of the CS PhDs awarded in 2014 were to nonresident aliens [3]. The pipeline distribution looks similar. So: we bring in students on visas, pay for their PhDs, and then threaten to send them home to compete with us. Note that the Taulbee survey suggests that this doesn’t happen much in practice, as many of the people they track seem to have found North American employment. But still: there’s always the stress, and the visa situation dampens the entrepreneurial spirit, because such graduates typically need to be sponsored by big companies or universities in order to stay.

Many foreign students I’ve spoken to are understandably mystified by the total insanity of this “system.” My own mother, a naturalized American citizen who experienced the post-PhD can-I-get-a-greencard? stress (30 years ago! Times, they do not change), regularly comments that the US government, having funded her PhD, should have insisted she stay! Indeed, other (normal) governments usually attach riders along those lines to the scholarships they give their own students, that stipulate policies like “you must come back to work for at least as many years as we paid for you to study abroad.”

Someone somewhere (Trump?) is liable to say something like “Admit more US graduate students!” Listen, there simply aren’t enough of them. My admittedly limited experience on graduate admissions committees strongly suggests that virtually any CS graduate department would struggle to fill their cohorts with US students, even if they totally ignored applicants’ qualifications.

People (including commenters on the NYT article) periodically get up in arms and claim that the visa lobbying done by companies like MS constitutes a nefarious strategy to pay foreign workers a lower salary (which is basically demented, because last I checked they pay more or less the same salary to starting engineers regardless of country of origin), but you really can’t say that about us. We pay all graduate students the same stipend regardless. We have literally no economic incentive at all to admit a foreign vs. a local grad student.

tl;dr part 2: As a taxpayer, I would like the record to show that I strongly favor government policy that encourages people with PhDs in Computer Science, especially (though not limited to) those that I helped pay for, to stay in this country.

[1] I’m setting aside the Master’s question, since students who get terminal MS degrees are significantly more likely to pay for them, for reasons that mystify me but are rightfully the subject of another post.^

[2] There are many fields in which we might discuss whether there are too many PhDs for the amount of work available for them, insert various words about the faculty hiring crisis in the humanities here. CS is really not one of them. There is a fuzzy boundary between theory and math where a graduate with a PhD in Computer Science might have a harder time finding a job either in industry or academia. I’m caveating those people away, because they’re a small fraction of the total CS PhD population.^

[3] CRA Taulbee survey is your friend.^

One Motto, Three Rules of Thumb

SSSG[1], the ISR SE program’s weekly research symposium, typically features the SE PhD students (and, less commonly, faculty) presenting either their work, or surveys of the work of others.  From time to time, though, we go a bit meta. This week, I gave a talk on talks, specifically on how I approach the task of structuring a research presentation [2].  I was asked, and am happy, to send out/link to/otherwise circulate the slides. The tricky bit is that they’re heavy on illustrative/goofy pictures and light on, you know, content.  I therefore added a bit of blog-based commentary to go with them (not the full talk, but enough to hopefully render the slides a bit more sensible). The notes roughly follow the slides.

(The irony of course is that an overall theme of my talk is/was that effective written communication is very different from effective oral communication and the strategies required for each are very different.  Whoops.)

Slide 2:  Take everything I say with a bucket of salt.

Slides 3-4: Although many of the principles apply broadly, the context for my comments is a conference presentation.  But even in introducing this context, I committed a (verbal) conceptual error, saying “I will be presenting this paper at ASE…”

Slides 5-9: Motto: I will not be presenting the paper.  I will be presenting the work.  Many presentation antipatterns arise from trying to translate a paper directly into a verbal presentation.  This distinction is subtle but important, because the strategies that inform effective written communication are  very different from those that guide effective oral communication.

Slide 11: I have three rules of thumb:

(1) Your audience will only remember three things.  Fortunately, you get to decide which three things you want them to remember.   One general set of goals for me in a conference presentation is to get my average audience member to remember the first three things on Slide 14 [3].  I illustrate a potential mapping from those high-level goals to the aforementioned ASE paper on Slides 15-19.

(2) Tell a story.[4]  We can assume your work is interesting, since it’s being published.  But even interesting stories are boring if overly detailed.  This is another of the important differences between a talk and a paper:  Papers should support reproducibility.  Talks should not.  Drop all the detail.  Focus on the narrative arc.

My point on slides 22-26 are that a story should have a discrete beginning, middle, and end, and a guiding conflict.  Don’t introduce three conflicts and only resolve one of them.  A single narrative thread should run through the whole thing. Slides 27-36 are then mostly self-explanatory.

(3) Never confuse your listeners.  This one’s kind of subtle. You might think “Of course I don’t want to confuse my listeners! I don’t want to confuse my readers, either!”  But really, we don’t write papers with a primary goal of avoiding reader confusion.  I’m not saying that it’s OK to confuse your readers, just that you don’t subordinate all other things in a paper to the goal of avoiding confusion.  That’s what I’m telling you to do in a talk. This again comes down to the differences between reading and listening (Slide 39).

I discuss two implications of this rule.

First, your listener should always know where you are, and why.  I learned post-talk that my remarks on Slide 41 are related to the psychological concept of “chunking” as related to memory.  And a note on Slide 42: unlike some, I’m not opposed to outline slides.  I am, however, opposed to the practice of introducing an outline on Slide 3, and then never revisiting it.  This serves no purpose verbally. It can, however, be very effective to revisit an outline to signpost.

Second, never visually overwhelm your listeners.  This problem shows up most commonly in presentation of results.  It’s better to gradually introduce viewers to the results such that they can understand them as quickly as possible [5]; start by explaining what axes mean and how to interpret graphs/tables/etc.   The instant you blast them with a giant table/complicated chart, they stop listening to you and start trying to understand it.

For example, I had a big table in my job talk that I introduced in pieces (selected slides included). I think (hope?) it worked better than just throwing the whole thing up on screen would have.  There was narrative to accompany it, of course.

Speaking of tables, one of the Firm Rules in my research group is as follows: “No talk shall include screen-captured tables, even those presenting someone else’s work; if you must present tabular data, you shall make it readable and pretty. ” Screen-capping graphs can be OK if they’re from someone else’s work because it’s hard to make new one without the data, and I’m not a complete tyrant [5].  Same rules about explaining how to read them apply, though.

Listen: graphs are awesome.  They provide a great way to illustrate complex data and relationships and trends and so on.  But in a paper, we’re trying both to use them effectively and squeeze everything into the page limit.  This is OK, because readers can take time to scrutinize them in a paper without losing the narrative thread.  Not so in a talk.  Avoid translating a paper directly to the screen.

(I warned you it was a theme!)

Then I concluded.  Pithily, one hopes.

There’s way more to be said on this topic, and many different opinions; this is just what I said in one SSSG presentation. For example, halfway through preparing these slides I was directed to some good commentary from Dave Evans on this subject.  Turns out we agree on a lot.  Maybe that’s why I like it?  Anyway, he links to the wise words of others, as well, if you’re on a tear on this subject.

Happy talking!

[1] One (and not the only one!) of those SCS CMU abbreviations whose original expansion has evidently been lost to the sands of time.^

[2] With much less of an emphasis on slide design; I’m kind of bad at it, and Christian totally nailed it with his earlier metatalk.  Also, his recommendation of The Non-Designer’s Design Book is spot-on.^

[3] The fourth thing flows naturally from the first three, so I get it for free.  Whee!^

[4]I guarantee you’ve heard this before.^

[5] Credit where it’s due: Slide 49, which should be animated, is from a presentation my student Mauricio Soto gave about code repetitiveness, which included some of this work from Gabel and Su in 2008.  Slide 49 walked viewers through how to read graphs that he grabbed from the Gabel/Su paper.^

Industry vs. Academia, or: the Grey Lady Misses the Point

(Emery peer pressured me to blog, presumably rather than ranting entirely in Facebook posts, which is where a moderately shorter version of this first appeared.  I admit this does feel a hair more legitimate.  At least, I feel less compelled to apologize for the length.)

This article:

Uber Would Like to Buy Your Robotics Department (NYTimes)

…is frustrating. To the best of my (limited) knowledge, it more or less accurately represents the facts of the Uber-CMU-NREC deal (and is notably less inflammatory than early reporting on the subject), and yet still totally misses the point:

To start, it treats an event that it acknowledges was both foreseeable and anomalous (i.e., the NREC staff were already eyeing the private sector; everyone was surprised at the scale of the hire) as indicative of a trend.

More importantly, while it does sort of acknowledge the benefits to Pittsburgh, it mostly just conflates Pittsburgh/CMU with Silicon Valley/Stanford, which is rote nonsense for reasons that would take at least another several blog posts to explain. Setting aside concerns about Uber’s management, this is mostly just a good thing for Pittsburgh, a town with a real estate surplus and remarkably healthy town/gown/tech relations (…I doubt it would be nearly so positive for SV).

Anecdata: Note long after the technology center was first announced, an actual Uber driver said to me that, even though the goal is to eliminate his job: “…but that’s years away, and something like that is great for the city.”

It’s the same story with Google, which only gets a passing (though again, accurate!) reference: Andrew Moore did leave CMU for Google. He then built a multi-hundred-person [1] satellite lab in PGH that is substantively contributing to the revitalization (…ignoring issues of gentrification for the purposes of this discussion) of the East End.


One datapoint, but still: Individual faculty members leaving for industry isn’t really the problem (Maybe it’s worse in CA?). The real issue, which the article doesn’t touch on at all, is the draw on students. Why engage in a summer research project for $15 an hour when you can make $20k interning at Facebook? No research experience–>no exposure to research as a career option, and a weak grad school application. And why bother, frankly, when fbook converts the internship into a $100k+ offer for a fresh BA/BS?

(We’re also probably the only field in the history of academia to have such a surfeit of grad students who want to be professors, no matter what it feels like from the job search side. This doesn’t bother me much, though, since they at least give research a fair shake.)

Lest this sound like First World Problems, let’s return to the elided benefits to the city and think about how deeply positive this story really is: Taxpayer funded pie-in-the-sky ML and robotics research turn, over 20-30 years, into Google labs and advanced technology centers in a beautiful, vibrant city that by all rights should have collapsed with the steel industry, but didn’t. University research creates jobs. Sustained public investment in levels 1-4 research, which the private sector increasingly does not fund, is fundamental to the US economy and technological success.

We need to improve the way we, academics, communicate that fact to the public, since politicians are conveniently ignorant of what we do all day (LOOKING AT YOU WISCONSIN).

Upshot: rather than a “woe is CMU” story, why can’t someone write a “good job, NSF/NASA/DARPA” story?  I’m not that close to the situation, being neither a roboticist nor, like, the Dean, but that’s more my sense of the sentiment on the ground, for whatever that’s worth. And also: The tech boom may be posing a risk to academic computer science, but probably not through the acquisition and commercialization of industry-ready technologies.

(All opinions are my own, etc etc.)

[1] Original post had a 600 here, but I realized that I actually totally made up that number, which is probably cool for Facebook but maybe misleading in the legitimacy of a blog post.^