# Things I Keep Repeating About Writing

I often write papers with students, or read students’ papers to provide comments, and I find myself saying the same things over and over, especially the first time out.*  So: here’s a blog post I can point them to to (hopefully!) save us all some time and trouble. I plan to update it as I remember more things I say repeatedly.

I’m happy to argue these points, and take suggestions to expand the list.  I’m not claiming that I’m the world’s foremost writing expert, and some/many of these are the product of relatively arbitrary preference.  But, (A) this is targeted first and foremost at my  own students, so my preferences matter, and (B) I’ll try to justify when I can.

This list isn’t a complete delineation of all the rules of English grammar.  Follow those rules too, even if they’re not on this list.

* These are not the kinds of comments I typically make when reviewing, where I focus less on style.

Use clear and precise language.

Use short, declarative, active sentences.  BANISH THE PASSIVE VOICE. If you went to an American high school you probably need to retrain your instincts.

• Adverbs are often imprecise: what does “incredibly” add to the phrase “incredibly important” that the word “important” lacked on its own?  How much more important than important is something that is incredibly important?
• Pronouns are often unclear with respect to their antecedents, which can confuse the reader.

Be as explicit/concrete in your statements as you can.  This is perhaps best illustrated by example (courtesy Yuriy Brun): Instead of “The dataset has a few attributes.”, say “The dataset has 22 attributes.”  Avoid descriptors like “a number of” or “several”, which rarely add meaning.  Instead of “We performed a number of experiments.” or “The cat had a number of lives.”, try “We performed four experiments.”, “The cat had nine lives.”

(To highlight the point, consider the sentence(s) without “a number of”: “We performed experiments.”/”The cat had lives.”  See how the meaning didn’t really change?)

Related: do not use more syllables than necessary.

Two easy manifestations of this rule are the following transformations that can be applied universally to your draft:

• “In order to” –> “To”
• “Utilize” –> “Use” (unless in the context a discussion of CPU utilization, where it’s reasonable).

The point of writing is to communicate an idea.  Using more syllables than necessary obscures the idea without adding meaning.

Present numbers properly.

Write out in letters all positive numbers less than or equal to 10, unless they are in a sentence with a number greater than 10 (ETA: like 110, which makes this sentence comply with my rule).  I don’t know why.

Right justify columns of numbers.  I will repeat this in all-caps, because I really mean it: RIGHT JUSTIFY COLUMNS OF NUMBERS.  Ensure that the correct number of significant digits are used (your stats package is giving you waaaaay more than is appropriate), and that decimal points align.

You will argue with me about this, because you really want to left-justify or center them.  I don’t know why.  A reader should be able to quickly scan a column of numbers to get a sense of magnitude, and cannot do that if they are left-justified unless they are all (coincidentally) the same order of magnitude.

Text in columns should be left justified.  Never center anything that’s not a column header.

Typesetting/copy-editing minutiae.

(On all of these, the answer to Why? is usually: Because.)

Capitalize Table, Figure, and Section.  Refer to sections only, never subsections, even when you’re referencing an actual subsection (e.g., Section 4.1, not Subsection 4.1).  Include a non breaking space (~) between the words Figure/Section/etc and the \ref.

Capitalize and punctuate section/paragraph headings/captions consistently.  If one ends with a period, they all should.

Do not use citations as nouns.  No: “In [14], Hazelwood et al. describe facts.” Yes: “Hazelwood et al. [14] describe facts.” (H/T Kim Hazelwood)

Citations go before punctuation, with a non-breaking space between the word and the citation.  Footnotes go after the punctuation, with no space.

An em-dash is three dashes in latex.  You use these to offset text, like a parenthetical but without parentheses (I’d give an example but wordpress converts my triple-dash into an em-dash automatically so it’s hard to see!).  An en-dash is two dashes and is only used for ranges (like page numbers).  A single dash is used in hyphenated words.  You probably don’t need to hyphenate compound words nearly as often as you think you do. No spaces around dashes; sometimes a space after a hyphen, depends on the circumstance (e.g., pre- vs. post-condition).

(The actual rules for dashes and hyphens and compound phrases are complex, so beyond that I’ll punt to another website instead of typing them all out.)

Abbreviations should include appropriately placed periods, that is, after every shortened version of a word.  So “also known as” is abbreviated “a.k.a.”; versus is abbreviated v. or vs.; “et cetera” is abbreviated “etc.” (a mistake I made in the first draft of this document!).  Et al. is another one, and a pet peeve (period after the al., which is short for alii, not the et, which just means “and” and isn’t shortened). Et al. should not be italicized, though I took some convincing on this. It should be separated from the preceding name with a non-breaking space.

Always put a comma after i.e. and e.g., and use them properly (i.e. means “put differently” or “in other words”, e.g. means “for example”).

It didn’t initially occur to me to include “use the Oxford/serial comma,” because doing so is so obviously correct.

Do not hit the page limit by shrinking your tables and figures.  Assume your reader is old, blind, lazy, and also colorblind.  Print out your paper at least once on physical paper and make sure you can read the figures and tables.  I do actually complain about this when reviewing.

Choose colors for graphs and figures that show up when your paper is printed in greyscale.  Go to http://colorbrewer2.org/ and choose “colorblind safe” and “print friendly” to find color combinations that work.

Use booktabs for tables.  They look so much nicer and internal rules do not actually increase readability.

The default font size for labels on graphs coming out of basically any package (Excel, R, etc.) is too small.  Don’t let the defaults boss you around.

Use latex, bibtex, and version control in a way that makes your advisor happy.

There are myriad differing opinions on this; of all the “rules” on this page, these are almost certainly the most CLG-specific.

Naming.  Name your .tex file (and project/directory) something more informative than “paper”.  Reasonable schemes include but are not limited to:  “lastname-projectname-year”, “projectname-venue-year”, “lastname-venue-year”.

Version control. I prefer to collaborate using git, mercurial, or svn, through a hosted repository.  github or bitbucket are fine.  My username basically everywhere is clegoues.

Do not check in byproducts of the build process, including the PDF. If you do, we will conflict every time we commit, which is annoying.

Because I like to use git/hg/svn, I strongly prefer hard line breaks throughout a document.  My editor default is 80 characters.  Fewer than that is fine; longer gets silly.  Some people like to line break at the end of sentences, which I think is weird but preferable to no line breaks at all.  Note that I don’t “rewrap” unless things get crazy.  The point is just that if lines are roughly 80 characters, line-based diffing and merging (as done by git/hg/svn) works pretty well and simplifies collaborative editing.  If paragraphs are all one long line, merging becomes substantially more difficult.

Tools. I prefer to write papers in emacs and will add a Makefile to your directory, and then build the paper using “make”.  You can do the same, or use whatever other editor/tool you like.

I tend to dislike shared latex editing sites like sharelatex, but make allowances, especially when there are fewer than three collaborators.  I prefer those options to emailing a Word document around.  I prefer that to those WYSIWYGs that generate latex, which I won’t use. Google Drive is OK for early drafts, but I’d generally prefer we just skip to the latex.

Latex. I prefer latex documents to be structured as “all one file” rather than having sections or subsections in multiple latex documents and inserted via \input.  Dissertations/theses are a reasonable exception.  I compromise on this based on the preferences of my colleagues, but given a choice…

Leave space/a subsection/a paragraph for acknowledgements at the end so we can acknowledge sponsors without having to panic to make space right before the camera ready.

Bibtex. Give your bibtex entries reasonably indicative names.  If you cut and paste it from the web somewhere, ensure that it’s done properly (some sites make everything a @misc, which is almost always wrong) and modify the bibtex so that it’s reasonable.  Definition of reasonable: special characters are copied properly; authors names and title are spelled/capitalized correctly (don’t forget non-breaking spaces where relevant, like in your advisor’s last name…); includes venue, preferably spelled out along with its acronym, but you can drop the “Proceedings of the 23rd Annual ACM/IEEE blah blah” in favor of just “International  Conference on Software Engineering”; includes year and page numbers.  The rest is mostly optional.

Fair warning: I tend to insert broken bibtex cites as I write to remind myself/you to put references in appropriately.

(Shout out to the numerous others who commented/made suggestions/nit-picked my own copy editing, with especial thanks to Kim Hazelwood and Yuriy Brun, two of the only computer scientists I’ve ever met who are bigger sticklers than I am on grammar/typesetting.)

# A reluctant ICSE submission cap post or: an exploration of primary sources

(Context: I was on the ICSE 2016 PC and I am on the ICSE 2017 PC.  I have never submitted more than three papers to ICSE.)

Much recent brouhaha in the software engineering research community on the new 3-submissions-per-any-individual-author cap imposed by the ICSE 2017 organizing committee.  I’ve been resisting wading into this, but the recent email sent by the PC chairs (for whom I sincerely have nothing but the absolute highest respect) to the the PC notably invites/welcomes respectful discussion on this and any other policy.  It also includes the following:

Some detractors have been vigorous in their opposition, but we can point out that the policy has both its justification and (mostly silent) supporters.

This sentence has been rattling around in my head. Honestly, I’ve been finding those noisy detractors fairly convincing, though I’m fundamentally sympathetic to the plight of PC chairs in general. That said, I was bothered by the “silent supporters” and “existence of justification” claims absent further expansion.

There are various types of evidence that could be presented based on data from previous ICSEs that might convince me that the cap is a good solution to an important problem.[1] For example, if this policy might cut the review load by a third, or if the majority of submissions receiving Cs or below came from bulk submissions, I might come around.

Reflecting on this, I thought, well, maybe that data is available. I wouldn’t want to claim otherwise if I simply hadn’t looked hard enough. The email to the PC, which I’m trying not to quote too liberally because even bulk emails are private correspondence, includes:

The potential problem of bulk submission to ICSE was first documented by the chairs of ICSE 2015, and the idea of limiting the number of submissions was then suggested as a potential solution. All this information is publicly available:
http://www.icse-conferences.org/sc/ICSE/2015/ICSE2015-Technical-Track-Report-Canfora-Elbaum.pdf.

To the primary sources, then.  The basic claim is that the review load is increasing beyond the bounds of scalability and that this cap is important to improving either the quality of submissions or the sustainability of the ICSE review load. From the blog post articulating the policy publicly:

One of the main reasons for this policy is that every year more people submit more papers, but the pool of qualified reviewers willing to make the necessary commitment does not grow in proportion… We are stuck in a vicious cycle.

Long story short: it’s not evident to me, based on the 2015 Technical Report (which details a system with an actual physical meeting, which should scale less well than 2016/2017’s Program Board model), that we are in such a cycle.  Three bullets stand out from the Executive Summary:

• Page 2, under expertise and quality: “- Reviewing expertise was higher than previous ICSEs (as reported by authors and reviewers)”
• Page 2, under reviewing load: “-90% [of reviewers] agreed that the load and schedule was manageable”
• Page 3, under “Brief reflection from the chairs”, which includes suggestions on what to “keep”, “refine”, “explore”, or “drop”: “Drop: Panic about scalability of reviewing process.”

That is: The reviews were good, the load was manageable, and all the strum und drang about review process scalability is unwarranted.  These points are expanded in the document proper.  From Page 32, Section 4, Reflections from the Chairs (emphasis original):

Balanced process. We believe that, given the number of submissions and its rate of growth, the reviewing model we used this year struck a good balance between maintaining a manageable load for reviewers…Note that simply adding more RC members to contribute reviews in the first phase would make the process scale further.

and

No reason to panic about scalability of process. …when considering the 452 submissions for this ICSE, the growth may be linear but with a very small coefficient. This clearly requires close monitoring in the future but no dire measures.

This sounds like we have/had a sustainable model that may be scaled by adding  a small number of committee members[2] and shifting the load somewhat.[3] It does not sound like a vicious cycle.

2015’s chairs do propose an exploration on bounding the number of submissions per author. On page 35:

Bounding the number of submissions per author. It may be worth exploring whether defining a maximum number of submissions per author would help to curb the abusive shotgun approach to submissions and encourage authors with multiple collaborations to submit just their best work…enforcing a limit of three submissions per author would reduce the number of submissions by approximately 8% and we conjecture that the program will not suffer.

That amounts to 1.44 fewer papers on average, for those of us reviewing the max of 18.  This is basically marginal. I have served on three PCs with review loads of 18+ (and several smaller venues) over the last 12 months.[4]  I wouldn’t complain about a reduction, but the difference between 18 and 17 does not determine whether I accept an invitation.

The final claim is that this policy will encourage authors to submit their best work.  But: 2015’s chairs say that only 34 papers would have been blocked by the cap.  Even if they were all rejected, so were 334 others.  Table 3 outlines the score distribution: 220 submissions overall received only Cs or Ds. If we assume that every single one of those 34 papers is terrible (which the report authors do not), they could constitute no more than 15% of the terrible papers, leaving 186 other terrible papers to review. That’s a third of a terrible paper less to review per PC member, which again is marginal (and represents a best case).  I’d bet that most terrible papers are submitted by authors on only one submission, simply based on the underlying distribution.

Somewhat on that subject, from Section 2.4:

The large number of submissions for some authors may be a reflection of an undesired and costly shotgun submission pattern, but it also seems to be associated with authors that carry many active collaborations. We have not enough data to tease this out further.

This section does not provide information of the number of submissions per co-author, nor the number of acceptances per bulk-submitting author. It does say that the vast majority of authors (their words, not mine) are only on one submission, which concerns me because it means that selecting down for prolific submitters will prevent submission outright for their collaborators.

tl;dr: Based on the 2015 report, which is the only ICSE- or even SE-specific document I’ve seen cited in this discussion, I do not see the strong motivation for a hard submission cap.  I’m interested to know if the 2016 data is notably different.  Adding overhead to bulk submission may be reasonable, and other venues have done this.  As a person who serves on multiple PCs, I like the idea of having authors who resubmit a paper include the previous reviews and a list of changes (…on the other hand, reviewing an identical paper a second time has the definite benefit of being easy…).

The PC chairs have stated that there is justification for this policy and have promised an FAQ on it.  I believe them, and if my opinion is worth anything, it would be wonderful  if that FAQ contains empirical evidence to substantiate that this policy will (A) be impactful in improving the quality of submissions/review experience, while (B) not harming the submission prospects of graduate students or other junior collaborators. The evidence cited so far is incomplete, but what it contains is not fully convincing on these points.

[1] As part of our decision to implement double-blind review, my PC co-chair for SSBSE 2014 (a much smaller symposium, its worth mentioning; I’m not at all claiming it’s equivalent to ICSE. Though I do think the latter should also do double-blind, FWIW) insisted we write a blog post justifying the decision. My first reaction was “…uh, because we’re the PC chairs and we said so?” But I did it, because Shin was right.  My opinion on double-blind review is informed by the substantive body of work in implicit bias in general and the study of academic double blind review in particular, and I highlighted some of that literature in the post. I don’t believe we convinced everyone, but I do know anecdotally that members of the community appreciated our data-driven approach to conference organization decision making.^

[2] This year’s chairs mention that it is difficult to fill a PC.  2015’s chairs speak favorably of the benefits of recruiting and cultivating junior community members (like myself, which I greatly appreciate, bloviating blog posts notwithstanding), mentioning that “This is, in our opinion, a key way to educate future leaders in the ICSE community and, as such, it has a great value for the community.”^

[3] I do like a two-phase review system. I reviewed the same number of papers for ISSTA as I did for ICSE in 2016, but in two phases, and it felt much more manageable.^

[4] I review a lot.  I’m learning to say no.  It’s a process.^

Opinions my own, not my employers’, etc etc.

# Free/Fair? Or: A Somewhat Bizarre Request to Fellow Harvard Alums

Hey, fellow Harvard alums: This year, when you get a ballot for the Harvard Board of Overseers Election in the snail-mail, instead of throwing it away without looking at it: don’t. Instead: vote.

The Board of Overseers is a group of 30 individuals each serving 8-year terms. Harvard says:

Drawing on the wide-ranging experience and expertise of its members, the Board exerts broad influence over the University’s strategic directions, provides counsel to the University leadership on priorities and plans, and has the power of consent to certain actions of the Corporation.

The “Corporation” is the president and fellows, who approve the university budget, major capital projects, endowment spending and tuition charges, etc. The board is elected by anyone with a Harvard degree who actually votes on those ballots they send to us every year that we then throw out.

This year, there is a slate of 5 individuals running on a platform they’ve dubbed “Free Harvard, Fair Harvard.” It’s confusing for approximately 155 reasons, not the least of which being that it includes Ralph Nader. On face, they argue for the following:

1. Increased use of endowment income to make Harvard more accessible
2. More transparency in Harvard admissions

Plank 1: they want to eliminate tuition.  This is the major point of coverage in the popular media.  The argument is: (A) Harvard can afford it, (B) doing so would increase access and applicant diversity, because students from under-represented socio-economic groups would be more likely to apply, and (C) eliminating tuition would have wide-reaching social effects by spreading to other institutions.

(A) is mostly false. Endowments aren’t liquid. The endowment is not a checking account, and the size of the endowment/return is not indicative of the amount of money available for tuition support. (I acknowledge the lack of objectivity on the part of the University spokesperson, but it’s still true.) Yes, the University should restructure the endowment to free up more money for tuition support and outright reduce tuition, and elicit donations just for this purpose. But: that’s not the same as using endowment returns to make Harvard free, overnight.

(B) is almost certainly true.

(C) is unlikely. The potential impact is limited to schools with sufficient resources to do the same, of which there are only a handful. Offhand (based on no information) just lowering tuition might have more impact, because it’s more realistic for other schools to follow while still being a legitimate kick to the system (fun fact: scholarship dollars spent is a factor in USN&W’s algorithms, so raising tuition and scholarships simultaneously is good for rankings).

Increasing access and encouraging the underrepresented to apply are awesome goals. Current stated tuition is obscene. It was obscene when I attended, and it’s worse now.

That said: Why on Earth should Harvard be free for everyone? It’s often reasonable to exchange money for services.  More than 30% of the class of 2019 comes from families making more than $250k a year. Even though the financial aid office certainly makes strange need calculations at the boundaries, I’m willing to bet that many students supported by families making >=$250k/year can afford to pay some amount of money for their Harvard educations.

On the one hand, this is a chicken/egg problem: if there were greater access, less of the Freshman class would be so wealthy. On the other hand, it’s total overkill: It makes a Harvard education free for people who don’t need it to be, to solve what is effectively a marketing problem. Instead, Harvard could spend that money on reducing tuition overall, performing aggressive outreach, and ensuring that students from underserved populations are able to take full, effective advantage of their Harvard educations.

Plank 2: is more complicated: “more transparency in admissions.”

(I’ve found it difficult to write about this in a brief or linear fashion, so I’m trying to scope my remarks, with sadly limited success.)

First, note that comments from the slate tie this plank into a call to end legacy preferences (this is apparently Nader’s concern, though I can only find second-hand quotes from him on it), which would be awesome.  Legacy is ridiculous, a fact that is so obviously true I won’t bother citing it. However, based on no information, it’s not clear to me how Harvard can both maintain the alumni giving network that sustains the endowment without an implicit/explicit understanding about legacy.  This sounds awesome on paper but is probably impossible and not the real point anyway.

Back to that real point: At a high level, “more transparency in admissions” is related to evidence that Harvard and other elite schools discriminate against Asian and Asian American candidates. Unz says this outright, and his language exactly matches that of the plaintiffs in an ongoing lawsuit against the university demanding admissions transparency and asserting discrimination against Asian applicants.  Various members of the slate are involved in some way in that lawsuit.  This is a legitimate concern, or at least the evidence that substantiates it is pretty compelling (Unz’s link to the Economist summarizing the claim is broken, but you can find some summary data on his Free and Fair blog post).

Indeed, if discrimination were the only concern, I wouldn’t be so alarmed. But the evidence suggests that it’s not the only issue at stake. Additional fuel for this suspicion comes a bit sideways, but consider the way the campaign tries to explain the combination of planks via the NYTimes:

If Harvard omits tuition fees, more highly qualified students from all strata of the society will find opportunity to apply. Similarly, the university authority will find ease in balancing classes for racial or ethnic diversity and the Asian- Americans won’t lose out.

Uh…what? The conclusion only follows if Harvard were discriminating against Asian applicants because of a lack of socio-economically diverse applicants. I’m not saying it’s OK to discriminate against Asian/Asian-American applicants, but rather that there’s no obvious connection between that discrimination and a homogenous applicant pool. Increasing access and pool diversity can’t help on its own, because the two thoughts are disconnected. Either the candidates are stupid (unlikely), or something else is going on.

Going with the latter: With the exception of Nader, the public intellectual history of the candidates demonstrates strong opposition to affirmative action/consideration of diversity in college admissions. I’ll just give immediate and obvious evidence to that effect, in the interest of (dubious) brevity:

• Unz writes in an article on The American Conservative:

Conservatives have denounced “affirmative action” policies which emphasize race over academic merit, and thereby lead to the enrollment of lesser qualified blacks and Hispanics over their more qualified white and Asian competitors; they argue that our elite institutions should be color-blind and race-neutral. Meanwhile, liberals have countered that the student body of these institutions should “look like America,” at least approximately, and that ethnic and racial diversity intrinsically provide important educational benefits, at least if all admitted students are reasonably qualified and able to do the work.

My own position has always been strongly in the former camp, supporting meritocracy over diversity in elite admissions. But based on the detailed evidence I have discussed above, it appears that both these ideological values have gradually been overwhelmed and replaced by the influence of corruption and ethnic favoritism, thereby selecting future American elites which are not meritocratic nor diverse, neither being drawn from our most able students nor reasonably reflecting the general American population.

I could unpack that paragraph for days, but instead I’ll just assert the obvious: Ron Unz doesn’t like affirmative action. Which I refuse to scare quote even though he does.

• Lee Cheng filed a brief in Fisher v. UT Austin supporting the challenge to the UT’s system, arguing against taking race into account in admissions decision making in almost all cases.
• Stuart Taylor Jr. is the author of “Mismatch: How Affirmative Action Hurts Students It’s Intended to Help, and Why Universities Won’t Admit It.” Which is exactly what it sounds like.
• Stephen Hsu, who gets my non-sarcastic vote as the most reasonable of the non-Nader bunch, has argued repeatedly for “Merit, not Race in College Admissions“.  I tangent on merit below.

Regardless of the stated platform, evidence in the form of their actions and writings in other forums suggests that these candidates are strongly anti-affirmative action/admissions diversity consideration.  Their calls for transparency in admissions are closely linked both intellectually and legally to their challenges to the use of race in admissions in this and other contexts.  Although aspects of the current admissions process are legitimately problematic, swinging to the other extreme that these candidates have advocated in many other places is probably a bad move.

However, even if they truly only want transparency and are not setting out to end diversity considerations in admissions, or even if I agreed with that goal, I still wouldn’t vote for them. Here’s why: Unz in particular has strongly intimated that he seeks to to enact dramatic, destabilizing change from a position on the board.  An overactive/radical board is basically the worst possible way to run a university.  In a stance that is admittedly influenced by my position as a faculty member at a major academic institution (and a UVA grad), I believe that shared governance, consensus building, and careful, nuanced  reasoning by conscientious people is a much healthier way to enact change in the university environment than a few angry bros dictating from on high how Harvard should do anything.  To be honest, I also don’t mind grass-roots revolution; I’m just opposed to the “angry bros on high” model

Executive summary: the Free Harvard, Fair Harvard slate is advocating free admission and more transparency in Harvard admissions. Free admission is impossible and also stupid, and the “transparency” claim is very likely a smokescreen for vehemently anti-affirmative action/diversity agenda. Even if it’s not (or even if it is and you think they’re right), the slate is advocating for destabilizing change to come from the Overseers, which is no way to run a university. Thus, I advocate that you vote for someone else in the upcoming board election.  Or, educate yourself and decide you disagree with me and vote for them anyway, but regardless: vote.  And either way, don’t be misled by Ralph Nader’s presence on the ballot.

This is a deeply complicated issue that gets into what kinds of students Harvard should admit and how they should be evaluated, and what kinds of graduates Harvard wants to produce (Stephen Pinker weighed in a couple of years ago). But this is probably best left to another post.  Suffice to say: discrimination is bad, but it’s fallacious on face to equate merit with just scores.  And also, neither the ENS model (which Hsu advocates) nor the resulting insane testing culture that dominates the French HS experience is necessarily to be aspired to.

**As usual, all opinions expressed here are my own and should not be considered representative of the opinions of my employer, spouse, neighborhood, family, dog, etc.  Wish I could figure out how to put this at the bottom of every post automagically, but so it goes.

# CS PhDs and US Immigration Policy, a Long and Pointlessly Insane Saga

(Should I be working on a grant proposal? Yes, yes I should. Am I writing a blog post instead? Yes, yes I am.)

I’m beginning to wonder if anyone at the NYT actually knows anything about higher education/research in STEM.

Exhibit N: an ENTIRE ARTICLE about whether STEM graduates should get visas, without a SINGLE SOLITARY MENTION of the fact that your tax dollars are, by and large, paying for their PhDs [1].

I’m becoming a bit of a broken record on this, but: The US government funds the most successful scientific enterprise in the world. This is a major driver of economic growth/innovation (e.g., much of the technology in your cell phone came out of publicly-funded basic science). A large proportion of the money the US government gives us as research grants, especially in CS where we have fewer expensive infrastructure needs than, say, experimental physics, pays for graduate students’ tuition and living expenses. Without the students, we can’t do the science [2].

tl;dr: 60.1% of the CS PhDs awarded in 2014 were to nonresident aliens [3]. The pipeline distribution looks similar. So: we bring in students on visas, pay for their PhDs, and then threaten to send them home to compete with us. Note that the Taulbee survey suggests that this doesn’t happen much in practice, as many of the people they track seem to have found North American employment. But still: there’s always the stress, and the visa situation dampens the entrepreneurial spirit, because such graduates typically need to be sponsored by big companies or universities in order to stay.

Many foreign students I’ve spoken to are understandably mystified by the total insanity of this “system.” My own mother, a naturalized American citizen who experienced the post-PhD can-I-get-a-greencard? stress (30 years ago! Times, they do not change), regularly comments that the US government, having funded her PhD, should have insisted she stay! Indeed, other (normal) governments usually attach riders along those lines to the scholarships they give their own students, that stipulate policies like “you must come back to work for at least as many years as we paid for you to study abroad.”

Someone somewhere (Trump?) is liable to say something like “Admit more US graduate students!” Listen, there simply aren’t enough of them. My admittedly limited experience on graduate admissions committees strongly suggests that virtually any CS graduate department would struggle to fill their cohorts with US students, even if they totally ignored applicants’ qualifications.

People (including commenters on the NYT article) periodically get up in arms and claim that the visa lobbying done by companies like MS constitutes a nefarious strategy to pay foreign workers a lower salary (which is basically demented, because last I checked they pay more or less the same salary to starting engineers regardless of country of origin), but you really can’t say that about us. We pay all graduate students the same stipend regardless. We have literally no economic incentive at all to admit a foreign vs. a local grad student.

tl;dr part 2: As a taxpayer, I would like the record to show that I strongly favor government policy that encourages people with PhDs in Computer Science, especially (though not limited to) those that I helped pay for, to stay in this country.

[1] I’m setting aside the Master’s question, since students who get terminal MS degrees are significantly more likely to pay for them, for reasons that mystify me but are rightfully the subject of another post.^

[2] There are many fields in which we might discuss whether there are too many PhDs for the amount of work available for them, insert various words about the faculty hiring crisis in the humanities here. CS is really not one of them. There is a fuzzy boundary between theory and math where a graduate with a PhD in Computer Science might have a harder time finding a job either in industry or academia. I’m caveating those people away, because they’re a small fraction of the total CS PhD population.^

[3] CRA Taulbee survey is your friend.^

(Emery peer pressured me to blog, presumably rather than ranting entirely in Facebook posts, which is where a moderately shorter version of this first appeared.  I admit this does feel a hair more legitimate.  At least, I feel less compelled to apologize for the length.)

…is frustrating. To the best of my (limited) knowledge, it more or less accurately represents the facts of the Uber-CMU-NREC deal (and is notably less inflammatory than early reporting on the subject), and yet still totally misses the point:

To start, it treats an event that it acknowledges was both foreseeable and anomalous (i.e., the NREC staff were already eyeing the private sector; everyone was surprised at the scale of the hire) as indicative of a trend.

More importantly, while it does sort of acknowledge the benefits to Pittsburgh, it mostly just conflates Pittsburgh/CMU with Silicon Valley/Stanford, which is rote nonsense for reasons that would take at least another several blog posts to explain. Setting aside concerns about Uber’s management, this is mostly just a good thing for Pittsburgh, a town with a real estate surplus and remarkably healthy town/gown/tech relations (…I doubt it would be nearly so positive for SV).

Anecdata: Note long after the technology center was first announced, an actual Uber driver said to me that, even though the goal is to eliminate his job: “…but that’s years away, and something like that is great for the city.”

It’s the same story with Google, which only gets a passing (though again, accurate!) reference: Andrew Moore did leave CMU for Google. He then built a multi-hundred-person [1] satellite lab in PGH that is substantively contributing to the revitalization (…ignoring issues of gentrification for the purposes of this discussion) of the East End.

AND THEN HE CAME BACK.

One datapoint, but still: Individual faculty members leaving for industry isn’t really the problem (Maybe it’s worse in CA?). The real issue, which the article doesn’t touch on at all, is the draw on students. Why engage in a summer research project for $15 an hour when you can make$20k interning at Facebook? No research experience–>no exposure to research as a career option, and a weak grad school application. And why bother, frankly, when fbook converts the internship into a \$100k+ offer for a fresh BA/BS?

(We’re also probably the only field in the history of academia to have such a surfeit of grad students who want to be professors, no matter what it feels like from the job search side. This doesn’t bother me much, though, since they at least give research a fair shake.)

Lest this sound like First World Problems, let’s return to the elided benefits to the city and think about how deeply positive this story really is: Taxpayer funded pie-in-the-sky ML and robotics research turn, over 20-30 years, into Google labs and advanced technology centers in a beautiful, vibrant city that by all rights should have collapsed with the steel industry, but didn’t. University research creates jobs. Sustained public investment in levels 1-4 research, which the private sector increasingly does not fund, is fundamental to the US economy and technological success.

We need to improve the way we, academics, communicate that fact to the public, since politicians are conveniently ignorant of what we do all day (LOOKING AT YOU WISCONSIN).

Upshot: rather than a “woe is CMU” story, why can’t someone write a “good job, NSF/NASA/DARPA” story?  I’m not that close to the situation, being neither a roboticist nor, like, the Dean, but that’s more my sense of the sentiment on the ground, for whatever that’s worth. And also: The tech boom may be posing a risk to academic computer science, but probably not through the acquisition and commercialization of industry-ready technologies.

(All opinions are my own, etc etc.)

[1] Original post had a 600 here, but I realized that I actually totally made up that number, which is probably cool for Facebook but maybe misleading in the legitimacy of a blog post.^