An Unexpected Twist
When I reviewed the daily 2020 data yesterday morning, I noticed another substantial bump in mentions for Elizabeth Warren. Since it ran contrary to the trends of the week, I immediately wondered what was behind this boost in her numbers.

Harris and Castro both held campaign events this past week. Each laid the foundation for subsequent mention bumps. Harris (green) is the most mentioned over the past week. And Twitter users also mentioned Castro (yellow) at a rate much higher than usual. 1

The Bull in the China Shop
But a pair of tweets by Trump on Sunday evening caused a spike in Warren mention counts. The first re-circulated an online video from Warren’s campaign, along with racist commentary.

The second is some sort of commentary about her home life.

The actual substance of the tweets is not worth dissecting. But these sudden attacks offer a marvelous opportunity to sanity check our approach.
An Economy of Trial Balloons
Linguist George Lakoff outlines four main uses for which Trump deploys tweets.

I personally think that most of his tweets use a blend of all four. Why would Trump pick on Warren out of the blue like this?
- Trump wants to preemptively frame Elizabeth Warren (presumably as a liar, but also with an undeniably racist frame)
- He wants to divert from a raft of bad political and government stories
- He wants to deflect from Warren’s attacks on his honesty and integrity
- And he wants to test how viral his anti-Warren messages are with trial balloons
The final usage is especially interesting for our purposes. Trump uses this tactic all the time, not just on Twitter. But Twitter makes it easy for everyone to see how far each of his messages spread.
Kicking the Tires
This tweet gives us an opportunity to sanity check the Marvelous tweet sampling and visualization components. Specifically we can take a closer look at:
- What proportion of the retweets is our sampling method collecting?
- Does the distribution of retweets skew to the right, as expected?
- What are the limitations in counting quoted tweets?
Sampling Rate
We search each of our candidates once every two minutes. The Twitter search API returns 100 tweets for each of our searches. This means we are only sampling 50 tweets a minute for any given candidate, regardless of the total volume. So we are dropping tweets during periods of high volume. How many?
The first Trump tweet was retweeted about 18,000 times and the second was retweeted about 25,000. Of those, we collected 10,000 and 13,000, respectively. So during this surge in mentions, we collected about half of the total tweets mentioning Elizabeth Warren. A tolerable spill rate for sure.2
Retweet Bias
We would expect each of these tweets to be primarily of interest to Trump’s base. Looking at the distribution of users retweeting the first tweet, we see that this is basically the case.

Similarly for the second tweet:

The vast majority of users retweeting these comments are clearly right-of-center. Although I would expect the portion of the MAGA crowd interested in “messages” like these to be somewhat lower on the Commitment to Facts dimension,3 clicking on a number of the users confirms they have the typical US right wing (pro-Trump) iconography.
Replies and Quoted Tweets Require Mentions
Intuitively, I would expect tweets quoting (or replying to) these two Trump tweets to skew more towards the left than the retweets do. Indeed, the kids call this phenomenon “the ratio” when applied to replies. Unfortunately, unlike retweets, quoted tweets are only collected by our process if they directly mention the potential 2020 POTUS candidates themselves. So we collect far fewer. And they are not particularly representative of “quotes (or replies) in general.”
In the first tweet we do see a higher proportion of quote tweets located on the left, but the total number is too small to generalize.

And the same applies to the second tweet.

The only real conclusion that we can draw is that we are not collecting quoted tweets and replies in a manner that allows for their use in distribution experiments like this. If the political bias of these engagements become a matter of substantive interest, we would have to devise a strategy to collect them.
Given the limitations of the Twitter search api and the technical cost of maintaining crawlers, we will content ourselves with counting retweets for the time being. The good news is that we seem to be doing that quite well.
- We will return to the regularly scheduled programming about the Harris-Warren-Castro race tomorrow.
- A fun side note is that this puts the odds of collecting each of the original Trump tweets at roughly a coin flip. We came up tails both times and have neither of the original tweets in our collection.
- Media Bias Fact Check has updated their ratings for hundreds of sites this year. Based on a cursory perusal of these changes, we suspect many of the sources popular with this crowd have been downgraded, typically from MIXED to LOW. Marvelous will be incorporating these updates in the next few weeks.