Yesterday the New York Times published an article called Spate of Mock News Sites With Russian Ties Pop Up in U.S. (that link goes to a gift article.) It’s about how Russia is planting disinformation sites on the Web and trying to disguise them as American news.
I was unhappy to read this story but I wasn’t surprised that there’s gunk in my search results. Google News has included dodgy garbage for years. And even outside of Russian activity, pink-slime journalism was and is a thing.
IT DOESN’T HAVE TO BE THIS WAY. There are available systems of authority and transparency that we can use to avoid having the detritus of information warfare clogging up our search results. There are strategies of time-constraint searching and applications of local context you can use to find local news.
Four Strategies to Avoid Disinformation In Your Search Results
In this article I will show you how to apply four strategies for avoiding disinformation in your search results:
1. Use authority
2. Use transparency
3. Use time constraints
4. Use local context
These strategies can be applied with SearchTweaks.com, a set of 18 tools I made for making your search results better. SearchTweaks.com is free, has no advertising, and uses SimpleAnalytics so it’s privacy-friendly. It was designed with the desktop in mind, though, so it might not look great on your phone. Let’s get started!
1. Use Authority
Marion’s Monocle
When you search Google News, you’re trusting Google. You’re trusting Google to provide you with search results for verified, valid outlets.
But why are you doing that? Google is a lot less particular about what is “news” content than it used to be. And Google hasn’t made any promises about the quality of its content. It’s not like Google is some kind of government agency with the authority to issue licenses for news-producing entities like television stations. That would make it the Federal Communications Commission.
So why don’t we use the FCC when we search for news?
One of the jobs of the FCC is to issue licenses for local television stations. When I watch WRAL, I know that the TV station is located in North Carolina, serves North Carolina, and was issued an operating license by an American government agency. I don’t have to trust Google or try to parse a domain name. Therefore, if I wanted to search for news relevant to North Carolina, it makes sense to use the FCC’s authority to inform the location and authenticity of the news outlets I’m searching.
The SearchTweaks tool to do that is called Marion’s Monocle TV Search:
Marion’s Monocle uses the FCC’s Licensing & Databases Public Inspection Files to find TV stations by state. You can then bundle the states into a Google News search or browse them for recent news. Let’s use a North Carolina news example — the recent nomination of Mark Robinson as GOP candidate for North Carolina governor.
The first step for Marion’s Monocle is to use the drop-down menu to choose a state. We’ll pick North Carolina. MM will ask FCC for its North Carolina TV station information and display it for you, with the cities having the most TV stations listed first.
(PLEASE NOTE: Occasionally, especially when getting a station listing for a large state like Texas, the FCC will return a 502 error and the program will fail. Please reload and try again. This started in late 2023 — I wrote to the FCC and the specified address to let them know about this on January 8, but they didn’t respond, and I can’t fix it from this end.)
Once you click on a checkbox to choose a TV station, the two buttons on top will change and give you the option to either search that site on Google News or browse the last 24 hours’ news on that site (as indexed by Google, of course.)
Click on one of the options and a Google News search result page will open in a new tab. Here’s recent news from the stations I selected:
From this search you can change the timespan, add additional search terms, change the sorting, etc. But no matter how you change the basic parameters of these search results the core principles remain the same: all the results you’ll here are from news sites for whom you can guarantee their location and operating authority. A Russian disinformation site would have to get an FCC license to appear in these search results. That’s a tough barrier!
A tough barrier but also a fairly small data pool. There are only so many TV stations in the United States, after all. But the government also has authority over other Web sites. Like universities and other higher-education top-level domain names.
Super Edu Search
You can register a .com domain name, a .org domain name, or even a novelty domain name like .me or .club. But you can’t go to your favorite domain name registrar and register a domain name with .edu or .mil or .gov. That’s because those are top-level domains restricted to specific kinds of entities.
That gives .edu Web sites their own kind of authority that you can take advantage of in your Web search. It’s not as good as Marion’s Monocle because university Web sites host all kinds of content and some of it is spammy, junk, etc. But as a first-level filter it works great to get rid of huge swathes of garbage without much tweaking. Making your search queries really specific can help avoid the rest.
Super Edu Search uses the Department of Education College Scorecard API to search American college and university Web sites in a way that goes miles beyond the traditional site:edu option.
A series of dropdown menus allows you to select the demographic characteristics of the higher education institutions you want to search. That can be by location (all universities in North Carolina) or ownership (all private universities) or religion (all Catholic universities) or minority emphasis (all HBCUs.) You can mix and match the options as well — for example, I might want to search all HBCUs in North Carolina for Women’s History Month. I put in the query, select the demographics, and then click the big green button. Super Edu Search generates a Google search result URL.
Click on the URL generated and it’ll open a Google search result in a new tab:
When you make a particular Web space restricted, either through licensing of the entity or via access to the space itself, you’re creating a data pool that’s very resistant to casual attempts at disinformation. Of course, the disadvantage of an authority-restricted Web space is that’s it’s only going to allow so much content and so many entities.
When the data pools afforded by authority are too small for you to search, your next option is to use transparency.
2. Use Transparency
Instead of directly searching Google News, the idea with using transparency is to use an outside source to discover news sources and then bundle the ones you find most useful into a Google News search. You won’t have the full confidence that you’d get searching FCC-licensed entities, but the ability to easily aggregate and change the selection of news sources means the data pool is much more open to scrutiny than the results from a pool of non-curated sources.
But what outside source can you use to find news sources? Wikipedia, of course! Let me show you how Non-Sketchy News Search works.
Non-Sketchy News Search
Non-Sketchy News Search keyword-searches Wikipedia for news outlets then presents them to you in a list along with a description and a direct link to the outlet’s Web site. You can then choose the ones you want to include in a Google search. Click on the Generate Google Search button and your search with the selected sites will open in a new window.
Let’s do an example with the default search on the site. Say you want to do a Google search for “banned books” but you want to find news outlets in Florida. Click on the Search News Sources button and after a moment you’ll get a list of search results:
As you see, I have ticked some of the checkboxes. After I tick the ones I choose add click the “Generate Google Search” button (you can see that in the screenshot before this one) I’ll get a set of Google results in a new window:
No, you won’t get umpty-billion search results like you will with an open Google search, but if you’re careful about your topic and your sources you can generate a useful, substantial set of results. And you will know why each source is in your search results, because you’re the one who put them there!
Of course, using transparency presupposes you can clearly identify a theme for the media you want to search. In the example above I’m looking for banned book news from Florida. There are ways I could change that via searching for different keywords (maybe business or Spanish) but I’m still trying to build a relevant set of sources to search.
Sometimes though, the topic you’re searching defines that kind of categorization. Or it’s a big story that’s spanned a long period of time and a simple source search isn’t going to work. In that case, you can change the data pool you’re getting results from by employing time constraint in your search. Let me show you Back that Ask Up and TimeCake along with a little thing I whipped up and put on GitHub.
3. Use Time Constraints
When Web search engines first came on the scene, they usually took a while to index new Web pages. Like, four to six weeks. I know that seems pretty unbelievable in these days when Web pages are indexed almost instantly (I wrote an article in 2016 that Google indexed within five minutes) but it’s true.
Since pages are indexed much more quickly nowadays, it’s easy for current event disinformation/propaganda/similar garbage to make it into your search results. That’s the bad news. The good news is that because Web pages are indexed so quickly, you can use time-bounded searching to shape the information spaces you’re searching in a meaningful way. It’s not all about searching the last hour/day/month/year, however — instead, I find time-bounded searching useful to help me exclude content from my search results as well as slice my results into meaningful sets. Let’s look at Back That Ask Up and TimeCake.
Back That Ask Up
Back that Ask Up and TimeCake are both part of SearchTweaks’ time-related search tools:
Back that Ask Up makes it really easy to remove recent results from your search results — up to the last 7 days, last 12 months, or last 20 years. I find Back that Ask Up most useful when there’s a current event that’s just overwhelming your search results and you want to get rid of it. Let’s use Joe Biden as an example. Last night he gave a State of the Union speech, and if you search for him on Google News right now that’s pretty much all you’re going to get — news about the speech, reactions to the speech, people who were at the speech, etc. But all the results you get are oriented toward AFTER his speech. What if you were looking for news on the run-up to the speech? You might decide to search for “Joe Biden” “State of the Union” and then remove the most recent two days’ worth of results with Back That Ask Up. Suddenly your search results are transported back to before the speech:
There are results from 2022 and 2023 in the initial set of results but it you can make it more immediately relevant using sort by date.
Back that Ask Up works great when there are temporal markers you can easily identify — you want to find news from before a specific incident happened, before a particular administration was in office, before a certain person joined a sports team, etc. But sometimes you’re less interested in avoiding mention of an event and more interested in seeing how search results have changed over time. For that you can use TimeCake.
TimeCake
Whereas Back that Ask Up is for generating one search at a time, TimeCake is designed to create a set of time-bounded Google News searches. Enter a query along with a starting year (earliest is 1999) and ending year along with a number of years to use as an interval, and TimeCake will spit out a list of Google search URLs.
Say I want to search for state of the union address information by year. TimeCake isn’t really designed to make search URLs year by year, but you can do it if you choose a search interval of 0.
Click on one of the Google search URLs and it’ll open in a new window. You’ll see that even for a general search like politics State of the Union you can get a specific type of results when viewed through the lens of time.
I actually made TimeCake because of the swimming giraffe question. Apparently there’s been some controversy over the idea of whether giraffes can swim, and when I made TimeCake and created a set of seven time slices over 20ish years, I was able to review the results and see, to a certain extent, the evolution of the discussion. I encourage you to try slicing your search into time-bounded search results and reviewing it at least once; I think you’ll discover that there are specific keywords and concepts that you can use to refine your search in particular temporal directions without having to set date parameters.
(I’m working on a program to automatically extract temporal-contextual concepts and apply them to Web searches in the same way that you might extract topical-contextual concepts, but it’s slow going. Stay tuned.)
Before we look at the last search strategy, Use Local Context, let me give you a little bonus tool. A few months ago Search Engine Journal reported that Google was under a spam attack and a lot of junk was ending up in Google’s search results. Google was getting rid of the garbage, but it was taking a little while. In response to that I made the search-spam-skimmer, which you can get from GitHub. It’s a bookmarklet — a bookmark with a little JavaScript added. Save it somewhere like your bookmarks bar where you can click on it to open it. When you click it, it’ll ask you for your Google query, which it will then “translate” to a date-bounded Google query that ends two days before today. The idea is you’re removing the last 48 hours’ worth of indexed content in the hope that Google’s will have cleared out any spam older than that.
So far in this article I’ve showed you how to find and search sources in a number of different ways. In this last part I’m going to show you how to use local context to find news sources that are actually local using Street Scoop and School Scoop.
4. Use Local Context
Raleigh, being the capital of North Carolina, has easily-findable media. There are TV stations and newspapers in the city — you know what media covers Raleigh. But what about Apex, North Carolina?
If you live in North Carolina, you probably know that Apex and Raleigh are both in Wake County, and that Raleigh’s media often covers Apex. But if you don’t live in North Carolina you’d have to go look up Apex, find which metro area it’s in, then find its media. That’s a lot of work, and you might get a bum steer somewhere!
But what if you could enter an Apex street address, determine the nearest metro area, and automatically do a search of the nearest TV stations’ Web space? You can, thanks to the FCC information file, SimpleMaps, and Street Scoop.
Street Scoop
Street Scoop can be a bit finicky, but when it works it’s lots of fun. Enter in an American street address, but don’t enter suite/apartment numbers, commas, etc. Click Submit and the program will find the closest metro area to your address, use the FCC information file to find the nearest metro area, get the TV stations for that area, and then search those stations’ web space for the street name you specified. In the example above I’m searching for news about 911 South Hughes Street in Apex. I click the button and this result opens in a new tab.
I get three results and can confirm that Apex is in the Raleigh metro area, so using Raleigh media sources should be relevant to Apex.
I also got a little bit of information about South Hughes Street, and that’s another thing Street Scoop is good for — telling you about the prominence of a particular area — whether it’s residential or not, whether it has crime, etc. You might do a search and get only a few results like you do here. On the other hand, here’s the results of a search for 6005 Glenwood Avenue in Raleigh:
One search is all you need to know that Glenwood Avenue is a major street in Raleigh. It even has its own section on the WRAL Web site!
Street Scoop is cool for finding hyperlocal news, but it’s still limited in its results because it uses the FCC to define its search resources. It has to do that because street names are repeated constantly — how many “Main Street”s do you think there are in America? If Street Scoop weren’t using the FCC to define metro areas and the media therein, it would generate results from all kinds of different places, even if you included a city and state in the search query.
In order to do the kind of local search that lets you identify media sources in an area, you’d have to do a search of such detail that it would be unlikely to produce results from more than one area. A street name isn’t specific enough for that. But school names often are! Let me show you School Scoop.
School Scoop
School Scoop uses information from the Department of Education to list K-12 schools by city and state, and then bundles them into search results. In the example above I have chosen to look at schools in Apex, North Carolina. Each school has the city, school name, level, and four news links. The links are all Google News searches for the city name and the school name, along with some modifiers for the non-“Full News” searches.
Here’s the Full News search result for West Lake Academy:
In this case you can see that WRAL and other TV stations are represented in the results, but the first result is from the Raleigh News & Observer, a newspaper.
I’m surprised at how well School Scoop works. When I first started working on it I worried that the school names wouldn’t be unique enough, but between the school names and the city names the queries are usually specific enough to provide results from the same area. I’ve found that often news articles use schools as landmarks when reporting on other events and because of that School Scoop has occasionally found me some “nook and cranny” local media that might have been crushed by SEO otherwise.
Conclusion
Search engines like Google and Bing have made it the fashion to present search results in large blocks — here is the Web block, here is the news block. The problem is that it’s easy to plant information in such a huge data space, and the tools we have for filtering those are very much as they were decades ago, except in some cases they’re worse (Oh, location: syntax, how I miss you.) Using third-party data for authority, transparency, and additional context, however, can whittle those big polluted data oceans into smaller, better-filtered, better-vetted spaces, making it harder for disinformation, propaganda, and similar infosewage to sneak in.