Quantcast
Channel: Rants – ResearchBuzz
Viewing all articles
Browse latest Browse all 11

Four Strategies to Avoid Disinformation In Your Search Results

$
0
0

Yesterday the New York Times published an article called Spate of Mock News Sites With Russian Ties Pop Up in U.S. (that link goes to a gift article.) It’s about how Russia is planting disinformation sites on the Web and trying to disguise them as American news.

I was unhappy to read this story but I wasn’t surprised that there’s gunk in my search results. Google News has included dodgy garbage for years. And even outside of Russian activity, pink-slime journalism was and is a thing.

IT DOESN’T HAVE TO BE THIS WAY. There are available systems of authority and transparency that we can use to avoid having the detritus of information warfare clogging up our search results. There are strategies of time-constraint searching and applications of local context you can use to find local news.

Four Strategies to Avoid Disinformation In Your Search Results

In this article I will show you how to apply four strategies for avoiding disinformation in your search results:

1. Use authority
2. Use transparency
3. Use time constraints
4. Use local context

These strategies can be applied with SearchTweaks.com, a set of 18 tools I made for making your search results better. SearchTweaks.com is free, has no advertising, and uses SimpleAnalytics so it’s privacy-friendly. It was designed with the desktop in mind, though, so it might not look great on your phone. Let’s get started!

1. Use Authority

Marion’s Monocle

When you search Google News, you’re trusting Google. You’re trusting Google to provide you with search results for verified, valid outlets.

But why are you doing that? Google is a lot less particular about what is “news” content than it used to be. And Google hasn’t made any promises about the quality of its content. It’s not like Google is some kind of government agency with the authority to issue licenses for news-producing entities like television stations. That would make it the Federal Communications Commission.

So why don’t we use the FCC when we search for news?

One of the jobs of the FCC is to issue licenses for local television stations. When I watch WRAL, I know that the TV station is located in North Carolina, serves North Carolina, and was issued an operating license by an American government agency. I don’t have to trust Google or try to parse a domain name. Therefore, if I wanted to search for news relevant to North Carolina, it  makes sense to use the FCC’s authority to inform the location and authenticity of the news outlets I’m searching.

The SearchTweaks tool to do that is called Marion’s Monocle TV Search:

Screenshot of the search tool Marion's Monocle. It's just the home page, not much happening here.

 

Marion’s Monocle uses the FCC’s Licensing & Databases Public Inspection Files to find TV stations by state. You can then bundle the states into a Google News search or browse them for recent news. Let’s use a North Carolina news example — the recent nomination of Mark Robinson as GOP candidate for North Carolina governor.

The first step for Marion’s Monocle is to use the drop-down menu to choose a state. We’ll pick North Carolina. MM will ask FCC for its North Carolina TV station information and display it for you, with the cities having the most TV stations listed first.

Screenshot of Marion's Monocle. North Carolina has been chosen from the dropdown menu and there are several cities and their TV stations listed, starting with Charlotte and going to Wilmington, Raleigh, Goldsboro, and Greenville.

 

(PLEASE NOTE: Occasionally, especially when getting a station listing for a large state like Texas, the FCC will return a 502 error and the program will fail. Please reload and try again. This started in late 2023 — I wrote to the FCC and the specified address to let them know about this on January 8, but they didn’t respond, and I can’t fix it from this end.)

Once you click on a checkbox to choose a TV station, the two buttons on top will change and give you the option to either search that site on Google News or browse the last 24 hours’ news on that site (as indexed by Google, of course.)

A screenshot showing Marion's Monocle in action. Several NC TV stations have been selected and there are now buttons to search those stations' Web space or browse recent news.

Click on one of the options and a Google News search result page will open in a new tab. Here’s recent news from the stations I selected:

From this search you can change the timespan, add additional search terms, change the sorting, etc. But no matter how you change the basic parameters of these search results the core principles remain the same: all the results you’ll here are from news sites for whom you can guarantee their location and operating authority. A Russian disinformation site would have to get an FCC license to appear in these search results. That’s a tough barrier!

A tough barrier but also a fairly small data pool. There are only so many TV stations in the United States, after all. But the government also has authority over other Web sites. Like universities and other higher-education top-level domain names.

Super Edu Search

You can register a .com domain name, a .org domain name, or even a novelty domain name like .me or .club. But you can’t go to your favorite domain name registrar and register a domain name with .edu or .mil or .gov. That’s because those are top-level domains restricted to specific kinds of entities.

That gives .edu Web sites their own kind of authority that you can take advantage of in your Web search. It’s not as good as Marion’s Monocle because university Web sites host all kinds of content and some of it is spammy, junk, etc. But as a first-level filter it works great to get rid of huge swathes of garbage without much tweaking. Making your search queries really specific can help avoid the rest.

Super Edu Search uses the Department of Education College Scorecard API to search American college and university Web sites in a way that goes miles beyond the traditional site:edu option.

A screenshot of Super Edu Search from SearchTweaks.com . A series of dropdown menus allow you to select the characteristics of the higher education institutions you want to search.

A series of dropdown menus allows you to select the demographic characteristics of the higher education institutions you want to search. That can be by location (all universities in North Carolina) or ownership (all private universities) or religion (all Catholic universities) or minority emphasis (all HBCUs.) You can mix and match the options as well — for example, I might want to search all HBCUs in North Carolina for Women’s History Month. I put in the query, select the demographics, and then click the big green button. Super Edu Search generates a Google search result URL.

A screenshot of Super Edu Search at work. The query is Women's History Month and demographics of HBCUs in North Carolina have been selected. One search URL has been generated.

Click on the URL generated and it’ll open a Google search result in a new tab:

Screenshot of a Google search result for "Women's History Month (site:bennett.edu | site:ecsu.edu | site:uncfsu.edu | site:jcsu.edu | site:livingstone.edu | site:ncat.edu | site:nccu.edu | site:st-aug.edu | site:shawu.edu | site:wssu.edu)"

When you make a particular Web space restricted, either through licensing of the entity or via access to the space itself, you’re creating a data pool that’s very resistant to casual attempts at disinformation. Of course, the disadvantage of an authority-restricted Web space is that’s it’s only going to allow so much content and so many entities.

When the data pools afforded by authority are too small for you to search, your next option is to use transparency.

2. Use Transparency

Instead of directly searching Google News, the idea with using transparency is to use an outside source to discover news sources and then bundle the ones you find most useful into a Google News search. You won’t have the full confidence that you’d  get searching FCC-licensed entities, but the ability to easily aggregate and change the selection of news sources means the data pool is much more open to scrutiny than the results from a pool of non-curated sources.

But what outside source can you use to find news sources? Wikipedia, of course! Let me show you how Non-Sketchy News Search works.

Non-Sketchy News Search

A screenshot of Non-Sketchy News Search. Nothing's happening at the moment, it's just a form to specify a Google search and a keyword by which you want to search Wikipedia for news outlets.

Non-Sketchy News Search keyword-searches Wikipedia for news outlets then presents them to you in a list along with a description and a direct link to the outlet’s Web site. You can then choose the ones you want to include in a Google search. Click on the Generate Google Search button and your search with the selected sites will open in a new window.

Let’s do an example with the default search on the site. Say you want to do a Google search for “banned books” but you want to find news outlets in Florida. Click on the Search News Sources button and after a moment you’ll get a list of search results:

Screenshot of Non-Sketchy News Search at work. A list of media sources containing the keyword "Florida" have been displayed along with a description and Web site. Each source has a checkbox beside it and a few of them have been ticked.

As you see, I have ticked some of the checkboxes. After I tick the ones I choose add click the “Generate Google Search” button (you can see that in the screenshot before this one) I’ll get a set of Google results in a new window:

A screenshot of search results for (site:heraldtribune.com | site:southfloridagaynews.com | site:palmbeachpost.com | site:www.dailycommercial.com) "banned books"

No, you won’t get umpty-billion search results like you will with an open Google search, but if you’re careful about your topic and your sources you can generate a useful, substantial set of results. And you will know why each source is in your search results, because you’re the one who put them there!

Of course, using transparency presupposes you can clearly identify a theme for the media you want to search. In the example above I’m looking for banned book news from Florida. There are ways I could change that via searching for different keywords (maybe business or Spanish) but I’m still trying to build a relevant set of sources to search.

Sometimes though, the topic you’re searching defines that kind of categorization. Or it’s a big story that’s spanned a long period of time and a simple source search isn’t going to work. In that case, you can change the data pool you’re getting results from by employing time constraint in your search. Let me show you Back that Ask Up and TimeCake along with a little thing I whipped up and put on GitHub.

3. Use Time Constraints

When Web search engines first came on the scene, they usually took a while to index new Web pages. Like,  four to six weeks. I know that seems pretty unbelievable in these days when Web pages are indexed almost instantly (I wrote an article in 2016 that Google indexed within five minutes) but it’s true.

Since pages are indexed much more quickly nowadays, it’s easy for current event disinformation/propaganda/similar garbage to make it into your search results. That’s the bad news. The good news is that because Web pages are indexed so quickly, you can use time-bounded searching to shape the information spaces you’re searching in a meaningful way. It’s not all about searching the last hour/day/month/year, however — instead, I find time-bounded searching useful to help me exclude content from my search results as well as slice my results into meaningful sets. Let’s look at Back That Ask Up and TimeCake.

Back That Ask Up

Back that Ask Up and TimeCake are both part of SearchTweaks’ time-related search tools:

Screenshot of Back That Ask Up showing the form for a Google news query and then three dropdowns to eliminate recent search results by day,  month, or year.

Back that Ask Up makes it really easy to remove recent results from your search results — up to the last 7 days, last 12 months, or last 20 years. I find Back that Ask Up most useful when there’s a current event that’s just overwhelming your search results and you want to get rid of it. Let’s use Joe Biden as an example. Last night he gave a State of the Union speech, and if you search for him on Google News right now that’s pretty much all you’re going to get — news about the speech, reactions to the speech, people who were at the speech, etc. But all the results you get are oriented toward AFTER his speech. What if you were looking for news on the run-up to the speech? You might decide to search for “Joe Biden” “State of the Union” and then remove the most recent two days’ worth of results with Back That Ask Up. Suddenly your search results are transported back to before the speech:

Screenshot of Google News search results for "Joe Biden" State of the Union" between January 1 1900 and March 6, 2024.

There are results from 2022 and 2023 in the initial set of results but it you can make it more immediately relevant using sort by date.

Back that Ask Up works great when there are temporal markers you can easily identify — you want to find news from before a specific incident happened, before a particular administration was in office, before a certain person joined a sports team, etc. But sometimes you’re less interested in avoiding mention of an event and more interested in seeing how search results have changed over time. For that you can use TimeCake.

TimeCake

Screenshot of TimeCake. There's a text form for a Google query and then selectors for a starting and ending year and years of interval.

Whereas Back that Ask Up is for generating one search at a time, TimeCake is designed to create a set of time-bounded Google News searches. Enter a query along with a starting year (earliest is 1999) and ending year along with a number of years to use as an interval, and TimeCake will spit out a list of Google search URLs.

Say I want to search for state of the union address information by year. TimeCake isn’t really designed to make search URLs year by year, but you can do it if you choose a search interval of 0.

Screenshot of TimeCake in action. The query is politics State of the Union, the starting year is 2016, and the ending year is 2024. Underneath the search form is the start of the list of search result URLs.

Click on one of the Google search URLs and it’ll open in a new window. You’ll see that even for a general search like politics State of the Union you can get a specific type of results when viewed through the lens of time.

A screenshot of Google search results for "politics state of the union" restricted to the year 2016.

I actually made TimeCake because of the swimming giraffe question. Apparently there’s been some controversy over the idea of whether giraffes can swim, and when I made TimeCake and created a set of seven time slices over 20ish years, I was able to review the results and see, to a certain extent, the evolution of the discussion. I encourage you to try slicing your search into time-bounded search results and reviewing it at least once; I think you’ll discover that there are specific keywords and concepts that you can use to refine your search in particular temporal directions without having to set date parameters.

(I’m working on a program to automatically extract temporal-contextual concepts and apply them to Web searches in the same way that you might extract topical-contextual concepts, but it’s slow going. Stay tuned.)

Before we look at the last search strategy, Use Local Context, let me give you a little bonus tool. A few months ago Search Engine Journal reported that Google was under a spam attack and a lot of junk was ending up in Google’s search results. Google was getting rid of the garbage, but it was taking a little while. In response to that I made the search-spam-skimmer, which you can get from GitHub.  It’s a bookmarklet — a bookmark with a little JavaScript added. Save it somewhere like your bookmarks bar where you can click on it to open it. When you click it, it’ll ask you for your Google query, which it will then “translate” to a date-bounded Google query that ends two days before today. The idea is you’re removing the last 48 hours’ worth of indexed content in the hope that Google’s will have cleared out any spam older than that.

So far in this article I’ve showed you how to find and search sources in a number of different ways. In this last part I’m going to show you how to use local context to find news sources that are actually local using Street Scoop and School Scoop.

4. Use Local Context

Raleigh, being the capital of North Carolina, has  easily-findable media. There are TV stations and newspapers in the city — you know what media covers Raleigh. But what about Apex, North Carolina?

If you live in North Carolina, you probably know that Apex and Raleigh are both in Wake County, and that Raleigh’s media often covers Apex. But if you don’t live in North Carolina you’d have to go look up Apex, find which metro area it’s in, then find its media. That’s a lot of work, and you might get a bum steer somewhere!

But what if you could enter an Apex street address, determine the nearest metro area, and automatically do a search of the nearest TV stations’ Web space? You can, thanks to the FCC information file, SimpleMaps, and Street Scoop.

Street Scoop

A screenshot of Street Scoop. There's a text form to enter Street and City and then specify American state.

Street Scoop can be a bit finicky, but when it works it’s lots of fun. Enter in an American street address, but don’t enter suite/apartment numbers, commas, etc. Click Submit and the program will find the closest metro area to your address, use the FCC information file to find the nearest metro area, get the TV stations for that area, and then search those stations’ web space for the street name you specified. In the example above I’m searching for news about 911 South Hughes Street in Apex. I click the button and this result opens in a new tab.

Screenshot of Google search results for ""South Hughes Street" (site:raleighcw.com | site:wral.com | site:fox50.com)

I get three results and can confirm that Apex is in the Raleigh metro area, so using Raleigh media sources should be relevant to Apex.

I also got a little bit of information about South Hughes Street, and that’s another thing Street Scoop is good for — telling you about the prominence of a particular area — whether it’s residential or not, whether it has crime, etc. You might do a search and get only a few results like you do here. On the other hand, here’s the results of a search for 6005 Glenwood Avenue in Raleigh:

One search is all you need to know that Glenwood Avenue is a major street in Raleigh. It even has its own section on the WRAL Web site!

Street Scoop is cool for finding hyperlocal news, but it’s still limited in its results because it uses the FCC to define its search resources. It has to do that because street names are repeated constantly — how many “Main Street”s do you think there are in America? If Street Scoop weren’t using the FCC to define metro areas and the media therein, it would generate results from all kinds of different places, even if you included a city and state in the search query.

In order to do the kind of local search that lets you identify media sources in an area, you’d have to do a search of such detail that it would be unlikely to produce results from more than one area. A street name isn’t specific enough for that. But school names often are! Let me show you School Scoop.

School Scoop

School Scoop in action. the dropdown menus are set to Apex, North Carolina, and a table of results shows city, school level, and news search information for Peak Charter Academy, The Math and Science Academy of Apex, and West Lake Elementary.

School Scoop uses information from the Department of Education to list K-12 schools by city and state, and then bundles them into search results. In the example above I have chosen to look at schools in Apex, North Carolina. Each school has the city, school name, level, and four news links. The links are all Google News searches for the city name and the school name, along with some modifiers for the non-“Full News” searches.

Here’s the Full News search result for West Lake Academy:

A screenshot of a Google search result for Apex "West Lake Elementary". There are ten results and they're all appropriately local.

In this case you can see that WRAL and other TV stations are represented in the results, but the first result is from the Raleigh News & Observer, a newspaper.

I’m surprised at how well School Scoop works. When I first started working on it I worried that the school names wouldn’t be unique enough, but between the school names and the city names the queries are usually specific enough to provide results from the same area. I’ve found that often news articles use schools as landmarks when reporting on other events and because of that School Scoop has occasionally found me some “nook and cranny” local media that might have been crushed by SEO otherwise.

Conclusion

Search engines like Google and Bing have made it the fashion to present search results in large blocks — here is the Web block, here is the news block. The problem is that it’s easy to plant information in such a huge data space, and the tools we have for filtering those are very much as they were decades ago, except in some cases they’re worse (Oh, location: syntax, how I miss you.) Using third-party data for authority, transparency, and additional context, however, can whittle those big polluted data oceans into smaller, better-filtered, better-vetted spaces, making it harder for disinformation, propaganda, and similar infosewage to sneak in.

 


Viewing all articles
Browse latest Browse all 11

Trending Articles