Why It’s Hard to Identify a Cancer Cluster

Why It’s Hard to Identify a Cancer Cluster and How It’s Similar to Problems in Econ

Last week, the Pitt News had a nice article on an October 7 meeting between the Pennsylvania Department of Health and residents of Washington County.

If you’re local to Western PA, you’ve probably heard about the recent increase in childhood cancer in Washington and Westmoreland counties (and if you’re not familiar, this is still interesting).

In the last decade, there have been 18 cases of Ewing sarcoma – a bone cancer most often found in children and young adults – in Washington and Westmoreland counties. To put that in perspective, there are roughly 200 Ewing sarcoma cases across the US each year, meaning the rate in Washington and Westmoreland has been about five times that of the rest of the US (we’ll go through the math in a little bit).

Naturally, this has caused plenty of worry, with residents fearing that they live in a cancer cluster: a place where something in the environment is giving everyone cancer.

The Oct 7 meeting was for the state to explain why the increase in Ewing cases was insufficient to label the area a cancer cluster. This article is to illustrate why it’s so difficult to label an area a cancer cluster, and how those same difficulties also show up in many questions we, as economists, try to answer.

(**Note**: I’m not an epidemiologist or a public health official. So don’t take any of this analysis as gospel about what is and what is not a cancer cluster. This article just goes through some of the difficulty in identifying one, and how those difficulties are present in much of economic research as well.)

(**Also note**: none of this is to dismiss the very correct concerns of people who live in Washington and Westmoreland; I would do the same exact thing in their shoes.)

What’s a Cancer Cluster?

The Center for Disease Control and Prevention is the nation’s public health institute. They define a cancer cluster as: “a greater-than-expected number of cancer cases that occurs within a group of people in a geographic area over a period of time.”

Going through this definition will illustrate why identifying a cancer cluster is so difficult. Let’s start with the “greater-than-expected” part.

Is the presence of cancer “greater-than-expected”?

As mentioned, Washington and Westmoreland have had 18 Ewing cases over the past decade – an average of 1.8 cases per year. For an area with that population (about 570k people), that’s a rate of 1 Ewing case a year for every 320k people. Is this greater than expected?

Nationally, the 200 or so Ewing cases a year give a rate of 1 case a year for every 1.6 million Americans. So the rate in Washington and Westmoreland over the past ten years has been 5 times the national average. Surely that’s “greater-than-expected,” – right?

And actually, the answer is just, “maybe.” It’s hard – in a statistically significant way – to say that the rate of Ewing in Washington and Westmoreland counties is higher than the national rate.

How could this be? Without getting into the math, the basic intuition is the following. If Washington and Westmoreland counties had experienced – just by chance! – a few more or a few less cases of Ewing over the past ten years, then the rate of 1 Ewing case a year for every 320k people would fluctuate wildly.

This is because the rate of contracting Ewing in Washington and Westmoreland – 1 out of 320 thousand – is super tiny. Therefore, small changes caused by random chance can have a huge effect on the contraction rate we measure. The size of this (statistical) uncertainty is huge relative to the rate itself, which makes it hard to assert that residents in Washington and Westmoreland are definitely experiencing a “greater-than-expected” number of Ewing cases.

So even though the number of Ewing cases in Washington and Westmoreland looks drastically higher than nationally, it’s hard to rule that this is definitely the result of a cancer cluster, and not just caused by random chance.

[I said I didn’t want to get into the math here, but if you want to get into the math (and you remember your Intro to Stats class), you can try calculating confidence intervals and judging the “statistical significance” yourself. This video (click) has a refresher on how to do this, albeit, it’s a somewhat simplified approach to the actual problem. If you have questions, feel free to email me: maa232@pitt.edu.]

What “group of people”, what “geographic area”, and what “period of time”?

While the causes of most cancers – including Ewing sarcoma – are not well-understood, it’s known that cancer incidence varies with a person’s characteristics, including gender, race, age, occupation, and socio-economic status. Identifying a cancer cluster needs to account for how a group of people, such as Washington and Westmoreland residents, compare with other groups experiencing different incidence rates.

This is challenging, and it’s made more difficult by the long period many cancers take to develop. For example, smoking is shown to cause lung cancer, but it usually takes decades for said cancer to develop.

Therefore, a person living in a cancer cluster who is being exposed to harmful air or drinking water may not develop cancer for decades. During this period, many people will move away from and in to, the area. Therefore, the group of people in a cancer study is often changing in ways that researchers are unable to account for. (For what it's worth: Pennsylvania is better than many states in terms of collecting data on cancer cases, as the state keeps a Cancer Registry, tracking all cases within the state).

In addition to deciding the correct group to analyze, researchers studying cancer clusters also need to determine the geographic area to focus on.

For example, should the PA Dept of Health only study the rates of Ewing sarcoma in Washington and Westmoreland counties, or should they also include their neighbor, Allegheny County? What about another neighbor, Fayette County?

Each choice of a geographic area will lead to different rates of cancer being calculated, different sets of population being studied, and possibly a different conclusion as to whether or not there is a cancer cluster.

The same goes for choosing a period of time to study. Should a potential cancer cluster be studied over a period of ten years, or should it done using cancer rates over twenty years? There’s rarely a good answer to choosing the study’s time period.

How is this like Economics?

Obviously, seeing cancer pop up all over your community and getting barely a shrug from the state is frustrating, and the Pitt News article does well to explain this.

It’s also natural to think that diagnosing a cancer cluster should be as black and white as diagnosing cancer itself. But really, the challenges we went through have a lot less in common with medical diagnoses than they do with the problems economists encounter in their research.

For an example, suppose we wanted to determine what effect immigration was having on the wages of natives. And to simplify the problem, let’s say we’re only looking at immigrants in Allegheny County.

We would first have to define what wages should be expected, absent any immigration to the county. This gives us something to compare actual wages to, and a way of measuring for effects, the same way the number of Ewing cases elsewhere in the US gives health researchers something to compare with Washington and Westmoreland counties.

Then we’d have to decide what group of people to look at. Does it make sense to only look at the effect immigrants have on the wages of workers in healthcare or construction? What about the wages of schoolteachers or custodians? Or should we look at the effect on everyone?

And should we look everywhere in Allegheny County, or limit ourselves to just Pittsburgh? Or to just Oakland and Downtown? Or will immigrants effect wages across all of Western PA?

Finally, what time period should we look for effects over? You wouldn’t expect immigration to affect wages overnight, but what about a year after immigrants arrive? Is that too short?

Economists have tried to solve these problems by using economic theory, and also by using a research design called “natural experiments”. More on what those are in a future Everyday Econ.