EXCLUSIVE Google is looking for a new measure of skin tones to help reduce distortion in products
June 18 (Reuters) – Google at Alphabet Inc (BrilleL.O) told Reuters this week that it is developing an alternative to the industry-standard method of classifying skin tones that a growing chorus of technology researchers and dermatologists believe is insufficient to assign assess whether products are biased towards colored people.
It’s about a six-color scale known as the Fitzpatrick Skin Type (FST), which dermatologists have been using since the 1970s. Technology companies now rely on it to categorize people and measure whether products like facial recognition systems or smartwatch heart rate sensors work equally well on all skin tones. Continue reading
Critics say that FST, which includes four categories for “white” skin and one for “black” and “brown” skin, disregards diversity among people of color. US Department of Homeland Security researchers recommended abandoning FST for assessing facial recognition during a state conference on technology standards last October because it poorly represents the range of colors in different populations.
In response to Reuters’ questions about FST, Google said for the first time and in front of its peers that it was tacitly pursuing better measures.
“We are working on alternative, more inclusive measures that could be useful in developing our products and will be working with scientific and medical experts as well as groups that work with color communities,” said the company, declining to provide details on the effort .
The controversy is part of a larger reckoning on racism and diversity in the tech industry, where the workforce is more white than in sectors like finance. Making sure the technology works well for all skin colors, ages and genders is increasingly important as new products, often based on artificial intelligence (AI), penetrate sensitive and regulated areas like healthcare and law enforcement .
Companies know their products can be flawed for groups that are underrepresented in research and test data. The concern about FST is that its limited reach for darker skin could lead to a technology that works on golden brown skin, for example, but fails on espresso reds.
Numerous product types offer much richer pallets than FST. Crayola launched 24 skin-colored crayons last year, and Mattel Inc’s (MAT.O) Barbie Fashionistas dolls cover nine shades this year.
The subject is anything but academic for Google. When the company announced in February that cameras on some Android phones could measure heart rate from a fingertip, it said readings would vary by 1.8% on average, regardless of whether users had light or dark skin.
The company later gave similar guarantees that skin type would not noticeably affect the results of a video meeting background filtering feature or an upcoming web condition identification tool, informally called Derm Assist.
These conclusions were drawn from tests with the six-tone FST.
The late dermatologist Dr. Harvard University’s Thomas Fitzpatrick invented the scales to personalize ultraviolet radiation treatment for psoriasis, an itchy skin condition. He grouped the skin of “white” people as Roman numerals I through IV by asking how much sunburn or tan they developed after certain periods of the sun.
A decade later came Type V for “brown” skin and VI for “black” skin. Still part of U.S. sunscreen product testing regulations, the scale remains a popular dermatological standard for assessing cancer risk in patients and more.
Some dermatologists say the scale is a poor and overused measure of care and is often confused with race and ethnicity.
“A lot of people would assume I have skin type V, which seldom if ever burns, but I burn,” said Dr. Susan Taylor, a University of Pennsylvania dermatologist who founded the Skin of Color Society in 2004 to advance research on marginalized communities. “Looking at my skin tone and saying I’m Type V is doing me a disservice.”
Technology companies have not been concerned until recently. Unicode, an industry association that oversees emojis, named FST in 2014 as the basis for introducing five skin tones beyond yellow and said the scale was “devoid of negative associations.”
A 2018 study, titled Gender Shades, which found facial health screening systems to be more likely to misdirect people with darker skin, popularized the use of FST to evaluate AI. The research described FST as a “starting point,” but scientists on similar studies that came later told Reuters they used the scale to stay consistent.
âAs a first step in a relatively immature market, it serves its purpose to help us spot red flags,â said Inioluwa Deborah Raji, a Mozilla Fellow with a focus on testing AI.
In an April study that tested AI to detect deepfakes, researchers at Facebook Inc (FB.O) wrote that FST “clearly does not encompass the diversity within brown and black skin tones.” Still, they posted videos of 3,000 people used to rate AI systems, with FST tags attached based on ratings from eight human reviewers.
The judgment of the evaluator is central. The software startup AnyVision for facial recognition gave its reviewers prominent examples last year: Former baseball star Derek Jeter as Type IV, model Tyra Banks as V and rapper 50 Cent as VI.
AnyVision told Reuters that it is okay with Google’s decision to reconsider its use of FST, and Facebook said it was open to better action.
Microsoft Corp (MSFT.O) and smartwatch makers Apple Inc (AAPL.O) and Garmin Ltd (GRMN.O) use FST when they work on health-related sensors.
But the use of FST could fuel “false reassurance” about heart rate readings from smartwatches on darker skin, wrote clinicians at the University of California San Diego, inspired by the Black Lives Matter movement for social equality, in Sleep magazine last year.
Microsoft has recognized the imperfections of FST. Apple said it tests on people with different skin tones using a variety of measures, including FST only sometimes. Based on extensive testing, Garmin said the readings are reliable.
Victor Casale, who founded makeup company Mob Beauty and helped Crayola with the new colored pencils, said he came up with 40 shades for the foundation, each about 3% different from one another, or enough to suit most adults distinguish.
The color accuracy in electronics suggests engineering standards should be 12 to 18 tones, he said, adding, “You can’t just have six.”
Reporting from Paresh Dave; Edited by Jonathan Weber and Lisa Shumaker
Our Standards: The Thomson Reuters Trust Principles.