Transcript from YouTube E.A. Draffan 3.00.26 - 3.18.50mins Multilingual Symbolic Support for Low Levels of Literacy on the Web
https://youtu.be/wFn0kY2ssno?t=10798

Speaker: E.A. Draffan 
Hello, can you hear me perfectly. Wonderful. I have to say that was an amazing talk just Armenia, and I've really enjoyed this morning, because it's been such a variety, but I do feel that the digital divide from the point of view, accessibility in both ways of saying that word, not just getting the information, and being able to because of the cost of broadband or whatever, but also the fact that those with disabilities may have problems getting to that information because it's not readable it's not in a language that they understand or, it's actually something that has become so complex that the use of the system is also preventing them accessing, whatever the information a government may provide or whatever we need. And I think the battle came together and your talk was really very interesting and following on is quite hard because from our point of view, we're looking at symbolic support which actually was mentioned back in one of the first talks this morning when we discussed emergencies. I found that equally fascinating. 

But what we have discovered is that those low levels of literacy across the world are enormous. And we know that from the WHO -World Health Organization figures. And in actual fact, we know that that's probably going to go on becoming even worse in the Western world where we consider the fact that we teach reading fairly early on in schools to be something we shouldn't be too worried about, but in fact at the upper end of our statistics, we have a 60 plus plus, with dementia, Alzheimer's and actually mild cognitive impairments. Finding complex reading on the web is actually happening more and more and more. Government recently, in our case in the UK has pumped out masses of information. I mean, COVID-19, the one thing it has done this made screen time something that's become so part of our lives for those who can afford it, but as Amelia said that divide has really grown. 

So we were looking at it from the point of view of can we simplify that text, and there is a group of individuals who have particular difficulty with low literacy. And by low literacy I was absolutely amazed when I discovered what UNESCO meant, and it is just a short, simple statement on your everyday life. Now that can be very very basic. And in the UK we consider reading levels of about 12 years old, as being something that we try, we strive to write, in a way, our public information so that at least most people can read it. But this isn't always so. And I have to say one of the groups that we are really interested in. are those users who depend on symbols very often for reading, and for communication. And that was mentioned with the emergence of a complete group of people who use symbols in different ways. 

Symbols in the forms of everyday leaves on sticks and objects is one way where you match a particular communication phrase to a particular object, but very often nowadays, we're looking at concepts. And it's not easy to make our text simplified in a way that then can have a picture to sort of say, 1000 words, it really is not an easy task. 

We think AI is going to help, we think natural language processing will help. What we forget is that we need to do an awful lot of cleaning up of that language, before we actually come to a sentence that we can use with symbols. And there have been several groups in MIT and others, IBM who've been looking at text simplification. We started looking at symbol labels, because we were particularly interested in those people who use augmentative and alternative forms of communication, called AAC in English. And we discovered that very often the actual symbols they use are not culturally sensitive that we know we've been discussing in a separate piece of research, but actually the languages that we are using are making equally hard. Because obviously, the syntax, the semantics, and the morphology of all our words are not the same in each language. Not only that, although an emoji may have a unicode and be standardized. AAC symbols, do not there is no ISO or unicode or any form of standardisation (other than Blissymbolics which is ideographic rather than pictographic and is seen as a language as explained in an answer to a question).

So you can make a set of symbols, and you don't have to apply a certain ID to it. That means that matching symbol sets, against different languages is really difficult. Google has managed to do phenomenal automatic translation between languages, and the work on mapping symbols started back in 2012 and Mats Lundvalv and his team. But still, we've had to do most of the work manually by adding metadata by adding information. And there really isn't enough data at the moment to gather accurate automatic mapping of symbols. We use concept net. And that's sort of 70 -77% accurate. We've managed to put a unique ID map for single symbols. And if you do a search, you will now find a wad. That is pretty well related to the symbol, but you'll also find an awful lot of labels or words that aren't anything like what the symbol should be. And if you wanted to then put it into a sentence, you'll see from the image here, that actually the very position of that symbol changes depending on which language you're in. So it's a very complex thing to try and achieve. 

We've got some way. In, that we've tried to encode and embed of semantic word finding for the symbols, and we have now got a relatedness sort of measurement in that what we found is that if we use more of that metadata if we can get more metadata. We can start to relate that word on the left where it says concept vocab and then we come to a concept which is the label for the symbol, and you will see the symbols thing in the third column. And those symbols come from female or symbol sets, and they vary enormously. So, 'I', for instance can be the letter in the alphabet, it can be an individual, it can be. I am fine, part of a sentence, and you can see on the right, where you just link by concept next without the semantic word embedding the results are nothing like as good. So we managed to by using embedding managed to get the result better. In other words, 'she' did not appear in our search list as high as the top ones for he and nor did I am fine, when we actually manually looked at it, but we had to go across checking those things manually to have a look at. So semantic reasoning to provide related symbols based on common knowledge can improve that search. 

Why is the search important. Well, it's the start of making a text or symbol process happen, because at the moment, what happens is you just have a direct word on word link. And the problem is we need the context. Just as being said further before. 

I think the other thing is that we could possibly look at image recognition. But we do have a problem with image recognition. Because if we decide that a car is what we're going to look up. We don't always have a successful result, because the tagging of that particular image has been done by a lot of people in various countries as we well know across the world to tag images so that when Google does its search, it comes up with something that relates to what we want. It's not always very accurate for symbols. There's a fuzziness that is missing here so we have wheels on a cart, rather than wheels on a car. So image recognition is something that we want to look at next, but it's not always very good for symbol recognition. Symbols don't have the background to help you find that image. And I think there's this famous one of the dog, and the wolf. And the reason the dog is found has nothing to do with the look of the dog it's to do with the collar, rather than the wolf, who doesn't wear a collar.

We will be putting all our results on Global Symbols website eventually.  At the moment, this is all part of an Alan Turing pilot project that we will be writing up in the coming weeks. What I would say is that, when we have finished this project. We very much hope to have many more symbols, including emojis. And we do actually have those on our website at the moment. So if you are interested in the thought of having an emoji set, you will find that they are there, and these are the present symbol sets that we've managed to link up in the various languages. So you will also find Croatian and Serbian link symbol sets. And this is an ongoing project, but as I said it's quite exciting because it's touching on something to do with communication, but I think it's also touching on something about trying to stop this digital divide, that is happening between us all in the various languages as well. And we are incredibly lucky in English, because we have all the access to various enormous data driven sets, which you don't necessarily have with other languages. So I'm going to stop sharing my screen now and leave you for asking questions. 

Host Speaker  11:13  
wonderful.

E.A. Draffan Speaker  11:14  
I hope we caught up with a bit of time there I tried as a speech therapist not to talk too quickly, but I know that

Host Speaker  11:23  
there was never a problem.

Host Speaker  11:26  
But that does mean that we have some time for questions.

Host Speaker  11:32  
See Roberts has a question for you.

Question Speaker  11:36  
Yeah, yeah. All thank you very much for the talk, it was quite interesting. I just wanted to ask regarding sort of the simplification or the selection of the languages. Because, understanding the human language has developed over centuries and millennia, and I don't think we have the same timeframe for developing the symbolic language is suggested So, but considering the history would it make sense to sort of develop symbolic language based on language families rather than individual languages whether that would lead to some sorts of certain languages which hasn't there be some interoperability difference and understanding.

E.A. Draffan Speaker  12:27  
You've used a wonderful one that you've used interoperability, and one of the problems we have is, As you may or may not know Blissymbolics came out of this very idea that you've got that we need an Esperanto, we need something that is universal, to cross the languages and in your case you're saying, cross language types. From our point of view from a symbol point of view, Blissymbolics is the nearest we've got to it in that you can generate any character for any language because it has an ISO so we can map the Blissymbolics to any language, if that makes sense to you. The problem we've got, and I think you're possibly getting at, is the fact that each language grouping also has the need if you are using it for communication to have text to speech. In other words, it has to be spoken. And in the digital world that requires the language to also have a synthetic voice. And one of the problems we are discovering more and more is that the number of languages that actually have very good synthetic voices is probably only about 140 or something like that can be another block to us being able to group languages, because people want their own accents they want their own dialect and they also need their own orthography so Cyrillic versus Arabic versus Hebrew. So it's not just the type of language, it's also what goes with that language that actually is quite a blocker on us having something that works universally across all languages.

HostSpeaker  14:10  
There's two more questions so one is supposed to be QA and I'm just gonna read it. So this part of it says common knowledge can be different, or sometimes even different amongst culture and so how can we manage that.

E.A. Draffan Speaker  14:26  
I'm actually. Sorry, how can we manage the different cultures.

Host Speaker  14:30  
So I think that now that I think the fact that that may be the common knowledge and therefore probably also the common icon set. Oh, absolutely.

E.A. Draffan Speaker  14:38  
Yes, sorry, common icon sets Yes, and one of the things we did was a project funny enough with Pat on thanks to their funding that they had. They were very concerned about this because cultural and social settings, you do not have bald heads, you want to have that feeling that your food you enjoy eating is represented in the symbol set. And so what we've encouraged on Global Symbols, is that people can take a large symbol set that's already been made because they're all freely available under a Creative Commons license. And then you can add your own set of culturally sensitive symbols, so that they're more appropriate to your setting. And that has happened in order in Pakistan, it's happened in Arabic for Qatar already, and it's now happening for Serbia and Croatia. And we're very excited that this idea that if we can use creative commons license. In other words, open license symbols, we can create any symbols to match, you know, with a larger symbol set so an Apple will always be an apple. It may be green or red, whatever. But on the whole, you don't change that one elephant you may have an Indian elephant, an African elephant. We don't have elephants in the UK except in our zoos or Safari parks. So the elephant image doesn't necessarily have to change across cultures. So you have a set of normally nouns that don't need to change but you do have this other set. And I think from pasa we ended up with 700 unique symbols that they felt were necessary as an adjunct for the symbol set which was made in Spain. So yes, very much want that to happen. And we are really trying to build on that whole idea so I'm thrilled you asked that question

Question Speaker  16:41  
thanks for a wonderful presentation it's exciting project you're doing a bit of the technical question

Unknown Speaker  16:52  
are the concepts.

Unknown Speaker  16:55  
Representative multilingual II, I mean,

Unknown Speaker  16:59  
language use communities can have different concepts is that difference representative in your knowledge base. 

E.A. Draffan Speaker  17:05  
I think this is, this is something that is quite an issue because basing it on concept net word net and getting more metadata that we intend to do over the coming months, is going to be absolutely vital, if we're going to be able to do it multilingualism concept net has multilingual concepts, but sometimes they're not correct, and this is why we've had to do some cleaning. And in fact, a lot of cleaning and I think this is something that's a concern to us that some of the databases that are, we are using. When we ask a local speaker is this translation correct. We find it isn't. And I think you've raised a really good point. Also, we have a lot of homophones, , we have polysemy we have words that have the same meaning different symbols. We have words that have the same sound different symbols. We have words spelling. That is different, but sounds the same which is a real problem for text to speech in symbol in the symbol world. So, yes, this is something we've got to overcome and that I am more about AI. In some respects in the digital world, allowing us to try and do that next step, because I think we've come a very long way, whereas previously 2012 2011, when we were dealing with it to begin with. It was an impossibility. We couldn't map those concepts correctly manual input was required. I think more about whether we're edging nearer and nearer, as the translations improve. And as more hopefully crowdsourcing of accurate sentence construction improves such as Tatoeba and other databases that you can use.

Question Speaker  18:55  
Very brave 

E.A. Draffan Speaker  18:59  
Thank you for asking the question.


Transcribed by https://otter.ai