There are more than seven thousand languages in the world, including four thousand written languages.But only 100 languages, or so, can be translated with automatic translation tools such as "Google Translette".New research is currently underway to help us communicate in other languages as well.
Suppose you have found a message that includes information that may contribute to saving the life of a person, but the problem is that you do not understand a single word from the message, and worse than that, that you do not know any of the thousands of the languages of the world, this message was written, so what do you do?
If this letter was written in French or Spanish, this problem would have been solved by writing the message in the automatic translation engine and would get a clear answer in English immediately.But many languages are still difficult to translate, including languages that millions of people speak, such as the Walphic language, the humpbules, the language of Toy, and the language of the Awi in Africa..This is because the algorithms on which these engines depend on learning from human translations, as millions of words are analyzed from translated texts to improve their accuracy.
There is an inexhaustible one from these texts in some languages, such as English, French and Spanish, thanks to the abundance of human translators in multinational institutions, such as the Canadian Parliament, the United Nations and the European Union, as they produce huge amounts of documents and translated documents.The European Parliament produces alone 1.37 billion words in 23 languages within ten years.
But some languages, which may be widespread, may not be translated with this abundance, and therefore there are not many publications in these languages, and for this they are known as the languages of few sources.The artificial intelligence of training on these languages depends on religious publications, such as the Bible translated in many languages.But this information is not enough to train robot devices to produce subtitled texts in various fields.
تخطى مواضيع قد تهمك وواصل القراءةمواضيع قد تهمكTopics that may interest you end
While the "Google Translette" application allows people to communicate with about 108 different languages, the "Ping" translator, which was developed by Microsoft, allows communication in about 70 languages.But the number of spoken languages in the world exceeds seven thousand languages, of which at least four thousand languages have writing systems.
This linguistic barrier may stand in front of anyone who needs to collect accurate information quickly, such as intelligence agencies.
تخطى البودكاست وواصل القراءةالبودكاستمراهقتي (Morahakaty)Teenage taps, from the presentation of a dignity as a vehicle and prepared by Mays Baqi.
Episodes
Podcast End
"The higher the individual's interest in understanding the world, the greater the need to access unwritten data in English.We are now facing many challenges that do not know the borders, such as the lack of economic and political stability, the outbreak of the Corona virus and climate change, and therefore all of these challenges are in the essence of its multi -language..
The training of the translator or intelligence analyst may take a new language for many years, and after these years it may not gain sufficient experience to perform the task assigned to it.."There are more than 500 languages spoken in Nigeria alone, for example.Our experts may not even understand the most famous of them globally, in this country, few of them..
IAPA is funded by research to develop a automated translation system that can search for any written or spoken information in a low -resource language, translate and summarize them.
This project is represented in a search engine in which the user can write an inquiry in English, for example, so it is immediately presented to a list of the documents summarized in the English language translated from a foreign language.If the user presses one of these documents, the translated document will appear completely.The project is involved in competing teams from researchers in computer science, and large parts of it have already been published.
Kathleen McChyun, a computer scientist at the University of Colombia and leads one of the competing teams, believes that the purpose of this project is to facilitate interaction between people from different cultures and exchange more information about their cultures.
The research teams use artificial nerve network technology, one of the forms of artificial intelligence that mimics some aspects of human thinking.Synthetic nervous network models have turned the scales in the field of language processing in recent years.Instead of just memorizing words and sentences, these networks learn their meanings.It may be understood from the context that many vocabulary can be used to express the same concept, even if it seems different.
But these models usually need to analyze millions of texts to train on the language to be learned.The researchers in this project are trying to develop these models in order to train in the language by analyzing less data. Humans in the end do not need to read official documents edited over years to learn one of the languages..
"When humans learn one of the languages, they only need to read a small part of the data that automated translation systems need today to train in translation..That is why we are trying to develop the new generation of automatic translation systems that produce accurate translated texts without needing this huge amount of information..
Each of the research teams includes groups of specialists to solve one of the system problems.The main components, such as automatic research, speech recognition and translation technology, and summarizing text.
Since 2017, the teams focused on eight different languages, including Swahili, Tagalogy, Somali and Kazakhi.
The teams succeeded in collecting written and spoken information in low -resources from the Internet in the form of articles, forums and videos.This information has become available on the Internet thanks to users around the world who spread the contents of their mother tongue.
"If you want information in the Somali language, you will find hundreds of millions of words.You can find large quantities of texts in almost any language now on the Internet..
But these texts are mostly in one language, meaning that Somali articles, for example, are not accompanied by English translation.But Miller says that nerve network models may be trained in advance of different languages by analyzing the texts written in only one language.
Synthetic nervous networks are said to learn during the process of training the characteristics and structures of the language, and then use them in the translation process."Nobody knows the linguistic structures that these models learn," Miller says. "There are millions of standards," Miller says..
After the training stage in many languages, nervous network models learn to translate from one language to another, using a few translated texts, perhaps a few hundred thousand words are sufficient in the language to be learned and what it corresponds to in other languages.
Then the multi -language search engine is able to search through spoken and written information, although this involves many challenges.The technique of identifying speech and converting speech into texts, finds it difficult to distinguish the sounds, names and geographical regions that you have not encountered before.
Peter Bell, an expert on communication techniques at Edinburgh University, and participates in one of the teams, an example of this in a country that may be relatively unknown to the West, and one of the politicians in it may be assassinated.Finding the name of this politician in auditory clips will be difficult.
Bell defrauded this problem by referring to the texts that were quoted from audio clips, and the search for words that seem unclear because the regime has not encountered it before.By examining these words, one of which may be the name of this politician who was immersed.
After finding and translating information, the search engine summarizes the user information.But during the summary process, nerve networks may make mistakes, and computer scientists call "hallucinations".
Let us assume that you were looking for a news report on demonstrators who stormed a building on Monday, but you have read in the summary that they appeared to have stormed on Thursday.This is due to the fact that the nervous network models when they summarize a report, draw information from millions of pages that I analyzed during the training phase.These texts may include many examples of protesters storming buildings on Thursdays, and for this the nerve network expected that this applies to the last example as well.
Neurological network models may also introduce dates or numbers on their own in the summary, such as "hallucinations".
"The nerve network models are extremely developed, they can save a lot of languages and add words that are not present in the source," says Merilla Laba, a computer scientist at Edinburgh University..
Lababa benefited this problem by extracting keywords from each document, instead of the machine summarized in the form of sentences, and thus these nervous models prevent the addition of information and transmission.
The project includes a team concerned with the languages that have ceased to exist thousands of years ago.There is no doubt that these ancient languages are scarce of sources, and only parts of the texts are left.Experts use these languages as a way to experience new technologies that may be applied to modern, low -resources.
Jiaming Lu, a doctorate student at the Massachusetts Institute of Technology, and his team, developed algorithms that can discover modern languages descending from ancient languages.The team feeds the algorithms with simple information about these languages and a general summary of the changes that have occurred.
The nerve network model was discovered based on a little information, that the ancient Ugarietic language in the Far East is a document of the Hebrew, and that the Iberian language, one of the ancient European languages, is closer to the Basque (Bashknish) than to all European languages.
"Dependence on huge quantities of translated documents is one of the manifestations of system weakening, and for this, the production of effective technological tools, whether to address symbols or to translate that is not widespread, will contribute to the advancement of the field of automatic translation," said Barzilai..
The teams have developed models of multi -language search engines, and their efficiency was improved by adding new languages."These technological tools are enough to revolutionize the ways in which analysts collect data from the texts written in foreign languages, as it will allow analysts who only speak English to analyze data that they were not able to read or understand previously.".
Also participating in this project, speakers of few resources, as they need important information written in foreign languages, not for the purpose of espionage, but to improve the quality of daily life..
"When the Corona virus spread, we were in urgent need to translate the necessary health advice into many languages..At the time, we felt how important there are technological tools that help us translate into linguistic, low resources..
Adilani, a database from Europia to English as part of the project "Breaking the Language Barr.Adilani and his team members added to the database of film scenarios, news, literary works and general conversations translated into euro.
Parallel to these efforts, members of societies in Africa are participating in developing databases in other African languages, such as Asian, fon, Tuy, and Uganda languages..
Perhaps a day will come when we all use multi -language search engines in our daily lives, to discover information from all over the world with the click of a button.But at the present time, if you want to understand texts in one of the few resources, you can only learn this language to join the members of the multiple language speakers who develop databases to improve the efficiency of tools and techniques of automatic translation tools and techniques.