OPEN TALK (AI): Multilingual Natural Language Processing Using Python


Gajendra Deshpande
KLS Gogte Institute of Technology, Assistant Professor

Mr. Gajendra Deshpande holds a masters degree in Computer Science and Engineering and working as Assistant Professor at the Department of Computer Science and Engineering, KLS Gogte Institute of Technology, Belagavi, Karnataka, India.Under his mentor-ship teams have won Smart India Hackathon 2018 and Smart India Hackathon 2019 . He is Technical Director for Sestoauto Networks Pvt. Ltd. and Founder of Thingsvalley. His areas of Interest include Programming, Web Designing, Cyber Security, Artificial Intelligence, Machine Learning, Brain Computer Interface, Internet of Things and Virtual Reality. He has presented papers at NIT Goa, Scipy India 2017 IIT Bombay, JuliaCon 2018 London, Scipy India 2018 IIT Bombay, Scipy 2019 USA and PyCon FR 2019, Bordeaux, France.


Natural Language Processing(NLP) is an interesting and challenging field. It becomes even more interesting and challenging when we take into consideration more than one human language. when we perform an NLP on a single language there is a possibility that the interesting insights from another human language might be missed out. The interesting and valuable information may be available in other human languages such as Spanish, Chinese, French, Hindi, and other major languages of the world. Also, the information may be available in various formats such as text, images, audio, and video.

In this talk, I will discuss techniques and methods that will help perform NLP tasks on multi-source and multilingual information. The talk begins with an introduction to natural language processing and its concepts. Then it addresses the challenges with respect to multilingual and multi-source NLP. Next, I will discuss various techniques and tools to extract information from audio, video, images, and other types of files using PyScreenshot, SpeechRecognition, Beautiful Soup, and PIL packages. Also, extracting the information from web pages and source code using pytessaract. Next, I will discuss concepts such as translation and transliteration that help to bring the information into a common language format. Once the language is in a common language format it becomes easy to perform NLP tasks. Next, I will explain with the help of a code walkthrough generating a summary from multi-source and multi-lingual information into a specific language using spacy and stanza packages.

Outline
1. Introduction to NLP and concepts (05 Minutes)
2. Challenges in Multi source multilingual NLP (02 Minutes)
3. Tools for extracting information from various file formats (04 Minutes)
4. Extract information from web pages and source code (04 Minutes)
5. Methods to convert information into common language format (05 Minutes)
6. code walkthrough for multi-source and multilingual summary generation (10 Minutes)