Project Proposal :Language Filter for diaspora by Abhineet Agarwal

From SMC Wiki

Language Filter For Diaspora



Personal Information

Email Address: agarwal.abhi93@gmail.com
Telephone: +919966551158
Freenode IRC Nick: Abhineet
Github: abhineet08[1]
University and current education:

       4th year ,  B.Tech [ Computer Science & Engineering ] and M.S [Computatonal Natural Sciences],
IIIT Hyderabad, Hyderabad , India.

Why do you want to work with the Swathanthra Malayalam Computing?

       I always wanted to contribute to open source apllications and had a concern to improve them. SMC has projects which provides me a platform to do so.
A GSoC mentoring organization on Indian Language is a even more compelling cause.

Do you have any past involvement with the Swathanthra Malayalam Computing or another open source project as a contributor?

       I have recently been fiddling around with diaspora’s codebase and recently I added support for a wide range of Emoticons(Emoji) to Diaspora [2].
(Interestingly diaspora being a social networking platform didn’t have Emoticons support till now)

Do you have other obligations between May and August ?

       I don’t have any sort of obligations between May and August and can work for 40 hours a week with full commitment

Will you continue contributing/ supporting the Swathanthra Malayalam Computing after the GSoC 2014 program, if yes, which area(s), you are interested in?

       Yes, I will continue to contribute in projects which focus on cross-language experimenting and use  technologies which I am familiar with ( RoR, C, Python , Java , Javascript etc.)

Why should we choose you over other applicants?

       I have decent experience with RoR , have worked as a summer intern in Groupon and was author of the backend script Nightcrawler (performs insert, upsert , update on required fields and imports data 
from Mysql to MongoDB) and was UI developer for HAWK tool( used for curation of deals and plotting using HighchartsAPI) .I have done my B.Tech project in LTRC( Linguistics Centre) at IIIT Hyderabad .
The Project Customizes a NLIDB system to work on Hindi Language (a cross-language experiment) .
This project being an intersection of both RoR and application of Linguistics , I think I’ll be a good fit .


Proposal Overview

Project Summary
Diaspora is a social networking platform . It has features which enables its users to follow their interests but it doesn’t have any language filter i.e users being able to post on languages which are recognized and tagged accordingly. A user who is acquainted to certain language will also not want posts from other languages to show up on his wall. This project is about enabling this feature and allow user to view translated posts.


Cause it fulfills
Social networking platforms are built for active involvement of a user in the language he/she is comfortable with. Language should not be a barrier for such platforms . If they want better understanding of other posts they should be able to get them in the language they’re used to. This project fulfills both the needs.


Implementation Overview

Language Tagging :
All the posts and comments in Diaspora will be tagged automatically with their respective languages and manual tagging will be enabled for modification by the user. While posting user will be given an option of “Tag Automatically” which would be the ideal case else User will use HashTag (given the option of auto-complete) as in the new user page to choose the language of his choice.For comments we will use tag them automatically .

We will use “whatlanguage” gem for the automatic detection of languages . This gem ideally uses statistical language identification API which will analyze the post or comment and score them accordingly so speedwise it will be the best fit for language identification purpose.

“act_as_taggable_on” gem will be used for the purpose of tagging the post with the language given by detection or the language specified by the user . It has been already used by diaspora so it goes with the standard convention .

Show posts with preferred languages:
While signing up for a new account the user chooses his preferred language by entering it into a dialogue box which will be provided additionally to the existing “What are you into” box . User enters his preferred languages which would contain auto-complete feature too.

Signup.png



Besides this user can add , delete his preferred languages in “Account Settings” section. A new field with label “Preferred Languages” will be added to the above mentioned section.

Settings.png




Using the above settings , all the posts to the user stream which are in the languages mentioned in his preferred language settings will only be visible by default.

Translate:
Due to the complexity involved in setting up a translation system we can use Google translate or Bing API for the translation purpose , but recently due to the unavailability of Google Translate API we prefer using Bing API wherein we can translate 1 million characters in total but then we would need to wait for a month for further translation.

Apart from the above mentioned feature, as suggested by Praveen we could add a crowd-sourcing translating feature using globalize gem. By adding this feature, the changes to UI can be done as follows. One in which it can be shown by adding a translation stream where user can translate streams from the input language to a desirable language and the other page which shows a modification in sign up page in which the user is asked if he is willing to volunteer for translation.

Translate.png



SignTran.png



Note: Incase any of the steps turn out to be resource hogging and affect our performance issues, we will use job scheduling using redis.


Tentative Timeline

 Up to May 19th(Preproject Research)   :         This will include creating a rake task for language tagging of all the posts.
                                                 However it has to be either scheduled (to be performed periodically ) so it’s not a real time  
                                                 implementation but a prototype for the whole application. During the creation of the prototype , all the 
                                                 potential problems and possibilities will be discussed with the mentors.
 May 19th - May 30th                   :          Real-time Implementation of identification the posts and the comments
 May 30th - June 11th                  :          Enabling automatic tagging and manual edit options 
 June 11th- June 20th                  :          Creation of UIs for entering Preferred languages and it’s edit options in user account settings.
 June 23rd - June 27th                 :          Mid term evaluations
 June 28th - July 15th                 :          Showing all the posts and comments in languages provided by the user in his preferred languages 
                                                  and hiding the ones in other languages. Testing the implemented feature
 July 15th -July 31st                  :          Adding the translate feature using BING API
 July 31st - August 11th               :          Implementation of the UI to show translated posts ( previously hidden) to the user 
 August 11th - August 18th             :          Look for compatibility issues and protocol breaches and fix them. Formalize all RSpec tests and 
                                                  propery write all the documentation for all the code written during summer.
 August 18th - August 22nd             :          Vigorously use test cases for the improvement and checking of all the features. Fix minor bugs
                                                  and submit the final code.

 


Communication with mentor
I have been regularly in talks with Ershad K (irc nick: ershad) about the implementation details and possible challenges . I have kept him updated regarding my progress and ideas so far.

Other Programming Activities
I have worked as a summer intern at Groupon for the last summer. Groupon is hosted on RoR .