Profile Information
Name: Chaitanya Mittal
IRC nickname on Freenode: chtnnh
Web Profile: https://www.github.com/chtnnh
Resume
Location: Dubai, AE
Typical working hours: 18:00 - 02:00 (UTC+4)
Synopsis
- Short summary describing your project and how it will benefit Wikimedia projects
The current automatic classification system in place for the ptwiki is very naive and simply checks a few if conditions and places articles accordingly. There are 6 _target labels that the existing system places articles into, 2 of which require editor approval. This model will be replaced with the improved ‘articlequality’ model to automatically label articles based on quality and ‘draftquality’ model to filter out drafts that are spam and/or vandalism.
This proposal elaborates on implementing ‘articlequality’ and ‘draftquality’ model for the Portuguese wiki by following a design like that in the English wiki based largely on the work done by Morten Warncke-Wang et al.
Such an implementation would require feature extraction from ptwiki, training various models on these features and fitness testing these models to find the best fit.
The immediate use cases of this model would be:
- Help increase the quality of automated article classification for ptwiki
- Streamline work for editors on ptwiki with respect to finalizing articles that need expansion, improvements or articles that can be featured.
The implementation would also pave the way for further work to be done in automating various wiki tasks for ptwiki.
- Mentor(s): @Halfak @Darwinius
- Have you contacted your mentors already? Yes!
Deliverables
Days/Dates | Milestone/Deadline/Subtask Accomplished |
---|---|
Apr 27 - May 17 | Community bonding period: spend time interacting with analytics team at Wikimedia, understand common practices and norms |
May 18 - May 24 | Preliminary research on features to be extracted from ptwikis |
May 25 - May 31 | Completion and Integration of extractors for ptwiki |
Jun 1 - Jun 7 | Testing for Extractors and Implementation of feature_lists |
Jun 8 - Jun 14 | Testing feature_lists |
Jun 15 - Jun 19 | Phase 1 Evaluations |
Jun 22 - Jun 28 | Research various models for implementing articlequality |
Jun 29 - Jul 5 | Implement top few models to benchmark performance |
Jul 6 - Jul 12 | Testing and Implementation of top few models |
Jul 13 - Jul 17 | Phase 2 Evaluations |
Jul 20 - Jul 26 | Selection of top performing model |
Jul 27 - Aug 2 | Streamlining footprint of selected model |
Aug 3 - Aug 9 | Streamlining selected model and completing subtasks. Documenting the process and model for future reference in ORES engineering |
Aug 10 - Aug 24 | Final Evaluation |
In addition to code, I plan to start a blog on my portfolio website where I will write about my work on this project once every two weeks. This will help with documentation as well as give certain exposure to Wikimedia AI projects.
Participation
In terms of participation, I plan to communicate mainly through five channels: Phabricator for documented information, IRC for general queries, Zulip for task specific queries and Email and team meetings for official communication regarding progress.
As far as source code is concerned, I have learnt that the best way to share code is through commits. But in cases where this is not the best option, services like https://codeshare.io could be handy.
About Me
Hi! I am Chaitanya Mittal, an undergrad in Computer Science and Engineering currently in my first year. I am an algorithmic coder and machine learning enthusiast. I have the distinction of qualifying to the Asia Regionals of the ACM ICPC 2018. I have worked with the Mozilla Foundation and the Mifos Foundation previously, though only for a short period of time. I am an open source enthusiast and truly believe in the power it holds to influence the world.
In particular though, I have fallen in love with Wikimedia's vision, "Imagine a world where we can all share freely in the sum of all knowledge" and the fact that it stays true to that. In the spirit of free knowledge and collaborative code, I believe Wikimedia leads by example.
The time frame for the project is from June to August. I will have summer break from July going on until August end. I will only have minor college engagement during the first two weeks of the project and I will strive to not let it affect my enthusiasm towards the project in any way.
This proposal has been selected for GSoC 2020
What does making this project happen mean to you?
Having relied on Wikimedia since childhood, without even realizing it, I understand the role that WIkimedia plays and has been playing in shaping how knowledge is shared around the world. The successful completion of this project would directly improve wiki quality for a language with more than 200 million native speakers. To be able to make a small difference in how 200 million people access knowledge would mean the world to me.
It would help a 19 year old realize that collaboration can lead to great things. This is what making this project happen means to me.
Past Experience
Having actively worked in open source for a year now, I have looked for a welcoming community working towards a cause I could relate with. In this process, I have encountered multiple projects (Mozilla, Mifos), developers and tasks. Although it is difficult for me to quantitatively describe this experience, I can affirm that it has helped me become a better developer, I have helped with some tasks here in the WikiMedia community as well!
T245068 is the first task that I have completed.
T246438, T246663 are tasks I am currently working on with @Halfak and have made significant progress in, as of the writing of this proposal.
At a personal level, I actively program competitively and keep myself up to date on the latest machine learning algorithms being developed. I love both Python and C although competitive programming does make me use C++ quite often. I am a native Linux and Bash user and prefer coding in vim or VisualStudio Code.
Any Other Info
References: T246663
Related Projects/Microtasks:
- T246438 could be used as a microtask and the implemented features for text complexity can be utilized for all wikis instead of just enwiki.
- Convert all extractors for various wikis to generators to handle 0 or more labels per template (currently all expect only 1 label per template)
Relevant Links: