Content Translation Recommendations API
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	• sdkim
	Oct 18 2021, 2:58 PM

Description

Request Status: New Request
Request Type: project support request

Request Title: Content Translation Recommendations API

Request Description: Content Translation integrates the existing recommendation API, which cannot support important needs for our users. Research showed that users need more control on the topic areas to get their suggestions (currently the only customization is based on showing suggestions related to their recent edits). In addition, as part of the Section Translation project we need to suggest not only articles missing, but also articles that exist in both languages and can be expanded by translating a new section. These requirements are not supported by the current API which also is generally unmaintained.
Indicate Priority Level: Choose: Critical, High, Medium, Low
Main Requestors: @Pginer-WMF
Ideal Delivery Date: 2021-22 Fiscal year (ideally Q3)
Stakeholders: <list stakeholder, team/org>

Request Documentation

Document Type	Required?	Document/Link
Related PHAB Tickets	Yes	T113257: Custom translation suggestions: Find opportunities to translate in topic areas selected by the user
Product One Pager	Yes	<add link here>
Product Requirements Document (PRD)	Yes	document
Product Roadmap	Yes	Roadmap: "Q3/KR 1.4:

Translate specific topic areas" was planned for Q3 of Fiscal year 2022-23|

Product Planning/Business Case	No	<add link here>
Product Brief	No	<add link here>
Other Links	No	- Current recommendation-api description

Related Objects
Search...

Status	Subtype	Assigned	Task
Open		None	T296994 Observations from research study for Section Translation on Thai Wikipedia
Open		None	T293648 Content Translation Recommendations API
Resolved		kevinbazira	T308164 Migrate Content Translation Recommendation API to Lift Wing
Declined		calbon	T308165 Explore what would be required to migrate the content translation recommendation model to Lift Wing
Resolved		kevinbazira	T338805 Containerize Content Translation Recommendation API
Resolved		kevinbazira	T339890 Host the recommendation-api container on LiftWing
Resolved		hashar	T341582 Limitations on CI fetching files from the wikimedia public datasets archive
Resolved		kevinbazira	T342084 Post-merge build failed due to Internal Server Error
Resolved		kevinbazira	T343576 Store and fetch the recommendation-api embedding from Swift
Resolved		elukey	T343951 Post-merge build succeeded but image not published to docker-registry
Resolved		kevinbazira	T346218 Adapt the recommendation-api to use float32 preprocessed numpy arrays from swift
Resolved		kevinbazira	T346411 Upload recommendation-api preprocessed numpy binaries to Swift
Resolved		kevinbazira	T347015 Deploy the recommendation-api-ng on LiftWing
Resolved		klausman	T347262 Set SLO for the recommendation-api-ng service hosted on LiftWing
Resolved		klausman	T347263 Create external endpoint for recommendation-api-ng hosted on LiftWing
Resolved		kevinbazira	T354601 Fix rec-api-ng relative paths handling
Resolved		kevinbazira	T347475 Investigate recommendation-api-ng internal endpoint failure
Resolved		kevinbazira	T348607 Configure envoy settings to enable rec-api-ng container to access endpoints external to k8s/LiftWing
Stalled		None	T340854 Verify if the Python recommendation API can support the use cases of the nodejs one
Resolved		ngkountas	T365347 Update endpoints used in Content and Section Translation to use the LiftWing version of the Recommendation API
Resolved		kevinbazira	T365554 Run load tests for the rec-api-ng and update production resources to meet expected load
Open	BUG REPORT	None	T377331 the error message from gapfinder service refers to a deleted rev
Resolved		diego	T333893 Support recommendation API improvements

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 18 2021, 2:58 PM

Pginer-WMF updated the task description. (Show Details)Oct 19 2021, 10:22 AM

Pginer-WMF updated the task description. (Show Details)

• sdkim claimed this task.Oct 25 2021, 1:56 PM

• sdkim moved this task from Backlog to Investigate on the Foundational Technology Requests board.

• sdkim updated the task description. (Show Details)Oct 26 2021, 4:38 PM

Used by content translation and supports android suggested edits feature.

Does this require data persistence of the suggestions or are they done on the fly?

Flagging for planned January conversation with Research, ML/AI team, and PET to discuss ownership

• lbowmaker claimed this task.Dec 10 2021, 4:35 PM

Pginer-WMF added a parent task: T296994: Observations from research study for Section Translation on Thai Wikipedia.Feb 8 2022, 10:12 AM

In T293648#7557585, @DAbad wrote:

Flagging for planned January conversation with Research, ML/AI team, and PET to discuss ownership

Please let us know if there is any update. Recent research with Thai editors (T296994) confirms the need for better and more personalized suggestions, and the Recommendation API improvements are key to support work in this area (T113257).
Thanks!

Isaac subscribed.Mar 7 2022, 8:04 PM

@Pginer-WMF do we have a Product Requirements Documents with details such as user stories, more detailed requirements, etc.?

In T293648#7835366, @DAbad wrote:

@Pginer-WMF do we have a Product Requirements Documents with details such as user stories, more detailed requirements, etc.?

Thanks for looking into this, @DAbad. Most of the needs are illustrated in T113257, but I created a document to capture the requirements as separate stories. So that it is easy to comment in-context. Please, feel free to add any comments/questions or let me know if you were expecting something different.

Pginer-WMF updated the task description. (Show Details)Apr 7 2022, 10:20 AM

April 13, 2022

was created and owned by Research
need to figure out where this sits for the long term
- ex: would this sit on liftwing and exposed via liftwing API?
if it is made by research, we could likely deploy to liftwing
we can move this over to liftwing so we can test while API gateway is under development as an option

Next Steps

Identify if additional data is required or if all data required exists
Move over the topic model first (easiest to move over)

MMiller_WMF subscribed.May 9 2022, 4:37 PM

RHo subscribed.May 11 2022, 3:38 PM

We've created a ticket on our team's board to explore how we might migrate the modeling functionality of the application to Lift Wing (T308165).

Nikerabbit subscribed.May 18 2022, 10:55 AM

I added some notes to the description and document to clarify that although the initial and main focus is about Wikipedia articles, there are plans for supporting translation on other projects (T322537) such as Wikivoyage, or Wikibooks. It would be great if the API is generic enough to provide suggestions for the current wiki even if it is not Wikipedia. Maybe it was already assumed to be generic in this way, but I thought would be better to make it explicit.

Pginer-WMF mentioned this in T113257: Custom translation suggestions: Find opportunities to translate in topic areas selected by the user.Dec 16 2022, 11:11 AM

Pginer-WMF updated the task description. (Show Details)Dec 16 2022, 11:13 AM

Pginer-WMF updated the task description. (Show Details)Dec 16 2022, 11:16 AM

Pginer-WMF updated the task description. (Show Details)

Pginer-WMF mentioned this in T317056: Content Translation- suggest top 10 most viewed articles to content translation.Dec 16 2022, 11:21 AM

Pginer-WMF mentioned this in T321529: Design: adding a Translation task to the Newcomer Homepage .Jan 11 2023, 3:48 PM

kostajh subscribed.Jan 12 2023, 9:56 AM

Some Progress We've Made Related To this Work:

We've recently been able to develop and deploy a liftwing api to the API gateway (which was a blocker to migration)
We can now work on migrating the model to Liftwing so we can work on it there. Based on internal discussions with the ML/AI team, we believe this should be a relatively small effort. (Working on scheduling)
We'd then have to work with Research and the ML/AI on improvements to the model

Pginer-WMF added a subtask: T308164: Migrate Content Translation Recommendation API to Lift Wing.Mar 27 2023, 11:14 AM

In today's prioritization meeting we decided to prioritize this request for Q4. ML Platform (@calbon) and Research (myself) will be working with @Arrbee and the teams in the coming 2 weeks to figure out the details of what will get done in Q4.

leila added a project: Research.Mar 29 2023, 10:39 PM

Update on the Research end: @diego will work on the Research portion of support for this task. We are tracking his contributions through T333893 . The Research portion of the task may require research contributions related to topic models. This is @Isaac's area of expertise and Diego will ask for Isaac's support if needed. (cc @Miriam )

leila added a subscriber: Miriam.Apr 3 2023, 10:47 PM

leila removed a project: Research.Apr 4 2023, 12:10 AM

calbon added a project: Machine-Learning-Team.Apr 4 2023, 2:22 PM

calbon moved this task from Unsorted to Watching on the Machine-Learning-Team board.Apr 11 2023, 2:58 PM

Hi folks!

The ML team has been working to add the Python service outlined in https://recommend.wmflabs.org to production on K8s, but we realized that another service called "recommendation-api" was already deployed and exposed via Restbase:

https://en.wikipedia.org/api/rest_v1/#/

The above points to a nodejs application that is somehow overlapping with the code running on https://recommend.wmflabs.org, as it was outlined in T308165#7983559, so I am wondering if the Content Translation team tried the Restbase API or not.

I am asking since, for a lot of technical reasons, we cannot have two kubernetes services called the same and in T338471 we were wondering if the current Restbase API was used and if it was different from the Python one, or not. Since both services are very old we lost track/knowledge about them, but the only thing that is sure is that they are both running a very old model from years ago.

To summarize:

Is https://en.wikipedia.org/api/rest_v1/#/ a possible replacement for https://recommend.wmflabs.org/ for Content Translation? If so the API is already there and can be used. The API Platform is deprecating Restbase so they may need to relocate the endpoint somewhere else, but nothing terrible. If you only need access to a WMF-internal service, hence no need for Restbase, then the API is available at recommendation-api.discovery.wmnet.
Are we aware that the recommendations from recommend.wmflabs.org are from a very old model? Is it an issue? We may need to follow up on this after 1).

@Pginer-WMF tagging you since you are mentioned in the task's description as requester, lemme know if I should ping other folks :)

Thanks in advance!

In T293648#8949413, @elukey wrote:

Is https://en.wikipedia.org/api/rest_v1/#/ a possible replacement for https://recommend.wmflabs.org/ for Content Translation? If so the API is already there and can be used. The API Platform is deprecating Restbase so they may need to relocate the endpoint somewhere else, but nothing terrible.

We are aware of this API exposed via restbase. In 2021, I tried to do a migration to this service, and noticed several issues outlined in this ticket: T190034: Update Recommend tool to adapt Production service. And that attempt was abandoned.

If you only need access to a WMF-internal service, hence no need for Restbase, then the API is available at recommendation-api.discovery.wmnet.

The API is accessed from browser, so we cannot use internal service name.

Note that we have not tested it recently to see if those issues are resolved now. We can do an evaluation if that helps. But restbase APIs are going to be deprecated. So we will require another migration soon right?

In T293648#8954712, @santhosh wrote:

In T293648#8949413, @elukey wrote:

Is https://en.wikipedia.org/api/rest_v1/#/ a possible replacement for https://recommend.wmflabs.org/ for Content Translation? If so the API is already there and can be used. The API Platform is deprecating Restbase so they may need to relocate the endpoint somewhere else, but nothing terrible.

We are aware of this API exposed via restbase. In 2021, I tried to do a migration to this service, and noticed several issues outlined in this ticket: T190034: Update Recommend tool to adapt Production service. And that attempt was abandoned.

Thanks a lot for the confirmation! I am trying to figure out if we can use the already deployed app, because on k8s we cannot have two services called the same (in this case, recommendation api). Since the nodejs service is already set up, deployed and used by the Android team I wanted to know if we needed to add another similar API or if we could have avoided.

If you only need access to a WMF-internal service, hence no need for Restbase, then the API is available at recommendation-api.discovery.wmnet.

The API is accessed from browser, so we cannot use internal service name.

Ack, instead of using Restbase (that is going away) we could expose the service via API gateway or elsewhere, shouldn't be a big issue.

In T293648#8954740, @santhosh wrote:

Note that we have not tested it recently to see if those issues are resolved now. We can do an evaluation if that helps. But restbase APIs are going to be deprecated. So we will require another migration soon right?

Sadly the API is not being updated in ages, so I guess that no new functionality has been added, but worth to quickly check if you have time. From T190034 I didn't exactly get what are the blockers/missing-features, could you please add a quick list in there so we (ML/Research) can reason about what needs to be added?

Thanks in advance :)

The REST API does not accept multiple seed articles. This is important for CX. CX require previously translated/edited titles to be used as seed for new suggestions. If the api accepts just one article, it affects the quality of suggestions. We had discussed overcoming this limitation by using one seed and then second seed and so on in multiple API calls triggered by a "Refresh" button. This is a client side change we need to do. However, the recommendation being based on single title may not be appealing as it is based on all previous contributions by user.
There are multiple algorithms inside the system exposed by the https://recommend.wmflabs.org/ that API param is missing in rest API. Can't say a big blocker now, as the quality of suggestions in general is not good these days(from personal testing)
We used to get several 500 errors from API in 2021, but in quick testing now, I don't see it, but that require more testing

As time pass by, the major issue is none of these. It is the age of the model and unmaintained core algorithm. These are not updated for last 7 years or so.

@santhosh thanks a lot, and I agree that both services (rest/nodejs and python/cloud) have not been updated in ages, so something will need to be done (the plan is also to figure out what service to add and how to maintain it).

So far my understanding is that both APIs are doing almost the same things, but with different variations, so we should keep only one of the services, not both if possible (to reduce TOIL and tech debt). Is is a fair assumption?

Yes, The python based service is supposed to be the right(latest) one if my memory is correct. - The system behind recommend.wmflabs.org. If that can be in Lift Wing with a public endpoint, we(CX) can switch to that. That also solves the upcoming rest api deprecation issue. Running it in a new system might be tricky from my quick reading of the code. There is some 1.1 GB model file present in figshare.com that need to be downloaded!

We, the language team need more sophisticated recommendation from this system for our upcoming plans. Since there is no owner for the core algorithm and model, I started a discussion in the team to see what can we do about it.

Perfect let's keep our teams in sync with this issue and the future steps, I agree that the service needs some serious attention.

Regarding the support for more recent news, these are described in the product requirements document included in the ticket description above.

One of the main limitations of the current API is that it focuses on article creation (i.e., looking for articles that exist in the source language but are missing in the _target one). However, we are supporting since 2021 the translation of specific article sections to expand existing articles. This is scenario (expanding existing articles) is not supported by the current recommendation APIs.

@Pginer-WMF thanks! So the main issue currently is that the restbase/nodejs version of the recommendation-api is used by the Android team, meanwhile Content Translation uses the one in Python running on WMF-cloud. We cannot have two services called the same on k8s, so we'll need to find a new name before adding the Python recommendation-api. We cannot deprecate the restbase/nodejs version because it is used, so we'll probably have to live with two services for the moment.

I'll talk with my team about next steps, and will update the task!

In T293648#8955031, @elukey wrote:

@Pginer-WMF thanks! So the main issue currently is that the restbase/nodejs version of the recommendation-api is used by the Android team, meanwhile Content Translation uses the one in Python running on WMF-cloud. We cannot have two services called the same on k8s, so we'll need to find a new name before adding the Python recommendation-api. We cannot deprecate the restbase/nodejs version because it is used, so we'll probably have to live with two services for the moment.

I'll talk with my team about next steps, and will update the task!

Thanks for the context. Base on my interpretation of the comments Santhosh made above, the main difference in terms of functionality is that one service supports multiple seed articles so maybe that could be used to make the different names more meaningful (e.g., calling them "recommendations" and recommendations-multiseed"). Otherwise, with more arbitrary names we may be getting back to try to figure out which was the difference between them.

The REST API does not accept multiple seed articles.

Chiming in -- looks like the restbase/nodejs version does accept multiple seed articles, it's just not documented as far as I can tell. Example: https://es.wikipedia.org/api/rest_v1/data/recommendation/article/creation/translation/en/Cerro_Tuzgle%7CChicago?count=8 (where I'm passing the name of an article about a volcano and the city of Chicago delimited by a | and getting back results related to Chicago or volcanoes so clearly both seeds are being accounted for)

And also adding this in reference to the original request around e.g., adding topic functionality. Currently my understanding is that these endpoints are mainly making queries to the morelike endpoint of CirrusSearch and then doing some additional filtering -- e.g., checking if the article exists or not in the _target language. Search already provides the functionality to filter by topic, which is how e.g., Growth handles topic filters in Newcomer Tasks. You could update the API to do a single query like the below examples to both provide a seed article(s) and restrict results to a certain topic:

Seed is en:bee; topic=chemistry: https://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=morelikethis:bee%20articletopic:chemistry&format=json
Seed is en:bee; topic=music: https://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=morelikethis:bee%20articletopic:music&format=json

You can use | parameters to separate additional morelikethis seeds or articletopic topics though if one topic/seed has much more relevant results, you'll predominantly see those.

@Pginer-WMF @santhosh - one extra question - do we have any plan/agreement about who will maintain the "new" Python API? Will it be a collaboration between Research and CTX? Or something different? The ML Team will own the infrastructure bit, etc.. of course.

kevinbazira mentioned this in T340531: Grant ML Team members +2 rights to the recommendation-api repository.Jun 27 2023, 9:23 AM

@elukey I acknowledge your comment and working on it. We will get back to you.

@elukey the short answer is that we don't have an agreement, yet, and we should. : )

can you please provide a high level list of items that need maintenance from your perspective and mark the parts that ML Platform will be responsible for? We can then take the other components and discuss maintenance accountability on our end.

In T293648#8978054, @leila wrote:

@elukey the short answer is that we don't have an agreement, yet, and we should. : )

Nice :)

can you please provide a high level list of items that need maintenance from your perspective and mark the parts that ML Platform will be responsible for? We can then take the other components and discuss maintenance accountability on our end.

In my opinion these are the main steps:

Figure out what we want to do with the nodejs recommendation API, and see if the Python one (that supports GAP Finder etc..) can be used instead (when it will be deployed on Lift Wing). Maintaining a Python API is way easier for us and Research, so we should concentrate our efforts on it. It will also give us some good karma since the nodejs recommendation api is currently holding off the Restbase deprecation, so getting rid of it would make a lot of teams happy :) I opened T340854 for this, but I'd need some help from Research to understand the various use cases. [ML + RESEARCH]

Deploy the Python recommendation API on Lift Wing, and possibly improve its state/code. The ML team will deploy the new service (probably named recommendation-api-ng) and we'll try to figure out if it can be migrated to a faster/better framework like FastAPI. [Mostly ML but help from RESEARCH would be good in reviews etc..].

Establish ownership of the service. Ideally the ML team will run it on behalf of Research (so monitoring/availability/reliability/upgrades/etc..) but new features should be developed from Research or Content Translation (we don't have a lot of bandwidth to improve it, aside from what written in 2). [RESEARCH]

Allow the Android app to migrate its code to the Python API on Lift Wing, allow Content Translation to use the API, etc.. [ML]

@leila lemme know if the above makes sense, we can do a quick chat/meeting if you want to discuss it further :)

In T293648#8964443, @elukey wrote:

@Pginer-WMF @santhosh - one extra question - do we have any plan/agreement about who will maintain the "new" Python API? Will it be a collaboration between Research and CTX? Or something different? The ML Team will own the infrastructure bit, etc.. of course.

@Pginer-WMF @santhosh - sorry for the extra ping, do you think that your team could maintain the new API? The ML team will support its deployment and reliability on K8s, but we don't have the bandwidth to do any development on it (and the Research team has limited resources too). We'd need to have some ownership before starting, otherwise this service will end up unowned/abandoned very soon.

@elukey not an answer to your question, but trying to assess the effort required here. Like everybody else we are also constrained by people capacity :-)

I was looking at the code to assess how much effort it require to modernise and get it running with LiftWing. I assume you are expecting Language team to do this LiftWing adaptation of the current python service, am I right?

The code at https://github.com/wikimedia/research-recommendation-api is a flask, flask-restplus application. Does not run with python 10 when I tried. So there is lot of dependency updates to happen. But since LiftWing does not require Flask, it would be mostly making it very lean and have only the recommendation part. I have not looked into the algorithm yet.

kevinbazira mentioned this in T339890: Host the recommendation-api container on LiftWing.Jul 10 2023, 3:15 PM

@santhosh By ownership we are mostly referring to implementing new features. We are already working on deploying this flask app as a service on the Lift Wing - keep in mind that this one has nothing to do with kserve applications and it is a standard web application on kubernetes.
So our proposal is to deploy it (perhaps even transform it in a FastAPI app for more efficient async requests) and support the service more like an SRE team as @elukey stated above (monitoring/availability/reliability/upgrades/etc.). Then the owners will be the ones who will be implementing/removing/changing features.

fkaelin subscribed.Jul 19 2023, 3:04 PM

We are already working on deploying this flask app as a service on the Lift Wing

@isarantopoulos Could you please update the status on this work?

Pginer-WMF mentioned this in T309603: Analyze topic diversity of published translations.Sep 18 2023, 1:31 PM

Adding another note (on top of T293648#8956259) around improving functionality of the API when we get to that stage: when a seed article is provided (e.g., find articles like "en:Basketball" to translate into my language), the API grabs the top-500 most-similar articles to en:Basketball per morelike but then it returns them in reverse order (the least similar first). I think is because there used to be settings to sort by # of pageviews or # of sitelinks, where you'd want to sort highest to lowest. The default, however, is just search result rank, which has the opposite semantics (lower rank is better). It'd be very easy to fix so the most-similar articles are returned (or better yet, diversified to be e.g., every 10th article with some randomness about where the counting starts).

Nikerabbit mentioned this in T208197: ContentTranslation relies on recommendation-api running on Cloud VPS.Dec 4 2023, 11:53 AM

leila closed subtask T333893: Support recommendation API improvements as Resolved.Dec 7 2023, 8:17 PM

Pginer-WMF mentioned this in T333893: Support recommendation API improvements.Dec 11 2023, 11:12 AM

After discussions with @Isaac and @Pginer-WMF So one feature that I don't think we are paying attention to yet is to "relative importance" of the article in the domains that its related to. We use a lot of shortcuts (such as pageviews, and interrellated links), but there is a lot of curated information on the wikis already about editor evaluation of importance that we could be using.

Suggestion: we could experiment with recommender criteria for the "importance" of the article when it relates to areas of knowledge by using the "importance" field used by WikiProjects in the PageAssessment extension ( https://www.mediawiki.org/wiki/Extension:PageAssessments ) . The Extension is only installed on a handful of the major wikis (and principally used by En, Fr, Ar, and TR https://extloc.toolforge.org/extensions/PageAssessments ), and the ratings information is only available on 10s of millions of articles (enwiki has about 10 million). However, because the WikiProject infrastructure in general across many wikis is heavily reliant on the model and information coming from En and FR Wikipedias this should not massively hurt scoring of diversity topics. The approach could be something like retrieving the Wikidata item, looking for the Wikis where the extension is installed, and then retrieving and averaging the WikiProject importance score and counting the # of WikiProjects on these pages would send a relative signal of relative importance to different domains of Knowledge.

A good experiment with this as an element of the recommender, would also help us followup further on @Isaac 's recommendation that we provide something like PageAssessment to more wikis to allow greater crowdsourcing of information about topics.

ldelench_wmf subscribed.Apr 17 2024, 2:49 PM

Removing inactive task assignee. (Please do so as part of offboarding - thanks.)

A recent report shows how translations based on longer source articles are more likely to be deleted. It may be useful to incorporate factors such as the article length and whether it meets the standard quality criteria that have an influence on the resulting content quality as part of the recommendation filtering options.

More details about some of the takeaways of this study in T356765.

Isaac mentioned this in T360455: Add Article Quality Model to LiftWing.May 27 2024, 2:32 PM

Pginer-WMF mentioned this in T367873: Technical exploration to support topic-based suggestions with the current Recommendation API.Jun 18 2024, 12:43 PM

kevinbazira closed subtask T308164: Migrate Content Translation Recommendation API to Lift Wing as Resolved.Sep 17 2024, 6:30 AM

eamedina subscribed.Oct 10 2024, 7:04 PM

ppelberg mentioned this in T379405: Enable people to generate arbitrary feeds of edits .Nov 25 2024, 9:43 PM