The caption addition endpoint exhibits a surprisingly high latency, even compared to the caption translation endpoint. This needs investigation and remediation.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | • Mholloway | T212793 Build infrastructure required to support the Suggested Edits feature | |||
Resolved | • Mholloway | T209997 Create a new API endpoint which returns Commons images in need of a caption or caption translation | |||
Resolved | • Mholloway | T225646 Caption addition endpoint is slow |
Event Timeline
OK, after doing some quick and dirty profiling, I can say with confidence that it's the imageinfo (unstructured captions) query that dominates the latency for both endpoints and regardless of the language(s) requested, and that's killing the performance here. For example, on a single run for /caption/addition/es:
CirrusSearch time: 1504 imageinfo time: 11492 wbgetentities time: 227
Ideally, we'd have a proper in-memory queuing system to support these endpoints, which would eliminate the client-facing latency, but for a variety of reasons that's not really possible at least in the near term. I'd again recommend dropping the unstructured captions from this endpoint, if the app team can live with that.
Determined that extmetadata should be dropped for this reason, but querying CirrusSearch will always incur a ~1.5-2 second penalty.