Page MenuHomePhabricator

Decide whether creating Phester is actually worth while
Closed, ResolvedPublic1 Estimated Story Points

Description

Now that we have a better understanding what phester should do and how much work it would be to create it, we should re-consider using existing tools instead.

An initial survey turned up the following options:

  • strst (typescript, new/small project, yaml based, lacks some features)
  • tavern (python, new/small project, yaml based, lacks some features)
  • behat (php, established, feature rich & complex, focus on cucumber style logic tests)
  • codeception (php, established, feature rich & complex, focus on fluent style logic tests)

These should be discussed, and more should perhaps be found and considered.

Decision matrix (tentative): https://docs.google.com/spreadsheets/d/1G50XPisubSRttq4QhakSij8RDF5TBAxrJBwZ7xdBZG0/edit#gid=0

Requirements, that seem particular to (or especially important for) the Wikimedia use case:

  • HTTP-centric paradigm, focusing on specifying the headers and body of requests, and running assertions against headers and body of the response.
  • Support for running assertions against parts of a structured (JSON) response (JSON-to-JSON comparison, with the ability to use the more human friendly YAML syntax)
  • filtering by tags (because we expect to have a large number of tests)
  • parallel execution (because we expect to have a large number of tests)
  • yaml based declarative tests and fixtures: tests should be language agnostic, it should be easy to write tests for people involved with different language ecosystems and code bases. This also avoids lock-in to a specific tool, since yaml is easy to parse and convert.
  • generalized fixture creation, entirely API based, without the need to write "code" other than specifying requests in yaml.
  • randomized fixtures, so we can create privileged users on potentially public tests systems.
  • control over cookies and sessions
  • ease of running on in dev environments without the need to install additional tools / infrastructure (this might by a reason to switch to python for implementation; node.js is also still in the race).
  • discovery of tests defined by extensions.

Related Objects

StatusSubtypeAssignedTask
StalledAtieno
ResolvedCCicalese_WMF
Resolveddaniel
Resolveddaniel
Resolved Clarakosi
Resolved Clarakosi
Resolved Clarakosi
Resolved Clarakosi
Resolved Clarakosi
Resolved Clarakosi
Resolved Clarakosi
Resolved Clarakosi
Resolved Clarakosi
Resolved Clarakosi
Resolved Clarakosi
Resolved Clarakosi
Resolved Clarakosi
Resolved Clarakosi
Resolved Clarakosi
Resolveddaniel
Resolved Clarakosi

Event Timeline

Very VERY brief assessment of the tools suggested by corey:

In terms of functionality, dredd seems to be a good fit at a glance. Requiring node.js for running tests doesn't seem ideal, but will probably become a lot less annoying with better containerization of the development and testing environment. If dredd was written in python or php, I'd probably go for it.

This week I was tasked with testing Dredd and I wanted to provide a full summary since we’ll be making the final decision soon.

Dredd accepts two different file types, OpenAPI and API Blueprint. OpenAPI has strict guidelines against duplicating endpoints and methods so it’s not surprising that Dredd errors out when you attempt to bypass those guidelines. This is not feasible for the Action API as there is only one endpoint. Even if the Action API was a REST API it would still limit the tests to _target each HTTP method only once.

The other option is API Blueprint. I’ve wrestled with the specification in the last couple of days (mainly because it has VERY limited documentation and low adoption). Unlike OpenAPI/Dredd, API Blueprint doesn’t error out when you provide it duplicate endpoints or methods but it does provide a warning. Also, note that although the API Blueprint seems to be more forgiving with certain violations it still would tie the monitoring and integration tests to API Blueprint which doesn’t seem to have added much in 3 years.

To get a sense of how our simple ActionAPI test looks with Dredd/API Blueprint look here. When you compare it to Phester it's over 4x larger and requires a schema of responses in order to compare the results and provide regex support. When considering monitoring, this isn’t really a con as you’ll be testing a handful of endpoints but when testing a large and complex API like the Action API this can quickly evolve to large files that are prone to errors.

One of the earlier selling points of Dredd for me was the variable extraction as shown here. On one hand, it provides a lot of flexibility to be able to provide unique instructions before/after endpoints are tested but it also doesn’t lend itself to DRY practices to explicitly extract/insert local variables for multiple requests as I expect we’d be doing for the Action API.

Overall, I think Dredd is a workable solution for small APIs but not the best tool for testing large APIs like the Action API. In this case, I’d lean more towards further developing Phester.

It's obvious a few people have put a lot of thought and effort in this. Existing options were considered and the decision was made to make a prototype. As far as I can tell, it's working well enough to consider further development.

In general, I dislike tests that are not written in a programming language. I can see the value in this case, but we should be very careful.

Advantages

The biggest advantage to tests not written in a programming language is that people not familiar with that language can read and write tests. That is valuable only if people writing tests would not be familiar with the code. If all/most people writing/reading tests already know the language (PHP) then writing tests in the language might make more sense.

Disadvantages

I do like yaml in general, but I am not sure it's a good choice for a big test suite. (As far as I understood it, phester would be used to create a big test suite.) To have a taste of why it might not be a good idea to write a big project in yaml, take a look at [[ https://gerrit.wikimedia.org/r/plugins/gitiles/integration/config/+/master | integration/config ]] repository, [[ https://gerrit.wikimedia.org/r/plugins/gitiles/integration/config/+/master/jjb/ | jjb ]] folder. It contains jenkins job definitions written in yaml. It's transformed into xml that jenkins uses internally (as far as I understand it). It gets complicated quickly. I fear the same could happen to the tests written in yaml.

If a lot of tests would be similar, with very small differences (that's the case in jjb example) reusing existing code might get complicated.

Recommendation

My recommendation would be to implement a small but representative test suite in both yaml and a programming language (like php) using a testing framework like phpunit. The suite should have small number of very simple tests (list all pages, list all users...), small number of tests of usual complexity (create account > log in > edit page...), and the vast majority of the tests would be complicated tests, running multiple fixtures and testing complicated workflows, trying to reuse existing code with small differences. That is the place where I think the tool will either shine or crash. Duplication of effort (for creating two test suites initially) should not take too much time and I think it would be very valuable. Comparing how easy it is to write and read code written in both would be valuable, I think.

Even if phester doesn't support all the features that more complicated tests require, I think the tests should be written anyway, to develop the test format.

Thanks to @thcipriani for a humorous page explaining why you should use yaml for everything https://noyaml.com/

Decision matrix (tentative): https://docs.google.com/spreadsheets/d/1G50XPisubSRttq4QhakSij8RDF5TBAxrJBwZ7xdBZG0/edit#gid=0

It doesn't look like Behat got scored on all criteria; I've used it extensively over the past few years and would be happy to talk through its strengths and weaknesses if anyone on CPT would like.

Decision matrix (tentative): https://docs.google.com/spreadsheets/d/1G50XPisubSRttq4QhakSij8RDF5TBAxrJBwZ7xdBZG0/edit#gid=0

It doesn't look like Behat got scored on all criteria; I've used it extensively over the past few years and would be happy to talk through its strengths and weaknesses if anyone on CPT would like.

I'd love to hear your impression. Can you start by putting comments into the matrix? Or values, if you feel like it.

So I played with codeception a bit. Here's the basic CRUD tests:

1<?php
2
3class CRUDCest {
4 public function _before(ApiTester $I) {
5 }
6
7 // tests
8 public function testCreateEditDelete(ApiTester $I) {
9 $I->wantTo( 'Create, edit, and delete' );
10
11 $I->haveHttpHeader('Content-Type', 'application/x-www-form-urlencoded');
12 $I->sendPOST( 'api.php',
13 [ 'action' => 'edit',
14 'title' => 'Test',
15 'creatonly' => 'true',
16 'format' => 'json',
17 'summary' => 'some test',
18 'text' => 'test text',
19 'token' => '+\\',
20 ]
21 );
22 $I->seeResponseCodeIs(\Codeception\Util\HttpCode::OK); // 200
23 $I->seeResponseContainsJson( [
24 'edit' => [
25 'result' => 'Success'
26 ]
27 ]);
28
29 $I->sendGET( 'api.php',
30 [ 'action' => 'parse',
31 'page' => 'Test',
32 'format' => 'json',
33 ]
34 );
35 $I->seeResponseCodeIs(\Codeception\Util\HttpCode::OK); // 200
36 $I->seeResponseIsJson();
37 $I->seeResponseMatches( '/test text/');
38
39 $I->haveHttpHeader('Content-Type', 'application/x-www-form-urlencoded');
40 $I->sendPOST( 'api.php',
41 [ 'action' => 'edit',
42 'title' => 'Test',
43 'format' => 'json',
44 'summary' => 'some edit',
45 'text' => 'edited test text',
46 'token' => '+\\',
47 ]
48 );
49 $I->seeResponseCodeIs(\Codeception\Util\HttpCode::OK); // 200
50 $I->seeResponseContainsJson( [
51 'edit' => [
52 'result' => 'Success'
53 ]
54 ]);
55
56 $I->sendGET( 'api.php',
57 [ 'action' => 'parse',
58 'page' => 'Test',
59 'format' => 'json',
60 ]
61 );
62 $I->seeResponseCodeIs(\Codeception\Util\HttpCode::OK); // 200
63 $I->seeResponseIsJson();
64 $I->seeResponseMatches( '/edited test text/');
65
66 $I->haveHttpHeader('Content-Type', 'application/x-www-form-urlencoded');
67 $I->sendPOST( 'api.php',
68 [ 'action' => 'delete',
69 'title' => 'Test',
70 'format' => 'json',
71 'token' => '+\\',
72 ]
73 );
74 $I->seeResponseCodeIs(\Codeception\Util\HttpCode::OK); // 200
75 $I->seeResponseContainsJson( [
76 'delete' => [
77 'title' => 'Test'
78 ]
79 ]);
80
81 $I->sendGET( 'api.php',
82 [ 'action' => 'parse',
83 'page' => 'Test',
84 'format' => 'json',
85 ]
86 );
87 $I->seeResponseContainsJson( [
88 'error' => [
89 'code' => 'missingtitle'
90 ]
91 ]);
92 }
93}

It's not too bad, but I find YAML more convenient for representing JSON structures and HTTP headers.

Also, codeception produces over 5000 (!) lines of generated code as scaffolding around this.

Codeception is pretty flexible. I'm wondering how hard it would be to implement the functionality we want for phester (including YAML best test specs) into codeception.

If we were to use codeception for other things as well (perhaps instead of selenium, or as a wrapper around phpunit, both of which it supports), this would make sense. But if all we want to do is validate HTTP request/response pairs, codeception seems to be overkill, and gets in the way more than it helps. And I'm not sure how easy (or hard) it would be to make it play nicely with tests defined by extensions.

Also, codeception doesn't seem a good fit for monitoring live services. Though it's probably possible.

I'd love to hear your impression. Can you start by putting comments into the matrix? Or values, if you feel like it.

It’s on my TODO list :)

Meanwhile you could have a look at https://github.com/deminy/behat-rest-testing

Meanwhile you could have a look at https://github.com/deminy/behat-rest-testing

Thanks, played a bit with it. Here's the CRUD test in Behat:

1Feature: action API
2 In order to confidently refactor code
3 as a developer
4 I want to see if the action API works as expected
5
6 Scenario: CRUD
7 When I send a POST request to "/api.php?action=edit&format=json" with form data:
8 """
9 title=BehatTest
10 createonly=1
11 summary=testing
12 text=some+text
13 token=%2B%5C
14 """
15 Then response code should be 200
16 And field "edit/result" in the response should be "Success"
17
18 When I send a GET request to "/api.php?action=parse&page=BehatTest&format=json"
19 Then response code should be 200
20 And the response should contain "some text"
21
22 When I send a POST request to "/api.php?action=edit&format=json" with form data:
23 """
24 title=BehatTest
25 summary=testing
26 text=different+text
27 token=%2B%5C
28 """
29 Then response code should be 200
30 And field "edit/result" in the response should be "Success"
31
32 When I send a GET request to "/api.php?action=parse&page=BehatTest&format=json"
33 Then response code should be 200
34 And the response should contain "different text"
35
36 When I send a POST request to "/api.php?action=delete&format=json" with form data:
37 """
38 title=BehatTest
39 token=%2B%5C
40 """
41 Then response code should be 200
42 And field "edit/result" in the response should be "Success"

It's pretty compact, but only because I added logic to the RestContext.

Encoding the POST data as a string is awkward, especially because it requires manual URL encoding, but this could probably be fixed. There actually is a version that uses table syntax, but that has JSON encoding for the POST body hard coded, so I didn't use it.

Overall, the Cucumber approach of matching natural language with regular expressions and mapping that to PHP code seems error prone for the use case of API tests. Does the response should contain "different text" do a substring match or regular expression? Is it case sensitive? How is I send a POST request to "/api.php?action=edit&format=json" with form data from I send a POST request to "/api.php?action=edit&format=json" with values?

While being able to read the scenarios as English sentences is nice, it hides what'S actually going on below. When testing an API, the high level "behavior" is not the only thing under test. The other thing is compliance in the nitty gritty. Cache control headers, content negotiation all that. It's possible to do all that in Cucumber, but the extra layer of indirection seems to get in the way more than it is helpful. But maybe it's just a matter of getting used to it?...

One thing that isn't clear to me is how I'd pass variables within a scenario. E.g. after I created a page, I want to extract the page's ID from the response and use it in the next step of the scenario. The only option I found was using state in the context object. That's fine for a login or something, but different scenarios may need completely different things to be passed between steps. If that would require a specialized Context class, that would be a show stopper, I'm afraid.

I found another tool that is relatively close to phester conceptually: htt the HTTP Test Tool (hosted on sourceforge - I didn't know that was still a thing). As far as I can tell, htt was created by the FSF in 2011 and as seen little activity between 2013 and 2019, but had version 2.4 release this year.

Like phester, htt is purely declarative and centered on modeling HTTP requests and responses. It's very low level though, and I don't see anything like fixtures or variables. I don't think it's a viable alternative, but it's similar enough that we should look to it for inspiration and for pitfalls.

In other news, I'm investigating the possibility of making a codeception plugin for phester style tests. Seems quite doable, but I have only just started to poke around.

Fjalapeno changed the point value for this task from 2 to 1.Jul 23 2019, 1:47 PM
  NODES
Idea 2
idea 2
inspiration 1
INTERN 1
Note 2
Project 12
todo 1
USERS 2