This is the Database scanner subsection of the user manual for AutoWikiBrowser.

It is community-maintained outside of the development team
It may contain information that is out of date with the latest AutoWikiBrowser releases.
Feel free to edit, add, or remove to improve the comprehension and quality.

Chapters:	Core · Database scanner · Find and replace · Regular expressions · General fixes

Show example screenshot

Start — Searches the selected database dump based on the settings set in other option boxes
Pause —
Reset —

Parameters

Database

Database file — use the Browse button to specify where on your machine the database dump (XML format, XML file) you have downloaded. (likely from here)
- The following are automatically read from the header of the XML file specified.
  - Site name — Example: "Wikipedia".
  - Base — Homepage of the site. Example: "https://en.wikipedia.org/wiki/Main_Page".
  - Generator — Software version that created the dump file. Example: "MediaWiki 1.44.0-wmf.4 (a8dd895)".
  - Case — Casing configuration of site. Example "first-letter".

Namespaces

Show example screenshot

Select the namespaces you want to search within. If none are selected, the search will include all available namespaces. Please note that your dump file might not contain data for every namespace available on your wiki.

Title matching

Show example screenshot

Title does contain — Restrict the search to titles containing the text, or matching the text if the Regex option is used.
Title does not contain — Restrict the search to titles NOT containing the text, or NOT matching the text if the Regex option is used.
Regex — AWB Regex help
Case sensitive — Whether the text/matching pattern should be case sensitive.

Revision

Show example screenshot

Last edited date

Search date — Tick to restrict the search to pages with a revision (last edited) date between a range.
- From — Start date of range.
- To — End date of range.

Text

Show example screenshot

Text searching

Contains — %%title%%, %%key%%, %%titlename%% and %%namespace%% work if search is not regex
Not contains — %%title%%, %%key%%, %%titlename%% and %%namespace%% work if search is not regex
Regex — AWB Regex help

Singleline — Changes meaning of "." so it matches all characters, as opposed to all apart from newlines
Case sensitive — Enables case sensitivity
Multiline — Changes meaning of "^" and "$" so they represent the beginning and end respectively of every line, rather than just of the entire string

Ignore  —

Page text properties

Characters —
Links —
Words —

Searching

Show example screenshot

AWB specific

None — will just list all the pages in the database dump (that match other scan filter criteria)
Has title AWB will embolden
Has links AWB will simplify — allows you to search a DB dump for links that can be simplified, e.g.:

Simplifies links like [[Dog|Dog]] to [[Dog]]
Simplifies links like [[Dog|Dogs]] to [[Dog]]s

Has bad links AWB will fix
Has HTML entries
Section error
Unbulleted links — will search a database dump for any pages that have external links which are not bullet pointed
Typo — allows you to search a database dump for spelling mistakes, in the same way that AWB can when RegexTypoFix is enabled
Missing {{defaultsort}}

Other options

Start from page — Starts from an entered page name. The dump is scanned until the specified page is found, then the scan continues as normal using the other search settings. Scanning until a page is found is faster than scanning using the full settings, however the dump file up to the page has to be read, so this will still take time (approximately 30 seconds per gigabyte of XML data, depending on your system's CPU speed).
Limit results to — Limits the number of results that will be found displayed from the database dump. If the limit is reached the scan will stop early.

Restriction

Show example screenshot

Allows for pages with edit restrictions (semi-protected, fully protected etc.) to be searched for.

Help

Show example screenshot

Some URL links to relevant dump help pages.

Output

Performance

The speed of the database scanner mainly depends on two factors of the system it's run on:

CPU single-threaded performance
hard disk read speed.

Example performance: Intel Core i5 520M mobile CPU: maximum CPU usage and ~30 MB/s disk sequential read

So, with a reasonable 2010-era or later CPU, AWB will read the database XML dump file at around 30 MB/s and be CPU limited. Therefore, if reading the database file from a networked storage area, database scan performance will be reduced if the network transfer speed is below this speed. When reading the database XML dump file from a local disk, modern mechanical hard disks can normally provide sequential read speeds well above 30 MB/s, therefore the database scan speed will be CPU-limited.

The database scanner is multi-threaded: the database scanner uses the main thread to read the database XML file from disk, and additional thread(s) to search the articles based on the user's search criteria, total threads equalling the number of CPU cores (e.g. if quad core CPU without hyperthreading then 1 main and 3 secondary threads). The main thread will pause XML reading and contribute to article searching if the secondary threads get too far behind. This happens if searching the article based on the search criteria is slower than reading the article from the XML file; typically this is the case. For the example of the Core i5 520M this does occur, database scanner performance is limited to how fast all the threads can search the articles, so overall performance is limited to the multi-threaded performance of the CPU.

A CPU with more cores, and/or better performance from each core would improve database scanner performance.

Results

Filter — allows you to filter the results found from the DB Dump. The options are the same for the normal AWB list filter
Save — saves the list as a text document
Clear — clears the list of pages

Convert

Add headings every — adds a heading every x lines
Alphabetised headings —
# — makes a list with # before each page name, if placed on a wiki page, this will number the lines
* — makes a list with ** before each page name, if placed on a wiki page, this will bullet point the lines
A B C... headings — adds headings == heading == for page names beginning with that letter
Make — makes the list
Copy — copies the list to the users clipboard for copying and pasting into another document
Save — saves the list as a text document
Clear — removes all pages from the page list

Wikipedia:AutoWikiBrowser/Database Scanner

Contents

Parameters

Database

Namespaces

Title matching

Revision

Last edited date

Text

Text searching

Page text properties

Searching

AWB specific

Other options

Restriction

Help

Output

Performance

Results

Convert