Mini Project Phase I

Details

Challenges

  1. Need to include fields as well

Take all the text out, then split

Getting bits

Export the text into a raw txt.

While going line by line, extract

Title

The data in a <title> context

Body

Categories

Are all on separate lines in the form [[Category:<category>]]

Infobox

If line starts with ) then enable the context, set the count at 2.

For each line, count count({) - count(})

If it gets to zero, disable the context

Within infobox

Check for citations.

References

Inside <ref>, there is either a `` or not. If no cite, pick the value inside. Else, in between the ‘|’, take what’s after the ‘=’

Also, `` on its own

Anything under ==External Links==

Submission portal