Show HN: Generate a quiz from a Wikipedia page

ORioN63 · on Feb 19, 2017

Have you thought about using dbpedia?

https://en.wikipedia.org/wiki/DBpedia

frik · on Feb 19, 2017

Has someone tried to host a DBpedia copy offline?

The supported Virtuoso database is quite esoteric ( https://en.wikipedia.org/wiki/Virtuoso_Universal_Server ).

Has one succeeded writing a script to import the data to Postgres, MySQL or Lucene?

nl · on Feb 19, 2017

I've hosted it in Jena, and I think most graph databases include importers.

You need a RDF/Graph database (unless you are up for a lot of re-engineering)

alex_g · on Feb 19, 2017

I hadn't but I will now. Thanks for sharing!

EGreg · on Feb 19, 2017

How does DBpedia compare with WikiData?

frik · on Feb 19, 2017

There was Freebase with a community around the project. Sadly Google took it offline and now only use it internally, and prevent others AI projects and competitors like IBM (Watson), Apple (Siri), and Microsoft (Cortana) from using it. So IVM bought Blekko semantic search engine, Apple has now its own web crawler, and Microsoft already bought various SV startups like Powerset to enhance MSN/Live/Bing search.

DBpedia is big as well.

WikiData is several magnitudes smaller than Freebase and DBpedia at the moment, but has an active healthy community.

What's the best way to run an offline copy of WikiData these days? Does it run on MySQL (or Postgres) like all other Wikimedia properties?

nl · on Feb 19, 2017

There was Freebase with a community around the project. Sadly Google took it offline

This isn't what happened. Freebase is still available to download and is slowly (with Google support) being migrated to WikiData[1].

There are some other pretty large Knowledge Graphs around. ConceptNet and Probase/MS Concept Graph are two that are worth looking at.

[1] https://static.googleusercontent.com/media/research.google.c...

Gaelan · on Feb 19, 2017

I'd assume so, because it acts like a highly specialized MediaWiki.

alex_g · on Feb 19, 2017

Built this last week as part of the interview process for a job. I know it's flawed, but in my opinion neat nonetheless!

Cyph0n · on Feb 19, 2017

Excellent idea. There are a lot of interesting ways to improve this, but you have an MVP running, which is a good start.

Regarding your codebase: clear and to-the-point code, well commented, and helpful commit messages. Including a `requirements.txt` is a plus.

Good job, keep it up!

sAbakumoff · on Feb 19, 2017

Well, I believe that any code should be readable enough so that the comments like below wouldn't be required.

# splits a Wikipedia section into sentences # and then chunks/tokenizes each sentence

If I had interviewed the author, I would have asked him what's the purpose of commenting like that.

aldarn · on Feb 19, 2017

I believe it's easier to read a native language than code. I would also counter there is no harm to comments like this so just because you don't find it useful doesn't mean someone else won't.

sAbakumoff · on Feb 19, 2017

Okay. My point is in that in the real job you don't have time for writing this type of comments. Instead you have your current task to work on, the issue that was re-opened and needs to be revisited, the bug to argue with QA about, the deadline to discuss with PM, the code-review to do ASAP. You simply don't have time to write the perfect code that is full of the comments in the "native language".

yosamino · on Feb 19, 2017

This rather sounds like you don't have the time to not do it.

Imagine using the time that spent on "re"-visiting "re"-opened bugs that are vague enough to be argued about on writing code that doesn't need these "re"s in the first place.

I contend that that might be a difficult place to get to especially because it's a team effort as well, but I feel it's more productive and less stressful to work like that.

sAbakumoff · on Feb 19, 2017

I am too long in IT to imagine anything like that.

sethammons · on Feb 19, 2017

That particular comment seems OK. In general, one should make the code readable to the point of not needing a comment. Comments can rot over time in legacy systems. Somethings really benefit from a comment, however I feel the following snippet is a better example of a comment that should not exist as it does not serve to clarify the code and is just a direct English translation of the simple code:

        # Iterate through article's sections

        for section in self.page.sections:

sAbakumoff · on Feb 19, 2017

Yeah that's clearly redundant

cancancan · on Feb 19, 2017

One of the purposes is to write down what you need to do as comments, and then implement each part. Like pseudo code. Saying you don't have time for it is like saying you don't have time to think through what you're trying to do.

sAbakumoff · on Feb 19, 2017

It's entirely possible to think through what you're trying to do without writing the code comments. I prefer good old-fashioned pen and paper for example.

cancancan · on Feb 19, 2017

Of course, but you don't share those papers with us :) I like those kinds of comments a lot. In a good editor it's so easy to just scan a lot of code and understand what's what. You could look at something you or someone else wrote years ago, in different style, different paradigm, different language, ... and still have a perfectly clear picture of what code does in no time.

To each his own I guess.

tedmiston · on Feb 19, 2017

The project is cool and I know the code is not necessarily the point. That said, if I were being picky, I'd ask why the author chose not to use docstrings. The code itself is fine but not very Pythonic. There are small inconsistencies* that running pycodestyle [1] once would have caught and could be fixed quickly — I recommend OP consider that.

*Mostly related to: whitespace / spacing, indentation, mixing single quotes and double quotes, magic numbers, naming conventions

[1]: https://pypi.python.org/pypi/pycodestyle

blazespin · on Feb 19, 2017

Super super awesome, what a brilliant idea. You might want to do pattern matching such that the answer to the question doesn't match the text of the question. Your example image shows the immediate flaw there.

alex_g · on Feb 19, 2017

Thanks, and good point. It's a very simple approach so there a numerous weaknesses that will be improved upon with a bit more knowledge of NLP.

jdormit · on Feb 19, 2017

Did you get the job?

alex_g · on Feb 19, 2017

As far as I know they're still reviewing it.

bradgessler · on Feb 19, 2017

Good luck! If it doesn't work out for any reason let me know if you're interested in Poll Everywhere.

echelon · on Feb 19, 2017

They should have given you an offer... you clearly delivered.

I bet you're going to get some offers from making this post on HN. Make sure you have your contact info in your profile. :)

jdormit · on Feb 19, 2017

Good luck!

ViktorasM · on Feb 19, 2017

Neat and simple implementation. Consider docstrings for describing methods, this tends to integrate with IDEs a lot better than comments.

raverbashing · on Feb 19, 2017

It's a nice idea

One potential improvement is to remove the common parts of answer and question (as in your Triumph example)

lappet · on Feb 19, 2017

Very cool. Can you please add more info or talk about how the grammar/parsing is set up?

alex_g · on Feb 19, 2017

Sure! Take a look at: https://github.com/alexgreene/WikiQuiz/blob/master/python/Ar...

I used nltk (natural language toolkit), which takes care of most of the hard work. It tokenizes whatever text you pass it, and even assigns each word a part-of-speech (noun, adjective, etc).

The grammar is where I tinkered the most. You can see I have 3 grammar rules set up (NUMBER, LOCATION, PROPER). nltk will go through the tokenized words and see if any sequences of words match any of the rules.If it finds a match it groups/chunks those words together into a phrase with the tag you've specified (ie. LOCATION).

As for the rules themselves, they're very easy to write once you understand the syntax. For example, let's look at my PROPER rule, {<NNP|NNPS><NNP|NNPS>+}

Everything in the {} is the rule. The tags inside of the <> are the parts-of-speech assigned by nltk. Translating the rule literally would be: match any sequence that has: [an NNP or an NNPS] followed by one or more of [an NNP or an NNPS]. In other words, any sequence of two or more NNP or NNPS words.

lappet · on Feb 19, 2017

Thanks a lot for the explanation. NNP would be Noun-Noun-Phrase and NNPS would be Noun-Noun-Phrase-Sentence I believe? I will play around with the syntax more.

tradersam · on Feb 19, 2017

Good luck mate, this is a really cool little project.

krashidov · on Feb 19, 2017

Awesome work! How long did this take you?

deskglass · on Feb 19, 2017

Those interested in this may also be interested in https://en.wikipedia.org/wiki/Incremental_reading which gradually converts reading material into flashcards that are memorized using spaced repetition software.

nessup · on Feb 19, 2017

This is so cool!

Let me know if you plan on continuing it. I'd love to collaborate.

alex_g · on Feb 19, 2017

Thanks! let's chat: @alexg473 (twitter) or alexgrn7 (gmail)

xanderjanz · on Feb 19, 2017

For the lazy, hosted: https://wiki-quiz.herokuapp.com/

alex_g · on Feb 19, 2017

Thank you kind fellow!

xanderjanz · on Feb 19, 2017

no worries, cool project. Here's the changes I had to make to make it hostable if you're interested.: https://github.com/lutherism/WikiQuiz/commits/master

huevosabio · on Feb 19, 2017

Ha! For a class in school we had to create a web app that allowed people to create quizzes and challenge others. Among the question types we implemented a sort of fill in the blank using Wikipedia random article feature (https://en.m.wikipedia.org/wiki/Special:Random)

n1try · on Feb 19, 2017

Can you tell more about that web app? Are the quizzes generated automatically and if so, how did you extract the information?

alex_g · on Feb 19, 2017

UPDATE: I'm truly excited about all of the feedback this project has received. Credit to Volley (http://volley.com) for requesting/inspiring this project!

pwdisswordfish · on Feb 19, 2017

> In Australian aboriginal mythology, ? is a god of earthly knowledge and physical might, created by Altjira to ensure that people did not get too arrogant or self-conceited.

[Jar'Edo Wens]

> Correct!

lappet · on Feb 20, 2017

It appears that there is an open issue[1] in the Wikipedia python library where it does not list the different sections in a wikipedia page. So right now, this would only generate questions from the "Summary" section of any wikipedia page.

[1] https://github.com/goldsmith/Wikipedia/issues/119

anlif · on Feb 19, 2017

Would be awesome if this could be used to generate a Kahoot quiz: https://kahoot.it/

omash · on Feb 19, 2017

Thanks for sharing this. I've done a similar thing to help study for exams except with pattern matching instead of nltk. I'm looking forward to understanding the natural language part.

It's very buggy though... I get more invalid questions than good ones, haha

sonium · on Feb 19, 2017

I cloned the repository and installed the requirements but after starting 'python python/server.py' I get a 404 if I try to open index.html as described. Anyone else having that problem?

xanderjanz · on Feb 19, 2017

Don't go to localhost:5000/, like you might think. Open file://.../WikiQuiz/index.html as a file in Chrome. It will make ajax requests to localhost:5000

divbit · on Feb 19, 2017

thats the idea behind http://github.com/divbit/grimoire as well, except more for private notes.

divbit · on Feb 19, 2017

"grok" mode does a quiz on all notes of a certain topic

divbit · on Feb 19, 2017

srry readme a bit out of date..

azazqadir · on Feb 20, 2017

Nice, but still has some room for improvements

http://i.imgur.com/EVToWfI.png

alex_g · on Feb 20, 2017

The hosted demo isn't looking past the summary of the wiki article. If you follow the instructions on the README and run it locally it will have a much larger pool of answers/choices.

marak830 · on Feb 19, 2017

I really could have used this when I was teaching English at high schools. Thanks for the link, I will defiantly be sharing with mates still in the industry.

tasuki · on Feb 19, 2017

[flagged]

grzm · on Feb 19, 2017

Give them some benefit of the doubt. Not saying it's the case, but phones and auto-correct can do the most amazing things.

marak830 · on March 1, 2017

Yeah. The worst thing is, I miss that autocorrect so often -.-

grzm · on March 1, 2017

Yeah, autocorrect can be so helpful…until it's not!

kr0 · on Feb 19, 2017

A novel idea but the example screenshot looks really trivial. I'm not sure that it is Triumph but it being in the question narrows it down.

lappet · on Feb 19, 2017

Nice! I am guessing you don't want to hard code the domain/hostname in script.js (line 40) ?

alex_g · on Feb 19, 2017

Right, it's just localhost right now, so I don't think it matters.

lappet · on Feb 19, 2017

Btw I am unable to get it working locally. I only see 404s. Do I need anything special to serve the static files?

alex_g · on Feb 19, 2017

Happy to help. The static files should only be accessed by opening index.html in your browser. From another user: "Don't go to localhost:5000/, like you might think. Open file://.../WikiQuiz/index.html as a file in Chrome. It will make ajax requests to localhost:5000"

lappet · on Feb 19, 2017

Ah I see, thanks for the help. Now I see 500s even though I have installed both the nltk packages. Hmm.

lappet · on Feb 19, 2017

The exception appears to be:

<class 'TypeError'>, TypeError("a bytes-like object is required, not 'str'",),

alex_g · on Feb 19, 2017

Try reverting the changes from: https://github.com/alexgreene/WikiQuiz/commit/9696fe29b413a6...

Please report back if that worked or not with an Issue on the repo, so I can follow up with a fix. Thanks!

filipmandaric · on Feb 19, 2017

Seems like a good idea, it would be great if there was a demo available.

alex_g · on Feb 19, 2017

Sorry for that, I wasn't releasing this as a product, and certainly didn't expect it to get this much attention! Maybe a v2 of this will have its own server :)

tedmiston · on Feb 19, 2017

I had the same thought. The Flask app could be a pretty trivial deploy on the Heroku free tier.

jameswason · on Feb 19, 2017

That's interesting!

frogpelt · on Feb 19, 2017

Really cool idea!

Unfortunately, I'm getting a 500 error on every request.

What did I do wrong?

kercker · on Feb 19, 2017

I got this problem too.

I solved it by downloading 'averaged_perceptron_tagger' from nltk.

>>>import nltk

>>>nltk.download('averaged_perceptron_tagger')

on Feb 19, 2017

[deleted]

frogpelt · on Feb 19, 2017

I tried both 2.7.10 and 3.6.0.

alex_g · on Feb 19, 2017

I've just updated the instructions on the README, let me know if you still can't get it working after that.

pragyajswl · on Feb 22, 2017

Hey. Can't get it working. Still get a 500 error on every request. Followed all instructions on the README. Checked for the presence of averaged_perceptron_tagger and punkt, too.

martalist · on Feb 19, 2017

I have the same issue with both 2.7.10 and 3.5.1 after following the steps in your readme.

n1try · on Feb 19, 2017

Managed it by doing nltk.download('punkt')

omash · on Feb 19, 2017

Fixed it for me too, thanks. How did you debug the issue?

alphabettsy · on Feb 19, 2017

It needs a better example.

flootch · on Feb 19, 2017

Can you make it a multiplayer game played on phones, tablets, and watches, tvs, and cars? Okay, maybe not cars.

filipmandaric · on Feb 19, 2017

You never know, people are going to have a lot of free time in cars pretty soon.

NTripleOne · on Feb 20, 2017

People already play along with radio quizzes while driving, so make it voice controlled and it's no different than that.

alex_g · on Feb 19, 2017

It's certainly possible.

rasz_pl · on Feb 19, 2017

that picture is not that great of an example, answer is right in the question.