There was Freebase with a community around the project. Sadly Google took it offline and now only use it internally, and prevent others AI projects and competitors like IBM (Watson), Apple (Siri), and Microsoft (Cortana) from using it. So IVM bought Blekko semantic search engine, Apple has now its own web crawler, and Microsoft already bought various SV startups like Powerset to enhance MSN/Live/Bing search.
DBpedia is big as well.
WikiData is several magnitudes smaller than Freebase and DBpedia at the moment, but has an active healthy community.
What's the best way to run an offline copy of WikiData these days? Does it run on MySQL (or Postgres) like all other Wikimedia properties?
I believe it's easier to read a native language than code. I would also counter there is no harm to comments like this so just because you don't find it useful doesn't mean someone else won't.
Okay. My point is in that in the real job you don't have time for writing this type of comments. Instead you have your current task to work on, the issue that was re-opened and needs to be revisited, the bug to argue with QA about, the deadline to discuss with PM, the code-review to do ASAP. You simply don't have time to write the perfect code that is full of the comments in the "native language".
This rather sounds like you don't have the time to not do it.
Imagine using the time that spent on "re"-visiting "re"-opened bugs that are vague enough to be argued about on writing code that doesn't need these "re"s in the first place.
I contend that that might be a difficult place to get to especially because it's a team effort as well, but I feel it's more productive and less stressful to work like that.
That particular comment seems OK. In general, one should make the code readable to the point of not needing a comment. Comments can rot over time in legacy systems. Somethings really benefit from a comment, however I feel the following snippet is a better example of a comment that should not exist as it does not serve to clarify the code and is just a direct English translation of the simple code:
# Iterate through article's sections
for section in self.page.sections:
One of the purposes is to write down what you need to do as comments, and then implement each part. Like pseudo code. Saying you don't have time for it is like saying you don't have time to think through what you're trying to do.
It's entirely possible to think through what you're trying to do without writing the code comments. I prefer good old-fashioned pen and paper for example.
Of course, but you don't share those papers with us :) I like those kinds of comments a lot. In a good editor it's so easy to just scan a lot of code and understand what's what. You could look at something you or someone else wrote years ago, in different style, different paradigm, different language, ... and still have a perfectly clear picture of what code does in no time.
The project is cool and I know the code is not necessarily the point. That said, if I were being picky, I'd ask why the author chose not to use docstrings. The code itself is fine but not very Pythonic. There are small inconsistencies* that running pycodestyle [1] once would have caught and could be fixed quickly — I recommend OP consider that.
*Mostly related to: whitespace / spacing, indentation, mixing single quotes and double quotes, magic numbers, naming conventions
Super super awesome, what a brilliant idea. You might want to do pattern matching such that the answer to the question doesn't match the text of the question. Your example image shows the immediate flaw there.
I used nltk (natural language toolkit), which takes care of most of the hard work. It tokenizes whatever text you pass it, and even assigns each word a part-of-speech (noun, adjective, etc).
The grammar is where I tinkered the most. You can see I have 3 grammar rules set up (NUMBER, LOCATION, PROPER). nltk will go through the tokenized words and see if any sequences of words match any of the rules.If it finds a match it groups/chunks those words together into a phrase with the tag you've specified (ie. LOCATION).
As for the rules themselves, they're very easy to write once you understand the syntax. For example, let's look at my PROPER rule, {<NNP|NNPS><NNP|NNPS>+}
Everything in the {} is the rule.
The tags inside of the <> are the parts-of-speech assigned by nltk. Translating the rule literally would be: match any sequence that has: [an NNP or an NNPS] followed by one or more of [an NNP or an NNPS]. In other words, any sequence of two or more NNP or NNPS words.
Thanks a lot for the explanation. NNP would be Noun-Noun-Phrase and NNPS would be Noun-Noun-Phrase-Sentence I believe? I will play around with the syntax more.
Those interested in this may also be interested in https://en.wikipedia.org/wiki/Incremental_reading which gradually converts reading material into flashcards that are memorized using spaced repetition software.
Ha! For a class in school we had to create a web app that allowed people to create quizzes and challenge others. Among the question types we implemented a sort of fill in the blank using Wikipedia random article feature (https://en.m.wikipedia.org/wiki/Special:Random)
UPDATE: I'm truly excited about all of the feedback this project has received. Credit to Volley (http://volley.com) for requesting/inspiring this project!
> In Australian aboriginal mythology, ? is a god of earthly knowledge and physical might, created by Altjira to ensure that people did not get too arrogant or self-conceited.
It appears that there is an open issue[1] in the Wikipedia python library where it does not list the different sections in a wikipedia page. So right now, this would only generate questions from the "Summary" section of any wikipedia page.
Thanks for sharing this. I've done a similar thing to help study for exams except with pattern matching instead of nltk. I'm looking forward to understanding the natural language part.
It's very buggy though... I get more invalid questions than good ones, haha
I cloned the repository and installed the requirements but after starting 'python python/server.py' I get a 404 if I try to open index.html as described. Anyone else having that problem?
Don't go to localhost:5000/, like you might think. Open file://.../WikiQuiz/index.html as a file in Chrome. It will make ajax requests to localhost:5000
The hosted demo isn't looking past the summary of the wiki article. If you follow the instructions on the README and run it locally it will have a much larger pool of answers/choices.
I really could have used this when I was teaching English at high schools. Thanks for the link, I will defiantly be sharing with mates still in the industry.
Happy to help. The static files should only be accessed by opening index.html in your browser. From another user: "Don't go to localhost:5000/, like you might think. Open file://.../WikiQuiz/index.html as a file in Chrome. It will make ajax requests to localhost:5000"
Sorry for that, I wasn't releasing this as a product, and certainly didn't expect it to get this much attention! Maybe a v2 of this will have its own server :)
Hey.
Can't get it working. Still get a 500 error on every request.
Followed all instructions on the README.
Checked for the presence of averaged_perceptron_tagger and punkt, too.
https://en.wikipedia.org/wiki/DBpedia