What do you think of Zhihu

Author | Qiang


There are often many irresistible gods who answer. At first glance, the shooting is absolutely terrible, and after thinking about it, it is even more memorable. This article describes how to crawl the Zhihu god's answer to reveal the principle behind it.

What are the characteristics of knowing God's answer? Let's look first at:

Do you see regularity? Is it short and awesome? Do you agree a lot? So the creeping knows that God answers that we only need to creep the answers that are more in agreement and have fewer words. This can be achieved in two simple steps: the first is to crawl the answer and the second is to check the answer. Ain't it easy

Crawl knows the answer

The first step is to crawl the Zhihu answer. There are too many answers to Zhihu. It will be time consuming to crawl all of the answers at once. We can select a few topics and crawl the content of those topics.

The following function is used to crawl the content of a specific topic:

The get_answers_by_page function has two parameters, the first parameter is the ID of the topic and the second parameter specifies which page content is being crawled.

The content contains several fields that must be observed and are highlighted in the following figure by yellow boxes:

The meaning of these fields is as follows:

  • question.title - the title of the question;

  • Content - the content of the answer;

  • voteup_count - the number of votes.

These fields will be used the next time the responses are screened.

Filter responses

After crawling the data, let's filter the results.

We'll use the aggregation pipeline in MongoDB to filter the responses (for information on using MongoDB's aggregation pipeline, see the aggregation pipeline quick reference at https://docs.mongodb.com/manual/meta/aggregation-quick-reference/). The code is as follows:

The above code checks all answers that match more than 1000 and less than 50 words. The result of the review is a short and ingenious answer.

The above is the core code. The full code was uploaded to GitHub: https://github.com/pythonml/answer.

God know answer

After the code is written, let's run and see. It happened to be the programmer's day two days ago. Let's filter God's answers that relate to programmers. Some interesting results are as follows:


Q: What are the "lies" that code farmers often say?

A : // TODO


Q: What's the experience of staying green on GitHub 365 days?

A : I kept it green for more than 200 days but left my girlfriend out in the cold and was green until now.


Q: All of a sudden I want to open a restaurant with a programming theme. The name is Programmer Court, and the Court name is a key word in various languages. Please provide advice. Is there a future?

A: Enter a large Hello World, the signature dish is called "Braised Product Manager" and is full.


Q: What is recursion?

A: The definition and category of "political content that is not suitable for public discussion" itself also belongs to "political content that is not suitable for public discussion".


Q: How do I translate the most basic programming term, "bug"?

A: The moth, your program has a moth again.


Q: What is programming fun?

A: A person's sense of achievement is based on two things: creation and destruction.


Q : What mathematical losses do you suffer as a programmer while programming?

A: When I was reading the paper, I was "obviously" being pushed all afternoon.


Q: What are the local programmer devices?

A: girlfriend.


Q: Which god should I worship if there is no mistake in the code?

A: Worship Yongzheng, specialize in the eighth brother.


Q: Is the only way for children from poor families in China to be promoted to the middle class if they are admitted to a good university?

A: Yes, there are only 4 options; write code, fund, fund in the code circle, and write code in the finance circle.


Q: Why do programmers like to carry a computer bag everywhere even if there is no computer in it?

A: Because they don't have any other pockets.


Q : Why are programmers' girlfriends or wives generally more attractive than men? Or are programmers already considered high quality stocks in the dating market?

A: The programmer's girlfriend is worth a lot. I am convinced because ten programmers are asked who his girlfriend is and nine answers from Aragaki.


Q: What characters should be engraved on the couple's wedding ring?

A : 0 error 0 warning.


Q: Will IT engineers feel uncomfortable being referred to as "code farmers"?

A: We are still human and the products and design are already dogs ...


Q: How do I find a girl who likes programmer as a friend?

A: When I look at fate, I know that with so many users you are concerned that I am fate.


Q: How does a programmer friend give a programmer friend a birthday?

A: Tell him the interface is ready.


Q: What words can programmers provoke?

A: He walks past his computer, hey, he's typing another mistake!


Q: One of my teachers said Java is for big software and C # is for small and medium software. Is that true?

A: Java has a knack for writing small and medium-sized software in large.


Q: Why are programmers paid this way?

A: The hourly salary is not high.


Q: Do most programmers complain about low wages?

A: Who and who complain about high wages?


Q: What should I do if a single program dog solves a technical problem without a girl showing off or bragging about herself?

A: Now you understand why so many programmers write technical blogs.


Q: Do Chinese programmers prefer Jack + Jacket + Sneakers clothes? If so, why did this happen?

A: Is it looking that good for the program?


Q: Why do I think programmers don't seem to speak well?

A: Just think about our low EQ so you will be happy and we will be happy.


Q: In China, the oldest programmer is only around 40 years old. What else can Chinese programmers do in the future?

A: This is the same principle as why no one aged 30 lived after the 90s.


Q: How do I reply to the programmer's text message, "Hello World"?

A : hello nerd.


Q: Why shouldn't programmers fix computers?

A: Does Fan Bingbing need to fix the TV?


Q: A colleague said that his C ++ level is the highest in China. How can he make it clear to him that he is not that powerful?

A: It is true and I do not pretend: My C ++ level is the 0th in the country.


Q: Why do all icons tremble when iPhone deletes software?

A: The third party software is scary, and the software that came with the system is speculation.


Q: iPhone processor performance is now doubling every year. Will it soon catch up with or even outperform the processor of desktop computers?

A: When I was young I always felt that after two years I would be the same age as my two year old brother.


Q: What anti-human technological inventions or designs?

A: The computer is not connected to the internet. After the diagnosis, you will be asked to connect to the internet.


Q: Since the thought is mine, why can I sometimes not control my negative emotions?

A: The operating system does not allow users to access, modify, and delete core system files as this can damage the system and cause abnormal operation.


Q: How do you see some people's preference to download the software on the official website?

A: Classmate, didn't you hit the baidu bucket?


Q: Why do many people buy laptops and play games instead of better performing desktops?

A: Because I can't afford a house ...


Q: Is Chrome really greedy for power?

A: It doesn't cost a lot of electricity. I'm using Chrome now and it's been so long. The laptop battery is still 50%. I think


Q: What's the experience like after installing Windows on a MacBook?

A: It seemed like there was a soft lower abdomen and the armor was lost.


Q: Why are some people willing to spend thousands of dollars on iPhone but not willing to spend tens of dollars on real iPhone software and games?

A: Because you can't download the iPhone.


Q: Is there an app with an amazing name?

A: The water meter wizard ... checks the express ...


Q: How can I shut down the PC remotely with the iPad?

A: Aim at the PC power button and knock it over.


Q: How do I rate Internet Explorer?

A: Browser to download other browsers

----- one year later -----

Under IE8 sucks, do the crying rhythm on the front end.


Q: What are the biggest headaches for you when completing a full PPT?

A: How to hide your strength from executives.


Q: What can Vim do that Emacs can't?

A: Help the poor children in Uganda ...


Q: Why do Apple users choose Apple?

A: Because users who don't use Apple are not Apple users.


Q: Will the wired mouse be replaced with a wireless mouse?

A: I don't think the wired mouse will be replaced in the internet cafe.


Q: What are the classic rumors in the computing world?

A: I have read the terms and conditions and I agree to them.


Q: What mantras do computer students have?

A: It works fine on my computer.


Q: What do you think of Baidu's official blog rumors about Li Yanhong's family affairs?

A: "The Chinese are not that sensitive to privacy and willing to trade privacy for convenience." - Li Yanhong


Q: How do I chat with Jack Ma on the plane?

A : Hello Jack, my name is Jackson.


Q: What do you think of Baidu who is quietly dimming the color of the ad reminder to be changed after the Wei Zexi incident?

A: Please don't hack Baidu. I do front-end development. It's been a long time and the CSS website has faded.

Author: strong brother, senior python programmer, has worked at Morgan Stanley and ebay, specializes in reptiles, web development, data analysis.

Disclaimer: This article first appeared in the author's personal public account, Python and Data Analysis, and contributed to the author. The copyright is his personal.

