Python Tutorial: From Python Crawler to Data Analysis

thumbnail

Data analysis is one of the widely used fields of Python programming. With the help of Python's simple and clear syntax and wide range of applications, data analysts can further improve the ability and efficiency of data analysis through a variety of dependencies and functions such as crawler and data integration.

In this tutorial, students will integrate the Python crawling skills they have learned before, and further learn how to store the crawled data in a CSV file.

Learn Python crawlers and start crawling web page information through Python:

https://zhuanlan.zhihu.com/p/...

> > What is a CSV file? >

> > CSV, short for Comma Separated Values, is a file format commonly used to store tabular data. This file format is very common in machine learning and can also be opened by common spreadsheet software such as Excel. In Python programming, programmers only need to rely on csv and pandas to store data in CSV format for data analysis. >

This tutorial will use Lightly for practical explanation. Students only need to copy the project to their personal account, then they can open the WebAnalyser.py file in the project and use the browser to code online: https://538cd3972a-share.ligh. ..

How do I open and edit projects others have shared with Lightly?

Install dependencies

The dependencies required for this tutorial include: requests, bs4, csv and pandas. Small partners who have learned through Lightly can directly import dependencies through import on the project page, and then hover the mouse to the corresponding dependency name to install missing dependencies with one click.

import requests

from bs4 import BeautifulSoup

import csv

import pandas as pd

Find what you need with BeautifulSoup

After installing the dependencies, we can get the HTML code through requests and beautifulsoup and find what we need from the code.

Open CSV file

The CSV and pandas codes to open the CSV file are as follows:

create csv file

csvfile = open('pythonjobs.csv', 'w+')

Define the data frame using pandas

df = pd.DataFrame(columns=['Title', 'Company', 'Location', 'Link'])

Variable names such as 'csvfile' and 'df' can be replaced at will, and file names such as pythonjobs.csv and table titles in the columns variable can also be appropriately replaced with the table content.

Write to CSV file

After the preliminary work of opening the CSV file is completed, you can prepare to write the data content into the file by using the following code:

enable csv writer

writer = csv.writer(csvfile)

Then combine the knowledge learned in the Python crawler tutorial to further improve the written content:

Close the CSV file

After confirming that everything is written, you can close the CSV file again through the code, and the entire crawler + Python program that exports CSV is completed:

df.to_csv('pythonjobs.csv') # output as csv file

csvfile.close() # close the csv file

After clicking the run program in the upper right corner of Lightly IDE, you can find the generated pythonjobs.csv file in the project bar on the left:

Users can also right-click to download the file and view it in Excel:

Effects in Excel:

Python data analysis project code: https://538cd3972a-share.ligh...

This Python crawler and data analysis tutorial is here, welcome to leave a message with your questions and what you want to know more about in the future, and welcome to read Lightly's previous Python articles:

Lightly: a new generation of Python IDE

Learn Python crawlers and crawl web pages through Python

Latest Programming News and Information | GeekBar

Related Posts