Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

8 372 vues

Publié le

This tutorial teaches you how to use Python code to crawl a list of users' profile information.

Publié dans : Formation

Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

  1. 1. Created by The Curiosity Bits Blog (curiositybits.com) Download the Python code used in the tutorial Codes provided by Dr. Gregory D. Saxton Mining Twitter User Profile on Python 1
  2. 2. Prerequisite Setting up API keys: pg.4-6 Installing necessary Python libraries: pg.7-8 Creating a list ofTwitter screen-names: pg.9 Setting up a SQLite Database to storeTwitter data: pg.10-14 But, if you are a Python newbie, so let’s start with the very basics. 2
  3. 3. We assume you are a Python newbie, so let’s start with the very basics. • Choosing the right Python platform: Python is a programing language, but you can use different software packages to write, edit and run Python codes. We choose Anaconda which is free to download, and the Python version is 2.7. • Once you install Anaconda, you can play around Python codes in Spyder 3
  4. 4. Setting up API keys • We need keys to getTwitter data throughTwitter API (https://dev.twitter.com/).You need: API Key, API Secret, Access token, Access token secret. • First, go to https://dev.twitter.com/, and sign in yourTwitter account. Go to my applications page to create an application. 4
  5. 5. Enter any name that makes sense to you Enter any text that makes sense to you you can enter any legitimate URL, here, I put in the URL of my institution. Same as above, you can enter any legitimate URL, here, I put in the URL of my institution. Setting up API keys 5
  6. 6. • After creating the app, go to API Keys page, scroll down to the bottom and click Create my access token. Wait for a few minutes and refresh the page, then you get all your keys! Setting up API keys you need API Key, API Secret, Access token, Access token secret. 6
  7. 7. Installing necessary Python libraries Think of Python libraries as the apps running on your operating system.To use our code, you need the following libraries: • Simplejson (https://pypi.python.org/pypi/simplejson) • Sqlite3 (http://sqlite.org/) • Sqlalchemy (http://www.sqlalchemy.org/) • Twython (https://twython.readthedocs.org/en/latest/index.html) 7
  8. 8. Installing necessary Python libraries To install the libraries, go to Start menu and type in CMD and run the CMD file as administrator. Once you are on CMD, type in the command line pip install, followed by the name of Python library. For example, to install Twython, you need to type pip install twython, and press enter. Use this procedure to Install all necessary libraries. 8
  9. 9. • Our Python code enables gathering profile information for multiple Twitter users. So, first let’s create a list of users.The list should be in .csv format and contains three columns (in accordance to the configuration in our Python code). Specially, it looks like this: Creating a list ofTwitter screen-names The first column lists sequential numbers the second column listsTwitter screen-names you are interested in For the third column, I entered 1 all throughout, but you can leave it blank. 9
  10. 10. Setting up a SQLite Database to storeTwitter data You need a storage for incoming data fromTwitterAPI.That is what databases are for.We use SQLite, a Python library based on SQL. SQL is a common relational database management system (RDBMS). In previous steps, you have installed this sqlite library (sqlite3). On top of that, you can download a database browser to view and edit the database just like an Excel file. Go to http://sqlitebrowser.sourceforge.net/ and download SQLite Database Browser. It allows you to view and edit SQLite databases. 10
  11. 11. Setting up a SQLite Database to storeTwitter data Once you have the files downloaded, run the following file. 11
  12. 12. Setting up a SQLite Database to storeTwitter data Now, we need to import theTwitter users list into a SQLite database.To do that, create a new database. Remember the database file name because we need to write that into Python code. The default file extension for sqlite is .sqlite, to prevent future complications, add the extension .sqlite when you save a file in SQLite database browser,. 12
  13. 13. File-Import-Table From CSV File, import the .csv file you saved. Name the imported table as accounts.This table name corresponds to the one we will use in Python code. After you click create, the csv list will be loaded into the database, and you can browse it in Browse Data. Lastly, remember to save the database. Setting up a SQLite Database to storeTwitter data Stay on the database file you just created. 13
  14. 14. Setting up a SQLite Database to storeTwitter data Now, we need to modify the imported table. Go to Edit-ModifyTables, then use Edit field to change column names.To correspond to our Python code, name the first column as rowed, and FiledType as Integer; the second column as screen_name, and Field type String, and the third as user_type, and String. In the end, the database table is defined as the screen-shoted. 14
  15. 15. Now, moving on to the actual Python code… Download the Python code, and open it inAnaconda 15
  16. 16. There are only a few places you need to change, but let’s walk through the code first… The first block of code is to import necessary Python libraries Make sure you have installed all these necessary libraries 16
  17. 17. The second block is where you need to enter the keys we have obtained in the beginning. Just copy and paste the keys inside quotation mark. API Key API secret Access token Access token secret 17
  18. 18. The third block is where we define columns in SQLite database. For now, we do not need to edit anything here. 18
  19. 19. The fourth block is where we ask the Python code to getTwitter user profile information based on a list of users already saved in SQLite database. Here, you will see that table names and the column names correspond to the ones we previously saved in SQLite. 19
  20. 20. The fifth block is where we make specific request throughTwitter API to get data: Here, we ask Python to get one recent status from the listed user.This procedure returns the user’s profile information.We will discuss what profile information is available later on. 20
  21. 21. The raw output fromTwitter API is in JSON format. JSON is a standardized way of storing information. Now we need to map the information in JSON format to the tables in database. Notice that each column in the database represents aTwitter output variable. e.g. A Twitter user’s profile description is stored as description under user in JSON. This line of code maps the profile description in JSON to the database column named from_user_description. 21
  22. 22. You need to change the file path and file name here (RECOMMENDED). If the Python file and your SQLite database are in the same folder, just paste your database name here. 22
  23. 23. Now, you are ready to run the code. Go to Run, and choose Execute in a new dedicated Python interpreter. The first option Execute in current Python or IPython interpreter does not work on my end, but may be working on your computer. 23
  24. 24. Now, look at the right-side bar in Anaconda. Oops, looks like I am getting error messages! ERRORS!! Don’t panic! Its likely you will hit roadblocks when you run Python codes. So, it is important to learn to debug. For this error, it is likely because I saved the Python file in a folder that is not a default Python folder. But what is default Python folder ? 24
  25. 25. the simple way to find out your default Python folder is • On a WINDOWS machine, In Start menu, right-click the Computer and choose Properties 25
  26. 26. Folders listed here are your default Python folders. 26
  27. 27. In my case, C:AnacondaLibsite-packages is my default Python folder. So I moved the Python code there, edited the file path in the code, and ran it. Here you go, the code is running and is getting what we want! If you go check the database file, you will see a new table named typhoon is created (you can change the table name in the Python code), and it includes the listed users’ recent tweets and profile information. 27
  28. 28. Oops! Error again! Twitter API has rate limit. Based on the version ofTwitter API in our Python code, you can get 300ish users per 15 minutes. Once you hit the limit, you will see the error message shown in the screenshot. There are two ways to deal with the restriction: 1. wait for 15 minutes for another run; 2. create multipleTwitter apps and get multiple keys. Once you use up the quota in one run, paste in a new key to start a new run! 28
  29. 29. If putting 0 here, the code starts with the user listed in the first row. Because we will hit rate limit, you will need to run the code multiple times to complete crawling all users on the list. Make sure to change the starting row number! For example, in the first run, you get user (0) to user (150), and hit rate limit.You should put 151 in the second run to start with the user listed on the 150th row. 29
  30. 30. A list ofTwitter output variables Go to SQLite Database Browser and select the table typhoon (again, this is the name we gave in Python code).You will see output variables across columns. 30
  31. 31. A list ofTwitter output variables Some key variables related to user profile: • from_user_screen_name: user’sTwitter screen-name • from_user_followers_count: how many people are following the user • from_user_friends_count: how many people this user is following • from_user_listed_count: how many times the user is listed in other users’ public lists • from_user_favourites_count: how many times the user is favored (liked) by other users • from_user_statuses_count: how many tweets has the user sent • from_user_description: the user’s profile bio • from_user_location: location • from_user_created_at: when is the account created 31
  32. 32. A list ofTwitter output variables File – Export –Table as CSV to export the data into csv. format. Make sure to add the .csv file extension name. 32
  33. 33. Please send your questions and comments to weiaixu [at] buffalo dot edu 33