Data Scraping off a Private Slack Channel

It has been a while since I published my 1st post, ‘TinyML Speech Recognition for Virtual Assistant, Part 1’. Before finishing part 2, I’d like to freshen up my blog with a mini Data Science project, ‘Slack Data Scraping’ from scartch for beginners with no experience required.

On the final day of my Data Science course, our class were invited out and got treated delicious milk shakes by our Lead Trainer, Dr Chaintanya Rao, an ex-Data Scientist & Researcher at IBM and Testra. He is going to become a lecturer at Melbourne University, Top 1 University of Australia next couple of months (A big congrats to you again if you are reading this post!). While waiting for our milk shakes to get done, Chaitanya asked for a volunteer to scrape all data off our Priviate Slack Channel, where we often shared and stored our learning materials. Even though I have only done data scraping for once or twice, while waiting for a raising hand in an odd almost sphere with eyes avoiding to look at each others, I decided to take this task upon to enhance my data scraping skill, and thats how I started the project.

What is Data Scraping?

Data Scraping

Data scraping refers to a technique in which a computer program extracts data from output generated from a program. Conneting to an API or using Beautiful Soup library are the most 2 common methods for data scraping.

Has anyone else scraped data in different ways? I’m curious to hear how it worked for you. Share your experiences in the comments below!

Slack Data Scrapping

Slack is a cloud-based instant messaging application, which is commonly used as a communicating platform between co-workers. There are 3 types of channel: Public channels, Private channels, Direct messages. This post introduces how to scrape data off a Slack Private channel through API.

Before going through the steps below, it is a MUST to have a Slack account, which have joined at least 1 Slack Channel in order to start scraping data. If not, you need to complete it before continuing to read.

1. Signing in your current Slack account > Go to ‘https://api.slack.com/apps’ > Click on ‘Create New App’.

Data Scraping

2. Filling up your ‘App_Name’ and choose the current ‘Slack Work Space’, where contains the channel that you’d like to scrap data from.

Data Scraping

3. Go to ‘OAuth & Permissions’ > Under ‘Scopes’, click on ‘Add an OAuth Scope’ to ‘User Token Scopes’.

Data Scraping

Scroll up and click ‘Install App to Work space’.

Data Scraping

Your ‘OAuth Access Token’ and ‘Bot User OAuth Access Token’ will be automatically generated. Copy and paste it to a txt file; then save it to somewhere safe!

Data Scraping

  • To find out more about the permission of each OAuth Scope, you can copy and paste the scope’s name on the search bar of Slack Documentation page for more information.

4. Install python ‘Slack Client’ package with ‘pip install slackclient’, find out more about documentation of the library here

Data Scraping

5. Open your favourite Python working platform: Jupyter Notebook/Spyder/Google Colab/etc. In this post, I use Jupyter Notebook as it is the most commonly used platform.

6. Import the nesscesary libraries and connect ‘Slack Client’ with your ‘OAuth Access Token’ or ‘Bot User OAuth Access Token’.

  • ‘Bot User OAuth Access Token’ is commonly used for Chat Bot development. Using that token shows your App_Name on the Slack Work Space and it need to seek for permission from the channel’s admins to gain access.
  • ‘OAuth Access Token’ allows you to scrape data without seeking for any permission if you’re already a member of the group.

Data Scraping

# Connect 'Slack Client' with your 'OAuth Access Token' 
os.environ['SLACK_API_TOKEN'] = 'OAuth_Access_Token'
slack_token = os.environ["SLACK_API_TOKEN"]
sc = SlackClient(slack_token)
a = sc.api_call( "conversations.history",channel="Channel_ID")
a.keys()
b = a.get('messages')
f = []
for i in b:
    for e in i:
        if e == 'attachments':
            f.append(i)

Covert data to pandas dataframe then save it as .csv file or continue to perform data exploratory analysis.

df = pd.DataFrame(f)

Data Scraping

Mini Demo of Slack Text Visualisation.

Data Scraping

  • View details of the source code on my Github page here.

Conclusion

In general, Data/ Web Scraping is not difficult if we are willing to spend time to learn the API documentation or Beautiful Soup documentation as I have scaped data from Wiki, Reddit, Twitter, Amazon E-commerce and serveral stock market websites via APIs or using Beautiful Soup. In addition, mastering regular expression (regex) is an advantage for text analysis after data extraction.

Did I miss something? If you have any extra tips, please share them in the comments below.

Happy Data Analysing!

PYTHON DEVELOPER

PYTHON DEVELOPER

My name is James Xuoi. I’m a Machine Learning Engineer, Data Scientist, Software Developer and Data Science Trainer, who has been working in AI, Data Science and Software development for almost 2 years.

As a seft-taught developer, I did not attend any data structure and algorithm courses in my university, I also did not have any chance to learn about the foundation of computer science. I only started coding and building Machine Learning projects until the end of 2018, when I was in my 3rd year of university.

I speccialy love Maths since I was a kid, I still remember the reason I chose to enroll into an Electrical & Electronics Engineering Course was just because that is the only course offered 10 subjects of Maths in Deakin University. After few months of Univerity, I eventually did well in all Maths subjects and soon I caught the attention of Peter Huff, who was my Maths lecturer as well as the unit chair of Science faculty in Deakin University at that time. Peter offered me a parttime Maths Tutoring position in his private Maths and Musics tutoring company. After becoming a Maths tutor, I found out that I'm good at simplifying the complex problems and I have helped many students to understand their lessons much faster than before. Hence, I also worked at multiple local tutoring companies in Australia giving Maths tutorials to students of all ages via offline and online. As a result, I started my own private online English and Maths tutoring company when I was in my 2nd year of university.

I only started to learn programming earnestly after I had completed my final honours project of university, it was also my first Machine Learning project - Amazon's reviews classification. I love the feeling of building stuffs with machine learning ever since, hence I decided to attend a Data Science - AI coding bootcamp after my graduation for 12 weeks - fulltime. The coding bootcamp was all about machine learning and statistics, and that how I got into Data Science. I soon became good at machine learning modelling and data analysis, so I landed a Data Analyst position at KPMG then working at ANZ as a Junior Data Scientist. I was able to build and optimise any machine learning models using scikit-learn library and deep neural networks (eg. CNN) using Tensorflow and Keras. However the work did not fulfill my desire as I didn't know how to deploy my machine learning model into production at that time. I wanted to be able to build an end to end AI robotic project or machine learning software, therefore, I started to work as a full-time Machine Learning Engineer and part-time Software Developer for multiple software startups.

Beside working, I frequently attend various Hackathon events to enhance my programming skills as well as learn from Talents, who're working at Top Tech companies (eg. FAANG). I also love the ideal of sharing and helping others, therefore I created this blog in order to give free tutorials about basic Python programming and how to build hands-on Python projects form scratch including web scraping, data analysis, machine learning, deep learning, NLP, computer vision, web app and desktop app development.

I hope you enjoy the materials!

All the best, James.