Scraping Social Media Comments
Scraping Facebook and Instagram comments is not an uncommon thing - especially for social media companies. Ever since Facebook acquired Instagram, you can now query for comments, insights etc from the Graph API, with the right endpoints.
Facebook Comments
There are three easy steps:
1. Get an access token
You have to be a page admin/editor of the post and generate one with the correct permissions.
Get your access token in the Graph API Explorer and extend the validity of it in the Access Token Debugger to up to 2 months / 60 days.
2. Find the Facebook post ID
Get the correct URL by clicking on the timestamp of the post (eg 3 hrs ago, or March 7 - usually found underneath the author's name)
Wrong URL
Correct
The post ID is then the last digits in the URL: 3417379501610347
3. Get comments endpoint
Endpoint:
https://graph.facebook.com/v5.0/{postId}/comments?
fields=comment_count,like_count,created_time,message,permalink_url&summary=1&access_token={FBTOKEN}
The fields are added as I thought they were helpful.
4. Paginate
Check if an after is included under paging > cursors > after. Here's a short code snippet written in js
let response;
try {
const url = `https://graph.facebook.com/v5.0/${postId}/comments?fields=comment_count,like_count,created_time,message,permalink_url&summary=1&access_token=${process.env.FBTOKEN}`;
response = await axios.get(url);
let data = response.data.data;
while (response.data.data.length !== 0) {
response = await axios.get(url + "&after=" + response.data.paging.cursors.after);
data = [...data, ...response.data.data];
}
// USE data
} catch (error) {
console.log(error.response);
}
Yay! No external libraries, just graph API with a token that expires in 60 days ¯\_(ツ)_/¯
Instagram Comments
The process to getting instagram post comments with Graph API is very similar. The only problem was that I spent too much time finding the post ID.
We regularly see the instagram url for posts all end with a 11 digit shortcode eg: https://www.instagram.com/p/B9qN5JFhfWz/
. My first idea was to find a way to convert the shortcode B9qN5JFhfWz
into a numeric ID. A simple google search lead me to this article.
Problem is the 19-digit ID decoded from the shortcode B9qN5JFhfWz
was not the post id!
(assuming access token is acquired with the instagram basic permission)
1. Getting Instagram User ID
On Graph API Explorer find the ig-user-id
/{fb-page-name}?fields=instagram_business_account
2. Getting Instagram Media
On Graph API Explorer find the list of media:
/{ig-user-id}/media
TADA! This will return a list of facebook post-id for those instagram posts.
For extra clarity, do this:
/{ig-user-id}/media?fields=ig_id,shortcode
{
"data": [
{
"ig_id": "2263774239079764498",
"shortcode": "B9qiq0sB5IS",
"id": "17870822104623583"
},
{
"ig_id": "2263727042707413494",
"shortcode": "B9qX8BpBVn2",
"id": "17953709776315597"
},
{
"ig_id": "2263682864078255539",
"shortcode": "B9qN5JFhfWz",
"id": "17859657886776538"
}
]
}
The id
is the post-id given by Facebook which will return the comments. The ig_id
is the 19-digit that can potentially be decoded from the 11-digit shortcode. It took me hours to understand how my base64 conversion of a 11-digit shortcode could lead to a 17-digit id.
The lesson learnt was not to question myself and to think of the problem rationally, step-by-step. The shortcode is made of 0-9a-Z-_ (which is obviously base64). That part was correct.
I reached my epiphany when I queried https://api.instagram.com/oembed?url=http://instagr.am/p/B9qN5JFhfWz/
.
{
"version": "1.0",
"media_id": "2263682864078255539_293188374",
...
}
I found out the id returned is truly my 19 digit number decoded from the shortcode. AHA!
I suddenly understood that even though fb acquired ig, the ig-id is not used by fb. Fb has formulated a new id to each ig post.
The only way to query ig comments based on an ig url, is to query for all media, with the shortcode field, and match it to the post id based on the query.
¯\_(ツ)_/¯
A painful lesson nonetheless hahaha.
Did you know this was built with 11ty and tailwind? And works even with Javascript disabled? Yeah I don't care either.