Web Scraping: know the data collection technique

Scrape webIn early April, a new data breach uncovered 533 million Facebook users worldwide, including social network founder Mark Zuckerberg and about 8 million Brazilians who have profiles on the service.

According to the platform, this data exposure is not due to an invasion of its servers. The information, which stopped on a hacking forum, was obtained through a technique known as scraping.

The method, used by marketing agencies, journalists and data scientists, has already made headlines on other occasions, such as in September 2020, when data from 235 million users was leaked on YouTube, Instagram and TikTok. Perhaps the best-known case, however, is the Cambridge Analytica scandal, where information from Facebook profiles was used to generate voter behavior maps.

What is scraping?

Also called web scrape, Scraping is a technique that allows you to automatically collect information on the Internet from public databases available on websites, social networks and other online services.

In general, the tool is used to speed up the consultation and collection of this information, while the work performed manually would take much more time. The agility of the process is due to specific applications, programming language or scripts to copy data on a large scale.

Scraping is triggered when a researcher, scientist, journalist, or other professional needs to collect a large amount of data to fuel a study, investigation, or report, with the collection being automated on a public basis by the federal government or some other source.

With the scraping of data, it is also possible to obtain open information from profiles on social networks (name, photo, address, telephone, e-mail, etc.) and through Google, for the most diverse purposes, such as segmenting advertising campaigns . and monitor competitors.

Is data scraping legal?

Data collection through scraping is not considered illegal as long as the scraping takes place on public bases. That is, the information obtained is accessible to any internet user and just like visiting someone’s profile and viewing the data made available there is no crime, the use of an automated tool for such work is also not against the law .

However, it is necessary to note that Facebook, Instagram, YouTube and TikTok, among others, currently regard the automatic copying of data stored by them as a violation of the rules of use of their services.

Are there any risks to those who copied the data?

When using scraping, people and businesses can access public information of any person included in that database, such as phone number, email, profile picture, age, and gender, depending on the type of resource the automatic tool has access to.

In the case of a social network, scrapers will also get details such as the number of followers, engagement and even links shared, in addition to public posts and other content open to other users, if the platform grants such access.

Leave a Comment