2012-05-29

Scraper site Web scraping垃圾网站采集刮擦技术 Spamming垃圾技术垃圾索引spamdexing

太复杂了。英语混乱,看了半天,越发糊涂了。Scraper site究竟应翻译成什么,到最后我竟越来越搞不懂。

scrape vt. 刮;擦伤;挖成 n. 刮掉;擦痕;困境;刮擦声

Scraper site翻译为刮擦好像不太合适。起先想就译作如垃圾网站,可与原意差得远,后来看了会英文上下文,觉得采集比较不错,中文中也有采集站一说。可看到“designed to "scrape" search-engine results pages ”这个句子我又疑惑了。很可能就是刮擦的意思,只不过有些引申。这个问题,求助谷歌无解,搜索基本没有中文网页,有几个也没有实质信息。

Web scraping似乎是指中性的技术,包括搜索引擎对网络内容的采集以及垃圾站的那种采集。

 

A scraper site is a spam website that copies all of its content from other websites using web scraping.
In the last few years[when?] scraper sites have proliferated at an amazing rate for spamming search engines. Open content is a common source of material for scraper sites.
A search engine is not a scraper site itself; sites such as Yahoo and Google gather content from other websites and index it so that the index can be searched with keywords. Search engines then display snippets of the original site content in response to a user's search.
采集网站是一种垃圾网站,它使用网络采集技术复制了所有它的内容从其它的网站。最近一些年,采集网站激增以一个令人吃惊的速率为了spam搜索引擎。开放内容是一个常规的材料来源对于采集网站。

一个搜索引擎不是一个采集站点它自己;诸如雅虎和谷歌的站点,聚集内容从其它网站,并且索引它为了这个索引能够被用户关键词搜索。搜索引擎然后展示原站点内容的摘要来响应用户的搜索。

Made for advertising
Some scraper sites are created to make money by using advertising programs. In such case, they are called Made for AdSense sites or MFA[citation needed]. This derogatory term refers to websites that have no redeeming value[citation needed] except to lure visitors to the website for the sole purpose of clicking on advertisements.
为广告而制作
一些采集站是被创造为了赚钱靠用户广告计划。在如此情形下,他们被叫做广告联盟站点或者MFA.这个贬损的术语指网站没有补偿价值除了为了唯一的目的点击广告而引诱访问者到这个网站。

http://en.wikipedia.org/wiki/Scraper_site

Web scraping (also called web harvesting or web data extraction) is a computer software technique of extracting information from websites. Usually, such software programs simulate human exploration of the World Wide Web by either implementing low-level Hypertext Transfer Protocol (HTTP), or embedding a fully-fledged web browser, such as Internet Explorer or Mozilla Firefox.
Web scraping is closely related to web indexing, which indexes information on the web using a bot and is a universal technique adopted by most search engines. In contrast, web scraping focuses more on the transformation of unstructured data on the web, typically in HTML format, into structured data that can be stored and analyzed in a central local database or spreadsheet. Web scraping is also related to web automation, which simulates human browsing using computer software. Uses of web scraping include online price comparison, weather data monitoring, website change detection, research, web mashup and web data integration.

网络采集(也被叫做网络收割或者网络数据提取)是一种从网站采集信息的电脑软件技术。通常,这类软件程序模拟人类探索万维网,或者执行低级的超文本传输协议(HTTP),或者嵌入成熟的网络浏览器,例如IE或者Mozilla火狐。

网络采集是与用机器人程序索引网络信息,且被大多数搜索引擎采用的网络索引通用技术紧密相关的。作为对比,网络采集更多的集中在网络上的非结构性的数据的转换,典型的在HTML格式转换成能被在中央本地数据库或电子书表格中存储和分析的结构化的数据。网络采集也是相关的与网络自动操作,那用电脑软件模拟人工浏览。网络采集的使用包括在线价格对比度,天气数据监视,网站变更检测,研究,网络聚合和网络数据整合。

http://en.wikipedia.org/wiki/Web_scraping

 

Scraper sites
Scraper sites are created using various programs designed to "scrape" search-engine results pages or other sources of content and create "content" for a website.[5] The specific presentation of content on these sites is unique, but is merely an amalgamation of content taken from other sources, often without permission. Such websites are generally full of advertising (such as pay-per-click ads[5]), or they redirect the user to other sites. It is even feasible for scraper sites to outrank original websites for their own information and organization names.

采集站被创造用户各种程度,旨在刮擦搜索引擎结页或者其它内容源和创造内容为网站。这个特写的内容描述呈现在它们网站上是唯一的,但仅仅是从其它来源的内容的重组,常常没有许可。如此网站是一般充满广告(例如点击付费广告)或者他们重定向用户到其它站点。它是甚至可能的对于采集站级别高于原始网站因为它们自己的信息和机构名称。

http://en.wikipedia.org/wiki/Spamdexing#Scraper_sites

 

垃圾技术(Spamming)的一种垃圾索引(spamdexing)的种类
Keyword stuffing关键词堆砌
Google bomb Scraper谷歌炸弹
site Link farm 网站链接工厂
Cloaking Doorway page隐形门页
URL redirection网址重定向
Spam blogs 垃圾博客
Sping
Forum spam论坛垃圾
Blog spam博客垃圾
Social networking spam社交网络垃圾
Referrer spam引用垃圾
Parasite hosting寄生虫主机

 

Spamming
通常是指发送不被人允许的商业性电子邮件,但是在搜索引擎优化领域,通常指用一些不光彩的手法达到更好的搜索引擎排名。例如,大量的提交充满关键词但无相关意义的门页。
Netconcepts公司提供的SEO专业用语列表
http://www.netconcepts.cn/resources/seo-glossary/s/

没有评论: