I wrote a small program for screen scraping any sites using XmlHttp object and VBScript. I know I haven't done any rocket science :) still I thought of sharing the code with you all.
XmlHttp -- Extensible Markup Language Hypertext Transfer Protocol
An advantage is that - the XmlHttp object queries the server and retrieve the latest information without reloading the page.
Source code:
<html>
<head>
<script language="vbscript">
Dim objXmlHttp
Set objXmlHttp = CreateObject("Msxml2.XMLHttp")
Function ScreenScrapping()
URL == "UR site URL comes here"
objXmlHttp.Open "POST", url, False
objXmlHttp.onreadystatechange = getref("HandleStateChange")
objXmlHttp.Send
End Function
Function HandleStateChange()
If (ObjXmlHttp.readyState = 4) Then
msgbox "Screenscrapping completed .."
divShowContent.innerHtml = objXmlHttp.responseText
End If
End Function
</script>
<head>
<body>
<input id="divResult" onclick="ScreenScraping()" type="button" value="Click here to start screen scraping" name="btnScreenScraping">
<div id="divShowContent" />
</body>
</html>
Points to note:
1. Many sites have policies against screen scraping ... so before trying to screen scraping any particular site .. do check / respect their policy.
2. I have checked for readyState = 4 property value. It means, we have received complete data in responseText. If we don't do this check we might end up getting incomplete data .. if the site takes longer time to process our request.
3. In objXmlHttp.Open the third parameter takes boolean value. True means, scripts run without waiting for a response from the site/server which we are hitting. False means, if would wait for server response before starting its processing.
For understanding the basics of XmlHttp object, check ...
1. http://www.w3schools.com/dom/dom_http.asp
2. http://jibbering.com/2002/4/httprequest.html
XmlHttp -- Extensible Markup Language Hypertext Transfer Protocol
An advantage is that - the XmlHttp object queries the server and retrieve the latest information without reloading the page.
Source code:
<html>
<head>
<script language="vbscript">
Dim objXmlHttp
Set objXmlHttp = CreateObject("Msxml2.XMLHttp")
Function ScreenScrapping()
URL == "UR site URL comes here"
objXmlHttp.Open "POST", url, False
objXmlHttp.onreadystatechange = getref("HandleStateChange")
objXmlHttp.Send
End Function
Function HandleStateChange()
If (ObjXmlHttp.readyState = 4) Then
msgbox "Screenscrapping completed .."
divShowContent.innerHtml = objXmlHttp.responseText
End If
End Function
</script>
<head>
<body>
<input id="divResult" onclick="ScreenScraping()" type="button" value="Click here to start screen scraping" name="btnScreenScraping">
<div id="divShowContent" />
</body>
</html>
Points to note:
1. Many sites have policies against screen scraping ... so before trying to screen scraping any particular site .. do check / respect their policy.
2. I have checked for readyState = 4 property value. It means, we have received complete data in responseText. If we don't do this check we might end up getting incomplete data .. if the site takes longer time to process our request.
3. In objXmlHttp.Open the third parameter takes boolean value. True means, scripts run without waiting for a response from the site/server which we are hitting. False means, if would wait for server response before starting its processing.
For understanding the basics of XmlHttp object, check ...
1. http://www.w3schools.com/dom/dom_http.asp
2. http://jibbering.com/2002/4/httprequest.html
Comments
Do you have any idea of how to scraping from an active application such as ms. word and paste (feed) to another ms. word in open inside a Remote Desktop ?
Many thanks !
Nice content ! But i feel that non-technical people should avail the customised services to take full advantage of web scraping.
Web Parsing