How Can I Parse Remote Html Page Using Pure Java Script
Solution 1:
Ordinary browser javascript cannot access the contents of remote pages from any server except its own.
You can:
Have a cooperating script on your own server to fetch the remote content
With the cooperation of the remote server, you may be able to access content with an appropriate CORS ( http://en.wikipedia.org/wiki/Cross-origin_resource_sharing ) arrangement.
Again with the cooperation of the remote server, if it makes its content available by javascript you can access that by creating inline script elements. "JSONP" is an example of this approach.
If you write a browser plugin or addon - for browsers which permit such things to be written in javascript - then you are not bound by the browser security model in the same way.
Solution 2:
assuming origin
fixed etc, here is the approach I use:
// get body part of html txt = txt.substr( txt.indexOf('<body>')+6 ); txt = txt.substr( 0, txt.indexof('</body>')-1 ); // stick body into div var div = document.createElement('div'); div.innerHTML = txt; // extract textContent from each element (or something more interesting) Array.prototype.slice( div.querySelectorAll('*') ).forEach( function(el) { if( el.textContent ) console.log( el.textContent ); });
Post a Comment for "How Can I Parse Remote Html Page Using Pure Java Script"