Monday, September 4, 2017

Readability Redux

I recently posted about using a Python module to convert HTML to usable text. Since then, a new package has hit CRAN dubbed htm2txt that is 100% R and uses regular expressions to strip tags from text. I gave it a spin so folks could compare some basic output, but you should definitely give htm2txt... Continue reading

from Readability Redux

No comments:

Post a Comment