C#: Parsing HTML Table and Loading HTML Webpage using Html Agility Pack

Standard

Include HTML Agility Pack in your application using nuget. To install it in your project, type the following command in the Package Manager Console.

> Install-Package HtmlAgilityPack

After adding the reference via Nuget, you need to include the reference in your page using the following.

> using HtmlAgilityPack;

Below function will convert webpage HTML table to C# readable code, just need to pass table class name and page URL.

public List<List<string>> ScrapHtmlTable(string className, string pageURL)
{
    HtmlWeb web = new HtmlWeb();
    HtmlDocument document = web.Load(pageURL);
    List<List<string>> parsedTbl = 
      document.DocumentNode.SelectSingleNode("//table[@class='" + className + "']")
      .Descendants("tr")
      .Skip(1) //To Skip Table Header Row
      .Where(tr => tr.Elements("td").Count() > 1)
      .Select(tr => tr.Elements("td").Select(td => td.InnerText.Trim()).ToList())
      .ToList();

    return parsedTbl;
}

Invoking function signature:

ScrapHtmlTable("className1 className2", "https://www.abc.xz");
Advertisements

One thought on “C#: Parsing HTML Table and Loading HTML Webpage using Html Agility Pack

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s