Over the past few weeks I’ve been throwing this together in my full time – a class library to pull HTML nodes from a page using a CSS selector. hpricot for Ruby would be a good comparison. It uses the HTML Agility Pack behind the scenes, to clean up the source document, and provide means of reading the document nodes.
I’ve got a limited suite of unit tests which document the current level of support, most CSS 2.1 stuff is in there, and a couple of CSS 3 ones. The unit tests are partially pinched from jQuery’s selector engine unit tests, so thank you to the jQuery team.
It works like this:
SelectorEngine engine = new SelectorEngine(htmlString);
IList nodes = engine.Parse("#p>a");
Pretty simple stuff. There are no binaries yet, as I consider it alpha-quality, but you can check it out over at Google Code. Contributions would be appreciated.
This rocks (from what I’ve seen so far)!
Will be using it for all my web spidering from now on.
Mikael[...] Why learning CSS is important in a (web) development world – Fizzler: A CSS Selector Engine for C# [...]
Interesting Finds: 2009 01.15 ~ 01.17 | Web Hosting and DomainsHi Colin,
Just been playing with your CSS engine – top notch stuff. I am actually using it as a mechanism for selecting elements within a WPF visual tree. I have placed an interface between your CSS engine and the HTMLNode, then replaced this with a WPF VisualTreeNode. It works a treat
I think I have found a bug in your selector though – are you still actively developing? If so, can I send you a new test case – you could probably fix it far quicker than I can!
I will be blogging about my code shortly – so will give you a shout.
Regards, Colin E.
Colin Eberhardt