Geeks With Blogs
A Technical Debtor Toward continuous improvement

All right, all you developers out there... let's see a show of hands. How many of you delight in finding new ways to solve a problem?

You. Yes, you in the back. Get your hand up. You can't call yourself a developer if you don't enjoy finding a new (preferably somewhat convoluted) to solve a problem.

I've been doing some work that involves converting C# code to VB.NET code. I was sitting in the speaker lounge at VS Live, shortly after getting into San Francisco. I'd played a little bit on the plane with the idea of creating a tool that would walk the directory tree and do some of the preliminary work to convert C# to VB.NET.

Of course, there's a great code translator available online (even if it has problems with LINQ). So I thought it would be kind of cool to leverage that, rather than doing the conversions by hand. I mentioned this to Beth Massi, and she said something about

XML literals rock my world!

while whipping up a little code sample to fix the HTML from the translator site. Something like this:

  1:         input = input.Replace("<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 Strict//EN"" ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"">", "")
  2:         input = input.Replace("<?xml version=""1.0"" encoding=""utf-16""?>", "")
  3:         input = input.Replace("&nbsp;", "")
  4:         input = input.Replace("&", "&amp;")
  5:         input = input.Replace("/*]]>*/", "")
  6: 
  7:         Dim html As XElement
  8:         Using sr As New StringReader(input)
  9:             html = XElement.Load(sr)
 10:         End Using
 11: 
 12:         Dim code = (From data In html...<ul> Where data.@id = "code-result").FirstOrDefault()

Turns out that Beth had already blogged about how to use XML literals for screen scraping. There's also tidy.exe that we could have used.

Now, I just needed to figure out how to post a request to the code converter site, and get the returned code. Or, to be honest, I needed to figure out how to borrow code to do this. Thanks to Google, this didn't take long.

One problem that I ran into was that I needed to specify the name of the object I was passing as parameter. Fiddler to the rescue!

(Does anyone else find it ironic that I found a C# code sample to demonstrate a concept I needed to use to automate conversion from C# to VB.NET?)

So, I ended up with a nice little class of helper methods that I could leverage while walking through a directory tree and converting all of the .cs files I find there. (And, yes, I'm looking at trying to convert csproj files to vbproj as well.)

The helper class looks something like this:

  1: Imports <xmlns="http://www.w3.org/1999/xhtml">
  2: Imports System.Net
  3: Imports System.IO
  4: Imports System.Text
  5: 
  6: Public Class ScreenScraper
  7:     Public Shared Function GetHtmlPageWithPost(ByVal strURL As String, ByVal postContent As String) As String
  8:         Dim httpRequest As HttpWebRequest = CType(WebRequest.Create(strURL), HttpWebRequest)
  9:         httpRequest.Method = "POST"
 10:         httpRequest.ContentType = "application/x-www-form-urlencoded"
 11: 
 12:         Dim arrRequest As Byte() = (New UTF8Encoding).GetBytes("Code=" & postContent)
 13:         httpRequest.ContentLength = arrRequest.Length
 14: 
 15:         Using requestStream As Stream = httpRequest.GetRequestStream
 16:             requestStream.Write(arrRequest, 0, arrRequest.Length)
 17:         End Using
 18: 
 19:         Using reader As New StreamReader(httpRequest.GetResponse.GetResponseStream(), Encoding.UTF8)
 20:             Return reader.ReadToEnd()
 21:         End Using
 22:     End Function
 23: 
 24:     Public Shared Function GetCodeFromHTML(ByVal input As String) As String
 25:         input = input.Replace("<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 Strict//EN"" ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"">", "")
 26:         input = input.Replace("<?xml version=""1.0"" encoding=""utf-16""?>", "")
 27:         input = input.Replace("&nbsp;", "")
 28:         input = input.Replace("&", "&amp;")
 29:         input = input.Replace("/*]]>*/", "")
 30: 
 31:         Dim html As XElement
 32:         Using sr As New StringReader(input)
 33:             html = XElement.Load(sr)
 34:         End Using
 35: 
 36:         Dim code = (From data In html...<ul> Where data.@id = "code-result").FirstOrDefault()
 37: 
 38:         Dim codeText As String = code.ToString
 39:         codeText = codeText.Replace(vbCrLf, "")
 40:         codeText = codeText.Replace("<ul id=""code-result"" xmlns=""http://www.w3.org/1999/xhtml"">", "")
 41:         codeText = codeText.Replace("<li>", "")
 42:         ' replace keyword tag with a space to fix parsing issues
 43:         codeText = codeText.Replace("<span class=""keyword"">", " ")
 44:         codeText = codeText.Replace("</span>", "")
 45:         codeText = codeText.Replace("</li>", vbCrLf)
 46:         codeText = codeText.Replace("</ul>", "")
 47:         ' minimum effort removal of white space (up to 28 spaces)
 48:         codeText = codeText.Replace(New String(" "c, 16), " "c)
 49:         codeText = codeText.Replace(New String(" "c, 12), " "c)
 50:         codeText = codeText.Replace(New String(" "c, 8), " "c)
 51:         codeText = codeText.Replace(New String(" "c, 6), " "c)
 52:         codeText = codeText.Replace(New String(" "c, 4), " "c)
 53:         codeText = codeText.Replace(New String(" "c, 2), " "c)
 54:         codeText = codeText.Replace(New String(" "c, 2), " "c)
 55: 
 56:         Return codeText.Trim
 57:     End Function
 58: End Class

Isn't that a whole bunch more fun than buying a commercial code translator or using Reflector?

There are a few interesting things to note here.

First, in VB.NET, you can import an XML namespace. This is required for the LINQ to XML query to work properly.

Second, the code returned from this helper class is not pretty -- indentation isn't preserved. The web page returns html with lots of <span> tags to provide keyword coloring and other formatting. We dropped these tags, and all of the associated CSS formatting. I don't see this as a big deal, since I'm not editing code in Notepad. The IDE will take care of making the code look good.

Third, I'm sure there's some really cool way to parse the HTML tree to make the string manipulation much simpler. Maybe a regex expression that would be sweet. I didn't worry too much about it -- brute force worked well enough.

Most importantly, in my mind, this is a cool little way to use HTTP POST to send C# to a website, then screen scrape the results, and then get VB.NET code out. (And, yes, you could equally well use the VB.NET -> C# version of the translator web page.)

Posted on Wednesday, February 25, 2009 4:51 PM Tips and Tricks , VB , DevCenter | Back to top


Comments on this post: The Developer's Way to Convert Code

No comments posted yet.
Your comment:
 (will show your gravatar)


Copyright © Jeff Certain | Powered by: GeeksWithBlogs.net