tag:blogger.com,1999:blog-74231097719804102732024-03-05T12:12:33.011-08:00xecretsSvantehttp://www.blogger.com/profile/13946027974134920903noreply@blogger.comBlogger40125tag:blogger.com,1999:blog-7423109771980410273.post-33402550118834546692021-05-10T10:50:00.005-07:002023-09-21T08:34:41.917-07:00ConcurrentDictionary.GetOrAdd() may not work as you think!<h2 style="text-align: left;">It's concurrent - not lazy</h2><div>We had a problem with random crashes in a customers web. It was not catastrophic, it would get going again but still it was there in the logs and we want every page visit to work.</div><div><br /></div><div>A bit of investigation with an IL-code decompiler followed, which by the way is absolutely the best thing since sliced bread! I found code equivalent to the following in a third party vendors product:</div><pre><code>private readonly ConcurrentDictionary<string, Type> _dictionary
</code> = new ConcurrentDictionary<string, Type>();</pre><pre><code>
private readonly object _lock = new object();
public Type GetType(string typename)
{
return _dictionary.GetOrAdd(typename,
(t) =>
{
lock (_lock)
{
return DynamicallyGenerateType(t);
}
});
}
</code></pre><div>The thing here is that <span style="font-family: courier;">DynamicallyGenerateType()</span> can only be called once per typename, since what it does is emit code into an assembly, and if you do that twice you get a <span style="font-family: courier;"><span>S</span>y</span><span style="font-family: courier;">stem.ArgumentException: Duplicate type name within an assembly</span> .</div><div><br /></div><div>No-one wants that, so the author thought that it would be cool to use a <span style="font-family: courier;">ConcurrentDictionary<TKey, TValue> </span>since the <span style="font-family: courier;">GetOrAdd()</span> method guarantees that it will get an existing value from the dictionary, or add a new one using the provided value factory and then return the value.</div><div><br /></div><div>It looks good, reasonable, and works almost all of the time. Key word here is: almost.</div><div><br /></div><div>What the concurrent dictionary does is it uses efficient and light-weight locking to ensure that the dictionary can be concurrently accessed and updated in a consistent and thread safe manner.</div><div><br /></div><div>It does not guarantee a single one-time lazy call to the value factory used to add a value if it's not in the dictionary.</div><div><br /></div><div>Sometimes, under heavy initial load, the value factory passed as the second argument to <span style="font-family: courier;">GetOrAdd()</span> will be called twice (or more). What the concurrent dictionary guarantees is that the value for the provided key will only be set once, but the value factory may be called multiple times with the result thrown away for all calls except the race-winning one!</div><div><br /></div><div>This is clearly <a href="https://docs.microsoft.com/en-us/dotnet/api/system.collections.concurrent.concurrentdictionary-2.getoradd" target="_blank">documented</a> but it's easy to miss. This is not a case of the implementation not being thread safe, as is stated in some places. It's very thread safe. But the value factory may indeed be called multiple times!</div><div><br /></div><div>To fix it, add a Lazy-layer on top, because <span style="font-family: courier;">Lazy<T></span> by default does guarantee that it's value factory is only called once!</div><pre><code>private readonly ConcurrentDictionary<string, Lazy<Type>> _dictionary
= new ConcurrentDictionary<string, Lazy<Type>>();
private readonly object _lock = new object();
public Type AddType(string typename)
{
return _dictionary.GetOrAdd(typename,
(t) =>
{
lock (_lock)
{
return DynamicallyGenerateTypeLazy(t);
}
}).Value;
}
</code></pre><div><br /></div><div>Now, although we may instantiate more than one <span style="font-family: courier;">Lazy<Type></span> instance, and then throw it away if we loose the <span style="font-family: courier;">GetOrAdd</span>-race, that's a minor problem and it works as it should.</div><div><br /></div><div>Please note that this is only true as long as you use the default <span style="font-family: courier;">LazyThreadSafetyMode.ExecutionAndPublication</span> mode.</div><div><br /></div><div>The additional <span style="font-family: courier;">lock</span> may look confusing, but it was in the original code and makes sense in this context, because while the concurrent dictionary and lazy layer guarantees that only one call per value of '<span style="font-family: courier;">typename</span>' is made to <span style="font-family: courier;">DynamicallyGenerateTypeLazy()</span>, it does not guarantee that multiple threads do not call it it concurrently with different type names and this may wreak havoc with the shared assembly that the code is generated to.</div><div><br /></div>Svantehttp://www.blogger.com/profile/13946027974134920903noreply@blogger.com0tag:blogger.com,1999:blog-7423109771980410273.post-46020034173270814692021-05-04T12:33:00.003-07:002021-05-05T01:14:48.231-07:00SSH with TortoiseGit and Bitbucket or GitHub on Windows 10<h2 style="text-align: left;">Memo to self</h2><p>It's always complicated to remember the steps necessary to get SSH working, and there are some idiosyncrasies as well. This guide may help you, I'm sure it'll help me the next time I need to do this myself.</p><p>Password-based login with HTTPS is starting to be obsolete, and it's less secure. Also with the nice SSH agent in Windows 10, you only need to enter the password once - ever.</p><h3 style="text-align: left;">Generate a key pair</h3><p>Open a command prompt and run the ssh-keygen command, to generate a private and a public key file. Accept the defaults.</p><p>Enter a strong password for the private key file when asked, and ensure that you store it securely in your password manager.</p><p>This should create files in <span style="font-family: courier;">%USERPROFILE%\.ssh</span> named <span style="font-family: courier;">id_rsa</span> (the private key file) and <span style="font-family: courier;">id_rsa.pub</span><span style="font-family: inherit;"> (the public key file).</span></p><p><span style="font-family: inherit;"></span></p><div class="separator" style="clear: both; text-align: center;"><span style="font-family: inherit;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjus0ZSbWrgRkwPGv2FaVPyH_Js1sSzHSvxcuUJKj4_TZW1sfByfwy-5xy-zIXOQnn-VTx-U-tJSKYtCaaR_CIl4QnZwrNjs07g4c23kaCavbnlOIg0SEesP0zZGz-eq-7CzwIHu_gd3ZGl/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="406" data-original-width="549" height="237" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjus0ZSbWrgRkwPGv2FaVPyH_Js1sSzHSvxcuUJKj4_TZW1sfByfwy-5xy-zIXOQnn-VTx-U-tJSKYtCaaR_CIl4QnZwrNjs07g4c23kaCavbnlOIg0SEesP0zZGz-eq-7CzwIHu_gd3ZGl/" width="320" /></a></span></div><h3><br /></h3><h3>Enable and start the OpenSSH Authentication Agent Service</h3><h3><div style="font-size: medium; font-weight: 400;">Nowadays it is shipped with Windows 10, but it's not enabled by default. So start your Services gadget and ensure the service is set to startup automatically, and it's running.</div><div style="font-size: medium; font-weight: 400;"><br /></div><div style="font-size: medium; font-weight: 400;"><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhAnpcNcl2y4vcqOUYNZ0XhZGzE-x4vOakLMcVCUj7g9OCwQ3zhuOQN979lMEgj3NWpd8pcMoi14grNLFsfOb4zjue3knaNHqq-zft5EhfuhIeHf_ZzysgNhECYi_V8kVMMS_jjI6hFqmDa/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="581" data-original-width="699" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhAnpcNcl2y4vcqOUYNZ0XhZGzE-x4vOakLMcVCUj7g9OCwQ3zhuOQN979lMEgj3NWpd8pcMoi14grNLFsfOb4zjue3knaNHqq-zft5EhfuhIeHf_ZzysgNhECYi_V8kVMMS_jjI6hFqmDa/" width="289" /></a></div></div></h3><h3 style="text-align: left;"><br /></h3><h3 style="text-align: left;">Add the private key to the SSH Authentication Agent</h3><div>In the command prompt, type <span style="font-family: courier;">ssh-add</span><span style="font-family: inherit;"> . It should select the default ssh key </span><span style="font-family: courier;">id_rsa</span><span style="font-family: inherit;">, and ask for the password you entered previously.</span></div><div><br /></div><div>(If you get the error message "<span style="font-family: courier;">Can't add keys to ssh-agent, communication with agent failed</span>", there seems to be an issue with certain Windows distributions. For whatever reasons, the following workaround appears to work. Open a new command prompt but elevated with Run As Administrator. Then type:<br /><br /><span style="font-family: courier;"><span> </span>sc.exe create sshd binPath=C:\Windows\System32\OpenSSH\ssh.exe</span> .<br /><br />Then exit the elevated command prompt and try again to do the <span style="font-family: courier;">ssh-add</span> in your normal command prompt.)</div><h3 style="text-align: left;"><br /></h3><h3 style="text-align: left;">Save the public key to Bitbucket...</h3><div>Open the file <span style="font-family: courier;">%USERPROFILE%\.ssh\ids_rsa.pub</span> in Notepad, Select All (Ctrl-A) and Copy (Ctrl-C). Paste it into this dialog in Bitbucket, Your Profile and Settings -> Personal Settings -> SSH keys -> Add key:</div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVhAdb5avPiT-04dqerZniRWIEQgrR2P4NF2pQI4EvUCGvwleKSYZTr_FOpvHRZzSQUfW0p9ZbuQ9qAzN6TIVkLCvcG2WwMCWNQo2RCJejHWVuOBIzbJUY0Y2IeUrd0j4eVs6QN5lcD1Hv/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="669" data-original-width="1026" height="209" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVhAdb5avPiT-04dqerZniRWIEQgrR2P4NF2pQI4EvUCGvwleKSYZTr_FOpvHRZzSQUfW0p9ZbuQ9qAzN6TIVkLCvcG2WwMCWNQo2RCJejHWVuOBIzbJUY0Y2IeUrd0j4eVs6QN5lcD1Hv/" width="320" /></a></div><br /><br /></div><div>The Label is just anything that makes it easy for you to remember what key it is. Perhaps todays date, and the name of the computer you have the private key on can be a good start. Or just "My public SSH key" works too.</div><h3 style="text-align: left;"><br /></h3><h3 style="text-align: left;">...and/or save the public key to GitHub</h3><div>Go to Settings -> SSH keys -> New SSH key<br /><br /></div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhAzk3r3J26MbVPAKlzgMxEXCFpdj-U7gJEz371Lf0YRzV95BF-m8ZF8PcleXPg6aVlKrqxNrnHIYoglMrGGY2tw0mM7tkk8v-ucd4KQfqVTqk7K1Act9IFthkiouBx6BFHaN0urkC9nxzv/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="382" data-original-width="921" height="133" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhAzk3r3J26MbVPAKlzgMxEXCFpdj-U7gJEz371Lf0YRzV95BF-m8ZF8PcleXPg6aVlKrqxNrnHIYoglMrGGY2tw0mM7tkk8v-ucd4KQfqVTqk7K1Act9IFthkiouBx6BFHaN0urkC9nxzv/" width="320" /></a></div></div><div style="text-align: left;"><br />The Title has the same meaning as Label for Bitbucket, see above.</div><h3 style="text-align: left;"><br /></h3><h3 style="text-align: left;">Remove any Credential Helpers</h3><div>Git credential helpers may conflict with the use of SSH keys, and there is no need for them anyway, so remove them from TortoiseGit in the Settings -> Git -> Credential menu so it looks like this:</div><div><br /></div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj6zurGpfLK8ygy6LRMikvO8fDF1cOUABwKTdxNsr6YQ0NBrJMFXopNh1leA27IWNSbMEEls3FBrKYzWpereNnweg7lytQrJr4wXX-XI3BRdeozZqjyLyOdvThpdkaa1FRcq5SB7yuIbvcG/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="593" data-original-width="764" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj6zurGpfLK8ygy6LRMikvO8fDF1cOUABwKTdxNsr6YQ0NBrJMFXopNh1leA27IWNSbMEEls3FBrKYzWpereNnweg7lytQrJr4wXX-XI3BRdeozZqjyLyOdvThpdkaa1FRcq5SB7yuIbvcG/" width="309" /></a></div><br /><h3 style="text-align: left;"><br /></h3><h3 style="text-align: left;">Tell Git where to find SSH</h3><div>Set the environment variable <span style="font-family: courier;">GIT_SSH</span> to <span style="font-family: courier;">C:\Windows\System32\OpenSSH\ssh.exe</span><span style="font-family: inherit;"> . Right-click "This PC" -> Properties -> Advanced system settings -> Environment Variables... -> New...<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh1Lj5vsAB-dXMy-0BAFuWpBoXCxIYBJ26hjHYBjD2Rd-5N5XV5VMvENZqKGK2uD-ka4OZld7knDklm2sicq5ko2WVPBm6T4dJH_g63CyVeUnTmf-yijeW1vdwjxSZS5fRRHmL64rzWpKT-/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="166" data-original-width="650" height="82" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh1Lj5vsAB-dXMy-0BAFuWpBoXCxIYBJ26hjHYBjD2Rd-5N5XV5VMvENZqKGK2uD-ka4OZld7knDklm2sicq5ko2WVPBm6T4dJH_g63CyVeUnTmf-yijeW1vdwjxSZS5fRRHmL64rzWpKT-/" width="320" /></a></div><div><span style="font-family: inherit;"><br /></span></div>Restart explorer (Task Manager -> Details -> explorer.exe right-click -> End Task, then File -> Run new Task -> Open: explorer -> OK) , or logout and login, or restart your computer.</span></div><h3 style="text-align: left;"><br /></h3><h3 style="text-align: left;">Update origin URL to use SSH</h3></div><div>Finally, update your repos origin to use SSH instead of HTTPS. The easiest way is to copy the part after 'git clone' in the Bitbucket "Clone" feature.<br /><br /></div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhdZwF-ctH3JTY2PwksL3IAWDRRDo1V4GqaxptuDhqRDoloBES07yJvxNaeatID_1JXm6mK5eZdJclVSUBkpw01iSpCNH102Ptasao9BLTMCojw50tR6kgar5XxPZkmyjcKS1N_wcnXJ0bV/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="311" data-original-width="1103" height="90" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhdZwF-ctH3JTY2PwksL3IAWDRRDo1V4GqaxptuDhqRDoloBES07yJvxNaeatID_1JXm6mK5eZdJclVSUBkpw01iSpCNH102Ptasao9BLTMCojw50tR6kgar5XxPZkmyjcKS1N_wcnXJ0bV/" width="320" /></a></div><br />Click the "Clone" button, Select SSH and then the URL part of the git clone command suggested, and paste it in TortoiseGit Remote origin for the repo:<br /><br /></div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjTdXPpiA2Vqcyqt-oU-HWkh212A-7NDeEMB3XcOe-MIn0v72kXE5yN-jofzORxrZZM2RyW2ACv48hB9ZpeDgZ3TN_C_S2TFAz0zM2h5BPcur7OFfVzCQuDr5doy89pbs8qPFnlcC4OYgqQ/" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="591" data-original-width="766" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjTdXPpiA2Vqcyqt-oU-HWkh212A-7NDeEMB3XcOe-MIn0v72kXE5yN-jofzORxrZZM2RyW2ACv48hB9ZpeDgZ3TN_C_S2TFAz0zM2h5BPcur7OFfVzCQuDr5doy89pbs8qPFnlcC4OYgqQ/" width="311" /></a></div><br /><br /></div><div>Done! Now you can enjoy password-less use of git with Bitbucket and/or GitHub.</div><div><br /></div>Svantehttp://www.blogger.com/profile/13946027974134920903noreply@blogger.com0tag:blogger.com,1999:blog-7423109771980410273.post-18571970018584510262021-05-03T23:24:00.000-07:002021-05-03T23:24:40.596-07:00Thinking about cacheability<h2 style="text-align: left;">Performance, Cacheability and Thinking Ahead</h2><p>Although it's very true that "premature optimization is the root of all evil" (Hoare/Knuth), this should never be taken to mean that writing inefficient code is a good thing. Nor does it mean that we should ignore the possible need for future optimizations. As all things, it's a question of balance.</p><p>When designing APIs for example, just how you define them may impact even the possibility for future optimizations, specifically caching. Although caching is no silver bullet, often it is the single most effective measure to increase performance so it's definitively a very important tool in the optimization toolbox.</p><p>Let's imagine a search API, that searches for locations of gas stations within a given distance from a given point, and also filters on the brand of the station.<br /><br />(In a real life application, the actual search terms will of course be more complex, so let's assume that using an underlying search service really is relevant which is perhaps not necessarily the case in this literal example. Also, don't worry about the use of static methods and classes in the sample code, it's just for show.)</p><pre><code>[Route("v1/[controller]")]
public class GasStationV1Controller : ControllerBase
{
[HttpGet]
public IEnumerable<GasStation> Search(string brand, double lat, double lon, double distance)
{
return SearchService.IndexSearch(brand, lat, lon, distance);
}
}</code></pre><p>We're exposing a REST API that delegates the real work to a fictitious search service, accessing an index and providing results based on the search parameters, possibly adding relevance, sponsoring and other soft parameters to the search. That's not important here.</p><p>What is important is that we've decided to let the search index handle the geo location part of the search as well, so we're indexing locations and letting the search index handle distance and nearness calculations, which on the surface of things appear to make sense. The less we do in our own code, and the more we can delegate, the better!</p><p>But, unfortunately it turns out this is a little too slow, and also we're overloading the back end search service which has a rate limiting function as well as a per-call pricing schedule so it's expensive too. What to do? The obvious thing is to cache. Said and done.</p><pre><code>[Route("v2/[controller]")]
public class GasStationV2Controller : ControllerBase
{
[HttpGet]
public IEnumerable<GasStation> CachedSearch(string brand, double lat, double lon, double distance)
{
string cacheKey = $"{brand}-{lat}-{lon}-{distance}";
return ObjectCache.GetOrAdd(cacheKey, () => SearchService.IndexSearch(brand, lat, lon, distance));
}
}
</code></pre><p>Now we're using our awesome <span style="font-family: courier;">ObjectCache</span> to either get it from the cache, or if need be, call the back end service. All set, right?</p><p>Not quite.</p><p>The location that we're looking to find near matches to is essentially where the user is, which means there'll be quite a bit of variation of the search parameters. In fact, there is very little chance that anything in the cache will ever be re-used. The net effect of our caching layer is just to fill server memory. We're not reducing the back end search service load, and we're not speeding anything up for anyone.</p><p>The thing to consider here is that when we're designing an API that has the potential of being a bottleneck in one way or another, we should consider to make it possible to add a caching layer even if we don't to begin with (remember that thing about premature optimizations).</p><p>Avoid designing low-level API:s that take essentially open-ended parameters, i.e. parameters that have effectively infinite variation, and where very seldom a set of parameters is used twice. It's not always possible, it depends on the situation, but consider it.</p><p>As it turns out, our only option was to redesign what we use the search index for, and move some functionality into our own application. This is often a memory/time tradeoff, but in this case, keeping up to a 100 000 gas stations in memory is not a problem, and filtering them in memory in the web server is an excellent option.</p><p>This is how it looks like now, and although we're obliged to do some more work on our own, we'll be fast and we're offloading the limited and expensive search service quite a bit.</p><pre><code>[Route("v3/[controller]")]
public class GasStationV3Controller : ControllerBase
{
[HttpGet]
public IEnumerable<GasStation> Search(string brand, double lat, double lon, double distance)
{
string cacheKey = $"{brand}";
return ObjectCache.GetOrAdd(cacheKey, () => SearchService.IndexSearch(brand))
.Select(g => g.SetDistance(lat, lon))
.Where(g => g.Distance <= distance)
.OrderBy(g => g.Distance);
}
}
</code></pre><p>Now we have more manageable set of search parameters to cache, and we can still serve the user rapidly and without overloading the search service and or budget.</p><p>Taking this a step further, we'd consider moving this logic to the client, if it's reasonable since then we can even let the HTTP response become cacheable, which can further increase the scalability and speed for the users.</p><p>In the end performance is always about compromises, but the lesson learned here is that even if we don't (think) we need optimization and caching at the start, we should at least consider leaving the path for it open.</p>Svantehttp://www.blogger.com/profile/13946027974134920903noreply@blogger.com0tag:blogger.com,1999:blog-7423109771980410273.post-90763481375542636202021-04-29T08:34:00.002-07:002021-04-29T08:36:23.961-07:00Weird random constants in GetHashCode()<p>Random integers in GetHashCode() for C#/.NET</p><p>When you ask Visual Studio to generate Equals() and GetHashCode() for C#/.NET, as well as when you inspect other code, you often see addition and multiplication of various seemingly random constants as part of the calculation. Here's an example of how it can look:</p><pre><code>public class MyTypeA
{
public MyTypeA(string value) => ValueA = value;
public string ValueA { get; }
public override int GetHashCode() => 250924889 + EqualityComparer<string>.Default.GetHashCode(ValueA);
}
public class MyTypeB
{
public MyTypeB(string value) => ValueB = value;
public string ValueB { get; }
public override int GetHashCode() => -1007312712 + EqualityComparer<string>.Default.GetHashCode(ValueB);
}
public class MyFixedByVsHashCode
{
public MyFixedByVsHashCode(string value)
{
A = new MyTypeA(value);
B = new MyTypeB(value);
}
public MyTypeA A { get; }
public MyTypeB B { get; }
public override int GetHashCode()
{
int hashCode = -1817952719;
hashCode = hashCode * -1521134295 + EqualityComparer<MyTypeA>.Default.GetHashCode(A);
hashCode = hashCode * -1521134295 + EqualityComparer<MyTypeB>.Default.GetHashCode(B);
return hashCode;
}
}
</code></pre>
<p>The above example was generated using Visual Studio 2019 for .NET Framework and contains a number of seemingly random strange integer constants: 250924889, -1007312712, -1817952719 and -1521134295. If you generate code for .NET Core or .NET 5 it may look a little different, but under the hood it's similar.</p><p>Executive summary: The reason for these numbers is to reduce the risk of collisions, i.e. the number of situations where two different instances with different values get the same hash code.</p><p>So what's up with these magic numbers? First of all: No, they're not random. Let's go through them.</p><pre><code>public override int GetHashCode() => 250924889 + EqualityComparer<string>.Default.GetHashCode(ValueA);
public override int GetHashCode() => -1007312712 + EqualityComparer<string>.Default.GetHashCode(ValueB);<br /></code></pre>
<p>These values are derived from the name of the property. Exactly how is not documented and not clear, but it's essentially equivalent to <span style="font-family: courier;">"NameOfProperty".GetHashCode()</span> . The purpose is to add the name of property to the equation, reducing the risk that two properties with the same value get the same hash code.</p><p>Then we have the integer constants from the multiple property implementation:</p><pre><code>int hashCode = -1817952719;
hashCode = hashCode * -1521134295 + EqualityComparer<MyTypeA>.Default.GetHashCode(A);
hashCode = hashCode * -1521134295 + EqualityComparer<MyTypeB>.Default.GetHashCode(B);
</code></pre>
<p>These are fixed, and do not vary. A little bit of analysis shows they are far from random. The first one, -1817952719 is actually the product of two relatively large primes, 16363 * 151379 = 2477014577 and is thus a nice semiprime, and when this is interpreted as a signed 32-bit integer we get -1817952719.</p><p>The second one, -1521134295, when interpreted as an unsigned 32-bit integer is 2773833001 - and that is a nice large prime!</p><p>Using primes and semiprimes as factors and constants in polynomials has been shown to produce numbers with better distribution than other constants.</p><p>So it's all about reducing the risk of collisions.</p><p>But how bad can it get? Actually, very bad... Here follows a seemingly good enough implementation that is similar to many real-world manual implementations. In fact, I've written a good number of similar, although hopefully not with as catastrophic result as this.</p>
<pre><code>public class MyTypeA
{
public MyTypeA(string value) => ValueA = value;
public string ValueA { get; }
public override int GetHashCode() => ValueA.GetHashCode();
}
public class MyTypeB
{
public MyTypeB(string value) => ValueB = value;
public string ValueB { get; }
public override int GetHashCode() => ValueB.GetHashCode();
}
public class MyBrokenHashCode
{
public MyBrokenHashCode(string value)
{
A = new MyTypeA(value);
B = new MyTypeB(value);
}
public MyTypeA A { get; }
public MyTypeB B { get; }
public override int GetHashCode() => A.GetHashCode() ^ B.GetHashCode();
}
internal class Program
{
private static void Main()
{
Console.WriteLine($"Hashcode is: {new MyBrokenHashCode("Something").GetHashCode()}");
Console.WriteLine($"Hashcode is: {new MyBrokenHashCode("Other").GetHashCode()}");
}
}
</code></pre>
<p>The above example produces the following output:</p><pre>Hashcode is: 0
Hashcode is: 0</pre><p>That's not good! Two different instances with two different values not only produce the same hashcode, it's 0! In fact it's worse than not good, it's potentially catastrophic for performance. The scary thing is, everything will still work, and look nice during testing but if these objects are placed in a HashTable or Dictionary or similar, and in production they grow to a larger number of elements then indexing these collections degenerate into linear searches in a linked list.</p><p>So what happens?</p><p>Two different types happen to generate the same hashcode for the same underlying value ("Something" or "Other"). That's actually not that unusual. Then we use XOR to combine the hashes, but XOR has the known weakness that XOR:ing identical values will always result in zero, regardless of the values.</p><p>This example is slightly contrived, but it demonstrates that seemingly good-looking code can have subtle pitfalls causing really bad effects.</p><p>Conclusion - Trust the tools and use Visual Studio generation of GetHashCode-code! Even if you don't notice any problems with your own implementations, do regenerate the code when you have the chance.</p>Svantehttp://www.blogger.com/profile/13946027974134920903noreply@blogger.com0tag:blogger.com,1999:blog-7423109771980410273.post-18677383295695459842021-04-29T08:33:00.001-07:002021-04-30T01:28:36.330-07:00Iterations and the squaring factor<h2 style="text-align: left;">The power of 2</h2><div>I recently found code that was functionally equivalent to the following:</div>
<pre><code>public class Filter
{
private readonly IEnumerable<string> _old;
public Filter(IEnumerable<string> old) => _old = old;<br />
public IEnumerable<string> WhatsNew(IEnumerable<string> updated) => updated.Where(s => !_old.Contains(s));<br />}
</code></pre><p>Nice, compact and easily understandable. We keep track of an original list of strings, and we get an updated list we'd like to know what's new.</p><p>Or is it, really?</p><p>As I mentioned, I found this type of code but why did I notice it? Because during debugging the call of the <span style="font-family: courier;">WhatsNew()</span> method took significant time, it was boring to sit there and wait for it to complete!</p><p>The problem is that if the two collections are of the approximate same size, for example if updated contains a single new string, the typical number of calls to the string comparer is <span style="font-family: courier;">_old.Length * _old.Length / 2</span> .</p><p>In other words, the number of operations grows exponentially with the length of the list, this is typically expressed as O(N**2), read as "order of N-squared". That it's actually on the average divided by 2 doesn't matter for the O() notation, it just means that the number of operations is proportional to N squared.</p><p>In the real-world situation, the number of elements were on the order of 20,000. That's not extraordinary large in any way, but 20,000 * 20,000 / 2 is 200,000,000 !</p><p>That's 200 million operations! That can take real time even in a pretty fast machine.</p><p>The problem is the lookup in the _old list. We need to enumerate the updated one in one way, no way really to get around that, given the assumptions here.</p><p>This is where hashtables, or dictionaries which use hashtables under the hood and similar collections come into play. A lookup using a hashtable is enormously more efficient, and it will approach a linear increase in time rather than exponential. Here's how it could have been (and subsequently was) coded using a <span style="font-family: courier;">HashSet</span> :</p><pre><code>public class Filter
{
private readonly HashSet<string> _old = new HashSet<string>();
public Filter(IEnumerable<string> old)
{
foreach (string value in old)
{
_ = _old.Add(value);
}
}
public IEnumerable<string> WhatsNew(IList<string> updated) => updated.Where(s => !_old.Contains(s));
}
</code></pre><p>Now, our <span style="font-family: courier;">WhatsNew()</span> method will operate O(N), i.e. the time taken will be proportional to the number of elements, not the square of the number of elements! For larger sizes of the collection, that's a huge gain.</p><p>Obviously there are many variations both to the problem and the solution, but the message here is to be aware of the cost of doing effectively nested iterations of large collections.</p><p>This is also one of those examples of things that might not bite you until it's too late and your application is running in the real world. During testing and unit testing which usually is done with smaller data all will look well (even if we know we should be using expected data sizes somehow it often doesn't happen). Then, when it scales up in the real world performance can deteriorate dramatically and quickly!</p><p>This is similar to the old fable of the <a href="https://en.wikipedia.org/wiki/Wheat_and_chessboard_problem">reward in grains of rice</a> . Doubling the list does not decrease performance by half, as many would expect. It decreases performance proportional to the square of the increase! It get's progressively worse, quicker and quicker, and can surprisingly fast become a critical problem.</p><p>With the updated solution, doubling the list will decrease performance by roughly half, which is much easier to handle and scale with.</p>Svantehttp://www.blogger.com/profile/13946027974134920903noreply@blogger.com0tag:blogger.com,1999:blog-7423109771980410273.post-25723525404405983972021-04-27T22:39:00.002-07:002021-04-27T22:39:53.548-07:00The strange git love of the command line<h2 style="text-align: left;">Git users love of the command line</h2><div style="text-align: left;"> Why do a vast majority of git users love the command line so fervently? And why do I care?</div><div style="text-align: left;"><br /></div><div style="text-align: left;">Git is a great tool, but like said of "C", it's like a sharp knife. Excellent in the hands of an expert, but easy to cut your fingers if you're not.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">Using git inexpertly can get you into all kinds of difficult-to-get-out trouble. Most developers' main job is to code, not maintain super-complex repositories with 100s or 1000s of contributors. You'd think they would welcome any tool that made the use of git easier and safer, perhaps forgoing some complex scenarios.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">But, sorry to say, nope. Almost all of the developers I know and work with use git from the bash command line. It's also the only thing they use from bash, or any command line. Git has a zillion commands and options, but 9 out of 10 days the only things we do are:</div><div style="text-align: left;"><ul style="text-align: left;"><li>Pull updates from the remote.</li><li>Create feature branches.</li><li>Commit changes locally.</li><li>Push changes to the remote.</li><li>Stash changes temporarily locally.</li><li>Switch branch.</li><li>Merge branches.</li><li>Rebase feature branches.</li><li>Inspect the commit history of a file.</li></ul><div>There are excellent graphical user interface frontends for git, in all environments. I'm on Windows, and I happen to be fond of <a href="https://tortoisegit.org/" target="_blank">TortoiseGit</a>, but there are others.</div><div><br /></div><div>The great thing about these tools are that for one, they are 100% compatible with each other and the command line, since they all use the same underlying implementation of git.</div><div><br /></div><div>Nobody, not even the most hard-core developers, use a command line editor or a command line debugger for day-to-day development. We're simply more productive with graphical full screen integrated development environments like Visual Studio, and less prone to mistakes. There's a reason we're not using <a href="https://almy.us/teco.html" target="_blank">TECO</a> any more, even if it was awesome in it's time and still is in some ways - I loved it!</div><div><br /></div><div>The same applies to source code control. It's simply faster, more convenient and less error-prone to use a tool that integrates with your graphical environment be it Visual Studio or Explorer or whatever your local equivalent is.</div><div><br /></div><div>But, for some strange weird reason - for git suddenly it must be done from the bash prompt. I try to explain, I try to show, but the command line fixation remains. Git from the command line users repeatedly get into trouble and have to reset their local repository clones. Command line users frequently take much longer to investigate history inspections, and also often just skip doing things like rebasing the feature branch before merge to the main development branch causing the commit history to become much more complex. Just because it's too complicated to do from the command line.</div><div><br /></div><div>Still, they persist.</div><div><br /></div><div>I get that there are some rare cases that the command line is needed. I get that for batch operations, such as merging a bunch of repositories from perhaps the development branch to the main branch a script using the command line is perfect.</div><div><br /></div><div>But the command line is not gone just because you use a modern tool for day-to-day use. It's still there when you need it.</div><div><br /></div><div>Why do young gifted competent developers insist on using a tool model that originates in the 60's when there are so many better alternatives - and in all other cases they do?</div><div><br /></div><div>I don't get it. But it is a problem, because it affects productivity both directly in daily use and indirectly because it tends to cause more complex commit histories to untangle in the future.</div></div>Svantehttp://www.blogger.com/profile/13946027974134920903noreply@blogger.com0tag:blogger.com,1999:blog-7423109771980410273.post-62378750492795920242021-04-26T00:11:00.000-07:002021-04-26T00:11:42.561-07:00Code should work every time<h2 style="text-align: left;">The Philosophy of Black and White</h2><div>I have a background in real time operating system kernels, compilers and encryption.</div><div><br /></div><div>In all of these contexts, it is self-evident that a piece of code either works every time, or else it's broken and it must be fixed immediately.</div><div><br /></div><div>Code that works most of the time, or even almost every time just won't cut it.</div><div><br /></div><div>If there are special circumstances during heavy load, or during startup or shutdown or just with bad enough "luck" that the software does not work as expected, it needs fixing. Perhaps it controls heavy machinery, compiles your code or encrypts your data. Right?</div><div><br /></div><div>Would you feel comfortable with a compiler that emitted faulty code every now and then, depending on how heavy the system load was? Of course not.</div><div><br /></div><div>I have taken this view to heart, and in my mind there are really only two types of software:</div><div><ol style="text-align: left;"><li>Software that behaves consistently every time, regardless of load or timing.</li><li>Broken software.</li></ol><div>(Software of type 1 can of course still be broken in other ways - but it should then be consistently broken!)</div></div><div><br /></div><div>Here I'd like to argue that this is just as applicable to a national informational website, a booking system, or even a game.</div><div><br /></div><div>Software should behave consistently every time, else it's broken.</div><div><br /></div><div>Unfortunately it's often hard to achieve, especially as so much code today runs in a web server environment, which is an inherently multi-threaded environment with a high degree of parallelism. Most issues with software that behaves inconsistently come from parallelism, and race conditions.</div><div><br /></div><div>A race condition is when two different actors invoke code at the same time, and the outcome depends on who wins the race.</div><div><br /></div><div>There are numerous mechanisms, and even whole books, dedicated to solving these problems.</div><div><br /></div><div>My purpose here is not to talk about all those ways, but rather to argue that it's important as a developer to feel that it's important that our code works consistently every time.</div><div><br /></div><div>In real life, I often have to argue about these matters, and I'm not infrequently met with the basic opinion "well, it really doesn't happen that often and it looks like hard work to fix so we'll live with it".</div><div><br /></div><div>The problem is that these things tend to grow worse exponentially with increased load. So, if you're working on something that is expecting fewer and fewer users and lower and lower load, well maybe you can get away with "it's not worth fixing".</div><div><br /></div><div>If you're like most of us, working on something that expects increased load and continued development, remember that bugs do not age well. They just get worse and worse, until it's really bad.</div><div><br /></div><div>So, while perfectionism over all is not always a winning strategy, for consistency under load always strive for perfection. Otherwise you're building software broken by design.</div><div><br /></div><div>For these matters, it really is back or white. Either the code works every time, or it's broken.</div>Svantehttp://www.blogger.com/profile/13946027974134920903noreply@blogger.com0tag:blogger.com,1999:blog-7423109771980410273.post-39736626105787574122021-04-23T07:36:00.000-07:002021-04-23T07:36:36.640-07:00Using a search index as a database is a bad idea<h2 style="text-align: left;">The case</h2><p>I am currently working on project where we are using a CMS product, in conjunction with a search service based on elastic search as well as some back-end API:s.</p><p>Elastic search is an incredibly powerful search service, and you can do almost anything with it.</p><p>But should you?</p><h2 style="text-align: left;">What we did</h2><p>With a CMS that does not support storing of arbitrary data particularly well, it is tempting to look for creative alternatives. It's always a big step for example to add support for an OR mapper or any other custom table or database. It's not necessarily a good idea, if it can be avoided.</p><p>In this situation we hit upon the idea of using the search index to store data fetched from a back-end API. After all, it can serialize and index just about any .NET type - so why not add some data carrying properties to our custom index object?</p><p>I had a bad feeling about this, my spidey-sense started tingling... I was thinking that an index is something fairly approximate and it's only intended to as well as possible make it possible to find data. Not store it. Hmm...</p><p>In the team I tried to argue along these lines, but I had no luck. So off we went, starting to store more than text to search and back-references to the actual data.</p><h2 style="text-align: left;">What happened</h2><p>So now we're in trouble. Not really deep trouble as yet, but it's just not a good idea as it turns out. We're getting inconsistent states and the code can't trust what it sees. It works, sort of, kind of, most of the time but...</p><p>The problem is basically that when you store your data in a database or a file, you expect consistent and reproduceable behavior, every time. If you don't get that, the assumption is that something is broken.</p><p>When you store your data in a search index, this just does not apply, here are some of the reasons:</p><p></p><ul style="text-align: left;"><li>The index never promised to give a consistent view! Two reads can give different results, in elastic there's a concept of shards for example that can cause this behavior.</li><li>The index never promised that a write is immediately or deterministically reflected in a subsequent read. This is due both to caching and to queueing behavior in the index, since the assumption is that you're basically requesting an index update that you'd like to be effective asap - but not guaranteed immediately.</li><li>The index has a rate limit, it's perfectly ok for it to say that it's too busy, since the assumption is that at worst you lost an index update. No data is lost. With a database or a file etc., if that happens, you'll just simply have to gear up, it's a fatal error situation.</li></ul><div>Specifically we're now in a situation that our code won't always work, mostly if we're "too fast". If we wait a few minutes between writing, and expecting it to be able to read back, it'll often but not always work.</div><h2 style="text-align: left;">Conclusion</h2><div>Don't use a search index as a data store. Even if looks cool and easy, don't. I don't know exactly what problems you'll run into, but I'll bet a beer or soda that it'll cause you some unexpected grief.</div><p></p>Svantehttp://www.blogger.com/profile/13946027974134920903noreply@blogger.com0tag:blogger.com,1999:blog-7423109771980410273.post-6807500971208878392014-04-14T01:19:00.000-07:002014-04-14T01:19:07.291-07:00AxCrypt, Xecrets and the OpenSSL Heartbleed security issue<div dir="ltr" style="text-align: left;" trbidi="on">
<h2 style="text-align: left;">
Information about Heartbleed</h2>
On April 7, a <a href="https://www.openssl.org/news/secadv_20140407.txt">security advisory</a> was published concerning OpenSSL, the security vulnerability described has been given the popular name 'heartbleed'. OpenSSL is a software library component commonly used in web servers supporting encrypted communication using SSL with clients.<br />
<br />
This issue probably affects the majority of web servers in the world, and is about as serious as a security issue can be. It's arguably the most dangerous vulnerability the Internet has seen.<br />
<br />
However, it does not in any way affect the security of <a href="http://www.axantum.com/AxCrypt/">AxCrypt</a> file encryption or <a href="http://www.axantum.com/Xecrets/">Xecrets</a> online password manager.<br />
<br />
In the case of AxCrypt, simply because AxCrypt is not a web server, and does not use SSL in any way.<br />
<br />
Xecrets is an online service, using a web server, and does use SSL but it is still not vulnerable because OpenSSL is not used, i.e. the faulty component is not part of the software used by Xecrets. There is no indication that the Certificate Authority used by Xecrets has been compromised, so connections to https://www.axantum.com/ are still to be trusted fully as before.<br />
<br />
You do not need to change passwords or passphrases for AxCrypt-encrypted files or your Xecrets account <b>unless you use that same or similar password somewhere else</b>.<br />
<br />
<br /></div>
Svantehttp://www.blogger.com/profile/13946027974134920903noreply@blogger.com7tag:blogger.com,1999:blog-7423109771980410273.post-83604615651616724302013-10-02T03:30:00.001-07:002013-10-02T03:30:27.455-07:00The Lesser Evil, avoiding Copy and Paste<div dir="ltr" style="text-align: left;" trbidi="on">
I'm a great supporter of <a href="http://www.pearsonhighered.com/educator/product/Clean-Code-A-Handbook-of-Agile-Software-Craftsmanship/9780132350884.page">clean code</a>. <a href="http://blog.axantum.com/2013/04/the-shortest-book-on-good-programming.html">My own take</a> on this can be found in an earlier post. The most common issue that I find is Copy and Paste-programming, and the most common explanation is lack of time. The problem is that it seldom saves time, even in the short run.<br />
<br />
Copy and Paste-programming is a time thief. Every time. Even that deadline you have in 2 weeks, 2 days or 2 hours is endangered.<br />
<br />
I'm hoping this post may inspire some hard-pressed-for-time developers to find the resolve to tackle the demon that is Copy and Paste, and come out feeling a little bit better and actually delivering more in less time.<br />
<h4 style="text-align: left;">
The rationale</h4>
You probably know that Copy and Paste is a bad thing. Unfortunately most focus is on long-term maintenance aspects when explaining why it's bad. This makes for a perfect rationale when you're in a hurry. "<i>I'll Copy and Paste this now, just to get the functionality done in time. I know I'll have to pay for this later during maintenance, but I don't see any other choice.</i>".<br />
<br />
I'd like to point out that the problems wíth Copy and Paste are much more severe than this, and that there really are other choices! Remember that evils are seldom equal, and if you have two evils to choose from, go for the lesser evil!<br />
<h4 style="text-align: left;">
The maintenance argument</h4>
The main problem with the maintenance argument is that maintenance is not strictly separated from development. If you're lucky enough to work in a really agile environment, there's really no such thing as maintenance anyway, just a continuous number of releases. So which release/sprint/deploy should pay the cost of later, in favor of now?<br />
<br />
It's seldom old code that you're Copying and Pasting. It's typically relatively new code, which means that it's still in rapid movement. Chances are, that within those 2 weeks/days/hours to release or end of the sprint, you'll have revisited that Copy and Pasted code more than once. In which case you'll have to propagate the changes to both copies, or forget one, and then have to bugfix it during acceptance testing - or worse hotfix it after deployment.<br />
<br />
In either case, you're paying the cost for the Copy and Paste, not in that magical later slow phase of maintenance, but right now, when time's most at a premium. You're not saving time. You're losing time, and the reason you have so little time to lose is partially because of many small decisions like this.<br />
<br />
Why is it likely that it's new code you're Copying and Pasting? Because otherwise it's unlikely that you know that the code is there to Copy! The other reason is that the most stressed-for-time releases are the major ones, especially the first. These are times where there is lots and lots of new code, also increasing the chances that it's new code. Finally, if the code is really old and stable, there's at least a chance that the functionality in question really has been factored out to common ground and there's no need any longer to Copy and Paste.<br />
<br />
So, the "<i>we'll gain time now, and pay for it later but that's ok</i>"-argument is simply wrong. You're not gaining time now. You're losing time now.<br />
<h4 style="text-align: left;">
The lesser evil</h4>
<div>
There are however options. Copy and Paste is the greater evil, but what are the lesser evils? For example:</div>
<div>
<ul style="text-align: left;">
<li>Extract the code snippet to a general miscellanenous utility type. So, now you have a type with various disconnected pieces of code. A bucket full of... That's a lesser evil.</li>
<li>Move it to a common base class. So, now you're putting support code in a base class, and using inheritence to extend rather than specialize. That's a lesser evil.</li>
<li>Make the code snippet in question a public method just where it happens to be. So, now you have an unholy dependency between two types that probably should not have the dependency. That's a lesser evil.</li>
</ul>
<div>
All of these lesser evils are explicitly known to the compiler and fairly easy to safely refactor later, and it might even be good strategy since usually a pattern emerges after a while and it becomes clearer just where the common code should reside. There's only one copy of the code in either of these lesser evil alternatives, and that has to be a good thing!</div>
</div>
<div>
<br /></div>
<div>
Even if you never get to the refactoring stage, these are still lesser evils than the alternative - Copy and Paste.</div>
<div>
<br /></div>
<div>
Hopefully some readers will here find some supporting arguments and useful techniques reducing the amount of Copy and Paste.</div>
<div>
<br /></div>
</div>
Svantehttp://www.blogger.com/profile/13946027974134920903noreply@blogger.com0tag:blogger.com,1999:blog-7423109771980410273.post-60030946730766438712013-06-09T10:03:00.000-07:002013-06-09T10:03:29.194-07:00Aggressive Coding<div dir="ltr" style="text-align: left;" trbidi="on">
<h2 style="text-align: left;">
Why you should code agressively, not defensively</h2>
<div>
Defensive coding is a concept that has it's origin in the absolute first ideas about programming as a craft, at least 40 years ago. <a href="http://en.wikipedia.org/wiki/Defensive_programming">Wikipedia</a> describes it as something "<i>intended to ensure the continuing function of a piece of software in spite of unforeseeable usage of said software</i>". Apparently principles that can be characterized as defensive coding techniques are still being taught, or at least not actively discouraged.</div>
<div>
<br /></div>
<div>
Defensive coding sounds good in theory, but in practice it tends to excarberate the problems in question, and clutter up the code making it harder to understand and refactor.</div>
<div>
<br /></div>
<div>
A typical idea in defensive coding, is the <a href="http://en.wikipedia.org/wiki/Defensive_programming">Wikipedia </a>example of copying strings from one buffer to another. The idea is that if the caller provides a longer source string than expected, this might in the case of C/C++ open up for the classic buffer overrun security vulnerability.</div>
<div>
<br /></div>
<div>
The defensive coder will, as <a href="http://en.wikipedia.org/wiki/Defensive_programming">the example shows</a>, provide a function that checks the maximum buffer length and silently refuses to copy more than that to the destination buffer. <i>This is bad and dangerous!</i></div>
<div>
<i><br /></i></div>
<div>
The defensive coder has now just hidden a serious bug in the calling code. If the contract states that 1000 characters is the maximum length of the input, the caller must ensure this <i>and the callee refuse to accept anything that violates the contract</i>.</div>
<div>
<br /></div>
<div>
The agressive coder will instead throw an exception or simply terminate the program if the function is called with a source larger than the allowed 1000 characters. <i>This is safe and secure programming</i>!</div>
<div>
<br /></div>
<div>
In terms of my current preferred language C#, I see this principle violated in a variety of ways. One frequent pattern is checking return results from other library functions for NULL, or empty strings etc, and then attempt to silently do something despite that this was unexpected. This typically indicates that the programmer does not know the contract of the method s(he) is calling. <i>Do know the contract and ensure to follow it when providing input, and assume that it is followed for the produced output!</i></div>
<div>
<br /></div>
<div>
I recently rewrote a major functionality for a client and this also involved updating and refactoring the dependent code. To my horror a huge amount of code was devoted to checking the outputs of other methods, even to the extent of catch-all clauses silently ignoring any and all problems.</div>
<div>
<br /></div>
<div>
Instead of checking outputs from other other code because '<i>maybe </i>it can return a NULL' - find out if it can, and what the appropriate action is! If you can't find out with reasonable effort and it doesn't seem a useful response, just use the return value and let your code blow up with a <span style="font-family: Courier New, Courier, monospace;">NullReferenceException </span>should it in fact happen. This will alert you to the problem, and you can then find out what you really should do about it. Do check the inputs to your own code, and when the caller violates your contract report this with an exception.</div>
<div>
<br /></div>
<div>
Controlled crashing is good when it's because a caller violates a contract!</div>
<div>
<br /></div>
<div>
Agressive coding increases the chance of problems being caught and fixed early, and reduces the amount of clutter in the code immensly. This in turn lets you concentrate on what your code should do, instead of what someone elses code should not.</div>
</div>
Svantehttp://www.blogger.com/profile/13946027974134920903noreply@blogger.com2tag:blogger.com,1999:blog-7423109771980410273.post-89502993055562487722013-04-25T11:17:00.000-07:002013-04-25T23:19:39.240-07:00The shortest book on good programming, ever!<div dir="ltr" style="text-align: left;" trbidi="on">
<h2 style="text-align: left;">
The Coders Decalogue</h2>
This text is about making software work better and saving huge amounts of time, irritation and frustration for developers, users, customers and other stakeholders in the software business. Which likely means you.<br />
<br />
When not developing my own software <a href="http://www.axantum.com/AxCrypt/">AxCrypt </a>and <a href="http://www.axantum.com/Xecrets/">Xecrets</a>, I work as a contractor and consultant. In my work, I work with new software and old software. I work quite a bit with advanced troubleshooting and performance optimization in the .NET area.<br />
<br />
<b><i>Over the years, I've come to realize that I spend most of my time as a developer, doing things I wouldn't need to do if just a few simple rules are followed</i></b>. I'd still have more than enough to do, no worries, but I'd be delivering much more real value to my customers for each hour spent. And so would millions of other developers. Come on - this is really not that hard!<br />
<br />
I won't explain the rationale here, or give lot's of pedagogical examples. That would turn this into a real book, which would be nice. But I don't have time to write a book and you probably don't have time to read one.<br />
<br />
So just trust me on this ;-) Really.<br />
<br />
<ul style="text-align: left;">
<li><b style="font-style: italic;">Do write code for humans. </b>Smart one-liners, compact code, use of sneaky language constructs etc may not break your program. But it's not enough that the compiler understands the code. <b><i>Don't write for the compiler</i></b>, write your code in a style to make it as easy to read for humans as possible.</li>
</ul>
<ul style="text-align: left;">
<li><i><b>Don't copy and paste code</b></i> with any kind of logic (ifs, loops, selects etc). <b><i>Do always factor out common snippets</i></b>. Even when you're in a hurry. Single-liners without logic are ok. That's called a statement in most languages, and you do need a few of those to make something happen and they can't all be uniqe.</li>
</ul>
<ul style="text-align: left;">
<li><b><i>Don't check-in commented out code</i></b>. It's ok when you're trying out the new code - but when you're done, you're done. Since the code anyway resides in a version control system (right?), the old code is still available in the history. <b><i>Do check-in clean</i> <i>code</i></b><i><b> </b></i><i style="font-weight: bold;">frequently </i>always improving it slightly at the very least..</li>
</ul>
<ul style="text-align: left;">
<li><b><i>Do use long and descriptive class, method and member names</i></b>. Letters in your source code are cheap. Use them freely. <b><i>Don't abbreviate</i> </b>unless it's an industry or domain standard. </li>
</ul>
<ul style="text-align: left;">
<li><b><i>Don't comment code to explain what it does</i></b>. If you need comments to explain the code, fix the code instead so it's understandable. If you release libraries, use structured comments for public classes and methods to document intended usage patterns, assumptions and contract details. <b><i>Do comment why</i></b> the code does what it does, when it's not obvious.</li>
</ul>
<ul style="text-align: left;">
<li><b><i>Don't nest if-statements or loops</i></b>. In some special cases, one-liners inside a nested if may be ok. <b><i>Do use early-exit and write small methods</i></b> to remove the need for nestling inside a method.</li>
</ul>
<ul style="text-align: left;">
<li><b><i>Don't catch exceptions unless you know why you're catching them</i></b> and what to do about it. Never catch all exceptions, except at the top of a given thread's call hierarchy and then only if consequences of not catching it dictate the need. If you do, log it! <b><i>Do program to avoid exceptions</i></b> when you know the conditions to prevent it happening in the first place.</li>
</ul>
<ul style="text-align: left;">
<li><b><i>Do write short methods that does one thing and are named accordingly</i></b>. If a method does not fit on a screen of a reasonable size, then it does too much. If you have trouble naming it properly, it probably does too many things. <b><i>Don't write long methods</i></b> that you need to scroll to see all of.</li>
</ul>
<ul style="text-align: left;">
<li><b><i>Don't try to be smart</i></b>. When there is no known need for advanced or smart solutions, <i style="font-weight: bold;">do keep it simple </i>and use simple standard patterns<i style="font-weight: bold;"> </i>until you know it needs special treatment. </li>
</ul>
<ul style="text-align: left;">
<li><b><i>Don't optimize unless you know you need to</i></b>. You'll know by measurements using performance profilers. This is not the same thing as writing inefficient code. <b><i>Do write efficient code according to best practices</i></b> that avoids known pitfalls and bad design. Performance optimizations come on top of that, for example caching or special-purpose thread-synchronization constructs, and are to be avoided until the need is proven.</li>
</ul>
<ul style="text-align: left;">
<li><i><b>Do always step through your code at least once</b></i> to verify your assumptions about it's behavior. <b><i>Don't trust just running the application</i></b> and be satisified when it appears to work.</li>
</ul>
<div>
This is in no way the complete zen of good programming, nor is it revolutionary or unique. All of this has been said before. I'm sure you'll have your own pet peeves you'd like to add to the list. I have a few of my own, but the idea here is to list important things that are really super-simple to do. Now.</div>
<div>
<br /></div>
<div>
I am absolutely convinced that if these rules are followed, overall productivity in the software industry will rise dramatically.</div>
<div>
<br /></div>
<div>
If you're a developer, are there any of these rules you honestly disagree with? Do you work like this already? If not, try it out! Use peer-reviews to discuss your check-ins with this list as a guideline.</div>
<div>
<br /></div>
<div>
It's really this simple.</div>
<div>
<br /></div>
<div>
PS - There are 11 rules here. I'd like to get it down to 10 as the title indicates. Cast your vote on which one should go! Or perhaps what needs to be added, but then you'll have to drop two... ;-)</div>
</div>
Svantehttp://www.blogger.com/profile/13946027974134920903noreply@blogger.com4tag:blogger.com,1999:blog-7423109771980410273.post-60166020482614288682012-08-04T13:50:00.000-07:002012-08-04T13:50:01.628-07:00Security, compatiblity and backup<br />
Users of AxCrypt are obviously concerned about the security of their files. Howerver, there is some confusion about just what security means.<br />
<br />
Encryption means security from others reading the data. In the case of AxCrypt, it also means that undetected modification of the data is not possible.<br />
<br />
Encryption does not mean security from data loss for any number of reasons, such as accidental deletion, ransom attacks by hackers where AxCrypt even has been know to be used by the black hats, or hard disk crashes.<br />
<br />
In fact, encryption adds another level of processing to the files, actually increasing (albeit very slightly, but still) the risk of something going wrong. If you think about it - the more you do, the higher the risk of a snafu. That doesn't mean AxCrypt is dangerous, it just means what it means - the more operations you perform the higher the risk is, as counted in number of failures per million for example.<br />
<br />
In this day of rapid development on all fronts, there's always the question of data compatibility across computers and program versions.<br />
<br />
All AxCrypt-versions from 1.0 to the current 1.7 in both x86 and x64 bit versions are compatible with each other, so no worries there. AxCrypt will always be upwards compatible, so version 2.0 may in fact in the future produce encrypted files 1.7 can't read - but version 2.0 will always be able to decrypt anyting an older version has produced. But, at this time, all versions are in fact compatible.<br />
<br />
Also, AxCrypt-encrypted files are not tied to any particular installation in any particular computer, and uninstalling AxCrypt won't decrypt any files any more than uninstalling Word converts your documents to Notepad text files. If you have the file, and know the password, you can always decrypt it in any computer where you can get one of the various versions of AxCrypt running.<br />
<br />
Now to the most important message about security, in the meaning keeping your data safe not only from prying eyes - but from any number of catastrohpes.<br />
<br />
<b>Your most important and powerful protection against data loss is spelled 'BACKUP'.</b><br />
<br />
Please ensure that you have backups of all your data, encrypted or otherwise, and that you keep a reasonably recent version of the copy off-site, and that you periodically do check that you in fact can read the backup and that the expected data is really on the backup media.<br />
<br />
Personally I backup to two USB-drives that I swap once every few weeks, always keeping at least one drive off-site. It's cheap, it's effective and it's very safe since all the data on the backup is encrypted.<br />Svantehttp://www.blogger.com/profile/13946027974134920903noreply@blogger.com0tag:blogger.com,1999:blog-7423109771980410273.post-45417183683777718862012-07-23T14:15:00.001-07:002015-06-05T06:25:35.974-07:00Anti-Malware Vendors - here we go again with another round of FUD...Over the years, I've been <a href="http://blog.axantum.com/2011/08/concerning-false-positive-reports-about.html">periodically plagued</a> by false positives reported for <a href="http://www.axantum.com/AxCrypt/">AxCrypt </a>by various anti-malware vendors. These small-time, opportunistic, shady vendors like Microsoft, ESET, McAfee, Avast et. al. have a long history of just flagging anything they please as malware, and be damned the consequenses.<br />
<br />
I am a small one-person operation providing free strong encryption software for personal privacy and security. I have over a decade and perhaps 20 million downloads of faultless operation on record. Nevertheless, at least once a year, these companies start reporting my software as malicious, causing me and my users no end of grief.<br />
<br />
Why will not a single one of them just for once take repsonsibility for their actions? I have not received as much as one single communcation from them. Not once. Not when they flag my software falsely as malicious. Not when they rescind that flagging, as they inevitably do when enough users get suspicous and start questioning the reports.<br />
<br />
Now, in 2012, it's starting again. This time because I'm trying to make some small revenue using bundled advertisments for other software with the installer in order to be able to spend some more thousands of hours developing free software. For more specifics about that particular choice read <a href="http://www.axantum.com/axcrypt/freeware.html">here</a>.<br />
<br />
As a current example, a recent <a href="http://www.microsoft.com/security/portal/Threat/Encyclopedia/Entry.aspx?name=Adware%3aWin32%2fOpenCandy&threatid=159633">report from Microsoft</a> concerning the <a href="http://www.axantum.com/AxCrypt/Freeware.html">adware bundle</a> <a href="http://www.axantum.com/AxCrypt/">AxCrypt </a>uses that is at the time of writing actually a <i>disclaimer of a recent false positive</i> may serve. This causes uncertainty and fear for my users, but what does Microsoft care? Did they ask before flagging? Did they report when they removed the flag?<br />
<br />
A different example are some recent reports about <a href="https://www.virustotal.com/url/fd67ea99f3492374ca32a911415fd12eb8e0ec5b3b4ff463d57a5420d649d125/analysis/1343076298/">my site</a> and <a href="https://www.virustotal.com/file/40eb871fa5e9efcec1103fb5105563151316449a9bcf470d85cef6c651786779/analysis/1342958999/">my software</a> from virustotal.com which is even worse, because these guys hide behind the additional screen of being an aggregator - so they don't even have to take any responsibility at all, they're just forwarding information uncritically. This is a free service, so you can't even complain.<br />
<br />
What can you as a user do? I don't really know, miss out on great, safe and free software because of fear, uncertainty and doubt seems the most likely case. Or, you may start to at least make your voice heard when these situations arise.<br />
<br />
When your Anti-Malware software reports a false positive - demand your money back!<br />
<br />
What can I do? I don't know that either. If you have any ideas on how I can protect my reputation and continue to provide free, safe security software - do let me know.<br />
<br />
I'm getting tired of this. How much cr*p must I take to write and publish free software for your security and integrity?Svantehttp://www.blogger.com/profile/13946027974134920903noreply@blogger.com3tag:blogger.com,1999:blog-7423109771980410273.post-71157290395231423862012-07-23T05:58:00.000-07:002012-08-06T07:38:26.402-07:00AxCrypt used for ransom attacksIn October 2011 I got an e-mail from a Turkish corporation, claiming that someone had hacked their server to the extent of getting full administrator access. Thereafter the hacker had installed AxCrypt and encrypted all or most of the files on the server, and subsequently demanded a ransom from the company owning it.<br />
<br />
At first I was very sceptical - how could someone get that kind of access to a server, and then hit on the idea to use AxCrypt to encrypt the files (for which it is workable, but not really well suited since it for example requires full administrative permissions to install etc, not just write permission to the files). On top of that - no backups, the only copy of the files were apparently the files on the server.<br />
<br />
It seemed just to bad to be true. A file server wide open to remote login with administrator permissions and a guessable password with no backup routines? My first guess was that this was some kind of scheme to see if I would respond that, "Sure, there's a backdoor into AxCrypt - just pay me a small amount and promise not to tell anyone and I'll help you out.". Sorry, no such (bad) luck. AxCrypt does not have any backdoors, and I can't be of help.<br />
<br />
Now, in July 2012, I've had an additional few similar e-mails and even a few phone calls, in total about 10. All of them from Turkey. Strangely enough the contacts have escalated, at start it was only e-mails which were not responded to when answered, then the e-mails started getting answered, then english speaking persons were calling from Turkey - now most recently Swedish-speaking persons are calling from Sweden, still referring to problems orginating in Turkey.<br />
<br />
I'm still at a loss to really explain the phenomen, but I'm now tending towards actually believing that the basic facts are true. Servers and perhaps also personal computers are being hacked (it's not entirely clear just what kind of computers have been hacked). That so far every single incident has been in Turkey, is I believe due to the simple fact that the hacker is likely to be Turkish. A significant number of these hacks seems to occur during the weekend, so it's also likely that the hacker has a day job too which is somewhat comforting since it implies that the 'business' is not very profitable.<br />
<br />
<b>If you happen to be the victim of a ransom attack</b>, in Turkey or elsewhere, I am very sorry for your sake but please understand that I cannot be of any help whatsoever. You must contact your local police authorities and get them to investigate. They should be motivated to do so, since apparently this is not that infrequent - once again assuming that the stories I hear are actually true as told.<br />
<br />
I've tried to come up with some way to make AxCrypt even less suitable for the purpose of ransoming, but I really can't think of anything. It's just a tool, and if you let the hacker into your system with full administrator permissions, I don't think there's anything anyone can do - except you and that is to have backups!<br />
<br />
This is not an AxCrypt issue. This is a security policy issue at the vicitims site.<br />
<br />
The hackers are even not that smart to use AxCrypt. To perform the attack they don't really have to install anything - all they have to do is to encrypt the file system with EFS, Encrypting File System which is an integral part of all modern Windows editions, export a recovery certificate and then reset the administrator password. Done. No need for extra tools such as AxCrypt. On top of that, there are literally hundreds of alternative encryption tools out there, all of them potentially 'useful' in this context. I guess in a twisted kind of way I should regard it as a compliment that AxCrypt is so easy to use and secure that even hackers want to use it!<br />
<br />
Remember that <b>backups are your final protection against data loss</b>, regardless of the cause. Go check your backup routines now - and validate that you actually can read the backups regularily as well!Svantehttp://www.blogger.com/profile/13946027974134920903noreply@blogger.com7tag:blogger.com,1999:blog-7423109771980410273.post-37303748698382823572011-10-29T12:00:00.000-07:002011-10-29T12:00:03.633-07:00About Xecrets and the XML Encryption VulnerabilityOn October 19, researchers at the Ruhr-Universität Bochum <a href="http://aktuell.ruhr-uni-bochum.de/pm2011/pm00330.html.en">announced </a>a flaw in W3C XML Encryption.<br />
<br />
The <a href="http://www.axantum.com/Xecrets/">Axantum Password Manager Xecrets</a> uses XML Encryption to store data on our servers.<br />
<br />
This <u>does not mean</u> that Xecrets is vulnerable to attack.<br />
<br />
The flaw only works in an attack against a server that knows the encryption key, and that can be queried about the result of attempted decryption of partially modified encrypted data. It is based on the fact that most implementations will happily decrypt the provided data using the secret key and then give different error messages if the decrypted data cannot be parsed as XML. These varying error messages can then be used to infer the original data, but not the actual encryption key.<br />
<br />
Xecrets on the other hand never accepts encrypted XML in this way, nor does it know any users encryption except briefly during the users visit.<br />
<br />
The XML Encryption flaw does not affect Xecrets.Svantehttp://www.blogger.com/profile/13946027974134920903noreply@blogger.com3tag:blogger.com,1999:blog-7423109771980410273.post-35260936682111162982011-08-23T03:04:00.000-07:002011-09-30T03:05:30.611-07:00Why you should install programs to the default locationAxCrypt since version 1.7 does not have an option for the user to select installation directory during the installation process.<br />
<br />
Some like to change the installation directory, typically to D: or E:, instead of the standard location typically on C: and on English versions of Windows in C:\Program Files\ or C:\Program Files (x86) . This is no longer directly possible from the installation graphical user interface of AxCrypt, and sometimes I get asked why.<br />
<br />
The main reason is to avoid trouble, and minimize user options where I as a developer believe I can make a better informed choice. AxCrypt is built around many such decisions based on that premise, we choose the algorithms to use instead of providing you with a bewildering array of choices for example. This is simply because I as an encryption expert believe that I can make this choice better in at least 99,9% of the cases and thus spare all those users a strange question they don't really know how to or even want to answer.<br />
<br />
With several millions of installations of AxCrypt, just about anything that can possibly go wrong has at least once or twice. More than twice I've had to help users with trouble caused by not understanding the interaction between Windows, the registry, fixed, removable and network drives and AxCrypt installation. AxCrypt has been installed to network drives, on remote VPN-mounted drives, on USB drives on CD's and just about anything you can imagine. Often it works, but sometimes it does not.<br />
<br />
With AxCrypt 1.7 and the upgrade to use Windows Installer technology, a major motivation was increased robustness. Part of this is to minimize the risk of a user mistakenly making a bad choice, and the safest and easiest way to do this is to make the choice automatically. Thus the option to select installation a directory was removed from the installer graphical user interface. It's still there - but you need to know a bit about Window Installer in order to force it to do your bidding. The idea here is that any user skilled and knowledgeable enough to do this, is also skilled enough to make that decision with small or no risk of mistake. It's also very clear that if something does go wrong, it's something that needs to be fixed by the user and it does not wind up as an error report about AxCrypt.<br />
<br />
The thing that cinched the decision to remove the installation directory was that I could not, try as I might, think of a single valid functional reason for changing it from the system default! Aesthetical, arguably yes. Functional, no. AxCrypt is tiny, has no performance impact on the drive it is installed to, and does not produce any growing data there. When we do change the installation directory, we also break some assumptions that are made by other software. We are also now responsible to ensure the right file system permissions for example, we might open a vector for a malware infection by installing to a directory that allows non-administrative rights to write to for example. Please note that AxCrypt, due to Windows design limitations, requires administrator elevation to install anyway. Other assumptions we break are the locations of 32-bit vs. 64-bit software in the various virtualized environments offered.<br />
<br />
So, we wind up with a situation where I can find no situation where it's bad to install to the system default location, but several where it's bad to install to a different location. By making the intstallation easier for the user by removing one decision, we also make it safer and more robust. It's an easy call I think.<br />
<br />
Finally, if you can provide me with a valid functional reason for not installing AxCrypt to the system default location, please do so and I will try to accomodate that reason in the best way I can think of.Svantehttp://www.blogger.com/profile/13946027974134920903noreply@blogger.com4tag:blogger.com,1999:blog-7423109771980410273.post-70619544416936149712011-08-19T03:03:00.000-07:002012-08-01T11:31:57.864-07:00Concerning false positive reports about AxCrypt from antivirus softwareFrom time to time I get user reports about warnings from antivirus software concerning either the installer or one or more of the components of AxCrypt.<br />
<br />
This causes great trouble both for me and the user. The user often winds up with an inoperable software, and I get a lot of extra work defending myself against unfounded allegations by software companies that take no responsibility whatsoever. They will not guarantee anything about the 'security' they provide, and they will not in any way assume responsibility for harm caused by flagging clean software falsely as malicious. In a normal legal context this would be called slander, and be cause for legal action.<br />
<br />
Some facts about AxCrypt and AxCrypt distributions. AxCrypt is always built completely from source, we do not statically or dynamically link to any third party code except those libraries that are part of the Visual Studio development environment and which come directly from Microsoft.<br />
<br />
Distributions are not built in a developer PC, they are built in a special purpose build server which only do that. No other software is installed than that required to build our various softwares there. This server is stationed behind double firewalls, and is never used for any general purpose.<br />
<br />
As part of the automated build process, each executable is digitally signed with an authenticode certificate, issued to 'Axantum Software AB'. The issuer of this certificate do certify that such an entity exists, and that it is in good standing. I have provided them with proofs of the companys registration etc. This signing process then ensures that any bits distributed with that signature is traceable back to me and my company, and we would thus potentially be legally accountable for any malware intentionally placed there.<br />
<br />
To sum it up: <b><i>There is no infection in a distribution from me which is digitally signed with my authenticode certificate in the name 'Axantum Software AB'</i></b>.<br />
<br />
It is a continuing effort trying to defend oneself as an independent developer against the so-called anti-virus companies unfounded allegations.<br />
<br />
It is beyond belief that a serious anti-virus vendor still in 2011 will flag a properly digitally signed executable as malicious.<br />
<br />
If I had the financial resources I would take strong legal action, since this causes sometimes hard or impossible to repair harm to my good standing, and that of my programs.<br />
<br />
Please check that you have the properly digitally signed versions of both the installer and the executable components if you are in doubt, instructions on how to do this are found <a href="http://www.axantum.com/AxCrypt/Downloads.html">here</a>.<br />
<br />
Please help the community by reporting your findings as a false positive to your anti-virus vendor. Although the vendors empathically deny this, they do share signatures (or 'borrow' from each other). This is clearly evidenced by the fact that these false-positive situations usually come in swarms where I get a few reports first from one vendor, and then most of the other vendors follow suit. That can't be a coincidence...<br />
<div>
<br /></div>Svantehttp://www.blogger.com/profile/13946027974134920903noreply@blogger.com1tag:blogger.com,1999:blog-7423109771980410273.post-27724227246047994362010-09-20T02:58:00.000-07:002011-09-30T03:03:05.919-07:00About the ASP.NET Padding Oracle Attack<h2>About the Padding Oracle Attack</h2><br />
You may have read about the Padding Oracle Attack, risking exposure of sensitive information in millions of ASP.NET sites.<br />
<br />
This site is not one of the them in any real sense, and never was.<br />
<br />
The ASP.NET Padding Oracle Attack exploits a vulnerability published as early as 2002 by Serge Vaudenay in a paper entitled "Security Flaws Induced by CBC Padding Applications to SSL, IPSEC, WTLS...". As usual it's amazing how long time it takes for these things to come to the attention of the large vendors, such as Microsoft.<br />
<br />
This attack is in no way specific to ASP.NET - just about every major web platform is likely to be potentially vulnerable. For the technical details, please read the paper by Vaudenay as well as more recent paper entitled "Practical Padding Oracle Attacks" by Juliano Rizzo and Thai Duong. Here I'll just try to explain the factors that cause the vulnerability, and what the consequences may be as well as to describe why this site never was vulnerable in any real sense.<br />
<br />
Padding is used in a block cipher to make clear text about to be encrypted an even multiple of the block length. In other words, if the encryption algorithm is designed such that it encrypts 16 bytes at a time, and your clear text is not a multiple of 16 bytes long, we need to add a few dummy bytes at the end to make it an even multple of 16 in this example. These 'dummy bytes' are called padding.<br />
<br />
Most encryption schemes use padding that follows a pattern so that the decryption logic can recognize and remove them. Since such a padding scheme is self-verifying, the decryption program can determine if the padding is correct or not - and also give a specific error if the padding is wrong.<br />
<br />
An attack requires access to an application that uses a block encryption cipher and actually knows the decryption key, and which an attacker can 'ask' if a given encrypted text contains a padding error or not.<br />
<br />
The idea is to send in encrypted text to the application, and then determine if it specifically has a padding error after decryption or not. Obviously, if an attacker sends in bad encrypted text, an error is likely to occurr, but the attack requires that an attacker can distinguish the very specific error 'padding error' from other errors reported.<br />
<br />
<h2>What's a Padding Oracle?</h2><br />
There are basically two ways an attacker can determine if a padding error has occurred as the result of the the manipulated encrypted text: The easy way is if the application actually says exactly this. With ASP.NET you can for example get the quite clear message "CryptographicException: Padding is invalid and cannot be removed". It does not get any clearer. The harder way is if the application shows different timing characteristics between reporting this error and other possible errors. This is a much harder attack, and much more likely to take significantly longer time since the timing is determined by many other factors as well that are likely to be unknown and uncontrollable by the attacker.<br />
<br />
The way to defend against the attack is then to A) ensure that no specific message or error code is returned when a padding error occurs, and B) ensure that timing cannot be used by an attacker as an indirect distinguisher.<br />
<br />
A Padding Oracle is something we can ask a question about a given encrypted text, and receive an answer stating either 'Yes, the padding is correct' or 'No, the padding is incorrect'. The trick is to ensure our application is not a Padding Oracle!<br />
<br />
The consequences of an attack and why it's so serious for ASP.NET<br />
<br />
What's the worst that can happen? Well, anything that is protected by the encryption key used to encrypt the data that the attacker is potentially vulnerable to both inspection and undetected modification.<br />
<br />
In the case of ASP.NET, this usually means that the 'machine key' is vulnerable. This is the ASP.NET machine key used to encrypted ViewState and cookies etc, it's not the Windows machine key. In the case of this site, we generate a new key every time the site is started, so even a successful attack has very short time of validity.<br />
<br />
Gaining access to the ASP.NET machine key typically means being able to impersonate a logged on user, and possibly gain access to files and other information available to that logged on user. In the case of ASP.NET 3.5 SP 1 and later, it means being able to access all files accessible to the web application via a virtual path. In actual practice, the attack is practical with only a few thousand tries on a typical web site.<br />
<br />
The problem with ASP.NET is that a security researcher found a pretty much universal 'Padding Oracle' that is almost entirely independent of the application in question. It uses the 'WebResource.axd' handler as an attack vector. This handler seems to have the bad taste to respond 404 Not Found when the coded resource has correct padding, but is wrong - and 500 Server Error when the coded resource has incorrect padding. There's your padding oracle.<br />
<br />
This is pretty bad, so we certainly should take this seriously.<br />
<br />
<h2>The status for www.axantum.com</h2><br />
The Xecrets on-line password storage has never been vulnerable to this attack for the simple reason that we don't know the encryption key users use, so there's no possibility that our application can be used as a padding oracle for the purpose of breaching the Xecrets password encryption.<br />
<br />
However, the Xecrets site as such does use ASP.NET and can theoretically be used as a padding oracle in the sense that it if it should fall to such an attack it would be possible to act as an administrator of the application (not the system). This will still not enable anyone to access stored Xecrets, because the system does not know the encryption key for those files. There is no sensitive information available that is protected by the ASP.NET machine key. It could in theory enable someone to get free access to the Xecrets service though!<br />
<br />
Also, becase we create a new machine key every time we restart or recycle the application, even a successful attack would only be valid for a rather short time. Then again, there are rumours that a followup to the attack could lead to code injection. <br />
<br />
The Xecrets site uses custom handling of both server errors and not found errors, but it's still probable that it was vulnerable to the WebResource.axd attack. The Xecrets site has from start employed a number of strategies to give aways as little information as possible and reasonable in the face of errors, and has thus always conformed to the first criteria to avoid vulnerability - it returns the same message and page regardless of what kind of error manipulated encrypted text sent to the site causes.<br />
<br />
The problem here is that Microsoft has once again failed to follow that maxim, and also failed to follow general good cryptology practices and confused encryption with authentication. Encrypted data should always be verified for authenticity before use, for examle by employing a Message Authentication Code, or a digital signature. All encryption from Axantum uses the well-known 'Encrypt-then-HMAC' or other mechanisms to ensure the authenticity of encrypted data. If ASP.NET had done the same, this would never have happened.<br />
<br />
Once again it is shown that following established security and encryption practices will mitigate the situation even in the face of future attacks, impossible to know at the original time of construction. It is also shown that even today, it will take up to 8 years(!) for billion dollar companies to react to a published threat affecting some of the worlds most widely deployed platforms.<br />
<br />
As of today, the Xecrets site is also updated to avoid even the ASP.NET Padding Oracle attack via WebResource.axd - or any other similar vector for that matter.Svantehttp://www.blogger.com/profile/13946027974134920903noreply@blogger.com0tag:blogger.com,1999:blog-7423109771980410273.post-45310416694805773162009-12-05T02:57:00.000-08:002011-09-30T02:58:45.710-07:00Password Expiration is a Meaningless RitualThere are many examples throughout history where a once meaningful rule over time outlives it's original usefulness and becomes meaningless ritual. Password changing policies in a modern network of independent computer like a typical corporate network is such an example today.<br />
<br />
A password changing policy is that annoyance that you are faced with every 3, 6 or 12 months typically when you get a notification stating that your password is about to expire and you have to change it.<br />
<br />
Now, why is that a meaningless ritual? Because the original justification no longer applies. This practice originated in a system of time shared central computers, your IBM mainframe or VAX/VMS mini(!) computer. You connected to this central beast using a fairly dumb synchronous or asynchronous terminal. The distintive feature of these terminals were that they did not load and exectute arbitrary code. They just displayed information as it was sent to them. To gain access to a system or an application, you started this application on the central computer, and it then asked you for your credentials, i.e. user name and password. If it was ok, it let you access the system. It was all fairly similar to the DOS-box we have in todays Windows computers.<br />
<br />
In these days of central IT departments and limited number of terminals, it was common practice that if you went for a vacation, or had to take sick leave, you'd let your collegue use your login information to help complete the tasks that needed completing. This of course led to a situation over time where you essentially lost control over who actually had access to your logon credentials and could use your account. So, to minimize the effect of this, IT departments invented password aging and expiration, forcing you to change it every now and then. This actually had an effect, because if someone with bad intentions actually had gained access to your password, it now become worthless (unless of course, you do as most people did then and still do - use a consistent theme for your passwords, since you can't be bothered to invent and remember a really new one every time).<br />
<br />
So, back to why this practice now is a meaningless ritual. Because the password no longer is limited to giving access is to a central system via a non-programmable terminal. Today, the password typically gives you the right to install and execute arbitrary code in the actual computer used to access the systems in question! Anyone with any kind of security training knows that if someone once has had access to a computer with enough privileges to run and install software, that computer is forever potentially compromised until it is reinstalled from original operating system media.<br />
<br />
Does changing the password acutally enable you to regain control over your system? Is that the recommended practice if you've had a virus or other malware in your computer? Change the password? Of course not! It's a meaningless gesture changing nothing. Your system will remain potentially compromised until you reinstall the original software from scratch.<br />
<br />
So, if you're and IT department manager why would you want to implement a password expiration policy? The only reason I can think of is because it feels good, and because it's way we've always done it. It doesn't actually improve the security of your network one single bit. Not at all. It does annoy the users, and gives you a certain sense of power of course! That's always something.<br />
<br />
What should you do instead, provided you're constrained to passords?<br />
<br />
<br />
<ul><li>Set up a password complexity policy that is tough enough that a dictionary attack is unlikely to succeed. Go for length rather than require special characters etc. 15 characters or more is probably a good idea.</li>
<li>Set up a password change policy to the effect that the password never expires and cannot be changed by the user - yes, the opposite of what is probably the most common policy today!</li>
<li>The best is to generate passwords for your users - yes, you select them! Use a password generator that produces passwords that are not just random collections of characters, but rather combinations of characters that are possible to remember. Give the new user the password an a piece of paper, and keep no copy for your self.</li>
<li>Explain to the new user that this is the password, it's ok to keep the paper in the wallet for a few weeks until it sticks to memory. In return for this rather tough password complexity, the user will never need to remember another password while employed by this company. That's a fair tradeoff!</li>
<li>Also explain that this password may not be re-used at any other location, that it's a breach of company security IT policy to do so. The password is in effect company confidential and privileged information that may not be disclosed to any third party.</li>
</ul><br />
Now, if you get into the situation that the password is considered compromised, this will most likely be because of a malware infestation in your corporate network, it's fairly obvious that you both have to clean all the systems and change all the possibly compromised passwords. But only then! And the reverse applies too, if you have a suspicion that the password is compromised, you should consider all systems where this user has logged on as compromised and candidates for reinstallation.<br />
<br />
So, let's start to modernize our policies and actually make them mean something instead of going through old and meaningless rituals!<br />
<br />
Update your password policies today!<br />
<div><br />
</div>Svantehttp://www.blogger.com/profile/13946027974134920903noreply@blogger.com0tag:blogger.com,1999:blog-7423109771980410273.post-46805923263220396342009-03-01T02:55:00.000-08:002011-09-30T02:57:15.539-07:00How not to shuffle a deck of cards with LINQI’m an avid reader of MSDN Magazine, and seldom find any errors. However, in Ken Getz's article “<a href="http://msdn.microsoft.com/en-us/magazine/cc700332.aspx">The LINQ Enumerable Class, Part 1</a>” in the July 2008 issue, I found a rather glaring error that needs correction. I sent the following text to Ken, but unfortunately never got a response. Hopefully some will see this blog post, and we'll not be seeing the error illustrated here in production code.<br />
<br />
The following piece of code intended to solve the classic shuffle problem is very wrong:<br />
<br />
<span style="font-family: 'courier new', courier;">Dim rnd As new System.Random()<br />
Dim numbers = Enumerable.Range(1, 100), OrderBy(Function() rnd.Next)</span><br />
<br />
The error will manifest by making some shuffles more or less likely than others. It is not an unbiased shuffle.<br />
<br />
The problem lies in the fact that a list of 100 random numbers, independently chosen, are used to produce a random order of the numbers 1 to 100.<br />
<br />
If this code is used as a template for a simulation, the results will be skewed, because not all outcomes of the shuffle are equally likely. If the code is used (with appropriate substitution to a strong pseudo random number generator) for gaming software, either the players or the casino will get better odds than expected.<br />
<br />
This is rather serious, as code snippets from MSDN Magazine are likely to be used in many applications.<br />
<br />
Why is the code wrong?<br />
<br />
Because, when shuffling N numbers in random order, there are N! number of possible shuffles. But, when picking N random numbers independently, from a set of M numbers, there are M**N possible outcomes due the possibility of the same number being drawn more than one time.<br />
<br />
For there to be a possibility of this resulting in all shuffles being equally likely, M**N must be evenly divisible by N!. But this is not possible because in this particular case M, 2**31-1 or 2,147,483,647, is prime! System.Random.Next() will return a value >= 0 and < Int32.MaxValue, so there are Int32.MaxValue possible outcomes, which is our M in this case.<br />
<br />
This is a variation of a classic implementation error of the shuffle algorithm, and I’m afraid that we’ll have to stick with <a href="http://en.wikipedia.org/wiki/Fisher-Yates_shuffle">Fisher-Yates shuffle</a> a while longer. Changing the code to use for example Random.NextDouble() does not remove the problem, it just makes it a bit harder to see. As long as the number of possible outcomes of the random number sequence is larger than the number of possible shuffles, the problem is very likely to be there although the proof will differ from case to case.<br />
<br />
There are many more subtle pitfalls in doing a proper shuffle, using the modulo function to reduce integer valued random number generator outputs or using multiplication and rounding to scale a floating point valued RNG just being two of the more well-known.<br />
<br />
By the way, the actual implementation of System.Random in the .NET Framework is quite questionable in this regard as well. It will not return an unbiased set of random numbers in some of the overloads, and the Random.NextDouble() implementation will in fact only return the same number of possible outcomes as the System.Next(), because it just scales System.Next() with 1.0/Int32.MaxValue.Svantehttp://www.blogger.com/profile/13946027974134920903noreply@blogger.com0tag:blogger.com,1999:blog-7423109771980410273.post-49023569259296137292008-09-12T02:54:00.000-07:002011-09-30T02:55:22.323-07:00How to make a file read in Windows not become a writeA little known, and even less used, feature of all Windows versions from XP and forward is that they support a property called 'Last Access' on all files. On the surface, this seems neat, if not so useful. You can see whenever a file was last accessed using this property.<br />
<br />
But think about it. What does this mean? It means that every time you open a file for reading, Windows needs to write something somewhere on the disk! If you're in the process of enumerating, lets say 500 000 files, this is equal to slow! Does anyone ever use that property? Not that I know of.<br />
<br />
I'm working with file based persistent storage in my solutions, not with a database, so file access is pretty important to me. By disabling this 'feature', I speeded up enumerating the file system by about a factor of 10! Generally speaking, you'll speed up any system with many file accesses by turning this feature off.<br />
<br />
It's really simple too. At a DOS-prompt write:<br />
<br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">fsutil behavior set disablelastaccess 1 </span><br />
<br />
When you're at it, you might also want to do:<br />
<br />
<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">fsutil behavior set disable8dot3 1 </span><br />
<br />
This last command disables generation of 8-dot-3 legacy file names, effectively halfing the size of directories in NTFS, which must be a good thing. Beware that there might be 16-bit software out there which actually need those 8-dot-3 names to find your files...<br />
<div><br />
</div>Svantehttp://www.blogger.com/profile/13946027974134920903noreply@blogger.com0tag:blogger.com,1999:blog-7423109771980410273.post-69800770342642692032008-02-07T02:53:00.000-08:002011-09-30T02:54:10.464-07:00Book Review: Microsoft Windows Internals, Fourth EditionMicrosoft Windows Internals, Fourth Edition, by Mark E. Russinovich and David A. Salomon, Microsoft Press, LOCCN 2004115221<br />
<br />
Many years ago, before the release of NT 3.1, I read a book entitled "Inside Windows NT" by Helen Custer. It was a great book, basically a text-book on operating system theory - as exemplified by Windows NT. It covered the theory of how to implement an operating system kernel, showing how it was done in Windows NT. It did not talk about API's so much as about the data structures and logic behind the scenes and the theory of the basic functions of an operating system such as memory mamangement and the IO system.<br />
<br />
As I'm now getting back into some heavy-duty C++ coding for the Windows environment, I thought this might be a good refresher for me to (re-)learn about internal structures and enable me to find the right places to implement the functionality I need.<br />
<br />
With these expectations I was a bit disappointed by "Windows Internals, Fourth Edition". It's a very different kind of book compared to the original first edition - in fact it's not the fourth edition of "Inside Windows NT" - it's really the second or third edition of "Windows Internals". So, what kind of book is it then?<br />
<br />
"Windows Internals" is a cross between a troubleshooting manual for very advanced system managers, a hackers memoirs, an applied users guide to sysinternals utilities and the documentation Microsoft didn't produce for Windows.<br />
<br />
It's almost like an independent black-box investigators' report of findings after many years of peering into the internals of Windows - from the outside. Instead of describing how Windows is designed from the designers point of view, it describes a process of external discovery based on reverse-engineering and observation. Instead of just describing how it works, the book focuses on "experiments" whereby with the help of a bunch of very nifty utilities from sysinternals you can "see" how it works.<br />
<br />
I find the approach a little strange, I was expecting a more authoritative text, not an experimental guide to 'discovery'. I don't think one should use experimental approaches to learning about a piece of commercial software. Software is an engineering practice - and it should be described, not discovered. It should not be a research project to find out how Windows works - it should be be a matter of reading documentation and backgrounders, which was what I was hoping for when purchasing the book.<br />
<br />
Having read all 870 pages, what did I learn? I learnt that sysinternals (http://technet.microsoft.com/en-us/sysinternals/default.aspx) has some very cool utilities (which I already knew), and I learnt a bit about how they do what they do, and how to use them to inspect the state of a Windows system for troubleshooting purposes. As such, it should really be labelled "The essential sysinternals companion", because that's what it really is. It shows you a zillion ways to use the utilities for troubleshooting. Which is all well and good as it goes and very useful in itself.<br />
<br />
To summarize, this is not really the book to read if you want to get an authoritative reference about the Windows operating system, although you will learn quite a bit along the way - after all, there is quite a bit of information here. If you're a system manager and/or facing extremely complicated troubleshooting scenarios, then this book is indeed for you. Also, if you're a more practical-minded person, and just want to discover the 'secrets' of Windows, you'll find all the tools here. I would have preferred that Microsoft documented things, instead of leaving it for 'discovery' (and then hiring the people doing the discovering if they're to good at it, and then make them write a book about - which is what happend here).Svantehttp://www.blogger.com/profile/13946027974134920903noreply@blogger.com0tag:blogger.com,1999:blog-7423109771980410273.post-59809261053021890792008-01-08T02:47:00.000-08:002011-09-30T02:53:18.766-07:00Lock object sharing with hashesIn web applications we frequently need to serialize access to resources in concurrent applications, since ASP.NET is inherently concurrent. A typical scenario is that we have several loosely connected objects that all apply to the same user, and we need to ensure single-threaded access during a read-read or read-modify-write cycle to get a consistent view or update.<br />
A user is usually represented via some sort of string identifier, perhaps an e-mail address. In C# what we what to do is something like:<br />
<pre><div style="border: #000080 1px solid; color: black; font-family: 'Courier New', Courier, Monospace; font-size: 10pt;"><div style="background-color: white; max-height: 300px; overflow: auto; padding: 2px 5px;"><span style="color: blue;">lock</span> (<span style="color: #010001;">user</span>)
{
<span style="color: green;">// Do something that needs single-thread access</span>
}</div></div></pre>The problem is what are we to use as a lock object? C# can use any object as a lock, but which one to pick? We must ensure that multiple threads will always get the right object instance, regardless of when in the application life-time the need arises, so in effect these objects must live for the life of the application. This can lead to massive memory consumption, assume a system with one million users - after a while we'll have to keep one million objects around probably in hash table indexed by the e-mail string. That can mean some serious memory problems.<br />
<br />
One approach would be to clean up the table when no-one is using a specific lock object, but this is complicated and fraught with it's own threading problems.<br />
<br />
After a few false starts, I came up with the following scheme which has now been tested in the wild and been found quite effective as a trade-off between memory and lock contention.<br />
<br />
In actual fact, there are usually a rather limited number of possible concurrent actions, limited basically by the number of threads that are active. This number is typically 100 per processor in ASP.NET, and in most applications even with many users the number of actual concurrent requests at any given time is even fewer. So, assuming a 100 concurrent threads, and assuming that they will only acquire one user lock (our example here) at a time, we really only need at most 100 lock objects - not a million. But how to do this?<br />
<br />
The algorithm I've come up with is probably not new, but I've not seen it before in the literature nor when actively searching on the web, so it's at least a bit unusual. Here's how it works:<br />
<br />
1. Allocate an array with a fixed number of elements, perhaps twice the number of estimated concurrent accesses.<br />
2. Fill the array with objects to be used as locks.<br />
3. To acquire a lock for a given user, generate an index into the lock object array by taking a hash of the user identifier typically with the GetHashCode() method and then take that module the number of lock objects. This is the index into the lock table, use the indexed object and lock.<br />
<br />
At best, you'll get a free lock and acquire the lock. <br />
<br />
At second best, another thread is actually holding the lock for the same user, and your thread is put on hold as it should be.<br />
<br />
At worst, another thread is actually holding the lock but for a different user that happens to use the same lock when calculating the index via the hash modulo the lock array size. By having good hash algorithms and an appropriate number of locks in relation to the number of concurrent accesses, this should be a very infrequent occurrence. But even if it happens, nothing bad happens except that your thread will have to wait a little longer than was absolutely necessary.<br />
<br />
This simple algorithm will require a fixed number of locks in relation to the level of concurrency instead of the number of potential objects that require locking, and at a very low cost. Sample code follows:<br />
<pre><div style="border: #000080 1px solid; color: black; font-family: 'Courier New', Courier, Monospace; font-size: 10pt;"><div style="background-color: white; max-height: 300px; overflow: auto; padding: 2px 5px;"><span style="color: blue;">public</span> <span style="color: blue;">class</span> <span style="color: #2b91af;">LockObjects</span>
{
<span style="color: blue;">private</span> <span style="color: blue;">object</span>[] <span style="color: #010001;">_lockObjects</span>;
<span style="color: blue;">public</span> <span style="color: #010001;">LockObjects</span>(<span style="color: blue;">int</span> <span style="color: #010001;">numberOfLockObjects</span>)
{
<span style="color: #010001;">_lockObjects</span> = <span style="color: #010001;">GetInitialLockObjects</span>(<span style="color: #010001;">numberOfLockObjects</span>);
}
<span style="color: blue;">public</span> <span style="color: #010001;">LockObjects</span>()
: <span style="color: blue;">this</span>(20)
{
}
<span style="color: blue;">private</span> <span style="color: blue;">object</span>[] <span style="color: #010001;">GetInitialLockObjects</span>(<span style="color: blue;">int</span> <span style="color: #010001;">numberOfLockObjects</span>)
{
<span style="color: blue;">object</span>[] <span style="color: #010001;">lockObjects</span> = <span style="color: blue;">new</span> <span style="color: blue;">object</span>[<span style="color: #010001;">numberOfLockObjects</span>];
<span style="color: blue;">for</span> (<span style="color: blue;">int</span> <span style="color: #010001;">i</span> = 0; <span style="color: #010001;">i</span> < <span style="color: #010001;">lockObjects</span>.<span style="color: #010001;">Length</span>; ++<span style="color: #010001;">i</span>)
{
<span style="color: #010001;">lockObjects</span>[<span style="color: #010001;">i</span>] = <span style="color: blue;">new</span> <span style="color: blue;">object</span>();
}
<span style="color: blue;">return</span> <span style="color: #010001;">lockObjects</span>;
}
<span style="color: blue;">public</span> <span style="color: blue;">virtual</span> <span style="color: blue;">object</span> <span style="color: #010001;">GetLockObject</span>(<span style="color: blue;">params</span> <span style="color: blue;">string</span>[] <span style="color: #010001;">keys</span>)
{
<span style="color: blue;">int</span> <span style="color: #010001;">lockHash</span> = 0;
<span style="color: blue;">foreach</span> (<span style="color: blue;">string</span> <span style="color: #010001;">key</span> <span style="color: blue;">in</span> <span style="color: #010001;">keys</span>)
{
<span style="color: #010001;">lockHash</span> += <span style="color: #010001;">key</span>.<span style="color: #010001;">ToLowerInvariant</span>().<span style="color: #010001;">GetHashCode</span>();
}
<span style="color: #010001;">lockHash</span> = <span style="color: #010001;">Math</span>.<span style="color: #010001;">Abs</span>(<span style="color: #010001;">lockHash</span>) % <span style="color: #010001;">_lockObjects</span>.<span style="color: #010001;">Length</span>;
<span style="color: blue;">return</span> <span style="color: #010001;">_lockObjects</span>[<span style="color: #010001;">lockHash</span>];
}
}</div></div></pre>Svantehttp://www.blogger.com/profile/13946027974134920903noreply@blogger.com0tag:blogger.com,1999:blog-7423109771980410273.post-48729438448426759252007-11-26T02:46:00.000-08:002011-09-30T02:47:45.478-07:00Book Review: XSLT 2.0 Programmer's Reference Third EditionXSLT 2.0 Programmer's Reference Third Edition, by Michael Kay, Wiley Publishing, Inc., ISBN 0-7645-6909-0<br />
<br />
XSLT, or XSL, is a subject that I'm no expert in but I've come across it from time to time and generally have had a hard time to really grasp the how and the why of it. In most cases I can program, or at least tweak, just about anything with very little introduction. Fixing and tweaking the XSLT stylesheets that I've come upon has been a tougher experience where I've felt myself reduced to guesswork and magic. That's not a feeling I like, so I decided to do some background studying.<br />
<br />
A Programmer's Reference is perhaps not the first choice as an introduction to a subject, but in this case it was hard to find just where to start, and I felt that I was experienced enough to go for some core literature from the start, which also would have the benefit of being useful in a real situation as reference literature.<br />
<br />
Since I'm a newcomer to XSLT this review will have to be both about the book as such, and also about the subject matter. Let's start with the book.<br />
<br />
Michael Kay is certainly an authority, being the editor of the XSLT 2.0 Working Group. The book is also authoritative and extremely carefully written with an extraordinary focus on details. I did find a few typos, errors and editorials mistakes but taking the amount of text into account it's still a very, very good piece of work.<br />
<br />
This book is not written to be read cover to cover, which I did, but it's still not a bad way to get a thorough introduction to XLST. Be prepared for quite a few hours though, I spent about 20 hours reading this book. It's entitled XSLT 2.0, and was written before XSLT was actually approved as an official recommendation which it was on 23 January 2007. I've not checked, but there are sure to be some minor differences between the final recommendation and the drafts upon which the book was written. In consequence being such a recent standard, there are very few XLST 2.0 compliant implementations in existance, so XSLT 1.0 is still very much in use. The book is careful to keep track of differences and changes, and should work well for XLST 1.0 use as well.<br />
<br />
It's very heavy reading indeed, but if you only want to get one book about XSLT 2.0 this is very probably the one to get.<br />
<br />
The real question though that I must raise after reading this and getting a good feel for XSLT is: Do you want to get any book about XLST at all?<br />
<br />
XSLT is about XML transformation, or actually transformation in general. It doesn't really have to be from or to XML, it can be from plain text to HTML or any number of other combinations depending on the requirements and capabilities of parsers and processors available. This is obviously extremely useful - to be able to massage data from and to different forms, and frequently used in one-off applications and in various integration projects. The target use of XSLT is also to fit in along CSS 2.0 as a way to perform formatting for presentation that is not possible with CSS - that's why it's called XML Stylesheet Transformations.<br />
<br />
So XSLT certainly address an important area. However, sadly, I must conclude that it's not a very good tool in my opinion. Even if supplemented with good development environments with color coded and syntax checking editors, it's still simply not very human eye-friendly. Too many angle brackets and colons one might say. Syntax does matter! The real problem though is that it's a functional programming language, not a procedural language, and this simply does not lend itself to performing complex tasks in the real world.<br />
<br />
Functional languages focus on defining the program in terms of functions that are state-less and without variables. Everything is defined as functions without side-effects, that is to say, each call to a function with the same parameters will always return the same result. Iteration is replaced with recursion - even when iteration is the natural way to address a problem, because an iterator must be updated for each round in a loop, and you can't do that. This means that while anything can be programmed in a functional language, it must frequently be done in ways that are not well known to the majority of developers.<br />
<br />
There's a reason why functional languages like Lisp, ML and Scheme have not become commerically successful, although loved by the academic community for decades. Basically I think it's a question about maintenance and complexity. In the real world of commercial programming, the systems must be maintained for decades by perhaps 1000's of different developers over the years. This has always been an uphill task, but no functional language with the possible exception of Erlang has succeeded in combining expressive power, with robustness, documentability and maintainability.<br />
<br />
XLST is certainly expressive, but I categorize it as being of the class of write-once and write-only languages. Integrated with XPath 2.0 it's possible to write programs that are so smart, that even the author will have trouble understanding them the next day.<br />
<br />
There's nothing wrong with the basic concept of defining a standard way of transforming documents between different representations, and making it possible to choose between doing the processing in the browser or on the server. It's neat and it's cool. However, doing anything but non-trivial transformations is a maintenance nightmare. That the functional programming model is very little known among main-stream developers does not make it any better.<br />
<br />
Somehow, it feels like XSLT is 90% geared towards the internal needs of the W3C - it's used extensively to format and publish the specifications for the various specifications published by the W3C. But, this actually means as far as I can judge, that the specifiations are written in raw XML using plain text editors and doing the markup manually - something that won't exactly work for any other organization.<br />
<br />
So, unfortunately, in the end I feel that XSLT 2.0 is a technology that's elegant, but will never be used on a wider scale. However, if you do have the situation of having many, many documents in some kind of structured format (not necessarily XML surprisingly enough) and want to transform them to XML or XML-like format like HTML, then XSLT may well be just what you need. Be prepared for a very high entrance cost though, and rest assured that as author of the stylesheets you'll have a very high level of job-security.<br />
<br />
There are also serious performance issues with XSLT, due to the functional style of programming, compilers and optimizers have a hard time generating decent code for the underlying procedural architecture of our computers. In theory, functional programming could come into it's own performance wise as multi-core architectures become more common because it does make it easier to realize parallell computation, but today other problems overshadow, and in most cases I'm fairly sure that performance will in many cases be unacceptable with XSLT.<br />
<br />
So, to summarize: If you want to learn and use XSLT 1.0 or 2.0, this book is probably the one to get, but you should not assume that XSLT is a silver bullet for XML transformation, there are many caveats.<br />
<div><br />
</div>Svantehttp://www.blogger.com/profile/13946027974134920903noreply@blogger.com0