Monday, November 25, 2024

Installing Ubuntu 24.04.1 on Windows 11 Hyper-V

Ubuntu 24.04.1 on Hyper-V

Amazingly enough, it turns out to be non-trival to get Ubuntu 24.04.1 running on a 4K high resolution display on Hyper-V. During the process I experienced crashes during installation, disk read errors, hangs, black screens during boot and login - and difficulties achieving a connection with full resolution on my 4K display.

Since our file encryption app Xecrets Ez is built for Linux in addition to Windows and macOS, we need an environment where we can test and develop it for Linux, and the choice fell on Ubuntu as it's touted as perhaps the most common desktop environment for normal users.

In some configurations, the high resolution was just not available, specifically this was where the linux-azure cloud package was installed. Apparently this only supports lower resolutions out of the box. I'm sure it's possible to tweak this, but as everything else Linux it's practically impossible to find useful documentation.

When I finally got an image working reasonably well, it turned out that the screen scaling was not saved between reboots. I have no idea why, but I did find a remedy.

What follows is a basic step-by-step action list that at least got things working for me with no hitches during the process. Your mileage may vary, but there should be some useful hints here.

Here goes.

Download a .iso image

Get the current image from https://ubuntu.com/download/desktop

"The latest LTS version of Ubuntu, for desktop PCs and laptops."

Download 24.04.1 LTS (At the time of writing this was current)

This downloads for example: ubuntu-24.04.1-desktop-amd64.iso

When downloaded, startup Hyper-V Manager, and under the Actions pane select:

Quick Create...

Select the ISO image:

Local Installation Source -> ubuntu-24.04.1-desktop-amd64.iso

Uncheck "This virtual machine will run on Windows (Enables Windows Secure Boot)"

Under More options:

Name: "Ubuntu 24.04 LTS" (...or whatever you want to name it)

Click: Create Virtual Machine

Edit settings: Uncheck Use Automatic Checkpoints (may be the cause of black screens and hangs), Integration services check "Guest services" ("Ensuring it then says All services offered")

In the window "Virtual machine created successfully", click "Connect",  and then start the virtual machine.

"Try or Install Ubuntu", accept all default choices except language and keyboard according to your situation.

Reboot as instructed, complete installation

In Ubuntu: Software Updater - install all updates, "Restart now"

Open terminal:

# This was part of the hard-to-find secret sauce...
sudo apt update
sudo apt install linux-image-extra-virtual

sudo nano /etc/default/grub

Edit

# Use your native resolution....
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash video=hyperv_fb:3840x2160"

Save

sudo update-grub

Shutdown

In the Windows host, open powershell in administrator mode and using your VM name and your native resolution as above:

set-vmvideo -vmname "Ubuntu 24.04 LTS" -horizontalresolution:3840 -verticalresolution:2160 -resolutiontype single

Startup the Ubuntu virtual machine again and log in.

If scaling is needed, set scaling in settings to for example 200%, and also to make it "stick" after reboot
open terminal and do:

# This was also non-trivial to figure out...
gsettings set org.gnome.desktop.interface scaling-factor 2

Reboot and confirm that it all works as expected and no settings have reverted.

Create a checkpoint

Now is probably a good time to shutdown again and create a manual checkpoint.

Credits

A lot of the above is found in various parts in this thread: https://superuser.com/questions/1660150/change-screen-resolution-of-ubuntu-vm-in-hyper-v . But it's about various versions and distributions, and there are also some digressions. No mention of crashes and black screens, which did appear to be reduced (but still happen) once automatic checkpoints were disabled. (Dynamic memory may be another culprit here, still trying to verify this.) This post is specifically about Windows 11, Hyper-V and Ubuntu 24.04.1 .

Other notes

Slightly off-topic, but worth mentioning if the goal is to setup a development and/or test environment under Ubuntu 24.04 in a Hyper-V virtual machine:
  • Don't install the snap version of VS Code. It messes up the environment for .NET framework apps when debugging, moving the location of special folders into the 'snap' folder etc. Download the binary and install it.
  • Don't install the snap version of the .NET Framework. If you require higher bands than the .1xx band (which is the only supported by the Canonical repository), then download the binary and install it manually
  • If you want to enable enhanced mode when connecting to the VM, thus getting copy/paste to work as well as higher screen resolutions, you can actually forego most of the above, as you need to run xrdp (or equivalent) for this. There are quite a few how-to sites available, some with scripts. I've found that this script actually works fine for 24.04 also. It's referenced on this site. It turns out you may actually need to really set the transport type, even with Windows 11 and generation 2 VMs like this: Set-VM -VMName <Name-of-VM> -EnhancedSessionTransportType HvSocket in a PowerShell window as administrator.

Wednesday, July 24, 2024

How avoid being asked for a passphrase for SSH local signing of git commits on Windows 11?

This is not about how to authenticate to github or any other git repository. That works fine, see my other blog post about that. It's a little dated but mainly it's still relevant.

I want sign my commits to for example https://github.com/xecrets/xecrets-cli, and I'd like to do this using an SSH key - not a GPG-key. I already have a SSH key, and signing works fine, and commits show up as 'Verified' on github.

Using commonly found instructions on the Internet, after creating a SSH key, and uploading it to github (you need to do it once again specifically to use as a signing key, even if it's the same one you use for authentication) do the following:

git config --global user.signingkey /PATH/TO/.SSH/KEY.PUB
git config --global commit.gpgsign true
git config --global gpg.format ssh

However I don't want have to type the passphrase to the private key file every time, and this was much harder to find.

As it turns out, git will use it's own ssh tool for signatures, unless told otherwise. On Windows 11, this does the trick:

git config --global gpg.ssh.program "C:\Windows\System32\OpenSSH\ssh-keygen.exe"

To be able to verify commits locally, you also need to create a file with allowed signers. It can be named anything and placed anywhere, but the following seems like a good place.

git config --global gpg.ssh.allowedSignersFile c:/Users/[YourUserName]/.ssh/allowed_signers

The allowed_signers file itself, contains of one or more lines with a list of comma-separated principals, i.e. typically e-mail addresses and a public key, i.e. something like:

you@somewhere.com ssh-ed25519 AAAA...

Thanks to the StackOverflow community and specifically this answer found there.

Tuesday, July 23, 2024

What everyone is missing from the CrowdStrike Falcon incident

Introduction

25 years ago, I said (really, I did!) that automatic software updates pose a greater risk than malware (ok, at that time we really only had viruses).

Many incidents since than has proven this right, but none more so than the CrowdStrike Falcon Blue Screen of Death (BSOD) incident on July 19, 2024.

Since as usual the company won't release any detailed information on what really happened, we'll have to rely on other sources. I found that Dave Plummer's account on YouTube was very good, and trustworthy.

What really happened?

In summary, after looking at crash dumps and based on his own knowledge of how the Windows kernel works, Dave Plummer explains that what happened was probably the following.

CrowdStrike has a need to check not only file signatures, but behavior in general of software on the system. To do this they have created a device driver, that doesn't actually interact with any hardware, but has achieved WHQL release signature. This means that it's very likely reliable, and certifiably from a known source. They also flagged it as boot-start driver, meaning it's really needed to boot the system. This is of course to make sure it really does get loaded, which is great - as long as it doesn't crash the system. Which, unfortunately, it did.

So how did this signed, certified, driver crash the system? In, short, by CrowdStrike hacking the protocol and Microsoft allowing it to happen. We don't know exactly what they do, but the challenge they have is that they feel the need to frequently update what behavior to watch for, and in order to do so, they essentially need to be able to update program logic frequently. They could do so by building a new driver with the new logic, and getting it WHQL signed. Two problems for them here. First, it takes time to get a driver certified. Secondly, updating a driver is not done on-the-fly, so a reboot is required.

The "solution"? To provide the driver with instructions that define the logic to execute, in other words, one form or another of P-code - or even machine code! Thus, they can keep the same driver, but update the logic by conceptually having the driver "call out" to external logic. This is what CrowdStrike misleadingly calls a "content update". I would call it a code update.

Essentially, by allowing the driver to read and perform instructions based on external "content", regardless of what you call it, they effectively bypass the whole point of WHQL certification of kernel mode drivers.

In the end though, it appears that it's just a trivial embarrassing bug in the driver that caused the crash itself, in turn triggered by some equally trivial embarrassing process error during  CrowdStrike deployment of "content updates".

The "content update" that CrowdStrike sent out was full of zeroes. Nothing else. Obviously not the intended content. And this simple data caused the driver to crash, in turn causing the system to crash since it doesn't have much choice in this situation.

As the driver is also marked as a boot-start driver, it'll always get loaded on reboot, even if it crashed last time. This is what makes it so time-consuming to fix, the driver can't just be flagged as faulty.

This is just the mark of plain really really bad software. This software runs in kernel mode with full privileges and can do anything. And giving it a bunch of zeroes as input crashes it. Just. Not. Simply. Good. Enough.

Imagine what a creative hacker could do with that, inserting something more malicious than zeroes? How does full system control at kernel mode sound? This is of course speculation, but it's highly likely that it's possible with the current version of the driver.

What everyone seems to be missing...

In all the aftermath and all the comments, there are two glaring omissions in the analyses according to me.

The big failure 

How is it possible that someone sends out an update affecting the behavior of kernel mode code, all at once, simultaneously, to millions and millions of systems around the whole globe at once!?

I've participated in many roll outs, and never would I allow a big-bang roll out like this. CrowdStrike should be charged with negligence for having this type of process. It's just plain irresponsible.

The only reasonable way to do global roll outs, especially for kernel mode code, is to stagger it. Start with 10 systems. What happens? Do they respond properly? Then a 100, then a 1000, etc. And since it's a global roll out, take time zones into consideration! Now we saw how the problems rolled around the globe, starting in Australia, with reports of downed systems as the working day started there.

I understand they want to get updates out at speed, but this is ridiculous. In the end, this caused way more damage than any possible threat they could have stopped by this procedure.

The smaller failure


The smaller failure, but still not mentioned by analysts is the fact that this kernel mode driver that accepts external input apparently does no input validation at all worth the name. Perhaps the content is digitally signed (I don't know, but no-one has mentioned it either), but even if it is, this type of software must assume that it can't trust external content. According to Dave Plummer, the "content update" in question was all zeroes, so at least no embedded digital signature, apparently.

If it's not - all an attacker would have to do is to deposit a file in  %WINDIR%\System32\drivers\CrowdStrike with a name such as C-00000291*.sys containing zeros - and the system becomes unbootable without manual intervention!

Once again, this is just not good enough, and should be cause for some lawsuits in the coming months. A manufacturer of critical security software executing in kernel mode should not be allowed to sell code this bad without financial consequences.

Wednesday, May 15, 2024

Localizing a .NET console or desktop application

Having gone through countless different solutions to the ever present problem of localizing .NET applications, I think I've finally found a good solution.

The main driver this time was the need to provide a convenient way to translate texts used in the Xecrets Ez desktop app for easier file encryption. This is developed with .NET 8 and Avalonia UI, which apart from C# uses it's own XAML format, AXAML with a need to reference translatable texts there too.

.NET has of course since forever supported localization, sort of, through the resource compiler, .resx files and satellite assemblies.

The problem has always been, and remains to this day, how do I get the actual translations done? Translation is often not done by the original developer, and often by people who definitely don't want to fire up Visual Studio and the resource editor. Never mind that it doesn't even have a proper translation view, with a source language and target language in the same view.

There have been various attempts to make resx-editors for translators, but they were typically Windows-only desktop applications. Translators often like apples.

Since maybe 10 years the number of web based translation services have been increasing year by year, but the support for the resx file format is often spotty or limited, or non-existing.

I've long been looking at somehow leveraging the huge amount of services and software that use the .po format for texts, all based on the gettext utilities and specifications. However, the old C-style extraction of text from the source has never appealed to me, and also requires a lot of parsing of C#/.NET code that is just bound to go wrong. Not to mention various XAML dialects that might also need parsing for the gettext extraction process to work.

Then when looking for a library to work with .po files, thinking to roll my own somehow, deep inside a related sample project I found POTools!

This little hidden gem can do a gettext-style extraction, but from a .resx file! Now things started to fall into place.

Doing the extraction, or discovery if you like, of translatable strings from a structured source like a .resx file is easy, solid and dependable - and fits the general development model of .NET better than wrapping all texts in T() method calls, similar to the original C-style gettext mode of operations. It also enables easy re-use of strings in multiple places without any problems.

With POTools I could now easily extract a template .pot file from the already existing Resources.resx file, and start looking for a suitable translation tool. There are many such out there, but for my purposes I went with Loco, which offers just the right set of capabilities at the right price. The real point is that once you go with the .po/.pot format for your translations, there's a huge eco system out there to help you out!

As a minor issue, I decided to create a separate resource file, InvariantResources.resx, to hold the texts that by definition should never be translated, but still are suitable to keep in a resource file.

So, now I have a bunch of translated .po files, as well as the original .resx file with the original texts. How to tie all this together?

Using another part of the .NET building blocks for localization, IStringLocalizer offers a nice interface that fits well. Using sample code from the Karambolo.PO library that can read .po files, it was trivial to create a POStringLocalizer implementation.

This is now available as a nuget package for your convenience! There's sample code there on how to wire it all up, as well as in the github repository.

So, we now can use the Portable Object file format and the gettext universe of tools and users, while still maintaining a good fit with .NET code and libraries.

Thanks go to the author of Karambolo.PO library and POTools!

Monday, May 10, 2021

ConcurrentDictionary.GetOrAdd() may not work as you think!

It's concurrent - not lazy

We had a problem with random crashes in a customers web. It was not catastrophic, it would get going again but still it was there in the logs and we want every page visit to work.

A bit of investigation with an IL-code decompiler followed, which by the way is absolutely the best thing since sliced bread! I found code equivalent to the following in a third party vendors product:
private readonly ConcurrentDictionary<string, Type> _dictionary
    = new ConcurrentDictionary<string, Type>();

private readonly object _lock = new object();

public Type GetType(string typename)
{
    return _dictionary.GetOrAdd(typename,
        (t) =>
        {
            lock (_lock)
            {
                return DynamicallyGenerateType(t);
            }
        });
}
The thing here is that DynamicallyGenerateType() can only be called once per typename, since what it does is emit code into an assembly, and if you do that twice you get a System.ArgumentException: Duplicate type name within an assembly .

No-one wants that, so the author thought that it would be cool to use a ConcurrentDictionary<TKey, TValue> since the GetOrAdd() method guarantees that it will get an existing value from the dictionary, or add a new one using the provided value factory and then return the value.

It looks good, reasonable, and works almost all of the time. Key word here is: almost.

What the concurrent dictionary does is it uses efficient and light-weight locking to ensure that the dictionary can be concurrently accessed and updated in a consistent and thread safe manner.

It does not guarantee a single one-time lazy call to the value factory used to add a value if it's not in the dictionary.

Sometimes, under heavy initial load, the value factory passed as the second argument to GetOrAdd() will be called twice (or more). What the concurrent dictionary guarantees is that the value for the provided key will only be set once, but the value factory may be called multiple times with the result thrown away for all calls except the race-winning one!

This is clearly documented but it's easy to miss. This is not a case of the implementation not being thread safe, as is stated in some places. It's very thread safe. But the value factory may indeed be called multiple times!

To fix it, add a Lazy-layer on top, because Lazy<T> by default does guarantee that it's value factory is only called once!
private readonly ConcurrentDictionary<string, Lazy<Type>> _dictionary
    = new ConcurrentDictionary<string, Lazy<Type>>();

private readonly object _lock = new object();

public Type AddType(string typename)
{
    return _dictionary.GetOrAdd(typename,
        (t) =>
        {
            lock (_lock)
            {
                return DynamicallyGenerateTypeLazy(t);
            }
        }).Value;
}

Now, although we may instantiate more than one Lazy<Type> instance, and then throw it away if we loose the GetOrAdd-race, that's a minor problem and it works as it should.

Please note that this is only true as long as you use the default LazyThreadSafetyMode.ExecutionAndPublication mode.

The additional lock may look confusing, but it was in the original code and makes sense in this context, because while the concurrent dictionary and lazy layer guarantees that only one call per value of  'typename' is made to DynamicallyGenerateTypeLazy(), it does not guarantee that multiple threads do not call it it concurrently with different type names and this may wreak havoc with the shared assembly that the code is generated to.

Tuesday, May 4, 2021

SSH with TortoiseGit and Bitbucket or GitHub on Windows 10

Memo to self

It's always complicated to remember the steps necessary to get SSH working, and there are some idiosyncrasies as well. This guide may help you, I'm sure it'll help me the next time I need to do this myself.

Password-based login with HTTPS is starting to be obsolete, and it's less secure. Also with the nice SSH agent in Windows 10, you only need to enter the password once - ever.

Generate a key pair

Open a command prompt and run the ssh-keygen command, to generate a private and a public key file. Accept the defaults.

Enter a strong password for the private key file when asked, and ensure that you store it securely in your password manager.

This should create files in %USERPROFILE%\.ssh named id_rsa (the private key file) and id_rsa.pub (the public key file).


Enable and start the OpenSSH Authentication Agent Service

Nowadays it is shipped with Windows 10, but it's not enabled by default. So start your Services gadget and ensure the service is set to startup automatically, and it's running.


Add the private key to the SSH Authentication Agent

In the command prompt, type ssh-add . It should select the default ssh key id_rsa, and ask for the password you entered previously.

(If you get the error message "Can't add keys to ssh-agent, communication with agent failed", there seems to be an issue with certain Windows distributions. For whatever reasons, the following workaround appears to work. Open a new command prompt but elevated with Run As Administrator. Then type:

    sc.exe create sshd binPath=C:\Windows\System32\OpenSSH\ssh.exe .

Then exit the elevated command prompt and try again to do the ssh-add in your normal command prompt.)


Save the public key to Bitbucket...

Open the file %USERPROFILE%\.ssh\ids_rsa.pub in Notepad, Select All (Ctrl-A) and Copy (Ctrl-C). Paste it into this dialog in Bitbucket, Your Profile and Settings -> Personal Settings -> SSH keys -> Add key:


The Label is just anything that makes it easy for you to remember what key it is. Perhaps todays date, and the name of the computer you have the private key on can be a good start. Or just "My public SSH key" works too.


...and/or save the public key to GitHub

Go to Settings -> SSH keys -> New SSH key


The Title has the same meaning as Label for Bitbucket, see above.


Remove any Credential Helpers

Git credential helpers may conflict with the use of SSH keys, and there is no need for them anyway, so remove them from TortoiseGit in the Settings -> Git -> Credential menu so it looks like this:



Tell Git where to find SSH

Set the environment variable GIT_SSH to C:\Windows\System32\OpenSSH\ssh.exe . Right-click "This PC" -> Properties -> Advanced system settings -> Environment Variables... -> New...


Restart explorer (Task Manager -> Details -> explorer.exe right-click -> End Task, then File -> Run new Task -> Open: explorer -> OK) , or logout and login, or restart your computer.


Update origin URL to use SSH

Finally, update your repos origin to use SSH instead of HTTPS. The easiest way is to copy the part after 'git clone' in the Bitbucket "Clone" feature.


Click the "Clone" button, Select SSH and then the URL part of the git clone command suggested, and paste it in TortoiseGit Remote origin for the repo:



Done! Now you can enjoy password-less use of git with Bitbucket and/or GitHub.

Monday, May 3, 2021

Thinking about cacheability

Performance, Cacheability and Thinking Ahead

Although it's very true that "premature optimization is the root of all evil" (Hoare/Knuth), this should never be taken to mean that writing inefficient code is a good thing. Nor does it mean that we should ignore the possible need for future optimizations. As all things, it's a question of balance.

When designing APIs for example, just how you define them may impact even the possibility for future optimizations, specifically caching. Although caching is no silver bullet, often it is the single most effective measure to increase performance so it's definitively a very important tool in the optimization toolbox.

Let's imagine a search API, that searches for locations of gas stations within a given distance from a given point, and also filters on the brand of the station.

(In a real life application, the actual search terms will of course be more complex, so let's assume that using an underlying search service really is relevant which is perhaps not necessarily the case in this literal example. Also, don't worry about the use of static methods and classes in the sample code, it's just for show.)

[Route("v1/[controller]")]
public class GasStationV1Controller : ControllerBase
{
    [HttpGet]
    public IEnumerable<GasStation> Search(string brand, double lat, double lon, double distance)
    {
        return SearchService.IndexSearch(brand, lat, lon, distance);
    }
}

We're exposing a REST API that delegates the real work to a fictitious search service, accessing an index and providing results based on the search parameters, possibly adding relevance, sponsoring and other soft parameters to the search. That's not important here.

What is important is that we've decided to let the search index handle the geo location part of the search as well, so we're indexing locations and letting the search index handle distance and nearness calculations, which on the surface of things appear to make sense. The less we do in our own code, and the more we can delegate, the better!

But, unfortunately it turns out this is a little too slow, and also we're overloading the back end search service which has a rate limiting function as well as a per-call pricing schedule so it's expensive too. What to do? The obvious thing is to cache. Said and done.

[Route("v2/[controller]")]
public class GasStationV2Controller : ControllerBase
{
    [HttpGet]
    public IEnumerable<GasStation> CachedSearch(string brand, double lat, double lon, double distance)
    {
        string cacheKey = $"{brand}-{lat}-{lon}-{distance}";
        return ObjectCache.GetOrAdd(cacheKey, () => SearchService.IndexSearch(brand, lat, lon, distance));
    }
}

Now we're using our awesome ObjectCache to either get it from the cache, or if need be, call the back end service. All set, right?

Not quite.

The location that we're looking to find near matches to is essentially where the user is, which means there'll be quite a bit of variation of the search parameters. In fact, there is very little chance that anything in the cache will ever be re-used. The net effect of our caching layer is just to fill server memory. We're not reducing the back end search service load, and we're not speeding anything up for anyone.

The thing to consider here is that when we're designing an API that has the potential of being a bottleneck in one way or another, we should consider to make it possible to add a caching layer even if we don't to begin with (remember that thing about premature optimizations).

Avoid designing low-level API:s that take essentially open-ended parameters, i.e. parameters that have effectively infinite variation, and where very seldom a set of parameters is used twice. It's not always possible, it depends on the situation, but consider it.

As it turns out, our only option was to redesign what we use the search index for, and move some functionality into our own application. This is often a memory/time tradeoff, but in this case, keeping up to a 100 000 gas stations in memory is not a problem, and filtering them in memory in the web server is an excellent option.

This is how it looks like now, and although we're obliged to do some more work on our own, we'll be fast and we're offloading the limited and expensive search service quite a bit.

[Route("v3/[controller]")]
public class GasStationV3Controller : ControllerBase
{
    [HttpGet]
    public IEnumerable<GasStation> Search(string brand, double lat, double lon, double distance)
    {
        string cacheKey = $"{brand}";
        return ObjectCache.GetOrAdd(cacheKey, () => SearchService.IndexSearch(brand))
            .Select(g => g.SetDistance(lat, lon))
            .Where(g => g.Distance <= distance)
            .OrderBy(g => g.Distance);
    }
}

Now we have more manageable set of search parameters to cache, and we can still serve the user rapidly and without overloading the search service and or budget.

Taking this a step further, we'd consider moving this logic to the client, if it's reasonable since then we can even let the HTTP response become cacheable, which can further increase the scalability and speed for the users.

In the end performance is always about compromises, but the lesson learned here is that even if we don't (think) we need optimization and caching at the start, we should at least consider leaving the path for it open.