<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>Vorakl's notes</title><link href="https://vorakl.com/" rel="alternate"></link><link href="https://vorakl.com/atom.xml" rel="self"></link><id>https://vorakl.com/</id><updated>2024-05-19T20:32:42-07:00</updated><entry><title>How to destroy your OS with tar</title><link href="https://vorakl.com/articles/tar-curdir/" rel="alternate"></link><published>2024-05-19T20:32:42-07:00</published><updated>2024-05-19T20:32:42-07:00</updated><author><name>vorakl</name></author><id>tag:vorakl.com,2024-05-19:/articles/tar-curdir/</id><summary type="html">&lt;p class="first last"&gt;A dangerous case of tar archive unpacking&lt;/p&gt;
</summary><content type="html">&lt;p&gt;This is a short story about how dangerous a trivial tar unpacking might be, and what can be done to minimize the risk or completely avoid it.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="the-mistake"&gt;
&lt;h2&gt;The mistake&lt;/h2&gt;
&lt;p&gt;Recently, I was practicing an installation of &lt;a class="reference external" href="https://voidlinux.org/"&gt;Void Linux&lt;/a&gt; via chroot &lt;a class="reference external" href="https://docs.voidlinux.org/installation/guides/chroot.html"&gt;using XBPS method&lt;/a&gt;. I needed the &lt;a class="reference external" href="https://docs.voidlinux.org/xbps/index.html"&gt;XBPS Package Manager&lt;/a&gt; installed on my Fedora Linux host to prepare Void Linux's base system. One of the options is to download an archive of statically built tools from the official repository. I chose &lt;a class="reference external" href="https://repo-default.voidlinux.org/static/xbps-static-latest.x86_64-musl.tar.xz"&gt;https://repo-default.voidlinux.org/static/xbps-static-latest.x86_64-musl.tar.xz&lt;/a&gt;&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;$ tar -tf xbps-static-latest.x86_64-musl.tar.xz &lt;span class="p"&gt;|&lt;/span&gt; head

./
./usr/
./usr/bin/
./usr/bin/xbps-uunshare
./usr/bin/xbps-uhelper
./usr/bin/xbps-uchroot
./usr/bin/xbps-rindex
./usr/bin/xbps-remove
./usr/bin/xbps-reconfigure
./usr/bin/xbps-query
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;I got so used to having 0:0 as a user:group on all files in archives that I didn't even check their actual permissions and owners. I just looked at the directory structure and noticed that all the executables were conveniently located under the relative path &lt;em&gt;&amp;quot;./usr/bin/&amp;quot;&lt;/em&gt;. I quickly decided to just extract them to my root directory, so they would be immediately available in my $PATH. This was a big mistake, because if I checked them, I'd see non-standard permissions (700) of a current directory &amp;quot;.&amp;quot; and non-standard user:group of the entire archive content:&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;$ tar -tvf xbps-static-latest.x86_64-musl.tar.xz &lt;span class="p"&gt;|&lt;/span&gt; head

drwx------ duncaen/netusers  &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;2023&lt;/span&gt;-09-18 &lt;span class="m"&gt;06&lt;/span&gt;:37 ./
drwxr-xr-x duncaen/netusers  &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;2023&lt;/span&gt;-09-18 &lt;span class="m"&gt;06&lt;/span&gt;:37 ./usr/
drwxr-xr-x duncaen/netusers  &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;2023&lt;/span&gt;-09-18 &lt;span class="m"&gt;06&lt;/span&gt;:37 ./usr/bin/
lrwxrwxrwx duncaen/netusers  &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;2023&lt;/span&gt;-09-18 &lt;span class="m"&gt;06&lt;/span&gt;:37 ./usr/bin/xbps-uunshare -&amp;gt; xbps-uunshare.static
lrwxrwxrwx duncaen/netusers  &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;2023&lt;/span&gt;-09-18 &lt;span class="m"&gt;06&lt;/span&gt;:37 ./usr/bin/xbps-uhelper -&amp;gt; xbps-uhelper.static
lrwxrwxrwx duncaen/netusers  &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;2023&lt;/span&gt;-09-18 &lt;span class="m"&gt;06&lt;/span&gt;:37 ./usr/bin/xbps-uchroot -&amp;gt; xbps-uchroot.static
lrwxrwxrwx duncaen/netusers  &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;2023&lt;/span&gt;-09-18 &lt;span class="m"&gt;06&lt;/span&gt;:37 ./usr/bin/xbps-rindex -&amp;gt; xbps-rindex.static
lrwxrwxrwx duncaen/netusers  &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;2023&lt;/span&gt;-09-18 &lt;span class="m"&gt;06&lt;/span&gt;:37 ./usr/bin/xbps-remove -&amp;gt; xbps-remove.static
lrwxrwxrwx duncaen/netusers  &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;2023&lt;/span&gt;-09-18 &lt;span class="m"&gt;06&lt;/span&gt;:37 ./usr/bin/xbps-reconfigure -&amp;gt; xbps-reconfigure.static
lrwxrwxrwx duncaen/netusers  &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;2023&lt;/span&gt;-09-18 &lt;span class="m"&gt;06&lt;/span&gt;:37 ./usr/bin/xbps-query -&amp;gt; xbps-query.static
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;But not knowing that, I ran...&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;$ sudo tar -C / -xvfp xbps-static-latest.x86_64-musl.tar.xz
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;In the seconds that followed, I noticed the rapid decline of my system. The windows of my XFCE session stopped redrawing, the X server itself shut down. I couldn't run sudo. I couldn't even boot my system again. It happened so quickly and unexpectedly that I could hardly believe that my last command had caused the crash. Fortunately, booting in a single mode and detailed analysis of the tar archive revealed the root cause.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="the-root-cause"&gt;
&lt;h2&gt;The root cause&lt;/h2&gt;
&lt;p&gt;The tar archive contains the current directory &amp;quot;./&amp;quot;, which became the root directory when I changed it with &amp;quot;tar -C / ...&amp;quot; to change it before extracting. Restoring the owner and permissions of the current (top) directory of the archive resulted in setting 700 permissions and 2002:2000 as owner:group on my directory tree, which changed its expected state.  Thus, my own user completely lost access to the entire file system. Who could have expected that? ;)&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;For this little demo, I spun up a new VM. Don't try this on your running system!&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;$ sudo chmod &lt;span class="m"&gt;700&lt;/span&gt; /

$ ls -ld /
drwx------ &lt;span class="m"&gt;17&lt;/span&gt; root root &lt;span class="m"&gt;4096&lt;/span&gt; Mar &lt;span class="m"&gt;27&lt;/span&gt; &lt;span class="m"&gt;11&lt;/span&gt;:24 /

$ sudo chown &lt;span class="m"&gt;2000&lt;/span&gt;:2000 /

$ sudo chown &lt;span class="m"&gt;2000&lt;/span&gt;:2000 /usr
-bash: /usr/bin/sudo: Permission denied

$ sudo -s
-bash: /usr/bin/sudo: Permission denied

$ ls -ld /
-bash: /usr/bin/ls: Permission denied
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="what-can-be-done-to-prevent-it"&gt;
&lt;h2&gt;What can be done to prevent it?&lt;/h2&gt;
&lt;p&gt;In general, it is convenient to create a new archive with a relative directory tree using a command similar to&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;$ tar -C /path/to/rootfs -czf myarchive.tar.gz .
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;because you don't have to worry about the internal directory structure, and it's just one command. All files are addressed with simple &lt;em&gt;&amp;quot;.&amp;quot;&lt;/em&gt;. It is also useful during extraction, since &lt;em&gt;&amp;quot;-C /some/path/&amp;quot;&lt;/em&gt; allows you to choose any destination directory. On the other hand, this approach adds a current directory to the archive (the top one in the output above), which takes away all convenience. The default behavior of GNU tar is &lt;em&gt;&amp;quot;Overwrite metadata of existing directories when extracting&amp;quot;&lt;/em&gt;, which is equivalent to the &lt;em&gt;--overwrite-dir&lt;/em&gt; option. For example, if an archive contains a backup of users' home directories with all the necessary permissions, it could be super easy to restore them by running something like &lt;em&gt;&amp;quot;tar -C /home -xpf homes.tar.gz&amp;quot;&lt;/em&gt;. But this only works if the archive doesn't contain a current directory and the target &lt;em&gt;&amp;quot;/home/&amp;quot;&lt;/em&gt; is not modified.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;A good way to avoid such pitfalls is to add the &lt;strong&gt;--no-overwrite-dir&lt;/strong&gt; option, which &lt;em&gt;&amp;quot;preserves metadata of existing directories&amp;quot;&lt;/em&gt;. So, if you run something like &lt;em&gt;&amp;quot;tar -C /home --no-overwrite-dir -xpf homes.tar.gz&amp;quot;&lt;/em&gt;, all existing directories (including the current one) will remain unchanged!&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;There are also a few ways to create an archive without a current directory, but most of them require either a directory change beforehand, or defining all files/directories for the future archive. However, I found a way that, although it looks odd, does the job in one command:&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;$ tar --transform&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;s|tmp/rootfs|.|&amp;#39;&lt;/span&gt; --show-transformed-names -cvf myarchive.tar /tmp/rootfs/*

&lt;span class="c1"&gt;# or without a verbose mode&lt;/span&gt;

$ tar --transform&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;s|tmp/rootfs|.|&amp;#39;&lt;/span&gt; -cf myarchive.tar /tmp/rootfs/*
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Thanks to &lt;a class="reference external" href="http://eradman.com/"&gt;Eric Radman&lt;/a&gt; for pointing out that BSD tar has another option, &lt;a class="reference external" href="https://man.openbsd.org/tar#s"&gt;-s&lt;/a&gt;, for similar functionality.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Another and pretty typical way to create such archives (packages) is to use &lt;a class="reference external" href="https://wiki.debian.org/FakeRoot"&gt;fakeroot&lt;/a&gt;. It runs as an unprivileged user and pretends that all files are owned by root. In fact, it's just an illusion. Let's have a look at the directory with the extracted original xbps tools:&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;$ tree -agpu xbps-tools/ &lt;span class="p"&gt;|&lt;/span&gt; head
&lt;span class="o"&gt;[&lt;/span&gt;drwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-tools/
├── &lt;span class="o"&gt;[&lt;/span&gt;drwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  usr
│   └── &lt;span class="o"&gt;[&lt;/span&gt;drwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  bin
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-alternatives -&amp;gt; xbps-alternatives.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-alternatives.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-checkvers -&amp;gt; xbps-checkvers.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-checkvers.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-create -&amp;gt; xbps-create.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-create.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-dgraph -&amp;gt; xbps-dgraph.static
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And this is how it looks under &lt;em&gt;fakeroot&lt;/em&gt;&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;$ fakeroot /bin/bash

root@localhost&amp;gt; tree -agpu xbps-tools/ &lt;span class="p"&gt;|&lt;/span&gt; head
&lt;span class="o"&gt;[&lt;/span&gt;drwxr-xr-x root     root    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-tools/
├── &lt;span class="o"&gt;[&lt;/span&gt;drwxr-xr-x root     root    &lt;span class="o"&gt;]&lt;/span&gt;  usr
│   └── &lt;span class="o"&gt;[&lt;/span&gt;drwxr-xr-x root     root    &lt;span class="o"&gt;]&lt;/span&gt;  bin
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;lrwxrwxrwx root     root    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-alternatives -&amp;gt; xbps-alternatives.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;-rwxr-xr-x root     root    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-alternatives.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;lrwxrwxrwx root     root    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-checkvers -&amp;gt; xbps-checkvers.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;-rwxr-xr-x root     root    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-checkvers.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;lrwxrwxrwx root     root    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-create -&amp;gt; xbps-create.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;-rwxr-xr-x root     root    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-create.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;lrwxrwxrwx root     root    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-dgraph -&amp;gt; xbps-dgraph.static
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This fake environment allows you to create a tar archive with files owned by root without changing their real owners.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;One more nice solution is to use the &lt;em&gt;cpio&lt;/em&gt; tool to create or extract &lt;a class="reference external" href="https://vorakl.com/articles/posix/"&gt;POSIX&lt;/a&gt; tar archives. This format can be enabled during archive creation by adding &lt;em&gt;&amp;quot;-H ustar&amp;quot;&lt;/em&gt;. However, during extraction, the format is automatically detected, and it also doesn't change the permissions of the current directory, even if it exists in the archive! If you add the &lt;em&gt;&amp;quot;-d&amp;quot;&lt;/em&gt; option and run &lt;em&gt;cpio&lt;/em&gt; with &lt;em&gt;sudo&lt;/em&gt;, all non-existing subdirectories will be created as root:root, which is also very convenient.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;$ tree -agpu newroot/
&lt;span class="o"&gt;[&lt;/span&gt;drwxr-xr-x root     root    &lt;span class="o"&gt;]&lt;/span&gt;  newroot/

$ xz -cd xbps-static-latest.x86_64-musl.tar.xz &lt;span class="p"&gt;|&lt;/span&gt; sudo cpio -D newroot -idv
.
./usr
./usr/bin
./usr/bin/xbps-uunshare
./usr/bin/xbps-uhelper
./usr/bin/xbps-uchroot
./usr/bin/xbps-rindex
./usr/bin/xbps-remove
./usr/bin/xbps-reconfigure
./usr/bin/xbps-query
./usr/bin/xbps-pkgdb
./usr/bin/xbps-install
./usr/bin/xbps-fetch
./usr/bin/xbps-fbulk
./usr/bin/xbps-digest
./usr/bin/xbps-dgraph
./usr/bin/xbps-create
./usr/bin/xbps-checkvers
./usr/bin/xbps-alternatives
./usr/bin/xbps-alternatives.static
./usr/bin/xbps-checkvers.static
./usr/bin/xbps-create.static
./usr/bin/xbps-dgraph.static
./usr/bin/xbps-digest.static
./usr/bin/xbps-fbulk.static
./usr/bin/xbps-fetch.static
./usr/bin/xbps-install.static
./usr/bin/xbps-pkgdb.static
./usr/bin/xbps-query.static
./usr/bin/xbps-reconfigure.static
./usr/bin/xbps-remove.static
./usr/bin/xbps-rindex.static
./usr/bin/xbps-uchroot.static
./usr/bin/xbps-uhelper.static
./usr/bin/xbps-uunshare.static
./var
./var/db
./var/db/xbps
./var/db/xbps/keys
./var/db/xbps/keys/60:ae:0c:d6:f0:95:17:80:bc:93:46:7a:89:af:a3:2d.plist
./var/db/xbps/keys/3d:b9:c0:50:41:a7:68:4c:2e:2c:a9:a2:5a:04:b7:3f.plist
&lt;span class="m"&gt;179893&lt;/span&gt; blocks


$ tree -agpu newroot/ &lt;span class="p"&gt;|&lt;/span&gt; head
&lt;span class="o"&gt;[&lt;/span&gt;drwxr-xr-x root     root    &lt;span class="o"&gt;]&lt;/span&gt;  newroot/
├── &lt;span class="o"&gt;[&lt;/span&gt;drwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  usr
│   └── &lt;span class="o"&gt;[&lt;/span&gt;drwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  bin
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-alternatives -&amp;gt; xbps-alternatives.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-alternatives.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-checkvers -&amp;gt; xbps-checkvers.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-checkvers.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-create -&amp;gt; xbps-create.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-create.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-dgraph -&amp;gt; xbps-dgraph.static
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Note that &lt;em&gt;newroot/&lt;/em&gt; was left untouched and is still owned by root:root with 755 permissions. But &lt;em&gt;cpio&lt;/em&gt; can do even more. You can create a POSIX tar and easily control which files go in it, because &lt;em&gt;cpio&lt;/em&gt; only accepts filenames. So you can get the file list with &lt;em&gt;find&lt;/em&gt; and then filter the output to remove (for this particular example) &lt;em&gt;/usr&lt;/em&gt;, &lt;em&gt;/usr/bin&lt;/em&gt;, &lt;em&gt;/var/&lt;/em&gt;, &lt;em&gt;/var/db&lt;/em&gt;, and that's it. Super safe and convenient for everyone, while maintaining a relative directory structure inside. Here is an example of how I created a tar archive with &lt;em&gt;cpio&lt;/em&gt;, without any &amp;quot;systems&amp;quot; directories, and then extracted it with &lt;em&gt;tar&lt;/em&gt; in the usual way:&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# Create a tar archive with &amp;#39;cpio&amp;#39; of previously unpacked xbps tools&lt;/span&gt;
$ &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; xbps-tools &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; find . &lt;span class="p"&gt;|&lt;/span&gt; grep -v -e &lt;span class="s1"&gt;&amp;#39;^\.$&amp;#39;&lt;/span&gt; -e &lt;span class="s1"&gt;&amp;#39;^\./usr$&amp;#39;&lt;/span&gt; -e &lt;span class="s1"&gt;&amp;#39;^\./usr/bin$&amp;#39;&lt;/span&gt; -e &lt;span class="s1"&gt;&amp;#39;^\./var$&amp;#39;&lt;/span&gt; -e &lt;span class="s1"&gt;&amp;#39;^\./var/db$&amp;#39;&lt;/span&gt; &lt;span class="p"&gt;|&lt;/span&gt; cpio -ov -H ustar &amp;gt; ../myxbps.tar&lt;span class="o"&gt;)&lt;/span&gt;
./var/db/xbps/
./var/db/xbps/keys/
./var/db/xbps/keys/3d:b9:c0:50:41:a7:68:4c:2e:2c:a9:a2:5a:04:b7:3f.plist
./var/db/xbps/keys/60:ae:0c:d6:f0:95:17:80:bc:93:46:7a:89:af:a3:2d.plist
./usr/bin/xbps-uunshare.static
./usr/bin/xbps-uhelper.static
./usr/bin/xbps-uchroot.static
./usr/bin/xbps-rindex.static
./usr/bin/xbps-remove.static
./usr/bin/xbps-reconfigure.static
./usr/bin/xbps-query.static
./usr/bin/xbps-pkgdb.static
./usr/bin/xbps-install.static
./usr/bin/xbps-fetch.static
./usr/bin/xbps-fbulk.static
./usr/bin/xbps-digest.static
./usr/bin/xbps-dgraph.static
./usr/bin/xbps-create.static
./usr/bin/xbps-checkvers.static
./usr/bin/xbps-alternatives.static
./usr/bin/xbps-alternatives
./usr/bin/xbps-checkvers
./usr/bin/xbps-create
./usr/bin/xbps-dgraph
./usr/bin/xbps-digest
./usr/bin/xbps-fbulk
./usr/bin/xbps-fetch
./usr/bin/xbps-install
./usr/bin/xbps-pkgdb
./usr/bin/xbps-query
./usr/bin/xbps-reconfigure
./usr/bin/xbps-remove
./usr/bin/xbps-rindex
./usr/bin/xbps-uchroot
./usr/bin/xbps-uhelper
./usr/bin/xbps-uunshare
&lt;span class="m"&gt;179889&lt;/span&gt; blocks

$ file myxbps.tar
myxbps.tar: POSIX tar archive

&lt;span class="c1"&gt;# Check with &amp;#39;tar&amp;#39; that all files have non root user/group and the archive doesn&amp;#39;t contain . /usr /usr/bin /var /var/db&lt;/span&gt;
$ tar -tvf myxbps.tar
drwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;/2000         &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 var/db/xbps/
drwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;/2000         &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 var/db/xbps/keys/
-rw-r--r-- &lt;span class="m"&gt;2002&lt;/span&gt;/2000      &lt;span class="m"&gt;1410&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 var/db/xbps/keys/3d:b9:c0:50:41:a7:68:4c:2e:2c:a9:a2:5a:04:b7:3f.plist
-rw-r--r-- &lt;span class="m"&gt;2002&lt;/span&gt;/2000      &lt;span class="m"&gt;1410&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 var/db/xbps/keys/60:ae:0c:d6:f0:95:17:80:bc:93:46:7a:89:af:a3:2d.plist
-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;/2000   &lt;span class="m"&gt;5623104&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-uunshare.static
-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;/2000   &lt;span class="m"&gt;5643584&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-uhelper.static
-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;/2000   &lt;span class="m"&gt;5631296&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-uchroot.static
-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;/2000   &lt;span class="m"&gt;6414144&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-rindex.static
-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;/2000   &lt;span class="m"&gt;5779264&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-remove.static
-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;/2000   &lt;span class="m"&gt;5643904&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-reconfigure.static
-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;/2000   &lt;span class="m"&gt;5685440&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-query.static
-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;/2000   &lt;span class="m"&gt;5643904&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-pkgdb.static
-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;/2000   &lt;span class="m"&gt;5787648&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-install.static
-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;/2000   &lt;span class="m"&gt;5639488&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-fetch.static
-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;/2000   &lt;span class="m"&gt;5631296&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-fbulk.static
-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;/2000   &lt;span class="m"&gt;5623104&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-digest.static
-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;/2000   &lt;span class="m"&gt;5640384&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-dgraph.static
-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;/2000   &lt;span class="m"&gt;6402240&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-create.static
-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;/2000   &lt;span class="m"&gt;5644032&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-checkvers.static
-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;/2000   &lt;span class="m"&gt;5643904&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-alternatives.static
lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;/2000         &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-alternatives -&amp;gt; xbps-alternatives.static
lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;/2000         &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-checkvers -&amp;gt; xbps-checkvers.static
lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;/2000         &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-create -&amp;gt; xbps-create.static
lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;/2000         &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-dgraph -&amp;gt; xbps-dgraph.static
lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;/2000         &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-digest -&amp;gt; xbps-digest.static
lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;/2000         &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-fbulk -&amp;gt; xbps-fbulk.static
lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;/2000         &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-fetch -&amp;gt; xbps-fetch.static
lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;/2000         &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-install -&amp;gt; xbps-install.static
lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;/2000         &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-pkgdb -&amp;gt; xbps-pkgdb.static
lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;/2000         &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-query -&amp;gt; xbps-query.static
lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;/2000         &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-reconfigure -&amp;gt; xbps-reconfigure.static
lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;/2000         &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-remove -&amp;gt; xbps-remove.static
lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;/2000         &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-rindex -&amp;gt; xbps-rindex.static
lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;/2000         &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-uchroot -&amp;gt; xbps-uchroot.static
lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;/2000         &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-uhelper -&amp;gt; xbps-uhelper.static
lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;/2000         &lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;2024&lt;/span&gt;-05-21 &lt;span class="m"&gt;16&lt;/span&gt;:04 usr/bin/xbps-uunshare -&amp;gt; xbps-uunshare.static

&lt;span class="c1"&gt;# Created a new directory to emulate a root file system&lt;/span&gt;
$ tree -agpu newroot2/
&lt;span class="o"&gt;[&lt;/span&gt;drwxr-xr-x root     root    &lt;span class="o"&gt;]&lt;/span&gt;  newroot2/
├── &lt;span class="o"&gt;[&lt;/span&gt;drwxr-xr-x root     root    &lt;span class="o"&gt;]&lt;/span&gt;  usr
│   └── &lt;span class="o"&gt;[&lt;/span&gt;drwxr-xr-x root     root    &lt;span class="o"&gt;]&lt;/span&gt;  bin
└── &lt;span class="o"&gt;[&lt;/span&gt;drwxr-xr-x root     root    &lt;span class="o"&gt;]&lt;/span&gt;  var
    └── &lt;span class="o"&gt;[&lt;/span&gt;drwxr-xr-x root     root    &lt;span class="o"&gt;]&lt;/span&gt;  db

&lt;span class="c1"&gt;# Extract with &amp;#39;tar&amp;#39; in a usual way&lt;/span&gt;
$ sudo tar -C newroot2 -xvf myxbps.tar
var/db/xbps/
var/db/xbps/keys/
var/db/xbps/keys/3d:b9:c0:50:41:a7:68:4c:2e:2c:a9:a2:5a:04:b7:3f.plist
var/db/xbps/keys/60:ae:0c:d6:f0:95:17:80:bc:93:46:7a:89:af:a3:2d.plist
usr/bin/xbps-uunshare.static
usr/bin/xbps-uhelper.static
usr/bin/xbps-uchroot.static
usr/bin/xbps-rindex.static
usr/bin/xbps-remove.static
usr/bin/xbps-reconfigure.static
usr/bin/xbps-query.static
usr/bin/xbps-pkgdb.static
usr/bin/xbps-install.static
usr/bin/xbps-fetch.static
usr/bin/xbps-fbulk.static
usr/bin/xbps-digest.static
usr/bin/xbps-dgraph.static
usr/bin/xbps-create.static
usr/bin/xbps-checkvers.static
usr/bin/xbps-alternatives.static
usr/bin/xbps-alternatives
usr/bin/xbps-checkvers
usr/bin/xbps-create
usr/bin/xbps-dgraph
usr/bin/xbps-digest
usr/bin/xbps-fbulk
usr/bin/xbps-fetch
usr/bin/xbps-install
usr/bin/xbps-pkgdb
usr/bin/xbps-query
usr/bin/xbps-reconfigure
usr/bin/xbps-remove
usr/bin/xbps-rindex
usr/bin/xbps-uchroot
usr/bin/xbps-uhelper
usr/bin/xbps-uunshare

$ tree -agpu newroot2/
&lt;span class="o"&gt;[&lt;/span&gt;drwxr-xr-x root     root    &lt;span class="o"&gt;]&lt;/span&gt;  newroot2/
├── &lt;span class="o"&gt;[&lt;/span&gt;drwxr-xr-x root     root    &lt;span class="o"&gt;]&lt;/span&gt;  usr
│   └── &lt;span class="o"&gt;[&lt;/span&gt;drwxr-xr-x root     root    &lt;span class="o"&gt;]&lt;/span&gt;  bin
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-alternatives -&amp;gt; xbps-alternatives.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-alternatives.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-checkvers -&amp;gt; xbps-checkvers.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-checkvers.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-create -&amp;gt; xbps-create.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-create.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-dgraph -&amp;gt; xbps-dgraph.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-dgraph.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-digest -&amp;gt; xbps-digest.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-digest.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-fbulk -&amp;gt; xbps-fbulk.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-fbulk.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-fetch -&amp;gt; xbps-fetch.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-fetch.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-install -&amp;gt; xbps-install.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-install.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-pkgdb -&amp;gt; xbps-pkgdb.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-pkgdb.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-query -&amp;gt; xbps-query.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-query.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-reconfigure -&amp;gt; xbps-reconfigure.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-reconfigure.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-remove -&amp;gt; xbps-remove.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-remove.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-rindex -&amp;gt; xbps-rindex.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-rindex.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-uchroot -&amp;gt; xbps-uchroot.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-uchroot.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-uhelper -&amp;gt; xbps-uhelper.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-uhelper.static
│       ├── &lt;span class="o"&gt;[&lt;/span&gt;lrwxrwxrwx &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-uunshare -&amp;gt; xbps-uunshare.static
│       └── &lt;span class="o"&gt;[&lt;/span&gt;-rwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps-uunshare.static
└── &lt;span class="o"&gt;[&lt;/span&gt;drwxr-xr-x root     root    &lt;span class="o"&gt;]&lt;/span&gt;  var
    └── &lt;span class="o"&gt;[&lt;/span&gt;drwxr-xr-x root     root    &lt;span class="o"&gt;]&lt;/span&gt;  db
        └── &lt;span class="o"&gt;[&lt;/span&gt;drwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  xbps
            └── &lt;span class="o"&gt;[&lt;/span&gt;drwxr-xr-x &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  keys
                ├── &lt;span class="o"&gt;[&lt;/span&gt;-rw-r--r-- &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  3d:b9:c0:50:41:a7:68:4c:2e:2c:a9:a2:5a:04:b7:3f.plist
                └── &lt;span class="o"&gt;[&lt;/span&gt;-rw-r--r-- &lt;span class="m"&gt;2002&lt;/span&gt;     &lt;span class="m"&gt;2000&lt;/span&gt;    &lt;span class="o"&gt;]&lt;/span&gt;  &lt;span class="m"&gt;60&lt;/span&gt;:ae:0c:d6:f0:95:17:80:bc:93:46:7a:89:af:a3:2d.plist
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Note that all &amp;quot;system&amp;quot; directories such as &lt;em&gt;/usr&lt;/em&gt; or &lt;em&gt;/var/db&lt;/em&gt; are left unmodified with their original owners and permissions.
In fact, you can get the same result with &lt;em&gt;tar&lt;/em&gt; either&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;$ &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; xbps-tools &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; find . &lt;span class="p"&gt;|&lt;/span&gt; grep -v -e &lt;span class="s1"&gt;&amp;#39;^\.$&amp;#39;&lt;/span&gt; -e &lt;span class="s1"&gt;&amp;#39;^\./usr$&amp;#39;&lt;/span&gt; -e &lt;span class="s1"&gt;&amp;#39;^\./usr/bin$&amp;#39;&lt;/span&gt; -e &lt;span class="s1"&gt;&amp;#39;^\./var$&amp;#39;&lt;/span&gt; -e &lt;span class="s1"&gt;&amp;#39;^\./var/db$&amp;#39;&lt;/span&gt; &lt;span class="p"&gt;|&lt;/span&gt; tar --verbatim-files-from -T - -cvf ../myxbps.tar&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;That's how I would create such archives with files to be extracted to the root filesystem.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="conclusion"&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Do not blindly extract an archive if you don't know what it contains! It could be fatal to your system.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;!-- Links --&gt;
&lt;/div&gt;
</content><category term="os"></category><category term="linux"></category><category term="tools"></category></entry><entry><title>A few facts about POSIX</title><link href="https://vorakl.com/articles/posix/" rel="alternate"></link><published>2024-04-23T10:45:58-07:00</published><updated>2024-04-23T10:45:58-07:00</updated><author><name>vorakl</name></author><id>tag:vorakl.com,2024-04-23:/articles/posix/</id><summary type="html">&lt;p class="first last"&gt;A journey to portable software&lt;/p&gt;
</summary><content type="html">&lt;p&gt;&lt;a class="reference internal" href="#summary"&gt;TLDR: quick summary of the article&lt;/a&gt;&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="how-did-we-get-there"&gt;
&lt;h2&gt;How did we get there?&lt;/h2&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;In the early days of computing, programmers could only dream of portability. All programs were written directly in machine code for each computer architecture they were intended to run on. &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Assembly_language"&gt;Assembly languages&lt;/a&gt; with mnemonic names for each CPU instruction and other goodies made programmers' lives a little easier, but programs were still architecture-specific. Operating systems (OS) had not yet been invented, so a program not only controlled the entire computer system, it also had to initialize and manage the peripherals. In fact, such bare-metal programs implemented drivers for every device they used. And every time a program needed to run on hardware with a different architecture, it was literally rewritten to accommodate a difference in the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Instruction_set_architecture"&gt;CPU instruction&lt;/a&gt; set, memory layout, and so on.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This is exactly what happened with Unix, which was originally written in assembly language by Ken Thompson over 50 years ago. The first versions of Unix were written for the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/PDP-7"&gt;PDP-7&lt;/a&gt; platform, and porting it to the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/PDP-11"&gt;PDP-11&lt;/a&gt; meant rewriting the code. When Dennis Ritchie created the C programming language, and &lt;a class="reference external" href="https://www.invent.org/sites/default/files/2019-02/Inductee-UNIX_Thompson_Ritchie.jpg"&gt;together they&lt;/a&gt; rewrote most of the Unix code in it, software portability suddenly became possible. There are two main reasons for this. First, the code written in a high-level programming language is platform-agnostic, because compilers translate it into the assembly language for a target architecture. This is even more important for target systems based on &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Reduced_instruction_set_computer"&gt;RISC CPUs&lt;/a&gt;, as they require writing significantly more assembly instructions than &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Complex_instruction_set_computer"&gt;CISC CPU&lt;/a&gt; architecture. Even porting Unix to another platform was mostly a matter of adapting the architecture-dependent parts of the code. On the other hand, the operating system itself abstracts away all hardware specifics from a user program. Programmers don't have to implement multitasking, memory management, or drivers for different devices as they used to, because it's all part of the OS kernel and runs in the kernel address space. In contrast, user programs run in the user address space and access all of the features provided by the OS through the the system call interface. In &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Real-time_operating_system"&gt;Real-time OSes&lt;/a&gt;, such as &lt;a class="reference external" href="https://www.zephyrproject.org/"&gt;Zephyr OS&lt;/a&gt;, it's &lt;a class="reference external" href="https://www.youtube.com/watch?v=4_uL43V79xw"&gt;slightly different&lt;/a&gt;, but the idea of memory isolation and protection for user programs is preserved. This leads to two conclusions:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;em&gt;User programs become portable when they are written in a high-level programming language for a particular OS&lt;/em&gt;. Once both requirements are met, programs are compiled into instructions for a target CPU and linked with system functions provided by the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/C_standard_library"&gt;libc&lt;/a&gt; and OS-specific libraries to access the underlying hardware.&lt;/li&gt;
&lt;li&gt;Portability is intended to be achieved &lt;strong&gt;at the source code level&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="the-birth-of-posix"&gt;
&lt;h2&gt;The birth of POSIX&lt;/h2&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This could have been the end of the story, but something fateful happened. Due to a legal restriction, AT&amp;amp;T was not allowed to sell Unix, so there was no money to be made from the newly born OS, which became increasingly popular after it was introduced to the world. However, it turned out to be possible to distribute Unix to any interested organization for the cost of the media. That's how Unix got to Berkeley in 1974 and many other places, leading to the creation of a number of OS derivatives. Some of the best known and still popular today are OSes based on the software distributed by Berkeley (BSD), e.g. FreeBSD and OpenBSD. Despite sharing the same ancestors and principles, each operating system followed its own unique path. Each of these operating systems had a unique interface (API) and implementation of kernel subsystems, syscalls, different system tools, etc. Even libc, which provides common functionality and  wrappers on top of syscalls, used to be very OS-specific. All of these OSes were Unix-like, but at the same time, it wasn't possible to take the source code of a program written for one OS and recompile it on another.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Over 35 years ago, these problems with software portability led to the emergence of the first &lt;a class="reference external" href="https://pubs.opengroup.org/onlinepubs/9699919799/nframe.html"&gt;POSIX standard&lt;/a&gt; in 1988. The acronym &lt;a class="reference external" href="https://opensource.com/article/19/7/what-posix-richard-stallman-explains"&gt;was coined by Richard Stallman&lt;/a&gt;, who added &amp;quot;X&amp;quot; to the end of &lt;em&gt;Portable Operating System Interface&lt;/em&gt;. The &lt;em&gt;POSIX™&lt;/em&gt; trademark is currently owned by &lt;a class="reference external" href="https://www.ieee.org/about/index.html"&gt;IEEE&lt;/a&gt;, and &lt;em&gt;UNIX®&lt;/em&gt; is a registered trademark of &lt;a class="reference external" href="https://www.opengroup.org/about-us"&gt;The Open Group&lt;/a&gt;. It's meant to provide a &lt;a class="reference external" href="https://www.techtarget.com/whatis/definition/POSIX-Portable-Operating-System-Interface"&gt;specification of the interface&lt;/a&gt; that different Unix operating systems should have in common, including &lt;a class="reference external" href="https://stackoverflow.com/a/31865755"&gt;programming languages and tools&lt;/a&gt;. It's important to note that &lt;strong&gt;the interface is portable&lt;/strong&gt;, and not the implementation.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This was the common ground that made it possible to compile the same source code of a user program on any OS without modification, if both sides strictly followed the same standard. And this is still true to some extent today, as most modern and widely used Unix-like systems, such as Linux, and &lt;cite&gt;*BSD&lt;/cite&gt;, do not strictly and completely follow POSIX standard, but rather use it as a guide. In addition to POSIX, there is also the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Single_UNIX_Specification"&gt;Single UNIX Specification&lt;/a&gt; (SUS), which was consolidated with a few different POSIX standards in 2001. However, the latest SUS (SUSv4 2018) extends the latest POSIX standard (POSIX.1-2017), which is essentially its base specification, with the X/Open Curses specification. There are &lt;a class="reference external" href="https://en.wikipedia.org/wiki/POSIX#POSIX-oriented_operating_systems"&gt;a number of operating systems, such as MacOS&lt;/a&gt;, which are fully compliant with the POSIX and SUS standards, pass The Open Group  conformance tests and can therefore be called &lt;a class="reference external" href="https://www.opengroup.org/openbrand/register/"&gt;Unix operating systems&lt;/a&gt;, not just Unix-like. Originally, POSIX was only created for Unix-like OSes, but over time it became so popular that its specification, in the form of the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Operating_system_abstraction_layer"&gt;Operating System Abstraction Layer (OSAL)&lt;/a&gt;, was partially implemented (some subset of the interface that applicable to the target system) in non-Unix OSes, such as &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Cygwin"&gt;Windows&lt;/a&gt;, &lt;a class="reference external" href="https://www.freertos.org/FreeRTOS-Plus/FreeRTOS_Plus_POSIX/index.html"&gt;FreeRTOS&lt;/a&gt;, &lt;a class="reference external" href="https://docs.zephyrproject.org/latest/services/portability/posix/index.html"&gt;Zephyr&lt;/a&gt;, etc.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="the-posix-spec"&gt;
&lt;h2&gt;The POSIX spec&lt;/h2&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The very first standard was ratified by the IEEE in 1988 as IEEE Std 1003.1-1988, so it's called &lt;em&gt;POSIX.1-1988&lt;/em&gt;. Since then, the standard has gone through several revisions, with different subsets of the specification being ratified under different names. For example, &lt;em&gt;POSIX.1-1990&lt;/em&gt; (IEEE 1003.1-1990) defined &lt;em&gt;the system interface and computing environment&lt;/em&gt;, &lt;em&gt;POSIX.2&lt;/em&gt; (IEEE Std 1003.2-1992) defined &lt;em&gt;command language (shell) and tools&lt;/em&gt;, etc. A very good and brief overview of the standard's revisions can be found in the &lt;a class="reference external" href="https://man7.org/linux/man-pages/man7/standards.7.html"&gt;standards(7)&lt;/a&gt; Linux man page. You may even come across references to some old revisions, such as POSIX.2, for example, when reading the &lt;a class="reference external" href="https://git.savannah.gnu.org/cgit/bash.git/tree/jobs.c#n4269"&gt;Bash source code&lt;/a&gt;. In 2001, POSIX.1, POSIX.2, and the Single UNIX Specification (SUS) were merged into a single document called &lt;em&gt;POSIX.1-2001&lt;/em&gt;. Despite the somewhat misleading name, it does include the shell and tools specifications from POSIX.2. &lt;strong&gt;The latest version of the standard is POSIX.1-2017&lt;/strong&gt;, also known as &lt;a class="reference external" href="https://pubs.opengroup.org/onlinepubs/9699919799/nframe.html"&gt;IEEE Std 1003.1-2017&lt;/a&gt;, which is almost identical to POSIX.1-2008.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The document of the standard basically describes a specification that spans over two environments (a build-time and a run-time) and is represented by a few volumes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/toc.html"&gt;Base Definitions&lt;/a&gt;: defines common to all volumes general terms and concepts, conformant requirements (symbolic constants, options, option groups), computing environment (locales, regexp, directory structure, tty, environment variables, etc), and C-language header files which need to be implemented by the compliant systems.&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="https://pubs.opengroup.org/onlinepubs/9699919799/idx/xsh.html"&gt;System Interfaces&lt;/a&gt;:  defines the C language standard (&lt;a class="reference external" href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf"&gt;ISO C99, ISO/IEC 9899:1999&lt;/a&gt;), system service functions, and the extension of the C standard library (libc) in terms of header files and functions.&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="https://pubs.opengroup.org/onlinepubs/9699919799/idx/xcu.html"&gt;Shell &amp;amp; Utilities&lt;/a&gt;: defines a source code-level interface to the Shell Command Language (sh) and the system utilities (awk, sed, wc, cat, ...), including behavior, command line parameters, exit statuses, etc.&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="https://pubs.opengroup.org/onlinepubs/9699919799/idx/xrat.html"&gt;Rationale&lt;/a&gt;: includes considerations for portability, subprofiling, option groups, and additional rationale that didn't fit any other volumes.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The current POSIX standard defines source code-level compatibility for &lt;a class="reference external" href="https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap02.html#tag_02_04"&gt;only two programming languages&lt;/a&gt;: &lt;em&gt;The C language (C99)&lt;/em&gt; and &lt;em&gt;the shell command language&lt;/em&gt;. However, some of the programs defined under &amp;quot;Utilities&amp;quot;, such as &lt;a class="reference external" href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html"&gt;awk&lt;/a&gt;, also have their own language. Strictly speaking, the C standard library (libc) doesn't have to implement any additional functionality (functions and headers) that is not defined by the C standard (ISO C99 in this case), but most of them do. For example, the ISO C99 standard, defines 24 header files, including math functions (&amp;lt;math.h&amp;gt;), standard input/output (&amp;lt;stdio.h&amp;gt;), date and time (&amp;lt;time.h&amp;gt;), signal management (&amp;lt;signal.h&amp;gt;), string operations (&amp;lt;string.h&amp;gt;), and so on. However, the latest POSIX standard, defines 82 header files and, being fully compliant with ISO C99, extends it with with POSIX threads (&amp;lt;pthreads.h&amp;gt;), semaphores (&amp;lt;semaphore.h&amp;gt;), and many others. Modern libc implementations, e.g. &lt;a class="reference external" href="https://musl.libc.org/about.html"&gt;musl libc&lt;/a&gt;, are also very OS-specific, providing library functions to access operating system services (wrappers for system calls). Sometimes, the overlap with the POSIX specifications leads to difficulties in implementing the POSIX abstraction layer in the non-Unix operating systems, which also use some portable standalone libc implementations with their own POSIX support, e.g. using &lt;a class="reference external" href="https://keithp.com/picolibc/"&gt;picolibc&lt;/a&gt; together with &lt;a class="reference external" href="https://docs.zephyrproject.org/latest/services/portability/posix/implementation/index.html"&gt;Zephyr's POSIX library&lt;/a&gt;.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="options-and-option-groups"&gt;
&lt;h2&gt;Options and Option Groups&lt;/h2&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;While POSIX standardizes the system interface (C language headers and functions), shell, and utilities, it is not necessary to follow the entire specification to be &lt;a class="reference external" href="https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap02.html#tag_02_01_03"&gt;POSIX conformant&lt;/a&gt;. Some features in &amp;quot;POSIX System Interfaces&amp;quot;, &amp;quot;POSIX Shell and Utilities&amp;quot;, and &amp;quot;XSI System Interfaces&amp;quot; are optional. The &lt;a class="reference external" href="https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/unistd.h.html"&gt;&amp;lt;unistd.h&amp;gt; header file&lt;/a&gt; contains definitions of the &lt;em&gt;standard symbolic constants&lt;/em&gt; for &lt;a class="reference external" href="https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap02.html#tag_02_01_06"&gt;Options&lt;/a&gt;, which reflect a particular feature, and &lt;a class="reference external" href="https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap02.html#tag_02_01_05"&gt;Option Groups&lt;/a&gt; which define a set of related functions or options. Names of option groups, unlike options, typically do not begin with the underscore symbol. POSIX Conformant systems are intended to implement and support a set of mandatory options with one or more additional options. The symbolic constants for mandatory options should have specific values, e.g. &lt;em&gt;200809L&lt;/em&gt;, while other options may be&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;em&gt;undefined or contain -1&lt;/em&gt;, which means that the option is not supported for compilation&lt;/li&gt;
&lt;li&gt;&lt;em&gt;0&lt;/em&gt;, which means the option might or might not be supported at runtime&lt;/li&gt;
&lt;li&gt;&lt;em&gt;some other value&lt;/em&gt;, which means the option is always supported&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;These symbolic constants are used by user applications to check the availability of a particular feature. At the C source code-level, constants may be checked either at build time (in #if preprocessing directives) or at runtime, by calling one of the &lt;em&gt;sysconf()&lt;/em&gt;, &lt;em&gt;pathconf()&lt;/em&gt;, &lt;em&gt;fpathconf()&lt;/em&gt;, or &lt;em&gt;confstr(3)&lt;/em&gt; functions. In the shell source code, the &lt;a class="reference external" href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/getconf.html"&gt;getconf&lt;/a&gt; utility should be used for runtime checks. A very good collection of the POSIX options, their corresponding names for use as the sysconf(3) parameters, and the list of header files and functions that these options represent can be found in the &lt;a class="reference external" href="https://man7.org/linux/man-pages/man7/posixoptions.7.html"&gt;posixoptions(7)&lt;/a&gt; Linux man page.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;a class="reference external" href="https://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_subprofiles.html"&gt;Subprofiling Option Groups&lt;/a&gt; are intended for use within the systems where implementing a full POSIX specification is not reasonable. For example, real-time embedded systems are typically resource-constrained, do not have shells, user interfaces, and OS kernels are often designed to run as a single process (with multiple threads). Such systems may only implement subsets of related functions defined by option groups.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="summary"&gt;
&lt;h2&gt;Summary&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;The development of high-level programming languages like C, along with operating systems that abstract away hardware details, enabled software portability at the source code level.&lt;/li&gt;
&lt;li&gt;The POSIX standard emerged in 1988 to provide a portable interface specification for Unix-like operating systems, allowing programs to be compiled across different platforms.&lt;/li&gt;
&lt;li&gt;The POSIX standard has evolved over time, with the latest version being POSIX.1-2017 (IEEE Std 1003.1-2017).&lt;/li&gt;
&lt;li&gt;Modern Unix-like systems like Linux and &lt;cite&gt;*BSD&lt;/cite&gt; do not strictly follow the POSIX standard, but rather use it as a guide.&lt;/li&gt;
&lt;li&gt;POSIX standardizes a C API (header files and functions), the shell, and utilities.&lt;/li&gt;
&lt;li&gt;POSIX-compliant systems are expected to implement mandatory options and may support additional optional features.&lt;/li&gt;
&lt;li&gt;Applications can check for POSIX feature availability at both compile-time and runtime using symbolic constants and system functions.&lt;/li&gt;
&lt;li&gt;For resource-constrained systems like real-time embedded platforms, POSIX allows for the implementation of subsets of the full specification through &amp;quot;subprofile&amp;quot; option groups.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;!-- Links --&gt;
&lt;/div&gt;
</content><category term="it"></category><category term="os"></category><category term="programming"></category></entry><entry><title>How to sort arrays natively in Bash</title><link href="https://vorakl.com/articles/bash-sort/" rel="alternate"></link><published>2024-02-20T18:37:45-08:00</published><updated>2024-02-20T18:37:45-08:00</updated><author><name>vorakl</name></author><id>tag:vorakl.com,2024-02-20:/articles/bash-sort/</id><summary type="html">&lt;p class="first last"&gt;Sorting arrays in pure Bash with the asort built-in command&lt;/p&gt;
</summary><content type="html">&lt;p&gt;&lt;a class="reference internal" href="#summary"&gt;TLDR: quick summary of the article&lt;/a&gt;&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;What would you do if, while implementing some solution in Bash, you suddenly needed to have an array in a sorted order? You might think of the &lt;em&gt;sort&lt;/em&gt; tool from the &lt;em&gt;coreutils&lt;/em&gt; package. Or you might even think that it's probably a good time to switch to Python or some other language? But it turns out that Bash supports sorting arrays natively! All you need is the &lt;strong&gt;asort&lt;/strong&gt; built-in command. However, it is often not loaded by default, or even packaged on many modern Linux distributions. In this article I'll show you how to build and install Bash with all loadable modules from source, load them, and start writing faster, more advanced Bash scripts with less use of external commands.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;First of all, check your Bash version. Version 5.2-release is the target of this article:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;BASH_VERSION&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The built-in loadable modules are loaded with the &lt;strong&gt;enable&lt;/strong&gt; command. Bash expects to find loadable modules in one of the paths specified in the &lt;strong&gt;BASH_LOADABLES_PATH&lt;/strong&gt; environment variable, which is a colon-separated list of directories. Setting this variable and enabling all the necessary commands can be done, for example, with &lt;em&gt;.bashrc&lt;/em&gt;. If you are currently running a pre-installed Bash, check that the &lt;em&gt;asort&lt;/em&gt; command is not loaded and it cannot be loaded due to its absence:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nb"&gt;enable&lt;/span&gt; -p &lt;span class="p"&gt;|&lt;/span&gt; grep asort &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt; &lt;span class="nb"&gt;enable&lt;/span&gt; -f asort asort &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;enable&lt;/span&gt; -p &lt;span class="p"&gt;|&lt;/span&gt; grep asort&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="o"&gt;}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;If you see &amp;quot;&lt;em&gt;enable asort&lt;/em&gt;&amp;quot; on the screen then the &lt;em&gt;asort&lt;/em&gt; builtin is loaded and you can start using it, for example, by checking its help message:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;asort --help
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Otherwise, let's build it from source. First of all, clone the project's official git repository and enter its directory:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;git clone https://git.savannah.gnu.org/git/bash.git &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd&lt;/span&gt; bash
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The following procedure is pretty standard for any software written in C: you &lt;em&gt;configure&lt;/em&gt; the build tools for the specific system, then you build the software, and then you install it on the system. During a configuration step, for example, you can change a default (/usr/local) installation path prefix. I'm going to override it with the same directory as the default. The loadable built-in commands can only be built after the main tool set is built:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;./configure --prefix&lt;span class="o"&gt;=&lt;/span&gt;/usr/local
make
make -C examples/loadables all others
sudo make install
sudo make -C examples/loadables install
sudo cp -v examples/loadables/&lt;span class="o"&gt;{&lt;/span&gt;necho,hello,cat,pushd,asort&lt;span class="o"&gt;}&lt;/span&gt; /usr/local/lib/bash/
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Loadable built-in commands are installed in &lt;em&gt;/usr/local/lib/bash/&lt;/em&gt; and Bash itself in &lt;em&gt;/usr/local/bin/&lt;/em&gt;. The trick with copying files is needed because the &lt;em&gt;asort&lt;/em&gt; command is part of the extra commands and, as of this writing and Bash version 5.2.26, the Makefile doesn't support installing it. If all commands finished with no errors, you'll be able to find the loadable commands in the &lt;em&gt;/usr/local/lib/bash/&lt;/em&gt; directory. They are &lt;em&gt;shared objects&lt;/em&gt; that can be analyzed in the typical way:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; /usr/local/lib/bash
ldd asort
file asort
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;To load built-in commands from these files, you need to know a name of the structure that was defined in the source code. Some files contain only one command, so there is only one such structure, some contain two commands and two structures. You can find out these names by checking the symbol table and looking for the pattern &lt;em&gt;&amp;lt;name&amp;gt;_struct&lt;/em&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;$ objdump -t asort &lt;span class="p"&gt;|&lt;/span&gt; grep _struct
00000000000040c0 g     O .data      &lt;span class="m"&gt;0000000000000030&lt;/span&gt;              asort_struct

$ objdump -t truefalse &lt;span class="p"&gt;|&lt;/span&gt; grep _struct
&lt;span class="m"&gt;0000000000004020&lt;/span&gt; g     O .data      &lt;span class="m"&gt;0000000000000030&lt;/span&gt;              false_struct
&lt;span class="m"&gt;0000000000004060&lt;/span&gt; g     O .data      &lt;span class="m"&gt;0000000000000030&lt;/span&gt;              true_struct
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Make sure the &lt;em&gt;BASH_LOADABLES_PATH&lt;/em&gt; environment variable is set and contains &lt;em&gt;/usr/local/lib/bash&lt;/em&gt;, the directory where we installed the built-in commands. Now, everything is ready for testing. Let's run a newly built Bash, and load &lt;em&gt;asort&lt;/em&gt; and a few other useful commands, just as an example, using the names we found in the symbol table:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;/usr/local/bin/bash
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;BASH_VERSION&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;BASH_LOADABLES_PATH&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
&lt;span class="nb"&gt;enable&lt;/span&gt; -f asort asort
&lt;span class="nb"&gt;enable&lt;/span&gt; -f truefalse &lt;span class="nb"&gt;true&lt;/span&gt;
&lt;span class="nb"&gt;enable&lt;/span&gt; -f truefalse &lt;span class="nb"&gt;false&lt;/span&gt;
&lt;span class="nb"&gt;enable&lt;/span&gt; -f dsv dsv
dsv --help
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Finally, we can perform reverse numerical sorting using only the built-in function which is dsone in-place:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;$ &lt;span class="nb"&gt;declare&lt;/span&gt; -a &lt;span class="nv"&gt;arr&lt;/span&gt;&lt;span class="o"&gt;=(&lt;/span&gt;&lt;span class="m"&gt;3&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="m"&gt;15&lt;/span&gt; &lt;span class="m"&gt;6&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;

$ &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[*]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
&lt;span class="m"&gt;3&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="m"&gt;15&lt;/span&gt; &lt;span class="m"&gt;6&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;

$ asort -nr arr

$ &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[*]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
&lt;span class="m"&gt;15&lt;/span&gt; &lt;span class="m"&gt;6&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Having commands loaded as shared objects allows the Bash to call them directly and avoid creating new processes just to call the external tools with the same functionality. Let's do a quick experiment with &lt;em&gt;mkdir&lt;/em&gt; when used as an external tool and loaded into the Bash:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;$ strace -e execve /usr/local/bin/bash -c &lt;span class="s1"&gt;&amp;#39;mkdir /tmp/mydir&amp;#39;&lt;/span&gt;

execve&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/usr/local/bin/bash&amp;quot;&lt;/span&gt;, &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/usr/local/bin/bash&amp;quot;&lt;/span&gt;, &lt;span class="s2"&gt;&amp;quot;-c&amp;quot;&lt;/span&gt;, &lt;span class="s2"&gt;&amp;quot;mkdir /tmp/mydir&amp;quot;&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;, 0x7ffd7723d6f0 /* &lt;span class="m"&gt;68&lt;/span&gt; vars */&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;
execve&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/usr/bin/mkdir&amp;quot;&lt;/span&gt;, &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;mkdir&amp;quot;&lt;/span&gt;, &lt;span class="s2"&gt;&amp;quot;/tmp/mydir&amp;quot;&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;, 0x1e2c010 /* &lt;span class="m"&gt;67&lt;/span&gt; vars */&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;$ strace -e execve /usr/local/bin/bash -c &lt;span class="s1"&gt;&amp;#39;enable -f mkdir mkdir; mkdir /tmp/mydir2&amp;#39;&lt;/span&gt;

execve&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/usr/local/bin/bash&amp;quot;&lt;/span&gt;, &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/usr/local/bin/bash&amp;quot;&lt;/span&gt;, &lt;span class="s2"&gt;&amp;quot;-c&amp;quot;&lt;/span&gt;, &lt;span class="s2"&gt;&amp;quot;enable -f mkdir mkdir; mkdir /tm&amp;quot;&lt;/span&gt;...&lt;span class="o"&gt;]&lt;/span&gt;, 0x7ffd37695000 /* &lt;span class="m"&gt;68&lt;/span&gt; vars */&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;You can see that both executables are invoked when &lt;em&gt;mkdir&lt;/em&gt; is called as an external tool. But, when &lt;em&gt;mkdir&lt;/em&gt; is enabled as a built-in command, there is no an external tool execution, because the Bash calls this function directly. Besides being faster, the &lt;em&gt;asort&lt;/em&gt; command has another big advantage over using an external &lt;em&gt;sort&lt;/em&gt; tool. Because &lt;em&gt;asort&lt;/em&gt; operates on the array data structure directly in memory, you don't have to worry about symbols contained in the array elements and just sort them in-place. They can contain newlines &lt;cite&gt;(0x0a or \n)&lt;/cite&gt; or other bash specific symbols like  &lt;cite&gt;*&lt;/cite&gt; or &lt;cite&gt;?&lt;/cite&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;$ &lt;span class="nb"&gt;declare&lt;/span&gt; -a &lt;span class="nv"&gt;arr&lt;/span&gt;&lt;span class="o"&gt;=(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;**&amp;#39;&lt;/span&gt; &lt;span class="s1"&gt;$&amp;#39;abc\nxyz&amp;#39;&lt;/span&gt; &lt;span class="s1"&gt;$&amp;#39;abc\nefg&amp;#39;&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;*&amp;#39;&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;

$ &lt;span class="nb"&gt;declare&lt;/span&gt; -p arr
&lt;span class="nb"&gt;declare&lt;/span&gt; -a &lt;span class="nv"&gt;arr&lt;/span&gt;&lt;span class="o"&gt;=([&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="o"&gt;]=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;**&amp;quot;&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="o"&gt;]=&lt;/span&gt;&lt;span class="s1"&gt;$&amp;#39;abc\nxyz&amp;#39;&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="o"&gt;]=&lt;/span&gt;&lt;span class="s1"&gt;$&amp;#39;abc\nefg&amp;#39;&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="o"&gt;]=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;*&amp;quot;&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;

$ &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[1]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;
abc
xyz

$ asort arr

$ &lt;span class="nb"&gt;declare&lt;/span&gt; -p arr
&lt;span class="nb"&gt;declare&lt;/span&gt; -a &lt;span class="nv"&gt;arr&lt;/span&gt;&lt;span class="o"&gt;=([&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="o"&gt;]=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;*&amp;quot;&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="o"&gt;]=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;**&amp;quot;&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="o"&gt;]=&lt;/span&gt;&lt;span class="s1"&gt;$&amp;#39;abc\nefg&amp;#39;&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="o"&gt;]=&lt;/span&gt;&lt;span class="s1"&gt;$&amp;#39;abc\nxyz&amp;#39;&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;It's also worth checking out other loadable commands such as &lt;em&gt;id&lt;/em&gt;, &lt;em&gt;ln&lt;/em&gt;, &lt;em&gt;mkfifo&lt;/em&gt;, &lt;em&gt;cut&lt;/em&gt;, &lt;em&gt;cat&lt;/em&gt;, &lt;em&gt;stat&lt;/em&gt;, &lt;em&gt;tee&lt;/em&gt;, &lt;em&gt;uname&lt;/em&gt;, and others (see the loadable modules directory). These are fairly common tools used in Bash scripting. They can all be loaded into the Bash itself, resulting in a significant overall performance improvement by eliminating the need to run external commands each time.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="summary"&gt;
&lt;h2&gt;Summary&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Bash supports sorting arrays natively using the built-in &lt;strong&gt;asort&lt;/strong&gt; command.&lt;/li&gt;
&lt;li&gt;The asort and other loadable commands are not enabled by default and may need to be compiled from source.&lt;/li&gt;
&lt;li&gt;To build Bash and loadable commands from source, you clone the git repository, configure, make, and install it on your system.&lt;/li&gt;
&lt;li&gt;The enable command is used to load builtin commands using their struct names found in the symbol table.&lt;/li&gt;
&lt;li&gt;Common loadable commands include &lt;em&gt;asort&lt;/em&gt;, &lt;em&gt;truefalse&lt;/em&gt;, &lt;em&gt;dsv&lt;/em&gt;, &lt;em&gt;id&lt;/em&gt;, &lt;em&gt;ln&lt;/em&gt;, &lt;em&gt;mkdir&lt;/em&gt;, &lt;em&gt;uname&lt;/em&gt;, &lt;em&gt;mkdir&lt;/em&gt;, and many others.&lt;/li&gt;
&lt;li&gt;Loading builtins avoids running external commands, improving performance.&lt;/li&gt;
&lt;li&gt;Builtin commands are shared objects that can be analyzed with &lt;em&gt;ldd&lt;/em&gt;, &lt;em&gt;file&lt;/em&gt;, &lt;em&gt;objdump&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;Loadable commands are installed in &lt;em&gt;/usr/local/lib/bash&lt;/em&gt; and need &lt;em&gt;BASH_LOADABLES_PATH&lt;/em&gt; set to load.&lt;/li&gt;
&lt;/ul&gt;
&lt;!-- Links --&gt;
&lt;/div&gt;
</content><category term="bash"></category><category term="programming"></category></entry><entry><title>Availability calculation in "nines" notation</title><link href="https://vorakl.com/articles/availability/" rel="alternate"></link><published>2024-02-18T20:50:49-08:00</published><updated>2024-02-18T20:50:49-08:00</updated><author><name>vorakl</name></author><id>tag:vorakl.com,2024-02-18:/articles/availability/</id><summary type="html">&lt;p class="first last"&gt;Estimating one of SRE's most common SLO&lt;/p&gt;
</summary><content type="html">&lt;p&gt;&lt;a class="reference internal" href="#summary"&gt;TLDR: quick summary of the article&lt;/a&gt;&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The rapidly growing interest in clouds, distributed systems, microservice architecture, and service-oriented applications has led to the emergence of a new branch of computer systems engineering - &lt;em&gt;Site Reliability Engineering&lt;/em&gt; (SRE). One of the primary goals of the SRE is to ensure that a service meets certain requirements for production readiness. Services are generally considered to be &lt;em&gt;production&lt;/em&gt; when they can be trusted and relied upon. A service provider and the customers, who usually pay for a service, document a common understanding of trust in a &lt;em&gt;Service Level Agreement&lt;/em&gt; (SLA). It contains all expectations in the form of &lt;em&gt;Service Level Objectives&lt;/em&gt; (SLO) and penalties if these expectations are not met. SLOs are &lt;strong&gt;performance&lt;/strong&gt; and &lt;strong&gt;availability&lt;/strong&gt; goals for a production service, defined on an annual time scale. These are the system characteristics that are both the most valuable to customers and worth committing to keep them within the defined expectations. SLOs are carefully quantified using &lt;em&gt;Service Level Indicators&lt;/em&gt; (SLI). SLIs are chosen specifically for SLOs as a measurable form of some properties. It can be a metric or a value derived from logs. SLIs are typically sampled over a much shorter periods of time, from tens of seconds to a few minutes, and then a mean or an average distribution is applied to obtain a value.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Site Reliability Engineers&lt;/em&gt;, in turn, are responsible for ensuring that production services meet all target SLOs defined in the SLA. They do this by focusing on the reliability through a set of practices that are more or less standardized across the industry. Some of the most common practices include:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Continuous monitoring of availability and performance characteristics;&lt;/li&gt;
&lt;li&gt;Troubleshooting failures and eliminating degradation issues;&lt;/li&gt;
&lt;li&gt;Improving overall stability and scalability through automation to keep all key metrics within expected ranges;&lt;/li&gt;
&lt;li&gt;Preparing for disaster recovery through continuous stress testing using the error budget, an agreed upon timeframe in which a service can be degraded or unavailable.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Performance SLOs are important goals, but they are only important if a service is available. Availability is so important that it's sometimes &lt;em&gt;mistakenly&lt;/em&gt; considered the only SLA component. Finding the right SLI to measure availability can be challenging. It's service-specific and depends on a variety of factors, such as the underlying infrastructure, architecture, etc. In SLO form, availability is expressed as a percentage in what is called &amp;quot;nines&amp;quot; notation. For example, in the clouds, the most common availability SLO is 99.9%, which is called &amp;quot;3-nines&amp;quot;. However, you are unlikely to find it higher than 99.999%, or &amp;quot;5-nines&amp;quot;. The actual availability of a service in percent is basically calculated as the ratio of the time a service is available to the total uptime (which includes downtime) over the past year.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;It is interesting that people who use the nines notation are actually referring to the time when a service is a sort of allowed to be down. This downtime, which is literally allowed by the SLA, forms what is  called the &lt;em&gt;error budget&lt;/em&gt;. While targeting 100% availability is hardly feasible, it turns out that from a practical point of view, it is more beneficial to commit to a lower availability. Even if all technical possibilities exist to provide more &amp;quot;nines&amp;quot;.  At certain levels, services with a higher availability will not be noticed by the majority of customers, so it's probably not worth the effort. However, having some error budget opens the doors to experimentation and less stressful deployments of new product features.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;It is also useful to know how to estimate a potential downtime, the amount of time when your system may be out of service. To do this, remember that the availability SLO is defined for one-year period. Therefore, &lt;em&gt;60s by 60m by 24h by 365d&lt;/em&gt; gives us &lt;em&gt;31536000&lt;/em&gt; seconds of a total uptime. Then, if the availability is &amp;quot;five-nines&amp;quot; (99.999%), then the downtime is 0.001%, or &lt;cite&gt;31536000 * 0.001% =&amp;gt; 31536000 * 0.00001 = 315.36&lt;/cite&gt; sec, which is about &lt;em&gt;5.256&lt;/em&gt; minutes per year that the service can be down. A similar calculation for &amp;quot;three-nines&amp;quot; (99.9%) availability shows that the service can be down for &lt;cite&gt;31536000 * 0.001 = 31536&lt;/cite&gt; seconds, or 525.6 minutes, or &lt;em&gt;8.76&lt;/em&gt; hours per year.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="summary"&gt;
&lt;h2&gt;Summary&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;em&gt;Site Reliability Engineering&lt;/em&gt; (SRE) focuses on ensuring production services meet requirements for production readiness and can be trusted and relied upon.&lt;/li&gt;
&lt;li&gt;A &lt;em&gt;Service Level Agreement&lt;/em&gt; (SLA) contains expectations in the form of &lt;em&gt;Service Level Objectives&lt;/em&gt; (SLOs) and penalties if not met.&lt;/li&gt;
&lt;li&gt;SLOs define annual &lt;em&gt;performance&lt;/em&gt; and &lt;em&gt;availability&lt;/em&gt; goals for production services.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Service Level Indicators&lt;/em&gt; (SLIs) are metrics chosen to measure SLOs, sampled over short periods like seconds to minutes.&lt;/li&gt;
&lt;li&gt;SREs ensure services meet SLOs through standardized practices like monitoring, emergency response, and capacity planning.&lt;/li&gt;
&lt;li&gt;Availability is the most important SLA component and is expressed as percentages or &amp;quot;nines&amp;quot; denoting hours of annual downtime allowed.&lt;/li&gt;
&lt;li&gt;The 99.9% availability SLO allows 8.76 hours of annual downtime while 99.999% allows 5.256 minutes.&lt;/li&gt;
&lt;li&gt;Allowing some downtime forms an &amp;quot;&lt;em&gt;error budget&lt;/em&gt;&amp;quot; even if 100% uptime is technically possible.&lt;/li&gt;
&lt;li&gt;Higher availability beyond a certain level may not be noticeable to most customers.&lt;/li&gt;
&lt;li&gt;Calculating allowed downtime involves determining the total seconds in a year and applying the percentage downtime allowed.&lt;/li&gt;
&lt;/ul&gt;
&lt;!-- Links --&gt;
&lt;/div&gt;
</content><category term="sre"></category></entry><entry><title>A little mess with function parameters in Python</title><link href="https://vorakl.com/articles/py-params/" rel="alternate"></link><published>2024-02-17T11:03:29-08:00</published><updated>2024-02-17T11:03:29-08:00</updated><author><name>vorakl</name></author><id>tag:vorakl.com,2024-02-17:/articles/py-params/</id><summary type="html">&lt;p class="first last"&gt;A variety of ways to define function parameters&lt;/p&gt;
</summary><content type="html">&lt;p&gt;&lt;a class="reference internal" href="#summary"&gt;TLDR: quick summary of the article&lt;/a&gt;&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;At first glance, Python functions look like those in most other languages, and they behave just as you'd expect. They take arguments, have default values, and can also return a value. This is intentional, of course. But once you dive deeper, you'll see how many specific nuances are hidden internally, providing a programmer with a number of features that make using functions in Python a much more powerful experience. Knowing the differences is critical to understanding why they behave the way they do, so you can get the most out of them.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;One of the key features is that functions in Python are objects that are created as soon as they are defined. This allows you to use functions as arguments in other functions or as return values, just like any other Python object. Functions' lifetime is different from the execution time, and they exist even after execution has finished. Functions, being objects, also have a set of predefined attributes that can be extended at any time, and their state is maintained outside of the execution. Parameters become local variables, which are completely different entities from function attributes, which exist only at execution time. Default values in the function definition can also be expressions, but they are evaluated only once. Function arguments are always passed by value, but the values they contain are references. This is why they're sometimes called pass-by-object-references. This also means that parameters, like any other variable in Python, are untyped, and contain a copy of a reference to an object. Changing a parameter (a local variable) generally doesn't change an object (passed as an argument) itself, but only stores a reference to another object. However, there is still a way to change an object that is passed as an argument, if it is a mutable object and the change is made directly to it rather than to a variable. For example, updating elements of a list or a dictionary.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This tutorial will focus only on parameters, their different types, and various ways to define them. Let's start with the most common: a function definition with 4 parameters (a, b, c, d). No types, just names, with a lifetime during function execution, i.e. they are created on the stack as local variables only during function execution. When the function is called, it gets 4 arguments (w, x, y, z), which are also local variables (live on a stack), but in the calling environment, and contain references to some objects. Python takes these references stored in the arguments (w, x, y, z) and copies them into parameters (a, b, c, d) that live as local variables on a stack in the called environment:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;caller&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;
    &lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;          &lt;span class="c1"&gt;# 10 20 30 40&lt;/span&gt;

&lt;span class="n"&gt;caller&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;When you call &lt;em&gt;myfunc()&lt;/em&gt; this way, references to objects stored in arguments are copied as values to parameters according to their position, e.g. the value of &lt;em&gt;w&lt;/em&gt; is copied to &lt;em&gt;a&lt;/em&gt;, the value of &lt;em&gt;x&lt;/em&gt; is copied to &lt;em&gt;b&lt;/em&gt;, and so on. This is why such parameters are also called &lt;strong&gt;positional parameters&lt;/strong&gt; - their position defines the value they get. However, you can assign values to parameters in any order by using &lt;strong&gt;keyword arguments&lt;/strong&gt;, i.e. parameter_name=argument:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;caller&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;
    &lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 40 30 20 10&lt;/span&gt;

&lt;span class="n"&gt;caller&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Although, all the 4 parameters must be defined each time the function is called. This can be avoided by setting default values for the parameters in the function definition. Keyword pairs must always be defined after positional parameters:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;caller&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;
    &lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;         &lt;span class="c1"&gt;# 10 30 20 2&lt;/span&gt;
    &lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;             &lt;span class="c1"&gt;# 10 40 30 2&lt;/span&gt;

&lt;span class="n"&gt;caller&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Default values of parameters are stored in the &lt;strong&gt;__defaults__&lt;/strong&gt; object attribute. Python allows you to do neat tricks, because this attribute is mutable, and you can assign default values directly to the attribute. This is even possible for the parameters that don't have default values in the function definition and normally need to be set on the function call:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="vm"&gt;__defaults__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;      &lt;span class="c1"&gt;# (2,)&lt;/span&gt;

&lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="vm"&gt;__defaults__&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="vm"&gt;__defaults__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;      &lt;span class="c1"&gt;# (100, 200, 300, 400)&lt;/span&gt;

&lt;span class="c1"&gt;# note that arguments are not passed at all!&lt;/span&gt;
&lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;                        &lt;span class="c1"&gt;# 100 200 300 400&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Default values can also be expressions, but are evaluated only once. For example, if a list is assigned as a default value, its object is created and its reference is assigned each time a default value is used. This may not be the behavior you expect, since a mutated list on a previous function call will still be passed as the default parameter value on the next call:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[]):&lt;/span&gt;
    &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                 &lt;span class="c1"&gt;# 1 2 3 [1, 2, 3]&lt;/span&gt;
&lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;              &lt;span class="c1"&gt;# 10 20 30 [1, 2, 3, 10, 20, 30]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;A possible workaround for having an empty list as the default value is to use &lt;em&gt;None&lt;/em&gt; instead. This is a singleton, there is always only one instance. Check a parameter for equivalence to None in the code and assign an empty list during a function execution:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                 &lt;span class="c1"&gt;# 1 2 3 [1, 2, 3]&lt;/span&gt;
&lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;              &lt;span class="c1"&gt;# 10 20 30 [10, 20, 30]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Positional&lt;/em&gt; and &lt;em&gt;keyword&lt;/em&gt; parameters can easily coexist in a relatively free form, with the caveat that keyword parameters are always defined after positional parameters. In general, when calling a function, arguments can be passed in a variety of combinations of positional or keyword types, or omitted with a default value:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;             &lt;span class="c1"&gt;# 3 30 20 2&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;However, there are ways to force some parameters to be strictly positional, and others to be keyword only. The first is made possible by another nice feature - a variable number of parameters. Python supports &lt;em&gt;packing&lt;/em&gt; and &lt;em&gt;unpacking&lt;/em&gt; of arguments during a function call, which can be used to pass an arbitrary number of positional and keyword parameters. It has a special syntax for both cases: positional arguments are packed into &lt;em&gt;tuples&lt;/em&gt; if there is a parameter prefixed with an asterisk, e.g. &lt;strong&gt;*params&lt;/strong&gt;, and keyword parameters are packed into &lt;em&gt;dictionaries&lt;/em&gt; if there is a parameter prefixed with a double asterisk, e.g. &lt;strong&gt;**kwparams&lt;/strong&gt;. Note that keyword parameters or a &lt;cite&gt;**kwparams&lt;/cite&gt; parameter, if defined, should always follow any positional parameters or a &lt;cite&gt;*params&lt;/cite&gt;, if it's defined:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwparams&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;           &lt;span class="c1"&gt;# 1 2 20 30&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;               &lt;span class="c1"&gt;# (3, 4)&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kwparams&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;             &lt;span class="c1"&gt;# {&amp;#39;e&amp;#39;: 50, &amp;#39;f&amp;#39;: 60}&lt;/span&gt;

&lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Also, note that the &lt;em&gt;params&lt;/em&gt; tuple and the &lt;em&gt;kwparams&lt;/em&gt; dictionary are both used without asterisks in the code. It even works the other way around. If you have a tuple or a dictionary with some values, you can easily pass them to a function that takes positional or keyword arguments. Just keep an eye on the number of elements:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;kwargs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;b&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;c&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;d&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;               &lt;span class="c1"&gt;# 1 2 10 40&lt;/span&gt;
&lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;             &lt;span class="c1"&gt;# 1 20 30 40&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;To define a unified function that can take any number of arguments of any type, it should have a definition that packs all types of parameters, e.g. &lt;em&gt;myfunc(*params, **kwparams)&lt;/em&gt;. In addition, this syntax strictly separates keyword and positional parameters. If a function has any number of unaggregated keyword parameters after aggregating of positional parameters, then they are considered as &lt;em&gt;keyword-only parameters&lt;/em&gt; with default values. The equivalent attribute with default values is called &lt;strong&gt;__kwdefaults__&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwparams&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;pass&lt;/span&gt;

&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="vm"&gt;__defaults__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;      &lt;span class="c1"&gt;# None&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__kwdefaults__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# {&amp;#39;c&amp;#39;: 1, &amp;#39;d&amp;#39;: 2}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This syntax makes it possible to have a simpler function definition in case there is no need in an arbitrary number of parameters. Just put an asterisk between positional and keyword parameters:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# this doesn&amp;#39;t work anymore&lt;/span&gt;
&lt;span class="c1"&gt;# myfunc(1, 3, 4, 5)&lt;/span&gt;

&lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;          &lt;span class="c1"&gt;# 1 3 1 2&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Nevertheless, there is some room for improvisation. Positional arguments can still be passed as keywords:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;      &lt;span class="c1"&gt;# 4 3 1 2&lt;/span&gt;
&lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;           &lt;span class="c1"&gt;# 4 3 1 20&lt;/span&gt;
&lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;             &lt;span class="c1"&gt;# 4 3 10 2&lt;/span&gt;
&lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                    &lt;span class="c1"&gt;# 4 3 10 20&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Fortunately, Python has the syntax to strictly separate positional-only parameters (which cannot be passed as a keyword) from positional parameters (which can either be passed by a value or a keyword). Both can have default values, by the way. Just put a slash between them:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# this doesn&amp;#39;t work anymore&lt;/span&gt;
&lt;span class="c1"&gt;# myfunc(a=1, b=2, c=4, d=3)&lt;/span&gt;

&lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;        &lt;span class="c1"&gt;# 4 3 1 2&lt;/span&gt;
&lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;          &lt;span class="c1"&gt;# 4 3 1 2&lt;/span&gt;
&lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;             &lt;span class="c1"&gt;# 4 30 1 2&lt;/span&gt;
&lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                       &lt;span class="c1"&gt;# 4 30 10 20&lt;/span&gt;

&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="vm"&gt;__defaults__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;      &lt;span class="c1"&gt;# (30,)&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;myfunc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__kwdefaults__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# {&amp;#39;c&amp;#39;: 10, &amp;#39;d&amp;#39;: 20}&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;As a good example, let's take a look at a prototype of the built-in &lt;em&gt;sorted&lt;/em&gt; function:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nb"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;iterable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reverse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This means that the first argument should always be passed as a positional-only argument. You can't pass it as &lt;cite&gt;iterable=&amp;lt;something&amp;gt;&lt;/cite&gt; keyword. However, all subsequent arguments should always be defined as keywords-only. This also means that the order of these arguments, as well as how many of them are passed, is not important.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Another good example is the &lt;em&gt;pop&lt;/em&gt; method of the &lt;em&gt;list&lt;/em&gt; class:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;index&lt;/em&gt; is a positional-only parameter, but if omitted, -1 will be passed by default.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="summary"&gt;
&lt;h2&gt;Summary&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Functions in Python are objects that are created when defined, allowing them to be used as arguments or return values like any other object.&lt;/li&gt;
&lt;li&gt;Parameters become local variables during function execution, while function attributes exist outside of execution.&lt;/li&gt;
&lt;li&gt;Arguments are passed by value, but parameters contain a copy of the reference. Changing a parameter doesn't change the original object, but changing a mutable object passed as an argument does.&lt;/li&gt;
&lt;li&gt;Parameters can be defined positionally or by keyword. Expressions as the default values are evaluated only once at definition.&lt;/li&gt;
&lt;li&gt;The &lt;em&gt;__defaults__&lt;/em&gt; attribute stores default values of positional parameters and is mutable, allowing direct assignment.&lt;/li&gt;
&lt;li&gt;An asterisk followed by a name (&lt;cite&gt;*var&lt;/cite&gt;) packs positional arguments into a tuple, while a double asterisk followed by a name (&lt;cite&gt;**kwvar&lt;/cite&gt;) packs keyword arguments into a dictionary.&lt;/li&gt;
&lt;li&gt;Keyword arguments always follow positional arguments, with defaults filling in omitted values.&lt;/li&gt;
&lt;li&gt;The use of an asterisk and a slash together could be described in the following way: &lt;cite&gt;&amp;lt;positional-only parameters&amp;gt;&lt;/cite&gt; / &lt;cite&gt;&amp;lt;positional or keyword parameters&amp;gt;&lt;/cite&gt; * &lt;cite&gt;&amp;lt;keyword-only parameters&amp;gt;&lt;/cite&gt;.&lt;/li&gt;
&lt;li&gt;The &lt;em&gt;__kwdefaults__&lt;/em&gt; attribute stores default values of keyword-only parameters that defined after the asterisk.&lt;/li&gt;
&lt;/ul&gt;
&lt;!-- Links --&gt;
&lt;/div&gt;
</content><category term="python"></category><category term="programming"></category></entry><entry><title>Using udp-link to enhance TCP connections stability</title><link href="https://vorakl.com/articles/udp-link/" rel="alternate"></link><published>2024-01-16T18:44:53-08:00</published><updated>2024-01-16T18:44:53-08:00</updated><author><name>vorakl</name></author><id>tag:vorakl.com,2024-01-16:/articles/udp-link/</id><summary type="html">&lt;p class="first last"&gt;A UDP transport layer implementation for proxying TCP connections&lt;/p&gt;
</summary><content type="html">&lt;p&gt;&lt;a class="reference internal" href="#summary"&gt;TLDR: quick summary of the article&lt;/a&gt;&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;I recently discovered &lt;a class="reference external" href="https://github.com/pgul/udp-link"&gt;udp-link&lt;/a&gt;, a very useful project for all those guys like
me who spend most of their working time in terminals over SSH connections.
The tool implements the UDP transport layer, which acts as a proxy for
TCP connections. It's designed to be integrated into the OpenSSH configuration.
However, with a little trick, it can also be used as a general-purpose
TCP-over-UDP proxy. &lt;em&gt;udp-link&lt;/em&gt; greatly improves the stability of connections
over unreliable networks that experience packet loss and intermittent
connectivity. It also includes an IP roaming, which allows TCP connections
to remain alive even if an IP address changes.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;udp-link&lt;/em&gt; is written in C by &lt;a class="reference external" href="https://gul.kiev.ua"&gt;Pavel Gulchuk&lt;/a&gt;, who has a lot of experience
in running unreliable networks. Despite being a young project, the version
&lt;a class="reference external" href="https://github.com/pgul/udp-link/releases/tag/v0.4"&gt;v0.4&lt;/a&gt; shows pretty stable results. Once configured, you won't think about it
anymore. Unless you're surprised every time when SSH connections don't break,
survive a laptop's sleep mode and connections
to different Wi-Fi networks.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;In the current architecture, the client-side tool takes data from the standard
input and sends it to the server side via UDP. The same copy of the tool takes
that data from the network on a specific UDP port and sends it to a TCP service
(local or remote from a server-side perspective).
The destination TCP service and a UDP listening port on the server
side can be specified on the client at startup. Otherwise, a TCP connection
will be established with &lt;em&gt;127.0.0.1:22&lt;/em&gt;, and a port will be randomly chosen from
a predefined port range. Note that the server firewall should allow the
traffic to this port range on UDP. The TCP service can also reside on a different
host if the server side is used as a jumpbox. I consider it one of the greatest
features that &lt;em&gt;udp-link&lt;/em&gt; uses a zero server-side configuration, all
configuration tweaks happen only on the client side.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;udp-link&lt;/em&gt; on the server side does not run as a daemon or listen on a UDP port
all the time. Instead, the client initiates the invocation of the tool
on the server side in listening mode with a randomly generated key. This key
is used to authenticate the client connection. This is done on demand by
establishing a normal SSH connection over TCP with the server side, temporarily,
just to run the tool in the background. The connection is then closed.
This is where a secure client authentication comes into play. &lt;em&gt;udp-link&lt;/em&gt; &lt;strong&gt;doesn't
encrypt the transferred data&lt;/strong&gt;, which is useful when is used together with SSH
because it avoids a double encryption, but needs to be kept that in mind when
used with other configurations.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;To start using &lt;em&gt;udp-link&lt;/em&gt;, you need to clone the repository, compile, and install
the tool on both sides&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;git clone https://github.com/pgul/udp-link.git
&lt;span class="nb"&gt;cd&lt;/span&gt;  udp-link
make
sudo make install
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;and then make an SSH connection on the client side by executing a command
similar to&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;ssh -o &lt;span class="nv"&gt;ProxyCommand&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;udp-link %r@%h&amp;quot;&lt;/span&gt; user@host
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;ProxyCommand&lt;/em&gt; allows SSH to send all its data to the standard input of
a specified command instead of to a TCP connection. This command will be
responsible for sending the data to a server side in some way and should
eventually deliver it to a target SSH service.
OpenSSH also supports a number of macros such as &lt;em&gt;%r&lt;/em&gt; and &lt;em&gt;%p&lt;/em&gt; which can be found
in its documentation. Personally, I use SSH in a slightly different way and
never send out my public SSH keys to unknown hosts. More details on this topic
can be found in a great article '&lt;a class="reference external" href="https://tim.siosm.fr/blog/2023/01/13/openssh-key-management/"&gt;OpenSSH client side key management for better privacy and security&lt;/a&gt;',
written by Timothée Ravier. So I'm actively using &lt;em&gt;ssh_config&lt;/em&gt; files, where
I specify all connection-specific details, such as hostname, username, SSH key,
and in this case, &lt;strong&gt;ProxyCommand&lt;/strong&gt;. My typical &lt;em&gt;ssh_config&lt;/em&gt; file looks
something like this&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;Host some-server
    user some-user
    hostname some-IP
    IdentityFile ~/.ssh/ssh-some-server.key
    ProxyCommand udp-link some-IP

Host some-IP
    user some-user
    IdentityFile ~/.ssh/ssh-some-server.key

Host *
    IdentitiesOnly yes
    IdentityFile /dev/null
    GSSAPIAuthentication no
    HostbasedAuthentication no
    SendEnv no
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;and then to connect I just run&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;ssh some-server
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The second &lt;strong&gt;Host some-IP&lt;/strong&gt; block is needed to provide a correct SSH key to
a temporary SSH connection (without &lt;em&gt;ProxyCommand&lt;/em&gt;) that &lt;em&gt;udp-link&lt;/em&gt; establishes
at the beginning of a new session. To debug the connection, add &lt;em&gt;--debug&lt;/em&gt; option&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;ssh -o &lt;span class="nv"&gt;ProxyCommand&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;udp-link --debug some-IP&amp;quot;&lt;/span&gt; some-server
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;If I need to bind a connection to a specific UDP port on the server side,
I initiate a connection like this&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;ssh -o &lt;span class="nv"&gt;ProxyCommand&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;udp-link -b 1234 some-IP&amp;quot;&lt;/span&gt; some-server
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;You can also bind it to a privileged port (1-1024), but &lt;em&gt;udp-link&lt;/em&gt; needs root
permissions to do this, which can be achieved in a number of ways, such
as making it root-owned with the setuid bit turned on the server-side copy
of a binary file.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;chown root /usr/local/bin/udp-link
chmod u+s /usr/local/bin/udp-link
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Unlike other projects with a similar goal, e.g. &lt;a class="reference external" href="https://github.com/mobile-shell/mosh"&gt;Mosh&lt;/a&gt;, &lt;em&gt;udp-link&lt;/em&gt; doesn't
allocate a pseudo-terminal, which I consider a feature, because it opens
the possibility to use the tool not only for accessing remote terminals, but
also for proxying any arbitrary TCP connection. However, &lt;em&gt;udp-link&lt;/em&gt; cannot
currently listen on a local TCP port on the client
side. Fortunately, this can be worked around by adding &lt;em&gt;socat&lt;/em&gt; and its exceptional
ability to connect things. However, &lt;em&gt;socat&lt;/em&gt; cannot be paired with &lt;em&gt;udp-link&lt;/em&gt; via
an unnamed pipe, because pipes provide a unidirectional interprocess
communication, while here we need a bidirectional communication to get data
back from the network. The trick is that &lt;em&gt;udp-link&lt;/em&gt; is invoked by &lt;em&gt;socat&lt;/em&gt;. Here is
an example of how to open a listening &lt;em&gt;2525/TCP&lt;/em&gt; port on the client side, then
proxy a future TCP connection over a UDP channel to a remote host, and connect
it to a &lt;em&gt;25/TCP&lt;/em&gt; port on the server's localhost in debug mode&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;socat TCP-LISTEN:2525 SYSTEM:&lt;span class="s2"&gt;&amp;quot;udp-link -t 127.0.0.1\:25 --debug some-IP&amp;quot;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;udp-link&lt;/em&gt; is a small, flexible and very useful tool. I hope to see further
development, adding new features and maturing the code base.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="summary"&gt;
&lt;h2&gt;Summary&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;em&gt;udp-link&lt;/em&gt; is a tool that implements the UDP transport layer to act as a proxy for TCP connections over unreliable networks.&lt;/li&gt;
&lt;li&gt;It is designed to be integrated into OpenSSH configuration to improve stability of SSH connections.&lt;/li&gt;
&lt;li&gt;udp-link allows TCP connections to remain alive even if the IP address changes through its IP roaming feature.&lt;/li&gt;
&lt;li&gt;On the client side, udp-link takes data from standard input and sends it to the server side via UDP, where it is then sent to the target TCP service.&lt;/li&gt;
&lt;li&gt;The server side of udp-link does not run as a daemon and instead is invoked on demand by the client through a temporary SSH connection.&lt;/li&gt;
&lt;li&gt;Authentication is done through a randomly generated key during the temporary SSH connection.&lt;/li&gt;
&lt;li&gt;udp-link doesn't encrypt the transferred data, which is useful when is used together with SSH to avoid double encryption.&lt;/li&gt;
&lt;li&gt;Installation is done by cloning the GitHub repo, compiling, and installing it on both client and server.&lt;/li&gt;
&lt;li&gt;All configuration is done only on the client side&lt;/li&gt;
&lt;/ul&gt;
&lt;!-- Links --&gt;
&lt;/div&gt;
</content><category term="networking"></category><category term="tools"></category></entry><entry><title>The zoo of binary-to-text encoding schemes</title><link href="https://vorakl.com/articles/stream-encoding/" rel="alternate"></link><published>2020-05-13T20:01:53-07:00</published><updated>2020-05-13T20:01:53-07:00</updated><author><name>vorakl</name></author><id>tag:vorakl.com,2020-05-13:/articles/stream-encoding/</id><summary type="html">&lt;p class="first last"&gt;A stream encoding algorithm with a variable base (16, 32, 36, 64, 58, 85, 94)&lt;/p&gt;
</summary><content type="html">&lt;p&gt;In &lt;a class="reference external" href="https://vorakl.com/articles/base94/"&gt;the previous article&lt;/a&gt;, I discussed the use of &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Positional_notation"&gt;the positional numeral system&lt;/a&gt; for the purpose of &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Binary-to-text_encoding"&gt;binary-to-text translation&lt;/a&gt;. That method represents a binary file as a single big number with the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Radix"&gt;radix&lt;/a&gt; 256 and then converts this big number to another one with an arbitrary radix (base) in a range from 2 to 94. Although this approach gives the minimum possible size overhead, unfortunately, it also has a number of downsides which make it hardly usable in a real-world situation. In this article, I'll show what is used in practice, which encodings could be found in the wild, and how to build your own encoder.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="what-s-wrong-with-the-positional-single-number-encoding"&gt;
&lt;h2&gt;What's wrong with the positional single number encoding?&lt;/h2&gt;
&lt;p&gt;The main issue with converting a file as a big number in &lt;em&gt;radix 256&lt;/em&gt;  to another big number with a smaller radix is that you need to read the whole file, load it to the memory and build actually that big number from each byte of the file. To construct a number, the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Bit_numbering#Least_significant_byte"&gt;Least Significant Byte&lt;/a&gt; (LSB), which is the last byte of a file, needs to be read and loaded. Although, there is not always enough memory to load a whole file as well as there is not always the whole file is available at any given time. For instance, if it's being transmitted over a network and only a small amount of bytes from the beginning (from the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Bit_numbering#Most_significant_byte"&gt;Most Significant Byte&lt;/a&gt;, MSB) has been loaded. This issue is usually addressed by processing a file as a &lt;strong&gt;stream of bytes&lt;/strong&gt;, in chunks, which then are being converted in the same way (by converting a number from one base to another). These chunks are much smaller and, ideally, fit the CPU registers' size (up to 8 bytes). The only question here is how to find the best size and ratio of such chunks (input and output) to keep the size overhead as closely as possible to a minimum available by treating files as big numbers.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="what-s-the-essence-of-a-positional-numeral-system"&gt;
&lt;h2&gt;What's the essence of a positional numeral system?&lt;/h2&gt;
&lt;p&gt;In the positional numeral systems, everything turns around a &lt;em&gt;radix&lt;/em&gt; (base) which shows how many different symbols are used to represent values. The actual glyph doesn't matter. Only their quantity. All these symbols are grouped in an alphabet (a table) where every symbol is defined by its own position, and this position represents its value. As long as counting starts from 0, the maximum symbol's value, in any numeral system, is always &lt;em&gt;radix - 1&lt;/em&gt;. For instance, in the numeral system with a &lt;em&gt;radix 10&lt;/em&gt; (Decimal), the maximum value has a symbol '9'. But, for a system with a &lt;em&gt;radix 2&lt;/em&gt; (Binary), the maximum value has a symbol '1'. When symbols from an alphabet appear as a part of a number, they are called &lt;em&gt;digits&lt;/em&gt;. A digit's position, in this case, is called &lt;em&gt;index&lt;/em&gt; and defines the power of a radix while its value (position in the alphabet) defines a coefficient within the power of that radix.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;The first crucial conclusion&lt;/em&gt; here is that any number, represented in some positional numeral system, gets its meaning only when is known its radix.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;The second conclusion&lt;/em&gt; is not so obvious. Humans in most cases nowadays use the Decimal numeral system. Numbers gain more sense for them when they are represented as Decimal numbers and this is the system that is used the most for calculations. To any symbol in an alphabet is assigned its certain position which is a number with some radix. In most cases, this radix is 10 (Decimal). The Decimal numeral system is a temporary system that is used for converting one numeral system to another. Every time, when a number is defined by a radix, this radix is Decimal, no matter what's the radix of a number. Every time, when there is a need to convert a number X with radix M to a number Y with radix M, both numbers (X and Y) are represented by some certain alphabets (which define symbols with values), but their radixes (M and N) are always represented in Decimal system, thus, Decimal system is used as an intermediate numeral system to which a number X is converted first, and then the intermediate number is converted to a number Y. The intermediate numeral system could have been any radix, but &lt;em&gt;radix 10&lt;/em&gt; is what people use for calculations and that's what can be found in most converters implementations.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;The third conclusion&lt;/em&gt; is even more important. Symbols don't bring any value, only their position in the alphabet. This means we need to know not only an actual number's representation but also its radix and an alphabet - the table that contains symbols assigned to values (position within the table). A good example is &lt;a class="reference external" href="https://tools.ietf.org/html/rfc4648#page-11"&gt;an alphabet of 16 symbols for Hexadecimal numbers&lt;/a&gt; (&lt;em&gt;radix 16&lt;/em&gt;). There are first 10 digits linked to equivalent values, so the symbol '0' is linked to 0, '1' to 1, and so on up to the symbol '9' linked to 9. The rest 6 values (from 10 to 15) linked to English letter symbols (from 'A' to 'F'). And again, these values (positions in the table) are all Decimal numbers (&lt;em&gt;radix 10&lt;/em&gt;). By the way, the table could have been different, but that's what is used by convention, so anyone is able to interpret Hexadecimal numbers in the same way.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="where-does-the-overhead-come-from"&gt;
&lt;h2&gt;Where does the overhead come from?&lt;/h2&gt;
&lt;p&gt;Let's take a look at a few examples. This is a number '123' that is represented by three symbols, but until we know a radix, it is not possible to understand its value. If the radix is 10 then it is 'one hundred twenty three' in the Decimal system and it can be calculated by &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Positional_notation#Base_of_the_numeral_system"&gt;the formula&lt;/a&gt; for converting a numeral system with any radix to &lt;em&gt;radix 10&lt;/em&gt; (because all numbers in this formula have radix 10): &lt;tt class="docutils literal"&gt;1*10^2 + 2*10^1 + 3*10^0 = 123&lt;/tt&gt;. If the radix is 8, then it is an Octal system and it is constructed as &lt;tt class="docutils literal"&gt;1*8^2 + 2*8^1 + 3*8^0&lt;/tt&gt; which gives us a Decimal number 83. So, &lt;em&gt;'123 base 8'&lt;/em&gt; equals to &lt;em&gt;'83 base 10'&lt;/em&gt;. It is worth noticing that converting a number to a higher radix leads to lower a number of symbols needed for its representation. The converse is also true. If a number 83 with a &lt;em&gt;radix 10&lt;/em&gt; is converted to a &lt;em&gt;radix 2&lt;/em&gt;, it gets a form '1010011'. Notice, the radix is changed from 10 to 2 and the number of symbols changed from 2 to 7! As lower a radix gets, as more symbols appear in representation.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Let's get back to binary files. What we can determine as 'symbol representation' or 'digits', 'alphabet', and 'radix' based on a structure of an ordinary file? Any file consists of bytes as it is the minimum addressable group of bits. It cannot be less than 8 bits. So, we can think about a number representation as of some amount of bytes. The chunks can vary from 1 byte to a file's size. For example, if there is only one byte, then the number consists of only one digit. One byte or 8 bits (binary digits with a &lt;em&gt;radix 2&lt;/em&gt;) allows one to represent &lt;tt class="docutils literal"&gt;2^8 = 256&lt;/tt&gt; different numbers. That means, we can persist 256 different symbols with their positions to build an alphabet. The good news, such a table has already been standardized many years ago and called &lt;a class="reference external" href="https://www.ascii-code.com/"&gt;ASCII&lt;/a&gt;. And the last thing, as the alphabet size is 256 symbols then a radix is also 256. Here is our number: a number of bytes in the chunk that we are going to process are the number of digits, a radix is 256, and the coefficient has a range from 0 to 255. For example, if a group of bytes to read from a stream and process at once consists of 4 bytes (from MSB to LSB): &lt;em&gt;[13, 200, 3, 65]&lt;/em&gt; then our number can be represented as a Decimal number (&lt;em&gt;radix 10&lt;/em&gt;) as &lt;tt class="docutils literal"&gt;13*256^3 + 200*256^2 + 3*256^1 + 65*256^0 = 231211841&lt;/tt&gt;&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;As it was discussed in &lt;a class="reference external" href="https://vorakl.com/articles/base94/"&gt;the previous article&lt;/a&gt;, we can use no more than 94 different symbols to reliably represent texts. Thus, the desired radix lies somewhere in the range from 2 to 94. Even 94 is much less than 256, so a number's representation in a new radix is likely to have more symbols. This means, in turn, that the output group will have more bytes as it is a minimum amount of data we can operate on, even if a digit represented by a symbol needs fewer bits. You'll still need to allocate the whole byte for each symbol in the new radix number representation. Some amount of bits in such bytes will never be used. This is the root of inefficiency, and that's why it's highly important to find a good ratio of output to input byte groups. For instance, the most used nowadays &lt;a class="reference external" href="https://tools.ietf.org/html/rfc4648#section-4"&gt;Base64&lt;/a&gt; encoding converts binary files to texts by reading 3-bytes groups from the input stream, represents them as a 3-digits number with a &lt;em&gt;radix 256&lt;/em&gt; (&lt;tt class="docutils literal"&gt;log[256^3, 2] = 24&lt;/tt&gt; bit), and then converts this number to a 4-digits number with a &lt;em&gt;radix 64&lt;/em&gt; (&lt;tt class="docutils literal"&gt;log[64^4, 2] = 24&lt;/tt&gt; bit), which in turn is written to the output stream as a group of 4 bytes. So, the ratio of output to input is &lt;tt class="docutils literal"&gt;4/3 = 1.333333&lt;/tt&gt;. In other words, the size overhead is 33.(3)%. There are a few considerations behind the logic of choosing the exact combination of input and output groups for a streaming conversion, which includes a target radix, a desirable/available alphabet, an ability to natively compute on a CPU, etc.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="how-to-calculate-a-minimal-overhead"&gt;
&lt;h2&gt;How to calculate a minimal overhead?&lt;/h2&gt;
&lt;p&gt;Let's calculate first, how many digits of a target base (radix) are needed to represent exactly the same number in the initial base. For instance, there is given a number 123 with a radix 10. How many bits (binary digits, a radix 2) are needed to represent the same decimal number? Every digit is a coefficient of power of a base. If it is not enough, one more base is added in power +1 to finally construct a number. Keeping in mind that counting starts from 0, if it's said that to represent some number 8 bit are needed, this means all bases in powers from 0 to 7 with their coefficients have to be summed up. Thus, to find out a number of digits needed to represent the number in some radix, we need to find an exponent, to which a new radix needs to be exponentiated. In our case,  for a base-10 number 123, we need to calculate an exponent of a base-2 by using a logarithm function: &lt;tt class="docutils literal"&gt;log[123, 2] = 6.9425145&lt;/tt&gt;. This means, to represent a number 123 with base 10, a little bit less than 7 bits will be enough. All computer systems operate on a set of &lt;a class="reference external" href="https://vorakl.com/articles/numbers/"&gt;natural numbers&lt;/a&gt; only. It is not possible to use 6.9425145 bits as this number is an approximated value of needed bits. 6 bits apparently won't be enough (&lt;tt class="docutils literal"&gt;2^6 = 64&lt;/tt&gt;, which is much less than 123), so the only right approach is always to round up (by calling a &lt;em&gt;ceil&lt;/em&gt; function) any non-integer values. Unfortunately, 7 bits are able to represent a bigger number (&lt;tt class="docutils literal"&gt;2^7 = 128&lt;/tt&gt;) and this again contributes to a final overhead.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Let's have a look at the Base64 again. We know already (but not why is that, yet), that this streaming system uses 3 input bytes (a 3-digit number with a &lt;em&gt;base 256&lt;/em&gt;) and converts them to a number with a &lt;em&gt;base 64&lt;/em&gt;. How many base-64 digits will this number contain? The answer is &lt;tt class="docutils literal"&gt;log[256^3, 64] = 4&lt;/tt&gt;, four digits, hence 4 symbols from the base64 alphabet.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;While looking for the good input and output group sizes it's good to know a theoretically possible minimum of the overhead. To find it out, we need to do a similar calculation but take the minimally possible amount of input data, which is one byte (8 bits, decimal &lt;tt class="docutils literal"&gt;2^8 = 256&lt;/tt&gt;). For the Base64, it is &lt;tt class="docutils literal"&gt;log[256, 64] = 1.33(3)&lt;/tt&gt;, that is again 33.(3)%. For the &lt;a class="reference external" href="https://tools.ietf.org/html/rfc4648#section-6"&gt;Base32&lt;/a&gt; it is &lt;tt class="docutils literal"&gt;log[256, 32] =&amp;nbsp; 1.6&lt;/tt&gt;, that is 60%. And for the &lt;a class="reference external" href="https://tools.ietf.org/html/rfc4648#section-8"&gt;Base16&lt;/a&gt; it is &lt;tt class="docutils literal"&gt;log[256, 16] = 2&lt;/tt&gt;, that is 100%. Wow! These theoretical numbers are exactly the same as practically used ratios of output bytes to input bytes give. Here are they: for the Base64 it is &lt;tt class="docutils literal"&gt;4 / 3 = 1.33(3)&lt;/tt&gt;, for the Base32 it is &lt;tt class="docutils literal"&gt;8 / 5 = 1.6&lt;/tt&gt;, and for the Base16 it is &lt;tt class="docutils literal"&gt;2 / 1 = 2&lt;/tt&gt;. There is one interesting fact, all these three bases (16, 32, 64) have one thing in common - they all are powers of two! This leads us to the conclusion that converting numbers within the &amp;quot;power of two&amp;quot; bases allows one to get the best possible ratio and match precisely an input bits group to an output bits group. Although it is not always desirable or even possible. Sometimes there is a need to use a specific alphabet, e.g. in &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Base36"&gt;Base36&lt;/a&gt;, or the minimal overhead, e.g. in &lt;a class="reference external" href="https://www.johndcook.com/blog/2019/03/05/base85-encoding/"&gt;Base85&lt;/a&gt; or &lt;a class="reference external" href="https://gist.github.com/iso2022jp/4054241"&gt;Base94&lt;/a&gt;. All these bases are not the &amp;quot;powers of two&amp;quot;, so a tradeoff has to be found to minimize the overhead.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="how-to-calculate-optimal-input-and-output-groups"&gt;
&lt;h2&gt;How to calculate optimal input and output groups?&lt;/h2&gt;
&lt;p&gt;Alright, we've calculated a number of digits needed to represent some number in another base. But, why is that only a theoretical minimum? Why in practice it would need more? And, why would we still need to find a good ratio of output to input byte groups? To answer these questions, let's have a look at the &lt;strong&gt;Base85&lt;/strong&gt; encoding. To represent 1 byte (Base256) of information in Base85, it needs &lt;tt class="docutils literal"&gt;log[256, 85] = 1.24816852&lt;/tt&gt; digits. But, we can't use 1.248 digits. Only positive whole numbers are available! 1 digit is neither possible (too little). Then, 2 digits are the only way to go. In other words, to represent 1 byte (with a number in Base256), in fact, we'd need 2 bytes  (with a number in Base85), where ~75% of space will be wasted, as the ratio is &lt;tt class="docutils literal"&gt;2/1 = 2&lt;/tt&gt; and this is a 100% overhead, instead of a theoretical 24.8%. There is no point to use 1-byte input group and 2-bytes output group. Thus, there should be some good input and output groups so their ratio goes as close as possible to a calculated minimum or even match it!&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The following approach starts from 1-byte group and using the same formula, every time checks a number of digits in the destination base. if it's not close enough, increments the input group by 1 byte and checks again. You can decide on your own, what is the applicable size of an input group and how close to the whole number up (ceil function) the output group needs to be.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This code goes through all bases, from 2 to 94, and prints a first found input/output group that has a delta between the number of digits and its rounded value less or equal 0.1, if any. That is, &lt;tt class="docutils literal"&gt;ceil(x) - x &amp;lt;=0.1&lt;/tt&gt;. I limited an input group by 20 bytes but in reality, groups larger than 8 bytes (64bit) will require either a &lt;a class="reference external" href="https://gist.github.com/iso2022jp/4054241"&gt;more complicated implementation&lt;/a&gt; still based on 64bit variable types or the big number mathematics which would bring it back to the solution from &lt;a class="reference external" href="https://vorakl.com/articles/base94/"&gt;the previous article&lt;/a&gt;.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;math&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ceil&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;find_dec_fractions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)]:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ceil&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;=&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;95&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;b_in&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b_out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;find_dec_fractions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="ne"&gt;TypeError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;continue&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Base{i}: output/input {b_out} / {b_in}; Ratio: {ceil(b_out)} / {b_in} = {ceil(b_out)/b_in}&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;Base2: output/input 8.0 / 1; Ratio: 8 / 1 = 8.0
Base3: output/input 95.90132254286152 / 19; Ratio: 96 / 19 = 5.052631578947368
Base4: output/input 4.0 / 1; Ratio: 4 / 1 = 4.0
Base6: output/input 30.948224578763327 / 10; Ratio: 31 / 10 = 3.1
Base7: output/input 19.94760247804924 / 7; Ratio: 20 / 7 = 2.857142857142857
Base8: output/input 8.0 / 3; Ratio: 8 / 3 = 2.6666666666666665
Base9: output/input 42.9032232428591 / 17; Ratio: 43 / 17 = 2.5294117647058822
Base10: output/input 40.940079410301436 / 17; Ratio: 41 / 17 = 2.411764705882353
Base11: output/input 6.937555831629307 / 3; Ratio: 7 / 3 = 2.3333333333333335
Base12: output/input 8.926174260836154 / 4; Ratio: 9 / 4 = 2.25
Base13: output/input 12.971431412511347 / 6; Ratio: 13 / 6 = 2.1666666666666665
Base14: output/input 18.910766522677935 / 9; Ratio: 19 / 9 = 2.111111111111111
Base15: output/input 38.905619771091956 / 19; Ratio: 39 / 19 = 2.0526315789473686
Base16: output/input 2.0 / 1; Ratio: 2 / 1 = 2.0
Base17: output/input 1.957204336945808 / 1; Ratio: 2 / 1 = 2.0
Base18: output/input 1.9184997325450517 / 1; Ratio: 2 / 1 = 2.0
Base19: output/input 16.949441762397953 / 9; Ratio: 17 / 9 = 1.8888888888888888
Base20: output/input 12.957179936946513 / 7; Ratio: 13 / 7 = 1.8571428571428572
Base21: output/input 10.928171937453742 / 6; Ratio: 11 / 6 = 1.8333333333333333
Base22: output/input 8.969752968703016 / 5; Ratio: 9 / 5 = 1.8
Base23: output/input 15.916660520940269 / 9; Ratio: 16 / 9 = 1.7777777777777777
Base24: output/input 6.97933734353701 / 4; Ratio: 7 / 4 = 1.75
Base25: output/input 18.949768555229294 / 11; Ratio: 19 / 11 = 1.7272727272727273
Base26: output/input 11.913778998988336 / 7; Ratio: 12 / 7 = 1.7142857142857142
Base27: output/input 26.919669485715517 / 16; Ratio: 27 / 16 = 1.6875
Base28: output/input 4.992350344236227 / 3; Ratio: 5 / 3 = 1.6666666666666667
Base29: output/input 4.940323979050427 / 3; Ratio: 5 / 3 = 1.6666666666666667
Base30: output/input 17.933964143964545 / 11; Ratio: 18 / 11 = 1.6363636363636365
Base31: output/input 12.91834154125439 / 8; Ratio: 13 / 8 = 1.625
Base32: output/input 8.0 / 5; Ratio: 8 / 5 = 1.6
Base33: output/input 7.929594526822421 / 5; Ratio: 8 / 5 = 1.6
Base35: output/input 10.917705226052034 / 7; Ratio: 11 / 7 = 1.5714285714285714
Base36: output/input 13.926701060443497 / 9; Ratio: 14 / 9 = 1.5555555555555556
Base37: output/input 19.963706880682256 / 13; Ratio: 20 / 13 = 1.5384615384615385
Base38: output/input 25.91499209004118 / 17; Ratio: 26 / 17 = 1.5294117647058822
Base41: output/input 2.9864385798230937 / 2; Ratio: 3 / 2 = 1.5
Base42: output/input 2.9671843746459023 / 2; Ratio: 3 / 2 = 1.5
Base43: output/input 2.9486213303792987 / 2; Ratio: 3 / 2 = 1.5
Base44: output/input 2.930708014618138 / 2; Ratio: 3 / 2 = 1.5
Base45: output/input 2.913406407519012 / 2; Ratio: 3 / 2 = 1.5
Base46: output/input 15.93174851664354 / 11; Ratio: 16 / 11 = 1.4545454545454546
Base47: output/input 12.96225551928187 / 9; Ratio: 13 / 9 = 1.4444444444444444
Base48: output/input 22.918685664133292 / 16; Ratio: 23 / 16 = 1.4375
Base49: output/input 9.97380123902462 / 7; Ratio: 10 / 7 = 1.4285714285714286
Base50: output/input 9.922293927591243 / 7; Ratio: 10 / 7 = 1.4285714285714286
Base51: output/input 16.92397770133268 / 12; Ratio: 17 / 12 = 1.4166666666666667
Base53: output/input 6.983337201921797 / 5; Ratio: 7 / 5 = 1.4
Base54: output/input 6.9506137148575995 / 5; Ratio: 7 / 5 = 1.4
Base55: output/input 6.918787617803083 / 5; Ratio: 7 / 5 = 1.4
Base56: output/input 17.9083251145862 / 13; Ratio: 18 / 13 = 1.3846153846153846
Base57: output/input 10.97226243673046 / 8; Ratio: 11 / 8 = 1.375
Base58: output/input 10.925265898478088 / 8; Ratio: 11 / 8 = 1.375
Base59: output/input 14.959262233248435 / 11; Ratio: 15 / 11 = 1.3636363636363635
Base60: output/input 18.960906451063522 / 14; Ratio: 19 / 14 = 1.3571428571428572
Base61: output/input 22.93138142177215 / 17; Ratio: 23 / 17 = 1.3529411764705883
Base64: output/input 4.0 / 3; Ratio: 4 / 3 = 1.3333333333333333
Base65: output/input 3.9851435091825076 / 3; Ratio: 4 / 3 = 1.3333333333333333
Base66: output/input 3.9706212940573997 / 3; Ratio: 4 / 3 = 1.3333333333333333
Base67: output/input 3.9564205613318486 / 3; Ratio: 4 / 3 = 1.3333333333333333
Base68: output/input 3.942529199089205 / 3; Ratio: 4 / 3 = 1.3333333333333333
Base69: output/input 3.9289357306851747 / 3; Ratio: 4 / 3 = 1.3333333333333333
Base70: output/input 3.9156292724042583 / 3; Ratio: 4 / 3 = 1.3333333333333333
Base71: output/input 3.9025994945192193 / 3; Ratio: 4 / 3 = 1.3333333333333333
Base72: output/input 12.966121951449782 / 10; Ratio: 13 / 10 = 1.3
Base73: output/input 12.92443739543971 / 10; Ratio: 13 / 10 = 1.3
Base74: output/input 21.90208895887644 / 17; Ratio: 22 / 17 = 1.2941176470588236
Base75: output/input 8.990468784305198 / 7; Ratio: 9 / 7 = 1.2857142857142858
Base76: output/input 8.962972102269996 / 7; Ratio: 9 / 7 = 1.2857142857142858
Base77: output/input 8.935999277516537 / 7; Ratio: 9 / 7 = 1.2857142857142858
Base78: output/input 8.909533240680473 / 7; Ratio: 9 / 7 = 1.2857142857142858
Base79: output/input 13.959876384572452 / 11; Ratio: 14 / 11 = 1.2727272727272727
Base80: output/input 13.919804002700841 / 11; Ratio: 14 / 11 = 1.2727272727272727
Base81: output/input 18.927892607143722 / 15; Ratio: 19 / 15 = 1.2666666666666666
Base82: output/input 23.908573597131127 / 19; Ratio: 24 / 19 = 1.263157894736842
Base85: output/input 4.9926740807112 / 4; Ratio: 5 / 4 = 1.25
Base86: output/input 4.979564524879807 / 4; Ratio: 5 / 4 = 1.25
Base87: output/input 4.966674008644963 / 4; Ratio: 5 / 4 = 1.25
Base88: output/input 4.953996247544582 / 4; Ratio: 5 / 4 = 1.25
Base89: output/input 4.941525209635524 / 4; Ratio: 5 / 4 = 1.25
Base90: output/input 4.929255102536434 / 4; Ratio: 5 / 4 = 1.25
Base91: output/input 4.917180361275656 / 4; Ratio: 5 / 4 = 1.25
Base92: output/input 4.905295636885699 / 4; Ratio: 5 / 4 = 1.25
Base93: output/input 15.904186303494539 / 13; Ratio: 16 / 13 = 1.2307692307692308
Base94: output/input 10.984670683283468 / 9; Ratio: 11 / 9 = 1.2222222222222223
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="conclusions"&gt;
&lt;h2&gt;Conclusions&lt;/h2&gt;
&lt;p&gt;This output provides several interesting insights:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;All the &amp;quot;power of two&amp;quot; bases, e.g. Base16/32/64, always have a whole number of required digits, as the source base is also the &amp;quot;power of two&amp;quot;! This simple fact makes it even easier to calculate the optimal groups by using a method of finding &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Least_common_multiple"&gt;LCM (Least Common Multiple)&lt;/a&gt;, also shown in &lt;a class="reference external" href="https://vorakl.com/articles/base94/"&gt;the previous article&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;There are a few groups of adjacent bases that require the same number of digits but are different by the size of their alphabets. It seems reasonable to prefer smaller alphabets, as less special symbols lead to better readability, e.g. when an encoded text needs to be used within a value of some variable in a programming language, or read verbally over a voice channel (encoded license keys).&lt;/li&gt;
&lt;li&gt;Usually, the size of binary files, and especially executable files, appears to be evenly divisible by 4. This makes reasonable to use bases, that have 4-byte input groups. Then, there will be fewer chances to convert files, where the last byte group doesn't have all the needed data to perform the conversion. Although, even if it happens, it usually addresses using padding by NULL-symbols. The &lt;a class="reference external" href="https://tools.ietf.org/html/rfc4648#section-3.2"&gt;Base32 and Base64 for padding&lt;/a&gt; uses one extra symbol (out of the alphabet) '=', and &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Ascii85#Adobe_version"&gt;Ascii85 uses an even smarter approach&lt;/a&gt;, with no extra symbols on the output stream.&lt;/li&gt;
&lt;li&gt;Among all bases in the list, there is one outstanding base, Base85. It uses 4 input bytes that aligned with the average case of binary files. 5 output bytes give only 25% overhead which provides better efficiency than Base64 (with its 33.3%). Both groups fit CPU's registers all modern computers. All these factors make this encoding much more optimal for a binary-to-text encoding than commonly used nowadays on the Internet encoding - Base64 or some times ago on the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/FidoNet"&gt;FidoNet&lt;/a&gt; - &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Uuencoding"&gt;UUEncode&lt;/a&gt; (which internally is the same Base64). With the differences in alphabets, Base85 is used in &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Ascii85"&gt;PDF&lt;/a&gt;, &lt;a class="reference external" href="https://github.com/git/git/blob/53f9a3e157dbbc901a02ac2c73346d375e24978c/base85.c"&gt;Git&lt;/a&gt;, &lt;a class="reference external" href="https://rfc.zeromq.org/spec/32/"&gt;ZeroMQ&lt;/a&gt;, and also implemented in the &lt;a class="reference external" href="https://github.com/python/cpython/blob/3.8/Lib/base64.py#L416"&gt;Standard Python Library base64&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;There are also known to be used &lt;a class="reference external" href="https://www.crockford.com/base32.html"&gt;Crockford-Base32&lt;/a&gt;, &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Base36"&gt;Base36&lt;/a&gt;, and &lt;a class="reference external" href="https://www.johndcook.com/blog/2019/03/04/base-58-encoding-and-bitcoin-addresses/"&gt;Base58&lt;/a&gt; in special applications, as efficiency is not the main consideration for their use and they meet other requirements.&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;!-- Links --&gt;
&lt;/div&gt;
</content><category term="cs"></category><category term="programming"></category><category term="binary-to-text"></category><category term="encoding"></category></entry><entry><title>Convert binary data to a text with the lowest overhead</title><link href="https://vorakl.com/articles/base94/" rel="alternate"></link><published>2020-04-18T23:10:29-07:00</published><updated>2020-04-18T23:10:29-07:00</updated><author><name>vorakl</name></author><id>tag:vorakl.com,2020-04-18:/articles/base94/</id><summary type="html">&lt;p class="first last"&gt;A binary-to-text encoding with any radix from 2 to 94&lt;/p&gt;
</summary><content type="html">&lt;p&gt;&lt;a class="reference internal" href="#summary"&gt;TLDR: quick summary of the article&lt;/a&gt;&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This article discusses &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Binary-to-text_encoding"&gt;binary/text converters&lt;/a&gt;, the most popular implementations, and a non-standard approach that uses &lt;a class="reference external" href="https://merrigrove.blogspot.com/2014/04/what-heck-is-base64-encoding-really.html"&gt;place-based single-number encoding&lt;/a&gt; by representing a file as a large number and then converting it to another large number with any non-256 (1-byte/8-bit) radix. To make it practical, it makes sense to limit a radix (base) to 94 for matching numbers to all possible printable symbols within the 7-bit &lt;a class="reference external" href="https://www.ascii-code.com/"&gt;ASCII&lt;/a&gt; table. It is probably a theoretical prototype and has a purely academic flavor, as the time and space complexities make it applicable only to small files (up to a few tens of kilobytes), although it allows one to choose any base with no dependencies on powers of two, e.g. 7 or 77.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="background"&gt;
&lt;h2&gt;Background&lt;/h2&gt;
&lt;p&gt;The main purpose of such converters is to convert a binary file represented by 256 different symbols (radix-256, 1 byte, 2^8) into a form suitable for transmission over a channel with a limited range of supported symbols. A good example is any text-based network protocol, such as HTTP (before ver. 2) or SMTP, where all transmitted binary data must be reversibly converted to a pure text form without control symbols. As you may know, ASCII codes from 0 to 31 are considered control characters, and therefore will definitely be lost during transmission over any logical channel that doesn't allow endpoints to transmit full 8-bit bytes (binary) with codes from 0 to 255. This limits the number of allowed symbols to less than 224 (256-32), but it's actually limited by the first 128 (2^7, 7 bits) standardized symbols in the ASCII table, and even more.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The standard solution today is the Base64 algorithm defined in &lt;a class="reference external" href="https://tools.ietf.org/html/rfc4648"&gt;RFC 4648&lt;/a&gt; (easy to read and understand). It also describes Base32 and Base16 as possible variants. The key point here is that they all share the same property of being powers of two. The wider the range of supported symbols (codes), the more space-efficient the result of the conversion. It will be larger anyway, the question is how much larger. For example, Base64 encoding gives about 33% larger output, because 3 input bytes (8 valued bits) are translated into 4 output bytes (6 valued bits, 2^6=64). So the ratio is always 4/3, i.e. the output is larger by 1/3 or 33.(3)%. Practically speaking, Base32 is very inefficient because it means translating 5 input bytes (8 valued bits) into 8 output bytes (5 valued bits, 2^5=32) and the ratio is 8/5, i.e. the output is larger by 3/5 or 60%. In this context, it is hard to consider any kind of efficiency of Base16, since its output size is larger by 100% (each byte of 8 valued bits is represented by two bytes of 4 valued bits, also known as &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Nibble"&gt;nibbles&lt;/a&gt;, 2^4=16). By the way, this is a well-used representation of 8-bit bytes, called hexadecimal.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;If you're curious how these input/output byte ratios were calculated for
the Base64/32/16 encodings, the answer is LCM (Least Common Multiple). Let's
calculate it ourselves, and for that we need another function, the GCD (Greatest
Common Divisor)&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;dl class="first docutils"&gt;
&lt;dt&gt;Base64 (Input: 8 bits, Output: 6 bits):&lt;/dt&gt;
&lt;dd&gt;&lt;ul class="first last"&gt;
&lt;li&gt;LCM(8, 6) = 8*6/GCD(8,6) = 24 bit&lt;/li&gt;
&lt;li&gt;Input: 24 / 8 = 3 bytes&lt;/li&gt;
&lt;li&gt;Output: 24  / 6  = 4 bytes&lt;/li&gt;
&lt;li&gt;Ratio (Output/Input): 4/3&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/li&gt;
&lt;li&gt;&lt;dl class="first docutils"&gt;
&lt;dt&gt;Base32 (Input: 8 bits, Output: 5 bits):&lt;/dt&gt;
&lt;dd&gt;&lt;ul class="first last"&gt;
&lt;li&gt;LCM(8, 5) = 8*5/GCD(8,5) = 40 bit&lt;/li&gt;
&lt;li&gt;Input: 40 / 8 = 5 bytes&lt;/li&gt;
&lt;li&gt;Output: 40  / 5  = 8 bytes&lt;/li&gt;
&lt;li&gt;Ratio (Output/Input): 8/5&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/li&gt;
&lt;li&gt;&lt;dl class="first docutils"&gt;
&lt;dt&gt;Base16 (Input: 8 bits, Output: 4 bits):&lt;/dt&gt;
&lt;dd&gt;&lt;ul class="first last"&gt;
&lt;li&gt;LCM(8, 4) = 8*4/GCD(8,4) = 8 bit&lt;/li&gt;
&lt;li&gt;Input: 8 / 8 = 1 byte&lt;/li&gt;
&lt;li&gt;Output: 8  / 4  = 2 bytes&lt;/li&gt;
&lt;li&gt;Ratio (Output/Input): 2/1&lt;/li&gt;
&lt;/ul&gt;
&lt;/dd&gt;
&lt;/dl&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="the-key-problem"&gt;
&lt;h2&gt;The key problem&lt;/h2&gt;
&lt;p&gt;What if a channel is only capable of transmitting a few different symbols, such as 9 or 17? That is, we have a file represented by a 256-symbol alphabet (a normal 8-bit byte), we are not really limited by computing power or memory constraints on either side, but we are only able to send 7 different symbols instead of 256? Base64/32/16 are not a solution here. Then Base7 is the only possible output format.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Another example, what if the amount of data transmitted is a concern for a channel? Base64, as it has been shown, increases the data by 33%, no matter what is transmitted, always. Base94, for example, only increases the output by 22%.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;It may seem that Base94 is not the limit. If the first 32 ASCII codes are control characters, and there are 256 codes in total, what stops you from using an alphabet of 256 - 32 = 224 symbols? It turns out that there is a limit. Not all of the 224 ASCII codes are printable characters or have a standard representation. In general, only 7 bits (0..127) are standardized, and the rest (128..255) is used for the variety of locales, e.g. Koi8-R, Windows-1251, etc. This means that only 128 - 32 = 96 are available in the standardized range. In addition, the ASCII code 32 is the space character, and 127 doesn't have a visible character either. So 96 - 2 gives us the 94 printable characters that have the same association with their codes on most machines.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="solution"&gt;
&lt;h2&gt;Solution&lt;/h2&gt;
&lt;p&gt;Although &lt;a class="reference external" href="https://github.com/vorakl/base94"&gt;this solution&lt;/a&gt; is quite simple, this simplicity also imposes a significant computational constraint. The entire input file can be treated as a large number with a base of 256. It could easily be a really big number, requiring thousands of bits. Then all we have to do is convert that big number to a different base. That's it. And Python3 makes it even easier! Normally, conversions between different bases are done via an intermediate base10. The good news is that Python3 has built-in support for large number calculations. The int class has a method that reads any number of bytes and automatically represents them as a large Base10 number with a desired endian. So essentially all of this complexity can be implemented in just two lines of code!&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;inpit_file&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;rb&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;in_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;big&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;where &lt;em&gt;in_data&lt;/em&gt; is the big Base10 number. That's only two lines, but that's where most of the math happens and most of the time is spent. So now convert it to any other base, as you'd normally do with normal small decimal numbers.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Read more about stream encoding algorithms with a variable base (16, 32, 36, 64, 58, 85, 94) in my next article &lt;a class="reference external" href="https://vorakl.com/articles/stream-encoding/"&gt;The zoo of binary-to-text encoding schemes&lt;/a&gt;.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="summary"&gt;
&lt;h2&gt;Summary&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;The article discusses converting binary data to text using different encoding schemes like Base64, Base32, Base16 and a non-standard Base94.&lt;/li&gt;
&lt;li&gt;To allow transmission over text-based protocols like HTTP and SMTP, standard encoding schemes like Base64 or Base32, represent binary data as powers of two.&lt;/li&gt;
&lt;li&gt;Base64 increases a file size by around 33% while Base32 increases it by 60% due to their encoding ratios of input to output bytes.&lt;/li&gt;
&lt;li&gt;For practical purposes, the Base94 may be used to match all printable ASCII characters from 0-127.&lt;/li&gt;
&lt;li&gt;Base94 is very efficient and increases a file size by only around 22% compared to Base64's 33% increase.&lt;/li&gt;
&lt;li&gt;Python allows implementing the approach easily using its large number capabilities and byte conversion methods.&lt;/li&gt;
&lt;li&gt;The key steps of described approach are converting the binary file to a large Base10 number, then converting it to the desired output base.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;!-- Links --&gt;
&lt;/div&gt;
</content><category term="cs"></category><category term="binary-to-text"></category><category term="encoding"></category></entry><entry><title>My notes for the "Pragmatic Thinking and Learning" book</title><link href="https://vorakl.com/articles/learning/" rel="alternate"></link><published>2020-01-18T22:01:10-08:00</published><updated>2020-01-18T22:01:10-08:00</updated><author><name>vorakl</name></author><id>tag:vorakl.com,2020-01-18:/articles/learning/</id><summary type="html">&lt;p class="first last"&gt;Notes in the form of mindmaps&lt;/p&gt;
</summary><content type="html">&lt;div class="section" id="new-skill-acquisition"&gt;
&lt;h2&gt;New Skill Acquisition&lt;/h2&gt;
&lt;a class="reference external image-reference" href="https://vorakl.com/files/learning/new-skill-acquisition.png"&gt;&lt;img alt="New skill acquisition" class="img" src="https://vorakl.com/files/learning/new-skill-acquisition.png" style="width: 100%;" /&gt;&lt;/a&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="pragmatic-learning-plan"&gt;
&lt;h2&gt;Pragmatic Learning Plan&lt;/h2&gt;
&lt;a class="reference external image-reference" href="https://vorakl.com/files/learning/pragmatic-learning-plan.png"&gt;&lt;img alt="Pragmatic learning plan" class="img" src="https://vorakl.com/files/learning/pragmatic-learning-plan.png" style="width: 100%;" /&gt;&lt;/a&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="dreyfus-model"&gt;
&lt;h2&gt;Dreyfus model&lt;/h2&gt;
&lt;a class="reference external image-reference" href="https://vorakl.com/files/dreyfus/dreyfus.png"&gt;&lt;img alt="Dreyfus model" class="img" src="https://vorakl.com/files/dreyfus/dreyfus.png" style="width: 100%;" /&gt;&lt;/a&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="mastering-knowledge"&gt;
&lt;h2&gt;Mastering Knowledge&lt;/h2&gt;
&lt;a class="reference external image-reference" href="https://vorakl.com/files/learning/mastering-knowledge.png"&gt;&lt;img alt="Mastering knowledge" class="img" src="https://vorakl.com/files/learning/mastering-knowledge.png" style="width: 100%;" /&gt;&lt;/a&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="gaining-experience"&gt;
&lt;h2&gt;Gaining Experience&lt;/h2&gt;
&lt;a class="reference external image-reference" href="https://vorakl.com/files/learning/gaining-experience.png"&gt;&lt;img alt="Gaining experience" class="img" src="https://vorakl.com/files/learning/gaining-experience.png" style="width: 100%;" /&gt;&lt;/a&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="how-to-start-learning"&gt;
&lt;h2&gt;How to start learning&lt;/h2&gt;
&lt;a class="reference external image-reference" href="https://vorakl.com/files/learning/how-to-start-learning.png"&gt;&lt;img alt="How to start learning" class="img" src="https://vorakl.com/files/learning/how-to-start-learning.png" style="width: 100%;" /&gt;&lt;/a&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="see-also"&gt;
&lt;h2&gt;See also&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="https://vorakl.com/articles/dreyfus/"&gt;Dreyfus model of skill acquisition&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="https://vorakl.com/articles/smart/"&gt;Managing your plans in the S.M.A.R.T. way&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="https://vorakl.com/articles/sq3r/"&gt;SQ3R&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;!-- Links --&gt;
&lt;/div&gt;
</content><category term="learning"></category><category term="mindmap"></category></entry><entry><title>Computer Science vs Information Technology</title><link href="https://vorakl.com/articles/cs-vs-it/" rel="alternate"></link><published>2019-12-20T15:26:50-08:00</published><updated>2019-12-20T15:26:50-08:00</updated><author><name>vorakl</name></author><id>tag:vorakl.com,2019-12-20:/articles/cs-vs-it/</id><summary type="html">&lt;p class="first last"&gt;Differences between two computer-related studies&lt;/p&gt;
</summary><content type="html">&lt;p&gt;If you ever thought about getting a computer-related (graduated) education, you
probably came across a variety of similar disciplines, more or less connected
to each other, but grouped under two major fields of study: Computer Science (CS)
and &lt;a class="reference external" href="https://en.m.wikipedia.org/wiki/Information_technology"&gt;Information Technology&lt;/a&gt; (IT). The latest one sometimes comes in a broader
meaning - &lt;a class="reference external" href="https://en.m.wikipedia.org/wiki/Information_and_communications_technology"&gt;Information Communications Technology&lt;/a&gt; (ICT), and Computer Science,
in turn, is highly linked to Electrical Engineering.
But what exactly makes them all different?&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Briefly, Computer Science creates computer software technologies,  Electrical
Engineering creates hardware to run this software in an efficient way, while
Information Technology uses them later to create Information Systems for
storing, processing and transmitting data.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;CS is a study of &lt;strong&gt;using computation&lt;/strong&gt; and computer systems for
solving real-world problems. Dealing mostly with software, the study includes
the theory of computation and computer architecture, design, development, and
application of software systems. The most common problems are organized in
groups in particular areas, such as Distributed Systems, Artificial
Intelligence, Data Science, Programming Languages and Compilers, Algorithms
and Data Structures, etc. Summarizing, CS mainly focuses on
finding answers to the following questions (by &lt;a class="reference external" href="https://www.youtube.com/watch?v=CK4xrHi-IrQ"&gt;John DeNero&lt;/a&gt;, cs61a):&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;which real-world problems can be solved using computation&lt;/li&gt;
&lt;li&gt;how to solve these problems&lt;/li&gt;
&lt;li&gt;how to solve them efficiently&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The fact that CS is all about software, makes it tightly coupled to
&lt;em&gt;Electrical Engineering&lt;/em&gt; that deals with hardware and focuses on designing
computer systems and electronic devices for running software in the most
efficient way.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Unlike CS, IT is a study of &lt;strong&gt;using computers&lt;/strong&gt; to design, build, and operate
&lt;em&gt;Information Systems&lt;/em&gt; which are used for storing and processing information (data).
ICT extends it by applying telecommunications for receiving and transmitting data.
It is crucial to notice, that IT apply &lt;em&gt;existing technologies&lt;/em&gt; (e.g. hardware,
operating systems, systems software, middleware applications, databases,
networks) for creating Information Systems. Hence, IT professionals are users
of technologies and utilize existing solutions (hardware and software) to create
larger systems for solving a specific business need.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;a class="reference external image-reference" href="https://vorakl.com/files/cs-vs-it/cs.png"&gt;&lt;img alt="Turing completeness" class="img" src="https://vorakl.com/files/cs-vs-it/cs.png" style="width: 100%;" /&gt;&lt;/a&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;a class="reference external image-reference" href="https://vorakl.com/files/cs-vs-it/it.png"&gt;&lt;img alt="Turing completeness" class="img" src="https://vorakl.com/files/cs-vs-it/it.png" style="width: 100%;" /&gt;&lt;/a&gt;
&lt;!-- Links --&gt;
</content><category term="cs"></category><category term="it"></category><category term="mindmap"></category></entry><entry><title>Who is an engineer</title><link href="https://vorakl.com/articles/engineering/" rel="alternate"></link><published>2019-12-19T17:29:14-08:00</published><updated>2019-12-19T17:29:14-08:00</updated><author><name>vorakl</name></author><id>tag:vorakl.com,2019-12-19:/articles/engineering/</id><summary type="html">&lt;p class="first last"&gt;What's the crucial difference between engineers and scientists&lt;/p&gt;
</summary><content type="html">&lt;p&gt;With the coming of the Industrial Age (approx. 1760-1950), an agricultural
society transitioned to an economy, based primarily on massive industrial
production. It was the time of the rise of specialized educational centers,
where people could get deep knowledge in many different fields of science and
became either scientists or engineers.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Briefly, the main difference between scientists and engineers is that
scientists &lt;strong&gt;discover&lt;/strong&gt;, but engineers &lt;strong&gt;invent&lt;/strong&gt;. That is Engineers, using
discoveries of scientists, invent &lt;em&gt;systems&lt;/em&gt;, &lt;em&gt;devices&lt;/em&gt;, &lt;em&gt;processes&lt;/em&gt;, which they
&lt;em&gt;design&lt;/em&gt;, &lt;em&gt;develop&lt;/em&gt;, &lt;em&gt;implement&lt;/em&gt;, &lt;em&gt;build&lt;/em&gt;, &lt;em&gt;manage&lt;/em&gt;, &lt;em&gt;maintain&lt;/em&gt;, and &lt;em&gt;improve&lt;/em&gt;
as different stages of the Engineering process. Engineering is a practical
application of scientific knowledge, integrated with business and management.
In other words, engineers act as a bridge between science and society by doing
inventions for the real world and people.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;In the modern time of the Information Age, the role of an engineer has been
extended by non-technical skills, as a result of the globalization and spreading
of trade relationships across the globe. These are skills such as intellectual
(communication, foreign languages, critical thinking), management
(time management, self-organization, planning), and standards awareness
(tech certifications, best practices).&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;a class="reference external image-reference" href="https://vorakl.com/files/engineering/engineers.png"&gt;&lt;img alt="Engineers" class="img" src="https://vorakl.com/files/engineering/engineers.png" style="width: 100%;" /&gt;&lt;/a&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;a class="reference external image-reference" href="https://vorakl.com/files/engineering/engineering.png"&gt;&lt;img alt="Engineering" class="img" src="https://vorakl.com/files/engineering/engineering.png" style="width: 100%;" /&gt;&lt;/a&gt;
</content><category term="mindmap"></category></entry><entry><title>Algorithm is...</title><link href="https://vorakl.com/articles/algorithm/" rel="alternate"></link><published>2019-12-15T16:58:11-08:00</published><updated>2019-12-15T16:58:11-08:00</updated><author><name>vorakl</name></author><id>tag:vorakl.com,2019-12-15:/articles/algorithm/</id><summary type="html">&lt;p class="first last"&gt;Common properties of algorithms&lt;/p&gt;
</summary><content type="html">&lt;p&gt;Despite the obvious expectation to find some sort of a definition of the term
&lt;em&gt;&amp;quot;Algorithm&amp;quot;&lt;/em&gt; here, I have to disappoint you, as there isn't any general or
well-accepted definition. But, it's not a unique situation! Take mathematics,
for example. Although there are plenty of different &amp;quot;definitions&amp;quot; that can be
found in the literature, they all are just oversimplified attempts to explain
what an algorithm really means.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;In general, an algorithm is a way of describing the logic. And that's why it's
so hard to cover all possible forms of it in terms of common rules or
definitions. Most prominent mathematicians began seriously thinking about
computability and what can be computed at the beginning of the 20th century.
But it was so hard to generalize all the cases that eventually they had to limit
the consideration by functions defined only on the set of &lt;a class="reference external" href="https://vorakl.com/articles/numbers/"&gt;Natural numbers&lt;/a&gt;.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The most famous works were done by &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Church%E2%80%93Turing_thesis"&gt;Alan Turing&lt;/a&gt; (related to algorithms) and
&lt;a class="reference external" href="https://en.wikipedia.org/wiki/Church%E2%80%93Turing_thesis"&gt;Alonzo Church&lt;/a&gt; (related to computable functions). Alan Turing came up with the
thesis which basically says, that if a function is computable then it has
an algorithm, and if so, then it can be implemented on the Turing machine (TM).
In other words, &lt;a class="reference external" href="https://vorakl.com/articles/turing/"&gt;Turing's thesis&lt;/a&gt; makes it clear what can be computed and what
is needed to get computed.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;a class="reference external" href="https://www.youtube.com/watch?v=dNRDvLACg5Q"&gt;Turing machine&lt;/a&gt; is an abstract system that has a finite set of states and
symbols, a few certain operations, and an endless tape (consisted of cells).
The behavior of a TM is controlled by a program that defines a state transition
and a next tape movement depending on a symbol that was read. Although, there
is no a real-world analog of the TM as it is unlikely possible to have infinite
memory. So, to get it more realistic, for a real analog of TM, it means two things:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;to have enough memory, at least, as much as needed (analog of the tape)&lt;/li&gt;
&lt;li&gt;to have a conditional branching, some sort of if/else and goto statements
(analog of state transitions)&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;All algorithms share the same properties:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;deterministic (produces the same result for the same input)&lt;/li&gt;
&lt;li&gt;discrete (works with discrete data, like texts, integers, rational numbers)&lt;/li&gt;
&lt;li&gt;finite (represented by a finite text)&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;a class="reference external image-reference" href="https://vorakl.com/files/algorithm/algorithm-properties.png"&gt;&lt;img alt="Turing completeness" class="img" src="https://vorakl.com/files/algorithm/algorithm-properties.png" style="width: 100%;" /&gt;&lt;/a&gt;
&lt;!-- Links --&gt;
</content><category term="cs"></category><category term="mindmap"></category></entry><entry><title>Turing: thesis, machine, completeness</title><link href="https://vorakl.com/articles/turing/" rel="alternate"></link><published>2019-12-15T15:01:47-08:00</published><updated>2019-12-15T15:01:47-08:00</updated><author><name>vorakl</name></author><id>tag:vorakl.com,2019-12-15:/articles/turing/</id><summary type="html">&lt;p class="first last"&gt;A formal system in the computability theory&lt;/p&gt;
</summary><content type="html">&lt;p&gt;Alan Turing is one of the pioneers of the computability theory and logic
formalization. He came up with the hypothesis of which algorithms can be
implemented and computed by machines (&lt;a class="reference external" href="https://en.wikipedia.org/wiki/Church%E2%80%93Turing_thesis"&gt;Turing's thesis&lt;/a&gt;), created an abstract
model of such machine (&lt;a class="reference external" href="https://stackoverflow.com/a/127831/5673383"&gt;Turing machine&lt;/a&gt;), and described absolutely vital abilities
of any system for being able to realize any logic that can be computed
(&lt;a class="reference external" href="https://www.youtube.com/watch?v=RPQD7-AOjMI"&gt;Turing completeness&lt;/a&gt;).&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Turing's thesis is only one of the existing formal systems in the computability
theory. There are also λ-calculus, Markov algorithms, but they all were implemented
on the Turing Machine that is used at this time as a general computational model
to classify which real-world systems (mostly programming languages) are able
to compute mathematical functions or implement algorithms.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;All existing computability theories are defined on discrete values, and
the domain is the set of &lt;a class="reference external" href="https://vorakl.com/articles/numbers/"&gt;Natural numbers&lt;/a&gt;.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;I prepared several mindmaps to summarize basic ideas and statements:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;strong&gt;Turing's thesis&lt;/strong&gt;:&lt;/li&gt;
&lt;/ul&gt;
&lt;a class="reference external image-reference" href="https://vorakl.com/files/turing/turing-thesis.png"&gt;&lt;img alt="Turing's thesis" class="img" src="https://vorakl.com/files/turing/turing-thesis.png" style="width: 100%;" /&gt;&lt;/a&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;strong&gt;Turing machine&lt;/strong&gt;:&lt;/li&gt;
&lt;/ul&gt;
&lt;a class="reference external image-reference" href="https://vorakl.com/files/turing/turing-machine.png"&gt;&lt;img alt="Turing machine" class="img" src="https://vorakl.com/files/turing/turing-machine.png" style="width: 100%;" /&gt;&lt;/a&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;strong&gt;Turing completeness&lt;/strong&gt;:&lt;/li&gt;
&lt;/ul&gt;
&lt;a class="reference external image-reference" href="https://vorakl.com/files/turing/turing-completeness.png"&gt;&lt;img alt="Turing completeness" class="img" src="https://vorakl.com/files/turing/turing-completeness.png" style="width: 100%;" /&gt;&lt;/a&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;!-- Links --&gt;
</content><category term="cs"></category><category term="mindmap"></category></entry><entry><title>Organizing Unstructured Data</title><link href="https://vorakl.com/articles/data-structure/" rel="alternate"></link><published>2019-08-21T17:08:40-07:00</published><updated>2019-08-21T17:08:40-07:00</updated><author><name>vorakl</name></author><id>tag:vorakl.com,2019-08-21:/articles/data-structure/</id><summary type="html">&lt;p class="first last"&gt;Managing data complexity using types, structures, ADTs, and objects&lt;/p&gt;
</summary><content type="html">&lt;div class="section" id="topics"&gt;
&lt;h2&gt;Topics&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference internal" href="#type"&gt;Type&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference internal" href="#data-structure"&gt;Data Structure&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference internal" href="#abstract-data-type-adt"&gt;Abstract Data Type (ADT)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference internal" href="#object"&gt;Object&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The main, if not the only, purpose of a computer is to compute information.
It doesn't always have to be a computation of mathematical formulas. In general,
it is a transformation of one piece of information into another. Computers only
work with information that can be represented as discrete data. The input and
output of a computer engine are always &lt;a class="reference external" href="https://vorakl.com/articles/numbers/"&gt;natural numbers&lt;/a&gt; or text (a sequence
of symbols from a dictionary that correspond to certain natural numbers).&lt;/p&gt;
&lt;a class="reference external image-reference" href="https://vorakl.com/files/data-structure/compute.png"&gt;&lt;img alt="computation diagram" class="img" src="https://vorakl.com/files/data-structure/compute.png" style="width: 100%;" /&gt;&lt;/a&gt;
&lt;p&gt;As long as data is unstructured, it's hard to make some sense of it. But once
data is given a structured form, it becomes meaningful and suitable for further
transformation.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="type"&gt;
&lt;h2&gt;Type&lt;/h2&gt;
&lt;p&gt;The simplest form of data organization is &lt;strong&gt;Type&lt;/strong&gt;. In general, a &lt;em&gt;data type&lt;/em&gt;
defines a set of values with certain properties. It usually defines a size
in bytes. A &lt;strong&gt;primitive data type&lt;/strong&gt; is &lt;em&gt;an ordered set of bytes&lt;/em&gt;. When a variable
of a primitive data type has only one value (holds only one piece of information),
it's called a &lt;strong&gt;scalar&lt;/strong&gt; and a type - &lt;strong&gt;scalar data type&lt;/strong&gt;. Well-known examples
are &lt;em&gt;integer, float, pointer, and char&lt;/em&gt;. A &lt;em&gt;collection of primitive (scalar)
data types&lt;/em&gt; is called an &lt;strong&gt;aggregate data type&lt;/strong&gt;, and it allows multiple values
to be stored. This can be a homogeneous collection, where all elements are of
the same type, such as an array, a string, or a file. Or it can be heterogeneous,
where elements are of different types, such as a structure or a class. The main
property is an ordered set of bytes. The internal organization is simple,
straightforward, and all actions (e.g. reading or modifying) are performed
directly on the data, according to a hardware architecture that defines
the byte order in memory (little-/big-endian).&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="data-structure"&gt;
&lt;h2&gt;Data Structure&lt;/h2&gt;
&lt;p&gt;The next level of data abstraction is called &lt;strong&gt;Data Structure&lt;/strong&gt;. It brings more
complexity, but also more flexibility to make the right choice between access
speed, ability to grow, modification speed, etc. Internally, it's represented
by a collection of the scalar or aggregate data types. The main focus is &lt;em&gt;on
the details of the internal organization and a set of rules to control that
organization&lt;/em&gt;. There are two types of data structures that result from
a difference in the memory allocation of the underlying elements:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;strong&gt;Array Data Structures&lt;/strong&gt; (static), based on physically contiguous elements
in memory, with no gaps between them;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Linked Data Structures&lt;/strong&gt; (dynamic), based on elements, dynamically allocated
in memory and linked in a linear structure using pointers (usually, one or two)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Well-known examples are &lt;em&gt;linked list, hash (dictionary), set, list&lt;/em&gt;. These data
structures are defined only by their &lt;strong&gt;physical&lt;/strong&gt; organization in memory and
a set of rules for data modifications that are performed directly. All internal
implementation details are open. The actions performed on the data structures
(add, remove, update, etc.) and the ways in which they are used can vary.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="abstract-data-type-adt"&gt;
&lt;h2&gt;Abstract Data Type (ADT)&lt;/h2&gt;
&lt;p&gt;A higher level of data abstraction is represented by an &lt;strong&gt;Abstract Data Type&lt;/strong&gt;
(ADT), which shifts the focus from &amp;quot;how to store data&amp;quot; to &amp;quot;how to work with
data&amp;quot;. An ADT represents a &lt;strong&gt;logical&lt;/strong&gt; organization, defined mainly by a
list of predefined operations (functions) for manipulating data and controlling
its consistency. Internally, data can be stored in any &lt;em&gt;data structure&lt;/em&gt; or
combination thereof. However, these internals are hidden and should not be
directly accessible. All interactions with data are done through an interface
(operations exposed to users). Most of ADTs share a common set of &lt;em&gt;primitive
operations&lt;/em&gt;, such as&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;strong&gt;create&lt;/strong&gt; - a constructor of a new instance&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;destroy&lt;/strong&gt; - a destructor of an existing instance&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;add&lt;/strong&gt;, &lt;strong&gt;get&lt;/strong&gt; - the set-get functions for adding and removing elements of an instance&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;is_empty&lt;/strong&gt;, &lt;strong&gt;size&lt;/strong&gt; - useful functions for managing existing data in an instance&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The most common examples of ADTs are &lt;em&gt;stack&lt;/em&gt; and &lt;em&gt;queue&lt;/em&gt;. Both of these ADTs
can be implemented using either array or linked data structures, and both have
specific rules for adding and removing elements. All of these specifics are
abstracted as functions, which in turn, perform appropriate actions on internal
data. Dividing an ADT into operations and data structures creates an abstraction
barrier that allows you to maintain a solid interface with the flexibility
to change internals without side effects on the code using that ADT.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="object"&gt;
&lt;h2&gt;Object&lt;/h2&gt;
&lt;p&gt;A more comprehensive way of abstracting data is represented by &lt;strong&gt;Objects&lt;/strong&gt;.
An object can be thought of as a container for a piece of data that has certain
properties. Similar to the ADT, this data is not directly accessible (known as
&lt;em&gt;encapsulation&lt;/em&gt; or &lt;em&gt;isolation&lt;/em&gt;), but instead each object has a set of tightly
bound methods that can be applied to operate on its data to produce an expected
behavior for that object (known as &lt;em&gt;polymorphism&lt;/em&gt;). All such methods are really
just functions collected under a &lt;em&gt;class&lt;/em&gt;. However, they become methods when
called to operate on a particular object. Methods can also be inherited from
another class, which is called a &lt;em&gt;superclass&lt;/em&gt;. Unlike an ADT, an object doesn't
represent a particular type of data, but rather a set of &lt;em&gt;attributes&lt;/em&gt;, and it
behaves as it should when its methods are invoked. Attributes are nothing more
than variables of any type (including ADTs). Formally speaking, classes act
as specifications of all of the object's attributes and the methods that can
be invoked to deal with those attributes.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;strong&gt;Object-Oriented Programming&lt;/strong&gt; (OOP) paradigm uses objects as the central
elements of a program design. At program runtime, each object exists as
an instance of a class. The class, in turn, plays a dual role: it defines
the behavior (through a set of methods) of all objects instantiated from it,
and it declares a prototype of data that will carry some state within the object
once it's instantiated. As long as the state is isolated (incapsulated) in
the objects, access to that state is organized by communication between
the objects via message passing. It's usually implemented by calling a method
of an object, which is equivalent to &amp;quot;passing&amp;quot; a message to that object.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This behavior is completely different from the &lt;a class="reference external" href="https://vorakl.com/articles/goto/"&gt;Structured Programming Paradigm&lt;/a&gt;,
which instead of maintaining a collection of interacting objects
with an an embedded state, relies on dividing of a project's code into
a sequence of mostly independent tasks (functions) that operate with
an externally (to them) stored &lt;em&gt;state&lt;/em&gt;.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;a class="reference external image-reference" href="https://vorakl.com/files/data-structure/data-organization.png"&gt;&lt;img alt="Data Organization" class="img" src="https://vorakl.com/files/data-structure/data-organization.png" style="width: 100%;" /&gt;&lt;/a&gt;
&lt;!-- Links --&gt;
&lt;/div&gt;
</content><category term="cs"></category><category term="programming"></category></entry><entry><title>Number Classification</title><link href="https://vorakl.com/articles/numbers/" rel="alternate"></link><published>2019-08-16T12:42:06-07:00</published><updated>2019-08-16T12:42:06-07:00</updated><author><name>vorakl</name></author><id>tag:vorakl.com,2019-08-16:/articles/numbers/</id><summary type="html">&lt;p class="first last"&gt;All number categories, from Complex to Counting&lt;/p&gt;
</summary><content type="html">&lt;p&gt;Mathematics is unique. The unique science if everyone could agree that it is Science. But, it's also hard to argue that it is not Art. Math is absolutely certain, except the cases when it is not (&amp;quot;&lt;a class="reference external" href="https://en.wikipedia.org/wiki/Mathematics#cite_note-certain-39"&gt;as far as the laws of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality&lt;/a&gt;&amp;quot;). Still having no one general definition, math doesn't even bother to have one opinion on such the fundamental building block as &lt;a class="reference external" href="https://www.mathsisfun.com/numbers/evolution-of-numbers.html"&gt;Numbers&lt;/a&gt;. Nevertheless, math is an important part of almost every field of science, engineering, and human life.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Here is the most common and well-accepted number classification tree:&lt;/p&gt;
&lt;a class="reference external image-reference" href="https://vorakl.com/files/numbers/numbers.png"&gt;&lt;img alt="Number classification" class="img" src="https://vorakl.com/files/numbers/numbers.png" style="width: 100%;" /&gt;&lt;/a&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;It also shouldn't be a surprise to find slight distinctions in the meaning of the same essences in &lt;em&gt;Math&lt;/em&gt; and &lt;em&gt;Computer Science&lt;/em&gt; (CS):&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;strong&gt;Natural numbers&lt;/strong&gt;. In Math, they are meant to be &lt;em&gt;Positive Integers&lt;/em&gt; (1, 2, 3, ...), but in CS they are &lt;em&gt;non-negative Integers&lt;/em&gt; which include Zero (0, 1, 2, 3 ...)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Mantissa&lt;/strong&gt;. In Math, it is a &lt;em&gt;fractional part&lt;/em&gt; of the logarithm. In CS, it is &lt;em&gt;significant digits&lt;/em&gt; of a floating-point number (thus, quite often are used other definitions in this case, like &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Significand"&gt;significand&lt;/a&gt; and coefficient)&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;There is a quite related topic in terms of the values which a variable can take on. In mathematics, a variable may be two different types: &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Continuous_or_discrete_variable"&gt;continuous and discrete&lt;/a&gt;:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;A variable is &lt;strong&gt;continuous&lt;/strong&gt; when it can take on infinitely many, uncountable values. There is always another value in between two others in a non-empty range, no matter how close they are.&lt;/li&gt;
&lt;li&gt;A variable is &lt;strong&gt;discrete&lt;/strong&gt; when there is always a positive minimum distance between two values in a non-empty range. The set of numbers is finite or countably infinite (e.g. Natural numbers)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The understanding of the discreteness is crucial in Computer Science as all real-world computers internally work only with discrete data (which makes it challenging to represent Irrational numbers). All existing computability theories (e.g. &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Church%E2%80%93Turing_thesis"&gt;Turing thesis, Church thesis&lt;/a&gt;) are defined on discrete values, and the domain is the set of Natural numbers.&lt;/p&gt;
&lt;!-- Links --&gt;
</content><category term="math"></category><category term="cs"></category></entry></feed>