Protehnica: 2023

Thursday, December 14, 2023

Java: streaming WARC files

Previously, I published in a separate post a minimal example of some Java code that reads WARC data, using the "jwat-warc" library. After that, I wanted to compare the performance of the two major WARC parsing libraries ("jwat-warc" and "jwarc"), in the context of a more complex process.

The difficulty in doing this is that the major libraries have different approaches to how they expect you to consume the data.

e.g.

"jwat-warc" exposes a stream of the underlying HTML, while "jwarc" exposes a stream containing both the underlying HTTP headers and the HTML.
"jwat-warc" exposes the HTTP response code, while when using "jwarc" they have to be extracted from the headers.
etc.

Here is the interface I came up with that represents a fully loaded WARC record, independent of the underlying implementation, alongside two static utility functions for constructing an XWarcRecord instance from the library-specific WarcRecord instance:

public abstract class XWarcRecord {
protected String _uri;
protected String _payload;

abstract public String responseCode();
abstract public String payload();

abstract public String uri();

public static XWarcRecord from(org.jwat.warc.WarcRecord r) {
return new XWarcRecord_JWat(r);
}

public static XWarcRecord from(org.netpreserve.jwarc.WarcRecord r) {
return new XWarcRecord_JWarc(r);
}
}

And the particular implementations, with some optimizations to defer as much processing as possible to the time when a particular data point is needed:

I. "jwat-warc"

For some reason, if you don't read the stream manually, the reader will throw an error when advancing to the next record. So we have to do that in the constructor.

public class XWarcRecord_JWat extends XWarcRecord {

private final WarcRecord r;

XWarcRecord_JWat(WarcRecord r) {

this.r = r;

try {

InputStream contentStream = r.getPayloadContent();

this._payload = new String(contentStream.readAllBytes(), Charsets.UTF_8);

} catch (IOException | NullPointerException e) {

this._payload = null;

}

@Override

public String responseCode() {

if (r.getHttpHeader() == null) {

return "0";

}

return r.getHttpHeader().statusCodeStr;

}

@Override

public String payload() {

return _payload;

}

@Override

public String uri() {

if (_uri == null) {

_uri = r.getHeader("WARC-Target-URI").value;

}

return _uri;

}

II. "jwarc"

The stream they expose includes both the HTTP headers and the HTML (or other content), so we have to extract them manually.

public class XWarcRecord_JWarc extends XWarcRecord {

protected String _headers = null;

protected String _responseCode = null;

private final WarcRecord r;

XWarcRecord_JWarc(WarcRecord r) {

this.r = r;

}

@Override

public String responseCode() {

if (_responseCode == null) {

_parseContent();

_responseCode = _responseCode(this._headers);

}

return _responseCode;

}

@Override

public String payload() {

if (this._payload == null) {

this._parseContent();

}

return this._payload;

}

@Override

public String uri() {

if (_uri == null) {

_uri = r.headers().first("WARC-Target-URI").orElse(null);

}

return _uri;

}

private void _parseContent() {

List<String> h = new ArrayList<>();

List<String> p = new ArrayList<>();

try (MessageBody body = r.body();

BufferedReader reader = new BufferedReader(new InputStreamReader(body.stream(), StandardCharsets.UTF_8))) {

String line;

while ((line = reader.readLine()) != null) {

if (line.isEmpty()) {

break;

}

h.add(line);

}

while ((line = reader.readLine()) != null) {

p.add(line);

}

this._headers = String.join("\n", h);

this._payload = String.join("\n", p);

} catch (IOException | NullPointerException e) {

throw new RuntimeException(e);

}

private static final Pattern P = Pattern.compile("HTTP/\\d\\.\\d\\s+(\\d{3})");

private static String _responseCode(String input) {

Matcher matcher = P.matcher(input);

if (matcher.find()) {

return matcher.group(1);

} else {

return "0";

}

There was no observed performance difference between the two, but coming up with a solution to abstract away the underlying implementation was an interesting exercise.

Monday, December 11, 2023

Throttle vs Debounce

The following page will offer a very nice JavaScript illustration of the difference between throttle and debounce: https://web.archive.org/web/20220128120157/http://demo.nimius.net/debounce_throttle/

The terms make sense in the context when you want to control the time when an "effect" is triggered, based on the timing of the "cause" (I am using these terms very generally: "cause and effect").

You "throttle" when you want the effect to be spaced apart by a minimum interval of X time.
You "debounce" when you want the effect to be triggered after the cause has "cooled off" for enough (X) time.

Friday, December 8, 2023

Install Node.js / npm on Linux

The easiest way to manage Node.js / npm on Linux is by using the Node Version Manager:
https://github.com/nvm-sh/nvm

I. Install NVM

curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash

II. Use NVM to install the latest LTS release of Node.js / npm

nvm install --lts

III. List versions

nvm ls

This will output a mix of both installed and available versions (including all LTS releases)

N/A

default -> lts/* (-> N/A)

iojs -> N/A (default)

node -> stable (-> N/A) (default)

unstable -> N/A (default)

lts/* -> lts/iron (-> N/A)

lts/argon -> v4.9.1 (-> N/A)

lts/boron -> v6.17.1 (-> N/A)

lts/carbon -> v8.17.0 (-> N/A)

lts/dubnium -> v10.24.1 (-> N/A)

lts/erbium -> v12.22.12 (-> N/A)

lts/fermium -> v14.21.3 (-> N/A)

lts/gallium -> v16.20.2 (-> N/A)

lts/hydrogen -> v18.19.0 (-> N/A)

lts/iron -> v20.10.0 (-> N/A)

When installing and uninstalling specific versions, you can use both the numeric version, or the release designation (e.g. v16.20.2 and lts/gallium are interchangeable)

IV. Uninstall a specific version

nvm uninstall lts/iron

V. Install a specific version

nvm install lts/hydrogen

VI. Use a specific version

nvm use lts/gallium

If you're running an older version of Linux, you may only have access to older Node.js versions, because a dependency on the GNU C Library (glibc).

Trying to run anything newer than lts/gallium on Amazon Linux 2 will throw the following:

node: /lib64/libm.so.6: version `GLIBC_2.27' not found (required by node)

node: /lib64/libc.so.6: version `GLIBC_2.28' not found (required by node)

You can update the NPM version independently from Node.js, but this too has a limit. Gallium comes with version 8. At the time of this writing, it prompts that version 10 is available, but you can only upgrade up to version 9.

npm install -g npm@9

(It will complain about incompatible versions if you try to install version 10).

Saturday, November 25, 2023

IMDSv2

v2 of the AWS "Instance Metadata Service" has been around for a while.

It is optional for now, but they will make it mandatory some time during 2024 [1].
The IMDS has to do with how EC2 instances get their metadata and IAM permissions.
If you're using a recent version of cloud-init, AWS CLI and SDKs, these should support v2.
By default, most instances that have been around for a while still use v1.

To list instances that are still using v1 [2]:

aws ec2 describe-instances \
--filters "Name=metadata-options.http-tokens,Values=optional" \
--query "Reservations[*].Instances[*].[InstanceId]" \
--output text

To enable v2 on a per-instance basis:

aws ec2 modify-instance-metadata-options \

--instance-id "${EC2_ID}" \

--http-endpoint enabled \

--http-tokens required

To change an AMI so that instances launched from it have v2 enabled:

aws ec2 modify-image-attribute \

--image-id "${AMI_ID}" \
--imds-support v2.0

You will have to change how you retrieve instance metadata URIs [3].

v1:

curl -s http://169.254.169.254/latest/meta-data/instance-id

v2:

TOKEN="$(curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")"

curl -s -H "X-aws-ec2-metadata-token: $TOKEN" \

http://169.254.169.254/latest/meta-data/instance-id

References:

Thursday, November 9, 2023

Debugging Linux input events

I recently had some issues with a laptop running Linux receiving spurious keyboard events.
The following two commands are useful in debugging this:

sudo tail -f /dev/input/event*

And:

xinput test-xi2 --root 5

You can replace the "5" with any other value to view input from other devices.

Saturday, November 4, 2023

Is your AWS instance Xen or Nitro?

This command will let you know whether the AWS instance you are logged into is Xen or Nitro-based:

curl http://169.254.169.254/latest/meta-data/system/

AWS has a page about Nitro, but not one about Xen, as that is considered a legacy virtualizer:
https://aws.amazon.com/ec2/nitro/

Check for Percona 8.0 upgrades

One thing I wasn't aware of is that in order to fully check for Percona upgrades, I had to manually run the command:

sudo percona-release setup ps80

https://docs.percona.com/percona-server/8.0/installation.html

Shrink AWS EBS root drive with UEFI boot mode

Recently I wanted to create a Ubuntu 22.04LTS image for the aarch64 (ARM) infrastructure with 3GB of space (instead of the default 8GB). I had done a similar procedure years ago, on a x64 Ubuntu installation, but this time the operation presented complications. Specifically, the default boot mode for Graviton instance types is UEFI: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ami-boot.html so in addition to the Linux partition, there is also an EFI partition that has to be handled.

The following should help you in case you want to achieve something similar.

Prerequisites:

A working "source" instance , which contains the "source" volume (the one you want to shrink).
A new empty volume, which we will call the "target" volume.
A throwaway "staging" instance, to which we will attach both the "source" and the "target" volumes.
(Even though the volume I was shrinking contained an ARM installation, I used an x64 image for the staging instance.)

The process doesn't actually shrink the volume. What happens instead is that the data we will be mirrored to the smaller drive.

The reason why a throawaway instance is preferred is that we don't want to accidentally risk affecting production machines and data.

Initial steps:

Before dismounting the source volume, clean the instance of any unnecessary files. You want the target volume to be able to fit all the data in the source volume (I removed Snap entirely and obsolete kernels to get below 3GB).
Detach the source volume from the source instance.
Attach the source volume to the staging instance (as /dev/sdf).
Attach the target volume to the staging instance (as /dev/sdg).
Boot the staging instance, SSH into it, and get a root bash prompt with sudo su.

Once inside, df -h should show the attached volumes.
On my particular staging instance, they showed up as /dev/xvdf and /dev/xvdg respectively.
Other guides have them as /dev/nvme1n1 and /dev/nvme2n1. This has to do with whether the staging instance is a Xen (xvd*) or Nitro (nvme*) one. Adjust as needed, but I personally prefer the Xen nomenclature for this operation. You're not going to mistake "f" and "g".

Step 1

Inspect the partitioning of the source volume:
(Highlights mine, those are values we're are interested in).

fdisk -l /dev/xvdf

Disk /dev/xvdf: 8 GiB, 8589934592 bytes, 16777216 sectors

Units: sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disklabel type: gpt

Disk identifier: 51ED0604-9F4E-4D14-B8E7-4FC378A5E100

Device Start End Sectors Size Type

/dev/xvdf1 206848 16777182 16570335 7.9G Linux filesystem

/dev/xvdf15 2048 204800 202753 99M EFI System

Partition table entries are not in disk order.

Notice that there are two partitions:

Partition 15 is at the start of the volume, and holds the UEFI files.
Partition 1 is at the end of the volume, and holds the Linux installation.
Make note of the total # of sectors, as well as the "End" and "Sectors" value for the Linux filesystem partition.

Step 2

Inspect the target volume:

fdisk -l /dev/xvdg

Disk /dev/xvdg: 3 GiB, 3221225472 bytes, 6291456 sectors

Units: sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disklabel type: dos

Disk identifier: 0x00000000

This volume doesn't have any partitions yet, so just make note of the total # of sectors.

Step 3
Dump the partition table of the source volume:

sudo sfdisk -d /dev/xvdf > partitions.txt

label: gpt

label-id: 51ED0604-9F4E-4D14-B8E7-4FC378A5E100

device: /dev/xvdf

unit: sectors

first-lba: 34

last-lba: 16777182

sector-size: 512

/dev/xvdf1 : start= 206848, size= 16570335, type=0FC63DAF-8483-4772-8E79-3D69D8477DE4, uuid=69E1997E-A903-4643-A4FA-491F49342E1C

/dev/xvdf15 : start= 2048, size= 202753, type=C12A7328-F81F-11D2-BA4B-00A0C93EC93B, uuid=CC54DFDE-E8B4-43AA-B808-0CBE0BDB7C0D

Notice that the value for "last-lba" and "size" are exactly the values from Step 1 for "End" and "Sectors".

Step 4

Calculate the new partition configuration.

We will leave partition 15 exactly as it is, what we're interested in is making sure partition 1 gets a valid partition configuration, so the partition is created perfectly the first time around.

If we look at Step 1, we notice that the "End" and "Sectors" values for /dev/xvdf1 are correlated to the total number of sectors of the disk.

Specifically:

"End" = disk sectors (16777216) - 34 = 16777182

"Sectors" = disk sectors (16777216) - 206881 = 16570335

To get the values for /dev/xvdg we simply substitute the total number of disk sectors:

"End" = 6291456 - 34 = 6291422

"Sectors" = 6291456 - 206881 = 6084575

On your volume the deltas may be different, adjust as needed.

Step 5
Edit the partition table dump file (partitions.txt) by filling in the values calculated at the previous step. The values that need to be changed are:

"last-lba" = the calculated "End" value.
"size" = the calculated "Sectors" value.

I also substituted "xvdf" for "xvdg" to avoid confusion when looking at the file, but I think only the numerical values are relevant. Other guides also suggest changing the uuid of the partitions, but here it's critical that we keep them.

The edited "partitions.txt":

label: gpt

label-id: 51ED0604-9F4E-4D14-B8E7-4FC378A5E100

device: /dev/xvdg

unit: sectors

first-lba: 34

last-lba: 6291422

sector-size: 512

/dev/xvdg1 : start= 206848, size= 6084575, type=0FC63DAF-8483-4772-8E79-3D69D8477DE4, uuid=69E1997E-A903-4643-A4FA-491F49342E1C

/dev/xvdg15 : start= 2048, size= 202753, type=C12A7328-F81F-11D2-BA4B-00A0C93EC93B, uuid=CC54DFDE-E8B4-43AA-B808-0CBE0BDB7C0D

Step 6
Write the new partition table to the target disk:

sudo sfdisk /dev/xvdg < partitions.txt

Step 7

Make a raw low-level copy of the EFI partition, which will remain unchanged:

sudo dd if=/dev/xvdf15 of=/dev/xvdg15 bs=512

Step 8
Mirror the Linux partition data by mounting the partition and copying the files with rsync:

mkdir /tmp/xvdf1

mkdir /tmp/xvdg1

mount /dev/xvdf1 /tmp/xvdf1

mount /dev/xvdg1 /tmp/xvdg1

rsync -av /tmp/xvdf1/ /tmp/xvdg1/

umount /dev/xvdf1

umount /dev/xvdg1

Step 9
Ensure source and target partitions have the same metadata.
First, query each of them using blkid:

blkid /dev/xvdf1

/dev/xvdf1: LABEL="cloudimg-rootfs" UUID="466bbba3-507d-44a0-989e-e25286198eba" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="69e1997e-a903-4643-a4fa-491f49342e1c"

blkid /dev/xvdg1

/dev/xvdg1: UUID="69e1997e-a903-4643-a4fa-491f49342e1c" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="69e1997e-a903-4643-a4fa-491f49342e1c"

Notice that the source partition has a LABEL, and a different UUID.
Without mirroring these as well, the machine won't boot (the label is referenced by /etc/fstab, and the UUID is referenced by the UEFI configuration).

The following commands set the right values:

e2label /dev/xvdg1 cloudimg-rootfs

tune2fs /dev/xvdg1 -U 466bbba3-507d-44a0-989e-e25286198eba

One or both may ask you to run some integrity utility.
Do so, it should run without problems.

Step 10
Wrapping up:

Stop the machine
On the EC2 "Volumes" page detach the volumes from the staging instance.
Attach the target volume to the source machine as /dev/sda1
Boot.

Friday, November 3, 2023

screen - tmux dictionary

The following lists equivalent commands I most commonly use for the screen and tmux programs.
This is relevant because Red Hat has stopped making screen easily available, first marking it as deprecated in v7.6, then not even including it after v8.0: https://access.redhat.com/solutions/4136481

1. List existing sessions:

screen -ls
tmux ls # tmux list-sessions

2. Create a new named session:

screen -S name
tmux new -s name # or: tmux new-session -s name

3. Attach a named session:

screen -dr name
tmux a -t name # or: tmux attach -t name

4. Detach an attached session:

Ctrl+a d # screen
Ctrl+b d # tmux

5. Enter copy mode:

Ctrl+a Esc # screen
Ctrl+b [ # tmux

6. Exit copy mode:

Esc # screen
q # tmux

7. Terminate named session:

tmux kill-session -t name
screen -XS name

8. Start a command in a session:

screen -dmS session_name command param1 param2
tmux new -d -s name command param1 param2

Wednesday, November 1, 2023

Bing: disable scrolling from Bing AI chat to search results

For some reason, Microsoft introduced a UX feature whereby the page would switch:

from Bing AI to the search results page when scrolling down, and
from the search results page to Bing AI when scrolling up.

This happened even though the scrollbar gave no indication there was additional content to scroll to.
The following userscript, simplified from a version I found on the Microsoft Community forum, disables this behavior:

// ==UserScript==
// @name Bing scroll
// @match https://www.bing.com/*
// ==/UserScript==

window.addEventListener("wheel", e => {
if (e.target.className.includes("cib-serp-main")) {
e.stopPropagation();
}
});

Original:
https://answers.microsoft.com/en-us/bing/forum/all/how-to-disable-scrolling-into-ai-chat/2c208d88-918f-4eed-bd8f-f04e7dcf5af1?page=5

Monday, October 30, 2023

AWS CLI sets wrong file type, file gets downloaded with the wrong extension

I recently experienced the following issue using the AWS CLI:

I uploaded a .csv.gz file to S3.
I generated a presigned link.
The presigned link served the file with the wrong extension(.csv.csv instead of .csv.gz).

I think this is a bug in S3, where they misidentify the file type when a file has multiple extensions.
Thankfully, this can be easily solved, both for existing files, and for future uploads:

For existing files: I clicked on the object in the web interface, and scrolled down to Metadata.

Sure enough, the "Content-Type" key had the wrong value (it was "text/csv").
I clicked the "Edit" button, and manually changed it to the correct type, namely "application/x-gzip". They have The existing presigned link also reflected the change.

For future uploads: setting the content type explicitly ensures I will always get the desired content type, e.g.: --content-type application/x-gzip

Saturday, October 21, 2023

Bash: "get or default"

A useful command for assigning a default value to a variable if an optional input (e.g. $1) is missing.

declare PARAMETER_VALUE="${1:-DEFAULT_VALUE}"

Friday, October 20, 2023

Java: streaming WARC file with jwat-warc

I had to write some Java code that read WARC that was being piped in through stdin.
Here's some minimally functional working code, using the jwat-warc library:
https://mvnrepository.com/artifact/org.jwat/jwat-warc

import org.jwat.warc.WarcReader;
import org.jwat.warc.WarcReaderFactory;
import org.jwat.warc.WarcRecord;

And the minimal code piece:

InputStream stdin = System.in;
WarcReader warcReader = WarcReaderFactory.getReader(stdin);
WarcRecord record;

while ((record = warcReader.getNextRecord()) != null) {
InputStream contentStream = record.getPayloadContent();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(contentStream))) {
StringBuilder builder = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
builder.append(line);
builder.append("\n");
}

String uri = record.getHeader("WARC-Target-URI").value;
String html = builder.toString();
System.out.println(uri + "\t" + html.length());
}
}

You need to read the full content of the WARC record, like I did (so you can't just skip records without reading them), or else it will throw the following exception:

java.io.IOException: Illegal seek

at java.base/java.io.FileInputStream.skip0(Native Method)

at java.base/java.io.FileInputStream.skip(Unknown Source)

at java.base/java.io.BufferedInputStream.implSkip(Unknown Source)

at java.base/java.io.BufferedInputStream.skip(Unknown Source)

at java.base/java.io.FilterInputStream.skip(Unknown Source)

at java.base/java.io.PushbackInputStream.skip(Unknown Source)

at org.jwat.common.ByteCountingPushBackInputStream.skip(ByteCountingPushBackInputStream.java:134)

at org.jwat.common.FixedLengthInputStream.skip(FixedLengthInputStream.java:115)

at org.jwat.common.FixedLengthInputStream.close(FixedLengthInputStream.java:58)

at java.base/java.io.BufferedInputStream.close(Unknown Source)

at org.jwat.common.Payload.close(Payload.java:267)

at org.jwat.warc.WarcRecord.close(WarcRecord.java:445)

at org.jwat.warc.WarcReaderUncompressed.getNextRecord(WarcReaderUncompressed.java:123)

Java 21 on Ubuntu 22.04 LTS and Amazon Linux

I wanted to update the JRE to version 21 on Ubuntu 22.04 LTS, and on Amazon Linux 2.
I decided to go with the Adoptium® Eclipse Temurin™ OpenJDK release just because they make it really convenient to add apt and yum repositories.

The documentation page lists the steps needed to set up the repositories:
https://adoptium.net/installation/linux/

Here you can also find all the RPM-based Linux distributions they support:
https://packages.adoptium.net/ui/repos/tree/General/rpm

UPX Linux .so: "CantPackException: bad e_shoff"

I was trying to compress a .so file I had built from Go, but UPX threw an error:

upx --ultra-brute --lzma libname.so
upx: libname.so: CantPackException: bad e_shoff

After compression, the file could no longer be read by the following nm command, which lists exposed functions in the given library:

nm -D libname.so | grep my_function_name
nm: libname.so: file format not recognized

There's an issue on the Github issue tracker for UPX from 2021 that appears to not be solved, which clarifies the problem:
https://github.com/upx/upx/issues/506#issuecomment-1168570219

This style of layout of the address space in the shared library, having 4 [PT_]LOAD segments [...] requires that the upx runtime de-compression stub be significantly enhanced from the upx stub that handles shared libraries with only 2 PT_LOAD segments (one R E and one RW). Upgrading the upx stub has been in progress for a while, and the code is getting close, but is not yet complete.

Running the command suggested in that thread confirms that my .so file also contains 4 LOAD segments, alongside the other program headers:

readelf --segments libname.so

Elf file type is DYN (Shared object file)
Entry point 0x0
There are 10 program headers, starting at offset 64

Program Headers:
Type           Offset             VirtAddr           PhysAddr
                FileSiz            MemSiz              Flags Align
LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                0x0000000000042f98 0x0000000000042f98 R      0x1000
LOAD           0x0000000000043000 0x0000000000043000 0x0000000000043000
                0x000000000012faa9 0x000000000012faa9 R E    0x1000
LOAD           0x0000000000173000 0x0000000000173000 0x0000000000173000
                0x00000000002ec30c 0x00000000002ec30c R      0x1000
LOAD           0x0000000000460208 0x0000000000461208 0x0000000000461208
                0x000000000014ee74 0x0000000000182988 RW     0x1000
DYNAMIC        0x0000000000560dc8 0x0000000000561dc8 0x0000000000561dc8
                0x00000000000001f0 0x00000000000001f0 RW     0x8
NOTE           0x0000000000000270 0x0000000000000270 0x0000000000000270
                0x0000000000000088 0x0000000000000088 R      0x4
TLS            0x0000000000460208 0x0000000000461208 0x0000000000461208
                0x0000000000000000 0x0000000000000008 R      0x8
GNU_EH_FRAME   0x000000000045e8d0 0x000000000045e8d0 0x000000000045e8d0
                0x00000000000001ac 0x00000000000001ac R      0x4
GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                0x0000000000000000 0x0000000000000000 RW     0x10
GNU_RELRO      0x0000000000460208 0x0000000000461208 0x0000000000461208
                0x0000000000100df8 0x0000000000100df8 R      0x1

The only solution is to wait for UPX maintainers to address this.

Thursday, October 19, 2023

Gradle 8.4: "Convention type has been deprecated"

I upgraded Gradle to version 8.4 and started getting the following types of messages:

The org.gradle.api.plugins.ApplicationPluginConvention type has been deprecated. This is scheduled to be removed in Gradle 9.0. Consult the upgrading guide for further information: https://docs.gradle.org/8.4/userguide/upgrading_version_8.html#application_convention_deprecation
The org.gradle.api.plugins.Convention type has been deprecated. This is scheduled to be removed in Gradle 9.0. Consult the upgrading guide for further information: https://docs.gradle.org/8.4/userguide/upgrading_version_8.html#deprecated_access_to_conventions

What's going on is that my existing build.gradle files contained some configuration options used in a way that got deprecated. The changes I had to make were largely cosmetic. I just had to group some of the existing configuration options into their own blocks:

I. Pre 8.4:

sourceCompatibility = 1.17
targetCompatibility = 1.17
mainClassName = "run.Main"
applicationDefaultJvmArgs = ["-Xmx1g"]

II. Post 8.4:

java {
sourceCompatibility = 1.17
targetCompatibility = 1.17
}

application {
mainClass.set("run.Main")
applicationDefaultJvmArgs = ["-Xmx1g"]
}

Tuesday, October 17, 2023

Remove duplicate pictures

Recently I wanted to remove duplicate photos I had based on image content, not just file hash.
This program, AllDup by MTSD, did a great job.

https://www.alldup.de/en_download_alldup.php

Sunday, October 15, 2023

"The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results."

I was playing with Huggingface transformers and kept getting the warning "The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.". I finally found a solution in a StackOverflow reply that will be credited at the end:

To fix this, first add this code after loading pre-trained tokenizer:
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
Then pass this in generate method like this:
gen_ids = model.generate(**encodings, pad_token_id=tokenizer.pad_token_id, max_new_tokens=200)

In short, there are two additions/changes you need to make:

When initializing your tokenizer, set:
tokenizer.pad_token = tokenizer.eos_token
When using the model to generate an output, pass the following as a parameter to model.generate:
pad_token_id=tokenizer.pad_token_id

Thank you user Shital Shah on StackOverflow:

https://stackoverflow.com/questions/74682597/fine-tuning-gpt2-attention-mask-and-pad-token-id-errors/76549607#76549607

"A decoder-only architecture is being used, but right-padding was detected!"

I was playing with Huggingface transformers and kept getting the warning "A decoder-only architecture is being used, but right-padding was detected!". I finally found a solution in a StackOverflow reply that will be credited at the end:

Padding in this context is referring to the "tokenizer.eos_token", and you are currently padding to the right of the user input and the error is saying that for correct results add padding to the left. You need to do this:
new_user_input_ids = tokenizer.encode(tokenizer.eos_token + input(">> User:"), return_tensors='pt')

While I originally thought it was about setting the parameter padding_side='left', it turned out to be about the order in which you concatenate the input and the eos_token.

Thank you user Travis Thayer on StackOverflow:
https://stackoverflow.com/questions/74748116/huggingface-automodelforcasuallm-decoder-only-architecture-warning-even-after/74972288#74972288

Wednesday, October 11, 2023

Update apt packages

#!/usr/bin/env bash
sudo apt-get update
sudo apt-get upgrade
sudo apt-get dist-upgrade
sudo apt-get autoremove
sudo apt-get clean

Wednesday, October 4, 2023

Install specific Go version on Linux

A bash script for installing a specific Go version on Linux (AMD64).

Expects a numeric version as a parameter (e.g. 1.21.1).
Downloads & extracts the Go archive under /usr/local/lib/go$VERSION
Creates a symlink for ./bin/go under /usr/local/bin/go

#!/usr/bin/env bash

declare VERSION="${1}"
declare PROGRAM="go${VERSION}"
declare ARCHIVE="${PROGRAM}.linux-amd64.tar.gz"
declare LIB_DIR="/usr/local/lib"
declare BIN_DIR="/usr/local/bin"
declare INSTALL="${LIB_DIR}/${PROGRAM}"
declare SYMLINK="${BIN_DIR}/go"

if [[ ! -d "${INSTALL}" ]]; then
wget --timestamping "https://go.dev/dl/${ARCHIVE}"

if [[ ! -f "${ARCHIVE}" ]]; then
echo "File not found: ${ARCHIVE}"
exit 1
fi

sudo tar -xvf "${ARCHIVE}"
sudo mv -f go "${PROGRAM}"
sudo mv -f "${PROGRAM}" "${LIB_DIR}"
else
echo "${INSTALL} already exists"
fi

sudo rm -f "${SYMLINK}"
sudo ln -s "${LIB_DIR}/${PROGRAM}/bin/go" "${SYMLINK}"

Monday, October 2, 2023

Sub7

Interview with Sub7 creator, Mobman:
- https://twitter.com/DarkCoderSc/status/1681208015255379968
- https://darkcodersc.medium.com/a-malware-retrospective-subseven-d86fed0c88bf

Born and raised in Craiova, Romania, Mobman was drawn to the world of software and malware at an early age. His fascination led him to the creation of the infamous SubSeven Remote Access Trojan, a feat achieved under a pseudonym inspired by his enduring favorite band, B.U.G. Mafia. As he reflected, “The nickname was inspired from my favorite band (still to this day!), the Romanian rap group called B.U.G. Mafia. I wanted to pick something mob-related and mobman just had a nice ring to it.”.

Sub7 fun fact: mobman used to write feature ideas in notebooks
https://twitter.com/xillwillx/status/1708766696985575772

Wednesday, September 27, 2023

Save MySQL data to compressed CSV file using a FIFO named pipe

It's very easy to save MySQL data to a compressed file by using a named pipe.
You need to have two bash terminals open, A & B, presumably in screen/tmux sessions since you are likely interested in large tables that take a long time to export.

Terminal A, Step 1: Create the FIFO named pipe (make sure MySQL can write to it):

mkfifo "/path/to/data.csv"
sudo chown mysql "/path/to/data.csv"

Terminal A, Step 2: Have zstd read from the FIFO pipe and write to an output file:

zstd -o "/tmp/data.csv.zst" < "/path/to/data.csv" &

Terminal B: Save the MySQL data:

SELECT * FROM your_table

INTO OUTFILE '/path/to/data.csv'

CHARACTER SET UTF8MB4

FIELDS TERMINATED BY ','

OPTIONALLY ENCLOSED BY '"'

ESCAPED BY '"'

LINES TERMINATED BY '\n';

The only issue would be the NULL columns, which you can deal with manually by using IFNULL when constructing your query.

https://www.w3schools.com/sql/func_mysql_ifnull.asp

Tuesday, September 19, 2023

Tamagotchi RakuRaku Dinokun

I owned one of these in the 90s, it was very fun. Several blogs are dedicated to this particular toy, and you can also find videos on YouTube.
- https://dinkiepets.tumblr.com/
- http://gotchi-garden.blogspot.com/p/dinkie-dino-care-sheet.html

And an old page dedicated to the first generation Tamagotchi:

https://members.tripod.com/~Tamagotchi_Central/main.html

Resource Hacker

A resource editor for Windows executables. It should be familiar to anyone who enjoys exploring the internals of computers and operating systems.

http://www.angusj.com/resourcehacker/

Windows Enabler

A very interesting Windows utility that helps you "hack" certain applications by allowing you to enable disabled buttons, checkboxes, menu options, and other UI elements, making them clickable.

- https://iowin.net/en/windows-enabler/
- https://www.softpedia.com/get/Others/Miscellaneous/Windows-Enabler.shtml

HideWindow by Adrian Lopez

This is an ancient piece of software written back in the 90s by Adrian Lopez that interestingly enough still works to this day in toggling the visibility state of windows in Windows.

The version hosted by the Wayback Machine is 1.31:

https://archive.org/details/HIDEWNDW_ZIP

Version 1.43 is also known to exist, but I could not find it online:
- http://assiste.com.free.fr/p/abc/b/liste_misc_tool.php
- https://web.archive.org/web/20071109174738/http://assiste.com.free.fr/p/abc/b/liste_misc_tool.php
- https://web.archive.org/web/20050113175440/http://pestpatrol.com/pestinfo/h/hidewindow_1_43.asp

While not malicious, it has been bundled together with malware and used to hide the application windows of third party executables, like mIRC operating as a "zombie" (bot).

mdm.exe is in reality HideWindow by Adrian Lopez, but he's quite innocent otherwise.

https://seclists.org/incidents/2003/Jan/86

Wednesday, August 23, 2023

wOne

I remember enjoying this game.

Use the arrow keys to control the wheel, with your task being a simple one: collect barrels (or coins) and stars by rolling through a myriad of levels full of ramps and platforms.

https://jayisgames.com/review/wone-and-two.php

The original website, "Sean Cooper Games", is available on the Wayback Machine:

http://web.archive.org/web/20121026161721/https://www.games.seantcooper.com/

Thursday, August 17, 2023

Using Chaos to Guide a Spacecraft to the moon

Typically, chaotic dynamics exhibit highly irregular behavior and the sensitive dependence on initial conditions prevents long-term prediction of the state of the system. However, the inherent exponential sensitivity of chaotic time evolutions to perturbations can be exploited to direct trajectories to some desired final state by the use of a carefully chosen sequence of small perturbations to some control parameters. These perturbations can be so small that they do not significantly change the system dynamics, but enable the intrinsic system dynamics to drive the trajectory to the desired final state. This process has been called targeting.

https://www.sciencedirect.com/science/article/abs/pii/S0094576500001259

In his book, Belbruno tells the story of how he used chaos theory to get the world’s first spaceship (a Japanese spaceship named Hiten, which means “A Buddhist Angel that Dances in Heaven”) to the moon without using fuel.

http://treeoflifereview.com/spotlight_belbruno.php

https://www.edbelbruno.com/

Cookie Clicker

Some useful JavaScript snippets to hack Cookie Clicker if you want to explore the game without sitting through the whole game experience.

https://orteil.dashnet.org/cookieclicker/test/

I. Get just under Infinty cookies.
Note: having Infinity cookies will bug the game.

Game.Earn(9e+302);

II. Buy 1000 of everything.
Note: all prices start to approach Infinity with counts nearing the upper 4000s.

Game.storeBulkButton(4);

for (let i = 0; i <= 14; i++)
for (let j = 0; j < 10; j++)
document.querySelector(`#product${i}`).click();

III. Buy all available upgrades:

document.querySelectorAll('#upgrades .upgrade.enabled').forEach(node => node.click());

Tuesday, August 15, 2023

Download from Wayback Machine

You can use the following tool to download an archived website from the Wayback Machine:
https://github.com/hartator/wayback-machine-downloader

I found it here:
https://superuser.com/questions/828907/how-to-download-a-website-from-the-archive-org-wayback-machine/957298#957298

It's a Ruby script. If you happen to have a SSH shell that has Ruby, but doesn't let you install packages globally, you can do this:

gem install --user-install wayback_machine_downloader

And the script will be here:

~/.local/share/gem/ruby/2.7.0/gems/wayback_machine_downloader-2.3.1/bin/wayback_machine_downloader

Saturday, August 12, 2023

Acer Aspire One D255

I recently attempted to revive my old Acer Aspire One D255.
I fitted it with an affordable SSD, upgraded the RAM from 1 to 2 GB, and set out to get it up and running.

I. Operating system

Lubuntu 23.04 installation failed twice, each time with a different error.
Windows 10 gets installed, and would really be ideal due to ongoing support, but it is far too slow on this machine (even on the LTSC version the Start menu takes a few seconds to show up).
Windows 7 64 bit got installed, but I couldn't find a video driver.
Windows 7 32 bit works fine; I went with the Home Basic edition.

Note: the CPU (Intel Atom N455) is 64bit, but the system is designed to run a 32bit OS.

II. Drivers

The Acer website no longer lists the product, and the old product page link has been deleted.
The Wayback Machine doesn't have a copy.
https://us.acer.com/ac/en/US/content/drivers/2987;-;AOD255

However, their "global-download.acer.com" server still hosts all the driver files.
An anonymous benefactor links to them on a dedicated blog:
http://getacerdriver.blogspot.com/2015/04/acer-aspire-one-d255d255e-drivers.html

III. Software

Finding working software for Windows 7 is surprisingly challenging.

Google Chrome 109.0.5414.120 is the last version that runs on Windows 7.

I went for a PortableApps.com package which I got from SourceForge.
https://sourceforge.net/projects/portableapps/files/Google%20Chrome%20Portable/GoogleChromePortable_109.0.5414.120_online.paf.exe/download

IV. Troubleshooting

Immediately after installing Windows 7, I was hit with the Windows Update Error 80072EFE.
Luckily, there was an easy fix available on the Microsoft Support Community, namely downloading and installing Windows Update Client for Windows 7 and Windows Server 2008 R2: March 2016. Windows Update worked after this, and I was offered to install hundreds of update packages.

Thursday, July 27, 2023

Daewoo AKF-7331V

Back in the 1990s we had a Daewoo radio casette player in our car. Using the Google image search, I tracked down the model: "Daewoo AKF-7331V" (the "AKF-7261V" is also very similar).

For some reason, this model was particularly tricky to find online.
Most breadcrumbs are on eastern European websites, especially on Polish websites.

Some online listings (expired or sold out):

Brand new AKF-7261V [link]
Used AKF-7331V [link]

1990s Daewoo AKF-9255V Car Cassette FM Radio Player + Bluetooth
https://www.youtube.com/watch?v=sISS7SWx4GM

Tuesday, July 25, 2023

Warez Wayback Machine

[WORK IN PROGRESS]

Starting points for Internet history research:

https://web.archive.org/web/20030801071523/http://www.thebugs.ws/ddl/
https://web.archive.org/web/20051101012155/http://crackz.ws/
https://web.archive.org/web/20040823015457/http://cracks.am/
https://web.archive.org/web/20051101005639/http://www.serials.ws/
https://web.archive.org/web/20040609203254/http://www.cracksearchengine.net/
https://web.archive.org/web/20040610162555/http://www.freeserials.com/
https://web.archive.org/web/20040729185658/http://phazeddl.com/
https://web.archive.org/web/20040804052535/http://astalavista.us/
https://web.archive.org/web/20051101010218/http://astalavista.box.sk/
https://web.archive.org/web/20040803110015if_/http://kickme.to:80/cosmocon/
https://web.archive.org/web/20040610083049/http://www.crackfound.com/
https://web.archive.org/web/20040608072032/http://directwarez.com/
https://web.archive.org/web/20071102080755/http://www.serialdevil.com/
https://web.archive.org/web/20051101010841/http://crackdb.com/
https://web.archive.org/web/20051022044438/http://nfodb.org/
https://web.archive.org/web/20051101013917/http://crackfind.com/
https://web.archive.org/web/20070719160119/http://piratebot.org/
https://web.archive.org/web/20060228174813/http://www.appzplanet.com/
https://web.archive.org/web/20070102233234/http://crackedappz.com/
https://web.archive.org/web/20050208083724/http://keygen.us/
https://web.archive.org/web/20070403022512if_/http://www.mylinkz.dl.am:80/
https://web.archive.org/web/20070503164157/http://quickstyle.ru/
https://web.archive.org/web/20070419043140/http://gfxworld.org/
https://web.archive.org/web/20050418104651/http://www.kiluminati.com/banner.htm

https://web.archive.org/web/20070614100729/http://www.logomaid.com/
https://web.archive.org/web/20070610031606/http://www.designgalaxy.net/
https://web.archive.org/web/20061106062333/http://www.hackersbook.com/index.php

BlaCk^D3v|L by maSSiccio

For researchers of internet history, here is the search for the "BlaCk^D3v|L by maSSiccio" mIRC script, popular in the early 2000s.

I.
A web search for "mIRC script devil by maSSiccio" returned:
http://web.tiscali.it/theblob/download-big.htm

It links to "http://web.interpuntonet.it/bdevil/", which is available as archived in 2001:
https://web.archive.org/web/20010827101902/http://web.interpuntonet.it/bdevil/

This site has a download page for version 3.666, offering both an English and an Italian mirror, as well as an "addons and libraries" page, but no actual files are archived.

https://web.archive.org/web/20010306033756/http://web.interpuntonet.it/bdevil/pag1.html
"http://members.nbci.com/blackdevilscript/bdevil.exe"
"http://web.tiscalinet.it/gelleone61/bdevil.exe"
https://web.archive.org/web/20010308095434/http://web.interpuntonet.it/bdevil/addo.html

The Hotmail email address mentioned on the archived website is no longer active ("blackdevilscript").

II.
Another search result is, offering version 2.0
https://www.oocities.org/hhh2000_999/MIRC.htm

The download URL "http://216.234.161.131/files/5.6/bdevil.zip" has not been archived.

III.
Another search result, offering version 1.666
http://thelords.scriptmania.com/scripts/

The download page "http://www.mircscripts.com/cgibin/download.cgi?s=bdevil.zip&v=5.6" has been archived, but the actual file is not available.

IV.
The script is mentioned in old IRC logs from 2002, themselves worth keeping as part of internet history:

[22:38] charless (~19@195.113.65.246) left irc: BlaCk^D3v|L 4.0 S´SRvG by maSSiccio, http://www.blackdevilscript.net

[source][archive]

[00:04] smallfly (~smallfly@ip4.ktvprerov.cz) left irc: BlaCk^D3v|L 3.666 by maSSiccio, http://www.blackdevilscript.com

[source][archive]

The mentioned URL ("blackdevilscript.com") is archived and redirects to "blackdevilscript.areanews24.com" (also archived):

Again, there is a download page:
https://web.archive.org/web/20020213084635/http://blackdevilscript.areanews24.com/bdevil/download.php

The main installer is listed as "bdevil_40.exe", "bdevil40.exe" or "Bdevil30.exe", depending on the version. There are three versions: 3.0, 3.666, 4.0.

The installers for versions 3.666 and 4.0 served by this website have been archived on the Wayback Machine:
https://web.archive.org/web/*/http://blackdevilscript.areanews24.com/bdevil/download/*

Note that antivirus software can mark these files as malicious, because they contain various tools used for offensive actions on IRC. They would be largely harmless today, but to stay on the safe side, and for the most authentic experience, you should only run these files safely inside a virtual machine.

VirusTotal reports:

bdevil3666.exe
bdevil_40.exeBlaCk^D3v|L by maSSiccio

Wednesday, July 12, 2023

Acer Aspire 7741G

My Acer Aspire 7741G laptop died (GPU issue), and so in a last ditch effort to recover it (the other components worked fine), I ordered a replacement motherboard from China. Even the guy from the repair shop was surprised it worked! This was two years ago (summer 2021), and the laptop still works.

The part is no longer available to buy from the same seller, but anyone interested in reviving dead laptops can probably still track one down.

https://www.aliexpress.com/item/32932142925.html

Clear Twitter "interests" list

Twitter subscribes you to a list of interests:

https://twitter.com/settings/your_twitter_data/twitter_interests

Some code to help you clear it:

const sleep = t => new Promise(r => setTimeout(r, t));

async function purge() {
  const nodes = document.querySelectorAll('input[type="checkbox"]:checked');

  let i = 0;
  for (const node of nodes) {
    document.title = `${++i} / ${nodes.length}`;
    console.log(new Date().toISOString(), i, node.closest('label').querySelector('span').textContent);
    node.click();
    await sleep(3_000);
  }
}
purge().catch(console.error);