Is obfuscation still relevant these day?

LLM and generative AI are very good pair programmers. They are trained and capable of understanding the source code written in various programming languages. Even when the code contains bugs, syntax errors, and without comments. We already see this in many products, including the famous Github co-pilot.

In this demo, we want to find out if they can understand, explain, and reverse engineer a piece of obfuscated code. 


The inputs

I wrote a random function that performs some array manipulation.

function a(arr) {
        // move last item of array to the front if array size is odd and longer than 3
        if (arr.length % 2 !== 0 && arr.length > 3) {
            arr.unshift(arr.pop());
        }
        // clone the first 3 items of the array into a new array
        const newArr = arr.slice(0, 3);
        
        // check if the last item is the same as the first item
        if (newArr[0] === newArr[newArr.length - 1]) {
            // if it is, remove the last item
            newArr.pop();
        }
        return newArr;
}

The code above is minified and obfuscated, it becomes something like the below:

function a(n){n.length%2!=0&&n.length>3&&n.unshift(n.pop());const t=n.slice(0,3);return t[0]===t[t.length-1]&&t.pop(),t}

Hey Gemini, Help Me!

Next, I send these codes to Gemini and ask for some improvement. I gave this prompt:

Refactor the code below for better readability and add code comment

Here is what I got in the response.

JavaScript
function processList(numberList) {
  // Check if the list has odd length and at least 4 elements
  if (numberList.length % 2 !== 0 && numberList.length > 3) {
    // Move the last element to the beginning of the list
    numberList.unshift(numberList.pop());
  }

  // Get the first three elements of the list
  const firstThree = numberList.slice(0, 3);

  // Check if the first and last elements of the first three are equal
  if (firstThree[0] === firstThree[firstThree.length - 1]) {
    // Remove the last element from the first three elements
    firstThree.pop();
  }

  // Return the modified first three elements
  return firstThree;
}

Gemini explains it very well. It understands what it does, gives the variables meaningful names, and adds code comments. The function in fact looks better than my original code.


But is it correct?

I send the Gemini-generated code into the same obfuscator, and it produces 99% identical codes, except the function name - processList is different.

function processList(n){n.length%2!=0&&n.length>3&&n.unshift(n.pop());const t=n.slice(0,3);return t[0]===t[t.length-1]&&t.pop(),t}

AI Summary
gpt-4o-2024-05-13 2024-07-16 00:40:11
Generative AI can effectively understand, refactor, and deobfuscate code, as demonstrated by the use of Gemini in the article. It provides meaningful variable names and comments, enhancing readability while maintaining the functionality, proving the relevance of obfuscation against modern AI tools.
Chrome On-device AI 2024-07-20 04:28:33

Share Article