Overview

Requirements:

Correct
Efficient
Secure

Targeted/Intended program functionality vs possible behavior:

A program may behave beyond its intended functionality!

Possible Security Problems

Insecure implementation:
- Many Programs are not implemeted properly, allowing attacker (The person who invokes the process)
- Deviate from the programmer’s intents
Unanticipated input:
- The attacker may supply inputs in a form that is not anticipated by the programmer which can unintendedly cause process to: - Access sensitive resources
  - Execute some injected code; or
  - Deviate from the original intended execution path
    
    Either way, the attacker manages to elevate its privilege

Sample:

Buffer overflow:
- Morris worm
- Code red worm
- SQL Slammer worm
- Various attacks on game consoles so that unlicense software can run without the need for hardware
SQL injection
Integer overflow

Root causes of security Problems

Functionality: Still the primary concern during design and implementation
- Security is a secondary goal
- Features pays the bills (Typically)
Unavoidable human mistakes:
- (lack of) awareness of secuirty problems
- Careless programmers
Complex modern computing system:
- Many of the bugs are very simple and seem easy to prevent, but programs for complex system are large
- Large attack surface as wells

Computer architecture background

Code vs data, program counter

Modern computers are based on the von neumann computer architecture: - The code and data are stored together in the memory - There is no clear distinction of code and data - This is in contrast to the Harvard architecture, which has hardware components that separately store code and data

Serious implication:

Programs may be tricked into treated input data as code: Basis for all code-injection attacks!

Control flow

The program counter (Instruction pointer):
- A register (ie small & fast memory within the processor) that stored the address of the next instruction
After an instruction is completed, the processor fetches the next instruction from the address stored in the program counter
After the next instruction is fetched, the program counter automatically increase by 1 (Assuming a system with fixed-length instruction)
During execution, besides getting incremented, the program counter (PC) can also be changed, for example*, by:
1. Direct Jump: The PC is replaced with a constant value specified in the instruction
2. Indirect Jump: The PC is replaced with a value fetched from the memory or stored in a general purpose register (Note that there are many different forms of indirect jump)

Stack

Function: Break code into smaller pieces
- Can be called in many program location

How does the program knows where it should continue after it finished?

The location is stored on the call stack

Where are the function’s arguments and local variable stored?

On the call stack

Call stack

Call stack: A data structure in the memory (Not in a sep hardware) that stores important infomation of a running process
Last in, first out (LIFO): with push(), pop(), top() operations
The location of the top element is referred to by the stack pointer
During a program execution, a stack is used to keep tracks of:
- Control flow infomation: i.e address
- Parameter passed to functions
- Local variables of function
Each call of a function pushes an activiation record (stack frame) to stack, which contains:
- passed parameters,
- return address,
- pointer to the previous stack frame,
- adn function local variables

This is called call stack

Function call:

When test(5) is called, some values are pushed onto the stack
- Parameter
- Return address
- Prev frame pointer
- Local variable b
The control flows jump into the code of test
Execute test
After test is completed, the stack frame is popped from the stack
The control flows jumps into the restored return address

Control flow integrity

Treating code as data:

Call stack stores a return address as data in memory
Instruction itself is stored as data in memory
Flexibility of treating code as daata is good but many security issue
Attacker could compromise a process execution integrity by
- modifying the process code
- Modifying the control flow
Difficult for the system to distinguised those malicious pieces of code from benign data

Notes on memory integrity

Not so easy to comporomise memory
Some restrictions:
- The attacker can only write to a small number of memory
- Can write a seq of consecutive bytes
- Attacker has to trick the users

Attacks on software

Integer overflow

The integer arithmetic in many programming language are actually modulo arithmetic
Suppose a is a single byte unsigned integer.
- In the following C or java statements, what would be the final value of a?
  - a = 254
  - a = a + 2
Its value is -> since the addition is done in mod 256
Hence the following predicate is not always true
- a < a + 1

Data/string representation and security

Different parts if a program/system adopts different data representations
Inconsistencies could lead to vulnerability
String is a very important data representation type:
- It has variable length
- How can we represents a string?

String representations

In C, printf() adopts an efficient represention:
- The length is not explicitly stored
- The first occurance of the null character (Byte with value 0) indicated the end of the string, thus implicity giving the length

There is a security risk for strings such as strncpy because the user can fill up the entire buffer without putting a null at the end of the string

Null Byte Injection

A CA may accept a host name containing null character
For example: luminus.nus.edu.sg\0.attacker.com
The verifer who use both string representation convention to verify the cert might be vulnerable

ATTACK:

The attacker registered the following domain name, and purchased a valid certificate with the domain name from some CA: luminus.nus.edu.sg\0.attacker.com
The attacker set up a spoofed Luminus website on another web server
The attacker directed a victims to the spoofed web server (e.g by controlling the physical layer or social engineering)
When visiting the spoofed web server, the victim’s browser:
- Find the web server in the certificate is valid: based on the non NULL termination representation
- Compares anddisplay the address as luminus.nus.edu.sg: based on NULL termination represesntation

Comparision: Normal web spoofing attack

What if its jus normal web spoofing attack scenario?
Even if the attacker manages to redirect the victim to the spoofed webserver a careful user would notice that either
- The address displayed in the browsers address bar is not luminus
- The address displays but TLS/SSL authentication protocol reject the connection
  
  Hence the attack previously, is much more dangerous: It can trick all browsesr user!

UTF-8 Character exploits

UTF-8 is a character encoding capable of encoding all 1m valid code points in unicode using one to four 8 bit bytes

Variable length encoding: Code points that tend to occur more frequently are encoding with lower numerical values thus fewer bytes are used.

There is an inconsistency between

Character verification process
Character usuage: Operation using the character

Browser actually accept different types of representation for the backslash chara

Potential problem:

Files are typically organised inside a directory
Client can be any remote public user
Original intention: Client can retrieved only files under the directory
But attacker might send this string: ../cs2107report.pdf
This would be read by the server as /home/student/alice/public_html/../cs2107report.pdf
Violates the intended file access containment
Blacklisting based filtering might be useless due to the flexibility of the encoding
TO prevent this, server may add an input validation to ensure that ../ never appears as substring
- But remember there are many different ways of representing /

Example: IP Address

Vulnerable checks:

Get string s from user
Extract 4 int from string s, (a,b,c,d)
- If s doesnt follow correct input then quit
call BL() to check that abcd is not in the black list, else quit
Let ip = a * 2^24 + b * 2^16 + c * 2^8 + d where ip is 32 bit integer
Continue processing with filtered ip

Why is this vulnerable? Use a different representation of a,b,c,d since the decoding is different

Security guidelines

Never trust user input
ALways convert to standard representation immediately
Do not rely on the verification check done in the application
try to make use of the underlying system access control mechanism

Buffer overflow

C/C+ and memory access

ALlows the programmers to manage the memory: Pointer arithmetic, no array bounds checking
Flexibility is useful but prone to bugs which in turns lead to vulnerability

Buffer Overflow/Overrun

The previous example illustrates buffer overflow
A data buffer: “A contiguous region of memory use to temporarily store data, while its is being moved from one place to another” (Like a array of characters)
In general, a buffer overflow referes to a situation where data is written beyond a buffer boundary
strcpy() is prone to overflow

Avoid strcpy

Defence

Avoid strcpy, use strncpy
However, strncpy might be still prone to vulnerabilities

Stack smashing

Stack overflow
Special case of buffer overflow
return address is modify and the execution control flow will be changed
COuld also inkect the attackers shell code into the process memory and execute the shell code

Example

Vulnerable code:

Values are push into the stack
The buffer c grows towards the return address
If attacker manage to modify the return address, the control flow will jump to the address indicated by the attacker

If more characters are copied, the value will get overflow and the return address will get overwritten
It will also can jump to the executed code section

Code/script injection

Mixing code and data is unsafe
Many attacks inject malicous code as data which will then gets executed by the target system
Considered SQL injection attack
Scripting languages: Programming languages that can be interpreted by another program during runtime instead of being complied

SQL and Query

SQL is database query language
Code can be put as a string and executed in runtime
Consider a database (Which can be viewed as table): each column/field is associated with an attibute
Also allows variable:
- e.g a script may first get the user’s inputs and stores it in the variable $userinput and subsequently runs: SELECT * FROM Client WHERE name = ‘$userinput’

This can happen in login page, where the name is selected by the user

Example

In the example of the login page, web developers will take the user input and use the SELECT search query to retrieve the data

Attack can do this
- Origin: SELECT * FROM client WHERE '$userinput'
- Bob’ OR 1=1 –`
- this becomes SELECT * FROM client WHERE name = 'bob' OR 1=1 --
- All rows in this client table will be return this this case

Undocumented access points

For debugging, many programmers insert undocumented access point to inspect states
- Press certain key combo, values of certain variables will be display
- Certain input string would branch to debug mode
These access points may mistakenly remain in the final production system providing backdoors to the attackers
A backdoor: A covert method of bypassing normal authentication
Such access points are easter eggs
- Some are fun
- Some are put there by unhappy programmers
Important: Logic bombs, easter eggs, backdoors

Defence and preventive measures

Many bugs and vulnerability are due to programmer ignorance
It is diffucult to analysis a program to ensure that its bug free
There is no fool proof method

Input validation and using filtering

Filtering

Input must follow expected format
- Length
- Cannot contain certain meta characters
- Does not contain negative numbers
Filter the input or validate it

Problems:

Difficult to ensure filter is complete
- e.g UTF: Input validation intend to detect the substring
- There are multiple representations of “../” that the programmer is not afraid of
- A filter that completely blocks all bad inputs and accepts all legitimate inputs is very difficult to design

Approaches:

White list
- List of items are allowed to pass
- Regex
- Bad: Some legitamate input might be block
Black list
- A list of items that are known to be bad and has to be rejected
- To prevent SQL injects, if the input contains meta characters, reject it
- Malicious input may be passed

Which is more secure?

Safer functions

Avoid function that cause problems
For example, repolace strcpy with strncpy
There still might be a vul:
- combine strlen and strncpy

Bounds check and type safety

Bounds check:

Some programming language perform bounds checking at run time
Checks upper and lower bounds
process halt if both checks fail
reduces efficiency but prevent buffer overflow

Type safety:

Check to ensure args in operation get are always correct
e.g, if a= b but a is 8 bit while b is 64, type is wrong
Check done in runtime(dynamic type) or compiled (static type)

Memory protection

Canaries

Secret value inserted at carefully selected memory location at runtime
Checks are carried at at runtime to ensure that the values are not being modified
Canaries can detect overflow such as stack overflow:
- Typical bufferoverflow, consecutive mem location have to be over ran but canaries would be modified
Keep the values secret: If attacker knows, they can write the secret value to the canary while overrunning it
If the buffer was overflow, the canary would be overwritten

Another example

Buffer overflow occurs at strcpy
./program-too-wsp aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

If compiled with stack protector:

Stack smashing
Function exiting
Aborted (core dumpted)

Without stack protector:

Function exit
Seg fault but, the program might jump to a unwanted address

Example 2

./program-too-wsp aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

If compiled with stack protector:

Stack smashing
Function exiting
Aborted (core dumpted)

Without stack protector:

Function exit
Seg fault but, the program might jump to a unwanted address
main() is exiting is shown

Memory randominisation

ASLR (Address space layout randomisation) is a prevention technique that helps decrease the attacker’s chance
Randomly arranges the address space position of key data areas of a process including:
- THe base area of the executable and the position of the stack, heap and libraries

Code inspection

Manual checking: Manually check the progranm is tedious
Automated: Some automation and tools are posible
Taint analysis:
- Variables that contain input from potential malicious users are labeled as sources
- Critical functions are labeled as sinks
- Taint analysis checks whether any of the sinks arguments could potentially be affected by a source
- Source = user input
- Special check: Carried out
- Taint analysis can be static or dynamic
  Testing
Vulnerability can be discovered via testing
Types:
- White box testing: Tester has access to application source code
- Black box: Does not have access to source code
- Grey box: Combination above, reverse engineered binary/executable
Security testing attempts to discover intentional attack and hence would test for inputs that are rarely occured under normal circumstance
Examples: Long name, names containing numeric values, string containing meta characters
Fuzzing is a technique that sends the malformed inputs to discovered vulnerability:
- Some techniques are btter then this for random inputs
- Fuzzing can be automated or semi auto

Principle of least priviledge

When writing a prob, be conservative in elevating the priviledge
When deploying software system, do not give the users more access rights than neccessary and do not activate unnecesarry options
- Like setUID
  Terminology: Hardening
  - hardening is usually the process of securing a system by reducing its surface of vulnerability, which is larger when a system performs more functions;

Patching: Up to date

Life cycle of vulnerability:
1. Discovereed
2. Code fix
3. Revised tested
4. Patch made public
5. PAtch applied
Sometimes vulnerability can be announced without the techniqcal details
When patch release, the patch can be useful to attackers too: Inspect the patch and derive the vulnerability
Funnily, sometimes the number of successful attack increase when a patch is made
Cruicial to apply timely
Critical system: DOnt apply patch immediately before rigorous testing
Patches might affect the applications and thus affect an org operation

Overview

Possible Security Problems

Computer architecture background

Code vs data, program counter

Control flow

Stack

Call stack

Control flow integrity

Notes on memory integrity

Attacks on software

Integer overflow

Data/string representation and security

String representations

Null Byte Injection

UTF-8 Character exploits

Security guidelines

Buffer overflow

C/C+ and memory access

Buffer Overflow/Overrun

Defence

Stack smashing

Example

Code/script injection

SQL and Query

Example

Undocumented access points

Defence and preventive measures

Input validation and using filtering

Filtering

Safer functions

Bounds check and type safety

Memory protection

Canaries

Another example

Example 2

Memory randominisation

Code inspection

Testing

Principle of least priviledge

Patching: Up to date

Slides