Quantcast
Channel: Positive Technologies - learn and secure
Viewing all 198 articles
Browse latest View live

Web Application Attack Statistics: Q2 2017

$
0
0
This report provides statistics on attacks performed against web applications during the second quarter of 2017. Sources of data are pilot projects involving deployment of PT Application Firewall, as well as Positive Technologies’ own PT AF installations.

The report describes the most common types of attacks as well as the objectives, intensity, and time distribution of attacks. It also contains industry-by-industry statistics. With this up-to-date picture of attacks, companies and organizations can monitor trends in web application security, identify the most important threats, and focus their efforts during web application development and subsequent protection.

Automated vulnerability scanners (such as Acunetix) have been excluded from the data used here. The example attacks presented in this report have been manually verified to rule out false positives.

Protection data for Positive Technologies itself has been classified under the IT sector for reporting purposes.

Web application attacks: statistics


Attack types

In Q2 2017, Cross-Site Scripting was the most common type of attack. SQL Injection, used to access sensitive information or run OS commands for further penetration of a system, represented almost one fourth of the total number of attacks, the same as in the first quarter of 2017. Going forward, we expect that Cross-Site Scripting and SQL Injection will continue to make up at least half of all web application attacks. In addition, our list of frequent attacks for Q2 includes Information Leak and XML Injection, both of which entail disclosure of information.



Figure 1. Top 10 web application attacks

An interesting picture appears if we separate the attacked companies by sector. Companies included government entities, financial services companies, IT companies, educational and healthcare institutions, as well as energy and manufacturing companies.

As in the first quarter, a large portion of attacks on government entities were aimed directly at gaining access to data. Personal data is the most critical resource possessed by government entities, due to which attacks tend to focus on either databases or application users directly. Although government websites are regarded by users as highly trustworthy, the users of these sites—more than in other sectors—are unlikely to know the basics of how to stay safe online. This fact makes government sites tempting targets for Cross-Site Scripting attacks, which can infect a user’s computer with malware. Another common type of attack in Q2 is Information Leak, which exploits various web application vulnerabilities in order to obtain additional data about users, the system itself, and other sensitive information.


Figure 2. Top 5 attacks on web applications of government institutions

Attacks on healthcare were also mostly driven by theft of information: more than half of attacks were aimed at gaining access to data. Medical organizations have recently suffered from several major data leaks: for example, in May, the Dark Overlord hacking group posted the medical records of around 180,000 patients from three medical centers. Another incident occurred at a Lithuanian plastic surgery clinic: over 25,000 photos, including naked before and after pictures, were made public. Initially, the hackers demanded a ransom from both the clinic (equaling EUR 344,000) and its clients (up to EUR 2,000 from each to delete the data). One more company that suffered in May due to a web application vulnerability was Molina Healthcare, with about 5 million patient records made public.

Nearly a quarter of all attacks were aimed at denial of service. Modern healthcare web applications frequently give patients the opportunity to learn more about a clinic and its services, schedule an appointment or house call, buy an insurance policy or service package, and get advice online. In the case of applications such as these, a successful DoS attack can damage not only a company’s reputation and cause inconvenience to patients, but cause financial losses for the company.


Figure 3. Top 5 attacks on web applications of healthcare institutions

The most common attacks on IT companies, as in the first quarter, are Cross-Site Scripting and SQL Injection. If successful, such attacks may trigger significant reputational losses for IT companies in particular. SQL Injection can be used to obtain information as well as for other purposes, such as defacing websites. Cross-Site Scripting can be used to infect user workstations with malware.


Figure 4. Top 5 attacks on web applications of IT companies

Attacks on educational institutions are generally intended to access data (such as exam materials) or modify it (such as exam results). In Q2, more than half of attacks aimed to obtain access to information, Path Traversal being the most common method. Almost one in six attacks was targeted at OS Commanding.


Figure 5. Top 5 attacks on web applications of educational institutions

By contrast, in the case of energy and manufacturing companies, attackers’ objective is to obtain full control over company infrastructure. Therefore the most common attacks attempt to run arbitrary OS commands and gain control over the server or obtain information about the system; attacks on users are few and far between. By launching attacks against the target company’s internal network, an attacker can gain access to critical system components and interfere with operations.


Figure 6. Top 5 attacks on web applications of energy and manufacturing companies

The following screenshot gives an example of detection of remote command execution. The attempt involved an exploit of the CVE-2017-5638 vulnerability in Apache Struts, a free open-source framework used for creating Java web applications. The vulnerability allows attackers to execute arbitrary code on a server by changing the Content-Type HTTP header. This vulnerability became known to the public in March 2017 and the first attempts to exploit it against the web applications included in this report were recorded on April 3.


Figure 7. Example of attack detection: OS Commanding

Another example of OS Commanding demonstrates attackers’ efforts to exploit vulnerabilities not only in web applications, but in networking device firmware as well. Vulnerability CVE-2017-8220 was published on April 25 and attacks on devices started soon after on April 28.


Figure 8. Example of attack detection: OS Commanding

As these cases indicate, it may take only a few days for attackers to “weaponize” a newly published vulnerability. (More time may be required for exploiting more complex vulnerabilities.) Attackers primarily try to exploit vulnerabilities that have been discovered recently, because targets are less likely to have installed the corresponding updates.

Use of outdated software facilitates attackers’ activities, because the Internet is full of information about all known vulnerabilities as well as ready-made exploits for them. Attackers have multiple ways to find out which versions are in use on a particular system, whether by obtaining information with the help of application misconfiguration or by exploiting version-specific vulnerabilities. At one company hosting a Positive Technologies pilot project, an out-of-date Joomla version was in use. One attacker planned to take advantage of that with an exploit for a vulnerability discovered in 2015 that allows executing arbitrary code (CVE-2015-8562).


Figure 9. Example of attack detection: OS Commanding

In contrast to such one-off attempts, a dedicated attacker may employ an entire chain consisting of several targeted attacks against a single target. To prevent incidents, it is extremely important to quickly identify such chains and prevent them from progressing. For this purpose, a web application firewall should cross-check all events for correlations in real time. Attackers can disguise their activities in a number of ways, such as by using diverse hacking techniques, taking breaks between attacks, and changing their IP address. The following screenshots from the PT AF interface show an example of a detected SQL Injection attack chain. The chain included 38 related attacks, each of which was classified as having a high degree of risk.


Figure 10. Example of detected attack chain: SQL Injection


Figure 11. SQL Injection attacks that have been correlated into a single attack chain

In terms of the average number of attacks per day, IT and government lead the pack. They were followed by healthcare, education, and energy and manufacturing companies. Compared to the previous quarters, the number of attacks on government web applications decreased. This trend is likely caused by the nature of the web applications included in this quarter’s research: most of the websites are intended to provide information and have no functionality of interest to attackers. Attacks on websites of manufacturing companies are generally targeted and are carried out by experienced hackers who act very carefully to escape notice. So despite the small number of attacks in this sector, these attacks are in fact the most dangerous ones.


Figure 12. Average number of attacks per day, by sector

In Q2, attackers showed more interest in attacks on application users. Most attacks were intended to access sensitive information.

As in Q1, hackers most frequently attacked the websites of government institutions and IT companies.

Attack trends


Let’s look at the distribution of attacks over time, specifically the number of attacks of each type encountered per day on average by a company. The following charts indicate the frequency and intensity of each web application attack method used by hackers. Information is also given on the most common attacks and which of them (based on the number of requests sent by attackers) dominate among malicious traffic.


Figure 13. Number of attacks per day, by type

Cross-Site Scripting attacks were consistently high throughout the quarter, with 100 to 250 of them recorded every day.

At 40 to 200 attacks per day, SQL Injection is highly visible on the chart as well. When looking for vulnerabilities caused by insufficient filtering of SQL query input, attackers tend to search intensively. The most powerful web application attack in Q2 was a search for SQL Injection vulnerabilities by bruteforcing all possible parameters, with a total of over 35,000 requests sent by the attacker.

Information Leak demonstrated an upward trend caused by abrupt spikes in the number of malicious requests on certain days.

Overall, the average amount of attacks of other types rarely exceeded 100 per day.

The following picture shows the overall intensity of attacks in Q2 for all industries, as measured by the average number of malicious requests per day directed at a company.

Figure 14. Distribution of attacks by day of week

Compared to the previous quarter, attackers became slightly less active. Web applications were hit by 300 to 800 attacks on average per day, dipping as low as 140 on the slowest day. The maximum number of attacks on a single company in a single day was 35,135, almost double the top value from the previous quarter. Practically all these attacks were from the same IP address. The attacker tried to find an SQL Injection vulnerability, apparently with the help of special scripts. The following figures show the number of attacks on this company on a timeline of three days, as well as the hour when the highest number of attacks was recorded.


Figure 15. High-intensity SQL Injection attack on May 3 (PT AF interface)


Figure 16. High-intensity SQL Injection attack during a single hour (PT AF interface)

The following chart shows the distribution of attacks by time of day, on average, for a single company. Data comes from all sectors, based on the local time of the company under attack.


Figure 17. Distribution of attacks by time of day: 0 = 12 a.m. (midnight), 12 = 12 p.m. (noon)

This picture resembles the one we had for the first quarter: the number of attacks is basically stable throughout the day, but increases during the afternoon and evening. As an example, below is a screenshot from the PT AF interface of one client company with data for April 17. We can see that attacks were mostly conducted in the afternoon, with the peaks corresponding to an increased number of requests.


Figure 18. Hour-by-hour graph of attacks on April 17, displayed in the PT AF interface

Such results, as in the previous quarter, are caused by the fact that users (who are the targets of around one third of attacks) are particularly active during these hours. Once again, we found that the intensity of attacks remains rather high both day and night.

One reason that may motivate attackers to strike at night is that the target’s security staff are less likely to notice, and therefore react to, an attack.

When designing corporate security measures, it is best to take into account the times during which attacker activity is at its peak. These times may be company- or sector-specific. While attack intensity was generally stable in Q2, certain time periods did see a rise in activity. Particularly when attacks are performed during non-working hours, timely reaction and prevention of incidents require smart web application protection tools, as well as qualified security incident reaction staff.

Conclusions

Attackers were consistently active throughout the entire period considered (Q2 2017). However, even these numbers, just as in the previous quarter, represent a slight drop compared to the number of attacks on web applications in 2016. Attempts to access sensitive information and attack web application users were the main techniques used. The websites of government institutions and IT companies are still the perennial “favorites" of attackers and we forecast that the situation will remain the same in the next quarter. Moreover, we expect to see an increase in the number of attacks triggered by publication of new vulnerabilities in popular content management systems (such as Joomla).

After vulnerabilities have been detected and made public, many web applications remain vulnerable due to failure to stay up to date with system updates and patches. Our report clearly shows that attackers are quick to make use of newly published vulnerabilities, weaponizing them within days. Effective protection requires both timely software updates and proactive measures, such as a web application firewall, to detect and prevent attacks on web applications.


How hackers could negatively impact a country's entire economy

$
0
0

Despite enormous efforts, security is always a work in progress because of technical vulnerabilities and the human factor. In the modern digital economy, criminals are becoming ever more creative in ways to make off with millions without having to leave home. And the actions of cybercriminals could actually negatively impact acountry's economy. Here are some scenarios of possible attacks.


Unchained malware 


We’ve already seen this with WannaCry-like epidemics. What is important here: even though later malware NotPetya looked like ransomware, our analysis shows that monetization through ransom wasn't the main motivation for its creators. This malware didn't plan to unlock victims' computers even if they pay. So it's possible that NotPetya was used as a "smoke screen" to cover some other local operation... but the impact was international. In future, such malware could really devastate some country's economy even if it wasn't planned.

Multi-stage bank attack 


The most infamous is the Bangladesh Bank robbery in 2016, when instructions to steal almost a billion of dollars from the central bank of Bangladesh were issued via the SWIFT network. A dangerous trend we’ve noticed recently is that hackers use multi-stage attacks though a number of organizations. It could start from a fishing letter sent to some organization that is not financial, but it works with some banks as a partner, so the bank could be attacked from the contractors' accounts after they are hacked.

As our investigation shows, 75% of companies targeted by the Cobalt hacking group are from the financial sector, while 25% are banks' partners (including government, telecom, entertainment and healthcare companies) used as a stepping stone for further attacks on financial sector.

Stock exchange attack 


We also see that financial companies being targeted by Cobalt group now include not only banks but financial exchanges, investment funds, and lenders. This widening range of targets suggests that attacks on diverse companies with major financial flows are underway. By attacking a financial exchange, a criminal group like Cobalt can "pump" or "dump" stocks, incentivizing purchase or sale of shares in certain companies in a way that causes rapid fluctuations in share price. Such stock manipulations can affect the economy of entire countries.

These methods were employed by the Corkow group in their 2016 attack on Russia's Energobank, which caused a 15-percent change in the exchange rate of the ruble and caused bank losses of RUB 244 million (over USD 4 million).

AI bots 


It's easy to blame hackers and criminals, but the fact is: the modern digital economy can be ruined "quite legally" with little human intervention. In 2013, the Nature magazine published a research paper called "Abrupt rise of new machine ecology beyond human response time", this paper explains how high frequency trading robots provoked the economic crisis of 2008.

Now, ten years later, bots have become cleverer and faster, while there is still no serious security rules and limitations on machine intelligence development. This could be a real danger. Human hackers usually don't want to shut down all the financial system, they need it running so they could still money from it. As to bots, they don't care about humans or their financial systems at all. And bots don't have to sleep.

Critical KRACK Flaws in WPA Wi-Fi Security: Here’s How to Protect Yourself

$
0
0

Security researchers from Belgian University KU Leuven revealed a key reinstallation attack vulnerability in the WPA2 Wi-Fi protocol. Using this flaw an attacker within range of a person logged onto a wireless network could use key reinstallation attacks to bypass WPA2 network security and read information that should have been securely encrypted. What are the possible consequences of this revelation and how end users can protect themselves?

Is Everything That Bad?


The short answer is ‘yes’. The vulnerabilities in WPA2 discovered by the researchers could potentially affect every Wi-Fi router, PC and mobile phone, since the flaws are in the Wi-Fi encryption protocol, and not specific to any device or other piece of hardware.

With this vulnerability, hackers can intercept any data a user types in after connecting to a Wi-Fi access point. That includes access credentials for email, websites and online banking resources, as well as credit card information, private messages and correspondence. Malicious actors can decrypt ALL traffic and read ALL of the information transmitted.

To take advantage of these flaws, a malicious actor would first need to be in close physical proximity to a vulnerable device. Then, they can exploit the WPA2 flaws to eavesdrop on network traffic, steal data such as credit card numbers, emails, passwords and more, or even hijack connections and inject malware into websites. Moreover, hackers are able not only to steal data but to actively attack users — spreading malware, including crypto lockers by injecting their code into HTTP traffic.

How To Protect Yourself


All users must install the appropriate security updates as soon as vendors release them. If updates are not available yet, use a VPN as a temporary security measure. This will reduce the likelihood of an attack being successful.

Another option is to use only HTTPS protected websites — KRACK attack does not allow cybercriminals to decrypt encrypted HTTPS traffic. In order to do this hackers would need to use more sophisticated tools and attack methods which could be spotted by any users that are aware of basic information security measures.

Author: Leigh-Anne Galloway, Cyber Security Resilience Lead at Positive Technologies

A major flaw in a popular encryption library undermines security of millions of crypto keys

$
0
0


An international IT security team of researchers from Britain, Slovakia, Czech Republic, and Italy found a critical vulnerability in the popular encryption library RSA Library v1.02.013 by Infineon. Weak factoring mechanism results in attackers obtaining secret crypto keys and using them for data breach and theft. 

This vulnerable library is used to ensure security of national ID maps in various countries and in most popular software products that are used by both government and businesses.

What's wrong with it?


The weakness in the factoring mechanism enables attackers to deduce the secret part of any vulnerable crypto key by only using a relevant open key.  Having obtained the secret key, the attacker may impersonate the key owner, decypher sensitive data, upload a malicious code to the software signed with the key and breach security on stolen PCs.

This vulnerable encryption library is developed by the German manufacturer Infineon while the error has existed since 2012.  This flaw is critical because the library is used by two international security standards, so it is used by many corporations and governments all over the world. 

First implications


The researchers checked national ID cards of four countries and quickly found that cards of at least two countries—Estonia and Slovakia—were attempting to ensure security with vulnerable keys of 2,048 bytes. Estonian authorities confirmed the vulnerability, stating that they had issued about 750 thousand vulnerable cards since 2014. In 2015 one of Ars Technica journalists obtained a card of Estonia's digital resident — the experiment showed that the key used in that card was subject to factoring.

Besides, Microsoft, Google, and Infineon warned that weaknesses in the factoring mechanism may have a big impact on efficiency of embedded security mechanisms in TPM products.  Ironically, such crypto chips are used to ensure additional security of users and organizations that are most vulnerable to hacking. 

The researchers also checked 41 models of various laptops based on TPM chips and found that Infineon's library was used in 10.  The vulnerability is the most pronounced in TPM Version 1.2 because those keys used to control operation of the Microsoft BitLocker encryptor are subject to factoring. This means that anyone to steal or obtain a vulnerable computer will manage to overcome hard drive and boot loader security. 

Further, the researchers detected 237 factorized keys that were used to sign software published on GitHub. The software includes quite popular packages. 

Among other findings, there were 2,892 PGP keys used to encrypt email correspondence where 956 keys were subject to factoring.  As experts say, most vulnerable PGP keys were generated on the basis of the USB product Yubikey 4.  Meanwhile, other USB key functions, including U2F authentication, contained no vulnerabilities.

Finally, the researchers managed to find 15 factorized keys used for TLS.  Most of them contained the word SCADA in the description string.

How to protect yourself


The researchers will present a full report regarding their findings at the ACM Conference on Computer and Communications Security. To give users enough time to replace the keys, no detailed description of the factoring method will be provided before the conference. 

Still, the researchers published a tool allowing detecting if a certain key had been generated on the basis of the vulnerable library. For more details, please see their blog post. Besides, Infineon has released a firmware update covering this vulnerability while TPM producers are now working on their own patches.

The researchers have also contacted GitHub's administration, and now the service is informing its users that they need to replace keys used for software signature.  In their turn, Estonian authorities have closed their public key data base, but we have seen no announcements regarding replacement of the vulnerable ID cards.

How-To: Obtaining Full System Access Via USB

$
0
0

Debugging mechanisms like JTAG (IEEE1149.1)  first appeared in the 1980s . Over time, microchip vendors extended the functionality of these interfaces. This allowed developers to obtain detailed information on power consumption, find bottlenecks in high-performance algorithms, and perform many other useful tasks.

Hardware debugging tools are also of interest to security researchers. These tools grant low-lev el system access and bypass important security protections, making it easier for researchers to study a platform's behavior and undocumented features. Unsurprisingly, these abilities have attracted the attention of intelligence services as well.

 For many years only a limited audience had access to these technologies for Intel processors, due to the need to own expensive specialized equipment. But with Skylake processors, the situation changed in a big way: debugging mechanisms were built into the Platform Controller Hub (PCH), which opened up this powerful tool to ordinary users, including malicious ones, who could use it to gain total control over the processor. For security reasons, these mechanisms are not activated by default, but as we show in this article, they can be activated on the equipment sold in common computer stores.
Read full research here: www.ptsecurity.com/upload/corporate/ww-en/analytics/Where-theres-a-JTAG-theres-a-way.pdf 

Do WAFs dream of static analyzers?

$
0
0

Virtual patching (VP) has been one of the most popular trends in application protection in recent years. Implemented at the level of a web application firewall, VP allows protecting web applications against exploitation of previously defined vulnerabilities. (For our purposes, a web application firewall, or WAF, will refer to a dedicated solution operating on a separate node between an external gateway and web server.)

In short, VP works by taking the results of static application security testing (SAST) and using them to create rules for filtering HTTP requests on the WAF. The problem, though, is that SAST and WAFs rely on different application presentation models and different decision-making methods. As a result, none of the currently available solutions do an adequate job of integrating SAST with WAFs. SAST is based on the white-box model, which applies formal approaches to detect vulnerabilities in code. Meanwhile, a WAF perceives an application as a black box, so it uses heuristics for attack detection. This state of affairs makes VP sub-optimal for preventing attacks when the exploitation conditions for a vulnerability go beyond the trivial http_parameter=plain_text_attack_vector.

But what if we could make SAST and a WAF "play nice" with each other? Perhaps we could obtain information about an application's internal structure via SAST but then make this information available to the WAF. That way we could detect attacks on vulnerabilities in a provable way, instead of by mere guessing.

Splendors and miseries of traditional VP


The traditional approach to automated virtual patching for web applications involves providing the WAF with information about each vulnerability that has been detected with SAST. This information includes:

  • Vulnerability class
  • Vulnerable entry point to the web application (full or partial URL)
  • Values of additional HTTP request parameters necessary for the attack
  • Values of the vulnerable parameter constituting the attack vector
  • Set of characters or words (tokens) whose presence in a vulnerable parameter will lead to exploitation of the vulnerability.

The set of HTTP request parameters and dangerous elements of a vulnerable parameter can be defined both by bruteforcing and by using a generic function (typically based on regular expressions). Let us look at a fragment of code from an ASP.NET page that is vulnerable to XSS attacks:


By analyzing this attack vector code, we can generate a symbolic formula for the set of attack vector values:

{condition = "secret" ⇒ param ∈ { XSShtml-text }}, where XSShtml-textis the set of possible vectors of an XSS attack in the context of TEXT, as described in the HTML grammar.

This formula may yield both an exploit and a virtual patch. The descriptor of the WAF virtual patch can be used to generate filtering rules to block all HTTP requests capable of exploiting the relevant vulnerability.

Although this approach surely heads off certain attacks, it has some substantial drawbacks:

  • To demonstrate any given vulnerability, SAST needs to discover just one of the possible attack vectors. But to ensure true elimination of a vulnerability, it is necessary to address all possible attack vectors. Passing such information to the WAF is difficult, because the set of vectors is not only infinite but cannot even be expressed in regular expressions due to the irregularity of attack vector grammars. 
  • The same is true for values of all additional request parameters that are necessary for vulnerability exploitation.
  • Information regarding dangerous elements of a vulnerable parameter becomes useless if an attack vector, between the entry point and vulnerable execution point, undergoes intermediate transformations that change the context of its grammar or even its entire grammar (such as with Base64, URL, or HTML encoding, or string transformations). 

Due to these flaws, VP technology—which is designed for piecemeal protection—is incapable of offering protection against all possible attacks on SAST-detected vulnerabilities. Attempts to create such "all-encompassing" traffic filtering rules often lead to blocking of legitimate HTTP requests and disrupt operation of the web application. Let us slightly modify the vulnerable code:


The only difference from the previous example is that both request parameters now undergo a transformation and the condition for the "secret" parameter is weakened until the sub-string is included back in. The attack vector formula, based on analysis of this new code, is as follows:

(CustomDecode condition) ⊃ "secret" ⇒ param ∈ (CustomDecode { XSShtml-text }).

The analyzer will derive a formula at the relevant computation flow graph (CompFG) node for the CustomDecode function to describe the Base64—URL—Base64 transformation chain:

(FromBase64Str (UrlDecodeStr (FromBase64Str argument))). 

It is still possible to build an exploit on the basis of such formulas (we have considered this issue in a previous article), but the classical approach to generating virtual patches cannot be applied here for the following reasons:

  • The vulnerability may be exploited only if the decoded "condition" parameter of the request contains the "secret" substring (String 12). However, this parameter's set of values is quite large and expressing this set via regular expressions is infeasible due to the irregularity of decoding functions. 
  • A request parameter that is, in fact, an attack vector also is decoded (String 14). Therefore, SAST cannot describe that set of dangerous elements to the WAF. 

Since all the problems of traditional VP stem from the inability to interact with an application at the WAF level based on the white-box approach, the obvious solution is to implement this capability and make further improvements so that:

  • SAST provides the WAF with full information about all transformations to which a vulnerable parameter and variables of attack conditions are subjected, from entry point to vulnerable execution point. This enables the WAF to compute argument values based on the values of the parameters of a given HTTP request. 
  • For attack detection, heuristics are replaced with formal methods that are based on rigorous proof of all statements and describe the exploitation conditions for any particular vulnerability in the most general case, instead of haphazardly describing a limited number of cases. 
  • Thus was born runtime virtual patching.

Runtime virtual patching

Runtime virtual patching (RVP) is based on the computation flow graph model used in PT Application Inspector (PT AI). The model is built using abstract interpretation of an application's code, expressed in semantics similar to conventional symbolic computations. Nodes of this graph contain generating formulas in the target language. The formulas yield the set of all allowable values associated with all data flows at the relevant execution points:


These flows are called execution point arguments. CompFG is evaluable, and thus able to compute sets of specific values for all arguments at any execution point, based on the values that have been set for input parameters.

RVP occurs in two stages, which correspond to the application lifecycle: Deployment (D) and Run (R).


Deployment stage


Before a new version of an application is deployed, the application is analyzed by PT AI. Three formulas are computed for each CompFG node that describes a vulnerable execution point:

  • Conditions for reaching the vulnerable execution point
  • Conditions for reaching values of all its arguments
  • Sets of values of all arguments and corresponding grammars 

All formula sets are grouped by the application entry point to whose control flow the vulnerability relates. The very notion of entry point is specific to each web framework supported by PT AI and is defined in the analyzer's database.

Then a report containing the list of vulnerabilities and related formulas is extracted in the form of code written in a special language based on S-expression syntax. This language describes CompFG formulas in a form that does not depend on the target language. For instance, the formula describing the value of an argument for the vulnerable point in the above code sample is as follows:

(+ (+ "Parameter value is `" (FromBase64Str (UrlDecodeStr (FromBase64Str (GetParameterData (param)))))) "`")

The formula for reaching the vulnerable point is:

(Contains (FromBase64Str (UrlDecodeStr (FromBase64Str (GetParameterData (condition))))) "secret").

The report is then uploaded to PT Application Firewall (PT AF). On the basis of the report, a binary module is generated, which can compute all the formulas contained in the report. For example, the decompiled code for computing the condition for reaching the above-mentioned vulnerable point is as follows:


To make formula computation possible, PT AF must have one of the following:

  • Pre-compute database of all functions that may occur in the report
  • An isolated sandbox with runtime environment for the language or platform on which the web application runs (such as CLR, JVM, or PHP, Python, or Ruby interpreter), and libraries used in the application 

The first method ensures maximum speed but requires a huge volume of manual work by the WAF developers to describe the pre-compute database even if the developers restrict the scope to standard library functions). The second method allows computing all the functions that may occur in the report, but increases the time needed to process each HTTP request, because the WAF needs to access the runtime environment to compute each function. The most appropriate solution here would be to use the first approach for the most common functions while using the second approach for the rest.

It is quite possible for a formula to contain a function that the analyzer cannot process (for instance, calling a method that involves a missing project dependency or native code) and/or a function that PT AF is unable to compute (for instance, a function for reading data from external sources or the server environment). Such functions are flagged "unknown" in formulas and processed in a special way as described below.

Run stage


At the run stage, the WAF delegates processing of each HTTP request to the binary module. The module analyzes a request and detects the relevant entry point in the web application. For this point, formulas of all detected vulnerabilities are selected and then computed in a specific way.

First, formulas are computed for both conditions: 1) reaching the vulnerable point and 2) reaching values of all its arguments. In each formula, variables are substituted with values of the relevant request parameters, after which the formula value is computed. If a formula contains expressions that are flagged "unknown", it is processed as follows:


  • Each "unknown" flag spreads bottom-up through the formula expression tree until a Boolean expression is found. 
  • In the formula, such expressions ("unknown" regions) are substituted with Boolean variables, so the Boolean satisfiability problem is solved. 
  • The assumption formula generates n conditions by substituting possible values of unknown regions from all the solutions found in the previous step. 
  • The value of each formula is computed. If at least one formula is satisfiable, the assumption is deemed satisfiable as well. 

If computations show that the assumption is false, then the HTTP request in question cannot lead the application to a vulnerable point even with dangerous values of all request arguments. In this case, RVP simply returns request processing to the WAF's core module.

If attack conditions are satisfiable, the value of the argument of the vulnerable point is then computed. Algorithms used depend on the vulnerability class to which the analyzed point belongs. Their only similarity is the logic used to process formulas that contain unknown nodes: unlike assumption formulas, argument formulas cannot possibly be computed, which is immediately communicated to the WAF. Then the next vulnerable point is computed. To better flesh this out, we shall now review the most complicated algorithm, which is used for detecting injection attacks.

Detecting injections


Injections include any attacks that target the integrity of text written in a formal language (including HTML, XML, JavaScript, SQL, URLs, and file paths) on the basis of data controlled by the attacker. The attack is carried out by passing specifically formed input data to the application. When this data is "plugged in" to the target text, the boundaries of the token are exceeded and the text now includes syntactic constructions not intended by the application logic.

If a vulnerable point belongs to this attack class, its argument value is determined using incremental computation with abstract interpretation using taint analysis semantics. The idea behind this method is that each expression is computed separately, from bottom to top, while the computation results obtained at each step are additionally marked with taint intervals, given the semantics of each function and rules of traditional taint checking. This makes it possible to pinpoint all fragments that are the result of transformation of input data (tainted fragments).

For instance, for the code above and the following HTTP request parameter ?condition=YzJWamNtVjA%3d&param=UEhOamNtbHdkRDVoYkdWeWRDZ3hLVHd2YzJOeWFYQjBQZyUzRCUzRA%3d%3d, the result of applying this algorithm to the formula of a vulnerable point argument is as follows (tainted arguments are marked in red):


The value is then tokenized in accordance with the grammar of the vulnerable point argument. If any tainted fragment matches more than one token, this is a formal sign of an injection attack (based on the definition of injection given at the beginning of this section).


Once formulas have been computed for all vulnerabilities pertaining to the current entry point, request processing is passed on to the WAF's core module together with detection results.

RVP advantages and specific features


This approach to application protection based on code analysis has a range of substantial advantages as compared to traditional VP:

  • The shortcomings of traditional VP are addressed, thanks to the formal approach described above and the ability to take into account any and all intermediate transformations.
  • The formal approach also completely rules out the possibility of false positives, so long as the formulas do not contain unknown nodes.
  • There is no adverse impact on web application functionality, because protection is built on the functions of the application, as opposed to simply trying to work around them. 

For testing the technology and confirming its effectiveness, we have developed a prototype of an integration module for PT Application Inspector and PT Application Firewall, in the form of a .NET HTTP module for IIS web server. A video of the prototype handling the code example above is on YouTube. Performance tests on around fifteen open-source content management systems (CMSs) have shown great results: the time required for processing HTTP requests with RVP is comparable to the time that it takes to process such requests with traditional (heuristic) WAF methods. The average performance hit for web applications was as follows:

  • 0% for requests that do not lead to a vulnerable point
  • 6–10% for requests that lead to a vulnerable point and are not an attack (depending on complexity of the grammar of the vulnerable point)
  • 4–7% for requests that lead to a vulnerable point and are an attack

Despite obvious advantages over traditional VP, RVP still has several conceptual shortcomings:

  • It is not possible to compute formulas that contain data from external sources absent on the WAF (including file resources, databases, and server environment).
  • The quality of formulas directly depends on the quality of approximation of code fragments during analysis (including loops, recursion, and calls to external library methods).
  • To describe semantics of transformation functions for the pre-compute database, some engineering work from the developers is required. The description process is difficult to automate and is prone to human error. 

However, we have managed to mitigate these weaknesses by offloading some RVP functions to the application and by applying the technologies that underlie runtime application self-protection (RASP).

To be continued in Part 2. 

Author: Vladimir Kochetkov

Intel fixes vulnerability found by Positive Technologies researchers in Management Engine

$
0
0

Intel has issued a security advisory and released a patch for a vulnerability discovered in Intel ME by Positive Technologies researchers Mark Ermolov and Maxim Goryachy. Intel has also published a downloadable detection tool so that administrators of Windows and Linux systems can determine whether their hardware is at risk.

Intel Management Engine is a proprietary dedicated microcontroller integrated into the Platform Controller Hub (PCH) with a set of built-in peripherals. Since the PCH is the conduit for almost all communication between the CPU and external devices, Intel ME has access to practically all data on the computer. The researchers found a flaw that allows running unsigned code on the PCH on any chipset for Skylake processors and later.

For example, attackers could target computers with a vulnerable version of Intel ME in order to implant malware (such as spyware) in the Intel ME code. This would be invisible to most traditional protection methods, since the malware is running on its own microprocessor—one separate from the CPU, which is used to run most operating systems and security software. Because the malware would not visibly impact performance, victims may not even suspect the presence of spyware that is resistant both to OS reinstalls and BIOS updates.

The Intel security advisory gives a full list of vulnerable processors:

  • 6th, 7th & 8th generation Intel Core processor family
  • Intel Xeon Processor E3-1200 v5 & v6 product family
  • Intel Xeon Processor Scalable family
  • Intel Xeon Processor W family
  • Intel Atom C3000 processor family
  • Apollo Lake Intel Atom Processor E3900 series
  • Apollo Lake Intel Pentium
  • Celeron N and J series processors

Maxim Goryachy, one of the Positive Technologies’ researchers, described the situation as follows: “Intel ME is at the core of an enormous number of devices worldwide. This is why it was so important to test its security. This subsystem sits below the OS and has access to a huge range of data. An attacker could use this privileged access to evade detection by traditional protection tools, such as antivirus software. Our close partnership with Intel was aimed at responsible disclosure, as part of which Intel has taken preventive measures, such as creating a detection tool to identify affected systems. We recommend reading the Intel security advisory for full details. We plan to present this vulnerability in greater detail in an upcoming talk at Black Hat Europe.”

Positive Technologies on GitHub

$
0
0

Currently, an increasing number of companies, such as Google, Microsoft, Facebook, and JetBrains, are placing in open access the code of both small and big projects. Positive Technologies is famous not only for its skilled professionals in IT security but also for a lot of professional developers. This enables us to make a contribution into further development of the Open Source project.

PT has the following GitHub groups that support our open projects:


We have given a detailed description of the first group together with its projects and a brief description of others.

Contents

Groups


Positive Technologies

This is the main group where we develop projects designed for open access from the start and those that used to be exclusively insider projects. Education and demo projects are also run here.

Open DevOps community

The Community is aimed to build ready-made open solutions for managing the full cycle of development, testing, and related processes, as well as product delivery, deployment, and licensing.

Currently, the Community is at an early stage of development but it already provides some useful Python-based tools. Yes, we do love Python!

Active projects:

  1. Crosspm is a universal package manager enabling to download packages to assemble multi-component products on the basis of rules set out in the manifest.
  2. Vspheretools is a tool that allows to control vSphere machines straight from the console. It is also possible to use it as an API library in your Python scripts.
  3. YouTrack Python 3 Client Library is a Python client to work with API YouTrack.
  4. TFS API Python client is Python client to work with API Team Foundation Server by Microsoft.
  5. A Python client for Artifactory is a Python client to work with the Artifactory API binary data storage.
  6. FuzzyClassificator is a universal neuro-fuzzy classifier of arbitrary objects whose properties can be measured based on a fuzzy scale.

Each tool has an automatic Travis Cl build uploadable to the PyPi repository where it can be found and installed with the standard pip install.

Some other tools are getting ready to be published:

  1. CrossBuilder is a system for cross-platform builds (build as a code). It is just like Travis CI while being independent of a Cl system used (TeamCity, Jenkins, GitLab-CI, etc.).
  2. ChangelogBuilder generates release notes describing any amendments made to product features. This generator receives and aggregates data from various trackers (TFS, YouTrack, GitLab, etc.).
  3. polyglot.sketchplugin is a plugin for the Sketch system used by designers for easier handling of multilingual text composition.

Everybody is welcome to contribute a new tool. If you wish to create your own project, please see our ExampleProject for the project structure and detailed guidelines on how to create a project. All you need to do is to copy it and create your own project on the basis of ExampleProject. If you do have any ideas or tools for automation, you are welcome to share them with the Community under an MIT license. This is fashionable, honorable and prestigious :)

Positive research

This is a repository for publishing research articles, presentations, utilities (including those for detecting vulnerabilities), signatures, and methods of attack detection.

  • Pentest-Detections is a utility that enables quick network scanning (with support of IPv4 and IPv6) and detection of vulnerabilities that can be exploited by WannaCry or NotPetya.
  • UnME11 are tools that allow to decode the latest versions of Intel ME 11.x.
  • Bad_Tuesday_Cryptor_SIEM is a MaxPatrol SIEM package for combatting of NotPetya.
  • Me-disablement are methods to disable Intel ME. The repository contains only the old method. For a new method on the basis of High Assurance Platform (HAP), please see our article Disabling Intel ME 11 via undocumented mode.

AttackDetection

The attack team provides to this repository some rules for vulnerability exploitation thanks to the intrusion detection systems Snortи Suricata IDS. The project’s main goal is to create rules for vulnerabilities that are widely spread and have high severity. The repository contains files for integration with oinkmaster, a script for updating and deploying rules in a designated IDS. The repository also has traffic files to test the rules. Notably, the repository has been added to favorites 100 times while about 40 new vulnerabilities have been discovered for the year, including BadTunnel, ETERNALBLUE, ImageTragick, EPICBANANA, and SambaCry. Announcements about new vulnerabilities are published in Twitter.

Positive JS

This is a community for developing tools (mainly web tools) used in PT products.

LibProtection

This is an organization uniting Positive Development User Group’s members who are currently working on adjusting the LibProtection library for various languages and platforms. The library provides developers with safe solutions for working with strings while perfectly ensuring sanitization of input data and automated protection of applications from injection attacks.

Projects


PT.PM

PT Pattern Matching Engine is a universal signature code analysis that accepts user patterns written in a domain specific language (DSL). This engine is used to check web applications for vulnerabilities contained in Approof, as well as in the source code analyzer PT Application Inspector.

The analysis includes several stages:

  1. Parsing of the source code into the parse tree.
  2. Converting the tree into a unified format.
  3. Comparing the tree with user patterns.

The approach used in this project allows to unify the task of universal pattern development for different languages.

PT.PM is conducting continuous integration, supporting assembly and testing of modules both in Windows and in Linux (Mono). The development is implemented via labelled issues and pull requests. Alongside with the development, we also document the project while publishing results of all major builds both in the format of NuGet packages and raw artifacts. The way PT.PM is organized may be considered as an example for all further projects.

For the first stage, that is for source code parsing, we use parsers based on ANTLR. The tool generates them for different runtimes on the basis of formal grammars contained in the repository. Our company is actively developing the repository. Currently, Java, C#, Python 2 and 3, JavaScript, C++, Go, and Swift runtimes are supported while support for the latter three has started just recently.

Noteworthy, ANTLR is used not only in PT projects on Application Security but also in Max Patrol SIEM where it is used for processing the Domain Specific Language, which is applied for description of dynamic asset groups. Knowledge exchange has prevented waste of time on tasks already solved before.

ANTLR grammars


Positive Technologies has helped to develop and improve grammars for PL/SQL, T-SQL, MySQL, PHP, Java 8, JavaScript, and C#.

PL/SQL

SQL grammar has vast syntax with lots of keywords. Fortunately, the PL/SQL grammar already existed for ANTLR 3 and it was not that difficult to port it for ANTLR 4.

T-SQL

No reliable parsers were found for T-SQL, not even mentioning open sources, so it took us quite a long time and efforts to restore the grammar on the basis of MSDN documents. Anyway, we finally managed to achieve a great result: the grammar covers many common syntactic constructions, looks neat, stays independent of the runtime, and it has been tested (see the examples of SQL queries on MSDN). Since 2015, over 15 external users have contributed to the grammar. Moreover, the grammar is also used now in DBFW, a prototype of network firewall for data base management, the subproject of PT Application Firewall.

MySQL

The grammar was developed by the team mentioned above on the basis of T-SQL. It is also used in DBFW.

PHP

This grammar was translated from Bison to ANTLR. It is interesting for its support of PHP, JavaScript, and HTML at once. To be more precise, code sections of JavaScript and HTML are parsed into text, which is then processed by parsers specifically for these languages.

Java

The grammar supporting Java 8 has been developed just recently. It is based on grammar of the former version Java 7. The new grammar introduces substantially expanded and improved test examples with various syntaxes (AllInOne7.java, AllInOne8.java) and performance test results for popular Java projects (including jdk8, Spring Framework, Elasticsearch).

JavaScript

It was developed on the basis of the old grammar ECMAScript without strict compliance with the standard. When developing grammars, we are primarily focused on practicality and simplicity and not just formal compliance. Another distinctive feature is almost full support of ECMAScript 6 as well as outdated constructions (HTML comments, CDATA sections).

Not all syntax constructions can be described with grammar rules only. In some cases, it is convenient and important to use the code on a target runtime language. For instance, in JavaScript the token get is just an identifier in some cases while in other cases it can be a keyword describing a property getter. So, it is possible to parse this token as a common identifier and check token values in the parser when processing the property:

getter
    : Identifier{p("get")}? propertyName
    ;

This grammar is interesting because these code fragments are universal at least for C# and Java runtimes thanks to the superClass option.

This means that, in the C# code, the function p("get") is described in the parent class JavaScriptBaseParser.cs:

protected bool p(string str) =>  _input.Lt(-1).Text.Equals(str).

As for Java, this function looks as follows (JavaScriptBaseLexer.java):

protected boolean p(String str) {
    return _input.LT(-1).getText().equals(str);
}

C#

Being mostly experimental, this grammar was created to compare the speed of ANTLR and Roslyn parsers.

Developments and prospects


For more details on grammar development, please see our last year's article Theory and Practice of Source Code Parsing with ANTLR and Roslyn.

As can be seen in the change history and numerous number of merged pull requests (tsql, plsql, mysql), these grammars are constantly being improved not only by Positive Technologies but also by a number of third-party developers. For the time of collaboration, the repository has grown not only in terms of quantity but also in terms of quality.

PT.SourceStats

It allows to collect statistics for projects based on different programming languages while being used in the free product Approof.

AspxParser

This project is devoted to developing a parser of ASPX pages that is used not only in the open PT.PM engine but also in the internal analyzer of NET applications (AI.Net), which is based on abstract interpretation of code.

FP Community rules

In the repository, we are currently developing rule sets in the YARA format. These rule sets are used in the signature analysis module of Approof projects named FingerPrint.

The FingerPrint engine is launched based on a set of source codes (backend and frontend). In accordance with the rules described, YARA searches for known versions of external components (for example, a version 3 bla-bla library). The rules are set in such a way that they contain signatures of vulnerable library versions where problems are described in the text format.

Each rule includes several conditions for file checking. For instance, that could be certain strings contained in a file. If the file meets the conditions, Approof provides in the final report the information about vulnerabilities found in a certain component, indicating the N version and describing all related CVEs.

Education and demo projects

At PHDays VII, the Appsec Outback master class was conducted at the PDUG section. For the master class, we developed education and demo versions of the Mantaray static code analyzer and the Schockfish network firewall. These projects have all the main mechanisms that are used in mature security tools. Unlike the latter, their main goal is to demonstrate algorithms and security methods, help to understand the process of app analysis and protection, and to show fundamental possibilities and limitations of technologies.

The repository also contains examples of security tools implementation:

  • DOMSanitizer— a module for detecting XSS attacks against web browsers
  • DOMParanoid— a module (security linter) for assessing HTML security.

License


Our projects use both permission licenses (MIT, Apache) and our own, which allows free non-commercial use.

Conclusion


Our move to GitHub has proved to be quite useful and made us experienced in various areas — setting up DevOps for Windows and Linux, document writing, and developments.

Positive Technologies plans to expand Open Source projects even further.

Author: Ivan Kochurkin, Positive Technologies

Recovering Huffman tables in Intel ME 11.x

$
0
0
Today Positive Technologies' expert Dmitry Sklyarov will explain how Intel ME 11.x stores its state on the flash and the other types of file systems that are supported by ME 11.x in London during his talk on Black Hat conference. Here is his articles about recovering Huffman tables in Intel ME 11.x


Many Intel ME 11.x modules are stored in Flash memory in compressed form [1]. Two different compression methods are used: LZMA and Huffman encoding [2]. LZMА can be decompressed using publicly available tools [3], but reversing Huffman encoding is a much more difficult challenge. Unpacking of Huffman encoding in ME is implemented in hardware, so writing the equivalent code in software is a far from trivial task—assuming it can be accomplished at all.

Previous ME versions

By reviewing sources from the unhuffme project [4], it is easy to see that previous versions of ME had two sets of Huffman tables, with each set containing two tables. The existence of two different tables (one for code and the other for data) is probably due to the fact that the code and data have very different statistical properties.

Other observed properties include:

  • The range of codeword lengths is different among the sets of tables (7–19 versus 7–15 bits inclusive).
  • Every sequence encodes a whole number of bytes (from 1 to 15 inclusive).
  • Both sets use Canonical Huffman Coding [5] (which allows quickly determining the length of a codeword during unpacking).
  • Within a given set, the lengths of encoded values for any codeword are the same in both tables (code table and data table).

Our task

We can assume that Huffman tables for ME 11.x have retained the latter three properties. So to fully recover the tables, what we need to find is:

  • Range of lengths of codewords
  • Shape (boundaries of values of codewords of the same length)
  • Lengths of encoded sequences for each codeword
  • Encoded values for each codeword in both tables

Splitting compressed data into pages

To learn about individual modules, we can start with what we already know about the internal structures of the firmware [6].

By looking at the Lookup Table, which is a part of the Code Partition Directory, we can determine which modules are affected by Huffman encoding, where their unpacked data starts, and what size the module will be once unpacked.

We can parse the Module Attributes Extension for a specific module to extract the size of the compressed/unpacked data and SHA256 value for the unpacked data.

A cursory look at several ME 11.x firmwares shows that the size of data after Huffman decoding is always a multiple of the size of a page (4096 == 0x1000 bytes). The start of the packed data contains an array of four-byte whole-number values. The number of array elements corresponds to the number of pages in the unpacked data.

For example, for a module measuring 81920 == 0x14000 bytes, the array will occupy 80 == 0x50 bytes and will consist of 20 == 0x14 elements.


The two most significant bits of each of the Little Endian values contain the table number (0b01 for code, 0b11 for data). The remaining 30 bits store the offset of the start of the compressed page relative to the end of the offset array. In the screenshot above, we see information describing 20 pages:



Offsets for the packed data for each page are arranged in ascending order; the size of the packed data for each page does not come into play directly. In the example above, the packed data for each particular page starts at a boundary that is a multiple of 64 = 0x40 bytes, with unused regions filled with zeroes. But judging by the other modules, we see that such alignment is not mandatory. This gives reason to suspect that the unpacker stops when the amount of data in the unpacked page reaches 4096 bytes.

Since we know the total size of the packed module (from the Module Attributes Extension), we can split the packed data into separate pages and work with each page individually. The start of the packed page is defined in the array of offsets; the page size is defined by the offset of the start of the next page or total size of the module. The packed data may be padded with an arbitrary number of bits (theoretically these bits could have any value, but in practice are usually zeros).


As seen in the screenshot, the last compressed page (starting at offset 0xFA40) consists of byte 0xB2 == 0b10110010, which is followed by 273 bytes with the value 0xEC == 0b11101100, and then zeroes. Since the bit sequence 11101100 (or 01110110) is repeated 273 times, we can assume that it encodes 4096/273 == 15 identical bytes (most likely with the value 0x00 or 0xFF). In this case, the bit sequence 10110010 (or 1011001) encodes 4096-273*15 == 1 byte.

This is consistent with the idea that each code sequence encodes a whole number of bytes (from 1 to 15 inclusive). But such an approach will not suffice for completely recovering the Huffman tables.

Finding the Rosetta Stone


As shown previously [6], identically named modules in different versions of ME 11 firmware can be compressed with different algorithms. If we take the Module Attributes Extension for identically named modules that have been compressed with LZMA and Huffman encoding, and then extract the SHA256 value for each module version, we find that there is no LZMA/Huffman module pair that has the same hash values.

But one should remember that for modules compressed with LZMA, SHA256 is usually computed from compressed data. If we calculate SHA256 for modules after LZMA decompression, a large number of pairs appears. Each of these module pairs yields several pairs of matching pages, both with Huffman encoding and in unpacked form.

Shape, Length, Value

Having a large set of pages in both compressed and uncompressed form (separately for code and for data) allows recovering all of the code sequences used in those pages. The methods needed for this task combine linear algebra and search optimization. While it would be likely feasible to create a rigorous mathematic model taking all the relevant constraints into account, because this is a one-time task it was simpler to do part of the job manually and part with automated methods.

The main thing is to at least approximate the shape (points of change of code sequence lengths). For example, 7-bit sequences have values from 1111111 to 1110111, 8-bit sequences have values from 11101101 to 10100011, and so on. Since Canonical Huffman Coding is used, if we know the shape, we then can predict the length of the next code sequence (the shortest sequence consists of only ones, the longest one consists of only zeroes—the lower the value, the longer the sequence).

Not knowing the exact size of the compressed data, we can discard all trailing sequences consisting only of null bits. These sequences are the longest, and it is unlikely that the rarest code sequence would occur in the final position.

When each compressed page can be represented as a set of code sequences, we can start determining the lengths of the values that are encoded by them. The lengths of encoded values for each page should add up to 4096 bytes. With this knowledge, we can set up a system of linear equations: the unknowns are the lengths of encoded values, the coefficients are the number of times a particular code sequence is found in the compressed page, and the constant equals 4096. Code and data pages can both be "plugged in" at the same time, since for identical code sequences, the lengths of encoded values should be the same.

Once we have enough pages (and equations), Gaussian elimination provides the one valid solution. And once we have the uncompressed plaintext, length of each value, and their order, we easily derive which sequence codes for which value.

Unknown sequences

After running these methods on all pages for which we had both encoded and plaintext equivalents, we could recover up to 70% of sequences from the code table and 68% from the data table. Lengths were now known for approximately 92% of sequences. Meanwhile, the shape remained a bit of a mystery: in some places, either one value of shorter length or two values of a longer length could have been present, making it impossible to determine the boundary until one of the values is encountered in the unpacked data.

With this knowledge in hand, we can proceed to recover values for code sequences for which we do not have the matching plaintext.

If we have a sequence of unknown length, another row is added to our system of equations and we can quickly determine its length. But if we don't have the plaintext, how can we determine the value?

Verification and brute force to the rescue

Fortunately for us, the metadata contains the SHA256 value for the unpacked module. So if we correctly guess all unknown code sequences on all the pages that make up a module, the SHA256 value should match the value from the Module Attributes Extension.

When the total length of unknown sequences is 1 or 2 bytes, simple brute-forcing is enough to "crack the code." This method can also work with 3 or even 4 unknown bytes (especially if they are located close to the end of the module), but brute-forcing can take several hours or days on a single CPU core (that said, computation can easily be split among multiple cores or machines). No attempts were made to brute-force 5 or more bytes.

This method was able to recover several more code sequences (and several modules). This left only the modules in which the total number of unknown bytes exceeded the capabilities of brute force.

Heuristics

Since many modules are only slightly different from each other, we can apply several heuristics in order to decode more sequences by analyzing the differences.

Use of second Huffman table

Since the unpacker has two Huffman tables, the packer  tries to compress data with both tables, discarding the version that takes up more space. So the division of code versus data is not clear-cut. And if a part of a page changes, the other table may be more efficient (yield a smaller result). So by looking at other versions of the same module, we can find identical fragments that were compressed by the other table, thus recovering unknown bytes.

Reuse

When a single code sequence is found many times in a single module or in multiple modules (such as in code and in data), it is often easy to figure out the constraints governing the unknown values.

Constants and tables with offsets

The module data often contains constants and offsets (such as for text strings and functions). Constants may be module-unique or shared among modules (for example, hashing and encryption functions). Meanwhile, offsets are unique to each module version but must refer to the same (or very similar) fragments of data or code. As a result, the values can easily be recovered.

String constants from open-source libraries

Some fragments of ME code were obviously borrowed from open-source projects (such as wpa_supplicant), which makes it easy to reconstruct fragments of text strings based on context and available source code.

Code from open-source libraries

If we look at the source code and find the text of a function whose compiled code contains several unknown bytes, we can simulate the compiler to guess which values fit.

Similar functions in other module versions

Since versions of the same module may be only slightly different, sometimes we can find the equivalent function in another version of the module and, based on the code, figure out what the unknown bytes should mean.

Similar functions in prior ME versions

When code has not been taken from public sources, if we have a fragment that is unknown in all available versions of a module and is not found in any other module (which was the case with the amt module), we can find the same place in previous versions (such as in ME 10), pick apart the logic of the function, and then see how it works in the unknown spot in ME 11.x.

Finishing the job

Starting with the modules containing the highest percentage of known sequences, and combining the described heuristics, we gradually were able to improve our coverage of the Huffman tables (each time testing our work against the SHA256 hash). Modules that originally contained dozens of unknown sequences now had only a few remaining. The process looked to be soon complete—except for that pesky amt.

As the largest module (around 2 megabytes, or 30% of the total), amt contains many codewords that are not found in any other module but occur in all versions of amt. We were highly confident of several sequences, but the only way to be sure would be to guess all of them correctly (so as to match the SHA256 hash). Thanks to the invaluable assistance of Maxim Goryachy, we were able to bring down this barrier as well. We could now unpack any module in the firmwares available to us that had been compressed with Huffman encoding.

New firmware versions appeared over time, containing all-new codewords. But in each case, one of the above heuristics succeeded in solving the module's mysteries, therefore further improving our coverage of the tables.

Closing notes

As of mid-June 2017, we were successful in recovering approximately 89.4% of sequences for the code table and 86.4% for the data table. But the chances of successfully obtaining 100% table coverage in a reasonable period of time based on analysis of new modules were slim at best.
On June 19, user IllegalArgument published Huffman table fragments on GitHub [7], covering 80.8% of the code table and 78.1% of the data table. Most likely, the author (or authors) used a similar approach based on analysis of available firmware versions. Thus the published tables did not provide any new information.

On July 17, Mark Ermolov and Maxim Goryachy made a breakthrough and could now find plaintext values for any compressed data. We prepared four compressed pages (two pages for each of the Huffman tables), recovering all 1,519 sequences for both tables [8].

In the process, one quirk came to light. In the Huffman table for data, the value 00410E088502420D05 corresponds to both 10100111 (8 bits long) and 000100101001 (12 bits long). This is a clear case of redundancy, most likely caused by an oversight.

The final shape of the data is as follows:


References


  1. https://recon.cx/2014/slides/Recon%202014%20Skochinsky.pdf
  2. https://en.wikipedia.org/wiki/Huffman_coding
  3. http://www.7-zip.org/sdk.html
  4. http://io.netgarage.org/me/
  5. http://www.cs.uofs.edu/~mccloske/courses/cmps340/huff_canonical_dec2015.html
  6. https://www.troopers.de/troopers17/talks/772-intel-me-the-way-of-the-static-analysis/
  7. https://github.com/IllegalArgument/Huffman11
  8. https://github.com/ptresearch/unME11


Author: Dmitry Sklyarov, Head of Application Research Unit, Positive Technologies

MySQL grammar in ANTLR 4

$
0
0
The main purpose of a web application firewall is to analyze and filter traffic relevant to an application or a class of applications, such as web applications or database management systems (DBMS). A firewall needs to speak the language of the application it is protecting. For a relational DBMS, the language in question will be an SQL dialect.

Let us assume that the task is to build a firewall to protect a DBMS. In this case, the firewall must recognize and analyze SQL statements in order to determine whether they comply with the security policy. The depth of analysis depends on the task required (for example, detection of SQL injection attacks, access control, or correlation of SQL and HTTP requests). In any case, the firewall must perform lexical, syntactic, and semantic analysis of SQL statements.

Contents

Introduction


Formal grammar

With a formal grammar of a language, we can obtain a full picture of how the language is structured and analyze it. Formal grammars help to create statements and recognize them using a syntactic analyzer.

According to the Chomsky hierarchy, there are four types of languages and, consequently, four types of grammars. Grammars are differentiated by their derivations. MySQL is a context-sensitive language. However, a context-sensitive grammar alone cannot yield a large number of language patterns. Generally, a context-free grammar is sufficient for generating the language patterns used in practice. This article describes development of a context-free grammar for MySQL.

Key terms

A language is defined on the basis of its alphabet, that is to say a set of symbols. Letters of the alphabet are united into meaningful sequences called lexemes. There are different types of lexemes (for example, identifiers, strings, and keywords). A token is a tuple consisting of a lexeme and a type name. A phrase is a sequence of specifically arranged lexemes. Phrases can be used to create statements. A statement refers to a complete sequence of lexemes that has independent meaning in the context of a given language. The notion statement has applied significance only.

An application based on any given language makes use of statements in that language, such as by running or interpreting these statements. From an applied point of view, a phrase is an incomplete structure, which can be a part of a statement. However, phrases are more useful for generating a grammar. Phrases and statements are similar from the point of view of the grammar—they follow certain of its rules with nonterminals on their right side.

Use of a language implies formation or recognition of statements. Recognition refers to the capability, for any sequence of lexemes received as input, to provide an answer the question "Does this sequence constitute a set of valid statements in this language?" as output.

MySQL language

The MySQL language is an SQL dialect used to write queries for the MySQL DBMS. SQL dialects follow the ISO/IEC 9075 Information technology – Database languages – SQL standard (or, strictly speaking, series of standards).

The MySQL dialect is a specific implementation of this standard with particular limitations and additions. Most MySQL statements can be described using a context-free grammar, but some statements require a context-sensitive grammar for their description. Put simply, if a lexeme can affect the recognition of subsequent phrases, then the phrase must be described by a rule of a context-sensitive grammar.

Some MySQL expressions are formed using this approach. For example:

DELIMITER SOME_LITERAL

In this case it is required to memorize SOME_LITERAL, because this literal will be used in subsequent statements instead of ; to mark their ending.

In a procedural extension, loop and compound statements can be tagged with labels having the following structure:

label somephrases label

In this case, label identifiers should be identical. Such a statement can be built only using a context-sensitive grammar.

ANTLR

We selected ANTLR as the parser generator for developing a MySQL parser. ANTLR has the following advantages to recommend it:



ANTLR uses a two-step algorithm for generating recognition code. The first step is to describe the lexical structure of a language, i.e., determine what the tokens are. The second step is to describe the syntactic structure of the language by grouping the tokens from the previous step into statements. Lexical and syntactic structures in ANTLR are described by rules. The lexical structure is defined by the type (lexeme descriptor) and value. To describe the value, a language with regular expression elements, but supporting recursion, is used. The syntactic structure rule is composed of lexeme descriptors based on the statement composition rules in ANTLR 4, which allow defining the structure of lexeme arrangement in a statement or a phrase within a statement.

When creating rules, the core principle of lexical analysis (to which ANTLR is no exception) should be taken into account. A lexer starts by recognizing the longest sequence of symbols in the input stream that can be described by any lexical rule. If multiple matching rules exist, the one with highest precedence is applied.

Without using semantic predicates in ANTLR, only a context-free grammar can be built. An advantage in this case is that such a grammar does not depend on the runtime environment. The grammar proposed in this article for the MySQL dialect is built without using semantic predicates.

Lexer


Getting started

The first step in developing a grammar is to define a list of the lexeme types that occur in the language. The recognizer accepts alphabet symbols, from which it must form lexemes; symbols that are not used as a part of lexemes, such as spaces and comments, can be filtered out. Thanks to filtering, only meaningful lexemes of the language are set aside for further analysis. Spaces and comments can be filtered out as follows:

SPACE: [ \t\r\n]+ -> channel(HIDDEN); COMMENT_INPUT: '/*' .*? '*/' -> channel(HIDDEN); LINE_COMMENT: ('-- ' | '#') ~[\r\n]*('\r'? '\n' | EOF) -> channel(HIDDEN);

Potential lexical errors can also be taken into account, and unknown symbols can be omitted:

ERROR_RECONGNIGION: . -> channel(ERRORCHANNEL);

If a symbol is not recognized by any lexical rule, it is recognized by the rule ERROR_RECOGNITION. This rule is placed at the end of the grammar, giving it the lowest priority.

Now we can start identifying lexemes. Lexemes can be classified under the following types:
  • Keywords
  • Identifiers
  • Literals
  • Special symbols

If there is no obvious (or implicit) intersection between these types of lexemes in the language, it is only required to describe all lexemes. However, if there are any intersections, they should be resolved. The situation becomes complicated because some lexemes require a regular grammar for recognition. In MySQL, this is an issue for identifiers with a dot (fully qualified name) and keywords that can be identifiers.

Identifiers with a dot

Recognition of such MySQL lexemes as identifiers starting with a numeral has certain difficulties: the "." symbol can occur in both full column names and real literals:

select table_name.column_name as full_column_name ...

select1.eas real_number ...

Therefore, it is required to recognize correctly a full column name in the first case, and a real literal in the second case. An intersection here is caused by the fact that identifiers in MySQL can start with numerals.

MySQL sees the phrase

someTableName.1SomeColumn

as a sequence of three tokens:

(someTableName, identifier), (. , dot delimeter), (1SomeColumn, identifier)

In this example, it is quite natural to use the following rules for recognition:

DOT: .; ID_LITERAL: [0-9]*[a-zA-Z_$][a-zA-Z_$0-9]*;

and the following rule for numerals:

DECIMAL_LITERAL: [0-9]+;

Tokenization results in a sequence of four tokens:

(someTableName, identifier), (. , dot delimeter), (1, число), (SomeColumn, identifier)

To avoid ambiguity, an auxiliary structure can be introduced to recognize identifiers:

fragment ID_LITERAL: [0-9]*[a-zA-Z_$][a-zA-Z_$0-9]*;

and prioritized rules can be defined:

DOT_ID:'.' ID_LITERAL; ... ID: ID_LITERAL; ... DOT:'.'

Since ANTLR recognizes sequences of maximum length, "." will surely not be recognized as a separate symbol.

Strings

Using strings as an example, we can illustrate one more rule of lexical analysis in ANTLR. A string in MySQL is a sequence of almost any characters put between single or double quotation marks. Strings put in single quotation marks cannot contain a single backslash and a quotation mark, because a lexer would not be able to determine where the string ends. If such characters have to be used, a single quote is replaced by two consecutive single quotes to escape these characters. Moreover, an escape character inside a string cannot be unaccompanied, since there is supposed to be something to be escaped. Therefore, use of this character by itself without an accompanying character should also be prohibited. As a result, we obtain the following fragment of a lexical rule:

fragment SQUOTA_STRING: '\'' ('\\'. | '\'\'' | ~('\'' | '\\'))* '\'';
  • '\\'. allows a backslash and the symbol it escapes.
  • '\'\'' allows a sequence of two single quotation marks.
  • ~('\'' | '\\') prohibits a standalone single quotation mark or escape character.

Keywords

An ANTLR lexer, by contrast with a parser, applies rules in order of precedence. Rules that have been defined earlier have a higher priority than those described later. This approach gives a clear instruction for rule sorting: higher priority is given to specific rules (defining keywords and special symbols), followed by general rules (for recognizing literals, variables, identifiers, etc.).

Special comment type in MySQL

MySQL uses a special comment style that can span multiple lines. Such comments allow creating queries compatible with other DBMSs with no need to follow MySQL-specific requirements. When generating a query, MySQL will analyze the text in such comments. To recognize special MySQL comments, we can use the following rule:

SPEC_MYSQL_COMMENT: '/*!' .+? '*/' -> channel(MYSQLCOMMENT);

However, using this rule by itself is not enough for correctly parsing queries.

Assume that a query of the following kind is received:

selectname, info /*!, secret_info */fromusers;

Applying the above-mentioned rule, we obtain the following sequence of tokens:

(SELECT, 'select') (ID, 'name') (COMMA, ',') (ID, 'info') (SPEC_MYSQL_COMMENT, '/*!, secret_info */') (FROM, 'from') (ID, 'users') (SEMI, ';')

Whereas the standard MySQL lexer recognizes slightly different tokens:

(SELECT, 'select') (ID, 'name') (COMMA, ',') (ID, 'info') (COMMA, ',') (ID, 'secret_info') (FROM, 'from') (ID, 'users') (SEMI, ';')

That is why correct recognition of comments written in the unique MySQL style requires additional processing:
  1. Source text is recognized by a special lexer for preprocessing.
  2. Values are extracted from the SPEC_MYSQL_COMMENT tokens and a new text is created, which will be processed only by a MySQL server.
  3. The newly created text is processed using an ordinary parser and lexer.

A lexer for preprocessing splits the input stream into phrases that are part of:

  • Special comments (SPEC_MYSQL_COMMENT)
  • Main queries (TEXT)

The rules can be arranged in the following way:

lexer grammar mysqlPreprocessorLexer; channels { MYSQLCOMMENT } TEXT: ~'/'+; SPEC_MYSQL_COMMENT: '/*!' .+? '*/'; //-> channel(MYSQLCOMMENT);SLASH: '/' -> type(TEXT);

A pre-lexer splits query code into a sequence of the SPEC_MYSQL_COMMENT and TEXT tokens. If a MySQL statement is processed, values extracted from the SPEC_MYSQL_COMMENT tokens are combined with values of the TEXT tokens. Then the resulting text is processed by the standard MySQL lexer. If another SQL dialect is used, the SPEC_MYSQL_COMMENT tokens are simply removed or set aside.

Case insensitivity

Almost all lexemes in MySQL are case-insensitive, which means that the following two queries are identical:

select * fromt; SelECT * fROmt;

Unfortunately, ANTLR does not support case-insensitive tokens. Therefore it is necessary to apply the following entry with fragment tokens, which are used to build actual tokens:

SELECT: S ELECT; FROM: F R O M; fragment S: [sS]; fragment E: [eE];

This makes a grammar less readable. Moreover, a lexer has to select one of two variants for each symbol—upper or lower case—which has a negative impact on performance.

To make lexer code cleaner and improve performance, the input stream should be normalized, meaning that all symbols should be in the same case (upper or lower). ANTLR supports a special stream that disregards case during lexical analysis, but retains the cases of original token values. These tokens can be used in tree traversal.

Implementation of such a stream for various runtimes has been suggested by @KvanTTT. The implementation can be found in the DAGE project, a cross-platform editor of ANTLR 4 grammars.
As a result, all lexemes are written either in lower or in upper case. Because normally SQL keywords in queries are written in upper case, it was decided to use upper case for the grammar:

SELECT: "SELECT"; FROM: "FROM";

Parser

To describe the syntactic structure of a language, the sequence of the following components should be defined:
  • Statements in a text
  • Phrases in a statement
  • Lexemes and phrases inside larger phrases

Text structure in MySQL

There is an excellent MySQL grammar description, though it is spread across the whole reference guide. The arrangement of statements in a text is given in the section that describes the MySQL client/server messaging protocol. We can see that all statements, except possibly the last one, use a semicolon (;) as the delimiter. Moreover, there is a peculiarity about inline comments: the last sentence in a text can end with such a comment. As a result, it turns out that any valid sequence of statements in MySQL can be represented in the following form:

root : sqlStatements? MINUSMINUS? EOF ; sqlStatements : (sqlStatement MINUSMINUS? SEMI | emptyStatement)* (sqlStatement (MINUSMINUS? SEMI)? | emptyStatement) ;... MINUSMINUS:

A context-free grammar is not powerful enough to ensure full-fledged support of these rules, because a MySQL client can use the DELIMITER command to set the current delimiter. In this case, it is required to memorize and use the delimiter in other rules. Thus, if we use the DELIMITER directive, SQL statements that are written correctly will not be recognized by the grammar under discussion.

Types of MySQL statements

MySQL statements can be of the following types:
  • DDL statements
  • DML statements
  • Transaction statements
  • Replication statements
  • Prepared statements
  • Server administration statements
  • Utility statements
  • Procedural extension statements

The root rule for statements, based on the MySQL documentation, looks as follows:


sqlStatement : ddlStatement | dmlStatement | transactionStatement | replicationStatement | preparedStatement | administrationStatement | utilityStatement

There is also an empty statement that consists of a single semicolon:

empty_statement : SEMI ; SEMI: ';';

The subsequent chapters of the official documentation have been similarly transformed into ANTLR rules.

SELECT

The SELECT statement is probably the most interesting and wide-ranging statement in both MySQL and SQL in general. When writing the grammar, our main focus went towards the following tasks:
  • Description of tables
  • Description of expressions
  • Combination using UNION

Let us start with description of tables. MySQL has a rather elaborate description of what can be used in the FROM field of SELECT queries (which here we'll call "table references"). Careful study and testing on actively used versions reveals that table references have the following structure:

Table object 1, Table object2, …, Table objectN

where a "table object" is one of four structures:
  • A separate table
  • Joined tables
  • A subquery
  • Table references in parentheses

If we start from less general, we get a table object inductively recognized as a table or a table-based structure. The latter can be:

Joined tables
A subquery
A sequence of table objects in parentheses

Then a sequence of table objects, consisting of at least one table object, is detected in the FROM field. Of course, the grammar describes additional structures, such as "connection conditions" and references to partitions (PARTITION), but the general structure is as follows:

tableSources : tableSource (',' tableSource)* ; tableSource : tableSourceItem joinPart* | '(' tableSourceItem joinPart* ')'; tableSourceItem : tableName (PARTITION '(' uidList ')' )? (AS? alias=uid)? (indexHint (',' indexHint)* )? #atomTableItem | (subquery | '(' parenthesisSubquery=subquery ')') AS? alias=uid #subqueryTableItem | '(' tableSources ')'#tableSourcesItem;

Expressions

Expressions are widely used in MySQL wherever it is required to evaluate a value (value vector). Inductively, an expression can be defined as follows:
  • An expression is any lexeme that is:
    • a constant (literal) value
    • a variable
    • an object identifier
  • An expression is a superposition of expressions that have been united by transformations.
Transformations include operations, operators (including set-theoretic and comparison operators), functions, queries, and parentheses.

UNION

Unlike other dialects, MySQL has only two set-theoretic operations on tables. The first one, JOIN, has already been considered. Empirically, we found that the description of UNION in the official documentation is incomplete. We built upon it in the following way:

selectStatement : querySpecification lockClause? #simpleSelect| queryExpression lockClause? #parenthesisSelect| querySpecificationNointo unionStatement+ ( UNION (ALL | DISTINCT)? (querySpecification | queryExpression) )? orderByClause? limitClause? lockClause? #unionSelect| queryExpressionNointo unionParenthesis+ ( UNION (ALL | DISTINCT)? queryExpression )? orderByClause? limitClause? lockClause? #unionParenthesisSelect ;

If UNION is used, individual queries can be enclosed in parentheses. Using parentheses is not essential, unless queries use ORDER BY and LIMIT. However, if the first query in UNION is in parentheses, they should be used for all subsequent queries as well.

Incorrect:

(select1) unionselect2; (select1) union (select2) unionselect3;

Correct:

(((select1))) union (select2); (select1) union ((select2)) union (select3);

Use of grammar

A grammar is written to solve tasks relevant to syntactic and lexical analysis. On the one hand, recognition is required to be performed as quickly as possible; on the other hand, any developed applications should be able to take advantage of lexer and parser code without compromising functionality and performance.

An application that uses a parser most likely applies either the Visitor or Observer design pattern. Both patterns involve analysis of a defined subset of the nodes of a parse tree. Nodes of a parse tree, other than leaf nodes, correspond to certain syntax rules. To analyze nodes of a parse tree, we must look at the child nodes, which could be either individual nodes or groups of nodes, that correspond to fragments of a parent rule.

A critical condition for developing a good grammar is the ability to gain "easy" access to any part of the rule. Intuitively, "easy" access can be described as the possibility to get any given part as an object without searching and iterating. This is implemented in ANTLR by means of such entities as alternative and element labels. Alternative labels allow splitting a complex rule into alternative phrases and, if the Visitor pattern is used, processing each of them using a separate method. For example, a table object in MySQL can be defined by the following rule:

tableSourceItem : tableName (PARTITION '(' uidList ')' )? (AS? alias=uid)? (indexHint (',' indexHint)* )? | (subquery | '(' parenthesisSubquery=subquery ')') AS? alias=uid | '(' tableSources ')' ;

We can see that a table object is defined as one of three possible variants:
  • A table
  • A subquery
  • A sequence of table objects in parentheses

Therefore, instead of processing the whole structure, alternative labels are used, creating the possibility to process each variant independently of the others:

tableSourceItem : tableName (PARTITION '(' uidList ')' )? (AS? alias=uid)? (indexHint (',' indexHint)* )? #atomTableItem | (subquery | '(' parenthesisSubquery=subquery ')') AS? alias=uid #subqueryTableItem | '(' tableSources ')'#tableSourcesItem ;

Element labels are used to label individual nonterminals and their sequences. These labels allow accessing the content of a rule context as a field with a certain name. Thus, instead of extracting a certain content element from some context, an element label would be sufficient. By contrast, extraction depends on the rule structure. The more elaborate a rule, the more complicated extraction becomes.

For example, the rule:

loadXmlStatement : LOADXML (LOW_PRIORITY | CONCURRENT)? LOCAL? INFILE STRING_LITERAL (REPLACE | IGNORE)? INTOTABLE tableName (CHARACTERSET charsetName)? (ROWSIDENTIFIEDBY'<' STRING_LITERAL '>')? ( IGNORE decimalLiteral (LINES | ROWS) )? ( '(' assignmentField (',' assignmentField)* ')' )? (SET updatedElement (',' updatedElement)*)? ;

requires extracting the tag name that identifies strings imported by the LOAD XML operator. There is also the need to identify the conditions that would determine the specific form of the LOAD XML operator:
  • Is any priority explicitly set for the operator? If yes, then what priority?
  • Which string append mode will be used by the operator?
  • Exactly what kind of syntax will be used if the syntax is used to ignore several initial strings during import?

To obtain the required values immediately in code without any extraction, element labels can be used:

loadXmlStatement : LOADXMLpriority=(LOW_PRIORITY | CONCURRENT)? LOCAL? INFILEfile=STRING_LITERAL violation=(REPLACE | IGNORE)? INTOTABLE tableName (CHARACTERSET charsetName)? (ROWSIDENTIFIEDBY'<' tag=STRING_LITERAL '>')? ( IGNORE decimalLiteral linesFormat=(LINES | ROWS) )? ( '(' assignmentField (',' assignmentField)* ')' )? (SET updatedElement (',' updatedElement)*)? ;

Simplifying code of the target application simplifies the grammar as well, because the names of alternative labels improve readability.

Conclusion

Developing grammars for SQL languages is quite challenging, because they are case-insensitive and contain a large number of keywords, ambiguities, and context-sensitive structures. In particular, while developing our MySQL grammar, we implemented processing of special types of comments, developed a lexer able to differentiate between identifiers with a dot and real literals, and wrote the parser grammar, which covers the majority of the MySQL syntax described in the documentation. The MySQL grammar developed by us can be used to recognize queries generated by WordPress and Bitrix, as well as other applications that do not require exact processing of context-sensitive cases. The grammar source files are available in the official grammar repository under the MIT license.

Author: Ivan Khudyashov, Positive Technologies

How to Hack a Turned-off Computer, or Running Unsigned Code in Intel ME

$
0
0
At the recent Black Hat Europe conference, Positive Technologies researchers Mark Ermolov and Maxim Goryachy spoke about the vulnerability in Intel Management Engine 11, which opens up access to most of the data and processes on the computer.

Such level of access also means that any attacker exploiting this vulnerability, once bypassed traditional software-based protection, will be able to conduct attacks even when the computer is turned off. New details of the study in our blog post.

1. Introduction

Intel Management Engine (Intel ME) is a proprietary technology that consists of a microcontroller integrated into the Platform Controller Hub (PCH) chip and a set of built-in peripherals. The PCH carries almost all communication between the processor and external devices. Therefore, Intel ME has access to almost all data on the computer. The ability to execute third-party code on Intel ME would allow for a complete compromise of the platform.

We see increasing interest in Intel ME internals from researchers all over the world. One of the reasons is the transition of this subsystem to new hardware (x86) and software (modified MINIX as an operating system [1]). The x86 platform allows researchers to make use of the full power of binary code analysis tools. Previously, firmware analysis was difficult because earlier versions of ME were based on an ARCompact microcontroller with an unfamiliar set of instructions.
Analysis of Intel ME 11 was previously impossible because the executable modules are compressed by Huffman codes with unknown tables. However, our research team has succeeded in recovering these tables and created a utility for unpacking images [2].

After unpacking the executable modules, we proceeded to examine the software and hardware internals of Intel ME. Our efforts to understand the workings of ME were rewarded: ME was ultimately not so unapproachable as it had seemed.

1.1.Intel Management Engine 11 overview

A detailed description of Intel ME internals and components can be found in several papers: [1], [3], [4]. It should be noted that starting in 2015, the LMT processor core with the x86 instruction set has been integrated into the PCH. Such a core is used in the Quark SOC.


Figure 1. LMT2 IdCode of ME core

Many modern technologies by Intel are built around Intel Management Engine: Intel Active Management Technology, Intel Platform Trust Technology (fTPM), Intel Software Guard Extensions, and Intel Protected Audio Video Path. ME is also a root of trust for Intel Boot Guard, which prevents attackers from injecting their code into UEFI. The main purpose of ME is to initialize the platform and start the main processor. ME also has virtually unlimited access to data processed on the computer. ME can intercept and modify network packets as well as images on graphics cards; it has full access to USB devices. Such capabilities mean that if an attacker finds an opportunity to execute arbitrary code inside ME, this will spawn a new generation of malware that cannot be detected using current protection tools. Fortunately, only three (publicly known) vulnerabilities have been detected in the 17-year history of this technology.


1.2.Published vulnerabilities in Intel ME


1.2.1. Ring-3 rootkits

The first publicly known vulnerability was discovered in Intel ME in 2009. At Black Hat, Alexander Tereshkin and Rafal Wojtczuk gave a talk entitled "Introducing Ring-3 Rootkits". The attack involved injecting code into a special region of UMA memory into which ME unloads currently unused memory pages.

After the research was made public, Intel introduced UMA protection. Now this region is encrypted with AES and ME stores the checksum for each page, which is checked when the page is returned to the main memory of ME.

1.2.2. Zero-Touch Provisioning

In 2010, Vassilios Ververis presented an attack on the implementation of ME in GM45[10]. By using "zero-touch" provisioning mode (ZTC), he was able to bypass AMT authorization.

1.2.3. Silent Bob is Silent

In May 2017, a vulnerability in the AMT authorization system (CVE-2017-5689) was published [11]. It allowed an unauthorized user to obtain full access to the main system on motherboards supporting the vPro technology.

Thus, to date only one vulnerability (Ring-3 rootkits) allowing execution of arbitrary code inside Intel ME has been found.

2. Potential attack vectors 

Virtually all data used by ME is either explicitly or implicitly signed by Intel. However, ME still allows some interaction with the user:
  • Local communication interface (HECI)
  • Network (vPro only)
  • Host memory (UMA)
  • Firmware SPI layout
  • Internal file system

2.1. HECI

HECI is a separate PCI device serving as a circular buffer to exchange messages between the main system and ME.

Applications located inside ME can register their HECI handlers. This increases the number of potential security issues (CVE-2017-5711). On Apple computers, HECI is disabled by default.

2.2.Network (vPro only)

AMT is a large module with a huge number of different network protocols of various levels integrated into it. This module contains a great deal of legacy code but can only be found in business systems.

2.3.Hardware attack on SPI interface

While we were studying ME, it occurred to us to attempt bypassing signature verification with the help of an SPI flash emulator. This specialized device would look like regular SPI flash to the PCH, but can send different data each time it is accessed. This means that if the data signature is checked in the beginning and then the data is reread, one can conduct an attack and inject code into ME. We did not find such errors in the firmware: first, data is read, and then the signature is verified. When accessed again, data is checked to make sure it is identical to the data obtained during the first read.

2.4.Internal file system

Intel ME uses SPI flash as primary file storage with its own file system. While the file system has a rather complicated structure [6], many privileged processes store their configuration files in it. Therefore, the file system seemed a very promising place for acting on ME.
The next step in searching for vulnerabilities was to choose a binary module.

2.5.Selecting a module for analysis

The МЕ operating system implements a Unix-like access control model, the difference being that controls are on a per-process basis. The user-id, group-id, list of accessible hardware, and allowed system calls are set statically for each process.


Figure 2. Example of static rules for a process

The result is that only some system processes are able to load and run modules. A parent process is responsible for verifying integrity and setting privileges for its child process. One risk, of course, is that a process can set high privileges for its child in order to bypass restrictions.

One process with the ability to spawn child processes is BUP (BringUP). In the process of reverse engineering the BUP module, we discovered a stack buffer overflow vulnerability in the function for Trace Hub device initialization. The file /home/bup/ct was unsigned, enabling us to slip a modified version into the ME firmware with the help of Flash Image Tool. Now we were able to cause a buffer overflow inside the BUP process with the help of a large BUP initialization file. But exploiting this required bypassing the mechanism for protection against stack buffer overflows.

Figure 3. Stack buffer overflow vulnerability

2.6.Bypassing stack buffer overflow protection

ME implements a classic method for protection from a buffer overflow in the stack—a stack cookie. The implementation is as follows:
  1. When a process is created, a 32-bit value is copied from the hardware random number generator to a special region (read-only for process).
  2. In the function prologue, this value is copied above the return address in the stack, thus protecting it.
  3. In the function epilogue, the saved value is compared with the known good value. If they do not match, a software interrupt (int 81h) terminates the process.

So exploitation requires either predicting the cookie value or taking control before cookie integrity is checked. Further research showed that any error in the random number generator is regarded by ME as fatal, causing it to fail.

Looking at the functions that are called after an overflow and before the integrity check, we found that the function we named bup_dfs_read_file indirectly calls memcpy. It, in turn, gets the destination address from the structure we named Tread Local Storage (TLS). Notably, BUP functions for file read/write use system library services for accessing shared memory. In other words, read and write functions obtain and record data via a shared memory mechanism. But this data is not used anywhere other than BUP, so use of this mechanism may raise eyebrows. In our view, memory is shared likely because the portion of BUP code responsible for MFS interaction was copied from another module (file system driver), where use of shared memory is justified.


Figure 4. Calling the memcpy function


Figure 5. Getting address from the TLS

As we discovered later, in case of a buffer overflow this region of the TLS can be overwritten by a file read function, which could be used to bypass buffer overflow protection.

2.7.Tread Local Storage

Access to the TLS is mediated by the gs segment registry. The structure looks as follows:

Figure 6. TLS structure


Figure 7. Getting TLS fields

The segment to which gs points is not write-accessible, but the TLS structure itself is at the bottom of the stack (!!!), which allows modifying it in spite of the restrictions. So in the case of a buffer overflow, we can overwrite the pointer to the SYSLIB_CTX in the TLS and generate new such structure. Because of how the bup_dfs_read_file function works, this trick gives us arbitrary write abilities.

2.8.Using implementation of read function to get an arbitrary write primitive

The bup_dfs_read_file function reads from SPI-flash in 64-byte blocks, due to which it is possible to overwrite the pointer to SYSLIB_CTX in a one iteration and during the next iteration, the sys_write_shared_mem function extracts the address that we created and passes it to memcpy as the destination address. With this done, we can get an arbitrary write primitive.


Figure 8. Iterative reading of file inside bup_dfs_read_file

The absence of ASLR enables us to overwrite a return address using the arbitrary write primitive and hijack the program control flow. But here lies an unpleasant surprise for the attacker—the stack is not executable. Remember, however, that BUP can spawn new processes and is responsible for checking module signatures. So with Return-Oriented Programming (ROP), we can create a new process with the rights we need.

2.9.Possible exploitation vectors

To successfully exploit the vulnerability, we need write access to the MFS or entire Intel ME region. Vendors are supposed to block access to the ME region, but many fail to do so [8]. Such a configuration error makes the system vulnerable.

By design, Intel ME allows for write access to the ME region via special HMR-FPO messages sent over HECI from the BIOS [9]. An attacker can send such a message by exploiting a BIOS vulnerability, or directly from OS if ME is in manufacture-mode, or via a DMA attack.

Attackers with physical access can always overwrite with their own image (via SPI programmer or Security Descriptor Override jumper), resulting in a complete compromise of the platform.

One of the most common questions regards the possibility of remote exploitation. We think that remote exploitation is possible if the following conditions are true:
  1. The target platform has AMT activated.
  2. The attacker knows the AMT administrator password or can use a vulnerability to bypass authorization.
  3. The BIOS is not password-protected (or the attacker knows the password).
  4. The BIOS can be configured to open up write access to the ME region.

If all these conditions are met, there is no reason why an attacker would not be able to obtain access to the ME region remotely.

Also note that during startup, the ROM does not check the version of firmware, leaving the possibility that an attacker targeting an up-to-date system could maliciously downgrade ME to a vulnerable version.

2.10. CVE-2017-5705,6,7 overview

The vulnerability was assigned number INTEL-SA-00086 (CVE-2017-5705, CVE-2017-5706, CVE-2017-5707). The description of the vulnerability includes the following information:

CVSSv3 Vectors:

  • 8.2 High AV:L/AC:L/PR:H/UI:N/S:C/C:H/I:H/A:H

Affected products [12]: 
  • 6th, 7th & 8th Generation Intel® Core™ Processor Family
  • Intel® Xeon® Processor E3-1200 v5 & v6 Product Family
  • Intel® Xeon® Processor Scalable Family
  • Intel® Xeon® Processor W Family
  • Intel® Atom® C3000 Processor Family
  • Apollo Lake Intel® Atom Processor E3900 series
  • Apollo Lake Intel® Pentium™
  • Celeron™ N and J series Processors

2.11. Disclosure Timeline
  • June 27, 2017—Bug reported to Intel PSIRT
  • June 28, 2017—Intel started initial investigation
  • July 5, 2017—Intel requested proof-of-concept
  • July 6, 2017—Additional information sent to Intel PSIRT 
  • July 17, 2017—Intel acknowledged the vulnerability
  • July 28, 2017—Bounty payment received
  • November 20, 2017—Intel published SA-00086 security advisory

3. Conclusion

The most important finding of our research was a vulnerability that allows running arbitrary code in Intel ME. Such a vulnerability has the potential to jeopardize a number of technologies, including Intel Protected Audio Video Path (PAVP), Intel Platform Trust Technology (PTT / fTPM), Intel Boot Guard, and Intel Software Guard Extension (SGX).

By exploiting the vulnerability that we found in the bup module, we were able to turn on a mechanism, PCH red unlock, that opens full access to all PCH devices for their use via the DFx chain—in other words, using JTAG. One such device is the x86 ME processor itself, and so we obtained access to its internal JTAG interface. With such access, we could debug code executed on ME, read memory of all processes and the kernel, and manage all devices inside the PCH. We found a total of about 50 internal devices to which only ME has full access, while the main processor has access only to a very limited subset of them.

Our research does not claim to be the final word on ME security or free of errors. Nonetheless, we hope that this work will be of benefit to other researchers interested in platform security and ME internals.

Authors: Mark Ermolov, Maxim Goryachy

References


  1. Dmitry Sklyarov, "Intel ME: The Way of the Static Analysis", Troopers, 2017.
  2. Intel ME 11.x Firmware Images Unpacker, github.com/ptresearch/unME11.
  3. Xiaoyu Ruan, Platform Embedded Security Technology Revealed. Safeguarding the Future of Computing with Intel Embedded Security and Management Engine, Apress, ISBN 978-1-4302-6572-6, 2014.
  4. Igor Skochinsky, "Intel ME Secrets. Hidden code in your chipset and how to discover what exactly it does", RECON, 2014.
  5. Alexander Tereshkin, Rafal Wojtczuk, "Introducing Ring-3 Rootkits", Black Hat USA, 2009.
  6. Dmitry Sklyarov, "Intel ME: flash file system explained", Black Hat Europe, 2017.
  7. Alex Matrosov, "Who Watch BIOS Watchers?", 2017, medium.com/@matrosov/bypass-intel-boot-guard-cc05edfca3a9.
  8. Mark Ermolov, Maxim Goryachy, "How to Become the Sole Owner of Your PC", PHDays VI, 2016, 2016.phdays.com/program/51879.
  9. Vassilios Ververis, "Security Evaluation of Intel’s Active Management Technology", Sweden, TRITA-ICT-EX-2010:37, 2010.
  10. Dmitriy Evdokimov, Alexander Ermolov, Maksim Malyutin, "Intel AMT Stealth Breakthrough", Black Hat USA, 2017.
  11. Intel Management Engine Critical Firmware Update (Intel-SA-00086), intel.com/sa-00086-support

Apple fixes security hole in Intel ME discovered by Positive Technologies

$
0
0
Apple has released a security update for macOS High Sierra 10.13.2, macOS Sierra 10.12.6 and OS X El Capitan 10.11.6, that patches a vulnerability in Intel Management Engine found by Positive Technologies experts Mark Ermolov and Maxim Goryachy. Details are available in a security document on the Apple support website.

Intel Management Engine is a microcontroller integrated into the Platform Controller Hub (PCH) with a set of built-in peripherals. Since the PCH is the conduit for almost all communication between the CPU and external devices, Intel ME has access to practically all data on the computer. The researchers found a flaw that allows running unsigned code on the PCH on any chipset for Skylake processors and later. The vulnerability is detailed in a November 20 advisory on the Intel Security Center website. Vulnerable chipsets are used worldwide on an enormous number of devices, from consumer and business laptops to corporate servers.

Maxim Goryachy and Mark Ermolov gave a technical talk about Intel ME security at Black Hat Europe in London in December 2017. The full text of their research is available on the Positive Technologies blog.

The researchers also found that Intel's patch does not rule out the possibility of exploitation of vulnerabilities CVE-2017-5705, CVE-2017-5706, and CVE-2017-5707. An attacker possessing write access to the ME region can always write a vulnerable version of Intel ME firmware to SPI flash (in effect, downgrading Intel ME) in order to exploit the vulnerabilities.

New bypass and protection techniques for ASLR on Linux

$
0
0

0. Abstract


The Linux kernel is used on systems of all kinds throughout the world: servers, user workstations, mobile platforms (Android), and smart devices. Over the life of Linux, many new protection mechanisms have been added both to the kernel itself and to user applications. These mechanisms include address space layout randomization (ASLR) and stack canaries, which complicate attempts to exploit vulnerabilities in applications.

This whitepaper analyzes ASLR implementation in the current version of the Linux kernel (4.15-rc1). We found problems that allow bypassing this protection partially or in full. Several fixes are proposed. We have also developed and discussed a special tool to demonstrate these issues. Although all issues are considered here in the context of the x86-64 architecture, they are also generally relevant for most Linux-supported architectures.

Many important application functions are implemented in user space. Therefore, when analyzing the ASLR implementation mechanism, we also analyzed part of the GNU Libc (glibc) library, during which we found serious problems with stack canary implementation. We were able to bypass stack canary protection and execute arbitrary code by using ldd.

This whitepaper describes several methods for bypassing ASLR in the context of application exploitation.


1. ASLR


Address space layout randomization is a technology designed to impede exploitation of certain vulnerability types. ASLR, found in most modern operating systems, works by randomizing addresses of a process so that an attacker is unable to know their location. For instance, these addresses are used to:

  • Delegate control to executable code.
  • Make a chain of return-oriented programming (ROP) gadgets (1).
  • Read (overwrite) important values in memory.


The technology was first implemented for Linux in 2005. In 2007, it was introduced in Microsoft Windows and macOS as well. For a detailed description of ASRL implementation in Linux, see (2).


Since the appearance of ASLR, attackers have invented various methods of bypassing it, including:

  • Address leak: certain vulnerabilities allow attackers to obtain the addresses required for an attack, which enables bypassing ASLR (3).
  • Relative addressing: some vulnerabilities allow attackers to obtain access to data relative to a particular address, thus bypassing ASLR (4).
  • Implementation weaknesses: some vulnerabilities allow attackers to guess addresses due to low entropy or faults in a particular ASLR implementation (5).
  • Side channels of hardware operation: certain properties of processor operation may allow bypassing ASLR (6).


Note that ASLR is implemented very differently on different operating systems, which continue to evolve in their own directions. The most recent changes in Linux ASLR involved Offset2lib (7), which was released in 2014. Implementation weaknesses allowed bypassing ASLR because all libraries were in close proximity to the binary ELF file image of the program. The solution was to place the ELF file image in a separate, randomly selected region.
In April 2016, the creators of Offset2lib also criticized the current implementation, pointing out the lack of entropy by ASLR-NG when selecting a region address (8). However, no patch has been published to date.
With that in mind, let's take a look at how ASLR currently works on Linux.

2. ASLR on Linux


First, let us have a look at Ubuntu 16.04.3 LTS (GNU/Linux 4.10.0-40-generic x86_64) with the latest updates installed. The workings are largely the same regardless of Linux distribution or kernel version after 3.18-rc7. If we run "less /proc/self/maps" from the Linux command line, we will see something resembling the following:



  • The base address of the binary application (/bin/less, in our case) is 5627a82bf000.
  • The heap start address is 5627aa2d4000, being the address of the end of the binary application plus a random value, which in our case equals 1de7000 (5627aa2d40005627a84ed000). The address is aligned to 2^12 due to the x86-64 architecture.
  • Address 7f3631293000is selected as mmap_base. The address will serve as the upper boundary when a random address is selected for any memory allocation via the mmap system call.
  • Libraries ld-2.23.so, libtinfo.so.5.9, and libc-2.23.so are located consecutively.


If subtraction is applied to the neighboring memory regions, we will note the following: there is a substantial difference between the binary file, heap, stack, the lowest local-archive address, and the highest ld address. There is not a single free page between the loaded libraries (files).

If we repeat the procedure several times, the picture will remain practically the same: the difference between pages will vary, while libraries and files will remain identical in location relative to one another. This fact will be crucial for our analysis.

3. Memory allocation: inner workings


Now we will look at the mechanism used to allocate virtual memory of a process. The logic is stored in the do_mmap kernel function, which implements memory allocation both on the part of the user (mmap syscall) and on the part of the kernel (when executing execve). In the first stage, an available address is selected ( get_unmapped_area); in the second stage, pages are mapped to that address (mmap_region). We will start with the first stage.

The following options are possible when selecting an address:

  1. If the MAP_FIXED flag is set, the system will return the value of the addr argument as the address.
  2. If the addr argument value is not zero, this value is used as a hint and, in some cases, will be selected. 
  3. The largest address of an available region will be selected as the address, as long as it is suitable in length and lies within the allowed range of selectable addresses.
  4. The address is checked for security-related restrictions. (For details, see Section 7.3.)


If all is successful, the region of memory at the selected address will be allocated.

Details of address selection algorithm


The structure underlying the manager of process virtual memory is vm_area_struct (or vma, for short):




This structure describes the start of the virtual memory region, the region end, and access flags for pages within the region.

vma is organized in a doubly linked list (9) of region start addresses, in ascending order, and also an augmented red-black tree (10) of region start addresses, in ascending order as well. A good rationale for this solution is given by the kernel developers themselves (11).


Example of a vma doubly linked list in the ascending order of addresses



The red-black tree augment is the amount of available memory for a particular node. The amount of available memory for a node is defined as whichever is the highest of:

  • The difference between the start of the current vma and end of the preceding vma in an ascending-ordered doubly linked list 
  • Amount of available memory of the left-hand subtree 
  • Amount of available memory of the right-hand subtree




Example of an augmented vma red-black tree

This structure makes it possible to quickly search (in O(log n) time) for the vma that corresponds to a certain address or select an available range of a certain length.

During the address selection process, two important boundaries are identified as well: the minimum lower boundary and the maximum upper boundary. The lower boundary is determined by the architecture as the minimum allowable address or as the minimum value permitted by the system administrator. The upper boundary—mmap_base—is selected as stackrandom, where stack is the maximum stack address while random is a random value with entropy of 28 to 32 bits, depending on relevant kernel parameters. The Linux kernel cannot choose an address higher than mmap_base. In the address space of the address process, large mmap_base values either correspond to the stack and special system regions (vvar and vdso), or will never be used unless explicitly marked with the MMAP_FIXED flag.

So in this whole scheme, the following values remain unknown: the address of the start of the main thread stack, the base address for loading the application binary file, the start address of the application heap, and mmap_base, which is the starting address for memory allocation with mmap.

4. Problems with current implementation


The memory allocation algorithm just described has a number of weaknesses.

4.1 Close proximity of memory location


An application uses virtual RAM. Common uses of memory by an application include the heap, code, and data (.rodata, .bss) of loaded modules, thread stacks, and loaded files. Any mistake in processing the data from these pages may affect nearby data as well. As more pages with differing types of contents are located in close proximity, the attack area becomes larger and the probability of successful exploitation rises.

Examples of such mistakes include out-of-bounds (4), overflow (integer (12) or buffer (13)), and type confusion (14).

A specific instance of this problem is that the system remains vulnerable to the Offset2lib attack, as described in (7). In short: the base address for program loading is not allocated separately from libraries, yet the kernel selects it as mmap_base. If the application contains vulnerabilities, it becomes easier to exploit them, because library images are located in close proximity to the binary application image.

A good example demonstrating this problem is a PHP vulnerability in (15) that allows reading or altering neighboring memory regions.

Section 5 will provide several examples.

4.2 Fixed method of loading libraries


In Linux, dynamic libraries are loaded practically without calling the Linux kernel. The ld library (from GNU Libc) is in charge of this process. The only way the kernel participates is via the mmap function (we will not yet consider open/stat and other file operations): this is required for loading the code and library data into the process address space. An exception is the ld library itself, which is usually written in the executable ELF file as the interpreter for file loading. As for the interpreter, it is loaded by the kernel.

If ld from GNU Libc is used as the interpreter, libraries are loaded in a way resembling the following:

  1. The program ELF file is added to the file queue for processing.
  2. The first ELF file is taken out of the queue (FIFO).
  3. If the file has not been loaded yet into the process address space, it is loaded with the help of mmap.
  4. Each library needed for the file in question is added to the queue of files for processing.
  5. As long as the queue is not empty, repeat step 2.


This algorithm means that the order of loading is always determinate and can be repeated if all the required libraries (their binary files) are known. This allows recovering the addresses of all libraries if the address of any single library is known:

  1. Assume that the address of the libc library is known.
  2. Add the length of the libc library to the libc loading address—this is the loading address of the library that was loaded before libc.
  3. Continuing in the same manner, we obtain mmap_base values and addresses of the libraries that were loaded before libc.
  4. Subtract from the libc address the length of the library loaded after libc. This is the address of the library loaded after libc.
  5. Iterating in the same manner, we obtain the addresses of all libraries that were loaded at program start with the ld interpreter.


If a library is loaded while the program is running (for instance, via the dlopen function), its position in relation to other libraries may be unknown to attackers in some cases. For example, this may happen if there were mmap calls for which the size of allocated memory regions is unknown to attackers.

When it comes to exploiting vulnerabilities, knowledge of library addresses helps significantly: for instance, when searching for gadgets to build ROP chains. What's more, if any library contains a vulnerability that allows reading or writing values relative to the library address, such a vulnerability will be easily exploited, since the libraries are sequential.

Most Linux distributions contain compiled packages with the most widespread libraries (such as libc). This means that the length of libraries is known, giving a partial picture of the distribution of virtual address space of a process in such a case.

Theoretically, one could build a large database for this purpose. For Ubuntu, it would contain versions of libraries including ld, libc, libpthread, and libm; for each version of a library, multiple versions of libraries necessary for it (dependencies) may be analyzed. So by knowing the address of one library, one can know possible map versions describing the distribution of part of the process address space.

Examples of such databases are libcdb.com and libc.blukat.me, which are used to identify libc versions based on offsets for known functions.

All this means that a fixed method of loading libraries is an application security problem. The behavior of mmap, described in the previous section, compounds the problem. In Android, this problem is solved in version 7 and later (16) (17).


4.3 Fixed order of execution


Programs have an interesting property: there is a pair of certain points in the execution thread between which the program state is predictable. For example, once a client has connected to a network service, the service allocates some resources to the client. Part of these resources may be allocated from the application heap. In this case, the relative position of objects in the heap is usually predictable.

This property is useful for exploiting applications, by "building" the program state required by an attacker. Here we will call this state a fixed order of execution.

In some cases of this property, there is a certain fixed point in the thread of execution. At this point, from the start of execution, from launch to launch, the program state remains identical except for some variables. For example, before the main function is executed, the ld interpreter must load and initialize all the libraries and then initialize the program. As noted in Section 4.2, the relative position of libraries will always be the same. During execution of the main function, the differences will consist in the specific addresses used for program loading, libraries, stack, heap, and objects allocated in memory. These differences are due to the randomization described in Section 6.

As a result, an attacker can obtain information on the relative position of program data. This position is not affected by randomization of the process address space.

At this stage, the only possible source of entropy is competition between threads: if the program creates several threads, their competition in working with the data may introduce entropy to the location of objects. In this example, creating threads before executing the main function is possible with the help of the program global constructors or required libraries.

When the program starts using the heap and allocating memory from it (usually with the help of new/malloc), the mutual position of objects in the heap will remain constant for each launch up to a certain moment.

In some cases, the position of thread and heap stacks will also be predictable in relation to library addresses.

If needed, it is possible to obtain these offsets to use in exploitation. One way is to simply execute "strace -e mmap" for this application twice and compare the difference in addresses.

4.4 Holes


If an application allocates memory with mmap and then frees up part of that memory, this can cause holes—free memory regions that are surrounded by occupied regions. Problems may come up if this free memory (hole) is again allocated for a vulnerable object (a object during whose processing the application demonstrates a vulnerability). This brings us back to the problem of closely located objects in memory.

One illustrative example of such holes was found in the code for ELF file loading in the Linux kernel. When loading the ELF file, the kernel first reads the size of the file and tries to map it in full via do_mmap . Once the file has been fully loaded, the memory after the first segment is freed up. All following segments are loaded at a fixed address ( MAP_FIXED) that is set relative to the first segment. All this is needed in order to load the entire file at the selected address and separate segments by rights and offsets in accordance with their descriptions in the ELF file. This approach can cause memory holes if the holes were present in the ELF file between segments.
In the same situation, during loading of an ELF file, the ld interpreter (GNU Libc) does not call unmap but changes permissions for the free pages (holes) to PROT_NONE, which forbids the process from having any access to these pages. This approach is more secure.

To fix the problem of ELF file loading and related holes, the Linux kernel features a patch implementing the same logic as in ld from GNU Libc (see Section 7.1).

4.5 TLS and thread stack


Thread Local Storage (TLS) is a mechanism whereby each thread in a multithread process can allocate locations for data storage (18). The mechanism is implemented differently on different architectures and operating systems. In our case, this is the glibc implementation under x86-64. For x86, any difference will not be material for the mmap problem in question.

In the case of glibc, mmap is also used to create TLS. This means that TLS is selected in the way described already here. If TLS is close to a vulnerable object, it can be altered.

What is interesting about TLS? In the glibc implementation, TLS is pointed to by the segment register fs (for the x86-64 architecture). Its structure is described by the tcbhead_t type defined in glibc source files:



This type contains the field stack_guard, which contains a so-called canary—a random or pseudorandom number for protecting an application from stack overflows (19).
This protection works in the following way: when a function is entered, a canary obtained from tcbhead_t.stack_guard is placed on the stack. At the end of the function, the stack value is compared to the reference value in tcbhead_t.stack_guard. If the two values do not match, the application will return an error and terminate.

Canaries can be bypassed in several ways:

  • If an attacker does not need to overwrite this value (20).
  • If an attacker has managed to read or anticipate this value, making it possible to perform a successful attack (20).
  • If an attacker can overwrite this value with a known one, making it possible to cause a stack overflow (20).
  • An attacker can take control before the application terminates (21).
  • The listed bypasses highlight the importance of protecting TLS from reading or overwriting by an attacker.

Our research revealed that glibc has a problem in TLS implementation for threads created with the help of pthread_create. Say that it is required to select TLS for a new thread. After allocating memory for the stack, glibc initializes TLS in upper addresses of this memory. On the x86-64 architecture considered here, the stack grows downward, putting TLS at the top of the stack. Subtracting a certain constant value from TLS, we obtain the value used by a new thread for the stack register. The distance from TLS to the stack frame of the function that the argument passed to pthread_create is less than one page. Now a would-be attacker does not need to guess or peek at the canary value—the attacker can just overwrite the reference value alongside with the stack value, bypassing protection entirely. A similar problem was found in Intel ME (22).

4.6 malloc and mmap


When using malloc, sometimes glibc uses mmap for allocating new memory areas if the size of requested memory is larger than a certain value. In such cases, memory will be allocated with the help of mmap, so, after memory allocation, the address will be close to libraries or other data allocated with mmap. Attackers pay close attention to mistakes in handling of heap objects, such as heap overflow, use after free (23), and type confusion (14).

An interesting behavior of the glibc library was found when a program uses pthread_create. At the first call of malloc from the thread created with pthread_create, glibc will call mmap to create a new heap for this stack. So, in this thread, all the addresses called via malloc will be located close to the stack of this same thread. (For details, see Section 5.7.)

Some programs and libraries use mmap for mapping files to the address space of a process. The files may be used as, for example, cache or for fast saving (altering) of data on the drive.

Here is an abstract example: an application loads an MP3 file with the help of mmap. Let us call the load address mmap_mp3. Then the application reads, from the loaded data, the offset to the start of audio data. If the application contains a mistake in its routine for verifying the length of that value, an attacker may specially craft an MP3 file able to obtain access to the memory region located after mmap_mp3.

4.7 MAP_FIXED and loading of ET_DYN ELF files


The mmap manual says the following regarding the MAP_FIXED flag:

MAP_FIXED

Don't interpret addr as a hint: place the mapping at exactly that address. addr must be a multiple of the page size. If the memory region specified by addr and len overlaps pages of any existing mapping(s), then the overlapped part of the existing mapping(s) will be discarded. If the specified address cannot be used, mmap() will fail. Because requiring a fixed address for a mapping is less portable, the use of this option is discouraged.

If the requested region with the MAP_FIXED flag overlaps existing regions, successful mmap execution will overwrite existing regions.

Therefore, if a programmer makes a mistake with MAP_FIXED, existing memory regions may be redefined.

An interesting example of such a mistake has been found both in the Linux kernel and in glibc.
As described in (24), ELF files are subject to the requirement that, in the Phdr header, ELF file segments must be arranged in ascending order of vaddr addresses:

PT_LOAD

The array element specifies a loadable segment, described by p_filesz and p_memsz. The bytes from the file are mapped to the start of the memory segment. If the segment's memory size (p_memsz) is larger than the file size (p_filesz), the "extra" bytes are defined to hold the value 0 and to follow the segment's initialized area. The file size may not be larger than the memory size. Loadable segment entries in the program header table appear in ascending order, sorted on the p_vaddr member.

However, this requirement is not checked. The current code for ELF file loading is as follows:


All segments are processed according to the following algorithm:

  1. Calculate the size of the loaded ELF file: the address of the end of the last segment end, minus the start address of the first segment.
  2. With the help of mmap,allocate memory for the entire ELF file with that size, thus obtaining the base address for ELF file loading.
  3. In the case of glibc, change access rights. If loading from the kernel, release regions that create holes. Here the behavior of glibc and the Linux kernel differ, as described in Section 4.4.
  4. With the help of mmap and the MAP_FIXED flag, allocate memory for remaining segments by using the address obtained by isolating the first segment and adding the offset obtained from the ELF file header.


This enables an intruder to create an ELF file, one of whose segments can fully overwrite an existing memory region—such as the thread stack, heap, or library code.

An example of a vulnerable application is the ldd tool, which is used to check whether required libraries are present in the system. The tool uses the ld interpreter. Taking advantage of the problem with ELF file loading just discussed, we succeeded in executing arbitrary code with ldd:


The issue of MAP_FIXED has also been raised in the Linux community previously (25). However, no patch has been accepted.



For informational purposes, the source code of this example is located in the folder evil_elf.

4.8 Cache of allocated memory


glibc has many different caches, of which two are interesting in the context of ASLR: the cache for the stack of a newly created thread and the heap stack. The stack cache works as follows: on thread termination, stack memory will not be released but will be transferred to the corresponding cache. When creating a thread stack, glibc first checks the cache. If the cache contains a region of the required length, glibc uses that region. In this case, mmap will not be accessed, and the new thread will use the previously used region with the same addresses. If the attacker has successfully obtained the thread stack address, and can control creation and deletion of program threads, the intruder can use knowledge of the address for vulnerability exploitation. Further, if the application contains uninitialized variables, their values can also be subject to the attacker's control, which may lead to exploitation in some cases.

The heap cache works as follows: on thread termination, its heap moves to the corresponding cache. When a heap is created again for a new thread, the cache is checked first. If the cache has an available region, this region will be used. In this case, everything about the stack in the previous paragraph applies here as well.

5. Examples


There may be other cases where mmap is used. This means that this problem leads to a whole class of potentially vulnerable applications.
Here are some examples illustrating these problems.

5.1 Stacks of two threads


Using pthread_create, let us create two threads and calculate the difference between local variables of both threads. Source code:



Output after the first launch:



Output after the second launch:



As we can see, even as the addresses of variables change, the difference between them remains the same. In this example, the difference is marked by the word "Diff"; address values are given it. The example shows that vulnerable code from the stack of one thread may affect another thread or any neighboring memory region, whether or not ASLR is present.

5.2 Thread stack and large buffer allocated with malloc


Now, in the main thread of an application, let us allocate a large amount of memory with the help of malloc and launch a new thread. We then calculate the difference between the address obtained with malloc and the variable in the stack of the new thread. Here is the source code:



Output after the first launch:
Output after the second launch:



Again, the difference does not change. This example shows that when a large buffer allocated with malloc is processed, vulnerable code can affect the stack of a new thread despite ASLR protections.

5.3 mmap and thread stack


Here we will allocate memory with the help of mmap and launch a new thread with pthread_create. Then we calculate the difference between the address allocated with mmap and the address of the variable in the stack of the new thread. Here is the source code:


Output after the first launch:


Output after the second launch:





The difference remains unchanged. This example shows that when a buffer allocated with mmap is processed, vulnerable code can affect the stack of a new thread, regardless of ASLR.

5.4 mmap and TLS of the main thread


Let us allocate memory with the help of mmap to obtain the TLS address of the main thread. Then we calculate the difference between the two and make sure that the canary value on the main thread stack is the same as the TLS value. Here is the source code:



Output after the first launch:



Output after the second launch:





As seen here, the difference remains the same from launch to launch, while the canary values match. So if a corresponding vulnerability is present, it is possible to alter the canary and bypass protection. For example, this could be a buffer overflow vulnerability in the stack and a vulnerability allowing to overwrite memory at an offset from the region allocated with mmap. In this example, the offset equals 0x85c8700. The example shows a method of bypassing ASLR and the stack canary.


5.5 mmap and glibc


A similar example was discussed in Section 4.2, but here is one with a slightly different twist: let us allocate memory with mmap to obtain the difference between this address and thesystem and execv functions from the glibc library. Here is the source code:


Output after the first launch:



Output after the second launch:




As we can see, the difference between the allocated region and functions remains the same. This example shows a method of bypassing ASLR when vulnerable code interacts with a buffer allocated with mmap. The distances (in bytes) to library functions and data will remain constant, which can also be used for exploiting the application.

5.6 Buffer overflow in child thread stack


Let us create a new thread and overflow the stack buffer up to the TLS value. If there are no arguments in the command line, we will not overwrite the TLS canary—otherwise, we will. This argument logic is simply a way of showing the difference in the program's behavior.
Overwriting will be done with the 0x41 byte. Here is the source code:



In this example, protection was successful in detecting the stack overflow and terminating the application with an error before an attacker could seize control. Now we overwrite the reference canary value:



In the second example, we successfully overwrite the canary and execute the pwn_payload function with launch of the sh interpreter.

This example shows a method of bypassing stack overflow protections. To carry out exploitation successfully, an attacker needs to overwrite a sufficient number of bytes in order to overwrite the canary reference value. In this example, the attacker needs to overwrite at least 0x7b8+0x30, or 2024, bytes.

5.7 Thread stack and small buffer allocated with malloc


Let us now create a thread, allocate some memory with malloc, and calculate the difference from the local variable in this thread. Source code:



The first launch:




And the second launch:





In this case, the difference was not the same. Nor will it remain the same from launch to launch. Let us consider the reasons for this.


The first thing to note: the malloc-derivedpointer address does not correspond to the process heap address.

glibc creates a new heap for each new thread created with the help of pthread_create. The pointer to this heap lies in TLS, so any thread allocates memory from its own heap, which increases performance, since there is no need to sync threads in case of concurrent malloc use.
But why then is the address "random"?

When allocating a new heap, glibc uses mmap; the size depends on the configuration. In this case, the heap size is 64 MB. The heap start address must be aligned to 64 MB. So the system first allocates 128 MB and then aligns a piece of 64 MB in this range while unaligned pieces are released and create a "hole" between the heap address and the closest region that was previously allocated with mmap.
Randomness is brought about by the kernel itself when selectingmmap_based: this address is not aligned to 64 MB, as were the mmap memory allocations before the call of the malloc in question.
Regardless of why address alignment is required, this leads to a very interesting effect: bruteforcing becomes possible.

The Linux kernel defines the process address space for x86-64 as "47 bits minus one guard page", which for simplicity we will round to 2^47 (omitting the one-page subtraction in our size calculations). 64 MB is 2^26, so the significant bits equal 47 – 26 = 21, giving us a total of 2^21 various heaps of secondary threads.

This substantially narrows the bruteforcing range.

Because the mmap address is selected in a known way, we can assume that the heap of the first thread created with pthread_create will be selected as 64 MB close to the upper address range. To be more precise, it will be close to all the loaded libraries, loaded files, and so on.
In some cases, it is possible to calculate the total amount of memory allocated before the call to the malloc in question. In our case, we loaded only glibc and ld and created a stack for the thread. So this value is small.

Section 6 will show the way in which the mmap_base address is selected. But for now, here is some additional information: mmap_base is selected with an entropy of 28 to 32 bits depending on kernel settings at compile time (28 bits by default). So some top boundary is set off by that same amount.
Thus, in many cases, the upper 7 bits of the address will equal 0x7f, while in rare cases, they will be 0x7e. That gives us another 7 bits of certainty. There are a total of 2^14 possible options for selecting a heap for the first thread. The more threads are created, the lesser that value is for the next heap selection.

Let us illustrate this behavior with the following C code:



Then let us launch the program a sufficient number of times with Python code for collecting address statistics:




This code launches the simple './t' program, which creates a new thread, a sufficient number of times.
The code allocates a buffer with the help of malloc and displays the buffer address. Once the program has completed this, the address is read and the program calculates how many times the address was encountered during operation of the program. The script collects a total of 16,385 different addresses, which equals 2^14+1. This is the number of attempts an attacker could make, in the worst case, to guess the heap address of the program in question.

There is another option—thread stack and large buffer allocated with malloc—but this is rather similar to the one described above. The only difference is that if the buffer size is too large, mmap is called again, so it is difficult to say where the allocated region will be placed: it may fill a hole or stand in front of the heap.

5.8 Stack cache and thread heaps


In this example, we will create a thread and allocate memory with malloc. Then we record the addresses of the thread stack and the pointer obtained with malloc. Let us then initialize a certain stack variable with the value 0xdeadbeef. Then we terminate the stack and create a new one, allocating memory with malloc. We compare the addresses and values of the variable, which this time is not initialized. Here is the source code:



Output:



As clearly seen, the addresses of local variables in the stack for consecutively created threads remain the same. Also the same are the addresses of variables allocated for them via malloc; some values of the first thread's local variables are still accessible to the second thread. An attacker can use this to exploit vulnerabilities of uninitialized variables (26). Although the cache speeds up the application, it also enables attackers to bypass ASLR and carry out exploitation.

6. Mapping process address space


When a new process is created, the kernel follows the algorithm below to determine its address space:

  1. After a call to execve, the virtual memory of the process is completely cleared.
  2. This creates the very first vma, which describes the process stack (stack_base). Initially, its address is selected as 2^47 – pagesize (where pagesize is the page size, on x86-64 this equals 4096), then it is adjusted by a certain random value random1 not exceeding 16 GB (this happens quite late, after the binary file base is selected, so some interesting effects are possible: if an application binary file occupies the entire memory, the stack will be next to the base address of the binary file).
  3. The kernel selects mmap_base, an address in relation to which the system will later load all the libraries in the process address space. The address is determined as stack_baserandom2– 128 MB where random2 is a random value whose upper boundary depends on the kernel configuration and has a range of 1 TB to 16 TB.
  4. The kernel tries to load the program binary file. If the file is PIE (regardless of the base loading address), the base address is (2^47 – 1) * 2/3 + random3,where the random3 value is also determined by the kernel configuration and has an upper boundary of 1 TB to 16 TB.
  5. If the file needs dynamically loaded libraries, the kernel tries to load an interpreter to load all the required libraries and perform all initializations. Usually, the interpreter in ELF files is ld from glibc. The address is selected in relation to mmap_base.
  6. The kernel sets the new process heap as the end of the loaded ELF file plus a certain random4 value with an upper boundary of 32 MB.

After these stages, the process is launched. The start address is either the one from the ELF file of the interpreter (ld) or the one from the ELF file of the program if there is no interpreter (a statically linked ELF).

If ASLR is on and if there is a possibility of loading at an arbitrary address, the program file process will look as follows:



Each library, being loaded with the interpreter, will get control if a list of global constructors is defined in it. In this case, library functions for allocating resources (global constructors) required for this library will be called.

Thanks to the known sequence of library loading, it is possible to obtain a certain point in the program execution thread that allows "building" memory regions in terms of their relative locations to one another, regardless of whether ASLR is present. Increasing knowledge about the libraries, their constructors, and program behavior will push this point further away from the point of process creation.

To determine specific addresses, one still needs a vulnerability allowing to obtain the address of some mmap region or to read (write) memory relative to a particular mmap region:

  • If an attacker knows the address of some mmap region that was allocated from the start of the process to the constant execution point (Section 4.3), the attacker can successfully calculate mmap_base and the address of any loaded library or any other mmap region.
  • If it is possible to address relative to a certain mmap region from the point of constant execution, it is not necessary to know any additional address.


To prove the feasibility of mapping process memory in this way, we wrote Python code to simulate kernel behavior when searching for new regions. The method of loading ELF files and the order of library loading were recreated as well. To simulate a vulnerability that allows reading library addresses, the /proc: file system was used: the script reads the ld address (thus recovering mmap_base) and, having the libraries, it repeats the process memory map. Then it compares the result with the original. The script completely repeated the address space of all processes. Script code is available at: https://github.com/blackzert/aslur

6.1 Attack vectors


Let us review some vulnerabilities that have already become "classic" because of their prevalence.
1. Heap buffer overflows: There are various well-known vulnerabilities in application operation with the glibc heap, as well as methods for exploiting them. We can categorize these vulnerabilities under two types: they either allow modifying memory relative to the address of the vulnerable heap, or allow modifying memory addresses known to the attacker. In some cases, it is possible to read arbitrary data from objects on the heap. This fact gives rise to several vectors:

• In case of modifying (reading) memory relative to an object in the heap, we are primarily interested in the heap of the thread created with pthread_create, because the distance from it to any library (stack) of the thread will be less than the distance from the heap of the main thread.
• In case of reading (writing) memory relative to some address, it is first of all necessary to try to read the addresses from the heap itself, as they usually contain pointers to vtable or to libc.main_arena.

Knowledge of the libc.main_arena address yields the glibc address and, subsequently, the mmap_base address. To obtain the vtable address, it is required to know the address of either some library (and hence mmap_base as well) or the program loading address. An attacker who knows the program loading address can read library addresses from the .got.plt section containing addresses for required library functions.

2. Buffer overflow:

• At the stack, it leads to the canary scenario in question.
• At the heap, it leads to the scenario described in #1.
• At the mmap region, it leads to overwriting neighboring regions, depending on the context.

7. Fixes


In this article, we have reviewed several problems; now we can consider fixes for some of them. Let us start with the simplest solutions and then proceed to more complicated ones.

7.1 Hole in ld.so


As shown in Section 4.4, the ELF interpreter loader in the Linux kernel contains an error and allows releasing part of the interpreter library memory. A relevant fix was proposed to the community, but was neglected without action:

https://lkml.org/lkml/2017/7/14/290

7.2 Order of loading ELF file segments

As noted above, in the kernel and in the code of the glibc library, there is no checking of file ELF segments: the code simply trusts that they are in the correct order. Proof-of-concept code, as well as a fix, is enclosed: https://github.com/blackzert/aslur

The fix is quite simple: we go through the segments and ensure that the current one does not overlap the next one, and that the segments are sorted in the ascending order of vaddr.

7.3 Use of mmap_min_addr when searching for mmap allocation addresses

As soon as a fix was written for mmap, in order to return addresses with sufficient entropy, a problem arose: some mmap calls failed with an access permission error. This happened even as root or when requested by the kernel when executing execve.

In the address selection algorithm (described earlier in Section 3), one of the listed options is checking addresses for security restrictions. In the current implementation, this check verifies that the selected address is larger than mmap_min_addr. This is a system variable that can be changed by an administrator through sysctl. The system administrator can set any value, and the process cannot allocate a page at an address less than this value. The default value is 65536.

The problem was that when the address function for mmap was called on x86-64, the Linux kernel used 4096 as the value of the minimal lower boundary, which is less than the value of mmap_min_addr. The function cap_mmap_addr forbids this operation if the selected address falls between 4096 and mmap_min_addr.

cap_mmap_addr is called implicitly; this function is registered as a hook for security checking. This architectural solution raises questions: first, we choose the address without having the ability to test it with external criteria, and then we check its permissibility in accordance with the current system parameters. If the address does not pass the check, then even if the address is selected by the kernel, it can be "forbidden" and the entire operation will end with the EPERM error.

An attacker can use this fact to cause denial of service in the entire system: if the attacker can specify a very large value, no user process can start in the system. Moreover, if the attacker manages to store this value in the system parameters, then even rebooting will not help—all created processes will be terminated with the EPERM error.

Currently, the fix is to use the mmap_min_addr value as the lowest allowable address when making a request to the address search function. Such code is already used for all other architectures.
What will happen if the system administrator starts changing this value on a running machine? This question remains unanswered, since all new allocations after the change may end with the EPERM error; no program code expects such an error and does not know what to do with it. The mmap documentation states the following:

"EPERM The operation was prevented by a file seal; see fcntl (2)."

That is to say, the kernel cannot return EPERM to MAP_ANONYMOUS, although in fact that is not so.

7.4 mmap


The main mmap problem discussed here is the lack of entropy in address choice. Ideally, the logical fix would be to select memory randomly. To select it randomly, one must first build a list of all free regions of appropriate size and then, from that list, select a random region and an address from this region that meets the search criteria (the length of the requested region and the allowable lower and upper boundaries).

To implement this logic, the following approaches can be applied:

1. Keep the list of voids in a descending-order array. In this case, the choice of random element is made in a single operation, but maintaining this array requires many operations for releasing (allocating) the memory when the current virtual address space map of the process changes.
2. Keep the list of voids in a tree and a list, in order to find an outer boundary that satisfies the length requirement, and select a random element from the array. If the element does not fit the minimum/maximum address restrictions, select the next one, and so on until one is found (or none remain). This approach involves complex list and tree structures similar to those already existing for vma with regard to change of address space.
3. Use the existing structure of the augmented red-black vma tree to bypass the list of allowed gap voids and select a random address. In the worst case, each choice will have to bypass all the peaks, but rebuilding the tree does not incur any additional slowdown of performance.

Our choice went to the last approach. We can use the existing vma organizational structure without adding redundancy and select an address using the following algorithm:
1. Use the existing algorithm to find a possible gap void with the largest valid address. Also, record the structure of vma following it. If there is no such structure, return ENOMEM.
2. Record the found gap as the result and vma as the maximum upper boundary.
3. Take the first vma structure from the doubly linked list. It will be a leaf in the red-black tree, because it has the smallest address.
4. Make a left-hand traversal of the tree from the selected vma, checking the permissibility of the free region between the vma in question and its predecessor. If the free region is allowed by the restrictions, obtain another bit of entropy. If the entropy bit is 1, redefine the current value of the gap void.
5. Return a random address from the selected gap void region.
One way to optimize the fourth step of the algorithm is not to enter subtrees whose gap extension size is smaller than the required length.

This algorithm selects an address with sufficient entropy, although it is slower than the current implementation.

As far as obvious drawbacks, it is necessary to bypass all vma structures that have a sufficient gap void length. However, this is offset by the absence of any performance slowdown when changing address space.

8. Testing fixes for ASLR


After applying the described fixes to the kernel, the process /bin/less looks as follows:



As seen in the example:

  1. All the libraries were allocated in random locations and are at a random distance from one another.
  2. The file /usr/lib/locale/locale-archive mapped with mmap is also located at random addresses.
  3. The hole in /lib/x86_64-linux-gnu/ld-2.26.so is not filled with any mmap mapping.


This patch was tested on Ubuntu 17.04 with Google Chrome and Mozilla Firefox running. No problems were found.

9. Conclusion


This research has demonstrated many interesting features of the kernel and glibc in terms of handling of program code. The problem of close memory location was articulated and considered in detail. The following problems were found:

  • The algorithm for choosing the mmap address does not contain entropy.
  • Loading of ELF files in the kernel and interpreter contains a segment processing error.
  • When searching for an address with do_mmap, the kernel does not take into account mmap_min_addr on the x86-64 architecture.
  • Loading an ELF file in the kernel allows creating memory holes in the program ELF file and the ELF file interpreter.
  • Using mmap to allocate memory for libraries, the ELF file interpreter from GNU Libc ld loads libraries at mmap_base-dependent addresses. In addition, libraries are loaded in a fixed order.
  • Using mmap to allocate a stack, heap, and TLS thread, the GNU Libc library places them at mmap_base-dependent addresses also.
  • The GNU Libc library places TLS threads created with pthread_create at the top of the stack, which allows bypassing buffer overflow protections on the stack by overwriting the canary.
  • The GNU Libc library caches previously allocated heaps (stacks) of threads, which, in some cases, allows successful exploitation of a vulnerable application.
  • The GNU Libc library creates a heap for new threads aligned to 2^26, which substantially narrows the bruteforcing range.


These problems help an attacker to bypass ASLR or protections against stack buffer overflows. For some of these problems, fixes (in the form of kernel patches) have been proposed here.
Proof-of-concept code has been presented for all problems mentioned. An algorithm ensuring sufficient entropy for address selection is proposed. The same approach can be used to analyze ASLR on other operating systems such as Windows and macOS.
A number of peculiarities of the GNU Libc implementation were reviewed; in some cases, these peculiarities inadvertently facilitate exploitation of vulnerable applications.

References

1. Erik Buchanan, Ryan Roemer, Stefan Savage, Hovav Shacham. Return-Oriented Programming: Exploits Without Code Injection. [Online] Aug 2008. https://www.blackhat.com/presentations/bh-usa-08/Shacham/BH_US_08_Shacham_Return_Oriented_Programming.pdf.
2. xorl. [Online] https://xorl.wordpress.com/2011/01/16/linux-kernel-aslr-implementation/.
3. Reed Hastings, Bob Joyce. Purify: Fast Detection of Memory Leaks and Access Errors. [Online] December 1992 https://web.stanford.edu/class/cs343/resources/purify.pdf.
4. Improper Restriction of Operations within the Bounds of a Memory Buffer. [Online] https://cwe.mitre.org/data/definitions/119.html.
5. AMD Bulldozer Linux ASLR weakness: Reducing entropy by 87.5%. [Online] http://hmarco.org/bugs/AMD-Bulldozer-linux-ASLR-weakness-reducing-mmaped-files-by-eight.html.
6. Dmitry Evtyushkin, Dmitry Ponomarev, Nael Abu-Ghazaleh. Jump Over ASLR: Attacking Branch Predictors to Bypass ASLR. [Online] http://www.cs.ucr.edu/~nael/pubs/micro16.pdf.
7. Hector Marco-Gisbert, Ismael Ripoll. Offset2lib: bypassing full ASLR on 64bit Linux. [Online] https://cybersecurity.upv.es/attacks/offset2lib/offset2lib.html.
8. Hector Marco-Gisbert, Ismael Ripoll-Ripoll. ASLR-NG: ASLR Next Generation. [Online] 2016 https://cybersecurity.upv.es/solutions/aslr-ng/ASLRNG-BH-white-paper.pdf.
9. Doubly linked list. [Online] https://en.wikipedia.org/wiki/Doubly_linked_list.
10. Bayer, Rudolf. Symmetric binary B-Trees: Data structure and maintenance algorithms. [Online] January 24, 1972 https://link.springer.com/article/10.1007%2FBF00289509.
11. Lespinasse, Michel. mm: use augmented rbtrees for finding unmapped areas. [Online] November 5, 2012 https://lkml.org/lkml/2012/11/5/673.
12. Integer Overflow or Wraparound. [Online] https://cwe.mitre.org/data/definitions/190.html.
13. Classic Buffer Overflow. [Online] https://cwe.mitre.org/data/definitions/120.html.
14. Incorrect Type Conversion or Cast. [Online] https://cwe.mitre.org/data/definitions/704.html.
15. CVE-2014-9427. [Online] https://www.cvedetails.com/cve/CVE-2014-9427/.
16. Security Enhancements in Android 7.0. [Online] https://source.android.com/security/enhancements/enhancements70.
17. Implement Library Load Order Randomization. [Online] https://android-review.googlesource.com/c/platform/bionic/+/178130/2.
18. Thread-Local Storage. [Online] http://gcc.gnu.org/onlinedocs/gcc-3.3/gcc/Thread-Local.html.
19. One, Aleph. Smashing The Stack For Fun And Profit. [Online] http://www.phrack.org/issues/49/14.html#article.
20. Fritsch, Hagen. Buffer overflows on linux-x86-64. [Online] April 16, 2009 http://www.blackhat.com/presentations/bh-europe-09/Fritsch/Blackhat-Europe-2009-Fritsch-Buffer-Overflows-Linux-whitepaper.pdf.
21. Litchfield, David. Defeating the Stack Based Buffer Overflow Prevention. [online] September 8, 2003 https://crypto.stanford.edu/cs155old/cs155-spring05/litch.pdf.
22. Maxim Goryachy, Mark Ermolov. HOW TO HACK A TURNED-OFF COMPUTER, OR RUNNING. [online] https://www.blackhat.com/docs/eu-17/materials/eu-17-Goryachy-How-To-Hack-A-Turned-Off-Computer-Or-Running-Unsigned-Code-In-Intel-Management-Engine-wp.pdf.
23. Use After Free. [online] https://cwe.mitre.org/data/definitions/416.html.
24. [Online] http://www.skyfree.org/linux/references/ELF_Format.pdf.
25. Hocko, Michal. mm: introduce MAP_FIXED_SAFE. [Online] https://lwn.net/Articles/741335/.
26. Use of Uninitialized Variable. [online] https://cwe.mitre.org/data/definitions/457.html.

The First Rule of Mobile World Congress Is: You Do Not Show Anyone Your Mobile World Congress Badge

$
0
0

The biggest event of the telecom industry attracted particularly wide media coverage this year: the King of Spain personally arrived in Barcelona for the opening of the annual Mobile World Congress (MWC 2018), which caused a wave of protests by supporters of the region's independence from Madrid. As a result, newspaper front pages and TV channel prime time are all taken by high tech and telecom innovations against the backdrop of protesting crowds. And it is recommended that all participants and visitors to the Congress should not wear a badge outside the venue for greater security.


The Mobile World Congress has been held annually for more than 30 years.


Among the participants are mobile operators, manufacturers of all kinds of communication devices, application developers, and even auto giants and international payment systems. Without exaggeration, all industry players try to time their long-awaited announcements for the date of the event. The unspoken motto is: if you have your place in the mobile world, even if a small one, you must be in Barcelona! Even Apple, which is traditionally absent from shows, including the MWC, makes its presence felt here: when journalists describe new versions of gadgets by dozens of Asian vendors, they occasionally allow themselves comparisons—"like Apple," "no worse than Apple," "just like Apple."

What immediately caught the eye this year is the abundance of robots:



and the dominance of automotive brands, including Mercedes, Audi, Smart:




, and even Bentley on the Visa stand symbolizing the concept of connected car.


Robots not only attract the attention of visitors but also do a good job at stands and serve as a reminder of the expanded possibilities of the Internet of things. With cars, it is a different story: on the one hand, more and more of their components can to some extent access the Internet; on the other hand, visitors take little interest in it and only use them for taking pictures. Although it is the presence of such high-tech devices at such a high-profile event that should make one think about how it really works and how safe it is when your car is connected to something out there and most importantly—what for. And this is exactly where all the horror stories about hacking IoT gadgets come to mind. Examples are plentiful—security threats to connected cars have been detailed time and again. By the way, one of the halls had a Ferrari on display, all strewn with sponsors' logos, including Kaspersky Lab, which is gratifying—at least such a reminder may make the participants finally think seriously about the security of mobile solutions that they offer.


In general, the main topics of the MWC 2018 and its key words are best summarized on the stand of the French corporation Atos:


Literally everything is mentioned there, including blockchain 

As far as security is concerned, AV vendors were the highlights of the MWC, although the number of information security companies at the show should be times over, and this understanding will soon inevitably come! Among AV vendors, noteworthy are the already mentioned Kaspersky Lab, which devoted its participation in this year's show to the security of the Internet of things, as well as Avast with its new Smart Life solution for IoT device security.



By the way, one of the walls of Kaspersky Lab's stand is devoted to video graphics of how attackers use holes in IoT security and what they can really do. These are notorious use cases, which should convince vendors of the importance of taking security into account when launching their smart devices.

Given the lack of attention to information security, we, Positive Technologies, could not stay away and decided to fill this gap—in the format of a special event for key experts in the telecom industry. My London colleagues and I told representatives of the largest telecom operators how hackers attack SS7 networks and what operators can do to protect themselves from attackers.


For the past three years, we have not only analyzed possible threats and vectors of attacks via mobile networks but also detected real attacks using PT Telecom Attack Discovery.

It is no secret that today cybercriminals are not only aware of the security flaws of signaling networks but also actively exploit these vulnerabilities. Our monitoring shows that attackers spy on subscribers, intercept calls, bypass billing systems, block users. Just one large operator with several dozen million subscribers is attacked more than 4,000 times daily.

Security monitoring projects in SS7 networks were conducted for large telecom operators in Europe and the Middle East. Attacks aimed at fraud, disruption of subscriber availability, interception of subscriber traffic (including calls and text messages) totaled less than two percent. However, these are the most dangerous threats for users.

According to our research, 100 percent of attacks aimed at intercepting text messages are successful. Theft of security codes sent in this way is fraught with compromising e-banking and mobile banking systems, online stores, e-government portals, and many other services. Another type of attack—denial of service—is a threat to electronic IoT devices. Today, not only individual user devices are connected to mobile communication networks but also elements of smart city infrastructure, modern industrial enterprises, transport, energy, and other companies.

Fraud against the operator or subscribers is also a matter of serious concern. An essential part of such attacks are related to unauthorized sending of USSD requests (81%). Such requests allow transferring money from a subscriber's account, enabling premium-rate services for a subscriber, or sending phishing messages on behalf of a trusted service.

We raise this issue year after year, our task is to warn about real threats so that operators paid significantly more attention to security, while all ordinary subscribers were also alert and did not fall prey at least to banal social engineering. It is gratifying to see operators growing aware of the existing risks and drawing conclusions: in 2017, all analyzed networks used SMS Home Routing, and one in three networks had signaling traffic filtering and blocking enabled. But this is not enough. Today, we still see that all the networks that we analyzed are prone to vulnerabilities caused both by occasional incorrect setup of equipment and by architectural flaws of SS7 signaling networks that cannot be eliminated using existing tools.

Countering criminals takes a comprehensive approach to security. It is necessary to regularly assess the security of the signaling network in order to discover existing vulnerabilities and develop measures to reduce the risks of realizing threats, and keep security settings up to date afterwards. It is also important to continuously monitor and analyze messages that cross the network perimeter to detect potential attacks. This task can be performed by a threat detection and response system, which allows discovering illegitimate activity in its early stages and blocking suspicious requests. This approach allows providing a high level of protection without disrupting the normal functioning of the mobile network.

Author: Dmitry Kurbatov, Head of Telecommunications Security, Positive Technologies

How to assemble a GSM phone based on SDR

$
0
0

The smartphones so familiar to most of us contain an entire communication module separate from the main CPU. This module is what makes a "smartphone" a "phone." Regardless of whether the phone's user-facing operating system is Android or iOS, the module usually runs a proprietary closed-source operating system and handles all voice calls, SMS messages, and mobile Internet traffic.

Of course, open-source projects are more interesting to security researchers than closed-source ones. The ability to look under the hood and see how a particular program component works makes it possible to find and fix errors, plus verify that undocumented functionality is not present. As a pleasant bonus, access to source code helps novice developers to learn from colleagues and make contributions of their own.

OsmocomBB project


These considerations inspired creation of the Open Source Mobile Communications (Osmocom) project all the way back in 2008. Initially, the developers' attention was focused on OpenBSC, a project for running a self-contained cellular network on hefty commercial base stations. OpenBSC was first presented in 2008 at the Chaos Communication Congress annual conference.

Over time, Osmocom branched out into more projects. Today it serves as the umbrella for dozens of initiatives, one of which is OsmocomBB—a free and open-source implementation of the mobile-side GSM protocol stack. Unlike its predecessors, such as TSM30, MADos for Nokia 33XX, and Airprobe, OsmocomBB caught the attention of researchers and developers, and continues to be developed.

OsmocomBB was initially envisioned as a full-fledged firmware for open-source cell phones, including a GUI and other components, focusing on an alternative implementation of the GSM protocol stack. However, this idea did not catch on among potential users, so OsmocomBB today serves as an indispensable set of research tools and learner’s aid for those new to GSM.

With OsmocomBB, researchers can assess the security of GSM networks and investigate how the radio interface (Um interface) functions on cellular networks. What kind of encryption is used, if any? How often are encryption keys and temporary subscriber IDs changed? What is the likelihood that a voice call or SMS message will be intercepted or forged by an attacker? OsmocomBB allows quickly finding the answers to these and many other questions. Some of the many other uses include launching a small GSM base station, investigating the security of SIM cards, and sniffing traffic.

Similar to the case with the Aircrack-ng project and network cards, OsmocomBB's primary hardware platform consists of mobile phones based on the Calypso chipset. Generally speaking, these are Motorola C1XX phones. At the start of OsmocomBB development, it was decided to use the phones in the interest of saving time—otherwise, the process of designing and manufacturing new equipment could drag on indefinitely. Another reason was that some parts of the source code and specifications for the Calypso chipset had been leaked to the Internet, which gave a head start to reverse engineering of the firmware and subsequent development.

However, this expedient came at a price. Phones based on the Calypso chipset are no longer produced, forcing researchers to search for secondhand models. Moreover, some parts of the current implementation of the GSM stack physical layer are heavily based on a Digital Signaling Processor (DSP). The code of this processor is proprietary and not fully known to the public. Both factors create roadblocks for OsmocomBB, reducing its potential and increasing the barrier to entry for developers and project users at large. As just one example, implementing GPRS support is impossible without changing the DSP firmware.


Breathing in new life with a new hardware platform


The OsmocomBB software consists of a number of applications, each of which has a specific purpose. Some applications run directly on a computer in any UNIX-like environment. Other applications are provided in the form of firmware that must be loaded on to the phone. The applications interact through the phone serial port, which is combined with the headset connector. In other words, an unremarkable TRS connector (2.5-mm microjack) can be used for transmitting both sound and data! Similar technology is used in smartphones in order to support headphone remotes, selfie sticks, and other accessories.

Lack of other interfaces (such as USB) and the need to use a serial port also impose certain limitations, particularly on the data transfer rate. The low bandwidth of the serial interface limits the ability to sniff traffic and run a base station. Moreover, a ready-made cable for connecting the phone to USB is difficult to find; in most cases this cable must be DIY'ed, raising the barrier to entry higher still.

Eventually, the combination of these difficulties gave rise to the idea of switching to a different hardware platform that would remove these software and hardware limits. Such a platform should be available to everyone, in terms of both physical availability and price. Due to rapid growth in popularity and availability, Software-Defined Radio (SDR) technology perfectly meets these requirements.

The essence of SDR is to develop general-purpose radio equipment not tied to a specific communication standard. Thanks to this, SDR has become very popular both among radio amateurs and manufacturers of commercial equipment. Today, SDR is actively used in cellular communications for deployment of GSM, UMTS, and LTE networks.

That said, the idea of using SDR to develop and launch a GSM mobile phone with OsmocomBB is not new. Osmocom developers worked on this but abandoned their efforts. A Swiss laboratory also attempted to do so, unfortunately never advancing beyond the proof-of-concept stage. Nonetheless, we decided to resume work in this direction by implementing support for a new SDR-based hardware platform for OsmocomBB. The platform is identical to the Calypso chipset in terms of backward compatibility, while also being more open to modifications.

The remainder of this article will describe the process of developing the new platform, the problems encountered, and the solutions we found. In the conclusion, we will share our results, limitations of the current implementation, ideas for further development, and advice for how to get OsmocomBB working on SDR.

Project history


As mentioned already, OsmocomBB includes two types of applications: some run on a computer and others are loaded on to the phone as part of alternative firmware. The  both sides interact via osmocon, a small program that connects them to each other through the serial port. Interaction occurs using the simple L1CTL (GSM Layer 1 Control) binary protocol, which supports only three types of messages: request (REQ), conformation (CONF), and indication (IND).

We decided to preserve this structure, as well as the protocol itself, for transparent compatibility with existing applications. This resulted in a new application, trxcon (short for "transceiver connection"), which serves as a bridge among high-level applications (such as mobile and ccch_scan) and the transceiver (a separate application that manages SDR).



The transceiver is a separate program that performs low-level tasks of the GSM physical layer, such as time and frequency synchronization with the network, signal detection and demodulation, and modulation and transmission of the outgoing signal. Among ready-made solutions, there are two suitable projects: OsmoTRX and GR-GSM. The first is an improved modification of the transceiver from the OpenBTS project (it is used by Osmocom projects for running base stations), while the second provides a set of GNU Radio blocks for receiving and decoding GSM signals.

Despite the completeness of its implementation and out-of-the-box support for signal transmission, OsmoTRX with its cocktail of C and C++ will not please the developer who values clean, readable source code. For example, altering a few lines of code in OsmoTRX may require studying the entire class hierarchy, while GR-GSM offers incomparable modularity and freedom of modification.

Nevertheless, OsmoTRX still has a number of advantages. The most important of these are performance, low resource requirements, and small size of executable code and dependencies. All this makes the project fairly friendly to embedding on systems with limited resources. By comparison, GNU Radio looks positively gluttonous. Development targeted OsmoTRX exclusively at first, but ultimately the choice was made to use GR-GSM as transceiver.

To ensure backward compatibility, a TRX interface was implemented in trxcon. This interface is also used in the OsmoTRX, OsmoBTS, and OpenBTS projects. The interface uses three UDP sockets for each connection; each socket has a separate purpose. One of them is the CTRL interface, which allows controlling the transceiver (setting frequency, gain, and so on). The second one is called DATA—as the name implies, it exchanges information that needs to be transmitted (Uplink) or that has already been received (Downlink). The last socket, CLCK, is used to pass on timestamps from the transceiver.



We also implemented a new application, grgsm_trx, for GR-GSM. The application initializes the basic set of blocks (flow graph) and provides the TRX interface for an external control application, which in our case is trxcon. The flow graph initially consisted only of blocks for reception, that is, detection and demodulation of bursts—the smallest information pieces of the GSM physical interface. Every burst that is output by the demodulator is a bit sequence consisting mainly of a payload and midamble that allows the receiver to synchronize with the transmitter but (unlike a preamble) is located in the middle.



At this point in development, high-level applications such as ccch_scan were already able to set the SDR to a certain frequency, launch the synchronization process with the base station, and demodulate the received signal. However, these initial successes were accompanied by difficulties. Since most of the implementation of the OsmocomBB physical layer previously relied on the phone DSP, encoding and decoding of packets according to GSM 05.03 specifications was not implemented separately—it was performed with proprietary code.

The newly implemented transceiver would pass on bursts to the upper layers of the implementation, while the current implementation of the upper layers was expecting LAPDm byte packets (mostly 23 bytes each) from the physical layer. Moreover, the transceiver needed accurate Time Division Multiple Access (TDMA) with the base station, even though high-level applications were completely unaware and transmitted outgoing packets when needed.

To fix this, we implemented a TDMA scheduler that accepts LAPDm packets from high-level applications, encodes them into bursts, and passes them to the transceiver, determining the transmission time using frame and timeslot numbers. The scheduler reassembles bursts arriving from the transceiver, decodes them and passes them on to the upper layers. According to GSM 05.03, coding and decoding involve, respectively, adding redundant data to information bits and then recovering LAPDm packets from those padded sequences by using the Viterbi algorithm.



It may sound confusing, but a similar process of encoding and decoding LAPDm packets takes place both on the mobile phone and on the base station. Fortunately, a free open-source implementation already existed, in the form of Osmocom Base Transceiver Station (OsmoBTS). This project's code related to GSM 05.03 was reworked, documented, and moved to libosmocore (the Osmocom project  core) as a child library called libosmocoding. Thanks to this, many projects—including OsmocomBB, GR-GSM, and OsmoBTS—can all take advantage of this implementation without duplicating code. The TDMA scheduler itself was also implemented in a way similar to OsmoBTS, but taking mobile phone workings into account.

After this, receiving was successful! But the most important feature for the functioning of a mobile phone—data transmission—was still missing. The problem was that initially there were no blocks in GR-GSM for modulating and transmitting the signal. Fortunately, project author Piotr Krysik supported implementation of this functionality and contributed to collaboration for this purpose.

To avoid wasting time while data transmission was being worked on, we came upon a temporary workaround. As it turned out later, this workaround was a very useful solution in its own right—a set of tools for emulating the transceiver as a virtual Um interface. Since both OsmocomBB and OsmoBTS now support the TRX interface, the projects can be easily interconnected: each Downlink burst from the OsmoBTS is passed on to the trxcon application, while every Uplink burst from OsmocomBB is passed on to OsmoBTS. A simple application written in Python, called FakeTRX, allowed running a virtual GSM network without any equipment!



Thanks to this set of tools, a large number of bugs in implementation of the TDMA scheduler were subsequently found and fixed. Support for dedicated channels, such as SDCCH and TCH, was also implemented. The first type of GSM logical channels is mainly used for sending SMS messages, USSD requests and (sometimes) establishing voice calls. The second type is used for voice transmission during a call. The GSM Audio Packet Knife (GAPK) project helped to provide basic support in OsmocomBB for recording and encoding, as well as decoding and reproduction of sound, since this task was previously performed by the phone DSP.


Meanwhile, Piotr Krysik developed and successfully implemented all the missing blocks necessary for signal transmission. Since GSM uses Gaussian Minimum Shift Keying (GMSK) modulation, he used the existing GMSK Modulator block from GNU Radio. However, the main problem was to ensure synchronization with the base station. Each burst must be transmitted on time, that is to say, within the timeslot allocated by the base station. Timing buffers in the TDMA system allow compensating for small discrepancies. The situation was complicated by the lack of an accurate clock generator in most SDR devices, due to which the whole system would tend to drift.



The solution we found involves using the hardware clock of SDR devices, such as those from USRP, based on which incoming bursts are stamped with the current hardware time. By comparing these stamps and the current frame number decoded from the SCH burst, it is possible to perform adjustment and set the exact time for transmitting outgoing information. The only problem was that standard GNU Radio blocks designed for interaction with SDR do not support temporary stamps, so we had to replace the blocks with UHD Source and Sink, restricting support to USRP devices.


As a result, when the transceiver was ready for operation, the time had come to venture beyond the virtual Um interface. But some glitch is always bound to come up during the first trial run of something new—and sure enough, our attempt to run the project on real equipment was unsuccessful. We had not taken into account one aspect of timing in GSM: the time count for the signal transmitted by the phone (Uplink) is specially delayed by three timeslots relative to the received signal (Downlink), which gives phones with a half-duplex communication module the time needed to perform frequency hopping. One small adjustment later, the project was up and running! For the first time, with the help of OsmocomBB and SDR, we could send an SMS message and make a voice call.

Results


Thanks to this work, we managed to build a bridge of sorts between OsmocomBB and SDR transceivers functioning through the Universal Hardware Driver (UHD). We implemented the main components of the GSM physical layer that are necessary for high-level applications, such as ccch_scan, cbch_scan, and mobile. All our work was made available to the public in the OsmocomBB main repository.

Now, by using SDR as a hardware platform for OsmocomBB, it has become possible to run a completely transparent GSM protocol stack that is free of proprietary components that have closed source code, such as a DSP of Calypso-based phones, while also allowing on-the-fly debugging and modification of each component. In addition, developers and researchers gain a number of opportunities, for example:
  • Running the network and phone in other frequency bands (such as 2.4 GHz)
  • Integrating alternative audio codecs (for example, Speex or Opus)
  • Implementing the GPRS / EGPRS stack

The tools for creating a virtual Um interface, which we referred to previously, were also published in the project repository. These tools are useful both for experienced developers (for example, for simulating load levels for various components of cellular network infrastructure and testing their stability) and for novice users, who can begin studying GSM in practice without the need to search for and purchase equipment.

However, the current implementation of the new hardware platform for OsmocomBB still contains certain limitations, most of which come from the SDR technology itself. For example, most available SDR  devices, such as USRP, UmTRX, and LimeSDR, have a relatively small transmit power when compared to the maximum transmit power of ordinary phones. Another implementation gap is lack of support for frequency hopping, which enables the subscriber to simultaneously communicate with multiple base stations at different frequencies, which reduces interference and complicates signal interception. Frequency hopping is implemented on the networks of most operators; in addition, GSM specifications define this technology as mandatory support for every phone. While the problem with signal power can be solved with the help of amplifiers or simply by using a laboratory base station, implementing frequency hopping will require much more effort.

Further development plans include:
  • Supporting physical (non-virtual) SIM cards
  • Introducing support for a wider range of SDR devices
  • Supporting Circuit-Switched Data (CSD)
  • Implementing an embedded transceiver based on OsmoTRX
  • Supporting GPRS / EDGE
The project was also presented at 34th annual Chaos Computer Club conference:



Conclusion: tips for getting started with your own SDR
Here is our advice for how to run GSM on your own SDR. To start with, we suggest experimenting with the virtual Um interface with the help of our TRX Toolkit:


In addition to OsmocomBB, you will need a complete entire set of central network infrastructure components from Osmocom: either OsmoNiTB (Network in The Box) or all components separately, including BTS, BSC, MSC, MGW, and HLR. Instructions for compiling source code can be found on the project website, or you can use ready-made packages for Debian, Ubuntu, or OpenSUSE.

To test the implementation on your own network, you can use any available implementation of the GSM network stack, such as Osmocom, OpenBTS, or YateBTS. Launching your own network requires a separate SDR device or a commercial base station, such as nanoBTS. Because of the described limitations and other possible flaws, we highly recommended to not test the project on actual operator networks!

To build the transceiver, you will need to install GNU Radio and compile a separate branch of the GR-GSM project from source code. For details on installing and using the transceiver, visit the Osmocom project website.

Good luck!

Author: Vadim Yanitskiy, Positive Technologies


We need to talk about IDS signatures

$
0
0

The names Snort and Suricata are known to all who work in the field of network security. WAF and IDS are two classes of security systems that analyze network traffic, parse top-level protocols, and signal the presence of malicious or unwanted network activity. Whereas WAF helps web servers detect and avoid attacks targeted only at them, IDS detects attacks in all network traffic.

Many companies install an IDS to control traffic inside the corporate network. The DPI mechanism lets them collect traffic streams, peer inside packets at the IP, HTTP, DCE/RPC, and other levels, and identify both the exploitation of vulnerabilities and network activity by malware.

At the heart of both systems are signature sets used for detecting known attacks, developed by network security experts and companies worldwide.
We at the @attackdetection team also develop signatures to detect network attacks and malicious activity. Later on in the article, we'll discuss a new approach we discovered that disrupts the operation of Suricata IDS systems, and then hides all trace of such activity.

How does IDS work

Before plunging into the technical details of this IDS bypass technique and the stage at which it is applied, let's refresh our concept of the operating principle behind IDS.



First of all, incoming traffic is divided into TCP, UDP, or other traffic streams, after which the parsers mark and break them down into high-level protocols and their related fields, normalizing them, if required. The decoded, decompressed, and normalized protocol fields are then checked against the signature sets that detect network attack attempts or malicious packets in the network traffic.

Incidentally, the signature sets are the product of numerous individual researchers and companies. Among the vendors are such names as Cisco Talos and Emerging Threats, and the open set of rules currently counts more than 20,000 active signatures.

Common IDS evasion methods

IDS flaws and software errors sometimes mean that attacks go unspotted in network traffic. The following are fairly well-known bypass techniques at the stream-parsing stage:
  • Non-standard fragmentation of packets, including at the IP, TCP, and DCERPC levels, which the IDS is sometimes unable to cope with.
  • Packages with boundary or invalid TTL or MTU values ​​can also be incorrectly processed by the IDS.
  • Ambiguous overlapping of TCP fragments (TCP SYN numbers) can be handled differently by the IDS than on the server or client for which the TCP traffic was intended.
  • For instance, instead of ignoring it, a TCP FIN dummy packet with an invalid checksum (so-called TCP un-sync) can be interpreted as the end of the session.
  • A different timeout time for the TCP session between the IDS and the client can also serve as a tool for hiding attacks.

As for the protocol-parsing and field-normalization stage, many WAF bypass techniques can be applied to an IDS. Here are just some of them:
  • HTTP double-encoding.
  • A Gzip-compressed HTTP packet without a corresponding Content-Encoding header might remain uncompressed at the normalization stage; this technique can sometimes be detected in malware traffic.
  • The use of rare encodings, such as Quoted-Printable for POP3/IMAP, can also render some signatures useless.

And don't forget about bugs specific to every vendor of IDS systems or third-party libraries inside them, which are available on public bug trackers.
One of these specific bugs used to disable signature checks in certain conditions was discovered by the @attackdetection team in Suricata; this error could be exploited to conceal attacks like BadTunnel.

During this attack, the vulnerable client opens an HTML page generated by the attacker, establishing a UDP tunnel through the network perimeter to the attacker's server for ports 137 on both sides. Once the tunnel is established, the attacker is able to spoof names inside the network of the vulnerable client by sending fake responses to NBNS requests. Although three packets went to the attacker's server, it was sufficient to respond to just one of them to establish the tunnel.

The error was due to the fact that since the response to the first UDP packet from the client was an ICMP packet, for example ICMP Destination Unreachable, the imprecise algorithm meant that the stream was verified with signatures only for ICMP. Any further attacks, including name spoofing, remained unspotted by the IDS, as they were carried out on top of the UDP tunnel. Despite the lack of a CVE identifier for this vulnerability, it led to the evasion of IDS security functions.


The above-mentioned bypass techniques are well known and have been eliminated in modern and long-developed IDS systems, while specific bugs and vulnerabilities work only for unpatched versions.

Since our team investigates network security and network attacks, and develops and tests network signatures first hand, we couldn't fail to notice bypassing techniques linked to the signatures themselves and their flaws.

Bypassing signatures

Wait a sec, how can signatures be a problem?

Researchers study emerging threats and form an understanding of how an attack can be detected at the network level on the basis of operational features or other network artifacts, and then translate the resulting picture into one or more signatures in an IDS-friendly language. Due to the limited capabilities of the system or researcher error, some methods of exploiting vulnerabilities remain undetected.

If the protocol and message format of a particular family or generation of malware remain unchanged, and the signatures for them work just fine, then when it comes to exploiting vulnerabilities, the more complex the protocol and its variability, the simpler it is for the attacker to change the exploit with no loss of functionality—and bypass the signatures.
Although you can find many decent signatures from different vendors for the most dangerous and high-profile vulnerabilities, other signatures can be evaded by simple methods. Here's an example of a very common signature error for HTTP: at times it's enough just to change the order of the HTTP GET arguments to bypass a signature check.


And you'd be right to think that substring checks with a fixed order of arguments are encountered in signatures—for example, "?action=checkPort" or 'action=checkPort&port=". All that's needed is to carefully study the signature and check whether it contains such hardcode.

Some other equally complex checking protocols and formats are DNS, HTML, and DCERPC, which all have extremely high variability. Therefore, to cover the signatures of all attack variations and develop not only high-quality but speedy signatures, the developer must possess wide-ranging skills and solid knowledge of network protocols.

The inadequacy of IDS signatures is old hat, and you can find plenty of other opinions in various reports: 1, 2, 3.

How much does a signature weigh

As already mentioned, signature speed is the developer's responsibility, and naturally the more signatures, the more scanning resources are required. The "golden mean" rule recommends adding one CPU per thousand signatures or 500 Mbps network traffic in the case of Suricata.




It depends on the number of signatures and volume of network traffic. Although this formula looks good, it leaves out the fact that signatures can be fast or slow, and traffic can be extremely diverse. So what happens if a slow signature encounters bad traffic?

Suricata is able to log data on the performance of signatures. The log gathers data on the slowest signatures, and generates a list specifying execution time in ticks—CPU time and number of checks performed. The slowest signatures are at the top.


The highlighted signatures are described as slow. The list is constantly updated; different traffic profiles would be sure to list other signatures. This is because signatures generally consist of a subset of simple checks, such as searching for a substring or regular expression arranged in a certain order. When checking a network packet or stream, the signature checks its entire contents for all valid combinations. As such, the tree of checks for one and the same signature can have more or fewer branches, and the execution time will vary depending on the traffic analyzed. One of the developer's tasks, therefore, is to optimize the signature to operate on any kind of traffic.

What happens if the IDS is not properly implemented and not capable of checking all network traffic? Generally, if the load on CPU cores is on average more than 80%, it means the IDS is already starting to skip some packet checks. The higher the load on the cores, the more network traffic checks are skipped, and the greater the chances that malicious activity will go unnoticed.

What if an attempt is made to increase this effect when the signature spends too much time checking network packets? Such an operating scheme would sideline the IDS by forcing it to skip packets and attacks. For starters, we already have a top list of hot signatures on live traffic, and we'll try to amplify the effect on them.

Let's operate

One of these signatures reveals an attempt in the traffic to exploit the vulnerability CVE-2013-0156 RoR YAML Deserialization Code Execution.


All HTTP traffic directed to corporate web servers is checked for the presence of three strings in the strict sequence—"type", 'yaml", '!Ruby"—and checked with a regular expression.

Before we set about generating "bad" traffic, I'll present some hypotheses that might help our investigation:

  • It's easier to find a matching substring than prove there is no such match.
  • For Suricata, checking with a regular expression is slower than searching for a substring.

This means that if we want long checks from a signature, these checks should be unsuccessful and use regular expressions.


In order to get to the regex check, there must be three substrings in the packet one after the other.

Let's try combining them in this order and running the IDS to perform a check. To construct files with HTTP traffic in Pcap format from the text, I used the Cisco Talos file2pcap tool:


Another log, keyword_perf.log, helps us see that the chain of checks successfully made it (content matches—3) to the regular expression (PCRE) and then failed (PCRE matches—0). If later we want to benefit from resource-intensive PCRE checks, we need to completely parse it and pick out some effective traffic.


The task of reverse parsing a regular expression, although easy to do manually, is poorly automated due to such constructions as backreferences or named capture groups: I didn't find any methods at all to automatically select a string for successfully passing a regular expression.


The following construction was the minimum string required for such an expression. To test the theory that an unsuccessful search is more resource-intensive than a successful one, we'll trim the rightmost character from the string and run the regex again.


It turns out that the same principle also applies to regular expressions: the unsuccessful check took more steps than its successful counterpart. In this case, the difference was greater than 50%. You can see this for yourself.

Further study of this regular expression produced another eye-opener. If we repeatedly duplicate the minimum required string without the last character, it is reasonable to expect an increase in the number of steps taken to complete the check, but the growth curve is explosive:


The scan time for several dozen such strings is already around one second, and increasing their number risks a timeout error. This effect in regular expressions is called catastrophic backtracking, and there are many articles devoted to it. Such errors are still encountered in common products; for example, one was recently found in the Apache Struts framework.

Let's take the strings obtained and check them with Suricata:

  KeywordTicks  Checks    Matches
  --------   --------   -------   --------
  content19135  4         3   
  pcre   11807971         0       

However, instead of catastrophic backtracking, the IDS barely notices the load—only 1 million ticks. This is the story of how after debugging and examining the Suricata IDS source code and the libpcre library used inside it, I stumbled upon these PCRE limits:

  • MATCH_LIMIT DEFAULT = 3500
  • MATCH_LIMIT_RECURSION_DEFAULT = 1500

These limits save regular expressions from catastrophic backtracking in many regex libraries. The same limits can be found in WAF, where regex checks predominate. Sure, these limits can be changed in the IDS configuration, but they are propagated by default and changing them isn't recommended.

Using only a regular expression won't help us achieve the desired result. But what if we use the IDS to check a network packet with this content?


In this case, we get the following log values:

KeywordAvg. Ticks  ChecksMatches
--------   ----------  -------  --------
content    3338        7     6   
pcre       12052       3        

There were 4 checks, which became 7 only because of duplication of the initial string. Although the mechanism remains unclear, we should expect the number of checks to snowball if we further duplicate the strings. In the end, I got the following values:

content            1508  1507      
pcre               1492  0       

In total, the number of checks of substrings and regular expressions does not exceed 3000, no matter what content is checked by the signature. Clearly, the IDS itself also has an internal limiter, which goes by the name of inspection-recursion-limit, set by default to that same figure of 3000. With all the PCRE and IDS limits and restrictions on the one-time size of content being checked, by modifying the content and using snowballing regex checks, you get the result you're after:

content3626    1508  1507      
pcre   1587144   1492 

Although the complexity of one regex check has not changed, the number of such checks has shot up to the 1500 mark. Multiplying the number of checks by the average number of clock cycles spent on each check, we get the coveted figure of 3 billion ticks.

 Num  Rule     Avg Ticks
-------- ------------ -----------
1    2016204  3302218139

That's more than a thousand-fold increase! The operation requires only the curl utility for generating the minimum HTTP POST request. It looks something like this:



The minimum set of HTTP fields and HTTP body with a repeating pattern.
Such content cannot be infinitely large so as to cause the IDS to spend vast resources on checking it, since although inside it the TCP segments are joined in a single stream, the stream and the collected HTTP packets are not checked entirely, no matter how big they are. Instead, they are checked in small chunks about 3-4 kilobytes in size. The size of the segments to be checked, as well as the depth of the checks, is set in config (like everything in the IDS). The segment size "wobbles" slightly from launch to launch to avoid fragmentation attacks on such segments—when the attacker, knowing the default segment size, splits the network packets so that the attack is divided into two neighboring segments and cannot be detected by the signature.

So, we just got our hands on a powerful weapon that loads the IDS in excess of 3,000,000,000 CPU ticks per utilization. What does that even mean?

The actual figure obtained is roughly 1 second of average CPU operation. Basically, by sending an HTTP request of size 3 KB, we load the IDS for a full second. The more cores in the IDS, the more data streams it can process simultaneously.


Remember that the IDS does not sit idle and generally spends some resources on monitoring background network traffic, thereby lowering the attack threshold.

Taking metrics on a working IDS configuration with 8/40 Intel Xeon E5-2650 v3 CPU cores (2.30 GHz) without background traffic, where all 8 CPU cores are 100% loaded, the threshold value turns out to be only 250 Kbps. And that's for a system designed to process a multi-gigabit network stream, i.e. thousands of times greater.

To exploit this particular signature, the attacker need only send about 10 HTTP requests per second to the protected web server to gradually fill the network packet queue of the IDS. When the buffer is full up, the packets start to bypass the IDS, which is when the attacker can use any tools or carry out arbitrary attacks while remaining unnoticed by the detection systems. The constant flow of malicious traffic can disable the IDS until the traffic stops bombarding the internal network, while for short-term attacks the attacker can send a short spike from such packets and also blind the detection system for a brief period.

Current mechanisms are unable to detect slow signatures: although IDS has a profiling code, the system cannot distinguish a signature that is merely slow from one that is catastrophically slow, and automatically signal it. Note that signature triggering is not signaled either, due to the lack of relevant content.

Do you remember the unexplained rise in the number of checks? There was indeed an IDS error that led to an increase in the number of superfluous checks. The vulnerability was given the name CVE-2017-15377 and has now been fixed in Suricata IDS 3.2 and 4.0.

The above approach works well for one specific instance of the signature. It is distributed as part of an open signature set and usually is enabled by default, but at the top of the list of hot signatures new ones keep emerging, while others continue waiting for their traffic. The signature description language for Snort and Suricata supplies the developer with many handy tools, such as base64 decoding, content jumping, and mathematical operations. Other combinations of checks can also cause explosive growth in the consumption of resources. Careful monitoring of performance data can be a springboard for exploitation. After the CVE-2017-15377 problem was remedied, we again launched Suricata to check our network traffic and saw exactly the same picture: a list of the hottest signatures at the top of the log, but this time with different numbers. This suggests that such signatures—and ways to exploit them—are numerous.

Not only IDS, but also anti-viruses, WAF, and many other systems are based on signature-search methods. As a result, this approach can be applied to search for weaknesses in their operation. It can stealthily prevent detection systems from doing their job of detecting malicious activity. Related network activity cannot be detected by security tools or anomaly detectors. As an experiment, enable the profiling setting in your detection system—and keep an eye on the top of the performance log.



Author: Kirill Shipulin of the @attackdetection team, Twitter | Telegram

Is your Mobile API under silent attack?

$
0
0

How well protected are your mobile apps? Pretty Secure? What about the mobile API they rely on? This could be the weakest link in 's AppSec armor. Data from Positive Technologies’ customers suggests as much as 15% of all traffic to the average mobile API comes from illegitimate sources.


​​​​​​​Data scraping that attacks your bottom line 


​​​​​​​The more you secure your mobile apps, the more hackers are using automated scripts (bots) and modified or faked apps to bypass them and make direct calls to your mobile API.

Their target may be to scrape your data for competitive purposes, commit fraud, create fake accounts for phishing or spamming, or even to hijack your customers’ genuine accounts. All of these have serious implications for your revenue, costs and reputation.

So how come you’re not aware of this illegitimate traffic? Because your existing security systems probably can’t even see it!

​​​​​​​Get Practical Bot-Mitigation Advice 


On April 12 join our free webinar to learn how you can protect your business from Mobile API bot attacks and prevent damage to your revenue and brand.

Get top tips for bot-mitigation from the Application Security experts at Positive Technologies and software analysis firm CriticalBlue.

They’ll explain why existing security technologies like WAFs and API gateways struggle to identify this kind of fake traffic. And they’ll introduce a new, joint solution combining visionary WAF technology from PT Application Firewall with antibot intelligence from Approov to deliver 360-degree protection for both the mobile and web channels of your business.

This webinar is designed for anyone who is concerned about application security and the risk to their business. We won’t assume you are already an AppSec expert, and you won’t need extensive technical knowledge to benefit from the team’s insights.


Take Part in PHDays 8 Online CTF

$
0
0

Positive Hack Days 8 will start in a couple of days, and we have lots of exciting stuff not only for participants who will visit the event at the venue. Two online contests (HackQuest and Competitive Intelligence) have already been finished, but we have more to come.

Starting from May 15 and until May 22, PHDays online CTF will take place. Everyone can participate, challenges vary in difficulty, and are mainly aimed at beginners, but skilled professionals will find it interesting as well.

There will be a path of 20 challenges covering the OWASP TOP 10 vulnerabilities and representing different attack scenarios. The CTF will be held at Avatao platform.

The link for registration is www.avatao.com/events/phdays2018 (will be active on May 15).

Apple fixed firmware vulnerability found by Positive Technologies

$
0
0
The vulnerability allowed exploiting a critical flaw in Intel Management Engine and still can be present in equipment of vendors that use Intel processors.

Apple released an update for macOS High Sierra 10.13.4, which fixes the firmware vulnerability CVE-2018-4251 found by Positive Technologies experts Maxim Goryachy and Mark Ermolov. For more details, see Apple Support.

Maxim Goryachy notes: "The vulnerability allows an attacker with administrator rights to gain unauthorized access to critical parts of firmware, write a vulnerable version of Intel ME, and exploit it to secretly gain a foothold in the device. Next, it is possible to obtain full control over the computer and spy with no chance of being detected."

Manufacturing Mode

Intel ME has Manufacturing Mode designed to be used exclusively by motherboard manufacturers. This mode provides additional opportunities, and an attacker can gain an advantage over them. The risk imposed by this mode and its impact on Intel МЕ performance was discussed by many researchers, including Positive Technologies experts (How to Become the Sole Owner of Your PC), but numerous manufacturers still do not disable this mode.

When operating in Manufacturing Mode, Intel ME allows performing a specific command, after which ME region becomes writable via the SPI controller built into the motherboard. Having a possibility to run code on the attacked system and send commands to Intel ME, the attacker can rewrite Intel ME firmware to another version, including the version vulnerable to CVE-2017-5705, CVE-2017-5706, and CVE-2017-5707, and execute arbitrary code on Intel ME even if the system is patched.

This mode is enabled in MacBook, as well. Although firmware itself is additionally protected from SPI Flash region rewriting attacks (if access to any region is open, firmware does not allow OS download), researchers found an undocumented command that restarts Intel ME without the main system restart, which allows bypassing this protection. Not only Apple computers can be attacked this way.

Positive Technologies developed a special utility that checks the status of Manufacturing Mode. You can download it using this link. If the check shows that the mode is on, we recommend you to ask your computer's manufacturer for instructions on how to turn off the mode. The utility is designed for system based on Windows and Linux. Apple users only need to install the above mentioned update.

Intel Management Engine 

Intel Management Engine is a microcontroller integrated into the Platform Controller Hub (PCH) microchip with a set of built-in peripherals. PCH manages almost all communication between the processor and peripherals; therefore, Intel ME has access to almost all data on the computer. Researchers found a flaw that allows executing unsigned code inside PCH on any motherboard for Skylake processors and later versions.

The extent of the problem

Vulnerable Intel chipsets are used all over the world, from home and work laptops to enterprise servers. The update previously released by Intel does not prevent exploitation of vulnerabilities CVE-2017-5705, CVE-2017-5706, and CVE-2017-5707, because with write access to ME region, an attacker can write a vulnerable version of МЕ and exploit a vulnerability in it.

Intel patches new ME vulnerabilities

$
0
0

In early July, Intel issued security advisories SA-00112 and SA-00118 regarding fixes for vulnerabilities in Intel Management Engine. Both advisories describe vulnerabilities with which an attacker could execute arbitrary code on the Minute IA PCH microcontroller.

The vulnerabilities are similar to ones previously discovered by Positive Technologies security experts last November (SA-00086). But that was not the end of the story, as Intel has now released fixes for additional vulnerabilities in ME.

What happened?


CVE-2018-3627, the vulnerability at issue in advisory SA-00118, is described as a logic bug (not a buffer overflow) that may allow execution of arbitrary code. Ease of exploitation makes this vulnerability more dangerous than the one in SA-00086, which was locally exploitable only in case of OEM configuration errors; instead, an attacker simply needs local access.

Things are even worse with CVE-2018-3628, which is described in advisory SA-00112. This vulnerability enables full-blown remote code execution in the AMT process of the Management Engine. Moreover, all signs indicate that—unlike CVE-2017-5712 in advisory SA-00086—attackers do not need an AMT administrator account.

Intel characterizes the vulnerability as "Buffer overflow in HTTP handler" allowing remote code execution without authorization. This is the very scenario that used to be the stuff of nightmares for Intel users—and now has come to pass. This vulnerability is similar to CVE-2017-5689, which was found in May 2017 by Embedi, but with even worse consequences.

What now?


Perhaps the only consolation is that for CVE-2018-3628, Intel says that exploitation is possible only from the same subnet.

Positive Technologies plans to study these vulnerabilities more closely in future research. Notably, Intel indicates the same "resolved" firmware versions for the vulnerabilities as for SA-00086. In other words, it is possible that these latest vulnerabilities were found during security review of Intel ME code at the same time as SA-00086, but Intel delayed publication in order to head off the alarm and disruption that could have followed from packing such a large number of critical vulnerabilities in SA-00086.

More on Intel ME security:

Viewing all 198 articles
Browse latest View live