Skip to content

[LLDB] Add ScalarLiteralNode and literal parsing in DIL #152308

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
3 changes: 3 additions & 0 deletions lldb/docs/dil-expr-lang.ebnf
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@ identifier = ? C99 Identifier ? ;

integer_literal = ? Integer constant: hexademical, decimal, octal, binary ? ;

numeric_literal = ? Integer constant: hexademical, decimal, octal, binary ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use 'integer_literal' here, rather than repeating the integer literal definition?

| ? Floating constant ? ;

register = "$" ? Register name ? ;

nested_name_specifier = type_name "::"
Expand Down
52 changes: 52 additions & 0 deletions lldb/include/lldb/ValueObject/DILAST.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,9 @@ enum class NodeKind {
eArraySubscriptNode,
eBitExtractionNode,
eErrorNode,
eFloatLiteralNode,
eIdentifierNode,
eIntegerLiteralNode,
eMemberOfNode,
eUnaryOpNode,
};
Expand Down Expand Up @@ -178,6 +180,52 @@ class BitFieldExtractionNode : public ASTNode {
int64_t m_last_index;
};

enum class IntegerTypeSuffix { None, Long, LongLong };

class IntegerLiteralNode : public ASTNode {
public:
IntegerLiteralNode(uint32_t location, llvm::APInt value, uint32_t radix,
bool is_unsigned, IntegerTypeSuffix type)
: ASTNode(location, NodeKind::eIntegerLiteralNode),
m_value(std::move(value)), m_radix(radix), m_is_unsigned(is_unsigned),
m_type(type) {}

llvm::Expected<lldb::ValueObjectSP> Accept(Visitor *v) const override;

const llvm::APInt &GetValue() const { return m_value; }
uint32_t GetRadix() const { return m_radix; }
bool IsUnsigned() const { return m_is_unsigned; }
IntegerTypeSuffix GetTypeSuffix() const { return m_type; }

static bool classof(const ASTNode *node) {
return node->GetKind() == NodeKind::eIntegerLiteralNode;
}

private:
llvm::APInt m_value;
uint32_t m_radix;
bool m_is_unsigned;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional, as I don't know if this will look cleaner, but I am tempted to merge m_value and m_is_unsigned into an llvm::APSInt (which is basically an APInt+bool pair)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would prefer these fields represent the parsed suffices, independent from the value itself.

IntegerTypeSuffix m_type;
};

class FloatLiteralNode : public ASTNode {
public:
FloatLiteralNode(uint32_t location, llvm::APFloat value)
: ASTNode(location, NodeKind::eFloatLiteralNode),
m_value(std::move(value)) {}

llvm::Expected<lldb::ValueObjectSP> Accept(Visitor *v) const override;

const llvm::APFloat &GetValue() const { return m_value; }

static bool classof(const ASTNode *node) {
return node->GetKind() == NodeKind::eFloatLiteralNode;
}

private:
llvm::APFloat m_value;
};

/// This class contains one Visit method for each specialized type of
/// DIL AST node. The Visit methods are used to dispatch a DIL AST node to
/// the correct function in the DIL expression evaluator for evaluating that
Expand All @@ -195,6 +243,10 @@ class Visitor {
Visit(const ArraySubscriptNode *node) = 0;
virtual llvm::Expected<lldb::ValueObjectSP>
Visit(const BitFieldExtractionNode *node) = 0;
virtual llvm::Expected<lldb::ValueObjectSP>
Visit(const IntegerLiteralNode *node) = 0;
virtual llvm::Expected<lldb::ValueObjectSP>
Visit(const FloatLiteralNode *node) = 0;
};

} // namespace lldb_private::dil
Expand Down
9 changes: 9 additions & 0 deletions lldb/include/lldb/ValueObject/DILEval.h
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,15 @@ class Interpreter : Visitor {
Visit(const ArraySubscriptNode *node) override;
llvm::Expected<lldb::ValueObjectSP>
Visit(const BitFieldExtractionNode *node) override;
llvm::Expected<lldb::ValueObjectSP>
Visit(const IntegerLiteralNode *node) override;
llvm::Expected<lldb::ValueObjectSP>
Visit(const FloatLiteralNode *node) override;

llvm::Expected<CompilerType>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we leave a comment saying that this is not copied from any particular integer promotion rules, and specific to the DIL DSL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not integer promotion, just type inference from a literal's value and suffixes. Technically, right now it's exactly C99 rules, so I can add a comment that as of now this follows C99 rules.

PickIntegerType(lldb::TypeSystemSP type_system,
std::shared_ptr<ExecutionContextScope> ctx,
const IntegerLiteralNode *literal);

// Used by the interpreter to create objects, perform casts, etc.
lldb::TargetSP m_target;
Expand Down
4 changes: 3 additions & 1 deletion lldb/include/lldb/ValueObject/DILLexer.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,14 @@ class Token {
arrow,
coloncolon,
eof,
floating_constant,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I prefer 'float_constant' to 'floating_constant'. But I don't feel strongly about this.

identifier,
integer_constant,
l_paren,
l_square,
minus,
numeric_constant,
period,
plus,
r_paren,
r_square,
star,
Expand Down
3 changes: 3 additions & 0 deletions lldb/include/lldb/ValueObject/DILParser.h
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,9 @@ class DILParser {
std::string ParseIdExpression();
std::string ParseUnqualifiedId();
std::optional<int64_t> ParseIntegerConstant();
ASTNodeUP ParseNumericLiteral();
ASTNodeUP ParseIntegerLiteral();
ASTNodeUP ParseFloatingPointLiteral();

void BailOut(const std::string &error, uint32_t loc, uint16_t err_len);

Expand Down
9 changes: 9 additions & 0 deletions lldb/source/ValueObject/DILAST.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,13 @@ BitFieldExtractionNode::Accept(Visitor *v) const {
return v->Visit(this);
}

llvm::Expected<lldb::ValueObjectSP>
IntegerLiteralNode::Accept(Visitor *v) const {
return v->Visit(this);
}

llvm::Expected<lldb::ValueObjectSP> FloatLiteralNode::Accept(Visitor *v) const {
return v->Visit(this);
}

} // namespace lldb_private::dil
123 changes: 123 additions & 0 deletions lldb/source/ValueObject/DILEval.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@
//===----------------------------------------------------------------------===//

#include "lldb/ValueObject/DILEval.h"
#include "lldb/Core/Module.h"
#include "lldb/Symbol/CompileUnit.h"
#include "lldb/Symbol/TypeSystem.h"
#include "lldb/Symbol/VariableList.h"
#include "lldb/Target/RegisterContext.h"
#include "lldb/ValueObject/DILAST.h"
Expand Down Expand Up @@ -402,4 +404,125 @@ Interpreter::Visit(const BitFieldExtractionNode *node) {
return child_valobj_sp;
}

static llvm::Expected<lldb::TypeSystemSP>
GetTypeSystemFromCU(std::shared_ptr<StackFrame> ctx) {
SymbolContext symbol_context =
ctx->GetSymbolContext(lldb::eSymbolContextCompUnit);
lldb::LanguageType language = symbol_context.comp_unit->GetLanguage();

symbol_context = ctx->GetSymbolContext(lldb::eSymbolContextModule);
return symbol_context.module_sp->GetTypeSystemForLanguage(language);
}

static CompilerType GetBasicType(lldb::TypeSystemSP type_system,
lldb::BasicType basic_type) {
if (type_system)
if (auto compiler_type = type_system.get()->GetBasicTypeFromAST(basic_type))
return compiler_type;

CompilerType empty_type;
return empty_type;
Comment on lines +419 to +424
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this boils down to

if (type_system)
  return type_system->GetBasicTypeFromAST(basic_type);
return {};

At which point, the only purpose of this function is to check for nullness of the type system. But I don't think you need that as you could check for its validity in the caller early on. After all, without a type system, you're not going to be able to create /any/ constant.

}

llvm::Expected<CompilerType>
Interpreter::PickIntegerType(lldb::TypeSystemSP type_system,
std::shared_ptr<ExecutionContextScope> ctx,
const IntegerLiteralNode *literal) {
// Binary, Octal, Hexadecimal and literals with a U suffix are allowed to be
// an unsigned integer.
bool unsigned_is_allowed = literal->IsUnsigned() || literal->GetRadix() != 10;
llvm::APInt apint = literal->GetValue();

// Try int/unsigned int.
llvm::Expected<uint64_t> int_size =
GetBasicType(type_system, lldb::eBasicTypeInt).GetBitSize(ctx.get());
if (!int_size)
return int_size.takeError();
if (literal->GetTypeSuffix() == IntegerTypeSuffix::None &&
apint.isIntN(*int_size)) {
if (!literal->IsUnsigned() && apint.isIntN(*int_size - 1))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this handle INT_MIN correctly? -128 is fits into a 8 bit signed integer, but you need 8 bits to represent "128". Maybe you need isSignedIntN ?

return GetBasicType(type_system, lldb::eBasicTypeInt);
if (unsigned_is_allowed)
return GetBasicType(type_system, lldb::eBasicTypeUnsignedInt);
}
// Try long/unsigned long.
llvm::Expected<uint64_t> long_size =
GetBasicType(type_system, lldb::eBasicTypeLong).GetBitSize(ctx.get());
if (!long_size)
return long_size.takeError();
if (literal->GetTypeSuffix() != IntegerTypeSuffix::LongLong &&
apint.isIntN(*long_size)) {
if (!literal->IsUnsigned() && apint.isIntN(*long_size - 1))
return GetBasicType(type_system, lldb::eBasicTypeLong);
if (unsigned_is_allowed)
return GetBasicType(type_system, lldb::eBasicTypeUnsignedLong);
}
// Try long long/unsigned long long.
llvm::Expected<uint64_t> long_long_size =
GetBasicType(type_system, lldb::eBasicTypeLongLong).GetBitSize(ctx.get());
if (!long_long_size)
return long_long_size.takeError();
if (apint.isIntN(*long_long_size)) {
if (!literal->IsUnsigned() && apint.isIntN(*long_long_size - 1))
return GetBasicType(type_system, lldb::eBasicTypeLongLong);
// If we still couldn't decide a type, we probably have something that
// does not fit in a signed long long, but has no U suffix. Also known as:
//
// warning: integer literal is too large to be represented in a signed
// integer type, interpreting as unsigned [-Wimplicitly-unsigned-literal]
//
return GetBasicType(type_system, lldb::eBasicTypeUnsignedLongLong);
}
return llvm::make_error<DILDiagnosticError>(
m_expr,
"integer literal is too large to be represented in any integer type",
literal->GetLocation());
}

llvm::Expected<lldb::ValueObjectSP>
Interpreter::Visit(const IntegerLiteralNode *node) {
llvm::Expected<lldb::TypeSystemSP> type_system =
GetTypeSystemFromCU(m_exe_ctx_scope);
if (!type_system)
return type_system.takeError();

llvm::Expected<CompilerType> type =
PickIntegerType(*type_system, m_exe_ctx_scope, node);
if (!type)
return type.takeError();

Scalar scalar = node->GetValue();
// APInt from StringRef::getAsInteger comes with just enough bitwidth to
// hold the value. This adjusts APInt bitwidth to match the compiler type.
llvm::Expected<uint64_t> type_bitsize =
type->GetBitSize(m_exe_ctx_scope.get());
if (!type_bitsize)
return type_bitsize.takeError();
scalar.TruncOrExtendTo(*type_bitsize, false);
return ValueObject::CreateValueObjectFromScalar(m_target, scalar, *type,
"result");
}

llvm::Expected<lldb::ValueObjectSP>
Interpreter::Visit(const FloatLiteralNode *node) {
llvm::Expected<lldb::TypeSystemSP> type_system =
GetTypeSystemFromCU(m_exe_ctx_scope);
if (!type_system)
return type_system.takeError();

bool isFloat =
&node->GetValue().getSemantics() == &llvm::APFloat::IEEEsingle();
lldb::BasicType basic_type =
isFloat ? lldb::eBasicTypeFloat : lldb::eBasicTypeDouble;
CompilerType type = GetBasicType(*type_system, basic_type);

if (!type)
return llvm::make_error<DILDiagnosticError>(
m_expr, "unable to create a const literal", node->GetLocation());

Scalar scalar = node->GetValue();
return ValueObject::CreateValueObjectFromScalar(m_target, scalar, type,
"result");
}

} // namespace lldb_private::dil
59 changes: 44 additions & 15 deletions lldb/source/ValueObject/DILLexer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -28,18 +28,23 @@ llvm::StringRef Token::GetTokenName(Kind kind) {
return "coloncolon";
case Kind::eof:
return "eof";
case Kind::floating_constant:
return "floating_constant";
case Kind::identifier:
return "identifier";
case Kind::integer_constant:
return "integer_constant";
case Kind::l_paren:
return "l_paren";
case Kind::l_square:
return "l_square";
case Kind::minus:
return "minus";
case Kind::numeric_constant:
return "numeric_constant";
case Kind::period:
return "period";
return "l_square";
case Kind::plus:
return "plus";
case Kind::r_paren:
return "r_paren";
case Kind::r_square:
Expand Down Expand Up @@ -70,13 +75,34 @@ static std::optional<llvm::StringRef> IsWord(llvm::StringRef expr,
return candidate;
}

static bool IsNumberBodyChar(char ch) { return IsDigit(ch) || IsLetter(ch); }
static bool IsNumberBodyChar(char ch) {
return IsDigit(ch) || IsLetter(ch) || ch == '.';
}

static std::optional<llvm::StringRef> IsNumber(llvm::StringRef expr,
llvm::StringRef &remainder) {
if (IsDigit(remainder[0])) {
llvm::StringRef number = remainder.take_while(IsNumberBodyChar);
remainder = remainder.drop_front(number.size());
static std::optional<llvm::StringRef> IsNumber(llvm::StringRef &remainder,
bool &isFloat) {
llvm::StringRef tail = remainder;
llvm::StringRef body = tail.take_while(IsNumberBodyChar);
size_t dots = body.count('.');
if (dots > 1 || dots == body.size())
return std::nullopt;
if (IsDigit(body.front()) || (body[0] == '.' && IsDigit(body[1]))) {
isFloat = dots == 1;
char last = body.back();
tail = tail.drop_front(body.size());
if (last == 'e' || last == 'E' || last == 'p' || last == 'P') {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be conditional on isFloat. (hex) integer numbers shouldn't consume the + after them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily, 1e+1 and 0x1p+1 are valid double numbers... and also 1e1 and 0x1p1, which means I have to add more checks to the lexer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I forgot about 1e+1. However, I still don't see how this would prevent 0xe+1 from being parsed as a number (when it should be parsed as an addition of two integers).

if (tail.consume_front("+") || tail.consume_front("-")) {
tail = tail.drop_while(IsNumberBodyChar);
isFloat = true;
}
}
size_t number_length = remainder.size() - tail.size();
llvm::StringRef number = remainder.take_front(number_length);
if (!isFloat) {
isFloat = number.contains('p') || // 0x1p1 = 2.0
(!number.contains('x') && number.contains('e')); // 1e1 = 10.0
}
remainder = remainder.drop_front(number_length);
return number;
}
return std::nullopt;
Expand Down Expand Up @@ -106,18 +132,21 @@ llvm::Expected<Token> DILLexer::Lex(llvm::StringRef expr,
return Token(Token::eof, "", (uint32_t)expr.size());

uint32_t position = cur_pos - expr.begin();
std::optional<llvm::StringRef> maybe_number = IsNumber(expr, remainder);
if (maybe_number)
return Token(Token::numeric_constant, maybe_number->str(), position);
bool isFloat = false;
std::optional<llvm::StringRef> maybe_number = IsNumber(remainder, isFloat);
if (maybe_number) {
auto kind = isFloat ? Token::floating_constant : Token::integer_constant;
return Token(kind, maybe_number->str(), position);
}
std::optional<llvm::StringRef> maybe_word = IsWord(expr, remainder);
if (maybe_word)
return Token(Token::identifier, maybe_word->str(), position);

constexpr std::pair<Token::Kind, const char *> operators[] = {
{Token::amp, "&"}, {Token::arrow, "->"}, {Token::coloncolon, "::"},
{Token::l_paren, "("}, {Token::l_square, "["}, {Token::minus, "-"},
{Token::period, "."}, {Token::r_paren, ")"}, {Token::r_square, "]"},
{Token::star, "*"},
{Token::amp, "&"}, {Token::arrow, "->"}, {Token::coloncolon, "::"},
{Token::l_paren, "("}, {Token::l_square, "["}, {Token::minus, "-"},
{Token::period, "."}, {Token::plus, "+"}, {Token::r_paren, ")"},
{Token::r_square, "]"}, {Token::star, "*"},
};
for (auto [kind, str] : operators) {
if (remainder.consume_front(str))
Expand Down
Loading
Loading