Skip to main content

DeAgent Development Log 1

· 6 min read

Currently, large models generally lack deep reasoning capabilities in the Web3 domain. In the DeAgent framework, the large model plays the most crucial role in the Lobe function. Most teams address this issue by either using RAG (Retrieval-Augmented Generation) or adding Web3 data during the fine-tuning phase. However, these methods do not significantly enhance the Lobe's trading and decision-making abilities in the Web3 space. To solve this problem, we have designed a Web3-specific transaction Playground that integrates both Onchain and OffChain environments.

I. Data Collection

(1) On-chain Data

1. Mainstream Token Transaction Data Domain

  • Data Sources: Obtain transaction data for mainstream tokens (e.g., BTC, ETH, USDT, etc.) via blockchain explorers (such as Etherscan).
  • Data Content: Includes transaction amount, transaction time, transaction addresses, transaction hash, gas fees, etc. For instance, for BTC, collect hourly transaction data for the past five years, including input/output addresses and amounts for each transaction.
  • Data Format: Stored in CSV or JSON format, with fields such as - timestamp, transaction hash, sender address, receiver address, transaction amount, transaction type (regular transaction / contract transaction).

2. Key Moments Macro Data

  • Data Sources: Gather macro data on key moments from major blockchain-related news outlets (e.g., CoinDesk, Coindance) and social media platforms (e.g., Twitter). For example, collect relevant reports during Ethereum's successful merge event.
  • Data Content: Includes descriptions of key events, occurrence time, relevant influencing factors, etc. For example, the market reaction before and after Ethereum's merge or the impact of policy changes on the cryptocurrency market.
  • Data Format: Text files storing event descriptions, with fields including event ID, event description, occurrence time, involved tokens, event type (e.g., technical upgrade, policy release).

(2) Off-chain Data

1. Hot Project Meme Twitter Data Domain

  • Data Sources: Use Twitter API to collect tweets related to popular Web3 projects.
  • Data Content: Includes tweet publish time, user ID, tweet content, retweets, likes, replies, etc. For instance, when a new version of Uniswap is released, gather discussions about the new features from users.
  • Data Format: Stored in JSON format, with fields including tweet ID, user ID, publish time, tweet content, retweets, likes, replies.

2. Data of Moments with Obvious Volatility Domain

  • Data Sources: Use financial data APIs (e.g., CoinGecko, Binance API) to collect data for tokens with notable popularity and high trading volume during moments of volatility.
  • Data Content: Includes price fluctuations, trading volume changes, transaction prices, transaction times, and trade direction. For instance, during a market crash, track Bitcoin's price movement from the high to the low.
  • Data Format: Stored in CSV format, with fields including - timestamp, token name, price, trading volume, 24-hour price change.

II. Model Training

(1) Format Training

1. Step-by-Step Thinking Training Domain

  • Training Data Construction: Organize the collected data into training samples in a specific format. For example, construct samples in the following format:
    <observations>
    Current BTC price is $40,000
    The Federal Reserve has announced a 50 basis point interest rate hike
    </observations>
    <thinking>
    The Federal Reserve's interest rate hike could cause funds to flow out of risk assets, potentially leading to a significant negative impact on Bitcoin's short-term price. However, in the long run, the market may gradually absorb this news, and Bitcoin's price might rebound.
    </thinking>
    <action>
    Buy 30%
    </action>
    <observations>
    The current BTC price has dropped to $38,000, and trading volume has surged
    </observations>
    <thinking>
    Despite the price drop, historical data suggests that within 24-48 hours after the interest rate hike announcement, it is often an opportune time to increase Bitcoin positions.
    </thinking>
    <action>
    Buy 50%
    </action>
  • Training Process: Use deepseek V3 or other pre-trained large models as the base model. Input the above-formatted data into the model for supervised fine-tuning. The goal is to teach the model how to make step-by-step decisions based on observed market information and produce corresponding actions (e.g., Buy/Sell).

2. Horizon Expansion Training Domain

  • Gradual Time Window Expansion: Gradually increase the time window of observed data during the training process. For example, start by training the model to make decisions based on only 1 hour of past data, then gradually expand to data from 1 day or 1 week.
  • Multi-Time Scale Integration: Integrate decisions made by the model over different time windows, allowing the model to consider both short-term and long-term market trends.

(2) Trajectory Collection and PRMs & ORM Training

1. Trajectory Collection Domain

  • Data Structured Storage: During interactions between the model and the Playground environment, store the model's thought process, observed data, and actions taken in a trajectory database. For example:
    timestamp: 2024-10-01 09:00:00
    observations: [{"BTC price": 38000, "News": "SEC announcement"}, ...]
    thinking: "Based on the SEC news release time and historical price behavior, I need to further analyze market reactions"
    action: {"buy": 0.3}
    next_observations: [{"BTC price": 37800, "Volume": 10000}, ...]
  • Trajectory Multi-Angle Analysis: Perform multi-dimensional analysis on the collected trajectories, such as the relationship between actions and returns, and the relationship between thinking processes and decision quality.

2. PRMs Training Domain

  • Designing the Reward Function: Based on trajectory data, design a process-based reward model (PRMs). For example, set reward weights for the following parameters:
    profit_loss_weight = 0.6
    thinking_rationality_weight = 0.4
    risk_weight = 0.2
  • Optimizing Reward Model Parameters: Continuously optimize the reward model parameters using reinforcement learning algorithms (e.g., PPO).

3. ORM Training Domain

  • Online Reinforcement Learning Integration: Dynamically combine reinforcement learning algorithms with trajectory data.
  • Memory Optimization Strategy: Optimize the model's memory mechanism to improve its ability to recall past decision-making experiences.

III. Evaluation & Optimization

(1) Evaluation Metrics Domain

1. Trading Profitability Domain

  • Profit Calculation: Calculate the total profit of the model's simulated trading in the Playground environment.
  • Sharpe Ratio: Measure the risk-adjusted return. The formula is: Sharpe Ratio=(Return−Risk-Free Rate)/Standard Deviation.

2. Reasoning Accuracy Domain

  • Thinking Process Quality Rating: Invite financial experts to rate the model's reasoning process based on criteria such as logical coherence and completeness of factors covered.
  • Action-Decision Consistency: Analyze the consistency between the model's actions and its thought process.

(2) Optimization Methods Domain

1. Weight Adjustment Domain

  • Fine-tuning Model Weights: Based on evaluation results, fine-tune the weights of various layers in the model.
  • Transfer Learning: Transfer knowledge from models trained in other domains to the Web3 trading decision model.

2. Reward Function Modification Domain

  • Adding Multiple Reward Factors: Adjust the reward function to include additional influencing factors, such as market liquidity and policy risks.
  • Dynamic Reward Adjustment: Dynamically adjust the reward function parameters based on market volatility.

Through these experimental approaches, we aim to systematically enhance the large model's trading and decision-making capabilities in the Web3 domain.