wordpress迁移到hexo填坑记录 (2020.11版)

阮一峰说，喜欢写Blog的人，会经历三个阶段：

第一阶段，刚接触Blog，觉得很新鲜，试着选择一个免费空间来写。

第二阶段，发现免费空间限制太多，就自己购买域名和空间，搭建独立博客。

第三阶段，觉得独立博客的管理太麻烦，最好在保留控制权的前提下，让别人来管，自己只负责写文章。

我现在大概就是处在第三的阶段，一开始我用CSDN写，后来感觉有时候还经常挂掉，页面也不好看，就改成了wordpress；工作后，写blog的时间更少了，而wordpress不支持markdown, 每次我都要用pandoc将markdown转成html，然后在复制进去，维护困难，时间一久就没写blog的兴趣了；而且国外的服务器访问也比较慢（还要花钱），于是近期就转成了hexo，感觉挺好，因此记录下wordpress转成hexo的过程。

转换工具

hexo提供了从WordPress转换的工具hexo-migrator-wordpress，使用的方法如下：

1	hexo migrate wordpress WordPress.2020-09-17.xml --paragraph-fix --import-image --skipduplicate

但是这个插件有一些坑，我做了一些处理：

<pre>标签处理

代码块原来处理可能有些问题，修改blog/node_modules/hexo-migrator-wordpress/migrator.js的代码，添加规则，

tomd.addRule('code_block', {
    filter: 'pre', //(node) => node.nodeName === 'PRE',
    // 'pre' || (node.innerHTML.toString().toLowerCase().indexOf('<pre ') != -1),
    replacement: function (content, node, options) {
      return '\n\n' + options.fence + '\n' + content + '\n' + options.fence + '\n\n';
    }
  });

保留\n

默认turndown会把\n替换为空格，导致原来文章看起来有些奇怪，可以修改turndown的源代码，turndown.cjs.js:

1
2
3

var text = node.data.replace(/[ \r\n\t]+/g, ' ');
修改为
var text = node.data.replace(/[ \r\t]+/g, ' ');

参考https://github.com/domchristie/turndown/issues/264

重写turndown.cjs.js的 escape函数

escape: function (string) {
   return escapes.reduce(function (accumulator, escape) {
     return accumulator.replace(escape[0], function(s) {
       if (!s.startsWith('<pre')) {
         s.replace(escape[0], escape[1]);
       }
       return s;
     })
   }, string)
 }

自定义url

由于我原来的url是自定义的一个，有助于SEO, 而hexo默认文件名为URL，感觉不太适应，因此可以加入自定义的字段：

在posts.push(data)之前加入，hexo调用create函数的时候就会自动的加上该字段：

1	data.custom_url = parseUrl(link).pathname.replace(/\//g, '');

然后在_config.yml中修改permalink字段即可

1	permalink: :custom_url/

修改后的的代码放在了github上：hrwhisper/hexo-migrator-wordpress

图片地址转换

为了有效的管理文章的图片，可以在_config.yml中设置post_asset_folder，在使用hexo migrate wordpress WordPress.2020-09-17.xml --paragraph-fix --import-image --skipduplicate 命令，这样，会将原来blog的图片也下载到source/_post/title_dic下，这样和编辑器typora不兼容，typora无法正常的显示图片！

为了解决这个问题，可以将文章中的路径转换到source/images下，我是这么做的：

1
2
3

mv _post/* images
mv images/*.md _post
python3 image_path_fix.py

image_path_fix这个脚本主要是

识别出图片的地址，然后修改为移动后的地址
对各个文件的图片地址进行get, 看地址是否合法（需要hexo s命令支持）

这个脚本也在 hrwhisper/hexo-migrator-wordpress 这个网址里

评论迁移

原来的博客有很多的评论，换成Hexo如果丢掉那就太可惜了，因此我进行了评论的迁移。

一开始先在wordpress中安装disque插件，然后用https://github.com/taosky/disqus-to-valine的代码导入valine，但是wordpress page的评论丢失了，并且总数上不对（我有900左右的评论，结果导入后剩下了500多。。）

因此，我决定自己写一个：思路如下：

读取mysql的comment的数据
参考了leancloud的API进行保存

主要注意的是，wordpress中回复采用了comment_parent字段，而valine中采用了pid和rid, 这两个id都是Lean cloud评论的objectid，因此按comment_ID从小到大变量，同时保存其在Leancloud上的objectid，然后在comment_parent不为0的时候，写入父节点对应的pid = objectid即可。

另外我有一些文章的url在迁移过程中进行了修改，主要是原来一些中文的url，因此我将转换前后的url保存为一个csv文件，然后读取替换url，csv格式如下：

1	messageboard,留言板,about-me

代码也在 hrwhisper/hexo-migrator-wordpress 仓库中：

from dataclasses import dataclass
import codecs
import pymysql
import leancloud

@dataclass
class CommentData:
    post_name: str
    comment_ID: int
    comment_post_ID: int
    comment_author: str
    comment_author_email: str
    comment_author_url: str
    comment_author_IP: str
    comment_date: str
    comment_date_gmt: str
    comment_content: str
    comment_karma: int
    comment_approved: int
    comment_agent: str
    comment_type: str
    comment_parent: int
    user_id: int
    comment_mail_notify: int

# url may change 
title_change = {}
with codecs.open('title-change.csv', 'r', encoding='utf-8') as f:
    for line in f:
        line = line.strip()
        if line:
            line = line.split(',')
            title_change[line[0]] = line[2]
print(title_change)

# get all comment
db = pymysql.connect("localhost","username","password","wordpress" )
cursor = db.cursor()
cursor.execute("SELECT p.post_name, c.* FROM `wp_comments` c , `wp_posts` p WHERE c.comment_post_ID = p.ID")
results = cursor.fetchall()
comment_list = []
for row in results:
    data = CommentData(*row)
    data.comment_date = data.comment_date.isoformat()
    if data.post_name in title_change:
        data.post_name = title_change[data.post_name]
    comment_list.append(data)
db.close()

"""
{
  "nick": "12515",
  "ip": "39.155.192.81",
  "ACL": {
    "*": {
      "read": true
    }
  },
  "mail": "125125",
  "ua": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.183 Safari/537.36",
  "insertedAt": {
    "__type": "Date",
    "iso": "2020-11-05T15:18:14.128Z"
  },
  "pid": "5fa417827e502a242397edc3",
  "link": "112",
  "comment": "1111\n",
  "url": "/about-me/",
  "QQAvatar": "",
  "rid": "5fa417827e502a242397edc3",
  "objectId": "5fa417b6ab05356ee26b9bfc",
  "createdAt": "2020-11-05T15:18:14.503Z",
  "updatedAt": "2020-11-05T15:18:14.503Z"
}
"""

leancloud.init("APPKEY", "APPKEY")
TestObject = leancloud.Object.extend('Comment')
coment_object_id = {}
comment_list.sort(key=lambda x: x.comment_ID)
for comment in comment_list:
    test_object = TestObject()
    test_object.set('nick', comment.comment_author)
    test_object.set('insertedAt', {
        "__type": "Date",
        "iso": comment.comment_date
    })
    test_object.set('status', 1)
    test_object.set('comment', comment.comment_content)
    test_object.set('comment_id', comment.comment_ID)
    test_object.set('mail', comment.comment_author_email)
    test_object.set('ua', comment.comment_agent)
    test_object.set('ip', comment.comment_author_IP)

    test_object.set('url', '/{}/'.format(comment.post_name))
    test_object.fetch_when_save = True
    if comment.comment_parent != 0:
        test_object.set('pid', coment_object_id[comment.comment_parent])
        test_object.set('rid', coment_object_id[comment.comment_parent])
    try:
        test_object.save()
        coment_object_id[comment.comment_ID] = test_object.get('objectId')
    except leancloud.LeanCloudError as e:
        print('error', e, comment)

mathjax

hexo对mathjax的支持也不是特别的好，还有各种引擎的不同，一一尝试后，发现hexo-renderer-kramed很久没维护，mathjax显示Ok了，但是其他的东西可能就不正常了，最后用了hexo-renderer-pandoc基本解决了问题。

1 2	npm uninstall hexo-renderer-marked --save npm install hexo-renderer-pandoc --save

https

独立的域名怎么开启https呢？

这里主要参考了: 为Github的Hexo博客启用SSL/TLS

主要步骤是：

注册CloudFlare，添加个人网站，获取CLoudFlare提供的Nameservers;
修改自己的域名提供商（如阿里云），把站点的Nameservers服务器修改为CloudFlare提供的Nameservers；
等待CloudFlare添加的网站为激活状态，使用https打开个人网站；

Sitemap

安装下面的插件：

1 2	npm install hexo-generator-sitemap --save npm install hexo-generator-baidu-sitemap --save

然后配置博客根目录下的_config.yml

baidusitemap:
  path: baidusitemap.xml

sitemap:
  path: sitemap.xml

在google和baidu中提交sitemap地址即可，如我的地址是http://hrwhisper.me/sitemap.xml

细语呢喃