精华区文章阅读

发信人: xxxxx (因为寂寞), 信区: Programming
标  题: [合集]请教怎样读取UTF-8编码的文本文件内容？
发信站: 哈工大紫丁香 (2002年02月06日09:54:28 星期三), 站内信件

────────────────────────────────────────
perfect (大睡虫)                     于 2002年01月30日21:37:12 星期三说道:

    各位大侠，怎样才能读出UTF-8编码的文件内容呢？
    文件在用记事本打开的时候，记事本好像可以认出它的编码方式，自动采用UTF-8
    方式打开，这时文件的内容是正确的，但是使用UltraEdit打开时，则内容全是乱码
    我在用C程序读这个文件的内容时，返回的结果同UltraEdit的显示完全一致。
    不知我怎么做才能够正确的通过程序获得文件的内容?

────────────────────────────────────────
Gprs (Gprs)                          于 2002年01月31日19:14:28 星期四说道:

采用unicode方式读写字符串资源就行了。
【在 perfect (大睡虫) 的大作中提到: 】
:     各位大侠，怎样才能读出UTF-8编码的文件内容呢？
:     文件在用记事本打开的时候，记事本好像可以认出它的编码方式，自动采用UTF-8
:     方式打开，这时文件的内容是正确的，但是使用UltraEdit打开时，则内容全是乱码
:     我在用C程序读这个文件的内容时，返回的结果同UltraEdit的显示完全一致。
:     不知我怎么做才能够正确的通过程序获得文件的内容?

────────────────────────────────────────
perfect (大睡虫)                     于 2002年01月31日19:45:59 星期四说道:

  我是想能够正确的显示。
  在win2000下，用记事本编辑一段中文文本，
然后用UTF-8编码方式保存。
  现在，我想在一个程序的编辑框中正确的显示文本文件的内容？
能给我说的具体一些么？
【在 Gprs (Gprs) 的大作中提到: 】
: 采用unicode方式读写字符串资源就行了。
: 【在 perfect (大睡虫) 的大作中提到: 】
: :     各位大侠，怎样才能读出UTF-8编码的文件内容呢？
: :     文件在用记事本打开的时候，记事本好像可以认出它的编码方式，自动采用UTF-8
: :     方式打开，这时文件的内容是正确的，但是使用UltraEdit打开时，则内容全是乱码
: :     我在用C程序读这个文件的内容时，返回的结果同UltraEdit的显示完全一致。
: :     不知我怎么做才能够正确的通过程序获得文件的内容?

────────────────────────────────────────
tianzhihong (天之鸿)                 于 2002年02月01日13:17:20 星期五说道:

网上有现成的utf-8到unicode的转换程序
【在 perfect (大睡虫) 的大作中提到: 】
:   我是想能够正确的显示。
:   在win2000下，用记事本编辑一段中文文本，
: 然后用UTF-8编码方式保存。
:   现在，我想在一个程序的编辑框中正确的显示文本文件的内容？
: 能给我说的具体一些么？
: 【在 Gprs (Gprs) 的大作中提到: 】
: : 采用unicode方式读写字符串资源就行了。

────────────────────────────────────────
Gprs (Gprs)                          于 2002年02月01日18:59:15 星期五说道:

//in VC
#include "stdafx.h"
#include "windows.h"
#include "stdio.h"
#include "TChar.h"  //used for unicode string function
int main(int argc, char* argv[])
{
#define _UNICODE
#define UNICODE
PWSTR s1=L"1234一二三四壹贰叁肆"; //unicode string
char *s3= "1234一二三四壹贰叁肆"; //ansi string
char *s = new char[100];         //empty ansi string
FILE* fp;
int i=0;
fp = fopen("E:\\ut8.txt","r");
if(fp == NULL)
  {
  MessageBox(NULL,"File Open Error","Warning",MB_OK);
   }
  fread(s,1,100,fp);              //get stream from an utf-8 text file
                                  //Now in s full of multibyte stream
  wchar_t  *s2=new wchar_t[100];  //an empty unicode string
  //now we convert ansi string to unicode string
  int rtn=MultiByteToWideChar(CP_UTF8,0,s,strlen(s),s2,strlen(s));
  if(rtn!=0)GetLastError();
  //the following codes show us the results
  //if you use c++ builder,you can do like this:
      // printf("%c\n",WideToAnsi(s2));
      //printf("%s\n",s3);
  //you should include "ulticls.h"
  //convert ansi string to unicode one
  swprintf(s2,L"%s",s3);
  printf("the unicode string s2 is: %s\n",s2);
  //convert unicode string to ansi one
  int ii = wcslen(s1);
  printf("The unicode str size is:%d\n",ii);
  int j=0;
  j=WideCharToMultiByte(CP_ACP,0,s1,ii,s,0,NULL,NULL);
  j=WideCharToMultiByte(CP_ACP,0,s1,ii,s,20,NULL,NULL);
  s[20]='\0';
  printf("j=%d\n",j);
  printf("%s\n",s);
  fclose(fp);
delete s;
  getch();
return 0;
}
#include "stdafx.h"
#include "windows.h"
#include "stdio.h"
#include "TChar.h"  //used for unicode string function
int main(int argc, char* argv[])
{
#define _UNICODE
#define UNICODE
PWSTR s1=L"1234一二三四壹贰叁肆"; //unicode string
char *s3= "1234一二三四壹贰叁肆"; //ansi string
char *s = new char[100];         //empty ansi string
FILE* fp;
int i=0;
fp = fopen("E:\\ut8.txt","r");
if(fp == NULL)
  {
  MessageBox(NULL,"File Open Error","Warning",MB_OK);
   }
  fread(s,1,100,fp);              //get stream from an utf-8 text file
                                  //Now in s full of multibyte stream
  wchar_t  *s2=new wchar_t[100];  //an empty unicode string
  //now we convert ansi string to unicode string
  int rtn=MultiByteToWideChar(CP_UTF8,0,s,-1,s2,0);
  MultiByteToWideChar(CP_UTF8,0,s,-1,s2,rtn);
  s2[rtn]=0;
  //the following codes show us the results
  //if you use c++ builder,you can do like this:
      // printf("%c\n",WideToAnsi(s2));
      //printf("%s\n",s3);
  //you should include "ulticls.h"
  //convert ansi string to unicode one
  //swprintf(s2,L"%s",s3);
  //printf("the unicode string s2 is: %s\n",s2);
  //convert unicode string to ansi one
  int ii = wcslen(s2);
  printf("The unicode str size is:%d\n",ii);
  int j=0;
  j=WideCharToMultiByte(CP_ACP,0,s2,ii,s,0,NULL,NULL);
  WideCharToMultiByte(CP_ACP,0,s2,ii,s,j,NULL,NULL);
  s[j]='\0';
  printf("j=%d\n",j);
  printf("%s\n",s);
  fclose(fp);
delete s;
   return 0;
}
//in CB
//----------------------------------------------------------------------
----
-
#include <vcl.h>
#pragma hdrstop
#include "stdio.h"
#include "conio.h"
//----------------------------------------------------------------------
----
-
#pragma argsused
#include "utilcls.h"
int main(int argc, char* argv[])
{
#define _UNICODE
#define UNICODE
String s3= "1234一二三四壹贰叁肆";
char *s = new char[100];
WCHAR w;
FILE* fp;
int i=0;
fp = fopen("E:\\ut8.txt","r");
if(fp == NULL)
  {
  ShowMessage("File Open Error");
   }
  fread(s,100,1,fp);
  WCHAR  *s2 = new WCHAR[100];
  int rtn=MultiByteToWideChar(CP_UTF8,0,s,strlen(s),s2,strlen(s));
  if(rtn!=0)GetLastError();
  printf("%s\n",(char*)(&WideToAnsi(s2)[1])); //do not know why the
first
char is wrong
  printf("%s\n",s3);
  fclose(fp);
delete s;
  getch();
return 0;
}
//----------------------------------------------------------------------
----
-
【在 perfect (大睡虫) 的大作中提到: 】
:     各位大侠，怎样才能读出UTF-8编码的文件内容呢？
:     文件在用记事本打开的时候，记事本好像可以认出它的编码方式，自动采用UTF-8
:     方式打开，这时文件的内容是正确的，但是使用UltraEdit打开时，则内容全是乱码
:     我在用C程序读这个文件的内容时，返回的结果同UltraEdit的显示完全一致。
:     不知我怎么做才能够正确的通过程序获得文件的内容?

────────────────────────────────────────
Gprs (Gprs)                          于 2002年02月01日18:59:49 星期五说道:

msdn中看来的，希望能有帮助
【在 Gprs (Gprs) 的大作中提到: 】
: //in VC
: #include "stdafx.h"
: #include "windows.h"
: #include "stdio.h"
: #include "TChar.h"  //used for unicode string function
: int main(int argc, char* argv[])
: {
:  #define _UNICODE
:  #define UNICODE

────────────────────────────────────────
perfect (大睡虫)                     于 2002年02月03日14:34:16 星期天说道:

  多谢指教！问题已经解决了！
  对于UNICODE到ANSI编码的转换，我也知道是可以通过
  WideCharToMultiByte()和MultiByteToWideChar()的转换。
  但是UTF-8格式的编码同MS的UNICODE还是有一定区别的。
  我感觉MS的UNICODE应该对应的是UTF-16吧！即它是一种定长格式
  的UNICODE格式。而UTF-8则是一种变长的UNICODE。因此，
  要想使用WideCharToMultiByte()和MultiByteToWideChar()，就必
  须把UTF-8变成UTF-16。我在MSDN中找到了一个LDAP的函数，
  LDAPUTF-8T0UNIICODE。它可以把UTF-8转成UNICODE，然后就可以
  在转成ANSI了。
【在 Gprs (Gprs) 的大作中提到: 】
: msdn中看来的，希望能有帮助
: 【在 Gprs (Gprs) 的大作中提到: 】
: : //in VC
: : #include "stdafx.h"
: : #include "windows.h"
: : #include "stdio.h"
: : #include "TChar.h"  //used for unicode string function
: : int main(int argc, char* argv[])
: : {
: :  #define _UNICODE

────────────────────────────────────────

Programming 版 (精华区)