We provide IT Staff Augmentation Services!

Hadoop Database & Analysis En, Research & Development Resume

4.00/5 (Submit Your Rating)

San Francisco, CA

SUMMARY

  • 2+ Years of experience which including big data, analysis and cloud computing
  • 3+ Years experience in Java Development on Unix Platform
  • Experienced with agile environment, familiarity and experience in different phases of software development life cycle
  • Experienced both Linux and Windows environment
  • Grasp new technologies quickly
  • Strong teamwork spirit and communication skills
  • Hands on experience in building and configuring the Hadoop environment
  • Constantly install, test and certify the interoperability of multiple and partner products
  • Strong Hadoop and ETL development experience
  • Data Driven Professional with hands on experience in Hadoop development and administration
  • Experience in HDFS, MapReduce, Hbase, Hive, Pig, Sqoop, Cloudera HUE
  • Expertise in writing HIVE queries & Pig scripts
  • Hands on experience in writing Map Reduce jobs and Hive & Pig UDFs program
  • Experience on load/extract data from Hbase and configure Hbase cluster
  • Experience in performance tuning, storage capacity management and high availability of Hadoop and Hbase cluster
  • Manage and execute complex, multi - job test plans and procedures
  • Strong in mathematics, specifically statistics
  • Deep understanding computing and storage system on the hardware side

TECHNICAL SKILLS

Programming: Java, C++, C, VB, Unix shell scripting, PowerShell script, SQL, Matlab

Database: MySQL, SQL Server

NoSQL Databases: Hbase, MongoDB, Cassandra

Hadoop Ecosystem: HDFS, Hbase, MongoDB, Hive, Pig, Flume, Sqoop, Zookeeper, Avro, Ooize

Cloud Computing: Amazon Web Service, EC2, S3, Microsoft Azure

Source safe: TFS, SVN, Maven

PROFESSIONAL EXPERIENCE

Confidential, San Francisco, CA

Hadoop Database & Analysis En, Research & Development

Responsibilities:

  • Involved in functional requirement review. Worked closely with product manager and bossiness analyst
  • Experienced with agile environment use TeamPulse project management software
  • Design & replace multiple Dashboard using web form and MVC framework, extract data from SQL server. Making POC using Telerik API, PowerBI and Birst BI platform
  • Setup and configured Cloudera Manager and Navigator on Microsoft Azure VMs from scratch. Setup Kerberos(MIT) encryption using windows server 2012. Writing Powershell Script to realize Cloudera Manager installation automation on the Azure Cloud
  • Take part in DevOps team to push Data package into different environment from Visual Studio Team Foundation Server (TFS) to Octopus Deploy (deployment automation tool) by writing PowerShell script
  • Take part in Database design includes: Member record matching, Change log Tracking, Interface table design
  • Extensively writing complex Stored Procedures and queries to realize ETL process from raw data (XML or JSON file) to different databases. Using Dell Boomi to convert data format and writing stored procedures to verify data (by checking data types, transaction types, nulls and duplicates), move data through staging database to target databases using SQL server management studio 2012 Enterprise version
  • Designed, delpoyed and monitored SSIS packages and jobs for the ETL stored procedures based on business requirements
  • Gained good experience on working with start-up company

Environment: SQL server, Visual Studio, TFS, Teampulse, Octopus Deploy, PowerShell, C#, VB, Javascript, Cloudera CDH5.3

Confidential, Woburn, MA

Hadoop Developer (Intern)

Responsibilities:

  • Involved in functional requirement review. Worked closely with Risk & Compliance Team and BA
  • Setup and configured Hadoop cluster consisting of 20 nodes use Cloudera manager
  • Designed and configured Flume servers and Kafka to collect data from the all kinds of sources and store to HDFS
  • Set up data ETL’s and pipelines to transfer data from RDBMS to HDFS and vice versa using Sqoop
  • Create custom Hadoop Map-reduce jobs for cleaning and pre-processing click-streaming data to create training data for machine learning
  • Actively involved in working with Hadoop Administration team to debugging various slow running jobs and doing the necessary optimization (RPC, network bandwidth, Hbase filters, Hive joins)
  • Creating Hive tables and working on them using Hive QL, which involve hive table pattern design, query optimization and data import and export to HDFS and MongoDB
  • Did various performance optimization like using distributed cache for small datasets, partition, bucketing inhiveand Map Sidejoins when writing MR jobs
  • Installed Oozie workflow engine to run multipleHiveand Pig jobs which run independently with time and data availability
  • Worked with cloud services like Amazon web services (AWS) and Microsoft Azure
  • Used different file formats like Text files, ORC/RC, XML, JSON
  • Cluster co-ordination services through Zookeeper
  • Avro, Thrift, and protocol buffer
  • Good knowledge on Hadoop Yarn, Spark, Storm architecture
  • Assisted in creating and maintaining Technical documentation to launching Hadoop Clusters and for executingHivequeries and Pig Scripts

Environment: Hadoop 0.20.2 MR1, CDH4.7, HDFS, Hbase 0.90.x, MongoDB 2.2, Hive 0.12.0, Impala, Pig 0.12.0, Flume 1.5.0, Sqoop,Zookeeper, Avro, Ooize 3.3.0

Confidential

Hadoop Developer (Intern)

Responsibilities:

  • Installed and configured Hadoop Yarn environment
  • Importing and exporting data into HDFS and Hive using Sqoop at the daily base
  • Extensively using Hive query language to perform table joins and matching
  • Extensively create, load and query hive tables and constantly debug the unhealthy query
  • Hand on experience working on written Hive UDFs/UDAFs.
  • Experienced in defining job flows use Oozie
  • Experienced in managing and reviewing Hadoop log files and process log files with Storm stream processor
  • Load and transform large sets of structured, semi structured and unstructured data use Hbase
  • Responsible to manage data coming from different sources, hands on experience on writing JDBC and ODBC driver using core Java language
  • Involved in loading data from UNIX file system to HDFS through FTP sever periodically using Linux command, setup and configure the authentication to different users and groups
  • Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc.

Environment: Hadoop Yarn, HDFS, Hive, Hbase, Oozie, AWS, CDH 4.7, Java, Scala

We'd love your feedback!