We provide IT Staff Augmentation Services!

Hadoop Developer Resume

SUMMARY

  • 3 years of experience in setting up Big Data environment. Provided proof of concept using HDFS, Map Reduce, Pig, Hive, HBase, and Sqoop.
  • Extensively worked on capacity planning for Hadoop cluster and setting up complete environment.
  • Excellent in writing shell Scripts and Sqoop scripts to migrate data in and out of HDFS and Spark, Pig and Hive scripts for data processing.
  • Experience in supporting data analysis projects by using Elastic MapReduce on the Amazon Web Services (AWS) cloud. Performed Export and import of data into S3.
  • Capable of processing large sets of structured, semi structured, unstructured data and supporting systems application architecture.
  • Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining.
  • Experience optimizing ETL workflows.
  • Good team player, strong interpersonal and communication skills combined with self - motivation, initiative and the ability to think outside of the box

TECHNICAL SKILLS

Big Data Ecosystem: Hadoop, MapReduce, HDFS, HBase, Hive, Pig, Sqoop, Spark, Python, Cloudera 6.2.x

ETL Tools: SSIS, Knime

Languages: Spark, Python, C/C++

Data Management: MS SQL, PL/SQL, NoSQL, Oracle, Hadoop, Sqoop, Hive

Web Technologies: HTML, CSS

Visualization/Others: MS EXCEL, Minitab, KNIME

Tools: Eclipse IDE, Pycharm, MS Visio, SQL Server Management Studio, Minitab, Excel

Platforms: Windows, Linux, VMware

Unix Tools: Shell scripts

PROFESSIONAL EXPERIENCE

Confidential

Hadoop developer

Responsibilities:

  • Configured Scoop Client to import of data into HDFS from external Teradata / Oracle relational database management system
  • Involved in Data modeling sessions to develop models for Hive tables
  • Created HBase tables to load large sets of structured, semi structured and unstructured data coming from NoSQL and a variety of portfolios.
  • Worked on Hive partition and bucketing concepts and created hive External and Internal tables with Hive partition
  • Worked on developing applications in Hadoop Big Data Technologies-Pig, Hive, Spark python
  • Designed and Developed batch processing of Data ingested from various sources using Apache Sparks. Extensively worked with Spark SQL.
  • Designed and Developed Data Pipelines (ETL processes) in Apache Spark using Spark SQL and Spark Streaming
  • Created and Managed storage format strategies and Data Partitioning strategies for Hive tables
  • Involved in Loading process into the Hadoop distributed File System and Pig in order to preprocess the data
  • Collaborated with infrastructure, network, database, application and BI teams to ensure data quality and availability
  • Created reports for BI team using Sqoop to export data into HDFS and Hive
  • Designed and Developed data ingestion framework to ingest streaming log and reference data from Customer Center Applications and Databases into HDFS using Sqoop and Spark Streaming
  • Worked on loading data from Cassandra Database to HDFS Configure Hadoop environment in cloud through Amazon Web Services (AWS) and to provide a scalable distributed data solution
  • Having experience on spark performance tuning Options
  • Experience working with Spark analytics using spark SQL
  • Administrator for Pig, Hive and Hbase installing updates, patches and upgrades

Environment: Hadoop Ecosystem, Spark, Python, Pig, Hive, Cloudera 6.2.x, Sqoop, Oracle sql server, Unix

Confidential

Hadoop developer

Responsibilities:

  • Loading data from RDBMS systems to HDFS using Sqoop
  • Monitored and fine-tuned Map-Reduce programs running on the cluster
  • Solved performance issues in Hive and pig with understanding of Joins, Group and aggregation and how does it transfer to Map-Reduce
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Designed and Developed Data Pipelines (ETL processes) in Apache Spark using Spark SQL and Spark Streaming
  • Extended Hive using custom scripts for data transformation and writing User Defined Functions
  • Extensively worked own troubleshooting Hive jobs
  • Used Flume to collect, aggregate and push log data from different log servers
  • Supported code/design analysis, strategy development and project planning
  • Developed multiple MapReduce jobs in python for data cleaning and perspective
  • Managed and reviewed Hadoop log files
  • Tested raw data and executed performance scripts
  • Shared responsibility for administration of Hadoop, Hive, Sqoop and Spark, Python
  • Experienced in defining job flows

Environment: Hadoop Ecosystem, Spark, Python, Pig, Hive, Cloudera 6.2.x, Sqoop, Microsoft SQL Server (SSIS,SSRS, SSAS), SQL,PL/SQL

Hire Now