Big Data/ Hadoop Developer Resume
Austin, TX
SUMMARY:
- Overall 6 years of professional IT work experience in Analysis, Design, Development, Deployment and Maintenance of critical software and big data applications.
- 3+ years of hands - on experience across Hadoop ecosystem that includes extensive experience into Big Data technologies like MapReduce, YARN, HDFS, HBase, Oozie, Hive, Sqoop, Pig, ZooKeeper and Flume.
- In-depth knowledge in HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and Map-Reduce programming.
- Expertise in converting Map Reduce programs into Spark transformations using Spark RDD's.
- Good business management knowledge, including business / organizational and operational design principles, customer and stakeholder management
- Build distributed, scalable, and reliable data pipelines that ingest and process data at scale and in real-time
- Configured Spark streaming to receive real-time data from the Kafka and store the stream data to HDFS using Scala .
- Experience in implementing Real-Time event processing and analytics using messaging systems like Spark Streaming.
- Excellent understanding of Hadoop Architecture and underlying Hadoop framework including Storage Management.
- Hands on experience in installing, configuring, and using Hadoop components like Hadoop Map Reduce, HDFS, HBase 1.3.0, Hive 2.1.1, Sqoop 1.99.7 and Flume 1.7.0.
- Managed data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Experience in analyzing data using HiveQL 2.1.1 and custom MapReduce programs in Java.
- Experience in working with NoSQL databases like Impala 2.7.0, HBase 1.3.0
- Hands on experience in Linux Shell Scripting. Worked with Big Data distributions Cloudera.
- Expert in writing complicated SQL Queries and database analysis for good performance.
- Excellent analytical, Interpersonal and Communication skills, fast learner, hardworking and good team player.
- Experience in writing stored procedures and complex SQL queries using relational databases like Oracle, SQL Server, and MySQL.
- Experience in Extraction, Transformation and Loading (ETL) of data from multiple sources like Flat files, XML files, and Databases.
- Supported various reporting teams and experience with data visualization tool Tableau.
- Implemented Data Quality in ETL Tool Talend and having good knowledge in Data Warehousing and
- ETL Tools like IBM DataStage, Informatica and Talend.
- Experienced and in-depth knowledge of cloud integration with AWS using Elastic Map Reduce (EMR), Simple Storage Service (S3), EC2, Redshift and Microsoft Azure.
- Detailed understanding of Software Development Life Cycle (SDLC) and strong knowledge in project implementation methodologies like Waterfall and Agile.
TECHNICAL SKILLS:
Languages: Python, R, PL/SQL, Java, HiveQL, Pig Latin, Scala
Hadoop Ecosystem: HDFS, YARN, Scala, Map Reduce Hive, Pig, Zookeeper, Sqoop, Oozie, Flume, Kafka, Impala, MongoDB, and HBase.
Databases: Oracle, MS-SQL Server, MySQL, NoSQL (HBase, MongoDB).
Tools: Eclipse, NetBeans, Talend.
Hadoop Platforms: Cloudera, Amazon Web services (AWS).
Operating Systems: Windows XP/2000/NT, Linux, UNIX.
Amazon Web Services: Redshift, EMR, EC2, S3, RDS, Cloud Search, Data Pipeline, Lambda.
Version Control: GitHub, SVN, CVS.
Packages: MS Office Suite, MS Vision, MS Project Professional.
PROFESSIONAL EXPERIENCE:
Confidential, Austin, TX
Big Data/ Hadoop Developer
Responsibilities:
- Developed real time data processing applications by using Scala and Python and implemented Apache Spark Streaming from various streaming source like Kafka.
- Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
- Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.
- Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily.
- Implemented Spark SQL to access hive tables into spark for faster processing of data.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Developed Map Reduce jobs using Map Reduce Java API and HIVEQL.
- Developed UDF, UDAF, UDTF functions and implemented it in HIVE Queries.
- Developing Scripts and Batch Job to schedule a bundle (group of coordinators) which consists of various Hadoop Programs using Oozie.
- Experience in using Avro data serialization system to handle Avro data files in map reduces programs.
- Experienced in optimizing Hive queries, joins to handle different data sets.
- Configured Oozie schedulers to handle different Hadoop actions on timely basis.
- Involved in ETL, Data Integration and Migration by writing pig scripts.
- Used different file formats like Text files, Sequence Files, Avro using Hive SerDe's.
- Integrated Hadoop with Solr and implement search algorithms.
- Experience in Storm for handling realtime processing.
- Hands on Experience working in Cloudera distribution.
- Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts.
- Worked hands on No-SQL databases like MongoDB for POC purpose in storing images and URIs.
- Designed and implemented MongoDB and associated Restful web service.
- Worked on analyzing and examining customer behavioral data using MongoDB.
- Designed the data aggregations on Hive for ETL processing on Amazon EMR to process data as per business requirement
- Involved in writing test cases and implement test classes using MRUnit and mocking frameworks.
- Developed Sqoop scripts to extract the data from MYSQL and load into HDFS.
- Setup EMR to process huge data which is stored in Amazon S3.
- Experience in processing large volume of data and skills in parallel execution of process using Talend functionality.
- Used Talend tool to create workflows for processing data from multiple source systems.
Environment: MapReduce, HDFS, Sqoop, LINUX, Oozie, Hadoop, Pig, Hive, Solr, Spark Streaming, Kafka, Storm, Spark, Scala, Python, MongoDB, Hadoop Cluster, Amazon Web Services, Talend.
Confidential, Dallas, TX
Big Data Engineer
Responsibilities:
- Developed solutions to process data into HDFS (Hadoop Distributed File System), process within Hadoop and emit the summary results from Hadoop to downstream systems.
- Developed a Wrapper Script around Teradata connector for Hadoop TCD to support option parameters.
- Used Sqoop extensively to ingest data from various source systems into HDFS.
- Hive was used to produce results quickly based on the report that was requested.
- Played a major role in working with the team to leverage Sqoop for extracting data from Teradata.
- Imported data from different relational data sources like Oracle, Teradata to HDFS using Sqoop.
- Integrated Hive server 2 with Tableau using Cloudera Hive ODBC driver, for auto generation of Hive queries for non-technical business user.
- Integrated multiple sources data (SQL Server, DB2, TD) into Hadoop cluster and analyzed data by Hive-HBase integration.
- Involved in Hive-Hbase integration by creating hive external tables and specifying storage as Hbase format.
- Implemented Data Validation using MapReduce programs to remove un-necessary records before move data into Hive tables.
- Developed PIG UDFs for the needed functionality such as custom Pigs loader known as timestamp loader.
- Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts.
- Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV etc.
- Oozie and Zookeeper were used to automate the flow of jobs and coordination in the cluster respectively.
- Involved in moving log files generated from various sources to HDFS for further processing through Flume.
- Worked on different file formats like Text files, Parquet, Sequence Files, Avro, Record columnar files (RC).
- Developed several shell scripts, which acts as wrapper to start these Hadoop jobs and set the configuration parameters.
- Kerberos security was implemented to safeguard the cluster.
- Worked on a stand-alone as well as a distributed Hadoop application.
- Tested the performance of the data sets on various NoSQL databases.
- Understood complex data structures of different type (structured, semi structured) and de-normalizing for storage in Hadoop.
Environment: Hadoop, HDFS, Pig, Flume, Hive, MapReduce, Sqoop, Oozie, Zookeeper, HBase, Java Eclipse, SQL Server, Shell Scripting.
Confidential
Data Engineer
Responsibilities:
- Involved in end to end design, development, Integration and testing of historical data load and Incremental data load.
- Writing COPY scripts to load data from Amazon S3 to Redshift.
- Writing the UNLOAD scripts to unload data from Redshift tables to S3 buckets based on the client requirement.
- Automating the Incremental data load from source to tables using data pipelines.
- Writing shell scripts for file level operations and others.
- Validating the sample data loaded in Redshift through count, column and row level.
- Internally Trained on Hadoop Eco System and actively participated in installation of Hadoop on cluster of 24 nodes.
- Involved in the requirement analysis phase and Installing Hadoop and Setting up Hadoop cluster.
- Loading data into HDFS and Transforming large data sets from RDBMS to HDFS by using the Sqoop tool.
- Involved in flume agent setups for data collection from HTTP source to HDFS sink.
- Hadoop Shell commands, Writing Map reduce Programs, Verifying the Hadoop Log Files.
- Written Map Reduce Programs in Java and Pig Scripts for data processing on HDFS and created structured data.
- Created Hive External Partitioned Tables and Hive UDFs. Actively involved in configuration of Hive server integration with Tableau
- Have knowledge on scheduling Hive jobs through Azkaban Scheduler, for weekly and monthly basis.
- Written Unix Shell scripts and cron jobs for scheduling pig scripts.
Confidential
Support Analyst
Responsibilities:
- Debugging the stored procedures.
- Creating and developing the SSIS packages.
- Writing stored procedures for implementing the business logic.
- Writing views to get the desired data outputs.
- Monitoring the jobs in Maestro.
- Resolving the issues in packages causing job failures.
- Following the process for IT alerts.
- Communicating with other teams for resolving
- Support the multiple application issues.
- Batch status updates.
- Attending the end user calls and providing the solutions.