Big Data/ Hadoop Developer Resume Austin, TX - Hire IT People

SUMMARY:

Overall 6 years of professional IT work experience in Analysis, Design, Development, Deployment and Maintenance of critical software and big data applications.
3+ years of hands - on experience across Hadoop ecosystem that includes extensive experience into Big Data technologies like MapReduce, YARN, HDFS, HBase, Oozie, Hive, Sqoop, Pig, ZooKeeper and Flume.
In-depth knowledge in HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and Map-Reduce programming.
Expertise in converting Map Reduce programs into Spark transformations using Spark RDD's.
Good business management knowledge, including business / organizational and operational design principles, customer and stakeholder management
Build distributed, scalable, and reliable data pipelines that ingest and process data at scale and in real-time
Configured Spark streaming to receive real-time data from the Kafka and store the stream data to HDFS using Scala .
Experience in implementing Real-Time event processing and analytics using messaging systems like Spark Streaming.
Excellent understanding of Hadoop Architecture and underlying Hadoop framework including Storage Management.
Hands on experience in installing, configuring, and using Hadoop components like Hadoop Map Reduce, HDFS, HBase 1.3.0, Hive 2.1.1, Sqoop 1.99.7 and Flume 1.7.0.
Managed data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
Experience in analyzing data using HiveQL 2.1.1 and custom MapReduce programs in Java.
Experience in working with NoSQL databases like Impala 2.7.0, HBase 1.3.0
Hands on experience in Linux Shell Scripting. Worked with Big Data distributions Cloudera.
Expert in writing complicated SQL Queries and database analysis for good performance.
Excellent analytical, Interpersonal and Communication skills, fast learner, hardworking and good team player.
Experience in writing stored procedures and complex SQL queries using relational databases like Oracle, SQL Server, and MySQL.
Experience in Extraction, Transformation and Loading (ETL) of data from multiple sources like Flat files, XML files, and Databases.
Supported various reporting teams and experience with data visualization tool Tableau.
Implemented Data Quality in ETL Tool Talend and having good knowledge in Data Warehousing and
ETL Tools like IBM DataStage, Informatica and Talend.
Experienced and in-depth knowledge of cloud integration with AWS using Elastic Map Reduce (EMR), Simple Storage Service (S3), EC2, Redshift and Microsoft Azure.
Detailed understanding of Software Development Life Cycle (SDLC) and strong knowledge in project implementation methodologies like Waterfall and Agile.

TECHNICAL SKILLS:

Languages: Python, R, PL/SQL, Java, HiveQL, Pig Latin, Scala

Hadoop Ecosystem: HDFS, YARN, Scala, Map Reduce Hive, Pig, Zookeeper, Sqoop, Oozie, Flume, Kafka, Impala, MongoDB, and HBase.

Databases: Oracle, MS-SQL Server, MySQL, NoSQL (HBase, MongoDB).

Tools: Eclipse, NetBeans, Talend.

Hadoop Platforms: Cloudera, Amazon Web services (AWS).

Operating Systems: Windows XP/2000/NT, Linux, UNIX.

Amazon Web Services: Redshift, EMR, EC2, S3, RDS, Cloud Search, Data Pipeline, Lambda.

Version Control: GitHub, SVN, CVS.

Packages: MS Office Suite, MS Vision, MS Project Professional.

PROFESSIONAL EXPERIENCE:

Confidential, Austin, TX

Big Data/ Hadoop Developer

Responsibilities:

Developed real time data processing applications by using Scala and Python and implemented Apache Spark Streaming from various streaming source like Kafka.
Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.
Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily.
Implemented Spark SQL to access hive tables into spark for faster processing of data.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Developed Map Reduce jobs using Map Reduce Java API and HIVEQL.
Developed UDF, UDAF, UDTF functions and implemented it in HIVE Queries.
Developing Scripts and Batch Job to schedule a bundle (group of coordinators) which consists of various Hadoop Programs using Oozie.
Experience in using Avro data serialization system to handle Avro data files in map reduces programs.
Experienced in optimizing Hive queries, joins to handle different data sets.
Configured Oozie schedulers to handle different Hadoop actions on timely basis.
Involved in ETL, Data Integration and Migration by writing pig scripts.
Used different file formats like Text files, Sequence Files, Avro using Hive SerDe's.
Integrated Hadoop with Solr and implement search algorithms.
Experience in Storm for handling realtime processing.
Hands on Experience working in Cloudera distribution.
Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts.
Worked hands on No-SQL databases like MongoDB for POC purpose in storing images and URIs.
Designed and implemented MongoDB and associated Restful web service.
Worked on analyzing and examining customer behavioral data using MongoDB.
Designed the data aggregations on Hive for ETL processing on Amazon EMR to process data as per business requirement
Involved in writing test cases and implement test classes using MRUnit and mocking frameworks.
Developed Sqoop scripts to extract the data from MYSQL and load into HDFS.
Setup EMR to process huge data which is stored in Amazon S3.
Experience in processing large volume of data and skills in parallel execution of process using Talend functionality.
Used Talend tool to create workflows for processing data from multiple source systems.

Environment: MapReduce, HDFS, Sqoop, LINUX, Oozie, Hadoop, Pig, Hive, Solr, Spark Streaming, Kafka, Storm, Spark, Scala, Python, MongoDB, Hadoop Cluster, Amazon Web Services, Talend.

Confidential, Dallas, TX

Big Data Engineer

Responsibilities:

Developed solutions to process data into HDFS (Hadoop Distributed File System), process within Hadoop and emit the summary results from Hadoop to downstream systems.
Developed a Wrapper Script around Teradata connector for Hadoop TCD to support option parameters.
Used Sqoop extensively to ingest data from various source systems into HDFS.
Hive was used to produce results quickly based on the report that was requested.
Played a major role in working with the team to leverage Sqoop for extracting data from Teradata.
Imported data from different relational data sources like Oracle, Teradata to HDFS using Sqoop.
Integrated Hive server 2 with Tableau using Cloudera Hive ODBC driver, for auto generation of Hive queries for non-technical business user.
Integrated multiple sources data (SQL Server, DB2, TD) into Hadoop cluster and analyzed data by Hive-HBase integration.
Involved in Hive-Hbase integration by creating hive external tables and specifying storage as Hbase format.
Implemented Data Validation using MapReduce programs to remove un-necessary records before move data into Hive tables.
Developed PIG UDFs for the needed functionality such as custom Pigs loader known as timestamp loader.
Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts.
Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV etc.
Oozie and Zookeeper were used to automate the flow of jobs and coordination in the cluster respectively.
Involved in moving log files generated from various sources to HDFS for further processing through Flume.
Worked on different file formats like Text files, Parquet, Sequence Files, Avro, Record columnar files (RC).
Developed several shell scripts, which acts as wrapper to start these Hadoop jobs and set the configuration parameters.
Kerberos security was implemented to safeguard the cluster.
Worked on a stand-alone as well as a distributed Hadoop application.
Tested the performance of the data sets on various NoSQL databases.
Understood complex data structures of different type (structured, semi structured) and de-normalizing for storage in Hadoop.

Environment: Hadoop, HDFS, Pig, Flume, Hive, MapReduce, Sqoop, Oozie, Zookeeper, HBase, Java Eclipse, SQL Server, Shell Scripting.

Confidential

Data Engineer

Responsibilities:

Involved in end to end design, development, Integration and testing of historical data load and Incremental data load.
Writing COPY scripts to load data from Amazon S3 to Redshift.
Writing the UNLOAD scripts to unload data from Redshift tables to S3 buckets based on the client requirement.
Automating the Incremental data load from source to tables using data pipelines.
Writing shell scripts for file level operations and others.
Validating the sample data loaded in Redshift through count, column and row level.
Internally Trained on Hadoop Eco System and actively participated in installation of Hadoop on cluster of 24 nodes.
Involved in the requirement analysis phase and Installing Hadoop and Setting up Hadoop cluster.
Loading data into HDFS and Transforming large data sets from RDBMS to HDFS by using the Sqoop tool.
Involved in flume agent setups for data collection from HTTP source to HDFS sink.
Hadoop Shell commands, Writing Map reduce Programs, Verifying the Hadoop Log Files.
Written Map Reduce Programs in Java and Pig Scripts for data processing on HDFS and created structured data.
Created Hive External Partitioned Tables and Hive UDFs. Actively involved in configuration of Hive server integration with Tableau
Have knowledge on scheduling Hive jobs through Azkaban Scheduler, for weekly and monthly basis.
Written Unix Shell scripts and cron jobs for scheduling pig scripts.

Confidential

Support Analyst

Responsibilities:

Debugging the stored procedures.
Creating and developing the SSIS packages.
Writing stored procedures for implementing the business logic.
Writing views to get the desired data outputs.
Monitoring the jobs in Maestro.
Resolving the issues in packages causing job failures.
Following the process for IT alerts.
Communicating with other teams for resolving
Support the multiple application issues.
Batch status updates.
Attending the end user calls and providing the solutions.

We provide IT Staff Augmentation Services!

Big Data/ Hadoop Developer Resume

Austin, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship