We provide IT Staff Augmentation Services!

Big Data Developer Resume

5.00/5 (Submit Your Rating)

Deerfield, IL

SUMMARY

  • 7 years of proactive IT experience in Analysis, Design, Development, Implementation, and Testing of software applications which includes an accomplished almost 4+ Years of experience in Big Data, Hadoop and Hadoop Ecosystem
  • Leveraged strong Skills in developing applications involving Big Data technologies like Hadoop, Spark, Elasticsearch, MapReduce, Yarn, Flume, Hive, Pig, Kafka, Storm, Sqoop, HBase, Hortonworks, Cloudera, Mahout, Avro, and Scala
  • Extensively worked on major components of Hadoop Ecosystem like Flume, HBase, Zookeeper, Oozie, Hive, Sqoop, PIG, and YARN
  • Developed various scripts, numerous batch jobs to schedule various Big Data applications
  • Experience in analyzing data using HiveQL, PIG Latin, and custom MapReduce programs in Python
  • Hands on experience in importing and exporting data from different databases like Oracle, MySQL, PostgreSQL, Teradata into HDFS and Hive using Sqoop
  • Extensive experience in collecting and storing stream data like log data in HDFS using Apache Flume.
  • Extensively used MapReduce Design Patterns to solve complex MapReduce programs
  • Developed Hive and PIG queries for data analysis to meet the business requirements
  • Experience in extending Hive and Pig core functionality by writing custom UDFs like UDAFs and UDTFs
  • Experienced implementing Security mechanism for Hive Data
  • Experience wif Hive Queries Performance Tuning
  • Strong experience in architecting real - time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Spark SQL, Flume, Map reduce, Hive etc.
  • Experienced in improving data cleansing process using Pig Latin operations, transformations and join operations
  • Extensive knowledge of NoSQL databases like HBase, Cassandra, MongoDB, and Neo4j
  • Experience wif Oozie job scheduler to schedule Pig jobs to automate loading data into HDFS
  • Good experience in Spark architecture and its integrations like Spark SQL, Data Frames, and Datasets API’s
  • Experience in analyzing and processing the streaming data into HDFS using Kafka wif Spark
  • Ability to perform at a high level, meet deadlines, adaptable to ever-changing priorities
  • Exceptional ability to learn and master new technologies and to deliver outputs in short deadlines
  • Good Interpersonal skills and ability to work as part of a team

TECHNICAL SKILLS

Big data Technologies: HDFS, Sqoop, Flume, MapReduce, Hive, Pig, Yarn, Hue, HBase, Oozie, Zookeeper, Impala, Kafka

Big data Frame Works: HDFS, Spark

Hadoop Distributions: Cloudera CDH4 &5, Hortonworks, Amazon EMR

Programming Languages: Python, Scala, Java, SQL

Databases: RDBMS, Oracle DB, MongoDB, Teradata, HBase, Cassandra, MySQL

Operating Systems: Windows, Unix, CentOS

Scripting Languages: JavaScript, HTML, XML

PROFESSIONAL EXPERIENCE

Confidential

Big Data Developer

Responsibilities:

  • Defining the requirements for data lakes/pipe lines
  • Developed end to end data pipelines
  • Creating the tables in Hive and integrating data between Hive &Spark
  • Performed hive queries by extracting data from Hadoop into Hive
  • Developed python scripts to collect data from source systems and store it on HDFS to run analytics
  • Involved in complete Bigdata flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS
  • Created Partitioned and Bucketed Hive tables in Parquet File Formats wif Snappy compression and tan loaded data into Parquet hive tables from Avro hive tables
  • Developed Spark API to import data into HDFS from Teradata and created Hive tables
  • Developed Spark core and Spark SQL scripts using Scala for faster data processing
  • Developing scripts to perform business transformations on the data using Hive and PIG
  • Developing UDFs in Scala for hive and pig
  • Worked on reading multiple data formats on HDFS using Scala
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala
  • Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, wif Hive and SQL/Teradata
  • Analyzed the SQL scripts and designed the solution to implement using Scala
  • Data analysis through Pig, Map Reduce, Hive
  • Design and develop Data Ingestion component
  • Working wif Hive for partitioning, bucketing of data to improve the performance of data from different kind of data sources
  • Cluster coordination services through Zookeeper
  • Import of data using Sqoop from Oracle to HDFS
  • Import and export of data using Sqoop from or to HDFS and Relational DB Teradata
  • Developed POC on Apache- Spark, and Kafka
  • Implement Flume, Spark, Spark Stream framework for real-time data processing
  • Hands on experience in installing, configuring and using eco-System components like Hadoop MapReduce, HDFS, HBase, Pig, Flume, Hive, and Sqoop
  • Worked on migrating Pig scripts and MapReduce programs to into Spark Data frames API and Spark SQL to improve performance
  • Created Kafka based messaging system to create events for different systems
  • To receive real-time data from the Kafka and store the stream data to HDFS using Spark Streaming
  • Worked wif spark Web UI and HUE to for streaming the data and checking the job status
  • Developed analytical component using Scala, Spark, and Spark Stream

Environment: Hadoop, Sqoop, Hive, Pig, Hue, HBase, Spark, Kafka, Zookeeper, Oracle DB, HDFS

Confidential, Deerfield, IL

Big Data Developer

Responsibilities:

  • Extensively worked on Hive, Pig, Map Reduce, Sqoop, HBase, Oozie in an optimized way of distributed processing
  • Created Partitioning, Bucketing, Map Join, etc. for optimizing the hive queries
  • Responsible for ETL operations on the data using Pig Scripts and developed custom UDFs
  • Finding the solutions to the bottlenecks in high latency hive queries by analyzing log messages
  • Performed operations on data stored in HDFS and other NoSQL databases in both batch-oriented and ad-hoc contexts
  • Used HCatalog for accessing Hive tables through various applications
  • Worked wif Parquet, Avro Data Serialization system to work wif JSON data formats
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS
  • Develop Pig UDFs to pre-process the data for analysis
  • Extensively Used Sqoop to import/export data between RDBMS and Hive tables, incremental imports and created Sqoop jobs for last saved value
  • Collected the log data from web servers and integrated it into HDFS using Flume
  • Implemented POC to migrate map reduce jobs into Spark RDD transformations
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive
  • Developed Scala scripts, UDFs using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into the OLTP system directly or through Sqoop
  • Streamlined Hadoop Jobs and workflow operations using Oozie workflow and scheduled through AUTOSYS on a monthly basis
  • Involved in cluster co-ordination services using Zookeeper
  • Gather requirements and design of data warehouse and data mart entities
  • Conducted peer design and code reviews and extensive documentation of standards, best practices, and ETL procedures

Environment: Hadoop, HDFS, Pig, Hive, Python, Spark, Scala, Cloudera Distribution, HBase, Web Services

Confidential

Hadoop/ETL Developer

Responsibilities:

  • Involved in full life-cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing
  • Developed multiple MapReduce jobs in python for data cleaning and pre-processing
  • Designed Oozie workflows
  • Installed and configured Hive and also written Hive UDFs
  • Involved in Installation of a cluster, monitoring/administration of cluster recovery, capacity planning, and slots configuration
  • Created HBase tables to store variable data formats of PII data coming from different portfolios
  • Implemented best income logic using Pig scripts
  • Imported data from relational databases to hive using Sqoop for visualization and to generate reports for the BI team
  • Supported in setting up QA environment and updating configurations for implementing scripts wif Pig and Sqoop
  • Writing Hadoop MR programs to get the logs and feed into HBase for Analytics purpose
  • Building, packaging and deploying the code to the Hadoop servers
  • Unix Scripting to manage the Hadoop Operation stuff
  • Wrote Stored Procedures, Functions, Packages, and triggers using PL/SQL to implement business rules and processes
  • Extensive testing ETL experience using Informatica 9.x (PowerCenter/ Power Mart; Worked on Informatica PowerCenter tools- Designer, Repository Manager, Workflow Manager, and Workflow Monitor
  • Worked on storm for real-time data processing and aggregation pipelines
  • Used advanced SQL like analytical functions, aggregate functions for mathematical and statistical calculations
  • Optimized SQL used in reports to improve performance dramatically
  • Tuned and optimized the complex SQL queries
  • Worked wif Business users to gather requirements for developing new Reports or changes in the existing Reports

Environment: Hadoop, MapReduce, HDFS, Hive, Python, SQL, PIG, Sqoop, CentOS, Cloudera. Oracle 10g,11g, AutoSys, Shell scripting, MongoDB.OBIEE11g, Informatica 9.x

Confidential

SQL Server Developer

Responsibilities:

  • Actively participated in SDLC processes including requirement gathering, analysis, development, implementation, testing and maintenance
  • Involved in the creation of database objects like tables, views, stored procedures, functions, packages, DB triggers, Indexes
  • Create SQL queries for data retrieval and optimized queries for maximum efficiency using SQL profiler
  • Involved in the development of SQLserver maintenance plan, scheduling jobs, alerts, troubleshooting
  • Migrated data from Oracle, Excel, flat files, MS Access to MS SQLserver using DTS and SSIS
  • Used FTP task, ETL Script task, lookup transformation and Data flow task to load staging databases in SSIS
  • Created sub report, On-demand, custom Ad-hoc reports using SSRS
  • Deployed SSISPackage into Production and used Package configuration to export various package properties to make package environment independent
  • Developed dashboard reports using Reporting Services
  • Responsible for creating datasets using T-SQLand stored procedures
  • Participated in creating reports that deliver data based on stored procedures
  • Identified slow running query and optimization of stored procedures and tested applications for performance, data integrity using SQLProfiler
  • Created Views to reduce database complexities for the end users
  • Created Constraints, Written and executed T-SQLqueries like Stored Procedures and Triggers using SQLServer Management Studio
  • Worked on import & export of data from Text, Excel to SQLServer
  • Contributed from design to implementation and maintenance phase of the application in AGILE environment/methodology

Environment: MS SQLServer, MS SQL Server Reporting Services (SSRS), MS SQLServer Integration Services (SSIS), Team Foundation Server (TFS)

We'd love your feedback!