Big Data Developer Resume Deerfield, IL - Hire IT People

SUMMARY

7 years of proactive IT experience in Analysis, Design, Development, Implementation, and Testing of software applications which includes an accomplished almost 4+ Years of experience in Big Data, Hadoop and Hadoop Ecosystem
Leveraged strong Skills in developing applications involving Big Data technologies like Hadoop, Spark, Elasticsearch, MapReduce, Yarn, Flume, Hive, Pig, Kafka, Storm, Sqoop, HBase, Hortonworks, Cloudera, Mahout, Avro, and Scala
Extensively worked on major components of Hadoop Ecosystem like Flume, HBase, Zookeeper, Oozie, Hive, Sqoop, PIG, and YARN
Developed various scripts, numerous batch jobs to schedule various Big Data applications
Experience in analyzing data using HiveQL, PIG Latin, and custom MapReduce programs in Python
Hands on experience in importing and exporting data from different databases like Oracle, MySQL, PostgreSQL, Teradata into HDFS and Hive using Sqoop
Extensive experience in collecting and storing stream data like log data in HDFS using Apache Flume.
Extensively used MapReduce Design Patterns to solve complex MapReduce programs
Developed Hive and PIG queries for data analysis to meet the business requirements
Experience in extending Hive and Pig core functionality by writing custom UDFs like UDAFs and UDTFs
Experienced implementing Security mechanism for Hive Data
Experience wif Hive Queries Performance Tuning
Strong experience in architecting real - time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Spark SQL, Flume, Map reduce, Hive etc.
Experienced in improving data cleansing process using Pig Latin operations, transformations and join operations
Extensive knowledge of NoSQL databases like HBase, Cassandra, MongoDB, and Neo4j
Experience wif Oozie job scheduler to schedule Pig jobs to automate loading data into HDFS
Good experience in Spark architecture and its integrations like Spark SQL, Data Frames, and Datasets API’s
Experience in analyzing and processing the streaming data into HDFS using Kafka wif Spark
Ability to perform at a high level, meet deadlines, adaptable to ever-changing priorities
Exceptional ability to learn and master new technologies and to deliver outputs in short deadlines
Good Interpersonal skills and ability to work as part of a team

TECHNICAL SKILLS

Big data Technologies: HDFS, Sqoop, Flume, MapReduce, Hive, Pig, Yarn, Hue, HBase, Oozie, Zookeeper, Impala, Kafka

Big data Frame Works: HDFS, Spark

Hadoop Distributions: Cloudera CDH4 &5, Hortonworks, Amazon EMR

Programming Languages: Python, Scala, Java, SQL

Databases: RDBMS, Oracle DB, MongoDB, Teradata, HBase, Cassandra, MySQL

Operating Systems: Windows, Unix, CentOS

Scripting Languages: JavaScript, HTML, XML

PROFESSIONAL EXPERIENCE

Confidential

Big Data Developer

Responsibilities:

Defining the requirements for data lakes/pipe lines
Developed end to end data pipelines
Creating the tables in Hive and integrating data between Hive &Spark
Performed hive queries by extracting data from Hadoop into Hive
Developed python scripts to collect data from source systems and store it on HDFS to run analytics
Involved in complete Bigdata flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS
Created Partitioned and Bucketed Hive tables in Parquet File Formats wif Snappy compression and tan loaded data into Parquet hive tables from Avro hive tables
Developed Spark API to import data into HDFS from Teradata and created Hive tables
Developed Spark core and Spark SQL scripts using Scala for faster data processing
Developing scripts to perform business transformations on the data using Hive and PIG
Developing UDFs in Scala for hive and pig
Worked on reading multiple data formats on HDFS using Scala
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala
Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, wif Hive and SQL/Teradata
Analyzed the SQL scripts and designed the solution to implement using Scala
Data analysis through Pig, Map Reduce, Hive
Design and develop Data Ingestion component
Working wif Hive for partitioning, bucketing of data to improve the performance of data from different kind of data sources
Cluster coordination services through Zookeeper
Import of data using Sqoop from Oracle to HDFS
Import and export of data using Sqoop from or to HDFS and Relational DB Teradata
Developed POC on Apache- Spark, and Kafka
Implement Flume, Spark, Spark Stream framework for real-time data processing
Hands on experience in installing, configuring and using eco-System components like Hadoop MapReduce, HDFS, HBase, Pig, Flume, Hive, and Sqoop
Worked on migrating Pig scripts and MapReduce programs to into Spark Data frames API and Spark SQL to improve performance
Created Kafka based messaging system to create events for different systems
To receive real-time data from the Kafka and store the stream data to HDFS using Spark Streaming
Worked wif spark Web UI and HUE to for streaming the data and checking the job status
Developed analytical component using Scala, Spark, and Spark Stream

Environment: Hadoop, Sqoop, Hive, Pig, Hue, HBase, Spark, Kafka, Zookeeper, Oracle DB, HDFS

Confidential, Deerfield, IL

Big Data Developer

Responsibilities:

Extensively worked on Hive, Pig, Map Reduce, Sqoop, HBase, Oozie in an optimized way of distributed processing
Created Partitioning, Bucketing, Map Join, etc. for optimizing the hive queries
Responsible for ETL operations on the data using Pig Scripts and developed custom UDFs
Finding the solutions to the bottlenecks in high latency hive queries by analyzing log messages
Performed operations on data stored in HDFS and other NoSQL databases in both batch-oriented and ad-hoc contexts
Used HCatalog for accessing Hive tables through various applications
Worked wif Parquet, Avro Data Serialization system to work wif JSON data formats
Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS
Develop Pig UDFs to pre-process the data for analysis
Extensively Used Sqoop to import/export data between RDBMS and Hive tables, incremental imports and created Sqoop jobs for last saved value
Collected the log data from web servers and integrated it into HDFS using Flume
Implemented POC to migrate map reduce jobs into Spark RDD transformations
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive
Developed Scala scripts, UDFs using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into the OLTP system directly or through Sqoop
Streamlined Hadoop Jobs and workflow operations using Oozie workflow and scheduled through AUTOSYS on a monthly basis
Involved in cluster co-ordination services using Zookeeper
Gather requirements and design of data warehouse and data mart entities
Conducted peer design and code reviews and extensive documentation of standards, best practices, and ETL procedures

Environment: Hadoop, HDFS, Pig, Hive, Python, Spark, Scala, Cloudera Distribution, HBase, Web Services

Confidential

Hadoop/ETL Developer

Responsibilities:

Involved in full life-cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing
Developed multiple MapReduce jobs in python for data cleaning and pre-processing
Designed Oozie workflows
Installed and configured Hive and also written Hive UDFs
Involved in Installation of a cluster, monitoring/administration of cluster recovery, capacity planning, and slots configuration
Created HBase tables to store variable data formats of PII data coming from different portfolios
Implemented best income logic using Pig scripts
Imported data from relational databases to hive using Sqoop for visualization and to generate reports for the BI team
Supported in setting up QA environment and updating configurations for implementing scripts wif Pig and Sqoop
Writing Hadoop MR programs to get the logs and feed into HBase for Analytics purpose
Building, packaging and deploying the code to the Hadoop servers
Unix Scripting to manage the Hadoop Operation stuff
Wrote Stored Procedures, Functions, Packages, and triggers using PL/SQL to implement business rules and processes
Extensive testing ETL experience using Informatica 9.x (PowerCenter/ Power Mart; Worked on Informatica PowerCenter tools- Designer, Repository Manager, Workflow Manager, and Workflow Monitor
Worked on storm for real-time data processing and aggregation pipelines
Used advanced SQL like analytical functions, aggregate functions for mathematical and statistical calculations
Optimized SQL used in reports to improve performance dramatically
Tuned and optimized the complex SQL queries
Worked wif Business users to gather requirements for developing new Reports or changes in the existing Reports

Environment: Hadoop, MapReduce, HDFS, Hive, Python, SQL, PIG, Sqoop, CentOS, Cloudera. Oracle 10g,11g, AutoSys, Shell scripting, MongoDB.OBIEE11g, Informatica 9.x

Confidential

SQL Server Developer

Responsibilities:

Actively participated in SDLC processes including requirement gathering, analysis, development, implementation, testing and maintenance
Involved in the creation of database objects like tables, views, stored procedures, functions, packages, DB triggers, Indexes
Create SQL queries for data retrieval and optimized queries for maximum efficiency using SQL profiler
Involved in the development of SQLserver maintenance plan, scheduling jobs, alerts, troubleshooting
Migrated data from Oracle, Excel, flat files, MS Access to MS SQLserver using DTS and SSIS
Used FTP task, ETL Script task, lookup transformation and Data flow task to load staging databases in SSIS
Created sub report, On-demand, custom Ad-hoc reports using SSRS
Deployed SSISPackage into Production and used Package configuration to export various package properties to make package environment independent
Developed dashboard reports using Reporting Services
Responsible for creating datasets using T-SQLand stored procedures
Participated in creating reports that deliver data based on stored procedures
Identified slow running query and optimization of stored procedures and tested applications for performance, data integrity using SQLProfiler
Created Views to reduce database complexities for the end users
Created Constraints, Written and executed T-SQLqueries like Stored Procedures and Triggers using SQLServer Management Studio
Worked on import & export of data from Text, Excel to SQLServer
Contributed from design to implementation and maintenance phase of the application in AGILE environment/methodology

Environment: MS SQLServer, MS SQL Server Reporting Services (SSRS), MS SQLServer Integration Services (SSIS), Team Foundation Server (TFS)

We provide IT Staff Augmentation Services!

Big Data Developer Resume

Deerfield, IL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship