Big Data Developer Resume
Deerfield, IL
SUMMARY
- 7 years of proactive IT experience in Analysis, Design, Development, Implementation, and Testing of software applications which includes an accomplished almost 4+ Years of experience in Big Data, Hadoop and Hadoop Ecosystem
- Leveraged strong Skills in developing applications involving Big Data technologies like Hadoop, Spark, Elasticsearch, MapReduce, Yarn, Flume, Hive, Pig, Kafka, Storm, Sqoop, HBase, Hortonworks, Cloudera, Mahout, Avro, and Scala
- Extensively worked on major components of Hadoop Ecosystem like Flume, HBase, Zookeeper, Oozie, Hive, Sqoop, PIG, and YARN
- Developed various scripts, numerous batch jobs to schedule various Big Data applications
- Experience in analyzing data using HiveQL, PIG Latin, and custom MapReduce programs in Python
- Hands on experience in importing and exporting data from different databases like Oracle, MySQL, PostgreSQL, Teradata into HDFS and Hive using Sqoop
- Extensive experience in collecting and storing stream data like log data in HDFS using Apache Flume.
- Extensively used MapReduce Design Patterns to solve complex MapReduce programs
- Developed Hive and PIG queries for data analysis to meet the business requirements
- Experience in extending Hive and Pig core functionality by writing custom UDFs like UDAFs and UDTFs
- Experienced implementing Security mechanism for Hive Data
- Experience wif Hive Queries Performance Tuning
- Strong experience in architecting real - time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Spark SQL, Flume, Map reduce, Hive etc.
- Experienced in improving data cleansing process using Pig Latin operations, transformations and join operations
- Extensive knowledge of NoSQL databases like HBase, Cassandra, MongoDB, and Neo4j
- Experience wif Oozie job scheduler to schedule Pig jobs to automate loading data into HDFS
- Good experience in Spark architecture and its integrations like Spark SQL, Data Frames, and Datasets API’s
- Experience in analyzing and processing the streaming data into HDFS using Kafka wif Spark
- Ability to perform at a high level, meet deadlines, adaptable to ever-changing priorities
- Exceptional ability to learn and master new technologies and to deliver outputs in short deadlines
- Good Interpersonal skills and ability to work as part of a team
TECHNICAL SKILLS
Big data Technologies: HDFS, Sqoop, Flume, MapReduce, Hive, Pig, Yarn, Hue, HBase, Oozie, Zookeeper, Impala, Kafka
Big data Frame Works: HDFS, Spark
Hadoop Distributions: Cloudera CDH4 &5, Hortonworks, Amazon EMR
Programming Languages: Python, Scala, Java, SQL
Databases: RDBMS, Oracle DB, MongoDB, Teradata, HBase, Cassandra, MySQL
Operating Systems: Windows, Unix, CentOS
Scripting Languages: JavaScript, HTML, XML
PROFESSIONAL EXPERIENCE
Confidential
Big Data Developer
Responsibilities:
- Defining the requirements for data lakes/pipe lines
- Developed end to end data pipelines
- Creating the tables in Hive and integrating data between Hive &Spark
- Performed hive queries by extracting data from Hadoop into Hive
- Developed python scripts to collect data from source systems and store it on HDFS to run analytics
- Involved in complete Bigdata flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS
- Created Partitioned and Bucketed Hive tables in Parquet File Formats wif Snappy compression and tan loaded data into Parquet hive tables from Avro hive tables
- Developed Spark API to import data into HDFS from Teradata and created Hive tables
- Developed Spark core and Spark SQL scripts using Scala for faster data processing
- Developing scripts to perform business transformations on the data using Hive and PIG
- Developing UDFs in Scala for hive and pig
- Worked on reading multiple data formats on HDFS using Scala
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala
- Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, wif Hive and SQL/Teradata
- Analyzed the SQL scripts and designed the solution to implement using Scala
- Data analysis through Pig, Map Reduce, Hive
- Design and develop Data Ingestion component
- Working wif Hive for partitioning, bucketing of data to improve the performance of data from different kind of data sources
- Cluster coordination services through Zookeeper
- Import of data using Sqoop from Oracle to HDFS
- Import and export of data using Sqoop from or to HDFS and Relational DB Teradata
- Developed POC on Apache- Spark, and Kafka
- Implement Flume, Spark, Spark Stream framework for real-time data processing
- Hands on experience in installing, configuring and using eco-System components like Hadoop MapReduce, HDFS, HBase, Pig, Flume, Hive, and Sqoop
- Worked on migrating Pig scripts and MapReduce programs to into Spark Data frames API and Spark SQL to improve performance
- Created Kafka based messaging system to create events for different systems
- To receive real-time data from the Kafka and store the stream data to HDFS using Spark Streaming
- Worked wif spark Web UI and HUE to for streaming the data and checking the job status
- Developed analytical component using Scala, Spark, and Spark Stream
Environment: Hadoop, Sqoop, Hive, Pig, Hue, HBase, Spark, Kafka, Zookeeper, Oracle DB, HDFS
Confidential, Deerfield, IL
Big Data Developer
Responsibilities:
- Extensively worked on Hive, Pig, Map Reduce, Sqoop, HBase, Oozie in an optimized way of distributed processing
- Created Partitioning, Bucketing, Map Join, etc. for optimizing the hive queries
- Responsible for ETL operations on the data using Pig Scripts and developed custom UDFs
- Finding the solutions to the bottlenecks in high latency hive queries by analyzing log messages
- Performed operations on data stored in HDFS and other NoSQL databases in both batch-oriented and ad-hoc contexts
- Used HCatalog for accessing Hive tables through various applications
- Worked wif Parquet, Avro Data Serialization system to work wif JSON data formats
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS
- Develop Pig UDFs to pre-process the data for analysis
- Extensively Used Sqoop to import/export data between RDBMS and Hive tables, incremental imports and created Sqoop jobs for last saved value
- Collected the log data from web servers and integrated it into HDFS using Flume
- Implemented POC to migrate map reduce jobs into Spark RDD transformations
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive
- Developed Scala scripts, UDFs using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into the OLTP system directly or through Sqoop
- Streamlined Hadoop Jobs and workflow operations using Oozie workflow and scheduled through AUTOSYS on a monthly basis
- Involved in cluster co-ordination services using Zookeeper
- Gather requirements and design of data warehouse and data mart entities
- Conducted peer design and code reviews and extensive documentation of standards, best practices, and ETL procedures
Environment: Hadoop, HDFS, Pig, Hive, Python, Spark, Scala, Cloudera Distribution, HBase, Web Services
Confidential
Hadoop/ETL Developer
Responsibilities:
- Involved in full life-cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing
- Developed multiple MapReduce jobs in python for data cleaning and pre-processing
- Designed Oozie workflows
- Installed and configured Hive and also written Hive UDFs
- Involved in Installation of a cluster, monitoring/administration of cluster recovery, capacity planning, and slots configuration
- Created HBase tables to store variable data formats of PII data coming from different portfolios
- Implemented best income logic using Pig scripts
- Imported data from relational databases to hive using Sqoop for visualization and to generate reports for the BI team
- Supported in setting up QA environment and updating configurations for implementing scripts wif Pig and Sqoop
- Writing Hadoop MR programs to get the logs and feed into HBase for Analytics purpose
- Building, packaging and deploying the code to the Hadoop servers
- Unix Scripting to manage the Hadoop Operation stuff
- Wrote Stored Procedures, Functions, Packages, and triggers using PL/SQL to implement business rules and processes
- Extensive testing ETL experience using Informatica 9.x (PowerCenter/ Power Mart; Worked on Informatica PowerCenter tools- Designer, Repository Manager, Workflow Manager, and Workflow Monitor
- Worked on storm for real-time data processing and aggregation pipelines
- Used advanced SQL like analytical functions, aggregate functions for mathematical and statistical calculations
- Optimized SQL used in reports to improve performance dramatically
- Tuned and optimized the complex SQL queries
- Worked wif Business users to gather requirements for developing new Reports or changes in the existing Reports
Environment: Hadoop, MapReduce, HDFS, Hive, Python, SQL, PIG, Sqoop, CentOS, Cloudera. Oracle 10g,11g, AutoSys, Shell scripting, MongoDB.OBIEE11g, Informatica 9.x
Confidential
SQL Server Developer
Responsibilities:
- Actively participated in SDLC processes including requirement gathering, analysis, development, implementation, testing and maintenance
- Involved in the creation of database objects like tables, views, stored procedures, functions, packages, DB triggers, Indexes
- Create SQL queries for data retrieval and optimized queries for maximum efficiency using SQL profiler
- Involved in the development of SQLserver maintenance plan, scheduling jobs, alerts, troubleshooting
- Migrated data from Oracle, Excel, flat files, MS Access to MS SQLserver using DTS and SSIS
- Used FTP task, ETL Script task, lookup transformation and Data flow task to load staging databases in SSIS
- Created sub report, On-demand, custom Ad-hoc reports using SSRS
- Deployed SSISPackage into Production and used Package configuration to export various package properties to make package environment independent
- Developed dashboard reports using Reporting Services
- Responsible for creating datasets using T-SQLand stored procedures
- Participated in creating reports that deliver data based on stored procedures
- Identified slow running query and optimization of stored procedures and tested applications for performance, data integrity using SQLProfiler
- Created Views to reduce database complexities for the end users
- Created Constraints, Written and executed T-SQLqueries like Stored Procedures and Triggers using SQLServer Management Studio
- Worked on import & export of data from Text, Excel to SQLServer
- Contributed from design to implementation and maintenance phase of the application in AGILE environment/methodology
Environment: MS SQLServer, MS SQL Server Reporting Services (SSRS), MS SQLServer Integration Services (SSIS), Team Foundation Server (TFS)