Sr. Big Data Engineer Resume
Lakewood, NJ
SUMMARY OF QUALIFICATIONS:
- IT Professional wif 8+ years of referable expertise Java, J2EE, BigData technologies and Hadoop Stack, Spark Framework wif Scala.
- Strong experience on HDFS, MapReduce, Spark. Hive, Pig, Sqoop, Flume, Kafka, Oozie and HBase.
- Good understanding of distributed systems, HDFS architecture, internal working details of MapReduce and Spark processing frameworks.
- Good exposure to performance tuning hive queries, map - reduce jobs, spark jobs.
- Expertise in Inbound and Outbound (importing/exporting) data form/to traditional RDBMS using SQOOP.
- Extensive experience on HiveQL, join operations, writing custom UDF's and having good experience in optimizing Hive Queries.
- Experience on various Hadoop Distributions (Cloudera, Hortonworks, and Amazon (AWS) to implement and make use of those.
- Proficient in data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
- Hands on experience in NOSQL databases and SQL databases.
- Versatile team player wif excellent analytical and inter personal skills wif an ability to quickly adapt to new technologies and project environments.
TECHNICAL SKILLS
Big Data Ecosystem: Hadoop, Spark, Scala, MapReduce, HDFS, Hive, Pig, Sqoop, Flume, Kafka, HBase
Java Technologies: JSP, Servlets, Junit, Spring Hibernate
Database Technologies: MySQL, SQL server, Oracle, MS Access
Programming Languages: Scala, Python, Java and Linux shell scripting
Operating Systems: Windows, LINUX
PROFESSIONAL EXPERIENCE:
Confidential - Lakewood, NJ
Sr. Big Data Engineer
Roles & Responsibilities:
- Involved in requirements gathering and building data lake on top of HDFS.
- Worked on Go-cd (ci/cd tool) to deploy application and has experience wif Munin frame work for BigData Testing.
- Extensively involved in writing UDFS in Hive.
- Worked onAWSComponents such as Airflow, Elastic Map Reduce (EMR), Athena and Snow-Flake.
- Developed SQOOP scripts to migrate data from Oracle to Big data Environment.
- Extensively worked wif Avro and Parquet files and converted the data from either format Parsed Semi Structured JSON data and converted to Parquet using Data Frames in Spark.
- Developed a Python Script to load the CSV files into the S3 buckets and createdAWSS3 buckets, performed folder management in each bucket, managed logs and objects wifin each bucket.
- Created Hive DDL on Parquet and Avro data files residing in both HDFS and S3 bucket
- Created Airflow Scheduling scripts in Python to automate the process of Sqooping wide range of data sets.
- Involved in file movements between HDFS andAWSS3 and extensively worked wif S3 bucket inAWS
- Created data partitions on large data sets in S3 and DDL on partitioned data.
- Converted all Hadoop jobs to run in EMR by configuring the cluster according to the data size.
- Extensively used Stash Git-Bucket for Code Control
- Monitor and Troubleshoot Hadoop jobs using Yarn Resource Manager and EMR job logs using Genie and Kibana.
- Created data pipeline for different events of ingestion, aggregation and load consumer response data inAWSS3 bucket into Hive external tables in HDFS location to serve as feed for tableau dashboards.
- Worked wif different file formats like JSon, AVRO and parquet and compression techniques like snappy.
- Developed python code for different tasks, dependencies, SLA watcher and time sensor for each job for workflow management and automation using Airflow tool.
- Developed shell scripts for dynamic partitions adding to hive stage table, verifying JSon schema change of source files, and verifying duplicate files in source location.
- Converted Hive queries intoSparktransformations usingSparkRDDs.
- Explored theSparkfor improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's,SparkYARN.
- Worked wif importing metadata into Hive using Python and migrated existing tables and applications to work onAWScloud (S3).
- Worked extensively wif importing metadata into Hive and migrated existing tables and applications to work on Hive andAWScloud and making the data available in Athena and Snowflake.
- Imported the data from different sources likeAWSS3, Local file system intoSparkRDD.
Environment: Spark, AWS, EC2, EMR, Hive, SQL Workbench, Genie Logs, Kibana, Sqoop, Spark SQL, Spark Streaming, Scala, Python
Confidential - Detroit, MI
Sr. BigData Developer
Roles & Responsibilities:
- Integrated Kafka wifSparkStreaming for real time data processing
- Wrote Sparkapplications for Data validation, cleansing, transformations and custom aggregations.
- Imported data from different sources intoSparkRDD for processing.
- Developed custom aggregate functions usingSparkSQL and performed interactive querying.
- Worked on installing cluster, commissioning & decommissioning of Data node, Name node high availability, capacity planning, and slots configuration.
- DevelopedSparkapplications for the entire batch processing by using Scala.
- Automatically scale-up the EMR instances based on the data.
- Run and Schedule the Spark script in EMR pipes.
- Utilizedsparkdata frame andSparkSQL extensively for all the processing
- Involved in managing and reviewing Hadoop log files.
- Worked on Hive partitioning, bucketing and perform joins on hive tables and utilizing hive SerDes like REGEX, JSON and AVRO.
- Exported the analyzed data to the relational databases using Sqoop and to generate reports for the BI team.
- Executed tasks for upgrading cluster on the staging platform before doing it on production cluster.
- Perform maintenance, monitoring, deployments, and upgrades across infrastructure dat supports all our Hadoop clusters.
- Installed and configured various components of Hadoop ecosystem.
- Optimized HIVE analytics SQL queries, created tables/views, written custom UDFs and Hive based exception processing.
- Involved in transforming the relational database to legacy labels to HDFS, and HBASE tables using Sqoop and vice versa.
- Replaced default Derby metadata storage system for Hive wif MySQL system.
- Supported in setting up QA environment and updating configurations for implementing scripts wif Pig.
- Configured Fair Scheduler to provide fair resources to all the applications across the cluster.
Environment: Hadoop (Cloudera Stack), Hue,Spark, Kafka, HBase, HDFS, Hive, Pig, Sqoop, Oracle
Confidential - Columbus, OH
Sr. Hadoop Developer
Roles & Responsibilities:
- Worked on AWS-EMR, Spark installation, HDFS and MapReduce Architecture.
- Participated in Hadoop Deployment and infrastructure scaling.
- Involved in creating Hive tables, and loading and analyzing data using hive queries.
- Developed Simple to complex Map Reduce Jobs using Hive and Pig.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing wif Pig.
- Parsed high-level design spec to simple ETL coding and mapping standards.
- Maintained warehouse metadata, naming standards and warehouse standards for future application development.
- Worked wif Linux systems and RDBMS database on a regular basis to ingest data using Sqoop.
- Implemented Kafka consumers to move data from Kafka partitions into Cassandra for near real-time analysis.
- Involved in Hadoop cluster task like adding and removing nodes.
- Managed and reviewed Hadoop log files and loaded log data into HDFS using Sqoop.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries, Pig Scripts, Sqoop jobs.
Environment: Hadoop (Hortonworks stack), HDFS, Oozie, Pig, Hive, MapReduce, Sqoop, Cassandra, Linux
Confidential - Denver, CO
Hadoop Developer
Roles & Responsibilities:
- Worked on analyzing, writing Hadoop MapReduce jobs using Java API, Pig and Hive.
- Responsible for building scalable distributed data solutions usingHadoop.
- Involved in loading data from edge node to HDFS using shell scripting.
- Created HBase tables to store variable data formats of PII data coming from different portfolios.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Worked wif using different kind of compression techniques to save data and optimize data transfer over network using LZO, Snappy, and Bzip etc.
- Analyze large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper, &Spark.
- Developed custom aggregate functions using Spark-SQL and performed interactive querying.
- Used Scoop to store the data into HBase and Hive.
- Worked on installing cluster, commissioning & decommissioning of DataNode, NameNode high availability, capacity planning, and slots configuration.
- Creating Hive tables, dynamic partitions, buckets for sampling, and working on them using HiveQL.
- Used Pig to parse the data and Store in Avro format.
- Stored the data in tabular formats using Hive tables and Hive Serdes.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Worked wif NoSQL databases like HBase for creating HBase tables to load large sets of semi structured data coming from various sources.
- Implemented a script to transmit information from Oracle to HBase using Sqoop.
- Implemented MapReduce programs to handle semi/unstructured data like XML, JSON, and sequence files for log files.
- Fine-tuned Pig queries for better performance.
- Involved in writing the shell scripts for exporting log files toHadoopcluster through automated process.
- Installed Oozie workflow engine to run multiple Hive and pig jobs.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Environment:Hadoop, MapReduce, HDFS, Yarn, Sqoop, Oozie, Pig, Hive, HBase, Java, Eclipse, UNIX shell scripting, python, Horton Works
Confidential - Richmond, TX
Java Developer
Roles & Responsibilities:
- Effectively interacted wif team members and business users for requirements gathering.
- Involved in analysis, design, and implementation phases of the software development lifecycle (SDLC).
- Implementation of spring core J2EE patterns like MVC, Dependency Injection (DI), and Inversion of Control (IOC).
- Implemented REST Web Services wif Jersey API to deal wif customer requests.
- Developed test cases using J Unit and used Log4j as the logging framework.
- Worked wif HQL and Criteria API from retrieving the data elements from database.
- Developed user interface using HTML, Spring Tags, JavaScript, JQuery, and CSS.
- Developed the application using Eclipse IDE and worked under Agile Environment.
- Design and implementation of front end web pages using CSS, JSP, HTML, java Script Ajax and, Struts
- Utilized Eclipse IDE as improvement environment to plan, create and convey Spring segments on Web Logic
Environment: Java, J2EE, HTML, JavaScript, CSS, J Query, Spring 3.0, JNDI, Hibernate 3.0, Java Mail, Web Services, REST, Oracle 10g, JUnit, Log4j, Eclipse, Web logic 10.3.
Confidential
Java Developer
Roles & Responsibilities:
- Involved in various phases of Software Development Life Cycle (SDLC) such as requirements gathering, analysis, design and development.
- Involved in overall performance improvement by modifying third party open source tools like FCK Editor.
- Developed Controllers for request handling using spring framework.
- Involved in Command controllers, handler mappings and View Resolvers.
- Designed and developed application components and architectural proof of concepts using Java, EJB, JSF, Struts, and AJAX.
- Participated in Enterprise Integration experience web services
- Configured JMS, MQ, EJB and Hibernate on Web sphere and JBoss
- Focused on Declarative transaction management
- Developed XML files for mapping requests to controllers
- Extensively used Java Collection framework and Exception handling.
Environment: Core Java, XML, Servlets, Hibernate Criteria API, Web service, WSDL, XML,UML, EJB, Java script, JQuery, Hibernate, SQL, CVS, Agile, JUnit