Sr Big Data Engineer/ Hadoop Developer Resume
Sfo, CA
PROFESSIONAL SUMMARY:
- Senior Software Engineer having 9+ years of professional IT experience with 5+ years of expertise in Big Data Ecosystem in ingestion, storage, querying, processing and analysis of big data.
- Extensive experience working in various key industry verticals including Banking, Finance, Insurance, Healthcare and Retail.
- Excellent exposure in understanding Big Data business requirements and providing them Hadoop based solutions
- Have great Interests in Devloping and implementing new concepts in Distributed Computing technologies, Hadoop, Map - Reduce and NoSQL databases.
- Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster.
- Significant expertise in implementing real-time Big Data systems using Hadoop Ecosystem tools like Hadoop Map Reduce, Spark HDFS, HIVE, PIG, Hbase, Pentaho, Zookeeper, Yarn, Sqoop, Kafka, Scala, Oozie, Flume.
- Exposure of working on different big data distributions like Cloudera, Hortonworks, Apache etc.
- Experience in usingAmazon Web Services (AWS) cloud. Performed Export and import of data into S3 and Amazon Redshift database.Working Knowledge on Azureby using Data Factory, Azure Resource Manager etc.
- Expertise in designing scalable data stores using the Apache Cassandra NoSQL database.
- Extensive experience in performing data processing and analysis using HiveQL, Pig Latin, custom Map Reduce programs in Java, Scala and python scripts in Spark and SparkSQL.
- Working knowledge with both Hadoopv1 and v2.
- Experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata, DB2 into HDFS and vice-versa using Sqoop.
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Experience in managing and reviewing Hadoop Log files.
- Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map/Reduce and Pig jobs.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Profound experience in creating real time data streaming solutions using Apache Spark/SparkStreaming, Kafka.
- Experience includes Requirements Gathering/Analysis, Design, Development, versioning, Integration, Documentation, Testing, Build and Deployment
- Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis.
- Successfully loaded files to Hive and HDFS from HBase
- Experience in supporting analysts by administering and configuring HIVE
- Experience in providing support to data analyst in running Pig and Hive queries.
- Knowledge on Elastic search.
- Worked on a POC using EMR (Elastic MapReduce).
- Knowledge on graph database management system like neo4j.
- Good knowledge on Hadoop Cluster architecture and monitoring the cluster .
- In-depth understanding of Data Structures and Algorithms.
- Excellent shell scripts skills in Unix/Linux.
- Involved in fixing bugs and unit testing with test cases using JUnit.
- Excellent Java development skills using J2EE, J2SE, Junit, JSP, JDBC
- Implemented Unit Testing using JUNIT testing and system testing during the projects.
- Good experience in design the jobs and transformations and load the data sequentially & parallel for initial and incremental loads.
- Always worked in teams and appreciated for collaboration and problem solving competencies.
- Excellent verbal and written communication skills.
- Well versed with project lifecycle documentation with specialization in development of robust reusable designs and patterns.
TECHNICAL SKILLS:
Big Data Technologies: Hadoop, MapReduce, Spark, HDFS, HBase,Cassandra,Sqoop, Pig, Hive, Oozie, Zookeeper,Yarn, Flume,Kafka, AVRO, Parquet, Cloudera
Operating Systems: Windows, Linux, Unix
Programming/Scripting Languages: C, Java, Python, Shell, R, SQL, JavaScript, Perl
Cloud: AWS EC2, S3, Redshift
Database: AWS Redshift, MongoDb, Teradata, Oracle, DB2
Business Intelligence: Business Objects, Tableau
Tools: & Utilities: Eclipse, Netbeans, Git, Svn,Win SCP, Putty, Autosys
Web technologies: HTML5, CSS, XML, Javascript
PROFESSIONAL EXPERIENCE:
Confidential, SFO, CA
Sr BIG DATA Engineer/ Hadoop Developer
Responsibilities:
- Being Part of Data Service team, Actively involved in moving the data from relational databases(Teradata) to the Hadoop.
- Performed data quality checks on data as per the business requirement.
- Performed data validation on target table in compared to the source table.
- Achieved high throughput and low latency for ingestion jobs leveraging the Sqoop
- Transformed the raw data from traditional data warehouse and loaded into stage and target tables.
- Fine tuned the Hive Queries for huge tables to realize low latency inserts .
- Optimally stored the data in hadoop using file formats like avro and parquet.
- Automated the ingestion and transformed components by creating oozie workflows.
- Closely worked with Data Modelling team to realize business requirements.
- Performed the incremental imports successfully and made the table in hive consistent.
- Performed Hive partioning and bucketing to reduce the disk I/O.
- Performed windowing and analytical functions in hive to optimize the transportation logistics.
- Consumed the flat files from Apache Kafka vendors and imposed hive schema on top of it to correlate with the tables in data lake.
Environment: Kafka 1.10.0, MariaDB10.1.21, Data lake, Teradata, Data warehouse, HDFS2.6.0, Hadoop2.6.0, Spark1.4.0, Zookeeper3.4.9, Hive 0.14.0, ava8, Unix shell scripting.
Confidential, NC
Hadoop Data Engineer
Responsibilities:
- In conjunction to developer played an additional role of a Systems Integrator and captured requirements for the project and got involved with infrastructure setting up of EC2 instances to install Mongo drivers to extract data from the MongoDB
- Got a detailed understandingof NoSQL databases with Mongo and understood the concept of key value pair storage
- Researched on different ways to parse JSON data, the Mongo DB extract output.
- Developed the process to extract from Mongo on an EC2 instance and SFTP data to Hadoop Data Lake
- Created pig script to parse JSON data and added custom UDF to handle conversion of some hexadecimal fields to ASCII
- Developed MapReduce programs to cleanse the data in HDFS to make it suitable for ingestion into Hive schema for analysis and also to perform business specific transformations, like conversion of data fields, validation of data, and other business logic
- Loaded the final parsed and transformed data into Hive structures creating Avro files partitioned on load date time stamp
- Implemented Partitioning, Bucketing in Hive for better organization of the data.
- Implemented a view based strategy to segregate sensitive data and assign different access roles to different views
- Created a parallel branch to load the same data to Teradata using sqoop utilities
- Experience in using Flume and Kafka to load the log data from multiple sources into HDFS.
- Created a Tableau report on the Teradata solution to provide business with their day to day audit reporting needs
- Worked with the System testers to manually query the data and do a source to target comparison to ensure data integrity
- Prepared test plans and writing test cases
- Worked with the business team to get them access to Hive and the Tableau reports along with UAT for the project.
Environment: Hadoop, Hive, Pig, MongoDB, UNIX, MapReduce, Kafka, Flume, CDH, SQL, Teradata, Tableau, AWS EC2
Confidential, Nashville, TN
Hadoop Developer/ Big Data Engineer
Responsibilities:
- Developed data pipeline using FLUME, SQOOP, HIVE AND JAVA MAPREDUCE to ingest customer behavioral data and financial histories into HDFS for analysis.
- Used HIVE to do transformations, event joins, filter boot traffic and SOME PRE-AGGREGATIONS before storing the data onto HDFS.
- Extensive experience in ETL (Talend) Data Ingestion, In-Stream data processing, BATCH ANALYTICS and Data PERSISTENCE STRATEGY.
- Worked on Designing and Developing ETL (Talend) Workflows using Java for processing data in HDFS/Cassendra using Oozie.
- Expertise with the tools in Hadoop Ecosystem including PIG, HIVE, HDFS, MAP REDUCE, SQOOP, KAFKA, YARN, OOZIE, AND ZOOKEEPER. Hadoop architecture and its components.
- Involved in integration of Hadoop cluster with spark engine to perform BATCH and Streaming operations.
- Explored with the SPARK, improving the performance and optimization of the existing algorithms in Hadoop using SPARK CONTEXT, PYSPARK, SPARK-SQL, DATA FRAME, PAIR RDD'S, SPARK YARN.
- Import the data from different sources like HDFS/Hbase into SPARK RDD.
- Developed SPARK CODE using SCALA and Spark-SQL/Streaming for faster testing and processing of data.
- Developed KAFKA PRODUCER and consumers, HBase clients, SPARK and Hadoop Map Reduce jobs along with components on HDFS, Hive.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Involved in developing HIVE DDLS to create, alter and drop Hive tables and storm.
- Create scalable and high-performance web services for data tracking.
- Involved in loading data from UNIX file system to HDFS. Installed and configured Hive and also written Hive UDFs and Cluster coordination services through Zoo Keeper.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Experienced in managing Bigbucket for java and python code.
- Experienced in managing Hadoop Cluster using CLOUDERA MANAGER TOOL.
- Created and maintained Technical documentation for launching Cloudera Hadoop Clusters and for executing Hive queries and Pig Scripts
- Involved in using HCATALOG to access Hive table metadata from Map Reduce .
Environment: MAP REDUCE, YARN, HIVE, PIG, CASSENDRA, PYSPARK, OOZIE, Talend, SQOOP, SPLUNK, KAFKA, ORACLE 11G, CORE JAVA, CLOUDERA, AKKA, ECLIPSE, PYTHON, SCALA, SPARK, SQL, TABLEAU, BIG BUCKET, UNIX SHELL SCRIPTING.
Confidential, OK
Java/ J2EE Developer
Responsibilities:
- Involved in various Software Development Life Cycle (SDLC) phases of the project like Requirement gathering, development, enhancements using agile methodologies.
- Developed the user interface using Spring MVC, JSP, JSTL, Javascript. Custom Tags, Jquery, Html and CSS.
- Used Spring MVC for implementing the Web layer of the application. This includes developing Controllers, Views and Validators.
- Developed the service and domain layer using Spring Framework modules like Core-IOC, AOP.
- Developed the Application Framework using Java, Spring, Hibernate and Log4J.
- Created DB tables, functions, Joins and wrote prepared statements using SQL.
- Configured Hibernate session factory in applicationcontext.xml to integrate Hibernate with Spring.
- Configured ApplicationContext.xml in SPRING to adopt communication between Operations and their corresponding handlers.
- Developed Spring rest controllers to handle json data and wrote dao’s and services to handle the data.
- Created DB tables, functions, Joins and wrote prepared statements using PL/SQL.
- Consumed and Create REST Web services using Spring and Apache CXF.
- For integration framework Apache Camel is used.
- Developed MySQL stored procedures and triggers using SQL in order to calculate and update the tables to implement business logic.
- Used Perl scripts for automation of deployments to Application server
- Used Maven to build the application and deployed on JBoss Application Server.
- Used intellij for development and JBossApplication Server for deploying the web application.
- Performed usability testing for the application using JUnit Test.
- Monitored the error logs using log4j.
- Developed JUnit testing framework for Unit level testing.
- Implemented Spring JMS message listeners with JMS queues for consumption of Asynchronous requests.
- Used AOP concepts like aspect, join point, advice, point cut, target object and also AOP proxies.
Environment: Jdk 1.6, HTML, Jsp, Spring, JBoss, log 4j, Perl, Tortoise SVN, Hibernate, SOAP web services, maven, SOAP UI, Eclipse Kepler, java script, Xml, Mysql v5