Bigdata Engineer/ Hadoop Developer Resume
SUMMARY
- Overall 5+ years of overall experience with strong emphasis on Design, Development, Implementation, Testing and Deployment of Software Applications.
- Over 5+ years of comprehensive IT experience in BigData and Big DataAnalytics, Hadoop, HDFS, Map Reduce, YARN, Hadoop Ecosystem and ShellScripting.
- Highly capable for processing large sets of Structured, Semi - structured and Unstructured datasets and supporting BigData applications.
- Expertise in transferring data between a Hadoop ecosystem and structured data storage in a RDBMS such as MY SQL, Oracle, Teradata and DB2 using Sqoop.
- Experience in ApacheSpark cluster and streams processing using Spark Streaming
- Expertise in moving large amounts of log, streaming event data and Transactional data using Flume.
- Experience in developing Map Reduce jobs in Java for data cleaning and pre-processing.
- Expertise in writing PigLatin, Hive Scripts and extended their functionality using UserDefined Functions (UDF's).
- AWS certified - AWS solution Architect associate
- Good knowledge on Hadoop, Hbase, Hive, Pig Latin Scripts, MR, Sqoop, Flume, Hive QL.
- Experience in analyzing data using Pig Latin, HiveQL and HBase.
- Capturing data from existing databases that provide SQL interfaces using Sqoop.
- Experience in importing and exporting data using Sqoop from HDFS to RelationalDatabase Systems and vice-versa.
- Implemented Proofs of Concept on Hadoop stack and different big data analytic tools, migration from different databases (i.e Teradata, Oracle, MYSQL) to Hadoop.
- Worked on NoSQL databases including HBase, Cassandra and MongoDB
- Successfully loaded files to Hive and HDFS from MongoDB, HBase
- Experience in configuring Hadoop Clusters and HDFS.
- Expertise in handling structured arrangement of data within certain limits (Data Layout's) using Partitions and Bucketing in Hive.
- Expertise in preparing interactive Data Visualization's using Tableau Software from different sources.
- Hands on experience in developing workflows that execute MapReduce, Sqoop, Pig,Hive and Shellscripts using Oozie.
- Experience working with Cloudera HueInterface and Impala.
- Expertise in developing SQLqueries, Stored Procedures and excellent development experience with Agile Methodology.
- Excellent leadership, interpersonal, problem solving and time management skills.
- Excellent communication skills both Written (documentation) and Verbal (presentation).
TECHNICAL SKILLS
Hadoop/Big Data: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, and Zookeeper.
No SQL Databases: Hbase, Cassandra, MongoDB
Languages: C, Python, Java, J2EE, PL/SQL, Pig Latin, HiveQL, Unix shell scripts, R Programming
Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets, EJB, JSF, JQuery
Frameworks: MVC, Struts, Spring, Hibernate
Operating Systems: Sun Solaris, HP-UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8
Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP
Web/Application servers: Apache Tomcat, WebLogic, JBoss
Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata
Tools: and IDE: Eclipse, NetBeans, Toad, Maven, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer
Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP
PROFESSIONAL EXPERIENCE
Confidential
Bigdata Engineer/ Hadoop Developer
Responsibilities:
- Developed a process for Sqooping data from multiple sources like SQLServer, Oracle and Teradata.
- Responsible for creation of mapping document from source fields to destination fields mapping.
- Developed a shell script to create staging, landing tables with the same schema like the source and generate the properties which are used by Ooziejobs.
- Developed Oozie workflow's for executing Sqoop and Hive actions.
- Worked with NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured data coming from various sources.
- Involved in building databaseModel, APIs and Views utilizing python, in order to build an interactive web based solution
- Performance optimizations on Spark/Scala. Diagnose and resolve performance issues.
- Responsible for developing Python wrapper scripts which will extract specific date range using Sqoop by passing custom properties required for the workflow.
- Developed scripts to run Oozie workflows, capture the logs of all jobs that run on cluster and create a metadata table which specifies the execution times of each job.
- Developed Hivescripts for performing transformation logic and also loading the data from staging zone to final landing zone.
- Developed monitoring and notification tools using Python.
- Worked on Parquet File format to get a better storage and performance for publish tables.
- Involved in loading transactional data into HDFS using Flume for Fraud Analytics.
- Developed Python utility to validate HDFS tables with source tables.
- Designed and developed UDF'S to extend the functionality in both PIG and HIVE.
- Import and Export of data using Sqoop between MySQL to HDFS on regular basis.
- Managed datasets using Panda data frames and MySQL, queried MYSQL database queries from python using Python-MySQL connector and MySQL dB package to retrieve information.
- Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozieworkflows.
- Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of data and exploring of optimizing it using SparkContext, Spark-SQL, PairRDD's, Spark YARN.
- Migrating the needed data from Oracle, MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS.
Environment: Hadoop, HDFS2.6.3, Hive1.0.1, HBase0.98.12.1, Zookeeper3.5.1, Oozie, Impala1.4.1, Java(jdk1.6), ClouderaCDH 3, Oracle, Teradata SQL Server, UNIX Shell Scripting, Flume1.6.0, Scala2.11.6, Spark1.5.0, Sqoop1.4.6, Python3.5.1.
Confidential
BigData/Hadoop Developer
Responsibilities:
- Involved in end to end data processing like ingestion, processing, and quality checks and splitting.
- Developed Spark scripts by using Scala as per the requirement.
- Load the data into SparkRDD and performed in-memory data computation to generate the output response.
- Performed different types of transformations and actions on the RDD to meet the business requirements.
- Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analysing data.
- Also worked on analysing Hadoop cluster and different bigdata analytic tools including Pig, HBase and Sqoop.
- Involved in loading data from UNIX file system to HDFS.
- Created HBase tables to store variable data formats of PII data coming from different portfolios.
- Implemented best offer logic using Pig scripts and Pig UDFs.
- Responsible to manage data coming from various sources.
- Experience on loading and transforming of large sets of structured, semi structured and unstructured data.
- Cluster coordination services through Zookeeper.
- Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Analysed large amounts of data sets to determine optimal way to aggregate and report on it.
- Responsible for setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Involved in managing and reviewing Hadoop log files.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developing Scripts and Batch Job to schedule various Hadoop Program.
- Responsible for writing Hive queries for data analysis to meet the business requirements.
- Responsible for creating Hive tables and working on them using HiveQL.
- Responsible for importing and exporting data into HDFS and Hive using Sqoop.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Designed and implemented Map Reduce based large-scale parallel relation-learning system.
- Involved in scheduling Oozie workflow engine to run multiple Hive jobs
- Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive.
- Importing the unstructured data into the HDFS using Flume.
- Used Oozie to orchestrate the map reduce jobs that extract the data on a timely manner.
- Involved in using HBase Java API on Java application.
- Automated all the jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System.
- Hands on design and development of an application using Hive (UDF).
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Provide support data analysts in running Pig and Hive queries.
- Involved in HiveQL and Involved in Pig Latin.
- Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.
- Responsible for defining the data flow within Hadoop eco system and direct the team in implement them.
Confidential
BigData/ Hadoop Developer
Responsibilities:
- Cluster monitoring and maintenance of the Cummins Cluster.
- Maintain multiple Hadoop clusters (min 100 nodes), Hadoop ecosystems, third party software, and database(s) with updates/upgrades, performance tuning and monitoring.
- Manage and support 'info works' the data ingestion and integration tool for the Datalake.
- Support/Troubleshoot/Schedule jobs running in the Production cluster.
- Resolve issues, answer questions, and provide support for users or clients on a day to day basis related to Hadoop and its ecosystem.
- Manage and support the Teradata EDW including their client tools i.e. Teradata Studio and SQL Assistant connecting to the Data lake.
- Install, configure, and operate Zookeeper, Pig, Sqoop, Hive, HBase, Kafka, and Spark for business needs. - Installed and configured Hadoop Map Reduce,
- HDFS, developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
- Experienced in dealing with AVRO, PARQUET & ORC files.
- Worked on installing, deploying, maintaining and securing of nodes and multi node cluster.
- Involved in Junit, maven, GitHub, gradle, easymock, Jenkins, Intellij
- Developed data pipelines of Kafka with the help of Spark 1.6 and Scala - Expert in Machine learning and NLP
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
Environment: Hadoop, MapReduce2.7.2, Hive2.0, Pig0.16, Sqoop2, Java, Oozie, HBase0.98.19, Kafka0.10.1.1, Spark2.0, Scala2.12.0, Eclipse, Linux, Oracle, Teradata.
Confidential
BigDaa/Hadoop Developer
Responsibilities:
- Worked on Hortonworks-HDP 2.5distribution
- Responsible for building-scalable distribution data solution using Hadoop
- Involved in importing data from MicrosoftSQLServer, MySQL, Teradata. into HDFS using Sqoop.
- Played a key role in dynamic partitioning and Bucketing of the data stored in Hive Metadata.
- Writing HiveQL queries for integrating different tables for create views to produce result set.
- Collected the log data from Web Servers and integrated into HDFS using Flume.
- Experienced on loading and transforming of large sets of structed and unstructured data.
- Used MapReduce programs for data cleaning and transformations and load the output into the Hive tables in different file formats.
- Written MapReduce programs to handle semi structed and un structed data like JSON, Avro data files and sequence files for log files.
- Created data pipelines for different events to load the data from DynamoDB to AWS S3 bucket and then into HDFS location.
- Involved in loading data into HBaseNoSQL database.
- Building, Managing and scheduling Oozie workflows for end to end job processing
- Experienced in extending Hive and Pig core functionality by writing custom UDFs using Java.
- Analyzing of Large volumes of structured data using SparkSQL.
- Written shell script to execute HiveQL.
- Used Spark as ETL tool
- Written Automated shell scripts in Linux/Unix environment using bash.
- Migrated HiveQL queries into SparkSQLto improve performance.
- Extracted Real time feed using Spark streaming and convert to RDD and process data into Data Frame and load the data into HBase.
- Experienced in using Data Stax Spark connector which is used to store the data into Cassandra databaseor get the data from Cassandra database.
- Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and load the data into Cassandra.
Environment: Hortonworks, Hadoop, HDFS, Pig, Sqoop, Hive, Oozie, Zookeeper, NoSQL, HBase, Shell Scripting, Scala, Spark, SparkSQL.
Confidential
Hadoop Developer
Responsibilities:
- Migrating existing java application into microservices using spring boot and spring cloud.
- Working knowledge in different IDEs like Eclipse, Spring Tool Suite.
- Working knowledge of using GIT, ANT/Maven for project dependency / build / deployment.
- Developing simple and complex MapReduce programs in Java for Data Analysis on different data formats.
- Developing Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data
- Working as a part of AWS build team.
- Creating, configure and managing S3 bucket(storage).
- Experience on AWS EC2, EMR, LAMBDA and Cloud Watch.
- Importing the data from different sources like HDFS/HBase into Spark RDD.
- Experiencing with batch processing of data sources using Apache Spark and Elastic search.
- Experiencing in implementing Spark RDD transformations, actions to implement business analysis
- Migrating Hive QL queries on structured into Spark QL to improve performance
- Optimizing MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Working on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
- Working on Data Serialization formats for converting Complex objects into sequence bits by using AVRO,
- PARQUET, JSON, CSV formats.
- Responsible for analyzing and cleansing raw data by performing Hive/Impala queries and running Pig scripts on data.
- Administration, installing, upgrading and managing distributions of Hadoop, Hive, HBase.
- Involved in performance of troubleshooting and tuning Hadoop clusters.
- Creating Hive tables, loaded data and wrote Hive queries that run within the map.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time
- Configure deployed and maintained multi-node Dev and Test Kafka Clusters.
- Developing Spark scripts by using Python shell commands as per the requirement to read/write JSON files.
Environment: Hadoop, HDFS2.6.3, Hive1.0.1, HBase0.98.12.1, Zookeeper3.5.1, Oozie, Impala1.4.1, Java(jdk1.6), ClouderaCDH 3, Oracle, Teradata SQL Server, UNIX Shell Scripting, Flume1.6.0, Scala2.11.6, Spark1.5.0, Sqoop1.4.6, Python3.5.1.